Task selection

Choosing the right task type for your model is the crucial first step in the distil labs workflow. Currently, our platform provides specialized support for five fundamental task types:

  • Classification where the model analyzes the input text and assigns it to one category from a fixed set of options.
  • Open Book Question Answering (Rag) where the model generates answers to questions based context you have ready.
  • Open Book Question Answering (Synthetic Context) where the model generates answers to questions based context you don’t yet have.
  • Information Extraction where the model extracts structured information from unstructured text.
  • Tool Calling where the model calls a tool to complete a task.

Choosing Between Tasks

Deciding between these task types often comes down to the nature of your desired output.

If you need to…Choose
Assign categories to textClassification
Generate answers from contexts you haveOpen Book QA (RAG)
Generate answers from contexts you don’t yet haveOpen Book QA (Synthetic Contexts)
Extract information into structured fieldsInformation Extraction
Generate tool calls according to a pre-defined schemaTool Calling (Closed Book)

Classification

Classification models analyze input text and assign it to one category from a fixed set of options. This task type is particularly effective when you need deterministic categorization rather than open-ended generation.

Example Use Cases

Some common application areas for classification include:

  • Intent detection for customer service queries
  • Content moderation (toxic/safe, spam/not spam)
  • Sentiment analysis (positive/negative/neutral)
  • Topic categorization for knowledge bases
  • Triaging support tickets by department

Open Book Question Answering for RAG

Open-book QA trains a model to answer questions using a provided passage (“context”). The goal is to produce answers that are grounded in the text rather than relying on general world knowledge. This task is a natural fit for Retrieval-Augmented Generation (RAG), where a retriever supplies relevant chunks and the model answers strictly from those chunks.

Choose this task when you already have (or can reliably retrieve) the passages you want the model to use at train/validation/inference time.

Typical scenario: you’ve chunked product manuals, knowledge-base pages, tickets, SOPs, or API docs for a RAG pipeline and you want the teacher to draw from those same chunks when fabricating new QA (or grounded classification) examples.

Example Use Cases

Example use cases where open book QA excels include:

  • Customer support systems that answer based on product documentation
  • Legal document analysis and question answering
  • Technical documentation assistants
  • Knowledge base or FAQ automation
  • Research assistants that answer based on specific papers or texts

Open Book Question Answering with Synthetic Contexts

Open-book QA with synthetic contexts trains QA model for RAG, even when you don’t yet have the real unstructured corpus. During dataset creation, a teacher model fabricates realistic, domain-plausible passages and paired answers; the student is then trained to produce answers grounded in the provided (synthetic) passage rather than general world knowledge.

This is particularly valuable when you need a context-conditioned QA model but your sources are incomplete, proprietary, or still being organized. It lets you bootstrap a RAG-style system early, de-risk prompt/data design, and converge on schema and evaluation before ingestion pipelines are ready.

When to pick this task?

Choose this task when you do not yet have a clean, chunked corpus but still want a model that expects a context column at train/validation/prediction time. Typical scenario: you’re planning a RAG pipeline (manuals, tickets, SOPs) but the documents aren’t ready; you want the teacher to fabricate passages that look like those future chunks and generate Q-A pairs grounded in them.

Example Use Cases

Use cases where synthetic-context QA excels:

  • Prototyping support bots before product docs are cleaned and chunked
  • Building internal assistants while compliance reviews delay data access
  • Simulating knowledge-base style answers for new products/features
  • Training QA models for domains with sparse or scattered documentation
  • Creating evaluation suites that mimic the structure of future corpora
  • Stress-testing prompt/format choices for a planned RAG system
  • Teaching grounding behavior (answer only from passage) from day one

Information Extraction

Information extraction converts unstructured text into structured records (fields, entities, relations). Instead of generating a free-form answer, the model pulls precise facts from documents and outputs them in a predictable format (e.g., JSON).

Information Extraction is the right fit when your application needs consistent, schema-constrained outputs rather than prose, and when correctness/coverage of fields matters more than narrative fluency.

Example Use Cases

  • Contracts — turn contract text into a searchable deal-tracking table.
  • Invoices — convert vendor PDFs into normalized payables records.
  • Purchase Orders — standardize emailed POs into ERP-ready entries.
  • Meeting Minutes — convert minutes into decision and action registers.
  • Security Incident Reports — standardize incident write-ups for SOC dashboards.
  • IT Helpdesk Tickets — convert support tickets into analytics-ready records.
  • Sales Chat Transcripts — transform chat logs into CRM updates.

Tool Calling

Closed-book tool calling trains a model to select and invoke the appropriate function or API based solely on the user’s request, without requiring additional context. The model learns to map natural language queries to structured tool calls with correct parameters, relying on its training rather than retrieved documentation.

This task type is ideal when you need a model that can reliably dispatch user intents to specific backend functions, APIs, or services in a deterministic, schema-compliant way.

When to pick this task

Choose closed-book tool calling when:

  • You have a fixed set of tools/APIs the model should learn to invoke
  • The tool selection logic should be memorized during training, not looked up
  • You need structured, validated outputs that match exact function signatures
  • Your application requires routing user requests to backend services

Example Use Cases

Common applications for closed-book tool calling include:

  • Voice assistants — Map spoken commands to smart home APIs
  • Chatbot actions — Convert user intents to CRM/database operations
  • Code generation — Transform natural language to API calls
  • Workflow automation — Route requests to appropriate microservices
  • Command interfaces — Parse user input into system commands
  • Integration layers — Bridge natural language to legacy systems