Task selection
Choosing the right task type for your model is the crucial first step in the distil labs workflow. Currently, our platform provides specialized support for five fundamental task types:
- Classification where the model analyzes the input text and assigns it to one category from a fixed set of options.
- Open Book Question Answering (Rag) where the model generates answers to questions based context you have ready.
- Open Book Question Answering (Synthetic Context) where the model generates answers to questions based context you don’t yet have.
- Information Extraction where the model extracts structured information from unstructured text.
- Tool Calling where the model calls a tool to complete a task.
Choosing Between Tasks
Deciding between these task types often comes down to the nature of your desired output.
Classification
Classification models analyze input text and assign it to one category from a fixed set of options. This task type is particularly effective when you need deterministic categorization rather than open-ended generation.
Example Use Cases
Some common application areas for classification include:
- Intent detection for customer service queries
- Content moderation (toxic/safe, spam/not spam)
- Sentiment analysis (positive/negative/neutral)
- Topic categorization for knowledge bases
- Triaging support tickets by department
Open Book Question Answering for RAG
Open-book QA trains a model to answer questions using a provided passage (“context”). The goal is to produce answers that are grounded in the text rather than relying on general world knowledge. This task is a natural fit for Retrieval-Augmented Generation (RAG), where a retriever supplies relevant chunks and the model answers strictly from those chunks.
Choose this task when you already have (or can reliably retrieve) the passages you want the model to use at train/validation/inference time.
Typical scenario: you’ve chunked product manuals, knowledge-base pages, tickets, SOPs, or API docs for a RAG pipeline and you want the teacher to draw from those same chunks when fabricating new QA (or grounded classification) examples.
Example Use Cases
Example use cases where open book QA excels include:
- Customer support systems that answer based on product documentation
- Legal document analysis and question answering
- Technical documentation assistants
- Knowledge base or FAQ automation
- Research assistants that answer based on specific papers or texts
Open Book Question Answering with Synthetic Contexts
Open-book QA with synthetic contexts trains QA model for RAG, even when you don’t yet have the real unstructured corpus. During dataset creation, a teacher model fabricates realistic, domain-plausible passages and paired answers; the student is then trained to produce answers grounded in the provided (synthetic) passage rather than general world knowledge.
This is particularly valuable when you need a context-conditioned QA model but your sources are incomplete, proprietary, or still being organized. It lets you bootstrap a RAG-style system early, de-risk prompt/data design, and converge on schema and evaluation before ingestion pipelines are ready.
When to pick this task?
Choose this task when you do not yet have a clean, chunked corpus but still want a model that expects a context
column at train/validation/prediction time.
Typical scenario: you’re planning a RAG pipeline (manuals, tickets, SOPs) but the documents aren’t ready; you want the teacher to fabricate passages that look like those future chunks and generate Q-A pairs grounded in them.
Example Use Cases
Use cases where synthetic-context QA excels:
- Prototyping support bots before product docs are cleaned and chunked
- Building internal assistants while compliance reviews delay data access
- Simulating knowledge-base style answers for new products/features
- Training QA models for domains with sparse or scattered documentation
- Creating evaluation suites that mimic the structure of future corpora
- Stress-testing prompt/format choices for a planned RAG system
- Teaching grounding behavior (answer only from passage) from day one
Information Extraction
Information extraction converts unstructured text into structured records (fields, entities, relations). Instead of generating a free-form answer, the model pulls precise facts from documents and outputs them in a predictable format (e.g., JSON).
Information Extraction is the right fit when your application needs consistent, schema-constrained outputs rather than prose, and when correctness/coverage of fields matters more than narrative fluency.
Example Use Cases
- Contracts — turn contract text into a searchable deal-tracking table.
- Invoices — convert vendor PDFs into normalized payables records.
- Purchase Orders — standardize emailed POs into ERP-ready entries.
- Meeting Minutes — convert minutes into decision and action registers.
- Security Incident Reports — standardize incident write-ups for SOC dashboards.
- IT Helpdesk Tickets — convert support tickets into analytics-ready records.
- Sales Chat Transcripts — transform chat logs into CRM updates.
Tool Calling
Closed-book tool calling trains a model to select and invoke the appropriate function or API based solely on the user’s request, without requiring additional context. The model learns to map natural language queries to structured tool calls with correct parameters, relying on its training rather than retrieved documentation.
This task type is ideal when you need a model that can reliably dispatch user intents to specific backend functions, APIs, or services in a deterministic, schema-compliant way.
When to pick this task
Choose closed-book tool calling when:
- You have a fixed set of tools/APIs the model should learn to invoke
- The tool selection logic should be memorized during training, not looked up
- You need structured, validated outputs that match exact function signatures
- Your application requires routing user requests to backend services
Example Use Cases
Common applications for closed-book tool calling include:
- Voice assistants — Map spoken commands to smart home APIs
- Chatbot actions — Convert user intents to CRM/database operations
- Code generation — Transform natural language to API calls
- Workflow automation — Route requests to appropriate microservices
- Command interfaces — Parse user input into system commands
- Integration layers — Bridge natural language to legacy systems