Config file
The configuration file controls the training pipeline through four main sections: base, tuning, evaluation, and synthgen. Each section handles a specific aspect of the model training process.
File format
The configuration file supports two formats depending on how you interact with distil labs:
- Webapp: Use JSON format (
.jsonfile) - API: Use YAML format (
.yamlfile)
Both formats are functionally equivalent—choose based on your workflow. Examples in this documentation show YAML, but the JSON equivalent is straightforward:
Configuration structure
Base configuration
General parameters relevant to the overall task.
Supported task types
Supported student models
Supported teacher models
Tuning configuration
Parameters controlling the finetuning of the student model.
RLVR (Reinforcement Learning with Verifiable Rewards)
RLVR is an optional reinforcement learning stage that runs after SFT finetuning. It uses reward signals from an LLM-as-a-judge to further improve model performance. Set rlvr_dataset_size to a value greater than 0 to enable it.
Evaluation configuration
Parameters used in teacher evaluation.
Synthetic generation configuration
Parameters for fine-grained control over synthetic data generation.
Example configuration
Minimal configuration
Full configuration example
Model-specific notes
DeepSeek R1
When using deepseek.r1 as the teacher model, the recommended temperature range is 0.5 to 0.7. Configurations with temperatures outside this range will raise a validation error.
GPT OSS 120B Thinking
The openai.gpt-oss-120b-thinking model uses a medium reasoning effort setting by default for enhanced chain-of-thought capabilities.
Tool Calling
Tool calling tasks have specific model compatibility requirements:
Student models: Only Qwen3 and Llama 3-family models are supported for tool-calling-closed-book and multi-turn-tool-calling-closed-book tasks.
Teacher models for multi-turn: Multi-turn tool calling (multi-turn-tool-calling-closed-book) requires one of the following teacher models:
Qwen3-235B-A22B-Instruct-2507Llama-3.1-405B-Instructopenai.gpt-oss-120b
