Config file
The configuration file defines all parameters for your SLM training process. This page details each available configuration option, including default values and descriptions.
Format Overview
distil labs uses a YAML configuration file with the following structure:
Base Configuration
task
Default: classification
Options:
classification
contextual-classification
question-answering-open-book
information-extraction
Description: Type of NLP task to be solved.
random_seed
Default: 123
Description: Random seed used across the platform for operations like random sampling of data.
debug
Default: false
Description: Flag to enable debug mode. If set to true
, synthetic data generation ends after one iteration.
Setup Configuration
student_model_name
Default: meta-llama/Llama-3.2-1B-Instruct
Options:
meta-llama/Llama-3.2-1B-Instruct
meta-llama/Llama-3.2-3B-Instruct
meta-llama/Llama-3.1-8B-Instruct
HuggingFaceTB/SmolLM2-135M-Instruct
Description: Base model to use for the student model. This is the model we finetune for your use-case.
teacher_model_name
Default: us.meta.llama3-3-70b-instruct-v1:0
Options:
meta.llama3-1-405b-instruct-v1:0
meta.llama3-8b-instruct-v1:0
meta.llama3-1-70b-instruct-v1:0
us.meta.llama3-3-70b-instruct-v1:0
Description: Teacher model used to generate synthetic data and from which we distill knowledge.
Tuning Configuration
use_lora
Default: true
Description: Flag to control whether to use LoRA (Low-Rank Adaptation) for student training, a parameter-efficient fine-tuning method.
train_classification_as_textgen
Default: false
Description: Only relevant for classification tasks. When enabled, trains the classification model to generate class names as text instead of using a classification head.
per_device_train_batch_size
Default: 4
Description: The batch size per GPU/XPU/TPU/MPS/NPU core/CPU for training.
per_device_eval_batch_size
Default: 4
Description: The batch size per GPU/XPU/TPU/MPS/NPU core/CPU for evaluation.
num_train_epochs
Default: 128
Description: Total number of training epochs to perform during fine-tuning.
Evaluation Configuration
batch_size
Default: 4
Description: Batch size to use when evaluating model. (Deprecated)
num_few_shot_examples
Default: 1
Description: Number of examples to provide as few-shot examples when running teacher evaluation. For classification tasks with values above 0, at least one example per class is used.
Synthetic Generation Configuration
generation_target
Default: 5000
Description: Target number of synthetic data examples to generate.
generation_in_single_call
Default: 4
Description: Number of examples to generate per teacher/LLM invocation.
generation_iteration_size
Default: 128
Description: Number of examples to generate in a single batch. The generate-validate process runs in batches of this size.
num_positive_exemplars_per_generation
Default: 2
Description: The number of in-context examples for the class/task to be generated.
num_negative_exemplars_per_generation
Default: 2
Description: The number of in-context examples for classes that are not to be generated. Only used for classification tasks.
validation_max_answer_length
Default: 8192
Description: Maximum allowable length of generated examples.
validation_similarity_threshold
Default: 0.95
Range: 0.0
to 1.0
Description: Threshold to determine similarity of generated examples with seed data. Generated data with similarity above this threshold are removed.
teacher_temperature
Default: 1.0
Range: 0.0
to 1.0
Description: Controls the balance of predictability vs creativity of the teacher/LLM output.
teacher_max_tokens
Default: 4096
Description: Maximum number of tokens in the generated response.
match_generated_distribution_to_seed
Default: false
Description: When enabled, generated data will match seed data in terms of class distribution. Only used for classification tasks.
num_unlabelled_exemplars_per_generation
Default: 2
Description: Number of unlabelled examples to provide during each teacher/LLM invocation when generating synthetic data.
data_generation_strategy
Default: classification-one-class-context
Options:
- Classification options:
classification-one-class
classification-all-class
classification-all-class-context
classification-all-class-weak-labels
classification-one-class-context
classification-one-class-weak-labels
contextual-classification-all-class
- Question Answering options:
qa-open-book
qa-open-book-with-synthetic-context
qa-open-book-information-extraction
Description: Strategy to use when generating synthetic training data.
Example Configuration
This configuration sets up an open-book QA task using a 1B parameter model with 32 training epochs.