Config file

The configuration file defines all parameters for your SLM training process. This page details each available configuration option, including default values and descriptions.

Format overview

distil labs uses a YAML configuration file with the following structure:

1base:
2 # General parameters (task is required)
3 task: classification
4
5tuning:
6 # Fine-tuning parameters
7 num_train_epochs: 32
8
9evaluation:
10 # Evaluation parameters
11 num_few_shot_examples: 1
12
13synthgen:
14 # Synthetic data generation parameters
15 generation_target: 10000

Base configuration

task

Default: none (required)

Options:

  • classification
  • question-answering-open-book
  • information-extraction
  • question-answering-open-book-synthetic-context
  • tool-calling-closed-book

Description: Type of NLP task to be solved. This setting enables task-specific behaviors in tuning and data generation.

random_seed

Default: 123

Description: Random seed used across the platform for operations like random sampling of data and dataset splits.

Setup configuration

Note: In the current schema, these fields live under base: (there is no separate setup: section).

student_model_name

Default: Llama-3.2-1B-Instruct

Options:

  • Llama-3.2-1B-Instruct
  • Llama-3.2-3B-Instruct
  • Llama-3.1-8B-Instruct
  • SmolLM2-135M-Instruct
  • granite-3.1-8b-instruct
  • granite-3.3-8b-instruct

Description: Base model to use for the student model. This is the model we finetune for your use-case.

teacher_model_name

Default: Llama-3.3-70B-Instruct

Options:

  • deepseek.r1
  • Llama-3.1-405B-Instruct
  • Llama-3.1-8B-Instruct
  • Llama-3.1-70B-Instruct
  • Llama-3.3-70B-Instruct
  • openai.gpt-oss-120b

Description: Teacher model used to generate synthetic data and from which we distill knowledge.

Tuning configuration

learning_rate

Default: 5e-5

Description: Initial learning rate for the AdamW optimizer. Range: > 0 (commonly 1e-61e-3)

learning_rate_scheduler

Default: linear

Options: cosine, linear, constant

Description: Learning-rate schedule type.

weight_decay

Default: 0.0

Description: Weight decay for AdamW (excludes bias and LayerNorm weights). Range: ≥ 0 (e.g., 0.0–0.1 common)

warmup_ratio

Default: 0.05

Description: Ratio of total training steps used for a linear warmup from 0 to learning_rate. Typical range: 0.0–0.1

use_lora

Default: true

Description: Flag to enable LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning.

lora_alpha_multiplier

Default: 1

Description: Multiplier for LoRA alpha. Effective alpha is computed as lora_alpha = lora_r * lora_alpha_multiplier. Range: positive integer

lora_r

Default: 64

Description: LoRA rank (attention dimension). Range: positive integer (e.g., 4–256 typical)

train_classification_as_textgen

Default: false

Description: Only relevant for classification tasks. When enabled, trains the model as text generation that emits class names as text rather than using a classification head.

per_device_train_batch_size

Default: 4

Description: Batch size per GPU/XPU/TPU/MPS/NPU core/CPU for training. Range: positive integer

per_device_eval_batch_size

Default: 4

Description: Batch size per device for evaluation (prefer this over deprecated evaluation.batch_size). Range: positive integer

num_train_epochs

Default: 32

Description: Total number of epochs for fine-tuning. Range: positive integer

train_eval_split

Default: 0.2

Range: (0.0, 1.0) (exclusive)

Description: Fraction of the training dataset held out for evaluation and best-model selection.

Evaluation configuration

batch_size

Default: 4   (Deprecated)

Description: Batch size to use when evaluating model. Prefer tuning.per_device_eval_batch_size.

num_few_shot_examples

Default: 1

Description: Number of examples to provide as few-shot context when running teacher evaluation. If the number is above 0 for classification, at least one example per class is used.

Synthetic generation configuration

generation_target

Default: 10000

Description: Target number of synthetic data examples to generate. Special case: For question-answering-closed-book, this value is ignored and computed as len(unstructured_data) * generation_per_unstructured_context.

generation_in_single_call

Default: 4

Description: Number of examples to generate per teacher/LLM invocation.

generation_iteration_size

Default: 128

Description: Number of examples processed in each generate-validate batch.

generation_per_unstructured_context

Default: 50

Description: Number of examples to generate per context in unstructured data. Usage: Only for question-answering-closed-book.

num_positive_exemplars_per_generation

Default: 2

Description: Number of in-context examples for the target class/task per generation call.

num_negative_exemplars_per_generation

Default: 2

Description: Number of in-context examples from other classes per generation call. Usage: Only for classification tasks.

validation_max_answer_length

Default: 8192

Description: Maximum allowable length of a generated example/answer during validation.

validation_similarity_threshold

Default: 0.95 Range: 0.0 to 1.0 (inclusive)

Description: Similarity threshold against seed data. Generated samples with similarity above this threshold are removed to promote novelty.

teacher_temperature

Default: 1.0 Range: 0.0 to 1.0 (inclusive)

Description: Controls the balance of predictability vs. creativity in teacher/LLM outputs.

teacher_max_tokens

Default: 4096

Description: Maximum number of tokens in the generated response (provider limits may apply).

match_generated_distribution_to_seed

Default: false

Description: Match generated class distribution to seed data. Usage: Only for classification tasks.

num_unlabelled_exemplars_per_generation

Default: 2

Description: Number of unlabeled examples to include as additional context in each teacher/LLM invocation.

num_distractor_context_blocks

Default: 0

Description: Number of distractor context blocks to include with every generated example (enables RAFT-style training when > 0).

output_is_json

Default: false

Description: Only relevant for QA tasks. If true, retain only synthetic data whose outputs are valid JSON.