Model training | Distil Labs

After validating your teacher model’s performance, the next step is to train your small language model (SLM) using distil labs’ knowledge distillation approach.

Understanding knowledge distillation

Knowledge distillation is the core technology behind distil labs’ ability to create high-performing small models with minimal training data. The process works as follows:

Synthetic Data Generation: The large “teacher” model generates synthetic training data based on your problem definition, task description, and provided examples.
Synthetic Data Validation: We validate generated data to make sure the synthetic set is diverse and high-quality
Knowledge Transfer: The synthetic data is used to train the smaller “student” model with a loss function aligned with your specific task. This process enables the student model to emulate the teacher’s capabilities while maintaining a much smaller size.

Initiating model training

After completing teacher evaluation and confirming satisfactory performance, you can start the training process using the API (get your token):

1 import requests
2 
3 data = {"upload_id": upload_id}
4 response = requests.post(
5     f"https://api.distillabs.ai/models/{model_id}/training",
6     data=json.dumps(data),
7     headers={"content-type": "application/json", **auth_header},
8 )
9 slm_training_job_id = response.json().get("id")
10 print(f"Training model with ID: {slm_training_job_id}")

Monitoring training status

The training process typically takes several hours to complete. You can check the current status of your training job:

1 import requests
2 from pprint import pprint
3 
4 response = requests.get(
5         f"https://api.distillabs.ai/trainings/{slm_training_job_id}/status",
6         headers=auth_header
7     )
8 pprint(response.json())

Possible status values include:

PENDING - Job is waiting to start
RUNNING - Job is currently running
SUCCESS - Job has finished successfully
FAILURE - Job encountered an error

Retrieving evaluation results

When the training is complete (status=SUCCESS), you can retrieve detailed evaluation results to compare the performance of your trained SLM against the teacher model:

1 import requests
2 from pprint import pprint
3 
4 response = requests.get(
5     f"https://api.distillabs.ai/trainings/{slm_training_job_id}/evaluation-results",
6     headers=auth_header
7 )
8 
9 # Display the evaluation results
10 print(response.json())

Display the results with:

1 print(response.json()["message"])
2 for model in ["teacher", "student-base", "student-tuned"]:
3     model_results = response.json()["evaluation_results"][model]
4     print(f"\n---\n\n{model}")
5     print("Accuracy:", model_results["accuracy"])
6     print(pd.DataFrame(model_results).transpose())

Retrieving predictions

For more in-depth analysis, you can download the predictions on individual data points of the test dataset. These predictions are generated using the fine-tuned student model. The URL links to a JSON file that contains the predictions along with other information depending on which task type you have selected (i.e. classification/question-answering/tool-calling).

The URL of this file can be found using:

1 print(response.json()["finetuned_student_evaluation_predictions_download_url"])

You can then download this file from the terminal using:

$ curl -o finetuned_student_evaluation_predictions.jsonl "<DOWNLOAD_URL>"

Note that the file is in JSON Lines format and can be read using:

1 df = pd.read_json("finetuned_student_evaluation_predictions.jsonl", lines=True)

What makes a successful training?

Comparison to Teacher: Your SLM should achieve performance reasonably close to the teacher model (typically within one standard deviation)
Task Requirements: The absolute performance should meet your specific application needs

If your SLM performance is significantly below the teacher model, consider:

Increasing the number of training examples
Adjusting your task description to be more specific
Modifying your configuration parameters (like increasing training epochs)
Using a slightly larger student model