Quickstart | Distil Labs

This guide will walk you through training your first specialized small language model (SLM) with distil labs in just a few steps

Prerequisites

A distil labs account
Python 3.7+
tar
Basic understanding of your task requirements

Authentication

First, set up authentication to access the distil labs API. You can get the API key using the snippet in Account and Authentication.

1 AUTH_HEADER = {"Authorization": "API_KEY"}

Data Preparation

A training job requires three main components:

Job description: JSON file describing your task in plain English
Training and testing data: Small dataset (10-50 examples) showing inputs and expected outputs. They should be tables with question, answer columns as well as an optional context column.
Configuration file: YAML defining model and training parameters

A simple example dataset for a question-answering task:

1 import yaml
2 from pathlib import Path
3 
4 config = {
5     "base": {"task": "question-answering-open-book"},
6     "setup": {"student_model_name": "meta-llama/Llama-3.2-1B-Instruct"},
7     "synthgen": {"data_generation_strategy": "qa-open-book"},
8 }
9 
10 # Load data from files
11 data_dir = Path("data")
12 data = {
13     "job_description.json": open(data_dir / "job_description.json").read(),
14     "train.csv": open(data_dir / "train.csv").read(),
15     "test.csv": open(data_dir / "test.csv").read(),
16     "unstructured.csv": open(data_dir / "unstructured.csv").read(),
17     "config.yaml": yaml.dump(config)
18 }

Step 1: Upload and Validate Data

1 import yaml
2 
3 # Package and upload your data
4 response = requests.post(
5     "https://api.distillabs.ai/uploads",
6     data=json.dumps(data),
7     headers={"content-type": "application/json", **AUTH_HEADER}
8 )
9 
10 upload_id = response.json().get("id")
11 print(f"Upload successful. ID: {upload_id}")

Step 2: Teacher Evaluation

Before training an SLM, distil labs validates whether a large language model can solve your task:

1 # Start teacher evaluation
2 response = requests.post(
3     f"https://api.distillabs.ai/teacher-evaluations/{upload_id}",
4     headers=AUTH_HEADER
5 )
6 
7 teacher_evaluation_id = response.json().get("id")
8 print(f"Teacher evaluation started. ID: {teacher_evaluation_id}")
9 
10 # Check evaluation results
11 response = requests.get(
12     f"https://api.distillabs.ai/teacher-evaluations/{teacher_evaluation_id}/status",
13     headers=AUTH_HEADER
14 )
15 print(f"Status: {response.json()}")

Step 3: SLM Training

Once the teacher evaluation completes successfully, start the SLM training:

1 # Initiate SLM training
2 response = requests.post(
3     f"https://api.distillabs.ai/trainings/{upload_id}",
4     headers=AUTH_HEADER
5 )
6 
7 slm_training_job_id = response.json().get("id")
8 print(f"Training job started. ID: {slm_training_job_id}")
9 
10 # Check training status
11 response = requests.get(
12     f"https://api.distillabs.ai/trainings/{slm_training_job_id}/status",
13     headers=AUTH_HEADER
14 )
15 print(f"Training status: {response.json()}")
16 
17 # When complete, check performance
18 response = requests.get(
19     f"https://api.distillabs.ai/trainings/{slm_training_job_id}/evaluation-results",
20     headers=AUTH_HEADER
21 )
22 print(f"Evaluation results: {response.json()}")

Step 4: Download Your Model

Once training is complete, download your model for deployment:

1 # Get model download URL
2 response = requests.get(
3     f"https://api.distillabs.ai/trainings/{slm_training_job_id}/model",
4     headers=AUTH_HEADER
5 )
6 
7 print(f"Model ready for download at: {response.json()}")

Step 4: Deploy the model

After you download and untar the model, you can easily deploy it with any model-serving library of your choosing. The following command starts a vllm server with the fine-tuned model:

$ vllm serve model --api-key EMPTY

Query the model with input prompts:

1 from openai import OpenAI
2 
3 
4 client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
5 messages = [
6     {"role": "user", "content": "Hey, can you help me?"},
7     {"role": "system", "content": "You are a helpful assistant."},
8 ]
9 chat_response = client.chat.completions.create(
10     model="model",
11     messages=messages,
12 )
13 print(chat_response)

Next Steps

That’s it! You have successfully trained and deployed a specialized small language model that performs your specific task with high accuracy while being much smaller than general-purpose LLMs. For more advanced usage, explore our comprehensive How to documentation.