Introduction
Welcome to the distil Labs hands‑on tutorial for fine-tuning and deploying your own domain-specialized model. In this tutorial, you’ll learn how to Fine-tune a small language model (SLM) for a custom open-book question answering task using the Distil Labs platform.
Despite its compact size, the fine-tuned SLM will deliver performance close to much larger models—demonstrating how domain specialization and efficient distillation can unlock powerful capabilities on resource-constrained hardware. By the end, you’ll have a functional, local QA assistant—built with minimal data, no ML expertise, and zero dependency on cloud-based LLMs.
Registration
The first step towards model distillation is creating an account at app.distillabs.ai. Once you sign up, you can use your email/password combination in the authentification section below.
Notebook Setup
Copy over necessary data
Install python libraries
Specialize a Question-Answering Model with distil labs
In this chapter you will transform a compact 1B-parameter “student” model into a domain expert—without writing a single training loop yourself. Distil Labs takes care of every heavy-lifting step:
What you need to supply
- A concise job description that tells the platform what “good” looks like
- Roughly 20–100 labeled (question, answer) pairs for train / test
- Any domain documents you want the teacher to read while inventing synthetic Q&A pairs
Everything else (synthetic generation, distillation, evaluation, and packaging) is automated.
Let’s dive in and see how that looks in practice.
Authentication
The first step towards model distillation is logging into your distil labs account you created at the begginning of the notebook. If you registered already, you can use your email/password combination in the authentication section below.
Register a new model
The first component of the workflow is registring a new model - this helps us keep track of all our experiments down the line
Inspect our models
Now that the model is registerd, we can take a look at all the models in our repository
Data Upload
The data for this example should be stored in the data_location directory. Lets first take a look at the current directory to make sure all files are available. Your current directory should look like:
Train/test set
We need a small train dataset to begin distil labs training and a testing dataset that we can use to evaluate the performance of the fine-tuned model. Here, we use the train and test datasets from the data_location directory where each is a CSV file with below 100 (question, answer) pairs.
Unstructured dataset
The unstructured dataset is used to guide the teacher model in generating diverse, domain-specific data. In the case of this open-book example, we need to provide a realistic document that would be used as context for question-answering. Here, we use the unstructured datasets from the data_location/
directory where each is a JSON-lines with a single column (context
).
Let’s inspect the available datasets to see the format and a few examples.
Data upload
We upload our dataset by attaching it to the model we created, this lets us keep all the artifacts in one place
Teacher Evaluation
Before training an SLM, distil labs validates whether a large language model can solve your task:
Poll the status endpoint until it completes, then inspect the quality of generated answers. distil labs shows four scores to tell you how well the “teacher” model answers your test questions. Think of them as different lenses on the same picture—together they give a fuller view than any single number
How to interpret a scorecard
- If Exact-Match is low but LLM-as-a-Judge is high, the answers are probably right but paraphrased—consider adding those paraphrases to your reference set.
- If all four numbers sag, revisit your job description or give the model more context; the task may be under-specified.
Follow the links above for deeper dives if you want to explore the math or research behind each metric.
SLM Training
Once the teacher evaluation completes successfully, start the SLM training:
We can analyze the status of the training job using the jobs
API. The following code snippets displays the current status of the job we started before. When the job is finished (status=complete
), we can use the jobs
API again to get the benchmarking result - the accuracy of the LLM and the accuracy of the fine-tuned SLM. We can achieve this using:
When the job is finished (status=complete
), we can use the jobs
API again to get the benchmarking result for the base and fine-tuned SLM, using the same four metrics as for the teacher evaluation. We can achieve this using:
Download Your Model
You can list all of your models using the cell below. Once training is complete, download the selected model for deployment.
Deploy your fine‑tuned model
Now that we have a small language model fine‑tuned specifically for HotpotQA we can launch a lightweight chat model locally with ollama.
Install ollama in your own system
To install ollama, follow the instructions from https://ollama.com/download and make sure to enable the serving daemon (via ollama serve
). Once ready, make sure the app is running by executing the following command (the list should be empty since we have not loaded any models yet):
(Optional) Install ollama for Google Colab
If you are running this notebook in Google Colab, you can install Ollama using the following link
Once ollama is installed, we should start the application. You can start the daemon with ollama serve
using nohup
to make sure it stays in the background.
Make sure the app is running by executing the following command (the list should be empty since we have not loaded any models yet):
Register and test the downloaded model
Once your model is trained, it should be unpacked and registered with ollama. The downloaded model directory already contains everything that is needed and the model can be registed with the command below. Once it is ready, we can test the model with a standard OpenAI interface