Distil labs inference playground
You can use the distil labs inference playground to test your trained model. The playground provides a hosted deployment endpoint that supports OpenAI-compatible inference.
The inference playground deployments are not intended for production use. Once you’re ready for production, contact us at contact@distillabs.ai and we’ll set you up.
Using the CLI
The distil CLI is the quickest way to deploy, query, and manage your model on distil-managed remote infrastructure.
Activating a deployment
Deploy your trained model with a single command:
The CLI will provision your deployment and display the endpoint URL, API key, and a client script you can use to query your model.
To output only the client script (useful for piping to a file):
Querying your model
Get the command to invoke your deployed model:
This outputs a ready-to-run command using uv pointing to client saved in CLI’s cache. Copy and run it directly:
For question answering models that require context, use the --context flag:
It’s important to use the correct system prompt and message formatting when querying your SLM. SLMs are specialized and expect exactly the same format as seen during training. Using a different system prompt or formatting will result in poor performance.
Deactivating a deployment
When you’re done testing, deactivate your deployment to conserve credits:
CLI options reference
Using the API
You can also manage deployments programmatically using the REST API.
Activating a deployment
The response includes all the information you need to query your model:
The deployment_status field indicates the current state:
building- Deployment is being provisionedactive- Ready to accept requestsinactive- Deployment has been deactivatedcredits_exhausted- No credits remaining
The client_script field contains example Python code you can use to query your model. It is important that you use the exact prompt format shown in this script when querying your model.
Retrieving deployment information
After your deployment is set up, you can also retrieve information about it (the format will be the same as shown above).
Querying your model
Extract the client script from your deployment and save it to a file (you will need jq installed):
Then run the script with your question and context. You will need the openai Python package available locally.
It’s important to use the correct system prompt and message formatting when querying your SLM. SLMs are specialized and expect exactly the same format as seen during training. Using a different system prompt or formatting will result in poor performance.
Deactivating a deployment
When you’re done testing, deactivate your deployment to conserve credits:
Using the web dashboard
You can also manage deployments through the web interface:
- Open your model from the distil labs dashboard
- Click “Deploy model” in the left navigation bar
- On the “Deploy on distil labs” tab, click the “Deploy Model” button

After clicking, the deployment process might take a few minutes. Once ready, you will see:
- the deployment endpoint URL,
- the API key,
- an example Python script to make requests against the endpoint.

Credits
Inference playground deployments require credits. When you run out of credits, you won’t be able to create new deployments and your existing deployments will be deactivated. All users get $30 of free starting credits - reach out to us at contact@distillabs.ai when you need more.
