Distil labs inference playground

You can use the distil labs inference playground to test your trained model. The playground provides a hosted deployment endpoint that supports OpenAI-compatible inference.

The inference playground deployments are not intended for production use. Once you’re ready for production, contact us at contact@distillabs.ai and we’ll set you up.

Using the CLI

The distil CLI is the quickest way to deploy, query, and manage your model on distil-managed remote infrastructure.

Activating a deployment

Deploy your trained model with a single command:

1distil model deploy remote <model-id>

The CLI will provision your deployment and display the endpoint URL, API key, and a client script you can use to query your model.

To output only the client script (useful for piping to a file):

1distil model deploy remote --client-script <model-id>

Querying your model

Get the command to invoke your deployed model:

1distil model invoke <model-id>

This outputs a ready-to-run command using uv pointing to client saved in CLI’s cache. Copy and run it directly:

$uv run PATH_TO_CLIENT --question "Your question here"

For question answering models that require context, use the --context flag:

$uv run PATH_TO_CLIENT --question "Your question here" --context "Your context here"

It’s important to use the correct system prompt and message formatting when querying your SLM. SLMs are specialized and expect exactly the same format as seen during training. Using a different system prompt or formatting will result in poor performance.

Deactivating a deployment

When you’re done testing, deactivate your deployment to conserve credits:

1distil model deploy remote --deactivate <model-id>

CLI options reference

OptionDescription
--client-scriptOutput only the client script for the deployment
--deactivateDeactivate a remote deployment
--output jsonOutput results in JSON format

Using the API

You can also manage deployments programmatically using the REST API.

Activating a deployment

1curl -X POST "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
2 -H "Authorization: Bearer $DISTIL_TOKEN" \
3 -H "Content-Type: application/json" \
4 -d "{}"

The response includes all the information you need to query your model:

1{
2 "id": "deployment-uuid",
3 "training_id": "your-training-uuid",
4 "deployment_status": "active",
5 "url": "https://your-deployment-endpoint.distillabs.ai",
6 "client_script": "...",
7 "secrets": {
8 "api_key": "your-api-key"
9 }
10}

The deployment_status field indicates the current state:

  • building - Deployment is being provisioned
  • active - Ready to accept requests
  • inactive - Deployment has been deactivated
  • credits_exhausted - No credits remaining

The client_script field contains example Python code you can use to query your model. It is important that you use the exact prompt format shown in this script when querying your model.

Retrieving deployment information

After your deployment is set up, you can also retrieve information about it (the format will be the same as shown above).

1curl -X GET "https://api.distillabs.ai/trainings/YOUR_DEPLOYMENT_ID/deployment" \
2 -H "Authorization: Bearer $DISTIL_TOKEN"

Querying your model

Extract the client script from your deployment and save it to a file (you will need jq installed):

$curl -s "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
> -H "Authorization: Bearer $DISTIL_TOKEN" \
> | jq -r '.client_script' > model_client.py

Then run the script with your question and context. You will need the openai Python package available locally.

$python model_client.py \
> --question "Your question here" \
> --context "Your context here"

It’s important to use the correct system prompt and message formatting when querying your SLM. SLMs are specialized and expect exactly the same format as seen during training. Using a different system prompt or formatting will result in poor performance.

Deactivating a deployment

When you’re done testing, deactivate your deployment to conserve credits:

1curl -X DELETE "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
2 -H "Authorization: Bearer $DISTIL_TOKEN"

Using the web dashboard

You can also manage deployments through the web interface:

  1. Open your model from the distil labs dashboard
  2. Click “Deploy model” in the left navigation bar
  3. On the “Deploy on distil labs” tab, click the “Deploy Model” button

Deploy Model button

After clicking, the deployment process might take a few minutes. Once ready, you will see:

  • the deployment endpoint URL,
  • the API key,
  • an example Python script to make requests against the endpoint.

Deployed model

Credits

Inference playground deployments require credits. When you run out of credits, you won’t be able to create new deployments and your existing deployments will be deactivated. All users get $30 of free starting credits - reach out to us at contact@distillabs.ai when you need more.