Push to Hugging Face Hub

Once the model is trained, you can upload it directly to your private Hugging Face repository for easy deployment.

To upload your model, you will need

  1. Training ID of the model you wish to upload (YOUR_TRAINING_ID).
  2. Hugging Face user access token with sufficient write privileges (YOUR_HF_TOKEN).
  3. Name for the model which will be the name of Hugging Face repo (NAME_OF_YOUR_MODEL).
  4. distil labs token

You can upload the model with the following API call:

1import json
2
3slm_training_job_id = "YOUR_TRAINING_ID"
4hf_details = {"hf_token": "YOUR_HF_TOKEN", "repo_id": "NAME_OF_YOUR_MODEL"}
5# Push model to hub
6response = requests.post(
7 f"https://api.distillabs.ai/trainings/{slm_training_job_id}/huggingface_models",
8 data=json.dumps(hf_details),
9 headers={"content-type": "application/json", "Authorization": f"Bearer {token}"},
10)
11print(response.json())

Once your model has been pushed to Hugging Face, you have the possibility to run your model using your preferred inference framework. Currently, Hugging Face provides support for over 10 frameworks such as Ollama and vLLM.

Deploying question answering models from Hugging Face

Hugging Face provides out-of-the-box support for running your question-answer models using vLLM and Ollama. These frameworks run your models on a server and allow you to invoke the models using API requests. This is similar to how you may have used ChatGPT using the OpenAI API specification.

Note that for Ollama, your model needs to be in a GGUF format: GGUF. As such, we push models to two repositories on Hugging Face, one for the GGUF format and one for the safetensors format.

The following snippets show how you can run the models, which can be invoked using API requests:

1from transformers import pipeline
2
3pipe = pipeline("text-generation", model="<USERNAME>/<MODEL_NAME>")
4messages = [
5 {"role": "user", "content": "Who are you?"},
6]
7pipe(messages)

Note that when using Ollama, you may need to upload your Ollama SSH key to Hugging Face to authenticate access to your private models. You can do this by following the instructions: Hugging Face

Deploying classification models from Hugging Face

Once your classification model has been pushed to the model repository, you can use the following snippet to test the model:

transformers
1from transformers import pipeline
2
3pipe = pipeline("text-classification", model="<USERNAME>/<MODEL_NAME>")
4pipe("<INPUT>.")