Multi-turn tool calling data preparation | Distil labs

Before the training can start, you need to upload all the necessary ingredients to start the training job.

For this example, we will focus on multi-turn tool calling on a file system navigation dataset, where given a conversation history of user requests and assistant tool calls, the model needs to respond with the appropriate next tool call. Unlike single-turn tool calling where each input is independent, multi-turn tool calling maintains context across the conversation. Each assistant turn produces exactly one function call. To train a model for this purpose, we will need the following:

Model compatibility

Tool calling tasks have specific model compatibility requirements:

Student models: Only GPTOSS, Qwen3 and Llama 3-family models are supported for tool-calling-closed-book and multi-turn-tool-calling-closed-book tasks.

Teacher models for multi-turn: Multi-turn tool calling (multi-turn-tool-calling-closed-book) requires one of the following teacher models:

Qwen3-235B-A22B-Instruct-2507
Llama-3.1-405B-Instruct
openai.gpt-oss-120b

Job description

Describes the work you expect the model to perform; you can think of it as an LLM prompt that would help the model solve your task. In practice for a multi-turn tool calling problem, we expect two components: task_description that describes the main task and tools, a list of JSON Schemas describing the available tools and their parameters (the tool schemas should follow the OpenAI format).

The expected format is a JSON file:

job_description.json

1 {
2   "task_description": "You are an intelligent AI assistant that helps users navigate and manage files in a file system. Given a command or request from the user, call the appropriate file system function to complete the request.",
3   "tools": [
4     {
5       "type": "function",
6       "function": {
7         "name": "ls",
8         "description": "List the contents of the current directory.",
9         "parameters": {
10           "type": "object",
11           "properties": {
12             "a": {
13               "type": "boolean",
14               "description": "Show hidden files and directories. Defaults to False.",
15               "default": false
16             }
17           },
18           "required": []
19         }
20       }
21     },
22     {
23       "type": "function",
24       "function": {
25         "name": "cd",
26         "description": "Change the current working directory to the specified folder.",
27         "parameters": {
28           "type": "object",
29           "properties": {
30             "folder": {
31               "type": "string",
32               "description": "The folder of the directory to change to."
33             }
34           },
35           "required": ["folder"]
36         }
37       }
38     },
39     {
40       "type": "function",
41       "function": {
42         "name": "cat",
43         "description": "Display the contents of a file from the current directory.",
44         "parameters": {
45           "type": "object",
46           "properties": {
47             "file_name": {
48               "type": "string",
49               "description": "The name of the file to display."
50             }
51           },
52           "required": ["file_name"]
53         }
54       }
55     },
56     {
57       "type": "function",
58       "function": {
59         "name": "mkdir",
60         "description": "Create a new directory in the current directory.",
61         "parameters": {
62           "type": "object",
63           "properties": {
64             "dir_name": {
65               "type": "string",
66               "description": "The name of the new directory to create."
67             }
68           },
69           "required": ["dir_name"]
70         }
71       }
72     },
73     {
74       "type": "function",
75       "function": {
76         "name": "touch",
77         "description": "Create a new file in the current directory.",
78         "parameters": {
79           "type": "object",
80           "properties": {
81             "file_name": {
82               "type": "string",
83               "description": "The name of the new file to create."
84             }
85           },
86           "required": ["file_name"]
87         }
88       }
89     },
90     {
91       "type": "function",
92       "function": {
93         "name": "rm",
94         "description": "Remove a file or directory.",
95         "parameters": {
96           "type": "object",
97           "properties": {
98             "file_name": {
99               "type": "string",
100               "description": "The name of the file or directory to remove."
101             }
102           },
103           "required": ["file_name"]
104         }
105       }
106     },
107     {
108       "type": "function",
109       "function": {
110         "name": "cp",
111         "description": "Copy a file or directory from one location to another.",
112         "parameters": {
113           "type": "object",
114           "properties": {
115             "source": {
116               "type": "string",
117               "description": "The name of the file or directory to copy."
118             },
119             "destination": {
120               "type": "string",
121               "description": "The destination name to copy to."
122             }
123           },
124           "required": ["source", "destination"]
125         }
126       }
127     },
128     {
129       "type": "function",
130       "function": {
131         "name": "mv",
132         "description": "Move or rename a file or directory.",
133         "parameters": {
134           "type": "object",
135           "properties": {
136             "source": {
137               "type": "string",
138               "description": "Source name of the file or directory to move."
139             },
140             "destination": {
141               "type": "string",
142               "description": "The destination name to move to."
143             }
144           },
145           "required": ["source", "destination"]
146         }
147       }
148     }
149   ]
150 }

Train/test data

We need a training dataset to fine-tune the model for your specific task and a test dataset to evaluate its performance after fine-tuning. The more diverse and bigger the datasets, the better, but for the training stage we need only a few dozen examples (we’ll generate much more based on the examples you provide).

The expected format is CSV or JSONL with the following columns:

Column	Description
`question`	A JSON array representing the conversation history with alternating user and assistant messages
`answer`	The tool call to be invoked according to the provided schema

Conversation format

The question field contains a JSON array of conversation turns. Each turn is an object with:

role: Either "user" or "assistant"
content: The text content of the message (empty string for assistant turns with tool calls)
tool_calls: (assistant only) An array containing exactly one tool call made by the assistant

Each assistant turn must contain exactly one function call. Multiple function calls per turn are not supported.

This format allows the model to understand the full context of the conversation before generating the next tool call.

JSONL format

train.jsonl

1 {"question": "[{\"role\": \"user\", \"content\": \"Please list all the files in my current directory.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"ls\", \"arguments\": {}}}]}, {\"role\": \"user\", \"content\": \"Navigate to the backup directory.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"cd\", \"arguments\": {\"folder\": \"backup\"}}}]}, {\"role\": \"user\", \"content\": \"Create a new folder called archive_2024.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"mkdir\", \"arguments\": {\"dir_name\": \"archive_2024\"}}}]}, {\"role\": \"user\", \"content\": \"Show me what's inside config.txt.\"}]", "answer": "{\"name\": \"cat\", \"parameters\": {\"file_name\": \"config.txt\"}}"}
2 {"question": "[{\"role\": \"user\", \"content\": \"What's my current working directory?\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"pwd\", \"arguments\": {}}}]}, {\"role\": \"user\", \"content\": \"Go to the downloads folder.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"cd\", \"arguments\": {\"folder\": \"downloads\"}}}]}, {\"role\": \"user\", \"content\": \"Find all files with \\\"invoice\\\" in their name.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"find\", \"arguments\": {\"path\": \".\", \"name\": \"invoice\"}}}]}, {\"role\": \"user\", \"content\": \"Copy invoice_march.pdf to the accounting folder.\"}]", "answer": "{\"name\": \"cp\", \"parameters\": {\"source\": \"invoice_march.pdf\", \"destination\": \"accounting\"}}"}
3 {"question": "[{\"role\": \"user\", \"content\": \"Show me all files here including hidden ones.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"ls\", \"arguments\": {\"a\": true}}}]}, {\"role\": \"user\", \"content\": \"Change directory to research.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"cd\", \"arguments\": {\"folder\": \"research\"}}}]}, {\"role\": \"user\", \"content\": \"Create a file called experiment_log.txt.\"}]", "answer": "{\"name\": \"touch\", \"parameters\": {\"file_name\": \"experiment_log.txt\"}}"}
4 {"question": "[{\"role\": \"user\", \"content\": \"Display the contents of my current directory.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"ls\", \"arguments\": {}}}]}, {\"role\": \"user\", \"content\": \"Move to the utils directory.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"cd\", \"arguments\": {\"folder\": \"utils\"}}}]}, {\"role\": \"user\", \"content\": \"Remove helper.js.\"}]", "answer": "{\"name\": \"rm\", \"parameters\": {\"file_name\": \"helper.js\"}}"}
5 {"question": "[{\"role\": \"user\", \"content\": \"List files in the current location.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"ls\", \"arguments\": {}}}]}, {\"role\": \"user\", \"content\": \"Navigate to the media folder.\"}, {\"role\": \"assistant\", \"content\": \"\", \"tool_calls\": [{\"type\": \"function\", \"function\": {\"name\": \"cd\", \"arguments\": {\"folder\": \"media\"}}}]}, {\"role\": \"user\", \"content\": \"Move photo_001.jpg to the gallery directory.\"}]", "answer": "{\"name\": \"mv\", \"parameters\": {\"source\": \"photo_001.jpg\", \"destination\": \"gallery\"}}"}

The question field contains a stringified JSON array representing the conversation. The answer field should contain a string representing a JSON object, not an actual JSON object (note the escaped double quotes).

Understanding the conversation structure

Here’s an expanded view of what a single conversation looks like:

Conversation structure

1 [
2   {
3     "role": "user",
4     "content": "Please list all the files in my current directory."
5   },
6   {
7     "role": "assistant",
8     "content": "",
9     "tool_calls": [
10       {
11         "type": "function",
12         "function": {
13           "name": "ls",
14           "arguments": {}
15         }
16       }
17     ]
18   },
19   {
20     "role": "user",
21     "content": "Navigate to the backup directory."
22   },
23   {
24     "role": "assistant",
25     "content": "",
26     "tool_calls": [
27       {
28         "type": "function",
29         "function": {
30           "name": "cd",
31           "arguments": {"folder": "backup"}
32         }
33       }
34     ]
35   },
36   {
37     "role": "user",
38     "content": "Show me what's inside config.txt."
39   }
40 ]

The model receives this conversation history and should output: {"name": "cat", "parameters": {"file_name": "config.txt"}}

Key differences from single-turn tool calling

Aspect	Single-Turn Tool Calling	Multi-Turn Tool Calling
Input format	Plain text question	JSON array of conversation turns
Context	Each request is independent	Model maintains context across turns
Use case	One-shot function invocation	Conversational command sequences
Training goal	Map query → function	Understand conversation → next function

Configuration file

The configuration file specifies the task type and training parameters.

The expected format is YAML:

config.yaml

1 base:
2   task: multi-turn-tool-calling-closed-book

For additional configuration options, see Configuration file →