Tool calling data preparation

Before the training can start, you need to upload all the necessary ingredients to start the training job. For this example, we will focus on closed book tool calling on a pizza-making dataset, where given the recipe and the current state, the model needs to respond with the appropriate next action to complete the pizza. To train a model for this purpose, we will need the following:

Job description

Describes the work you expect the model to perform; you can think of it as an LLM prompt that would help the model solve your task. In practice for a tool calling problem, we expect two components: task_description that describes the main task and tools, a list of JSON Schemas describing the available tools and their parameters (the tool schemas should follow the OpenAI format).

The expected format is a JSON blob, and for Pizza Making, we should have the following:

1{
2 "task_description": "Respond with the next tool call to make a pizza",
3 "tools": [
4 {
5 "type": "function",
6 "function": {
7 "name": "take_out_of_oven",
8 "description": "Remove the pizza from the oven",
9 "parameters": {
10 "type": "object",
11 "properties": {},
12 "required": [],
13 "additionalProperties": false
14 }
15 }
16 },
17 {
18 "type": "function",
19 "function": {
20 "name": "put_in_oven",
21 "description": "Place the pizza in the oven",
22 "parameters": {
23 "type": "object",
24 "properties": {},
25 "required": [],
26 "additionalProperties": false
27 }
28 }
29 },
30 {
31 "type": "function",
32 "function": {
33 "name": "set_oven_temp",
34 "description": "Set the oven temperature in degrees Celsius",
35 "parameters": {
36 "type": "object",
37 "properties": {
38 "degrees_celsius": {
39 "type": "integer",
40 "description": "Temperature in degrees Celsius",
41 "minimum": 50,
42 "maximum": 300
43 }
44 },
45 "required": ["degrees_celsius"],
46 "additionalProperties": false
47 }
48 }
49 },
50 {
51 "type": "function",
52 "function": {
53 "name": "wait",
54 "description": "Wait for a specified number of seconds",
55 "parameters": {
56 "type": "object",
57 "properties": {
58 "seconds": {
59 "type": "integer",
60 "description": "Number of seconds to wait",
61 "minimum": 1
62 }
63 },
64 "required": ["seconds"],
65 "additionalProperties": false
66 }
67 }
68 },
69 {
70 "type": "function",
71 "function": {
72 "name": "add_topping",
73 "description": "Add a topping ingredient to the pizza",
74 "parameters": {
75 "type": "object",
76 "properties": {
77 "ingredient": {
78 "type": "string",
79 "description": "The topping ingredient to add",
80 "enum": [
81 "mozzarella",
82 "gorgonzola",
83 "Parmigiano-Reggiano",
84 "ricotta",
85 "rucola",
86 "pineapple",
87 "ham",
88 "salami",
89 "jalapeƱos",
90 "onions",
91 "black olives",
92 "green olives",
93 "anchovies",
94 "tuna",
95 "mushrooms"
96 ]
97 }
98 },
99 "required": ["ingredient"],
100 "additionalProperties": false
101 }
102 }
103 },
104 {
105 "type": "function",
106 "function": {
107 "name": "add_sauce",
108 "description": "Add sauce to the pizza base",
109 "parameters": {
110 "type": "object",
111 "properties": {
112 "sauce": {
113 "type": "string",
114 "description": "The type of sauce to add",
115 "enum": [
116 "tomato",
117 "alfredo"
118 ]
119 }
120 },
121 "required": ["sauce"],
122 "additionalProperties": false
123 }
124 }
125 },
126 {
127 "type": "function",
128 "function": {
129 "name": "fold",
130 "description": "Fold the pizza in half",
131 "parameters": {
132 "type": "object",
133 "properties": {},
134 "required": [],
135 "additionalProperties": false
136 }
137 }
138 },
139 {
140 "type": "function",
141 "function": {
142 "name": "cut",
143 "description": "Cut the pizza into slices",
144 "parameters": {
145 "type": "object",
146 "properties": {},
147 "required": [],
148 "additionalProperties": false
149 }
150 }
151 },
152 {
153 "type": "function",
154 "function": {
155 "name": "serve",
156 "description": "Serve the finished pizza",
157 "parameters": {
158 "type": "object",
159 "properties": {},
160 "required": [],
161 "additionalProperties": false
162 }
163 }
164 }
165 ]
166}

Test/train data

We need a training dataset to fine-tune the model for your specific task and a test dataset to evaluate its performance after fine-tuning. The more diverse and bigger the datasets, the better, but for the training stage we need only a few dozen examples (we’ll generate much more based on the examples you provide).

The expected format is CSV or JSON-lines with the following columns

  1. question should contain the pizza recipe and the current state of preparation
  2. answer is the tool call to be invoked according to the provided schema

The data for the closed-book pizza making task should look like this:

JSONL format

1{"question":"Margherita recipe: spread tomato sauce on the base, add fresh mozzarella, bake at 250°C for 9-12 minutes. Margherita assembled. Oven heating to 250°C, currently at 100°C.","answer":"{\"name\":\"wait\",\"parameters\":{\"seconds\":150}}"}
2{"question":"Quattro Formaggi recipe: spread alfredo sauce on the base, add mozzarella, gorgonzola, Parmigiano-Reggiano, and ricotta, bake at 220°C for 9 minutes. Quattro Formaggi ready to bake. Oven heated to 220°C.","answer":"{\"name\":\"put_in_oven\",\"parameters\":{}}"}
3{"question":"Anchovy recipe: spread tomato sauce on the base, add mozzarella and anchovies, bake at 240°C for 8 minutes. Anchovy pizza waiting. Oven preheated to 240°C.","answer":"{\"name\":\"put_in_oven\",\"parameters\":{}}"}
4{"question":"Ham recipe: spread tomato sauce on the base, add mozzarella and diced ham, bake at 220°C for 10 minutes. Ham pizza with tomato sauce spread to the edges.","answer":"{\"name\":\"add_topping\",\"parameters\":{\"ingredient\":\"mozzarella\"}}"}
5{"question":"Meat calzone recipe: on half of the dough circle, spread tomato sauce, add mozzarella, salami, and ham, fold over and seal edges, bake at 215°C for 16 minutes. Hearty meat calzone steaming on plate.","answer":"{\"name\":\"serve\",\"parameters\":{}}"}
The answer field should contain a string representing a JSON object, not an actual JSON object (note the escaped double quotes).

CSV format

questionanswer
Margherita recipe: spread tomato sauce on the base, add fresh mozzarella, bake at 250°C for 9-12 minutes. Margherita assembled. Oven heating to 250°C, currently at 100°C."{\"name\":\"wait\",\"parameters\":{\"seconds\":150}}"
Quattro Formaggi recipe: spread alfredo sauce on the base, add mozzarella, gorgonzola, Parmigiano-Reggiano, and ricotta, bake at 220°C for 9 minutes. Quattro Formaggi ready to bake. Oven heated to 220°C."{\"name\":\"put_in_oven\",\"parameters\":{}}"
Anchovy recipe: spread tomato sauce on the base, add mozzarella and anchovies, bake at 240°C for 8 minutes. Anchovy pizza waiting. Oven preheated to 240°C."{\"name\":\"put_in_oven\",\"parameters\":{}}"
Ham recipe: spread tomato sauce on the base, add mozzarella and diced ham, bake at 220°C for 10 minutes. Ham pizza with tomato sauce spread to the edges."{\"name\":\"add_topping\",\"parameters\":{\"ingredient\":\"mozzarella\"}}"
Meat calzone recipe: on half of the dough circle, spread tomato sauce, add mozzarella, salami, and ham, fold over and seal edges, bake at 215°C for 16 minutes. Hearty meat calzone steaming on plate."{\"name\":\"serve\",\"parameters\":{}}"

Unstructured dataset

The unstructured dataset is used to guide the teacher model in generating diverse, domain-specific data. It can be documentation, unlabelled examples, or even industry literature that contains such information.

For the closed book tool calling problem we will use assorted pizza recipes. We can generate new questions based on the information in those passages and then answer them.

The expected format is CSV or JSON lines with a single column (context). For the pizza making task it should look like this:

JSONL format

1{"context": "Classic Margherita Pizza: Start by stretching your dough into a 12-inch circle. Spread 3-4 tablespoons of tomato sauce evenly, leaving a 1-inch border for the crust. Tear fresh mozzarella into small pieces and distribute over the sauce. Drizzle with olive oil, add a pinch of salt, and bake in a preheated oven at 250°C for 9-12 minutes until the crust is golden and cheese is bubbly. Finish with fresh basil leaves after removing from oven."}
2{"context": "Quattro Formaggi (Four Cheese) Pizza: This white pizza celebrates Italian cheeses. Begin with a base of creamy alfredo or bechamel sauce instead of tomato. Layer four cheeses: mozzarella for stretch, gorgonzola for tang, Parmigiano-Reggiano for sharpness, and dollops of ricotta for creaminess. The key is balance - use mozzarella as the base, then add the others sparingly. Bake at 220°C for about 9 minutes. The different melting points create a complex texture."}
3{"context": "Calzone Preparation Technique: A calzone is essentially a folded pizza that creates a pocket of delicious filling. Roll your dough into an oval shape, slightly thicker than for regular pizza. Place fillings only on one half, leaving a 1-inch border. Common fillings include ricotta, mozzarella, Italian sausage, and vegetables. Brush the edges with water, fold the empty half over, and seal by pressing and crimping the edges. Cut 2-3 small vents on top to release steam. Bake at 215°C for 15-18 minutes until golden brown."}
4{"context": "Hawaiian Pizza Assembly: Despite the controversy, Hawaiian pizza has its devoted fans. Start with a traditional tomato sauce base, spreading it evenly across the dough. Add a layer of mozzarella cheese, then distribute cubed ham evenly across the surface. The key to good Hawaiian pizza is properly preparing the pineapple - use fresh or well-drained canned chunks, and pat them dry to prevent excess moisture. Arrange pineapple pieces between the ham. Bake at 230°C for 10-12 minutes."}
5{"context": "Pizza Timing and Temperature Guide: Different pizza styles require different baking approaches. Neapolitan pizzas cook best at extremely high temperatures (450°C+) for just 60-90 seconds. New York style benefits from 280-300°C for 5-7 minutes. Thick crust or deep dish pizzas need lower temperatures (200-220°C) for 15-25 minutes to cook through without burning the top. Always preheat your oven fully - at least 30 minutes for home ovens. Use a pizza stone or steel if available, preheating it with the oven for optimal crust development."}

CSV format

context
Classic Margherita Pizza: Start by stretching your dough into a 12-inch circle. Spread 3-4 tablespoons of tomato sauce evenly, leaving a 1-inch border for the crust. Tear fresh mozzarella into small pieces and distribute over the sauce. Drizzle with olive oil, add a pinch of salt, and bake in a preheated oven at 250°C for 9-12 minutes until the crust is golden and cheese is bubbly. Finish with fresh basil leaves after removing from oven.
Quattro Formaggi (Four Cheese) Pizza: This white pizza celebrates Italian cheeses. Begin with a base of creamy alfredo or bechamel sauce instead of tomato. Layer four cheeses: mozzarella for stretch, gorgonzola for tang, Parmigiano-Reggiano for sharpness, and dollops of ricotta for creaminess. The key is balance - use mozzarella as the base, then add the others sparingly. Bake at 220°C for about 9 minutes. The different melting points create a complex texture.
Calzone Preparation Technique: A calzone is essentially a folded pizza that creates a pocket of delicious filling. Roll your dough into an oval shape, slightly thicker than for regular pizza. Place fillings only on one half, leaving a 1-inch border. Common fillings include ricotta, mozzarella, Italian sausage, and vegetables. Brush the edges with water, fold the empty half over, and seal by pressing and crimping the edges. Cut 2-3 small vents on top to release steam. Bake at 215°C for 15-18 minutes until golden brown.
Hawaiian Pizza Assembly: Despite the controversy, Hawaiian pizza has its devoted fans. Start with a traditional tomato sauce base, spreading it evenly across the dough. Add a layer of mozzarella cheese, then distribute cubed ham evenly across the surface. The key to good Hawaiian pizza is properly preparing the pineapple - use fresh or well-drained canned chunks, and pat them dry to prevent excess moisture. Arrange pineapple pieces between the ham. Bake at 230°C for 10-12 minutes.
Pizza Timing and Temperature Guide: Different pizza styles require different baking approaches. Neapolitan pizzas cook best at extremely high temperatures (450°C+) for just 60-90 seconds. New York style benefits from 280-300°C for 5-7 minutes. Thick crust or deep dish pizzas need lower temperatures (200-220°C) for 15-25 minutes to cook through without burning the top. Always preheat your oven fully - at least 30 minutes for home ovens. Use a pizza stone or steel if available, preheating it with the oven for optimal crust development.