×
Menu
Index

3.4.17.2. Fine-tuning OpenAI models with ChronoScan applications

 
Since v1.0.3.14
 
ChronoScan integration of OpenAI allows you to fine tune OpenAI models with data from ChronoScan applications.
This feature is valuable for tailoring your models to meet your specific needs or tasks, allowing you to achieve greater accuracy in alignment with your unique requirements.
 
These are the main concepts to understand and that we will explain in this topic for using and fine tuning models with ChronoScan:
 
 
 

1. Main information and recommendations to take into account for fine tuning with ChronoScan

 
 
 

2. Preparing the right data

 
In order to train a model, first we need to have a ChronoScan job configured and a working batch with the desired data to be trained.
This means we first need to have a working application, with the best valid data possible and being sure of what our task to be trained is.
When we fine tune a ChronoScan batch, we are sending the formatted data to OpenAI and the desired base model, so it is important to make sure our data is validated and that the batch has the information we want our fine tuned to be improved for.
 
The most important thing is to have the best possible and validated data in your batch, and having a minimum of 10 documents in it, but OpenAI recommends to fine tune models with in between 50 to 100 examples (documents) to start noticing significant improvement.
 

3. Creating the necessary files for fine tuning

 
Fine tuning is a complex and an advanced feature, so we are taking for granted that you know how to use ChronoScan and create/ configure good quality working job/ batches.
 
We assume that you have a Job configured and a working batch, with at least 10 documents with valid data.
Now, we want to train a ChatGPT model so we can tailor our needs, and have models that have learned from our own specific valid data.
 
First of all, we need to have a working ChatGPT model configuration created, this is something that you already have if the job is inferencing a ChatGPT model, but if that is not the case, you have to create one obligatory.
It is very important that your ChatGPT model is well configured, specially having a very good system and user prompts engineered, that are going to be the task that we want to train for.
 
*If you want to know how the structured data training works you can refer to this link.
 
Now that we have our job, valid batch and the ChatGPT model configured properly, we can open the fine tuning manager interface in ChronoScan.
 
We are showing a demo batch with 11 documents, with some of it's fields extracted with ChatGPT for this example.
 
3.1. Open the ChatGPT model configurator:
 
3.4.17.2. Fine-tuning OpenAI models with ChronoScan applications
1

Click here to open the ChatGPT model configurator

1. Click here to open the ChatGPT model configurator
2

ChatGPT model configurator

2. ChatGPT model configurator
This is the ChatGPT model configurator, before fine tuning you need to have at least a valid model configuration with the system and user prompts for the training.
 
3.2. Open the Fine tuning manager
 
 
 
3.2.1. Fine tuning dialog
 
At this point we are ready to create a dataset from our batch and ChatGPT model configuration.
We are going to focus on the point 1 of the dialog "Dataset Generation" and the grid table "Datasets for this batch".
 
3.4.17.2. Fine-tuning OpenAI models with ChronoScan applications
1

Select the desired ChatGPT model configuration

1. Select the desired ChatGPT model configuration
Normally you would choose the one the batch is been working with
2

Optional: Generate a testing file from the dataset

2. Optional: Generate a testing file from the dataset
Choose to generate a testing file from the batch and the percentage of examples from it
3

Data dumping

3. Data dumping
Data dumping refers to the fields you want to include in the training and testing file.
Dump all fields: will include all job fields/values in the files
Dump only mapped fields will only include those fields that we have previously mapped from the request response. (recommended)
4

Base model to train from

4. Base model to train from
Select desired base model here
5

Generate dataset

5. Generate dataset
Click here after configuring to generate the training files (dataset)
 
3.2.2. Generated datasets
 
When we generate a dataset, the necessary files (training and testing) are automatically generated in the valid format for OpenAI to train.
These file are created using the system and user prompts configured in our ChronoScan-ChatGPT model plus the document fields data.
(system + user prompts) + assistant
 
Just created datasets are shown in the grid like this:
 
 
As we can see we have the dataset, locally under some folder in our ChronoScan installation directory. At this point we can enter the step 4. Uploading the training files.
 

4. Upload the training and testing files

 
The next step is to upload the dataset to the OpenAI servers so they can use it on the training job want to create.
 
 
Once sent, we will see the training file id on the column "Train file uploaded" like this:
 
 
Now is important to notice the column "total tokens (est)", this is an estimation of the amount of tokens that sum the training files.
When we create the fine tuning Job, first ChronoScan will estimate if your license has enough credits to do the training based on his estimation.
 
You can take a look at ChronoScan fine tuning credit costs here.
 
And now we can create the fine tuning job for our dataset.
 

5. Creating training jobs

 
Now, to create the fine tuning job, we click again on the dataset we want to train. (it has to be one with the training files uploaded like explained above) and we configure our training Job.
 
 
5.1. Fine tuning job configuration buttons:
 
ft:{model}:{company_name}:{alias/suffix}:{fine_tuned_hash}
5.2. Creating the Fine tuning job
 
Now we can click on the "Create Fine-tuning job". At this moment we are creating the fine tuning job the of selected dataset on the OpenAI servers.
From now on the process of fine-tuning depends on openai and normally it takes 3 stages.
 
 
We can monitor the fine tuning job(s) status on this grid:
 
 
 
From this moment, the fine tuning process is managed by openai.com.
In parallel, ChronoScan automatic cloud services will check (usually around 3 times per hour (every ~20 minutes)) to check on the status of the fine tuning, and when finished it will notify to the configured emails (if any), and it will charge the corresponding amount of ChronoScan credits, accordingly to the number of tokens trained and reported by openai.com.
 
 
 
 

8. Completions and inferencing the fine tuned models

 
When a fine tuning job is finished and successful, we will see the status "succeeded" and the real trained tokens trained.
At this moment, the model is trained but it won't be available until the ChronoScan cloud service charges the corresponding credits and sets the job as "available".
Once de model is available, we can "activate" it in order to use it in ChronoScan the same way we would use any other opeanai model.
 
 
 
 
8.1 Usign a fine tuned model
 
When the model is "active", it will be listed int eh chatGPT configurator dialog and we can select it and cofigure the same way a regular chatGPT model.
 
 
 
We recommend to activate only those fine tunes that you are happy with and discard the ones that don't meet your goals for a better identification.
 
 

9. Fine tunes report in ChronoScanHub    

 
If you have a ChronoScan user account you can access the cloud portal ChronoScanHub. There you can see the report of your fine tunes under the section "cloud licenses".
You will have to link your ChronoScan license, after that you can click on it and the following report will be displayed.
 
 
 
 
You will have some extra information of your fine tuning jobs, activate/ deactivate any model form any job/batch, find your jobs easier and you will have access to your fine tuning jobs information from everywhere.
Also you have your ChronoScan credits usage log in this action button:
 
 
 
 

10. Pricing for fine tuning and usage of fine tuned models in ChronoScan

 
You can estimate the different services provided by the ChronoScan service account  here: