Guide Large Language Models
By: Kolena Editorial Team
Mistral Fine-Tuning: The Basics and a Quick Tutorial
Mistral Fine-Tuning: The Basics and a Quick Tutorial
Oct 10, 2024
Oct 10, 2024

What Is Mistral Fine-Tuning? 

Mistral, based in France, is a developer of large language models (LLMs), several of which are offered as open source. Mistral’s open source models offer performance that is competitive with state of the art commercial models, making them a popular choice for experimentation and fine tuning.

LLM fine-tuning refers to the process of adjusting a pre-trained model to enhance its performance on specific tasks or datasets. By fine-tuning Mistral models, you can tweak the model parameters, typically weights, on a smaller, task-specific dataset after the model has been trained on a large, general dataset. This targeted training helps improve the accuracy of the model when deployed in specialized applications.

Use Cases for Fine-Tuning Mistral Models 

There are several reasons to fine-tune a Mistral model.

Specific Tone

Fine-tuning Mistral models can help establish a particular tone in generated content, tailoring responses to match a desired personality or style. For example, by training the model on a dataset that captures the distinctive speech patterns of a character like Professor Dumbledore from the Harry Potter series, the model can consistently respond in a similar, recognizable tone. 

This technique can be applied to create more engaging and immersive conversational agents that reflect specific characters, brand voices, or emotional tones suited for various applications, such as customer support, interactive storytelling, and educational tools.

Specific Format

Fine-tuning aids in generating outputs in a specific format, ensuring that the model produces data structured in the desired way. For example, in medical applications, the model can be trained to extract information from clinical notes and output it in a JSON format that categorizes conditions and interventions. 

This structured output can support downstream processing, such as integrating with electronic health records (EHR) systems, improving the efficiency and accuracy of data entry, and supporting clinical decision-making processes. Such fine-tuning ensures the model adheres to formatting requirements, which may be necessary in fields like finance, legal documentation, and technical reporting.

Coding

Fine-tuning Mistral models for coding tasks involves training the model on domain-specific datasets to improve its ability to generate code snippets or complete programming tasks from natural language descriptions. For example, fine-tuning on a dataset of SQL queries paired with natural language questions can enable the model to convert plain text queries into accurate SQL commands. 

This capability is particularly useful in automating routine coding tasks, generating code documentation, or assisting in educational environments where students can receive instant feedback on coding assignments. 

Domain-Specific Augmentation in RAG

In Retrieval-Augmented Generation (RAG) workflows, fine-tuning Mistral models can significantly enhance question-answering performance by training the model to focus on relevant documents and ignore irrelevant ones. This involves fine-tuning the embedding model used for retrieving relevant documents as well as the language model used for generating answers. 

For example, in a legal domain, the model can be trained with legal texts and related Q&A pairs, improving its ability to provide contextually accurate answers to legal queries. This targeted training is useful for knowledge management, research assistance, and information retrieval across industries such as healthcare, finance, and academia.

Understanding Mistral 7B LLM 

Mistral offers several model versions: Mistral 7B, Mixtral 8x7B, and Mixtral 8x22B. We’ll focus our discussion on Mistral 7B which is the most lightweight model, and thus the simplest and most efficient to fine tune. However the process is quite similar for the larger models.

The Mistral 7B LLM is a large language model architecture that extends the accessibility of machine learning tools to wider audiences, including small businesses and developers without extensive computational resources. Mistral 7B can deliver advanced AI functionalities, such as coherent text generation and language comprehension, without the prohibitive costs associated with larger models.

Mistral 7B’s design is optimized for scalability and customization, making it suitable for applications requiring adaptability and cost efficiency. The model supports applications ranging from automating routine tasks to generating interactive conversational agents.

Quick Tutorial: Fine-Tuning Mistral 7B

Here’s an overview of how to fine-tune the Mistral 7B LLM for your specific use case. This tutorial is adapted from the official Mistral documentation.

Preparing a Dataset

To begin fine-tuning the Mistral 7B model, we need to prepare our dataset. For this tutorial, we will use the ultrachat_200k dataset. The following code snippet demonstrates how to load a chunk of this dataset into Pandas DataFrames, split it into training and validation sets, and save these sets in the required JSONL format:

import pandas as pd

# Load dataset
df = pd.read_parquet('https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k/resolve/main/data/test_gen-00000-of-00001-3d4cd8309148a71f.parquet')

# Split dataset into training and evaluation sets
df_train = df.sample(frac=0.995, random_state=200)
df_eval = df.drop(df_train.index)

# Save datasets in JSONL format
df_train.to_json("ultrachat_chunk_train.jsonl", orient="records", lines=True)
df_eval.to_json("ultrachat_chunk_eval.jsonl", orient="records", lines=True)

This script reads the data from a Parquet file, splits it into training and evaluation sets, and then saves these sets in JSONL format, which is required for fine-tuning with Mistral’s API.

Reformatting the Dataset

Before uploading the dataset, it needs to be reformatted to ensure compatibility with the Mistral API. The provided script reformat_data.py helps validate and reformat both the training and evaluation data:

# Download the reformat script
wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/reformat_data.py

# Validate and reformat training data
python reformat_data.py ultrachat_chunk_train.jsonl

# Validate and reformat the evaluation data
python reformat_data.py ultrachat_chunk_eval.jsonl

Running these commands will check and reformat the data files, removing any problematic cases that could cause errors during the fine-tuning process.

Uploading the Dataset

Next, upload the reformatted datasets to the Mistral Client to make them available for fine-tuning jobs:

import os
from mistralai.client import MistralClient

# Initialize the client
api_key = os.environ.get("MISTRAL_API_KEY")
client = MistralClient(api_key=api_key)

# Upload training data
with open("ultrachat_chunk_train.jsonl", "rb") as f:
    ultrachat_chunk_train = client.files.create(file=("ultrachat_chunk_train.jsonl", f))

# Upload evaluation data
with open("ultrachat_chunk_eval.jsonl", "rb") as f:
    ultrachat_chunk_eval = client.files.create(file=("ultrachat_chunk_eval.jsonl", f))

After uploading, you will receive file IDs for the training and evaluation datasets, which are needed for creating the fine-tuning job.

Creating a Fine-Tuning Job

With the datasets uploaded, you can create a fine-tuning job. This involves specifying the model to fine-tune, the dataset files, and the hyperparameters for the training process:

from mistralai.models.jobs import TrainingParameters

# Create a fine-tuning job
created_jobs = client.jobs.create(
    model="open-mistral-7b",
    training_files=[ultrachat_chunk_train.id],
    validation_files=[ultrachat_chunk_eval.id],
    hyperparameters=TrainingParameters(
        training_steps=10,
        learning_rate=0.0001,
    )
)

This script sets up a job to fine-tune the Mistral 7B model using the specified training steps and learning rate.

Using the Fine-Tuned Model

Once the fine-tuning job is completed, you can use the fine-tuned model for various applications. Here’s an example of using the model for a chat application:

from mistralai.models.chat_completion import ChatMessage

# Use the fine-tuned model
chat_response = client.chat(
    model=retrieved_job.fine_tuned_model,
    messages=[ChatMessage(role='user', content='What are the best places to visit in Italy?')]
)

This code sends a user query to the fine-tuned model and retrieves a response.

Analyzing and Evaluating the Fine-Tuned Model

To monitor the performance of the fine-tuned model, you can retrieve metrics such as training loss, validation loss, and validation token accuracy:

# Retrieve job details
retrieved_jobs = client.jobs.retrieve(created_jobs.id)
print(retrieved_jobs)

These metrics provide insights into how well the model has learned from the training data and its ability to generalize to new, unseen data.

AI Testing & Validation with Kolena

Kolena is an AI/ML testing & validation platform that solves one of AI’s biggest problems: the lack of trust in model effectiveness. The use cases for AI are enormous, but AI lacks trust from both builders and the public. It is our responsibility to build that trust with full transparency and explainability of ML model performance, not just from a high-level aggregate ‘accuracy’ number, but from rigorous testing and evaluation at scenario levels.

With Kolena, machine learning engineers and data scientists can uncover hidden machine learning model behaviors, easily identify gaps in the test data coverage, and truly learn where and why a model is underperforming, all in minutes not weeks. Kolena’s AI / ML model testing and validation solution helps developers build safe, reliable, and fair systems by allowing companies to instantly stitch together razor-sharp test cases from their data sets, enabling them to scrutinize AI/ML models in the precise scenarios those models will be unleashed upon the real world. Kolena platform transforms the current nature of AI development from experimental into an engineering discipline that can be trusted and automated.

Learn more about Kolena