Mistral’s new Codestral code completion model

Mistral has enhanced its open-source coding model, Codestral, which has gained significant popularity among developers, intensifying the competition in the arena of coding-focused models designed for developers.

In a recent blog post, Mistral unveiled Codestral 25.01, an upgraded version featuring a more efficient architecture. The company claims this new model will be the “undisputed leader in its weight class” for coding, boasting speeds twice as fast as its predecessor.

Enhanced Capabilities with Codestral 25.01

Building on the strengths of the original model, Codestral 25.01 is tailored for low-latency, high-frequency tasks, excelling in areas such as:

Code correction
Test generation
Fill-in-the-middle tasks

Mistral also highlights its utility for enterprises with data-intensive workflows and model residency requirements.

Performance Benchmarks

Benchmark tests reveal that Codestral 25.01 surpasses its predecessor and other competitors, including Codellama 70B Instruct and DeepSeek Coder 33B Instruct, particularly in Python coding. It achieved an impressive 86.6% in the HumanEval test, solidifying its position as a top performer in the coding model space.

Overview

		Python					SQL		Average on several languages
Model	Context length	HumanEval	MBPP	CruxEval	LiveCodeBench	RepoBench	Spider	CanItEdit	HumanEval (average)	HumanEvalFIM (average)
Codestral-2501	256k	86.6%	80.2%	55.5%	37.9%	38.0%	66.5%	50.5%	71.4%	85.9%
Codestral-2405 22B	32k	81.1%	78.2%	51.3%	31.5%	34.0%	63.5%	50.5%	65.6%	82.1%
Codellama 70B instruct	4k	67.1%	70.8%	47.3%	20.0%	11.4%	37.0%	29.5%	55.3%	–
DeepSeek Coder 33B instruct	16k	77.4%	80.2%	49.5%	27.0%	28.4%	60.0%	47.6%	65.1%	85.3%
DeepSeek Coder V2 lite	128k	83.5%	83.2%	49.7%	28.1%	20.0%	72.0%	41.0%	65.9%	84.1%

Per-language

Model	HumanEval Python	HumanEval C++	HumanEval Java	HumanEval Javascript	HumanEval Bash	HumanEval Typescript	HumanEval C#	HumanEval (average)
Codestral-2501	86.6%	78.9%	72.8%	82.6%	43.0%	82.4%	53.2%	71.4%
Codestral-2405 22B	81.1%	68.9%	78.5%	71.4%	40.5%	74.8%	43.7%	65.6%
Codellama 70B instruct	67.1%	56.5%	60.8%	62.7%	32.3%	61.0%	46.8%	55.3%
DeepSeek Coder 33B instruct	77.4%	65.8%	73.4%	73.3%	39.2%	77.4%	49.4%	65.1%
DeepSeek Coder V2 lite	83.5%	68.3%	65.2%	80.8%	34.2%	82.4%	46.8%	65.9%

FIM (single line exact match)

Model	HumanEvalFIM Python	HumanEvalFIM Java	HumanEvalFIM JS	HumanEvalFIM (average)
Codestral-2501	80.2%	89.6%	87.96%	85.89%
Codestral-2405 22B	77.0%	83.2%	86.08%	82.07%
OpenAI FIM API*	80.0%	84.8%	86.5%	83.7%
DeepSeek Chat API	78.8%	89.2%	85.78%	84.63%
DeepSeek Coder V2 lite	78.7%	87.8%	85.90%	84.13%
DeepSeek Coder 33B instruct	80.1%	89.0%	86.80%	85.3%

FIM pass@1:

Model	HumanEvalFIM Python	HumanEvalFIM Java	HumanEvalFIM JS	HumanEvalFIM (average)
Codestral-2501	92.5%	97.1%	96.1%	95.3%
Codestral-2405 22B	90.2%	90.1%	95.0%	91.8%
OpenAI FIM API*	91.1%	91.8%	95.2%	92.7%
DeepSeek Chat API	91.7%	96.1%	95.3%	94.4%

How to Access Codestral 25.01

Developers can access Codestral 25.01 through multiple channels:

Mistral’s IDE plugin partners
Local deployment via the Continue code assistant
API access through Mistral’s la Plateforme and Google Vertex AI
Azure AI Foundry preview
Amazon Bedrock, where availability is expected soon.

A Growing Family of Coding Models

Mistral initially launched Codestral in May last year, introducing a 22B parameter model capable of coding in 80 programming languages. It set a new standard for code-centric models and has since been joined by Codestral-Mamba, a code generation model leveraging the Mamba architecture to handle longer code strings and more extensive inputs.

Now, with the release of Codestral 25.01, the model is already climbing the leaderboards on platforms like Copilot Arena, indicating strong interest and adoption within hours of its announcement.

The Proliferation of Coding-Specific Models

The rise of coding-specific models reflects their growing importance in the developer ecosystem. While early foundation models like OpenAI’s o3 and Anthropic’s Claude included coding capabilities, recent years have seen specialized coding models outperform these general-purpose systems. Notable releases include:

Qwen2.5-Coder by Alibaba (November)
DeepSeek Coder, the first model to outperform GPT-4 Turbo (June)
Microsoft’s GRIN-MoE, a mixture of experts model capable of coding and solving math problems.

The Debate: General-Purpose vs. Coding-Specific Models

The choice between a general-purpose model and a coding-specific model remains contentious. While general-purpose models like Claude offer broader applications, coding-focused models like Codestral demonstrate superior performance in their domain-specific tasks. By focusing solely on coding, Codestral provides developers with a highly optimized tool for tasks like debugging, code generation, and testing, albeit at the expense of versatility in non-coding tasks like email drafting.

Mistral’s commitment to improving coding-focused models underscores the growing demand for specialized AI solutions. With Codestral 25.01, the company has not only raised the bar for coding efficiency and accuracy but also provided developers with an accessible and powerful tool for tackling coding challenges.