Mistral has enhanced its open-source coding model, Codestral, which has gained significant popularity among developers, intensifying the competition in the arena of coding-focused models designed for developers.
In a recent blog post, Mistral unveiled Codestral 25.01, an upgraded version featuring a more efficient architecture. The company claims this new model will be the “undisputed leader in its weight class” for coding, boasting speeds twice as fast as its predecessor.
Enhanced Capabilities with Codestral 25.01
Building on the strengths of the original model, Codestral 25.01 is tailored for low-latency, high-frequency tasks, excelling in areas such as:
- Code correction
- Test generation
- Fill-in-the-middle tasks
Mistral also highlights its utility for enterprises with data-intensive workflows and model residency requirements.
Performance Benchmarks
Benchmark tests reveal that Codestral 25.01 surpasses its predecessor and other competitors, including Codellama 70B Instruct and DeepSeek Coder 33B Instruct, particularly in Python coding. It achieved an impressive 86.6% in the HumanEval test, solidifying its position as a top performer in the coding model space.
Overview
Python | SQL | Average on several languages | ||||||||
Model | Context length | HumanEval | MBPP | CruxEval | LiveCodeBench | RepoBench | Spider | CanItEdit | HumanEval (average) | HumanEvalFIM (average) |
Codestral-2501 | 256k | 86.6% | 80.2% | 55.5% | 37.9% | 38.0% | 66.5% | 50.5% | 71.4% | 85.9% |
Codestral-2405 22B | 32k | 81.1% | 78.2% | 51.3% | 31.5% | 34.0% | 63.5% | 50.5% | 65.6% | 82.1% |
Codellama 70B instruct | 4k | 67.1% | 70.8% | 47.3% | 20.0% | 11.4% | 37.0% | 29.5% | 55.3% | – |
DeepSeek Coder 33B instruct | 16k | 77.4% | 80.2% | 49.5% | 27.0% | 28.4% | 60.0% | 47.6% | 65.1% | 85.3% |
DeepSeek Coder V2 lite | 128k | 83.5% | 83.2% | 49.7% | 28.1% | 20.0% | 72.0% | 41.0% | 65.9% | 84.1% |
Per-language
Model | HumanEval Python | HumanEval C++ | HumanEval Java | HumanEval Javascript | HumanEval Bash | HumanEval Typescript | HumanEval C# | HumanEval (average) |
---|---|---|---|---|---|---|---|---|
Codestral-2501 | 86.6% | 78.9% | 72.8% | 82.6% | 43.0% | 82.4% | 53.2% | 71.4% |
Codestral-2405 22B | 81.1% | 68.9% | 78.5% | 71.4% | 40.5% | 74.8% | 43.7% | 65.6% |
Codellama 70B instruct | 67.1% | 56.5% | 60.8% | 62.7% | 32.3% | 61.0% | 46.8% | 55.3% |
DeepSeek Coder 33B instruct | 77.4% | 65.8% | 73.4% | 73.3% | 39.2% | 77.4% | 49.4% | 65.1% |
DeepSeek Coder V2 lite | 83.5% | 68.3% | 65.2% | 80.8% | 34.2% | 82.4% | 46.8% | 65.9% |
FIM (single line exact match)
Model | HumanEvalFIM Python | HumanEvalFIM Java | HumanEvalFIM JS | HumanEvalFIM (average) |
---|---|---|---|---|
Codestral-2501 | 80.2% | 89.6% | 87.96% | 85.89% |
Codestral-2405 22B | 77.0% | 83.2% | 86.08% | 82.07% |
OpenAI FIM API* | 80.0% | 84.8% | 86.5% | 83.7% |
DeepSeek Chat API | 78.8% | 89.2% | 85.78% | 84.63% |
DeepSeek Coder V2 lite | 78.7% | 87.8% | 85.90% | 84.13% |
DeepSeek Coder 33B instruct | 80.1% | 89.0% | 86.80% | 85.3% |
FIM pass@1:
Model | HumanEvalFIM Python | HumanEvalFIM Java | HumanEvalFIM JS | HumanEvalFIM (average) |
---|---|---|---|---|
Codestral-2501 | 92.5% | 97.1% | 96.1% | 95.3% |
Codestral-2405 22B | 90.2% | 90.1% | 95.0% | 91.8% |
OpenAI FIM API* | 91.1% | 91.8% | 95.2% | 92.7% |
DeepSeek Chat API | 91.7% | 96.1% | 95.3% | 94.4% |
How to Access Codestral 25.01
Developers can access Codestral 25.01 through multiple channels:
- Mistral’s IDE plugin partners
- Local deployment via the Continue code assistant
- API access through Mistral’s la Plateforme and Google Vertex AI
- Azure AI Foundry preview
- Amazon Bedrock, where availability is expected soon.
A Growing Family of Coding Models
Mistral initially launched Codestral in May last year, introducing a 22B parameter model capable of coding in 80 programming languages. It set a new standard for code-centric models and has since been joined by Codestral-Mamba, a code generation model leveraging the Mamba architecture to handle longer code strings and more extensive inputs.
Now, with the release of Codestral 25.01, the model is already climbing the leaderboards on platforms like Copilot Arena, indicating strong interest and adoption within hours of its announcement.
The Proliferation of Coding-Specific Models
The rise of coding-specific models reflects their growing importance in the developer ecosystem. While early foundation models like OpenAI’s o3 and Anthropic’s Claude included coding capabilities, recent years have seen specialized coding models outperform these general-purpose systems. Notable releases include:
- Qwen2.5-Coder by Alibaba (November)
- DeepSeek Coder, the first model to outperform GPT-4 Turbo (June)
- Microsoft’s GRIN-MoE, a mixture of experts model capable of coding and solving math problems.
The Debate: General-Purpose vs. Coding-Specific Models
The choice between a general-purpose model and a coding-specific model remains contentious. While general-purpose models like Claude offer broader applications, coding-focused models like Codestral demonstrate superior performance in their domain-specific tasks. By focusing solely on coding, Codestral provides developers with a highly optimized tool for tasks like debugging, code generation, and testing, albeit at the expense of versatility in non-coding tasks like email drafting.
Mistral’s commitment to improving coding-focused models underscores the growing demand for specialized AI solutions. With Codestral 25.01, the company has not only raised the bar for coding efficiency and accuracy but also provided developers with an accessible and powerful tool for tackling coding challenges.
Be First to Comment