Press "Enter" to skip to content

Mistral’s new Codestral code completion model

Mistral has enhanced its open-source coding model, Codestral, which has gained significant popularity among developers, intensifying the competition in the arena of coding-focused models designed for developers.

In a recent blog post, Mistral unveiled Codestral 25.01, an upgraded version featuring a more efficient architecture. The company claims this new model will be the “undisputed leader in its weight class” for coding, boasting speeds twice as fast as its predecessor.

Enhanced Capabilities with Codestral 25.01

Building on the strengths of the original model, Codestral 25.01 is tailored for low-latency, high-frequency tasks, excelling in areas such as:

  • Code correction
  • Test generation
  • Fill-in-the-middle tasks

Mistral also highlights its utility for enterprises with data-intensive workflows and model residency requirements.

Performance Benchmarks

Benchmark tests reveal that Codestral 25.01 surpasses its predecessor and other competitors, including Codellama 70B Instruct and DeepSeek Coder 33B Instruct, particularly in Python coding. It achieved an impressive 86.6% in the HumanEval test, solidifying its position as a top performer in the coding model space.


Overview

PythonSQLAverage on several languages
ModelContext lengthHumanEvalMBPPCruxEvalLiveCodeBenchRepoBenchSpiderCanItEditHumanEval (average)HumanEvalFIM (average)
Codestral-2501256k86.6%80.2%55.5%37.9%38.0%66.5%50.5%71.4%85.9%
Codestral-2405 22B32k81.1%78.2%51.3%31.5%34.0%63.5%50.5%65.6%82.1%
Codellama 70B instruct4k67.1%70.8%47.3%20.0%11.4%37.0%29.5%55.3%
DeepSeek Coder 33B instruct16k77.4%80.2%49.5%27.0%28.4%60.0%47.6%65.1%85.3%
DeepSeek Coder V2 lite128k83.5%83.2%49.7%28.1%20.0%72.0%41.0%65.9%84.1%


Per-language

ModelHumanEval PythonHumanEval C++HumanEval JavaHumanEval JavascriptHumanEval BashHumanEval TypescriptHumanEval C#HumanEval (average)
Codestral-250186.6%78.9%72.8%82.6%43.0%82.4%53.2%71.4%
Codestral-2405 22B81.1%68.9%78.5%71.4%40.5%74.8%43.7%65.6%
Codellama 70B instruct67.1%56.5%60.8%62.7%32.3%61.0%46.8%55.3%
DeepSeek Coder 33B instruct77.4%65.8%73.4%73.3%39.2%77.4%49.4%65.1%
DeepSeek Coder V2 lite83.5%68.3%65.2%80.8%34.2%82.4%46.8%65.9%


FIM (single line exact match)

ModelHumanEvalFIM PythonHumanEvalFIM JavaHumanEvalFIM JSHumanEvalFIM (average)
Codestral-250180.2%89.6%87.96%85.89%
Codestral-2405 22B77.0%83.2%86.08%82.07%
OpenAI FIM API*80.0%84.8%86.5%83.7%
DeepSeek Chat API78.8%89.2%85.78%84.63%
DeepSeek Coder V2 lite78.7%87.8%85.90%84.13%
DeepSeek Coder 33B instruct80.1%89.0%86.80%85.3%


FIM pass@1:

ModelHumanEvalFIM PythonHumanEvalFIM JavaHumanEvalFIM JSHumanEvalFIM (average)
Codestral-250192.5%97.1%96.1%95.3%
Codestral-2405 22B90.2%90.1%95.0%91.8%
OpenAI FIM API*91.1%91.8%95.2%92.7%
DeepSeek Chat API91.7%96.1%95.3%94.4%

How to Access Codestral 25.01

Developers can access Codestral 25.01 through multiple channels:

  • Mistral’s IDE plugin partners
  • Local deployment via the Continue code assistant
  • API access through Mistral’s la Plateforme and Google Vertex AI
  • Azure AI Foundry preview
  • Amazon Bedrock, where availability is expected soon.

A Growing Family of Coding Models

Mistral initially launched Codestral in May last year, introducing a 22B parameter model capable of coding in 80 programming languages. It set a new standard for code-centric models and has since been joined by Codestral-Mamba, a code generation model leveraging the Mamba architecture to handle longer code strings and more extensive inputs.

Now, with the release of Codestral 25.01, the model is already climbing the leaderboards on platforms like Copilot Arena, indicating strong interest and adoption within hours of its announcement.

The Proliferation of Coding-Specific Models

The rise of coding-specific models reflects their growing importance in the developer ecosystem. While early foundation models like OpenAI’s o3 and Anthropic’s Claude included coding capabilities, recent years have seen specialized coding models outperform these general-purpose systems. Notable releases include:

  • Qwen2.5-Coder by Alibaba (November)
  • DeepSeek Coder, the first model to outperform GPT-4 Turbo (June)
  • Microsoft’s GRIN-MoE, a mixture of experts model capable of coding and solving math problems.

The Debate: General-Purpose vs. Coding-Specific Models

The choice between a general-purpose model and a coding-specific model remains contentious. While general-purpose models like Claude offer broader applications, coding-focused models like Codestral demonstrate superior performance in their domain-specific tasks. By focusing solely on coding, Codestral provides developers with a highly optimized tool for tasks like debugging, code generation, and testing, albeit at the expense of versatility in non-coding tasks like email drafting.


Mistral’s commitment to improving coding-focused models underscores the growing demand for specialized AI solutions. With Codestral 25.01, the company has not only raised the bar for coding efficiency and accuracy but also provided developers with an accessible and powerful tool for tackling coding challenges.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *