LMSYS Chatbot Arena Leaderboard: Today’s Live Elo Rankings (Feb 22, 2026)

LMSYS Chatbot Arena Leaderboard Update February 22 2026

Daily Brief: Feb 22, 2026 Key Takeaways

  • The New King: Claude Opus 4.6 has officially claimed the overall #1 spot with 1505 Elo.
  • The 1500+ Club: Gemini 3.1 Pro and Claude 4.6 Thinking have both broken the 1500 barrier this week.
  • PhD-Level Peak: Gemini 3.1 Pro sets a new ceiling on Humanity's Last Exam (HLE) with a no-tools score of 44.4%.
  • Coding Divergence: In technical tasks, Claude 4.6 shatters the ceiling at 1561 Elo, leading GPT-5.3-Codex by 90 points.

2026 LLM Intelligence Hub

Access today's real-time specialized audits for the 1500+ Elo frontier:

Today's Top 5: Live Arena Leaderboard (Feb 22, 2026)

Rank Model Elo Score Primary Strength Status
🏆 #1 Claude Opus 4.6 1505 Agentic Planning & Research
🥇 #2 Claude 4.6 Thinking 1504 Self-Correction Logic
🥇 #3 Gemini 3.1 Pro 1500 PhD-level Science (HLE) Stable
🥈 #4 Gemini 3 Pro 1486 Multimodal (1M Context) Stable
🥉 #5 Seed 2.0 Pro 1475 Real-time Video Synthesis New Entry

The 1500 Elo Era: Anthropic vs. Google

The lmsys chatbot arena current rankings for February 22, 2026, mark a turning point in AI evolution. We have officially transitioned from conversational assistants to "Reasoning Agents." Claude Opus 4.6 has solidified its lead, achieving a verified 1505 Elo that reflects its superior performance in multi-step agentic tasks and legal reasoning.

However, Google’s Gemini 3.1 Pro remains the leader in abstract visual puzzles, scoring a massive 77.1% on ARC-AGI-2—establishing a benchmark lead that proprietary OpenAI models have yet to match. This divergence proves that relying on brand loyalty is costing you productivity; choosing the model with the better "Primary Strength" for your task is the only way to stay competitive in 2026.

Coding & Terminal Breakthroughs

While the overall leaderboard is a tight race, the specialized Coding Arena has seen a total blowout. Anthropic’s new architecture has pushed the technical ceiling to a record **1561 Elo**. This matches findings from our internal audits, where error rates for complex Python refactors dropped to near-zero with the Claude 4.6 series.

For DevOps pipelines, GPT-5.3-Codex has emerged as the specialist leader for terminal-based tasks, jumping to a **77.3% Terminal-Bench 2.0** score.


Create High-Performing Pages with AI. Try Landingi AI

Landingi AI Tool Review

We may earn a commission if you buy through this link. (This does not increase the price for you)


Sources & References

Back to Top