LMSYS Chatbot Arena Leaderboard: Today’s Live Elo Rankings (Feb 22, 2026)
Daily Brief: Feb 22, 2026 Key Takeaways
- The New King: Claude Opus 4.6 has officially claimed the overall #1 spot with 1505 Elo.
- The 1500+ Club: Gemini 3.1 Pro and Claude 4.6 Thinking have both broken the 1500 barrier this week.
- PhD-Level Peak: Gemini 3.1 Pro sets a new ceiling on Humanity's Last Exam (HLE) with a no-tools score of 44.4%.
- Coding Divergence: In technical tasks, Claude 4.6 shatters the ceiling at 1561 Elo, leading GPT-5.3-Codex by 90 points.
2026 LLM Intelligence Hub
Access today's real-time specialized audits for the 1500+ Elo frontier:
Today's Top 5: Live Arena Leaderboard (Feb 22, 2026)
| Rank | Model | Elo Score | Primary Strength | Status |
|---|---|---|---|---|
| 🏆 #1 | Claude Opus 4.6 | 1505 | Agentic Planning & Research | ↑ New Global King |
| 🥇 #2 | Claude 4.6 Thinking | 1504 | Self-Correction Logic | ↑ Trending Up |
| 🥇 #3 | Gemini 3.1 Pro | 1500 | PhD-level Science (HLE) | Stable |
| 🥈 #4 | Gemini 3 Pro | 1486 | Multimodal (1M Context) | Stable |
| 🥉 #5 | Seed 2.0 Pro | 1475 | Real-time Video Synthesis | New Entry |
The 1500 Elo Era: Anthropic vs. Google
The lmsys chatbot arena current rankings for February 22, 2026, mark a turning point in AI evolution. We have officially transitioned from conversational assistants to "Reasoning Agents." Claude Opus 4.6 has solidified its lead, achieving a verified 1505 Elo that reflects its superior performance in multi-step agentic tasks and legal reasoning.
However, Google’s Gemini 3.1 Pro remains the leader in abstract visual puzzles, scoring a massive 77.1% on ARC-AGI-2—establishing a benchmark lead that proprietary OpenAI models have yet to match. This divergence proves that relying on brand loyalty is costing you productivity; choosing the model with the better "Primary Strength" for your task is the only way to stay competitive in 2026.
Coding & Terminal Breakthroughs
While the overall leaderboard is a tight race, the specialized Coding Arena has seen a total blowout. Anthropic’s new architecture has pushed the technical ceiling to a record **1561 Elo**. This matches findings from our internal audits, where error rates for complex Python refactors dropped to near-zero with the Claude 4.6 series.
For DevOps pipelines, GPT-5.3-Codex has emerged as the specialist leader for terminal-based tasks, jumping to a **77.3% Terminal-Bench 2.0** score.
Sources & References
- LMArena: Live Crowdsourced Leaderboard – Data verified Feb 22, 2026.
- Google DeepMind: Gemini 3.1 Pro Technical Evaluation and ARC scores.
- Anthropic: Introducing Opus 4.6 and agentic reasoning metrics.
- Technical Breakdown: How Elo is Calculated in LMSYS.
- Methodology: Arena Hard vs Standard Arena Comparison.
External Verification:
Internal Technical Guides: