D
@dev_reviewer_17
5 reviewsTop use case: General Chat
Reviews
Grok 4.1
Codingxai
#1 LMArena with 1483 Elo. Coding is decent but official benchmarks skip SWE-bench suspiciously.
Mistral Large
Codingmistral
92% HumanEval. Doesn't blabber — gives code and fixes in fewer tokens. Excellent function calling.
Gemini 2.5 Pro
General Chatgoogle
High variance in reliability. Sometimes amazing, sometimes frustrating. Previous gen now.
GPT-4o
General Chatopenai
Still loved by companion/roleplay users. Developers moved on to 5.x.
Claude Sonnet 4.6
Tool Useanthropic
Actually edges out Opus on MCP-Atlas (61.3% vs 60.3%). Exceptional tau2-bench scores.