@dev_reviewer_17

5 reviewsTop use case: General Chat

Reviews

#1 LMArena with 1483 Elo. Coding is decent but official benchmarks skip SWE-bench suspiciously.

92% HumanEval. Doesn't blabber — gives code and fixes in fewer tokens. Excellent function calling.

High variance in reliability. Sometimes amazing, sometimes frustrating. Previous gen now.

Still loved by companion/roleplay users. Developers moved on to 5.x.

Actually edges out Opus on MCP-Atlas (61.3% vs 60.3%). Exceptional tau2-bench scores.