@dev_reviewer_16

5 reviewsTop use case: Coding

Reviews

#1 LMArena with 1483 Elo. Coding is decent but official benchmarks skip SWE-bench suspiciously.

92% HumanEval. Doesn't blabber — gives code and fixes in fewer tokens. Excellent function calling.

High variance in reliability. Sometimes amazing, sometimes frustrating. Previous gen now.

Retired from ChatGPT. Hallucinates modern APIs. Was great in 2024, now obsolete for serious work.

Actually edges out Opus on MCP-Atlas (61.3% vs 60.3%). Exceptional tau2-bench scores.