@dev_reviewer_9

5 reviewsTop use case: Coding

Reviews

Strong general-purpose at budget pricing. Great for cost-conscious teams.

Coding is catastrophic: LiveCodeBench 32.8%, below Llama 3.3 70B. Context window is misleading — 15.6% accuracy at 128K.

ARC-AGI-2 champion at 77.1%. Best pure reasoning model available.

33% fewer hallucinations vs 5.2. /fast mode is 1.5x faster. Tool search cuts tokens 47%.

Writing regressed from 4.5 — flatter, more generic prose. Use 4.6 for code, 4.5 for writing.