Ranked by developer community ratings. Minimum 3 reviews to qualify.
“ARC-AGI-2 champion at 77.1%. Best pure reasoning model available.” — @dev_reviewer_9
“Crushes competitive programming but fumbles real-world engineering. 450s reasoning for Tetris and still broken code.” — @dev_reviewer_4