Vibe Check

Tell us how LLMs work for you and see if everyone else feels the same way

Metrics Check

Continuously runs coding tasks against LLMs to track their performance over time

IsItNerfed Dataset

Success Rate (Higher is better)
Claude Code (Opus 4.6)
Claude Code (Sonnet 4.5)
GPT-5 Nano

Shows how well LLMs perform on IsItNerfed dataset over time.Higher = better performance, lower = worse performance.

Aider Dataset

Success Rate (Higher is better)
Claude Code (Opus 4.6)
GPT-5 Nano

Shows how well LLMs perform on Aider Polyglot dataset over time.Higher = better performance, lower = worse performance.