Vibe Check
Tell us how LLMs work for you and see if everyone else feels the same way
Metrics Check
Continuously runs coding tasks against LLMs to track their performance over time
IsItNerfed Dataset
Success Rate (Higher is better)Claude Code (Opus 4.6)
Claude Code (Sonnet 4.5)
GPT-5 Nano
Shows how well LLMs perform on IsItNerfed dataset over time.Higher = better performance, lower = worse performance.
Aider Dataset
Success Rate (Higher is better)Claude Code (Opus 4.6)
GPT-5 Nano
Shows how well LLMs perform on Aider Polyglot dataset over time.Higher = better performance, lower = worse performance.