Vibe Check

Tell us how LLMs work for you and see if everyone else feels the same way

Loading models...

Metrics Check

Continuously runs coding tasks against LLMs to track their performance over time

Claude Code (Sonnet 4.5) Failure Rate

Lower is better

Shows how well Claude Code performs on coding tasks over time.Lower = better performance, higher = worse performance.The data is updated every hour.Runs on Sonnet 4.5 model only at this time.

Claude Code (Sonnet 4) Failure Rate

Lower is better

Shows how well Claude Code performs on coding tasks over time.Lower = better performance, higher = worse performance.The data is updated every hour.Runs on Sonnet 4 model only at this time.

GPT-4.1 Failure Rate

Lower is better

Shows how well GPT-4.1 performs on coding tasks over time.Lower = better performance, higher = worse performance.The data is updated every day.