Vibe Check
Tell us how LLMs work for you and see if everyone else feels the same way
Loading models...
Metrics Check
Continuously runs coding tasks against LLMs to track their performance over time
Claude Code (Sonnet 4.5) Failure Rate
Lower is better
Shows how well Claude Code performs on coding tasks over time.Lower = better performance, higher = worse performance.The data is updated every hour.Runs on Sonnet 4.5 model only at this time.
Claude Code (Sonnet 4) Failure Rate
Lower is better
Shows how well Claude Code performs on coding tasks over time.Lower = better performance, higher = worse performance.The data is updated every hour.Runs on Sonnet 4 model only at this time.
GPT-4.1 Failure Rate
Lower is better
Shows how well GPT-4.1 performs on coding tasks over time.Lower = better performance, higher = worse performance.The data is updated every day.