IsItNerfed? Sonnet 4.5 tested!
Sonnet 4.5 benchmark results show 46% failure rate compared to Sonnet 4's 37% on our dataset
Over the past few weeks, we've been working hard on ideas and feedback from the community. Here are the new features we've added:
- More Models and AI agents: Sonnet 4.5, Gemini CLI, Gemini 2.5, GPT-4o
- Vibe Check: now separates AI agents from LLMs
- Charts: new beautiful charts with zoom, panning, chart types and average indicator
- CSV export: You can now export chart data to a CSV file
- New theme
- New tooltips explaining "Vibe Check" and "Metrics Check" features
- Roadmap page where you can track our progress

And yes, we finally tested Sonnet 4.5, and here are our results.

It turns out that while Sonnet 4 averages around 37% failure rate, Sonnet 4.5 averages around 46% on our dataset. Remember that lower is better, which means Sonnet 4 is currently performing better than Sonnet 4.5 on our data.
The situation does seem to be improving over the last 12 hours though, so we're hoping to see numbers better than Sonnet 4 soon.
Please join our subreddit to stay up to date with the latest testing results: https://www.reddit.com/r/isitnerfed
We're grateful for the community's comments and ideas! We'll keep improving the service for you.