Did Claude really get dumber again? — Summary & Key Points
TL;DR
Claude Opus 4.7 is experiencing measurable performance regressions, not just a placebo effect from higher user expectations. The author identifies the root causes as poorly engineered Claude Code harness logic, new tokenizers that bloat context by 1.35x, and Anthropic hiding the 'thinking' process which removes crucial context.
Key Quotes
"The harness is, to be frank, pretty [__] poorly coded."
"The 1 million token versions of the model... are behaving worse."
The argument
The regression is real
Claude Opus 4.7 is experiencing measurable performance regressions, not just a placebo effect from higher user expectations. Benchmarks from Margin Labs show a drop from 57% to 55% and Terminal Bench shows Opus in Claude Code performing at 58% compared to 75-82% in other harnesses.
The Claude Code harness is broken
The Claude Code harness is poorly coded, forcing the model to make unnecessary API calls for simple tasks like reading files. This pollutes the context with useless data, wastes tokens, and costs users money.
Tokenization changes bloat context
Opus 4.7 uses a new tokenizer that increases token count by 1.35x to 1.47x. This effectively bloats context windows, forcing the model to process more 'empty' tokens and reducing its ability to find information in files.
Infrastructure and routing issues
Infrastructure issues like context window routing errors and potential routing to 'dumber' 1 million token context versions degrade performance. Anthropic has confirmed the 1M context version behaves worse.
Hiding the thinking process hurts performance
Anthropic's decision to redact the 'thinking' process removes crucial context from API requests. This forces the model to think less deeply, exhibit more lazy behaviors like refusals, and requires 80x more API requests to achieve worse results.
Anthropic's engineering is the root cause
The regressions stem from Anthropic's engineering failures rather than the model weights themselves. The author concludes that other tools like OpenAI's Codex do not suffer from these regressions and recommends switching providers.
Use this with an agent
Copy or download either the structured summary or the full transcript.
Have your own business or tech recording?
Turn demos, webinars, strategy calls, and tech talks into transcripts, summaries, and agent-ready briefs.
Keep exploring
More summaries from the Typist library — picked for the same category

