Did Claude really get dumber again? — Summary & Key Points

Theo - t3․ggApr 20, 202644:1591K views

TL;DR

Claude Opus 4.7 is experiencing measurable performance regressions, not just a placebo effect from higher user expectations. The author identifies the root causes as poorly engineered Claude Code harness logic, new tokenizers that bloat context by 1.35x, and Anthropic hiding the 'thinking' process which removes crucial context.

Key Quotes

"The harness is, to be frank, pretty [__] poorly coded."
Theo
"The 1 million token versions of the model... are behaving worse."
Theo

The argument

The regression is real

Claude Opus 4.7 is experiencing measurable performance regressions, not just a placebo effect from higher user expectations. Benchmarks from Margin Labs show a drop from 57% to 55% and Terminal Bench shows Opus in Claude Code performing at 58% compared to 75-82% in other harnesses.

The Claude Code harness is broken

The Claude Code harness is poorly coded, forcing the model to make unnecessary API calls for simple tasks like reading files. This pollutes the context with useless data, wastes tokens, and costs users money.

Tokenization changes bloat context

Opus 4.7 uses a new tokenizer that increases token count by 1.35x to 1.47x. This effectively bloats context windows, forcing the model to process more 'empty' tokens and reducing its ability to find information in files.

Infrastructure and routing issues

Infrastructure issues like context window routing errors and potential routing to 'dumber' 1 million token context versions degrade performance. Anthropic has confirmed the 1M context version behaves worse.

Hiding the thinking process hurts performance

Anthropic's decision to redact the 'thinking' process removes crucial context from API requests. This forces the model to think less deeply, exhibit more lazy behaviors like refusals, and requires 80x more API requests to achieve worse results.

Anthropic's engineering is the root cause

The regressions stem from Anthropic's engineering failures rather than the model weights themselves. The author concludes that other tools like OpenAI's Codex do not suffer from these regressions and recommends switching providers.

Use this with an agent

Copy or download either the structured summary or the full transcript.

Have your own business or tech recording?

Turn demos, webinars, strategy calls, and tech talks into transcripts, summaries, and agent-ready briefs.

Try Typist free
Related

Keep exploring

More summaries from the Typist library — picked for the same category