Too Much Thinking Can Break LLMs: Inverse Scaling in Test-Time Compute
Recent advances in large language models (LLMs) have encouraged the idea that letting models “think longer” during inference usually improves their accuracy and robustness. Practices like chain-of-thought prompting, step-by-step explanations, and increasing “test-time compute” are now standard techniques in the field. However, the Anthropic-led study “Inverse Scaling in Test-Time Compute” delivers a compelling counterpoint: in…
