ParaThinker: Scaling LLM Test-Time Compute with Native Parallel Thinking to Overcome Tunnel Vision in Sequential Reasoning
Why Do Sequential LLMs Hit a Bottleneck? Test-time compute scaling in LLMs has historically relied on extending single reasoning paths. While this strategy improves reasoning for a restricted vary, efficiency plateaus rapidly. Experiments on DeepSeek-R1-distill-Qwen-1.5B present that growing token budgets past 32K (up to 128K) yields negligible accuracy positive aspects. The bottleneck arises from early…
