SwiReasoning: Entropy-Driven Alternation of Latent and Explicit Chain-of-Thought for Reasoning LLMs

SwiReasoning is a decoding-time framework that lets a reasoning LLM resolve when to suppose in latent house and when to write specific chain-of-thought, utilizing block-wise confidence estimated from entropy developments in next-token distributions. The methodology is training-free, model-agnostic, and targets Pareto-superior accuracy/effectivity trade-offs on arithmetic and STEM benchmarks. Reported outcomes present +1.5%–2.8% common accuracy enhancements with limitless tokens and +56%–79% common token-efficiency features below constrained budgets; on AIME’24/’25, it reaches most reasoning accuracy earlier than commonplace CoT.
What SwiReasoning adjustments at inference time?
The controller screens the decoder’s next-token entropy to type a block-wise confidence sign. When confidence is low (entropy trending upward), it enters latent reasoning—the mannequin continues to cause with out emitting tokens. When confidence recovers (entropy trending down), it switches again to specific reasoning, emitting CoT tokens to consolidate and decide to a single path. A swap depend management limits the utmost quantity of thinking-block transitions to suppress overthinking earlier than finalizing the reply. This dynamic alternation is the core mechanism behind the reported accuracy-per-token features.

Results: accuracy and effectivity on commonplace suites
It stories enhancements throughout arithmetic and STEM reasoning duties:
- Pass@1 (limitless finances): accuracy lifts as much as +2.8% (math) and +2.0% (STEM) in Figure 1 and Table 1, with a +2.17% common over baselines (CoT with sampling, CoT grasping, and Soft Thinking).
- Token effectivity (restricted budgets): common enhancements as much as +79% (Figure 2). A complete comparability exhibits SwiReasoning attains the highest token effectivity in 13/15 evaluations, with an +84% common enchancment over CoT throughout these settings (Figure 4).
- Pass@ok dynamics: with Qwen3-8B on AIME 2024/2025, most reasoning accuracies are achieved +50% earlier than CoT on common (Figure 5), indicating quicker convergence to the ceiling with fewer sampled trajectories.
Why switching helps?
Explicit CoT is discrete and readable however locks in a single path prematurely, which might discard helpful alternate options. Latent reasoning is steady and information-dense per step, however purely latent methods could diffuse chance mass and impede convergence. SwiReasoning provides a confidence-guided alternation: latent phases broaden exploration when the mannequin is unsure; specific phases exploit rising confidence to solidify an answer and commit tokens solely when helpful. The swap depend management regularizes the method by capping oscillations and limiting extended “silent” wandering—addressing each accuracy loss from diffusion and token waste from overthinking cited as challenges for training-free latent strategies.
Positioning vs. baselines
The mission compares towards CoT with sampling, CoT grasping, and Soft Thinking, reporting a +2.17% common accuracy elevate at limitless budgets (Table 1) and constant efficiency-per-token benefits below finances constraints. The visualized Pareto frontier shifts outward—both larger accuracy on the identical finances or comparable accuracy with fewer tokens—throughout completely different mannequin households and scales. On AIME’24/’25, the Pass@ok curves present that SwiReasoning reaches the efficiency ceiling with fewer samples than CoT, reflecting improved convergence conduct quite than solely higher uncooked ceilings.


Key Takeaways
- Training-free controller: SwiReasoning alternates between latent reasoning and specific chain-of-thought utilizing block-wise confidence from next-token entropy developments.
- Efficiency features: Reports +56–79% common token-efficiency enhancements below constrained budgets versus CoT, with bigger features as budgets tighten.
- Accuracy lifts: Achieves +1.5–2.8% common Pass@1 enhancements on arithmetic/STEM benchmarks at limitless budgets.
- Faster convergence: On AIME 2024/2025, reaches most reasoning accuracy earlier than CoT (improved Pass@ok dynamics).
Editorial Comments
SwiReasoning is a helpful step towards pragmatic “reasoning coverage” management at decode time: it’s training-free, slots behind the tokenizer, and exposes measurable features on math/STEM suites by toggling between latent and specific CoT utilizing an entropy-trend confidence sign with a capped swap depend. The open-source BSD implementation and clear flags (--max_switch_count
, --alpha
) make replication easy and decrease the barrier to stacking with orthogonal effectivity layers (e.g., quantization, speculative decoding, KV-cache tips). The methodology’s worth proposition is “accuracy per token” quite than uncooked SOTA accuracy, which is operationally vital for budgeted inference and batching.
Check out the Paper and Project Page. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The put up SwiReasoning: Entropy-Driven Alternation of Latent and Explicit Chain-of-Thought for Reasoning LLMs appeared first on MarkTechPost.