Google Proposes TUMIX: Multi-Agent Test-Time Scaling With Tool-Use Mixture
What if, as an alternative of re-sampling one agent, you may push Gemini-2.5 Pro to 34.1% on HLE by mixing 12–15 tool-using brokers that share notes and cease early? Google Cloud AI Research, with collaborators from MIT, Harvard, and Google DeepMind, launched TUMIX (Tool-Use Mixture)—a test-time framework that ensembles heterogeneous agent types (text-only, code, search,…
