Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models
Pre-training giant language fashions is pricey sufficient that even modest effectivity enhancements can translate into significant price and time financial savings. Nous Research is releasing Token Superposition Training (TST), a technique that considerably reduces pre-training wall-clock time at fastened compute with out touching the mannequin structure, optimizer, tokenizer, parallelism technique, or coaching information. At the…
