Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate
Perplexity AI’s analysis crew reimplemented their Unigram tokenizer from scratch in Rust and open-sourced the code in pplx-garden, their inference expertise repository. At manufacturing enter lengths, the brand new encoder cuts p50 latency by roughly 5x versus the Hugging Face tokenizers crate, ~2x versus SentencePiece (C++), and ~1.5x versus IREE’s tokenizer (C), with zero steady-state…
