Meet mmBERT: An Encoder-only Language Model Pretrained on 3T Tokens of Multilingual Text in over 1800 Languages and 2–4× Faster than Previous Models
Table of contents Why was a new multilingual encoder needed? Understanding the architecture of mmBERT What training data and phases were used? What new training strategies were introduced? How does mmBERT perform on benchmarks? How does mmBERT handle low-resource languages? What efficiency gains does mmBERT achieve? Summary Why was a brand new multilingual encoder wanted?…