IBM AI Research Releases Two English Granite Embedding Models, Both Based on the ModernBERT Architecture

IBM has quietly constructed a robust presence in the open-source AI ecosystem, and its newest launch exhibits why it shouldn’t be neglected. The firm has launched two new embedding fashions—granite-embedding-english-r2 and granite-embedding-small-english-r2—designed particularly for high-performance retrieval and RAG (retrieval-augmented technology) techniques. These fashions should not solely compact and environment friendly but additionally licensed underneath Apache 2.0, making them prepared for industrial deployment.
What Models Did IBM Release?
The two fashions goal totally different compute budgets. The bigger granite-embedding-english-r2 has 149 million parameters with an embedding dimension of 768, constructed on a 22-layer ModernBERT encoder. Its smaller counterpart, granite-embedding-small-english-r2, is available in at simply 47 million parameters with an embedding dimension of 384, utilizing a 12-layer ModernBERT encoder.
Despite their variations in dimension, each help a most context size of 8192 tokens, a serious improve from the first-generation Granite embeddings. This long-context functionality makes them extremely appropriate for enterprise workloads involving lengthy paperwork and complicated retrieval duties.

What’s Inside the Architecture?
Both fashions are constructed on the ModernBERT spine, which introduces a number of optimizations:
- Alternating international and native consideration to steadiness effectivity with long-range dependencies.
- Rotary positional embeddings (RoPE) tuned for positional interpolation, enabling longer context home windows.
- FlashAttention 2 to enhance reminiscence utilization and throughput at inference time.
IBM additionally educated these fashions with a multi-stage pipeline. The course of began with masked language pretraining on a two-trillion-token dataset sourced from net, Wikipedia, PubMed, BookCorpus, and inside IBM technical paperwork. This was adopted by context extension from 1k to 8k tokens, contrastive studying with distillation from Mistral-7B, and domain-specific tuning for conversational, tabular, and code retrieval duties.
How Do They Perform on Benchmarks?
The Granite R2 fashions ship sturdy outcomes throughout broadly used retrieval benchmarks. On MTEB-v2 and BEIR, the bigger granite-embedding-english-r2 outperforms equally sized fashions like BGE Base, E5, and Arctic Embed. The smaller mannequin, granite-embedding-small-english-r2, achieves accuracy near fashions two to 3 occasions bigger, making it notably engaging for latency-sensitive workloads.

Both fashions additionally carry out properly in specialised domains:
- Long-document retrieval (MLDR, LongEmbed) the place 8k context help is vital.
- Table retrieval duties (OTT-QA, FinQA, OpenWikiTables) the place structured reasoning is required.
- Code retrieval (CoIR), dealing with each text-to-code and code-to-text queries.
Are They Fast Enough for Large-Scale Use?
Efficiency is certainly one of the standout elements of those fashions. On an Nvidia H100 GPU, the granite-embedding-small-english-r2 encodes almost 200 paperwork per second, which is considerably sooner than BGE Small and E5 Small. The bigger granite-embedding-english-r2 additionally reaches 144 paperwork per second, outperforming many ModernBERT-based alternate options.
Crucially, these fashions stay sensible even on CPUs, permitting enterprises to run them in much less GPU-intensive environments. This steadiness of velocity, compact dimension, and retrieval accuracy makes them extremely adaptable for real-world deployment.
What Does This Mean for Retrieval in Practice?
IBM’s Granite Embedding R2 fashions show that embedding techniques don’t want huge parameter counts to be efficient. They mix long-context help, benchmark-leading accuracy, and excessive throughput in compact architectures. For firms constructing retrieval pipelines, information administration techniques, or RAG workflows, Granite R2 offers a production-ready, commercially viable various to current open-source choices.

Summary
In brief, IBM’s Granite Embedding R2 fashions strike an efficient steadiness between compact design, long-context functionality, and robust retrieval efficiency. With throughput optimized for each GPU and CPU environments, and an Apache 2.0 license that allows unrestricted industrial use, they current a sensible various to bulkier open-source embeddings. For enterprises deploying RAG, search, or large-scale information techniques, Granite R2 stands out as an environment friendly and production-ready choice.
Check out the Paper, granite-embedding-small-english-r2 and granite-embedding-english-r2. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter.
The publish IBM AI Research Releases Two English Granite Embedding Models, Both Based on the ModernBERT Architecture appeared first on MarkTechPost.