Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interaction Models for Fast Multilingual Search Across 11 Languages

This week, Liquid AI launched two new retrieval fashions. They are (*11*) and LFM2.5-Embedding-350M. Both maintain 350M parameters. Both are the primary bidirectional members of the LFM household. They construct on LFM2.5-350M-Base, launched in March. The pair targets quick multilingual and cross-lingual search throughout 11 languages. Their footprint is sufficiently small to run virtually wherever. Both can be found now on Hugging Face below the LFM Open License v1.0.

LFM2.5 Retrievers

The two fashions share one spine however characterize textual content otherwise. LFM2.5-Embedding-350M is a dense bi-encoder. It turns every doc right into a single vector. Pick it if you need the quickest search and the smallest, least expensive index.

LFM2.5-ColBERT-350M is a late-interaction mannequin. It converts every token right into a vector moderately than one vector per doc. This lets it match queries word-by-word for larger accuracy and higher generalization. The trade-off is a bigger index. Pick it when accuracy issues greater than storage. Its question size is capped at 32 tokens. It can even rerank a first-stage retriever’s outcomes with out constructing an index.

Both goal short-context search. Good matches embody product catalogs, FAQ data bases, and help docs. Liquid AI positions each as a drop-in substitute for an current RAG pipeline.

The Architecture Change: Causal to Bidirectional

Both fashions begin from LFM2.5-350M-Base, a mid-trained general-purpose checkpoint. Liquid AI applies a small set of bidirectional patches to the LFM2 structure. These adapt it from a causal decoder to a bidirectional encoder.

In a causal setup, every token makes use of solely itself and earlier tokens. That fits left-to-right era however is much less pure for retrieval. The staff replaces the causal consideration masks with a bidirectional one. Now each token can attend to each left and proper context. They additionally make the LFM2 brief convolutions non-causal. These combine native info symmetrically round every token, not solely from the previous.

This preserves the LFM2 spine’s effectivity whereas producing the full-context representations retrieval wants. Each mannequin has 17 layers: 10 convolution, 6 consideration, and 1 pooling or dense. Context size reaches 32,768 tokens, although paperwork are tuned to 512 tokens. From the shared encoder, the 2 fashions differ solely in output. Embedding makes use of CLS-style pooling for one 1024-dim vector. ColBERT retains 128-dim per-token embeddings for MaxSim late interplay.

Training and Data

Both fashions comply with the identical three-stage recipe:

Stage one is large-scale contrastive pretraining in English.
Stage two is multilingual and cross-lingual distillation from a powerful trainer throughout all 11 languages.
Stage three is last fine-tuning on hard-mined negatives.

The Embedding mannequin receives barely extra cross-lingual information than ColBERT. Cross-lingual retrieval emerges extra naturally within the late-interaction setup. Training information combines curated inside information with open-source English retrieval datasets. LLM-based translation expands the multilingual and cross-lingual pairs.

Benchmark

Liquid AI evaluated two capabilities. The first is multilingual retrieval with NanoBEIR. The second is cross-lingual open-domain QA with MKQA-11. Both report outcomes throughout all 11 languages: Arabic, German, English, Spanish, French, Italian, Japanese, Korean, Norwegian, Portuguese, and Swedish.

On common, each fashions lead their class. Here are the comparability particulars:

Model	Type	NanoBEIR ML (NDCG@10)	MKQA-11 (Recall@20)
LFM2.5-ColBERT-350M	late interplay	0.605	0.694
LFM2.5-Embedding-350M	dense	0.577	0.691
Qwen/Qwen3-Embedding-0.6B	dense	0.556	0.638
LFM2-ColBERT-350M	late interplay	0.540	0.646
Alibaba-NLP/gte-multilingual-base	dense	0.528	0.675
lightonai/GTE-ModernColBERT-v1	late interplay	0.489	0.459
BAAI/bge-large-en-v1.5	dense	0.359	0.413

ColBERT leads on each averages. Embedding is shut behind on MKQA-11 at 0.691. Both beat Qwen3-Embedding-0.6B, a bigger mannequin. The new ColBERT additionally improves on the sooner LFM2-ColBERT-350M, from 0.540 to 0.605 on NanoBEIR. Liquid AI additionally notes that NanoBEIR English tracks the dearer full BEIR. The two keep extremely correlated, with NanoBEIR scoring a near-constant ~15% larger. The analysis staff subsequently makes use of NanoBEIR as a sensible proxy throughout coaching runs.

Latency and Edge Deployment

Liquid AI launched GGUF variants for llama.cpp. These let each fashions run on CPUs, laptops, and edge gadgets. The figures beneath use a MacBook Pro M4 Max at FP16. Queries are 32 tokens; paperwork are 256 tokens.

Model	Stage	Docs cached	p50
LFM2.5-Embedding-350M	Query embedding	sure	7.3 ms
LFM2.5-ColBERT-350M	Query embedding + MaxSim	sure	8.2 ms
LFM2.5-ColBERT-350M	Query + Doc embedding + MaxSim	no	34.3 ms

When doc embeddings are pre-computed, median (p50) question latency stays below 10 ms. Encoding paperwork at question time pushes ColBERT to 34.3 ms. For enterprise scale, Liquid AI additionally constructed an inside GPU stack. On an H100 at FP16, it observes latencies as little as 1 ms. Embedding question latency there’s 1.5 ms p50.

Use Cases With Examples

E-commerce: Search a product catalog throughout many languages with one index. A consumer varieties a Korean question and the system surfaces an English product itemizing. Cross-lingual retrieval makes this work with out per-language indexes.
FAQ and help data bases: Retrieve the suitable reply reliably throughout customer-facing surfaces. A French help query maps to an English assist article.
On-device semantic search: Search recordsdata, emails, and notes domestically on shopper {hardware}. The GGUF construct retains information on the gadget at near-zero value.
Enterprise data assistants: Retrieve inside authorized, monetary, and technical paperwork throughout languages. ColBERT fits this when reply accuracy outranks index measurement.

Code: Getting Started

The Embedding mannequin runs by sentence-transformers. Always go the uneven prompts, question: and doc:. Omitting them silently degrades retrieval high quality.

Copy Code

from sentence_transformers import SentenceTransformer

mannequin = SentenceTransformer(
    "LiquidAI/LFM2.5-Embedding-350M",
    trust_remote_code=True,
)

queries = ["What is the capital of France?"]
paperwork = ["Paris is the capital and largest city of France."]

q_emb = mannequin.encode(queries,   prompt_name="question",    normalize_embeddings=True)
d_emb = mannequin.encode(paperwork, prompt_name="doc", normalize_embeddings=True)

scores = q_emb @ d_emb.T  # form: (n_queries, n_documents)

The ColBERT mannequin runs by PyLate. Its PLAID index makes use of FastPLAID for environment friendly similarity search.

Copy Code

from pylate import indexes, fashions, retrieve

mannequin = fashions.ColBERT(
    model_name_or_path="LiquidAI/LFM2.5-ColBERT-350M",
    trust_remote_code=True,
)
mannequin.tokenizer.pad_token = mannequin.tokenizer.eos_token

index = indexes.PLAID(index_folder="pylate-index", index_name="index", override=True)

docs_emb = mannequin.encode(["document 1 text", "document 2 text"], is_query=False)
index.add_documents(documents_ids=["1", "2"], documents_embeddings=docs_emb)

retriever = retrieve.ColBERT(index=index)
q_emb = mannequin.encode(["a search query"], is_query=True)
scores = retriever.retrieve(queries_embeddings=q_emb, ok=10)

To rerank an current first-stage pipeline as an alternative, skip the index and use rank.rerank.

Copy Code

from pylate import fashions, rank

mannequin = fashions.ColBERT(model_name_or_path="LiquidAI/LFM2.5-ColBERT-350M", trust_remote_code=True)

queries = ["query A"]
paperwork = [["candidate doc 1", "candidate doc 2"]]
documents_ids = [[1, 2]]

q_emb = mannequin.encode(queries, is_query=True)
d_emb = mannequin.encode(paperwork, is_query=False)

reranked = rank.rerank(
    documents_ids=documents_ids,
    queries_embeddings=q_emb,
    documents_embeddings=d_emb,
)

You can even fine-tune both mannequin by yourself information. The Embedding card supplies snippets utilizing sentence-transformers and MultipleNegativesRankingLoss.

Key Takeaways

Liquid AI’s LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M are the primary bidirectional LFMs, constructed for multilingual search throughout 11 languages.
Both 350M fashions lead their class on NanoBEIR and MKQA-11, beating the bigger Qwen3-Embedding-0.6B.
Embedding provides the smallest, least expensive index; ColBERT trades a bigger index for larger per-token accuracy.
GGUF builds run on CPUs, laptops, and edge by way of llama.cpp, with cached p50 question latency below 10 ms.
They drop into current RAG pipelines by sentence-transformers and PyLate, below the LFM Open License v1.0.

Interactive Explainer

Liquid AI weblog</a></span>
<span class=”tablet”>11 languages</span>
<span class=”tablet”>32,768 ctx</span>
<span class=”tablet”>GGUF · llama.cpp</span>
<span class=”tablet”>drop-in RAG</span>
</div>
</div>

<!– 01 RETRIEVAL SIMULATOR –>
<part>
<div class=”sn”>01 · Retrieval simulator</div>
<h2>Dense vs ColBERT on the identical question</h2>
<p class=”lead”>Type a question (any of the 11 languages) and watch each fashions rank a small multilingual corpus. Dense scores a single vector with cosine. ColBERT scores per-token vectors with MaxSim, so it might probably match throughout languages word-by-word.</p>
<div class=”card”>
<div class=”row” fashion=”align-items:flex-end;justify-content:space-between”>
<div fashion=”flex:1;min-width:240px”>
<div class=”lab”>Query</div>
<enter id=”q” sort=”textual content” worth=”What is the capital of France?” />
<div class=”chips” id=”qchips”></div>
</div>
<div>
<div class=”lab”>Model</div>
<div class=”seg” id=”modeSeg”>
<button data-m=”dense” class=”on”>Embedding (dense)</button>
<button data-m=”colbert”>ColBERT (MaxSim)</button>
</div>
</div>
</div>
<div class=”res” id=”outcomes”></div>
<span class=”observe”>Illustrative — similarity here’s a light-weight token/idea heuristic, not the actual 350M mannequin weights. Ranking conduct mirrors dense-cosine vs ColBERT-MaxSim.</span>
</div>
</part>

<!– 02 MAXSIM –>
<part>
<div class=”sn”>02 · Late interplay</div>
<h2>How MaxSim truly scores</h2>
<p class=”lead”>ColBERT retains one 128-dim vector per token. For every question token it takes the utmost similarity over all doc tokens, then sums these maxima. The matrix beneath exhibits question rows × doc columns; the outlined cell is every row’s max.</p>
<div class=”card”>
<div class=”row” fashion=”justify-content:space-between;align-items:flex-end”>
<div fashion=”flex:1;min-width:220px”>
<div class=”lab”>Query tokens</div>
<enter id=”mxq” sort=”textual content” worth=”capital of France” />
</div>
<div fashion=”flex:1;min-width:220px”>
<div class=”lab”>Document tokens</div>
<enter id=”mxd” sort=”textual content” worth=”Paris is the capital of France” />
</div>
</div>
<div class=”grid-h”><desk class=”mx” id=”mxtab”></desk></div>
<div class=”stats”>
<div class=”stat”><div class=”v” id=”mxscore”>0.00</div><div class=”ok”>MaxSim rating (Σ row max)</div></div>
<div class=”stat”><div class=”v” id=”mxnorm”>0.00</div><div class=”ok”>Normalized / question token</div></div>
<div class=”stat”><div class=”v” id=”mxdense”>0.00</div><div class=”ok”>Dense cosine (identical textual content)</div></div>
</div>
<span class=”observe”>Illustrative similarity values. Real per-token vectors are 128-dim; right here cells present a heuristic 0–1 token relatedness.</span>
</div>
</part>

<!– 03 INDEX + LATENCY –>
<part>
<div class=”sn”>03 · Cost & pace</div>
<h2>Index footprint and question latency</h2>
<p class=”lead”>Dense shops one 1024-dim vector per doc. ColBERT shops one 128-dim vector per token, so its index grows with doc size. Latency figures are the actual revealed p50/p95 numbers.</p>
<div class=”card”>
<div class=”row”>
<div fashion=”flex:1;min-width:200px”>
<div class=”lab”>Corpus measurement — <b id=”nDocsL” fashion=”coloration:var(–green)”>100,000</b> docs</div>
<enter id=”nDocs” sort=”vary” min=”1000″ max=”5000000″ step=”1000″ worth=”100000″/>
</div>
<div fashion=”flex:1;min-width:200px”>
<div class=”lab”>Avg tokens / doc — <b id=”tDocL” fashion=”coloration:var(–green)”>256</b></div>
<enter id=”tDoc” sort=”vary” min=”32″ max=”512″ step=”8″ worth=”256″/>
</div>
</div>
<div class=”row” fashion=”margin-top:14px;justify-content:space-between;align-items:heart”>
<div><div class=”lab”>Hardware (actual benchmark)</div>
<div class=”seg” id=”hwSeg”>
<button data-hw=”m4″ class=”on”>MacBook M4 Max · llama.cpp</button>
<button data-hw=”h100″>H100 · GPU stack</button>
</div>
</div>
</div>
<div class=”stats”>
<div class=”stat”><div class=”v” id=”denseIdx”>—</div><div class=”ok”>Dense index (fp16)</div></div>
<div class=”stat”><div class=”v” id=”colbertIdx”>—</div><div class=”ok”>ColBERT index (fp16)</div></div>
<div class=”stat”><div class=”v” id=”latEmb”>—</div><div class=”ok”>Dense question p50</div></div>
<div class=”stat”><div class=”v” id=”latCol”>—</div><div class=”ok”>ColBERT question+MaxSim p50</div></div>
</div>
<span class=”observe”>Index sizes are uncooked fp16 estimates (dense = N·1024·2B; ColBERT = N·T·128·2B). Production ColBERT indexes are quantized/compressed, so actual footprints are smaller. Latencies: 32-token question, 256-token doc, FP16.</span>
</div>
</part>

<!– 04 BENCHMARKS –>
<part>
<div class=”sn”>04 · Benchmarks (actual)</div>
<h2>Multilingual & cross-lingual scores</h2>
<p class=”lead”>Published outcomes throughout all 11 languages. NanoBEIR Multilingual Extended stories NDCG@10. MKQA-11 stories Recall@20. Higher is healthier. Liquid AI’s two fashions are highlighted.</p>
<div class=”card”>
<div class=”row” fashion=”justify-content:space-between;align-items:flex-end;hole:14px”>
<div><div class=”lab”>Benchmark</div>
<div class=”seg” id=”bSeg”>
<button data-b=”nano” class=”on”>NanoBEIR · NDCG@10</button>
<button data-b=”mkqa”>MKQA-11 · Recall@20</button>
</div>
</div>
<div fashion=”flex:1;min-width:200px”><div class=”lab”>Language</div>
<div class=”seg” id=”langSeg” fashion=”flex-wrap:wrap;border:0;hole:5px”></div>
</div>
</div>
<div class=”bk” id=”bench”></div>
</div>
</part>

<div class=”payoff”>
<div class=”p”><div class=”pv”>0.605</div><div class=”pk”>ColBERT avg NDCG@10 · NanoBEIR ML</div></div>
<div class=”p”><div class=”pv”>7.3 ms</div><div class=”pk”>Dense question p50 · M4 Max, cached</div></div>
<div class=”p”><div class=”pv”>~1 ms</div><div class=”pk”>As low as, on H100 GPU stack</div></div>
</div>

<div class=”ftr”>
<span>Built for <b fashion=”coloration:var(–green)”>Marktechpost</b> · information from Liquid AI & Hugging Face mannequin playing cards</span>
<span class=”mono-sm”>LFM Open License v1.0 · arXiv:2511.23404</span>
</div>
</div>

/* ———- idea lexicon (illustrative cross-lingual matching) ———- */
var CONCEPT={
capital:’CAP’,capitale:’CAP’,hauptstadt:’CAP’,’首都’:’CAP’,huvudstad:’CAP’,hovedstad:’CAP’,capitalle:’CAP’,
france:’FR’,frankreich:’FR’,francia:’FR’,’フランス’:’FR’,frança:’FR’,
japan:’JP’,japon:’JP’,’japón’:’JP’,’日本’:’JP’,giappone:’JP’,
germany:’DE’,deutschland:’DE’,deutschlands:’DE’,allemagne:’DE’,alemania:’DE’,’ドイツ’:’DE’,
spain:’ES’,’españa’:’ES’,espagne:’ES’,spanien:’ES’,’スペイン’:’ES’,
metropolis:’CITY’,ciudad:’CITY’,stadt:’CITY’,ville:’CITY’,’都市’:’CITY’,’città’:’CITY’,cidade:’CITY’,stad:’CITY’,
river:’RIV’,’río’:’RIV’,fluss:’RIV’,fleuve:’RIV’,’川’:’RIV’,fiume:’RIV’,rio:’RIV’,flod:’RIV’,seine:’RIV’,
largest:’BIG’,’größte’:’BIG’,largo:’BIG’,’最大’:’BIG’,maior:’BIG’,’största’:’BIG’,greatest:’BIG’,
paris:’PARIS’,’パリ’:’PARIS’,tokyo:’TOKYO’,’東京’:’TOKYO’,tokio:’TOKYO’,berlin:’BERLIN’,’ベルリン’:’BERLIN’,
madrid:’MADRID’,nile:’NILE’,nilo:’NILE’,africa:’AFR’,’áfrica’:’AFR’,afrika:’AFR’,
gastronomy:’GAS’,’gastronomía’:’GAS’,gastronomie:’GAS’,lyon:’LYON’
};
perform clear(t){return (t||”).toLowerCase().exchange(/[^p{L}p{N}u3040-u30ffu4e00-u9fff]+/gu,’ ‘).trim()}
perform uncookedToks(t){var c=clear(t);return c?c.break up(/s+/):[]}
var STOP={of:1,the:1,is:1,a:1,an:1,and:1,on:1,in:1,to:1,for:1,de:1,la:1,le:1,les:1,el:1,los:1,las:1,der:1,die:1,das:1,den:1,und:1,et:1,y:1,su:1,sa:1,son:1,est:1,ist:1,es:1,el:1,pour:1,ein:1,eine:1,l:1,d:1,il:1,’は’:1,’の’:1,’で’:1,’です’:1,’を’:1,’が’:1,’に’:1,’と’:1,’も’:1,’な’:1};
perform toks(t){return uncookedToks(t).filter(perform(x){return !STOP[x]&&x.size>1})}
perform cnorm(t){if(CONCEPT[t])return t;if(t.size>4&&CONCEPT[t.slice(0,-1)])return t.slice(0,-1);return t}
perform trig(s){var o={},i;for(i=0;i<s.length-2;i++){o[s.substr(i,3)]=1}if(s.size<3)o[s]=1;return o}
perform tsim(a,b){ // token-token similarity 0..1
if(a===b)return 1;
var ca=CONCEPT[cnorm(a)],cb=CONCEPT[cnorm(b)];
if(ca&&cb)return ca===cb?0.95:0.04;
var A=trig(a),B=trig(b),inter=0,uni={},ok;
for(ok in A){uni[k]=1;if(B[k])inter++}for(ok in B)uni[k]=1;
var u=Object.keys(uni).size;return u?inter/u:0;
}
perform ideaKey(tok)(‘T:’+tok)
perform bagVec(tlist){var v={},i;for(i=0;i<tlist.size;i++)0)+1return v}
perform cos(a,b){var d=0,na=0,nb=0,ok;for(ok in a){na+=a[k]*a[k];if(b[k])d+=a[k]*b[k]}for(ok in b)nb+=b[k]*b[k];if(!na||!nb)return 0;return d/Math.sqrt(na*nb)}
perform maxsim(qt,dt){var s=0,i,j;for(i=0;i<qt.size;i++){var m=0;for(j=0;j<dt.size;j++){var v=tsim(qt[i],dt[j]);if(v>m)m=v}s+=m}return s}

/* ———- corpus ———- */
var CORPUS=[
{id:1,lg:’EN’,tx:’Paris is the capital and largest city of France, on the Seine river.’},
{id:2,lg:’JA’,tx:’東京は日本の首都で世界最大の都市です。’},
{id:3,lg:’DE’,tx:’Berlin ist die Hauptstadt und größte Stadt Deutschlands.’},
{id:4,lg:’ES’,tx:’Madrid es la capital de España y su ciudad más poblada.’},
{id:5,lg:’FR’,tx:’Lyon est connue pour sa gastronomie et son patrimoine.’},
{id:6,lg:’EN’,tx:’The Nile is the longest river in Africa.’}
];
CORPUS.forEach(perform(d){d.t=toks(d.tx);d.bag=bagVec(d.t)});

var QCHIPS=[‘What is the capital of France?’,’首都日本’,’capitale de l’Allemagne’,’ciudad más grande de España’,’longest river in Africa’];
var qc=$(‘#qchips’);
QCHIPS.forEach(perform(q){var c=doc.createElement(‘span’);c.className=’chip’;c.textContent=q;c.onclick=perform(){$(‘#q’).worth=q;renderResults();};qc.appendChild(c)});

perform renderResults(){
var qtext=$(‘#q’).worth, qt=toks(qtext), qbag=bagVec(qt);
var scored=CORPUS.map(perform(d){
var s = MODE===’dense’ ? cos(qbag,d.bag) : (qt.size?maxsim(qt,d.t)/qt.size:0);
return {d:d,s:s};
});
scored.type(perform(a,b){return b.s-a.s});
var max=scored[0].s||1;
var html=scored.map(perform(o,i){
var pct=Math.max(2,Math.spherical(o.s/max*100));
return ‘<div class=”doc’+(i===0&&o.s>0?’ prime’:”)+'”>’+
‘<div class=”meta”><span class=”lg”>’+o.d.lg+’ · doc ‘+o.d.id+(i===0&&o.s>0?’ · TOP MATCH’:”)+'</span>’+
‘<span class=”sc”>’+o.s.toFixed(3)+'</span></div>’+
‘<div class=”tx”>’+o.d.tx+'</div>’+
‘<div class=”bar”><i fashion=”width:’+pct+’%”></i></div></div>’;
}).be part of(”);
$(‘#outcomes’).innerHTML=html;
}
$(‘#q’).addEventListener(‘enter’,renderResults);

/* ———- MaxSim matrix ———- */
perform renderMx(){
var qt=toks($(‘#mxq’).worth), dt=toks($(‘#mxd’).worth);
if(!qt.size||!dt.size){$(‘#mxtab’).innerHTML=”;return}
var head='<tr><th></th>’+dt.map(perform(d){return ‘<th>’+d+'</th>’}).be part of(”)+'</tr>’;
var rows=”,whole=0;
qt.forEach(perform(q){
var sims=dt.map(perform(d){return tsim(q,d)});
var mx=Math.max.apply(null,sims),mi=sims.indexOf(mx);whole+=mx;
rows+='<tr><td class=”q”>’+q+'</td>’+sims.map(perform(v,j){
var sh=Math.spherical(v*120);
return ‘<td class=”‘+(j===mi?’mxhit’:”)+'” fashion=”background:rgba(118,185,0,’+(v*0.55).toFixed(2)+’)”>’+v.toFixed(2)+'</td>’;
}).be part of(”)+'</tr>’;
});
$(‘#mxtab’).innerHTML=head+rows;
$(‘#mxscore’).textContent=whole.toFixed(2);
$(‘#mxnorm’).textContent=(whole/qt.size).toFixed(2);
$(‘#mxdense’).textContent=cos(bagVec(qt),bagVec(dt)).toFixed(2);
}
$(‘#mxq’).addEventListener(‘enter’,renderMx);
$(‘#mxd’).addEventListener(‘enter’,renderMx);

/* ———- index + latency ———- */
var HW=’m4′;
var LAT={m4:{emb:’7.3 ms’,col:’8.2 ms’},h100:{emb:’1.5 ms’,col:’2.5 ms’}};
$$(‘#hwSeg button’).forEach(perform(b){b.onclick=perform(){HW=b.getAttribute(‘data-hw’);$$(‘#hwSeg button’).forEach(perform(x){x.classList.take away(‘on’)});b.classList.add(‘on’);renderIdx();ping()}});
perform fmtBytes(b){var u=[‘B’,’KB’,’MB’,’GB’,’TB’],i=0;whereas(b>=1024&&i<u.length-1){b/=1024;i++}return b.toFixed(b<10?1:0)+’ ‘+u[i]}
perform renderIdx(){
var n=+$(‘#nDocs’).worth, t=+$(‘#tDoc’).worth;
$(‘#nDocsL’).textContent=n.toLocaleString();
$(‘#tDocL’).textContent=t;
$(‘#denseIdx’).textContent=fmtBytes(n*1024*2);
$(‘#colbertIdx’).textContent=fmtBytes(n*t*128*2);
$(‘#latEmb’).textContent=LAT[HW].emb;
$(‘#latCol’).textContent=LAT[HW].col;
}
$(‘#nDocs’).addEventListener(‘enter’,renderIdx);
$(‘#tDoc’).addEventListener(‘enter’,renderIdx);

/* ———- benchmarks (actual revealed per-language values) ———- */
var LANGS=[‘AVG’,’ar’,’de’,’en’,’es’,’fr’,’it’,’ja’,’ko’,’no’,’pt’,’sv’];
var DATA={
nano:{
‘LFM2.5-ColBERT-350M’:[.605,.551,.606,.687,.607,.622,.606,.614,.590,.570,.613,.586],
‘LFM2.5-Embedding-350M’:[.577,.529,.581,.644,.581,.592,.583,.575,.563,.557,.581,.566],
‘Qwen3-Embedding-0.6B’:[.556,.514,.560,.649,.568,.565,.565,.551,.530,.516,.571,.525],
‘gte-multilingual-base’:[.528,.477,.523,.624,.537,.542,.528,.511,.494,.516,.534,.526],
‘bge-large-en-v1.5’:[.359,.059,.419,.642,.445,.475,.431,.198,.132,.358,.434,.353]
},
mkqa:{
‘LFM2.5-ColBERT-350M’:[.694,.608,.709,.748,.711,.715,.707,.703,.640,.689,.703,.700],
‘LFM2.5-Embedding-350M’:[.691,.610,.709,.738,.708,.715,.703,.685,.630,.691,.710,.708],
‘gte-multilingual-base’:[.675,.567,.692,.741,.705,.703,.697,.655,.563,.698,.700,.699],
‘Qwen3-Embedding-0.6B’:[.638,.520,.671,.723,.678,.672,.671,.635,.543,.620,.667,.620],
‘bge-large-en-v1.5′:[.413,.133,.471,.748,.450,.531,.461,.208,.172,.456,.443,.467]
}
};
var BENCH=’nano’,LANG=0;
var ls=$(‘#langSeg’);
LANGS.forEach(perform(l,i){var b=doc.createElement(‘button’);b.textContent=l;if(i===0)b.className=’on’;b.onclick=perform(){LANG=i;$$(‘#langSeg button’).forEach(perform(x){x.classList.take away(‘on’)});b.classList.add(‘on’);renderBench()};ls.appendChild(b)});
$$(‘#bSeg button’).forEach(perform(b){b.onclick=perform(){BENCH=b.getAttribute(‘data-b’);$$(‘#bSeg button’).forEach(perform(x){x.classList.take away(‘on’)});b.classList.add(‘on’);renderBench();ping()}});
perform renderBench(){
var set=DATA[BENCH], rows=Object.keys(set).map(perform(identify){return {identify:identify,v:set[name][LANG]}});
rows.type(perform(a,b){return b.v-a.v});
var html=rows.map(perform(r){
var us=r.identify.indexOf(‘LFM2.5’)===0;
return ‘<div class=”forehead”><div class=”bn’+(us?’ us’:”)+'”>’+r.identify+'</div>’+
‘<div class=”btrk”><div class=”bfill’+(us?’ us’:”)+'” fashion=”width:’+(r.v*100).toFixed(1)+’%”></div></div>’+
‘<div class=”bv”>’+r.v.toFixed(3)+'</div></div>’;
}).be part of(”);
$(‘#bench’).innerHTML=html;
}

/* ———- WordPress auto-resize ———- */
perform ping(){attempt{var h=R.offsetHeight+40;mother or father.postMessage({sort:’lfm25-resize’,top:h},’*’)}catch(e){}}
var ro;if(window.ResizeObserver){ro=new ResizeObserver(ping);ro.observe(R)}
window.addEventListener(‘load’,ping);window.addEventListener(‘resize’,ping);

/* ———- preliminary render (values on load) ———- */
renderResults();renderMx();renderIdx();renderBench();ping();
setTimeout(ping,300);
})();
</script>

</physique>
</html>”>

Check out the Technical details, LFM2.5-Embedding and LFM2.5-ColBERT. Also, be at liberty to comply with us on Twitter and don’t overlook to hitch our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The publish Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interaction Models for Fast Multilingual Search Across 11 Languages appeared first on MarkTechPost.

Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interaction Models for Fast Multilingual Search Across 11 Languages

LFM2.5 Retrievers

The Architecture Change: Causal to Bidirectional

Training and Data

Benchmark

Latency and Edge Deployment

Use Cases With Examples

Code: Getting Started

Key Takeaways

Interactive Explainer

Thinking Machines Launches Tinker: A Low-Level Training API that Abstracts Distributed LLM Fine-Tuning without Hiding the Knobs

Is your most capable AI agent also your biggest data leak?

VLM2Vec-V2: A Unified Computer Vision Framework for Multimodal Embedding Learning Across Images, Videos, and Visual Documents

From Gemma 3 270M to FunctionGemma, How Google AI Built a Compact Function Calling Specialist for Edge Workloads

Microsoft Releases POML (Prompt Orchestration Markup Language): Bringing Modularity and Scalability to LLM Prompts

JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

LFM2.5 Retrievers

The Architecture Change: Causal to Bidirectional

Training and Data

Benchmark

Latency and Edge Deployment

Use Cases With Examples

Code: Getting Started

Key Takeaways

Interactive Explainer

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!