Liquid AI Releases LFM2-ColBERT-350M: A New Small Model that brings Late Interaction Retrieval to Multilingual and Cross-Lingual RAG

ByRicardo October 29, 2025October 29, 2025

Can a compact late interplay retriever index as soon as and ship correct cross lingual search with quick inference? Liquid AI launched LFM2-ColBERT-350M, a compact late interplay retriever for multilingual and cross-lingual search. Documents could be listed in a single language, queries could be written in lots of languages, and the system retrieves with excessive accuracy. The Liquid AI crew stories inference velocity on par with fashions that are 2.3 instances smaller, which is attributed to the LFM2 spine. The mannequin is on the market with a Hugging Face demo and an in depth mannequin card for integration in retrieval augmented technology methods.

https://www.liquid.ai/weblog/lfm2-colbert-350m-one-model-to-embed-them-all

What late interplay means and why it issues?

Most manufacturing methods use bi-encoders for velocity or cross encoders for accuracy. Late interplay goals to mix each benefits. Queries and paperwork are encoded individually on the token degree. The system compares token vectors at question time utilizing operations similar to MaxSim. This preserves high-quality grained token interactions with out the complete price of joint cross consideration. It permits pre-computation for paperwork and improves precision at rating time. It can function a primary stage retriever and additionally as a ranker in a single move.

Model specification

LFM2-ColBERT-350M has 350 million whole parameters. There are 25 layers, with 18 convolution blocks, 6 consideration blocks, and 1 dense layer. The context size is 32k tokens. The vocabulary dimension is 65,536. The similarity operate is MaxSim. The output dimensionality is 128. Training precision is BF16. The license is LFM Open License v1.0.

https://huggingface.co/LiquidAI/LFM2-ColBERT-350M

Languages, supported and evaluated

The mannequin helps 8 languages. These are English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish. The analysis provides Italian and Portuguese, which brings the matrix to 9 languages for cross comparisons of doc and question languages. This distinction is related when planning deployments that should cowl particular buyer markets.

Evaluation setup and key outcomes

Liquid AI extends the NanoBEIR benchmark with Japanese and Korean and publishes the extension for reproducibility. On this setup, LFM2-ColBERT-350M reveals stronger multilingual functionality than the baseline late interplay mannequin on this class, which is GTE-ModernColBERT-v1 at 150M parameters. The largest beneficial properties seem in German, Arabic, Korean, and Japanese, whereas English efficiency is maintained.

Key Takeaways

Token-level scoring with MaxSim preserves fine-grained interactions whereas protecting separate encoders, so doc embeddings could be precomputed and queried effectively.
Documents could be listed in a single language and retrieved in lots of. The mannequin card lists 8 supported languages, whereas evaluations span 9 languages for cross-lingual pairs.
On the NanoBEIR multilingual extension, LFM2-ColBERT-350M outperforms the prior late-interaction baseline (GTE-ModernColBERT-v1 at 150M) and maintains English efficiency.
Inference velocity is reported on par with fashions 2.3× smaller throughout batch sizes, attributed to the LFM2 spine.

Editorial Notes

Liquid AI’s LFM2-ColBERT-350M applies late interplay ColBERT with MaxSim, it encodes queries and paperwork individually, then scores token vectors at question time, which preserves token degree interactions and permits precomputed doc embeddings for scale. It targets multilingual and cross lingual retrieval, index as soon as and question in lots of languages, with evaluations described on a NanoBEIR multilingual extension. Liquid AI crew stories inference velocity on par with fashions 2.3 instances smaller, attributed to the LFM2 spine. Overall, late interplay on the nano scale seems to be manufacturing prepared for multilingual RAG trials.

Check out the Model Weights, Demo and Technical details. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to observe us on Twitter and don’t overlook to be part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit Liquid AI Releases LFM2-ColBERT-350M: A New Small Model that brings Late Interaction Retrieval to Multilingual and Cross-Lingual RAG appeared first on MarkTechPost.

Agentic AI AI Shorts

OceanBase Releases seekdb: An Open Source AI Native Hybrid Search Database for Multi-model RAG and AI Agents
ByRicardo November 27, 2025

AI purposes not often cope with one clear desk. They combine consumer profiles, chat logs, JSON metadata, embeddings, and typically spatial knowledge. Most groups reply this with a patchwork of an OLTP database, a vector retailer, and a search engine. OceanBase launched seekdb, an open supply AI centered database (underneath the Apache 2.0 license). seekdb…

Read More OceanBase Releases seekdb: An Open Source AI Native Hybrid Search Database for Multi-model RAG and AI Agents
AI Shorts Applications

OpenAI Launches Sora 2 and a Consent-Gated Sora iOS App
ByRicardo September 30, 2025

OpenAI launched (*2*) a text-to-video-and-audio mannequin centered on bodily plausibility, multi-shot controllability, and synchronized dialogue/SFX. The OpenAI workforce has additionally launched a new invite-only Sora iOS app (U.S. and Canada first) that allows social creation, remixing, and consent-controlled “cameos” for inserting a verified likeness into generated scenes. Model capabilities Sora 2 claims materially higher world…

Read More OpenAI Launches Sora 2 and a Consent-Gated Sora iOS App
AI Paper Summary AI Shorts

AutoCode: A New AI Framework that Lets LLMs Create and Verify Competitive Programming Problems, Mirroring the Workflow of Human Problem Setters
ByRicardo October 18, 2025

Are your LLM code benchmarks really rejecting wrong-complexity options and interactive-protocol violations, or are they passing under-specified unit assessments? A crew of researchers from UCSD, NYU, University of Washington, Princeton University, Canyon Crest Academy, OpenAI, UC Berkeley, MIT, University of Waterloo, and Sentient Labs introduce AutoCode, a brand new AI framework that lets LLMs create…

Read More AutoCode: A New AI Framework that Lets LLMs Create and Verify Competitive Programming Problems, Mirroring the Workflow of Human Problem Setters
AI Paper Summary AI Shorts

REST: A Stress-Testing Framework for Evaluating Multi-Problem Reasoning in Large Reasoning Models
ByRicardo July 26, 2025

Large Reasoning Models (LRMs) have rapidly advanced, exhibiting impressive performance in complex problem-solving tasks across domains like mathematics, coding, and scientific reasoning. However, current evaluation approaches primarily focus on single-question testing, which reveals significant limitations. This article introduces REST (Reasoning Evaluation through Simultaneous Testing) — a novel multi-problem stress-testing framework designed to push LRMs beyond isolated problem-solving…

Read More REST: A Stress-Testing Framework for Evaluating Multi-Problem Reasoning in Large Reasoning Models
AI Paper Summary AI Shorts

Meta AI Open-Sources OpenZL: A Format-Aware Compression Framework with a Universal Decoder
ByRicardo October 8, 2025

How a lot compression ratio and throughput would you recuperate by coaching a format-aware graph compressor and delivery solely a self-describing graph to a common decoder? Meta AI launched OpenZL, an open-source framework that builds specialised, format-aware compressors from high-level knowledge descriptions and emits a self-describing wire format that a common decoder can learn—decoupling compressor…

Read More Meta AI Open-Sources OpenZL: A Format-Aware Compression Framework with a Universal Decoder
AI Shorts Applications

Alibaba Qwen Unveils Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507: Refreshing the Importance of Small Language Models
ByRicardo August 9, 2025

Smaller Models with Smarter Performance and 256K Context Support Alibaba’s Qwen team has introduced two powerful additions to its small language model lineup: Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507. Despite having only 4 billion parameters, these models deliver exceptional capabilities across general-purpose and expert-level tasks while running efficiently on consumer-grade hardware. Both are designed with native 256K token…

Read More Alibaba Qwen Unveils Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507: Refreshing the Importance of Small Language Models

Liquid AI Releases LFM2-ColBERT-350M: A New Small Model that brings Late Interaction Retrieval to Multilingual and Cross-Lingual RAG