Apple Researchers Release CLaRa: A Continuous Latent Reasoning Framework for Compression‑Native RAG with 16x–128x Semantic Document Compression
How do you retain RAG methods correct and environment friendly when each question tries to stuff 1000’s of tokens into the context window and the retriever and generator are nonetheless optimized as 2 separate, disconnected methods? A group of researchers from Apple and University of Edinburgh launched CLaRa, Continuous Latent Reasoning, (CLaRa-7B-Base, CLaRa-7B-Instruct and CLaRa-7B-E2E)…
