OceanBase Releases seekdb: An Open Source AI Native Hybrid Search Database for Multi-model RAG and AI Agents
AI purposes not often cope with one clear desk. They combine consumer profiles, chat logs, JSON metadata, embeddings, and typically spatial knowledge. Most groups reply this with a patchwork of an OLTP database, a vector retailer, and a search engine. OceanBase launched seekdb, an open supply AI centered database (underneath the Apache 2.0 license). seekdb is described as an AI native search database that unifies relational knowledge, vector knowledge, textual content, JSON, and GIS in a single engine and exposes hybrid search and in database AI workflows.
What is seekdb?
seekdb is positioned because the light-weight, embedded model of the OceanBase engine, geared toward AI purposes moderately than basic function distributed deployments. It runs as a single node database, helps embedded mode and consumer or server mode, and stays appropriate with MySQL drivers and SQL syntax.
In the potential matrix, seekdb is marked as:
- Embedded database supported
- Standalone database supported
- Distributed database not supported
whereas the complete OceanBase product covers the distributed case.
From an information mannequin perspective, seekdb helps:
- Relational knowledge with customary SQL
- Vector search
- Full textual content search
- JSON knowledge
- Spatial GIS knowledge
all inside one storage and indexing layer.
Hybrid search because the core characteristic
The foremost characteristic OceanBase pushes is hybrid search. This is search that mixes vector primarily based semantic retrieval, full textual content key phrase retrieval, and scalar filters in a single question and a single rating step.
seekdb implements hybrid search via a system package deal named DBMS_HYBRID_SEARCH with two entry factors:
- DBMS_HYBRID_SEARCH.SEARCH which returns outcomes as JSON, sorted by relevance
- DBMS_HYBRID_SEARCH.GET_SQL which returns the concrete SQL string used for execution
The hybrid search path can run:
- pure vector search
- pure full textual content search
- mixed hybrid search
and can push relational filters and joins down into storage. It additionally helps question reranking methods like weighted scores and reciprocal rank fusion and can plug in giant language mannequin primarily based re-rankers.
For retrieval augmented era (RAG) and agent reminiscence, this implies you’ll be able to write a single SQL question that does semantic matching on embeddings, actual matching on product codes or correct nouns, and relational filtering on consumer or tenant scopes.
Vector and full textual content engine particulars
At its core, seekdb exposes a fashionable vector and full textual content stack.
For vectors, seekdb:
- helps dense vectors and sparse vectors
- helps Manhattan, Euclidean, internal product, and cosine distance metrics
- gives in reminiscence index varieties resembling HNSW, HNSW SQ, HNSW BQ
- gives disk primarily based index varieties together with IVF and IVF PQ
Hybrid vector index present how one can retailer uncooked textual content, let seekdb name an embedding mannequin routinely, and have the system preserve the corresponding vector index and not using a separate preprocessing pipeline.
For textual content, seekdb affords full textual content search with:
- key phrase, phrase, and Boolean queries
- BM25 rating for relevance
- a number of tokenizer modes
The key level is that full textual content and vector indexes are top quality and are built-in in the identical question planner as scalar indexes and GIS indexes, so hybrid search doesn’t want exterior orchestration.
AI features contained in the database
seekdb consists of in-built AI perform expressions that allow you to name fashions straight from SQL, and not using a separate software service mediating each name. The foremost features are:
- AI_EMBED to transform textual content into embeddings
- AI_COMPLETE for textual content era utilizing a chat or completion mannequin
- AI_RERANK to rerank an inventory of candidates
AI_PROMPT to assemble immediate templates and dynamic values right into a JSON object for AI_COMPLETE
Model metadata and endpoints are managed by the DBMS_AI_SERVICE package deal, which helps you to register exterior suppliers, set URLs, and configure keys, all on the database facet.
Multimodal knowledge and workloads
seekdb is constructed to deal with a number of knowledge modalities in a single node. it has a multimodal knowledge and indexing layer that covers vectors, textual content, JSON, and GIS, and a multi-model compute layer for hybrid workloads throughout vector, full textual content, and scalar situations.
It additionally gives JSON indexes for metadata queries and GIS indexes for spatial situations. This permits queries like:
- discover semantically comparable paperwork
- filter by JSON metadata like tenant, area, or class
- constrain by spatial vary or polygon
with out leaving the identical engine.
Because seekdb is derived from the OceanBase engine, it inherits ACID transactions, row and column hybrid storage, and vectorized execution, though excessive scale distributed deployments stay a job for the complete OceanBase database.
Comparison Table

Key Takeaways
- AI native hybrid search: seekdb unifies vector search, full textual content search and relational filtering in a single SQL and DBMS_HYBRID_SEARCH interface, so RAG and agent workloads can run multi sign retrieval in a single question as a substitute of sewing collectively a number of engines.
- Multimodal knowledge in a single engine: seekdb shops and indexes relational knowledge, vectors, textual content, JSON and GIS in the identical engine, which lets AI purposes hold paperwork, embeddings and metadata constant with out sustaining separate databases.
- In database AI features for RAG: With AI_EMBED, AI_COMPLETE, AI_RERANK and AI_PROMPT, seekdb can name embedding fashions, LLMs and rerankers straight from SQL, which simplifies RAG pipelines and strikes extra orchestration logic into the database layer.
- Single node, embedded pleasant design: seekdb is a single node, MySQL appropriate engine that helps embedded and standalone modes, whereas distributed, giant scale deployments stay the function of full OceanBase, which makes seekdb appropriate for native, edge and service embedded AI workloads.
- Open supply and instrument ecosystem: seekdb is open sourced underneath Apache 2.0 and integrates with a rising ecosystem of AI instruments and frameworks, with Python assist by way of pyseekdb and MCP primarily based integration for code assistants and brokers, so it might act as a unified knowledge aircraft for AI purposes.
Check out the Repo and Project. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The publish OceanBase Releases seekdb: An Open Source AI Native Hybrid Search Database for Multi-model RAG and AI Agents appeared first on MarkTechPost.
