Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent
Hermes Agent already remembers throughout periods. The open-source agent from Nous Research ships with curated reminiscence recordsdata and full-text session search. But a brand new neighborhood undertaking argues that in-built reminiscence is just too shallow for critical work. A new library named ‘Memory OS‘ has been launched underneath an MIT license by a developer (ClaudioDrews). It stacks six reminiscence layers onto Hermes. It provides a vector database, structured info, and an auto-curated data wiki. The undertaking is new but it surely appears to have potential and its structure exhibits how agent reminiscence could be layered.
Memory OS
Memory OS isn’t a Hermes plugin you toggle on. It is a layered system that sits beside Hermes Agent’s personal reminiscence. Hermes already gives workspace recordsdata and a session database. Memory OS retains these and provides 4 extra layers above them. The full stack runs domestically utilizing Docker, Qdrant, Redis, and Python 3.11+. It works with any LLM supplier Hermes helps, together with OpenRouter, OpenAI, Anthropic, and Ollama. The README frames it as a “reminiscence working system,” not a single function.
The Six Layers, From Files to Vectors
- Layer 1 is Workspace. It holds MEMORY.md, USER.md, and CREATIVE.md, injected into the system immediate every flip.
- Layer 2 is Sessions. It makes use of state.db, a SQLite database with FTS5 full-text search throughout dialog historical past.
- Layer 3 is Structured Facts. It shops sturdy info in memory_store.db, utilizing SQLite, HRR, FTS5, and belief scoring. A suggestions loop adjusts these belief scores over time, alongside entity decision.
- Layer 4 is Fabric, a closely forked model of the Icarus Plugin. This fork provides LLM-powered session extraction over the upstream esaradev/icarus-plugin. It handles cross-session recall by means of 16 instruments, together with fabric_recall, fabric_write, and fabric_brief.
- Layer 5 is the Vector Database, constructed on Qdrant. It makes use of 4096d Cosine vectors plus BM25 sparse search, a keyword-style rating methodology.
- Layer 6 is an LLM Wiki, an auto-curated vault of ideas, entities, and comparisons. That wiki is constantly ingested again into Qdrant by means of a course of known as wiki-continuous-ingest.
How the Retrieval Flow Works
The move sits on when reminiscence is learn and written. On pre_llm_call, Memory OS runs what it calls surgical recall. It pulls from 4 sources directly: Fabric, Qdrant, Sessions, and Facts. Each supply is gated by a relevance threshold earlier than something reaches the mannequin. Per-session deduplication stops the identical context from showing twice. A social-closer filter skips trivial messages, equivalent to a plain “thanks.” On post_llm_call and on_session_end, the system extracts and captures new learnings routinely. The said objective is token effectivity, not stuffing the context window.
The Fallback Cascade and Cleanup
Layer 5’s retrieval makes use of a four-level fallback. It tries hybrid search first, then dense vectors, then lexical, then SQLite. If one methodology fails or returns nothing, the following takes over. This design retains recall working even when the vector database struggles. Memory OS additionally runs a weekly decay scanner to age out stale entries. Semantic dedup merges near-identical reminiscences when cosine similarity exceeds 0.92. These housekeeping steps goal to cease reminiscence from bloating over months of use.
Local-First, And Deliberately So
Memory OS positions itself in opposition to cloud reminiscence companies like mem0, Zep, and Letta. Its pitch is that reminiscence infrastructure ought to run on your personal machine. The reminiscence information stays native, with no reminiscence subscription. LLM calls nonetheless go to whichever supplier you select. Hermes itself already helps eight exterior reminiscence suppliers, together with mem0 and Honcho. Memory OS isn’t one of these official suppliers. It is a separate, community-built stack layered on Hermes immediately. For groups with data-residency guidelines, an area reminiscence retailer can matter.
Strengths and Limitations
Strengths:
- Clear layered design separating recordsdata, periods, info, vectors, and a wiki
- Fully native infrastructure with no cloud reminiscence subscription
- Provider-agnostic, matching Hermes Agent’s personal flexibility
- Token-efficient retrieval by design, by way of gated sources and per-session deduplication
Limitations:
- Brand new, with few commits
- A forked Icarus Plugin that the creator says isn’t upstream-compatible
- Heavier setup: Docker, Qdrant, Redis, and an ARQ Worker all required
- No revealed benchmarks on recall high quality, latency, or token financial savings
Key Takeaways
- Memory OS is a community-built, MIT-licensed stack that provides six reminiscence layers on prime of Hermes Agent.
- It combines workspace recordsdata, FTS5 session search, trust-scored info, a forked Icarus cloth, Qdrant vectors, and an auto-curated LLM wiki.
- Retrieval runs on
pre_llm_callwith gated, deduplicated recall from 4 sources; seize runs onpost_llm_callandon_session_end. - Memory infrastructure is absolutely native and provider-agnostic, however LLM calls nonetheless go to your chosen supplier.
Check out the Repo. Also, be at liberty to comply with us on Twitter and don’t neglect to affix our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us
The publish Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent appeared first on MarkTechPost.

6 layers, absolutely native: