Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent

Hermes Agent already remembers throughout periods. The open-source agent from Nous Research ships with curated reminiscence recordsdata and full-text session search. But a brand new neighborhood undertaking argues that in-built reminiscence is just too shallow for critical work. A new library named ‘Memory OS‘ has been launched underneath an MIT license by a developer (ClaudioDrews). It stacks six reminiscence layers onto Hermes. It provides a vector database, structured info, and an auto-curated data wiki. The undertaking is new but it surely appears to have potential and its structure exhibits how agent reminiscence could be layered.

Memory OS

Memory OS isn’t a Hermes plugin you toggle on. It is a layered system that sits beside Hermes Agent’s personal reminiscence. Hermes already gives workspace recordsdata and a session database. Memory OS retains these and provides 4 extra layers above them. The full stack runs domestically utilizing Docker, Qdrant, Redis, and Python 3.11+. It works with any LLM supplier Hermes helps, together with OpenRouter, OpenAI, Anthropic, and Ollama. The README frames it as a “reminiscence working system,” not a single function.

The Six Layers, From Files to Vectors

Layer 1 is Workspace. It holds MEMORY.md, USER.md, and CREATIVE.md, injected into the system immediate every flip.
Layer 2 is Sessions. It makes use of state.db, a SQLite database with FTS5 full-text search throughout dialog historical past.
Layer 3 is Structured Facts. It shops sturdy info in memory_store.db, utilizing SQLite, HRR, FTS5, and belief scoring. A suggestions loop adjusts these belief scores over time, alongside entity decision.
Layer 4 is Fabric, a closely forked model of the Icarus Plugin. This fork provides LLM-powered session extraction over the upstream esaradev/icarus-plugin. It handles cross-session recall by means of 16 instruments, together with fabric_recall, fabric_write, and fabric_brief.
Layer 5 is the Vector Database, constructed on Qdrant. It makes use of 4096d Cosine vectors plus BM25 sparse search, a keyword-style rating methodology.
Layer 6 is an LLM Wiki, an auto-curated vault of ideas, entities, and comparisons. That wiki is constantly ingested again into Qdrant by means of a course of known as wiki-continuous-ingest.

How the Retrieval Flow Works

The move sits on when reminiscence is learn and written. On pre_llm_call, Memory OS runs what it calls surgical recall. It pulls from 4 sources directly: Fabric, Qdrant, Sessions, and Facts. Each supply is gated by a relevance threshold earlier than something reaches the mannequin. Per-session deduplication stops the identical context from showing twice. A social-closer filter skips trivial messages, equivalent to a plain “thanks.” On post_llm_call and on_session_end, the system extracts and captures new learnings routinely. The said objective is token effectivity, not stuffing the context window.

The Fallback Cascade and Cleanup

Layer 5’s retrieval makes use of a four-level fallback. It tries hybrid search first, then dense vectors, then lexical, then SQLite. If one methodology fails or returns nothing, the following takes over. This design retains recall working even when the vector database struggles. Memory OS additionally runs a weekly decay scanner to age out stale entries. Semantic dedup merges near-identical reminiscences when cosine similarity exceeds 0.92. These housekeeping steps goal to cease reminiscence from bloating over months of use.

Local-First, And Deliberately So

Memory OS positions itself in opposition to cloud reminiscence companies like mem0, Zep, and Letta. Its pitch is that reminiscence infrastructure ought to run on your personal machine. The reminiscence information stays native, with no reminiscence subscription. LLM calls nonetheless go to whichever supplier you select. Hermes itself already helps eight exterior reminiscence suppliers, together with mem0 and Honcho. Memory OS isn’t one of these official suppliers. It is a separate, community-built stack layered on Hermes immediately. For groups with data-residency guidelines, an area reminiscence retailer can matter.

Strengths and Limitations

Strengths:

Clear layered design separating recordsdata, periods, info, vectors, and a wiki
Fully native infrastructure with no cloud reminiscence subscription
Provider-agnostic, matching Hermes Agent’s personal flexibility
Token-efficient retrieval by design, by way of gated sources and per-session deduplication

Limitations:

Brand new, with few commits
A forked Icarus Plugin that the creator says isn’t upstream-compatible
Heavier setup: Docker, Qdrant, Redis, and an ARQ Worker all required
No revealed benchmarks on recall high quality, latency, or token financial savings

Key Takeaways

Memory OS is a community-built, MIT-licensed stack that provides six reminiscence layers on prime of Hermes Agent.
It combines workspace recordsdata, FTS5 session search, trust-scored info, a forked Icarus cloth, Qdrant vectors, and an auto-curated LLM wiki.
Retrieval runs on pre_llm_call with gated, deduplicated recall from 4 sources; seize runs on post_llm_call and on_session_end.
Memory infrastructure is absolutely native and provider-agnostic, however LLM calls nonetheless go to your chosen supplier.

Check out the Repo. Also, be at liberty to comply with us on Twitter and don’t neglect to affix our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The publish Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent appeared first on MarkTechPost.

Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent

Memory OS

The Six Layers, From Files to Vectors

How the Retrieval Flow Works

The Fallback Cascade and Cleanup

Local-First, And Deliberately So

Strengths and Limitations

Key Takeaways

How to Build a Self-Designing Meta-Agent That Automatically Constructs, Instantiates, and Refines Task-Specific AI Agents

NVIDIA AI Released Nemotron Speech ASR: A New Open Source Transcription Model Designed from the Ground Up for Low-Latency Use Cases like Voice Agents

TinyFish AI Releases Full Web Infrastructure Platform for AI Agents: Search, Fetch, Browser, and Agent Under One API Key

A New AI Research from Anthropic and Thinking Machines Lab Stress Tests Model Specs and Reveal Character Differences among Language Models

Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights

How an AI Agent Chooses What to Do Under Tokens, Latency, and Tool-Call Budget Constraints?

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Memory OS

The Six Layers, From Files to Vectors

How the Retrieval Flow Works

The Fallback Cascade and Cleanup

Local-First, And Deliberately So

Strengths and Limitations

Key Takeaways

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!