A Coding Implementation to Build Agent-Native Memory Infrastructure with Memori for Persistent Multi-User and Multi-Session LLM Applications

In this tutorial, we implement how Memori serves as an agent-native reminiscence infrastructure layer for constructing extra persistent, context-aware LLM functions. We begin by organising Memori in a Google Colab setting and connecting it to each synchronous and asynchronous OpenAI shoppers, so that each mannequin name can robotically move by means of the reminiscence layer. We then transfer on to sensible examples that present how consumer knowledge is saved, retrieved, and separated throughout completely different identities, agent roles, and classes. We additionally take a look at streaming responses, async calls, and a small customer-support agent workflow to perceive how reminiscence behaves in lifelike multi-turn functions. By the top of the tutorial, we acquire a transparent understanding of how Memori helps us construct AI brokers that don’t deal with every dialog in isolation however as a substitute retain helpful context throughout interactions.

Copy Code

import subprocess, sys
def _pip(*pkgs):
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", *pkgs])
_pip("memori>=3.3.0", "openai>=1.40.0", "nest_asyncio")
import os, getpass, time, uuid, asyncio
import nest_asyncio; nest_asyncio.apply()
if not os.getenv("OPENAI_API_KEY"):
   os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY: ")
if not os.getenv("MEMORI_API_KEY"):
   v = getpass.getpass("MEMORI_API_KEY (go away clean for rate-limited tier): ")
   if v.strip():
       os.environ["MEMORI_API_KEY"] = v.strip()
   else:
       print("→ No MEMORI_API_KEY set. Continuing with rate-limited tier.")

We set up Memori, OpenAI, and Nest AsyncIO so the tutorial runs easily inside Google Colab. We load the required Python modules and put together the pocket book to deal with async execution with out runtime points. We additionally accumulate the OpenAI API key and non-compulsory Memori API key, permitting the workflow to run both with authenticated Memori entry or the rate-limited tier.

Copy Code

from memori import Memori
from openai import OpenAI, AsyncOpenAI
consumer       = OpenAI()
async_client = AsyncOpenAI()
mem = Memori()
mem.llm.register(consumer)
mem.llm.register(async_client)
MODEL        = "gpt-4o-mini"
WRITE_DELAY  = 6
def ask(immediate, system=None):
   msgs = []
   if system: msgs.append({"function": "system", "content material": system})
   msgs.append({"function": "consumer", "content material": immediate})
   r = consumer.chat.completions.create(mannequin=MODEL, messages=msgs)
   return r.decisions[0].message.content material
def banner(t): print("n" + "="*78 + f"n {t}n" + "="*78)

We import Memori and create each synchronous and asynchronous OpenAI shoppers for completely different LLM interplay patterns. We register each shoppers with Memori in order that reminiscence can robotically intercept and enrich chat completion calls. We additionally outline a reusable ask() helper and a banner() operate to preserve the tutorial output clear and organized.

Copy Code

banner("Part 1 — Basic reminiscence: info persist throughout turns")
mem.attribution(entity_id="[email protected]", process_id="personal-assistant")
ask("My title is Alice. I like climbing, Italian meals, and I'm allergic to peanuts.")
time.sleep(WRITE_DELAY)
print("[Alice]", ask("What are you aware about me? Be particular."))
banner("Part 2 — Multi-tenant reminiscence: Bob's info do not leak into Alice's recall")
mem.attribution(entity_id="[email protected]", process_id="personal-assistant")
ask("I'm Bob. Vegetarian, write Rust for a dwelling, reside in Berlin.")
time.sleep(WRITE_DELAY)
mem.attribution(entity_id="[email protected]", process_id="personal-assistant")
print("[Alice]", ask("What's my favourite delicacies and any dietary points?"))
mem.attribution(entity_id="[email protected]", process_id="personal-assistant")
print("[Bob]  ", ask("Which programming language do I write professionally?"))

We start by testing fundamental reminiscence persistence: Alice shares private info, and the mannequin later recollects them. We then swap to Bob and retailer a separate set of particulars to display multi-tenant reminiscence isolation. We return to Alice and Bob individually to verify that every consumer’s info stay scoped to the right entity.

Copy Code

banner("Part 3 — Same consumer, completely different agent personas by way of process_id")
mem.attribution(entity_id="[email protected]", process_id="fitness-coach")
ask("Goal: sub-25-minute 5K by June. Currently I run half-hour flat.")
time.sleep(WRITE_DELAY)
mem.attribution(entity_id="[email protected]", process_id="meal-planner")
ask("Prefer low-carb dinners on weekdays.")
time.sleep(WRITE_DELAY)
mem.attribution(entity_id="[email protected]", process_id="fitness-coach")
print("[fitness-coach]", ask("Remind me of my operating purpose."))
mem.attribution(entity_id="[email protected]", process_id="meal-planner")
print("[meal-planner] ", ask("Suggest tonight's dinner."))
banner("Part 4 — Sessions group associated turns")
mem.attribution(entity_id="[email protected]", process_id="personal-assistant")
project_session = f"project-fastapi-{uuid.uuid4().hex[:8]}"
mem.set_session(project_session)
ask("Notes: constructing a FastAPI app referred to as 'Lighthouse', Python 3.12, "
   "deploying to Fly.io.")
time.sleep(WRITE_DELAY)
ask("Decision: SQLAlchemy + Alembic for the info layer.")
time.sleep(WRITE_DELAY)
mem.new_session()
ask("Random apart: I simply adopted a pet named Mochi.")
time.sleep(WRITE_DELAY)
mem.set_session(project_session)
print("[project session]",
     ask("Summarize what we have determined about Lighthouse to this point."))

We present how the identical consumer can have completely different recollections throughout completely different agent personas utilizing separate process_id values. We retailer Alice’s health purpose underneath a health coach and her dinner choice underneath a meal planner, then confirm that every agent recollects solely its related context. We additionally create a project-specific session for a FastAPI app and present how session administration retains associated undertaking selections separate from unrelated private particulars.

Copy Code

banner("Part 5 — Streaming")
mem.attribution(entity_id="[email protected]", process_id="personal-assistant")
stream = consumer.chat.completions.create(
   mannequin=MODEL,
   messages=[{"role": "user",
              "content": "In two sentences, what do you remember about me?"}],
   stream=True,
)
print("[stream] ", finish="")
for chunk in stream:
   d = chunk.decisions[0].delta.content material
   if d: print(d, finish="", flush=True)
print(); time.sleep(WRITE_DELAY)
banner("Part 6 — Async LLM calls")
async def async_demo():
   r = await async_client.chat.completions.create(
       mannequin=MODEL,
       messages=[{"role": "user",
                  "content": "What dietary restriction do I have? (asked async)"}],
   )
   return r.decisions[0].message.content material
print("[async]", asyncio.run(async_demo()))
banner("Part 7 — Mini assist agent throughout a number of classes")
def assist(user_id, immediate):
   mem.attribution(entity_id=user_id, process_id="support-bot")
   return ask(immediate, system=(
       "You are a relaxed, useful buyer assist agent. "
       "Use what you bear in mind in regards to the consumer. If you do not know, say so."
   ))
USER = "[email protected]"
mem.attribution(entity_id=USER, process_id="support-bot")
mem.new_session()
print("[support T1]", assist(USER,
   "Hi! I'm Charlie, on the Pro plan. Email: [email protected]. "
   "Billing query for subsequent month."))
time.sleep(WRITE_DELAY)
mem.new_session()
print("[support T2]", assist(USER,
   "Hey, me once more. What plan am I on and what's my e-mail of report?"))
banner("Done. Open https://app.memorilabs.ai to examine recollections, "
      "or use Memori BYODB to level at your personal Postgres.")

We take a look at Memori with streaming responses to verify that reminiscence continues working when tokens arrive incrementally. We then run an asynchronous OpenAI name and confirm that the async consumer may entry saved consumer context. Also, we constructed a mini support-agent movement that remembers Charlie’s plan and e-mail throughout separate classes, demonstrating how Memori helps lifelike, long-term buyer interactions.

In conclusion, we constructed and examined a whole Memori-powered reminiscence workflow for LLM brokers. We noticed how Memori shops fundamental consumer preferences, retains Alice’s and Bob’s recollections remoted, and permits the identical consumer to keep completely different recollections throughout separate agent personas, equivalent to a health coach and a meal planner. We additionally explored how classes assist us group project-specific conversations, whereas unrelated particulars keep exterior the lively session context. Beyond fundamental recall, we verified that Memori continues to work with streaming outputs, asynchronous OpenAI calls, and a mini support-agent state of affairs the place a consumer’s plan and e-mail are remembered throughout new conversations. Also, we created a sensible basis for constructing customized AI assistants, assist bots, workflow brokers, and multi-agent programs that bear in mind essential context whereas preserving reminiscence organized, scoped, and reusable.

Check out the Full Codes with Notebook here. Also, be happy to observe us on Twitter and don’t neglect to be part of our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us

The submit A Coding Implementation to Build Agent-Native Memory Infrastructure with Memori for Persistent Multi-User and Multi-Session LLM Applications appeared first on MarkTechPost.