Build a Hybrid-Memory Autonomous Agent with Modular Architecture and Tool Dispatch Using OpenAI
In this tutorial, we start by exploring the structure behind a hybrid-memory autonomous agent. This system combines semantic vector search, keyword-based retrieval, and a modular tool-dispatching loop to create an agent able to reasoning, remembering, and performing autonomously. We stroll by every layer of the design from the bottom up, beginning with summary interfaces that implement clear separation of issues, all the best way to a stay agent that manages its personal long-term reminiscence.
!pip set up openai numpy rank_bm25 --quiet
import os, json, math, re, time, getpass
from abc import ABC, abstractmethod
from dataclasses import dataclass, area
from typing import Any, Callable, Dict, List, Optional, Tuple
import numpy as np
from rank_bm25 import BM25Okapi
from openai import OpenAI
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or getpass.getpass("
Enter your OpenAI API key (hidden): ")
shopper = OpenAI(api_key=OPENAI_API_KEY)
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL = "gpt-4o-mini"
print("
OpenAI shopper prepared.")
We kick issues off by putting in all required dependencies and configuring our Python atmosphere with the required imports. We securely gather the OpenAI API key utilizing getpass, guaranteeing the bottom line is by no means echoed to the terminal or pocket book output. We additionally outline the 2 international constants, the embedding mannequin and the chat mannequin, that each subsequent snippet is determined by.
class MemoryBackend(ABC):
@abstractmethod
def retailer(self, textual content: str, metadata: Dict[str, Any]) -> str: ...
@abstractmethod
def search(self, question: str, top_k: int = 5) -> List[Dict[str, Any]]: ...
@abstractmethod
def list_all(self) -> List[Dict[str, Any]]: ...
class LLMProvider(ABC):
@abstractmethod
def full(self, messages: List[Dict], instruments: Optional[List] = None) -> Dict: ...
class Tool(ABC):
title: str
description: str
@abstractmethod
def run(self, **kwargs) -> str: ...
def schema(self) -> Dict:
return {
"kind": "perform",
"perform": {
"title": self.title,
"description": self.description,
"parameters": {"kind": "object", "properties": {}, "required": []},
},
}
@dataclass
class MemoryChunk:
id: str
textual content: str
metadata: Dict[str, Any]
embedding: Optional[np.ndarray] = area(default=None, repr=False)
def _embed(texts: List[str]) -> List[np.ndarray]:
resp = shopper.embeddings.create(mannequin=EMBED_MODEL, enter=texts)
vecs = [np.array(d.embedding, dtype=np.float32) for d in resp.data]
return [v / (np.linalg.norm(v) + 1e-10) for v in vecs]
def _tokenise(textual content: str) -> List[str]:
return re.sub(r"[^a-z0-9s]", "", textual content.decrease()).break up()
class HybridMemory(MemoryBackend):
RRF_K = 60
def __init__(self):
self._chunks: List[MemoryChunk] = []
self._bm25: Optional[BM25Okapi] = None
self._counter = 0
def retailer(self, textual content: str, metadata: Dict[str, Any] | None = None) -> str:
metadata = metadata or {}
self._counter += 1
chunk_id = f"mem_{self._counter:04d}"
[vec] = _embed([text])
chunk = MemoryChunk(id=chunk_id, textual content=textual content, metadata=metadata, embedding=vec)
self._chunks.append(chunk)
corpus = [_tokenise(c.text) for c in self._chunks]
self._bm25 = BM25Okapi(corpus)
print(f"
Stored [{chunk_id}]: {textual content[:60]}…" if len(textual content) > 60 else f"
Stored [{chunk_id}]: {textual content}")
return chunk_id
def search(self, question: str, top_k: int = 5) -> List[Dict[str, Any]]:
if not self._chunks:
return []
n = len(self._chunks)
top_k = min(top_k, n)
[q_vec] = _embed([query])
cos_scores = np.array([np.dot(q_vec, c.embedding) for c in self._chunks])
vec_ranks = {self._chunks[i].id: rank + 1 for rank, i in enumerate(np.argsort(-cos_scores))}
bm25_scores = self._bm25.get_scores(_tokenise(question))
kw_ranks = {self._chunks[i].id: rank + 1 for rank, i in enumerate(np.argsort(-bm25_scores))}
rrf: Dict[str, float] = {}
for chunk in self._chunks:
cid = chunk.id
rrf[cid] = (1.0 / (self.RRF_K + vec_ranks.get(cid, n + 1)) +
1.0 / (self.RRF_K + kw_ranks.get(cid, n + 1)))
ranked_ids = sorted(rrf, key=lambda x: rrf[x], reverse=True)[:top_k]
outcomes = []
ids = [c.id for c in self._chunks]
for cid in ranked_ids:
chunk = subsequent(c for c in self._chunks if c.id == cid)
outcomes.append({
"id": chunk.id,
"textual content": chunk.textual content,
"metadata": chunk.metadata,
"rrf_score": spherical(rrf[cid], 6),
"cosine": spherical(float(cos_scores[ids.index(cid)]), 4),
"bm25": spherical(float(bm25_scores[ids.index(cid)]), 4),
})
return outcomes
def list_all(self) -> List[Dict[str, Any]]:
return [{"id": c.id, "text": c.text, "metadata": c.metadata} for c in self._chunks]
class OpenAIProvider(LLMProvider):
def __init__(self, mannequin: str = CHAT_MODEL, temperature: float = 0.2):
self.mannequin = mannequin
self.temperature = temperature
def full(self, messages: List[Dict], instruments: Optional[List] = None) -> Dict:
kwargs: Dict[str, Any] = dict(mannequin=self.mannequin, messages=messages, temperature=self.temperature)
if instruments:
kwargs["tools"] = instruments
kwargs["tool_choice"] = "auto"
response = shopper.chat.completions.create(**kwargs)
msg = response.selections[0].message
end result: Dict[str, Any] = {"position": "assistant", "content material": msg.content material or ""}
if msg.tool_calls:
end result["tool_calls"] = [
{
"id": tc.id,
"type": "function",
"function": {"name": tc.function.name, "arguments": tc.function.arguments},
}
for tc in msg.tool_calls
]
return end result
print("
Interfaces, HybridMemory, and OpenAIProvider prepared.")
We outline the three core summary base lessons, MemoryBackend, LLMProvider, and Tool, that function the interface contracts each concrete part should honour. We then implement HybridMemory, which shops embeddings for vector search and maintains a stay BM25 index for key phrase matching, merging each end result units utilizing Reciprocal Rank Fusion. We shut the snippet with OpenAIProvider, a concrete LLMProvider that normalises the OpenAI response into a provider-agnostic dictionary the agent can devour with out realizing which mannequin sits beneath.
class MemoryRetailerTool(Tool):
title = "memory_store"
description = "Save an necessary truth or piece of data to long-term reminiscence."
def __init__(self, reminiscence: MemoryBackend):
self._mem = reminiscence
def run(self, textual content: str, class: str = "basic") -> str:
chunk_id = self._mem.retailer(textual content, {"class": class})
return f"Stored as {chunk_id}."
def schema(self) -> Dict:
return {
"kind": "perform",
"perform": {
"title": self.title,
"description": self.description,
"parameters": {
"kind": "object",
"properties": {
"textual content": {"kind": "string", "description": "The truth to recollect."},
"class": {"kind": "string", "description": "Category tag, e.g. 'user_pref', 'job', 'truth'."},
},
"required": ["text"],
},
},
}
class MemorySearchTool(Tool):
title = "memory_search"
description = "Search long-term reminiscence for info related to a question."
def __init__(self, reminiscence: MemoryBackend):
self._mem = reminiscence
def run(self, question: str, top_k: int = 3) -> str:
outcomes = self._mem.search(question, top_k=top_k)
if not outcomes:
return "No related recollections discovered."
traces = [f"[{r['id']}] (rating={r['rrf_score']}) {r['text']}" for r in outcomes]
return "Relevant recollections:n" + "n".be a part of(traces)
def schema(self) -> Dict:
return {
"kind": "perform",
"perform": {
"title": self.title,
"description": self.description,
"parameters": {
"kind": "object",
"properties": {
"question": {"kind": "string", "description": "What to search for."},
"top_k": {"kind": "integer", "description": "Max outcomes (default 3)."},
},
"required": ["query"],
},
},
}
class CalculatorTool(Tool):
title = "calculator"
description = "Evaluate a protected mathematical expression, e.g. '2 ** 10 + sqrt(144)'."
def run(self, expression: str) -> str:
allowed = {okay: getattr(math, okay) for okay in dir(math) if not okay.startswith("_")}
allowed.replace({"abs": abs, "spherical": spherical})
attempt:
end result = eval(expression, {"__builtins__": {}}, allowed)
return str(end result)
besides Exception as exc:
return f"Error: {exc}"
def schema(self) -> Dict:
return {
"kind": "perform",
"perform": {
"title": self.title,
"description": self.description,
"parameters": {
"kind": "object",
"properties": {
"expression": {"kind": "string", "description": "Math expression to judge."},
},
"required": ["expression"],
},
},
}
class WebSnippetTool(Tool):
title = "web_search"
description = "Search the online for present info on a subject (simulated)."
_KB = {
"openai": "OpenAI is an AI security firm that develops the GPT household of fashions.",
"rag": "Retrieval-Augmented Generation (RAG) combines a retrieval system with an LLM to floor solutions in exterior paperwork.",
"bm25": "BM25 (Best Match 25) is a probabilistic key phrase rating perform utilized in serps.",
}
def run(self, question: str) -> str:
q = question.decrease()
for kw, snippet in self._KB.gadgets():
if kw in q:
return f"Web snippet for '{question}': {snippet}"
return f"No snippet discovered for '{question}'. (Mock software — combine a actual search API right here.)"
def schema(self) -> Dict:
return {
"kind": "perform",
"perform": {
"title": self.title,
"description": self.description,
"parameters": {
"kind": "object",
"properties": {
"question": {"kind": "string", "description": "Search question."},
},
"required": ["query"],
},
},
}
@dataclass
class AgentPersona:
title: str
position: str
traits: List[str]
forbidden_phrases: List[str] = area(default_factory=checklist)
targets: List[str] = area(default_factory=checklist)
def compile_system_prompt(self, extra_context: str = "") -> str:
traces = [
f"You are {self.name}, {self.role}.",
"",
"## Core Traits",
*[f"- {t}" for t in self.traits],
]
if self.targets:
traces += ["", "## Goals", *[f"- {g}" for g in self.goals]]
if self.forbidden_phrases:
traces += ["", "## Forbidden Phrases (never say these)", *[f"- "{p}"" for p in self.forbidden_phrases]]
if extra_context:
traces += ["", "## Live Context", extra_context]
traces += [
"",
"## Behaviour",
"- Always reason step-by-step before answering.",
"- Use available tools proactively; never guess when you can look up.",
"- After using memory_search, quote the retrieved ID in your answer.",
"- Keep answers concise unless depth is explicitly requested.",
]
return "n".be a part of(traces)
ARIA = AgentPersona(
title="Aria",
position="a exact, useful analysis assistant with a hybrid reminiscence system",
traits=["Methodical", "Curious", "Transparent about uncertainty", "Concise"],
targets=[
"Remember and connect information across conversations",
"Use tools whenever they can improve accuracy",
],
forbidden_phrases=["I cannot", "As an AI language model"],
)
print("
Tools and AgentPersona prepared.")
We implement 4 instruments, MemoryRetailerTool, MemorySearchTool, CalculatorTool, and WebSnippetTool, every implementing the Tool interface and exposing an OpenAI-compatible JSON schema for computerized perform invocation. We then introduce AgentPersona, a knowledge class that compiles traits, targets, and forbidden phrases into a totally deterministic system immediate at runtime. We instantiate our demo persona, Aria, whose compiled immediate is injected on the high of each dialog flip to make sure constant id throughout all interactions.
class AutonomousAgent:
MAX_TOOL_ROUNDS = 8
def __init__(self, persona: AgentPersona, llm: LLMProvider, reminiscence: MemoryBackend, instruments: List[Tool]):
self.persona = persona
self._llm = llm
self._memory = reminiscence
self._tools = {t.title: t for t in instruments}
self._history: List[Dict] = []
def chat(self, user_message: str, verbose: bool = True) -> str:
if verbose:
print(f"n{'═'*60}")
print(f"
USER: {user_message}")
print(f"{'═'*60}")
memory_context = self._build_memory_context(user_message)
system_prompt = self.persona.compile_system_prompt(memory_context)
messages = [{"role": "system", "content": system_prompt}]
messages += self._history
messages.append({"position": "person", "content material": user_message})
tool_schemas = [t.schema() for t in self._tools.values()]
for round_num in vary(self.MAX_TOOL_ROUNDS):
reply = self._llm.full(messages, instruments=tool_schemas if tool_schemas else None)
if "tool_calls" not in reply:
final_text = reply["content"]
if verbose:
print(f"n
ARIA: {final_text}")
self._history.append({"position": "person", "content material": user_message})
self._history.append({"position": "assistant", "content material": final_text})
return final_text
messages.append(reply)
for tc in reply["tool_calls"]:
tool_name = tc["function"]["name"]
attempt:
args = json.masses(tc["function"]["arguments"])
besides json.JSONDecodeError:
args = {}
if verbose:
print(f"n
TOOL CALL → {tool_name}({args})")
end result = self._tools[tool_name].run(**args) if tool_name in self._tools else f"Error: unknown software '{tool_name}'."
if verbose:
print(f" ↳ RESULT: {end result}")
messages.append({"position": "software", "tool_call_id": tc["id"], "content material": end result})
return "[Agent reached tool round limit — please rephrase your request.]"
def register_tool(self, software: Tool) -> None:
self._tools[tool.name] = software
print(f"
Tool registered: {software.title}")
def list_tools(self) -> List[str]:
return checklist(self._tools.keys())
def memory_dump(self) -> List[Dict]:
return self._memory.list_all()
def clear_history(self) -> None:
self._history.clear()
def _build_memory_context(self, question: str) -> str:
outcomes = self._memory.search(question, top_k=3)
if not outcomes:
return ""
snippets = "n".be a part of(f"- [{r['id']}] {r['text']}" for r in outcomes)
return f"Recalled recollections associated to this question:n{snippets}"
reminiscence = HybridMemory()
llm = OpenAIProvider(mannequin=CHAT_MODEL)
instruments = [
MemoryStoreTool(memory),
MemorySearchTool(memory),
CalculatorTool(),
WebSnippetTool(),
]
agent = AutonomousAgent(persona=ARIA, llm=llm, reminiscence=reminiscence, instruments=instruments)
print(f"
Agent '{ARIA.title}' bootstrapped with instruments: {agent.list_tools()}")
We construct the AutonomousAgent class, which owns the agentic loop, repeatedly sending messages to the LLM, detecting software calls, dispatching them to the proper software, and feeding outcomes again till a plain-text reply is produced. We wire collectively all prior parts, HybridMemory, OpenAIProvider, the 4 instruments, and the Aria persona, into a single bootstrapped agent occasion able to obtain person messages. We additionally expose utility strategies, comparable to register_tool for runtime hot-swapping and memory_dump for inspecting the complete state of long-term reminiscence.
print("n" + "═"*60)
print("DEMO 1 — Pre-seeding long-term reminiscence")
print("═"*60)
info = [
("Alice's favourite programming language is Rust.", {"category": "user_pref"}),
("Alice is working on a distributed key-value store called 'VelocityDB'.", {"category": "task"}),
("VelocityDB uses the Raft consensus algorithm for replication.", {"category": "fact"}),
("Alice has a meeting with the infrastructure team on Friday at 2 PM.", {"category": "calendar"}),
("The project deadline for VelocityDB v1.0 is March 31.", {"category": "task"}),
("Alice prefers concise answers without unnecessary preamble.", {"category": "user_pref"}),
("Order #4821 was placed by Alice for 32 GB of DDR5 RAM modules.", {"category": "order"}),
]
for textual content, meta in info:
reminiscence.retailer(textual content, meta)
print("n" + "═"*60)
print("DEMO 2 — Hybrid Memory Search Showdown")
print("═"*60)
test_queries = [
"What consensus algorithm does VelocityDB use?",
"order 4821",
"Alice's language preference",
]
for q in test_queries:
print(f"n
Query: '{q}'")
outcomes = reminiscence.search(q, top_k=2)
for r in outcomes:
print(f" [{r['id']}] cosine={r['cosine']:.3f} bm25={r['bm25']:.2f} rrf={r['rrf_score']:.5f}")
print(f" → {r['text']}")
print("n" + "═"*60)
print("DEMO 3 — Autonomous Agent Conversations")
print("═"*60)
agent.chat("What are you aware about Alice's venture? What's the deadline and which algorithm does it depend on?")
agent.chat("Can you discover the small print on order quantity 4821?")
agent.chat(
"There are 22 working days till March 31. "
"If Alice works 6.5 hours per day on VelocityDB, "
"what number of complete hours does she have left?"
)
agent.chat(
"Alice simply determined to modify the storage engine of VelocityDB from LSM-tree to B-tree. "
"Please bear in mind this determination."
)
agent.chat("What storage engine determination did Alice make for VelocityDB?")
print("n" + "═"*60)
print("DEMO 4 — Runtime Tool Hot-Swap (vtable sample)")
print("═"*60)
class UpgradedWebSnippetTool(WebSnippetTool):
_KB = {
**WebSnippetTool._KB,
"lsm-tree": "An LSM-tree (Log-Structured Merge-tree) optimises write throughput at the price of learn amplification.",
}
agent.register_tool(UpgradedWebSnippetTool())
agent.chat(
"Can you search the online for a transient clarification of B-tree storage engines "
"and inform me if it is a good match for Alice's venture?"
)
print("n" + "═"*60)
print("FINAL — Full Memory Dump")
print("═"*60)
for chunk in agent.memory_dump():
print(f" [{chunk['id']}] ({chunk['metadata'].get('class','?')}) {chunk['text']}")
We run 4 progressive demo eventualities that train each layer of the structure we’ve constructed: seeding long-term reminiscence with structured info, working direct hybrid search queries to watch how vector and BM25 scores mix, conducting a multi-turn autonomous dialog the place the agent remembers, computes, and shops info by itself, and lastly hot-swapping a software at runtime to display the vtable sample in motion. We shut by dumping the complete reminiscence state to confirm that every one autonomously saved selections have been accurately continued. We finish with an structure recap desk that maps each part again to the sample it implements.
In conclusion, we’ve walked by the entire building of a hybrid-memory autonomous agent, from summary interface contracts and dual-path retrieval all the best way to a self-directing agent loop that shops, remembers, and causes over info with none hard-coded logic. We have seen how the modular design permits any part, the reminiscence backend, the language mannequin supplier, or particular person instruments, to be swapped or prolonged at runtime with zero modifications to the agent core. This property makes the structure genuinely production-ready.
Check out the Full Codes with Notebook here. Also, be happy to observe us on Twitter and don’t neglect to affix our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us
The publish Build a Hybrid-Memory Autonomous Agent with Modular Architecture and Tool Dispatch Using OpenAI appeared first on MarkTechPost.
