Build a Hybrid-Memory Autonomous Agent with Modular Architecture and Tool Dispatch Using OpenAI

In this tutorial, we start by exploring the structure behind a hybrid-memory autonomous agent. This system combines semantic vector search, keyword-based retrieval, and a modular tool-dispatching loop to create an agent able to reasoning, remembering, and performing autonomously. We stroll by every layer of the design from the bottom up, beginning with summary interfaces that implement clear separation of issues, all the best way to a stay agent that manages its personal long-term reminiscence.

Copy Code

!pip set up openai numpy rank_bm25 --quiet


import os, json, math, re, time, getpass
from abc import ABC, abstractmethod
from dataclasses import dataclass, area
from typing import Any, Callable, Dict, List, Optional, Tuple


import numpy as np
from rank_bm25 import BM25Okapi
from openai import OpenAI


OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or getpass.getpass("  Enter your OpenAI API key (hidden): ")
shopper = OpenAI(api_key=OPENAI_API_KEY)


EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL  = "gpt-4o-mini"


print("  OpenAI shopper prepared.")

We kick issues off by putting in all required dependencies and configuring our Python atmosphere with the required imports. We securely gather the OpenAI API key utilizing getpass, guaranteeing the bottom line is by no means echoed to the terminal or pocket book output. We additionally outline the 2 international constants, the embedding mannequin and the chat mannequin, that each subsequent snippet is determined by.

Copy Code

class MemoryBackend(ABC):
   @abstractmethod
   def retailer(self, textual content: str, metadata: Dict[str, Any]) -> str: ...
   @abstractmethod
   def search(self, question: str, top_k: int = 5) -> List[Dict[str, Any]]: ...
   @abstractmethod
   def list_all(self) -> List[Dict[str, Any]]: ...


class LLMProvider(ABC):
   @abstractmethod
   def full(self, messages: List[Dict], instruments: Optional[List] = None) -> Dict: ...


class Tool(ABC):
   title: str
   description: str


   @abstractmethod
   def run(self, **kwargs) -> str: ...


   def schema(self) -> Dict:
       return {
           "kind": "perform",
           "perform": {
               "title": self.title,
               "description": self.description,
               "parameters": {"kind": "object", "properties": {}, "required": []},
           },
       }




@dataclass
class MemoryChunk:
   id: str
   textual content: str
   metadata: Dict[str, Any]
   embedding: Optional[np.ndarray] = area(default=None, repr=False)




def _embed(texts: List[str]) -> List[np.ndarray]:
   resp = shopper.embeddings.create(mannequin=EMBED_MODEL, enter=texts)
   vecs = [np.array(d.embedding, dtype=np.float32) for d in resp.data]
   return [v / (np.linalg.norm(v) + 1e-10) for v in vecs]




def _tokenise(textual content: str) -> List[str]:
   return re.sub(r"[^a-z0-9s]", "", textual content.decrease()).break up()




class HybridMemory(MemoryBackend):
   RRF_K = 60


   def __init__(self):
       self._chunks: List[MemoryChunk] = []
       self._bm25: Optional[BM25Okapi] = None
       self._counter = 0


   def retailer(self, textual content: str, metadata: Dict[str, Any] | None = None) -> str:
       metadata = metadata or {}
       self._counter += 1
       chunk_id = f"mem_{self._counter:04d}"
       [vec] = _embed([text])
       chunk = MemoryChunk(id=chunk_id, textual content=textual content, metadata=metadata, embedding=vec)
       self._chunks.append(chunk)
       corpus = [_tokenise(c.text) for c in self._chunks]
       self._bm25 = BM25Okapi(corpus)
       print(f"     Stored [{chunk_id}]: {textual content[:60]}…" if len(textual content) > 60 else f"     Stored [{chunk_id}]: {textual content}")
       return chunk_id


   def search(self, question: str, top_k: int = 5) -> List[Dict[str, Any]]:
       if not self._chunks:
           return []
       n = len(self._chunks)
       top_k = min(top_k, n)


       [q_vec] = _embed([query])
       cos_scores = np.array([np.dot(q_vec, c.embedding) for c in self._chunks])
       vec_ranks = {self._chunks[i].id: rank + 1 for rank, i in enumerate(np.argsort(-cos_scores))}


       bm25_scores = self._bm25.get_scores(_tokenise(question))
       kw_ranks = {self._chunks[i].id: rank + 1 for rank, i in enumerate(np.argsort(-bm25_scores))}


       rrf: Dict[str, float] = {}
       for chunk in self._chunks:
           cid = chunk.id
           rrf[cid] = (1.0 / (self.RRF_K + vec_ranks.get(cid, n + 1)) +
                       1.0 / (self.RRF_K + kw_ranks.get(cid, n + 1)))


       ranked_ids = sorted(rrf, key=lambda x: rrf[x], reverse=True)[:top_k]
       outcomes = []
       ids = [c.id for c in self._chunks]
       for cid in ranked_ids:
           chunk = subsequent(c for c in self._chunks if c.id == cid)
           outcomes.append({
               "id": chunk.id,
               "textual content": chunk.textual content,
               "metadata": chunk.metadata,
               "rrf_score": spherical(rrf[cid], 6),
               "cosine": spherical(float(cos_scores[ids.index(cid)]), 4),
               "bm25": spherical(float(bm25_scores[ids.index(cid)]), 4),
           })
       return outcomes


   def list_all(self) -> List[Dict[str, Any]]:
       return [{"id": c.id, "text": c.text, "metadata": c.metadata} for c in self._chunks]




class OpenAIProvider(LLMProvider):
   def __init__(self, mannequin: str = CHAT_MODEL, temperature: float = 0.2):
       self.mannequin = mannequin
       self.temperature = temperature


   def full(self, messages: List[Dict], instruments: Optional[List] = None) -> Dict:
       kwargs: Dict[str, Any] = dict(mannequin=self.mannequin, messages=messages, temperature=self.temperature)
       if instruments:
           kwargs["tools"] = instruments
           kwargs["tool_choice"] = "auto"
       response = shopper.chat.completions.create(**kwargs)
       msg = response.selections[0].message
       end result: Dict[str, Any] = {"position": "assistant", "content material": msg.content material or ""}
       if msg.tool_calls:
           end result["tool_calls"] = [
               {
                   "id": tc.id,
                   "type": "function",
                   "function": {"name": tc.function.name, "arguments": tc.function.arguments},
               }
               for tc in msg.tool_calls
           ]
       return end result




print("  Interfaces, HybridMemory, and OpenAIProvider prepared.")

We outline the three core summary base lessons, MemoryBackend, LLMProvider, and Tool, that function the interface contracts each concrete part should honour. We then implement HybridMemory, which shops embeddings for vector search and maintains a stay BM25 index for key phrase matching, merging each end result units utilizing Reciprocal Rank Fusion. We shut the snippet with OpenAIProvider, a concrete LLMProvider that normalises the OpenAI response into a provider-agnostic dictionary the agent can devour with out realizing which mannequin sits beneath.

Copy Code

class MemoryRetailerTool(Tool):
   title = "memory_store"
   description = "Save an necessary truth or piece of data to long-term reminiscence."


   def __init__(self, reminiscence: MemoryBackend):
       self._mem = reminiscence


   def run(self, textual content: str, class: str = "basic") -> str:
       chunk_id = self._mem.retailer(textual content, {"class": class})
       return f"Stored as {chunk_id}."


   def schema(self) -> Dict:
       return {
           "kind": "perform",
           "perform": {
               "title": self.title,
               "description": self.description,
               "parameters": {
                   "kind": "object",
                   "properties": {
                       "textual content":     {"kind": "string", "description": "The truth to recollect."},
                       "class": {"kind": "string", "description": "Category tag, e.g. 'user_pref', 'job', 'truth'."},
                   },
                   "required": ["text"],
               },
           },
       }




class MemorySearchTool(Tool):
   title = "memory_search"
   description = "Search long-term reminiscence for info related to a question."


   def __init__(self, reminiscence: MemoryBackend):
       self._mem = reminiscence


   def run(self, question: str, top_k: int = 3) -> str:
       outcomes = self._mem.search(question, top_k=top_k)
       if not outcomes:
           return "No related recollections discovered."
       traces = [f"[{r['id']}] (rating={r['rrf_score']}) {r['text']}" for r in outcomes]
       return "Relevant recollections:n" + "n".be a part of(traces)


   def schema(self) -> Dict:
       return {
           "kind": "perform",
           "perform": {
               "title": self.title,
               "description": self.description,
               "parameters": {
                   "kind": "object",
                   "properties": {
                       "question": {"kind": "string", "description": "What to search for."},
                       "top_k": {"kind": "integer", "description": "Max outcomes (default 3)."},
                   },
                   "required": ["query"],
               },
           },
       }




class CalculatorTool(Tool):
   title = "calculator"
   description = "Evaluate a protected mathematical expression, e.g. '2 ** 10 + sqrt(144)'."


   def run(self, expression: str) -> str:
       allowed = {okay: getattr(math, okay) for okay in dir(math) if not okay.startswith("_")}
       allowed.replace({"abs": abs, "spherical": spherical})
       attempt:
           end result = eval(expression, {"__builtins__": {}}, allowed)
           return str(end result)
       besides Exception as exc:
           return f"Error: {exc}"


   def schema(self) -> Dict:
       return {
           "kind": "perform",
           "perform": {
               "title": self.title,
               "description": self.description,
               "parameters": {
                   "kind": "object",
                   "properties": {
                       "expression": {"kind": "string", "description": "Math expression to judge."},
                   },
                   "required": ["expression"],
               },
           },
       }




class WebSnippetTool(Tool):
   title = "web_search"
   description = "Search the online for present info on a subject (simulated)."


   _KB = {
       "openai": "OpenAI is an AI security firm that develops the GPT household of fashions.",
       "rag": "Retrieval-Augmented Generation (RAG) combines a retrieval system with an LLM to floor solutions in exterior paperwork.",
       "bm25": "BM25 (Best Match 25) is a probabilistic key phrase rating perform utilized in serps.",
   }


   def run(self, question: str) -> str:
       q = question.decrease()
       for kw, snippet in self._KB.gadgets():
           if kw in q:
               return f"Web snippet for '{question}': {snippet}"
       return f"No snippet discovered for '{question}'. (Mock software — combine a actual search API right here.)"


   def schema(self) -> Dict:
       return {
           "kind": "perform",
           "perform": {
               "title": self.title,
               "description": self.description,
               "parameters": {
                   "kind": "object",
                   "properties": {
                       "question": {"kind": "string", "description": "Search question."},
                   },
                   "required": ["query"],
               },
           },
       }




@dataclass
class AgentPersona:
   title: str
   position: str
   traits: List[str]
   forbidden_phrases: List[str] = area(default_factory=checklist)
   targets: List[str] = area(default_factory=checklist)


   def compile_system_prompt(self, extra_context: str = "") -> str:
       traces = [
           f"You are {self.name}, {self.role}.",
           "",
           "## Core Traits",
           *[f"- {t}" for t in self.traits],
       ]
       if self.targets:
           traces += ["", "## Goals", *[f"- {g}" for g in self.goals]]
       if self.forbidden_phrases:
           traces += ["", "## Forbidden Phrases (never say these)", *[f"- "{p}"" for p in self.forbidden_phrases]]
       if extra_context:
           traces += ["", "## Live Context", extra_context]
       traces += [
           "",
           "## Behaviour",
           "- Always reason step-by-step before answering.",
           "- Use available tools proactively; never guess when you can look up.",
           "- After using memory_search, quote the retrieved ID in your answer.",
           "- Keep answers concise unless depth is explicitly requested.",
       ]
       return "n".be a part of(traces)




ARIA = AgentPersona(
   title="Aria",
   position="a exact, useful analysis assistant with a hybrid reminiscence system",
   traits=["Methodical", "Curious", "Transparent about uncertainty", "Concise"],
   targets=[
       "Remember and connect information across conversations",
       "Use tools whenever they can improve accuracy",
   ],
   forbidden_phrases=["I cannot", "As an AI language model"],
)


print("  Tools and AgentPersona prepared.")

We implement 4 instruments, MemoryRetailerTool, MemorySearchTool, CalculatorTool, and WebSnippetTool, every implementing the Tool interface and exposing an OpenAI-compatible JSON schema for computerized perform invocation. We then introduce AgentPersona, a knowledge class that compiles traits, targets, and forbidden phrases into a totally deterministic system immediate at runtime. We instantiate our demo persona, Aria, whose compiled immediate is injected on the high of each dialog flip to make sure constant id throughout all interactions.

Copy Code

class AutonomousAgent:
   MAX_TOOL_ROUNDS = 8


   def __init__(self, persona: AgentPersona, llm: LLMProvider, reminiscence: MemoryBackend, instruments: List[Tool]):
       self.persona  = persona
       self._llm     = llm
       self._memory  = reminiscence
       self._tools   = {t.title: t for t in instruments}
       self._history: List[Dict] = []


   def chat(self, user_message: str, verbose: bool = True) -> str:
       if verbose:
           print(f"n{'═'*60}")
           print(f"  USER: {user_message}")
           print(f"{'═'*60}")


       memory_context = self._build_memory_context(user_message)
       system_prompt  = self.persona.compile_system_prompt(memory_context)


       messages = [{"role": "system", "content": system_prompt}]
       messages += self._history
       messages.append({"position": "person", "content material": user_message})


       tool_schemas = [t.schema() for t in self._tools.values()]


       for round_num in vary(self.MAX_TOOL_ROUNDS):
           reply = self._llm.full(messages, instruments=tool_schemas if tool_schemas else None)


           if "tool_calls" not in reply:
               final_text = reply["content"]
               if verbose:
                   print(f"n  ARIA: {final_text}")
               self._history.append({"position": "person",      "content material": user_message})
               self._history.append({"position": "assistant", "content material": final_text})
               return final_text


           messages.append(reply)


           for tc in reply["tool_calls"]:
               tool_name = tc["function"]["name"]
               attempt:
                   args = json.masses(tc["function"]["arguments"])
               besides json.JSONDecodeError:
                   args = {}


               if verbose:
                   print(f"n  TOOL CALL → {tool_name}({args})")


               end result = self._tools[tool_name].run(**args) if tool_name in self._tools else f"Error: unknown software '{tool_name}'."


               if verbose:
                   print(f"   ↳  RESULT: {end result}")


               messages.append({"position": "software", "tool_call_id": tc["id"], "content material": end result})


       return "[Agent reached tool round limit — please rephrase your request.]"


   def register_tool(self, software: Tool) -> None:
       self._tools[tool.name] = software
       print(f"     Tool registered: {software.title}")


   def list_tools(self) -> List[str]:
       return checklist(self._tools.keys())


   def memory_dump(self) -> List[Dict]:
       return self._memory.list_all()


   def clear_history(self) -> None:
       self._history.clear()


   def _build_memory_context(self, question: str) -> str:
       outcomes = self._memory.search(question, top_k=3)
       if not outcomes:
           return ""
       snippets = "n".be a part of(f"- [{r['id']}] {r['text']}" for r in outcomes)
       return f"Recalled recollections associated to this question:n{snippets}"




reminiscence = HybridMemory()
llm    = OpenAIProvider(mannequin=CHAT_MODEL)
instruments  = [
   MemoryStoreTool(memory),
   MemorySearchTool(memory),
   CalculatorTool(),
   WebSnippetTool(),
]


agent = AutonomousAgent(persona=ARIA, llm=llm, reminiscence=reminiscence, instruments=instruments)


print(f"  Agent '{ARIA.title}' bootstrapped with instruments: {agent.list_tools()}")

We construct the AutonomousAgent class, which owns the agentic loop, repeatedly sending messages to the LLM, detecting software calls, dispatching them to the proper software, and feeding outcomes again till a plain-text reply is produced. We wire collectively all prior parts, HybridMemory, OpenAIProvider, the 4 instruments, and the Aria persona, into a single bootstrapped agent occasion able to obtain person messages. We additionally expose utility strategies, comparable to register_tool for runtime hot-swapping and memory_dump for inspecting the complete state of long-term reminiscence.

Copy Code

print("n" + "═"*60)
print("DEMO 1 — Pre-seeding long-term reminiscence")
print("═"*60)


info = [
   ("Alice's favourite programming language is Rust.", {"category": "user_pref"}),
   ("Alice is working on a distributed key-value store called 'VelocityDB'.", {"category": "task"}),
   ("VelocityDB uses the Raft consensus algorithm for replication.", {"category": "fact"}),
   ("Alice has a meeting with the infrastructure team on Friday at 2 PM.", {"category": "calendar"}),
   ("The project deadline for VelocityDB v1.0 is March 31.", {"category": "task"}),
   ("Alice prefers concise answers without unnecessary preamble.", {"category": "user_pref"}),
   ("Order #4821 was placed by Alice for 32 GB of DDR5 RAM modules.", {"category": "order"}),
]


for textual content, meta in info:
   reminiscence.retailer(textual content, meta)




print("n" + "═"*60)
print("DEMO 2 — Hybrid Memory Search Showdown")
print("═"*60)


test_queries = [
   "What consensus algorithm does VelocityDB use?",
   "order 4821",
   "Alice's language preference",
]


for q in test_queries:
   print(f"n  Query: '{q}'")
   outcomes = reminiscence.search(q, top_k=2)
   for r in outcomes:
       print(f"   [{r['id']}] cosine={r['cosine']:.3f}  bm25={r['bm25']:.2f}  rrf={r['rrf_score']:.5f}")
       print(f"        → {r['text']}")




print("n" + "═"*60)
print("DEMO 3 — Autonomous Agent Conversations")
print("═"*60)


agent.chat("What are you aware about Alice's venture? What's the deadline and which algorithm does it depend on?")
agent.chat("Can you discover the small print on order quantity 4821?")
agent.chat(
   "There are 22 working days till March 31. "
   "If Alice works 6.5 hours per day on VelocityDB, "
   "what number of complete hours does she have left?"
)
agent.chat(
   "Alice simply determined to modify the storage engine of VelocityDB from LSM-tree to B-tree. "
   "Please bear in mind this determination."
)
agent.chat("What storage engine determination did Alice make for VelocityDB?")




print("n" + "═"*60)
print("DEMO 4 — Runtime Tool Hot-Swap (vtable sample)")
print("═"*60)


class UpgradedWebSnippetTool(WebSnippetTool):
   _KB = {
       **WebSnippetTool._KB,
       "lsm-tree": "An LSM-tree (Log-Structured Merge-tree) optimises write throughput at the price of learn amplification.",
   }


agent.register_tool(UpgradedWebSnippetTool())
agent.chat(
   "Can you search the online for a transient clarification of B-tree storage engines "
   "and inform me if it is a good match for Alice's venture?"
)




print("n" + "═"*60)
print("FINAL — Full Memory Dump")
print("═"*60)


for chunk in agent.memory_dump():
   print(f"  [{chunk['id']}] ({chunk['metadata'].get('class','?')}) {chunk['text']}")

We run 4 progressive demo eventualities that train each layer of the structure we’ve constructed: seeding long-term reminiscence with structured info, working direct hybrid search queries to watch how vector and BM25 scores mix, conducting a multi-turn autonomous dialog the place the agent remembers, computes, and shops info by itself, and lastly hot-swapping a software at runtime to display the vtable sample in motion. We shut by dumping the complete reminiscence state to confirm that every one autonomously saved selections have been accurately continued. We finish with an structure recap desk that maps each part again to the sample it implements.

In conclusion, we’ve walked by the entire building of a hybrid-memory autonomous agent, from summary interface contracts and dual-path retrieval all the best way to a self-directing agent loop that shops, remembers, and causes over info with none hard-coded logic. We have seen how the modular design permits any part, the reminiscence backend, the language mannequin supplier, or particular person instruments, to be swapped or prolonged at runtime with zero modifications to the agent core. This property makes the structure genuinely production-ready.

Check out the Full Codes with Notebook here. Also, be happy to observe us on Twitter and don’t neglect to affix our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us

The publish Build a Hybrid-Memory Autonomous Agent with Modular Architecture and Tool Dispatch Using OpenAI appeared first on MarkTechPost.

Build a Hybrid-Memory Autonomous Agent with Modular Architecture and Tool Dispatch Using OpenAI

CMU Researchers Introduce Go-Browse: A Graph-Based Framework for Scalable Web Agent Training

SDBench and MAI-DxO: Advancing Realistic, Cost-Aware Clinical Reasoning with AI

xAI Launches grok-voice-think-fast-1.0: Topping τ-voice Bench at 67.3%, Outperforming Gemini, GPT Realtime, and More

The Local AI Revolution: Expanding Generative AI with GPT-OSS-20B and the NVIDIA RTX AI PC

A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research

How to Design an Advanced Multi-Agent Reasoning System with spaCy Featuring Planning, Reflection, Memory, and Knowledge Graphs

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!