A Coding Guide to Build a Procedural Memory Agent That Learns, Stores, Retrieves, and Reuses Skills as Neural Modules Over Time

In this tutorial, we discover how an clever agent can steadily type procedural reminiscence by studying reusable expertise straight from its interactions with an setting. We design a minimal but highly effective framework wherein expertise behave like neural modules: they retailer motion sequences, carry contextual embeddings, and are retrieved by similarity when a new state of affairs resembles an expertise. As we run our agent by means of a number of episodes, we observe how its behaviour turns into extra environment friendly, transferring from primitive exploration to leveraging a library of expertise that it has realized by itself. Check out the FULL CODES here.

Copy Code

import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict


class Skill:
   def __init__(self, title, preconditions, action_sequence, embedding, success_count=0):
       self.title = title
       self.preconditions = preconditions
       self.action_sequence = action_sequence
       self.embedding = embedding
       self.success_count = success_count
       self.times_used = 0
  
   def is_applicable(self, state):
       for key, worth in self.preconditions.gadgets():
           if state.get(key) != worth:
               return False
       return True
  
   def __repr__(self):
       return f"Skill({self.title}, used={self.times_used}, success={self.success_count})"


class SkillLibrary:
   def __init__(self, embedding_dim=8):
       self.expertise = []
       self.embedding_dim = embedding_dim
       self.skill_stats = defaultdict(lambda: {"makes an attempt": 0, "successes": 0})
  
   def add_skill(self, ability):
       for existing_skill in self.expertise:
           if self._similarity(ability.embedding, existing_skill.embedding) > 0.9:
               existing_skill.success_count += 1
               return existing_skill
       self.expertise.append(ability)
       return ability
  
   def retrieve_skills(self, state, query_embedding=None, top_k=3):
       relevant = [s for s in self.skills if s.is_applicable(state)]
       if query_embedding shouldn't be None and relevant:
           similarities = [self._similarity(query_embedding, s.embedding) for s in applicable]
           sorted_skills = [s for _, s in sorted(zip(similarities, applicable), reverse=True)]
           return sorted_skills[:top_k]
       return sorted(relevant, key=lambda s: s.success_count / max(s.times_used, 1), reverse=True)[:top_k]
  
   def _similarity(self, emb1, emb2):
       return np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2) + 1e-8)
  
   def get_stats(self):
       return {
           "total_skills": len(self.expertise),
           "total_uses": sum(s.times_used for s in self.expertise),
           "avg_success_rate": np.imply([s.success_count / max(s.times_used, 1) for s in self.skills]) if self.expertise else 0
       }

We outline how expertise are represented and saved in a reminiscence construction. We implement similarity-based retrieval in order that the agent can match a new state with previous expertise utilizing cosine similarity. As we work by means of this layer, we see how ability reuse turns into potential as soon as expertise purchase metadata, embeddings, and utilization statistics. Check out the FULL CODES here.

Copy Code

class GridWorld:
   def __init__(self, measurement=5):
       self.measurement = measurement
       self.reset()
  
   def reset(self):
       self.agent_pos = [0, 0]
       self.goal_pos = [self.size-1, self.size-1]
       self.objects = {"key": [2, 2], "door": [3, 3], "field": [1, 3]}
       self.stock = []
       self.door_open = False
       return self.get_state()
  
   def get_state(self):
       return {
           "agent_pos": tuple(self.agent_pos),
           "has_key": "key" in self.stock,
           "door_open": self.door_open,
           "at_goal": self.agent_pos == self.goal_pos,
           "objects": {ok: tuple(v) for ok, v in self.objects.gadgets()}
       }
  
   def step(self, motion):
       reward = -0.1
       if motion == "move_up":
           self.agent_pos[1] = min(self.agent_pos[1] + 1, self.measurement - 1)
       elif motion == "move_down":
           self.agent_pos[1] = max(self.agent_pos[1] - 1, 0)
       elif motion == "move_left":
           self.agent_pos[0] = max(self.agent_pos[0] - 1, 0)
       elif motion == "move_right":
           self.agent_pos[0] = min(self.agent_pos[0] + 1, self.measurement - 1)
       elif motion == "pickup_key":
           if self.agent_pos == self.objects["key"] and "key" not in self.stock:
               self.stock.append("key")
               reward = 1.0
       elif motion == "open_door":
           if self.agent_pos == self.objects["door"] and "key" in self.stock:
               self.door_open = True
               reward = 2.0
       executed = self.agent_pos == self.goal_pos and self.door_open
       if executed:
           reward = 10.0
       return self.get_state(), reward, executed

We assemble a easy setting wherein the agent learns duties such as choosing up a key, opening a door, and reaching a aim. We use this setting as a playground for our procedural reminiscence system, permitting us to observe how primitive actions evolve into extra advanced, reusable expertise. The setting’s construction helps us observe clear, interpretable enhancements in behaviour throughout episodes. Check out the FULL CODES here.

Copy Code

class ProceduralMemoryAgent:
   def __init__(self, env, embedding_dim=8):
       self.env = env
       self.skill_library = SkillLibrary(embedding_dim)
       self.embedding_dim = embedding_dim
       self.episode_history = []
       self.primitive_actions = ["move_up", "move_down", "move_left", "move_right", "pickup_key", "open_door"]
  
   def create_embedding(self, state, action_seq):
       state_vec = np.zeros(self.embedding_dim)
       state_vec[0] = hash(str(state["agent_pos"])) % 1000 / 1000
       state_vec[1] = 1.0 if state.get("has_key") else 0.0
       state_vec[2] = 1.0 if state.get("door_open") else 0.0
       for i, motion in enumerate(action_seq[:self.embedding_dim-3]):
           state_vec[3+i] = hash(motion) % 1000 / 1000
       return state_vec / (np.linalg.norm(state_vec) + 1e-8)
  
   def extract_skill(self, trajectory):
       if len(trajectory) < 2:
           return None
       start_state = trajectory[0][0]
       actions = [a for _, a, _ in trajectory]
       preconditions = {"has_key": start_state.get("has_key", False), "door_open": start_state.get("door_open", False)}
       end_state = self.env.get_state()
       if end_state.get("has_key") and not start_state.get("has_key"):
           title = "acquire_key"
       elif end_state.get("door_open") and not start_state.get("door_open"):
           title = "open_door_sequence"
       else:
           title = f"navigate_{len(actions)}_steps"
       embedding = self.create_embedding(start_state, actions)
       return Skill(title, preconditions, actions, embedding, success_count=1)
  
   def execute_skill(self, ability):
       ability.times_used += 1
       trajectory = []
       total_reward = 0
       for motion in ability.action_sequence:
           state = self.env.get_state()
           next_state, reward, executed = self.env.step(motion)
           trajectory.append((state, motion, reward))
           total_reward += reward
           if executed:
               ability.success_count += 1
               return trajectory, total_reward, True
       return trajectory, total_reward, False
  
   def discover(self, max_steps=20):
       trajectory = []
       state = self.env.get_state()
       for _ in vary(max_steps):
           motion = self._choose_exploration_action(state)
           next_state, reward, executed = self.env.step(motion)
           trajectory.append((state, motion, reward))
           state = next_state
           if executed:
               return trajectory, True
       return trajectory, False

We deal with constructing embeddings that encode the context of a state-action sequence, enabling us to meaningfully examine expertise. We additionally extract expertise from profitable trajectories, reworking uncooked expertise into reusable behaviours. As we run this code, we observe how easy exploration steadily yields structured data that the agent can apply later. Check out the FULL CODES here.

Copy Code

   def _choose_exploration_action(self, state):
       agent_pos = state["agent_pos"]
       if not state.get("has_key"):
           key_pos = state["objects"]["key"]
           if agent_pos == key_pos:
               return "pickup_key"
           if agent_pos[0] < key_pos[0]:
               return "move_right"
           if agent_pos[0] > key_pos[0]:
               return "move_left"
           if agent_pos[1] < key_pos[1]:
               return "move_up"
           return "move_down"
       if state.get("has_key") and not state.get("door_open"):
           door_pos = state["objects"]["door"]
           if agent_pos == door_pos:
               return "open_door"
           if agent_pos[0] < door_pos[0]:
               return "move_right"
           if agent_pos[0] > door_pos[0]:
               return "move_left"
           if agent_pos[1] < door_pos[1]:
               return "move_up"
           return "move_down"
       goal_pos = (4, 4)
       if agent_pos[0] < goal_pos[0]:
           return "move_right"
       if agent_pos[1] < goal_pos[1]:
           return "move_up"
       return np.random.alternative(self.primitive_actions)
  
   def run_episode(self, use_skills=True):
       self.env.reset()
       total_reward = 0
       steps = 0
       trajectory = []
       whereas steps < 50:
           state = self.env.get_state()
           if use_skills and self.skill_library.expertise:
               query_emb = self.create_embedding(state, [])
               expertise = self.skill_library.retrieve_skills(state, query_emb, top_k=1)
               if expertise:
                   skill_traj, skill_reward, success = self.execute_skill(expertise[0])
                   trajectory.lengthen(skill_traj)
                   total_reward += skill_reward
                   steps += len(skill_traj)
                   if success:
                       return trajectory, total_reward, steps, True
                   proceed
           motion = self._choose_exploration_action(state)
           next_state, reward, executed = self.env.step(motion)
           trajectory.append((state, motion, reward))
           total_reward += reward
           steps += 1
           if executed:
               return trajectory, total_reward, steps, True
       return trajectory, total_reward, steps, False
  
   def practice(self, episodes=10):
       stats = {"rewards": [], "steps": [], "skills_learned": [], "skill_uses": []}
       for ep in vary(episodes):
           trajectory, reward, steps, success = self.run_episode(use_skills=True)
           if success and len(trajectory) >= 3:
               phase = trajectory[-min(5, len(trajectory)):]
               ability = self.extract_skill(phase)
               if ability:
                   self.skill_library.add_skill(ability)
           stats["rewards"].append(reward)
           stats["steps"].append(steps)
           stats["skills_learned"].append(len(self.skill_library.expertise))
           stats["skill_uses"].append(self.skill_library.get_stats()["total_uses"])
           print(f"Episode {ep+1}: Reward={reward:.1f}, Steps={steps}, Skills={len(self.skill_library.expertise)}, Success={success}")
       return stats

We outline how the agent chooses between utilizing recognized expertise and exploring with primitive actions. We practice the agent throughout a number of episodes and report the evolution of realized expertise, utilization counts, and success charges. As we look at this half, we observe that ability reuse reduces episode size and improves general rewards. Check out the FULL CODES here.

Copy Code

def visualize_training(stats):
   fig, axes = plt.subplots(2, 2, figsize=(12, 8))
   axes[0, 0].plot(stats["rewards"])
   axes[0, 0].set_title("Episode Rewards")
   axes[0, 1].plot(stats["steps"])
   axes[0, 1].set_title("Steps per Episode")
   axes[1, 0].plot(stats["skills_learned"])
   axes[1, 0].set_title("Skills in Library")
   axes[1, 1].plot(stats["skill_uses"])
   axes[1, 1].set_title("Cumulative Skill Uses")
   plt.tight_layout()
   plt.savefig("skill_learning_stats.png", dpi=150, bbox_inches='tight')
   plt.present()


if __name__ == "__main__":
   print("=== Procedural Memory Agent Demo ===n")
   env = GridWorld(measurement=5)
   agent = ProceduralMemoryAgent(env)
   print("Training agent to be taught reusable expertise...n")
   stats = agent.practice(episodes=15)
   print("n=== Learned Skills ===")
   for ability in agent.skill_library.expertise:
       print(f"{ability.title}: {len(ability.action_sequence)} actions, used {ability.times_used} occasions, {ability.success_count} successes")
   lib_stats = agent.skill_library.get_stats()
   print(f"n=== Library Statistics ===")
   print(f"Total expertise: {lib_stats['total_skills']}")
   print(f"Total ability makes use of: {lib_stats['total_uses']}")
   print(f"Avg success fee: {lib_stats['avg_success_rate']:.2%}")
   visualize_training(stats)
   print("n✓ Skill studying full! Check the visualization above.")

We deliver all the things collectively by operating coaching, printing realized expertise, and plotting behaviour statistics. We visualize the pattern in rewards and how the ability library grows over time. By operating this snippet, we full the lifecycle of procedural reminiscence formation and verify that the agent learns to behave extra intelligently with expertise.

In conclusion, we see how procedural reminiscence emerges naturally when an agent learns to extract expertise from its personal profitable trajectories. We observe how expertise are gained, construction, metadata, embeddings, and utilization patterns, permitting the agent to reuse them effectively in future conditions. Lastly, we respect how even a small setting and easy heuristics lead to significant studying dynamics, giving us a concrete understanding of what it means for an agent to develop reusable inner competencies over time.

Check out the FULL CODES here. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to comply with us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The publish A Coding Guide to Build a Procedural Memory Agent That Learns, Stores, Retrieves, and Reuses Skills as Neural Modules Over Time appeared first on MarkTechPost.