A Coding Implementation to Build Neural Memory Agents with Differentiable Memory, Meta-Learning, and Experience Replay for Continual Adaptation in Dynamic Environments

In this tutorial, we discover how neural reminiscence brokers can be taught constantly with out forgetting previous experiences. We design a memory-augmented neural community that integrates a Differentiable Neural Computer (DNC) with expertise replay and meta-learning to adapt rapidly to new duties whereas retaining prior information. By implementing this strategy in PyTorch, we show how content-based reminiscence addressing and prioritized replay allow the mannequin to overcome catastrophic forgetting and preserve efficiency throughout a number of studying duties. Check out the FULL CODES here.

Copy Code

import torch
import torch.nn as nn
import torch.nn.purposeful as F
import numpy as np
from collections import deque
import matplotlib.pyplot as plt
from dataclasses import dataclass


@dataclass
class MemoryConfig:
   memory_size: int = 128
   memory_dim: int = 64
   num_read_heads: int = 4
   num_write_heads: int = 1

We start by importing all of the important libraries and defining the configuration class for our neural reminiscence system. Here, we set parameters similar to reminiscence dimension, dimensionality, and the variety of learn/write heads that form how the differentiable reminiscence behaves all through coaching. This setup acts as the muse upon which our memory-augmented structure is constructed. Check out the FULL CODES here.

Copy Code

class NeuralMemoryBank(nn.Module):
   def __init__(self, config: MemoryConfig):
       tremendous().__init__()
       self.memory_size = config.memory_size
       self.memory_dim = config.memory_dim
       self.num_read_heads = config.num_read_heads
       self.register_buffer('reminiscence', torch.zeros(config.memory_size, config.memory_dim))
       self.register_buffer('utilization', torch.zeros(config.memory_size))
   def content_addressing(self, key, beta):
       key_norm = F.normalize(key, dim=-1)
       mem_norm = F.normalize(self.reminiscence, dim=-1)
       similarity = torch.matmul(key_norm, mem_norm.t())
       return F.softmax(beta * similarity, dim=-1)
   def write(self, write_key, write_vector, erase_vector, write_strength):
       write_weights = self.content_addressing(write_key, write_strength)
       erase = torch.outer(write_weights.squeeze(), erase_vector.squeeze())
       self.reminiscence = (self.reminiscence * (1 - erase)).detach()
       add = torch.outer(write_weights.squeeze(), write_vector.squeeze())
       self.reminiscence = (self.reminiscence + add).detach()
       self.utilization = (0.99 * self.utilization + write_weights.squeeze()).detach()
   def learn(self, read_keys, read_strengths):
       reads = []
       for i in vary(self.num_read_heads):
           weights = self.content_addressing(read_keys[i], read_strengths[i])
           read_vector = torch.matmul(weights, self.reminiscence)
           reads.append(read_vector)
       return torch.cat(reads, dim=-1)


class MemoryController(nn.Module):
   def __init__(self, input_dim, hidden_dim, memory_config: MemoryConfig):
       tremendous().__init__()
       self.hidden_dim = hidden_dim
       self.memory_config = memory_config
       self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
       total_read_dim = memory_config.num_read_heads * memory_config.memory_dim
       self.read_keys = nn.Linear(hidden_dim, memory_config.num_read_heads * memory_config.memory_dim)
       self.read_strengths = nn.Linear(hidden_dim, memory_config.num_read_heads)
       self.write_key = nn.Linear(hidden_dim, memory_config.memory_dim)
       self.write_vector = nn.Linear(hidden_dim, memory_config.memory_dim)
       self.erase_vector = nn.Linear(hidden_dim, memory_config.memory_dim)
       self.write_strength = nn.Linear(hidden_dim, 1)
       self.output = nn.Linear(hidden_dim + total_read_dim, input_dim)
   def ahead(self, x, memory_bank, hidden=None):
       lstm_out, hidden = self.lstm(x.unsqueeze(0), hidden)
       controller_state = lstm_out.squeeze(0)
       read_k = self.read_keys(controller_state).view(self.memory_config.num_read_heads, -1)
       read_s = F.softplus(self.read_strengths(controller_state))
       write_k = self.write_key(controller_state)
       write_v = torch.tanh(self.write_vector(controller_state))
       erase_v = torch.sigmoid(self.erase_vector(controller_state))
       write_s = F.softplus(self.write_strength(controller_state))
       read_vectors = memory_bank.learn(read_k, read_s)
       memory_bank.write(write_k, write_v, erase_v, write_s)
       mixed = torch.cat([controller_state, read_vectors], dim=-1)
       output = self.output(mixed)
       return output, hidden

We implement the Neural Memory Bank and the Memory Controller, which collectively type the core of the agent’s differentiable reminiscence mechanism. The Neural Memory Bank shops and retrieves info via content-based addressing, whereas the controller community dynamically interacts with this reminiscence utilizing learn and write operations. This setup allows the agent to recall related info and adapt to new inputs effectively. Check out the FULL CODES here.

Copy Code

class ExperienceReplay:
   def __init__(self, capability=10000, alpha=0.6):
       self.capability = capability
       self.alpha = alpha
       self.buffer = deque(maxlen=capability)
       self.priorities = deque(maxlen=capability)
   def push(self, expertise, precedence=1.0):
       self.buffer.append(expertise)
       self.priorities.append(precedence ** self.alpha)
   def pattern(self, batch_size, beta=0.4):
       if len(self.buffer) == 0:
           return [], []
       probs = np.array(self.priorities)
       probs = probs / probs.sum()
       indices = np.random.selection(len(self.buffer), min(batch_size, len(self.buffer)), p=probs, change=False)
       samples = [self.buffer[i] for i in indices]
       weights = (len(self.buffer) * probs[indices]) ** (-beta)
       weights = weights / weights.max()
       return samples, torch.FloatTensor(weights)


class MetaLearner(nn.Module):
   def __init__(self, mannequin):
       tremendous().__init__()
       self.mannequin = mannequin
   def adapt(self, support_x, support_y, num_steps=5, lr=0.01):
       adapted_params = {identify: param.clone() for identify, param in self.mannequin.named_parameters()}
       for _ in vary(num_steps):
           pred, _ = self.mannequin(support_x, self.mannequin.memory_bank)
           loss = F.mse_loss(pred, support_y)
           grads = torch.autograd.grad(loss, self.mannequin.parameters(), create_graph=True)
           adapted_params = {identify: param - lr * grad for (identify, param), grad in zip(adapted_params.gadgets(), grads)}
       return adapted_params

We design the Experience Replay and Meta-Learner parts to strengthen the agent’s capacity to be taught constantly. The replay buffer allows the mannequin to revisit previous experiences via prioritized sampling, thereby lowering forgetting, whereas the Meta-Learner makes use of MAML-style adaptation for speedy studying on new duties. Together, these modules convey stability and flexibility to the agent’s coaching course of. Check out the FULL CODES here.

Copy Code

class ContinualLearningAgent:
   def __init__(self, input_dim=64, hidden_dim=128):
       self.config = MemoryConfig()
       self.memory_bank = NeuralMemoryBank(self.config)
       self.controller = MemoryController(input_dim, hidden_dim, self.config)
       self.replay_buffer = ExperienceReplay(capability=5000)
       self.meta_learner = MetaLearner(self.controller)
       self.optimizer = torch.optim.Adam(self.controller.parameters(), lr=0.001)
       self.task_history = []
   def train_step(self, x, y, use_replay=True):
       self.optimizer.zero_grad()
       pred, _ = self.controller(x, self.memory_bank)
       current_loss = F.mse_loss(pred, y)
       self.replay_buffer.push((x.detach().clone(), y.detach().clone()), precedence=current_loss.merchandise() + 1e-6)
       total_loss = current_loss
       if use_replay and len(self.replay_buffer.buffer) > 16:
           samples, weights = self.replay_buffer.pattern(8)
           for (replay_x, replay_y), weight in zip(samples, weights):
               with torch.enable_grad():
                   replay_pred, _ = self.controller(replay_x, self.memory_bank)
                   replay_loss = F.mse_loss(replay_pred, replay_y)
                   total_loss = total_loss + 0.3 * replay_loss * weight
       total_loss.backward()
       torch.nn.utils.clip_grad_norm_(self.controller.parameters(), 1.0)
       self.optimizer.step()
       return total_loss.merchandise()
   def consider(self, test_data):
       self.controller.eval()
       total_error = 0
       with torch.no_grad():
           for x, y in test_data:
               pred, _ = self.controller(x, self.memory_bank)
               total_error += F.mse_loss(pred, y).merchandise()
       self.controller.practice()
       return total_error / len(test_data)

We assemble a Continual Learning Agent that integrates reminiscence, controller, replay, and meta-learning right into a single, adaptive framework. In this step, we outline how the agent trains on every batch, replays previous knowledge, and evaluates its efficiency. The implementation ensures that the mannequin can retain prior information whereas studying new info with out catastrophic forgetting. Check out the FULL CODES here.

Copy Code

def create_task_data(task_id, num_samples=100):
   torch.manual_seed(task_id)
   x = torch.randn(num_samples, 64)
   if task_id == 0:
       y = torch.sin(x.imply(dim=1, keepdim=True).increase(-1, 64))
   elif task_id == 1:
       y = torch.cos(x.imply(dim=1, keepdim=True).increase(-1, 64)) * 0.5
   else:
       y = torch.tanh(x * 0.5 + task_id)
   return [(x[i], y[i]) for i in vary(num_samples)]


def run_continual_learning_demo():
   print(" Neural Memory Agent - Continual Learning Demon")
   print("=" * 60)
   agent = ContinualLearningAgent()
   num_tasks = 4
   outcomes = {'duties': [], 'without_memory': [], 'with_memory': []}
   for task_id in vary(num_tasks):
       print(f"n Learning Task {task_id + 1}/{num_tasks}")
       train_data = create_task_data(task_id, num_samples=50)
       test_data = create_task_data(task_id, num_samples=20)
       for epoch in vary(20):
           total_loss = 0
           for x, y in train_data:
               loss = agent.train_step(x, y, use_replay=(task_id > 0))
               total_loss += loss
           if epoch % 5 == 0:
               avg_loss = total_loss / len(train_data)
               print(f"  Epoch {epoch:second}: Loss = {avg_loss:.4f}")
       print(f"n   Evaluation on all duties:")
       for eval_task_id in vary(task_id + 1):
           eval_data = create_task_data(eval_task_id, num_samples=20)
           error = agent.consider(eval_data)
           print(f"    Task {eval_task_id + 1}: Error = {error:.4f}")
           if eval_task_id == task_id:
               outcomes['tasks'].append(eval_task_id + 1)
               outcomes['with_memory'].append(error)
   fig, axes = plt.subplots(1, 2, figsize=(14, 5))
   ax = axes[0]
   memory_matrix = agent.memory_bank.reminiscence.detach().numpy()
   im = ax.imshow(memory_matrix, facet='auto', cmap='viridis')
   ax.set_title('Neural Memory Bank State', fontsize=14, fontweight='daring')
   ax.set_xlabel('Memory Dimension')
   ax.set_ylabel('Memory Slots')
   plt.colorbar(im, ax=ax)
   ax = axes[1]
   ax.plot(outcomes['tasks'], outcomes['with_memory'], marker='o', linewidth=2, markersize=8, label='With Memory Replay')
   ax.set_title('Continual Learning Performance', fontsize=14, fontweight='daring')
   ax.set_xlabel('Task Number')
   ax.set_ylabel('Test Error')
   ax.legend()
   ax.grid(True, alpha=0.3)
   plt.tight_layout()
   plt.savefig('neural_memory_results.png', dpi=150, bbox_inches='tight')
   print("n Results saved to 'neural_memory_results.png'")
   plt.present()
   print("n" + "=" * 60)
   print(" Key Insights:")
   print("  • Memory financial institution shops compressed activity representations")
   print("  • Experience replay mitigates catastrophic forgetting")
   print("  • Agent maintains efficiency on earlier duties")
   print("  • Content-based addressing allows environment friendly retrieval")


if __name__ == "__main__":
   run_continual_learning_demo()

We conduct a complete demonstration of the continuous studying course of, producing artificial duties to consider the agent’s adaptability throughout a number of environments. As we practice and visualize the outcomes, we observe how reminiscence replay improves stability and maintains accuracy throughout duties. The experiment concludes with graphical insights that spotlight how differentiable reminiscence enhances the agent’s long-term studying functionality.

In conclusion, we constructed and educated a neural reminiscence agent able to continuous adaptation throughout evolving duties. We noticed how the differentiable reminiscence allows environment friendly storage and retrieval of discovered representations, whereas the replay mechanism reinforces stability and information retention. By combining these parts with meta-learning, we noticed how such brokers pave the way in which for extra resilient, self-adapting neural methods that may keep in mind, motive, and evolve with out shedding what they’ve already mastered.

Check out the FULL CODES here. Feel free to try our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit A Coding Implementation to Build Neural Memory Agents with Differentiable Memory, Meta-Learning, and Experience Replay for Continual Adaptation in Dynamic Environments appeared first on MarkTechPost.

A Coding Implementation to Build Neural Memory Agents with Differentiable Memory, Meta-Learning, and Experience Replay for Continual Adaptation in Dynamic Environments

Comparing the Top 4 Agentic AI Browsers in 2025: Atlas vs Copilot Mode vs Dia vs Comet

Building Advanced MCP (Model Context Protocol) Agents with Multi-Agent Coordination, Context Awareness, and Gemini Integration

Implementing OAuth 2.1 for MCP Servers with Scalekit: A Step-by-Step Coding Tutorial

Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents

Google vs OpenAI vs Anthropic: The Agentic AI Arms Race Breakdown

A Coding Implementation for an Agentic AI Framework that Performs Literature Analysis, Hypothesis Generation, Experimental Planning, Simulation, and Scientific Reporting

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!