|

How to Build a Self-Designing Meta-Agent That Automatically Constructs, Instantiates, and Refines Task-Specific AI Agents

In this tutorial, we build a Meta-Agent that designs other agents automatically from a simple task description. We implement a system that analyzes the task, selects tools, chooses a memory architecture, configures a planner, and then instantiates a fully working agent runtime. We go beyond static agent templates and instead build a dynamic, self-configuring architecture that can evaluate its own performance and refine itself as needed. We also demonstrate how agent design automation, tool selection, memory strategy, and iterative self-improvement can be unified into a cohesive, Colab-ready framework.

import os, re, json, math, time, textwrap, traceback, random
from dataclasses import dataclass
from typing import Any, Dict, List, Optional, Callable, Tuple


def _pip_install():
   try:
       import pydantic
       import transformers
       return
   except Exception:
       pass
   import sys, subprocess
   pkgs = [
       "pydantic>=2.6.0",
       "transformers>=4.41.0",
       "accelerate>=0.30.0",
       "sentencepiece",
       "torch",
       "numpy",
       "scikit-learn",
       "pandas",
   ]
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs)


_pip_install()


import numpy as np
import pandas as pd
from pydantic import BaseModel, Field
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors


try:
   from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
   _HAS_TRANSFORMERS = True
except Exception:
   _HAS_TRANSFORMERS = False


class ToolSpec(BaseModel):
   name: str
   description: str
   inputs_schema: Dict[str, Any] = Field(default_factory=dict)


class MemorySpec(BaseModel):
   kind: str = Field(default="scratchpad", description="scratchpad | retrieval_tfidf")
   max_items: int = 200
   retrieval_k: int = 5


class PlannerSpec(BaseModel):
   kind: str = Field(default="react", description="react | plan_execute")
   max_steps: int = 10
   temperature: float = 0.2


class AgentConfig(BaseModel):
   agent_name: str = "DesignedAgent"
   objective: str
   planner: PlannerSpec
   memory: MemorySpec
   tools: List[ToolSpec]
   output_style: str = "concise"
   safety_rules: List[str] = Field(default_factory=lambda: [
       "Do not execute arbitrary OS commands.",
       "Refuse harmful/illegal instructions; suggest safe alternatives.",
       "If uncertain, ask for missing inputs or state assumptions.",
   ])

We set up the complete foundational environment for our meta-agent system. We install required dependencies, import all necessary libraries, and define the core configuration schemas using Pydantic. We formalize structured specifications for tools, memory, planner, and the overall agent configuration to enable typed, automated agent construction.

class LocalLLM:
   def __init__(self, model_name: str = "google/flan-t5-small", device: int = -1):
       self.model_name = model_name
       self.device = device
       self._pipe = None
       self._fallback = False


       if not _HAS_TRANSFORMERS:
           self._fallback = True
           return


       try:
           tok = AutoTokenizer.from_pretrained(model_name)
           mdl = AutoModelForSeq2SeqLM.from_pretrained(model_name)
           self._pipe = pipeline(
               "text2text-generation",
               model=mdl,
               tokenizer=tok,
               device=device,
           )
       except Exception:
           self._fallback = True


   def generate(self, prompt: str, max_new_tokens: int = 256, temperature: float = 0.2) -> str:
       if self._fallback or self._pipe is None:
           return self._heuristic(prompt)


       try:
           out = self._pipe(
               prompt,
               max_new_tokens=max_new_tokens,
               do_sample=temperature > 0,
               temperature=max(temperature, 1e-6),
               num_return_sequences=1,
           )[0]["generated_text"]
           return out.strip()
       except Exception:
           return self._heuristic(prompt)


   def _heuristic(self, prompt: str) -> str:
       p = prompt.lower()
       if "tool" in p and "json" in p:
           return '{"action":"final","final":"(fallback) I can’t load the model. Provide more details or enable internet in Colab to download the model."}'
       return "(fallback) I can’t load the model. Please ensure Colab has internet access and retry."


class ScratchpadMemory:
   def __init__(self, max_items: int = 200):
       self.max_items = max_items
       self.items: List[Dict[str, str]] = []


   def add(self, role: str, content: str):
       self.items.append({"role": role, "content": content})
       if len(self.items) > self.max_items:
           self.items = self.items[-self.max_items:]


   def recent(self, n: int = 12) -> List[Dict[str, str]]:
       return self.items[-n:]


   def retrieve(self, query: str, k: int = 5) -> List[Dict[str, str]]:
       return self.recent(k)


class TfidfRetrievalMemory:
   def __init__(self, max_items: int = 200, retrieval_k: int = 5):
       self.max_items = max_items
       self.retrieval_k = retrieval_k
       self.items: List[Dict[str, str]] = []
       self._vectorizer = TfidfVectorizer(stop_words="english")
       self._nn = None
       self._X = None


   def add(self, role: str, content: str):
       self.items.append({"role": role, "content": content})
       if len(self.items) > self.max_items:
           self.items = self.items[-self.max_items:]
       self._rebuild_index()


   def _rebuild_index(self):
       docs = [it["content"] for it in self.items] or [""]
       self._X = self._vectorizer.fit_transform(docs)
       n_neighbors = min(self.retrieval_k, self._X.shape[0])
       self._nn = NearestNeighbors(n_neighbors=n_neighbors, metric="cosine")
       self._nn.fit(self._X)


   def recent(self, n: int = 12) -> List[Dict[str, str]]:
       return self.items[-n:]


   def retrieve(self, query: str, k: Optional[int] = None) -> List[Dict[str, str]]:
       if not self.items:
           return []
       if self._nn is None:
           self._rebuild_index()
       k = k or self.retrieval_k
       q = self._vectorizer.transform([query])
       n_neighbors = min(k, self._X.shape[0])
       dists, idx = self._nn.kneighbors(q, n_neighbors=n_neighbors)
       hits = [self.items[i] for i in idx[0].tolist()]
       return hits

We implement the LocalLLM wrapper that powers reasoning and tool-selection behavior. We configure a lightweight open-source model with a safe fallback mechanism to ensure robustness in Colab. We also define both scratchpad and retrieval-based memory systems to support contextual and semantic recall.

class ToolResult(BaseModel):
   ok: bool
   output: str
   data: Optional[Any] = None


class Tool:
   def __init__(self, name: str, description: str, fn: Callable[..., ToolResult], inputs_schema: Dict[str, Any]):
       self.name = name
       self.description = description
       self.fn = fn
       self.inputs_schema = inputs_schema


   def call(self, **kwargs) -> ToolResult:
       try:
           return self.fn(**kwargs)
       except Exception as e:
           return ToolResult(ok=False, output=f"Tool error: {e}n{traceback.format_exc()}")


class ToolRegistry:
   def __init__(self):
       self._tools: Dict[str, Tool] = {}


   def register(self, tool: Tool):
       self._tools[tool.name] = tool


   def has(self, name: str) -> bool:
       return name in self._tools


   def specs(self) -> List[ToolSpec]:
       return [
           ToolSpec(name=t.name, description=t.description, inputs_schema=t.inputs_schema)
           for t in self._tools.values()
       ]


   def call(self, name: str, args: Dict[str, Any]) -> ToolResult:
       if name not in self._tools:
           return ToolResult(ok=False, output=f"Unknown tool: {name}")
       return self._tools[name].call(**args)


_ALLOWED_MATH = {
   "abs": abs, "round": round, "min": min, "max": max,
   "sqrt": math.sqrt, "log": math.log, "exp": math.exp,
   "sin": math.sin, "cos": math.cos, "tan": math.tan,
   "pi": math.pi, "e": math.e
}
def tool_calc(expression: str) -> ToolResult:
   expr = expression.strip()
   if not expr:
       return ToolResult(ok=False, output="Empty expression.")
   if re.search(r"[A-Za-z_]w*", expr):
       names = set(re.findall(r"[A-Za-z_]w*", expr))
       bad = [n for n in names if n not in _ALLOWED_MATH]
       if bad:
           return ToolResult(ok=False, output=f"Disallowed names in expression: {bad}")
   if re.search(r"__|import|exec|eval|open|os.|sys.", expr):
       return ToolResult(ok=False, output="Disallowed tokens in expression.")
   try:
       val = eval(expr, {"__builtins__": {}}, dict(_ALLOWED_MATH))
       return ToolResult(ok=True, output=str(val), data=val)
   except Exception as e:
       return ToolResult(ok=False, output=f"Failed to evaluate: {e}")


def tool_text_stats(text: str) -> ToolResult:
   s = text or ""
   words = re.findall(r"w+", s)
   lines = s.splitlines() if s else []
   out = {
       "chars": len(s),
       "words": len(words),
       "lines": len(lines),
       "unique_words": len(set(w.lower() for w in words)),
   }
   return ToolResult(ok=True, output=json.dumps(out, indent=2), data=out)


def tool_csv_profile(path: str, n_rows: int = 5) -> ToolResult:
   try:
       df = pd.read_csv(path)
   except Exception as e:
       return ToolResult(ok=False, output=f"Could not read CSV: {e}")
   head = df.head(n_rows)
   desc = df.describe(include="all").transpose().head(30)
   out = (
       f"Shape: {df.shape}nn"
       f"Columns: {list(df.columns)}nn"
       f"Head({n_rows}):n{head}nn"
       f"Describe(top 30 cols):n{desc}n"
   )
   return ToolResult(ok=True, output=out, data={"shape": df.shape, "columns": list(df.columns)})


def default_tool_registry() -> ToolRegistry:
   reg = ToolRegistry()
   reg.register(Tool(
       name="calc",
       description="Evaluate a safe mathematical expression (no arbitrary code).",
       fn=lambda expression: tool_calc(expression),
       inputs_schema={"type":"object","properties":{"expression":{"type":"string"}}, "required":["expression"]}
   ))
   reg.register(Tool(
       name="text_stats",
       description="Compute basic statistics about a text blob (words, lines, unique words).",
       fn=lambda text: tool_text_stats(text),
       inputs_schema={"type":"object","properties":{"text":{"type":"string"}}, "required":["text"]}
   ))
   reg.register(Tool(
       name="csv_profile",
       description="Load a CSV from a local path and print a quick profile (head, describe).",
       fn=lambda path, n_rows=5: tool_csv_profile(path, n_rows),
       inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]}
   ))
   return reg

We build the full tool infrastructure including tool registration, safe execution, and structured outputs. We implement secure mathematical evaluation, text statistics analysis, and CSV profiling capabilities. We design the ToolRegistry abstraction to allow the meta-agent to dynamically select and invoke tools during runtime.

class AgentRuntime:
   def __init__(self, config: AgentConfig, llm: LocalLLM, tools: ToolRegistry, memory):
       self.config = config
       self.llm = llm
       self.tools = tools
       self.memory = memory


   def _tool_prompt(self) -> str:
       specs = self.config.tools
       lines = []
       for t in specs:
           lines.append(f"- {t.name}: {t.description} | inputs_schema={json.dumps(t.inputs_schema)}")
       return "n".join(lines)


   def _format_context(self, task: str) -> str:
       retrieved = self.memory.retrieve(task, k=getattr(self.config.memory, "retrieval_k", 5))
       recent = self.memory.recent(8)


       def pack(items):
           return "n".join([f"[{it['role']}] {it['content']}" for it in items])


       return (
           f"OBJECTIVE:n{self.config.objective}nn"
           f"TASK:n{task}nn"
           f"SAFETY RULES:n- " + "n- ".join(self.config.safety_rules) + "nn"
           f"AVAILABLE TOOLS:n{self._tool_prompt()}nn"
           f"RETRIEVED MEMORY (may be relevant):n{pack(retrieved) if retrieved else '(none)'}nn"
           f"RECENT CONTEXT:n{pack(recent) if recent else '(none)'}n"
       )


   def _react_step_prompt(self, task: str, scratch: str) -> str:
       ctx = self._format_context(task)
       return textwrap.dedent(f"""
       You are an expert tool-using agent.
       Use the following JSON-only protocol (no extra text):
       {{
         "action": "tool" | "final",
         "tool_name": "name" (if action=tool),
         "tool_args": {{...}} (if action=tool),
         "final": "answer" (if action=final)
       }}


       Rules:
       - If a tool is needed, pick ONE tool call per step.
       - Keep args strictly matching the tool schema.
       - If you can answer directly, output action="final".
       - Output valid JSON only.


       {ctx}


       SCRATCHPAD (internal notes, may be incomplete):
       {scratch}
       """ ).strip()


   def run(self, task: str, verbose: bool = True) -> str:
       scratch = ""
       self.memory.add("user", task)


       for step in range(1, self.config.planner.max_steps + 1):
           prompt = self._react_step_prompt(task, scratch)
           raw = self.llm.generate(prompt, max_new_tokens=256, temperature=self.config.planner.temperature)


           m = re.search(r"{.*}", raw, re.DOTALL)
           raw_json = m.group(0).strip() if m else raw.strip()


           try:
               action = json.loads(raw_json)
           except Exception:
               final = f"(Parser fallback) I couldn't parse a tool plan. Here is what I can do:n- Clarify your goaln- Use available tools: {[t.name for t in self.config.tools]}nRaw model output:n{raw}"
               self.memory.add("assistant", final)
               return final


           if verbose:
               print(f"n--- Step {step}/{self.config.planner.max_steps} ---")
               print("Model JSON:", json.dumps(action, indent=2))


           if action.get("action") == "tool":
               name = action.get("tool_name", "")
               args = action.get("tool_args", {}) or {}
               res = self.tools.call(name, args)
               if verbose:
                   print(f"Tool call: {name}({args})")
                   print("Tool ok:", res.ok)
                   print("Tool output:n", res.output[:2000])


               scratch += f"n[tool:{name}] args={args}nresult_ok={res.ok}nresult={res.output}n"
               self.memory.add("tool", f"{name} args={args}n{res.output}")


               if not res.ok:
                   scratch += "nNOTE: tool failed; consider alternative approach or ask for missing input.n"


           elif action.get("action") == "final":
               final = action.get("final", "").strip()
               if not final:
                   final = "I’m missing the final answer text. Please restate the task or provide more details."
               self.memory.add("assistant", final)
               return final
           else:
               final = f"Unknown action type in model output: {action}"
               self.memory.add("assistant", final)
               return final


       final = "Reached max steps without a final answer. Provide missing inputs or simplify the request."
       self.memory.add("assistant", final)
       return final

We implement the core AgentRuntime that executes the designed agent configuration. We construct the structured ReAct-style prompting loop, enforce a strict JSON-based tool-calling protocol, and integrate memory retrieval into reasoning. We manage iterative use of tools, scratchpad updates, and controlled final answer generation.

class MetaAgent:
   def __init__(self, llm: Optional[LocalLLM] = None):
       self.llm = llm or LocalLLM()


   def _capability_heuristics(self, task: str) -> Dict[str, Any]:
       t = task.lower()


       needs_data = any(k in t for k in ["csv", "dataframe", "pandas", "dataset", "table", "excel"])
       needs_math = any(k in t for k in ["calculate", "compute", "probability", "equation", "optimize", "derivative", "integral"])
       needs_writing = any(k in t for k in ["write", "draft", "email", "cover letter", "proposal", "summarize", "rewrite"])
       needs_analysis = any(k in t for k in ["analyze", "insights", "trend", "compare", "benchmark"])
       needs_memory = any(k in t for k in ["long", "multi-step", "remember", "plan", "workflow", "pipeline"])


       return {
           "needs_data": needs_data,
           "needs_math": needs_math,
           "needs_writing": needs_writing,
           "needs_analysis": needs_analysis,
           "needs_memory": needs_memory,
       }


   def design(self, task_description: str) -> AgentConfig:
       caps = self._capability_heuristics(task_description)
       tools = default_tool_registry()


       selected: List[ToolSpec] = []
       selected.append(ToolSpec(
           name="calc",
           description="Evaluate a safe mathematical expression (no arbitrary code).",
           inputs_schema={"type":"object","properties":{"expression":{"type":"string"}}, "required":["expression"]}
       ))
       selected.append(ToolSpec(
           name="text_stats",
           description="Compute basic statistics about a text blob (words, lines, unique words).",
           inputs_schema={"type":"object","properties":{"text":{"type":"string"}}, "required":["text"]}
       ))
       if caps["needs_data"]:
           selected.append(ToolSpec(
               name="csv_profile",
               description="Load a CSV from a local path and print a quick profile (head, describe).",
               inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]}
           ))


       if caps["needs_memory"] or caps["needs_analysis"] or caps["needs_data"]:
           mem = MemorySpec(kind="retrieval_tfidf", max_items=250, retrieval_k=6)
       else:
           mem = MemorySpec(kind="scratchpad", max_items=120, retrieval_k=5)


       if caps["needs_analysis"] or caps["needs_data"] or caps["needs_memory"]:
           planner = PlannerSpec(kind="react", max_steps=12, temperature=0.2)
       else:
           planner = PlannerSpec(kind="react", max_steps=8, temperature=0.2)


       objective = "Solve the user task with tool use when helpful; produce a clean final response."
       cfg = AgentConfig(
           agent_name="AutoDesignedAgent",
           objective=objective,
           planner=planner,
           memory=mem,
           tools=selected,
           output_style="concise",
       )


       for ts in selected:
           if not tools.has(ts.name):
               raise RuntimeError(f"Tool selected but not registered: {ts.name}")


       return cfg


   def instantiate(self, cfg: AgentConfig) -> AgentRuntime:
       tools = default_tool_registry()
       if cfg.memory.kind == "retrieval_tfidf":
           mem = TfidfRetrievalMemory(max_items=cfg.memory.max_items, retrieval_k=cfg.memory.retrieval_k)
       else:
           mem = ScratchpadMemory(max_items=cfg.memory.max_items)
       return AgentRuntime(config=cfg, llm=self.llm, tools=tools, memory=mem)


   def evaluate(self, task: str, answer: str) -> Dict[str, Any]:
       a = (answer or "").strip().lower()
       flags = {
           "empty": len(a) == 0,
           "generic": any(p in a for p in ["i can't", "cannot", "missing", "provide more details", "parser fallback"]),
           "mentions_max_steps": "max steps" in a,
       }
       score = 1.0
       if flags["empty"]: score -= 0.6
       if flags["generic"]: score -= 0.25
       if flags["mentions_max_steps"]: score -= 0.2
       score = max(0.0, min(1.0, score))
       return {"score": score, "flags": flags}


   def refine(self, cfg: AgentConfig, eval_report: Dict[str, Any], task: str) -> AgentConfig:
       new_cfg = cfg.model_copy(deep=True)


       if eval_report["flags"]["generic"] or eval_report["flags"]["mentions_max_steps"]:
           new_cfg.planner.max_steps = min(18, new_cfg.planner.max_steps + 6)
           new_cfg.planner.temperature = min(0.35, new_cfg.planner.temperature + 0.05)
           if new_cfg.memory.kind != "retrieval_tfidf":
               new_cfg.memory.kind = "retrieval_tfidf"
               new_cfg.memory.max_items = max(new_cfg.memory.max_items, 200)
               new_cfg.memory.retrieval_k = max(new_cfg.memory.retrieval_k, 6)


       t = task.lower()
       if any(k in t for k in ["csv", "dataframe", "pandas", "dataset", "table"]):
           if not any(ts.name == "csv_profile" for ts in new_cfg.tools):
               new_cfg.tools.append(ToolSpec(
                   name="csv_profile",
                   description="Load a CSV from a local path and print a quick profile (head, describe).",
                   inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]}
               ))


       return new_cfg


   def build_and_run(self, task: str, improve_rounds: int = 1, verbose: bool = True) -> Tuple[str, AgentConfig]:
       cfg = self.design(task)
       agent = self.instantiate(cfg)


       if verbose:
           print("n==============================")
           print("META-AGENT: DESIGNED CONFIG")
           print("==============================")
           print(cfg.model_dump_json(indent=2))


       ans = agent.run(task, verbose=verbose)
       report = self.evaluate(task, ans)


       if verbose:
           print("n==============================")
           print("EVALUATION REPORT")
           print("==============================")
           print(json.dumps(report, indent=2))
           print("n==============================")
           print("FINAL ANSWER")
           print("==============================")
           print(ans)


       for r in range(improve_rounds):
           if report["score"] >= 0.85:
               break
           cfg = self.refine(cfg, report, task)
           agent = self.instantiate(cfg)
           if verbose:
               print(f"nn==============================")
               print(f"SELF-IMPROVEMENT ROUND {r+1}: UPDATED CONFIG")
               print("==============================")
               print(cfg.model_dump_json(indent=2))
           ans = agent.run(task, verbose=verbose)
           report = self.evaluate(task, ans)
           if verbose:
               print("nEVAL:", json.dumps(report, indent=2))
               print("nANSWER:n", ans)


       return ans, cfg


meta = MetaAgent()


examples = [
   "Design an agent workflow to summarize a long meeting transcript and extract action items. Keep it concise.",
   "I have a local CSV at /content/sample.csv. Profile it and tell me the top 3 insights.",
   "Compute the monthly payment for a $12,000 loan at 8% APR over 36 months. Show the formula briefly.",
]


print("n==============================")
print("RUNNING A QUICK DEMO TASK")
print("==============================")
demo_task = examples[2]
_ = meta.build_and_run(demo_task, improve_rounds=1, verbose=True)

We implement MetaAgent, which analyzes tasks, designs agent configurations, instantiates runtimes, evaluates performance, and refines the architecture as needed. We apply capability heuristics to dynamically choose tools, memory strategy, and planner depth. We then demonstrate the full build-and-run pipeline, including optional self-improvement, to complete the automated agent design lifecycle.

In conclusion, we demonstrated how a Meta-Agent can move from passive task execution to active architecture construction. We designed agents programmatically, instantiated them automatically, evaluated their outputs, and refined their configurations through a self-improvement loop. We showed that agentic systems can reason not only about tasks but also about their own structure, capabilities, and limitations. This approach pushes us toward self-evolving AI systems in which the architecture becomes adaptive, automated, and increasingly autonomous, bringing us closer to fully self-designing agent ecosystems.


Check out Full Codes hereAlso, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post How to Build a Self-Designing Meta-Agent That Automatically Constructs, Instantiates, and Refines Task-Specific AI Agents appeared first on MarkTechPost.

Similar Posts