A Coding Guide to Building a Brain-Inspired Hierarchical Reasoning AI Agent with Hugging Face Models

On this tutorial, we got down to recreate the spirit of the Hierarchical Reasoning Mannequin (HRM) utilizing a free Hugging Face mannequin that runs domestically. We stroll by way of the design of a light-weight but structured reasoning agent, the place we act as each architects and experimenters. By breaking issues into subgoals, fixing them with Python, critiquing the outcomes, and synthesizing a remaining reply, we will expertise how hierarchical planning and execution can improve reasoning efficiency. This course of permits us to see, in real-time, how a brain-inspired workflow will be applied with out requiring huge mannequin sizes or costly APIs. Try the Paper and FULL CODES.

Copy Code

!pip -q set up -U transformers speed up bitsandbytes wealthy


import os, re, json, textwrap, traceback
from typing import Dict, Any, Checklist
from wealthy import print as rprint
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline


MODEL_NAME = "Qwen/Qwen2.5-1.5B-Instruct"
DTYPE = torch.bfloat16 if torch.cuda.is_available() else torch.float32

We start by putting in the required libraries and loading the Qwen2.5-1.5B-Instruct mannequin from Hugging Face. We set the information kind primarily based on GPU availability to make sure environment friendly mannequin execution in Colab.

Copy Code

tok = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
mannequin = AutoModelForCausalLM.from_pretrained(
   MODEL_NAME,
   device_map="auto",
   torch_dtype=DTYPE,
   load_in_4bit=True
)
gen = pipeline(
   "text-generation",
   mannequin=mannequin,
   tokenizer=tok,
   return_full_text=False
)

We load the tokenizer and mannequin, configure it to run in 4-bit for effectivity, and wrap every little thing in a text-generation pipeline so we will work together with the mannequin simply in Colab. Try the Paper and FULL CODES.

Copy Code

def chat(immediate: str, system: str = "", max_new_tokens: int = 512, temperature: float = 0.3) -> str:
   msgs = []
   if system:
       msgs.append({"position":"system","content material":system})
   msgs.append({"position":"consumer","content material":immediate})
   inputs = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
   out = gen(inputs, max_new_tokens=max_new_tokens, do_sample=(temperature>0), temperature=temperature, top_p=0.9)
   return out[0]["generated_text"].strip()


def extract_json(txt: str) -> Dict[str, Any]:
   m = re.search(r"{[sS]*}$", txt.strip())
   if not m:
       m = re.search(r"{[sS]*?}", txt)
   strive:
       return json.hundreds(m.group(0)) if m else {}
   besides Exception:
       # fallback: strip code fences
       s = re.sub(r"^```.*?n|n```$", "", txt, flags=re.S)
       strive:
           return json.hundreds(s)
       besides Exception:
           return {}

We outline helper capabilities: the chat perform permits us to ship prompts to the mannequin with optionally available system directions and sampling controls, whereas extract_json helps us parse structured JSON outputs from the mannequin reliably, even when the response consists of code fences or extra textual content. Try the Paper and FULL CODES.

Copy Code

def extract_code(txt: str) -> str:
   m = re.search(r"```(?:python)?s*([sS]*?)```", txt, flags=re.I)
   return (m.group(1) if m else txt).strip()


def run_python(code: str, env: Dict[str, Any] | None = None) -> Dict[str, Any]:
   import io, contextlib
   g = {"__name__": "__main__"}; l = {}
   if env: g.replace(env)
   buf = io.StringIO()
   strive:
       with contextlib.redirect_stdout(buf):
           exec(code, g, l)
       out = l.get("RESULT", g.get("RESULT"))
       return {"okay": True, "end result": out, "stdout": buf.getvalue()}
   besides Exception as e:
       return {"okay": False, "error": str(e), "hint": traceback.format_exc(), "stdout": buf.getvalue()}


PLANNER_SYS = """You're the HRM Planner.
Decompose the TASK into 2–4 atomic, code-solvable subgoals.
Return compact JSON solely: {"subgoals":[...], "final_format":"<one-line reply format>"}."""


SOLVER_SYS = """You're the HRM Solver.
Given SUBGOAL and CONTEXT vars, output a single Python snippet.
Guidelines:
- Compute deterministically.
- Set a variable RESULT to the reply.
- Hold code quick; stdlib solely.
Return solely a Python code block."""


CRITIC_SYS = """You're the HRM Critic.
Given TASK and LOGS (subgoal outcomes), resolve if remaining reply is prepared.
Return JSON solely: "revise","critique":"...", "fix_hint":"<if revise>"."""


SYNTH_SYS = """You're the HRM Synthesizer.
Given TASK, LOGS, and final_format, output solely the ultimate reply (no steps).
Observe final_format precisely."""

We add two essential items: utility capabilities and system prompts. The extract_code perform pulls Python snippets from the mannequin’s output, whereas run_python safely executes these snippets and captures their outcomes. Alongside, we outline 4 position prompts, Planner, Solver, Critic, and Synthesizer, which information the mannequin to interrupt duties into subgoals, clear up them with code, confirm correctness, and at last produce a clear reply. Try the Paper and FULL CODES.

Copy Code

def plan(process: str) -> Dict[str, Any]:
   p = f"TASK:n{process}nReturn JSON solely."
   return extract_json(chat(p, PLANNER_SYS, temperature=0.2, max_new_tokens=300))


def solve_subgoal(subgoal: str, context: Dict[str, Any]) -> Dict[str, Any]:
   immediate = f"SUBGOAL:n{subgoal}nCONTEXT vars: {listing(context.keys())}nReturn Python code solely."
   code = extract_code(chat(immediate, SOLVER_SYS, temperature=0.2, max_new_tokens=400))
   res = run_python(code, env=context)
   return {"subgoal": subgoal, "code": code, "run": res}


def critic(process: str, logs: Checklist[Dict[str, Any]]) -> Dict[str, Any]:
   pl = [{"subgoal": L["subgoal"], "end result": L["run"].get("end result"), "okay": L["run"]["ok"]} for L in logs]
   out = chat("TASK:n"+process+"nLOGS:n"+json.dumps(pl, ensure_ascii=False, indent=2)+"nReturn JSON solely.",
              CRITIC_SYS, temperature=0.1, max_new_tokens=250)
   return extract_json(out)


def refine(process: str, logs: Checklist[Dict[str, Any]]) -> Dict[str, Any]:
   sys = "Refine subgoals minimally to repair points. Return identical JSON schema as planner."
   out = chat("TASK:n"+process+"nLOGS:n"+json.dumps(logs, ensure_ascii=False)+"nReturn JSON solely.",
              sys, temperature=0.2, max_new_tokens=250)
   j = extract_json(out)
   return j if j.get("subgoals") else {}


def synthesize(process: str, logs: Checklist[Dict[str, Any]], final_format: str) -> str:
   packed = [{"subgoal": L["subgoal"], "end result": L["run"].get("end result")} for L in logs]
   return chat("TASK:n"+process+"nLOGS:n"+json.dumps(packed, ensure_ascii=False)+
               f"nfinal_format: {final_format}nOnly the ultimate reply.",
               SYNTH_SYS, temperature=0.0, max_new_tokens=120).strip()


def hrm_agent(process: str, context: Dict[str, Any] | None = None, funds: int = 2) -> Dict[str, Any]:
   ctx = dict(context or {})
   hint, plan_json = [], plan(process)
   for round_id in vary(1, funds+1):
       logs = [solve_subgoal(sg, ctx) for sg in plan_json.get("subgoals", [])]
       for L in logs:
           ctx_key = f"g{len(hint)}_{abs(hash(L['subgoal']))%9999}"
           ctx[ctx_key] = L["run"].get("end result")
       verdict = critic(process, logs)
       hint.append({"spherical": round_id, "plan": plan_json, "logs": logs, "verdict": verdict})
       if verdict.get("motion") == "submit": break
       plan_json = refine(process, logs) or plan_json
   remaining = synthesize(process, hint[-1]["logs"], plan_json.get("final_format", "Reply: <worth>"))
   return {"remaining": remaining, "hint": hint}

We implement the total HRM loop: we plan subgoals, clear up every by producing and working Python (capturing RESULT), then we critique, optionally refine the plan, and synthesize a clear remaining reply. We orchestrate these rounds in hrm_agent, carrying ahead intermediate outcomes as context so we iteratively enhance and cease as soon as the critic says “submit.” Try the Paper and FULL CODES.

Copy Code

ARC_TASK = textwrap.dedent("""
Infer the transformation rule from prepare examples and apply to check.
Return precisely: "Reply: <grid>", the place <grid> is a Python listing of lists of ints.
""").strip()
ARC_DATA = {
   "prepare": [
       {"inp": [[0,0],[1,0]], "out": [[1,1],[0,1]]},
       {"inp": [[0,1],[0,0]], "out": [[1,0],[1,1]]}
   ],
   "check": [[0,0],[0,1]]
}
res1 = hrm_agent(ARC_TASK, context={"TRAIN": ARC_DATA["train"], "TEST": ARC_DATA["test"]}, funds=2)
rprint("n[bold]Demo 1 — ARC-like Toy[/bold]")
rprint(res1["final"])


WM_TASK = "A tank holds 1200 L. It leaks 2% per hour for 3 hours, then is refilled by 150 L. Return precisely: 'Reply: <liters>'."
res2 = hrm_agent(WM_TASK, context={}, funds=2)
rprint("n[bold]Demo 2 — Phrase Math[/bold]")
rprint(res2["final"])


rprint("n[dim]Rounds executed (Demo 1):[/dim]", len(res1["trace"]))

We run two demos to validate the agent: an ARC-style process the place we infer a metamorphosis from prepare pairs and apply it to a check grid, and a word-math downside that checks numeric reasoning. We name hrm_agent with every process, print the ultimate solutions, and in addition show the variety of reasoning rounds the ARC run takes.

In conclusion, we acknowledge that what we have now constructed is greater than a easy demonstration; it’s a window into how hierarchical reasoning could make smaller fashions punch above their weight. By layering planning, fixing, and critiquing, we empower a free Hugging Face mannequin to carry out duties with stunning robustness. We depart with a deeper appreciation of how brain-inspired constructions, when paired with sensible, open-source instruments, allow us to discover reasoning benchmarks and experiment creatively with out incurring excessive prices. This hands-on journey exhibits us that superior cognitive-like workflows are accessible to anybody keen to tinker, iterate, and be taught.

Try the Paper and FULL CODES. Be happy to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter.

The submit A Coding Guide to Building a Brain-Inspired Hierarchical Reasoning AI Agent with Hugging Face Models appeared first on MarkTechPost.