Building a Context-Folding LLM Agent for Long-Horizon Reasoning with Memory Compression and Tool Use

In this tutorial, we discover the best way to construct a Context-Folding LLM Agent that effectively solves lengthy, complicated duties by intelligently managing restricted context. We design the agent to interrupt down a massive job into smaller subtasks, carry out reasoning or calculations when wanted, and then fold every accomplished sub-trajectory into concise summaries. By doing this, we protect important information whereas maintaining the lively reminiscence small. Check out the FULL CODES here.

Copy Code

import os, re, sys, math, random, json, textwrap, subprocess, shutil, time
from typing import List, Dict, Tuple
strive:
   import transformers
besides:
   subprocess.run([sys.executable, "-m", "pip", "install", "-q", "transformers", "accelerate", "sentencepiece"], test=True)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
MODEL_NAME = os.environ.get("CF_MODEL", "google/flan-t5-small")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
mannequin = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
llm = pipeline("text2text-generation", mannequin=mannequin, tokenizer=tokenizer, device_map="auto")
def llm_gen(immediate: str, max_new_tokens=160, temperature=0.0) -> str:
   out = llm(immediate, max_new_tokens=max_new_tokens, do_sample=temperature>0.0, temperature=temperature)[0]["generated_text"]
   return out.strip()

We start by organising our surroundings and loading a light-weight Hugging Face mannequin. We use this mannequin to generate and course of textual content regionally, making certain the agent runs easily on Google Colab with none API dependencies. Check out the FULL CODES here.

Copy Code

import ast, operator as op
OPS = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul, ast.Div: op.truediv, ast.Pow: op.pow, ast.USub: op.neg, ast.GroundDiv: op.floordiv, ast.Mod: op.mod}
def _eval_node(n):
   if isinstance(n, ast.Num): return n.n
   if isinstance(n, ast.UnaryOp) and kind(n.op) in OPS: return OPS[type(n.op)](_eval_node(n.operand))
   if isinstance(n, ast.BinOp) and kind(n.op) in OPS: return OPS[type(n.op)](_eval_node(n.left), _eval_node(n.proper))
   elevate ValueError("Unsafe expression")
def calc(expr: str):
   node = ast.parse(expr, mode='eval').physique
   return _eval_node(node)
class FoldingMemory:
   def __init__(self, max_chars:int=800):
       self.lively=[]; self.folds=[]; self.max_chars=max_chars
   def add(self,textual content:str):
       self.lively.append(textual content.strip())
       whereas len(self.active_text())>self.max_chars and len(self.lively)>1:
           popped=self.lively.pop(0)
           fold=f"- Folded: {popped[:120]}..."
           self.folds.append(fold)
   def fold_in(self,abstract:str): self.folds.append(abstract.strip())
   def active_text(self)->str: return "n".be part of(self.lively)
   def folded_text(self)->str: return "n".be part of(self.folds)
   def snapshot(self)->Dict: return {"active_chars":len(self.active_text()),"n_folds":len(self.folds)}

We outline a easy calculator instrument for primary arithmetic and create a reminiscence system that dynamically folds previous context into concise summaries. This helps us preserve a manageable lively reminiscence whereas retaining important data. Check out the FULL CODES here.

Copy Code

SUBTASK_DECOMP_PROMPT="""You are an professional planner. Decompose the duty beneath into 2-4 crisp subtasks.
Return every subtask as a bullet beginning with '- ' in precedence order.
Task: "{job}" """
SUBTASK_SOLVER_PROMPT="""You are a exact downside solver with minimal steps.
If a calculation is required, write one line 'CALC(expr)'.
Otherwise write 'ANSWER: <closing>'.
Think briefly; keep away from chit-chat.


Task: {job}
Subtask: {subtask}
Notes (folded context):
{notes}


Now reply with both CALC(...) or ANSWER: ..."""
SUBTASK_SUMMARY_PROMPT="""Summarize the subtask consequence in <=3 bullets, complete <=50 tokens.
Subtask: {title}
Steps:
{hint}
Final: {closing}
Return solely bullets beginning with '- '."""
FINAL_SYNTH_PROMPT="""You are a senior agent. Synthesize a closing, coherent resolution utilizing ONLY:
- The authentic job
- Folded summaries (beneath)
Avoid repeating steps. Be concise and actionable.


Task: {job}
Folded summaries:
{folds}


Final reply:"""
def parse_bullets(textual content:str)->List[str]:
   return [ln[2:].strip() for ln in textual content.splitlines() if ln.strip().startswith("- ")]

We design immediate templates that information the agent in decomposing duties, fixing subtasks, and summarizing outcomes. These structured prompts allow clear communication between reasoning steps and the mannequin’s responses. Check out the FULL CODES here.

Copy Code

def run_subtask(job:str, subtask:str, reminiscence:FoldingMemory, max_tool_iters:int=3)->Tuple[str,str,List[str]]:
   notes=(reminiscence.folded_text() or "(none)")
   hint=[]; closing=""
   for _ in vary(max_tool_iters):
       immediate=SUBTASK_SOLVER_PROMPT.format(job=job,subtask=subtask,notes=notes)
       out=llm_gen(immediate,max_new_tokens=96); hint.append(out)
       m=re.search(r"CALC((.+?))",out)
       if m:
           strive:
               val=calc(m.group(1))
               hint.append(f"TOOL:CALC -> {val}")
               out2=llm_gen(immediate+f"nTool end result: {val}nNow produce 'ANSWER: ...' solely.",max_new_tokens=64)
               hint.append(out2)
               if out2.strip().startswith("ANSWER:"):
                   closing=out2.cut up("ANSWER:",1)[1].strip(); break
           besides Exception as e:
               hint.append(f"TOOL:CALC ERROR -> {e}")
       if out.strip().startswith("ANSWER:"):
           closing=out.cut up("ANSWER:",1)[1].strip(); break
   if not closing:
       closing="No definitive reply; partial reasoning:n"+"n".be part of(hint[-2:])
   summ=llm_gen(SUBTASK_SUMMARY_PROMPT.format(title=subtask,hint="n".be part of(hint),closing=closing),max_new_tokens=80)
   summary_bullets="n".be part of(parse_bullets(summ)[:3]) or f"- {subtask}: {closing[:60]}..."
   return closing, summary_bullets, hint
class ContextFoldingAgent:
   def __init__(self,max_active_chars:int=800):
       self.reminiscence=FoldingMemory(max_chars=max_active_chars)
       self.metrics={"subtasks":0,"tool_calls":0,"chars_saved_est":0}
   def decompose(self,job:str)->List[str]:
       plan=llm_gen(SUBTASK_DECOMP_PROMPT.format(job=job),max_new_tokens=96)
       subs=parse_bullets(plan)
       return subs[:4] if subs else ["Main solution"]
   def run(self,job:str)->Dict:
       t0=time.time()
       self.reminiscence.add(f"TASK: {job}")
       subtasks=self.decompose(job)
       self.metrics["subtasks"]=len(subtasks)
       folded=[]
       for st in subtasks:
           self.reminiscence.add(f"SUBTASK: {st}")
           closing,fold_summary,hint=run_subtask(job,st,self.reminiscence)
           self.reminiscence.fold_in(fold_summary)
           folded.append(f"- {st}: {closing}")
           self.reminiscence.add(f"SUBTASK_DONE: {st}")
       closing=llm_gen(FINAL_SYNTH_PROMPT.format(job=job,folds=self.reminiscence.folded_text()),max_new_tokens=200)
       t1=time.time()
       return {"job":job,"closing":closing.strip(),"folded_summaries":self.reminiscence.folded_text(),
               "active_context_chars":len(self.reminiscence.active_text()),
               "subtask_finals":folded,"runtime_sec":spherical(t1-t0,2)}

We implement the agent’s core logic, during which every subtask is executed, summarized, and folded again into reminiscence. This step demonstrates how context folding allows the agent to purpose iteratively with out shedding observe of prior reasoning. Check out the FULL CODES here.

Copy Code

DEMO_TASKS=[
   "Plan a 3-day study schedule for ML with daily workouts and simple meals; include time blocks.",
   "Compute a small project budget with 3 items (laptop 799.99, course 149.5, snacks 23.75), add 8% tax and 5% buffer, and present a one-paragraph recommendation."
]
def fairly(d): return json.dumps(d, indent=2, ensure_ascii=False)
if __name__=="__main__":
   agent=ContextFoldingAgent(max_active_chars=700)
   for i,job in enumerate(DEMO_TASKS,1):
       print("="*70)
       print(f"DEMO #{i}: {job}")
       res=agent.run(job)
       print("n--- Folded Summaries ---n"+(res["folded_summaries"] or "(none)"))
       print("n--- Final Answer ---n"+res["final"])
       print("n--- Diagnostics ---")
       diag={okay:res[k] for okay in ["active_context_chars","runtime_sec"]}
       diag["n_subtasks"]=len(agent.decompose(job))
       print(fairly(diag))

We run the agent on pattern duties to watch the way it plans, executes, and synthesizes closing outcomes. Through these examples, we see the whole context-folding course of in motion, producing concise and coherent outputs.

In conclusion, we show how context folding allows long-horizon reasoning whereas avoiding reminiscence overload. We see how every subtask is deliberate, executed, summarized, and distilled into compact information, mimicking how an clever agent would deal with complicated workflows over time. By combining decomposition, instrument use, and context compression, we create a light-weight but highly effective agentic system that scales reasoning effectively.

Check out the FULL CODES here and Paper . Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The put up Building a Context-Folding LLM Agent for Long-Horizon Reasoning with Memory Compression and Tool Use appeared first on MarkTechPost.

Building a Context-Folding LLM Agent for Long-Horizon Reasoning with Memory Compression and Tool Use

How to Build a Multi-Round Deep Research Agent with Gemini, DuckDuckGo API, and Automated Reporting?

Comparing the Top 6 Agent-Native Rails for the Agentic Internet: MCP, A2A, AP2, ACP, x402, and Kite

Gemini Embedding-001 Now Available: Multilingual AI Text Embeddings via Google API

Guardrails AI Introduces Snowglobe: The Simulation Engine for AI Agents and Chatbots

Meta AI Researchers Introduce Matrix: A Ray Native a Decentralized Framework for Multi Agent Synthetic Data Generation

Building your agentic stack: A roadmap to real integration

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!