How to Design a Fully Local Multi-Agent Orchestration System Using TinyLlama for Intelligent Task Decomposition and Autonomous Collaboration

In this tutorial, we discover how we are able to orchestrate a workforce of specialised AI brokers domestically utilizing an environment friendly manager-agent structure powered by TinyLlama. We stroll by means of how we construct structured process decomposition, inter-agent collaboration, and autonomous reasoning loops with out counting on any exterior APIs. By operating the whole lot immediately by means of the transformers library, we create a absolutely offline, light-weight, and clear multi-agent system that we are able to customise, examine, and prolong. Through the snippets, we observe how every element, from process buildings to agent prompts to end result synthesis, comes collectively to type a coherent human-AI workflow that we management end-to-end. Check out the FULL CODES here.

Copy Code

!pip set up transformers torch speed up bitsandbytes -q


import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import json
import re
from typing import List, Dict, Any
from dataclasses import dataclass, asdict
from datetime import datetime


@dataclass
class Task:
   id: str
   description: str
   assigned_to: str = None
   standing: str = "pending"
   end result: Any = None
   dependencies: List[str] = None
  
   def __post_init__(self):
       if self.dependencies is None:
           self.dependencies = []


@dataclass
class Agent:
   identify: str
   position: str
   experience: str
   system_prompt: str

We arrange all of the core imports and outline the basic information buildings wanted to handle duties and brokers. We outline Task and Agent as structured entities to cleanly orchestrate work. By doing this, we be sure that each a part of the system has a constant and dependable basis. Check out the FULL CODES here.

Copy Code

AGENT_REGISTRY = {
   "researcher": Agent(
       identify="researcher",
       position="Research Specialist",
       experience="Information gathering, evaluation, and synthesis",
       system_prompt="You are a analysis specialist. Provide thorough analysis on subjects."
   ),
   "coder": Agent(
       identify="coder",
       position="Software Engineer",
       experience="Writing clear, environment friendly code with greatest practices",
       system_prompt="You are an skilled programmer. Write clear, well-documented code."
   ),
   "author": Agent(
       identify="author",
       position="Content Writer",
       experience="Clear communication and documentation",
       system_prompt="You are a skilled author. Create clear, partaking content material."
   ),
   "analyst": Agent(
       identify="analyst",
       position="Data Analyst",
       experience="Data interpretation and insights",
       system_prompt="You are a information analyst. Provide clear insights from information."
   )
}


class LocalLLM:
   def __init__(self, model_name: str = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"):
       self.tokenizer = AutoTokenizer.from_pretrained(model_name)
       quantization_config = BitsAndBytesConfig(
           load_in_4bit=True,
           bnb_4bit_compute_dtype=torch.float16
       ) if torch.cuda.is_available() else None
       self.mannequin = AutoModelForCausalLM.from_pretrained(
           model_name,
           quantization_config=quantization_config,
           device_map="auto",
           low_cpu_mem_usage=True
       )
       if self.tokenizer.pad_token is None:
           self.tokenizer.pad_token = self.tokenizer.eos_token
          
   def generate(self, immediate: str, max_tokens: int = 300) -> str:
       formatted_prompt = f"<|system|>nYou are a useful AI assistant.</s>n<|person|>n{immediate}</s>n<|assistant|>n"
       inputs = self.tokenizer(
           formatted_prompt,
           return_tensors="pt",
           truncation=True,
           max_length=1024,
           padding=True
       )
       inputs = {okay: v.to(self.mannequin.gadget) for okay, v in inputs.objects()}
       with torch.no_grad():
           outputs = self.mannequin.generate(
               **inputs,
               max_new_tokens=max_tokens,
               temperature=0.7,
               do_sample=True,
               top_p=0.9,
               pad_token_id=self.tokenizer.pad_token_id,
               eos_token_id=self.tokenizer.eos_token_id,
               use_cache=True
           )
       full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
       if "<|assistant|>" in full_response:
           return full_response.break up("<|assistant|>")[-1].strip()
       return full_response[len(formatted_prompt):].strip()

We register all our specialised brokers and implement the native LLM wrapper that powers the system. We load TinyLlama in an environment friendly 4-bit mode so we are able to run the whole lot easily on Colab or native {hardware}. With this, we give ourselves a versatile and absolutely native method to generate responses for every agent. Check out the FULL CODES here.

Copy Code

class ManagerAgent:
   def __init__(self, model_name: str = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"):
       self.llm = LocalLLM(model_name)
       self.brokers = AGENT_REGISTRY
       self.duties: Dict[str, Task] = {}
       self.execution_log = []
      
   def log(self, message: str):
       timestamp = datetime.now().strftime("%H:%M:%S")
       log_entry = f"[{timestamp}] {message}"
       self.execution_log.append(log_entry)
       print(log_entry)
  
   def decompose_goal(self, objective: str) -> List[Task]:
       self.log(f" Decomposing objective: {objective}")
       agent_info = "n".be part of([f"- {name}: {agent.expertise}" for name, agent in self.agents.items()])
       immediate = f"""Break down this objective into 3 particular subtasks. Assign every to the very best agent.


Goal: {objective}


Available brokers:
{agent_info}


Respond ONLY with a JSON array."""
       response = self.llm.generate(immediate, max_tokens=250)
       attempt:
           json_match = re.search(r'[s*{.*?}s*]', response, re.DOTALL)
           if json_match:
               tasks_data = json.hundreds(json_match.group())
           else:
               elevate ValueError("No JSON discovered")
       besides:
           tasks_data = self._create_default_tasks(objective)
      
       duties = []
       for i, task_data in enumerate(tasks_data[:3]):
           process = Task(
               id=task_data.get('id', f'task_{i+1}'),
               description=task_data.get('description', f'Work on: {objective}'),
               assigned_to=task_data.get('assigned_to', listing(self.brokers.keys())[i % len(self.agents)]),
               dependencies=task_data.get('dependencies', [] if i == 0 else [f'task_{i}'])
           )
           self.duties[task.id] = process
           duties.append(process)
           self.log(f"  ✓ {process.id}: {process.description[:50]}... → {process.assigned_to}")
      
       return duties

We start establishing the ManagerAgent class and give attention to how we decompose a high-level objective into well-defined subtasks. We generate structured JSON-based duties and robotically assign them to the correct agent. By doing this, we permit the system to assume step-by-step and set up work identical to a human mission supervisor. Check out the FULL CODES here.

Copy Code

 def _create_default_tasks(self, objective: str) -> List[Dict]:
       if any(phrase in objective.decrease() for phrase in ['code', 'program', 'implement', 'algorithm']):
           return [
               {"id": "task_1", "description": f"Research and explain the concept: {goal}", "assigned_to": "researcher", "dependencies": []},
               {"id": "task_2", "description": f"Write code implementation for: {objective}", "assigned_to": "coder", "dependencies": ["task_1"]},
               {"id": "task_3", "description": f"Create documentation and examples", "assigned_to": "author", "dependencies": ["task_2"]}
           ]
       return [
           {"id": "task_1", "description": f"Research: {goal}", "assigned_to": "researcher", "dependencies": []},
           {"id": "task_2", "description": f"Analyze findings and construction content material", "assigned_to": "analyst", "dependencies": ["task_1"]},
           {"id": "task_3", "description": f"Write complete response", "assigned_to": "author", "dependencies": ["task_2"]}
       ]
  
   def execute_task(self, process: Task, context: Dict[str, Any] = None) -> str:
       self.log(f" Executing {process.id} with {process.assigned_to}")
       process.standing = "in_progress"
       agent = self.brokers[task.assigned_to]
       context_str = ""
       if context and process.dependencies:
           context_str = "nnContext from earlier duties:n"
           for dep_id in process.dependencies:
               if dep_id in context:
                   context_str += f"- {context[dep_id][:150]}...n"
      
       immediate = f"""{agent.system_prompt}


Task: {process.description}{context_str}


Provide a clear, concise response:"""
       end result = self.llm.generate(immediate, max_tokens=250)
       process.end result = end result
       process.standing = "accomplished"
       self.log(f"  ✓ Completed {process.id}")
       return end result

We outline fallback process logic and the complete execution movement for every process. We information every agent with its personal system immediate and present contextual data to hold outcomes coherent. This permits us to execute duties intelligently whereas respecting dependency order. Check out the FULL CODES here.

Copy Code

def synthesize_results(self, objective: str, outcomes: Dict[str, str]) -> str:
       self.log(" Synthesizing last outcomes")
       results_text = "nn".be part of([f"Task {tid}:n{res[:200]}" for tid, res in outcomes.objects()])
       immediate = f"""Combine these process outcomes into one last coherent reply.


Original Goal: {objective}


Task Results:
{results_text}


Final complete reply:"""
       return self.llm.generate(immediate, max_tokens=350)
  
   def execute_goal(self, objective: str) -> Dict[str, Any]:
       self.log(f"n{'='*60}n Starting Manager Agentn{'='*60}")
       duties = self.decompose_goal(objective)
       outcomes = {}
       accomplished = set()
       max_iterations = len(duties) * 2
       iteration = 0
      
       whereas len(accomplished) < len(duties) and iteration < max_iterations:
           iteration += 1
           for process in duties:
               if process.id in accomplished:
                   proceed
               deps_met = all(dep in accomplished for dep in process.dependencies)
               if deps_met:
                   end result = self.execute_task(process, outcomes)
                   outcomes[task.id] = end result
                   accomplished.add(process.id)
      
       final_output = self.synthesize_results(objective, outcomes)
       self.log(f"n{'='*60}n Execution Complete!n{'='*60}n")
      
       return {
           "objective": objective,
           "duties": [asdict(task) for task in tasks],
           "final_output": final_output,
           "execution_log": self.execution_log
       }

We synthesize the outputs from all subtasks and convert them into one unified last reply. We additionally implement an orchestration loop that ensures every process runs solely after its dependencies are full. This snippet reveals how we carry the whole lot collectively into a easy multi-step reasoning pipeline. Check out the FULL CODES here.

Copy Code

def demo_basic():
   supervisor = ManagerAgent()
   objective = "Explain binary search algorithm with a easy instance"
   end result = supervisor.execute_goal(objective)
   print("n" + "="*60)
   print("FINAL OUTPUT")
   print("="*60)
   print(end result["final_output"])
   return end result


def demo_coding():
   supervisor = ManagerAgent()
   objective = "Implement a perform to discover the utmost factor in a listing"
   end result = supervisor.execute_goal(objective)
   print("n" + "="*60)
   print("FINAL OUTPUT")
   print("="*60)
   print(end result["final_output"])
   return end result


def demo_custom(custom_goal: str):
   supervisor = ManagerAgent()
   end result = supervisor.execute_goal(custom_goal)
   print("n" + "="*60)
   print("FINAL OUTPUT")
   print("="*60)
   print(end result["final_output"])
   return end result


if __name__ == "__main__":
   print(" Manager Agent Tutorial - APIless Local Version")
   print("="*60)
   print("Using TinyLlama (1.1B) - Fast & environment friendly!n")
   end result = demo_basic()
   print("nn Try extra:")
   print("  - demo_coding()")
   print("  - demo_custom('your objective right here')")

We present demonstration features to simply take a look at our system with completely different targets. We run pattern duties to observe how the supervisor decomposes, executes, and synthesizes work in actual time. This provides us an interactive method to perceive all the workflow and refine it additional.

In conclusion, we reveal how to design and function a full multi-agent orchestration system domestically with minimal dependencies. We now perceive how the supervisor breaks down targets, routes duties to the correct skilled brokers, collects their outputs, resolves dependencies, and synthesizes the ultimate end result. This implementation permits us to respect how modular, predictable, and highly effective native agentic patterns might be when constructed from scratch.

Check out the FULL CODES here. Feel free to try our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to observe us on Twitter and don’t overlook to be part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit How to Design a Fully Local Multi-Agent Orchestration System Using TinyLlama for Intelligent Task Decomposition and Autonomous Collaboration appeared first on MarkTechPost.

How to Design a Fully Local Multi-Agent Orchestration System Using TinyLlama for Intelligent Task Decomposition and Autonomous Collaboration

AI Interview Series #4: Transformers vs Mixture of Experts (MoE)

Building a Context-Aware Multi-Agent AI System Using Nomic Embeddings and Gemini LLM

11 Best AI Agent Frameworks for Software Developers

Google Researchers Release Magenta RealTime: An Open-Weight Model for Real-Time AI Music Generation

How to Build a Model-Native Agent That Learns Internal Planning, Memory, and Multi-Tool Reasoning Through End-to-End Reinforcement Learning

Is Model Context Protocol MCP the Missing Standard in AI Infrastructure?

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!