How to Build a Fully Self-Verifying Data Operations AI Agent Using Local Hugging Face Models for Automated Planning, Execution, and Testing

In this tutorial, we construct a self-verifying DataOps AIAgent that may plan, execute, and check information operations robotically utilizing native Hugging Face fashions. We design the agent with three clever roles: a Planner that creates an execution technique, an Executor that writes and runs code utilizing pandas, and a Tester that validates the outcomes for accuracy and consistency. By utilizing Microsoft’s Phi-2 mannequin regionally in Google Colab, we make sure that the workflow stays environment friendly, reproducible, and privacy-preserving whereas demonstrating how LLMs can automate advanced data-processing duties end-to-end. Check out the FULL CODES here.

Copy Code

!pip set up -q transformers speed up bitsandbytes scipy
import json, pandas as pd, numpy as np, torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig


MODEL_NAME = "microsoft/phi-2"


class LocalLLM:
   def __init__(self, model_name=MODEL_NAME, use_8bit=False):
       print(f"Loading mannequin: {model_name}")
       self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
       if self.tokenizer.pad_token is None:
           self.tokenizer.pad_token = self.tokenizer.eos_token
       model_kwargs = {"device_map": "auto", "trust_remote_code": True}
       if use_8bit and torch.cuda.is_available():
           model_kwargs["quantization_config"] = BitsAndBytesConfig(load_in_8bit=True)
       else:
           model_kwargs["torch_dtype"] = torch.float32 if not torch.cuda.is_available() else torch.float16
       self.mannequin = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
       self.pipe = pipeline("text-generation", mannequin=self.mannequin, tokenizer=self.tokenizer,
                            max_new_tokens=512, do_sample=True, temperature=0.3, top_p=0.9,
                            pad_token_id=self.tokenizer.eos_token_id)
       print("✓ Model loaded efficiently!n")


   def generate(self, immediate, system_prompt="", temperature=0.3):
       if system_prompt:
           full_prompt = f"Instruct: {system_prompt}nn{immediate}nOutput:"
       else:
           full_prompt = f"Instruct: {immediate}nOutput:"
       output = self.pipe(full_prompt, temperature=temperature, do_sample=temperature>0,
                          return_full_text=False, eos_token_id=self.tokenizer.eos_token_id)
       end result = output[0]['generated_text'].strip()
       if "Instruct:" in end result:
           end result = end result.break up("Instruct:")[0].strip()
       return end result

We set up the required libraries and load the Phi-2 mannequin regionally utilizing Hugging Face Transformers. We create a LocalLLM class that initializes the tokenizer and mannequin, helps elective quantization, and defines a generate methodology to produce textual content outputs. We make sure that the mannequin runs easily on each CPU and GPU, making it ultimate for use on Colab. Check out the FULL CODES here.

Copy Code

PLANNER_PROMPT = """You are a Data Operations Planner. Create a detailed execution plan as legitimate JSON.


Return ONLY a JSON object (no different textual content) with this construction:
{"steps": ["step 1","step 2"],"expected_output":"description","validation_criteria":["criteria 1","criteria 2"]}"""


EXECUTOR_PROMPT = """You are a Data Operations Executor. Write Python code utilizing pandas.


Requirements:
- Use pandas (imported as pd) and numpy (imported as np)
- Store ultimate lead to variable 'end result'
- Return ONLY Python code, no explanations or markdown"""


TESTER_PROMPT = """You are a Data Operations Tester. Verify execution outcomes.


Return ONLY a JSON object (no different textual content) with this construction:
{"handed":true,"points":["any issues found"],"suggestions":["suggestions"]}"""


class DataOpsAgent:
   def __init__(self, llm=None):
       self.llm = llm or LocalLLM()
       self.historical past = []


   def _extract_json(self, textual content):
       strive:
           return json.hundreds(textual content)
       besides:
           begin, finish = textual content.discover('{'), textual content.rfind('}')+1
           if begin >= 0 and finish > begin:
               strive:
                   return json.hundreds(textual content[start:end])
               besides:
                   go
       return None

We outline the system prompts for the Planner, Executor, and Tester roles of our DataOps Agent. We then initialize the DataOpsAgent class with helper strategies and a JSON extraction utility to parse structured responses. We put together the muse for the agent’s reasoning and execution pipeline. Check out the FULL CODES here.

Copy Code

 def plan(self, activity, data_info):
       print("n" + "="*60)
       print("PHASE 1: PLANNING")
       print("="*60)
       immediate = f"Task: {activity}nnData Information:n{data_info}nnCreate an execution plan as JSON with steps, expected_output, and validation_criteria."
       plan_text = self.llm.generate(immediate, PLANNER_PROMPT, temperature=0.2)
       self.historical past.append(("PLANNER", plan_text))
       plan = self._extract_json(plan_text) or {"steps":[task],"expected_output":"Processed information","validation_criteria":["Result generated","No errors"]}
       print(f"n Plan Created:")
       print(f"  Steps: {len(plan.get('steps', []))}")
       for i, step in enumerate(plan.get('steps', []), 1):
           print(f"    {i}. {step}")
       print(f"  Expected: {plan.get('expected_output', 'N/A')}")
       return plan


   def execute(self, plan, data_context):
       print("n" + "="*60)
       print("PHASE 2: EXECUTION")
       print("="*60)
       steps_text = 'n'.be a part of(f"{i}. {s}" for i, s in enumerate(plan.get('steps', []), 1))
       immediate = f"Task Steps:n{steps_text}nnData out there: DataBody 'df'n{data_context}nnWrite Python code to execute these steps. Store ultimate lead to 'end result' variable."
       code = self.llm.generate(immediate, EXECUTOR_PROMPT, temperature=0.1)
       self.historical past.append(("EXECUTOR", code))
       if "```python" in code: code = code.break up("```python")[1].break up("```")[0]
       elif "```" in code: code = code.break up("```")[1].break up("```")[0]
       traces = []
       for line in code.break up('n'):
           s = line.strip()
           if s and (not s.startswith('#') or 'import' in s):
               traces.append(line)
       code = 'n'.be a part of(traces).strip()
       print(f"n Generated Code:n" + "-"*60)
       for i, line in enumerate(code.break up('n')[:15],1):
           print(f"{i:2}. {line}")
       if len(code.break up('n'))>15: print(f"    ... ({len(code.break up('n'))-15} extra traces)")
       print("-"*60)
       return code

We implement the Planning and Execution phases of the agent. We let the Planner create detailed activity steps and validation standards, and then the Executor generates corresponding Python code primarily based on pandas to carry out the duty. We visualize how the agent autonomously transitions from reasoning to producing actionable code. Check out the FULL CODES here.

Copy Code

def check(self, plan, end result, execution_error=None):
       print("n" + "="*60)
       print("PHASE 3: TESTING & VERIFICATION")
       print("="*60)
       result_desc = f"EXECUTION ERROR: {execution_error}" if execution_error else f"Result kind: {kind(end result).__name__}n"
       if not execution_error:
           if isinstance(end result, pd.DataBody):
               result_desc += f"Shape: {end result.form}nColumns: {record(end result.columns)}nSample:n{end result.head(3).to_string()}"
           elif isinstance(end result, (int,float,str)):
               result_desc += f"Value: {end result}"
           else:
               result_desc += f"Value: {str(end result)[:200]}"
       criteria_text = 'n'.be a part of(f"- {c}" for c in plan.get('validation_criteria', []))
       immediate = f"Validation Criteria:n{criteria_text}nnExpected: {plan.get('expected_output', 'N/A')}nnActual Result:n{result_desc}nnEvaluate if end result meets standards. Return JSON with handed (true/false), points, and suggestions."
       test_result = self.llm.generate(immediate, TESTER_PROMPT, temperature=0.2)
       self.historical past.append(("TESTER", test_result))
       test_json = self._extract_json(test_result) or {"handed":execution_error is None,"points":["Could not parse test result"],"suggestions":["Review manually"]}
       print(f"n✓ Test Results:n  Status: {' PASSED' if test_json.get('handed') else ' FAILED'}")
       if test_json.get('points'):
           print("  Issues:")
           for situation in test_json['issues'][:3]:
               print(f"    • {situation}")
       if test_json.get('suggestions'):
           print("  Recommendations:")
           for rec in test_json['recommendations'][:3]:
               print(f"    • {rec}")
       return test_json


   def run(self, activity, df=None, data_info=None):
       print("n SELF-VERIFYING DATA-OPS AGENT (Local HF Model)")
       print(f"Task: {activity}n")
       if data_info is None and df isn't None:
           data_info = f"Shape: {df.form}nColumns: {record(df.columns)}nSample:n{df.head(2).to_string()}"
       plan = self.plan(activity, data_info)
       code = self.execute(plan, data_info)
       end result, error = None, None
       strive:
           local_vars = {'pd': pd, 'np': np, 'df': df}
           exec(code, local_vars)
           end result = local_vars.get('end result')
       besides Exception as e:
           error = str(e)
           print(f"n  Execution Error: {error}")
       test_result = self.check(plan, end result, error)
       return {'plan': plan,'code': code,'end result': end result,'check': test_result,'historical past': self.historical past}

We give attention to the Testing and Verification section of our workflow. We let the agent consider its personal output in opposition to predefined validation standards and summarize the result as a structured JSON. We then combine all three phases, planning, execution, and testing, into a single self-verifying pipeline that ensures full automation. Check out the FULL CODES here.

Copy Code

def demo_basic(agent):
   print("n" + "#"*60)
   print("# DEMO 1: Sales Data Aggregation")
   print("#"*60)
   df = pd.DataBody({'product':['A','B','A','C','B','A','C'],
                      'gross sales':[100,150,200,80,130,90,110],
                      'area':['North','South','North','East','South','West','East']})
   activity = "Calculate whole gross sales by product"
   output = agent.run(activity, df)
   if output['result'] isn't None:
       print(f"n Final Result:n{output['result']}")
   return output


def demo_advanced(agent):
   print("n" + "#"*60)
   print("# DEMO 2: Customer Age Analysis")
   print("#"*60)
   df = pd.DataBody({'customer_id':vary(1,11),
                      'age':[25,34,45,23,56,38,29,41,52,31],
                      'purchases':[5,12,8,3,15,7,9,11,6,10],
                      'spend':[500,1200,800,300,1500,700,900,1100,600,1000]})
   activity = "Calculate common spend by age group: younger (beneath 35) and mature (35+)"
   output = agent.run(activity, df)
   if output['result'] isn't None:
       print(f"n Final Result:n{output['result']}")
   return output


if __name__ == "__main__":
   print(" Initializing Local LLM...")
   print("Using CPU mode for most compatibilityn")
   strive:
       llm = LocalLLM(use_8bit=False)
       agent = DataOpsAgent(llm)
       demo_basic(agent)
       print("nn")
       demo_advanced(agent)
       print("n" + "="*60)
       print(" Tutorial Complete!")
       print("="*60)
       print("nKey Features:")
       print("  • 100% Local - No API calls required")
       print("  • Uses Phi-2 from Microsoft (2.7B params)")
       print("  • Self-verifying 3-phase workflow")
       print("  • Runs on free Google Colab CPU/GPU")
   besides Exception as e:
       print(f"n Error: {e}")
       print("Troubleshooting:n1. pip set up -q transformers speed up scipyn2. Restart runtimen3. Try a totally different mannequin")

We constructed two demo examples to check the agent’s capabilities utilizing easy gross sales and buyer datasets. We initialize the mannequin, execute the Data-Ops workflow, and observe the complete cycle from planning to validation. We conclude the tutorial by summarizing key advantages and encouraging additional experimentation with native fashions.

In conclusion, we created a totally autonomous and self-verifying DataOps system powered by a native Hugging Face mannequin. We expertise how every stage, planning, execution, and testing, seamlessly interacts to produce dependable outcomes with out counting on any cloud APIs. This workflow highlights the power of native LLMs, similar to Phi-2, for light-weight automation and evokes us to broaden this structure for extra superior information pipelines, validation frameworks, and multi-agent information programs sooner or later.

Check out the FULL CODES here. Feel free to try our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t neglect to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The put up How to Build a Fully Self-Verifying Data Operations AI Agent Using Local Hugging Face Models for Automated Planning, Execution, and Testing appeared first on MarkTechPost.

How to Build a Fully Self-Verifying Data Operations AI Agent Using Local Hugging Face Models for Automated Planning, Execution, and Testing

Microsoft Releases ‘Microsoft Agent Framework’: An Open-Source SDK and Runtime that Simplifies the Orchestration of Multi-Agent Systems

Building a Human Handoff Interface for AI-Powered Insurance Agent Using Parlant and Streamlit

xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL)

Anthropic Launches Claude Sonnet 4.5 with New Coding and Agentic State-of-the-Art Results

Meta AI Researchers Introduce Matrix: A Ray Native a Decentralized Framework for Multi Agent Synthetic Data Generation

Moonshot AI Releases Kosong: The LLM Abstraction Layer that Powers Kimi CLI

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!