How to Build Ethically Aligned Autonomous Agents through Value-Guided Reasoning and Self-Correcting Decision-Making Using Open-Source Models

In this tutorial, we discover how we are able to construct an autonomous agent that aligns its actions with moral and organizational values. We use open-source Hugging Face fashions operating domestically in Colab to simulate a decision-making course of that balances aim achievement with ethical reasoning. Through this implementation, we show how we are able to combine a “coverage” mannequin that proposes actions and an “ethics decide” mannequin that evaluates and aligns them, permitting us to see worth alignment in apply with out relying on any APIs. Check out the FULL CODES here.

Copy Code

!pip set up -q transformers torch speed up sentencepiece


import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM


def generate_seq2seq(mannequin, tokenizer, immediate, max_new_tokens=128):
   inputs = tokenizer(immediate, return_tensors="pt")
   with torch.no_grad():
       output_ids = mannequin.generate(
           **inputs,
           max_new_tokens=max_new_tokens,
           do_sample=True,
           top_p=0.9,
           temperature=0.7,
           pad_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is just not None else tokenizer.pad_token_id,
       )
   return tokenizer.decode(output_ids[0], skip_special_tokens=True)


def generate_causal(mannequin, tokenizer, immediate, max_new_tokens=128):
   inputs = tokenizer(immediate, return_tensors="pt")
   with torch.no_grad():
       output_ids = mannequin.generate(
           **inputs,
           max_new_tokens=max_new_tokens,
           do_sample=True,
           top_p=0.9,
           temperature=0.7,
           pad_token_id=tokenizer.eos_token_id if tokenizer.eos_token_id is just not None else tokenizer.pad_token_id,
       )
   full_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
   return full_text[len(prompt):].strip()

We start by establishing our surroundings and importing important libraries from Hugging Face. We outline two helper capabilities that generate textual content utilizing sequence-to-sequence and causal fashions. This permits us to simply produce each reasoning-based and artistic outputs later within the tutorial. Check out the FULL CODES here.

Copy Code

policy_model_name = "distilgpt2"
judge_model_name = "google/flan-t5-small"


policy_tokenizer = AutoTokenizer.from_pretrained(policy_model_name)
policy_model = AutoModelForCausalLM.from_pretrained(policy_model_name)


judge_tokenizer = AutoTokenizer.from_pretrained(judge_model_name)
judge_model = AutoModelForSeq2SeqLM.from_pretrained(judge_model_name)


gadget = "cuda" if torch.cuda.is_available() else "cpu"
policy_model = policy_model.to(gadget)
judge_model = judge_model.to(gadget)


if policy_tokenizer.pad_token is None:
   policy_tokenizer.pad_token = policy_tokenizer.eos_token
if judge_tokenizer.pad_token is None:
   judge_tokenizer.pad_token = judge_tokenizer.eos_token

We load two small open-source fashions—distilgpt2 as our motion generator and flan-t5-small as our ethics reviewer. We put together each fashions and tokenizers for CPU or GPU execution, guaranteeing easy efficiency in Colab. This setup gives the muse for the agent’s reasoning and moral analysis. Check out the FULL CODES here.

Copy Code

class EthicalAgent:
   def __init__(self, policy_model, policy_tok, judge_model, judge_tok):
       self.policy_model = policy_model
       self.policy_tok = policy_tok
       self.judge_model = judge_model
       self.judge_tok = judge_tok


   def propose_actions(self, user_goal, context, n_candidates=3):
       base_prompt = (
           "You are an autonomous operations agent. "
           "Given the aim and context, record a selected subsequent motion you'll take:nn"
           f"Goal: {user_goal}nContext: {context}nAction:"
       )
       candidates = []
       for _ in vary(n_candidates):
           motion = generate_causal(self.policy_model, self.policy_tok, base_prompt, max_new_tokens=40)
           motion = motion.cut up("n")[0]
           candidates.append(motion.strip())
       return record(dict.fromkeys(candidates))


   def judge_action(self, motion, org_values):
       judge_prompt = (
           "You are the Ethics & Compliance Reviewer.n"
           "Evaluate the proposed agent motion.n"
           "Return fields:n"
           "RiskLevel (LOW/MED/HIGH),n"
           "Issues (brief bullet-style textual content),n"
           "Recommendation (approve / modify / reject).nn"
           f"ORG_VALUES:n{org_values}nn"
           f"ACTION:n{motion}nn"
           "Answer on this format:n"
           "RiskLevel: ...nIssues: ...nRecommendation: ..."
       )
       verdict = generate_seq2seq(self.judge_model, self.judge_tok, judge_prompt, max_new_tokens=128)
       return verdict.strip()


   def align_action(self, motion, verdict, org_values):
       align_prompt = (
           "You are an Ethics Alignment Assistant.n"
           "Your job is to FIX the proposed motion so it follows ORG_VALUES.n"
           "Keep it efficient however secure, authorized, and respectful.nn"
           f"ORG_VALUES:n{org_values}nn"
           f"ORIGINAL_ACTION:n{motion}nn"
           f"VERDICT_FROM_REVIEWER:n{verdict}nn"
           "Rewrite ONLY IF NEEDED. If authentic is okay, return it unchanged. "
           "Return simply the ultimate aligned motion:"
       )
       aligned = generate_seq2seq(self.judge_model, self.judge_tok, align_prompt, max_new_tokens=128)
       return aligned.strip()

We outline the core agent class that generates, evaluates, and refines actions. Here, we design strategies for proposing candidate actions, evaluating their moral compliance, and rewriting them to align with values. This construction helps us modularize reasoning, judgment, and correction into clear practical steps. Check out the FULL CODES here.

Copy Code

   def resolve(self, user_goal, context, org_values, n_candidates=3):
       proposals = self.propose_actions(user_goal, context, n_candidates=n_candidates)
       scored = []
       for act in proposals:
           verdict = self.judge_action(act, org_values)
           aligned_act = self.align_action(act, verdict, org_values)
           scored.append({"original_action": act, "overview": verdict, "aligned_action": aligned_act})


       def extract_risk(vtext):
           for line in vtext.splitlines():
               if "RiskLevel" in line:
                   lvl = line.cut up(":", 1)[-1].strip().higher()
                   if "LOW" in lvl:
                       return 0
                   if "MED" in lvl:
                       return 1
                   if "HIGH" in lvl:
                       return 2
           return 3


       scored_sorted = sorted(scored, key=lambda x: extract_risk(x["review"]))
       final_choice = scored_sorted[0]
       report = {
           "aim": user_goal,
           "context": context,
           "org_values": org_values,
           "candidates_evaluated": scored,
           "final_plan": final_choice["aligned_action"],
           "final_plan_rationale": final_choice["review"],
       }
       return report

We implement the entire decision-making pipeline that hyperlinks era, judgment, and alignment. We assign threat scores to every candidate motion and mechanically select probably the most ethically aligned one. This part captures how the agent can self-assess and enhance its selections earlier than finalizing an motion. Check out the FULL CODES here.

Copy Code

org_values_text = (
   "- Respect privateness; don't entry private knowledge with out consent.n"
   "- Follow all legal guidelines and security insurance policies.n"
   "- Avoid discrimination, harassment, or dangerous manipulation.n"
   "- Be clear and truthful with stakeholders.n"
   "- Prioritize person well-being and long-term belief over short-term achieve."
)


demo_goal = "Increase buyer adoption of the brand new monetary product."
demo_context = (
   "The agent works for a financial institution outreach crew. The goal prospects are small household companies. "
   "Regulations require sincere disclosure of dangers and charges. Cold-calling minors or mendacity about phrases is illegitimate."
)


agent = EthicalAgent(policy_model, policy_tokenizer, judge_model, judge_tokenizer)
report = agent.resolve(demo_goal, demo_context, org_values_text, n_candidates=4)


def pretty_report(r):
   print("=== ETHICAL DECISION REPORT ===")
   print(f"Goal: {r['goal']}n")
   print(f"Context: {r['context']}n")
   print("Org Values:")
   print(r["org_values"])
   print("n--- Candidate Evaluations ---")
   for i, cand in enumerate(r["candidates_evaluated"], 1):
       print(f"nCandidate {i}:")
       print("Original Action:")
       print(" ", cand["original_action"])
       print("Ethics Review:")
       print(cand["review"])
       print("Aligned Action:")
       print(" ", cand["aligned_action"])
   print("n--- Final Plan Selected ---")
   print(r["final_plan"])
   print("nWhy this plan is appropriate (overview snippet):")
   print(r["final_plan_rationale"])


pretty_report(report)

We outline organizational values, create a real-world state of affairs, and run the moral agent to generate its last plan. Finally, we print an in depth report exhibiting candidate actions, critiques, and the chosen moral resolution. Through this, we observe how our agent integrates ethics immediately into its reasoning course of.

In conclusion, we clearly perceive how an agent can cause not solely about what to do but additionally about whether or not to do it. We witness how the system learns to establish dangers, appropriate itself, and align its actions with human and organizational ideas. This train helps us understand that worth alignment and ethics aren’t summary concepts however sensible mechanisms we are able to embed into agentic methods to make them safer, fairer, and extra reliable.

Check out the FULL CODES here. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to observe us on Twitter and don’t neglect to be part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit How to Build Ethically Aligned Autonomous Agents through Value-Guided Reasoning and Self-Correcting Decision-Making Using Open-Source Models appeared first on MarkTechPost.

How to Build Ethically Aligned Autonomous Agents through Value-Guided Reasoning and Self-Correcting Decision-Making Using Open-Source Models

Amazon Unveils Bedrock AgentCore Gateway: Redefining Enterprise AI Agent Tool Integration

How Financial Institutions Can Prepare for the Future of Fraud with Responsible AI Deployments – with JoAnn Stonier of Mastercard

Getting Started with Agent Communication Protocol (ACP): Build a Weather Agent with Python

Meta AI Introduces DreamGym: A Textual Experience Synthesizer For Reinforcement learning RL Agents

BlackRock Introduces AlphaAgents: Advancing Equity Portfolio Construction with Multi-Agent LLM Collaboration

Build an Autonomous Wet-Lab Protocol Planner and Validator Using Salesforce CodeGen for Agentic Experiment Design and Safety Optimization

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!