Build an Autonomous Wet-Lab Protocol Planner and Validator Using Salesforce CodeGen for Agentic Experiment Design and Safety Optimization

In this tutorial, we construct a Wet-Lab Protocol Planner & Validator that acts as an clever agent for experimental design and execution. We design the system utilizing Python and combine Salesforce’s CodeGen-350M-mono model for pure language reasoning. We construction the pipeline into modular elements: ProtocolParser for extracting structured information, corresponding to steps, durations, and temperatures, from textual protocols; InventoryManager for validating reagent availability and expiry; Schedule Planner for producing timelines and parallelization; and Safety Validator for figuring out biosafety or chemical hazards. The LLM is then used to generate optimization recommendations, successfully closing the loop between notion, planning, validation, and refinement.

Copy Code

import re, json, pandas as pd
from datetime import datetime, timedelta
from collections import defaultdict
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch


MODEL_NAME = "Salesforce/codegen-350M-mono"
print("Loading CodeGen mannequin (30 seconds)...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
mannequin = AutoModelForCausalLM.from_pretrained(
   MODEL_NAME, torch_dtype=torch.float16, device_map="auto"
)
print("✓ Model loaded!")

We start by importing important libraries and loading the Salesforce CodeGen-350M-mono mannequin regionally for light-weight, API-free inference. We initialize each the tokenizer and mannequin with float16 precision and automated system mapping to make sure compatibility and pace on Colab GPUs.

Copy Code

class ProtocolParser:
   def read_protocol(self, textual content):
       steps = []
       traces = textual content.break up('n')
       for i, line in enumerate(traces, 1):
           step_match = re.search(r'^(d+).s+(.+)', line.strip())
           if step_match:
               num, identify = step_match.teams()
               context = 'n'.be a part of(traces[i:min(i+4, len(lines))])
               period = self._extract_duration(context)
               temp = self._extract_temp(context)
               security = self._check_safety(context)
               steps.append({
                   'step': int(num), 'identify': identify, 'duration_min': period,
                   'temp': temp, 'security': security, 'line': i, 'particulars': context[:200]
               })
       return steps
  
   def _extract_duration(self, textual content):
       textual content = textual content.decrease()
       if 'in a single day' in textual content: return 720
       match = re.search(r'(d+)s*(?:hour|hr|h)(?:s)?(?!w)', textual content)
       if match: return int(match.group(1)) * 60
       match = re.search(r'(d+)s*(?:min|minute)(?:s)?', textual content)
       if match: return int(match.group(1))
       match = re.search(r'(d+)-(d+)s*(?:min|minute)', textual content)
       if match: return (int(match.group(1)) + int(match.group(2))) // 2
       return 30
  
   def _extract_temp(self, textual content):
       textual content = textual content.decrease()
       if '4°c' in textual content or '4 °c' in textual content or '4°' in textual content: return '4C'
       if '37°c' in textual content or '37 °c' in textual content: return '37C'
       if '-20°c' in textual content or '-80°c' in textual content: return 'FREEZER'
       if 'room temp' in textual content or 'rt' in textual content or 'ambient' in textual content: return 'RT'
       return 'RT'
  
   def _check_safety(self, textual content):
       flags = []
       text_lower = textual content.decrease()
       if re.search(r'bsl-[23]|biosafety', text_lower): flags.append('BSL-2/3')
       if re.search(r'warning|corrosive|hazard|poisonous', text_lower): flags.append('HAZARD')
       if 'sharp' in text_lower or 'needle' in text_lower: flags.append('SHARPS')
       if 'darkish' in text_lower or 'light-sensitive' in text_lower: flags.append('LIGHT-SENSITIVE')
       if 'flammable' in text_lower: flags.append('FLAMMABLE')
       return flags


class InventoryManager:
   def __init__(self, csv_text):
       from io import StringIO
       self.df = pd.read_csv(StringIO(csv_text))
       self.df['expiry'] = pd.to_datetime(self.df['expiry'])
  
   def check_availability(self, reagent_list):
       points = []
       for reagent in reagent_list:
           reagent_clean = reagent.decrease().substitute('_', ' ').substitute('-', ' ')
           matches = self.df[self.df['reagent'].str.decrease().str.incorporates(
               '|'.be a part of(reagent_clean.break up()[:2]), na=False, regex=True
           )]
           if matches.empty:
               points.append(f" {reagent}: NOT IN INVENTORY")
           else:
               row = matches.iloc[0]
               if row['expiry'] < datetime.now():
                   points.append(f"  {reagent}: EXPIRED on {row['expiry'].date()} (lot {row['lot']})")
               elif (row['expiry'] - datetime.now()).days < 30:
                   points.append(f"  {reagent}: Expires quickly ({row['expiry'].date()}, lot {row['lot']})")
               if row['quantity'] < 10:
                   points.append(f"  {reagent}: LOW STOCK ({row['quantity']} {row['unit']} remaining)")
       return points
  
   def extract_reagents(self, protocol_text):
       reagents = set()
       patterns = [
           r'b([A-Z][a-z]+(?:s+[A-Z][a-z]+)*)s+(?:antibody|buffer|resolution)',
           r'b([A-Z]{2,}(?:-[A-Z0-9]+)?)b',
           r'(?:add|use|put together|dilute)s+([a-z-]+s*(?:antibody|buffer|substrate|resolution))',
       ]
       for sample in patterns:
           matches = re.findall(sample, protocol_text, re.IGNORECASE)
           reagents.replace(m.strip() for m in matches if len(m) > 2)
       return listing(reagents)[:15]

We outline the ProtocolParser and InventoryManager lessons to extract structured experimental particulars and confirm reagent stock. We parse every protocol step for period, temperature, and security markers, whereas the stock supervisor validates inventory ranges, expiry dates, and reagent availability by way of fuzzy matching.

Copy Code

class SchedulePlanner:
   def make_schedule(self, steps, start_time="09:00"):
       schedule = []
       present = datetime.strptime(f"2025-01-01 {start_time}", "%Y-%m-%d %H:%M")
       day = 1
       for step in steps:
           finish = present + timedelta(minutes=step['duration_min'])
           if step['duration_min'] > 480:
               day += 1
               present = datetime.strptime(f"2025-01-0{day} 09:00", "%Y-%m-%d %H:%M")
               finish = present
           schedule.append({
               'step': step['step'], 'identify': step['name'][:40],
               'begin': present.strftime("%H:%M"), 'finish': finish.strftime("%H:%M"),
               'period': step['duration_min'], 'temp': step['temp'],
               'day': day, 'can_parallelize': step['duration_min'] > 60,
               'security': ', '.be a part of(step['safety']) if step['safety'] else 'None'
           })
           if step['duration_min'] <= 480:
               present = finish
       return schedule
  
   def optimize_parallelization(self, schedule):
       parallel_groups = []
       idle_time = 0
       for i, step in enumerate(schedule):
           if step['can_parallelize'] and i + 1 < len(schedule):
               next_step = schedule[i+1]
               if step['temp'] == next_step['temp']:
                   saved = min(step['duration'], next_step['duration'])
                   parallel_groups.append(
                       f" Steps {step['step']} & {next_step['step']} can overlap → Save {saved} min"
                   )
                   idle_time += saved
       return parallel_groups, idle_time


class SafetyValidator:
   RULES = {
       'ph_range': (5.0, 11.0),
       'temp_limits': {'4C': (2, 8), '37C': (35, 39), 'RT': (20, 25)},
       'max_concurrent_instruments': 3,
   }
  
   def validate(self, steps):
       dangers = []
       for step in steps:
           ph_match = re.search(r'phs*(d+.?d*)', step['details'].decrease())
           if ph_match:
               ph = float(ph_match.group(1))
               if not (self.RULES['ph_range'][0] <= ph <= self.RULES['ph_range'][1]):
                   dangers.append(f"  Step {step['step']}: pH {ph} OUT OF SAFE RANGE")
           if 'BSL-2/3' in step['safety']:
               dangers.append(f"  Step {step['step']}: BSL-2 cupboard REQUIRED")
           if 'HAZARD' in step['safety']:
               dangers.append(f" Step {step['step']}: Full PPE + chemical hood REQUIRED")
           if 'SHARPS' in step['safety']:
               dangers.append(f" Step {step['step']}: Sharps container + needle security")
           if 'LIGHT-SENSITIVE' in step['safety']:
               dangers.append(f" Step {step['step']}: Work in darkish/amber tubes")
       return dangers

We implement the SchedulePlanner and SafetyValidator to design environment friendly experiment timelines and implement lab security requirements. We dynamically generate every day schedules, determine parallelizable steps, and validate potential dangers, corresponding to unsafe pH ranges, hazardous chemical substances, or biosafety-level necessities.

Copy Code

def llm_call(immediate, max_tokens=200):
   attempt:
       inputs = tokenizer(immediate, return_tensors="pt", truncation=True, max_length=512).to(mannequin.system)
       outputs = mannequin.generate(
           **inputs, max_new_tokens=max_tokens, do_sample=True,
           temperature=0.7, top_p=0.9, pad_token_id=tokenizer.eos_token_id
       )
       return tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):].strip()
   besides:
       return "Batch related temperature steps collectively. Pre-warm devices."


def agent_loop(protocol_text, inventory_csv, start_time="09:00"):
   print("n AGENT STARTING PROTOCOL ANALYSIS...n")
   parser = ProtocolParser()
   steps = parser.read_protocol(protocol_text)
   print(f" Parsed {len(steps)} protocol steps")
   stock = InventoryManager(inventory_csv)
   reagents = stock.extract_reagents(protocol_text)
   print(f" Identified {len(reagents)} reagents: {', '.be a part of(reagents[:5])}...")
   inv_issues = stock.check_availability(reagents)
   validator = SafetyValidator()
   safety_risks = validator.validate(steps)
   planner = SchedulePlanner()
   schedule = planner.make_schedule(steps, start_time)
   parallel_opts, time_saved = planner.optimize_parallelization(schedule)
   total_time = sum(s['duration'] for s in schedule)
   optimized_time = total_time - time_saved
   opt_prompt = f"Protocol has {len(steps)} steps, {total_time} min whole. Key bottleneck optimization:"
   optimization = llm_call(opt_prompt, max_tokens=80)
   return {
       'steps': steps, 'schedule': schedule, 'inventory_issues': inv_issues,
       'safety_risks': safety_risks, 'parallelization': parallel_opts,
       'time_saved': time_saved, 'total_time': total_time,
       'optimized_time': optimized_time, 'ai_optimization': optimization,
       'reagents': reagents
   }

We assemble the agent loop, integrating notion, planning, validation, and revision right into a single, coherent circulation. We use CodeGen for reasoning-based optimization to refine step sequencing and suggest sensible enhancements for effectivity and parallel execution.

Copy Code

def generate_checklist(outcomes):
   md = "#  WET-LAB PROTOCOL CHECKLISTnn"
   md += f"**Total Steps:** {len(outcomes['schedule'])}n"
   md += f"**Estimated Time:** {outcomes['total_time']} min ({outcomes['total_time']//60}h {outcomes['total_time']%60}m)n"
   md += f"**Optimized Time:** {outcomes['optimized_time']} min (save {outcomes['time_saved']} min)nn"
   md += "##  TIMELINEn"
   current_day = 1
   for merchandise in outcomes['schedule']:
       if merchandise['day'] > current_day:
           md += f"n### Day {merchandise['day']}n"
           current_day = merchandise['day']
       parallel = " " if merchandise['can_parallelize'] else ""
       md += f"- [ ] **{merchandise['start']}-{merchandise['end']}** | Step {merchandise['step']}: {merchandise['name']} ({merchandise['temp']}){parallel}n"
   md += "n##  REAGENT PICK-LISTn"
   for reagent in outcomes['reagents']:
       md += f"- [ ] {reagent}n"
   md += "n##  SAFETY & INVENTORY ALERTSn"
   all_issues = outcomes['safety_risks'] + outcomes['inventory_issues']
   if all_issues:
       for danger in all_issues:
           md += f"- {danger}n"
   else:
       md += "-  No vital points detectedn"
   md += "n##  OPTIMIZATION TIPSn"
   for tip in outcomes['parallelization']:
       md += f"- {tip}n"
   md += f"-  AI Suggestion: {outcomes['ai_optimization']}n"
   return md


def generate_gantt_csv(schedule):
   df = pd.DataFrame(schedule)
   return df.to_csv(index=False)

We create output turbines that remodel outcomes into human-readable Markdown checklists and Gantt-compatible CSVs. We be sure that each execution produces clear summaries of reagents, time financial savings, and security or stock alerts for streamlined lab operations.

Copy Code

SAMPLE_PROTOCOL = """ELISA Protocol for Cytokine Detection


1. Coating (Day 1, 4°C in a single day)
  - Dilute seize antibody to 2 μg/mL in coating buffer (pH 9.6)
  - Add 100 μL per nicely to 96-well plate
  - Incubate at 4°C in a single day (12-16 hours)
  - BSL-2 cupboard required


2. Blocking (Day 2)
  - Wash plate 3× with PBS-T (200 μL/nicely)
  - Add 200 μL blocking buffer (1% BSA in PBS)
  - Incubate 1 hour at room temperature


3. Sample Incubation
  - Wash 3× with PBS-T
  - Add 100 μL diluted samples/requirements
  - Incubate 2 hours at room temperature


4. Detection Antibody
  - Wash 5× with PBS-T
  - Add 100 μL biotinylated detection antibody (0.5 μg/mL)
  - Incubate 1 hour at room temperature


5. Streptavidin-HRP
  - Wash 5× with PBS-T
  - Add 100 μL streptavidin-HRP (1:1000 dilution)
  - Incubate half-hour at room temperature
  - Work in darkish


6. Development
  - Wash 7× with PBS-T
  - Add 100 μL TMB substrate
  - Incubate 10-Quarter-hour (monitor shade growth)
  - Add 50 μL cease resolution (2M H2SO4) - CAUTION: corrosive
"""


SAMPLE_INVENTORY = """reagent,amount,unit,expiry,lot
seize antibody,500,μg,2025-12-31,AB123
blocking buffer,500,mL,2025-11-30,BB456
PBS-T,1000,mL,2026-01-15,PT789
detection antibody,8,μg,2025-10-15,DA321
streptavidin HRP,10,mL,2025-12-01,SH654
TMB substrate,100,mL,2025-11-20,TM987
cease resolution,250,mL,2026-03-01,SS147
BSA,100,g,2024-09-30,BS741"""


outcomes = agent_loop(SAMPLE_PROTOCOL, SAMPLE_INVENTORY, start_time="09:00")
print("n" + "="*70)
print(generate_checklist(outcomes))
print("n" + "="*70)
print("n GANTT CSV (first 400 chars):n")
print(generate_gantt_csv(outcomes['schedule'])[:400])
print("n Time Savings:", f"{outcomes['time_saved']} minutes through parallelization")

We conduct a complete check run utilizing a pattern ELISA protocol and a reagent stock dataset. We visualize the agent’s outputs, optimized schedule, parallelization positive aspects, and AI-suggested enhancements, demonstrating how our planner features as a self-contained, clever lab assistant.

At final, we demonstrated how agentic AI ideas can improve reproducibility and security in wet-lab workflows. By parsing free-form experimental textual content into structured, actionable plans, we automated protocol validation, reagent administration, and temporal optimization in a single pipeline. The integration of CodeGen permits on-device reasoning about bottlenecks and security circumstances, permitting for self-contained, data-secure operations. We concluded with a totally useful planner that generates Gantt-compatible schedules, Markdown checklists, and AI-driven optimization ideas, establishing a strong basis for autonomous laboratory planning programs.

Check out the FULL CODES here. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to observe us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The put up Build an Autonomous Wet-Lab Protocol Planner and Validator Using Salesforce CodeGen for Agentic Experiment Design and Safety Optimization appeared first on MarkTechPost.

Build an Autonomous Wet-Lab Protocol Planner and Validator Using Salesforce CodeGen for Agentic Experiment Design and Safety Optimization

Anthropic’s New Research Shows Claude can Detect Injected Concepts, but only in Controlled Layers

How to Build a Fully Functional Computer-Use Agent that Thinks, Plans, and Executes Virtual Actions Using Local AI Models

Building a Context-Aware Multi-Agent AI System Using Nomic Embeddings and Gemini LLM

Perplexity Launches an AI Email Assistant Agent for Gmail and Outlook, Aimed at Scheduling, Drafting, and Inbox Triage

A Coding Implementation to Build a Self-Adaptive Goal-Oriented AI Agent Using Google Gemini and the SAGE Framework

Google AI Introduces Stax: A Practical AI Tool for Evaluating Large Language Models LLMs

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!