Build an Autonomous Wet-Lab Protocol Planner and Validator Using Salesforce CodeGen for Agentic Experiment Design and Safety Optimization
In this tutorial, we construct a Wet-Lab Protocol Planner & Validator that acts as an clever agent for experimental design and execution. We design the system utilizing Python and combine Salesforce’s CodeGen-350M-mono model for pure language reasoning. We construction the pipeline into modular elements: ProtocolParser for extracting structured information, corresponding to steps, durations, and temperatures, from textual protocols; InventoryManager for validating reagent availability and expiry; Schedule Planner for producing timelines and parallelization; and Safety Validator for figuring out biosafety or chemical hazards. The LLM is then used to generate optimization recommendations, successfully closing the loop between notion, planning, validation, and refinement.
import re, json, pandas as pd
from datetime import datetime, timedelta
from collections import defaultdict
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
MODEL_NAME = "Salesforce/codegen-350M-mono"
print("Loading CodeGen mannequin (30 seconds)...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
mannequin = AutoModelForCausalLM.from_pretrained(
MODEL_NAME, torch_dtype=torch.float16, device_map="auto"
)
print("✓ Model loaded!")
We start by importing important libraries and loading the Salesforce CodeGen-350M-mono mannequin regionally for light-weight, API-free inference. We initialize each the tokenizer and mannequin with float16 precision and automated system mapping to make sure compatibility and pace on Colab GPUs.
class ProtocolParser:
def read_protocol(self, textual content):
steps = []
traces = textual content.break up('n')
for i, line in enumerate(traces, 1):
step_match = re.search(r'^(d+).s+(.+)', line.strip())
if step_match:
num, identify = step_match.teams()
context = 'n'.be a part of(traces[i:min(i+4, len(lines))])
period = self._extract_duration(context)
temp = self._extract_temp(context)
security = self._check_safety(context)
steps.append({
'step': int(num), 'identify': identify, 'duration_min': period,
'temp': temp, 'security': security, 'line': i, 'particulars': context[:200]
})
return steps
def _extract_duration(self, textual content):
textual content = textual content.decrease()
if 'in a single day' in textual content: return 720
match = re.search(r'(d+)s*(?:hour|hr|h)(?:s)?(?!w)', textual content)
if match: return int(match.group(1)) * 60
match = re.search(r'(d+)s*(?:min|minute)(?:s)?', textual content)
if match: return int(match.group(1))
match = re.search(r'(d+)-(d+)s*(?:min|minute)', textual content)
if match: return (int(match.group(1)) + int(match.group(2))) // 2
return 30
def _extract_temp(self, textual content):
textual content = textual content.decrease()
if '4°c' in textual content or '4 °c' in textual content or '4°' in textual content: return '4C'
if '37°c' in textual content or '37 °c' in textual content: return '37C'
if '-20°c' in textual content or '-80°c' in textual content: return 'FREEZER'
if 'room temp' in textual content or 'rt' in textual content or 'ambient' in textual content: return 'RT'
return 'RT'
def _check_safety(self, textual content):
flags = []
text_lower = textual content.decrease()
if re.search(r'bsl-[23]|biosafety', text_lower): flags.append('BSL-2/3')
if re.search(r'warning|corrosive|hazard|poisonous', text_lower): flags.append('HAZARD')
if 'sharp' in text_lower or 'needle' in text_lower: flags.append('SHARPS')
if 'darkish' in text_lower or 'light-sensitive' in text_lower: flags.append('LIGHT-SENSITIVE')
if 'flammable' in text_lower: flags.append('FLAMMABLE')
return flags
class InventoryManager:
def __init__(self, csv_text):
from io import StringIO
self.df = pd.read_csv(StringIO(csv_text))
self.df['expiry'] = pd.to_datetime(self.df['expiry'])
def check_availability(self, reagent_list):
points = []
for reagent in reagent_list:
reagent_clean = reagent.decrease().substitute('_', ' ').substitute('-', ' ')
matches = self.df[self.df['reagent'].str.decrease().str.incorporates(
'|'.be a part of(reagent_clean.break up()[:2]), na=False, regex=True
)]
if matches.empty:
points.append(f"
{reagent}: NOT IN INVENTORY")
else:
row = matches.iloc[0]
if row['expiry'] < datetime.now():
points.append(f"
{reagent}: EXPIRED on {row['expiry'].date()} (lot {row['lot']})")
elif (row['expiry'] - datetime.now()).days < 30:
points.append(f"
{reagent}: Expires quickly ({row['expiry'].date()}, lot {row['lot']})")
if row['quantity'] < 10:
points.append(f"
{reagent}: LOW STOCK ({row['quantity']} {row['unit']} remaining)")
return points
def extract_reagents(self, protocol_text):
reagents = set()
patterns = [
r'b([A-Z][a-z]+(?:s+[A-Z][a-z]+)*)s+(?:antibody|buffer|resolution)',
r'b([A-Z]{2,}(?:-[A-Z0-9]+)?)b',
r'(?:add|use|put together|dilute)s+([a-z-]+s*(?:antibody|buffer|substrate|resolution))',
]
for sample in patterns:
matches = re.findall(sample, protocol_text, re.IGNORECASE)
reagents.replace(m.strip() for m in matches if len(m) > 2)
return listing(reagents)[:15]
We outline the ProtocolParser and InventoryManager lessons to extract structured experimental particulars and confirm reagent stock. We parse every protocol step for period, temperature, and security markers, whereas the stock supervisor validates inventory ranges, expiry dates, and reagent availability by way of fuzzy matching.
class SchedulePlanner:
def make_schedule(self, steps, start_time="09:00"):
schedule = []
present = datetime.strptime(f"2025-01-01 {start_time}", "%Y-%m-%d %H:%M")
day = 1
for step in steps:
finish = present + timedelta(minutes=step['duration_min'])
if step['duration_min'] > 480:
day += 1
present = datetime.strptime(f"2025-01-0{day} 09:00", "%Y-%m-%d %H:%M")
finish = present
schedule.append({
'step': step['step'], 'identify': step['name'][:40],
'begin': present.strftime("%H:%M"), 'finish': finish.strftime("%H:%M"),
'period': step['duration_min'], 'temp': step['temp'],
'day': day, 'can_parallelize': step['duration_min'] > 60,
'security': ', '.be a part of(step['safety']) if step['safety'] else 'None'
})
if step['duration_min'] <= 480:
present = finish
return schedule
def optimize_parallelization(self, schedule):
parallel_groups = []
idle_time = 0
for i, step in enumerate(schedule):
if step['can_parallelize'] and i + 1 < len(schedule):
next_step = schedule[i+1]
if step['temp'] == next_step['temp']:
saved = min(step['duration'], next_step['duration'])
parallel_groups.append(
f"
Steps {step['step']} & {next_step['step']} can overlap → Save {saved} min"
)
idle_time += saved
return parallel_groups, idle_time
class SafetyValidator:
RULES = {
'ph_range': (5.0, 11.0),
'temp_limits': {'4C': (2, 8), '37C': (35, 39), 'RT': (20, 25)},
'max_concurrent_instruments': 3,
}
def validate(self, steps):
dangers = []
for step in steps:
ph_match = re.search(r'phs*(d+.?d*)', step['details'].decrease())
if ph_match:
ph = float(ph_match.group(1))
if not (self.RULES['ph_range'][0] <= ph <= self.RULES['ph_range'][1]):
dangers.append(f"
Step {step['step']}: pH {ph} OUT OF SAFE RANGE")
if 'BSL-2/3' in step['safety']:
dangers.append(f"
Step {step['step']}: BSL-2 cupboard REQUIRED")
if 'HAZARD' in step['safety']:
dangers.append(f"
Step {step['step']}: Full PPE + chemical hood REQUIRED")
if 'SHARPS' in step['safety']:
dangers.append(f"
Step {step['step']}: Sharps container + needle security")
if 'LIGHT-SENSITIVE' in step['safety']:
dangers.append(f"
Step {step['step']}: Work in darkish/amber tubes")
return dangers
We implement the SchedulePlanner and SafetyValidator to design environment friendly experiment timelines and implement lab security requirements. We dynamically generate every day schedules, determine parallelizable steps, and validate potential dangers, corresponding to unsafe pH ranges, hazardous chemical substances, or biosafety-level necessities.
def llm_call(immediate, max_tokens=200):
attempt:
inputs = tokenizer(immediate, return_tensors="pt", truncation=True, max_length=512).to(mannequin.system)
outputs = mannequin.generate(
**inputs, max_new_tokens=max_tokens, do_sample=True,
temperature=0.7, top_p=0.9, pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):].strip()
besides:
return "Batch related temperature steps collectively. Pre-warm devices."
def agent_loop(protocol_text, inventory_csv, start_time="09:00"):
print("n
AGENT STARTING PROTOCOL ANALYSIS...n")
parser = ProtocolParser()
steps = parser.read_protocol(protocol_text)
print(f"
Parsed {len(steps)} protocol steps")
stock = InventoryManager(inventory_csv)
reagents = stock.extract_reagents(protocol_text)
print(f"
Identified {len(reagents)} reagents: {', '.be a part of(reagents[:5])}...")
inv_issues = stock.check_availability(reagents)
validator = SafetyValidator()
safety_risks = validator.validate(steps)
planner = SchedulePlanner()
schedule = planner.make_schedule(steps, start_time)
parallel_opts, time_saved = planner.optimize_parallelization(schedule)
total_time = sum(s['duration'] for s in schedule)
optimized_time = total_time - time_saved
opt_prompt = f"Protocol has {len(steps)} steps, {total_time} min whole. Key bottleneck optimization:"
optimization = llm_call(opt_prompt, max_tokens=80)
return {
'steps': steps, 'schedule': schedule, 'inventory_issues': inv_issues,
'safety_risks': safety_risks, 'parallelization': parallel_opts,
'time_saved': time_saved, 'total_time': total_time,
'optimized_time': optimized_time, 'ai_optimization': optimization,
'reagents': reagents
}
We assemble the agent loop, integrating notion, planning, validation, and revision right into a single, coherent circulation. We use CodeGen for reasoning-based optimization to refine step sequencing and suggest sensible enhancements for effectivity and parallel execution.
def generate_checklist(outcomes):
md = "#
WET-LAB PROTOCOL CHECKLISTnn"
md += f"**Total Steps:** {len(outcomes['schedule'])}n"
md += f"**Estimated Time:** {outcomes['total_time']} min ({outcomes['total_time']//60}h {outcomes['total_time']%60}m)n"
md += f"**Optimized Time:** {outcomes['optimized_time']} min (save {outcomes['time_saved']} min)nn"
md += "##
TIMELINEn"
current_day = 1
for merchandise in outcomes['schedule']:
if merchandise['day'] > current_day:
md += f"n### Day {merchandise['day']}n"
current_day = merchandise['day']
parallel = "
" if merchandise['can_parallelize'] else ""
md += f"- [ ] **{merchandise['start']}-{merchandise['end']}** | Step {merchandise['step']}: {merchandise['name']} ({merchandise['temp']}){parallel}n"
md += "n##
REAGENT PICK-LISTn"
for reagent in outcomes['reagents']:
md += f"- [ ] {reagent}n"
md += "n##
SAFETY & INVENTORY ALERTSn"
all_issues = outcomes['safety_risks'] + outcomes['inventory_issues']
if all_issues:
for danger in all_issues:
md += f"- {danger}n"
else:
md += "-
No vital points detectedn"
md += "n##
OPTIMIZATION TIPSn"
for tip in outcomes['parallelization']:
md += f"- {tip}n"
md += f"-
AI Suggestion: {outcomes['ai_optimization']}n"
return md
def generate_gantt_csv(schedule):
df = pd.DataFrame(schedule)
return df.to_csv(index=False)
We create output turbines that remodel outcomes into human-readable Markdown checklists and Gantt-compatible CSVs. We be sure that each execution produces clear summaries of reagents, time financial savings, and security or stock alerts for streamlined lab operations.
SAMPLE_PROTOCOL = """ELISA Protocol for Cytokine Detection
1. Coating (Day 1, 4°C in a single day)
- Dilute seize antibody to 2 μg/mL in coating buffer (pH 9.6)
- Add 100 μL per nicely to 96-well plate
- Incubate at 4°C in a single day (12-16 hours)
- BSL-2 cupboard required
2. Blocking (Day 2)
- Wash plate 3× with PBS-T (200 μL/nicely)
- Add 200 μL blocking buffer (1% BSA in PBS)
- Incubate 1 hour at room temperature
3. Sample Incubation
- Wash 3× with PBS-T
- Add 100 μL diluted samples/requirements
- Incubate 2 hours at room temperature
4. Detection Antibody
- Wash 5× with PBS-T
- Add 100 μL biotinylated detection antibody (0.5 μg/mL)
- Incubate 1 hour at room temperature
5. Streptavidin-HRP
- Wash 5× with PBS-T
- Add 100 μL streptavidin-HRP (1:1000 dilution)
- Incubate half-hour at room temperature
- Work in darkish
6. Development
- Wash 7× with PBS-T
- Add 100 μL TMB substrate
- Incubate 10-Quarter-hour (monitor shade growth)
- Add 50 μL cease resolution (2M H2SO4) - CAUTION: corrosive
"""
SAMPLE_INVENTORY = """reagent,amount,unit,expiry,lot
seize antibody,500,μg,2025-12-31,AB123
blocking buffer,500,mL,2025-11-30,BB456
PBS-T,1000,mL,2026-01-15,PT789
detection antibody,8,μg,2025-10-15,DA321
streptavidin HRP,10,mL,2025-12-01,SH654
TMB substrate,100,mL,2025-11-20,TM987
cease resolution,250,mL,2026-03-01,SS147
BSA,100,g,2024-09-30,BS741"""
outcomes = agent_loop(SAMPLE_PROTOCOL, SAMPLE_INVENTORY, start_time="09:00")
print("n" + "="*70)
print(generate_checklist(outcomes))
print("n" + "="*70)
print("n
GANTT CSV (first 400 chars):n")
print(generate_gantt_csv(outcomes['schedule'])[:400])
print("n
Time Savings:", f"{outcomes['time_saved']} minutes through parallelization")
We conduct a complete check run utilizing a pattern ELISA protocol and a reagent stock dataset. We visualize the agent’s outputs, optimized schedule, parallelization positive aspects, and AI-suggested enhancements, demonstrating how our planner features as a self-contained, clever lab assistant.
At final, we demonstrated how agentic AI ideas can improve reproducibility and security in wet-lab workflows. By parsing free-form experimental textual content into structured, actionable plans, we automated protocol validation, reagent administration, and temporal optimization in a single pipeline. The integration of CodeGen permits on-device reasoning about bottlenecks and security circumstances, permitting for self-contained, data-secure operations. We concluded with a totally useful planner that generates Gantt-compatible schedules, Markdown checklists, and AI-driven optimization ideas, establishing a strong basis for autonomous laboratory planning programs.
Check out the FULL CODES here. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to observe us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The put up Build an Autonomous Wet-Lab Protocol Planner and Validator Using Salesforce CodeGen for Agentic Experiment Design and Safety Optimization appeared first on MarkTechPost.
