Build a Multi-Agent AI Workflow for Biological Network Modeling, Protein Interactions, Metabolism, and Cell Signaling Simulation
In this tutorial, we construct a multi-agent workflow for organic programs modeling and discover how totally different computational parts work collectively inside one unified programs biology pipeline. We generate artificial organic information, analyze gene regulatory construction, predict protein-protein interactions, optimize metabolic pathway exercise, and simulate a dynamic cell signaling cascade, all inside a Colab surroundings that is still sensible and reproducible. We additionally use an OpenAI mannequin to behave as a principal investigator, synthesizing the outputs of all specialised brokers into a single expert-style organic interpretation that connects regulation, interplay networks, metabolism, and signaling into a broader scientific story.
import sys, subprocess, pkgutil
def _install_if_missing(packages):
lacking = []
for p in packages:
import_name = p["import"]
if pkgutil.find_loader(import_name) is None:
lacking.append(p["pip"])
if lacking:
print("Installing:", ", ".be part of(lacking))
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + lacking)
_install_if_missing([
{"pip": "openai", "import": "openai"},
{"pip": "numpy", "import": "numpy"},
{"pip": "pandas", "import": "pandas"},
{"pip": "matplotlib", "import": "matplotlib"},
{"pip": "networkx", "import": "networkx"},
{"pip": "scikit-learn", "import": "sklearn"},
])
import os
import json
import math
import textwrap
import random
import getpass
from dataclasses import dataclass
from typing import Dict, List, Tuple, Any
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, average_precision_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from openai import OpenAI
np.random.seed(42)
random.seed(42)
OPENAI_API_KEY = None
attempt:
from google.colab import userdata
OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")
if OPENAI_API_KEY:
print("Loaded OPENAI_API_KEY from Colab Secrets.")
besides Exception:
go
if not OPENAI_API_KEY:
attempt:
OPENAI_API_KEY = getpass.getpass("Enter OPENAI_API_KEY (hidden enter): ").strip()
besides Exception:
OPENAI_API_KEY = enter("Enter OPENAI_API_KEY: ").strip()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
shopper = OpenAI(api_key=OPENAI_API_KEY)
OPENAI_MODEL = "gpt-4o-mini"
We put together the Colab surroundings and be certain all required libraries can be found earlier than the workflow begins. We import the scientific computing, machine studying, graph evaluation, plotting, and OpenAI libraries that assist the complete organic programs pipeline from begin to end. We additionally securely load the OpenAI API key both from Colab Secrets or hidden enter, initialize the shopper, and outline the mannequin so the pocket book is prepared for later LLM-based synthesis.
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def fairly(title: str, physique: str, width: int = 100):
print("n" + "=" * width)
print(title)
print("=" * width)
print(physique)
def safe_float(x):
attempt:
return float(x)
besides Exception:
return None
def generate_gene_regulatory_network(n_genes: int = 14, edge_prob: float = 0.18):
genes = [f"G{i+1}" for i in range(n_genes)]
W = np.zeros((n_genes, n_genes))
for i in vary(n_genes):
for j in vary(n_genes):
if i != j and np.random.rand() < edge_prob:
W[i, j] = np.random.uniform(-1.5, 1.5)
return genes, W
def simulate_gene_expression(W: np.ndarray, n_steps: int = 70, noise: float = 0.10):
n = W.form[0]
X = np.zeros((n_steps, n))
X[0] = np.random.uniform(0.2, 0.8, dimension=n)
for t in vary(1, n_steps):
sign = X[t-1] @ W
X[t] = sigmoid(sign + np.random.regular(0, noise, dimension=n))
return X
def generate_protein_features(n_proteins: int = 40, feature_dim: int = 10):
proteins = [f"P{i+1}" for i in range(n_proteins)]
options = np.random.regular(dimension=(n_proteins, feature_dim))
households = np.random.randint(0, 5, dimension=n_proteins)
localization = np.random.randint(0, 4, dimension=n_proteins)
return proteins, options, households, localization
def generate_ppi_dataset(proteins, options, households, localization):
rows = []
n = len(proteins)
hidden_w = np.random.regular(dimension=options.form[1])
for i in vary(n):
for j in vary(i + 1, n):
fi, fj = options[i], options[j]
sim = np.dot(fi, fj) / (np.linalg.norm(fi) * np.linalg.norm(fj) + 1e-8)
fam_same = 1 if households[i] == households[j] else 0
loc_same = 1 if localization[i] == localization[j] else 0
feat = np.concatenate([
np.abs(fi - fj),
fi * fj,
[sim, fam_same, loc_same]
])
rating = 1.4 * sim + 1.0 * fam_same + 0.8 * loc_same + 0.15 * np.dot((fi + fj) / 2, hidden_w)
prob = sigmoid(rating)
y = 1 if np.random.rand() < prob else 0
rows.append((proteins[i], proteins[j], feat, y))
return rows
def generate_metabolic_network():
metabolites = ["Glucose", "Pyruvate", "AcetylCoA", "ATP", "Biomass", "Lactate", "Ethanol"]
reactions = [
{"name": "R1_Glucose_Uptake", "yield_biomass": 0.0, "yield_atp": 0.3, "substrate_cost": 1.0, "oxygen_need": 0.0},
{"name": "R2_Glycolysis", "yield_biomass": 0.2, "yield_atp": 1.6, "substrate_cost": 0.7, "oxygen_need": 0.0},
{"name": "R3_TCA", "yield_biomass": 1.0, "yield_atp": 2.4, "substrate_cost": 0.8, "oxygen_need": 1.4},
{"name": "R4_Fermentation", "yield_biomass": 0.1, "yield_atp": 0.9, "substrate_cost": 0.4, "oxygen_need": 0.0},
{"name": "R5_Ethanol_Path", "yield_biomass": 0.15,"yield_atp": 0.8, "substrate_cost": 0.5, "oxygen_need": 0.0},
{"name": "R6_Biomass_Assembly","yield_biomass": 1.3, "yield_atp": -0.9,"substrate_cost": 0.6, "oxygen_need": 0.2},
]
return metabolites, reactions
def simulate_cell_signaling(T=200, dt=0.05, ligand_level=1.2):
t = np.arange(0, T * dt, dt)
ligand = np.ones_like(t) * ligand_level
receptor = np.zeros_like(t)
kinase = np.zeros_like(t)
tf = np.zeros_like(t)
phosphatase = np.zeros_like(t)
receptor[0] = 0.05
kinase[0] = 0.02
tf[0] = 0.01
phosphatase[0] = 0.30
for i in vary(1, len(t)):
dR = 1.6 * ligand[i-1] * (1 - receptor[i-1]) - 0.9 * receptor[i-1]
dK = 1.8 * receptor[i-1] * (1 - kinase[i-1]) - 1.1 * phosphatase[i-1] * kinase[i-1]
dTF = 1.4 * kinase[i-1] * (1 - tf[i-1]) - 0.55 * tf[i-1]
dP = 0.2 + 0.5 * tf[i-1] - 0.4 * phosphatase[i-1]
receptor[i] = np.clip(receptor[i-1] + dt * dR, 0, 1)
kinase[i] = np.clip(kinase[i-1] + dt * dK, 0, 1)
tf[i] = np.clip(tf[i-1] + dt * dTF, 0, 1)
phosphatase[i] = np.clip(phosphatase[i-1] + dt * dP, 0, 1.5)
return pd.DataFrame({
"time": t,
"ligand": ligand,
"receptor_active": receptor,
"kinase_active": kinase,
"tf_active": tf,
"phosphatase": phosphatase,
})
We outline the principle helper utilities and all artificial information technology capabilities that energy the pocket book’s organic duties. We create capabilities for gene regulatory community building, gene expression simulation, protein function technology, protein interplay dataset creation, metabolic community setup, and cell signaling dynamics, which collectively present 4 distinct organic views for evaluation. This snippet types the computational spine of the tutorial by creating the structured inputs that every specialised agent will later course of and interpret.
@dataclass
class AgentOutcome:
identify: str
abstract: Dict[str, Any]
class GeneRegulatoryNetworkAgent:
def run(self, genes, W, X) -> AgentOutcome:
corr = np.corrcoef(X.T)
inferred_edges = []
true_edges = []
n = len(genes)
for i in vary(n):
for j in vary(n):
if i == j:
proceed
if abs(corr[i, j]) > 0.35:
inferred_edges.append((genes[i], genes[j], float(corr[i, j])))
if abs(W[i, j]) > 1e-8:
true_edges.append((genes[i], genes[j], float(W[i, j])))
centrality_graph = nx.DiGraph()
for gi in genes:
centrality_graph.add_node(gi)
for i in vary(n):
for j in vary(n):
if abs(W[i, j]) > 1e-8:
centrality_graph.add_edge(genes[i], genes[j], weight=float(W[i, j]))
out_deg = dict(centrality_graph.out_degree())
in_deg = dict(centrality_graph.in_degree())
hubs = sorted(out_deg.gadgets(), key=lambda x: x[1], reverse=True)[:5]
sinks = sorted(in_deg.gadgets(), key=lambda x: x[1], reverse=True)[:5]
dynamic_var = X.var(axis=0)
most_dynamic = sorted(zip(genes, dynamic_var), key=lambda x: x[1], reverse=True)[:5]
abstract = {
"num_genes": n,
"num_true_regulatory_edges": len(true_edges),
"num_inferred_associations": len(inferred_edges),
"top_hub_genes": [{"gene": g, "out_degree": int(d)} for g, d in hubs],
"top_sink_genes": [{"gene": g, "in_degree": int(d)} for g, d in sinks],
"most_dynamic_genes": [{"gene": g, "variance": round(float(v), 4)} for g, v in most_dynamic],
"sample_inferred_edges": [
{"source": a, "target": b, "association": round(c, 3)}
for a, b, c in inferred_edges[:10]
],
"expression_tail_mean": spherical(float(X[-10:].imply()), 4),
}
return AgentOutcome(identify="GeneRegulatoryNetworkAgent", abstract=abstract)
We outline the shared consequence container and the gene regulatory community agent that analyzes regulatory construction and expression conduct. We use correlation-based affiliation inference, true-edge extraction, degree-based graph evaluation, and variance-based rating to establish hub, sink, and extremely dynamic genes throughout simulated expression trajectories. This provides us a network-level image of how regulatory affect could also be distributed within the system and helps us establish vital candidate regulators for downstream interpretation.
class ProteinInteractionPredictionAgent:
def run(self, ppi_rows) -> AgentOutcome:
X = np.vstack([r[2] for r in ppi_rows])
y = np.array([r[3] for r in ppi_rows])
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.rework(X_test)
clf = LogisticRegression(max_iter=1000)
clf.match(X_train_s, y_train)
probs = clf.predict_proba(X_test_s)[:, 1]
auc = roc_auc_score(y_test, probs)
ap = average_precision_score(y_test, probs)
scored_pairs = []
Xt_full = scaler.rework(X)
full_probs = clf.predict_proba(Xt_full)[:, 1]
for (p1, p2, _, label), pr in zip(ppi_rows, full_probs):
scored_pairs.append((p1, p2, float(pr), int(label)))
top_candidates = sorted(scored_pairs, key=lambda x: x[2], reverse=True)[:10]
positive_rate = float(y.imply())
abstract = {
"num_pairs": int(len(ppi_rows)),
"positive_interaction_rate": spherical(positive_rate, 4),
"test_roc_auc": spherical(float(auc), 4),
"test_average_precision": spherical(float(ap), 4),
"top_predicted_interactions": [
{"protein_a": a, "protein_b": b, "pred_prob": round(pr, 4), "label": lab}
for a, b, pr, lab in top_candidates
],
}
return AgentOutcome(identify="ProteinInteractionPredictionAgent", abstract=abstract)
class MetabolicOptimizationAgent:
def run(self, reactions, oxygen_budget=3.5, substrate_budget=4.0):
best_score = -1e9
best_flux = None
hint = []
for _ in vary(8000):
flux = np.random.dirichlet(np.ones(len(reactions))) * np.random.uniform(1.5, 5.0)
oxygen = sum(f["oxygen_need"] * v for f, v in zip(reactions, flux))
substrate = sum(f["substrate_cost"] * v for f, v in zip(reactions, flux))
atp = sum(f["yield_atp"] * v for f, v in zip(reactions, flux))
biomass = sum(f["yield_biomass"] * v for f, v in zip(reactions, flux))
penalty = 0.0
if oxygen > oxygen_budget:
penalty += 6.0 * (oxygen - oxygen_budget)
if substrate > substrate_budget:
penalty += 6.0 * (substrate - substrate_budget)
rating = 2.2 * biomass + 0.6 * atp - penalty
hint.append(rating)
if rating > best_score:
best_score = rating
best_flux = {
"oxygen": oxygen,
"substrate": substrate,
"atp": atp,
"biomass": biomass,
"fluxes": {reactions[i]["name"]: float(flux[i]) for i in vary(len(reactions))}
}
ranked_fluxes = sorted(best_flux["fluxes"].gadgets(), key=lambda x: x[1], reverse=True)
abstract = {
"oxygen_budget": oxygen_budget,
"substrate_budget": substrate_budget,
"best_objective_score": spherical(float(best_score), 4),
"best_biomass": spherical(float(best_flux["biomass"]), 4),
"best_atp": spherical(float(best_flux["atp"]), 4),
"oxygen_used": spherical(float(best_flux["oxygen"]), 4),
"substrate_used": spherical(float(best_flux["substrate"]), 4),
"dominant_reactions": [
{"reaction": name, "flux": round(val, 4)} for name, val in ranked_fluxes[:6]
],
}
return AgentOutcome(identify="MetabolicOptimizationAgent", abstract=abstract), hint
We outline the protein interplay prediction agent and the metabolic optimization agent, which collectively develop the evaluation past regulation into interplay biology and pathway allocation. We prepare a logistic regression classifier on artificial pairwise protein options to estimate interplay possibilities, consider predictive efficiency, and rank the strongest candidate protein pairs. We additionally run a randomized flux search below oxygen and substrate constraints to establish metabolically favorable response allocations, permitting us to review how the system balances biomass progress, ATP manufacturing, and useful resource limitations.
class CellSignalingSimulationAgent:
def run(self, df_signal: pd.DataFrame) -> AgentOutcome:
peak_receptor = float(df_signal["receptor_active"].max())
peak_kinase = float(df_signal["kinase_active"].max())
peak_tf = float(df_signal["tf_active"].max())
t_receptor = float(df_signal.loc[df_signal["receptor_active"].idxmax(), "time"])
t_kinase = float(df_signal.loc[df_signal["kinase_active"].idxmax(), "time"])
t_tf = float(df_signal.loc[df_signal["tf_active"].idxmax(), "time"])
final_state = df_signal.iloc[-1].to_dict()
abstract = {
"peak_receptor_activity": spherical(peak_receptor, 4),
"peak_kinase_activity": spherical(peak_kinase, 4),
"peak_tf_activity": spherical(peak_tf, 4),
"time_to_peak_receptor": spherical(t_receptor, 4),
"time_to_peak_kinase": spherical(t_kinase, 4),
"time_to_peak_tf": spherical(t_tf, 4),
"final_state": {ok: spherical(float(v), 4) for ok, v in final_state.gadgets()},
}
return AgentOutcome(identify="CellSignalingSimulationAgent", abstract=abstract)
class PrincipalInvestigatorAgent:
def __init__(self, shopper, mannequin=OPENAI_MODEL):
self.shopper = shopper
self.mannequin = mannequin
def synthesize(self, outcomes: List[AgentResult]) -> str:
payload = {r.identify: r.abstract for r in outcomes}
immediate = f"""
You are a principal investigator in computational programs biology.
Given the outputs of 4 specialised AI brokers:
1. gene regulatory community evaluation
2. protein interplay prediction
3. metabolic pathway optimization
4. cell signaling simulation
Write a rigorous however readable report with these sections:
- Executive Summary
- Key Findings by Agent
- Cross-System Biological Interpretation
- Hypotheses Worth Testing in Wet Lab
- Model Limitations
- Next Computational Extensions
Use concise scientific language.
Do not fabricate datasets past what's proven.
When helpful, join regulation, signaling, metabolism, and protein interactions into a single programs biology story.
Agent outputs:
{json.dumps(payload, indent=2)}
"""
attempt:
resp = self.shopper.chat.completions.create(
mannequin=self.mannequin,
messages=[
{"role": "user", "content": prompt},
],
temperature=0.4,
)
return resp.selections[0].message.content material
besides Exception as e:
return f"OpenAI synthesis failed: {e}"
genes, W = generate_gene_regulatory_network(n_genes=14, edge_prob=0.20)
X_expr = simulate_gene_expression(W, n_steps=80, noise=0.08)
grn_agent = GeneRegulatoryNetworkAgent()
grn_result = grn_agent.run(genes, W, X_expr)
proteins, prot_features, prot_families, prot_localization = generate_protein_features(n_proteins=40, feature_dim=10)
ppi_rows = generate_ppi_dataset(proteins, prot_features, prot_families, prot_localization)
ppi_agent = ProteinInteractionPredictionAgent()
ppi_result = ppi_agent.run(ppi_rows)
metabolites, reactions = generate_metabolic_network()
met_agent = MetabolicOptimizationAgent()
met_result, met_trace = met_agent.run(reactions, oxygen_budget=3.5, substrate_budget=4.2)
df_signal = simulate_cell_signaling(T=220, dt=0.05, ligand_level=1.2)
sig_agent = CellSignalingSimulationAgent()
sig_result = sig_agent.run(df_signal)
all_results = [grn_result, ppi_result, met_result, sig_result]
for r in all_results:
fairly(r.identify, json.dumps(r.abstract, indent=2))
fig = plt.determine(figsize=(18, 14))
ax1 = plt.subplot(2, 2, 1)
im = ax1.imshow(W, cmap="coolwarm", side="auto")
ax1.set_title("Gene Regulatory Weight Matrix")
ax1.set_xticks(vary(len(genes)))
ax1.set_yticks(vary(len(genes)))
ax1.set_xticklabels(genes, rotation=90)
ax1.set_yticklabels(genes)
plt.colorbar(im, ax=ax1, fraction=0.046, pad=0.04)
ax2 = plt.subplot(2, 2, 2)
for i in vary(min(6, X_expr.form[1])):
ax2.plot(X_expr[:, i], label=genes[i])
ax2.set_title("Sample Gene Expression Dynamics")
ax2.set_xlabel("Time step")
ax2.set_ylabel("Expression")
ax2.legend(loc="higher proper", fontsize=8)
ax3 = plt.subplot(2, 2, 3)
ax3.plot(df_signal["time"], df_signal["receptor_active"], label="Receptor")
ax3.plot(df_signal["time"], df_signal["kinase_active"], label="Kinase")
ax3.plot(df_signal["time"], df_signal["tf_active"], label="Transcription Factor")
ax3.plot(df_signal["time"], df_signal["phosphatase"], label="Phosphatase")
ax3.set_title("Cell Signaling Simulation")
ax3.set_xlabel("Time")
ax3.set_ylabel("Activity")
ax3.legend()
ax4 = plt.subplot(2, 2, 4)
ax4.plot(met_trace)
ax4.set_title("Metabolic Search Objective Trace")
ax4.set_xlabel("Iteration")
ax4.set_ylabel("Objective rating")
plt.tight_layout()
plt.present()
G_grn = nx.DiGraph()
for g in genes:
G_grn.add_node(g)
for i in vary(len(genes)):
for j in vary(len(genes)):
if abs(W[i, j]) > 0.4:
G_grn.add_edge(genes[i], genes[j], weight=W[i, j])
plt.determine(figsize=(10, 8))
pos = nx.spring_layout(G_grn, seed=42)
edge_colors = ["green" if G_grn[u][v]["weight"] > 0 else "pink" for u, v in G_grn.edges()]
nx.draw_networkx(G_grn, pos, with_labels=True, node_size=900, font_size=9, arrows=True, edge_color=edge_colors)
plt.title("Gene Regulatory Network Graph (inexperienced=activation, pink=repression)")
plt.axis("off")
plt.present()
top_ppi = ppi_result.abstract["top_predicted_interactions"][:12]
G_ppi = nx.Graph()
for row in top_ppi:
a, b, p = row["protein_a"], row["protein_b"], row["pred_prob"]
G_ppi.add_edge(a, b, weight=p)
plt.determine(figsize=(10, 8))
pos = nx.spring_layout(G_ppi, seed=7)
widths = [2 + 4 * G_ppi[u][v]["weight"] for u, v in G_ppi.edges()]
nx.draw_networkx(G_ppi, pos, with_labels=True, node_size=1000, font_size=9, width=widths)
plt.title("Top Predicted Protein Interaction Subnetwork")
plt.axis("off")
plt.present()
grn_table = pd.DataFrame(grn_result.abstract["most_dynamic_genes"])
ppi_table = pd.DataFrame(ppi_result.abstract["top_predicted_interactions"])
met_table = pd.DataFrame(met_result.abstract["dominant_reactions"])
sig_table = pd.DataFrame([sig_result.summary])
fairly("Most Dynamic Genes", grn_table.to_string(index=False))
fairly("Top Predicted PPIs", ppi_table.to_string(index=False))
fairly("Dominant Metabolic Reactions", met_table.to_string(index=False))
pi_agent = PrincipalInvestigatorAgent(shopper=shopper, mannequin=OPENAI_MODEL)
final_report = pi_agent.synthesize(all_results)
fairly("OPENAI SYSTEMS BIOLOGY REPORT", final_report)
artifact = {
"grn": grn_result.abstract,
"ppi": ppi_result.abstract,
"metabolic": met_result.abstract,
"signaling": sig_result.abstract,
"llm_report": final_report,
}
with open("bio_agents_tutorial_results.json", "w") as f:
json.dump(artifact, f, indent=2)
print("nSaved outcomes to: bio_agents_tutorial_results.json")
We outline the cell signaling simulation agent and the principal investigator agent, and then execute the entire end-to-end workflow. We run all 4 organic modules, print structured outputs, generate plots and community visualizations, construct tidy abstract tables, and lastly use the OpenAI mannequin to write down an expert-style report that integrates the findings throughout all subsystems. We convey every part collectively into a full pipeline for organic programs modeling. It exhibits how multi-agent AI can assist scientific interpretation, visualization, and speculation technology.
In conclusion, we created a full computational biology workflow that demonstrates how agent-based AI can be utilized to review a number of layers of organic group in a structured and interpretable manner. We moved from information technology to modeling, optimization, simulation, visualization, and closing scientific synthesis, which helps us see how specialised brokers can collaborate to supply richer organic perception than any single remoted evaluation. At the top, we’ve a sturdy basis for extending this pocket book towards extra life like omics datasets, experimental priors, mechanistic constraints, and deeper organic speculation technology for superior programs biology analysis.
Check out the Full Codes with Notebook. Also, be at liberty to comply with us on Twitter and don’t neglect to affix our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us
The submit Build a Multi-Agent AI Workflow for Biological Network Modeling, Protein Interactions, Metabolism, and Cell Signaling Simulation appeared first on MarkTechPost.
