A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison
In this tutorial, we implement an instrumented workflow for Microsoft SkillOpt. We arrange the SkillChoose repository, join it to OpenAI-compatible mannequin entry, configure the optimizer and goal fashions, and run the SearchQA optimization pipeline with a managed pattern restrict to maintain prices manageable. We first consider the unique seed talent as a baseline, then run an actual optimization loop by which SkillChoose improves the talent by way of rollout, reflection, aggregation, choice, updating, and validation-based gating. Along the way in which, we examine the coaching historical past, visualize modifications in accuracy, assessment edit-budget habits, monitor cumulative token utilization, and evaluate the advanced talent with the unique baseline.
SkillChoose Environment Setup
import os, re, json, glob, subprocess, pathlib, difflib
attempt:
from google.colab import userdata
OPENAI_KEY = userdata.get("OPENAI_API_KEY")
besides Exception:
OPENAI_KEY = os.environ.get("OPENAI_API_KEY", "")
OPENAI_KEY = OPENAI_KEY or "sk-PASTE-YOUR-KEY-HERE"
assert OPENAI_KEY.startswith("sk-"), "Set an actual OpenAI key (Colab Secrets -> OPENAI_API_KEY)."
OPTIMIZER_MODEL = "gpt-4o"
TARGET_MODEL = "gpt-4o-mini"
RUN = "outputs/searchqa_adv"
LIMIT = 24
RUN_KNOBS = dict(num_epochs=2, batch_size=8, minibatch=4, merge_batch=4,
employees=2, lr=4, lr_sched="cosine", restrict=LIMIT)
if not pathlib.Path("/content material/SkillChoose/scripts/practice.py").exists():
subprocess.run("git clone --depth 1 https://github.com/microsoft/SkillChoose.git",
shell=True, cwd="/content material")
subprocess.run('pip -q set up -e . && pip -q set up "openai>=1.0" pandas matplotlib',
shell=True, cwd="/content material/SkillChoose")
os.chdir("/content material/SkillChoose")
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://api.openai.com/v1"
os.environ["AZURE_OPENAI_API_KEY"] = OPENAI_KEY
os.environ["AZURE_OPENAI_AUTH_MODE"] = "openai_compatible"
SPLIT = "information/searchqa_id_split"
CFG = "configs/searchqa/default.yaml"
COMMON = ["--azure_openai_endpoint","https://api.openai.com/v1",
"--cfg-options","model.backend=azure_openai",
"model.azure_openai_auth_mode=openai_compatible"]
We put together the complete Colab setting for operating SkillChoose. We load the OpenAI API key, outline the optimizer and goal fashions, clone the SkillChoose repository, and set up the required dependencies. We additionally configure the OpenAI-compatible backend so the SkillChoose scripts can talk with the chosen fashions.
Baseline Skill Evaluation
def run_cli(args, tag):
print("n" + "#"*80 + f"n# {tag}n# $ " + " ".be a part of(args) + "n" + "#"*80)
p = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, textual content=True)
buf = []
for line in p.stdout:
print(line, finish=""); buf.append(line)
p.wait(); return "".be a part of(buf)
def parse_acc(txt):
m = re.search(r"Results:s*arduous=([d.]+)s+smooth=([d.]+)", txt)
if m: return {"arduous": float(m.group(1)), "smooth": float(m.group(2))}
g = re.findall(r"arduous=([d.]+)", txt)
return {"arduous": float(g[-1]), "smooth": None} if g else None
seed = "skillopt/envs/searchqa/expertise/preliminary.md"
if not pathlib.Path(seed).exists():
seed = "baseline_skill.md"; pathlib.Path(seed).write_text("You reply questions from the given context.n")
base_out = run_cli(["python","scripts/eval_only.py","--config",CFG,
"--skill",seed,"--split","valid_unseen","--split_dir",SPLIT,
"--target_model",TARGET_MODEL,*COMMON,
"env.workers=1",f"env.limit={LIMIT}"],
"BASELINE EVAL (env seed talent, no coaching)")
base = parse_acc(base_out)
We outline helper features to run SkillChoose instructions and extract analysis accuracy from the output. We then find the preliminary seed talent utilized by the SearchQA setting and consider it on the unseen validation cut up. This offers us a baseline end result earlier than any optimization or coaching takes place.
Training And Visualization
ok = RUN_KNOBS
train_out = run_cli(["python","scripts/train.py","--config",CFG,"--split_dir",SPLIT,
"--optimizer_model",OPTIMIZER_MODEL,"--target_model",TARGET_MODEL,"--out_root",RUN,
*COMMON,
"train.train_size=0",
f"train.num_epochs={k['num_epochs']}", f"practice.batch_size={ok['batch_size']}",
f"gradient.minibatch_size={ok['minibatch']}", f"gradient.merge_batch_size={ok['merge_batch']}",
f"gradient.analyst_workers={ok['workers']}",
f"optimizer.learning_rate={ok['lr']}", f"optimizer.lr_scheduler={ok['lr_sched']}",
"optimizer.use_slow_update=true", "optimizer.use_meta_skill=true",
f"env.employees={ok['workers']}", f"env.restrict={ok['limit']}"],
"TRAIN (rollout->reflect->aggregate->select->update->gate; slow-update + meta-skill)")
import pandas as pd, matplotlib.pyplot as plt
hist = json.masses(pathlib.Path(f"{RUN}/historical past.json").read_text())
df = pd.json_normalize(hist)
print("nhistory.json columns:", checklist(df.columns))
def col(*cands):
for c in cands:
for precise in df.columns:
if c in precise.decrease(): return precise
return None
c_step = col("step")
x = df[c_step] if c_step else vary(len(df))
c_tr, c_va = col("train_acc","train_hard","practice"), col("val_acc","val_hard","legitimate","val")
c_lr, c_tok = col("edit_budget","lr","learning_rate","funds"), col("token","price")
fig, ax = plt.subplots(1, 3, figsize=(16,4))
if c_tr: ax[0].plot(x, df[c_tr], "o-", label="practice acc")
if c_va: ax[0].plot(x, df[c_va], "s-", label="val acc (gate)")
if base and base["hard"] is just not None: ax[0].axhline(base["hard"], ls="--", c="gray", label="baseline (seed)")
ax[0].set_title("Skill accuracy over steps"); ax[0].set_xlabel("step"); ax[0].legend(); ax[0].grid(alpha=.3)
if c_lr: ax[1].plot(x, df[c_lr], "d-", c="purple")
ax[1].set_title("Edit-budget / LR schedule (cosine)"); ax[1].set_xlabel("step"); ax[1].grid(alpha=.3)
if c_tok: ax[2].plot(x, pd.to_numeric(df[c_tok],errors="coerce").cumsum(), c="darkorange")
ax[2].set_title("Cumulative token utilization"); ax[2].set_xlabel("step"); ax[2].grid(alpha=.3)
plt.tight_layout(); plt.savefig(f"{RUN}/training_dashboard.png", dpi=120); plt.present()
We run the principle SkillChoose coaching loop with the chosen optimizer and goal fashions. We configure necessary coaching settings corresponding to epochs, batch measurement, minibatch measurement, studying fee, sluggish replace, meta-skill, and information restrict. We then learn the coaching historical past, visualize accuracy, edit-budget habits, and cumulative token utilization on a dashboard.
Inspecting Skill Evolution
snaps = sorted(glob.glob(f"{RUN}/expertise/skill_v*.md"))
greatest = pathlib.Path(f"{RUN}/best_skill.md").read_text()
print("n" + "="*80 + f"nSKILL EVOLUTION: {len(snaps)} snapshots; diff v0 -> best_skilln" + "="*80)
if snaps:
diff = difflib.unified_diff(pathlib.Path(snaps[0]).read_text().splitlines(),
greatest.splitlines(), snaps[0].cut up('/')[-1], "best_skill.md", lineterm="")
print("n".be a part of(checklist(diff)[:120]) or "(no textual diff captured)")
prot = re.search(r"(SLOW_UPDATE.*?)$", greatest, re.S)
print("n--- protected SLOW_UPDATE block ---n",
prot.group(1)[:1500] if prot else "(none — seems after an epoch boundary)")
patch = (sorted(glob.glob(f"{RUN}/steps/step_*/patches/*.json")) or [None])[0]
analy = (sorted(glob.glob(f"{RUN}/steps/step_*/evaluation/*")) or [None])[0]
print("n" + "="*80 + "nTEXTUAL GRADIENT — one aggregated patch (clipped to edit funds):n" + "="*80)
print(pathlib.Path(patch).read_text()[:1500] if patch else "(no patch information)")
print("n--- one uncooked Reflect-stage evaluation ---n",
pathlib.Path(analy).read_text()[:1000] if analy else "(no evaluation information)")
for identify in ("slow_update", "meta_skill"):
information = sorted(glob.glob(f"{RUN}/{identify}/epoch_*/*"))
print(f"n[{name}] {len(information)} artifact(s):", [pathlib.Path(f).name for f in files[:6]])
We examine how the talent evolves through the optimization course of. We evaluate the primary saved talent snapshot with the ultimate greatest talent, verify whether or not a protected slow-update block seems, and assessment one generated patch and one reflection evaluation. We additionally checklist the slow-update and meta-skill artifacts created throughout epoch-level coaching.
Final Evaluation Comparison
best_out = run_cli(["python","scripts/eval_only.py","--config",CFG,
"--skill",f"{RUN}/best_skill.md","--split","valid_unseen","--split_dir",SPLIT,
"--target_model",TARGET_MODEL,*COMMON,"env.workers=1",f"env.limit={LIMIT}"],
"FINAL TEST EVAL (best_skill)")
educated = parse_acc(best_out)
print("n" + "="*80 + "nRESULT (arduous = actual match, the gated metric)n" + "="*80)
print(f"baseline seed talent : {base}")
print(f"educated best_skill : {educated}")
if base and educated:
print(f"hard-match elevate : {educated['hard'] - base['hard']:+.4f}")
print(f"nDeployable artifact: {RUN}/best_skill.md ({len(greatest)} chars)")
We consider the ultimate optimized best_skill.md file on the unseen validation cut up. We evaluate the educated talent’s hard-match rating with the unique baseline rating to measure the development. We end by printing the ultimate elevate and the trail to the deployable optimized talent artifact.
Conclusion
In conclusion, we constructed a whole SkillChoose experiment that goes past merely beginning a coaching command. We measured the baseline seed talent, optimized it utilizing a stronger mannequin because the optimizer and a smaller mannequin because the goal agent, and inspected how the talent advanced throughout coaching steps by way of saved snapshots, patches, reflections, sluggish updates, and meta-skill artifacts. We additionally generated a coaching dashboard that helps us perceive whether or not the optimization course of is enhancing efficiency and how a lot token utilization accumulates through the run. By the tip, now we have a deployable best_skill.md file, a closing analysis on the unseen validation cut up, and a transparent comparability between the unique and optimized expertise.
Check out the Full Codes with Notebook. Also, be happy to observe us on Twitter and don’t neglect to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and many others.? Connect with us
The submit A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison appeared first on MarkTechPost.
