Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks
In this tutorial, we implement an end-to-end workflow for Salesforce CodeGen. We load a CodeGen mannequin from Hugging Face, put together it for code era, and use it to generate Python capabilities from natural-language prompts. We then transfer past fundamental inference by including perform extraction, syntax checking, static security checks, unit-test-based validation, best-of-N candidate reranking, multi-step program synthesis, prompt-style experimentation, benchmark visualization, and artifact export. Through this workflow, we learn the way CodeGen can be utilized not solely as a code completion mannequin but additionally as a part of a structured code-generation pipeline that evaluates, filters, and organizes generated options.
Loading the Salesforce CodeGen Model from Hugging Face
import os, sys, subprocess, textwrap, json, re, time, math, ast, tempfile, multiprocessing as mp
from pathlib import Path
def sh(cmd):
print(f"n$ {cmd}")
subprocess.run(cmd, shell=True, verify=True)
sh(f"{sys.executable} -m pip set up -q -U transformers speed up safetensors einops datasets consider pandas matplotlib tqdm wealthy radon tiktoken")
import torch
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
from wealthy import print
from wealthy.panel import Panel
from wealthy.syntax import Syntax
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
from radon.complexity import cc_visit
OUT_DIR = Path("/content material/codegen_advanced_tutorial")
OUT_DIR.mkdir(dad and mom=True, exist_ok=True)
set_seed(42)
print(Panel.match("Salesforce CodeGen Advanced Tutorial", fashion="daring inexperienced"))
print("nRuntime info")
print("Python:", sys.model.break up()[0])
print("Torch:", torch.__version__)
print("CUDA obtainable:", torch.cuda.is_available())
if torch.cuda.is_available():
print("GPU:", torch.cuda.get_device_name(0))
print("CUDA reminiscence GB:", spherical(torch.cuda.get_device_properties(0).total_memory / 1e9, 2))
MODEL_ID = os.environ.get("CODEGEN_MODEL_ID", "Salesforce/codegen-350M-mono")
MODEL_OPTIONS = {
"easy_colab_default": "Salesforce/codegen-350M-mono",
"larger_codegen1": "Salesforce/codegen-2B-mono",
"codegen2_1b": "Salesforce/codegen2-1B_P",
"codegen25_7b_mono": "Salesforce/codegen25-7b-mono_P",
}
print("nSelected mannequin:", MODEL_ID)
print("Available mannequin examples:", MODEL_OPTIONS)
trust_remote_code = any(x in MODEL_ID.decrease() for x in ["codegen2", "codegen25"])
gadget = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if torch.cuda.is_available() else torch.float32
print("nLoading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
MODEL_ID,
trust_remote_code=trust_remote_code
)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
print("Loading mannequin...")
load_kwargs = {
"trust_remote_code": trust_remote_code,
"low_cpu_mem_usage": True,
}
if torch.cuda.is_available():
load_kwargs["torch_dtype"] = dtype
load_kwargs["device_map"] = "auto"
else:
load_kwargs["torch_dtype"] = torch.float32
mannequin = AutoModelForCausalLM.from_pretrained(MODEL_ID, **load_kwargs)
if not torch.cuda.is_available():
mannequin.to(gadget)
mannequin.eval()
def count_parameters(mannequin):
return sum(p.numel() for p in mannequin.parameters())
print(f"Loaded {MODEL_ID}")
print(f"Parameter rely: {count_parameters(mannequin)/1e6:.1f}M")
def generate_text(
immediate,
max_new_tokens=180,
temperature=0.35,
top_p=0.92,
top_k=50,
do_sample=True,
num_return_sequences=1,
repetition_penalty=1.05,
):
inputs = tokenizer(immediate, return_tensors="pt")
inputs = {ok: v.to(mannequin.gadget) for ok, v in inputs.objects()}
with torch.no_grad():
outputs = mannequin.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=do_sample,
temperature=temperature,
top_p=top_p,
top_k=top_k,
num_return_sequences=num_return_sequences,
repetition_penalty=repetition_penalty,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)
return decoded
def print_code(title, code):
print(Panel.match(title, fashion="daring cyan"))
print(Syntax(code, "python", theme="monokai", line_numbers=True))
We set up all required libraries and put together the atmosphere for working Salesforce CodeGen. We verify the runtime, detect GPU availability, choose the CodeGen mannequin, and load each the tokenizer and mannequin from Hugging Face. We additionally outline helper capabilities for textual content era and for displaying formatted code in order that the remainder of the tutorial is less complicated to observe.
Building Extraction, Safety, and Unit-Test Validation Utilities
def extract_function_source(full_text, function_name):
textual content = full_text.change("rn", "n")
fence = re.search(r"```(?:python)?n(.*?)```", textual content, flags=re.S | re.I)
if fence:
textual content = fence.group(1)
sample = rf"^defs+{re.escape(function_name)}s*("
match = re.search(sample, textual content, flags=re.M)
if not match:
return ""
chunk = textual content[match.start():]
traces = chunk.splitlines()
collected = []
for i, line in enumerate(traces):
if i > 0:
if line.startswith("def ") or line.startswith("class "):
break
if line.startswith("if __name__"):
break
if line and not line.startswith((" ", "t", "#")) and re.match(r"^[A-Za-z_][A-Za-z0-9_]*s*=", line):
break
collected.append(line)
supply = "n".be a part of(collected).rstrip()
attempt:
ast.parse(supply)
return supply
besides SyntaxError:
fixed_lines = []
for line in collected:
fixed_lines.append(line)
candidate = "n".be a part of(fixed_lines).rstrip()
attempt:
ast.parse(candidate)
supply = candidate
besides SyntaxError:
go
return supply if supply.strip().startswith("def ") else ""
def syntax_ok(supply):
attempt:
ast.parse(supply)
return True, ""
besides SyntaxError as e:
return False, str(e)
FORBIDDEN_NAMES = {
"eval", "exec", "compile", "open", "enter", "__import__",
"globals", "locals", "vars", "dir", "getattr", "setattr", "delattr",
"assist", "breakpoint", "exit", "give up"
}
FORBIDDEN_NODES = (
ast.Import,
ast.ImportFrom,
ast.Global,
ast.Nonlocal,
ast.With,
ast.AsyncWith,
ast.AsyncFunctionDef,
ast.ClassDef,
ast.Delete,
ast.Raise,
)
ALLOWED_BUILTINS = {
"abs": abs,
"all": all,
"any": any,
"bool": bool,
"dict": dict,
"enumerate": enumerate,
"float": float,
"int": int,
"isinstance": isinstance,
"len": len,
"record": record,
"map": map,
"max": max,
"min": min,
"pow": pow,
"vary": vary,
"reversed": reversed,
"spherical": spherical,
"set": set,
"sorted": sorted,
"str": str,
"sum": sum,
"tuple": tuple,
"zip": zip,
}
def static_safety_check(supply):
attempt:
tree = ast.parse(supply)
besides SyntaxError as e:
return False, f"SyntaxError: {e}"
for node in ast.stroll(tree):
if isinstance(node, FORBIDDEN_NODES):
return False, f"Forbidden AST node: {kind(node).__name__}"
if isinstance(node, ast.Name):
if node.id in FORBIDDEN_NAMES or node.id.startswith("__"):
return False, f"Forbidden title: {node.id}"
if isinstance(node, ast.Attribute):
if node.attr.startswith("__"):
return False, f"Forbidden attribute: {node.attr}"
if isinstance(node, ast.Call):
if isinstance(node.func, ast.Name) and node.func.id in FORBIDDEN_NAMES:
return False, f"Forbidden name: {node.func.id}"
return True, "handed"
def _worker_run_tests(supply, function_name, exams, queue):
attempt:
safe_globals = {"__builtins__": ALLOWED_BUILTINS}
safe_locals = {}
compiled = compile(supply, "<generated_code>", "exec")
exec(compiled, safe_globals, safe_locals)
fn = safe_locals.get(function_name) or safe_globals.get(function_name)
if fn is None:
queue.put({"okay": False, "error": f"{function_name} not discovered", "handed": 0, "complete": len(exams)})
return
handed = 0
particulars = []
for take a look at in exams:
args = take a look at.get("args", [])
kwargs = take a look at.get("kwargs", {})
anticipated = take a look at["expected"]
end result = fn(*args, **kwargs)
okay = end result == anticipated
handed += int(okay)
particulars.append({
"args": args,
"kwargs": kwargs,
"anticipated": anticipated,
"end result": end result,
"okay": okay,
})
queue.put({"okay": handed == len(exams), "error": "", "handed": handed, "complete": len(exams), "particulars": particulars})
besides Exception as e:
queue.put({"okay": False, "error": repr(e), "handed": 0, "complete": len(exams)})
def run_unit_tests_safely(supply, function_name, exams, timeout_seconds=3):
secure, motive = static_safety_check(supply)
if not secure:
return {"okay": False, "error": motive, "handed": 0, "complete": len(exams), "particulars": []}
ctx = mp.get_context("fork")
queue = ctx.Queue()
course of = ctx.Process(goal=_worker_run_tests, args=(supply, function_name, exams, queue))
course of.begin()
course of.be a part of(timeout_seconds)
if course of.is_alive():
course of.terminate()
course of.be a part of()
return {"okay": False, "error": "timeout", "handed": 0, "complete": len(exams), "particulars": []}
if queue.empty():
return {"okay": False, "error": "no end result returned", "handed": 0, "complete": len(exams), "particulars": []}
return queue.get()
def code_complexity(supply):
attempt:
blocks = cc_visit(supply)
if not blocks:
return 1
return max(block.complexity for block in blocks)
besides Exception:
return None
def score_candidate(supply, test_result):
syntax_score = 1 if syntax_ok(supply)[0] else 0
safety_score = 1 if static_safety_check(supply)[0] else 0
handed = test_result.get("handed", 0)
complete = max(test_result.get("complete", 1), 1)
test_score = handed / complete
complexity = code_complexity(supply)
complexity_penalty = 0 if complexity is None else min(complexity / 20, 0.25)
return syntax_score + safety_score + 3 * test_score - complexity_penalty
We construct the utility layer that extracts generated Python capabilities from uncooked mannequin outputs. We add syntax validation, static security checks, restricted execution, unit-test execution, and timeout dealing with to make generated code simpler to judge. We additionally calculate code complexity and create a scoring perform to rank generated candidates by correctness, security, and simplicity.
print("n" + "=" * 90)
Generating Code and Defining Benchmark Tasks
print("Demo 1: Basic natural-language-to-code completion")
print("=" * 90)
basic_prompt = """# Write a Python perform that returns the world of a circle.
# The perform needs to be named circle_area and ought to settle for radius as enter.
# Do not print something. Return the numeric end result.
def circle_area(radius):
"""
basic_output = generate_text(
basic_prompt,
max_new_tokens=120,
temperature=0.25,
do_sample=True,
num_return_sequences=1,
)[0]
print_code("Raw CodeGen output", basic_output)
circle_source = extract_function_source(basic_output, "circle_area")
print_code("Extracted perform", circle_source if circle_source else "# No perform extracted")
circle_tests = [
{"args": [1], "anticipated": math.pi},
{"args": [2], "anticipated": 4 * math.pi},
]
if circle_source:
print("Syntax:", syntax_ok(circle_source))
print("Safety:", static_safety_check(circle_source))
print("Complexity:", code_complexity(circle_source))
print("n" + "=" * 90)
print("Demo 2: Best-of-N era with test-based reranking")
print("=" * 90)
TASKS = [
{
"name": "factorial",
"signature": "def factorial(n):",
"instruction": "Return n factorial for a non-negative integer n. Use 1 for factorial(0).",
"tests": [
{"args": [0], "anticipated": 1},
{"args": [1], "anticipated": 1},
{"args": [5], "anticipated": 120},
{"args": [7], "anticipated": 5040},
],
},
{
"title": "is_palindrome",
"signature": "def is_palindrome(textual content):",
"instruction": "Return True if textual content is a palindrome after eradicating areas and ignoring case, in any other case return False.",
"exams": [
{"args": ["Race car"], "anticipated": True},
{"args": ["hello"], "anticipated": False},
{"args": ["Never odd or even"], "anticipated": True},
],
},
{
"title": "fibonacci",
"signature": "def fibonacci(n):",
"instruction": "Return the nth Fibonacci quantity the place fibonacci(0)=0 and fibonacci(1)=1.",
"exams": [
{"args": [0], "anticipated": 0},
{"args": [1], "anticipated": 1},
{"args": [8], "anticipated": 21},
{"args": [10], "anticipated": 55},
],
},
{
"title": "dedupe_keep_order",
"signature": "def dedupe_keep_order(objects):",
"instruction": "Return a listing with duplicate values eliminated whereas preserving the primary incidence order.",
"exams": [
{"args": [[1, 2, 1, 3, 2]], "anticipated": [1, 2, 3]},
{"args": [["a", "b", "a", "c"]], "anticipated": ["a", "b", "c"]},
{"args": [[]], "anticipated": []},
],
},
]
We begin with a easy natural-language-to-code era instance utilizing a circle space perform. We generate uncooked CodeGen output, extract the perform, and examine its syntax, security, and complexity. We then outline a number of programming duties that later assist us benchmark CodeGen throughout totally different function-generation issues.
Best-of-N Candidate Generation and Test-Based Reranking
def build_prompt(job):
examples = []
for t in job["tests"][:2]:
examples.append(f"# Example: {job['name']}(*{t['args']}) -> {repr(t['expected'])}")
example_block = "n".be a part of(examples)
return f'''# You are writing clear Python 3 code.
# Task: {job["instruction"]}
# Rules:
# - Do not import packages.
# - Do not print something.
# - Return the reply from the perform.
# - Keep the implementation compact and readable.
{example_block}
{job["signature"]}
'''
def generate_candidates_for_task(job, n=3, max_new_tokens=160):
immediate = build_prompt(job)
outputs = generate_text(
immediate,
max_new_tokens=max_new_tokens,
temperature=0.45,
top_p=0.92,
do_sample=True,
num_return_sequences=n,
repetition_penalty=1.07,
)
candidates = []
for i, out in enumerate(outputs):
supply = extract_function_source(out, job["name"])
syntax_pass, syntax_error = syntax_ok(supply) if supply else (False, "no supply extracted")
test_result = run_unit_tests_safely(supply, job["name"], job["tests"]) if supply else {
"okay": False,
"error": "no supply extracted",
"handed": 0,
"complete": len(job["tests"]),
"particulars": [],
}
candidates.append({
"job": job["name"],
"candidate_id": i,
"immediate": immediate,
"raw_output": out,
"supply": supply,
"syntax_ok": syntax_pass,
"syntax_error": syntax_error,
"security": static_safety_check(supply)[0] if supply else False,
"tests_passed": test_result.get("handed", 0),
"tests_total": test_result.get("complete", len(job["tests"])),
"test_ok": test_result.get("okay", False),
"test_error": test_result.get("error", ""),
"complexity": code_complexity(supply) if supply else None,
"rating": score_candidate(supply, test_result) if supply else -999,
})
candidates = sorted(candidates, key=lambda x: x["score"], reverse=True)
return candidates
all_candidates = []
best_solutions = {}
CANDIDATES_PER_TASK = 2
for job in tqdm(TASKS, desc="Generating and evaluating"):
candidates = generate_candidates_for_task(job, n=CANDIDATES_PER_TASK)
all_candidates.lengthen(candidates)
best_solutions[task["name"]] = candidates[0]
results_df = pd.DataFrame([
{
"task": c["task"],
"candidate_id": c["candidate_id"],
"syntax_ok": c["syntax_ok"],
"security": c["safety"],
"tests_passed": c["tests_passed"],
"tests_total": c["tests_total"],
"test_ok": c["test_ok"],
"complexity": c["complexity"],
"rating": spherical(c["score"], 3),
"test_error": c["test_error"],
}
for c in all_candidates
]).sort_values(["task", "score"], ascending=[True, False])
print("nCandidate abstract")
show(results_df)
for task_name, greatest in best_solutions.objects():
print_code(f"Best answer for {task_name}", greatest["source"] if greatest["source"] else "# No legitimate supply")
print({
"job": task_name,
"tests_passed": f'{greatest["tests_passed"]}/{greatest["tests_total"]}',
"rating": greatest["score"],
"test_error": greatest["test_error"],
})
We create structured prompts for every job and generate a number of candidate options utilizing CodeGen. We consider every candidate with unit exams, syntax checks, security checks, complexity evaluation, and a scoring system. We then summarize the ends in a DataFrame and show the best-generated answer for every job.
print("n" + "=" * 90)
Multi-Turn Program Synthesis and Prompt-Style Experiments
print("Demo 3: Multi-turn program synthesis")
print("=" * 90)
multi_turn_prompts = [
{
"name": "normalize_words",
"prompt": """# Step 1.
# Write a Python function normalize_words(text).
# It should lowercase text, remove punctuation characters .,!?:;, and split into words.
# Do not import packages.
def normalize_words(text):
""",
"tests": [
{"args": ["Hello, HELLO world!"], "anticipated": ["hello", "hello", "world"]},
{"args": ["A test: yes."], "anticipated": ["a", "test", "yes"]},
],
},
{
"title": "word_counts",
"immediate": """# Step 2.
# Write a Python perform word_counts(phrases).
# It receives a listing of phrases and returns a dictionary mapping every phrase to its frequency.
# Do not import packages.
def word_counts(phrases):
""",
"exams": [
{"args": [["a", "b", "a"]], "anticipated": {"a": 2, "b": 1}},
{"args": [[]], "anticipated": {}},
],
},
{
"title": "top_word",
"immediate": """# Step 3.
# Write a Python perform top_word(counts).
# It receives a dictionary of phrase frequencies.
# Return the phrase with the best rely.
# If counts is empty, return None.
# If there's a tie, return the alphabetically smallest phrase.
# Do not import packages.
def top_word(counts):
""",
"exams": [
{"args": [{"a": 2, "b": 1}], "anticipated": "a"},
{"args": [{"b": 2, "a": 2}], "anticipated": "a"},
{"args": [{}], "anticipated": None},
],
},
]
multi_turn_sources = []
for spec in multi_turn_prompts:
out = generate_text(
spec["prompt"],
max_new_tokens=150,
temperature=0.35,
top_p=0.92,
do_sample=True,
num_return_sequences=1,
)[0]
src = extract_function_source(out, spec["name"])
res = run_unit_tests_safely(src, spec["name"], spec["tests"]) if src else {"okay": False, "error": "no extraction"}
multi_turn_sources.append(src)
print_code(f"Generated {spec['name']}", src if src else "# No supply extracted")
print("Test end result:", res)
pipeline_code = "nn".be a part of([s for s in multi_turn_sources if s])
pipeline_code += """
def most_common_word(textual content):
phrases = normalize_words(textual content)
counts = word_counts(phrases)
return top_word(counts)
"""
pipeline_tests = [
{"args": ["Hello hello, world!"], "anticipated": "whats up"},
{"args": ["B b a a"], "anticipated": "a"},
]
pipeline_result = run_unit_tests_safely(pipeline_code, "most_common_word", pipeline_tests)
print_code("Composed multi-turn pipeline", pipeline_code)
print("Pipeline end result:", pipeline_result)
print("n" + "=" * 90)
print("Demo 4: Prompt types for various CodeGen workflows")
print("=" * 90)
PROMPT_LIBRARY = {
"docstring_to_code": '''def group_by_first_letter(phrases):
"""
Given a listing of strings, return a dictionary the place keys are first letters
and values are lists of phrases starting with that letter.
Preserve enter order.
"""
''',
"partial_code_completion": '''def moving_average(values, window):
end result = []
for i in vary(len(values)):
''',
"test_generation": '''# Write pytest-style exams for this perform.
def clamp(x, low, excessive):
return max(low, min(x, excessive))
def test_clamp():
''',
"refactor_request": '''# Refactor the next code right into a clear perform referred to as count_positive.
# x = [1, -2, 5, 0]
# c = 0
# for i in x:
# if i > 0:
# c = c + 1
# print(c)
def count_positive(values):
''',
}
for title, immediate in PROMPT_LIBRARY.objects():
print("nWorkflow:", title)
out = generate_text(
immediate,
max_new_tokens=120,
temperature=0.35,
top_p=0.92,
do_sample=True,
num_return_sequences=1,
)[0]
print_code(title, out)
We reveal multi-turn program synthesis by producing smaller capabilities that work collectively as a pipeline. We create capabilities for phrase normalization, phrase counting, and top-word choice, then compose them into an entire most-common-word workflow. We additionally take a look at totally different immediate types resembling docstring-to-code, partial completion, take a look at era, and refactoring.
print("Demo 5: Mini benchmark aggregation and visualization")
print("=" * 90)
benchmark_rows = []
for job in TASKS:
task_candidates = [c for c in all_candidates if c["task"] == job["name"]]
greatest = max(task_candidates, key=lambda x: x["score"])
pass_at_n = any(c["test_ok"] for c in task_candidates)
benchmark_rows.append({
"job": job["name"],
"best_tests_passed": greatest["tests_passed"],
"tests_total": greatest["tests_total"],
"best_pass_rate": greatest["tests_passed"] / max(greatest["tests_total"], 1),
"pass_at_n": pass_at_n,
"best_complexity": greatest["complexity"],
"best_score": greatest["score"],
})
benchmark_df = pd.DataFrame(benchmark_rows)
show(benchmark_df)
plt.determine(figsize=(9, 4))
plt.bar(benchmark_df["task"], benchmark_df["best_pass_rate"])
plt.ylim(0, 1.05)
plt.ylabel("Best candidate go fee")
plt.xlabel("Task")
plt.title("CodeGen mini benchmark: best-of-N unit-test go fee")
plt.xticks(rotation=30, ha="proper")
plt.tight_layout()
plt.present()
print("n" + "=" * 90)
print("Exporting artifacts")
print("=" * 90)
candidates_path = OUT_DIR / "codegen_candidates.jsonl"
summary_path = OUT_DIR / "benchmark_summary.csv"
solutions_path = OUT_DIR / "best_solutions.py"
pipeline_path = OUT_DIR / "multi_turn_pipeline.py"
with open(candidates_path, "w", encoding="utf-8") as f:
for c in all_candidates:
serializable = dict(c)
f.write(json.dumps(serializable, ensure_ascii=False, default=str) + "n")
benchmark_df.to_csv(summary_path, index=False)
with open(solutions_path, "w", encoding="utf-8") as f:
f.write("# Best generated options from Salesforce CodeGen tutorialnn")
for task_name, greatest in best_solutions.objects():
f.write(f"# ---- {task_name} ----n")
f.write(greatest["source"] if greatest["source"] else "# No supply generated")
f.write("nn")
with open(pipeline_path, "w", encoding="utf-8") as f:
f.write(pipeline_code)
print("Saved recordsdata:")
print(candidates_path)
print(summary_path)
print(solutions_path)
print(pipeline_path)
print("n" + "=" * 90)
print("Optional: interactive single-prompt helper")
print("=" * 90)
def codegen_assistant(user_task, function_signature, max_new_tokens=180, candidates=2):
immediate = f'''# Write clear Python 3 code.
# Task: {user_task}
# Rules:
# - Do not import packages except completely vital.
# - Do not print something.
# - Return values from the perform.
# - Keep the perform readable.
{function_signature}
'''
outputs = generate_text(
immediate,
max_new_tokens=max_new_tokens,
temperature=0.45,
top_p=0.92,
do_sample=True,
num_return_sequences=candidates,
)
extracted = []
fn_match = re.search(r"defs+([A-Za-z_][A-Za-z0-9_]*)s*(", function_signature)
fn_name = fn_match.group(1) if fn_match else None
for i, out in enumerate(outputs):
src = extract_function_source(out, fn_name) if fn_name else out
extracted.append(src)
print_code(f"Candidate {i+1}", src if src else out)
return extracted
custom_candidates = codegen_assistant(
user_task="Return the second largest distinctive quantity in a listing. If fewer than two distinctive numbers exist, return None.",
function_signature="def second_largest_unique(values):",
max_new_tokens=160,
candidates=2,
)
print("nTutorial full.")
print("Tip: change MODEL_ID close to the highest or set os.environ['CODEGEN_MODEL_ID'] earlier than working to attempt bigger CodeGen variants.")
We combination benchmark outcomes and visualize the perfect candidate go charges throughout all duties. We export generated candidates, benchmark summaries, greatest options, and the composed pipeline as reusable recordsdata. We end by including an interactive helper perform that lets us generate new CodeGen options from customized user-defined programming duties.
Conclusion
In conclusion, we constructed a sensible, superior Salesforce CodeGen tutorial and demonstrated tips on how to flip uncooked mannequin outputs into extra dependable code. We began with easy code completion, then strengthened the workflow with automated extraction, security checks, unit exams, reranking, multi-turn composition, immediate templates, and benchmark reporting. Finally, now we have an entire mini-framework for experimenting with CodeGen, evaluating generated candidates, validating their correctness, and exporting helpful outcomes for additional evaluation or integration into bigger code-generation programs.
Check out the Full Codes here. Also, be at liberty to observe us on Twitter and don’t neglect to hitch our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us
The submit Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks appeared first on MarkTechPost.
