Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks

In this tutorial, we implement an end-to-end workflow for Salesforce CodeGen. We load a CodeGen mannequin from Hugging Face, put together it for code era, and use it to generate Python capabilities from natural-language prompts. We then transfer past fundamental inference by including perform extraction, syntax checking, static security checks, unit-test-based validation, best-of-N candidate reranking, multi-step program synthesis, prompt-style experimentation, benchmark visualization, and artifact export. Through this workflow, we learn the way CodeGen can be utilized not solely as a code completion mannequin but additionally as a part of a structured code-generation pipeline that evaluates, filters, and organizes generated options.

Loading the Salesforce CodeGen Model from Hugging Face

Copy Code

import os, sys, subprocess, textwrap, json, re, time, math, ast, tempfile, multiprocessing as mp
from pathlib import Path
def sh(cmd):
   print(f"n$ {cmd}")
   subprocess.run(cmd, shell=True, verify=True)
sh(f"{sys.executable} -m pip set up -q -U transformers speed up safetensors einops datasets consider pandas matplotlib tqdm wealthy radon tiktoken")
import torch
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
from wealthy import print
from wealthy.panel import Panel
from wealthy.syntax import Syntax
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
from radon.complexity import cc_visit
OUT_DIR = Path("/content material/codegen_advanced_tutorial")
OUT_DIR.mkdir(dad and mom=True, exist_ok=True)
set_seed(42)
print(Panel.match("Salesforce CodeGen Advanced Tutorial", fashion="daring inexperienced"))
print("nRuntime info")
print("Python:", sys.model.break up()[0])
print("Torch:", torch.__version__)
print("CUDA obtainable:", torch.cuda.is_available())
if torch.cuda.is_available():
   print("GPU:", torch.cuda.get_device_name(0))
   print("CUDA reminiscence GB:", spherical(torch.cuda.get_device_properties(0).total_memory / 1e9, 2))
MODEL_ID = os.environ.get("CODEGEN_MODEL_ID", "Salesforce/codegen-350M-mono")
MODEL_OPTIONS = {
   "easy_colab_default": "Salesforce/codegen-350M-mono",
   "larger_codegen1": "Salesforce/codegen-2B-mono",
   "codegen2_1b": "Salesforce/codegen2-1B_P",
   "codegen25_7b_mono": "Salesforce/codegen25-7b-mono_P",
}
print("nSelected mannequin:", MODEL_ID)
print("Available mannequin examples:", MODEL_OPTIONS)
trust_remote_code = any(x in MODEL_ID.decrease() for x in ["codegen2", "codegen25"])
gadget = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if torch.cuda.is_available() else torch.float32
print("nLoading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
   MODEL_ID,
   trust_remote_code=trust_remote_code
)
if tokenizer.pad_token is None:
   tokenizer.pad_token = tokenizer.eos_token
print("Loading mannequin...")
load_kwargs = {
   "trust_remote_code": trust_remote_code,
   "low_cpu_mem_usage": True,
}
if torch.cuda.is_available():
   load_kwargs["torch_dtype"] = dtype
   load_kwargs["device_map"] = "auto"
else:
   load_kwargs["torch_dtype"] = torch.float32
mannequin = AutoModelForCausalLM.from_pretrained(MODEL_ID, **load_kwargs)
if not torch.cuda.is_available():
   mannequin.to(gadget)
mannequin.eval()
def count_parameters(mannequin):
   return sum(p.numel() for p in mannequin.parameters())
print(f"Loaded {MODEL_ID}")
print(f"Parameter rely: {count_parameters(mannequin)/1e6:.1f}M")
def generate_text(
   immediate,
   max_new_tokens=180,
   temperature=0.35,
   top_p=0.92,
   top_k=50,
   do_sample=True,
   num_return_sequences=1,
   repetition_penalty=1.05,
):
   inputs = tokenizer(immediate, return_tensors="pt")
   inputs = {ok: v.to(mannequin.gadget) for ok, v in inputs.objects()}
   with torch.no_grad():
       outputs = mannequin.generate(
           **inputs,
           max_new_tokens=max_new_tokens,
           do_sample=do_sample,
           temperature=temperature,
           top_p=top_p,
           top_k=top_k,
           num_return_sequences=num_return_sequences,
           repetition_penalty=repetition_penalty,
           pad_token_id=tokenizer.eos_token_id,
           eos_token_id=tokenizer.eos_token_id,
       )
   decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)
   return decoded
def print_code(title, code):
   print(Panel.match(title, fashion="daring cyan"))
   print(Syntax(code, "python", theme="monokai", line_numbers=True))

We set up all required libraries and put together the atmosphere for working Salesforce CodeGen. We verify the runtime, detect GPU availability, choose the CodeGen mannequin, and load each the tokenizer and mannequin from Hugging Face. We additionally outline helper capabilities for textual content era and for displaying formatted code in order that the remainder of the tutorial is less complicated to observe.

Building Extraction, Safety, and Unit-Test Validation Utilities

Copy Code

def extract_function_source(full_text, function_name):
   textual content = full_text.change("rn", "n")
   fence = re.search(r"```(?:python)?n(.*?)```", textual content, flags=re.S | re.I)
   if fence:
       textual content = fence.group(1)
   sample = rf"^defs+{re.escape(function_name)}s*("
   match = re.search(sample, textual content, flags=re.M)
   if not match:
       return ""
   chunk = textual content[match.start():]
   traces = chunk.splitlines()
   collected = []
   for i, line in enumerate(traces):
       if i > 0:
           if line.startswith("def ") or line.startswith("class "):
               break
           if line.startswith("if __name__"):
               break
           if line and not line.startswith((" ", "t", "#")) and re.match(r"^[A-Za-z_][A-Za-z0-9_]*s*=", line):
               break
       collected.append(line)
   supply = "n".be a part of(collected).rstrip()
   attempt:
       ast.parse(supply)
       return supply
   besides SyntaxError:
       fixed_lines = []
       for line in collected:
           fixed_lines.append(line)
           candidate = "n".be a part of(fixed_lines).rstrip()
           attempt:
               ast.parse(candidate)
               supply = candidate
           besides SyntaxError:
               go
       return supply if supply.strip().startswith("def ") else ""
def syntax_ok(supply):
   attempt:
       ast.parse(supply)
       return True, ""
   besides SyntaxError as e:
       return False, str(e)
FORBIDDEN_NAMES = {
   "eval", "exec", "compile", "open", "enter", "__import__",
   "globals", "locals", "vars", "dir", "getattr", "setattr", "delattr",
   "assist", "breakpoint", "exit", "give up"
}
FORBIDDEN_NODES = (
   ast.Import,
   ast.ImportFrom,
   ast.Global,
   ast.Nonlocal,
   ast.With,
   ast.AsyncWith,
   ast.AsyncFunctionDef,
   ast.ClassDef,
   ast.Delete,
   ast.Raise,
)
ALLOWED_BUILTINS = {
   "abs": abs,
   "all": all,
   "any": any,
   "bool": bool,
   "dict": dict,
   "enumerate": enumerate,
   "float": float,
   "int": int,
   "isinstance": isinstance,
   "len": len,
   "record": record,
   "map": map,
   "max": max,
   "min": min,
   "pow": pow,
   "vary": vary,
   "reversed": reversed,
   "spherical": spherical,
   "set": set,
   "sorted": sorted,
   "str": str,
   "sum": sum,
   "tuple": tuple,
   "zip": zip,
}
def static_safety_check(supply):
   attempt:
       tree = ast.parse(supply)
   besides SyntaxError as e:
       return False, f"SyntaxError: {e}"
   for node in ast.stroll(tree):
       if isinstance(node, FORBIDDEN_NODES):
           return False, f"Forbidden AST node: {kind(node).__name__}"
       if isinstance(node, ast.Name):
           if node.id in FORBIDDEN_NAMES or node.id.startswith("__"):
               return False, f"Forbidden title: {node.id}"
       if isinstance(node, ast.Attribute):
           if node.attr.startswith("__"):
               return False, f"Forbidden attribute: {node.attr}"
       if isinstance(node, ast.Call):
           if isinstance(node.func, ast.Name) and node.func.id in FORBIDDEN_NAMES:
               return False, f"Forbidden name: {node.func.id}"
   return True, "handed"
def _worker_run_tests(supply, function_name, exams, queue):
   attempt:
       safe_globals = {"__builtins__": ALLOWED_BUILTINS}
       safe_locals = {}
       compiled = compile(supply, "<generated_code>", "exec")
       exec(compiled, safe_globals, safe_locals)
       fn = safe_locals.get(function_name) or safe_globals.get(function_name)
       if fn is None:
           queue.put({"okay": False, "error": f"{function_name} not discovered", "handed": 0, "complete": len(exams)})
           return
       handed = 0
       particulars = []
       for take a look at in exams:
           args = take a look at.get("args", [])
           kwargs = take a look at.get("kwargs", {})
           anticipated = take a look at["expected"]
           end result = fn(*args, **kwargs)
           okay = end result == anticipated
           handed += int(okay)
           particulars.append({
               "args": args,
               "kwargs": kwargs,
               "anticipated": anticipated,
               "end result": end result,
               "okay": okay,
           })
       queue.put({"okay": handed == len(exams), "error": "", "handed": handed, "complete": len(exams), "particulars": particulars})
   besides Exception as e:
       queue.put({"okay": False, "error": repr(e), "handed": 0, "complete": len(exams)})
def run_unit_tests_safely(supply, function_name, exams, timeout_seconds=3):
   secure, motive = static_safety_check(supply)
   if not secure:
       return {"okay": False, "error": motive, "handed": 0, "complete": len(exams), "particulars": []}
   ctx = mp.get_context("fork")
   queue = ctx.Queue()
   course of = ctx.Process(goal=_worker_run_tests, args=(supply, function_name, exams, queue))
   course of.begin()
   course of.be a part of(timeout_seconds)
   if course of.is_alive():
       course of.terminate()
       course of.be a part of()
       return {"okay": False, "error": "timeout", "handed": 0, "complete": len(exams), "particulars": []}
   if queue.empty():
       return {"okay": False, "error": "no end result returned", "handed": 0, "complete": len(exams), "particulars": []}
   return queue.get()
def code_complexity(supply):
   attempt:
       blocks = cc_visit(supply)
       if not blocks:
           return 1
       return max(block.complexity for block in blocks)
   besides Exception:
       return None
def score_candidate(supply, test_result):
   syntax_score = 1 if syntax_ok(supply)[0] else 0
   safety_score = 1 if static_safety_check(supply)[0] else 0
   handed = test_result.get("handed", 0)
   complete = max(test_result.get("complete", 1), 1)
   test_score = handed / complete
   complexity = code_complexity(supply)
   complexity_penalty = 0 if complexity is None else min(complexity / 20, 0.25)
   return syntax_score + safety_score + 3 * test_score - complexity_penalty

We construct the utility layer that extracts generated Python capabilities from uncooked mannequin outputs. We add syntax validation, static security checks, restricted execution, unit-test execution, and timeout dealing with to make generated code simpler to judge. We additionally calculate code complexity and create a scoring perform to rank generated candidates by correctness, security, and simplicity.

Copy Code

print("n" + "=" * 90)

Generating Code and Defining Benchmark Tasks

Copy Code

print("Demo 1: Basic natural-language-to-code completion")
print("=" * 90)
basic_prompt = """# Write a Python perform that returns the world of a circle.
# The perform needs to be named circle_area and ought to settle for radius as enter.
# Do not print something. Return the numeric end result.
def circle_area(radius):
"""
basic_output = generate_text(
   basic_prompt,
   max_new_tokens=120,
   temperature=0.25,
   do_sample=True,
   num_return_sequences=1,
)[0]
print_code("Raw CodeGen output", basic_output)
circle_source = extract_function_source(basic_output, "circle_area")
print_code("Extracted perform", circle_source if circle_source else "# No perform extracted")
circle_tests = [
   {"args": [1], "anticipated": math.pi},
   {"args": [2], "anticipated": 4 * math.pi},
]
if circle_source:
   print("Syntax:", syntax_ok(circle_source))
   print("Safety:", static_safety_check(circle_source))
   print("Complexity:", code_complexity(circle_source))
print("n" + "=" * 90)
print("Demo 2: Best-of-N era with test-based reranking")
print("=" * 90)
TASKS = [
   {
       "name": "factorial",
       "signature": "def factorial(n):",
       "instruction": "Return n factorial for a non-negative integer n. Use 1 for factorial(0).",
       "tests": [
           {"args": [0], "anticipated": 1},
           {"args": [1], "anticipated": 1},
           {"args": [5], "anticipated": 120},
           {"args": [7], "anticipated": 5040},
       ],
   },
   {
       "title": "is_palindrome",
       "signature": "def is_palindrome(textual content):",
       "instruction": "Return True if textual content is a palindrome after eradicating areas and ignoring case, in any other case return False.",
       "exams": [
           {"args": ["Race car"], "anticipated": True},
           {"args": ["hello"], "anticipated": False},
           {"args": ["Never odd or even"], "anticipated": True},
       ],
   },
   {
       "title": "fibonacci",
       "signature": "def fibonacci(n):",
       "instruction": "Return the nth Fibonacci quantity the place fibonacci(0)=0 and fibonacci(1)=1.",
       "exams": [
           {"args": [0], "anticipated": 0},
           {"args": [1], "anticipated": 1},
           {"args": [8], "anticipated": 21},
           {"args": [10], "anticipated": 55},
       ],
   },
   {
       "title": "dedupe_keep_order",
       "signature": "def dedupe_keep_order(objects):",
       "instruction": "Return a listing with duplicate values eliminated whereas preserving the primary incidence order.",
       "exams": [
           {"args": [[1, 2, 1, 3, 2]], "anticipated": [1, 2, 3]},
           {"args": [["a", "b", "a", "c"]], "anticipated": ["a", "b", "c"]},
           {"args": [[]], "anticipated": []},
       ],
   },
]

We begin with a easy natural-language-to-code era instance utilizing a circle space perform. We generate uncooked CodeGen output, extract the perform, and examine its syntax, security, and complexity. We then outline a number of programming duties that later assist us benchmark CodeGen throughout totally different function-generation issues.

Best-of-N Candidate Generation and Test-Based Reranking

Copy Code

def build_prompt(job):
   examples = []
   for t in job["tests"][:2]:
       examples.append(f"# Example: {job['name']}(*{t['args']}) -> {repr(t['expected'])}")
   example_block = "n".be a part of(examples)
   return f'''# You are writing clear Python 3 code.
# Task: {job["instruction"]}
# Rules:
# - Do not import packages.
# - Do not print something.
# - Return the reply from the perform.
# - Keep the implementation compact and readable.
{example_block}
{job["signature"]}
'''
def generate_candidates_for_task(job, n=3, max_new_tokens=160):
   immediate = build_prompt(job)
   outputs = generate_text(
       immediate,
       max_new_tokens=max_new_tokens,
       temperature=0.45,
       top_p=0.92,
       do_sample=True,
       num_return_sequences=n,
       repetition_penalty=1.07,
   )
   candidates = []
   for i, out in enumerate(outputs):
       supply = extract_function_source(out, job["name"])
       syntax_pass, syntax_error = syntax_ok(supply) if supply else (False, "no supply extracted")
       test_result = run_unit_tests_safely(supply, job["name"], job["tests"]) if supply else {
           "okay": False,
           "error": "no supply extracted",
           "handed": 0,
           "complete": len(job["tests"]),
           "particulars": [],
       }
       candidates.append({
           "job": job["name"],
           "candidate_id": i,
           "immediate": immediate,
           "raw_output": out,
           "supply": supply,
           "syntax_ok": syntax_pass,
           "syntax_error": syntax_error,
           "security": static_safety_check(supply)[0] if supply else False,
           "tests_passed": test_result.get("handed", 0),
           "tests_total": test_result.get("complete", len(job["tests"])),
           "test_ok": test_result.get("okay", False),
           "test_error": test_result.get("error", ""),
           "complexity": code_complexity(supply) if supply else None,
           "rating": score_candidate(supply, test_result) if supply else -999,
       })
   candidates = sorted(candidates, key=lambda x: x["score"], reverse=True)
   return candidates
all_candidates = []
best_solutions = {}
CANDIDATES_PER_TASK = 2
for job in tqdm(TASKS, desc="Generating and evaluating"):
   candidates = generate_candidates_for_task(job, n=CANDIDATES_PER_TASK)
   all_candidates.lengthen(candidates)
   best_solutions[task["name"]] = candidates[0]
results_df = pd.DataFrame([
   {
       "task": c["task"],
       "candidate_id": c["candidate_id"],
       "syntax_ok": c["syntax_ok"],
       "security": c["safety"],
       "tests_passed": c["tests_passed"],
       "tests_total": c["tests_total"],
       "test_ok": c["test_ok"],
       "complexity": c["complexity"],
       "rating": spherical(c["score"], 3),
       "test_error": c["test_error"],
   }
   for c in all_candidates
]).sort_values(["task", "score"], ascending=[True, False])
print("nCandidate abstract")
show(results_df)
for task_name, greatest in best_solutions.objects():
   print_code(f"Best answer for {task_name}", greatest["source"] if greatest["source"] else "# No legitimate supply")
   print({
       "job": task_name,
       "tests_passed": f'{greatest["tests_passed"]}/{greatest["tests_total"]}',
       "rating": greatest["score"],
       "test_error": greatest["test_error"],
   })

We create structured prompts for every job and generate a number of candidate options utilizing CodeGen. We consider every candidate with unit exams, syntax checks, security checks, complexity evaluation, and a scoring system. We then summarize the ends in a DataFrame and show the best-generated answer for every job.

Copy Code

print("n" + "=" * 90)

Multi-Turn Program Synthesis and Prompt-Style Experiments

Copy Code

print("Demo 3: Multi-turn program synthesis")
print("=" * 90)
multi_turn_prompts = [
   {
       "name": "normalize_words",
       "prompt": """# Step 1.
# Write a Python function normalize_words(text).
# It should lowercase text, remove punctuation characters .,!?:;, and split into words.
# Do not import packages.
def normalize_words(text):
""",
       "tests": [
           {"args": ["Hello, HELLO world!"], "anticipated": ["hello", "hello", "world"]},
           {"args": ["A test: yes."], "anticipated": ["a", "test", "yes"]},
       ],
   },
   {
       "title": "word_counts",
       "immediate": """# Step 2.
# Write a Python perform word_counts(phrases).
# It receives a listing of phrases and returns a dictionary mapping every phrase to its frequency.
# Do not import packages.
def word_counts(phrases):
""",
       "exams": [
           {"args": [["a", "b", "a"]], "anticipated": {"a": 2, "b": 1}},
           {"args": [[]], "anticipated": {}},
       ],
   },
   {
       "title": "top_word",
       "immediate": """# Step 3.
# Write a Python perform top_word(counts).
# It receives a dictionary of phrase frequencies.
# Return the phrase with the best rely.
# If counts is empty, return None.
# If there's a tie, return the alphabetically smallest phrase.
# Do not import packages.
def top_word(counts):
""",
       "exams": [
           {"args": [{"a": 2, "b": 1}], "anticipated": "a"},
           {"args": [{"b": 2, "a": 2}], "anticipated": "a"},
           {"args": [{}], "anticipated": None},
       ],
   },
]
multi_turn_sources = []
for spec in multi_turn_prompts:
   out = generate_text(
       spec["prompt"],
       max_new_tokens=150,
       temperature=0.35,
       top_p=0.92,
       do_sample=True,
       num_return_sequences=1,
   )[0]
   src = extract_function_source(out, spec["name"])
   res = run_unit_tests_safely(src, spec["name"], spec["tests"]) if src else {"okay": False, "error": "no extraction"}
   multi_turn_sources.append(src)
   print_code(f"Generated {spec['name']}", src if src else "# No supply extracted")
   print("Test end result:", res)
pipeline_code = "nn".be a part of([s for s in multi_turn_sources if s])
pipeline_code += """
def most_common_word(textual content):
   phrases = normalize_words(textual content)
   counts = word_counts(phrases)
   return top_word(counts)
"""
pipeline_tests = [
   {"args": ["Hello hello, world!"], "anticipated": "whats up"},
   {"args": ["B b a a"], "anticipated": "a"},
]
pipeline_result = run_unit_tests_safely(pipeline_code, "most_common_word", pipeline_tests)
print_code("Composed multi-turn pipeline", pipeline_code)
print("Pipeline end result:", pipeline_result)
print("n" + "=" * 90)
print("Demo 4: Prompt types for various CodeGen workflows")
print("=" * 90)
PROMPT_LIBRARY = {
   "docstring_to_code": '''def group_by_first_letter(phrases):
   """
   Given a listing of strings, return a dictionary the place keys are first letters
   and values are lists of phrases starting with that letter.
   Preserve enter order.
   """
''',
   "partial_code_completion": '''def moving_average(values, window):
   end result = []
   for i in vary(len(values)):
''',
   "test_generation": '''# Write pytest-style exams for this perform.
def clamp(x, low, excessive):
   return max(low, min(x, excessive))
def test_clamp():
''',
   "refactor_request": '''# Refactor the next code right into a clear perform referred to as count_positive.
# x = [1, -2, 5, 0]
# c = 0
# for i in x:
#     if i > 0:
#         c = c + 1
# print(c)
def count_positive(values):
''',
}
for title, immediate in PROMPT_LIBRARY.objects():
   print("nWorkflow:", title)
   out = generate_text(
       immediate,
       max_new_tokens=120,
       temperature=0.35,
       top_p=0.92,
       do_sample=True,
       num_return_sequences=1,
   )[0]
   print_code(title, out)

We reveal multi-turn program synthesis by producing smaller capabilities that work collectively as a pipeline. We create capabilities for phrase normalization, phrase counting, and top-word choice, then compose them into an entire most-common-word workflow. We additionally take a look at totally different immediate types resembling docstring-to-code, partial completion, take a look at era, and refactoring.

Copy Code

print("Demo 5: Mini benchmark aggregation and visualization")
print("=" * 90)
benchmark_rows = []
for job in TASKS:
   task_candidates = [c for c in all_candidates if c["task"] == job["name"]]
   greatest = max(task_candidates, key=lambda x: x["score"])
   pass_at_n = any(c["test_ok"] for c in task_candidates)
   benchmark_rows.append({
       "job": job["name"],
       "best_tests_passed": greatest["tests_passed"],
       "tests_total": greatest["tests_total"],
       "best_pass_rate": greatest["tests_passed"] / max(greatest["tests_total"], 1),
       "pass_at_n": pass_at_n,
       "best_complexity": greatest["complexity"],
       "best_score": greatest["score"],
   })
benchmark_df = pd.DataFrame(benchmark_rows)
show(benchmark_df)
plt.determine(figsize=(9, 4))
plt.bar(benchmark_df["task"], benchmark_df["best_pass_rate"])
plt.ylim(0, 1.05)
plt.ylabel("Best candidate go fee")
plt.xlabel("Task")
plt.title("CodeGen mini benchmark: best-of-N unit-test go fee")
plt.xticks(rotation=30, ha="proper")
plt.tight_layout()
plt.present()
print("n" + "=" * 90)
print("Exporting artifacts")
print("=" * 90)
candidates_path = OUT_DIR / "codegen_candidates.jsonl"
summary_path = OUT_DIR / "benchmark_summary.csv"
solutions_path = OUT_DIR / "best_solutions.py"
pipeline_path = OUT_DIR / "multi_turn_pipeline.py"
with open(candidates_path, "w", encoding="utf-8") as f:
   for c in all_candidates:
       serializable = dict(c)
       f.write(json.dumps(serializable, ensure_ascii=False, default=str) + "n")
benchmark_df.to_csv(summary_path, index=False)
with open(solutions_path, "w", encoding="utf-8") as f:
   f.write("# Best generated options from Salesforce CodeGen tutorialnn")
   for task_name, greatest in best_solutions.objects():
       f.write(f"# ---- {task_name} ----n")
       f.write(greatest["source"] if greatest["source"] else "# No supply generated")
       f.write("nn")
with open(pipeline_path, "w", encoding="utf-8") as f:
   f.write(pipeline_code)
print("Saved recordsdata:")
print(candidates_path)
print(summary_path)
print(solutions_path)
print(pipeline_path)
print("n" + "=" * 90)
print("Optional: interactive single-prompt helper")
print("=" * 90)
def codegen_assistant(user_task, function_signature, max_new_tokens=180, candidates=2):
   immediate = f'''# Write clear Python 3 code.
# Task: {user_task}
# Rules:
# - Do not import packages except completely vital.
# - Do not print something.
# - Return values from the perform.
# - Keep the perform readable.
{function_signature}
'''
   outputs = generate_text(
       immediate,
       max_new_tokens=max_new_tokens,
       temperature=0.45,
       top_p=0.92,
       do_sample=True,
       num_return_sequences=candidates,
   )
   extracted = []
   fn_match = re.search(r"defs+([A-Za-z_][A-Za-z0-9_]*)s*(", function_signature)
   fn_name = fn_match.group(1) if fn_match else None
   for i, out in enumerate(outputs):
       src = extract_function_source(out, fn_name) if fn_name else out
       extracted.append(src)
       print_code(f"Candidate {i+1}", src if src else out)
   return extracted
custom_candidates = codegen_assistant(
   user_task="Return the second largest distinctive quantity in a listing. If fewer than two distinctive numbers exist, return None.",
   function_signature="def second_largest_unique(values):",
   max_new_tokens=160,
   candidates=2,
)
print("nTutorial full.")
print("Tip: change MODEL_ID close to the highest or set os.environ['CODEGEN_MODEL_ID'] earlier than working to attempt bigger CodeGen variants.")

We combination benchmark outcomes and visualize the perfect candidate go charges throughout all duties. We export generated candidates, benchmark summaries, greatest options, and the composed pipeline as reusable recordsdata. We end by including an interactive helper perform that lets us generate new CodeGen options from customized user-defined programming duties.

Conclusion

In conclusion, we constructed a sensible, superior Salesforce CodeGen tutorial and demonstrated tips on how to flip uncooked mannequin outputs into extra dependable code. We began with easy code completion, then strengthened the workflow with automated extraction, security checks, unit exams, reranking, multi-turn composition, immediate templates, and benchmark reporting. Finally, now we have an entire mini-framework for experimenting with CodeGen, evaluating generated candidates, validating their correctness, and exporting helpful outcomes for additional evaluation or integration into bigger code-generation programs.

Check out the Full Codes here. Also, be at liberty to observe us on Twitter and don’t neglect to hitch our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The submit Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks appeared first on MarkTechPost.

Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks

Loading the Salesforce CodeGen Model from Hugging Face

Building Extraction, Safety, and Unit-Test Validation Utilities

Generating Code and Defining Benchmark Tasks

Best-of-N Candidate Generation and Test-Based Reranking

Multi-Turn Program Synthesis and Prompt-Style Experiments

Conclusion

Meta FAIR Releases NeuralSet: A Python Package for Neuro-AI That Supports fMRI, M/EEG, Spikes, and HuggingFace Embeddings

Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing

Study finds AI can slash global carbon emissions

Meta AI Just Released DINOv3: A State-of-the-Art Computer Vision Model Trained with Self-Supervised Learning, Generating High-Resolution Image Features

Moonshot AI Releases Kimi K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior

AmbiGraph-Eval: A Benchmark for Resolving Ambiguity in Graph Query Generation

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Loading the Salesforce CodeGen Model from Hugging Face

Building Extraction, Safety, and Unit-Test Validation Utilities

Generating Code and Defining Benchmark Tasks

Best-of-N Candidate Generation and Test-Based Reranking

Multi-Turn Program Synthesis and Prompt-Style Experiments

Conclusion

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!