A Coding Guide for Property-Based Testing Using Hypothesis with Stateful, Differential, and Metamorphic Test Design

In this tutorial, we discover property-based testing utilizing Hypothesis and construct a rigorous testing pipeline that goes far past conventional unit testing. We implement invariants, differential testing, metamorphic testing, focused exploration, and stateful testing to validate each purposeful correctness and behavioral ensures of our programs. Instead of manually crafting edge circumstances, we let Hypothesis generate structured inputs, shrink failures to minimal counterexamples, and systematically uncover hidden bugs. Also, we display how fashionable testing practices could be built-in instantly into experimental and research-driven workflows.

Copy Code

import sys, textwrap, subprocess, os, re, math
!{sys.executable} -m pip -q set up speculation pytest


test_code = r'''
import re, math
import pytest
from speculation import (
   given, assume, instance, settings, be aware, goal,
   Well beingCheck, Phase
)
from speculation import methods as st
from speculation.stateful import RuleBasedStateMachine, rule, invariant, initialize, precondition


def clamp(x: int, lo: int, hello: int) -> int:
   if x < lo:
       return lo
   if x > hello:
       return hello
   return x


def normalize_whitespace(s: str) -> str:
   return " ".be part of(s.break up())


def is_sorted_non_decreasing(xs):
   return all(xs[i] <= xs[i+1] for i in vary(len(xs)-1))


def merge_sorted(a, b):
   i = j = 0
   out = []
   whereas i < len(a) and j < len(b):
       if a[i] <= b[j]:
           out.append(a[i]); i += 1
       else:
           out.append(b[j]); j += 1
   out.lengthen(a[i:])
   out.lengthen(b[j:])
   return out


def merge_sorted_reference(a, b):
   return sorted(checklist(a) + checklist(b))

We arrange the setting by putting in Hypothesis and pytest and importing all required modules. We start setting up the total check suite by defining core utility features corresponding to clamp, normalize_whitespace, and merge_sorted. We set up the purposeful basis that our property-based assessments will rigorously validate in later snippets.

Copy Code

def safe_parse_int(s: str):
   t = s.strip()
   if re.fullmatch(r"[+-]?d+", t) is None:
       return (False, "not_an_int")
   if len(t.lstrip("+-")) > 2000:
       return (False, "too_big")
   strive:
       return (True, int(t))
   besides Exception:
       return (False, "parse_error")


def safe_parse_int_alt(s: str):
   t = s.strip()
   if not t:
       return (False, "not_an_int")
   signal = 1
   if t[0] == "+":
       t = t[1:]
   elif t[0] == "-":
       signal = -1
       t = t[1:]
   if not t or any(ch < "0" or ch > "9" for ch in t):
       return (False, "not_an_int")
   if len(t) > 2000:
       return (False, "too_big")
   val = 0
   for ch in t:
       val = val * 10 + (ord(ch) - 48)
   return (True, signal * val)


bounds = st.tuples(st.integers(-10_000, 10_000), st.integers(-10_000, 10_000)).map(
   lambda t: (t[0], t[1]) if t[0] <= t[1] else (t[1], t[0])
)


@st.composite
def int_like_strings(draw):
   signal = draw(st.sampled_from(["", "+", "-"]))
   digits = draw(st.textual content(alphabet=st.characters(min_codepoint=48, max_codepoint=57), min_size=1, max_size=300))
   left_ws = draw(st.textual content(alphabet=[" ", "t", "n"], min_size=0, max_size=5))
   right_ws = draw(st.textual content(alphabet=[" ", "t", "n"], min_size=0, max_size=5))
   return f"{left_ws}{signal}{digits}{right_ws}"


sorted_lists = st.lists(st.integers(-10_000, 10_000), min_size=0, max_size=200).map(sorted)

We implement parsing logic and outline structured methods that generate constrained, significant check inputs. We create composite methods corresponding to int_like_strings to exactly management the enter area for property validation. We put together sorted checklist mills and bounds methods that allow differential and invariant-based testing.

Copy Code

@settings(max_examples=300, suppress_health_check=[HealthCheck.too_slow])
@given(x=st.integers(-50_000, 50_000), b=bounds)
def test_clamp_within_bounds(x, b):
   lo, hello = b
   y = clamp(x, lo, hello)
   assert lo <= y <= hello


@settings(max_examples=300, suppress_health_check=[HealthCheck.too_slow])
@given(x=st.integers(-50_000, 50_000), b=bounds)
def test_clamp_idempotent(x, b):
   lo, hello = b
   y = clamp(x, lo, hello)
   assert clamp(y, lo, hello) == y


@settings(max_examples=250)
@given(s=st.textual content())
@instance("   attb n c  ")
def test_normalize_whitespace_is_idempotent(s):
   t = normalize_whitespace(s)
   assert normalize_whitespace(t) == t
   assert normalize_whitespace(" nt " + s + "  t") == normalize_whitespace(s)


@settings(max_examples=250, suppress_health_check=[HealthCheck.too_slow])
@given(a=sorted_lists, b=sorted_lists)
def test_merge_sorted_matches_reference(a, b):
   out = merge_sorted(a, b)
   ref = merge_sorted_reference(a, b)
   assert out == ref
   assert is_sorted_non_decreasing(out)

We outline core property assessments that validate correctness and idempotence throughout a number of features. We use Hypothesis decorators to routinely discover edge circumstances and confirm behavioral ensures corresponding to boundary constraints and deterministic normalization. We additionally implement differential testing to make sure our merge implementation matches a trusted reference.

Copy Code

@settings(max_examples=250, deadline=200, suppress_health_check=[HealthCheck.too_slow])
@given(s=int_like_strings())
def test_two_parsers_agree_on_int_like_strings(s):
   ok1, v1 = safe_parse_int(s)
   ok2, v2 = safe_parse_int_alt(s)
   assert ok1 and ok2
   assert v1 == v2


@settings(max_examples=250)
@given(s=st.textual content(min_size=0, max_size=200))
def test_safe_parse_int_rejects_non_ints(s):
   t = s.strip()
   m = re.fullmatch(r"[+-]?d+", t)
   okay, val = safe_parse_int(s)
   if m is None:
       assert okay is False
   else:
       if len(t.lstrip("+-")) > 2000:
           assert okay is False and val == "too_big"
       else:
           assert okay is True and isinstance(val, int)


def variance(xs):
   if len(xs) < 2:
       return 0.0
   mu = sum(xs) / len(xs)
   return sum((x - mu) ** 2 for x in xs) / (len(xs) - 1)


@settings(max_examples=250, phases=[Phase.generate, Phase.shrink])
@given(xs=st.lists(st.integers(-1000, 1000), min_size=0, max_size=80))
def test_statistics_sanity(xs):
   goal(variance(xs))
   if len(xs) == 0:
       assert variance(xs) == 0.0
   elif len(xs) == 1:
       assert variance(xs) == 0.0
   else:
       v = variance(xs)
       assert v >= 0.0
       okay = 7
       assert math.isclose(variance([x + k for x in xs]), v, rel_tol=1e-12, abs_tol=1e-12)

We lengthen our validation to parsing robustness and statistical correctness utilizing focused exploration. We confirm that two impartial integer parsers agree on structured inputs and implement rejection guidelines on invalid strings. We additional implement metamorphic testing by validating invariants of variance below transformation.

Copy Code

class Bank:
   def __init__(self):
       self.stability = 0
       self.ledger = []


   def deposit(self, amt: int):
       if amt <= 0:
           elevate ValueError("deposit have to be optimistic")
       self.stability += amt
       self.ledger.append(("dep", amt))


   def withdraw(self, amt: int):
       if amt <= 0:
           elevate ValueError("withdraw have to be optimistic")
       if amt > self.stability:
           elevate ValueError("inadequate funds")
       self.stability -= amt
       self.ledger.append(("wd", amt))


   def replay_balance(self):
       bal = 0
       for typ, amt in self.ledger:
           bal += amt if typ == "dep" else -amt
       return bal


class BankMachine(RuleBasedStateMachine):
   def __init__(self):
       tremendous().__init__()
       self.financial institution = Bank()


   @initialize()
   def init(self):
       assert self.financial institution.stability == 0
       assert self.financial institution.replay_balance() == 0


   @rule(amt=st.integers(min_value=1, max_value=10_000))
   def deposit(self, amt):
       self.financial institution.deposit(amt)


   @precondition(lambda self: self.financial institution.stability > 0)
   @rule(amt=st.integers(min_value=1, max_value=10_000))
   def withdraw(self, amt):
       assume(amt <= self.financial institution.stability)
       self.financial institution.withdraw(amt)


   @invariant()
   def balance_never_negative(self):
       assert self.financial institution.stability >= 0


   @invariant()
   def ledger_replay_matches_balance(self):
       assert self.financial institution.replay_balance() == self.financial institution.stability


TestBankMachine = BankMachine.TestCase
'''


path = "/tmp/test_hypothesis_advanced.py"
with open(path, "w", encoding="utf-8") as f:
   f.write(test_code)


print("Hypothesis model:", __import__("speculation").__version__)
print("nRunning pytest on:", path, "n")


res = subprocess.run([sys.executable, "-m", "pytest", "-q", path], capture_output=True, textual content=True)
print(res.stdout)
if res.returncode != 0:
   print(res.stderr)


if res.returncode == 0:
   print("nAll Hypothesis assessments handed.")
elif res.returncode == 5:
   print("nPytest collected no assessments.")
else:
   print("nSome assessments failed.")

We implement a stateful system utilizing Hypothesis’s rule-based state machine to simulate a checking account. We outline guidelines, preconditions, and invariants to ensure stability consistency and ledger integrity below arbitrary operation sequences. We then execute your entire check suite through pytest, permitting Hypothesis to routinely uncover counterexamples and confirm system correctness.

In conclusion, we constructed a complete property-based testing framework that validates pure features, parsing logic, statistical habits, and even stateful programs with invariants. We leveraged Hypothesis’s shrinking, focused search, and state machine testing capabilities to maneuver from example-based testing to behavior-driven verification. It permits us to purpose about correctness at a better degree of abstraction whereas sustaining sturdy ensures for edge circumstances and system consistency.

Check out the Full Coding Notebook here. Also, be happy to comply with us on Twitter and don’t neglect to affix our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us

The submit A Coding Guide for Property-Based Testing Using Hypothesis with Stateful, Differential, and Metamorphic Test Design appeared first on MarkTechPost.

A Coding Guide for Property-Based Testing Using Hypothesis with Stateful, Differential, and Metamorphic Test Design

Alibaba Qwen Unveils Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507: Refreshing the Importance of Small Language Models

Google Researchers Release Magenta RealTime: An Open-Weight Model for Real-Time AI Music Generation

Microsoft Releases Phi-4-mini-Flash-Reasoning: Efficient Long-Context Reasoning with Compact Architecture

Tencent Hunyuan Releases HunyuanOCR: a 1B Parameter End to End OCR Expert VLM

Maya1: A New Open Source 3B Voice Model For Expressive Text To Speech On A Single GPU

Ant Group Releases Ling 2.0: A Reasoning-First MoE Language Model Series Built on the Principle that Each Activation Enhances Reasoning Capability

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!