Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios
How do you audit frontier LLMs for misaligned conduct in real looking multi-turn, tool-use settings—at scale and past coarse mixture scores? Anthropic launched Petri (Parallel Exploration Tool for Risky Interactions), an open-source framework that automates alignment audits by orchestrating an auditor agent to probe a goal mannequin throughout multi-turn, tool-augmented interactions and a choose mannequin…
