Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents

Agentic methods are stochastic, context-dependent, and policy-bounded. Conventional QA—unit assessments, static prompts, or scalar “LLM-as-a-judge” scores—fails to expose multi-turn vulnerabilities and offers weak audit trails. Developer groups want protocol-accurate conversations, express coverage checks, and machine-readable proof that may gate releases with confidence.

Qualifire AI has open-sourced Rogue, a Python framework that evaluates AI brokers over the Agent-to-Agent (A2A) protocol. Rogue converts enterprise insurance policies into executable eventualities, drives multi-turn interactions in opposition to a goal agent, and outputs deterministic reviews appropriate for CI/CD and compliance critiques.

Quick Start

Prerequisites

uvx – If not put in, observe uv installation guide
Python 3.10+
An API key for an LLM supplier (e.g., OpenAI, Google, Anthropic).

Installation

Option 1: Quick Install (Recommended)

Use our automated set up script to rise up and operating rapidly:

Copy Code

# TUI
uvx rogue-ai
# Web UI
uvx rogue-ai ui
# CLI / CI/CD
uvx rogue-ai cli

Option 2: Manual Installation

(a) Clone the repository:

Copy Code

git clone https://github.com/qualifire-dev/rogue.git
cd rogue

(b) Install dependencies:

If you might be utilizing uv:

Copy Code

uv sync

Or, if you’re utilizing pip:

Copy Code

pip set up -e .

(c) OPTIONALLY: Set up your atmosphere variables: Create a .env file in the root listing and add your API keys. Rogue makes use of LiteLLM, so you may set keys for varied suppliers.

Copy Code

OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."

Running Rogue

Rogue operates on a client-server structure the place the core analysis logic runs in a backend server, and varied shoppers join to it for various interfaces.

Default Behavior

When you run uvx rogue-ai with none mode specified, it:

Starts the Rogue server in the background
Launches the TUI (Terminal User Interface) consumer

Copy Code

uvx rogue-ai

Available Modes

Default (Server + TUI): uvx rogue-ai – Starts server in background + TUI consumer
Server: uvx rogue-ai server – Runs solely the backend server
TUI: uvx rogue-ai tui – Runs solely the TUI consumer (requires server operating)
Web UI: uvx rogue-ai ui – Runs solely the Gradio internet interface consumer (requires server operating)
CLI: uvx rogue-ai cli – Runs non-interactive command-line analysis (requires server operating, superb for CI/CD)

Mode Arguments

Server Mode

Copy Code

uvx rogue-ai server [OPTIONS]

Options:

–host HOST – Host to run the server on (default: 127.0.0.1 or HOST env var)
–port PORT – Port to run the server on (default: 8000 or PORT env var)
–debug – Enable debug logging

TUI Mode

Copy Code

uvx rogue-ai tui [OPTIONS]
Web UI Mode
uvx rogue-ai ui [OPTIONS]

Options:

–rogue-server-url URL – Rogue server URL (default: http://localhost:8000)
–port PORT – Port to run the UI on
–workdir WORKDIR – Working listing (default: ./.rogue)
–debug – Enable debug logging

Example: Testing the T-Shirt Store Agent

This repository features a easy instance agent that sells T-shirts. You can use it to see Rogue in motion.

Install instance dependencies:

If you might be utilizing uv:

Copy Code

 uv sync --group examples

or, if you’re utilizing pip:

Copy Code

pip set up -e .[examples]

(a) Start the instance agent server in a separate terminal:

If you might be utilizing uv:

Copy Code

uv run examples/tshirt_store_agent

If not:

Copy Code

python examples/tshirt_store_agent

This will begin the agent on http://localhost:10001.

(b) Configure Rogue in the UI to level to the instance agent:

Agent URL: http://localhost:10001
Authentication: no-auth

(c) Run the analysis and watch Rogue check the T-Shirt agent’s insurance policies!

You can use both the TUI (uvx rogue-ai) or Web UI (uvx rogue-ai ui) mode.

Where Rogue Fits: Practical Use Cases

Safety & Compliance Hardening: Validate PII/PHI dealing with, refusal conduct, secret-leak prevention, and regulated-domain insurance policies with transcript-anchored proof.
E-Commerce & Support Agents: Enforce OTP-gated reductions, refund guidelines, SLA-aware escalation, and tool-use correctness (order lookup, ticketing) beneath adversarial and failure situations.
Developer/DevOps Agents: Assess code-mod and CLI copilots for workspace confinement, rollback semantics, rate-limit/backoff conduct, and unsafe command prevention.
Multi-Agent Systems: Verify plannerexecutor contracts, functionality negotiation, and schema conformance over A2A; consider interoperability throughout heterogeneous frameworks.
Regression & Drift Monitoring: Nightly suites in opposition to new mannequin variations or immediate adjustments; detect behavioral drift and implement policy-critical cross standards earlier than launch.

What Exactly Is Rogue—and Why Should Agent Dev Teams Care?

Rogue is an end-to-end testing framework designed to consider the efficiency, compliance, and reliability of AI brokers. Rogue synthesizes enterprise context and danger into structured assessments with clear targets, ways and success standards. The EvaluatorAgent runs protocol right conversations in quick single flip or deep multi flip adversarial modes. Bring your individual mannequin, or let Rogue use Qualifire’s bespoke SLM judges to drive the assessments. Streaming observability and deterministic artifacts: stay transcripts,cross/fail verdicts, rationales tied to transcript spans, timing and mannequin/model lineage.

Under the Hood: How Rogue Is Built

Rogue operates on a client-server structure:

Rogue Server: Contains the core analysis logic
Client Interfaces: Multiple interfaces that join to the server:
- TUI (Terminal UI): Modern terminal interface constructed with Go and Bubble Tea
- Web UI: Gradio-based internet interface
- CLI: Command-line interface for automated analysis and CI/CD

This structure permits for versatile deployment and utilization patterns, the place the server can run independently and a number of shoppers can join to it concurrently.

Summary

Rogue helps developer groups check agent conduct the method it truly runs in manufacturing. It turns written insurance policies into concrete eventualities, workouts these eventualities over A2A, and information what occurred with transcripts you may audit. The result’s a transparent, repeatable sign you need to use in CI/CD to catch coverage breaks and regressions earlier than they ship.

Find Rogue on GitHub

Thanks to the Qualifire workforce for the thought management/ Resources for this text. Qualifire workforce has supported this content material/article.

The put up Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents appeared first on MarkTechPost.

Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents

Quick Start

Prerequisites

Installation

Option 1: Quick Install (Recommended)

Option 2: Manual Installation

Running Rogue

Default Behavior

Available Modes

Mode Arguments

Server Mode

Where Rogue Fits: Practical Use Cases

What Exactly Is Rogue—and Why Should Agent Dev Teams Care?

Under the Hood: How Rogue Is Built

Summary

NVIDIA AI Releases Universal Deep Research (UDR): A Prototype Framework for Scalable and Auditable Deep Research Agents

Salesforce AI Released GTA1: A Test-Time Scaled GUI Agent That Outperforms OpenAI’s CUA

How to Build a Fully Self-Verifying Data Operations AI Agent Using Local Hugging Face Models for Automated Planning, Execution, and Testing

Google Brings Gemini CLI to GitHub Actions: Secure, Free, and Enterprise-Ready AI Integration

Scaling Global Trade with AI-Powered Tools for SMBs – with Kuo Zhang of Alibaba.com

Google AI Introduces VISTA: A Test Time Self Improving Agent for Text to Video Generation

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Quick Start

Prerequisites

Installation

Option 1: Quick Install (Recommended)

Option 2: Manual Installation

Running Rogue

Default Behavior

Available Modes

Mode Arguments

Server Mode

Where Rogue Fits: Practical Use Cases

What Exactly Is Rogue—and Why Should Agent Dev Teams Care?

Under the Hood: How Rogue Is Built

Summary

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!