Fastino Labs Open-Sources GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size

As LLM-powered functions transfer into manufacturing — and as AI brokers tackle extra consequential duties like shopping the net, writing and executing code, and interacting with exterior companies — security moderation has quietly turn into one of probably the most operationally costly elements of the stack.

Most builders who’ve deployed a manufacturing LLM system know the issue: you have to consider each person immediate earlier than it reaches the mannequin, and each mannequin response earlier than it reaches the person. That means your guardrail mannequin runs on each single request, at each flip of a dialog. The guardrail latency compounds. The price compounds. And the present technology of open-source guardrail fashions — LlamaGuard4 (12B), WildGuard (7B), ShieldGemma (27B), NemoGuard (8B) — are all decoder-only fashions with billions of parameters, constructed for flexibility however not for velocity.

Fastino Labs launched GLiGuard, a 300 million parameter open-source security moderation mannequin designed to handle this particular drawback. GLiGuard evaluates a number of security dimensions in a single cross, and throughout 9 security benchmarks, its accuracy matches or exceeds fashions which might be 23 to 90 occasions its dimension whereas working as much as 16 occasions quicker.

https://pioneer.ai/weblog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

Why Decoder LLMs May Not Be the Right Tool for Safety Moderation

To perceive what makes GLiGuard totally different, it helps to know why present guardrail fashions are sluggish. Most main guardrail fashions are constructed on decoder-only transformer architectures, they generate their security verdicts autoregressively, one token at a time — the identical means a big language mannequin generates a response to a chat message.

This design made sense when security necessities have been fluid. Decoder fashions can interpret pure language activity descriptions and adapt to new security insurance policies with out retraining. But autoregressive technology is inherently sequential, which makes it sluggish and computationally costly.

There’s a compounding drawback on prime of that. Most guardrail fashions must assess inputs throughout a number of security dimensions: what kind of hurt is current, whether or not the person immediate is trying to bypass security coaching, whether or not the mannequin’s response is itself unsafe, and so forth. Because decoder fashions generate output sequentially, these assessments are sometimes produced one after one other, and latency compounds as extra standards are evaluated.

In different phrases, the structure that makes decoder fashions versatile can be the structure that makes them the unsuitable device for what’s essentially a classification drawback.

What GLiGuard Actually Does

GLiGuard is a small encoder-based mannequin that reframes security moderation as a textual content classification drawback moderately than a textual content technology drawback. Encoder fashions course of the complete enter directly and output a single classification label for a set of mounted labels, whereas decoder fashions generate their output one token at a time, left to proper.

The key architectural perception is in how GLiGuard handles a number of duties concurrently. Instead of producing tokens, GLiGuard encodes each the enter textual content and activity definitions (labels) collectively. These are then fed to the mannequin, which scores each label concurrently in a single ahead cross and returns the highest-scoring label for every activity. Because all duties and their candidate labels are half of the enter itself, evaluating extra security dimensions doesn’t add latency; it merely means together with extra labels within the enter.

GLiGuard runs 4 moderation duties concurrently in a single ahead cross:

Safety classification (secure / unsafe) — utilized to each person prompts earlier than technology and mannequin responses after technology.
Jailbreak technique detection throughout 11 methods, together with immediate injection, roleplay bypass, instruction override, and social engineering. If any jailbreak technique is detected, the immediate is mechanically flagged as unsafe.
Harm class detection throughout 14 classes — violence, sexual content material, hate speech, PII publicity, misinformation, baby security, copyright violation, and others. A single enter can set off a number of classes directly.
Refusal detection (compliance / refusal), tracked individually to assist measure over-refusal (when a mannequin refuses secure requests) and detect false compliance (when a mannequin seems to conform however doesn’t). If a refusal is detected, the response is mechanically marked as secure.

Training Data and Fine-Tuning

GLiGuard was educated on a combination of human-annotated and synthetically generated coaching information. For immediate security, response security, and refusal detection, the crew used WildGuardPractice, a dataset of 87,000 human-annotated examples. For hurt class and jailbreak technique detection, labels for the unsafe samples have been generated utilizing GPT-4.1.

During early coaching, the mannequin struggled to tell apart between comparable hurt classes like poisonous speech and violence, so the crew used Pioneer to generate supplemental artificial information with edge instances concentrating on these fine-grained distinctions.

On the structure aspect, GLiGuard was educated by way of full fine-tuning of the GLiNER2-base-v1 checkpoint for 20 epochs utilizing the AdamW optimizer. GLiNER2 is Fastino’s personal structure for multi-task textual content classification — a pure start line for a mannequin designed to attain a number of label units in a single cross.

Benchmark Results: Accuracy and Speed

The analysis crew evaluated GLiGuard throughout 9 established security benchmarks. These benchmarks cowl each immediate and response classification, testing whether or not a mannequin can establish dangerous content material, stand up to adversarial assaults, distinguish between differing kinds of hurt, and keep away from over-flagging secure content material. Results use macro-averaged F1, a typical metric that balances precision and recall.

On accuracy:

GLiGuard scores 87.7 common F1 on immediate classification, inside 1.7 factors of one of the best mannequin (PolyGuard-Qwen at 89.4).
It achieves the second-highest common F1 on response classification (82.7), behind solely Qwen3Guard-8B (84.1).
It outperforms LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B regardless of being 23–90× smaller.

On throughput and latency, benchmarked on a single NVIDIA A100 GPU:

GLiGuard achieves as much as 16.2× larger throughput (133 vs. 8.2 samples/s at batch dimension 4).
GLiGuard achieves as much as 16.6× decrease latency: 26 ms vs. 426 ms at sequence size 64.

These should not marginal enhancements. At 26 ms per request versus 426 ms, the distinction is significant in any real-time user-facing utility, and the compounding impact throughout a multi-turn dialog makes the hole even bigger in follow.

Marktechpost’s Visual Explainer

GLiGuard — Fastino Labs

1 / 6

01 — Overview

What is GLiGuard?

GLiGuard is an open-source 300M parameter security moderation mannequin launched by Fastino Labs on May 12, 2026. It is designed to behave as a guardrail layer between customers and LLMs — screening each person immediate earlier than it reaches the mannequin and each mannequin response earlier than it reaches the person.

300M

Parameters — runs on a single GPU

16x

Faster throughput vs. SOTA decoder guardrails

Safety duties evaluated in a single ahead cross

Apache 2.0
Hugging Face
Pioneer Inference
Encoder Architecture

02 — The Problem

Why Existing Guardrails Are Slow

Most manufacturing guardrail fashions — LlamaGuard4, WildGuard, ShieldGemma, NemoGuard — are constructed on decoder-only transformer architectures. They generate security verdicts autoregressively, one token at a time, the identical means a big language mannequin generates a chat response.

Decoder Guard Models

Generate verdicts token by token

Sequential output — latency compounds per activity

7B — 27B parameters required

Expensive to run at real-time scale

Separate passes per security dimension

GLiGuard (Encoder)

Processes total enter directly

All duties evaluated in one ahead cross

300M parameters

Single GPU deployment

More dimensions = no added latency

03 — Architecture

Single Pass. Multiple Tasks.

GLiGuard reframes security moderation as a textual content classification drawback, not a textual content technology drawback. It encodes the enter textual content and all activity definitions (labels) collectively, then scores each label concurrently in a single ahead cross. Adding extra security dimensions doesn’t enhance latency — it merely means extra labels within the enter.

Base mannequin: Fine-tuned from the GLiNER2-base-v1 checkpoint utilizing full fine-tuning for 20 epochs with the AdamW optimizer. Training information: 87,000 human-annotated examples from WildGuardPractice, plus artificial edge-case information generated by way of GPT-4.1 and Pioneer for fine-grained hurt class distinctions.

04 — Capabilities

4 Moderation Tasks in One Pass

Safety Classification — secure / unsafe

Applied to each person prompts earlier than technology and mannequin responses after technology.

Jailbreak Strategy Detection — 11 methods

Detects immediate injection, roleplay bypass, instruction override, social engineering, and others. Any detected technique auto-flags the immediate as unsafe.

Harm Category Detection — 14 classes

Violence, sexual content material, hate speech, PII publicity, misinformation, baby security, copyright violation, and others. A single enter can set off a number of classes.

Refusal Detection — compliance / refusal

Tracks over-refusal (refusing secure requests) and false compliance. A detected refusal auto-marks the response as secure.

05 — Benchmarks

Accuracy vs. Much Larger Models

Evaluated throughout 9 security benchmarks utilizing macro-averaged F1. Speed benchmarked on a single NVIDIA A100 GPU.

Prompt Classification — Avg. F1

GLiGuard (0.3B)

87.7

PolyGuard-Qwen (7B)

89.4

LlamaGuard4 (12B)

—

ShieldGemma (27B)

—

26ms

Latency at seq. size 64 (vs. 426ms for ShieldGemma-27B)

133

Samples/sec throughput at batch dimension 4

06 — Get Started

Deploy GLiGuard Today

At 300M parameters, GLiGuard runs on a single GPU and might be fine-tuned for domain-specific use instances with out heavy infrastructure. Weights can be found on Hugging Face below the Apache 2.0 license. Managed inference is offered on Pioneer.

Model ID

fastino/gliguard-LLMGuardrails-300M

🤗 Hugging Face
📄 arXiv Paper
⚡ Pioneer Inference

Prompt Safety
Response Safety
Jailbreak Detection
Harm Classification
Refusal Detection
Single GPU

Designed & Created by MarktechPost.com

Key Takeaways

GLiGuard is a 300M parameter encoder-based security moderation mannequin that handles 4 duties — security classification, jailbreak detection, hurt categorization, and refusal detection — in a single ahead cross.
Unlike decoder-only guardrail fashions that generate verdicts autoregressively, GLiGuard reframes security moderation as a textual content classification drawback, eliminating the sequential latency bottleneck.
Benchmarked on a single NVIDIA A100 GPU, GLiGuard achieves as much as 16.2× larger throughput and 16.6× decrease latency (26 ms vs. 426 ms) in comparison with present SOTA fashions like ShieldGemma-27B.
Across 9 security benchmarks, GLiGuard scores 87.7 common F1 on immediate classification and 82.7 on response classification — outperforming LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B regardless of being 23–90× smaller.
Model weights can be found below Apache 2.0 on Hugging Face (fastino/gliguard-LLMGuardrails-300M), making it deployable on a single GPU with out heavy infrastructure.

Check out the Paper, Model Weights on HF, GitHub Repo and Technical details. Also, be happy to observe us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us

The publish Fastino Labs Open-Sources GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size appeared first on MarkTechPost.

Fastino Labs Open-Sources GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size

Why Decoder LLMs May Not Be the Right Tool for Safety Moderation

What GLiGuard Actually Does

Training Data and Fine-Tuning

Benchmark Results: Accuracy and Speed

Marktechpost’s Visual Explainer

Key Takeaways

Build a Hybrid-Memory Autonomous Agent with Modular Architecture and Tool Dispatch Using OpenAI

Building a Multi-Agent Conversational AI Framework with Microsoft AutoGen and Gemini API

From engagement to fulfillment: How Agentic AI is rewriting product metrics

How to Build an Advanced AI Agent with Summarized Short-Term and Vector-Based Long-Term Memory

Top 15+ Most Affordable Proxy Providers 2025

A Coding Implementation to Build Multi-Agent AI Systems with SmolAgents Using Code Execution, Tool Calling, and Dynamic Orchestration

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Why Decoder LLMs May Not Be the Right Tool for Safety Moderation

What GLiGuard Actually Does

Training Data and Fine-Tuning

Benchmark Results: Accuracy and Speed

Marktechpost’s Visual Explainer

Key Takeaways

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!