Building a Hybrid Rule-Based and Machine Learning Framework to Detect and Defend Against Jailbreak Prompts in LLM Systems
In this tutorial, we introduce a Jailbreak Defense that we constructed step-by-step to detect and safely deal with policy-evasion prompts. We generate life like assault and benign examples, craft rule-based alerts, and mix these with TF-IDF options into a compact, interpretable classifier so we will catch evasive prompts with out blocking official requests. We exhibit…
