How to Design a Fully Functional Enterprise AI Assistant with Retrieval Augmentation and Policy Guardrails Using Open Source AI Models

In this tutorial, we discover how we will construct a compact but highly effective Enterprise AI assistant that runs effortlessly on Colab. We begin by integrating retrieval-augmented era (RAG) utilizing FAISS for doc retrieval and FLAN-T5 for textual content era, each absolutely open-source and free. As we progress, we embed enterprise insurance policies similar to information redaction, entry management, and PII safety straight into the workflow, making certain our system is clever and compliant. Check out the FULL CODES here.

Copy Code

!pip -q set up faiss-cpu transformers==4.44.2 speed up sentence-transformers==3.0.1


from typing import List, Dict, Tuple
import re, textwrap, numpy as np, torch
from sentence_transformers import SentenceTransformer
import faiss
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM


GEN_MODEL = "google/flan-t5-base"
EMB_MODEL = "sentence-transformers/all-MiniLM-L6-v2"


gen_tok = AutoTokenizer.from_pretrained(GEN_MODEL)
gen_model = AutoModelForSeq2SeqLM.from_pretrained(GEN_MODEL, device_map="auto")
generate = pipeline("text2text-generation", mannequin=gen_model, tokenizer=gen_tok)


emb_device = "cuda" if torch.cuda.is_available() else "cpu"
emb_model = SentenceTransformer(EMB_MODEL, gadget=emb_device)

We start by organising our surroundings and loading the required fashions. We initialize FLAN-T5 for textual content era and MiniLM for embedding representations. We guarantee each fashions are configured to routinely use the GPU when obtainable, so our pipeline runs effectively. Check out the FULL CODES here.

Copy Code

DOCS = [
 {"id":"policy_sec_001","title":"Data Security Policy",
  "text":"All customer data must be encrypted at rest (AES-256) and in transit (TLS 1.2+). Access is role-based (RBAC). Secrets are stored in a managed vault. Backups run nightly with 35-day retention. PII includes name, email, phone, address, PAN/Aadhaar."},
 {"id":"policy_ai_002","title":"Responsible AI Guidelines",
  "text":"Use internal models for confidential data. Retrieval sources must be logged. No customer decisioning without human-in-the-loop. Redact PII in prompts and outputs. All model prompts and outputs are stored for audit for 180 days."},
 {"id":"runbook_inc_003","title":"Incident Response Runbook",
  "text":"If a suspected breach occurs, page on-call SecOps. Rotate keys, isolate affected services, perform forensic capture, notify DPO within regulatory SLA. Communicate via the incident room only."},
 {"id":"sop_sales_004","title":"Sales SOP - Enterprise Deals",
  "text":"For RFPs, use the approved security questionnaire responses. Claims must match policy_sec_001. Custom clauses need Legal sign-off. Keep records in CRM with deal room links."}
]


def chunk(textual content:str, chunk_size=600, overlap=80):
   w = textual content.break up()
   if len(w) <= chunk_size: return [text]
   out=[]; i=0
   whereas i < len(w):
       j=min(i+chunk_size, len(w)); out.append(" ".be part of(w[i:j]))
       if j==len(w): break
       i = j - overlap
   return out


CORPUS=[]
for d in DOCS:
   for i,c in enumerate(chunk(d["text"])):
       CORPUS.append({"doc_id":d["id"],"title":d["title"],"chunk_id":i,"textual content":c})

We create a small enterprise-style doc set to simulate inner insurance policies and procedures. We then break these lengthy texts into manageable chunks to allow them to be embedded and retrieved successfully. This chunking helps our AI assistant deal with contextual info with higher precision. Check out the FULL CODES here.

Copy Code

def build_index(chunks:List[Dict]) -> Tuple[faiss.IndexFlatIP, np.ndarray]:
   vecs = emb_model.encode([c["text"] for c in chunks], normalize_embeddings=True, convert_to_numpy=True)
   index = faiss.IndexFlatIP(vecs.form[1]); index.add(vecs); return index, vecs


INDEX, VECS = build_index(CORPUS)


PII_PATTERNS = [
   (re.compile(r"bd{10}b"), "<REDACTED_PHONE>"),
   (re.compile(r"b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,}b", re.I), "<REDACTED_EMAIL>"),
   (re.compile(r"bd{12}b"), "<REDACTED_ID12>"),
   (re.compile(r"b[A-Z]{5}d{4}[A-Z]b"), "<REDACTED_PAN>")
]
def redact(t:str)->str:
   for p,r in PII_PATTERNS: t = p.sub(r, t)
   return t


POLICY_DISALLOWED = [
   re.compile(r"b(share|exfiltrate)b.*b(raw|all)b.*bdatab", re.I),
   re.compile(r"bdisableb.*bencryptionb", re.I),
]
def policy_check(q:str):
   for r in POLICY_DISALLOWED:
       if r.search(q): return False, "Request violates safety coverage (information exfiltration/encryption tampering)."
   return True, ""

We embed all chunks utilizing Sentence Transformers and retailer them in a FAISS index for quick retrieval. We introduce PII redaction guidelines and coverage checks to forestall misuse of knowledge. By doing this, we guarantee our assistant adheres to enterprise safety and compliance pointers. Check out the FULL CODES here.

Copy Code

def retrieve(question:str, okay=4)->List[Dict]:
   qv = emb_model.encode([query], normalize_embeddings=True, convert_to_numpy=True)
   scores, idxs = INDEX.search(qv, okay)
   return [{**CORPUS[i], "rating": float(s)} for s,i in zip(scores[0], idxs[0])]


SYSTEM = ("You are an enterprise AI assistant.n"
         "- Answer strictly from the supplied CONTEXT.n"
         "- If lacking data, say what's unknown and recommend the proper coverage/runbook.n"
         "- Keep it concise and cite titles + doc_ids inline like [Title (doc_id:chunk)].")
def build_prompt(user_q:str, ctx_blocks:List[Dict])->str:
   ctx = "nn".be part of(f"[{i+1}] {b['title']} (doc:{b['doc_id']}:{b['chunk_id']})n{b['text']}" for i,b in enumerate(ctx_blocks))
   uq = redact(user_q)
   return f"SYSTEM:n{SYSTEM}nnCONTEXT:n{ctx}nnUSER QUESTION:n{uq}nnINSTRUCTIONS:n- Cite sources inline.n- Keep to 5-8 sentences.n- Preserve redactions."


def reply(user_q:str, okay=4, max_new_tokens=220)->Dict:
   okay,msg = policy_check(user_q)
   if not okay: return {"reply": f" {msg}", "ctx":[]}
   ctx = retrieve(user_q, okay=okay); immediate = build_prompt(user_q, ctx)
   out = generate(immediate, max_new_tokens=max_new_tokens, do_sample=False)[0]["generated_text"].strip()
   return {"reply": out, "ctx": ctx}

We design the retrieval operate to fetch related doc sections for every person question. We then assemble a structured immediate combining context and questions for FLAN-T5 to generate exact solutions. This step ensures that our assistant produces grounded, policy-compliant responses. Check out the FULL CODES here.

Copy Code

def eval_query(user_q:str, ctx:List[Dict])->Dict:
   phrases = [w.lower() for w in re.findall(r"[a-zA-Z]{4,}", user_q)]
   ctx_text = " ".be part of(c["text"].decrease() for c in ctx)
   hits = sum(t in ctx_text for t in phrases)
   return {"phrases": len(phrases), "hits": hits, "hit_rate": spherical(hits/max(1,len(phrases)), 2)}


QUERIES = [
   "What encryption and backup rules do we follow for customer data?",
   "Can we auto-answer RFP security questionnaires? What should we cite?",
   "If there is a suspected breach, what are the first three steps?",
   "Is it allowed to share all raw customer data externally for testing?"
]
for q in QUERIES:
   res = reply(q, okay=3)
   print("n" + "="*100); print("Q:", q); print("nA:", res["answer"])
   if res["ctx"]:
       ev = eval_query(q, res["ctx"]); print("nRetrieved Context (prime 3):")
       for r in res["ctx"]: print(f"- {r['title']} [{r['doc_id']}:{r['chunk_id']}] rating={r['score']:.3f}")
       print("Eval:", ev)

We consider our system utilizing pattern enterprise queries that check encryption, RFPs, and incident procedures. We show retrieved paperwork, solutions, and easy hit-rate scores to verify relevance. Through this demo, we observe our Enterprise AI assistant performing retrieval-augmented reasoning securely and precisely.

In conclusion, we efficiently created a self-contained enterprise AI system that retrieves, analyzes, and responds to enterprise queries whereas sustaining robust guardrails. We respect how seamlessly we will mix FAISS for retrieval, Sentence Transformers for embeddings, and FLAN-T5 for era to simulate an inner enterprise data engine. As we end, we notice that this easy Colab-based implementation can function a blueprint for scalable, auditable, and compliant enterprise deployments.

Check out the FULL CODES here. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to observe us on Twitter and don’t neglect to be part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The put up How to Design a Fully Functional Enterprise AI Assistant with Retrieval Augmentation and Policy Guardrails Using Open Source AI Models appeared first on MarkTechPost.

How to Design a Fully Functional Enterprise AI Assistant with Retrieval Augmentation and Policy Guardrails Using Open Source AI Models

Guardrails AI Introduces Snowglobe: The Simulation Engine for AI Agents and Chatbots

FAQs: Everything You Need to Know About AI Agents in 2025

Meta AI Introduces DreamGym: A Textual Experience Synthesizer For Reinforcement learning RL Agents

StepFun AI Releases Step-Audio-EditX: A New Open-Source 3B LLM-Grade Audio Editing Model Excelling at Expressive and Iterative Audio Editing

UltraCUA: A Foundation Computer-Use Agents Model that Bridges the Gap between General-Purpose GUI Agents and Specialized API-based Agents

Anthropic Launches Claude Haiku 4.5: Small AI Model that Delivers Sonnet-4-Level Coding Performance at One-Third the Cost and more than Twice the Speed

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!