How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers

In this tutorial, we construct our personal customized GPT-style chat system from scratch utilizing a native Hugging Face mannequin. We begin by loading a light-weight instruction-tuned mannequin that understands conversational prompts, then wrap it inside a structured chat framework that features a system function, consumer reminiscence, and assistant responses. We outline how the agent interprets context, constructs messages, and optionally makes use of small built-in instruments to fetch native knowledge or simulated search outcomes. By the tip, we now have a absolutely useful, conversational mannequin that behaves like a personalised GPT operating. Check out the FULL CODES here.

Copy Code

!pip set up transformers speed up sentencepiece --quiet
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from typing import List, Tuple, Optional
import textwrap, json, os

We start by putting in the important libraries and importing the required modules. We make sure that the atmosphere has all mandatory dependencies, comparable to transformers, torch, and sentencepiece, prepared to be used. This setup permits us to work seamlessly with Hugging Face fashions inside Google Colab. Check out the FULL CODES here.

Copy Code

MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
BASE_SYSTEM_PROMPT = (
   "You are a customized GPT operating domestically. "
   "Follow consumer directions rigorously. "
   "Be concise and structured. "
   "If one thing is unclear, say it's unclear. "
   "Prefer sensible examples over company examples until explicitly requested. "
   "When requested for code, give runnable code."
)
MAX_NEW_TOKENS = 256

We configure our mannequin title, outline the system immediate that governs the assistant’s habits, and set token limits. We set up how our customized GPT ought to reply, concise, structured, and sensible. This part defines the muse of our mannequin’s id and instruction fashion. Check out the FULL CODES here.

Copy Code

print("Loading mannequin...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
if tokenizer.pad_token_id is None:
   tokenizer.pad_token_id = tokenizer.eos_token_id
mannequin = AutoModelForCausalLM.from_pretrained(
   MODEL_NAME,
   torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
   device_map="auto"
)
mannequin.eval()
print("Model loaded.")

We load the tokenizer and mannequin from Hugging Face into reminiscence and put together them for inference. We mechanically modify the gadget mapping based mostly on obtainable {hardware}, guaranteeing GPU acceleration if attainable. Once loaded, our mannequin is prepared to generate responses. Check out the FULL CODES here.

Copy Code

ConversationHistory = List[Tuple[str, str]]
historical past: ConversationHistory = [("system", BASE_SYSTEM_PROMPT)]


def wrap_text(s: str, w: int = 100) -> str:
   return "n".be part of(textwrap.wrap(s, width=w))


def build_chat_prompt(historical past: ConversationHistory, user_msg: str) -> str:
   prompt_parts = []
   for function, content material in historical past:
       if function == "system":
           prompt_parts.append(f"<|system|>n{content material}n")
       elif function == "consumer":
           prompt_parts.append(f"<|consumer|>n{content material}n")
       elif function == "assistant":
           prompt_parts.append(f"<|assistant|>n{content material}n")
   prompt_parts.append(f"<|consumer|>n{user_msg}n")
   prompt_parts.append("<|assistant|>n")
   return "".be part of(prompt_parts)

We initialize the dialog historical past, beginning with a system function, and create a immediate builder to format messages. We outline how consumer and assistant turns are organized in a constant conversational construction. This ensures the mannequin all the time understands the dialogue context accurately. Check out the FULL CODES here.

Copy Code

def local_tool_router(user_msg: str) -> Optional[str]:
   msg = user_msg.strip().decrease()
   if msg.startswith("search:"):
       question = user_msg.cut up(":", 1)[-1].strip()
       return f"Search outcomes about '{question}':n- Key level 1n- Key level 2n- Key level 3"
   if msg.startswith("docs:"):
       matter = user_msg.cut up(":", 1)[-1].strip()
       return f"Documentation extract on '{matter}':n1. The agent orchestrates instruments.n2. The mannequin consumes output.n3. Responses turn out to be reminiscence."
   return None

We add a light-weight instrument router that extends our GPT’s functionality to simulate duties like search or documentation retrieval. We outline logic to detect particular prefixes comparable to “search:” or “docs:” in consumer queries. This easy agentic design offers our assistant contextual consciousness. Check out the FULL CODES here.

Copy Code

def generate_reply(historical past: ConversationHistory, user_msg: str) -> str:
   tool_context = local_tool_router(user_msg)
   if tool_context:
       user_msg = user_msg + "nnUseful context:n" + tool_context
   immediate = build_chat_prompt(historical past, user_msg)
   inputs = tokenizer(immediate, return_tensors="pt").to(mannequin.gadget)
   with torch.no_grad():
       output_ids = mannequin.generate(
           **inputs,
           max_new_tokens=MAX_NEW_TOKENS,
           do_sample=True,
           top_p=0.9,
           temperature=0.6,
           pad_token_id=tokenizer.eos_token_id
       )
   decoded = tokenizer.decode(output_ids[0], skip_special_tokens=True)
   reply = decoded.cut up("<|assistant|>")[-1].strip() if "<|assistant|>" in decoded else decoded[len(prompt):].strip()
   historical past.append(("consumer", user_msg))
   historical past.append(("assistant", reply))
   return reply


def save_history(historical past: ConversationHistory, path: str = "chat_history.json") -> None:
   knowledge = [{"role": r, "content": c} for (r, c) in history]
   with open(path, "w") as f:
       json.dump(knowledge, f, indent=2)


def load_history(path: str = "chat_history.json") -> ConversationHistory:
   if not os.path.exists(path):
       return [("system", BASE_SYSTEM_PROMPT)]
   with open(path, "r") as f:
       knowledge = json.load(f)
   return [(item["role"], merchandise["content"]) for merchandise in knowledge]

We outline the first reply era operate, which mixes historical past, context, and mannequin inference to produce coherent outputs. We additionally add capabilities to save and cargo previous conversations for persistence. This snippet types the operational core of our customized GPT. Check out the FULL CODES here.

Copy Code

print("n--- Demo flip 1 ---")
demo_reply_1 = generate_reply(historical past, "Explain what this practice GPT setup is doing in 5 bullet factors.")
print(wrap_text(demo_reply_1))


print("n--- Demo flip 2 ---")
demo_reply_2 = generate_reply(historical past, "search: agentic ai with native fashions")
print(wrap_text(demo_reply_2))


def interactive_chat():
   print("nChat prepared. Type 'exit' to cease.")
   whereas True:
       strive:
           user_msg = enter("nUser: ").strip()
       besides EOFError:
           break
       if user_msg.decrease() in ("exit", "stop", "q"):
           break
       reply = generate_reply(historical past, user_msg)
       print("nAssistant:n" + wrap_text(reply))


# interactive_chat()
print("nCustom GPT initialized efficiently.")

We check the whole setup by operating demo prompts and displaying generated responses. We additionally create an optionally available interactive chat loop to converse instantly with the assistant. By the tip, we verify that our customized GPT runs domestically and responds intelligently in actual time.

In conclusion, we designed and executed a customized conversational agent that mirrors GPT-style reasoning with out counting on any exterior providers. We noticed how native fashions may be made interactive via immediate orchestration, light-weight instrument routing, and conversational reminiscence administration. This method allows us to perceive the inner logic behind industrial GPT programs. It empowers us to experiment with our personal guidelines, behaviors, and integrations in a clear and absolutely offline method.

Check out the FULL CODES here. Feel free to try our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t overlook to be part of our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers appeared first on MarkTechPost.