AI Support Copilot

Technical guide

How AI Support Copilot works — RAG pipeline, agent workflow, APIs, and the code behind InterviewPro.info support chat.

Next.js 16React 19OpenAIHybrid RAGSSEZustandTailwind v4

Reading guide

  • What it is — core concept in one paragraph.
  • In this app — how the copilot uses it end to end.
  • Key files — where to read the implementation.

System architecture

Four layers — client, API, retrieval, and trust — with OpenAI calls only on the server.

Client

Next.js chat UI streams answers over SSE. Zustand holds messages and multi-turn history.

  • components/SupportCopilot.jsx
  • stores/chatStore.js
  • lib/chatClient.js

API

Route handlers orchestrate retrieval, guardrails, tool calls, and streaming generation.

  • app/api/chat/route.js
  • app/api/structured/route.js
  • app/api/ingest/route.js

Retrieval

Help articles are chunked, embedded, and searched with hybrid vector + BM25 retrieval.

  • data/supportDocs.js
  • lib/hybridRetrieval.js
  • lib/vectorStore.js
  • lib/chunking.js

Trust layer

Injection checks, score guardrails, allowlisted tools, and golden eval regression tests.

  • lib/security.js
  • lib/prompt.js
  • lib/eval.js
  • data/goldenQuestions.js

Request flow

  1. 1

    Browser POSTs question + chat history to /api/chat.

  2. 2

    Server runs prompt-injection check; blocks unsafe input.

  3. 3

    Hybrid RAG retrieves top Help Center chunks (vector + BM25, reranked).

  4. 4

    Guardrail refuses if retrieval confidence is too low.

  5. 5

    ReAct planner may call get_subscription_status for billing questions.

  6. 6

    OpenAI generates a grounded answer; SSE streams tokens + source chips to UI.

Browser (/chat)
  │  POST /api/chat { question, history }
  ▼
Security check → Hybrid RAG → Guardrail
  │                      │
  │                      ├─ low score → refuse
  ▼                      ▼
ReAct + tools ──► OpenAI stream (SSE)
  │
  ▼
UI: tokens + source chips

API reference

Server routes called by the chat UI and dev tooling.

MethodPath
POST/api/chat
POST/api/structured
POST/api/ingest
GET/POST/api/eval
GET/api/tools/subscription-status

Foundation

1

LLM Fundamentals

What it is

An LLM (Large Language Model) is the AI that reads text and writes answers — like ChatGPT. It works with small pieces called tokens, and it can only read a limited amount of text at once (context window).

In this app

We do not paste the whole help center into the model. We turn your question and help articles into number lists called embeddings, then only send the best matching pieces to the model.

Flow

  • Your question is converted to an embedding (numbers that capture meaning).
  • Help articles were embedded the same way when the app started.
  • Only the top matching chunks go into the prompt — saves cost and reduces wrong answers.

Key files

  • lib/embeddings.js

    Calls OpenAI to create embeddings

  • lib/llm.js

    Checks how big the prompt is (tokens)

  • lib/openai.js

    Connects to OpenAI API

2

LLM Integration

What it is

Integration means connecting your app to the AI: sending questions, getting answers back, sometimes word-by-word (streaming), and sometimes calling extra functions (tools).

In this app

The website talks to our server. The server talks to OpenAI. The browser never sees the secret API key. Chat streams live; billing questions can trigger a subscription check tool.

Flow

  • You type in /chat → browser sends question to POST /api/chat.
  • Server streams the answer back (SSE) so words appear one by one.
  • For “Is my Pro active?” the server may call get_subscription_status, then answer.
  • POST /api/structured returns JSON (ticket summary) instead of chat text.

Key files

  • app/api/chat/route.js

    Main chat API — stream + RAG + tools

  • app/api/structured/route.js

    Returns JSON ticket shape

  • lib/tools.js

    Defines allowed tools

  • lib/chatClient.js

    Browser code that calls /api/chat

3

Prompt Engineering

What it is

A prompt is the instruction you give the AI. Good prompts tell it how to think, what sources to trust, and when to say “I don’t know.”

In this app

We use CoT grounding prompts, ReAct tool planning, multi-turn chat history, and a support persona (mateshwari). The model must answer only from retrieved Help Center chunks and refuse when retrieval confidence is low.

Flow

  • System prompt defines support voice; user prompt wraps TRUSTED_CONTEXT + question.
  • Follow-up questions merge prior turns via lib/chatHistory.js and buildRetrievalQuery().
  • If retrieval score is too low → refuse instead of guessing.
  • npm run eval verifies golden questions still retrieve the right articles.

Key files

  • lib/prompt.js

    System and CoT/ReAct prompt text

  • lib/security.js

    Blocks prompt injection

  • lib/eval.js

    Runs golden question tests

  • data/goldenQuestions.js

    List of test questions + expected docs

Retrieval

4

RAG Pipelines

What it is

RAG = Retrieval Augmented Generation. Instead of relying on the model’s memory, you search your own documents first, then ask the model to answer using only what you found.

In this app

Help articles live in data/supportDocs.js. They are split into chunks, embedded, stored in memory. When you ask a question, we retrieve the best chunks, then generate an answer with citations.

Flow

  • Ingest: split articles into chunks (lib/chunking.js).
  • Embed each chunk and save in vector store.
  • On question: hybrid search finds top chunks.
  • Rerank boosts important topics (billing, indexing).
  • Model writes answer using those chunks + shows sources in UI.

Key files

  • data/supportDocs.js

    All help article text

  • lib/chunking.js

    Splits docs into pieces

  • lib/hybridRetrieval.js

    Finds best chunks for a question

  • lib/vectorStore.js

    Stores embeddings in memory

  • app/api/ingest/route.js

    Rebuild index after doc changes

5

Vector Embeddings

What it is

An embedding turns text into a list of numbers. Similar meanings get similar numbers, so the computer can find “related” sentences without exact keyword match.

In this app

Every help chunk and every user question gets an embedding. We compare them with cosine similarity (how “close” two number lists are) to pick the best chunks.

Flow

  • OpenAI model text-embedding-3-small creates the vectors.
  • Similar questions like “cancel Pro” and “stop subscription” land near the right article.
  • lib/cache.js remembers embeddings so repeat questions are faster and cheaper.

Key files

  • lib/embeddings.js

    Creates embeddings via API

  • lib/vectorStore.js

    Stores chunks + vectors

  • lib/cache.js

    Caches question embeddings

6

Semantic Search

What it is

Semantic search finds meaning (e.g. “refund” matches “money back”). Keyword search finds exact words (e.g. “Privacy Mode”). Hybrid search combines both.

In this app

We mix 60% vector (meaning) + 40% BM25 (exact words). That helps when users type product names, settings paths, or error codes that must match exactly.

Flow

  • BM25 scores chunks that contain exact phrases from the question.
  • Vector scores chunks that are similar in meaning.
  • Scores are combined, then rerank.js boosts track-specific topics (ResumePro, EvalPro, DSA, etc.).

Key files

  • lib/bm25.js

    Keyword-style search

  • lib/hybridRetrieval.js

    Combines vector + BM25

  • lib/rerank.js

    Final ranking tweaks

Agents & UI

7

AI Agents & Workflows

What it is

An agent is an AI that can take steps: check safety, search docs, maybe call a tool, then answer — not just one shot reply.

In this app

Each chat request runs a fixed workflow: security → retrieve docs → check score → plan (docs or tool?) → optional subscription lookup → stream answer.

Flow

  • Step order is logged in lib/agent.js for debugging.
  • Only one tool is allowed: get_subscription_status (Pro billing).
  • Model cannot run random code — server validates every tool call.

Key files

  • lib/agent.js

    Step names and trace logging

  • app/api/chat/route.js

    Runs the full workflow

  • lib/tools.js

    Subscription tool logic

  • app/api/tools/subscription-status/route.js

    Tool HTTP endpoint

8

AI-powered UI Systems

What it is

The chat screen users see: typing box, streaming text, stop button, example questions, and links to sources under the answer.

In this app

The /chat page is a support desk UI: sidebar with example questions, streaming replies, citation chips, multi-turn memory, and stop control.

Flow

  • Zustand store holds messages; buildHistoryFromMessages sends prior turns to /api/chat.
  • SupportCopilot.jsx renders the thread and streams tokens via lib/chatClient.js.
  • Enter sends; Shift+Enter adds a new line. User can stop mid-stream.

Key files

  • components/SupportCopilot.jsx

    Main chat window

  • components/ChatSidebar.jsx

    Example questions

  • components/ChatWelcome.jsx

    First screen in chat

  • lib/chatHistory.js

    Multi-turn history for follow-ups

  • stores/chatStore.js

    Message + stream state

Engineering

9

AI Backend Engineering

What it is

Backend work: how the server sends data (streaming), saves money (cache), rebuilds search index (ingest), and logs what happened (telemetry).

In this app

Answers stream over SSE (Server-Sent Events). Embeddings are cached. You can rebuild the doc index with POST /api/ingest. Logs include trace ID and retrieval scores.

Flow

  • SSE sends events: meta → sources → tokens → done.
  • Cache avoids paying twice for the same question embedding.
  • Ingest re-reads supportDocs.js and rebuilds the search index.

Key files

  • app/api/chat/route.js

    SSE stream implementation

  • lib/cache.js

    Embedding cache

  • lib/telemetry.js

    Logs trace + scores

  • app/api/ingest/route.js

    Rebuild index API

10

AI System Design

What it is

How you split responsibility: what runs in the browser vs server, how to keep one customer’s data separate (multi-tenant), and how to scale later.

In this app

All secrets and AI calls stay on the server. Docs are tagged tenantId: interviewpro so only InterviewPro.info help articles are searched. Real apps would add login, database, rate limits.

Flow

  • Browser = UI only. Server = OpenAI + search + tools.
  • Every retrieve filters chunks where tenantId matches.
  • Designed so you can swap in pgvector or Pinecone later.

Key files

  • app/api/chat/route.js

    Server-side orchestration

  • lib/hybridRetrieval.js

    tenantId filter on search

  • data/supportDocs.js

    Each doc has tenantId

11

AI Performance & Security

What it is

Making sure the app is safe (block hacking prompts) and good (automated tests that the right docs are found before users complain).

In this app

We block jailbreak-style messages. Low search score → no fake answer. Golden questions in data/goldenQuestions.js are tested with npm run eval.

Flow

  • detectPromptInjection runs before search or AI.
  • If top chunk score is too low, user gets a polite refusal.
  • Eval script checks cancel Pro, privacy, indexing questions still retrieve correct docs.

Key files

  • lib/security.js

    Injection detection

  • lib/eval.js

    Eval runner

  • data/goldenQuestions.js

    Test questions

  • scripts/run-eval.mjs

    npm run eval script

Try it

Validate retrieval, tools, and guardrails in Chat.

  • ResumePro

    How does ResumePro score my resume against a job description?

  • EvalPro

    How does EvalPro evaluate my machine coding submission?

  • Uses a tool

    Is my InterviewPro.info Pro still active?

  • Should refuse

    What is the weather in Tokyo?

Test retrieval quality

npm run eval

Reload help articles

curl -X POST http://localhost:3010/api/ingest

Re-index after editing data/supportDocs.js.