Client
Next.js chat UI streams answers over SSE. Zustand holds messages and multi-turn history.
components/SupportCopilot.jsxstores/chatStore.jslib/chatClient.js
AI Support Copilot
How AI Support Copilot works — RAG pipeline, agent workflow, APIs, and the code behind InterviewPro.info support chat.
Reading guide
Four layers — client, API, retrieval, and trust — with OpenAI calls only on the server.
Next.js chat UI streams answers over SSE. Zustand holds messages and multi-turn history.
components/SupportCopilot.jsxstores/chatStore.jslib/chatClient.jsRoute handlers orchestrate retrieval, guardrails, tool calls, and streaming generation.
app/api/chat/route.jsapp/api/structured/route.jsapp/api/ingest/route.jsHelp articles are chunked, embedded, and searched with hybrid vector + BM25 retrieval.
data/supportDocs.jslib/hybridRetrieval.jslib/vectorStore.jslib/chunking.jsInjection checks, score guardrails, allowlisted tools, and golden eval regression tests.
lib/security.jslib/prompt.jslib/eval.jsdata/goldenQuestions.jsBrowser POSTs question + chat history to /api/chat.
Server runs prompt-injection check; blocks unsafe input.
Hybrid RAG retrieves top Help Center chunks (vector + BM25, reranked).
Guardrail refuses if retrieval confidence is too low.
ReAct planner may call get_subscription_status for billing questions.
OpenAI generates a grounded answer; SSE streams tokens + source chips to UI.
Browser (/chat)
│ POST /api/chat { question, history }
▼
Security check → Hybrid RAG → Guardrail
│ │
│ ├─ low score → refuse
▼ ▼
ReAct + tools ──► OpenAI stream (SSE)
│
▼
UI: tokens + source chipsServer routes called by the chat UI and dev tooling.
| Method | Path | Purpose |
|---|---|---|
POST | /api/chat | Send a question, get a streaming answer + sources |
POST | /api/structured | Get a JSON summary (like a support ticket) |
POST | /api/ingest | Reload help articles into search index |
GET/POST | /api/eval | Run automatic quality tests |
GET | /api/tools/subscription-status | Check Pro subscription (demo data) |
What it is
An LLM (Large Language Model) is the AI that reads text and writes answers — like ChatGPT. It works with small pieces called tokens, and it can only read a limited amount of text at once (context window).
In this app
We do not paste the whole help center into the model. We turn your question and help articles into number lists called embeddings, then only send the best matching pieces to the model.
Flow
Key files
lib/embeddings.jsCalls OpenAI to create embeddings
lib/llm.jsChecks how big the prompt is (tokens)
lib/openai.jsConnects to OpenAI API
What it is
Integration means connecting your app to the AI: sending questions, getting answers back, sometimes word-by-word (streaming), and sometimes calling extra functions (tools).
In this app
The website talks to our server. The server talks to OpenAI. The browser never sees the secret API key. Chat streams live; billing questions can trigger a subscription check tool.
Flow
Key files
app/api/chat/route.jsMain chat API — stream + RAG + tools
app/api/structured/route.jsReturns JSON ticket shape
lib/tools.jsDefines allowed tools
lib/chatClient.jsBrowser code that calls /api/chat
What it is
A prompt is the instruction you give the AI. Good prompts tell it how to think, what sources to trust, and when to say “I don’t know.”
In this app
We use CoT grounding prompts, ReAct tool planning, multi-turn chat history, and a support persona (mateshwari). The model must answer only from retrieved Help Center chunks and refuse when retrieval confidence is low.
Flow
Key files
lib/prompt.jsSystem and CoT/ReAct prompt text
lib/security.jsBlocks prompt injection
lib/eval.jsRuns golden question tests
data/goldenQuestions.jsList of test questions + expected docs
What it is
RAG = Retrieval Augmented Generation. Instead of relying on the model’s memory, you search your own documents first, then ask the model to answer using only what you found.
In this app
Help articles live in data/supportDocs.js. They are split into chunks, embedded, stored in memory. When you ask a question, we retrieve the best chunks, then generate an answer with citations.
Flow
Key files
data/supportDocs.jsAll help article text
lib/chunking.jsSplits docs into pieces
lib/hybridRetrieval.jsFinds best chunks for a question
lib/vectorStore.jsStores embeddings in memory
app/api/ingest/route.jsRebuild index after doc changes
What it is
An embedding turns text into a list of numbers. Similar meanings get similar numbers, so the computer can find “related” sentences without exact keyword match.
In this app
Every help chunk and every user question gets an embedding. We compare them with cosine similarity (how “close” two number lists are) to pick the best chunks.
Flow
Key files
lib/embeddings.jsCreates embeddings via API
lib/vectorStore.jsStores chunks + vectors
lib/cache.jsCaches question embeddings
What it is
Semantic search finds meaning (e.g. “refund” matches “money back”). Keyword search finds exact words (e.g. “Privacy Mode”). Hybrid search combines both.
In this app
We mix 60% vector (meaning) + 40% BM25 (exact words). That helps when users type product names, settings paths, or error codes that must match exactly.
Flow
Key files
lib/bm25.jsKeyword-style search
lib/hybridRetrieval.jsCombines vector + BM25
lib/rerank.jsFinal ranking tweaks
What it is
An agent is an AI that can take steps: check safety, search docs, maybe call a tool, then answer — not just one shot reply.
In this app
Each chat request runs a fixed workflow: security → retrieve docs → check score → plan (docs or tool?) → optional subscription lookup → stream answer.
Flow
Key files
lib/agent.jsStep names and trace logging
app/api/chat/route.jsRuns the full workflow
lib/tools.jsSubscription tool logic
app/api/tools/subscription-status/route.jsTool HTTP endpoint
What it is
The chat screen users see: typing box, streaming text, stop button, example questions, and links to sources under the answer.
In this app
The /chat page is a support desk UI: sidebar with example questions, streaming replies, citation chips, multi-turn memory, and stop control.
Flow
Key files
components/SupportCopilot.jsxMain chat window
components/ChatSidebar.jsxExample questions
components/ChatWelcome.jsxFirst screen in chat
lib/chatHistory.jsMulti-turn history for follow-ups
stores/chatStore.jsMessage + stream state
What it is
Backend work: how the server sends data (streaming), saves money (cache), rebuilds search index (ingest), and logs what happened (telemetry).
In this app
Answers stream over SSE (Server-Sent Events). Embeddings are cached. You can rebuild the doc index with POST /api/ingest. Logs include trace ID and retrieval scores.
Flow
Key files
app/api/chat/route.jsSSE stream implementation
lib/cache.jsEmbedding cache
lib/telemetry.jsLogs trace + scores
app/api/ingest/route.jsRebuild index API
What it is
How you split responsibility: what runs in the browser vs server, how to keep one customer’s data separate (multi-tenant), and how to scale later.
In this app
All secrets and AI calls stay on the server. Docs are tagged tenantId: interviewpro so only InterviewPro.info help articles are searched. Real apps would add login, database, rate limits.
Flow
Key files
app/api/chat/route.jsServer-side orchestration
lib/hybridRetrieval.jstenantId filter on search
data/supportDocs.jsEach doc has tenantId
What it is
Making sure the app is safe (block hacking prompts) and good (automated tests that the right docs are found before users complain).
In this app
We block jailbreak-style messages. Low search score → no fake answer. Golden questions in data/goldenQuestions.js are tested with npm run eval.
Flow
Key files
lib/security.jsInjection detection
lib/eval.jsEval runner
data/goldenQuestions.jsTest questions
scripts/run-eval.mjsnpm run eval script
Validate retrieval, tools, and guardrails in Chat.
ResumePro
“How does ResumePro score my resume against a job description?”
EvalPro
“How does EvalPro evaluate my machine coding submission?”
Uses a tool
“Is my InterviewPro.info Pro still active?”
Should refuse
“What is the weather in Tokyo?”
Test retrieval quality
npm run evalReload help articles
curl -X POST http://localhost:3010/api/ingestRe-index after editing data/supportDocs.js.