If you have visited this site in the last few months, you have probably noticed the Ask widget in the corner. It is not a chatbot bolted on for appearances. It is a purpose-built AI assistant that knows everything published on this site and nothing that is not, and it was designed that way on purpose.

This post is a full account of how it was built: the user experience problem it solves, the technology stack powering it, how we structured the knowledge base that keeps it accurate, and the layered security and guardrail system that keeps it from going off-script. If you are thinking about adding an AI assistant to your own site, this is the blueprint.

The Problem: Visitors Should Not Have to Hunt

Most business websites have a real navigation problem that analytics rarely surfaces cleanly. Visitors land with a specific question, something concrete and answerable, and they spend far too long clicking through pages to find the answer. Services pages. About pages. Case study archives. FAQ sections. Every click is friction, every wrong turn is a reason to leave.

The typical response to this is to simplify navigation or restructure the information architecture. Those things help. But they do not solve the underlying issue: web pages are optimized for browsing, not for answering a specific question someone has right now.

“A good AI assistant does not replace your website. It gives visitors a faster path to the answer they already came looking for.”

The goal for Ask was simple: let someone ask any question about Select Interactive and get an accurate, immediate answer without having to know where the information lives on the site. What services do you offer? Do you work with healthcare companies? What does the process look like? All of that should be a single question away.

Without Ask

Home

Services

About

Work

Still searching...

With Ask

Answered instantly.

The diagram above shows the contrast clearly. Without Ask, a visitor has to navigate through multiple pages to find what they need, and there is no guarantee they find it at all. With Ask, that same visitor types one question and gets the answer directly. The information on the site has not changed; the path to it has.

The Knowledge Base: Teaching the Model What It Needs to Know

The most important decision in building an AI assistant for a website is not which model to use. It is how you control what the model knows. A general-purpose LLM knows a lot about the world, but almost nothing specific about your company, your services, your pricing philosophy, your case studies, or your team. Left on its own, it will fill those gaps with plausible-sounding guesses.

We solved this with a structured knowledge base that is passed to the model as the system context on every single request. Before Ask ever generates a word, it has been given a complete, structured representation of the site: services with full descriptions, case studies with client and outcome details, team information, contact details, office locations, navigation paths, testimonials, and more.

How the knowledge base is structured

All of the site content already lives in well-typed TypeScript data modules, which is one of the advantages of building on a modern stack rather than pulling content from a CMS at runtime. We have a server-side builder function that imports from those data modules, serializes the relevant content to structured JSON, and assembles it into a knowledge base document that becomes the model's system context for every conversation.

Services. Each service area with its full description, what problems it solves, who it is for, and what the outcome looks like.
Case studies. Real project outcomes, client context, the challenges we addressed, and the solutions we built.
About and team. Company background, values, recognition, and what makes Select Interactive different.
Canonical paths. A map of every meaningful URL on the site so the assistant can link directly to the right page when relevant.
Contact and offices. How to reach us, where we are located, and what the engagement process looks like.

Services

Case Studies

About

Contact

Testimonials

Nav & Paths

System ContextKnowledge BaseStructured JSON

xAIGrokModel

Services

Case Studies

About

Contact

Testimonials

Nav & Paths

System ContextKnowledge BaseStructured JSON

xAIGrokModel

The key insight here is that the model is not browsing the site. It is not crawling URLs or fetching pages at query time. The entire relevant knowledge base is in the system context before the first token is generated. This means Ask is fast, deterministic in what it knows, and impossible to manipulate into fetching content from outside the approved knowledge set.

When we update site content, the knowledge base updates automatically on the next request because the builder function imports live from the same data modules that power the site itself. There is no separate sync process, no stale index to worry about.

The Architecture: From Browser to Model

Ask is built on TanStack AI, which handles the streaming infrastructure between the browser and the server, and the xAI Grok adapter for the model itself. The combination gives us a clean, type-safe request pipeline with Server-Sent Events (SSE) for real-time token streaming, and a model that is fast, capable, and well-suited to factual Q&A within a bounded domain.

Why TanStack AI

We already use TanStack Start, Router, Query, and Form throughout our stack, and TanStack AI fits naturally into that ecosystem. It provides the client-side hooks for managing chat state, the server-side streaming utilities, and a clean adapter pattern that lets us swap models without touching application code. It is headless and type-safe, which matches how we build everything else.

Why xAI Grok

Grok is fast and accurate for factual, conversational tasks. For an assistant that is answering questions from a bounded knowledge base, speed matters as much as capability. A slow assistant is a bad assistant, no matter how smart it is. Grok's response times keep the interaction feel close to a real conversation.

The request pipeline

Every Ask conversation follows the same path. The browser uses a TanStack AI chat hook that connects to the server via SSE. The server validates the incoming request through a strict Zod schema, trims the conversation history to a safe size, builds the full system context from the knowledge base, and sends everything to the model. Tokens stream back to the browser in real time as the model generates them.

BrowseruseChat / SSE

/api/askPOST endpoint

ValidateZod + Truncate

TanStack AIchat()

GrokxAI model

Streamed tokens returned via SSE

One deliberate design choice: the assistant uses streaming by default. Rather than waiting for the entire response to generate before showing anything, visitors see the answer appear word by word. This makes the interaction feel immediate and responsive even on longer answers, and it allows the user to start reading before the full response is complete.

Rate Limiting: Protecting Availability Without Blocking Real Users

An AI assistant with a public API endpoint is an obvious target for abuse, whether from bad actors trying to extract value at scale or automated traffic that has nothing to do with your actual visitors. Without protection, a poorly-configured endpoint can burn through API quota and drive up costs in minutes. We built rate limiting in from the start, not as an afterthought.

Sliding window, not hard quotas

The rate limiter uses a sliding window algorithm stored in Firestore. Rather than resetting a counter on the hour, the window looks at recent request history and evaluates current usage against a rolling period. This is more accurate and more fair than hourly buckets, which can either starve a legitimate user who hits the boundary at the wrong moment or allow a burst that the hourly model misses.

We deliberately chose not to publish the specific limits in this article. The limits are configured through environment variables so they can be tuned without a code deployment, and exposing exact numbers would only be useful to someone trying to stay just under the threshold.

Fail open, not fail closed

One of the most important design decisions in the rate limiter was how it handles its own failure. If the Firestore check times out or returns an error, the request is allowed through, not rejected. This is sometimes called a fail-open design, and we chose it deliberately. The risk of blocking a real user because our rate limiter had a transient error is worse than the risk of a few extra requests slipping through during an outage. The assistant still works; it just temporarily operates without its safety net.

Rate limit responses look like real responses

When a visitor does hit the rate limit, they receive a response through the same SSE stream as a normal answer. From the browser's perspective, the response format is identical. The only difference is the content, which is a polite message explaining the limit. This means the rate limit handling requires no special-case code in the frontend, and the experience remains consistent.

Defense in Depth: Four Layers Before the Model Sees Anything

Security for an AI endpoint is not a single thing you add at the end. It is a sequence of independent checkpoints, each one catching a different class of problem. By the time a request reaches the model, it has already passed through rate limiting, server-side credential isolation, schema validation, and input trimming. The model is the last step in the chain, not the first.

Defense in Depth — outermost to innermost

Rate LimitingSliding-window abuse prevention

API Key IsolationCredentials never leave the server

Schema ValidationZod rejects malformed input at the edge

Input TruncationMessage count and length capped before model

Prompt GuardrailsPersona rules, scope limits, refusal copy

Grok ModelGenerates from knowledge base only

Each layer must pass before the model is reached

Layer 1: Rate limiting

As described above, the rate limiter fires before any other processing. A request that is blocked at this layer never touches the API key, never runs a schema check, never reaches the model. This is intentional: the cheapest check runs first.

Layer 2: API key isolation

The xAI API key lives exclusively in the server environment. It is never bundled into the client, never exposed in a response header, never readable from the browser. The client does not need to know which model provider is in use. The API key is a server secret, full stop.

Layer 3: Schema validation and input truncation

Every request body is validated with a Zod schema before any processing occurs. The schema enforces the exact structure the API expects: a typed array of messages with defined roles and content shapes. Anything that does not match returns a 400 immediately. No model call is made, no error is logged to an external service with user data, the request simply stops.

After validation, the message history is trimmed. The server enforces a maximum number of recent messages and a per-message length cap. This has two effects: it limits the token budget sent to the model (cost control), and it prevents prompt injection attacks that rely on burying instructions deep in a long conversation history.

Layer 4: System prompt guardrails

The last layer before the model is the system prompt itself. This is where behavior is constrained at the model level, which we cover in detail in the next section.

Prompt-Level Guardrails: Constraining the Model to What We Know

Even a well-configured model with a rich knowledge base needs explicit behavioral rules. Without them, a model will answer questions that are outside its intended scope, speculate on things it does not know, adopt whatever persona the user suggests, or confirm information about the underlying technology stack. The system prompt is the place where intent becomes instruction.

The persona block

The system prompt opens with a strict persona definition. Ask is introduced to the model as a knowledgeable assistant for Select Interactive specifically, not a general assistant, not a coding helper, not a creative writing tool. The persona block establishes what Ask is for and, equally importantly, what it is not for.

Fact-only responses from the knowledge base

The most critical rule in the system prompt is that the model must answer only from the knowledge base provided. If the answer to a question is not in the knowledge base, Ask says so. It does not invent plausible answers. It does not draw on its general training data to fill gaps. It acknowledges the limit of what it knows and, where appropriate, directs the visitor to the right page or contact method.

Hard refusals

Several categories of question receive an explicit refusal regardless of context:

Pricing and estimates. Ask never quotes prices or gives cost estimates. Pricing is project-specific and context-dependent, and a model-generated number in a chat widget is not a substitute for a real scoping conversation.
Off-topic requests. Anything outside the domain of Select Interactive's services, work, and approach is politely declined. Ask does not write code for the visitor, does not answer general trivia, and does not engage with requests designed to redirect it.
Model and provider disclosure. Ask does not confirm which AI model or provider is powering it. This is a deliberate policy, not a technical limitation.
Jailbreak and persona override attempts. The system prompt explicitly addresses attempts to instruct the model to ignore its rules, pretend to be a different assistant, or reveal its instructions. These instructions take precedence over anything in the conversation.

Brand voice

Ask speaks in the same voice as the rest of the site: direct, knowledgeable, and professional without being corporate. The tone rules are part of the system prompt. Every conversation, regardless of what the visitor asks or how they phrase it, stays on-brand and on-message. There is no variance between sessions, no off day, no inconsistent answer from a different team member. The assistant is as consistent as a written style guide because it follows one.

Ask Is a Product Decision, Not a Feature Toggle

The version of Ask on this site did not get built by dropping an API key into a third-party chat widget. It required deliberate decisions about what the assistant knows, how it behaves, and what it protects against. The knowledge base strategy, the security stack, the prompt guardrails, and the rate limiting are all necessary parts of a system that works in production.

The result is an assistant that genuinely helps visitors, stays accurate, and cannot be steered off course. It has reduced the friction between a visitor's question and the answer they need. We have seen it handle questions about services, project approaches, team background, and process details, all accurately and without requiring a page reload.

We are building this kind of purposeful AI integration for clients as well. The same principles, knowledge-base-grounded responses, layered security, and brand-voice control, apply whether the use case is a marketing site, a product documentation portal, or a customer support interface. If you are thinking about adding AI to your digital experience and want it done correctly, this is the kind of work we do.

Tagged:AI Conversational AI Grok Guardrails Knowledge Base Rate Limiting React Security SSE TanStack AI TypeScript Web Strategy xAI

Work With Us

Have a project in mind?

We build the web's most demanding applications. Let's talk about yours.

Get in Touch

Building Ask: How We Put Our Entire Website Into an AI Assistant

The Problem: Visitors Should Not Have to Hunt

The Knowledge Base: Teaching the Model What It Needs to Know

How the knowledge base is structured

The Architecture: From Browser to Model

Why TanStack AI

Why xAI Grok

The request pipeline

Rate Limiting: Protecting Availability Without Blocking Real Users

Sliding window, not hard quotas

Fail open, not fail closed

Rate limit responses look like real responses

Defense in Depth: Four Layers Before the Model Sees Anything

Layer 1: Rate limiting

Layer 2: API key isolation

Layer 3: Schema validation and input truncation

Layer 4: System prompt guardrails

Prompt-Level Guardrails: Constraining the Model to What We Know

The persona block

Fact-only responses from the knowledge base

Hard refusals

Brand voice

Ask Is a Product Decision, Not a Feature Toggle

Have a project in mind?

Related Articles

From Audit to Action: How We Use Cursor’s SEO Skill and Agents for Full-Stack SEO

Preparing Your Tech Stack for Agentic AI: What Web Applications Need to Support Autonomous Agents

Building select-interactive.com: How We Put Our Own Stack to the Test