How We Built "Prizma": Lessons in Training an Agency AI Assistant

At Prizmstack, we don't just build AI solutions for clients—we use them ourselves. Prizma is our AI assistant that greets visitors, answers questions about our services, and qualifies leads before they ever speak to a human. This post shares the technical decisions, trade-offs, and lessons learned.

Why Build an In-House AI Assistant?

Proof of concept: Clients want to see that we practice what we preach.
Lead qualification: Prizma handles initial conversations, freeing our team for deeper engagements.
24/7 availability: Visitors from any timezone get instant responses.
Data ownership: We control the training data, prompts, and conversation logs.

Architecture Overview

Frontend: React with Valtio for state, Framer Motion for animations.
Backend: Next.js API routes proxying to OpenAI's Agents SDK.
Persistence: Supabase stores session IDs, message history, and rate-limit counters.
Voice: ElevenLabs converts text responses to natural-sounding speech.

Key Technical Decisions

1. Streaming Responses

Users expect instant feedback. We stream tokens from OpenAI to the browser using Server-Sent Events (SSE):

// Simplified streaming handler
export async function POST(req: Request) {
  const { messages } = await req.json();
 
  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages,
    stream: true,
  });
 
  return new Response(stream.toReadableStream(), {
    headers: { 'Content-Type': 'text/event-stream' },
  });
}

2. Session Management with JWT

To prevent abuse, we issue a signed JWT on first visit. The token encodes:

Session ID
Message count
Expiration timestamp

Each request validates the token and increments the counter. After 10 messages, we prompt users to book a call.

3. Voice Synthesis

ElevenLabs provides low-latency, high-quality TTS. We proxy requests through our API to keep the API key server-side:

// /api/elevenlabs/tts/route.ts
export async function POST(req: Request) {
  const { text, voiceId } = await req.json();
 
  const audio = await elevenlabs.textToSpeech(voiceId, { text });
 
  return new Response(audio, {
    headers: { 'Content-Type': 'audio/mpeg' },
  });
}

4. Prompt Engineering

Prizma's personality is defined in a system prompt:

You are Prizma, a friendly AI assistant for Prizmstack, a full-spectrum software development agency.

Your goals:
1. Greet visitors warmly and learn their name.
2. Understand their software challenges.
3. Explain how Prizmstack can help (AI, Product Development, Infrastructure).
4. Encourage them to book a discovery call.

Tone: Professional yet approachable. Concise answers. No jargon unless the user is technical.

We iterate on this prompt weekly based on conversation logs.

Lessons Learned

Challenge	Solution
Hallucinations about services	Added retrieval step with curated FAQ docs
Latency spikes	Moved to streaming + edge functions
Voice sounding robotic	Tuned ElevenLabs stability/similarity settings
Users gaming the system	JWT-based rate limiting + CAPTCHA fallback

Metrics After 3 Months

2,500+ conversations handled autonomously.
35% of visitors engage with Prizma.
12% conversion to booked discovery calls (up from 4% with static forms).
Average response time: 1.2 seconds (including TTS).

What's Next

Multi-modal input: Allow users to upload screenshots or documents.
Memory: Persist context across sessions for returning visitors.
Agent tools: Let Prizma query our CRM or schedule meetings directly.

Key Takeaways

Dogfooding builds credibility and surfaces real-world edge cases.
Streaming + voice dramatically improve perceived responsiveness.
Rate limiting is essential for any public-facing AI.
Prompt iteration is ongoing—treat it like code, version it, and review it.

Want to build an AI assistant for your business? Reach out and let's explore what's possible.

Topics covered

AILLMVoice AIElevenLabsOpenAIAutomation

Written by Prizmstack Team

Full-spectrum software agency