Back to Blog
AILLMVoice AIElevenLabsOpenAIAutomation

How We Built 'Prizma': Lessons in Training an Agency AI Assistant

A behind-the-scenes look at building Prizma, our in-house AI assistant—from prompt engineering to voice synthesis and real-time chat architecture.

P

Prizmstack Team

January 19, 2026

4 min read
648 words
How We Built 'Prizma': Lessons in Training an Agency AI Assistant

At Prizmstack, we don't just build AI solutions for clients—we use them ourselves. Prizma is our AI assistant that greets visitors, answers questions about our services, and qualifies leads before they ever speak to a human. This post shares the technical decisions, trade-offs, and lessons learned.


Why Build an In-House AI Assistant?

  1. Proof of concept: Clients want to see that we practice what we preach.
  2. Lead qualification: Prizma handles initial conversations, freeing our team for deeper engagements.
  3. 24/7 availability: Visitors from any timezone get instant responses.
  4. Data ownership: We control the training data, prompts, and conversation logs.

Architecture Overview

┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│   Browser   │ ───▶ │  Next.js    │ ───▶ │  OpenAI     │
│  (React UI) │      │  API Routes │      │  Agents SDK │
└─────────────┘      └──────┬──────┘      └─────────────┘
                            │
                            ▼
                     ┌─────────────┐
                     │  Supabase   │
                     │  (sessions) │
                     └─────────────┘
                            │
                            ▼
                     ┌─────────────┐
                     │ ElevenLabs  │
                     │  (TTS)      │
                     └─────────────┘
  • Frontend: React with Valtio for state, Framer Motion for animations.
  • Backend: Next.js API routes proxying to OpenAI's Agents SDK.
  • Persistence: Supabase stores session IDs, message history, and rate-limit counters.
  • Voice: ElevenLabs converts text responses to natural-sounding speech.

Key Technical Decisions

1. Streaming Responses

Users expect instant feedback. We stream tokens from OpenAI to the browser using Server-Sent Events (SSE):

// Simplified streaming handler
export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages,
    stream: true,
  });

  return new Response(stream.toReadableStream(), {
    headers: { 'Content-Type': 'text/event-stream' },
  });
}

2. Session Management with JWT

To prevent abuse, we issue a signed JWT on first visit. The token encodes:

  • Session ID
  • Message count
  • Expiration timestamp

Each request validates the token and increments the counter. After 10 messages, we prompt users to book a call.

3. Voice Synthesis

ElevenLabs provides low-latency, high-quality TTS. We proxy requests through our API to keep the API key server-side:

// /api/elevenlabs/tts/route.ts
export async function POST(req: Request) {
  const { text, voiceId } = await req.json();

  const audio = await elevenlabs.textToSpeech(voiceId, { text });

  return new Response(audio, {
    headers: { 'Content-Type': 'audio/mpeg' },
  });
}

4. Prompt Engineering

Prizma's personality is defined in a system prompt:

You are Prizma, a friendly AI assistant for Prizmstack, a full-spectrum software development agency.

Your goals:
1. Greet visitors warmly and learn their name.
2. Understand their software challenges.
3. Explain how Prizmstack can help (AI, Product Development, Infrastructure).
4. Encourage them to book a discovery call.

Tone: Professional yet approachable. Concise answers. No jargon unless the user is technical.

We iterate on this prompt weekly based on conversation logs.


Lessons Learned

| Challenge | Solution | |-----------|----------| | Hallucinations about services | Added retrieval step with curated FAQ docs | | Latency spikes | Moved to streaming + edge functions | | Voice sounding robotic | Tuned ElevenLabs stability/similarity settings | | Users gaming the system | JWT-based rate limiting + CAPTCHA fallback |


Metrics After 3 Months

  • 2,500+ conversations handled autonomously.
  • 35% of visitors engage with Prizma.
  • 12% conversion to booked discovery calls (up from 4% with static forms).
  • Average response time: 1.2 seconds (including TTS).

What's Next

  • Multi-modal input: Allow users to upload screenshots or documents.
  • Memory: Persist context across sessions for returning visitors.
  • Agent tools: Let Prizma query our CRM or schedule meetings directly.

Key Takeaways

  • Dogfooding builds credibility and surfaces real-world edge cases.
  • Streaming + voice dramatically improve perceived responsiveness.
  • Rate limiting is essential for any public-facing AI.
  • Prompt iteration is ongoing—treat it like code, version it, and review it.

Want to build an AI assistant for your business? Reach out and let's explore what's possible.

Topics covered

AILLMVoice AIElevenLabsOpenAIAutomation
P

Written by Prizmstack Team

Full-spectrum software agency

Ready to build something remarkable?

Let's discuss how we can help bring your vision to life with the same technical expertise shared in this article.

How We Built 'Prizma': Lessons in Training an Agency AI Assistant | Prizmstack