morty-voice v2: Production Architecture Proposal

#Executive Summary

The morty-voice project — a Rust-based voice assistant powered by Gemini 3.1 Flash Live — has proven the concept but exposed critical failures in both the voice pipeline and the development lifecycle surrounding it. Over a single sprint session, we logged 50+ human interventions, 9 process failures, and roughly 6 hours lost to rate limits and pipeline bugs.

This proposal replaces the custom audio pipeline with Pipecat (the emerging standard for local voice + LLM orchestration), migrates development infrastructure to Linear Agent API + cloud CI + CodeRabbit, introduces automated voice quality testing via Hamming AI, and wraps deployment in Kamal for zero-touch, rollback-capable deploys.

The net cost increase is approximately $84/month. The expected result: a voice assistant that ships reliably without requiring Paul to babysit every merge.

#Current State: What's Broken

Let's be honest about where we are. The prototype works — Paul can talk to Gemini, trigger smart home commands, and query Linear. But the system around it is held together with duct tape and manual intervention.

By the Numbers

50+

Human interventions

per sprint session

Process failures

in one evening

~6h

Hours lost

rate limits & bugs

Automated audio tests

regressions ship silently

Automated deploys

every deploy is manual

∞

Zombie PRs

polluting the repo

Specific Failures

PM cron nudging the wrong platform. The automation sends nudges to GitHub instead of Linear, where the actual work is tracked. This is a wiring bug that's been burning cycles for weeks.
CI running on self-hosted Mac mini. The same machine running the voice assistant is also running CI. A heavy test suite competes with audio processing for CPU and memory.
Direct code pushes bypassing the pipeline. When the pipeline is painful, people route around it. That's not a discipline problem — it's a tooling problem.
Cyrus (Claude Code) hitting rate limits and going silent. The AI developer hits Anthropic's rate limits, stops working, and nobody notices until the sprint stalls. No fallback, no alert, no graceful degradation.
Webhook + comment scraping for Linear integration. Instead of using Linear's structured API, we're scraping webhook payloads and parsing comment text. Fragile, lossy, and creates phantom state mismatches.
No automated audio testing. The core product is a voice assistant, and we have zero automated validation that audio quality hasn't regressed.

#Proposed Architecture

Voice Pipeline: Pipecat + Gemini Live

The problem: The current custom Rust audio pipeline handles recording, playback, VAD, echo cancellation, and LLM communication in a monolithic architecture. Every change risks breaking the audio path, and there's no standard framework for testing or extending it.

The solution: Pipecat — a Python framework purpose-built for real-time voice AI pipelines. It's the emerging reference architecture for exactly what morty-voice does: local mic/speaker → VAD → LLM → tool calls → speech output.

Architecture

┌─────────────────────────────────────────────────────┐
│                   Pipecat Pipeline                    │
│                                                       │
│  Shure MV7+ ──→ LocalTransport (16kHz input)         │
│       │                                               │
│       ▼                                               │
│  aec3-rs (Echo Cancellation) ──→ Silero VAD          │
│                                       │               │
│                                       ▼               │
│                    GeminiMultimodalLiveLLMService     │
│                         (Gemini 3.1 Flash Live)       │
│                              │                        │
│                    ┌─────────┴─────────┐              │
│                    │                   │              │
│                    ▼                   ▼              │
│              Speech Output      Async Tool Dispatch   │
│            (24kHz PCM → DAC)     ┌────┴────┐         │
│                                  │    │    │         │
│                                Hue  Linear OpenClaw  │
└─────────────────────────────────────────────────────┘

Key Design Decisions

LocalTransport for audio I/O. Pipecat's LocalTransport handles the Shure MV7+ mic input (16kHz) and built-in speaker output (24kHz). No custom audio device management code.
aec3-rs for echo cancellation. Gemini has no server-side AEC — this must be client-side. aec3-rs is a pure Rust port of WebRTC's AEC3 algorithm, battle-tested across billions of WebRTC sessions.
Silero VAD for voice activity detection. Pipecat integrates Silero VAD natively. This replaces any custom VAD logic and provides proper barge-in support.
Frame-based pipeline. Pipecat processes audio in frames, not raw streams. Every stage (AEC → VAD → LLM → output) is a composable, testable unit.
Async tool dispatch. Tool calls (Hue, Linear, OpenClaw) MUST be async. Blocking tool execution causes dead air.
Session reconnection. Gemini Live sessions have a 10-15 minute lifetime. The pipeline must handle reconnection transparently using resumption tokens.

What Stays in Rust

Not everything moves to Python. The aec3-rs echo cancellation stays as a Rust library called via PyO3 or as a subprocess. Audio-critical path processing benefits from Rust's performance guarantees. The Pipecat pipeline orchestrates; Rust handles the hot path.

Development Pipeline: Linear Agent API + Cloud CI

The problem: The current pipeline uses webhook scraping and GitHub comment parsing to bridge Linear and the development workflow. CI runs on the Mac mini. There's no automated code review, and PRs accumulate as zombies.

The solution: A proper structured pipeline using Linear's Agent API, cloud CI, and automated review.

Linear Agent API

Linear launched their Agent API — a structured interface designed for exactly this use case: AI agents that need to read, create, update, and transition issues programmatically.

Before: Webhook fires → parse JSON payload → scrape comment text → guess at state transitions
After: Agent API call → structured issue object → explicit state mutation → confirmation

This eliminates the entire class of "phantom state" bugs where the system thinks an issue is in one state but Linear shows another.

Cloud CI

Runner Type	What It Runs	Why
Cloud (ubuntu-latest)	Linting, formatting, unit tests, integration tests, CodeRabbit review	Scalable, isolated, doesn't compete with voice assistant
Self-hosted (Mac mini)	Audio hardware integration tests ONLY	Needs physical access to Shure MV7+ and speakers

CodeRabbit for Automated Review

CodeRabbit provides AI-powered code review at $12-30/user/month. For a single-developer project with an AI coding agent, this is the highest-leverage spend available:

Catches AI-generated issues. Research shows AI PRs have 1.7x more issues than human PRs and security bugs at 1.5-2x the rate.
Reviews every PR. Unlike human reviewers, it never skips a review because it's 11pm.
Integrates with GitHub. Comments directly on PRs with line-level suggestions.

Human Approval Gate

Every PR that passes CI and CodeRabbit review still requires human approval before merge to production. This is non-negotiable.

Deployment: Kamal + Health Checks

The problem: After a PR merges, nothing happens. Deployment is manual: SSH into the Mac mini, pull the latest code, rebuild, restart.

The solution: Kamal — Basecamp's zero-downtime deployment tool, designed for deploying Docker containers to a single machine via SSH, with health checks and automatic rollback.

Deployment Flow

git push (merge to main)
        │
        ▼
GitHub Actions (cloud)
  ├── Build Docker image
  ├── Run test suite
  ├── Push to GitHub Container Registry
  │
  ▼
Kamal Deploy (via SSH to Mac mini)
  ├── Pull new container image
  ├── Start new container alongside old one
  ├── Run health check (HTTP + audio device probe)
  │
  ├── ✅ Health check passes → Route traffic to new container
  │                           → Stop old container
  │                           → Done
  │
  └── ❌ Health check fails  → Kill new container
                              → Keep old container running
                              → Create Linear issue automatically
                              → Alert via Discord

What This Means

No manual restart, ever. git push is the deploy command.
Docker containers, not bare processes. Pinned dependencies. No more "it works on my machine."
Automatic rollback. If the new version can't pass a health check, Kamal keeps the old version running.
Health checks that matter. Not just "is the process alive?" but "can it hear, talk, and reach Gemini?"

Testing: Hamming AI + PESQ

The problem: The core product is a voice assistant and there are zero automated tests for voice quality. Audio regressions ship silently.

The solution: Automated voice quality testing using Hamming AI for end-to-end conversation testing and PESQ/MOS metrics for audio signal quality.

Hamming AI — Conversation Quality

Hamming AI provides automated voice agent testing with 95-96% agreement with human evaluators. It runs synthetic conversations, evaluates response quality, detects regressions, and runs as part of CI/CD.

PESQ/MOS — Audio Signal Quality

PESQ (Perceptual Evaluation of Speech Quality) provides an objective MOS score. If the score drops below MOS < 3.5, the deploy is blocked — catching echo cancellation regressions, audio gain issues, sample rate conversion bugs, and VAD cutting off speech.

Deploy Gate

Deploy Pipeline:
  ├── Unit tests pass?          → ✅ Continue / ❌ Block
  ├── Integration tests pass?   → ✅ Continue / ❌ Block
  ├── CodeRabbit review clean?  → ✅ Continue / ❌ Block
  ├── Hamming AI score ≥ 90%?   → ✅ Continue / ❌ Block
  ├── PESQ MOS ≥ 3.5?           → ✅ Continue / ❌ Block
  ├── Human approval?           → ✅ Deploy  / ❌ Block
  └── Health check post-deploy? → ✅ Live    / ❌ Rollback

Rate Limit Resilience

The problem: Cyrus (Claude Code) hits Anthropic rate limits and goes silent. No fallback, no alert, no queue. Work stops. This single failure mode has cost ~6 hours in one sprint.

The solution: LiteLLM proxy for multi-provider routing, plus operational guardrails.

LiteLLM Proxy

Multi-provider routing. If Anthropic rate-limits, route to a secondary provider for non-critical work.
Token budget tracking. Real-time visibility into spend per consumer, per provider.
Rate limit awareness. Proactively queues requests before hitting limits.

Operational Guardrails

Strategy	Implementation
Off-peak scheduling	Batch work runs during off-peak hours (2-6am PT)
Token budget alerts	Daily budget cap per consumer. At 80%, alert. At 100%, queue non-critical work.
Graceful degradation	When rate-limited: queue work items in Linear, notify via Discord, resume when capacity returns
Provider diversity	Critical path (voice) uses Gemini. Development uses Anthropic/OpenAI via LiteLLM.

Monitoring: Sentry + Grafana

The problem: When something breaks, the detection mechanism is "Paul notices." There's no crash reporting, no metrics dashboard, no automated alerting.

Sentry — Error Tracking

Sentry (free tier: 5K errors/month) provides automatic crash detection with full context, auto-created Linear issues for new error classes, and release tracking per Kamal deploy.

Grafana — Operational Dashboard

Metric	Source	Alert Threshold
Voice response latency (P50, P95, P99)	Pipecat metrics	P95 > 2s
Audio quality (PESQ MOS)	Test pipeline	MOS < 3.5
Gemini session reconnects/hour	Pipecat logs	> 6/hour
Tool dispatch success rate	Application logs	< 95%
Echo cancellation effectiveness	AEC metrics	Residual echo > -40dB
Rate limit events/hour	LiteLLM proxy	> 5/hour
Memory/CPU utilization	System metrics	CPU > 80% sustained

Alert Flow

Error detected (Sentry/Grafana)
        │
        ▼
  Create Linear issue (automated)
        │
        ▼
  Send Discord notification
        │
        ▼
  If critical (voice assistant down):
    → Page Paul via Discord DM
    → Auto-rollback via Kamal if health check fails

#Human-in-the-Loop Gates

The contrarian research is unambiguous: zero-human-review is not viable for AI-generated code. 88% of AI agent projects fail before production. AI PRs carry 1.7x more issues. 43% of AI patches that pass CI introduce new failures under adversarial conditions.

This doesn't mean AI coding agents are useless — it means they need guardrails. Here are the five non-negotiable human gates:

Architecture Decisions

Who: Paul · When: Before any structural change

AI agents optimize locally. They'll refactor a function beautifully while introducing a dependency that breaks the deployment model. Architecture requires system-level thinking that current AI can't reliably provide.

Security-Sensitive Changes

Who: Paul · When: Any change touching auth, secrets, permissions

AI-generated code has security bugs at 1.5-2x the human rate. For a system that controls smart home devices and has access to Linear/OpenClaw, a security regression isn't just a bug — it's a liability.

Production Deploy Approval

Who: Paul · When: After all automated checks pass

The final human checkpoint. A 30-second review of "what changed, do I trust it?" This is the cheapest gate with the highest expected value.

Test Scenario Authoring

Who: Paul · When: New features or gaps in test coverage

AI can't test what it doesn't know is wrong. The subtle, domain-specific scenarios that catch real bugs require human authorship. AI can write the tests, but humans must design what to test.

Weekly Code Quality Audit

Who: Paul · When: 30 minutes, Monday morning

Drift happens slowly. A weekly scan of merged PRs, error trends, and code quality metrics catches the gradual degradation that no individual PR review would flag.

#Migration Path

Four phases, one week each. Each phase is independently valuable — if we stop after Phase 1, we're still better off.

Phase 1 — Week 1: Cloud CI + Linear Agent API + CodeRabbit

Goal: Fix the development pipeline. Stop the bleeding.

Task	Effort	Dependency
Migrate CI to GitHub cloud runners	4h	None
Keep self-hosted runner for audio tests only	2h	CI migration
Integrate Linear Agent API (replace webhook scraping)	8h	None
Set up CodeRabbit on the repo	1h	None
Fix PM cron to target Linear (not GitHub)	1h	Linear API
Clean up zombie PRs	2h	None

Success criteria: CI runs in the cloud. Linear issues update via structured API. Every PR gets automated review. Zero zombie PRs.

Phase 2 — Week 2: Pipecat Voice Pipeline Migration

Goal: Replace the custom audio pipeline with Pipecat.

Task	Effort	Dependency
Set up Pipecat with LocalTransport	8h	None
Integrate GeminiMultimodalLiveLLMService	8h	Pipecat setup
Wire aec3-rs into Pipecat pipeline (PyO3)	6h	Pipecat setup
Migrate tool dispatch to Pipecat async pattern	4h	Gemini integration
Implement session reconnection with resumption tokens	4h	Gemini integration
Integration test: full voice conversation loop	4h	All above

Success criteria: Full voice conversation via Pipecat with equivalent or better quality. Tool calls, echo cancellation, and barge-in all working.

Phase 3 — Week 3: Kamal Deployment + Hamming AI Testing

Goal: Zero-touch deployment with automated quality gates.

Task	Effort	Dependency
Dockerize morty-voice	4h	Phase 2 complete
Set up Kamal config for Mac mini	4h	Docker
Implement health checks (audio device, Gemini, tools)	4h	Docker
Create synthetic voice test corpus	4h	None
Integrate Hamming AI for conversation testing	6h	Test corpus
Set up PESQ scoring in CI pipeline	4h	Test corpus
Wire deploy gate (all checks must pass)	2h	All above

Success criteria: git push → full test suite → human approval → deploy → health check → live. Automatic rollback if anything fails.

Phase 4 — Week 4: Sentry Integration + Monitoring Dashboard

Goal: Know when things break before Paul does.

Task	Effort	Dependency
Integrate Sentry SDK into morty-voice	2h	None
Configure Sentry → Linear issue creation	2h	Sentry
Set up LiteLLM proxy	4h	None
Configure multi-provider routing + budget alerts	4h	LiteLLM
Build Grafana dashboard (key metrics)	6h	Sentry, LiteLLM
Configure alerting (Discord notifications)	2h	Grafana
Document runbook for common failures	4h	All above

Success criteria: Errors auto-create tickets. Rate limits trigger fallback routing. Dashboard shows voice quality, latency, and error rates in real time.

#Cost Analysis

Current Monthly Costs

Anthropic Max (20x plan)	$200
GitHub Actions	$0
Gemini API	~$10
Paul's time (babysitting)	Priceless

~$210/mo

Proposed Monthly Costs

Anthropic Max (20x plan)	$200
GitHub Actions (cloud)	~$10
Gemini API	~$10
CodeRabbit (Pro)	$24
Hamming AI	~$50
Sentry / Grafana / Kamal / LiteLLM	$0

~$294/mo

Additional monthly cost

+$84/mo

Eliminates 50+ human interventions, 6 hours lost productivity, silent failures, manual deploys, and zombie PRs.
At Paul's effective hourly rate, the 6 hours saved per sprint pays for ~18 months of additional tooling — every single sprint.

#Risk Assessment

The contrarian research demands we acknowledge what could go wrong. Enthusiasm doesn't ship software; realism does.

Risk 1: Pipecat Immaturity

Medium LikelihoodHigh Impact

Pipecat is early-stage. Breaking changes, incomplete documentation, and edge cases in Gemini Live integration could stall Phase 2.

Mitigation: Pin Pipecat version. Maintain ability to fall back to v1 pipeline for 30 days post-migration. Keep the Rust audio path compilable as an escape hatch.

Risk 2: Over-Automation Leading to Quality Erosion

Medium LikelihoodHigh Impact

43% of AI patches that pass CI introduce new failures. More automation doesn't fix this if the test suite doesn't cover the right scenarios.

Mitigation: Human-authored test scenarios. Weekly code quality audit. PESQ regression detection as a hard deploy gate.

Risk 3: Tooling Sprawl

Medium LikelihoodMedium Impact

Six new systems to maintain. That's cognitive overhead.

Mitigation: Every tool has a free tier or is open source. If any tool creates more problems than it solves, rip it out. The architecture is modular.

Risk 4: Docker Overhead on Mac mini

Low LikelihoodMedium Impact

Running in Docker on macOS adds virtualization overhead. Audio device passthrough is not trivial.

Mitigation: Test early in Phase 3. Kamal can deploy to bare metal with a process manager if Docker audio is problematic.

Risk 5: Cost Creep from AI Services

High LikelihoodMedium Impact

One unsupervised AI agent burned $5,623 in a month. Usage-based pricing can surprise you.

Mitigation: Hard budget caps in LiteLLM. Monthly cost review. Every service has known, bounded costs.

Risk 6: Single-Machine Dependency

Low LikelihoodCritical Impact

Everything runs on one Mac mini. Hardware failure means total outage.

Mitigation: Accepted risk for a personal voice assistant. Kamal's Docker setup makes migration to a new machine a single command. Back up configuration and secrets offsite.

#Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────────┐
│                         MORTY-VOICE v2 — FULL ARCHITECTURE                      │
└─────────────────────────────────────────────────────────────────────────────────┘

  VOICE PIPELINE (Runtime — Mac mini)
  ═══════════════════════════════════

  Paul speaks         ┌──────────────────────────────────────────────┐
       │              │            Pipecat Pipeline                  │
       ▼              │                                              │
  ┌──────────┐        │  ┌────────┐   ┌─────────┐   ┌───────────┐  │
  │ Shure    │───16kHz──▶│ aec3-rs│──▶│ Silero  │──▶│ Gemini    │  │
  │ MV7+ Mic │        │  │ (AEC)  │   │  VAD    │   │ 3.1 Flash │  │
  └──────────┘        │  └────────┘   └─────────┘   │   Live    │  │
                      │       ▲                      └─────┬─────┘  │
  ┌──────────┐        │       │                            │        │
  │ Built-in │◀─24kHz─│───────┼────────────────────────────┤        │
  │ Speakers │        │       │                            │        │
  └──────────┘        │  Speaker output                    │        │
       │              │  (reference signal                 ▼        │
  Paul hears          │   for AEC)              ┌──────────────┐   │
                      │                         │ Async Tool   │   │
                      │                         │  Dispatch    │   │
                      │                         └──────┬───────┘   │
                      └────────────────────────────────┼───────────┘
                                                       │
                                    ┌──────────────────┼──────────────────┐
                                    │                  │                  │
                                    ▼                  ▼                  ▼
                              ┌──────────┐      ┌──────────┐      ┌──────────┐
                              │ Philips  │      │  Linear  │      │ OpenClaw │
                              │   Hue    │      │   API    │      │ (Claude) │
                              └──────────┘      └──────────┘      └──────────┘


  DEVELOPMENT PIPELINE (CI/CD)
  ════════════════════════════

  ┌──────────┐    ┌──────────┐    ┌──────────────┐    ┌──────────┐
  │  Cyrus   │───▶│  GitHub  │───▶│ GitHub       │───▶│ CodeRab  │
  │ (Claude  │ PR │   Repo   │    │ Actions      │    │  bit     │
  │  Code)   │    │          │    │ (Cloud)      │    │ Review   │
  └──────────┘    └──────────┘    │              │    └────┬─────┘
       ▲               │          │ • Lint/fmt   │         │
       │               │          │ • Unit tests │         │
  ┌──────────┐         │          │ • Int. tests │         ▼
  │ LiteLLM  │         │          │ • PESQ/MOS   │    ┌──────────┐
  │  Proxy   │         │          │ • Hamming AI │    │  Human   │
  │(fallback)│         │          └──────────────┘    │ Approval │
  └──────────┘         │                              └────┬─────┘
                       │                                   │
                       │          ┌──────────────┐         │
                       │          │  Self-hosted  │         │
                       └─────────▶│  Runner (Mac) │         │
                                  │ Audio HW tests│         │
                                  └──────────────┘         │
                                                           ▼
  DEPLOYMENT                                        ┌──────────┐
  ══════════                                        │  Kamal   │
                                                    │  Deploy  │
  ┌──────────────────────────────┐                  └────┬─────┘
  │          Mac mini            │                       │
  │  ┌────────────┐  ┌────────┐ │◀── SSH ────────────────┘
  │  │ morty-voice│  │  old   │ │
  │  │ (new)      │  │ (kept  │ │    Health Check:
  │  │            │  │  until │ │    ✅ → swap & go live
  │  │ Docker     │  │  new   │ │    ❌ → rollback, create
  │  │ container  │  │  is    │ │         Linear issue,
  │  │            │  │  live) │ │         alert Discord
  │  └────────────┘  └────────┘ │
  └──────────────────────────────┘


  MONITORING
  ══════════

  ┌────────────┐         ┌────────────┐         ┌────────────┐
  │   Sentry   │────────▶│   Linear   │         │  Discord   │
  │ (errors)   │ auto-   │  (issues)  │         │  (alerts)  │
  └────────────┘ create  └────────────┘         └────────────┘
       ▲                                              ▲
       │              ┌────────────┐                  │
       └──────────────│  Grafana   │──────────────────┘
                      │ (metrics)  │   threshold
                      │            │   alerts
                      │ • Latency  │
                      │ • MOS      │
                      │ • Sessions │
                      │ • Errors   │
                      └────────────┘


  LINEAR INTEGRATION
  ══════════════════

  ┌──────────┐  Agent API   ┌──────────┐  assign   ┌──────────┐
  │  Morty   │─────────────▶│  Linear  │──────────▶│  Cyrus   │
  │  (PM)    │  structured  │  Agent   │  issues   │  (Dev)   │
  │          │◀─────────────│   API    │◀──────────│          │
  └──────────┘  status      └──────────┘  updates  └──────────┘

#Appendix: Key Technical Specs

Parameter	Value	Source
Gemini input sample rate	16kHz mono PCM	Gemini Live API docs
Gemini output sample rate	24kHz mono PCM	Gemini Live API docs
Gemini session lifetime	10-15 minutes	Gemini Live API docs
AEC algorithm	WebRTC AEC3 (via aec3-rs)	WebRTC project
VAD model	Silero VAD v5	Silero models
PESQ quality threshold	MOS ≥ 3.5 (Good)	ITU-T P.862
Hamming AI accuracy	95-96% vs human evaluators	Hamming AI benchmarks
CodeRabbit pricing	$12-30/user/month	CodeRabbit.ai
Sentry free tier	5,000 errors/month	Sentry pricing
Kamal	Free (open source, MIT)	kamal-deploy.org

This document is a living proposal. It will be updated as implementation progresses and assumptions are validated or invalidated. The architecture is designed to be modular — every component can be replaced independently without rebuilding the system.