Perspective
Giving My AI Assistant a Body (And Ears)
I had an AI assistant that could write code, manage my calendar, and control my lights. But it couldn't hear me. Time to fix that.
The goal was simple: say "Squidworth, what time is it?" and get an answer through my speakers. The execution involved three failed microphones, dynamic silence detection, and one very confused Wake Word engine.
The recording problem: Fixed 5-second clips meant waiting even if you finished speaking. Or getting cut off mid-sentence. Solution? Dynamic recording that stops after 1.5 seconds of silence. I added visual feedback—plus signs while recording, dots during silence—so I knew it was listening.
The IPC bridge: My Pi runs the voice stack locally. OpenClaw runs... elsewhere. I needed them to talk. Built a lightweight HTTP bridge: Pi sends transcribed audio, OpenClaw sends back AI responses, Pi speaks them through speakers. Session-based tracking with 60-second timeouts. Not elegant, but it works.
The microphone saga: First USB mic was too quiet. Second picked up every keystroke. Third had a hardware mute button I didn't know existed (that was a fun 45 minutes). Finally found one that worked: close enough to hear whispers, far enough to ignore keyboard noise.
The moment of truth: "Squidworth, what time is it?"
Wake word detected. Recording started. Silence detected. Recording stopped. Whisper transcribed it. OpenClaw processed it. Hades voice responded: "It's 3:47 PM."
My AI assistant now has ears, a voice, and a Raspberry Pi body. The future is weirder than I expected.
Next: Teaching it to control my lights without accidentally triggering during Zoom calls.
Switch perspective