Voice Tasks
Siri for Sites using MCP
TL;DR
Let people talk to your web app. The SDK records, transcribes in under a second, and dispatches your MCP tools. You only define what each tool does on the server.
Over the past two weeks I built a proof of concept that shows how MCP slots into any website, especially if you have complex workflows, or e-commerce flows where a single call to action limits cross-sell opportunities.
Voice intent has mostly lived on phones with Siri, or desktop apps that transcribe voice into prompts (my daily workflow is Superwhisper to Cursor). I wanted to know if the same back-and-forth could happen inside the browser: not just recording audio, but parsing intent and executing it Siri-style.
MCP ended up being the missing piece. On the web you can blend user context (current page), recent history (last action), and chained tasks ("do this, then that"). That lets someone say "buy this", "take me back", or "add credit, then send me a receipt as PDF", without learning your UI first.
Here I share two demos: the maze below shows raw voice control moving a ball to the goal (try to win, see what happens), and the video above walks through how I navigate the Memoreco prototype, add credit, and send recording video links just by speaking. Both use the same SDK.
On the backend I analyze every attempted command, cluster the misses, and show you which MCP tools to add next. The goal is to keep improving the voice catalog based on what people say.
I am adding more use cases soon. If you want updates, hop on the waitlist or say hi @andupoto.
Hands-on Demo
Move the ball with your voice
Hold P and say "move the ball two boxes down and one left", and watch the MCP move the ball.
Loading maze…
Suggested prompts
- "Move the ball three boxes up."
- "Move the ball two boxes right."
- "Move one box down."
Status
Loading the maze... the Voice Tasks button is ready whenever you are.
Commands used: 0 / 10
Recent commands
Your transcript will appear here once the maze starts moving.
Under the hood the maze is powered by the same stack you see on Memoreco's dashboard: the SDK opens an audio-only session, streams it to Speechmatics for sub-second transcription, hands the text to Groq for intent parsing, and calls an MCP server you configure.
Replay the pipeline timeline
The simulator below replays the five pipeline stages. Click any step to inspect the SDK's state.
- Go to recordings
- Create a recording request and send it to john@example.com
- Take me to billing and add $10 credit
- Take me back
Pipeline
- Record
The SDK opens an audio-only session and records a short clip.
Next - Transcribe
Streaming speech-to-text returns words within a few hundred milliseconds.
Next - Parse
AI Assist extracts parameters and the LLM selects a tool.
Next - Execute
Your MCP endpoint receives the tool name and parameters.
Next - Result
A structured payload comes back to your UI for the next step.
Next
Results
Status
Current
idle
Command
Take me to billing and add $10 credit
Transcript
—
Parsed intent
// run a command to see the structured intent
Tool result
// the MCP reply appears here
Voice commands feel natural
Voice commands understand context, not just words. It knows where the user is, which resource they've selected, and any metadata you decide to pass in. So when someone says "share this recording", the system already knows what "this" means.
Every interaction is stored in a lightweight history layer. That's how "undo that" or "go back" work, by using simple lookups paired with your undo logic, provided in the MCP.
As shown in the dashboard walkthrough, the LLM turns speech into a sequence of actions, the SDK pauses when confirmation's needed, and each MCP response can trigger the next step.
Quick start in code
The SDK is currently published as @memoreco/memoreco-js@0.2.3. Drop it into your app, point it at your MCP endpoints, and listen for the structured replies. Reach out for API access.
Multi-language support
If you want multi-language support, flip transcriptionMode to "streaming" and specify the language when you initialise the provider. Everything else comes built-in.
Let's add voice commands to your product
The SDK is ready, the MCP templates are reusable, and the Speechmatics transcription is live. DM me and we'll set up Voice Tasks for your site, or share the explainer with your team if you want to spark ideas internally.