Voice Messages
Enable speech-to-text for voice notes using Groq's free Whisper API
Overview
Send voice messages to your OpenClaw instance and get intelligent responses. Instead of typing, just talk - OpenClaw transcribes your voice notes and responds as if you'd typed the message.
Supported channels:
- Telegram voice notes
- WhatsApp voice messages
- Slack voice clips
- Any chat platform that supports audio attachments
This guide covers setting up speech-to-text using Groq's free Whisper API, which gives you 8 hours of transcription per day at no cost.
Why Groq Whisper?
Groq runs OpenAI's Whisper model on their custom LPU chips, making it incredibly fast and offering a generous free tier:
| Feature | Value |
|---|---|
| Free daily quota | 8 hours of audio |
| Speed | ~200x real-time (1 min audio transcribes in under 1 sec) |
| Languages | 50+ languages, auto-detected |
| Cost | Free tier, then $0.04/hour |
| API format | OpenAI-compatible |
For comparison, OpenAI Whisper costs $0.006/minute ($0.36/hour). Groq's free tier covers most personal use cases entirely.
Step 1: Get Your Groq API Key
- Go to console.groq.com
- Sign up with Google, GitHub, or email
- Navigate to API Keys in the left sidebar
- Click Create API Key
- Name it something memorable (e.g., "openclaw-voice")
- Copy the key immediately - you won't see it again
- Store it securely (password manager recommended)
Your key will look like: gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Step 2: Configure OpenClaw
Choose your preferred method:
For Agents (Let OpenClaw Configure It)
The easiest way - just ask OpenClaw to set it up for you:
Enable voice message transcription using my Groq API key: gsk_your_key_here
Use the whisper-large-v3-turbo model.
OpenClaw will update the configuration automatically. Skip to Step 3.
For Humans (Manual Configuration)
Connect to your instance via Web Terminal or SSH.
Option A: Edit the config file
nano ~/.openclaw/openclaw.json
Add or update the tools.media.audio section:
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [
{ "provider": "groq", "model": "whisper-large-v3-turbo" }
]
}
}
}
}
Save and exit (Ctrl+X, then Y, then Enter).
Option B: Set environment variable
Add your Groq API key to the environment:
echo 'export GROQ_API_KEY="gsk_your_key_here"' >> ~/.bashrc
source ~/.bashrc
Then restart OpenClaw:
openclaw restart
Step 3: Test It
- Open Telegram and find your OpenClaw bot
- Tap and hold the microphone icon to record a voice note
- Say something like: "What can you help me with today?"
- Send the voice note
- OpenClaw should respond to your spoken message
If it works, you're all set! If not, check the Troubleshooting section below.
Telegram Voice Note Tips
Since Telegram is the most popular way to use voice with OpenClaw, here are some tips:
Recording:
- Tap and hold the mic icon to record
- Slide up to lock recording mode (hands-free)
- Slide left to cancel
Mention detection in groups: OpenClaw is smart about voice notes in group chats. It transcribes the audio before checking for @mentions, so you can say:
"Hey @YourBot, what's the weather in New York?"
...and it will work, even with requireMention: true enabled.
Language: Whisper auto-detects language, but for better accuracy with a specific language, you can set it in config:
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [
{ "provider": "groq", "model": "whisper-large-v3-turbo", "language": "en" }
]
}
}
}
}
Model Options
Groq offers two Whisper models:
| Model | Speed | Accuracy | Cost | Best For |
|---|---|---|---|---|
whisper-large-v3-turbo | Faster | Good (12% WER) | $0.04/hr | Daily use, most users |
whisper-large-v3 | Slower | Best (10.3% WER) | $0.111/hr | Accents, noisy audio, accuracy-critical |
Recommendation: Start with whisper-large-v3-turbo. Switch to whisper-large-v3 if you notice transcription issues.
Free Tier Limits
Groq's free tier is generous for personal use:
| Limit | Value | What It Means |
|---|---|---|
| Requests per minute | 20 | ~20 voice notes per minute |
| Requests per day | 2,000 | ~2,000 voice notes per day |
| Audio seconds per hour | 7,200 | 2 hours of audio per hour |
| Audio seconds per day | 28,800 | 8 hours of audio per day |
| Max file size | 25 MB | ~20-30 minutes per voice note |
For most personal use, you'll never hit these limits. If you do, Groq's paid tier is very affordable.
Advanced Configuration
Here's a more complete configuration with fallbacks and scope control:
{
"tools": {
"media": {
"audio": {
"enabled": true,
"maxBytes": 20971520,
"models": [
{ "provider": "groq", "model": "whisper-large-v3-turbo" },
{ "provider": "openai", "model": "whisper-1" }
],
"scope": {
"default": "allow",
"rules": [
{ "action": "deny", "match": { "chatType": "group" } }
]
}
}
}
}
}
Note: Comments are supported in OpenClaw's config (JSON5 format), but shown here as standard JSON for clarity.
Configuration options:
| Option | Description | Default |
|---|---|---|
enabled | Enable/disable voice transcription | true (auto-detect) |
maxBytes | Max audio file size in bytes | 20MB |
models | Ordered list of transcription providers | Auto-detect |
scope | Control which chats allow voice | All allowed |
Troubleshooting
| Issue | Solution |
|---|---|
| No transcription | Verify GROQ_API_KEY is set: echo $GROQ_API_KEY |
| "Rate limit exceeded" | Wait 1 minute, or check daily quota at console.groq.com |
| "File too large" | Record shorter voice notes (under 25MB / ~20 min) |
| Inaccurate transcription | Switch to whisper-large-v3 for better accuracy |
| Works in DMs, not groups | Check requireMention setting or scope rules |
| Garbled output | Reduce background noise, speak clearly |
Check logs for details:
openclaw logs --tail 50
Look for lines containing audio or transcription to diagnose issues.
Next Steps
- Web Terminal - Configure OpenClaw from your browser
- Auth-Protected Apps - Share your apps securely
- Security Layers - Understand your instance's security
You can now talk to your AI agent. Send a voice note, get an intelligent response - no typing required.