Voice Messages

Overview

Send voice messages to your OpenClaw instance and get intelligent responses. Instead of typing, just talk - OpenClaw transcribes your voice notes and responds as if you'd typed the message.

Supported channels:

Telegram voice notes
WhatsApp voice messages
Slack voice clips
Any chat platform that supports audio attachments

This guide covers setting up speech-to-text using Groq's free Whisper API, which gives you 8 hours of transcription per day at no cost.

Why Groq Whisper?

Groq runs OpenAI's Whisper model on their custom LPU chips, making it incredibly fast and offering a generous free tier:

Feature	Value
Free daily quota	8 hours of audio
Speed	~200x real-time (1 min audio transcribes in under 1 sec)
Languages	50+ languages, auto-detected
Cost	Free tier, then $0.04/hour
API format	OpenAI-compatible

For comparison, OpenAI Whisper costs $0.006/minute ($0.36/hour). Groq's free tier covers most personal use cases entirely.

Step 1: Get Your Groq API Key

Go to console.groq.com
Sign up with Google, GitHub, or email
Navigate to API Keys in the left sidebar
Click Create API Key
Name it something memorable (e.g., "openclaw-voice")
Copy the key immediately - you won't see it again
Store it securely (password manager recommended)

Your key will look like: gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Step 2: Configure OpenClaw

Choose your preferred method:

For Agents (Let OpenClaw Configure It)

The easiest way - just ask OpenClaw to set it up for you:

Enable voice message transcription using my Groq API key: gsk_your_key_here

Use the whisper-large-v3-turbo model.

OpenClaw will update the configuration automatically. Skip to Step 3.

For Humans (Manual Configuration)

Connect to your instance via Web Terminal or SSH.

Option A: Edit the config file

nano ~/.openclaw/openclaw.json

Add or update the tools.media.audio section:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          { "provider": "groq", "model": "whisper-large-v3-turbo" }
        ]
      }
    }
  }
}

Save and exit (Ctrl+X, then Y, then Enter).

Option B: Set environment variable

Add your Groq API key to the environment:

echo 'export GROQ_API_KEY="gsk_your_key_here"' >> ~/.bashrc
source ~/.bashrc

Then restart OpenClaw:

openclaw restart

Step 3: Test It

Open Telegram and find your OpenClaw bot
Tap and hold the microphone icon to record a voice note
Say something like: "What can you help me with today?"
Send the voice note
OpenClaw should respond to your spoken message

If it works, you're all set! If not, check the Troubleshooting section below.

Telegram Voice Note Tips

Since Telegram is the most popular way to use voice with OpenClaw, here are some tips:

Recording:

Tap and hold the mic icon to record
Slide up to lock recording mode (hands-free)
Slide left to cancel

Mention detection in groups: OpenClaw is smart about voice notes in group chats. It transcribes the audio before checking for @mentions, so you can say:

"Hey @YourBot, what's the weather in New York?"

...and it will work, even with requireMention: true enabled.

Language: Whisper auto-detects language, but for better accuracy with a specific language, you can set it in config:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          { "provider": "groq", "model": "whisper-large-v3-turbo", "language": "en" }
        ]
      }
    }
  }
}

Model Options

Groq offers two Whisper models:

Model	Speed	Accuracy	Cost	Best For
`whisper-large-v3-turbo`	Faster	Good (12% WER)	$0.04/hr	Daily use, most users
`whisper-large-v3`	Slower	Best (10.3% WER)	$0.111/hr	Accents, noisy audio, accuracy-critical

Recommendation: Start with whisper-large-v3-turbo. Switch to whisper-large-v3 if you notice transcription issues.

Free Tier Limits

Groq's free tier is generous for personal use:

Limit	Value	What It Means
Requests per minute	20	~20 voice notes per minute
Requests per day	2,000	~2,000 voice notes per day
Audio seconds per hour	7,200	2 hours of audio per hour
Audio seconds per day	28,800	8 hours of audio per day
Max file size	25 MB	~20-30 minutes per voice note

For most personal use, you'll never hit these limits. If you do, Groq's paid tier is very affordable.

Advanced Configuration

Here's a more complete configuration with fallbacks and scope control:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "maxBytes": 20971520,
        "models": [
          { "provider": "groq", "model": "whisper-large-v3-turbo" },
          { "provider": "openai", "model": "whisper-1" }
        ],
        "scope": {
          "default": "allow",
          "rules": [
            { "action": "deny", "match": { "chatType": "group" } }
          ]
        }
      }
    }
  }
}

Note: Comments are supported in OpenClaw's config (JSON5 format), but shown here as standard JSON for clarity.

Configuration options:

Option	Description	Default
`enabled`	Enable/disable voice transcription	`true` (auto-detect)
`maxBytes`	Max audio file size in bytes	20MB
`models`	Ordered list of transcription providers	Auto-detect
`scope`	Control which chats allow voice	All allowed

Troubleshooting

Issue	Solution
No transcription	Verify `GROQ_API_KEY` is set: `echo $GROQ_API_KEY`
"Rate limit exceeded"	Wait 1 minute, or check daily quota at console.groq.com
"File too large"	Record shorter voice notes (under 25MB / ~20 min)
Inaccurate transcription	Switch to `whisper-large-v3` for better accuracy
Works in DMs, not groups	Check `requireMention` setting or scope rules
Garbled output	Reduce background noise, speak clearly

Check logs for details:

openclaw logs --tail 50

Look for lines containing audio or transcription to diagnose issues.

Next Steps

Web Terminal - Configure OpenClaw from your browser
Auth-Protected Apps - Share your apps securely
Security Layers - Understand your instance's security

You can now talk to your AI agent. Send a voice note, get an intelligent response - no typing required.