Voice Messages

Enable speech-to-text for voice notes using Groq's free Whisper API

3 min read

Overview

Send voice messages to your OpenClaw instance and get intelligent responses. Instead of typing, just talk - OpenClaw transcribes your voice notes and responds as if you'd typed the message.

Supported channels:

  • Telegram voice notes
  • WhatsApp voice messages
  • Slack voice clips
  • Any chat platform that supports audio attachments

This guide covers setting up speech-to-text using Groq's free Whisper API, which gives you 8 hours of transcription per day at no cost.

Why Groq Whisper?

Groq runs OpenAI's Whisper model on their custom LPU chips, making it incredibly fast and offering a generous free tier:

FeatureValue
Free daily quota8 hours of audio
Speed~200x real-time (1 min audio transcribes in under 1 sec)
Languages50+ languages, auto-detected
CostFree tier, then $0.04/hour
API formatOpenAI-compatible

For comparison, OpenAI Whisper costs $0.006/minute ($0.36/hour). Groq's free tier covers most personal use cases entirely.

Step 1: Get Your Groq API Key

  1. Go to console.groq.com
  2. Sign up with Google, GitHub, or email
  3. Navigate to API Keys in the left sidebar
  4. Click Create API Key
  5. Name it something memorable (e.g., "openclaw-voice")
  6. Copy the key immediately - you won't see it again
  7. Store it securely (password manager recommended)

Your key will look like: gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Step 2: Configure OpenClaw

Choose your preferred method:

For Agents (Let OpenClaw Configure It)

The easiest way - just ask OpenClaw to set it up for you:

Enable voice message transcription using my Groq API key: gsk_your_key_here

Use the whisper-large-v3-turbo model.

OpenClaw will update the configuration automatically. Skip to Step 3.

For Humans (Manual Configuration)

Connect to your instance via Web Terminal or SSH.

Option A: Edit the config file

nano ~/.openclaw/openclaw.json

Add or update the tools.media.audio section:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          { "provider": "groq", "model": "whisper-large-v3-turbo" }
        ]
      }
    }
  }
}

Save and exit (Ctrl+X, then Y, then Enter).

Option B: Set environment variable

Add your Groq API key to the environment:

echo 'export GROQ_API_KEY="gsk_your_key_here"' >> ~/.bashrc
source ~/.bashrc

Then restart OpenClaw:

openclaw restart

Step 3: Test It

  1. Open Telegram and find your OpenClaw bot
  2. Tap and hold the microphone icon to record a voice note
  3. Say something like: "What can you help me with today?"
  4. Send the voice note
  5. OpenClaw should respond to your spoken message

If it works, you're all set! If not, check the Troubleshooting section below.

Telegram Voice Note Tips

Since Telegram is the most popular way to use voice with OpenClaw, here are some tips:

Recording:

  • Tap and hold the mic icon to record
  • Slide up to lock recording mode (hands-free)
  • Slide left to cancel

Mention detection in groups: OpenClaw is smart about voice notes in group chats. It transcribes the audio before checking for @mentions, so you can say:

"Hey @YourBot, what's the weather in New York?"

...and it will work, even with requireMention: true enabled.

Language: Whisper auto-detects language, but for better accuracy with a specific language, you can set it in config:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          { "provider": "groq", "model": "whisper-large-v3-turbo", "language": "en" }
        ]
      }
    }
  }
}

Model Options

Groq offers two Whisper models:

ModelSpeedAccuracyCostBest For
whisper-large-v3-turboFasterGood (12% WER)$0.04/hrDaily use, most users
whisper-large-v3SlowerBest (10.3% WER)$0.111/hrAccents, noisy audio, accuracy-critical

Recommendation: Start with whisper-large-v3-turbo. Switch to whisper-large-v3 if you notice transcription issues.

Free Tier Limits

Groq's free tier is generous for personal use:

LimitValueWhat It Means
Requests per minute20~20 voice notes per minute
Requests per day2,000~2,000 voice notes per day
Audio seconds per hour7,2002 hours of audio per hour
Audio seconds per day28,8008 hours of audio per day
Max file size25 MB~20-30 minutes per voice note

For most personal use, you'll never hit these limits. If you do, Groq's paid tier is very affordable.

Advanced Configuration

Here's a more complete configuration with fallbacks and scope control:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "maxBytes": 20971520,
        "models": [
          { "provider": "groq", "model": "whisper-large-v3-turbo" },
          { "provider": "openai", "model": "whisper-1" }
        ],
        "scope": {
          "default": "allow",
          "rules": [
            { "action": "deny", "match": { "chatType": "group" } }
          ]
        }
      }
    }
  }
}

Note: Comments are supported in OpenClaw's config (JSON5 format), but shown here as standard JSON for clarity.

Configuration options:

OptionDescriptionDefault
enabledEnable/disable voice transcriptiontrue (auto-detect)
maxBytesMax audio file size in bytes20MB
modelsOrdered list of transcription providersAuto-detect
scopeControl which chats allow voiceAll allowed

Troubleshooting

IssueSolution
No transcriptionVerify GROQ_API_KEY is set: echo $GROQ_API_KEY
"Rate limit exceeded"Wait 1 minute, or check daily quota at console.groq.com
"File too large"Record shorter voice notes (under 25MB / ~20 min)
Inaccurate transcriptionSwitch to whisper-large-v3 for better accuracy
Works in DMs, not groupsCheck requireMention setting or scope rules
Garbled outputReduce background noise, speak clearly

Check logs for details:

openclaw logs --tail 50

Look for lines containing audio or transcription to diagnose issues.

Next Steps

You can now talk to your AI agent. Send a voice note, get an intelligent response - no typing required.