SuperWhisper Review
Local-first dictation with cloud AI modes
SuperWhisper is a voice dictation app for macOS and Windows offering both local (offline) and cloud AI transcription. Priced at $9/month or $299 lifetime. This independent review covers WER/CER accuracy across 6 test recordings, a privacy analysis, and a UX verdict.
SuperWhisper Verdict
Powerful local engine buried under a broken cloud flagship
SuperWhisper version 1.4.0 scores 6.6/10 overall in VoiceTools independent testing (tested 2026-05-30). Best local model (Whisper Standard) achieves 2.4% aggregate WER across 6 recordings. Best cloud model (Ultra) achieves 2.6% aggregate WER.
Works well for
- Whisper Standard (hidden): best-in-class local accuracy at 2.4% WER
- Genuine offline mode — no audio leaves device in local configuration
- Lifetime license option at $299 — no subscription required
Watch out for
- S1-Voice (cloud flagship default) shows 15-37% WER across test recordings
- Trial is 15 minutes total — blocks even offline models after limit
- Cloud mode sends app name, clipboard, and focused text to Modal.com beyond audio
Best for
- Power users willing to dig past default settings and switch to Whisper Standard
Not for
- Anyone who installs and expects the default model to work well
SuperWhisper Accuracy & Speed
| Model | Accuracy | Speed | ||
|---|---|---|---|---|
| English | Local | Parakeet Default 1.1 GB CPU Tested on CPU Ryzen AI 9 HX · 32 GB RAM
NVIDIA Parakeet TDT 0.6B — default local model in SuperWhisper 1.4. Fast, good on clean speech, but has no ITN: numbers and dates come out as spoken words. Default local model — users see this first | 12.4%
WER Word Error Rate
What % of words the model got wrong. 0% = every word correct.
8.1% CER Character Error Rate
Same as WER but measured letter-by-letter. Usually lower than WER.
22% PER Punctuation Error Rate
How accurately the model placed commas, periods, and other punctuation.
4 / 10 |
~3s
2–5s range
Post-stop latency
Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.
8 / 10 |
| Whisper Standard Best accuracy hidden in UI
500 MB CPU Tested on CPU Ryzen AI 9 HX · 32 GB RAM
OpenAI Whisper large-v2 running locally. Best accuracy of all tested models but hidden from the main model picker — requires Library search to find. Hidden — Settings > Library > search "Whisper Standard" | 2.4%
WER Word Error Rate
What % of words the model got wrong. 0% = every word correct.
1.6% CER Character Error Rate
Same as WER but measured letter-by-letter. Usually lower than WER.
18% PER Punctuation Error Rate
How accurately the model placed commas, periods, and other punctuation.
8 / 10 |
~8s
6–12s range
Post-stop latency
Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.
5 / 10 | ||
| Cloud | S1-Voice Default SuperWhisper's proprietary cloud model, presented as the headline AI feature. Applies aggressive rewriting that causes large content losses on some recordings. Default cloud model — users land here without changing settings | 22.4%
WER Word Error Rate
What % of words the model got wrong. 0% = every word correct.
14.2% CER Character Error Rate
Same as WER but measured letter-by-letter. Usually lower than WER.
38% PER Punctuation Error Rate
How accurately the model placed commas, periods, and other punctuation.
3 / 10 |
~2s
1–4s range
Post-stop latency
Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.
9 / 10 | |
| Ultra Best Cloud Cloud Whisper-class model offered as "Ultra" tier. More conservative post-processing than S1-Voice — accurate and stable across all recording types. Better than S1-Voice on every recording — less prominently surfaced in UI | 2.6%
WER Word Error Rate
What % of words the model got wrong. 0% = every word correct.
1.8% CER Character Error Rate
Same as WER but measured letter-by-letter. Usually lower than WER.
20% PER Punctuation Error Rate
How accurately the model placed commas, periods, and other punctuation.
8 / 10 |
~3s
2–5s range
Post-stop latency
Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.
8 / 10 | ||
| No models match — turn a filter back on. | ||||
SuperWhisper for Coding & IT Recommended: Whisper Standard Ultra
Coding
- snake_case identifiers mostly preserved
- CLI flags correct
- No hallucinations
- "Tauri" → "Tory"
- "tokio::runtime" → "Toko runtime"
Conference
- Best on accented English speaker — 0.45% WER
- "Kubernetes", "PostgreSQL", "microservices" exact
- Zero dropped sentences
- "load balancer" → "load balance" (once)
Coding
- snake_case identifiers intact
- CLI flags like "--release" correct
- No hallucinations
- "Tauri" → "Tarry"
- "axum" → "axm"
Conference
- Strong on accented English
- "PostgreSQL" and "Kubernetes" exact
- Only 1 minor substitution across 89s
- "schema" → "schema" (case flip once)
Coding
- Handles prose segments cleanly
- No hallucinations
- "cargo.toml" → "Cargo .toml"
- "tokio::spawn" → "tokio spawn"
- "impl Trait" → "imp Trait"
Conference
- Clean transcription of accented English speaker
- Technical terms like "API" and "SDK" correct
- "Kubernetes" → "Cubernetes"
Coding
- Faster response than local models (~2s)
- "impl AsyncRead" → "RUMP_INSTAL"
- Dropped entire code block (lines 14–17)
- "CloudFace" hallucinated (not in source)
Conference
- Fast turnaround even on longer clip
- "distributed systems" → "destructive systems"
- Dropped 3 full sentences mid-recording
- Non-deterministic: 37% WER run 1 vs 16% run 2 — same audio
SuperWhisper for Everyday & Long-form Recommended: Whisper Standard Ultra
Casual
- Perfect: zero word errors
- Disfluencies handled cleanly
Long-form
- Best long-form accuracy tested — no drift over 3:42
- Zero hallucinations across full recording
- "whisper.cpp" partially garbled near end
- Occasional sentence-boundary missed
Casual
- Perfect: zero errors on casual speech
- Natural disfluency handling
Long-form
- No drift — consistent quality start to finish
- Zero hallucinations across 3:42
- App name spacing inconsistent once
- "whisper.cpp" spacing once
Casual
- Perfect: zero word errors on casual speech
- Disfluencies (um, uh) preserved naturally
Long-form
- No quality drift over 3+ minutes
- Consistent pace throughout — no mid-recording degradation
- "whisper.cpp" split across sentences once
- Occasional dropped filler mid-paragraph
Casual
- Lost 2 of 3 sections — only opening paragraph survived
- Heavy rewriting distorts meaning of what remains
Long-form
- Manages to complete the full recording without timeout
- Large sections rewritten — 14% WER from aggressive post-processing
- Brand names and app identifiers mangled or dropped in later paragraphs
SuperWhisper for Numbers & Structured Data Recommended: Whisper Standard S1-Voice
Numbers/ITN
- Dates and currency nearly exact
- "$12,400.75" and phone number correct
- "March 15th, 2026" → "March 15, 2026" (minor format)
Numbers/ITN
- "$12,400.75" exact
- Phone number format correct
- "March 15th, 2026" → "March 15, 2026"
- "ABC-123456" → "ABC 123456" (hyphen dropped)
Numbers/ITN
- Phone number and date format correct
- "$12,400.75" → "$12400.75" (comma dropped)
- "Order ID" label partially dropped
Numbers/ITN
- No ITN — numbers output as spoken words throughout
- "$12,400.75" → "twelve thousand four hundred dollars and seventy five cents"
- Phone number and order ID completely garbled
SuperWhisper: Noise Resistance Recommended: Parakeet Ultra
Noisy Cafe
- Noise has zero effect — identical output to clean version
- Café background at SNR 5 dB not detected
Noisy Cafe
- Noise has zero effect — identical output to clean version
Noisy Cafe
- Near-perfect under café noise — only 1 minor substitution
- "in-between" split once
Noisy Cafe
- Handles café noise better than casual clean — rewriting helps here
- "in-between" → "in between"
- Some filler words not stripped
SuperWhisper UX & Integration
Getting started & flow
Reached first successful dictation in about a minute — nothing superfluous.
Default shortcut is comfortable and remappable, no system conflicts — but the push-to-talk option does not actually work.
Shows a center-screen message when the trial runs out, but there is no fallback — and settings navigation is scattered across sections.
Recording experience
Clear recording pill / overlay — recording state is obvious.
Easy to cancel a bad dictation; cancel hotkey included.
Pastes reliably into every app tested.
Auto-inserts the text and can restore your previous clipboard afterwards.
Managing your work
Browsable history with search; you can open a recording to see its mode, duration and even the prompt used. No export.
Fast switching by hotkey and from the pill UI.
~160 MB RAM · 0.3% CPU at rest (cloud).
SuperWhisper Features
Text processing
Cloud AI modes rewrite text — many models, BYOK for several providers. But S1-Voice over-rewrites and drops content.
Per-word replacements applied at transcription.
Bundled into the custom-dictionary feature, not a separate snippets UI — and also doable via LLM post-processing instructions.
Output & extras
Hidden behind the tray icon. Broken on LLM modes (returns a stale buffer); only works on the Voice mode, and the UX is so confusing it barely counts.
No txt / srt / json export, and history cannot be bulk-exported.
Pause, lower, or fully mute media while recording.
Local recognition
Genuine offline mode — no audio leaves device in local configuration.
Parakeet, Whisper Standard (hidden), S1-Voice, Ultra — but best local model is buried.
SuperWhisper Privacy
SuperWhisper keeps audio on-device when using local models. Cloud models upload audio to modal.com only after you press Stop.
Endpoints: modal.com, api.superwhisper.com
Nothing is uploaded until you confirm by pressing Stop. Cancel before then and the audio never leaves.
In cloud mode: active app name, focused element text, clipboard contents, computer name, locale, timezone
Your recordings are not used to train models.
You can turn off product analytics and telemetry.
You can set the app to never store your transcription history.
From the privacy policy not scored
- Privacy policy guarantees data is never used to train AI models and is not retained on SuperWhisper servers — all storage is local.
- States it collects no usage data and uses no cookies or tracking technologies.
- Note: the observed cloud mode still sends app context and clipboard to Modal.com — stronger than the policy implies, so cloud users should not assume "local-only".
Pricing
Methodology
Accuracy scores use WER (Word Error Rate) computed against multi-reference ground truth
with {a|b} alternates for valid transcription variants (e.g. 48% and
forty-eight percent are both accepted). Audio delivered via virtual cable from
ElevenLabs TTS. Single test session on 2026-05-30.