Willow Voice Review (2026) — 99% Accuracy, Cloud-Only

Willow Voice Verdict

7.2

out of 10

Accuracy

Cloud

10 Accuracy — Cloud 10 / 10

Speed

Cloud

10 Speed — Cloud 10 / 10

UX

8 UX 8 / 10

Features

4.4 Features 4.4 / 10

Privacy

1.5 Privacy 1.5 / 10

How we score →

A near-identical Wispr Flow clone: very fast, very clean, cloud by default

Willow Voice version 2.1.2 scores 7.2/10 overall in Voice-list independent testing (tested 2026-06-09). Cloud achieves 0.9% aggregate WER across 6 recordings.

Works well for

Near-instant results: 0–2s, audio streamed as you speak
Excellent accuracy out of the box — 1.5% aggregate WER, flawless on numbers and dates
Polished, genuinely pleasant Mac-style UX and onboarding

Watch out for

Heavy tracking that cannot be disabled — Sentry, PostHog, S3, Loom all fire even while idle
Free tier is cloud-only; a local/offline mode exists but is locked behind Pro and could not be verified
Windows build has no platform-specific polish — it is the Mac app in a Windows window

Best for

Mac or Windows users who are happy in the cloud and want polished English dictation with near-instant results and zero model-picking

Not for

Privacy-conscious users — tracking is extensive and cannot be turned off — and anyone who needs an offline mode they can actually verify (it exists only on Pro and we could not test it)

Willow Voice Accuracy & Speed

		Model	Accuracy	Speed
English	Cloud	Cloud Free tier Willow Voice's cloud model — the only one available on the free tier (there is no model picker, just a language list). A local/offline mode is advertised on Pro but is locked, so we could not test it. Auto-cleanup (disfluency removal, capitalisation, punctuation, ITN) is always on; a separate Scribe mode applies a full LLM rewrite (paid). One cloud model for everyone — free tier only lets you pick a recognition language	99.1% Word accuracy The share of words the model got right (100% − word error rate). 100% = every word correct. 0.9% WER Word Error Rate What % of words the model got wrong. 0% = every word correct. 0.3% CER Character Error Rate Same as WER but measured letter-by-letter. Usually lower than WER. 26% PER Punctuation Error Rate How accurately the model placed commas, periods, and other punctuation. 10 / 10	~1s 0–2s range Post-stop latency Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings. 10 / 10
No models match — turn a filter back on.

Model

Accuracy

Speed

English

Cloud

Cloud Free tier

Willow Voice's cloud model — the only one available on the free tier (there is no model picker, just a language list). A local/offline mode is advertised on Pro but is locked, so we could not test it. Auto-cleanup (disfluency removal, capitalisation, punctuation, ITN) is always on; a separate Scribe mode applies a full LLM rewrite (paid).

One cloud model for everyone — free tier only lets you pick a recognition language

99.1%

Word accuracy The share of words the model got right (100% − word error rate). 100% = every word correct.

0.9% WER Word Error Rate What % of words the model got wrong. 0% = every word correct. 0.3% CER Character Error Rate Same as WER but measured letter-by-letter. Usually lower than WER. 26% PER Punctuation Error Rate How accurately the model placed commas, periods, and other punctuation.

10 / 10

~1s

0–2s range

Post-stop latency Seconds from pressing Stop to the final text appearing in your active app. Average across all test recordings.

10 / 10

Willow Voice for Coding & IT

Cloud Cloud

Coding 95.9% 11 err / 219w

Conference 99.5% 1 err / 214w

Coding

Auto-cleanup: punctuation and capitalisation correct
No hallucinations or dropped segments

"Tauri" → "Atari"
"last_seen_at" → "lastscene_at"
"ImagePullBackOff" and "rust-toolchain.toml" split apart

Conference

Handles accented speaker almost perfectly
Zero dropped sentences
Tech terms (whisper.cpp, Parakeet TDT, Gerganov) all correct

"per-minute" → "permanent" (single misheard term)

Willow Voice for Everyday & Long-form

Cloud Cloud

Casual 100.0% 0 err / 184w

Long-form 99.3% 4 err / 537w

Casual

Near-perfect — only one dropped connector
Auto-cleanup: caps, punctuation and list formatting all correct

Dropped one spoken "so"

Long-form

No drift over 4:00 — consistent throughout
All numbers, currency and percentages formatted correctly
Zero hallucinations

Auto-cleanup removed several spoken "so"/"and" connectors

alternative wrong extra missing

okay .i want to walk through what we learned this quarter about organic search versus paid acquisition ,because i think the numbers genuinely changed my mind ,and i want the whole team aligned before we set the budget for next year .quick context :for the last three years ,we have been spending roughly 70% of our marketing budget on paid channels ,google ads ,a bit of meta ,some linkedin for the enterprise segment .the remaining 30% went into content and seo .and the assumption ,honestly ,was that paid is the reliable engine and seo is a slow ,nice-to-have thing on the side .turns out that framing was wrong .let me give you the actual numbers .on paid ,our blended cost per acquisition climbed from $41 in january to $68 by september .that is a 66% increase in nine months ,and nothing about our targeting changed .the auction just got more expensive ,more competitors bidding on the same keywords ,plus the platform raising minimum bids .meanwhile ,our organic traffic went from about 12,000 sessions a month to 41,000 . and the cost per acquisition on that channel ,if you amortize the content investment ,was around $9 .nine versus 68 .that is not a small gap .now , the honest counterargument is timing .paid converts today .you spend $1,000 on tuesday ,you get leads on tuesday .seo is a delayed engine .the articles we published in february did not really start ranking until may or june . so there is a real cash flow difference .and if you are a startup that needs pipeline this month ,you cannot just turn off paid and wait two quarters for organic to compound .i get that .but here is the thing that surprised me .when we looked at lead quality ,not just volume ,the organic leads had a 31% higher trial-to-paid conversion rate .the theory is that someone who finds you by searching for a specific problem is further along in intent than someone who clicks an ad in their feed .they are actively looking .so not only is organic cheaper per lead ,the leads are actually better .so what are we doing differently next year ?three things : 1 .we are flipping the ratio ,moving to roughly 50/50 between paid and organic over the next two quarters ,not all at once ,because we still need the near-term pipeline .2 .we are doubling the content team from two writers to four ,and we are focusing on what we call bottom-of-funnel comparison content ,because that is where the intent and the conversion rate are highest .and 3 .we are going to treat paid as an accelerant for content that is already ranking instead of a standalone channel .so when an article hits page one organically ,we put paid behind it to compress the timeline .the goal by end of next year is to get our blended cost per acquisition back under $30 and to have organic driving more than half of all qualified pipeline .right now , it is at about 22% .that is a big gap to close ,but the trajectory over the last six months tells me it is achievable .anyway ,that is the short version .we can dig into the channel-level breakdown in a separate session .

Willow Voice for Numbers & Structured Data

Cloud Cloud

Numbers/ITN 100.0% 0 err / 40w

Numbers/ITN

Perfect ITN: "$12,400.75", "1-800-555-0123 ext. 479", "ABC-123456" all exact
Date "March 15th, 2026 at 3:30 PM" formatted correctly

Willow Voice: Noise Resistance

Cloud Cloud

Noisy Cafe 99.5% 0 err / 184w

Noisy Cafe

Café noise has no effect — identical to clean version

Dropped one spoken "so" (same as clean)

Tested on Windows 11 26H2 · AMD Ryzen AI 9 HX 370 · 32 GB RAM

Willow Voice UX & Integration

Getting started & flow

Onboarding flow

2–3 minutes, but several of the interactive tests ran unreliably.

3 / 5

Hotkey customization

Sensible defaults, but limited variety — some combos are blocked, mouse buttons and double-taps cannot be bound, and it froze during setup once.

4 / 5

Error messages

Clear, understandable error messages.

5 / 5

Recording experience

Recording overlay UX

The recording pill is clear and well done.

5 / 5

Stop / cancel UX

Works, but the cancel key is not shown.

4 / 5

Text insertion reliability

Auto-insert works everywhere tested.

5 / 5

Auto-insert vs clipboard

Auto-inserts by default with a hotkey to re-insert the last text, but it does not restore your clipboard.

3 / 5

Managing your work

Recording history

A history list exists on the home screen, but there is no export.

3 / 5

Mode / model switching

No preset list — style adapts per app — and a dedicated hotkey toggles Scribe (LLM) mode.

5 / 5

Idle resource use

~100 MB RAM · 0.3% CPU at rest (cloud).

3 / 5

Willow Voice Features

Text processing

AI post-processing

Cloud LLM: both per-app style adaptation and a full-rewrite Scribe mode (the rewrite key does nothing on the free tier).

Custom vocabulary / dictionary

Per-word auto-replace before insertion.

Text snippets / expansion

Text expansion via "Personal shortcuts".

Output & extras

Music auto-mute

Translation mode

No built-in translation mode.

Ask / Q&A mode

No Ask / Q&A LLM mode.

File transcription

Export (txt / srt / json)

No txt / srt / json export.

Voice commands

Local recognition

Offline / local inference

Offline is advertised but Pro-locked, so it could not be verified.

Multiple model options

A single cloud model — nothing to pick.

Willow Voice Privacy

Willow Voice streams audio to api.willowvoice.com on every recording — upload begins while you are still speaking, before you press Stop. Beyond audio: Training opt-out is offered, but tracking cannot be disabled and is extensive: Sentry crash data, PostHog product analytics, S3 uploads and Loom all fire — including while idle.

Audio uploaded on every recording

Endpoints: api.willowvoice.com, sentry.io, posthog, s3-r-w.us-east-1.amazonaws.com, cdn.loom.com

Audio streamed before you press Stop

Recording is streamed to the server while you talk — if you cancel, it has already left your device.

Account required

You must create an account (email) to use the app at all — your dictation is tied to an identity.

Sends more than audio

Training opt-out is offered, but tracking cannot be disabled and is extensive: Sentry crash data, PostHog product analytics, S3 uploads and Loom all fire — including while idle.

Opt out of training on your data

Your recordings are not used to train models.

Disable analytics & tracking

Analytics and tracking cannot be fully disabled (e.g. Google Analytics, ad attribution).

Turn off history storage

History is always stored — there is no way to disable it.

From the privacy policy not scored

We tested the free tier, which is cloud-only: audio is streamed to api.willowvoice.com live, before you press stop.
An offline/local mode is advertised on Pro but is locked on the free tier, so we could not confirm whether it truly keeps audio on-device.
Telemetry endpoints (Sentry, PostHog, S3, Loom) are contacted even during idle with the mic on.
Training opt-out is presented during onboarding, but it only blocks model training — not analytics or tracking.

Pricing

Free $0 No credit card

2,000 words per week
Instant dictation & formatting
Custom vocabulary
Works across apps (Slack, Gmail, Cursor…)

Subscription $15/mo Individual · $12/mo billed annually · Team $10/seat (min 3 seats)

Unlimited dictation words
Full personalization across apps and tasks
Smart memory of your writing style
Increased dictation length

Lifetime Not offered

No lifetime / one-time option — subscription only

Willow Voice — Pricing — Free trial, Individual $15/mo, Team $12/mo, Enterprise (as of 2026-07-09)

Willow Voice on the free tier

How far Willow Voice gets you without paying — the basis for its Best free option ranking.

Free limit: 2,000 words per week
Account required: Yes — sign-up needed

What you get for free

2,000 words per week
Instant dictation & formatting
Custom vocabulary
Works across apps (Slack, Gmail, Cursor…)

How we judge free tiers →

Methodology

Accuracy scores use WER (Word Error Rate) computed against multi-reference ground truth with {a|b} alternates for valid transcription variants (e.g. 48% and forty-eight percent are both accepted). Audio delivered via virtual cable from ElevenLabs TTS. Single test session on 2026-06-09.

Read the full methodology →

Limitations of this test

TTS source, not human voice — real-world WER will be higher
Single session, no variance measurement across multiple runs
Punctuation (PER) not shown in this table — see raw data
Numbers WER may be overstated for apps that apply ITN (converting spoken to digit form)