# Aqua Voice Overview Aqua Voice is *AI-Native* Dictation for Mac and Windows. It lets you talk into *any* text field -- Cursor, Gmail, Slack, even your terminal. It starts up in under 50ms, inserts text in about a second (sometimes as fast as 450ms), and has state-of-the-art accuracy. Aqua uses a fusion transcription architecture with a client context engine for the most accurate output. It makes ~17x fewer mistakes than Siri and Google Voice typing and will save you time and energy when writing and prompting. ## What is AI-Native Dictation? It means two things: 1. The AI models that power it: (transcription + large language models fusion to deliver improved output quality, contextual awareness, and fluidity) 2. The AIs you talk to with it: (Using your voice is the best way to talk to PROMPT) ## Quick Start (3 min) 1. Download Aqua (see below for direct links). 2. Install and sign in. 3. Press **Fn** (Mac) or **Alt** (Windows) to start dictation. 4. Speak naturally—Aqua Voice formats text to match the context. 5. Press the hotkey again to stop. ## Links - [Try In Browser](https://withaqua.com/sandbox) - [Demo Video (3:29)](https://withaqua.com/watch) - [Download](https://withaqua.com/download) - [Benchmarks](https://withaqua.com/blog/benchmark-nov-2024) - [Changelog](https://withaqua.com/changelog) - [FAQ](https://withaqua.com/faq) - [Privacy Policy](https://withaqua.com/privacy) - [System Status](https://status.withaqua.com/) ## External Links - [Twitter](https://twitter.com/@aquavoice_) - [Aqua Voice YC](https://www.ycombinator.com/companies/aqua-voice) - [Hacker News Post #1](https://news.ycombinator.com/item?id=39828686) - [Hacker News Post #2](https://news.ycombinator.com/item?id=43634005) - [Product Hunt](https://www.producthunt.com/products/aqua) ## Tech Specs ### Download Links - macOS Apple Silicon [Download](https://d1a1dx1sgvjqrz.cloudfront.net/aqua-voice-updates/darwin/arm64/Aqua+Voice-0.5.1-arm64.dmg) - macOS Intel [Download](https://d1a1dx1sgvjqrz.cloudfront.net/aqua-voice-updates/darwin/x64/Aqua+Voice-0.5.1-x64.dmg) - Windows 10+ [Download](https://d1a1dx1sgvjqrz.cloudfront.net/aqua-voice-updates/win32/x64/Aqua+Voice-0.5.1+Setup.exe) ### Languages Supported (49) Arabic, Belarusian, Bengali, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Maltese, Mandarin, Marathi, Mongolian, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovenian, Spanish, Swahili, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh ### Benchmarks (Speed) Aqua is the fastest transcription app in the world. #### End of Speech -> Text Latency (ms) - Aqua Voice - 965ms - Whispr Flow - 1399ms (Aqua 31% faster) - Superwhisper - 2407ms (Aqua 59% faster) Note: Testing was performed on clips under 30 seconds on a MacBook Pro in Northern California. ### Benchmarks (Accuracy, Lower is better) #### Human-Scribe (Email) - Aqua Voice - 0.9% - Whispr Flow - 10.5% - Whisper-large-v3 - 32.8% - SuperWhisper - 20.4% - Dragon Dictation 16 Pro - 12.2% - Rev - 11.9% - Apple Dictation - 17.8% - Otter - 28.0% Human transcription is ~4% WER. Aqua Voice achieves 0.9–4.4% WER across email, technical, notes, lecture, and book tasks in our human‑scribe benchmark. ##### LibriSpeech (test‑clean) - **Aqua Voice (Streaming)** – 3.22% - Google Realtime API – 5.63% - Amazon – 6.42% [Details here](https://withaqua.com/blog/benchmark-nov-2024) ## Features ### Modes One thing that sets Aqua apart is its two operation modes: - **Instant Mode**: (Press Key -> Talk -> Release -> See Text) startup <200ms, results ~450ms. - **Streaming Mode**: (Press Key -> Talk -> See Text in real time -> Release), continuous output, ~850ms latency, best with deep context. Different apps require different levels of precision and control over text. You wouldn't spend as much time writing a prompt in Cursor as you would on an important work email. Depending on the task, you can select the mode that suits you best. Instant Mode is great for shorter clips or when you want to "chain" Aqua together many times in a row. It has lowest possible latency and shows you the text when you're done speaking. Streaming Mode is great for more complex tasks that require maximum contextual understanding, and precise control over text and formatting (for example, an email). There's no right or wrong way to use these modes, it's a matter of personal preference. Think of it like texting style, some people write many short messages, others one longer message. ### Deep Context Deep Context is a powerful feature that uses what's on your screen to improve the accuracy of the text. It is especially powerful for coding, messaging, and document editing by understanding what is on screen. This feature is optional and off by default. Your data is processed securely and never stored. Deep context enables syntax highlighting in your transcript. For example: You Say: Can you modify the canonical title on the context response model to be either an object or a string? Aqua w/ Deep Context: Can you modify the `canonical_title` on the `ContextResponse` model to be either an object or a string? This works in any app but is especially powerful when prompting LLMs to generate code in apps like Cursor, Windsurf, VSCode, Zed, JetBrains IDEs, etc. ### Custom Instructions Custom Instructions allow you to fine-tune your output with natural language instructions. For example, you might put: ```txt In iMessage, Slack, & WhatsApp, use all lowercase (gen-z style) except for proper nouns and the pronoun "I". # PUNCTUATION & SYMBOLS - When I say "checkbox", insert a Markdown checkbox: - [ ] # DICTIONARY & MACROS - My Company: {{My Company Name}} (often misheard as {{common mispronunciation}}) - Our Product: {{Product Name}} ``` Similar to ChatGPT custom instructions, you can tune your output to your workflows using natural language. See our Discord for recommended format and starter packs. ### Dictionary 'Impossible' words are no problem. Enter names, technical terms (YC W24, factorio, package.json), or custom phrases (Jeffersonian Computing, OPEN-RISOP) and Aqua will recognize them in speech while preserving the casing. Aqua is also very smart, so if you enter the word "factorio" and then use it in the sentence "Factorio is a great game.", Aqua understands to capitalize the "F" given its sentence structure. ### Additional Features - Language Autodetect - Select "Auto" in the language picker to let Aqua auto detect the language. - Local History - Aqua stores the local history of your transcript and audio, ensuring you never lose any text. Pick the "History" tab in settings. - Multiple Hotkeys - bind multiple hotkeys to activate Aqua (useful when switching keyboard or if there is a conflict) ## Feedback If you have suggestions or want to share results, email us at or join our [Discord](https://discord.gg/aqua-voice). (Updated June 19th, 2025)