Captures your mic and the far side of the call (Zoom, Teams, Meet), and turns them into speaker-attributed transcripts that never leave your machine. 100% open-source, open-model, airgapped.
Cloud meeting AI means uploading the raw audio of every conversation to someone else's servers. For anything confidential, that's a non-starter.
So I built the opposite: transcription that runs entirely on-device, on open models, with nothing phoning home. And the transcripts aren't just notes — they're private context my own AI agents can draw on for total recall, without renting my memory to anyone. parley covers calls and meetings; mailrag covers email. Independent tools; my agents know about both and use what fits.
Records your mic and system audio as separate streams, so local and remote voices stay distinguishable. No virtual drivers.
Automatic who-said-what (pyannote + WeSpeaker + VBx), with a quality score per segment.
FluidAudio / Parakeet (fastest, 25 EU languages) or Apple SpeechAnalyzer. Swap in Settings.
Strips the far-end voice that bleeds into your mic so it isn't mistaken for a phantom speaker.
Survives UI and XPC crashes with auto-relaunch, silent re-attach, and multi-segment stitching.
JSON, SRT, and TXT with timestamps, speaker labels, confidence scores, and local/remote tags.
| Component | Model | License |
|---|---|---|
| Speech recognition | NVIDIA Parakeet TDT 0.6B (CoreML) | CC-BY-4.0 |
| Speaker diarization | pyannote segmentation + WeSpeaker | CC-BY-4.0 |
| Voice activity | Silero VAD | MIT |
| Engine SDK | FluidAudio | Apache-2.0 |
Get started: git clone https://github.com/fmasi/parley.git && cd parley && bash package_app.sh --install