i wanted to read a book but reading is hard, so i set up a screen reader to do it for me. the default voices on linux are pretty bad, so i looked for better ones — this is that story.
i tried coqui-ai and piper. coqui-ai was too slow to be usable, so i settled on piper.
here is a coqui-ai voice sample:
and here is a piper voice sample:
note that this is not in any way safe to run as your daily driver. it is not suitable if you rely on your text-to-speech to not break.
how?
prerequisites and trying it out
first you need text-to-speech tools on your system. usually having espeak-ng orca packages installed should be enough — they pull all the dependencies required and integrate with popular desktop environments.
next, grab a piper release and a voice for it from piper's releases page. pick the amd64 or arm64 prebuilt binaries depending on your architecture and one of the voices. the initial release has the voice download links. i went with en-us-amy-low because it was the first one in order.
extract the piper release and the voice anywhere; i put both in the same folder. you're now done with the requirements. we'll assume you also went with en-us-amy-low for the commands below — if not, replace it with whatever you picked.
test that speech-dispatcher is working:
ideally you get audio output (with a terrible voice).
test piper from the folder where you extracted everything:
echo "it's that shrimple" \
| ./piper --model en-us-amy-low.onnx --output_file - \
| paplay--output_file - tells piper to send raw audio to stdout, and we pipe that into paplay (the pulseaudio version of aplay) — it should work on most distros, even if you're on pipewire.
if those commands worked, time to wire it up. create a new module for speech-dispatcher:
add these two lines:
GenericExecuteSynth "echo '$DATA' | /home/user/Documents/piper/piper --model /home/user/Documents/piper/en-us-amy-low.onnx --output_raw | $PLAY_COMMAND"
AddVoice "en" "FEMALE1" "en_UK/apope_low"the first line is the piper command from earlier with variables instead of hardcoded values. $DATA is filled by speech-dispatcher and $PLAY_COMMAND defaults to aplay/paplay on most systems. the bits you care about are the full paths to the piper executable and the voice model — replace those.
the second line teaches speech-dispatcher about the voice so we can select it later.
you can use a graphical interface like orca --setup to pick piper-generic as speech engine and the new voice. or set it system-wide by editing:
append these three lines:
DefaultVoiceType "FEMALE1"
DefaultLanguage "en"
DefaultModule piper-genericsave and exit.
wrapping up
that's pretty much it — assuming all of the various bricks that make up linux decide to cooperate.
run spd-say again to confirm:
now anything that integrates with speech-dispatcher (which is a lot of stuff on linux — firefox, calibre reader, system-wide TTS) uses your new piper module. finally no more microsoft sam voice.
what next?
to make this properly usable there's still some stuff to figure out:
- adjusting diction speed
- making the setup more reliable
- packaging this as a speech-dispatcher package installable from a package manager
and generally polishing this post.
anyway, thank you for your time — i hope some of this was useful.