sowwy you dont have webgl :(

or the client side is just loading i'll implement a proper check someday :^)

hello, it's kay 🦊 (she - elle), kay 🤖 (they/it - ielles/lae), we do web development mainly (mostly, no clue what we're doing)

we're a 26 year-old trans 🏳️‍⚧️ non-binary borg system. you can use any neutral pronouns for us (english: they or it, french: ielle/iel ou lae/lea ou ca/cela). currently lives near paris. likes anything that has to do with technology and hacking stuff (code, video, music, games, electronics, etc.). likes to share what we make and learn and teach stuff. very much pro data rights and more broadly against rising worldwide fascistic ideology and oppression.

this post is still a work in progress

2023-06-07

more natural screen reader voice on linux

using piper for a more natural sounding voice with speech-dispatcher


why?


we recently wanted to read a book but are not very good at it. so we tried to setup a screen reader to do it for us. the default voices on linux are pretty bad so we tried to find better ones. this is the story.


we first tried coqui-ai and piper but coqui-ai was too slow to be usable, so we settled on piper.

here is a coqui-ai voice sample

and here is a piper voice sample


note that this is not in any way safe to run as daily driver. it is not suitable if you rely on your text to speech to not break.

how?


prerequisites and trying it out


first we need text to speech tools on our system. usually having espeak-ng orca packages installed should be enough. they pull all the dependencies required and integrate with popular desktop environments.


next we want to grab a release of piper and a voice for it . pick the amd64 or arm64 prebuilt binaries depending on your architecture and one of the voices. the initial release has the voice download links. we went with en-us-amy-low, because it was the first one in order.

now extract the piper release anywhere you please, extract the voice in the same folder. you can put it anywhere but we just put it in the same folder.

you are now done with the requirements part.


we will assume you also went with the voice en-us-amy-low to make commands that will come up easier. make sure to replace this with the actual voice you picked if different.

you can test that speech dispatcher is working with all the defaults by running :

spd-say "it's that shrimple"

ideally you get some audio output (with a terrible voice).

you can test piper by going into the folder where you put everything and :

echo "it's that shrimple" \
| ./piper --model en-us-amy-low.onnx --output_file - \
| paplay

output file with a dash tells it to output raw audio to stdout and we pipe (|) that into in paplay. paplay is the pulseaudio version of aplay (alsa play), meaning it should work on most distributions, even if on pipewire.

system configuration


if the commands ran fine you are now ready to configure everything. so the next step is to create a new module for speech-dispatcher. so we create a new file and edit it :

sudo touch /etc/speech-dispatcher/modules/piper-generic.conf
sudo $EDITOR /etc/speech-dispatcher/modules/piper-generic.conf

we then add these two lines :

GenericExecuteSynth "echo '$DATA' | /home/user/Documents/piper/piper --model /home/user/Documents/piper/en-us-amy-low.onnx --output_raw | $PLAY_COMMAND"
AddVoice "en" "FEMALE1" "en_UK/apope_low"

the first line is the piper test command we made you run earlier but with variables instead of hardcoded values. $DATA is handled by speech dispatcher and $PLAY_COMMAND should just default to aplay or paplay on most systems.


the things you care about are the full paths to the piper executable as well as the full path to the voice model that you extracted earlier. make sure to replace those.


the second line is because of how speech dispatcher is made to handle a quantity of different voices so we tell it to add a new one so we can later select it.


save the file and exit.

you can probably (maybe after a reboot or a logout) already just use a graphical interface like orca --setup to select piper-generic as speech engine module and the voice we added.


another way is to set the default settings for speech-dispatcher system wide to automatically use the new module and voice. you want to edit :

sudo $EDITOR /etc/speech-dispatcher/speechd.conf

add the 3 following lines at the end of the file :

DefaultVoiceType  "FEMALE1"
DefaultLanguage "en"
DefaultModule piper-generic

if you edited any of the voice names in the previous module file on the second line for the voice then make sure to adjust accordingly.


save the file and exit.


trying it out and wrapping up


that's pretty much it. hopefully. if all of the various bricks that make up linux decided to cooperate.


you can now try running the spd-say command again :

spd-say "it's that shrimple"

meaning now anything that integrates with speech-dispatcher (which is a bunch of stuff on linux actually so that's nice) like firefox, calibre reader, or even system wide will use the new piper module we just configured.


finally no more Microsoft Sam voice.

what next?


to actually make this properly usable system wide and better there's some stuff we still need to figure out such as :


  • adjusting diction speed
  • making this setup more reliable
  • packaging this as some sort of speech-dispatcher package that can just be installed with a package manager

and more generally adding more polish to this post.


anyways thank you for your time reading this we hope some of the knowledge in here can be of use to you.