Text to Speech synthesis software

From LinuxReviews
Jump to navigationJump to search

There are in principle three software alternatives for converting text to speech on Linux but in practice there's just one. It is called espeak-ng and it is not very good. It is capable of converting text from a text-file or pipe into understandable audio. Well, understandable as long as it's English, anyway.

The free software alternatives for converting Text to Speech on Linux

Your distribution will probably have a program called festival and many old web pages recommend it. It was last updated in 2004 and echo 'hello' | festival --tts outputs Segmentation fault (core dumped) on modern distributions. Why they still include it is a good question.

flite is another fine common recommendation which was last updated in 2005. It won't crash, it will execute, but it won't actually output any audio.

There's a Java alternative called freetts which was last updated in 2009. Good luck with that.

That leaves espeak which was last updated in 2014 and a fork of espeak called espeak-ng which is actively developed.

espeak-ng is the best choice because it's the only still updated alternative. And it works.

HOWTO use espeak-ng

espeak-ng is a commmand-line too which, like most command-line tools, accepts piped input. It will happily turn all piped input, either it's a file you cat or text you echo and turn it into spoken audio. Example:

echo 'Hello, this is a test of the emergency broadcasting system' | espeak-ng

This is what it wounds like - twice:

espeak-ng does have quite a lot of options for "enhancing" the audio. You can set things like speed, pause between words and amplitude. And there's several different voices available for it. Thus; you can play around with it but don't expect "professional" results no matter what you do.

The most interesting options to try with espeak-ng are espeak-ng --voices and espeak-ng --voices=mb which will list all the available voices for the default and the MBROLA voice synthesizer respectively. The list for --voices will be long and look like this

 5  mt              --/M      Maltese            sem/mt
 5  my              --/M      Burmese            sit/my
 5  nb              --/M      Norwegian_Bokmål   gmq/nb
 5  nci             --/M      Nahuatl_(Classical) azc/nci

(That's just 3 lines picked randomly, espeak-ng outputs a much longer list)

These voices can then be used with the -v option. Thus; to make it say something with the Norwegian voice you could do:

echo 'Nei takk ikke fiskeboller' | espeak-ng -v gmq/nb

MBROLA voices - don't bother

espeak-ng supports using MBROLA as a back-end. The list for MBROLA supported voices can be generated by espeak-ng --voices=mb and it will look similar to regular voices. However, using them will only work if you have the mbrola binary installed. It is non-free and not available in distributions. You can download and install it from http://tcts.fpms.ac.be/synthesis/mbrola.html if you want to. It it not worth the trouble. The voices available to it are different from espeak-ng's stock - but they are not better. If anything, they sound worse.

The espeak-ng manual page lists a lot more options. But as said, it won't sound great no matter what you do.

You can follow development at https://github.com/espeak-ng/espeak-ng/

More realistic alternatives

The sad truth is that the best alternative is Amazon Polly. It is botnet text to speech and Stallman would absolutely not approve. baby WOGUE uses it to make YouTube Videos about free software and it's quite good. It is better than espeak-ng and worth looking into if you find proprietary software to be acceptable when there is no free alternative. You could check out AWS: Getting Started with Amazon Polly if you are interested.