Text to Speech synthesis software

From LinuxReviews
Jump to navigationJump to search

There are in principle many free software alternatives for converting text to speech on Linux but in practice there's just two and they are rather poor compared to proprietary alternatives. They can be used to make the computer read text and speak in very artificial-sounding voices.

The free software alternatives for converting Text to Speech on Linux

Free Text To Speech Synthesis Software
Program rating example voice
Sad hyemi2.jpgSad hyemi2.jpg default
Kim.Se-jeong.confused.jpgKim.Se-jeong.confused.jpgKim.Se-jeong.confused.jpg default
1.3 (2005)
Frustrated stallman cropped.jpg default
Hyuna-approves.jpgHyuna-approves.jpgHyuna-approves.jpgHyuna-approves.jpg ab

The practically usable alternatives for converting text to speech using free software on GNU/Linux desktop and laptop machines are:

  • mimic from Mycroft, forked off an early version of the flite software, is the best choice if you are only interested in the English language.
  • festival is actively developed and it works fine but it is not great and it does not sound as good as mimic. festival may be the better choice for non-English languages. Festival is developed by the British at the University of Edinburgh. The project was dead for many years which is why some GNU/Linux distributions still ship an ancient version from 2004 even though there have been several releases after the project was somewhat revived in 2017. if echo 'hello' | festival --tts results in a Segmentation fault (core dumped) then it is likely because your distribution gave you an outdated version.

mimic and festival are not what you could call "natural-sounding". They do produce acceptable and, more importantly, understandable results even though both sounds very artificial.

There are several other alternatives but they not very good and, in most cases, usable. Many web pages, notably older pages and pages made by people who didn't do anything and just cut and paste from older pages, will recommend the following programs:

  • flite, or "festival lite", is another widely recommended program. The version most GNU/Linux distributions ship is from 2005. That version won't just crash, it will execute, but it won't actually output any audio. This may be due to it being written before PulseAudio was a thing. It can be used to create wav files that can be played using programs like aplay or mpv.
    • The flite project is not dead, there was a release (2.5.1) in July 2020. You can acquire the source from github.com/festvox/festival and compile it yourself if you want to try a newer flite version. Why GNU/Linux distributions ship an ancient 2005 version is unclear.
  • There's a Java alternative called freetts which was last updated in 2009. Good luck getting that one working. We tried and gave up after wasting too much time on it.
  • espeak, last updated in 2014, is another widely recommended alternative that isn't usable on modern distributions. The espeak espeak-ng fork is actively developed and it is quite usable.
  • espeak-ng (espeak next generation) can be used but it doesn't sound very good. All the distributions have a working version available in their repositories, so there's that.
  • There is also a GNU project for voice synthesis called gnuspeech. It was last updated in 2015. You can view the code at git.savannah.gnu.org: gnuspeech and you may be able to get it to compile if you have a lot of patience and willingness to change the code so it compiles against modern libraries. Getting it to work is not easy and it isn't very good.

GNU/Linux systems have a layer between applications with text to speech features and the applications who provide these features called speech-dispatcher. speech-dispatcher can be configured any of the above mentioned programs.

HOWTO use Mimic

A video explaining the four essential freedoms software must have to qualify as free software made in kdenlive using mimic -voice slt to create the audio.

mimic from Mycroft is available as a package called mimic on most GNU/Linux distributions. It is a pure command-line tool, there is no GUI. Using it is strait-forward:

mimic -t "Hello world" makes it say "Hello world".

-f filename.txt makes it read a text file. Adding -o output.wav makes mimic write the voice output to a .wav formatted audio file.

This is what mimic -t 'Hello, this is a test of the emergency broadcasting system' -o mimic-test.wav ; oggenc mimic-test.wav sounds like:

The mimic package comes with several built-in voices. There is also support for voice-files. One voice-file comes pre-installed in /usr/share/mimic/voices. There are no additional voice files available on the mimic website at mimic.mycroft.ai/ but there are some files flitevox files in a voices/ folder that are not included in the package distributions ship on the GitHub page at https://github.com/MycroftAI/mimic1.

The internal voices in mimic can be used by passing the -voice option. The available built-in internal voices can be listed with mimic -lv

This will, when using mimic v1.3.0, output: Voices available: ap slt slt_hts kal awb kal16 rms awb_time

The slt and slt_hts voices are female voices. Here is a test of slt made using:

mimic -t 'Hello, this is a test of the emergency broadcasting system' -voice slt -o mimic-slt-test.wav

  • ab, awb, kal and rms are male voices. awb is probably British. kal is probably a drunk. rms does not sound anything like Richard Stallman.
  • slt and slt_hts are female voices.
  • awb_time and kal16 seem to be broken, using them does not produce any understandable outout

Run mimic --help to see all the available command-line options.

See Mimic for additional information about Mimic.

HOWTO use espeak-ng

espeak-ng is a commmand-line tool which, like most command-line tools, accepts piped input. It will happily turn all piped input, either it's a file you cat or text you echo and turn it into spoken audio. Example:

echo 'Hello, this is a test of the emergency broadcasting system' | espeak-ng

This is what it wounds like - twice:

espeak-ng does have quite a lot of options for "enhancing" the audio. You can set things like speed, pause between words and amplitude. And there's several different voices available for it. Thus; you can play around with it but don't expect "professional" results no matter what you do.

The most interesting options to try with espeak-ng are espeak-ng --voices and espeak-ng --voices=mb which will list all the available voices for the default and the MBROLA voice synthesizer respectively. The list for --voices will be long and look like this

 5  mt              --/M      Maltese            sem/mt
 5  my              --/M      Burmese            sit/my
 5  nb              --/M      Norwegian_Bokmål   gmq/nb
 5  nci             --/M      Nahuatl_(Classical) azc/nci

(That's just 3 lines picked randomly, espeak-ng outputs a much longer list)

These voices can then be used with the -v option. Thus; to make it say something with the Norwegian voice you could do:

echo 'Nei takk ikke fiskeboller' | espeak-ng -v gmq/nb

espeak-ng is developed at github.com/espeak-ng/espeak-ng/.

Adding MBROLA voices: don't bother

espeak-ng supports using MBROLA as a back-end. The list for MBROLA supported voices can be generated by espeak-ng --voices=mb and it will look similar to regular voices. However, using them will only work if you have the mbrola binary installed. It is non-free and not available in distributions. You can download and install it from http://tcts.fpms.ac.be/synthesis/mbrola.html if you want to. It it not worth the trouble. The voices available to it are different from espeak-ng's stock - but they are not better. If anything, they sound worse.

The espeak-ng manual page lists a lot more options. But as said, it won't sound great no matter what you do.

HOWTO use festival

festival will say whatever is piped to it if you have a working version and you add the --tts option:

echo 'hello' | festival --tts

You can pipe files to festival and have them read:

echo "Hello world" > example.txt
festival --tts < example.txt


echo 'Hello, this is a test of the emergency broadcasting system' | festival --tts

Many GNU/Linux distributions ship wildly outdated versions of festival. You may find that the version your distribution includes segfaults and exits when you try to use it. You can acquire the source code from github.com/festvox/festival and compile it yourself that's the case.

See festival for additional information about festival.

HOWTO use flite

All the GNU/Linux distributions ship flite 1.3 from 2005 for some reason we can't begin to imagine. There are several newer releases available, v2.5.1 was released in July 2020.

The text you want flite to say can be specified with -t.

flite 1.3 will not produce any audio, or anything else, if you tell it to say something with -t. It does support file output and that works.

flite -t 'Hello, this is a test of the emergency broadcasting system' -o flite-1.3-test.wav

will produce a flite-1.3-test.wav file you can play with aplay or mpv.

You will want to compile and install a recent version (source at github.com/festvox/flite) if you want to use flite because the version Linux distributions ship is typically wildly outdated and outright horrible.

Proprietary alternatives

Amazon Polly is the best proprietary alternative if you want text-to-speech functionality in a non-free software project. It is botnet text to speech cloud service operated by the very evil American Amazon corporation. Stallman would absolutely not approve. baby WOGUE uses it to make YouTube Videos about free software. You can check that channel out to get an idea how Amazon Polly sounds. It is better than mimic and espeak-ng for practical purposes and worth looking into if you think evil proprietary software tied to cloud services is acceptable when there is no superb free alternative. You could check out AWS: Getting Started with Amazon Polly if you are interested. Most of the Android "apps" for text-to-speech use the Amazon Polly API.

Read Aloud: A Text to Speech Voice Reader is a plug-in for the Mozilla Firefox web browser which lets you do text-to-speech in that web browser using server-side services. The "standard" voices available are all generated using Google services. A Google account is required to use some of the "premium" voices. There are also many other "premium" voices available that use other third party services. You need to buy a subscription in order to use those voices.

Natural Reader is a plug-in for the Chrome and Chromium web browsers which lets you do text-to-speech in those browsers using a server-side service.

Read Aloud and Natural Reader are both decent alternatives if you want something read aloud. The obvious downsides with those are that a) they are limited to in-browser text-to-speech only and b) they use proprietary cloud services to do the actual text to speech synthesis. Everything you ask them to read is sent to the cloud.



9 months ago
Score 0++
Never heard of it before. It's https://gith...peech/julius ? It sounds like it is a speech to text, not text to speech, program. That's pretty cool, if it works.

Anonymous (f4df9e7b4e)

12 days ago
Score 0
Love this critical review! :D Imagine, the one voice sounds like that of a drunk. LMAO. Thanks for the article, I'll check out the FF-addon, and evil Polly as well.
Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.