The pyTTS module wraps the text-to-speech services of the Microsoft Speech API (SAPI) for use in Python. It relies on the win32com library for obtaining and communicating with the SAPI COM interfaces. Examples of common text-to-speech tasks including speaking simple text, changing voice parameters, speaking to or from a WAV file, correcting pronunciation, and handling speech events are presented in this tutorial.
This tutorial does not cover the full pyTTS API. See the pyTTS documentation for the full API spec.
Prerequisites
- pyTTS 3.0 or higher
- Microsoft SAPI 5.1 redistributable
- Extra Microsoft voices
- Mark Hammond’s Python win32all extensions
- Example code
Simple Speech
Producing speech is very easy. The following example demonstrates how to speak a string of words.
1 2 3 4 | import pyTTS tts = pyTTS.Create() tts.Speak('This is the sound of my voice.') |
Not much else can be said about this example. It speaks for itself. (Hyuck hyuck.)
Properties
Various voice properties can be modified including rate of speech, speech volume, and gender. The next example shows how these properties can be changed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import pyTTS tts = pyTTS.Create() #set the speech rate tts.Rate = 4 #set the speech volume percentage (0-100%) tts.Volume = 40 #get a list of all the available voice actors print tts.GetVoiceNames() #explicitly set a voice tts.SetVoiceByName('MSMary') #speak the text tts.Speak('This is the sound of my voice.') |
The Speak method accepts additional flags that determine, for instance, whether speech is synchronous or asynchronous and if it contains XML markup or not. The following example demonstrates some of these flags.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import pyTTS from time import sleep tts = pyTTS.Create() #the tts_async flag causes the program to continue immediately #after starting speech; the speak function will not block tts.Speak('The rain in Spain falls mainly on the plain.', pyTTS.tts_async) #wait for one second sleep(1) #now begin speaking the next stream, purging the remainder of the first stream tts.Speak('This is the sound of my melodious voice!', pyTTS.tts_purge_before_speak) |
WAV Files
The pyTTS class can speak to a file in WAV format on disk. The file can be played back later by the pyTTS class or any other audio playback software. The following example shows how to write and playback WAV files.
1 2 3 4 5 6 7 8 9 | import pyTTS tts = pyTTS.Create() #write to a wave file tts.SpeakToWave('spain.wav', 'The rain in Spain falls mainly on the plain.') #speak from the wave file tts.SpeakFromWave('spain.wav') |
Memory buffers
The pyTTS class can also speak to a memory buffer. The buffer can then be modified or played later by another sound library. See the pySonic tutorial for an example of this feature.
Pronunciation
The Microsoft speech engine has difficulty pronouncing some words, especially when the pronunciation depends on the semantic context. There are two ways of correcting pronunciation:
- Mispelling. Simply mispell the word that is mispronounced until it sounds right when spoken by pyTTS.
- Phonetic spelling. Insert an XML pronunciation tag into that spells a word phonetically.
Both of these approaches are shown in the example below.
1 2 3 4 5 6 7 8 9 10 11 12 | import pyTTS tts = pyTTS.Create() #MSSam mispronounces the word 'sonified' tts.Speak('Sonified.') #we can fix it by misspelling tts.Speak('Sahnified.') #or we can use XML to tell the speech engine to use the provided phonetics tts.Speak('<pron sym="s aa n ih f ay d" />.', pyTTS.tts_is_xml) |
The pyTTS module includes a class that makes pronunciation correction easier. The class stores pairs of mispronounced words and their pronunciation corrections to disk. An example of how to use the class is shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import pyTTS tts = pyTTS.Create() #create an instance of the pronunciation corrector p = pyTTS.Pronounce() #add an entry for the phonetic pronunciation of the abbreviation Alt p.AddPhonetic('Alt', 'ao l t 1') #add an entry for the purposeful misspelling of the abbreviation Control p.AddMisspelled('Ctrl', 'Control') #now quickly correct a sentence using pronunciations in the dictionary text = p.Correct('The alt key is fun, but the Ctrl key is cooler!.') #print the text to see what it looks like and then speak it print text tts.Speak(text, pyTTS.tts_is_xml) #the pronunciation dictionary can be saved to disk too p.Save('my.dict') |
The pyTTS installation also includes a module that has the start of a wxPython GUI that makes creating pronunciation dictionaries easy. The GUI panel can be integrated into other apps to support pronunciation creation and use. A simple application that uses this GUI can be seen by running the PronuncationEditor.py in your Python/libs/site-packages folder.
Events
The Microsoft speech engine also supports speech event callbacks. The pyTTS allows you to register callback functions that will be notified when speech events occur. Some of these events include end of sentence signals, end of stream signals, and bookmark signals. The available event callbacks are described in detail in the pyTTS documentation.
This next example demonstrates how to register callback functions for some simple signals. It requires wxPython or some other library with an event processing loop in order to run properly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | import pyTTS import wx import time #create a wxPython frame with a single button in it class myFrame(wx.Frame): def __init__(self): wx.Frame.__init__(self, None, -1, 'My Frame', size = wx.Size(120, 80)) #create the button id = wx.NewId() wx.Button(self, id, 'Press me') #create the TTS object and assign the callback functions self.tts = pyTTS.Create() self.tts.OnBookmark = self.OnBookmark self.tts.OnWord = self.OnWordSentence self.tts.OnSentence = self.OnWordSentence wx.EVT_BUTTON(self, id, self.OnSpeak) #when the button is pressed, speak the current time with a bookmark in it def OnSpeak(self, event): self.tts.Speak('The current time is '+ time.asctime()+'. End of line.', pyTTS.tts_is_xml, pyTTS.tts_async) #print when the bookmark is encountered def OnBookmark(self, event): print event.Kind, event.Bookmark #print whenever a word or sentence boundary is encountered def OnWordSentence(self, event): print event.Kind, event.CharacterPosition if __name__ == '__main__': app = wx.PySimpleApp(0) frame = myFrame() app.SetTopWindow(frame) frame.Show() app.MainLoop() |
It is worth noting that events are fired when speaking from a WAV file or from a memory buffer previously created by pyTTS or another SAPI program.
As a final note, bookmark events could be particularly useful for building interactive speech interfaces. For instance, the speech engine could read a list of menu items prefixed with bookmarks. As the items are read aloud, the program can monitor bookmark events to determine the active menu item. When the user presses a key, the program can identify the current menu item by the last bookmark encountered and take the appropriate action.
Conclusion
This tutorial introduces the basics of the pyTTS library. Please refer to the full pyTTS API documentation for more information.