The pySonic library provides a Pythonic interface to the cross platform FMOD sound library. This tutorial demonstrates the usage of pySonic to play sampled waveform data from files, memory, and the Web; play synthesized or sample-based music; play audio in a 3D soundspace; play speech audio generated bypyTTS without going to disk; and record, modify, and playback audio. Issues with multithreading are also discussed.
This tutorial does not cover the full pySonic API. See the pySonic documentation for the full API spec.
Prerequisites
- pySonic 0.8 or higher
- FMOD 3.74: Extract the fmod.dll from the zip file to your windows/system32 folder.
- pyTTS 2.4 (or higher needed for the MemorySample example): Run the installer
- Example code and sounds
World
To start using pySonic, you must first create an instance of the World class. The World object allows you to specify sound output drivers, output devices, mixing rates, number of channels, etc. to be used by pySonic. It will default to sane values if you don’t provide it with any parameters.
import pySonic, time # create the world w = pySonic.World()
Only one instance of the World can exist at a time. If, for instance, you want to select a different output device, you have to destroy your current World instance before creating a new one.
Sources
Once you’ve created a World object, you can create one or more Sources. Sources are responsible for the playback of audio. As you’ll see later, Sources can be positioned and moved in a 3D soundspace at any time during playback.
import pySonic, time # create the world w = pySonic.World() # create two sources src1 = pySonic.Source() src2 = pySonic.Source()
Samples and Streams
The last object you’ll need before you can make some noise is a Sound. Sound objects hold audio data for playback by a Source. There are five classes that derive from the Sound class in pySonic, each with its own purpose. The classes are split into two types: Samples and Streams. There primary difference between these classes is that a Sample can be playing on multiple Sources simultaneously while a Stream can only be playing on one Source at a time.
To keep things simple, you can start by using a FileSample object to load an entire audio file from disk into memory, decoding it if necessary. After creating this object, you can associate it with a Source and tell the source to begin playback.
import pySonic, time
# create the world
w = pySonic.World()
# create two sources
src1 = pySonic.Source()
src2 = pySonic.Source()
# load audio from disk
src1.Sound = pySonic.FileSample('short.wav')
# sleep while playing
src1.Play()
while src1.IsPlaying():
time.sleep(0.5)
If the audio file is large, you can stream it from disk during playback instead of loading it entirely into memory beforehand. In this situation, you can use a FileStream object to stream audio from disk, buffering only a portion in memory at a time and decoding just in time if necessary.
import pySonic, time
# create the world
w = pySonic.World()
# create two sources
src1 = pySonic.Source()
src2 = pySonic.Source()
# load audio from disk
src1.Sound = pySonic.FileSample('short.wav')
src2.Sound = pySonic.FileStream('long.mp3')
# sleep while playing
src1.Play()
src2.Play()
while src1.IsPlaying() or src2.IsPlaying():
time.sleep(0.5)
If the audio is already in memory as a raw block of PCM data, you can use a MemorySample or MemoryStream object to prepare it for playback. A noteworthy difference between these two objects is that MemorySample copies all the sample data into a separate buffer for playback while MemoryStream plays directly from the memory buffer you hand it.
import pySonic, time import pyTTS s = '''You are utterly powerless against me now. Ha ha ha ha ha! Et cetera.''' # speak to memory tts = pyTTS.pyTTS() m = tts.SpeakToMemory(s) format = m.Format.GetWaveFormatEx() data = m.GetData() # create world and source; prepare audio w = pySonic.World() src = pySonic.Source() src.Sound = pySonic.MemorySample(data, format.Channels, format.BitsPerSample, format.SamplesPerSec) # play until done src.Play() while src.IsPlaying(): time.sleep(0.5)
You can even stream audio directly from the Web using the WebStream class. A remote audio file must be given in the form of a URL starting with the http:// protocol prefix and ending with either a .mp3 or .ogg suffix. (This is a limitation of FMOD.)
import pySonic, time
# create world and source; prepare audio
w = pySonic.World()
src = pySonic.Source()
src.Sound = pySonic.WebStream('http://www.cs.unc.edu/~parente/tiny.mp3')
# play until done
src.Play()
while src.IsPlaying():
time.sleep(0.5)
Choosing the right Sound class for your application and audio data can be a little tricky. The following table captures the critical aspects of the five classes with respect to memory behavior, processor usage, and playback features. Use it as a reference when you must decide how to load and play sampled audio.
| Class | Memory | Processor | Playback |
|---|---|---|---|
| FileSample | Loads all from disk before playback | Decodes all before playback | Supports simultaneous playback, raw data access |
| MemorySample | Copies all to new buffer before playback, raw data access | Decodes all before playback | Supports simultaneous playback, |
| FileStream | Loads segments during playback | Decodes segments during playback | Supports callbacks, sync, seek to time |
| MemoryStream | Loads segments during playback | Decodes segments during playback | Supports callbacks, sync, seek to time |
| WebStream | Loads segments during playback | Decodes segments during playback | Supports callbacks, sync, seek to time, remote data |
Spatial Sound
Source objects can be positioned in a three-dimensional audio space around the listener. One-channel Samples and Streams played by a Source that is not positioned at the origin (0,0,0) will appear to come from the Source’s position in space around the listener.
Source position can be read or set using a Source property named Position. Likewise, the instantaneous velocity of a Source can be read or set using the Velocity property of a source. The values of both properties are given as 3-tuples (x,y,z).
Take note that setting the velocity to a non-zero tuple does not mean that the Source will change position for you. You have to move the Source by hand in a timed loop to actually change its position in space. The velocity values you set are only used for computing Doppler effects.
import pySonic, time
# create world and source; prepare audio
w = pySonic.World()
src = pySonic.Source()
src.Sound = pySonic.FileSample('short.wav')
src.Position = (0.1, 0, 0.25)
src.Velocity = (0, 0, 0)
# play until done
src.Play()
while src.IsPlaying():
time.sleep(0.5)
The properties of the listener can be set using the Listener object. A reference to the Listener can be obtained through a property of the World object with the same name. The Listener has properties that define his position, velocity, orientation, and up direction. By default, the listener is at location (0,0,0), has velocity (0,0,0), is looking at (0,0,1) or down the positive z-axis, and has his head pointing up at (0,1,0) or up the y-axis.
import pySonic, time
# create world and source; prepare audio
w = pySonic.World()
src = pySonic.Source()
src.Sound = pySonic.FileSample('short.wav')
src.Position = (0.1, 0, 0.25)
src.Velocity = (0, 0, 0)
# put listener to the right of the source
w.Listener.Position = (0.5, 0, 0)
# play until done
src.Play()
while src.IsPlaying():
time.sleep(0.5)
By default, all spatial sound effects (spatialization, Doppler effects) are computed in software. If you have a soundcard that supports 3D mixing in hardware, you can use it instead by specifying an optional flag on the loaded stream or sample: FSOUND_HW3D.
import pySonic, time
# create world and source; prepare audio
w = pySonic.World()
src = pySonic.Source()
src.Sound = pySonic.FileSample('short.wav', pySonic.Constants.FSOUND_HW3D)
src.Position = (0.1, 0, 0.25)
src.Velocity = (0, 0, 0)
# play until done
src.Play()
while src.IsPlaying():
time.sleep(0.5)
If you have a soundcard that supports the Creative EAX extensions, you can also get reverberation effects for sounds mixed in hardware. Reverberation can be specified for all sounds in the World or per Source by accessing the Reverb property of either type of object. Since setting the reverberation parameters by hand can be difficult, some presets are provided for World reverberation: sewerpipe, mountains, alley, carpettedhallway, hangar, concerthall, city, bathroom, livingroom, generic, cave, forest, hallway, quarry, underwater, parkinglot, stonecorridor, paddedcell, arena, off, room, stoneroom, plain, auditorium.
import pySonic, time
# create world and source; prepare audio
w = pySonic.World()
w.Reverb.SetPreset('concerthall')
src = pySonic.Source()
src.Sound = pySonic.FileSample('short.wav', pySonic.Constants.FSOUND_HW3D)
src.Position = (0.1, 0, 0.25)
src.Velocity = (0, 0, 0)
# play until done
src.Play()
while src.IsPlaying():
time.sleep(0.5)
Stream Callbacks
Sources playing Stream objects trigger callbacks when the end of a stream is reached, when a synchronization point in the stream is reached, and when metadata is received for a Web stream. A Source will remember the callback functions you set across stream assignments.
import pySonic, time
# define a callback function
flag = False
def stream_done(source):
global flag
flag = True
print 'stream ended'
# create world and source; prepare audio
w = pySonic.World()
src = pySonic.Source()
src.Sound = pySonic.FileStream('long.mp3')
src.SetEndStreamCallback(stream_done)
# play until callback
src.Play()
while not flag:
time.sleep(0.5)
Synchronization points can be added to a stream in a sound editor or programmatic ally using pySonic. All Stream object have the methods AddSyncPoint and RemoveSyncPoint to make adding and removing sync points possible.
import pySonic, time
# define a callback function
flag = False
def stream_done(source):
global flag
flag = True
print 'stream ended'
# def a sync point callback
def stream_sync(source, name):
print 'sync point:',name
print src.CurrentTime
# create world and source; prepare audio
w = pySonic.World()
src = pySonic.Source()
src.Sound = pySonic.FileStream('long.mp3')
src.SetEndStreamCallback(stream_done)
src.SetSyncStreamCallback(stream_sync)
# add a sync point halfway through the stream
src.Sound.AddSyncPoint(src.Sound.NumSamples/2, 'halfway')
# play until callback
src.Play()
while not flag:
time.sleep(0.5)
Songs
The Song class plays music files stored on disk. The music may either be synthesized (i.e. MIDI) or sample-based (e.g. MOD, S3M, XM, IT, etc.) Songs do not support spatial sound effects, so you do not associated them with a Source for playback. Instead, you simply tell the Song itself to play, stop, or pause.
import pySonic, time
# play a MIDI for five seconds
w = pySonic.World()
song = pySonic.Song('flourish.mid')
song.Play()
time.sleep(5)
# play a S3M for five seconds
song = pySonic.Song('goblin.s3m')
song.Play()
time.sleep(5)
Recording
A Recorder object can be used to acquire audio from any device that supports audio input. Audio can be recorded until an EmptySample of the size you specify is full.
import pySonic, time # create a Recorder w = pySonic.World() rec = pySonic.Recorder() # set characteristics for the recording rate = 44100 bits = 16 channels = 1 buffer_duration = 5 # allocate an EmptySample to hold buffer_duration seconds of the incoming audio sample = pySonic.EmptySample(buffer_duration*rate, channels, bits, rate) # record for buffer_duration seconds and do not loop print 'Recording...' rec.Start(sample, loopit=False) while rec.CurrentSample < buffer_duration*rate: time.sleep(0.1) # stop recording rec.Stop() # play the sample print 'Playing...' src = pySonic.Source() src.Sound = sample src.Play() while src.IsPlaying(): time.sleep(0.1)
Alternatively, the Recorder can treat the EmptySample as a ring buffer and record continuously. In either case, once the EmptySample has been filled with some data, you can access and modify the bytes of the audio recording.
import pySonic, time # create a Recorder w = pySonic.World() rec = pySonic.Recorder() # set characteristics for the recording rate = 44100 bits = 16 channels = 1 total_duration = 5 buffer_duration = 1 # allocate an EmptySample to hold buffer_duration seconds of the incoming audio # notice buffer_duration is less than total_duration sample = pySonic.EmptySample(buffer_duration*rate, channels, bits, rate) # create a list that will hold all the sample data data = [] # fill the buffer repeatedly for total_duration seconds print ‘Recording…’ now_time = start_time = time.time() last_pos = 0 rec.Start(sample, loopit=True) while (now_time - start_time) < total_duration: time.sleep(0.1) pos = rec.CurrentSample n = pos - last_pos # watch for looping if n < 0: n += buffer_duration * rate # copy all new data to a separate buffer from last_pos to last_pos+n data.append(sample.GetBytes(last_pos, n)) # recompute the last position and the current time last_pos = pos now_time = time.time() # stop recording rec.Stop() print ‘Playing…’ # join the sample segments into one buffer and play it bytes = ”.join(data) src = pySonic.Source() src.Sound = pySonic.MemorySample(bytes, channels, bits, rate) src.Play() while src.IsPlaying(): time.sleep(0.1)
If the driver used for recording supports input from more than one device, audio is acquired from the source currently set using the configuration utility for the driver. For example, audio will be recorded from the microphone if the microphone is selected as the input device for the default soundcard in the Windows sound mixer. There is no way to select a specific input device (e.g. line in, microphone), only a specific driver (e.g. soundcard), in FMOD, and, consequentially, in pySonic.
Multithreading
A quick perusal of the FMOD forums turns up a number of posts stating “FMOD is not 100% thread safe.” To allow FMOD to be used in multithreaded applications, pySonic holds the Python Global Interpreter Lock (GIL) while making calls to FMOD functions. This behavior effectively serializes calls to FMOD, and disallows multiple Python threads from making calls into FMOD simultaneously.
While holding the GIL avoids most thread safety problems, it does have one undesirable side effect: the pySonic Stream and Song callbacks can cause deadlock. Consider the following scenario:
- A Python thread, T1, registers a Stream callback.
- T1 plays the Stream.
- T1 sleeps, waiting for the callback.
- Another Python thread, T2, creates a new pySonic Stream.
- pySonic holds the GIL in the context of T2.
- pySonic makes a call in the context of T2 to an underlying FMOD function.
- While T2 is initializing the stream, FMOD makes a callback to T1.
- pySonic tries to grab the GIL in the context of T1. (The GIL must be held to make calls from C code into Python.)
- T1 waits indefinitely to acquire the GIL (which T2 is holding).
- Meanwhile, because of the design of FMOD, T2 stalls. The FMOD function it called cannot complete until FMOD returns from the callback to T1.
- Deadlock! T1 cannot proceed because it cannot acquire the GIL. T2 cannot proceed because the callback to T1 will never complete.
To avoid deadlock in practice, callback functions should not be registered on Streams and Songs in Python applications where more than one thread makes calls into pySonic. A thread polling strategy can be employed instead to note most Stream and Song events.
Conclusion
This tutorial introduces the basics of the pySonic library. Please refer to the full pySonic API documentation for more information.