JSonic: Speech and sound using HTML5
I've released a new library called JSonic for text-to-speech synthesis and sound playback in browsers supporting HTML5 <audio>. The code is on GitHub along with full documentation of the JS and REST APIs.
The client API is implemented as a Dojo dijit._Widget subclass. Other client implementations are possible as long as they provide the same JS interface. The TTS synthesis is implemented server-side using espeak and Tornado. Other server implementations are possible as long as they adhere to the REST API, and other speech engines can be plugged in rather easily.
The UNC Open Web group is looking to use JSonic to build self-voicing web games for kids with disabilities. I've already ported my Spaceship! game (also available on GitHub) to use it instead of Outfox, and hope deploy it somewhere in the near future.
Bug reports, bug fixes, comments, questions, uses, and so on are welcome. Please use the issue tracker on the GitHub project page when reporting bugs.
HTML5 audio caching
One of my latest coding endeavors is a text-to-speech interface for JavaScript using HTML5 <audio> elements to output synthesized speech from a server. To reduce the latency between a speech request and actual speech output, I'm using various levels of caching. One of these is the regular browser disk cache based on HTTP headers.
It turns out that browser caching behavior for <audio> data varies wildly among browsers. The following table shows the HTML5 <audio> caching behavior of various browsers. I tested all of them on OS X 10.6 with the standard Mac Apache server hosting all of the tested audio files.
| Browser | <audio> Behavior |
|---|---|
| Firefox 3.6 | Respects cache headers for the sound data. Only contacts the server when the cache item expires. <audio> elements pointing to the same src reuse the cache data. |
| Chrome 5.0.322.2 | Contacts the server on every load(). When it receives a 304 response, does not refetch content.* |
| Safari 4.0.4 | Contacts the server to fetch first two bytes of the audio file on every load(). Receives a 206 response with partial content. Fetches the additional bytes from the file. Receives another 206 response with the partial content. Performs another fetch and receives a 304 response with no data. Continues to alternate between fetches that receive 206 partial data responses and 304 not modified responses. Nothing appears to get cached. |
| Webkit r54921 | Same behavior as Safari 4.0.4. |
* Though not cache related, audio output in Chrome is often clipped before the end of the actual audio data. When this occurs, Chrome fires the onended event even before the audible output finishes.
Except for Firefox 3.6, all of these browsers seem to exhibit pretty terrible caching behavior when it comes to audio. I've reported bugs where I thought appropriate, but maybe I'm missing something. Am I supposed to include additional headers in the server-side response? Or maybe I'm glossing over some key part of the <audio> API? If so, please let me know. If not, yikes: <audio> support has definite room for improvement.