mindtrove Collecting ideas since 1980

19Feb/102

HTML5 audio caching

One of my latest coding endeavors is a text-to-speech interface for JavaScript using HTML5 <audio> elements to output synthesized speech from a server. To reduce the latency between a speech request and actual speech output, I'm using various levels of caching. One of these is the regular browser disk cache based on HTTP headers.

It turns out that browser caching behavior for <audio> data varies wildly among browsers. The following table shows the HTML5 <audio> caching behavior of various browsers. I tested all of them on OS X 10.6 with the standard Mac Apache server hosting all of the tested audio files.

Browser <audio> Behavior
Firefox 3.6 Respects cache headers for the sound data. Only contacts the server when the cache item expires. <audio> elements pointing to the same src reuse the cache data.
Chrome 5.0.322.2 Contacts the server on every load(). When it receives a 304 response, does not refetch content.*
Safari 4.0.4 Contacts the server to fetch first two bytes of the audio file on every load(). Receives a 206 response with partial content. Fetches the additional bytes from the file. Receives another 206 response with the partial content. Performs another fetch and receives a 304 response with no data. Continues to alternate between fetches that receive 206 partial data responses and 304 not modified responses. Nothing appears to get cached.
Webkit r54921 Same behavior as Safari 4.0.4.

* Though not cache related, audio output in Chrome is often clipped before the end of the actual audio data. When this occurs, Chrome fires the onended event even before the audible output finishes.

Except for Firefox 3.6, all of these browsers seem to exhibit pretty terrible caching behavior when it comes to audio. I've reported bugs where I thought appropriate, but maybe I'm missing something. Am I supposed to include additional headers in the server-side response? Or maybe I'm glossing over some key part of the <audio> API? If so, please let me know. If not, yikes: <audio> support has definite room for improvement.

11Mar/092

iPod Shuffle with TTS

It's about time! Now when can I expect it on the other iPods? Just because they have screens doesn't mean I'm always looking at them.

Article at Gizmodo

Tagged as: , 2 Comments
29May/081

Rich Audio MUDs

Gary has mentioned that sound adventure games like Descent into Madness and The Last Crusade have served as effective rewards in some local schools. Kids with visual impairments work hard in order to earn time playing them.

I've been brainstorming a bit about open-ended multi-user dungeons (MUDs) with rich sound and speech. Gary thinks it would be beneficial to use the MUD for educational purposes, not just as a reward after the work is done. I tend to agree, as long as its easy for teachers, older students, parents, and so on to translate lessons into in-game puzzles and adventures.

Here are some ideas I think could go into such a system to make it fun and rewarding for the kids, and an interesting platform for games.

  • Rooms and items. A simple setup including the entire dungeon, rooms with arbitrary connections, items in rooms, and users in rooms can should account for a large number of adventure game designs.
  • A basic command set. Take, drop, give, use, go, and a few other very simple commands can be supported everywhere to let the user navigate and interact with the environment.
  • An extensible parser. Items can define additional supported commands, even to the extent where the become...
  • In-world games. To go beyond exploring rooms, picking up items, and using items, items and rooms themselves can become full games. Imagine a puzzle game as an item or a room where everyone is participating in a guessing game.
  • Other input methods. Items can reconfigure the keyboard to support simpler methods of interaction. For example, an multiple choice game might require only use of the arrow keys instead of requiring the user to enter full sentences. A game might enable other devices too (e.g., DDR pad).
  • A rich audio client. Most MUD clients are text-only. When used with a screen reader, the game experience is entirely spoken. With a custom client, the game logic can provide responses the client renders as speech and sound in any number of streams.
  • A client/server configuration over XMPP. The dungeon lives on a server, though not necessarily the same machine as the XMPP server. Instead, it's just another XMPP client with a well known JID. A bot of sorts. The rooms and items can be objects managed by that bot, or even become other bots themselves. The dungeon can exist across multiple machines.
  • Collaboration. Rooms in the dungeon are like chat rooms where some text entered in the client is broadcast to everyone, and other commands are addressed to items in rooms. Going beyond simple text, clients could implement XMPP Jingle to support voice chat.

Is anyone working on a similar project? Heard of a similar project?