Parente's Mindtrove

Ph.D. Thesis @ UNC

August 01, 2008

Clique: Perceptually Based, Task Oriented Auditory Display for GUI Applications

Screen reading is the prevalent approach for presenting graphical desktop applications in audio. The primary function of a screen reader is to describe widgets the user encounters when interacting with a graphical user interface (GUI). This straightforward method allows people with visual impairments to hear exactly what is on the screen, but with significant usability problems in a multitasking environment. Screen reader users must infer the state of on-going tasks spanning multiple graphical windows, from a single, serial stream of speech describing one widget after another.

In this dissertation, I explore a new approach to enabling auditory display of GUI programs. With this method, the display describes concurrent application tasks using a small set of simultaneous speech and sound streams. The user listens to and interacts solely with this display, never with the underlying graphical interfaces. Scripts support this level of adaption by mapping GUI widgets to task definitions. Evaluation of this approach shows improvements in user efficiency, satisfaction, and understanding with relatively little development effort.

To develop this method, I studied the literature on existing auditory displays, working user behavior, and theories of human auditory perception and processing. I then conducted a user study to observe problems encountered and techniques employed by users interacting with an ideal auditory display: another human being. Based on my findings, I designed and implemented a prototype auditory display, called Clique, along with scripts adapting seven GUI applications. I concluded my work by conducting a variety of evaluations on Clique. The results of these studies show the following benefits of Clique over the state of the art for users with visual impairments (1-5) and mobile sighted users (6):

  1. Faster, accurate access to speech utterances through concurrent speech streams.
  2. Better awareness of peripheral information via concurrent speech and sound streams.
  3. Increased information bandwidth through concurrent streams.
  4. More efficient information seeking enabled by ubiquitous tools for browsing and searching.
  5. Greater accuracy in describing unfamiliar applications learned using a consistent, task-based user interface.
  6. Faster completion of email tasks in a standard GUI after exposure to those tasks in audio.

Documents

Example Movie

The following video gives a sample of the Clique user experience. In the video, a user works to complete a task assigned by email using multiple programs. The speech and sounds heard are all generated by Clique, and all changes in the visual GUIs are performed by Clique as it carries out the user commands. The captions in the video explain what the user is currently doing.

An accessible alternative to the Flash player embedded below is also available. Click here to download and automatically play the movie in Quicktime.

Example Sounds

The following sounds are examples of various concepts described in the dissertation document.

Table of example sounds in OGG and MP3 formats
Description Reference Audio
Concatenative speech synthesis Chapter 2, Section 2.1.1, Page 16 OGG, MP3
Formant speech synthesis Chapter 2, Section 2.1.1, Page 17 OGG, MP3
Auditory icons Chapter 2, Section 2.1.2, Page 18 OGG, MP3
Familial earcons Chapter 2, Section 2.1.3, Page 20 OGG, MP3
Ambient sound Chapter 2, Section 2.1.4, Page 22 OGG, MP3
Audio mixing Chapter 2, Section 2.1.5, Page 23 OGG, MP3
HRTF spatialized sound Chapter 2, Section 2.1.6, Page 24 External link
Screen reading a Web page Chapter 2, Sections 2.5.2 and 2.5.3, Pages 60-67 OGG, MP3, Screenshot
Ideal display interaction Chapter 3, Section 3.2.1, Page 91, List item #3 OGG, MP3
Temporal stream integration Chapter 4, Section 4.1.2, Pages 133-134 External Link
Spectral stream integration Chapter 4, Section 4.1.2, Pages 133-134 External Link
Content assistant in isolation Chapter 5, Section 5.1.1, Pages 160-165 OGG, MP3
Summary assistant in isolation Chapter 5, Section 5.1.1, Pages 160-165 OGG, MP3
Related assistant in isolation Chapter 5, Section 5.1.1, Pages 160-165 OGG, MP3
Environmental sound theme in isolation Chapter 5, Section 5.1.2, Pages 165-168 OGG, MP3
Program menu Chapter 5, Section 5.1.3, Pages 168-169 OGG, MP3
Task menu Chapter 5, Section 5.1.3, Pages 168-169 OGG, MP3
Assistant response to task navigation Chapter 5, Section 5.1.4, Pages 170-171 OGG, MP3
Assistant response to content browsing Chapter 5, Section 5.1.5, Page 173 OGG, MP3
Assistant response to content searching Chapter 5, Section 5.1.5, Page 173 OGG, MP3
Assistant response to content editing Chapter 5, Section 5.1.5, Page 173 OGG, MP3

Source Code

The Clique source code is BSD licensed. I provide it as a reference implementation of a task-based, multichannel auditory display in hope that developers will revise and extend its core concepts.

The source includes the following sounds licensed under various Creative Commons licenses:

Acknowledgement

This material is based upon work supported under a National Science Foundation Graduate Research Fellowship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation

Another Read: Rich Audio MUDs »

Gary has mentioned that sound adventure games like Descent into Madness and The Last Crusade have served as effective rewards in some local schools. Kids with visual impairments work hard in order to earn time playing them.