Ph.D. Thesis @ UNC

August 01, 2008

Clique: Perceptually Based, Task Oriented Auditory Display for GUI Applications

Screen reading is the prevalent approach for presenting graphical desktop applications in audio. The primary function of a screen reader is to describe widgets the user encounters when interacting with a graphical user interface (GUI). This straightforward method allows people with visual impairments to hear exactly what is on the screen, but with significant usability problems in a multitasking environment. Screen reader users must infer the state of on-going tasks spanning multiple graphical windows, from a single, serial stream of speech describing one widget after another.

In this dissertation, I explore a new approach to enabling auditory display of GUI programs. With this method, the display describes concurrent application tasks using a small set of simultaneous speech and sound streams. The user listens to and interacts solely with this display, never with the underlying graphical interfaces. Scripts support this level of adaption by mapping GUI widgets to task definitions. Evaluation of this approach shows improvements in user efficiency, satisfaction, and understanding with relatively little development effort.

To develop this method, I studied the literature on existing auditory displays, working user behavior, and theories of human auditory perception and processing. I then conducted a user study to observe problems encountered and techniques employed by users interacting with an ideal auditory display: another human being. Based on my findings, I designed and implemented a prototype auditory display, called Clique, along with scripts adapting seven GUI applications. I concluded my work by conducting a variety of evaluations on Clique. The results of these studies show the following benefits of Clique over the state of the art for users with visual impairments (1-5) and mobile sighted users (6):

Faster, accurate access to speech utterances through concurrent speech streams.
Better awareness of peripheral information via concurrent speech and sound streams.
Increased information bandwidth through concurrent streams.
More efficient information seeking enabled by ubiquitous tools for browsing and searching.
Greater accuracy in describing unfamiliar applications learned using a consistent, task-based user interface.
Faster completion of email tasks in a standard GUI after exposure to those tasks in audio.

Documents

Parente, Peter. Clique: Perceptually Based, Task Oriented Auditory Display for GUI Applications Ph.D. Thesis. University of North Carolina-Chapel Hill. July, 2008.
The proposal document introducing Clique as my dissertation topic approved in December, 2004.

Example Movie

The following video gives a sample of the Clique user experience. In the video, a user works to complete a task assigned by email using multiple programs. The speech and sounds heard are all generated by Clique, and all changes in the visual GUIs are performed by Clique as it carries out the user commands. The captions in the video explain what the user is currently doing.

An accessible alternative to the Flash player embedded below is also available. Click here to download and automatically play the movie in Quicktime.

Example Sounds

The following sounds are examples of various concepts described in the dissertation document.

Table of example sounds in OGG and MP3 formats
Description	Reference	Audio
Concatenative speech synthesis	Chapter 2, Section 2.1.1, Page 16	OGG, MP3
Formant speech synthesis	Chapter 2, Section 2.1.1, Page 17	OGG, MP3
Auditory icons	Chapter 2, Section 2.1.2, Page 18	OGG, MP3
Familial earcons	Chapter 2, Section 2.1.3, Page 20	OGG, MP3
Ambient sound	Chapter 2, Section 2.1.4, Page 22	OGG, MP3
Audio mixing	Chapter 2, Section 2.1.5, Page 23	OGG, MP3
HRTF spatialized sound	Chapter 2, Section 2.1.6, Page 24	External link
Screen reading a Web page	Chapter 2, Sections 2.5.2 and 2.5.3, Pages 60-67	OGG, MP3, Screenshot
Ideal display interaction	Chapter 3, Section 3.2.1, Page 91, List item #3	OGG, MP3
Temporal stream integration	Chapter 4, Section 4.1.2, Pages 133-134	External Link
Spectral stream integration	Chapter 4, Section 4.1.2, Pages 133-134	External Link
Content assistant in isolation	Chapter 5, Section 5.1.1, Pages 160-165	OGG, MP3
Summary assistant in isolation	Chapter 5, Section 5.1.1, Pages 160-165	OGG, MP3
Related assistant in isolation	Chapter 5, Section 5.1.1, Pages 160-165	OGG, MP3
Environmental sound theme in isolation	Chapter 5, Section 5.1.2, Pages 165-168	OGG, MP3
Program menu	Chapter 5, Section 5.1.3, Pages 168-169	OGG, MP3
Task menu	Chapter 5, Section 5.1.3, Pages 168-169	OGG, MP3
Assistant response to task navigation	Chapter 5, Section 5.1.4, Pages 170-171	OGG, MP3
Assistant response to content browsing	Chapter 5, Section 5.1.5, Page 173	OGG, MP3
Assistant response to content searching	Chapter 5, Section 5.1.5, Page 173	OGG, MP3
Assistant response to content editing	Chapter 5, Section 5.1.5, Page 173	OGG, MP3

Source Code

The Clique source code is BSD licensed. I provide it as a reference implementation of a task-based, multichannel auditory display in hope that developers will revise and extend its core concepts.

The source includes the following sounds licensed under various Creative Commons licenses:

Acknowledgement

This material is based upon work supported under a National Science Foundation Graduate Research Fellowship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation

Parente's Mindtrove

Ph.D. Thesis @ UNC

Documents

Example Movie

Example Sounds

Source Code

Acknowledgement

Another Read: Rich Audio MUDs »

Contact

Latest

More