HTML5 & Javascript

For more than 20 years, audio descriptions (also known as video descriptions) have been delivered using human narration. Traditionally, a describer will first write a script that describes key visual elements, such as costumes, scenery, scene changes, on-screen text, etc., that would otherwise not be available to viewers unable to see the screen. These descriptions are normally carefully timed to fit into the natural pauses of the dialog or narration. This script is recorded by a human narrator, and the description audio track is then mixed with the regular program-audio soundtrack before the program or movie is broadcast. In television programming, descriptions are usually delivered to the viewer via a separate audio channel and can be turned on and off; in theatrical presentations, such as first-run movies, descriptions can be delivered via wireless transmitter to patrons wearing special headsets; in an online environment, descriptions are often delivered as part of the regular program-audio soundtrack and cannot be turned off (these open-described movies are often offered as alternatives to the undescribed versions).

IBM-Research Tokyo recently partnered with NCAM to research ways to deliver online audio descriptions via text-to-speech (TTS) methods, rather than using human recordings. IBM and NCAM explored two approaches which exploit new HTML5 media elements-- , and -- as well as Javascript and TTML:

Writing and time-stamping a description script, then delivering the descriptions as hidden text in real time in such a way that a user's screen reader will read them aloud. The descriptions remain otherwise invisible and inaudible to non-screen-reader users.
Writing and time-stamping descriptions, then recording them using TTS technology. At the time of playback, each description is individually retrieved and played aloud at intervals corresponding to the time-stamped script.

Learn more about this project here.