Urbana Campus Research Calendar (OVCRI)

Mark Hasegawa-Johnson, a professor of electrical and computer engineering, will present "Inclusive speech technology" at the Beckman Institute Director's Seminar at noon on Thursday, March 23 in 1005 Beckman and on Zoom. Lunch will be provided.

Registration is required to attend.

"Inclusive speech technology"

Disability is a social construct. The population of a country is divided into abled versus disabled people on the basis of arbitrary decisions by architects, engineers, and organizations about the methods that will be considered to be standard for the performance of daily activities of living.

Speech is the fastest and most versatile way to communicate among humans, and most of us are able to immediately understand the speech of our close friends and family members, regardless of dialect or disability. Speech technology, however, is just coming out of a period in which it had to be highly standardized for the purpose of rapid technology development; consequently, most speech-enabled devices can now understand your speech if and only if your voice sounds like the voice of a typical public-domain audiobook narrator.

The algorithmic innovations behind modern speech technology are inspiring leaps across the gap between science and engineering, including sequence-to-sequence learning algorithms inspired by human processes of bottom-up attention, and self-supervised learning algorithms based on categorical perception as a strategy for remembering the past and predicting the future. Training data for these machine learning algorithms comes not from industrial or government sources, but from idealism-driven public communities such as the community of public-domain audiobook narrators. The important next step in speech technology development, therefore, is not an algorithmic innovation, but an innovation in community pride: it is the promotion of speech variety as a standard part of public discourse that should be represented in audiobooks, in film, in the news, and in the training data for automatic speech technology.

The Speech Accessibility Project is intended as a communication tool that connects people with speech disabilities to the computer scientists and engineers who train speech technology. The goal of the Speech Accessibility Project is to collect 1,000 hours of transcribed training speech from 2,000 people with speech affected by neurological conditions. The project is designed to be large enough, varied enough, and secure enough to permit universities and companies to train high-accuracy speech transcription tools for people with speech affected by neurological conditions. The long-term vision for this project is a virtuous cycle in which speech technology works well for people with disabilities, thereby encouraging people with a variety of speech patterns to contribute their speech to the public sphere, thereby encouraging public acceptance and understanding of speech variety.

Mark Hasegawa-Johnson is a William L. Everitt Faculty Scholar of Electrical and Computer Engineering at the University of Illinois Urbana-Champaign. He is a fellow of the Acoustical Society of America (for contributions to vocal tract and speech modeling) and of the IEEE (for contributions to speech processing of under-resourced languages), and has published in areas including signal generation from perceptual latent spaces, audio source separation using deep recurrent neural networks, zero-shot voice conversion using separable representations of pitch, rhythm, content and timbre, and acoustic modeling of dysarthria for universally accessible speech technology.

link for robots only