Abstract: In everyday life, speech communication takes place in less-than-ideal conditions. The major concern for the successful delivery of target information is speech intelligibility. In Section I of this talk, state-of-the-art speech intelligibility enhancement algorithms are introduced. As opposed to traditional speech enhancement techniques which focus on processing the noise-corrupted speech, This type of algorithms operate on the clean speech signal and alter it before the signal undergoes potential noise contamination during transmission. Instead of increasing speech intensity, these algorithms largely shift energy from other areas of the speech to the regions which are considered to be important for speech perception in noise, leading to the existing energy being reallocated across the time and frequency domain. Extensive validation using perceptual listening experiments in adverse listening conditions demonstrates that human listeners are able to understand speech signals modified by some intelligibility enhancement algorithms significantly better than both ordinary unmodified speech and even naturally-adapted speech (i.e. Lombard speech), which is produced by human talkers when speaking in noise. In Section II, the talk moves on to the computational modelling of speech intelligibility in noise. The current issues of modern objective intelligibility models (OIMs) are raised, followed by the theories, mechanisms and development of glimpse-based intelligibility metrics, which are intended to improve the predictive accuracy of OIM for algorithmically-modified speech and synthetic speech. The talk further explains how monaural OIMs can be extended to deal with more realistic situations where binaural listening is engaged. By comparing the model predictions to listener keyword recognition rates from three listening experiments, it is demonstrated that the binaural version of a glimpsed-based OIM is able to provide reasonable intelligibility estimations in a series of noise conditions and different room acoustics. The talk further exemplifies how contemporary machine learning techniques, e.g. neural networks and deep learning, can be used to solve some practical problems that traditional methods fall short of tackling. Finally, two demonstrations are presented to show practical applications of OIMs, e.g. intelligibility control and maintenance in audio production and in non-ideal listening conditions. In the final section, future work and possible research directions are envisaged in light of the achievements of research in intelligibility enhancement and modelling, focusing on computational modelling of informational masking in speech perception, perceptually-inspired signal processing, as well as ecological evaluation of model and algorithm development.
Bio: Yan Tang is currently a Research Fellow at the Acoustics Research Centre at the University of Salford, UK since 2014. He completed his PhD in Applied Linguistics with Sobresaliente Cum Laude at Universidad del País Vasco, Spain in 2014, focusing on computational speech and hearing. He also received a master's degree in Software Systems and Internet Technology with Distinction at the University of Sheffield, UK in 2008. Prior to that, he worked on environmental chemistry and obtained a master's degree in Environmental Science with Excellent Graduate Honour from Sichuan University, China in 2007, and his first degree in Environmental Engineering from Chengdu University of Technology, China in 2004. His current research topics lie in computational modelling of speech perception in noise, context-sensitive speech intelligibility enhancement and perceptually-motivated signal processing. His research interest also include psychoacoustics, speech production in noise, source separation, robust automatic speech recognition and machine learning.