Talk’s title: Deep Learning for Prosody-Based Irony Classification in Spontaneous Speech
Recognizing irony in speech and text can be challenging even for humans. For natural language processing applications, irony recognition presents a unique challenge as the presence of irony reverses the meaning and sentiment of the words themselves. Combining phonological insights and machine learning methods, this presentation details work endeavoring to capture as holistic as possible an acoustic impression of ironic speech as it differs from non-ironic speech, and use the insights gleaned to inform the construction of deep learning-based irony classification models.
The Sad Boyz Corpus, consisting of 4.68 hours of irony-annotated, naturalistic, conversational speech data has been constructed for the purposes of this research, and is publicly available for future researchers in this space. A wide array of utterance-level and time-series acoustic features are extracted from this data, and are analyzed using PCA, Logistic Mixed Effect Regression Models, and Generalized Additive Models. Inferential statistical results reveal areas of statistically significant difference (p<0.05) between ironic and non-ironic speech in the domains of frequency (F0) on both the utterance level and time-series contour, utterance-level timing/duration features, and time-series HNR, MFCCs, and RASTA-PLP. These inferential statistical results are then utilized to inform the training and fine-tuning of a series of deep learning approaches for irony classification, achieving an AUC of 0.8 in the speaker dependent condition, and an AUC of 0.77 in the speaker independent condition, outperforming the results of most irony classification models in the existing literature. In addition to the myriad of real-world applications for such a system, its contribution to the understanding of prosodically-encoded augmentation of semantic content constitutes a significant step forward for research in the fields of linguistics and NLP.