Talk title: Granular Text Classification for Biomedical Natural Language Processing
Abstract: The vast amount of textual information available in digital format and the difficulty in manual processing of this information have made Natural Language Processing (NLP) tools widely popular in recent years. NLP systems often contain multiple components that together enable automatic processing of language input. One common component of these systems is text classification, with applications in document organization, sentiment analysis, spam detection, and intent/domain classification in modern conversational AI systems. Text classification has been studied for documents with different lengths as independent machine learning tasks in different application domains. Given the granular nature of language discourse, however, the present research aims to study domain-specific granular text classification for biomedical natural language processing. I define granular text classification as performing classification on fine-grained text by leveraging training data at higher levels of granularity. Given the difficulty to obtain labeled data at various levels of granularity to train machine learning text classifiers, I explore approaches to synthesize training data at different levels of granularity for biomedical text classification and present a qualitative error analysis of the corresponding classification results.