Large amounts of text are written and published daily. As a result, applications such as reading through the documents to automatically extract useful and structured information from text have become increasingly needed for people’s efficient absorption of information. They are essential for applications such as answering user questions, information retrieval and knowledge base population.
In this talk, I will focus on the challenges of finding and organizing information about events and introduce my research on document-level information extraction. In the first part, I’ll introduce methods for better modeling the knowledge from context: (1) multi-granularity encoding of information beyond the sentence level — reading the sentence from the perspectives of both local and global context; (2) generative learning of output structures that better model the dependency between extracted events to enable more coherent extraction of information (i.e., event A happening in earlier part of the long document is usually correlated with event B happening in the later part).
In the second part, to reduce the cost of human annotations required for training data and to better access relevant knowledge encoded in large models, we propose a new question-answering formulation for the extraction problem. I will conclude by outlining a research agenda for building the next generation of efficient machine reading systems with close to human-level reasoning capabilities.
Xinya Du is a Postdoctoral Research Associate at the University of Illinois at Urbana-Champaign working with Prof. Heng Ji. He earned a Ph.D. degree in Computer Science from Cornell University, advised by Prof. Claire Cardie. Before Cornell, he received a bachelor's degree in Computer Science from Shanghai Jiao Tong University. His research is on natural language processing, especially methods that enable learning with fewer annotations for document-level information extraction. His work has been published in leading NLP conferences such as ACL, EMNLP, NAACL, EACL, and has been covered by New Scientist and TechRepublic. He has received awards including the CDAC Spotlight Rising Star award and SJTU National Scholarship.