Abstract: In recent years, generative neural network models in natural language processing and computer vision have become the frontier for malicious actors to controllably generate misinformation at scale. These realistic-looking AI-generated “fake news” have been shown to easily deceive humans, and it is, thus, critical for us to develop robust verification techniques against machine-generated fake news. Current misinformation detection approaches mainly focus on document-level fake news detection using lexical features and semantic embedding representations. However, fake news is often generated based on manipulating (misusing, exaggerating, or falsifying) only a small part of the true information, namely the knowledge elements (KEs, including entities, relations and events). Moreover, recent news oftentimes makes claims that do not have verified evidence yet, and evaluating the truthfulness of these real-time claims depends more on their consistency with other information conveyed in other data modalities. In this talk I propose to extend research on Information Extraction to evaluate the veracity of news stories and change the consumption of news media around the world. Such a system would extend traditional event extraction to future event prediction, use cross-lingual cross-media information extraction as a basis to analyze media reports from across the world, identify fine-grained falsified information, fix them and prioritize information for analyst review. I will present a new "Information Surgeon" model, which takes full advantage of state-of-the-art multimedia joint knowledge extraction techniques to analyze fine-grained event, entity, and relation elements, as well as whether these extracted knowledge elements align consistently across modalities and background knowledge. We propose a novel probabilistic graphical neural network model to fuse the outputs from these indicators to detect misinformation and make the results highly explainable. A major challenge to performing knowledge element level misinformation detection is the lack of training data. Hence, we additionally propose a novel graph-to-text generation approach to generate noisy training data automatically by knowledge element manipulation. Experiment results show that our approach achieves 92%-95% detection accuracy, 16.8% absolute higher than the state-of-the-art approach.
Bio: Heng Ji is a professor at Computer Science Department of University of Illinois Urbana-Champaign. She is an Amazon Scholar. She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Multimedia Multilingual Information Extraction, Knowledge Base Population and Knowledge-driven Generation. She was selected as "Young Scientist" and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. The awards she received include "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, Google Research Award in 2009 and 2014, IBM Watson Faculty Award in 2012 and 2014, Bosch Research Award in 2014-2018, Best-of-ICDM2013 Paper, Best-of-SDM2013 Paper, ACL2020 Best Demo Paper Award, and NAACL2021 Best Demo Paper Award. She was invited by the Secretary of the U.S. Air Force and AFRL to join Air Force Data Analytics Expert Panel to inform the Air Force Strategy 2030. She is the lead of many multi-institution projects and tasks, including the U.S. ARL projects on information fusion and knowledge networks construction, DARPA DEFT Tinker Bell team and DARPA KAIROS RESIN team. She is elected as the North American Chapter of the Association for Computational Linguistics (NAACL) secretary 2020-2021. She has served as the Program Committee Co-Chair of many conferences including NAACL-HLT2018, and she has been the coordinator for the NIST TAC Knowledge Base Population track since 2010. Her research has been widely supported by the U.S. government agencies (DARPA, ARL, IARPA, NSF, AFRL, DHS) and industry (Amazon, Google, Facebook, Bosch, IBM, Disney).