Siebel School Speaker Series Master Calendar

Radu Florian "The Road Towards Language Agnostic Information Extraction"

Event Type

Seminar/Symposium

Sponsor

The Department of Computer Science, University of Illinois, BLENDER Lab

Virtual

Join online

Date

Apr 30, 2021 11:00 am

Speaker

Radu Florian, Distinguished Research Scientist and Senior Manager, Multilingual Natural Language Processing Group in IBM Watson Research Center

Contact

Candice Steidinger

E-Mail

steidin2@illinois.edu

Phone

217-300-8564

Views

Originating Calendar

Siebel School Speakers Calendar

Abstract:

In this talk I will present my view on the evolution of Information Extraction, in particular mention detection, coreference resolution, relation extraction, and entity linking across multiple languages, going from language specific in CoNLL'02 and CoNLL'03 up until current research that produces models that can process a large set of languages with one engine. If time permits, I will also present some newer experiments that allows a user to take a system in English (for instance) and produce good models in other languages, further enabling true multi-language Information Extraction.

Bio:

Radu Florian wears two hats as Distinguished Research Scientist and Senior Manager, managing the Multilingual Natural Language Processing Group in IBM Watson Research Center in Yorktown Heights, NY. His research interests include multi-language statistical information extraction, question answering, semantic parsing, and machine learning. He has participated and lead teams in several competitions, including CoNLL information extraction, ACE, TAC-KBP, and DARPA projects such as GALE, BOLT, MRP, and KAIROS.

One of the recent focus in Radu's research involves building models that can operate cross-language -- using multilingual language models such as multilingual BERT or XLMRoberta to build information extraction, dependency parsing, and question answering models that can take a wide variety of languages as input and produce good output, even on languages that were not trained on. Not only these models can perform the extraction on any of the language input (unfortunately, not on Klingon yet), but they usually work better than the models built on one language alone, and do degrade gracefully when tested on completely new languages.