Computer Science Speakers Calendar

Back to Listing

Radu Florian "The Road Towards Language Agnostic Information Extraction"

Event Type
Seminar/Symposium
Sponsor
The Department of Computer Science, University of Illinois, BLENDER Lab
Location
https://illinois.zoom.us/j/8167899060?pwd=YkZrQ09zODRzL0txRGF5bnhWdmk0UT09
Virtual
wifi event
Date
Apr 30, 2021   11:00 am  
Speaker
Radu Florian, Distinguished Research Scientist and Senior Manager, Multilingual Natural Language Processing Group in IBM Watson Research Center
Contact
Candice Steidinger
E-Mail
steidin2@illinois.edu
Phone
217-300-8564
Views
17

Abstract:

In this talk I will present my view on the evolution of Information Extraction, in particular mention detection, coreference resolution, relation extraction, and entity linking across multiple languages, going from language specific in CoNLL'02 and CoNLL'03 up until current research that produces models that can process a large set of languages with one engine. If time permits, I will also present some newer experiments that allows a user to take a system in English (for instance) and produce good models in other languages, further enabling true multi-language Information Extraction.

 

Bio: 

Radu Florian wears two hats as Distinguished Research Scientist and Senior Manager, managing the Multilingual Natural Language Processing Group in IBM Watson Research Center in Yorktown Heights, NY. His research interests include multi-language statistical information extraction, question answering, semantic parsing, and machine learning. He has participated and lead teams in several competitions, including CoNLL information extraction, ACE, TAC-KBP, and DARPA projects such as GALE, BOLT, MRP, and KAIROS.


One of the recent focus in Radu's research involves building models that can operate cross-language -- using multilingual language models such as multilingual BERT or XLMRoberta to build information extraction, dependency parsing, and question answering models that can take a wide variety of languages as input and produce good output, even on languages that were not trained on. Not only these models can perform the extraction on any of the language input (unfortunately, not on Klingon yet), but they usually work better than the models built on one language alone, and do degrade gracefully when tested on completely new languages.

link for robots only