Please note that this talk will take place on Thursday.
Title: What neural models tell us about linguistic knowledge: insights from cross-linguistic investigations
Abstract: Is linguistic data enough to model human linguistic knowledge? In this talk, I will describe computational experiments that are designed to highlight how cross-linguistic variation provides unique insights into this question. I will draw on two key contributions of natural language processing: i) computational models which scale to large amounts of data, and ii) tests of linguistically naïve models on particular linguistic phenomena (e.g., subject-verb agreement) which probe aspects of linguistic knowledge. The field has uncovered considerable overlap between humans and neural models, suggesting that raw linguistic data can yield human-like linguistic knowledge. However, a direct link between model behavior and human linguistic knowledge is hindered by the fact that investigations are primarily based on studies of a single language – English. By drawing on findings from psycholinguistics, I will compare the performance of neural models and humans in two case studies: ambiguous relative clause attachment and implicit causality. While my results show that models trained and tested on English succeed in capturing human behavior, those trained and tested on Spanish (ambiguous relative clause attachment) or Italian (implicit causality) crucially fall short. I will argue that discrepancies between humans and neural models serve not only to advance the development of computational models beyond English, but also, reveal crucial cases where linguistic data is not enough for human linguistic knowledge, moving us closer to grasping the fundamental human capacity for language.