European Union Center

View Full Calendar

Linguistics Seminar Series: Kristopher Kyle (Oregon)

Event Type

Lecture

Sponsor

Department of Linguistics and the School of Literatures, Cultures, and Linguistics

Location

LCLB Lucy Ellis Lounge

Date

Nov 17, 2025 4:00 pm

Views

Originating Calendar

Linguistics Event Calendar

Join us for this exciting installment of the Linguistics Seminar Series!

Talk Title: Leveraging "AI" to bootstrap linguistic annotation in applied linguistic research
Abstract:Corpus linguistics is broadly concerned with describing language use based on representative samples of spoken, written, and/or signed texts. Corpus linguistic methods are used in applied linguistic research to describe particular language use domains (e.g., in research related to discourse analysis and language for specific purposes) and to investigate differences in language use across time and/or proficiency and/or between people groups. Although there are exceptions (e.g., Hunston, 2022; Sinclair, 1991), most corpus linguistic research leverages linguistic annotation of some sort (part of speech tags, syntactic dependency information, and/or other information). As the growth of the internet has made the collection of extremely large corpora feasible for most researchers, linguistic annotation has increasingly been automated (e.g., Davies, 2009; Kyle, 2021; Schäfer, 2015; Wenzek et al., 2020). Advances in machine learning, such as development of neural network architectures, pre-trained language models (PLMs), and large language models (LLMs) have also improved the accuracy of well-known annotation tasks while requiring less training data. Part of speech (POS) tagging has reached accuracy rates of 98%, and syntactic dependency parsing has reached accuracy rates above 95%. Accordingly, (applied) corpus linguists have leveraged automatic POS tagging and syntactic parsing annotation for a number of research purposes. However, there are many linguistic features that have not been investigated in large datasets because automatic annotation tools have, for multiple reasons, been unavailable. While most previous applications of automated annotation has relied on the adaptation of traditional annotation tools such as POS taggers and syntactic parsers, advances in machine learning in the form of pre-trained language models (PLMs) and generative large-language models (LLMs) has brought the creation of bespoke, contextually aware automated annotation tools in reach for most researchers in applied linguistics. In this talk, I provide an accessible introduction to the PLMs and LLMs (often referred to as “AI”) and empirically examine two concrete ways that they can be used to bootstrap linguistic annotation of corpora for use in applied linguistic research.

link for robots only