Abstract - Measures of linguistic variation, also called linguistic distances, is one of the prominent topics in the growing field of dialectometry, which is concerned with quantifying linguistic differences and similarities and, often, links it to geographical distances between the areas where the relevant languages are spoken (Nerbornne and Kretzschmar, 2003). In this presentation, I will report on an on-going project developing computational linguistic variation metrics quantifying the lexical, pronunciation, and morphosyntactic distance between three varieties of Arabic from different regions of the Arabic speaking world. The distances are based on elicitations of data from native speakers of the varieties under consideration. The speakers (and their parents) were born and raised in the same city of consideration.
For the purpose of evaluating the lexical and pronunciation distance, the Swadesh list (207 items) was elicited and phonetically transcribed for all varieties. In the elicitation sessions, subjects are given the appropriate context for the lexical items when an ambiguity is expected, and the researcher made some adaptations to the Swadesh list to make it consistent with Arabic varieties. The lexical distance is based on the number of cognate words in the list. The pronunciation distance is based on the Levenshtein distance ' A variant of the minimum edit distance ' between pair of words in the parallel lists. Levenshtein distance was introduced to computational linguistics by Kessler (1995), and subsequently used by other researchers. The morphosyntactic distance depends on the variation in patterns of morphosyntactic agreement. To quantify the amount of variation we investigated different patterns of agreement depending on the number, gender and person of the subject. We elicited data for different classes of verbs (sound verb, geminate verb, three verbs where one of the root radicals is a glide, and a verb with two glides). In addition to the verbal paradigms that expressed subject-verb agreement, we also extended the investigation to object and possessive clitics or bound pronouns.