UCxn: Typologically Informed Annotation of Constructions Atop Universal DependenciesShow others and affiliations
2024 (English)In: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, European Language Resources Association (ELRA) , 2024, p. 16919-16932Conference paper, Published paper (Refereed)
Abstract [en]
The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages. However, the UD annotations do not tell the full story. Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements-for example, interrogative sentences with special markers and/or word orders-are not labeled holistically. We argue for (i) augmenting UD annotations with a “UCxn” annotation layer for such meaning-bearing grammatical constructions, and (ii) approaching this in a typologically informed way so that morphosyntactic strategies can be compared across languages. As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns. In addition to findings regarding these particular constructions, our study yields important insights on methodology for describing and identifying constructions in language-general and language-particular ways, and lays the foundation for future constructional enrichment of UD treebanks.
Place, publisher, year, edition, pages
European Language Resources Association (ELRA) , 2024. p. 16919-16932
Keywords [en]
Case-studies; Corpus annotations; Grammatical construction; Interrogative sentences; Treebanks; Typology; Universal dependency; Word orders
National Category
General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:ri:diva-74948Scopus ID: 2-s2.0-85195888534ISBN: 9782493814104 (electronic)OAI: oai:DiVA.org:ri-74948DiVA, id: diva2:1892981
Conference
Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024. Hybrid, Torino, Italy. 20 May 2024 through 25 May 2024
Note
This work was initiated by the Dagstuhl Seminar 23191 \u201CUniversals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics\u201D (https://www.dagstuhl.de/23191). In addition to the authors, our discussion group also included Francis Bond, Jorg B\u00FCcker, Mathieu Constant, Daniel Flickinger, Sylvain Kahane, Peter Ljungl\u00F6f, Teresa Lynn, Alexandre Rademaker, Manfred Sailer, and Agata Savary. We are grateful for Grew infrastructure support from Bruno Guillaume; for feedback from members of the NERT lab at Georgetown and anonymous reviewers; and for discussion with Nat\u00E1lia Sathler Sigliano about some of the constructions in Portuguese. This work was supported in part by Israeli Ministry of Science and Technology grant No. 0002336 (Nurit Melnik, PI), CAPES PROEX grant No. 88887.816228/2023-00 (Arthur Lorenzi, PhD) and NSF award IIS-2144881 (Nathan Schneider, PI).
2024-08-282024-08-282025-09-23Bibliographically approved