Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Nucleus Composition in Transition-based Dependency Parsing
RISE Research Institutes of Sweden, Digital Systems, Data Science. Uppsala University, Sweden.ORCID iD: 0000-0002-7873-3971
Linköping University, Sweden.
RISE Research Institutes of Sweden, Digital Systems, Data Science. Uppsala University, Sweden.ORCID iD: 0000-0003-3246-1664
Uppsala University, Sweden.
2022 (English)In: Computational linguistics - Association for Computational Linguistics (Print), ISSN 0891-2017, E-ISSN 1530-9312, Vol. 48, no 4, p. 849-886Article in journal (Refereed) Published
Abstract [en]

Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesnière instead posits a more abstract notion of nucleus, which may be realized as one or more words. In this article, we investigate the effect of enriching computational parsing models with a concept of nucleus inspired by Tesnière. We begin by reviewing how the concept of nucleus can be defined in the framework of Universal Dependencies, which has become the de facto standard for training and evaluating supervised dependency parsers, and explaining how composition functions can be used to make neural transition-based dependency parsers aware of the nuclei thus defined. We then perform an extensive experimental study, using data from 20 languages to assess the impact of nucleus composition across languages with different typological characteristics, and utilizing a variety of analytical tools including ablation, linear mixed-effects models, diagnostic classifiers, and dimensionality reduction. The analysis reveals that nucleus composition gives small but consistent improvements in parsing accuracy for most languages, and that the improvement mainly concerns the analysis of main predicates, nominal dependents, clausal dependents, and coordination structures. Significant factors explaining the rate of improvement across languages include entropy in coordination structures and frequency of certain function words, in particular determiners. Analysis using dimensionality reduction and diagnostic classifiers suggests that nucleus composition increases the similarity of vectors representing nuclei of the same syntactic type. 

Place, publisher, year, edition, pages
MIT Press Journals , 2022. Vol. 48, no 4, p. 849-886
Keywords [en]
Abstracting, Computational linguistics, Structure (composition), Abstract notions, Computational modelling, Coordination structures, Dependency parser, Dependency parsing, Dependency relation, Dimensionality reduction, Nuclei composition, Syntactic analysis, Syntactic structure, Syntactics
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:ri:diva-61573DOI: 10.1162/coli_a_00450Scopus ID: 2-s2.0-85143253082OAI: oai:DiVA.org:ri-61573DiVA, id: diva2:1721054
Note

Funding details: Vetenskapsrådet, VR, 2016-01817; Funding text 1: We are grateful to Miryam de Lhoneux, Artur Kulmizev, and Sara Stymne for valuable comments and suggestions. We thank the action editor and the three reviewers for constructive comments that helped us improve the final version. The research presented in this article was supported by the Swedish Research Council (grant 2016-01817).

Available from: 2022-12-20 Created: 2022-12-20 Last updated: 2023-10-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Nivre, JoakimDürlich, Luise

Search in DiVA

By author/editor
Nivre, JoakimDürlich, Luise
By organisation
Data Science
In the same journal
Computational linguistics - Association for Computational Linguistics (Print)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 345 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf