Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision
RISE, Swedish ICT, SICS. Department of Linguistics and Philology.
Number of Authors: 12013 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language.

Place, publisher, year, edition, pages
Uppsala University , 2013, 7. p. xii+215-
Series
SICS dissertation series, ISSN 1101-1335
Keywords [en]
linguistic structure prediction, structured prediction, latent-variable model, semi-supervised learning, multilingual learning, cross-lingual learning, indirect supervision, partial supervision, ambiguous supervision, part-of-speech tagging, dependency parsing, named-entity recognition, sentiment analysis
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-24193ISBN: 978-91-554-8631-0 (print)OAI: oai:DiVA.org:ri-24193DiVA, id: diva2:1043272
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2020-12-01Bibliographically approved

Open Access in DiVA

fulltext(2892 kB)337 downloads
File information
File name FULLTEXT01.pdfFile size 2892 kBChecksum SHA-512
34d74a16d7299bcef1e9f6662654822119bd7bee77d8881f46af0c4c177afca49e55d1ac007acedac85028072d2298cc259eddedce302a8e817c8a78d007c46d
Type fulltextMimetype application/pdf

Other links

http
By organisation
SICS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 337 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 178 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf