Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Representation learning for natural language
RISE SICS, Sweden; Chalmers Universtiy of Technology, Sweden.ORCID iD: 0000-0002-9567-2218
2018 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Artificial neural networks have obtained astonishing results in a diverse number of tasks.One of the reasons for the success is their ability to learn the whole task at once (endto-end learning), including the representations for data. This thesis will investigate representation learning for natural language through the study of a number of tasks ranging from automatic multi-document summarization to named entity recognition and the transformation of words into morphological forms specified by analogies.In the first two papers, we investigate whether automatic multi-document summarization can benefit from learned representations, and what are the best ways of incorporating learned representations in an extractive summarization system. We propose a novel summarization approach that represents sentences using word embeddings, and a strategy for aggregating multiple sentence similarity scores to compute summaries that take multiple aspects into account. The approach is evaluated quantitatively using the de facto evaluation system ROUGE, and obtains state-of-the-art results on standard benchmark datasets for generic multi-document summarization.The rest of the thesis studies models trained end-to-end for some specific tasks, and investigates how to train the models to perform well, and to learn internal representations of data that explain the factors of variation in the data.Specifically, we investigate whether character-based recurrent neural networks (RNNs) can learn the necessary representations for tasks such as named entity recognition (NER) and morphological analogies, and what is the best way of learning the representations needed to solve the mentioned tasks. We devise a novel character-based recurrent neural network model that recognize medical terms in health record data. The model is trained on openly available data, and evaluated using standard metrics on sensitive medical health record data in Swedish. We conclude that the model learns to solve the task and is able to generalize from the training data domain to the test domain.We then present a novel recurrent neural model that transforms a query word into the morphological form demonstrated by another word. The model is trained and evaluated using word analogies and takes as input the raw character-sequence of the words with no explicit features needed. We conclude that character-based RNNs can successfully learn good representations internally and that the proposed model performs well on the analogy task, beating the baseline with a large margin. As the model learns to transform words, it learns internal representations that disentangles morphological relations using only cues from the training objective, which is to perform well on the word transformation task.In other settings, such cues may not be available at training time, and we therefore present a regularizer that improves disentanglement in the learned representations by penalizing the correlation between activations in a layer. In the second part of the thesis we have proposed models and associated training strategies that solves the tasks and simultaneously learns informative internal representations; in Paper V this is enforced by an explicit regularization signal, suitable for when such a signal is missing from the training data (e.g. in the case of autoencoders).

Place, publisher, year, edition, pages
2018.
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:ri:diva-37733ISBN: 978-91-7597-675-4 (print)OAI: oai:DiVA.org:ri-37733DiVA, id: diva2:1286686
Note

Doktorsavhandlingar vid Chalmers Tekniska Högskola

Ny serie nr. 4356

ISSN 0346-718X

Technical Report No. 155D

Paper IM. K˚ageb¨ack, O. Mogren, N. Tahmasebi, and D. Dubhashi (2014). “Extractive summarization using continuous vector space models”. SecondWorkshop of Continuous Vector Space Models and their CompositionalityPaper IIO. Mogren, M. K˚ageb¨ack, and D. Dubhashi (2015). “Extractive summarization by aggregating multiple similarities”. Proceedings of Recent Advancesin Natural Language Processing, pp. 451–457Paper IIIS. Almgren, S. Pavlov, and O. Mogren (2016). “Named Entity Recognition in Swedish Health Records with Character-Based Deep BidirectionalLSTMs”. Proceedings of the Fifth Workshop on Building and EvaluatingResources for Biomedical Text MiningPaper IVO. Mogren and R. Johansson (2018). “Character-based recurrent neuralnetworks for morphological relational reasoning”. Submitted draft. Preliminary version published at the EMNLP 2017 Workshop on Subword andCharacter-level Models in NLPPaper VM. K˚ageb¨ack and O. Mogren (2018). “Disentangled activations in deepnetworks”. Submitted draft. Preliminary version pre

Available from: 2019-02-08 Created: 2019-02-07 Last updated: 2023-06-02Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Mogren, Olof

Search in DiVA

By author/editor
Mogren, Olof
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 469 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf