Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
An integration of vector-based semantic analysis and simple recurrent networks for the automatic acquisition of lexical representations from unlabeled corpora
RISE - Research Institutes of Sweden, ICT, SICS.ORCID-id: 0000-0001-5100-0535
2002 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This study presents an integration of Simple Recurrent Networks to extract grammatical knowledge and Vector-Based Semantic Analysis to acquire semantic information from large corpora. Starting from a large, untagged sample of English text, we use Simple Recurrent Networks to extract morpho-syntactic vectors in an unsupervised way. These vectors are then used in place of random vectors to perform Vector-Based Semantic Analysis. In this way, we obtain rich lexical representations in the form of high-dimensional vectors that integrate morpho-syntactic and semantic information about words. Apart from incorporating data from the different levels, we argue how these vectors can be used to account for the particularities of each different word token of a given word type. The amount of lexical knowledge acquired by the technique is evaluated both by statistical analyses comparing the information contained in the vectors with existing `hand-crafted' lexical resources such as CELEX and WordNet, and by performance in language proficiency tests. We conclude by outlining the cognitive implications of this model and its potential use in the bootstrapping of lexical resources.

Ort, förlag, år, upplaga, sidor
2002, 1.
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:ri:diva-22530OAI: oai:DiVA.org:ri-22530DiVA, id: diva2:1042095
Konferens
Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data Workshop at LREC 2002, 1 June 2002, Las Palmas, Spain
Tillgänglig från: 2016-10-31 Skapad: 2016-10-31 Senast uppdaterad: 2025-09-23Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Person

Sahlgren, Magnus

Sök vidare i DiVA

Av författaren/redaktören
Sahlgren, Magnus
Av organisationen
SICS
Data- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 128 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf