Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Streaming word similarity mining on the cheap
RISE - Research Institutes of Sweden, ICT, SICS. (RISE AI)ORCID-id: 0000-0001-9244-4546
RISE - Research Institutes of Sweden, ICT, SICS. (RISE AI)ORCID-id: 0000-0001-8952-3542
2018 (Engelska)Konferensbidrag, Publicerat paper (Övrigt vetenskapligt)
Abstract [en]

Accurately and efficiently estimating word similarities from text is fundamental in natural language processing. In this paper, we propose a fast and lightweight method for estimating similarities from streams by explicitly counting second-order co-occurrences. The method rests on the observation that words that are highly correlated with respect to such counts are also highly similar with respect to first-order co-occurrences. Using buffers of co-occurred words per word to count second-order co-occurrences, we can then estimate similarities in a single pass over data without having to do prohibitively expensive similarity calculations. We demonstrate that this approach is scalable, converges rapidly, behaves robustly under parameter changes, and that it captures word similarities on par with those given by state-of-the-art word embeddings.

Ort, förlag, år, upplaga, sidor
2018.
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
URN: urn:nbn:se:ri:diva-35186OAI: oai:DiVA.org:ri-35186DiVA, id: diva2:1249038
Konferens
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Tillgänglig från: 2018-09-18 Skapad: 2018-09-18 Senast uppdaterad: 2025-02-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Person

Görnerup, OlofGillblad, Daniel

Sök vidare i DiVA

Av författaren/redaktören
Görnerup, OlofGillblad, Daniel
Av organisationen
SICS
Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 190 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf