Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Streaming word similarity mining on the cheap
RISE - Research Institutes of Sweden, ICT, SICS. (RISE AI)ORCID iD: 0000-0001-9244-4546
RISE - Research Institutes of Sweden, ICT, SICS. (RISE AI)ORCID iD: 0000-0001-8952-3542
2018 (English)Conference paper, Published paper (Other academic)
Abstract [en]

Accurately and efficiently estimating word similarities from text is fundamental in natural language processing. In this paper, we propose a fast and lightweight method for estimating similarities from streams by explicitly counting second-order co-occurrences. The method rests on the observation that words that are highly correlated with respect to such counts are also highly similar with respect to first-order co-occurrences. Using buffers of co-occurred words per word to count second-order co-occurrences, we can then estimate similarities in a single pass over data without having to do prohibitively expensive similarity calculations. We demonstrate that this approach is scalable, converges rapidly, behaves robustly under parameter changes, and that it captures word similarities on par with those given by state-of-the-art word embeddings.

Place, publisher, year, edition, pages
2018.
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:ri:diva-35186OAI: oai:DiVA.org:ri-35186DiVA, id: diva2:1249038
Conference
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Available from: 2018-09-18 Created: 2018-09-18 Last updated: 2018-09-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records BETA

Görnerup, OlofGillblad, Daniel

Search in DiVA

By author/editor
Görnerup, OlofGillblad, Daniel
By organisation
SICS
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 61 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.35.7