System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Streaming word similarity mining on the cheap
RISE - Research Institutes of Sweden, ICT, SICS. (RISE AI)ORCID iD: 0000-0001-9244-4546
RISE - Research Institutes of Sweden, ICT, SICS. (RISE AI)ORCID iD: 0000-0001-8952-3542
2018 (English)Conference paper, Published paper (Other academic)
Abstract [en]

Accurately and efficiently estimating word similarities from text is fundamental in natural language processing. In this paper, we propose a fast and lightweight method for estimating similarities from streams by explicitly counting second-order co-occurrences. The method rests on the observation that words that are highly correlated with respect to such counts are also highly similar with respect to first-order co-occurrences. Using buffers of co-occurred words per word to count second-order co-occurrences, we can then estimate similarities in a single pass over data without having to do prohibitively expensive similarity calculations. We demonstrate that this approach is scalable, converges rapidly, behaves robustly under parameter changes, and that it captures word similarities on par with those given by state-of-the-art word embeddings.

Place, publisher, year, edition, pages
2018.
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:ri:diva-35186OAI: oai:DiVA.org:ri-35186DiVA, id: diva2:1249038
Conference
Conference on Empirical Methods in Natural Language Processing (EMNLP)
Available from: 2018-09-18 Created: 2018-09-18 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Görnerup, OlofGillblad, Daniel

Search in DiVA

By author/editor
Görnerup, OlofGillblad, Daniel
By organisation
SICS
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 179 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf