Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Counting Lumps in Word Space: Density as a Measure of Corpus Homogeneity
RISE - Research Institutes of Sweden (2017-2019), ICT, SICS.ORCID iD: 0000-0001-5100-0535
RISE, Swedish ICT, SICS.ORCID iD: 0000-0003-4042-4919
2005 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces a measure of corpus homogeneity that indicates the amount of topical dispersion in a corpus. The measure is based on the density of neighborhoods in semantic word spaces. We evaluate the measure by comparing the results for five different corpora. Our initial results indicate that the proposed density measure can indeed identify differences in topical dispersion.

Place, publisher, year, edition, pages
2005, 1.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-20963OAI: oai:DiVA.org:ri-20963DiVA, id: diva2:1040997
Conference
String Processing and Information Retrieval: 12th International Conference, SPIRE 2005, Buenos Aires, Argentina
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2020-12-02Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Sahlgren, Magnus

Search in DiVA

By author/editor
Sahlgren, MagnusKarlgren, Jussi
By organisation
SICSSICS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 83 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf