Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Compound terms and their constituent elements in information retrieval
RISE, Swedish ICT, SICS.ORCID iD: 0000-0003-4042-4919
Number of Authors: 12005 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Compounds, especially in languages where compounds are formed by concatenation without intervening whitespace between elements, pose challenges to simple text retrieval algorithms. Search queries that include compounds may not retrieve texts where elements of those compounds occur in uncompounded form; search queries that lack compounds will not retrieve texts where the salient elements are buried inside compounds. This study explores the distributional characteristics of compounds and their constituent elements using Swedish, a compounding language, as a test case. The compounds studied are taken from experimental search topics given for CLEF, the Cross-Language Evaluation Forum and their distributions are related to relevance assessments made on the collection under study and evaluated in terms of divergence from expected random distribution over documents. The observations made have direct ramifications on e.g. query analysis and term weighting approaches in information retrieval system design.

Place, publisher, year, edition, pages
2005, 1.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-20956OAI: oai:DiVA.org:ri-20956DiVA, id: diva2:1040990
Conference
15th Nordic Conference of Computational Linguistics
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2020-12-02Bibliographically approved

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Karlgren, Jussi
By organisation
SICS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 294 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf