Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Stylistic Experiments for Information Retrieval
RISE, Swedish ICT, SICS. Stockholm University.ORCID iD: 0000-0003-4042-4919
Number of Authors: 12000 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topic. The experiments described in this text investigate {\em stylistic variation}. Roughly put, style is the difference between two ways of saying the same thing --- and systematic stylistic variation can be used to characterize the {\em genre} of documents. These experiments investigate if stylistic information is distinguishable using simple language engineering methods, and if in that case this type of information can be used to improve information retrieval systems. A first set of experiments shows that simple measures of stylistic variation can be used to distinguish genres from each other quite adequately; how well depends on what the genres in question are. A second set of experiments evaluates the utility of stylistic measures for the purposes of information retrieval, to identify common characteristics of relevant and non-relevant documents. The conclusion is that the requests for information as typically expressed to retrieval systems are too terse and inspecific for non-topical information to improve retrieval results. Systems for information access need to be designed from the beginning to handle richer information about the texts and documents at hand: information about stylistic variation cannot easily be added to an existing system. A third set of experiments explores how an interactive system can be designed to incorporate stylistic information in the interface between user and system. These experiments resulted in the design an interface for categorizing retrieval results by genre, and displaying the retrieval results using this categorization. This interface is integrated into a prototype for retrieving information from the World Wide Web.

Place, publisher, year, edition, pages
Stockholm University, 2000, 1.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-20982OAI: oai:DiVA.org:ri-20982DiVA, id: diva2:1041016
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2020-12-02Bibliographically approved

Open Access in DiVA

fulltext(4565 kB)478 downloads
File information
File name FULLTEXT01.pdfFile size 4565 kBChecksum SHA-512
f4f53135ea94a51943f499a0ecd8f18cdbf88a48ae7e8a60fbbf23585e057a960b72aaea503fad3f901866e4ce114b6fbf3fe9bb6d74bfae59f9a470c850e429
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Karlgren, Jussi
By organisation
SICS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 478 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 446 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf