Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Natural language information retrieval: TREC-5 report
RISE, Swedish ICT, SICS.ORCID iD: 0000-0003-4042-4919
Show others and affiliations
Number of Authors: 9
1996 (English)In: NIST Special Publication 500-238: The Fifth Text REtrieval Conference (TREC-5), November 1996, Gaithersburg, Maryland, USA, 1996, 1, , 28 p.Conference paper, (Refereed)
Abstract [en]

In this paper we report on the joint GE/Lockheed Martin/Rutgers/NYU natural language information retrieval project as related to the 5th Text Retrieval Conference (TREC-5). The main thrust of this project is to use natural language processing techniques to enhance the effectiveness of full-text document retrieval. Since our first TREC entry in 1992 (as NYU team) the basic premise of our research was to demonstrate that robust if relatively shallow NLP can help to derive a better representation of text documents for statistical search. TREC-5 marks a shift in this approach away from text representation issues and towards query development problems. While our TREC-5 system still performs extensive text processing in order to extract phrasal and other indexing terms, our main focus this year was on query construction using words, sentences, and entire passages to expand initial topic specifications in an attempt to cover their various angles, aspects and contexts. Based on our earlier TREC results indicating that NLP is more effective when long, descriptive queries are used, we allowed for liberal expansion with long passages from related documents imported verbatim into the queries. This method appears to have produced a dramatic improvement in the performance of two different statistical search engines that we tested (Cornell’s SMART and NIST’s Prise) boosting the average precision by at least 40%. The overall architecture of TREC-5 system has also changed in a number of ways from TREC-4. The most notable new feature is the stream architecture in which several independent, parallel indexes are built for a given collection, each index reflecting a different representation strategy for text documents. Stream indexes are built using a mixture of different indexing approaches, term extracting, and weighting strategies. We used both SMART and Prise base indexing engines, and selected optimal term weighting strategies for each stream, based on a training collection of approximately 500 MBytes. The final results are produced by a merging procedure that combines ranked list of documents obtained by searching all stream indexes with appropriately preprocessed queries. This allows for an effective combination of alternative retrieval and filtering methods, creating into a meta-search where the contribution of each stream can be optimized through training.

Place, publisher, year, edition, pages
1996, 1. , 28 p.
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:ri:diva-22469OAI: oai:DiVA.org:ri-22469DiVA: diva2:1042034
Conference
Fifth Text REtrieval Conference (TREC-5)
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2017-01-02Bibliographically approved

Open Access in DiVA

No full text

Other links

http

Search in DiVA

By author/editor
Karlgren, Jussi
By organisation
SICS
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

Total: 952 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.26.0