Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Protein name tagging for browsing support, active database cross linking, and information retrieval
RISE., Swedish ICT, SICS.ORCID-id: 0000-0001-6949-6380
2002 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Whereas many applications of natural language processing for molecular biology focus on protein name tagging for the purpose high-level information extraction from large corpuses of scientific text, such as automatic identification of protein-protein interactions, high quality protein name tagging has a value in itself. The aim of this study was to design, implement, and evaluate a high-accuracy protein name tagger, and give proof-of-concept for some of the most basic applications of protein name tagging in an information retrieval setting, namely browsing support, active database cross linking, and enhanced query functionality. A combination of heuristics, dictionary look-up, syntactic analysis, and the application of a local dynamic dictionary were used to create a protein name tagger. This tagger outperforms a previously published similar system when benchmarked on a corpus of manually annotated Medline abstracts. In addition to evaluating the tagging performance, the implemented algorithm was used to add mark-up to a corpus of approximately 10000 Medline abstracts, which were indexed in a state-of-the-art information retrieval system. Indexing highlights many basic benets of adding named entity mark-up such as protein names. One obvious benet is that the search process is enhanced by the addition of a search eld. Furthermore, the mark-up can be used for providing active hyperlinks between protein entities in presented documents and protein sequence databases, such as SwissProt, when both databases are indexed in the same information retrieval system. Efficient links can also be constructed in the opposite direction providing high precision retrieval of documents relevant for protein entries. Fast and accurate cross linking can be obtained by using an efficient implementation of the eld based approximate cosine measure, which is a simple standard information retrieval technique for document similarity searching. This poster presents methods, results, implementation details, and features of a prototype system.

sted, utgiver, år, opplag, sider
2002, 2.
HSV kategori
Identifikatorer
URN: urn:nbn:se:ri:diva-22517OAI: oai:DiVA.org:ri-22517DiVA, id: diva2:1042082
Konferanse
Bioinformatics 2002
Merknad

Poster presentation.

Tilgjengelig fra: 2016-10-31 Laget: 2016-10-31 Sist oppdatert: 2020-12-02bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Søk i DiVA

Av forfatter/redaktør
Eriksson, Gunnar
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric

urn-nbn
Totalt: 35 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
v. 2.41.0