Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Protein name tagging for browsing support, active database cross linking, and information retrieval
RISE, Swedish ICT, SICS.ORCID iD: 0000-0001-6949-6380
2002 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Whereas many applications of natural language processing for molecular biology focus on protein name tagging for the purpose high-level information extraction from large corpuses of scientific text, such as automatic identification of protein-protein interactions, high quality protein name tagging has a value in itself. The aim of this study was to design, implement, and evaluate a high-accuracy protein name tagger, and give proof-of-concept for some of the most basic applications of protein name tagging in an information retrieval setting, namely browsing support, active database cross linking, and enhanced query functionality. A combination of heuristics, dictionary look-up, syntactic analysis, and the application of a local dynamic dictionary were used to create a protein name tagger. This tagger outperforms a previously published similar system when benchmarked on a corpus of manually annotated Medline abstracts. In addition to evaluating the tagging performance, the implemented algorithm was used to add mark-up to a corpus of approximately 10000 Medline abstracts, which were indexed in a state-of-the-art information retrieval system. Indexing highlights many basic benets of adding named entity mark-up such as protein names. One obvious benet is that the search process is enhanced by the addition of a search eld. Furthermore, the mark-up can be used for providing active hyperlinks between protein entities in presented documents and protein sequence databases, such as SwissProt, when both databases are indexed in the same information retrieval system. Efficient links can also be constructed in the opposite direction providing high precision retrieval of documents relevant for protein entries. Fast and accurate cross linking can be obtained by using an efficient implementation of the eld based approximate cosine measure, which is a simple standard information retrieval technique for document similarity searching. This poster presents methods, results, implementation details, and features of a prototype system.

Place, publisher, year, edition, pages
2002, 2.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-22517OAI: oai:DiVA.org:ri-22517DiVA, id: diva2:1042082
Conference
Bioinformatics 2002
Note

Poster presentation.

Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2020-12-02Bibliographically approved

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Eriksson, Gunnar
By organisation
SICS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 85 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf