Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On Privacy Preservation and Document-based Active Learning for Named Entity Recognition
RISE, Swedish ICT, SICS. Userware.
Number of Authors: 1
2009 (English)Conference paper, Published paper (Refereed)
Abstract [en]

The preservation of the privacy of persons mentioned in text requires the ability to automatically recognize and identify names. Named entity recognition is a mature field and most current approaches are based on supervised machine learning techniques. Such learning requires the presence of labeled examples on which to train; training examples are usually provided to the learner on the form of annotated corpora. Creating and annotating corpora is a tedious, meticulous and error prone process; obtaining good training examples is a hard task in itself. This paper describes the development and in-depth empirical investigation of a method, called BootMark, for bootstrapping the marking up of named entities in textual documents. Experimental results show that BootMark requires a human annotator to manually annotate fewer documents in order to produce a named entity recognizer with a given performance, than would be needed if the documents forming the basis for the recognizer were randomly drawn from the same corpus. The investigation further indicates that the primary gain obtained by BootMark compared to passive learning is in terms of higher recall. Thus, it is argued, the recognizers are suitable for use in privacy preservation applications.

Place, publisher, year, edition, pages
2009, 7.
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:ri:diva-23588OAI: oai:DiVA.org:ri-23588DiVA: diva2:1042664
Conference
ACM First International Workshop on Privacy and Anonymity for Very Large Datasets
Projects
COMPANIONS
Note
Workshop held in conjunction with The 18th ACM Conference on Information and Knowledge Management (CIKM 2009)Available from: 2016-10-31 Created: 2016-10-31Bibliographically approved

Open Access in DiVA

No full text

By organisation
SICS
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.27.0