Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Tagging and Morphological Processing in the SVENSK System
RISE, Swedish ICT, SICS. Department of Linguistics.
Number of Authors: 11998 (English)Independent thesis Advanced level (degree of Master (Two Years))Student thesis
Abstract [en]

This thesis describes the work of providing separate morphological processing and part-of-speech tagging modules in the svensk system by integrating the Uppsala Chart Processor (UCP) and a Brill tagger into the system. svensk employs GATE (General Architecture for Text Engineering) as the platform in which the components are to be integrated. Two pre-processing modules, a tokeniser and a sentence splitter for Swedish, were developed in order to facilitate the preparation of the texts to be analysed by UCP and the Brill tagger. These four components were then integrated in GATE together with a newly developed viewer for displaying the results produced by UCP. The thesis introduces the reader to the svensk project, the GATE system and its underlying parts, especially the database architecture which is based on the TIPSTER annotation model. Further, the issues in connection with the development and design of the tokeniser and the sentence splitter for Swedish are elaborated on. The mechanisms behind transformation-based error-driven learning methods as employed by the Brill tagger are introduced as well as the principles of chart processing in general and UCP in particular. The greater part of the thesis is devoted to the process of integrating the natural language (NL) modules in GATE using the Tcl/Tk application programmers interface (API) and a so-called loose coupling. The results of the integration of the NL modules are very encouraging: it is possible to mix modules written in programming languages from completely dierent paradigms (in this case the languages are Common LISP, Perl and C) and to have them interact with each other, thus maintaining a high degree of reuse of algorithmical resources. However, the use of Tcl/Tk and the associated API for processing structurally relatively complex data, i.e. the output from UCP, is time consuming and considerably slows the processing in GATE.

Place, publisher, year, edition, pages
1998, 1. , p. 110
Keywords [en]
gate, natural language engineering, software architecture
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-20990OAI: oai:DiVA.org:ri-20990DiVA, id: diva2:1041024
Projects
SVENSKAvailable from: 2016-10-31 Created: 2016-10-31 Last updated: 2018-01-14Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

http
By organisation
SICS
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 1 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.35.3