Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Introducing the notion of ‘contrast’ features for language technology
RISE - Research Institutes of Sweden, ICT, SICS.ORCID-id: 0000-0002-5737-8149
Linköping Universtity, Sweden.
RISE - Research Institutes of Sweden, ICT, SICS. Linköping Universtity, Sweden.
2019 (Engelska)Ingår i: International Conference on Database and Expert Systems Applications              DEXA 2019: Database and Expert Systems Applications, Springer Verlag , 2019, s. 189-198Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In this paper, we explore whether there exist ‘contrast’ features that help recognize if a text variety is a genre or a domain. We carry out our experiments on the text varieties that are included in the Swedish national corpus, called Stockholm-Umeå Corpus or SUC, and build several text classification models based on text complexity features, grammatical features, bag-of-words features and word embeddings. Results show that text complexity features and grammatical features systematically perform better on genres rather than on domains. This indicates that these features can be used as ‘contrast’ features because, when in doubt about the nature of a text category, they help bring it to light.

Ort, förlag, år, upplaga, sidor
Springer Verlag , 2019. s. 189-198
Nyckelord [en]
Domain, Features, Genre, Supervised classification, Character recognition, Text processing, Bag of words, Language technology, Stockholm, Text classification models, Classification (of information)
Nationell ämneskategori
Naturvetenskap
Identifikatorer
URN: urn:nbn:se:ri:diva-39929DOI: 10.1007/978-3-030-27684-3_24Scopus ID: 2-s2.0-85071879330OAI: oai:DiVA.org:ri-39929DiVA, id: diva2:1361853
Konferens
International Conference on Database and Expert Systems Applications DEXA 2019: Database and Expert Systems Applications
Anmärkning

 Funding text 1: Acknowledgements. This research was supported by E-care@home, a “SIDUS – Strong Distributed Research Environment” project, funded by the Swedish Knowledge Foundation [kk-stiftelsen, Diarienr: 20140217]. Project website: http://ecareathome.se/

Tillgänglig från: 2019-10-17 Skapad: 2019-10-17 Senast uppdaterad: 2019-10-17Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Santini, Marina

Sök vidare i DiVA

Av författaren/redaktören
Santini, Marina
Av organisationen
SICS
Naturvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 18 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf