Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Emory University, USA; Amelia R&D, USA.
RISE Research Institutes of Sweden, Digital Systems, Data Science. University of California, USA.ORCID iD: 0000-0002-6032-6155
Westlake Institute for Advanced Study, USA.
Number of Authors: 1252023 (English)In: NEJLT Northern European Journal of Language Technology, ISSN 2000-1533, Vol. 9, no 1, p. 1-41Article in journal (Refereed) Published
Abstract [en]

Data augmentation is an important method for evaluating the robustness of and enhancing the diversity of training datafor natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based naturallanguage (NL) augmentation framework which supports the creation of transformations (modifications to the data) and filters(data splits according to specific features). We describe the framework and an initial set of117transformations and23filters for avariety of NL tasks annotated with noisy descriptive tags. The transformations incorporate noise, intentional and accidental humanmistakes, socio-linguistic variation, semantically-valid style, syntax changes, as well as artificial constructs that are unambiguousto humans. We demonstrate the efficacy of NL-Augmenter by using its transformations to analyze the robustness of popularlanguage models. We find different models to be differently challenged on different tasks, with quasi-systematic score decreases.The infrastructure, datacards, and robustness evaluation results are publicly available onGitHubfor the benefit of researchersworking on paraphrase generation, robustness analysis, and low-resource NLP.

Place, publisher, year, edition, pages
2023. Vol. 9, no 1, p. 1-41
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:ri:diva-67824DOI: 10.3384/nejlt.2000-1533.2023.4725OAI: oai:DiVA.org:ri-67824DiVA, id: diva2:1812497
Available from: 2023-11-16 Created: 2023-11-16 Last updated: 2023-12-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Kleyko, Denis

Search in DiVA

By author/editor
Kleyko, Denis
By organisation
Data Science
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 20 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf