Change search
Refine search result
1 - 23 of 23
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Amrhein, Chantal
    et al.
    Textshuttle, Switzerland; University of Zurich, Switzerland.
    Moghe, Nikita
    University of Edinburgh, UK.
    Guillou, Liane
    RISE Research Institutes of Sweden, Digital Systems, Industrial Systems.
    ACES: Translation Accuracy Challenge Sets at WMT 20232023In: Conference on Machine Translation: Proceedings / [ed] Barry Haddow, Tom Kocmi, Philipp Koehn & Christof Monz, Association for Computational Linguistics , 2023, p. 693-710, article id 194371Conference paper (Refereed)
    Abstract [en]

    We benchmark the performance of segment-level metrics submitted to WMT 2023 using the ACES Challenge Set (Amrhein et al., 2022). The challenge set consists of 36K examples representing challenges from 68 phenomena and covering 146 language pairs. The phenomena range from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. For each metric, we provide a detailed profile of performance over a range of error categories as well as an overall ACES-Score for quick comparison. We also measure the incremental performance of the metrics submitted to both WMT 2023 and 2022. We find that 1) there is no clear winner among the metrics submitted to WMT 2023, and 2) performance change between the 2023 and 2022 versions of the metrics is highly variable. Our recommendations are similar to those from WMT 2022. Metric developers should focus on: building ensembles of metrics from different design families, developing metrics that pay more attention to the source and rely less on surface-level overlap, and carefully determining the influence of multilingual embeddings on MT evaluation.

  • 2.
    Bashir, Sarmad
    et al.
    RISE Research Institutes of Sweden, Digital Systems, Industrial Systems. Mälardalen University, Sweden.
    Abbas, Muhammad
    RISE Research Institutes of Sweden, Digital Systems, Industrial Systems. Mälardalen University, Sweden.
    Saadatmand, Mehrdad
    RISE Research Institutes of Sweden, Digital Systems, Industrial Systems.
    Enoiu, Eduard
    Mälardalen University, Sweden.
    Bohlin, Markus
    Mälardalen University, Sweden.
    Lindberg, Pernilla
    Alstom, Sweden.
    Requirement or Not, That is the Question: A Case from the Railway Industry2023In: Lecture Notes in Computer Science. Volume 13975. Pages 105 - 121 2023, Springer Science and Business Media Deutschland GmbH , 2023, p. 105-121Conference paper (Refereed)
    Abstract [en]

    Requirements in tender documents are often mixed with other supporting information. Identifying requirements in large tender documents could aid the bidding process and help estimate the risk associated with the project.  Manual identification of requirements in large documents is a resource-intensive activity that is prone to human error and limits scalability. This study compares various state-of-the-art approaches for requirements identification in an industrial context. For generalizability, we also present an evaluation on a real-world public dataset. We formulate the requirement identification problem as a binary text classification problem. Various state-of-the-art classifiers based on traditional machine learning, deep learning, and few-shot learning are evaluated for requirements identification based on accuracy, precision, recall, and F1 score. Results from the evaluation show that the transformer-based BERT classifier performs the best, with an average F1 score of 0.82 and 0.87 on industrial and public datasets, respectively. Our results also confirm that few-shot classifiers can achieve comparable results with an average F1 score of 0.76 on significantly lower samples, i.e., only 20% of the data.  There is little empirical evidence on the use of large language models and few-shots classifiers for requirements identification. This paper fills this gap by presenting an industrial empirical evaluation of the state-of-the-art approaches for requirements identification in large tender documents. We also provide a running tool and a replication package for further experimentation to support future research in this area. © 2023, The Author(s)

  • 3.
    Buljan, Maja
    et al.
    University of Oslo, Norway.
    Nirve, Joakim
    RISE Research Institutes of Sweden. Uppsala University, Sweden.
    Oepen, Stephan
    University of Oslo, Norway.
    Øvrelid, Lilja
    University of Oslo, Norway.
    A tale of four parsers: methodological reflections on diagnostic evaluation and in-depth error analysis for meaning representation parsing2022In: Language resources and evaluation, ISSN 1574-020X, E-ISSN 1574-0218, Vol. 56, p. 1075-1102Article in journal (Refereed)
    Abstract [en]

    We discuss methodological choices in diagnostic evaluation and error analysis in meaning representation parsing (MRP), i.e. mapping from natural language utterances to graph-based encodings of semantic structure. We expand on a pilot quantitative study in contrastive diagnostic evaluation, inspired by earlier work in syntactic dependency parsing, and propose a novel methodology for qualitative error analysis. This two-pronged study is performed using a selection of submissions, data, and evaluation tools featured in the 2019 shared task on MRP. Our aim is to devise methods for identifying strengths and weaknesses in different broad families of parsing techniques, as well as investigating the relations between specific parsing approaches, different meaning representation frameworks, and individual linguistic phenomena—by identifying and comparing common error patterns. Our preliminary empirical results suggest that the proposed methodologies can be meaningfully applied to parsing into graph-structured target representations, as a side-effect uncovering hitherto unknown properties of the different systems that can inform future development and cross-fertilization across approaches.

  • 4.
    Capshaw, Riley
    et al.
    Linköping University, Sweden.
    Blomqvist, Eva
    Linköping University, Sweden.
    Santini, Marina
    RISE Research Institutes of Sweden, Digital Systems, Prototyping Society.
    Alirezaie, Marjan
    Örebro University, Sweden.
    BERT is as Gentle as a Sledgehammer: Too Powerful or Too Blunt? It Depends on the Benchmark2021Conference paper (Other academic)
    Abstract [en]

    In this position statement, we wish to contribute to the discussion about how to assess quality and coverage of a model.

    We believe that BERT's prominence as a single-step pipeline for contextualization and classification highlights the need for benchmarks to evolve concurrently with models. Much recent work has touted BERT's raw power for solving natural language tasks, so we used a 12-layer uncased BERT pipeline with a linear classifier as a quick-and-dirty model to score well on the SemEval 2010 Task 8 dataset for relation classification between nominals. We initially expected there to be significant enough bias from BERT's training to influence downstream tasks, since it is well-known that biased training corpora can lead to biased language models (LMs). Gender bias is the most common example, where gender roles are codified within language models. To handle such training data bias, we took inspiration from work in the field of computer vision. Tang et al. (2020) mitigate human reporting bias over the labels of a scene graph generation task using a form of causal reasoning based on counterfactual analysis. They extract the total direct effect of the context image on the prediction task by "blanking out" detected objects, intuitively asking "What if these objects were not here?" If the system still predicts the same label, then the original prediction is likely caused by bias in some form. Our goal was to remove any effects from biases learned during BERT's pre-training, so we analyzed total effect (TE) instead. However, across several experimental configurations we found no noticeable effects from using TE analysis. One disappointing possibility was that BERT might be resistant to causal analysis due to its complexity. Another was that BERT is so powerful (or blunt?) that it can find unanticipated trends in its input, rendering any human-generated causal analysis of its predictions useless. We nearly concluded that what we expected to be delicate experimentation was more akin to trying to carve a masterpiece sculpture with a self-driven sledgehammer. We then found related work where BERT fooled humans by exploiting unexpected characteristics of a benchmark. When we used BERT to predict a relation for random words in the benchmark sentences, it guessed the same label as it would have for the corresponding marked entities roughly half of the time. Since the task had nineteen roughly-balanced labels, we expected much less consistency. This finding repeated across all pipeline configurations; BERT was treating the benchmark as a sequence classification task! Our final conclusion was that the benchmark is inadequate: all sentences appeared exactly once with exactly one pair of entities, so the task was equivalent to simply labeling each sentence. We passionately claim from our experience that the current trend of using larger and more complex LMs must include concurrent evolution of benchmarks. We as researchers need to be diligent in keeping our tools for measuring as sophisticated as the models being measured, as any scientific domain does.

    Download full text (pdf)
    fulltext
  • 5.
    Carlsson, Fredrik
    et al.
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    Öhman, Joey
    Liu, Fangyu
    Verlinden, Severine
    Nirve, Joakim
    RISE Research Institutes of Sweden.
    Sahlgren, Magnus
    Fine-Grained Controllable Text Generation Using Non-Residual Prompting2022In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, p. 6837-6857Conference paper (Refereed)
    Abstract [en]

    The introduction of immensely large Causal Language Models (CLMs) has rejuvenated the interest in open-ended text generation. However, controlling the generative process for these Transformer-based models is at large an unsolved problem. Earlier work has explored either plug-and-play decoding strategies, or more powerful but blunt approaches such as prompting. There hence currently exists a trade-off between fine-grained control, and the capability for more expressive high-level instructions. To alleviate this trade-off, we propose an encoder-decoder architecture that enables intermediate text prompts at arbitrary time steps. We propose a resource-efficient method for converting a pre-trained CLM into this architecture, and demonstrate its potential on various experiments, including the novel task of contextualized word inclusion. Our method provides strong results on multiple experimental settings, proving itself to be both expressive and versatile.

  • 6.
    Danielsson, Benjamin
    et al.
    Linköping University, Sweden.
    Santini, Marina
    RISE Research Institutes of Sweden, Digital Systems, Prototyping Society.
    Lundberg, Peter
    Linköping University, Sweden.
    Al-Abasse, Yosef
    Linköping University, Sweden.
    Jönsson, Arne
    Linköping University, Sweden.
    Eneling, Emma
    Linköping University, Sweden.
    Stridsman, Magnus
    Linköping University, Sweden.
    Classifying Implant-Bearing Patients via their Medical Histories: a Pre-Study on Swedish EMRs with Semi-Supervised GAN-BERT2022In: 2022 Language Resources and Evaluation Conference, LREC 2022, European Language Resources Association (ELRA) , 2022, p. 5428-5435Conference paper (Refereed)
    Abstract [en]

    In this paper, we compare the performance of two BERT-based text classifiers whose task is to classify patients (more precisely, their medical histories) as having or not having implant(s) in their body. One classifier is a fully-supervised BERT classifier. The other one is a semi-supervised GAN-BERT classifier. Both models are compared against a fully-supervised SVM classifier. Since fully-supervised classification is expensive in terms of data annotation, with the experiments presented in this paper, we investigate whether we can achieve a competitive performance with a semi-supervised classifier based only on a small amount of annotated data. Results are promising and show that the semi-supervised classifier has a competitive performance when compared with the fully-supervised classifier. © licensed under CC-BY-NC-4.0.

  • 7.
    Dhole, Kaustubh
    et al.
    Emory University, USA; Amelia R&D, USA.
    Kleyko, Denis
    RISE Research Institutes of Sweden, Digital Systems, Data Science. University of California, USA.
    Zhang, Yue
    Westlake Institute for Advanced Study, USA.
    NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation2023In: NEJLT Northern European Journal of Language Technology, ISSN 2000-1533, Vol. 9, no 1, p. 1-41Article in journal (Refereed)
    Abstract [en]

    Data augmentation is an important method for evaluating the robustness of and enhancing the diversity of training datafor natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based naturallanguage (NL) augmentation framework which supports the creation of transformations (modifications to the data) and filters(data splits according to specific features). We describe the framework and an initial set of117transformations and23filters for avariety of NL tasks annotated with noisy descriptive tags. The transformations incorporate noise, intentional and accidental humanmistakes, socio-linguistic variation, semantically-valid style, syntax changes, as well as artificial constructs that are unambiguousto humans. We demonstrate the efficacy of NL-Augmenter by using its transformations to analyze the robustness of popularlanguage models. We find different models to be differently challenged on different tasks, with quasi-systematic score decreases.The infrastructure, datacards, and robustness evaluation results are publicly available onGitHubfor the benefit of researchersworking on paraphrase generation, robustness analysis, and low-resource NLP.

  • 8.
    Dong, Guojun
    et al.
    University of Copenhagen, Denmark.
    Bate, Andrew
    GSK, United Kingdom; London School of Hygiene and Tropical Medicine, United Kingdom.
    Haguinet, François
    GSK, United Kingdom.
    Westman, Gabriel
    Uppsala University, Sweden.
    Dürlich, Luise
    RISE Research Institutes of Sweden, Digital Systems, Data Science. Uppsala University, Sweden.
    Hviid, Anders
    University of Copenhagen, Denmark; Statens Serum Institut, Denmark.
    Sessa, Maurizio
    University of Copenhagen, Denmark.
    Optimizing Signal Management in a Vaccine Adverse Event Reporting System: A Proof-of-Concept with COVID-19 Vaccines Using Signs, Symptoms, and Natural Language Processing2024In: Drug Safety, ISSN 0114-5916, E-ISSN 1179-1942, Vol. 47, no 2, p. 173-Article in journal (Refereed)
    Abstract [en]

    Introduction: The Vaccine Adverse Event Reporting System (VAERS) has already been challenged by an extreme increase in the number of individual case safety reports (ICSRs) after the market introduction of coronavirus disease 2019 (COVID-19) vaccines. Evidence from scientific literature suggests that when there is an extreme increase in the number of ICSRs recorded in spontaneous reporting databases (such as the VAERS), an accompanying increase in the number of disproportionality signals (sometimes referred to as ‘statistical alerts’) generated is expected. Objectives: The objective of this study was to develop a natural language processing (NLP)-based approach to optimize signal management by excluding disproportionality signals related to listed adverse events following immunization (AEFIs). COVID-19 vaccines were used as a proof-of-concept. Methods: The VAERS was used as a data source, and the Finding Associated Concepts with Text Analysis (FACTA+) was used to extract signs and symptoms of listed AEFIs from MEDLINE for COVID-19 vaccines. Disproportionality analyses were conducted according to guidelines and recommendations provided by the US Centers for Disease Control and Prevention. By using signs and symptoms of listed AEFIs, we computed the proportion of disproportionality signals dismissed for COVID-19 vaccines using this approach. Nine NLP techniques, including Generative Pre-Trained Transformer 3.5 (GPT-3.5), were used to automatically retrieve Medical Dictionary for Regulatory Activities Preferred Terms (MedDRA PTs) from signs and symptoms extracted from FACTA+. Results: Overall, 17% of disproportionality signals for COVID-19 vaccines were dismissed as they reported signs and symptoms of listed AEFIs. Eight of nine NLP techniques used to automatically retrieve MedDRA PTs from signs and symptoms extracted from FACTA+ showed suboptimal performance. GPT-3.5 achieved an accuracy of 78% in correctly assigning MedDRA PTs. Conclusion: Our approach reduced the need for manual exclusion of disproportionality signals related to listed AEFIs and may lead to better optimization of time and resources in signal management. © 2023, The Author(s).

    Download full text (pdf)
    fulltext
  • 9.
    Dürlich, Luise
    et al.
    RISE Research Institutes of Sweden, Digital Systems, Data Science. Uppsala University, Sweden.
    Gogoulou, Evangelia
    RISE Research Institutes of Sweden, Digital Systems, Data Science. KTH Royal Institute of Technology, Sweden.
    Nivre, Joakim
    RISE Research Institutes of Sweden, Digital Systems, Data Science. Uppsala University, Sweden.
    On the Concept of Resource-Efficiency in NLP2023In: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 2023, p. 135-145Conference paper (Refereed)
    Abstract [en]

    Resource-efficiency is a growing concern in the NLP community. But what are the resources we care about and why? How do we measure efficiency in a way that is reliable and relevant? And how do we balance efficiency and other important concerns? Based on a review of the emerging literature on the subject, we discuss different ways of conceptualizing efficiency in terms of product and cost, using a simple case study on fine-tuning and knowledge distillation for illustration. We propose a novel metric of amortized efficiency that is better suited for life-cycle analysis than existing metrics.

    Download full text (pdf)
    fulltext
  • 10.
    Dürlich, Luise
    et al.
    RISE Research Institutes of Sweden, Digital Systems, Data Science. Uppsala University, Sweden.
    Riemann, Sebastian
    Uppsala University, Sweden; Ruhr-Universitat Bochum, Germany .
    Finnveden, Gustav
    Uppsala University, Sweden.
    Nirve, Joakim
    RISE Research Institutes of Sweden, Digital Systems, Data Science. Uppsala University, Sweden.
    Stymne, Sara
    Uppsala University, Sweden.
    Cause and Effect in Governmental Reports: Two Data Sets for Causality Detection in Swedish2022In: Proceedings of the First Workshop on Natural Language Processing for Political Sciences (PoliticalNLP), Marseille, Framnce,. 24 June 2022, 2022, p. 46-55Conference paper (Refereed)
    Abstract [en]

    Causality detection is the task of extracting information about causal relations from text. It is an important task for different types of document analysis, including political impact assessment. We present two new data sets for causality detection in Swedish. The first data set is annotated with binary relevance judgments, indicating whether a sentence contains causality information or not. In the second data set, sentence pairs are ranked for relevance with respect to a causality query, containing a specific hypothesized cause and/or effect. Both data sets are carefully curated and mainly intended for use as test data. We describe the data sets and their annotation, including detailed annotation guidelines. In addition, we present pilot experiments on cross-lingual zero-shot and few-shot causality detection, using training data from English and German.

  • 11.
    Ekgren, A.
    et al.
    AI Sweden, Sweden.
    Gyllensten, AC
    RISE Research Institutes of Sweden.
    Gogoulou, Evangelia
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    Heiman, A.
    AI Sweden, Sweden.
    Verlinden, S.
    AI Sweden, Sweden.
    Öhman, J.
    AI Sweden, Sweden.
    Carlsson, Fredrik
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    Sahlgren, M.
    AI Sweden, Sweden.
    Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish2022In: 2022 Language Resources and Evaluation Conference, LREC 2022, European Language Resources Association (ELRA) , 2022, p. 3509-3518Conference paper (Refereed)
    Abstract [en]

    We present GPT-SW3, a 3.5 billion parameter autoregressive language model, trained on a newly created 100 GB Swedish corpus. This paper provides insights with regard to data collection and training process, and discusses the challenges of proper evaluation. The results of quantitive evaluation using perplexity indicate that GPT-SW3 is a competent model in comparison with existing autoregressive models of similar size. Additionally, we perform an extensive prompting study which reveals the good text generation capabilities of GPT-SW3. © licensed under CC-BY-NC-4.0.

  • 12.
    Gyllensten, Amaru Cuba
    et al.
    RISE Research Institutes of Sweden.
    Gogoulou, Evangelia
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    Ekgren, Ariel
    RISE Research Institutes of Sweden.
    Sahlgren, Magnus
    RISE Research Institutes of Sweden.
    SenseCluster at SemEval-2020 Task 1: Unsupervised lexical semantic change detection2020In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020, p. 112-118Conference paper (Refereed)
  • 13.
    Görnerup, Olof
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Gillblad, Daniel
    RISE - Research Institutes of Sweden, ICT, SICS.
    Streaming word similarity mining on the cheap2018Conference paper (Other academic)
    Abstract [en]

    Accurately and efficiently estimating word similarities from text is fundamental in natural language processing. In this paper, we propose a fast and lightweight method for estimating similarities from streams by explicitly counting second-order co-occurrences. The method rests on the observation that words that are highly correlated with respect to such counts are also highly similar with respect to first-order co-occurrences. Using buffers of co-occurred words per word to count second-order co-occurrences, we can then estimate similarities in a single pass over data without having to do prohibitively expensive similarity calculations. We demonstrate that this approach is scalable, converges rapidly, behaves robustly under parameter changes, and that it captures word similarities on par with those given by state-of-the-art word embeddings.

  • 14.
    Jerdhaf, Oskar
    et al.
    Linkoping University, Sweden .
    Santini, Marina
    RISE Research Institutes of Sweden, Digital Systems, Prototyping Society.
    Lundberg, Peter
    Linkoping University, Sweden .
    Karlsson, Anette
    Linkoping University, Sweden .
    Jönsson, Arne
    Linkoping University, Sweden .
    Implant Terms: Focused Terminology Extraction with Swedish BERT - Preliminary Results2020Conference paper (Refereed)
    Abstract [en]

    Certain implants are imperative to detect be-fore MRI scans. However, implant terms, like‘pacemaker’ or ‘stent’, are sparse and difficultto identify in noisy and hastily written elec-tronic medical records (EMRs). In this pa-per, we explore how to discover implant termsin Swedish EMRs with an unsupervised ap-proach.To this purpose, we use BERT, astate-of-the-art deep learning algorithm, andfine-tune a model built on pre-trained SwedishBERT. We observe that BERT discovers asolid proportion of indicative implant terms.

    Download full text (pdf)
    fulltext
  • 15.
    Kulmizev, Artur
    et al.
    Uppsala University, Sweden.
    Nivre, Joakim
    RISE Research Institutes of Sweden, Digital Systems, Data Science. Uppsala University, Sweden.
    Schrödinger's tree—On syntax and neural language models2022In: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 5, article id 796788Article in journal (Refereed)
    Abstract [en]

    In the last half-decade, the field of natural language processing (NLP) has undergone two major transitions: the switch to neural networks as the primary modeling paradigm and the homogenization of the training regime (pre-train, then fine-tune). Amidst this process, language models have emerged as NLP's workhorse, displaying increasingly fluent generation capabilities and proving to be an indispensable means of knowledge transfer downstream. Due to the otherwise opaque, black-box nature of such models, researchers have employed aspects of linguistic theory in order to characterize their behavior. Questions central to syntax—the study of the hierarchical structure of language—have factored heavily into such work, shedding invaluable insights about models' inherent biases and their ability to make human-like generalizations. In this paper, we attempt to take stock of this growing body of literature. In doing so, we observe a lack of clarity across numerous dimensions, which influences the hypotheses that researchers form, as well as the conclusions they draw from their findings. To remedy this, we urge researchers to make careful considerations when investigating coding properties, selecting representations, and evaluating via downstream tasks. Furthermore, we outline the implications of the different types of research questions exhibited in studies on syntax, as well as the inherent pitfalls of aggregate metrics. Ultimately, we hope that our discussion adds nuance to the prospect of studying language models and paves the way for a less monolithic perspective on syntax in this context. 

  • 16.
    Lenci, Alessandro
    et al.
    Università di Pisa, Italy.
    Sahlgren, Magnus
    AI Sweden, Sweden.
    Jeuniaux, Patrick
    Institut National de Criminalistique et de Criminologie, Belgium.
    Cuba Gyllensten, Amaru
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    Miliani, Martina
    Università per Stranieri di Siena, Italy; Università di Pisa, Italy.
    A comparative evaluation and analysis of three generations of Distributional Semantic Models2022In: Language resources and evaluation, ISSN 1574-020X, E-ISSN 1574-0218, Vol. 56, p. 1219-Article in journal (Refereed)
    Abstract [en]

    Distributional semantics has deeply changed in the last decades. First, predict models stole the thunder from traditional count ones, and more recently both of them were replaced in many NLP applications by contextualized vectors produced by neural language models. Although an extensive body of research has been devoted to Distributional Semantic Model (DSM) evaluation, we still lack a thorough comparison with respect to tested models, semantic tasks, and benchmark datasets. Moreover, previous work has mostly focused on task-driven evaluation, instead of exploring the differences between the way models represent the lexical semantic space. In this paper, we perform a large-scale evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT. First of all, we investigate the performance of embeddings in several semantic tasks, carrying out an in-depth statistical analysis to identify the major factors influencing the behavior of DSMs. The results show that (i) the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous and (ii) static DSMs surpass BERT representations in most out-of-context semantic tasks and datasets. Furthermore, we borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models. RSA reveals important differences related to the frequency and part-of-speech of lexical items. © 2022, The Author(s).

  • 17.
    Lindqvist, Ellinor
    et al.
    Uppsala University, Sweden.
    Pettersson, Eva
    Uppsala University, Sweden.
    Nivre, Joakim
    RISE Research Institutes of Sweden, Digital Systems, Data Science. Uppsala University, Sweden.
    To the Most Gracious Highness, from Your Humble Servant: Analysing Swedish 18th Century Petitions Using Text Classification2022In: Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2022, p. 53-64Conference paper (Refereed)
    Abstract [en]

    Petitions are a rich historical source, yet they have been relatively little used in historical research. In this paper, we aim to analyse Swedish texts from around the 18th century, and petitions in particular, using automatic means of text classification. We also test how text pre-processing and different feature representations affect the result, and we examine feature importance for our main class of interest – petitions. Our experiments show that the statistical algorithms NB, RF, SVM, and kNN are indeed very able to classify different genres of historical text. Further, we find that normalisation has a positive impact on classification, and that content words are particularly informative for the traditional models. A fine-tuned BERT model, fed with normalised data, outperforms all other classification experiments with a macro average F1 score at 98.8. However, using less computationally expensive methods, including feature representation with word2vec, fastText embeddings or even TF-IDF values, with a SVM classifier also show good results for both unnormalised and normalised data. In the feature importance analysis, where we obtain the features most decisive for the classification models, we find highly relevant characteristics of the petitions, namely words expressing signs of someone inferior addressing someone superior.

  • 18.
    Löwenmark, Karl
    et al.
    Lulea University of Technology, Sweden.
    Taal, Cees
    SKF Research & Technology Development, Sweden.
    Nivre, Joakim
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    Liwicki, Marcus
    Lulea University of Technology, Sweden.
    Sandin, Fredrik
    Lulea University of Technology, Sweden.
    Processing of Condition Monitoring Annotations with BERT and Technical Language Processing2022Conference paper (Refereed)
    Abstract [en]

    Annotations in condition monitoring systems contain information regarding asset history and fault characteristics in the form of unstructured text that could, if unlocked, be used for intelligent fault diagnosis. However, processing these annotations with pre-trained natural language models such as BERT is problematic due to out-of-vocabulary (OOV) technical terms, resulting in inaccurate language embeddings. Here we investigate the effect of OOV technical terms on BERT and SentenceBERT embeddings by substituting technical terms with natural language descriptions. The embeddings were computed for each annotation in a pre-processed corpus, with and without substitution. The K-Means clustering score was calculated on sentence embeddings, and a Long Short-Term Memory (LSTM) network was trained on word embeddings with the objective to recreate the output from a keywordbased annotation classifier. The K-Means score for SentenceBERT annotation embeddings improved by 40% at seven clusters by technical language substitution, and the labelling capacity of the BERT-LSTM model was improved from 88.3 to 94.2%. These results indicate that the substitution of OOV technical terms can improve the representation accuracy of the embeddings of the pre-trained BERT and SentenceBERT models, and that pre-trained language models can be used to process technical language.

  • 19.
    Martinsson, John
    et al.
    RISE Research Institutes of Sweden, Digital Systems, Data Science. Lund University, Sweden.
    Willbo, Martin
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    Pirinen, Aleksis
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    Mogren, Olof
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    Sandsten, Maria
    Lund University, Sweden.
    Few-shot bioacoustic event detection using a prototypical network ensemble with adaptive embedding functions2022Conference paper (Refereed)
    Abstract [en]

    In this report we present our method for the DCASE 2022 challenge on few-shot bioacoustic event detection. We use an ensemble of prototypical neural networks with adaptive embedding functions and show that both ensemble and adaptive embedding functions can be used to improve results from an average F-score of 41.3% to an average F-score of 60.0% on the validation dataset.

    Download full text (pdf)
    fulltext
  • 20.
    Muram, Faiz Ul
    et al.
    Linnaeus University, Sweden.
    Javed, Muhammad Atif
    RISE Research Institutes of Sweden, Digital Systems, Data Science.
    ATTEST: Automating the review and update of assurance case arguments2023In: Journal of systems architecture, ISSN 1383-7621, E-ISSN 1873-6165, Vol. 134, article id 102781Article in journal (Refereed)
    Abstract [en]

    The assurance case arguments are created to demonstrate acceptable system safety and/or security. In this regard, a series of propositions expressed by natural language statements (claims) are broken down into sub-claims representing a logical chain of reasoning until the corresponding evidence is obtained. The review and update of assurance arguments for aligning with the process and product counterparts used for their construction are essential tasks. These tasks are perceived as challenging but can be efficiently supported by using Natural Language Processing (NLP). To date, however, the published studies on assurance cases have not leveraged the NLP. Accordingly, this paper presents our NLP-based assurance framework called ATTEST. At first, the text preprocessing is carried out by using NLP tasks. The rules are created, in which both syntactic and semantic features are captured. The former is captured by using NLP tasks, while the latter is captured by the internal structure of models as well as the mappings across them. The created rules are triggered for argument comprehension, well-formedness, sufficiency checks, and identifying defeaters and counter-evidence selection. Besides the process, product, and assurance case models produced during the design and development phase, the operational data is gathered from the configured simulation environments and used for identifying problems as well as the measures for resolving them. Finally, the affected parts of assurance case models are highlighted and the underlying reasoning for their adaptation is presented. The applicability of the proposed framework is demonstrated by reviewing and updating assurance cases constructed for vehicular Accelerator Control System (ACS) with Electronic Throttle Control (ETC). © 2022 The Author(s)

  • 21.
    Rennes, Evelina
    et al.
    Linköping University, Sweden.
    Santini, Marina
    RISE Research Institutes of Sweden, Digital Systems, Prototyping Society.
    Jönsson, Arne
    Linköping University, Sweden.
    The Swedish Simplification Toolkit: Designed with Target Audiences in Mind2022In: 2nd Workshop on Tools and Resources for REAding DIfficulties, READI 2022 - collocated with the International Conference on Language Resources and Evaluation Conference, LREC 2022, European Language Resources Association (ELRA) , 2022, p. 31-38Conference paper (Refereed)
    Abstract [en]

    In this paper, we present the current version of The Swedish Simplification Toolkit. The toolkit includes computational and empirical tools that have been developed along the years to explore a still neglected area of NLP, namely the simplification of “standard” texts to meet the needs of target audiences. Target audiences, such as people affected by dyslexia, aphasia, autism, but also children and second language learners, require different types of text simplification and adaptation. For example, while individuals with aphasia have difficulties in reading compounds (such as arbetsmarknadsdepartement, eng. ministry of employment), second language learners struggle with cultural-specific vocabulary (e.g. konflikträdd, eng. afraid of conflicts). The toolkit allows user to selectively select the types of simplification that meet the specific needs of the target audience they belong to. The Swedish Simplification Toolkit is one of the first attempts to overcome the one-fits-all approach that is still dominant in Automatic Text Simplification, and proposes a set of computational methods that, used individually or in combination, may help individuals reduce reading (and writing) difficulties.

    Download full text (pdf)
    fulltext
  • 22.
    Santini, Marina
    et al.
    RISE Research Institutes of Sweden, Digital Systems, Prototyping Society.
    Rennes, Evelina
    Linköping University, Sweden.
    Holmer, Daniel
    Linköping University, Sweden.
    Jönsson, Arne
    Linköping University, Sweden.
    Human-in-the-Loop: Where Does Text Complexity Lie?2021Conference paper (Other academic)
    Abstract [en]

    In this position statement, we would like to contribute to the discussion about how to assess quality and coverage of a model. In this context, we verbalize the need of linguistic features’ interpretability and the need of profiling textual variations. These needs are triggered by the necessity to gain insights into intricate patterns of human communication. Arguably, the functional and linguistic interpretation of these communication patterns contribute to keep humans’ needs in the loop, thus demoting the myth of powerful but dehumanized Artificial Intelligence. The desideratum to open up the “black boxes” of AI-based machines has become compelling. Recent research has focussed on how to make sense and popularize deep learning models and has explored how to “probe” these models to understand how they learn. The BERTology science is actively and diligently digging into BERT’s complex clockwork. However, much remains to be unearthed: “BERTology has clearly come a long way, but it is fair to say we still have more questions than answers about how BERT works”. It is therefore not surprising that add-on tools are being created to inspect pre-trained language models with the aim to cast some light on the “interpretation of pre-trained models in the context of downstream tasks and domain-specific data”. Here we do not propose any new tool, but we try to formulate and exemplify the problem by taking the case of text simplification/text complexity. When we compare a standard text and an easy-to-read text (e.g. lättsvenska or simple English) we wonder: where does text complexity lie? Can we pin it down? According to Simple English Wikipedia, “(s)imple English is similar to English, but it only uses basic words. We suggest that articles should use only the 1,000 most common and basic words in English. They should also use only simple grammar and shorter sentences.” This characterization of a simplified text does not provide much linguistic insight: what is meant by simple grammar? Linguistic insights are also missing from state-of-the-art NLP models for text simplification, since these models are basically monolingual neural machine translation systems that take a standard text and “translate” it into a simplified type of (sub)language. We do not gain any linguistic understanding, of what is being simplified and why. We just get the task done (which is of course good). We know for sure that standard and easy-to-read texts differ in a number of ways and we are able to use BERT to create classifiers that discriminate the two varieties. But how are linguistic features re-shuffled to generate a simplified text from a standard one? With traditional statistical approaches, such as Biber’s MDA (based on factor analysis) we get an idea of how linguistic features co-occur and interact in different text types and why. Since pre-trained language models are more powerful than traditional statistical models, like factor analysis, we would like to see more research on "disclosing the layers" so that we can understand how different co-occurrence of linguistic features contribute to the make up of specific varieties of texts, like simplified vs standard texts. Would it be possible to update the iconic example

    Download full text (pdf)
    fulltext
  • 23.
    Santini, Marina
    et al.
    RISE - Research Institutes of Sweden (2017-2019), ICT, SICS.
    Strandqvist, Wiktor
    Jönsson, Arne
    RISE - Research Institutes of Sweden (2017-2019), ICT, SICS.
    Profiling specialized web corpus qualities: A progress report on "Domainhood"2019In: Argentinian Journal of Applied Linguistics, ISSN 2314-3576, Vol. 7, no 7Article in journal (Refereed)
1 - 23 of 23
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf