Change search
Link to record
Permanent link

Direct link
Publications (10 of 20) Show all publications
Danielsson, B., Santini, M., Lundberg, P., Al-Abasse, Y., Jönsson, A., Eneling, E. & Stridsman, M. (2022). Classifying Implant-Bearing Patients via their Medical Histories: a Pre-Study on Swedish EMRs with Semi-Supervised GAN-BERT. In: 2022 Language Resources and Evaluation Conference, LREC 2022: . Paper presented at 13th International Conference on Language Resources and Evaluation Conference, LREC 2022, 20 June 2022 through 25 June 2022 (pp. 5428-5435). European Language Resources Association (ELRA)
Open this publication in new window or tab >>Classifying Implant-Bearing Patients via their Medical Histories: a Pre-Study on Swedish EMRs with Semi-Supervised GAN-BERT
Show others...
2022 (English)In: 2022 Language Resources and Evaluation Conference, LREC 2022, European Language Resources Association (ELRA) , 2022, p. 5428-5435Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we compare the performance of two BERT-based text classifiers whose task is to classify patients (more precisely, their medical histories) as having or not having implant(s) in their body. One classifier is a fully-supervised BERT classifier. The other one is a semi-supervised GAN-BERT classifier. Both models are compared against a fully-supervised SVM classifier. Since fully-supervised classification is expensive in terms of data annotation, with the experiments presented in this paper, we investigate whether we can achieve a competitive performance with a semi-supervised classifier based only on a small amount of annotated data. Results are promising and show that the semi-supervised classifier has a competitive performance when compared with the fully-supervised classifier. © licensed under CC-BY-NC-4.0.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2022
Keywords
BERT, clinical text mining, electronic medical records, EMR, GAN-BERT, text classification, Classification (of information), Data mining, Medical computing, Text processing, Electronic medical record, Medical record, Semi-supervised, Supervised classifiers, Text-mining, Support vector machines
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:ri:diva-62365 (URN)2-s2.0-85144479096 (Scopus ID)9791095546726 (ISBN)
Conference
13th International Conference on Language Resources and Evaluation Conference, LREC 2022, 20 June 2022 through 25 June 2022
Note

Funding details: VINNOVA, 2021-01699; Funding text 1: This research was funded by Vinnova (Sweden’s innovation agency), https://www.vinnova.se/ Project title: Patient-Safe Magnetic Resonance Imaging Examination by AI-based Medical Screening. Grant number: 2021-01699 to Peter Lundberg.; Funding text 2: This research was funded by Vinnova (Sweden's innovation agency), https://www.vinnova.se/Project title: Patient-Safe Magnetic Resonance Imaging Examination by AI-based Medical Screening. Grant number: 2021-01699 to Peter Lundberg.

Available from: 2023-01-24 Created: 2023-01-24 Last updated: 2023-01-24Bibliographically approved
Brännvall, R., Forsgren, H., Linge, H., Santini, M., Salehi, A. & Rahimian, F. (2022). Homomorphic encryption enables private data sharing for digital health: Winning entry to the Vinnova innovation competition Vinter 2021-22. In: 34th Workshop of the Swedish Artificial Intelligence Society, SAIS 2022: . Paper presented at 34th Workshop of the Swedish Artificial Intelligence Society, SAIS 2022, 13 June 2022 through 14 June 2022. Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Homomorphic encryption enables private data sharing for digital health: Winning entry to the Vinnova innovation competition Vinter 2021-22
Show others...
2022 (English)In: 34th Workshop of the Swedish Artificial Intelligence Society, SAIS 2022, Institute of Electrical and Electronics Engineers Inc. , 2022Conference paper, Published paper (Refereed)
Abstract [en]

People living with type 1 diabetes often use several apps and devices that help them collect and analyse data for a better monitoring and management of their disease. When such health related data is analysed in the cloud, one must always carefully consider privacy protection and adhere to laws regulating the use of personal data. In this paper we present our experience at the pilot Vinter competition 2021-22 organised by Vinnova. The competition focused on digital services that handle sensitive diabetes related data. The architecture that we proposed for the competition is discussed in the context of a hypothetical cloud-based service that calculates diabetes self-care metrics under strong privacy preservation. It is based on Fully Homomorphic Encryption (FHE)-a technology that makes computation on encrypted data possible. Our solution promotes safe key management and data life-cycle control. Our benchmarking experiment demonstrates execution times that scale well for the implementation of personalised health services. We argue that this technology has great potentials for AI-based health applications and opens up new markets for third-party providers of such services, and will ultimately promote patient health and a trustworthy digital society.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2022
Keywords
Cryptography, Information services, Life cycle, Sensitive data, Cloud-based, Digital services, Ho-momorphic encryptions, Homomorphic-encryptions, Monitoring and management, Privacy preservation, Privacy protection, Private data sharing, Self-care, Type 1 diabetes, Health
National Category
Political Science
Identifiers
urn:nbn:se:ri:diva-60198 (URN)10.1109/SAIS55783.2022.9833062 (DOI)2-s2.0-85136149174 (Scopus ID)9781665471268 (ISBN)
Conference
34th Workshop of the Swedish Artificial Intelligence Society, SAIS 2022, 13 June 2022 through 14 June 2022
Available from: 2022-10-07 Created: 2022-10-07 Last updated: 2023-10-31Bibliographically approved
Rennes, E., Santini, M. & Jönsson, A. (2022). The Swedish Simplification Toolkit: Designed with Target Audiences in Mind. In: 2nd Workshop on Tools and Resources for REAding DIfficulties, READI 2022 - collocated with the International Conference on Language Resources and Evaluation Conference, LREC 2022: . Paper presented at 2nd Workshop on Tools and Resources for REAding DIfficulties, READI 2022, 20 June 2022 through 25 June 2022 (pp. 31-38). European Language Resources Association (ELRA)
Open this publication in new window or tab >>The Swedish Simplification Toolkit: Designed with Target Audiences in Mind
2022 (English)In: 2nd Workshop on Tools and Resources for REAding DIfficulties, READI 2022 - collocated with the International Conference on Language Resources and Evaluation Conference, LREC 2022, European Language Resources Association (ELRA) , 2022, p. 31-38Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we present the current version of The Swedish Simplification Toolkit. The toolkit includes computational and empirical tools that have been developed along the years to explore a still neglected area of NLP, namely the simplification of “standard” texts to meet the needs of target audiences. Target audiences, such as people affected by dyslexia, aphasia, autism, but also children and second language learners, require different types of text simplification and adaptation. For example, while individuals with aphasia have difficulties in reading compounds (such as arbetsmarknadsdepartement, eng. ministry of employment), second language learners struggle with cultural-specific vocabulary (e.g. konflikträdd, eng. afraid of conflicts). The toolkit allows user to selectively select the types of simplification that meet the specific needs of the target audience they belong to. The Swedish Simplification Toolkit is one of the first attempts to overcome the one-fits-all approach that is still dominant in Automatic Text Simplification, and proposes a set of computational methods that, used individually or in combination, may help individuals reduce reading (and writing) difficulties.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2022
Keywords
automatic text adaptation, automatic text simplification, easy-to-read, 'current, Second language learners, Swedishs, Target audience
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:ri:diva-62645 (URN)2-s2.0-85145884191 (Scopus ID)9791095546849 (ISBN)
Conference
2nd Workshop on Tools and Resources for REAding DIfficulties, READI 2022, 20 June 2022 through 25 June 2022
Note

Funding details: VINNOVA; Funding details: Vetenskapsrådet, VR; Funding text 1: This work has been funded by The Swedish Research Council (VR) and Sweden’s innovation agency (VI-NOVA).

Available from: 2023-01-24 Created: 2023-01-24 Last updated: 2023-01-24Bibliographically approved
Capshaw, R., Blomqvist, E., Santini, M. & Alirezaie, M. (2021). BERT is as Gentle as a Sledgehammer: Too Powerful or Too Blunt? It Depends on the Benchmark. In: : . Paper presented at Sustainable language representations for a changing world, 31 May 2021.NoDaLiDa workshop.
Open this publication in new window or tab >>BERT is as Gentle as a Sledgehammer: Too Powerful or Too Blunt? It Depends on the Benchmark
2021 (English)Conference paper, Oral presentation with published abstract (Other academic)
Abstract [en]

In this position statement, we wish to contribute to the discussion about how to assess quality and coverage of a model.

We believe that BERT's prominence as a single-step pipeline for contextualization and classification highlights the need for benchmarks to evolve concurrently with models. Much recent work has touted BERT's raw power for solving natural language tasks, so we used a 12-layer uncased BERT pipeline with a linear classifier as a quick-and-dirty model to score well on the SemEval 2010 Task 8 dataset for relation classification between nominals. We initially expected there to be significant enough bias from BERT's training to influence downstream tasks, since it is well-known that biased training corpora can lead to biased language models (LMs). Gender bias is the most common example, where gender roles are codified within language models. To handle such training data bias, we took inspiration from work in the field of computer vision. Tang et al. (2020) mitigate human reporting bias over the labels of a scene graph generation task using a form of causal reasoning based on counterfactual analysis. They extract the total direct effect of the context image on the prediction task by "blanking out" detected objects, intuitively asking "What if these objects were not here?" If the system still predicts the same label, then the original prediction is likely caused by bias in some form. Our goal was to remove any effects from biases learned during BERT's pre-training, so we analyzed total effect (TE) instead. However, across several experimental configurations we found no noticeable effects from using TE analysis. One disappointing possibility was that BERT might be resistant to causal analysis due to its complexity. Another was that BERT is so powerful (or blunt?) that it can find unanticipated trends in its input, rendering any human-generated causal analysis of its predictions useless. We nearly concluded that what we expected to be delicate experimentation was more akin to trying to carve a masterpiece sculpture with a self-driven sledgehammer. We then found related work where BERT fooled humans by exploiting unexpected characteristics of a benchmark. When we used BERT to predict a relation for random words in the benchmark sentences, it guessed the same label as it would have for the corresponding marked entities roughly half of the time. Since the task had nineteen roughly-balanced labels, we expected much less consistency. This finding repeated across all pipeline configurations; BERT was treating the benchmark as a sequence classification task! Our final conclusion was that the benchmark is inadequate: all sentences appeared exactly once with exactly one pair of entities, so the task was equivalent to simply labeling each sentence. We passionately claim from our experience that the current trend of using larger and more complex LMs must include concurrent evolution of benchmarks. We as researchers need to be diligent in keeping our tools for measuring as sophisticated as the models being measured, as any scientific domain does.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:ri:diva-58557 (URN)
Conference
Sustainable language representations for a changing world, 31 May 2021.NoDaLiDa workshop
Available from: 2022-02-15 Created: 2022-02-15 Last updated: 2022-02-15Bibliographically approved
Jerdhaf, O., Santini, M., Lundberg, P., Karlsson, A. & Jönsson, A. (2021). Focused Terminology Extraction for CPSs The Case of "Implant Terms" in Electronic Medical Records. In: 2021 IEEE International Conference on Communications Workshops (ICC Workshops): . Paper presented at 2021 IEEE International Conference on Communications Workshops (ICC Workshops).
Open this publication in new window or tab >>Focused Terminology Extraction for CPSs The Case of "Implant Terms" in Electronic Medical Records
Show others...
2021 (English)In: 2021 IEEE International Conference on Communications Workshops (ICC Workshops), 2021Conference paper, Published paper (Refereed)
Abstract [en]

Language Technology is an essential component of many Cyber-Physical Systems (CPSs) because specialized linguistic knowledge is indispensable to prevent fatal errors. We present the case of automatic identification of implant terms. The need of an automatic identification of implant terms spurs from safety reasons because patients who have an implant may or may be not submitted to Magnetic Resonance Imaging (MRI). Normally, MRI scans are safe. However, in some cases an MRI scan may not be recommended. It is important to know if a patient has an implant, because MRI scanning is incompatible with some implants. At present, the process of ascertain whether a patient could be at risk is lengthy, manual, and based on the specialized knowledge of medical staff. We argue that this process can be sped up, streamlined and become safer by sieving through patients’ medical records. In this paper, we explore how to discover implant terms in electronic medical records (EMRs) written in Swedish with an unsupervised approach. To this aim we use BERT, a state-of-the-art deep learning algorithm based on pre-trained word embeddings. We observe that BERT discovers a solid proportion of terms that are indicative of implants.

Keywords
Annotations, Terminology, Magnetic resonance imaging, Conferences, Bit error rate, Implants, Manuals
National Category
Dentistry
Identifiers
urn:nbn:se:ri:diva-55994 (URN)10.1109/ICCWorkshops50388.2021.9473700 (DOI)
Conference
2021 IEEE International Conference on Communications Workshops (ICC Workshops)
Available from: 2021-08-26 Created: 2021-08-26 Last updated: 2021-08-26Bibliographically approved
Santini, M., Rennes, E., Holmer, D. & Jönsson, A. (2021). Human-in-the-Loop: Where Does Text Complexity Lie?. In: : . Paper presented at Sustainable language representations for a changing world, 31 May 2021. NoDaLiDa workshop.
Open this publication in new window or tab >>Human-in-the-Loop: Where Does Text Complexity Lie?
2021 (English)Conference paper, Oral presentation with published abstract (Other academic)
Abstract [en]

In this position statement, we would like to contribute to the discussion about how to assess quality and coverage of a model. In this context, we verbalize the need of linguistic features’ interpretability and the need of profiling textual variations. These needs are triggered by the necessity to gain insights into intricate patterns of human communication. Arguably, the functional and linguistic interpretation of these communication patterns contribute to keep humans’ needs in the loop, thus demoting the myth of powerful but dehumanized Artificial Intelligence. The desideratum to open up the “black boxes” of AI-based machines has become compelling. Recent research has focussed on how to make sense and popularize deep learning models and has explored how to “probe” these models to understand how they learn. The BERTology science is actively and diligently digging into BERT’s complex clockwork. However, much remains to be unearthed: “BERTology has clearly come a long way, but it is fair to say we still have more questions than answers about how BERT works”. It is therefore not surprising that add-on tools are being created to inspect pre-trained language models with the aim to cast some light on the “interpretation of pre-trained models in the context of downstream tasks and domain-specific data”. Here we do not propose any new tool, but we try to formulate and exemplify the problem by taking the case of text simplification/text complexity. When we compare a standard text and an easy-to-read text (e.g. lättsvenska or simple English) we wonder: where does text complexity lie? Can we pin it down? According to Simple English Wikipedia, “(s)imple English is similar to English, but it only uses basic words. We suggest that articles should use only the 1,000 most common and basic words in English. They should also use only simple grammar and shorter sentences.” This characterization of a simplified text does not provide much linguistic insight: what is meant by simple grammar? Linguistic insights are also missing from state-of-the-art NLP models for text simplification, since these models are basically monolingual neural machine translation systems that take a standard text and “translate” it into a simplified type of (sub)language. We do not gain any linguistic understanding, of what is being simplified and why. We just get the task done (which is of course good). We know for sure that standard and easy-to-read texts differ in a number of ways and we are able to use BERT to create classifiers that discriminate the two varieties. But how are linguistic features re-shuffled to generate a simplified text from a standard one? With traditional statistical approaches, such as Biber’s MDA (based on factor analysis) we get an idea of how linguistic features co-occur and interact in different text types and why. Since pre-trained language models are more powerful than traditional statistical models, like factor analysis, we would like to see more research on "disclosing the layers" so that we can understand how different co-occurrence of linguistic features contribute to the make up of specific varieties of texts, like simplified vs standard texts. Would it be possible to update the iconic example

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:ri:diva-58558 (URN)
Conference
Sustainable language representations for a changing world, 31 May 2021. NoDaLiDa workshop
Available from: 2022-02-15 Created: 2022-02-15 Last updated: 2022-02-15
Jerdhaf, O., Santini, M., Lundberg, P., Karlsson, A. & Jönsson, A. (2021). Implant Term Extraction from Swedish Medical Records–Phase 1: Lessons Learned.. In: : . Paper presented at Eighth Swedish Language Technology Conference (SLTC-2020), 25-27 November 2020.
Open this publication in new window or tab >>Implant Term Extraction from Swedish Medical Records–Phase 1: Lessons Learned.
Show others...
2021 (English)Conference paper, Published paper (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:ri:diva-58560 (URN)
Conference
Eighth Swedish Language Technology Conference (SLTC-2020), 25-27 November 2020
Available from: 2022-02-15 Created: 2022-02-15 Last updated: 2022-02-15Bibliographically approved
Santini, M., Jerdhaf, O., Karlsson, A., Eneling, E., Stridsman, M., Jönsson, A. & Lundberg, P. (2021). The Potential of AI-Based Clinical Text Mining to Improve Patient Safety: the Case of Implant Terms and Patient Journals.. In: : . Paper presented at Joint Annual Meeting ISMRM-ESMRMB & ISMRT. 31st Annual Meeting, 07-12 May 2022 in London, England, UK.. , Article ID 692.
Open this publication in new window or tab >>The Potential of AI-Based Clinical Text Mining to Improve Patient Safety: the Case of Implant Terms and Patient Journals.
Show others...
2021 (English)Conference paper, Oral presentation with published abstract (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:ri:diva-58559 (URN)
Conference
Joint Annual Meeting ISMRM-ESMRMB & ISMRT. 31st Annual Meeting, 07-12 May 2022 in London, England, UK.
Available from: 2022-02-15 Created: 2022-02-15 Last updated: 2022-02-15
Jerdhaf, O., Santini, M., Lundberg, P., Karlsson, A. & Jönsson, A. (2020). Implant Terms: Focused Terminology Extraction with Swedish BERT - Preliminary Results. In: : . Paper presented at Eighth Swedish Language Technology Conference (SLTC2020), 25–27 November 2020.
Open this publication in new window or tab >>Implant Terms: Focused Terminology Extraction with Swedish BERT - Preliminary Results
Show others...
2020 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Certain implants are imperative to detect be-fore MRI scans. However, implant terms, like‘pacemaker’ or ‘stent’, are sparse and difficultto identify in noisy and hastily written elec-tronic medical records (EMRs). In this pa-per, we explore how to discover implant termsin Swedish EMRs with an unsupervised ap-proach.To this purpose, we use BERT, astate-of-the-art deep learning algorithm, andfine-tune a model built on pre-trained SwedishBERT. We observe that BERT discovers asolid proportion of indicative implant terms.

Keywords
Automatic Terminology Extraction, BERT, medical records, deep llearning, pretrained model in Swedish
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:ri:diva-52378 (URN)
Conference
Eighth Swedish Language Technology Conference (SLTC2020), 25–27 November 2020
Note

This research was funded by Vinnova. Project title: Patient-Safe Magnetic Resonance Imaging Examination by AI-based Medical Screening. Grantnumber: 2020-00228.

Available from: 2021-02-07 Created: 2021-02-07 Last updated: 2021-05-05Bibliographically approved
Blomqvist, E., Alirezaie, M. & Santini, M. (2020). Towards causal knowledge graphs - position paper. In: CEUR Workshop Proceedings: . Paper presented at 5th International Workshop on Knowledge Discovery in Healthcare Data, KDH 2020, 29 August 2020 through 30 August 2020 (pp. 58-62). CEUR-WS
Open this publication in new window or tab >>Towards causal knowledge graphs - position paper
2020 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2020, p. 58-62Conference paper, Published paper (Refereed)
Abstract [en]

In this position paper, we highlight that being able to analyse the cause-effect relationships for determining the causal status among a set of events is an essential requirement in many contexts and argue that cannot be overlooked when building systems targeting real-world use cases. This is especially true for medical contexts where the understanding of the cause(s) of a symptom, or observation, is of vital importance. However, most approaches purely based on Machine Learning (ML) do not explicitly represent and reason with causal relations, and may therefore mistake correlation for causation. In the paper, we therefore argue for an approach to extract causal relations from text, and represent them in the form of Knowledge Graphs (KG), to empower downstream ML applications, or AI systems in general, with the ability to distinguish correlation from causation and reason with causality in an explicit manner. So far, the bottlenecks in KG creation have been scalability and accuracy of automated methods, hence, we argue that two novel features are required from methods for addressing these challenges, i.e. (i) the use of Knowledge Patterns to guide the KG generation process towards a certain resulting knowledge structure, and (ii) the use of a semantic referee to automatically curate the extracted knowledge. We claim that this will be an important step forward for supporting interpretable AI systems, and integrating ML and knowledge representation approaches, such as KGs, which should also generalise well to other types of relations, apart from causality. © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Place, publisher, year, edition, pages
CEUR-WS, 2020
Keywords
Data mining, Health care, Semantics, Automated methods, Causal relations, Cause-effect relationships, Generation process, Knowledge graphs, Knowledge patterns, Knowledge structures, Types of relations, Knowledge representation
National Category
Natural Sciences
Identifiers
urn:nbn:se:ri:diva-50447 (URN)2-s2.0-85093865186 (Scopus ID)
Conference
5th International Workshop on Knowledge Discovery in Healthcare Data, KDH 2020, 29 August 2020 through 30 August 2020
Available from: 2020-11-06 Created: 2020-11-06 Last updated: 2021-05-05Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5737-8149

Search in DiVA

Show all publications