Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Processing of Condition Monitoring Annotations with BERT and Technical Language Processing
Lulea University of Technology, Sweden.
SKF Research & Technology Development, Sweden.
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0002-7873-3971
Lulea University of Technology, Sweden.
Show others and affiliations
2022 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Annotations in condition monitoring systems contain information regarding asset history and fault characteristics in the form of unstructured text that could, if unlocked, be used for intelligent fault diagnosis. However, processing these annotations with pre-trained natural language models such as BERT is problematic due to out-of-vocabulary (OOV) technical terms, resulting in inaccurate language embeddings. Here we investigate the effect of OOV technical terms on BERT and SentenceBERT embeddings by substituting technical terms with natural language descriptions. The embeddings were computed for each annotation in a pre-processed corpus, with and without substitution. The K-Means clustering score was calculated on sentence embeddings, and a Long Short-Term Memory (LSTM) network was trained on word embeddings with the objective to recreate the output from a keywordbased annotation classifier. The K-Means score for SentenceBERT annotation embeddings improved by 40% at seven clusters by technical language substitution, and the labelling capacity of the BERT-LSTM model was improved from 88.3 to 94.2%. These results indicate that the substitution of OOV technical terms can improve the representation accuracy of the embeddings of the pre-trained BERT and SentenceBERT models, and that pre-trained language models can be used to process technical language.

Place, publisher, year, edition, pages
2022. Vol. 7, no 1, p. 306-314
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:ri:diva-62056DOI: 10.36001/phme.2022.v7i1.3356OAI: oai:DiVA.org:ri-62056DiVA, id: diva2:1722929
Conference
PHM Society European Conference. 2022
Available from: 2023-01-01 Created: 2023-01-01 Last updated: 2023-01-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Nivre, Joakim

Search in DiVA

By author/editor
Nivre, Joakim
By organisation
Data Science
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 141 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf