Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Structuring Semi-structured Data from Building Inspection Reports Using a Large Language Model
RISE Research Institutes of Sweden, Built Environment, Building and Real Estate.ORCID iD: 0000-0001-9666-2196
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0002-7181-8411
2025 (English)In: Lecture Notes in Civil Engineering, ISSN 2366-2557, Vol. 554 LNCE, p. 508-513Article in journal (Refereed) Published
Abstract [en]

Assessing the status of buildings is the basis for risk evaluation and prediction of maintenance need in the building stock as well as in individual buildings. It can be a both time-consuming and costly task that would benefit from computerized procedures based on machine learning methods. One main obstacle is to find structured data to use for the machine learning. Building inspection report are a goldmine for assessing the status of buildings, both on a building stock level and for individual buildings. As they contain both an overview of the buildings base data; age, type, construction and material choices, installations, etc. and notes on damages, deficiencies and risks and recommended measures often described in free-text sections. The problem is that the structure of the documents is so fluid and varies in structure and format between different inspectors. The free-text sections would, just a couple of years ago, have been very resource-intensive to analyze in any systematic way, but with the emergence of publicly available large language models such as ChatGPT, this is now completely realistic. In this project, we use the large language model in ChatGPT to extract structured data from text in pdf-format, through HTTP requests to ChatGPT in json format. The result is structured data in pre-determined categories that can be used for status prediction both on a building stock level and on individual buildings and as input data to more advanced machine learning procedures. The paper exemplifies the use of the structured data with a focus on prediction of maintenance need for single family houses in western Sweden. 

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH , 2025. Vol. 554 LNCE, p. 508-513
Keywords [en]
Adversarial machine learning; Model buildings; Building inspection report; Building inspections; Building status prediction; Building stocks; Language model; Large language model; Machine-learning; Semistructured data; Stock level; Structured data; Risk assessment
National Category
Civil Engineering
Identifiers
URN: urn:nbn:se:ri:diva-77993DOI: 10.1007/978-981-97-8313-7_70Scopus ID: 2-s2.0-85213018141OAI: oai:DiVA.org:ri-77993DiVA, id: diva2:1941062
Conference
9th International Building Physics Conference, IBPC 2024.Toronto, Canada. 25 July 2024through 27 July 2024
Note

This study was funded by L\u00E4nsf\u00F6rs\u00E4kringars Research Foundation www.lansforsakringar.se/forskning (grant number P5.22) and in collaboration with L\u00E4nsf\u00F6rs\u00E4kringar \u00C4lvsborg AB. The research team acknowledges the valuable work of Per Kreuger, retired from RISE Research Institutes of Sweden AB, Sweden, part of the research team until the end of 2023.

Available from: 2025-02-27 Created: 2025-02-27 Last updated: 2025-09-23Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Svennberg, KaisaEkman, Jan

Search in DiVA

By author/editor
Svennberg, KaisaEkman, Jan
By organisation
Building and Real EstateData Science
Civil Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 63 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf