Nearly all meteorological agencies in the world, including SMHI, possesses troves of archived observations spanning decades in paper format. Dawsonia is a proof-of-concept application which combines accurate computer vision algorithms and machine learning models to handle different forms of tabular data, convert handwritten text and produce machine-readable files. This would aid and accelerate the digitization work from the paper archives into data, which is done manually as of now. As a result of the project, SMHI aims at digitizing numerous historical weather observations that will help a better understanding of the climate, especially of the occurrence of extreme weather events.
The method implemented in Dawsonia is presented along with the development process. We also describe how the machine learning models were trained on LUMI, an EuroHPC supercomputer with technical support from ENCCS.