Your browser doesn't support javascript. This means that the content or functionality of our website will be limited or unavailable. If you need more information about Vinnova, please contact us.

Machine interpretation of handwritten source material

Reference number
Coordinator Riksarkivet
Funding from Vinnova SEK 400 000
Project duration April 2020 - May 2021
Status Completed
Venture AI - Competence, ability and application
Call Start your AI-journey! For public organizations

Important results from the project

This project has examined how techniques in the field HTR (Handwritten Text Recognition) can be used against handwritten archive materials within the Swedish National Archives. A HTR model that automatically interprets 22,500 text pages from the second half of the 19th century has been created. The model has been trained on 940 manually transcribed pages ("ground truth"), created by volunteers, and gives a character error rate of 2,7%. The HTR model is available via the Transkribus platform and the texts are searchable at the National Archives´ website.

Expected long term effects

The HTR model transcribes a historical archive correctly to 97% - better than most people can do and considerably faster. About 6 months of manual work - mainly performed by volunteers - has been put in place to create the training data for this project. It would probably have taken at least 6 years to transcribe the entire material manually. The potential of using HTR is thus great. An HTR model can also form the basis for new models, adapted for other materials. HTR will be a powerful tool for genealogists and local history researchers and data-driven research.

Approach and implementation

The work of creating HTR models and manual transcripts has taken place in the Transkribus platform. AI tools have in this way been combined with citizen science. Before the images are transcribed, the text lines need to be identified. This is done automatically but requires manual corrections. This has taken more time than expected. The HTR texts have then been exported to XML in standard format (ALTO, PAGE, and TEI). Close collaboration with external actors, researchers, and volunteers are an important part of the continued work with HTR at the National Archives.

External links

The project description has been provided by the project members themselves and the text has not been looked at by our editors.

Last updated 21 July 2021

Reference number 2020-00248