Your browser doesn't support javascript. This means that the content or functionality of our website will be limited or unavailable. If you need more information about Vinnova, please contact us.

Our e-services for applications, projects and assessments close on Thursday 25 April at 4:30pm because of system upgrades. We expect to open them again on Friday 26 April at 8am the latest.

Deep learning base speech synthesis for reading aloud of lengthy and information rich texts in Swedish

Reference number
Coordinator Kungliga Tekniska Högskolan - Språkbanken Tal
Funding from Vinnova SEK 6 688 000
Project duration October 2019 - October 2024
Status Ongoing
Venture AI - Leading and innovation
Call From AI-research to innovation

Purpose and goal

We develop deep learning based speech synthesis (text-to-speech) for reading aloud long and information-rich text in Swedish. The project runs from 2019 to 2022 and is led by the national research infrastructure Språkbanken Tal and the division for Speech, music and hearing at KTH Royal Institute of Technology. The remaining project partners are the Swedish Agency for Accessible Media (MTM), Bonnierförlagen AB, Wikimedia Sverige and Södermalms Talteknologiservice AB.

Expected results and effects

Freely available frameworks for training speech synthesis exist, but the language specific (Swedish) resources needed for the development and refinement of deep learning based speech synthesis are lacking or of poor quality. The cost to develop them is prohibitively high, which excludes Swedish companies from this development. The project will make hte required basic resources freely available, and lay the foundation for further development of Swedish speech technology and Swedish speech synthesis, in particular with respect to reading lengthy texts aloud.

Planned approach and implementation

The project holds 5 work packages: Graph-to-phone conversion (G2P): automatic conversion from text to pronuonciation. Text profiling: decides how text is to be read, distinguishes e.g. running text, dialogue, formulas an dtables. Recording of human speakers: training data for the deep learning. Speech synthesis voice: training of speech synthesis voice with deep learning. Evaluation: methods and tools for evaluation of speech synthesis. All results are made freely available at the conclusion of the project.

The project description has been provided by the project members themselves and the text has not been looked at by our editors.

Last updated 22 April 2024

Reference number 2019-02994

Page statistics