Deep learning base speech synthesis for reading aloud of lengthy and information rich texts in Swedish

Reference number
Coordinator	Kungliga Tekniska Högskolan - Språkbanken Tal
Funding from Vinnova	SEK 6 617 200
Project duration	October 2019 - October 2024
Status	Completed
Venture	AI - Leading and innovation
Call	From AI-research to innovation

Important results from the project

The project began when neural speech synthesis was relatively new, and there Swedish neural synthesis systems scarce. The goal of the project was to develop Swedish neural speech synthesis capable of handling long and information-rich text without making more errors than traditional synthesis. This goal was achieved. The sub-goals were to make Swedish adaptations in the processing (both training and synthesis), test them, and make them available. These goals were achieved, and tools and resources are now being prepared for release on the research infrastructure Språkbanken Tal.

Expected long term effects

The project contributes resources that provide support for Swedish speech synthesis, e.g. the pronunciation lexicon Braxen, the text preprocessor Sardin, adaptations of the training system Matcha, a Swedish test set for preprocessing and synthesis of long and information-rich text, and improved evaluation of this type of speech synthesis. The project has also contributed to collaborations in research and industry. The work with evaluation has gained international attention, and the project group is arranging one of the prestigious Dagstulh seminars on the topic in January.

Approach and implementation

The quick development during period has been exciting and complicated to follow. An obstacles we thought we would have to tackle - the poor quality of the step that goes from a two-dimensional representation of sound to Swedish sound - solved itself during the project time: that process now works well in general, for all languages. Other obstacles proved greater than expected. We spent an unexpected amount of resources on law, where one result is that we MTM managed to free up several resources for general use, and another that we were forced to cancel the recordings of a new voice.

The project description has been provided by the project members themselves and the text has not been looked at by our editors.

Last updated 6 December 2024

Reference number 2019-02994

Deep learning base speech synthesis for reading aloud of lengthy and information rich texts in Swedish

Important results from the project

Expected long term effects

Approach and implementation

Contact us

Follow us

About us

Applications and reports