Deep learning base speech synthesis for reading aloud of lengthy and information rich texts in Swedish
Reference number | |
Coordinator | Kungliga Tekniska Högskolan - Språkbanken Tal |
Funding from Vinnova | SEK 6 688 000 |
Project duration | October 2019 - October 2024 |
Status | Ongoing |
Venture | AI - Leading and innovation |
Call | From AI-research to innovation |
Purpose and goal
We develop deep learning based speech synthesis (text-to-speech) for reading aloud long and information-rich text in Swedish. The project runs from 2019 to 2022 and is led by the national research infrastructure Språkbanken Tal and the division for Speech, music and hearing at KTH Royal Institute of Technology. The remaining project partners are the Swedish Agency for Accessible Media (MTM), Bonnierförlagen AB, Wikimedia Sverige and Södermalms Talteknologiservice AB.
Expected results and effects
Freely available frameworks for training speech synthesis exist, but the language specific (Swedish) resources needed for the development and refinement of deep learning based speech synthesis are lacking or of poor quality. The cost to develop them is prohibitively high, which excludes Swedish companies from this development. The project will make hte required basic resources freely available, and lay the foundation for further development of Swedish speech technology and Swedish speech synthesis, in particular with respect to reading lengthy texts aloud.
Planned approach and implementation
The project holds 5 work packages: Graph-to-phone conversion (G2P): automatic conversion from text to pronuonciation. Text profiling: decides how text is to be read, distinguishes e.g. running text, dialogue, formulas an dtables. Recording of human speakers: training data for the deep learning. Speech synthesis voice: training of speech synthesis voice with deep learning. Evaluation: methods and tools for evaluation of speech synthesis. All results are made freely available at the conclusion of the project.