Your browser doesn't support javascript. This means that the content or functionality of our website will be limited or unavailable. If you need more information about Vinnova, please contact us.

TYDLIGT - Techniques for Yielding Discourse Level Improvements for German Translation

Reference number
Coordinator Uppsala universitet - Institutionen för lingvistik och filologi
Funding from Vinnova SEK 1 251 343
Project duration December 2015 - January 2018
Status Completed

Purpose and goal

The goal of this project was to develop innovative strategies to deal with complex linguistic constructions across word boundaries in Statistical Machine Translation (SMT) from and to German. In order to improve the translation of these constructions, often referred to as Multiword Expressions or MWEs, the machine translation system must be aware of their boundaries. This project mainly focused on this identification step which is itself very complex and time consuming.

Expected results and effects

This project has successfully drawn attention to the problem of explicitly addressing MWEs in statistical machine translation in the respective research community. One of our studies showed that even state-of-the-art neural machine translation struggles with the correct translation of these constructions. We significantly contributed to the creation of a large multilingual ressource of annotated MWE instances for German and Swedish which has been made freely accessible to the research community and can thus serve as the basis for MWE research for many years to come.

Planned approach and implementation

We started this project with a network of three university partners (Stuttgart, München, Uppsala) which has then been grown substantially on European level through serving on the board for the first edition of a large multilingual shared task on MWE identification (organised by the PARSEME European COST action). While these collaborations covered a wide range of MWE phenomena, the collaboration with the German community (Stuttgart, Düsseldorf, Hamburg) was later intensified to focus on German peculiarities and create a more fine-grained annotation scheme for German MWEs.

The project description has been provided by the project members themselves and the text has not been looked at by our editors.

Last updated 25 November 2019

Reference number 2015-01554

Page statistics