Background

Subtitling is the preferred multimedia content translation method in most European countries, and for most genres, to make audiovisual content widely accessible across languages. The increasing use of digital multilingual multimedia through the internet, the popularity of DVDs, the current European policies to promote cultural and linguistic diversity and to make audiovisual content accessible to all people has raised the demand for subtitling in recent years.

There is a clear need to optimize the productivity of current subtitle translation workflow processes, reducing costs and turnaround times while enhancing the consistency of the translation results.

 

Goals

SUMAT aims to increase the efficiency of professional subtitle translation through the introduction of statistical machine translation technology.

We are developing an online subtitle translation service addressing 9 different European languages combined into 14 different language pairs.

 

The Language Pairs

Our language pairs consist of:

SUMAT Language Pairsv1

 

 

 

 

 

 

 

 

Why Use MT Technology?

Machine translation uses software to translate text from one natural language to another.

Statistical Machine Translation (SMT) is a paradigm where translations are generated on the basis of statistical models derived from the analysis of bilingual and monolingual text corpora.

The SMT paradigm suits the machine translation of subtitles because:

  • Subtitles are grammatically sound, short textual units, whose linguistic properties fit well with the state of the art SMT models that are available.
  • The approach promotes the re-usability of  existing and new translations as training data.

 

The Rising use of Post-editing

The translation industry is embracing post-edited translation in domains in which there are enough parallel bilingual corpora to customize machine translation engines.

This means that for trained human translators post-edited translation is an increasingly useful method that has been shown to achieve higher productivity than human translation alone.

 

The SUMAT Approach

To build customized SMT engines for subtitles, trained on large professional-quality parallel and monolingual subtitle corpora.

To evaluate the merits of this approach by:

1. Having professional subtitle translators judge the quality of machine translated subtitles through quality ranking scales.

2. Measuring the productivity gain achieved by post-editing machine translated subtitles, compared to starting the translation process from scratch.

 

Project Milestones

Corpora

Large amounts (ca.1 million subtitles) of  professional quality parallel subtitle corpora has been collected for each of the language pairs addressed in the project, and prepared for SMT training purposes.

Experiments

Various technical approaches with the aim of improving SMT performance have been explored:

  • Subtitle vs. sentence alignment
  • Factored models
  • NER
  • Augmented phrase-tables
  • Mixed models adding extra data

Online Service

A prototype online service has been developed and is currently being refined.

Evaluation

Evaluation by professional subtitle translators is underway. Two evaluation rounds are foreseen:

  • Round 1: Subtitle translators are scoring individual subtitles and categorizing the errors found with the aim of analyzing the quality of the SMT outputs. Their feedback is being used to refine the SMT engines.
  • Round 2: The productivity gain that can be achieved through the use of the SUMAT approach will be measured.

Results

Evaluation results and the Online Service will be finalized by Q1 2014.