SimulTrans Localization Blog: SimulTips

Ingredients for a successful Machine Translation project

[fa icon="calendar"] February 2, 2016 / by Eli Karpodini

Ingredients for a Successful Machine Translation Project.png

The most widely used types of Machine Translation (MT) are statistical MT and rule-based MT. The former is based on the calculation of translation probabilistic models while the latter on grammatical  information and dictionaries. Nowadays, these types of MT have become available as commercial products and can be integrated into CAT tools, saving both time and money in your translation projects.



What to MT or, in other words, what to cook?
Before engaging in an MT project, the first thing you would have to question is your textual domain's suitability. Make sure that your domain is fit for it. For the most part, texts with technical, laconic language and user-generated content yield the best results.

The second thing you need to investigate is your ROI from the use of MT. The answer on this depends on the MT training resources you have at your disposal and, of course, on the prospective volume of translations. If the answer to these questions is positive, here are some useful tips on how to build a good MT engine.

 

Essential ingredients 
Some basic ingredients for the creation of a customized MT are:       
1.    An MT system either in the form of a cloud-based platform (like KantanMT or MSTH) or a stand-alone piece of software (like the command-line tool, Moses)
2.    Bilingual resources (translation memories (TMs), dictionaries, parallel corpora, term bases)
3.    A lot of patience!

 

Tips for your perfect recipe
Aside from these necessary ingredients, some useful tips to enhance your MT engine’s performance are the following:
·    Use monolingual resources (monolingual corpora, Do-not-Translate (DNT) lists)
·    Gather as many data as possible. As a rule of thumb, keep in mind that the more data you feed into your engine the better it will perform.
·    Not any data will do, though. The data you feed into your engine need to be representative of the domain in question.
·    Keep your data clean. Your training data should be typo-free.
·    Keep your data clear. Refrain from using colloquialisms, idioms, and slang.
·    If possible, pre-edit your data to maximize results.
·    Post-edit your MT output and re-feed it into the engine.
·    If your domain allows, consider the use of a control language.


How to serve?
Last but not least, always keep in mind that some post-editing might be required. This will be determined by the final aim of the translated text. Light post-editing might suffice for gisting purposes, while full post-editing would be required to attain publishable quality.


And for dessert...

To sum up, there are many variables that need to be taken into account when planning the deployment of your customized engine. And the best way to know if machine translation technology is working for you is to always check that you are not spending more time on it than what would be required to translate from scratch.

Like to discover more about when to use Machine Translation instead of human translators? Read on here

 

Or click the button below to get a complimentary Machine Translation suitability report: 

Get Your   Machine Translation   Suitability Report

Topics: Documentation Translation

Eli Karpodini

Written by Eli Karpodini

Elisavet Karpodini is a member of the locsolutions team at SimulTrans. Elisavet has studied Translation, International Relations and European Studies, and holds an MSc in Translation Technology.