Translation and Localization Resources | SimulTrans

Pitfalls of Post-Editing Machine Translated Content

Written by Alexandre Rallo | August 31, 2021


The newest Machine Translation (MT) engines have left their rules-based and statistical days behind in favor of neural technology. Because Machine Translation engines can be trained, they can evolve to suit particular projects. However, more and more customers are requesting post-editing to ensure near-human quality and consistency across projects. Here are some of the most common pitfalls a post-editor might encounter and how to solve them.

Balance

[The juxtaposition in writing of syntactically parallel]

The first pitfall of post-editing is finding the right balance. A post-editor should know when to edit something and when not to; they should neither be lazy nor overzealous. It all depends on the project specifics:

  • What kind of text are we working on? Is it a simple "How-To guide" about easy-to-use software?
  • Are the sentences generally short and straightforward?
  • What quality is the client expecting?

Consequently, over-editing is something that should be avoided, as it could slow down a project’s completion, while under-editing risks jeopardizing quality.

Accuracy

[Freedom from mistake or error]

Once the right balance has been found, linguists need to look out for other challenges, among them deceptive accuracy.
What I call deceptive accuracy occurs when the Machine Translation output seems accurate, but actually introduces an inconsistency within the project.

Here’s an example using French MT output:

EN source: Restart the computer the device is connected to before updating it.

FR MT output: Redémarrez l’ordinateur auquel l’appareil est connecté avant de le mettre à jour.

At first glance, there is no need to post-edit this sentence and the linguist should be able to confirm the segment and move on to the next one. But terminological consistency is key, and we need to make sure the client’s preferences are adhered to at all times.

It might just be the case that for this account, the French in-house reviewer from the customer has asked to translate “device” as “dispositif”, rather than “appareil”. “Device” is such a common word, the MT is likely to mistranslate it from time to time, even if the engine has been trained.

Post-editing a project can be a long process, and such inaccuracies are easy to miss. A glossary check during the QA phase can help catch such terminological mistakes.

Syntax

[The way in which linguistic elements (such as words) are put together to form constituents (such as phrases or clauses)]

Tags are very useful and commonly used placeholders that are replicated from source to target. Unfortunately, they sometimes get in the way of machine translation and the source syntax gets broken in a sentence. Things are improving, but small issues still pop up every now and then.

This is what a tag-related issue might look like:

EN source: Click on <b>Open</b> [TAG1] > [TAG2] > OK to complete the process.
FR MT output: Cliquez sur <b>Ouvrir</b> [TAG1] > [TAG2] > D’accord de finir le processus.

See how the source syntax is broken up by the tags and arrows.  The MT might think that “OK to complete the process” is a separate phrase that should be treated independently, when in fact it is the end of the segment as a whole. This needs to be fixed by the post editor.

Fluency 

[The ability to speak or write a foreign language easily and accurately]

Syntactic fluency is a tricky challenge because it can be linked to the balance between under-editing and over-editing I mentioned earlier. Often, the MT output will look perfectly understandable, even though it stays too close to the source syntax to be considered fluent in the target language.

Here’s another French example:

EN source: Please wait while the files are being updated. Then, restart the device.
FR MT output: Veuillez patienter pendant que les fichiers sont mis à jour. Ensuite, redémarrez l’appareil.
Post-edit: Attendez que les fichiers soient mis à jour, puis redémarrez l’appareil.

Here, the MT output is not wrong, strictly speaking, but the style makes it clear the sentence was obtained by translating an English source word for word, which does not read fluently. Unless the project requires light post-editing, linguists might want to fix this type of issue, because it could hamper fluency and introduce stylistic inconsistencies.

This blog covers just some challenges related to the post-editing of machine-translated business documentation. Though today’s MT engines are more and more powerful, linguists must ensure quality and consistency at all times regardless of the quality of the raw MT output. While deceptive accuracy and tag-related issues can be avoided by implementing stringent Quality Assurance checks, syntactic fluency can only be corrected by the post-editor.

If you want to know whether your business documentation could be suitable for machine translation and post-editing, click below for a free report.