SimulTrans Localization Blog: SimulTips

3 Factors to Consider for Translation Memory Alignment

[fa icon="calendar"] September 22, 2016 / by Marie Harte

Translation Memory Alignment Localization.png

 

Translation Memory (TM) systems are used in the translation and localization industry as an invaluable means of re-using or “leveraging” existing translated text in new projects. Every translation carried out during a project is uploaded into a translation memory database.

These translations are then available to the linguist whenever needed, as either 100% matches or fuzzy matches. In this way, savings are made in both time and money, and consistency is maintained across platforms and versions.

However, what do you do if your legacy material (previous translations) is not available in TM format?

There could be several reasons for this:

  1. Translations were done by in-country offices who don’t have access to TM systems.
  2. Translations were done by a linguistic vendor who has not delivered a TM as part of their handoff.
  3. Translations were done by a linguistic vendor but the quality was not good – changes were made to improve the quality but only in the translated files and not in the TM provided.

 

Does this mean that any existing work is lost and you have to start from scratch?

Not at all – in these cases we can create TMs from your legacy text using a process called “alignment”.

 

What is alignment?

Quite simply put, we will take a source file plus its corresponding translation and match the segments to each other. This builds up a repository of translation units which are then saved as a TM which can be used in your future translation projects.

 

The alignment process

The initial alignment is done using one of the many automated alignment tools on the market. A set of source files and target files are loaded into the tool and linked based on the filenames. An automatic alignment is then run on each file pair.

The alignment tools will look at the structure of both source and target files and sentence-by-sentence will match source text with probable translations.

Alignment tools have become ever more sophisticated over the years and the results of the automated process are usually very good.

Some tools will also generate a report with a quality score based on internal algorithms to give an indication of how successful the alignment was.

 

Linguistic verification

It is recommended that the alignment project is then sent to a linguist for verification. The linguist will work through each segment, approving the correct matches and fixing (or deleting where necessary) incorrect matches.

 

Example:

An example of an incorrect match could be where 2 English source sentences were translated as 1 single German sentence in order for the translation to flow correctly. The alignment tool may not recognise this and the matches from that point on are then out of sync.

 

However once the linguist makes the change needed, they can re-run the automatic alignment from that point onwards, which will update any incorrect matches. Once this is complete, the approved segments from all files are exported to a TM format ready for use.

 

There may be some cases where it is more expedient to just run the automatic alignment and generate the TM from that without performing the linguistic checks.

Perhaps the legacy material is very straightforward, containing short sentences which are very easily matched. In this instance, it is possible to apply an alignment penalty to any matches from that TM.

As the linguist works on a new translation using this TM, any matches have a certain percentage deducted automatically. So if the penalty is set to 10%, any 100% matches become 90% matches and they are tagged as “aligned” for the linguists reference. The matches can then be edited where necessary and you can be certain of good quality results.

 

Factors to consider

There are a number of factors which will help to improve alignment results:

 

  • Source and target files have to be in the same format

For example, an InDesign file and a Word file are processed slightly differently by a Translation Memory system. Formatting/variable information is converted to tags in the file for translation. The alignment tool can use these tags as a guide but if they are different in the source and target files, it may not match the segments as well as it could.

 

  • The source and target files are the same version

A source file may be updated to include extra information or delete redundant text after the initial translation is done. If the translated file is not updated to match, the alignment process is more complex.

 

  • The translated files are of good quality

The alignment process as a rule doesn’t include a linguistic review of the existing translations. So it is important that the client is happy with the quality of these translations. It is possible for the linguist to review the files as they go but this impacts on the time needed to complete the task.

 

How long does alignment take?

The time needed is calculated based on the number of words in the source files and also the format of the files. We would estimate for a set of MS Word files, 40,000 words a day can be aligned, whereas this is reduced to 20,000 to 25,000 words a day for other formats such as InDesign/Framemaker. This includes the automatic alignment and the linguistic check.

However the factors mentioned above will also need to be taken into account when providing an estimate for this.

 

How much does it cost?

Well that depends on the size of the project. Typically this type of alignment work is charged by the hour. Want to know even more about TM?

Watch  Translation Memory Video

Topics: Documentation Translation, Translation Services

Marie Harte

Written by Marie Harte

Marie has worked in the localization industry for almost 20 years, specializing in translation memory tools. As part of the Localization Solutions team in SimulTrans, she is involved in file analysis and translation memory maintenance, providing support to vendors and clients alike throughout the localization cycle. Marie has an Honors Bachelors degree in Applied Languages (French and German) from Dublin City University (DCU).