Translation and Localization Resources | SimulTrans

3 Factors to Consider for Translation Memory Alignment

Written by the SimulTrans Team | June 14, 2022

Translation Memory (TM) systems are used in the translation and localization industry as an invaluable means of re-using or "leveraging" existing translated text in new projects.  In other words, to create a translation memory using a previous translation.

Every translation carried out during a translation project is uploaded into a translation memory database. These translations are then available to the linguist whenever needed, as either 100% matches or fuzzy matches. In this way, savings are made in both time and money, maintaining consistency across platforms and versions. 

What is translation memory alignment?

Quite simply put, translation memory alignment is when a linguist takes a source file (for example, English) plus its corresponding translation (for example, French) and matches the segments to each other using a software application tool such as Trados or Memsource. This task builds up a repository of translation units (TU) which are then saved as a Translation Memory that can be used in your future translation projects.

If your legacy material (previous translations) is not available in TM format, typically two most popular file types in the industry are XLIFF and TMX - both of which are XML files. There could be several reasons for this.

Reasons:

  1. Translations were done by in-country offices that don't have access to TM systems.
  2. Translations were done by a linguistic vendor who has not delivered a TM as part of their handoff.
  3. Translations were done by a translation vendor but the quality was not good. Changes were made to improve the quality but only in the translated files and not in the TM provided.

Does this mean that any existing work is lost, and you have to start from scratch? Not at all – in these cases, we can create TMs from your legacy text using a process called "alignment."

The initial alignment is done using one of the many automated alignment tools on the market. A set of source files and target files are loaded into the tool and linked based on the filenames. An automatic alignment is then run on each file pair. 

The alignment tools will look at the structure of both source and target files, and sentence-by-sentence will match source text with probable translations. 

Alignment tools have become ever more sophisticated over the years, and the automated process results are usually excellent. 

Some tools will also generate a report with a quality score based on internal algorithms to indicate how successful the alignment was.

It is recommended that the automated alignment project is then sent to a linguist for verification. The linguist will work through each segment, approving the correct matches and fixing (or deleting where necessary) incorrect matches.

Example: 

An example of an incorrect match could be where two English source sentences were translated as one single German sentence in order for the translation to flow correctly.  The alignment tool may not recognize this, and the matches from that point on are then out of sync.

However, once the linguist makes the change needed, they can re-run the automatic alignment from that point onwards, which will update any incorrect matches.

 Once this is complete, the approved segments from all files are exported to a TM format of the client's choice and ready for use. There may be cases where it is more expedient just to run the automatic alignment and generate the TM without performing the linguistic checks. 

Perhaps the legacy material is very straightforward, containing short sentences which are very easily matched. In this instance, it is possible to apply an alignment penalty to any matches from that TM. 

As the linguist works on a new translation using this TM, any matches have a certain percentage deducted automatically. So if the penalty is set to 10%, any 100% matches become 90% matches, and they are tagged as "aligned" for the linguist's reference. The 90% matches can then be edited where necessary and you can be sure of good quality results.

What factors to consider before aligning translations?

There are several factors that will help to improve the alignment results:

  •  Source and target files have to be in the same format. For example, an InDesign file and a Word file are processed slightly differently by a Translation Memory system. Formatting/variable information is converted to tags in the file for translation. The alignment tool can use these tags as a guide, but if they are different in the source and target files, it may not match the segments as well as it could.
  • The source and target files are the same version. A source file may be updated to include extra information or, delete redundant text, after the initial translation is done. If the translated file is not updated to match, the alignment process is more complex.
  • The translated files are of good linguistic quality. The alignment process, as a rule doesn't include a linguistic review of the existing translations. So it is essential that the client is happy with the quality of these translations. The linguist can review the files as they go, but this impacts on the time needed to complete the task.

How long does translation memory alignment take?

The time needed is calculated based on the number of words in the source files and the files' format. We would estimate for a set of MS Word files that 40,000 words a day can be aligned, whereas this is reduced to 20,000 to 25,000 words a day for other formats such as InDesign/Framemaker. This includes the automatic alignment and the linguistic check. 

However, the factors mentioned above will also need to be considered when providing an estimate for this.

How much does translation memory alignment cost?

That depends on the size of the project. Typically this type of alignment work is charged by the hour. So we would need to review your files and the language combination and then come up with a quote.

Do you want to know if your files are suitable for translation memory alignment? 

This blog has been updated in 2022.