October 14, 2016 / by Humberto Farrera-Athié Estimated read time: 5 minutes

Save Time and Money by Sending Source Files

PDF Translator PDFs are one of the most popular and convenient ways to disseminate information in the increasingly digital world. They can be opened in practically any desktop or mobile environment and support almost all modern languages.

At SimulTrans, we receive dozens of customer requests asking how to translate a PDF every month. Some of those PDFs are scanned documents, and the original sources are not available to our clients, so they wonder how to translate from a PDF or if we have a PDF translator.

However, in some cases, those PDFs are generated out of desktop publishing applications like InDesign, Quark, FrameMaker, or the output of Content Management Systems.

While we can provide quotes based exclusively on PDFs, there are several advantages of providing the actual source files for analysis rather than PDFs only.

Quotes based on source files are more accurate than those based on PDFs

PDFs are an output rather than an actual source. This means that the quality of that output can vary hugely. Very few Computer Assisted Translation (CAT) tools can handle their translation in native format. For those that do, the resulting translated document can be a hit or miss because:

any non-selectable section may not be included in the scope
graphics are usually excluded
small print cannot be rendered correctly

To provide a more precise project scope to base a quote on, it is possible to convert the PDF into a localization-friendly format like Word.

Using advanced Optical Character Recognition (OCR) converters, SimulTrans can extract the localizable content (including graphics and screenshots) and provide industry-standard logs and Desktop Publishing (DTP) times.

However, even though these log files can provide a fair idea of the scope (word count, for example), they are not entirely accurate. And the longer and more complex the PDF, the more variations we can expect.

Why does this happen?

Readable Content

Not even the most advanced OCR can convert low-resolution text correctly in a PDF created with less-than-optimal quality, so variations in the word count are inevitable when compared to the analysis of actual sources.

For instance, a simple heading like "User Manual" would be rendered as "Us er Man ua l", increasing the word count from 2 to 5. The impact could be bigger if the source language of your files is not English.

Graphic Content Ambiguity

It is not always possible to distinguish which graphics are editable or not in a PDF. In a low-resolution PDF, most of the graphic text will not be selectable and will have to be assumed as non-editable; it may not be possible to determine whether there are source graphics available or not, which could unnecessarily increase the estimated costs.

Reusable Content

Repeated text that is used in several pages of a document can be marked as footers, headers and/or cross-references in most of the new generation desktop publishing software applications.

This means that analyzing the sources will only count this text once; however, when analyzing a PDF, it could be counted repeatedly, inflating the word count.

Why do source files matter?

Providing the full set of source files created from your publishing software application will allow SimulTrans to detect any potential issues with them, like translation-unfriendly formatting (for example, line breaks or indentations that result in unneeded segmentation), missing graphics or fonts, or localization.

This way, we can fine-tune any potential localization issues upfront at the analysis stage and avoid any surprises towards the end of a project that could impact the timeframe and cost.

So remember, sending your full set of source files, fonts, and graphics for your document translation project is the best way to get an accurate proposal and schedule.

Topics: Documentation Translation, Localization Technology, Translation Best Practices, Article

Written by Humberto Farrera-Athié

After working in the Information Technology, Software and Localization industry on both sides of the Atlantic since 1995, Humberto has gained extensive expertise in internationalization, localization, languages and computer-assisted translation tools. As the Localization Solutions manager in SimulTrans Ireland, he provides scalable localization solutions for new and existing clients and continuous technical and process support to clients and managers. Humberto has a bachelors degree in International Relations and a masters degree in Information Technology Systems from the Tecnológico de Monterrey in Mexico.

SHARE THIS POST: