Home
 Solutions
 News
 Education
 Books
 Careers
 Contact

Printing problems?
If you are having trouble printing information from SimulTrans´ website, please visit our
printing advice page.

 Localizing Multimedia

Summary

In June, the SimulTrans Localization Seminar Series featured two speakers, Glyn O'Leary, Managing Directory of SimulTrans' Dublin subsidiary, and Pat Wylie, Localization Manager for Humongous Entertainment in Washington.

O'Leary's current multimedia localization issues included:

  • Voice and sound reproduction
  • Cultural issues in multimedia
  • Reasons for high-cost structures
  • Localization methods and tools

Wylie addressed more specific concerns for multimedia localization, such as:

  • Proprietary languages and tools
  • Multimedia localization kits
  • Integration and testing
  • Casting and character names

Glyn O'Leary

O'Leary has ten years of localization experience. Prior to joining SimulTrans, he held senior management positions at Lotus Development Ireland, IDOC Europe, and Measurex. He holds a degree in computer science and mathematics from University College Cork.

Our interest in multimedia localization primarily stems from the fact that we have so much of it to do. Because so many of our clients come with so many diverse requirements, we've had to become a sort of "jack-of-all-trades."

I've been dealing with development people for 10 years, and I find myself saying now to multimedia developers what I said to Lotus development in 1988. I'm trying to teach them the same types of things and get them to follow the same types of rules. The complexity of multimedia development from a localization point of view is far greater than regular apps.

Routinely, if you have a Windows app that needs to be localized, you'll have a set of resources and associated code with call-outs. During localization, this is the main file you'll work with. You will get all of the text, screen items, the re-sizing information. You can apply tools from various companies, work on the main file, and put it back together. Seldom will you even have to run a build.

One big, complicated puzzle

In multimedia localization, there is text, graphics, sound, video, etc. It's one big puzzle that has to be taken apart and often put together again almost from scratch.

People often think of a multimedia application as something with text and sound. But multimedia is a significant combination (three to five) of text, graphics, sounds, animation, and video.

Multimedia localization is a slow process. It's complicated and, technically, very challenging. It can also be very costly. Based upon the size of the applet, it costs five times as much as routine localization. It is also subject to mavericks, which means no two companies seldom do the same thing; and no two developers in the same company seldom do the same thing. That means every time you take the puzzle apart, it's a brand new task. A company which produced numerous products and had them developed by third-party developers or different teams within its own company, creates a different puzzle each time for you to take apart.

There is no set way of developing a multimedia application. It's not treated as a niche area by most localization vendors --in contrast with the publishers. It's important to realize that this technology matters to publishers. If a publisher senses it doesn't matter to you, if you haven't established a multimedia section, it won't connect with you.

Management at multimedia companies is seldom interested in details. They want to know how much it's going to cost, what the complications are likely to be. Tell them it will be costly, slow, and technically challenging. Often, if you try to contact the developer, he or she has moved on. The actual engine they used to develop things can either be homemade or based loosely on some commercially available engine. The developer may also simply not want to offer specific information, for whatever reason.

Leveraging; a lost art

Few efficiencies can be gained from experience. With a routine Windows app, when different versions evolve you would expect to take advantage of some efficiencies. You already understand how the app builds, how it installs, etc. You can leverage the text. This rarely happens in multimedia because what you get back is the very latest technology that was developed with a different version of the engine, or a brand-new engine, or a brand-new developer.

English versus United States market competitiveness means further complications. U.S. marketing and developers first egg each other on with new features, resulting in new additions costing thousands of dollars. The cool, new features then don't play well in France. You go from localizing an English product for France (or Germany or Italy or Spain), to a situation where you're nearly developing a French product from scratch for reasons I'll go into later. For example, the developer may decide to use Mickey Mouse waving the U.S. flag, which means nothing in France. You will need to come up with the equivalent Mickey Mouse doing something similar in France. The localization task may have started out being a simple matter of translation, re-engineering, and putting the puzzle back together. Now it becomes a matter of developing a new product for that market--which carries associated costs.

Sound and video carry further complications. Italians would rather see Italians speaking, not a dubbed version. Furthermore, there are no neutral French or German accents; people may or may not like the accent you choose (it's too industrial, too artistic, etc.). Complications become enormous once your cross the Atlantic.

The cost for people putting their voices to products is astronomical, sometimes $3,000 to $3,500 a day. You may then only get two hours work because they'll spend the other six telling you what is wrong and how things should be rewritten.

Engines pose other problems. They're numerous and specific. If you're doing 100 multimedia products, you'll probably come across 100 variations of engines. Text is embedded everywhere.

Often an app is developed on Macintosh and then ported to Windows. It looks good in the development environment but not in the running environment. Or it doesn't look good when you go to French Windows. Often, the developer of the English product will note that his largest Help pop-up message contains four lines of text. He'll then make that one pop-up box the maximum standard. When it is translated into German, the text expands by 35% and runs out of the box. Developers should always allow for text expansion.

String recovery and comparison is another problem. When you translate a string, it may look fine on the screen. However, an app may ask you a question as part of a computer-based training method. You give your answer and it is compared to a string embedded deep in the code. You wind up with functionality problems.

Care and feeding of graphics

All graphics should be stored in the industry standard, Photoshop EPS. That means that a file will contain various layers. From a localization point of view, you can translate layer by layer. Also, don't use background gradation; you need to standardize and retain pallets, otherwise expect a significant increase in time and money.

As for fancy text and fonts, the general rule is: The better it looks, the more costly it will be to replicate it. A localization engineer may need to take a graphic, cut out the English text, re-build it with translated text, re-create the same fonts, and fill in the various colors. It might take 1 ˝ hours and cost $100 or more. And you could be dealing with hundreds of those screens.

Voices and sound effects

From a sound point of view, the more voices you have in the U.S. product, the more costly it's going to be going into Germany and France. There will be problems with voice variations and talent availability. Finding five Mickey Mouses is going to be five times harder than finding one. I know. I've been part of a team that's trying to find a Romanian Mickey Mouse. Often, the people translating the product are looking for a particular type of voice. You encounter some type of legacy. In this case, Disney had a movie out and we were trying to get that same voice. The man knew we were trying to get him, so he was extremely expensive. The more specific the voice is in the U.S. product, the more specific it will need to be in your target language--and the more difficult and costly.

Sound effects should be separate from voice. When music is playing and somebody is talking, make sure they're two different tracks, two different sounds. Then, when it comes to replicating it, you need only address the talking (and don't need to find the same music and blend it in).

Sound needs to expand in another language by as much as 50%. Taking sound into another language, say German, can completely throw off synchronization. What you end up doing is compressing the sound and dealing with the pitch.

From a video point of view, avoid lip sync video when possible. Stay away from head shots. Allow for expansion. We speak more slowly in Europe, so allow for more time for the same thing to be said before you move on to the next scene.

Use animation when possible. You don't have to sync in with anybody's lips. You just speak, take your time, and lay it over.

The single biggest issue

More about synchronization, the single biggest issue in multimedia. People paying for the localization project do not want to have to re-shoot expensive video. Instead, they ask you to use the same characters, just put in a different sound. Unfortunately, if we leave the video as is, and put German in its place, the characters finish speaking five seconds after the video. You'll need to speed up the (translated) sound while the video stays the same. If a German watches this video and it's important to him or her, the drop in sound quality will be very annoying.

A note to publishers

You should weigh a neat U.S. product against huge international tasks. If it's graphic-rich, has interesting characters and funny sounds and uses 25 different voices, it's going to cost something astronomical if you want to do it properly in France. If you don't want to do it properly, quality issues arise and you're left wondering whether or not you should spend the money in the first place.

Animate if you can. Segregate and document your process. Keep the international parts in one place. If you have your text in a single file, all of your comparison strings, everything neatly defined, all of the sounds that need to be localized in a particular sub-directory, you can immediately identify the size of the job.

Plan for expansion. A film company may never plan to translate a movie, however, that's exactly the sort of project a localization company eventually confronts.

Avoid culture messages. Try and avoid cultural things that mean a lot but in a very isolated area. I call America an isolated area.

Closing words

If you must get into multimedia localization, employ the expertise that's needed. Teach localization to multimedia engineers; don't teach multimedia to localization engineers. It is possible to teach someone with sound or video skills what needs to be done for localization. However, it's difficult to teach someone who knows localization on a regular Windows product what it means to synchronize, etc.

Invest in the technology. Obviously, to do things like sound recording, you need to invest in machinery, software, recording studios, etc.

Get good partners. You won't be able to find every voice in France and Germany yourself. You need to partner with those who have these talents readily available.

Take the business very seriously because it is a very serious business for the publisher. A development company often treats its multimedia app like a baby: it's theirs, they've developed it, everything they've done in the last six months went into it. They want to hand it over to somebody who feels the same. They need you to understand sound, video, and the technology and effort that went into their product.

Pat Wylie

Wylie has served as both a programmer and manager for out-sourced localization projects. Prior to Humongous, he helped streamline the localization efforts at Cavedog Entertainment. He holds a degree in microbiology from the University of Washington.

Humongous Entertainment makes software games or multimedia games for kids from the ages 3 to 10, i.e., "Freddy Fish" and "Pajama Sam." Our multimedia consists of animation, talkies, music and drawn, high quality graphics. We make hybridized games into Windows 3.1 and 95, and for Macintosh.

I oversee the development cycle with the international market in mind. I encourage developers to look at U.S. products with a localization mind set. The development cycle is never long enough, and once the crunch mode starts, the localization process goes out the door.

I also oversee the engineering and testing of games. I started out at Humongous as a programmer so I understand developers. This helps me reach my end goal of making a good localized product. I also oversee packaging and some marketing.

Proprietary languages and tools

Humongous Entertainment uses a language called SCUM developed at Lucas Arts. It stands for Script Creation Utility for Maniac Mansion, or Scripting Utility for Maniac Mansion. It's a higher level language based somewhat on C and allows people with little programming experience to build a multimedia game. We use this because it's very hard to find people to program games unless you pay top dollar. We're able to get people that will accept lower salaries for an entry-level job into the world of development.

Because it is a proprietary language, our code cannot leave the premises. This makes it difficult for a company such as SimulTrans to localize our products. Items that do go out include text for the talkies, art, and packaging. This calls for extensive preparation since our localization partners receive limited items. If our information is poorly thought out, it requires more time to localize.

Building a better kit

As a result, our localization kits are extremely important to the development of our localized products. We include everything other than the proprietary language, such as maps which reveal which parts of the game need to be isolated. We have character descriptions and scripts which allow the actors to get a feeling for what's going on with the characters, how they can suit their acting style and turn it into German or French, for example. We include detailed character descriptions and scripts, art and text, Help files, read me's, resource scripts for dialogue boxes, and packaging.

Past, present, and future

Humongous has localized some10 games per year (not much when you consider most companies like Humongous produce hundreds of titles each year).We'll probably do 40 next year. In the past, we didn't use kits and we didn't have programmers who understood our language. We didn't have partners that knew anything about our language. All this led to major problems.

Our kits usually take one to two months to produce. They begin to take shape during the U.S. production cycle and, by the end of the cycle, we can strip everything out of the game. We're able to plot all of the text, talkies, and art into special areas and special folders. We can then send out a kit on short notice. It also makes the product much easier to localize.

All of our art and packaging is currently being done out-of-house. We planning to change this. We use an animation suite called Splat. It's a typical Director-like program where you have layered graphics and frames. Currently we need to give our localization partner some sort of reference for what they're actually translating. To do so, we make a composite frame, put the graphics together, then forward it to our localization partners who translate the materials and return them. We then strip off the back graphic.

This process is problematic for a number of reasons. It can take weeks when you're dealing with 500 frames of art. We're going to scrap this process and do the art in-house. We'll simply give our partners text documents to translate. We will then put the text into the animation files. This method will most likely take two people, full time, to complete 40 games in one year.

Our games are rather big and have somewhere around 8,000 talkies and 2,000 animations. This makes for a cumbersome localization kit. Someone needs to systematically go through and look at each frame to make sure there isn't any text in it. If there is text, you need to save it all. It's something to consider if you are a developer: Document everything.

Beware of complex art

Watch out for art that features a letter or a word in multiple parts of the game. For example, our game "Pajama Sam," features a character called Darkness. Darkness is Pajama Sam's fear, so when Pajama Sam goes into Darkness' world, there are Ds and Darknesses written everywhere. You want to avoid this kind of thing in a U.S. development process because, if you have "D" or "Darkness" residing in multiple areas of the game, each piece of art needs to be translated. We asked our localization people to please not translate the word Darkness, or please use the letter D as the beginning letter for the name Darkness. Most did that--except France. They chose "obscure." Fortunately in "obscure," the "O" is rather close to "D." Not much art manipulation on their side, but a lot of work on ours. If possible, portray a feeling with something other than text or art that has text in it. It requires far less work to localize.

The essential script

Scripts are the most important item in our kits because they allow the actors to see the references and get a feeling for what the characters meant in English. They can then convey the same feeling in their respective countries. We have two scripts in our games that feature the same lines or talkies. Our initial script is a simple Word document. We organize items by room, line number, and then character. This allows for flow of conversation in a room. Then you have the line numbers for the flow of conversation, and the actors going back and forth.

We also add contextual references for lines that are puns, jokes, or have cultural meanings. Our games are filled with puns and jokes, so basically this document is nothing but contextual references above every line of talkie. It takes a great deal of time, but if you want the game to have the same feel that you have in the U.S. product, you need to let people in other countries know what you meant in the English product. Often, we tell our localization team what we did for our reference and we allow them to come up with something that fits their market. They can actually change the talkies; they don't necessarily need to translate verbatim.

The second script is an Excel spreadsheet which can be sorted any way you want. We have items sorted by character, talkie number, and room. This allows people at the recording studio to pull out talkies that occur in two different rooms that are identical. We re-use numerous talkies, or simply portions of them, because of high recording costs. This spreadsheet allows people to cross-reference and figure out which talkie happens in multiple places. Obviously, these things add a lot of time and expense. Multimedia games cost a lot of money to produce, but they also sell well. We have about 8,000 talkies maximum in a game, which typically cost $50,000 to record.

Lip synching pros and cons

Lip synching is very expensive. We tried to lip sync one game in its entirety and had to hire a staff of 10 to do just lip synching. Fortunately, we have an animation suite that allows you to lip sync. We get a talkie and put tokens on it. This allows you to place markers on parts of the talkie and allows the character to start flapping its mouth or have certain mouth openings. Because of the high cost, we try not to lip sync in the foreign versions except for extreme close-ups and for facial shots. If we were to lip sync one game, it would take one person one month to lip sync most of the facial shot talkies. If we do 40 games in a year, we'll need an entire staff who just lip syncs--an astronomical expense. We have a random mouth flapping generator that allows for a talkie to go longer, say in German. Because of this, we don't have to do any expanding or any shrinking. We don't have to ruin the sound of the talkies, which are pretty low quality to begin with by their very nature. If you start stretching and compressing low-quality sound, it will sound horrible.

The music in our games is done in the same manner as the talkies. We send the music track to our foreign partners, both with and without vocals. This provides a reference for the actors to sing. They return the vocal track to us synched up to the music. This prevents the vocal line from going way off the end of the music. For example, when Pajama Sam liberates carrots from the refrigerator, the carrots launch into a carrot opera. Some 10 different actors are involved in this small piece. If any of the actors went too long or too short, it would be a disaster.

Naming and casting of characters

The naming of characters presents its own set of challenges. One of our games concerns "Putt-Putt Goes to the Zoo" (or something like that), and in Germany they used "Tuff-Tuff." Belgium decided to translate very little because English is such a widely recognized language in the Benelux countries. If you want to start localizing, that's a good country to start with; we've had nothing but success in Belgium. And France did "Puss-Puss," which is also a rendition of a children's car sound.

Our marketing people trust those who have been our partners for a long time. We've been selling our games in Germany, France, Sweden, Japan, and Belgium, and we give some freedom to our partners in these situations.

In casting, you've got one character so you get one actor for that character. You basically need one actor for every main character of the game, otherwise, it doesn't sound good. We did our first Freddy Fish game in French and used only three actors for all 30 characters in the game. We ended up with the characters sounding like a bunch of adult men trying to act like little kids.

When you get sound bites from different countries, know exactly what you're getting. Germany once gave me several character files for Spy Fox and the actors sounded great. They sounded professional, not like someone reading off of paper. However, they weren't portraying characters in our game. … I told them what I thought and they said they would give me more examples of these same actors. They sent more talkies that now sounded terrible and still didn't match our game. After about a month, they informed me that these weren't actually examples of the characters in the game, rather, they were examples from their collection of sound files. My point is, if you're going to get sound files for your multimedia projects, it's easier to cast if they give you representations of the characters in the games. They need to go into the studio, record a couple of lines with each actor, and then send those to you. It's harder to get a representation of whether these actors can do a good job if they just send you parts of their collection. It costs more money, but it's worth it in the long run.

Integration and testing

We integrate the art, talkies, resource script, and dialogue boxes. We integrate the talkie sound effects and music first because, invariably, out of 8,000 talkies, one goes unrecorded. Our compiler detects this and we're able to instantly tell our partners. They can then book studio time to record any missing lines. Humongous gives us two months to integrate and test, and if we don't have all the resources or assets for the game, it's going to take our localization team a month to secure recording time and return the files. That means we wouldn't test a final product until the final weeks of our cycle.

We do black box scripting, which means our testers look at the game rather than code. They play the game and try to find typical bugs that happen in a localized product. We've done so many games now that few bugs are based on our engine. As a result, testers are free to look for missing or damaged talkies, missing art or sound effects, and dialogue box problems.

As for colors, keep the pallet the same from the time you send it out to when it comes back. The best way to do this is to not increase the pallet size from your typical 8-bit or 256 colors to something like a 24-bit or true color. This destroys your pallet, and when they bring it back down to 8-bit color, and you have art that is useless. Some of our games require that the pallet stay the same because code is based on places in the pallet that reference certain actions. People manipulating art should definitely not manipulate the pallet.

Closing words

In summary, it's important to build a complete localization kit and make sure all the text, talkies, art, etc., is included. If localizers need to ask questions regarding the kit, that means our kit wasn't complete.

STLine8

SimulTrans, L.L.C.
1370 Willow Road; Menlo Park, California 94025  USA; +1-650-605-1300;
info@simultrans.com

SimulTrans is proud to use the products of its clients, NetObjects Fusion and Netscape Communicator, for the development and viewing of this site.
© Copyright 1996-2001, SimulTrans, L.L.C.  All rights reserved.