Some see it as an opportunity, while others are still wary: artificial intelligence, or AI, is on everyone's lips. It is being used in numerous areas and is indispensable in global business. Over the past few months, we have focused more intensively on this topic and would like to share our learnings with you in this article.
Why AI? To save time and money!
AI can help reduce language barriers, especially in the global people business. With the right tools, more affordable, faster, and more effective global communication can be achieved – both for internal corporate communication with our employees from over 40 nations and with our valued customers worldwide. After all, we are active with our services in more than 75 countries.
With an increasing focus on audiovisual communication, we want to provide more personal insights into our company and, above all, show one thing: people! Subtitles hardly seemed adequate for this. Therefore, we have used an AI tool to transfer the original recordings with lip synchronization of the speakers into other languages.
Which tool is particularly suitable for Western languages? In our opinion, it's “Rask”!
Basically, AI video editing tools offer the possibility to test their functions with your own video for free before subscribing. We recommend taking advantage of this offer to compare the results and select the tool that best meets your needs. We chose “Rask”.
How does working with the tool function? First, the AI defines the number of different speakers. Then, the spoken word is transcribed and translated into the target language. Subsequently, lip synchronization occurs “at the push of a button.” For this, the original voices are cloned.
While the complete translation takes only a few minutes depending on the length of the video, generating the lip synchronization (LipSync for short) can currently still take up to several hours. Why is that? The time to complete synchronization depends on the number of projects in the queue that the tool has to process.
In some cases, it may be necessary to apply additional manual video editing. For example, if there is B-roll material in the video: When lip synchronization is activated, the AI will make the mouths move in the B-roll footage as well. Additionally, original English sections are not “skipped” during synchronization into English but are newly translated and overdubbed with the cloned voice.
What challenges can be expected? The devil is in the details!
Both translation and lip synchronization presented challenges. Although both processes are started “at the push of a button,” time and patience are required for post-processing:
Challenges with AI translation and synchronization:
Incorrect translations due to speaking too quickly and/or unclearly
Different text lengths after translation into other languages: This resulted in unnaturally fast or slow speaking speeds in the synchronized version (for example, English translations are usually shorter than the original German texts).
Challenges with AI lip synchronization:
Mouths sometimes remained closed and did not move even though the person was speaking
"Flickering" in the facial area (especially in the early stages of the tool and with men who have facial hair)
Corrections are only visible after re-synchronization. Due to the detailed work required, a significant amount of waiting time had to be bridged with other tasks, so the videos could not be completed without interruptions.
How can these challenges be overcome? Through creative human detours!
Automatic translation? A review by native speakers is still worthwhile:
Despite generally well-done, automatically generated translations, it remains essential to have them thoroughly reviewed—ideally by native speakers. The AI does not always correctly understand the context, which can lead to translation errors and real pitfalls. The AI also doesn't refrain from translating terms that should remain in the original corporate wording.
Unhappy with the LipSync result? Here's what you can do:
Except for a few exceptions, it is generally possible to manually correct irregularities in lip synchronization and different speaking speeds by adjusting at the timestamp level. Upon inquiry, Rask's customer service informed us that these irregularities (such as a closed mouth despite spoken text) depend on the mouth movements in the original video. Future updates are expected to bring quick improvements in this area.
Unnatural speaking speed? This new function in Rask can help:
Too many or too few characters in the translated transcription resulted in unnatural speaking speed. Previously, the only option was to independently think of ways to supplement the text with filler words or additional information to reach the required character count for the respective timeframe. This was very time-consuming. Now, there is a new "push-button" solution: Translations with significantly fewer or more characters than the original can now be reformulated by the AI to the needed character length without losing information. We tested it and can confirm: It works!
Incorrect pronunciation? Get creative:
Despite consistent spelling, the AI may pronounce the same word differently. For example, our company name "ICUnet" was sometimes pronounced correctly (/aisijunet/) and sometimes incorrectly (/ikunet/). What did we do? We tested various spellings, hoping one would result in the correct pronunciation. The solution for our case was: "I see you net."
The most impressive insights?
AI learns incredibly fast!
Each individual update brought significantly noticeable improvements:
When we first started working with the tool, we were sometimes still dissatisfied with the lip synchronization results: flickering in the mouth area, unnatural smiles, fake teeth. Especially since LipSync at Rask is no longer in the beta phase, the results have improved enormously. The synchronization process is now also much faster.
The pronunciation in the synchronization changed from time to time: for example, the AI eventually started responding to punctuation marks and accordingly changed the intonation of the sentence.
Sometimes regional accents from the original video were retained: for instance, the cloned voice in the English synchronization sometimes retained an Austrian or Indian accent. While this is impressive, it can also be a point of criticism: does this imply that an Indian or Austrian person cannot speak accent-free English? The tool initially did not offer a way to influence this. Only after several updates did the accents disappear.
There is a difference in the synchronization results between Western and Eastern languages!
We used Rask to translate and synchronize our videos in English, Spanish, French, and Chinese. What was the feedback from our native speakers? “Rask” seems to work really well for Western languages. However, the feedback on the Chinese result was critical: tone and pauses did not match linguistic norms and sounded very mechanical and emotionless. As a consequence, the individual speaking style was lost. The emphasis was sometimes incorrect and could not be manually corrected.
This feedback highlights the importance of incorporating intercultural expertise and correction by native speakers. Since AI standards in China and Europe vary significantly, we decided to produce videos for the Chinese market with English lip synchronization and Chinese subtitles, the quality of which we could confidently ensure.
Conclusion: We say "Yes" to AI – but our natives have the last word!
It is impressive how Rask's AI clones voices, allowing our employees to speak any language and accent! This is an innovative method for overcoming language barriers to enable faster global communication. Processes are generally accelerated: translations can be implemented much faster and cheaper. Despite this advanced AI technology, inaccuracies in translation and synchronization can still occur, making reviews by native speakers necessary. We believe this is the only way to ensure quality and linguistic accuracy.
How quickly do you get familiar with the tool? Initially, it requires some time investment. However, "learning by doing" is an approach that quickly builds confidence in using this overall quite intuitive tool. Customer service typically responds quickly to any questions.
AI cannot take over 100% of the work, especially if you have high-quality standards. We have seen many benefits in video editing with Rask and are very satisfied with the final result. At the same time, we are looking forward to further revolutionary functions, as new updates are released at short intervals.
In conclusion, transparency, honesty, and an open approach to this topic are important to us. Employees were asked for their consent before publication, and a clearly visible disclaimer was included in each video. This consistently points out that the edited video is not an original recording and that AI was used.
Curious? You can see the results for yourself in the video! If you want to see more, we warmly invite you to browse our websites for the USA, India, China, Mexico and France, where you will find the translated and synchronized videos in full length.