Marketing
❯
Improve Audience targeting
❯AI solution to automatically identify false positives from a specific check for untranslated target segments by an automated quality assurance tool
AI solution to automatically identify false positives from a specific check for untranslated target segments by an automated quality assurance tool
For:
Customers, translation partners, end users of the translated content.Goal:
Improved Employee EfficiencyProblem addressed
To reduce the number of false positive issues in the check for untranslated
target segments in bilingual content with an in-house automated quality
assurance tool.
Scope of use case
The scope of this use case is limited to automated linguistic quality assurance
tools, but the outcome of this use case can be applicable to other areas, such as
machine translation, automated post-editing, computer-aided translation
analysis and pre-translation.
Description
Untranslated target segments contain characters, symbols,
and words that remain the same in the source and target
language. These segments can contain numbers,
alphanumeric content, code, e-mail addresses, prices, proper
nouns, etc. or any combination of these. On a yearly basis,
this check produces over one million potential issues across
over fifty different languages.
Refining this check manually, based on annotated false-
positive data for each specific customer and product and for
35
specific language pairs, is very costly, and the coverage is
never sufficient, as new content is constantly produced and
there are always new opportunities for refining this check
via code. In addition, because of the high proportion of false
positives (over 95,5 %) our translators tend to ignore the
output from this valuable check and in many cases we
suspect that valid relevant issues are missed in situations
where there is actual translations omission.
There are typically three types of false positives for this type
of check.
1) Language-specific false positives, for example, in
situations where source and target segments are expected to
be the same because the words in these segments are
"cognates" with the same meaning (See Table 37, example 1).
2) Customer profile-specific false positives, for example,
situations where certain segments are to be left untranslated
based on specific guidelines from the customer, such as
segments that just consist of company names, product names
or specific words, and segments that the customer has
determined are not necessary to be translated (See Table 37,
example 2).
3) Segments that remain the same in the source and target,
because they act as special types of entities with some special
meaning, for example alphanumeric segments, such as part
numbers, placeholders, and code (See Table 37, example 3).
The idea is to create an AI solution that can automatically
identify results from the "check for untranslated target
segment" that are likely to be a false positive. With this
solution, we expect to reduce the number of potential issues
presented by this check to our end users by 80 %. This way
our end users can focus their efforts on those potential issues
that are more likely to be valid corrections because there can
have been a translation omission. In addition, we would be
able to increase the productivity of our end users when they
review potential issues flagged by automated quality
assurance during their bilingual content evaluation, and we
are able to save costs internally as we are not expected to
manually implement code changes in this check based on
manual analysis of our data based on users' annotations.
Machine Learning
AI: Understand