Machine translation: For human and machine

29.09.2022

Machine systems are designed to simplify translation and achieve better results. Technical information is suitable for this because it has a high degree of repetition and standardized texts.

Editorial guidelines are rules for writing technical information. For example, they define how instructions, headings, or lists are to be written. They may also point out that certain grammatical and stylistic formulations, such as passive constructions, conjunctions or certain abbreviations, are to be avoided. These rules are often based on guidelines and publications on rule-based writing and rules for controlled or simplified language. The rules are intended to help technical editors create a text that is easy to understand and translate. Ultimately, the goal is to ensure that a reader can use the technology in question.

The use of computer-aided translation (CAT) tools and translation memories when the source text is not standardized, there are errors in the translation, or high costs. To reduce these risks, it is important to update the translation rules regularly because machines understand text in a different way than humans do. Texts must therefore be altered so that they can be correctly interpreted by the machine.

Machine translation methods key terms

Statistical Machine Translation (SMT)

Statistical Machine Translation uses statistical analysis and prediction algorithms to determine which translation of a character, word, phrase, or sentence is likely. Training an SMT makes use of a bilingual corpus that is adapted to an application and delivers the best results.

Rule-based Machine Translation (RBMT)

RBMT is based on a set of rules that represent the grammar of a language. Dictionaries for general vocabulary and specialist dictionaries for more specific vocabulary are also used.

Neural Machine Translation (NMT)

NMT is based on an artificial neural network. It learns to recognize patterns in texts. The MT engine generates translations, compares them repeatedly with reference material and thus "learns" to translate

Suitability of headings

In technical information, headings should be short, concise and uniform, and avoid redundancies. In addition, they should be formulated in a task-oriented manner and, if possible, dispense with nominalizations. Readers should be able to immediately see what a chapter is about without receiving unnecessary information that overwhelms or distracts them from the essentials. Headings should also be clearly distinct from the descriptive and instructional text.

Although the rules for writing a heading are logical and should also be understood by machines, one example of challenges is that MT may translate a heading like an instruction (i.e. “install Windows”). In addition to unsightly phrasing in the target language, the consequences are incorrect back- translation that may cause confusion. This is particularly apparent in English – German translations as indicated below comparing DeepL (DE-EN/EN-DE) and Google (DE-EN/EN-DE).

Windows installieren - (Deepl) Install Windows /Windows installieren - (Google) Install Windows /Installieren Sie Windows

Installation von Windows - (Deepl) Installing Windows /Installation von Windows - (Google) Windows installation /Windows-Installation

Nominalization should be used for writing headings in machine translation. This measure makes the English translation more precise when compared to the “subject + verb” structure.

An issue affecting all languages are words with identical singular and plural endings, for example, “information” or “data”, must be defined more precisely. Depending on the language into which the headings are translated, incorrect grammatical forms or terms may occur. Finally, the rule for avoiding redundancies is also critical. It is not only MT that has problems making connections between seemingly redundancy-free headings. It can also happen to a person if they do not immediately see the complete context.

Instructions

Instructions and descriptions in technical information should be written in a way that the content is clearly recognizable.

Definitions in style guides on how not only the formulations of headings, but also the content of the headings should be taken into account when using MT. In addition, repeating references to the actual objects is crucial for MT, as well as for readers. Entries consisting of just one word leave MT plenty of room for interpretation. Words that can have different forms, depending on their use, can lead to errors. Any problems could be resolved through queries from the translator. However, this only works if a human translates the sentence. The result is an incorrect translation or an uncertain statement.

Compared to generic MTs, it is also evident that when Machine Translation does not have domain-specific datasets for passive sentences, it delivers increasingly incorrect results.

Abbreviations

Many technical information texts use abbreviations and acronyms to shorten words that are repeated often, however, they can be a major problem for machine translation. This is the case particularly when customer-specific abbreviations or acronyms are involved. Different interpretations can occur depending on the type of dataset used with the MT. Create rules that recommend using only known abbreviations or shortening words sensibly.

Achten Sie darauf, … - (Deepl) Make sure… Ensure that … Pay attention … - (Google) Be sure to … Take care … Be sure …

Stellen Sie sicher, … - (Deepl) Make sure … Ensure that … Be sure … - (Google) Make sure

In the case of acronyms based on language-specific words, such as “HW” for “hot water” or “hardware” or “CW” for “cold water” or “calendar week”, an unaltered inclusion in the target language would confuse the reader. After all, they won't be able to understand abbreviations that aren't present in their language. Or they may incorrectly connect the abbreviation to hardware instead of hot water. A reader unfamiliar with the subject can quickly lose the meaning of the abbreviation and fail to assign or assign incorrect meaning to it.

Number of meanings

In many languages, it is important that the correct naming convention is used to ensure that the reader of the text immediately understands what is meant. However, a word in one language can have several meanings. This variation is often prevented using prescribed terminology within the Technical Editorial team. There, attention is paid to the fact that a word can or should have only one meaning. The German word “freigeschaltet” is one example, depending on how it used in technical information, it can mean “de-energized”, “released” or “activated”. The specialist user understands its meaning from the context. However, such words can lead machine translation and even human translators to make mistakes when the context is unclear. The use of designations that are defined in either customer or project level, that allow for multiple interpretations should be controlled more precisely. In addition, depending on the type of MT (generic or domain-specific), lighter or stricter specifications may be necessary.

Texts without errors

Correct spelling is very important for machine translation, words are translated as they are found in the database and interpreted accordingly. Many editorial guidelines have already sufficiently dealt with writing and punctuation. Incorrect punctuation can result in a sentence not being read correctly by the MT. One example is the absence of commas when separating individual parts of a sentence, leading to subordinate clauses not being correctly recognized and parts of sentences being contracted. As a result, it incorrectly correlates the different parts of the sentence. The same applies to missing hyphens.

It is also possible that sentences are not translated as such when the final full stop is missing. Normally, such errors are detected and corrected during translation at the latest – when the translation is carried out by a human the risk of errors is lower.

The clearer the references, the higher the probability that MT will recognize the information correctly.

Coherence of markups

Regardless of whether a TMX editor, Adobe FrameMaker, Madcap Flare or another authoring system is used, texts are marked up to highlight sentences or words. Editorial guidelines provide rules for marking up texts. In addition, some rules should be extended for machine translation. When reading a text and its included markups or tags, MT can only recognize markups that it is already familiar with. Unknown markups, on the other hand, lead to an incorrect interpretation of correlations, as sentences are either read separately or markups are ignored and translates the corresponding positions as if they were “normal” words. These issues can be solved both manually as well as adding markups to the MT software.

Course of action

The list of possible problems that may arise with technical information and machine translation can be elaborated on further. After all, there will always be a subject that poses difficulties, and which could be coordinated better. It also demonstrates that rules for machine translation cannot be created overnight, let alone for universal application. Many rules that have been defined for comprehensible writing, and which primarily address the reader, can also apply to machine translation. However, certain cases in which only a human understands without problem represents unforeseen obstacles to machine translation.

Consider how the translation system can handle existing and future data efficiently.

With advances in neural machine translation and artificial intelligence, one might think that the adaptation of rules is not necessary. After all, such systems are trained using larger generic and adapted datasets, which could result in understandable and correct translations. But this impression is deceptive. Translation errors in the target language are usually only detected by native speakers and only when the action to be taken does not make sense.

Reduce inconsistent or ambiguous expressions.

If no effort to reduce inconsistent or ambiguous expressions in technical information is made, the quality of the translations will deteriorate. This can also happen when translating with a translation memory system.

Check rule adjustment for each text type.

Rules are then applicable in different areas, regardless of the subject area, text type or addressee, and will contribute to translation quality. To review and validate existing and new rules makes working with developments in machine translation interesting. It is important to analyze how individual systems behave with different text types. The goal is to one day be able to facilitate texts that can be written for both humans and machines.