Book Club: Some notes on the limits and pleasures of machine translation

An extract from Keith Kahn-Harris’s The Babel Message: A Love Letter to Language.

Nov 04, 2023

Morning all. This month’s extract comes from Keith Kahn-Harris’ The Babel Message: A Love Letter to Language, a book born out of his obsession with a tiny piece of paper: the multilingual warning messages found inside Kinder Surprise Eggs, aimed at reminding parents about potential choke hazards. “In the book I recount my adventures commissioning new translations of what I call ‘The Message’ into dozens of other languages,” he says. “The following edited extract explains how I explored the limitations and pleasures of machine translation.”

I defy you to read this extract and not want to get hold of the rest of the book. Enjoy. Jonn

I’m not ashamed to admit that I love Google Translate. There are few things more addictive than idly wondering how you say “squirrel” in Mongolian and satisfying your curiosity with a click (it’s Хэрэм, by the way).

I used Google Translate extensively in researching the book. While I made an iron rule not to use the app as a substitute for commissioning translations of the Message from actual human beings, it did come in handy for other things. When I transcribed Messages in non-Latin scripts, I pasted them into the app in order to check I hadn’t made a mistake. Google Translate helped me to confirm suspicions that some versions of the official Message were not identical in their wording.

Over time, though, I started to use Google Translate the other way round, using the Message as a way of exploring the app’s limitations and capabilities. I conducted an experiment to compare how the official Messages in three different languages – Spanish, Swedish and English – translate into each other on Google Translate. Here is a reminder of the versions of the Message as they appear on the warning message sheet:

WARNING, read and keep: Toy not suitable for children under 3 years. Small parts might be swallowed or inhaled
ATENCIÓN, lea y guarde: Juguete no apto para menores de 3 años. Las partes pequeñas podrían ser ingeridas o inhaladas.
VARNING, läs och behåll: Leksak ej lämplig för barn under 3 år. Små delar kan fastna i halsen eller näsan.

Here is the English Message translated into Spanish and Swedish:

ADVERTENCIA, lea y guarde: Juguete no apto para menores de 3 años. Las piezas pequeñas pueden tragarse o inhalarse.
VARNING, läs och förvara: Leksaken är inte lämplig för barn under 3 år. Små delar kan sväljas eller inandas.

Here is the Spanish Message translated into Swedish and English:

OBSERVERA, läs och spara: Leksak är inte lämplig för barn under 3 år. Små delar kan sväljas eller inandas.
ATTENTION, read and save: Toy not suitable for children under 3 years. Small parts could be swallowed or inhaled.

Finally, here is the Swedish Message translated into English and Spanish:

WARNING, read and keep: Toy not suitable for children under 3 years. Small parts can get stuck in the throat or nose.
ADVERTENCIA, lea y guarde: Juguete no apto para menores de 3 años. Las partes pequeñas pueden atascarse en la garganta o la nariz.

Not one of these Messages is identical. The differences seem fairly minor and, at least in the Spanish and English versions, the Messages remain coherent and grammatical. As I already knew, there is never only one way of saying something in any particular language. Viewed more broadly, the implications of this experiment are far from banal. Language and meaning involve subtle distinctions and nuances; if we trust Google Translate to navigate these complexities, we cannot know the consequences.

In my project to translate the Message into dozens of new languages, I trusted strangers and their knowledge of particular languages, but at least I could ask them questions and they could gain some background information from me that explained what the Message was. With Google Translate, I had to trust a “black box” that neither knew nor cared about the Message.

As with many other online apps, the exact nature of the system that powers Google Translate is unknown to us. Since 2016 it has used a “neural” form of machine translation that learns and improves as it goes, rather than applying fixed grammatical rules and dictionary-based translations. At the core of the system are extensive text corpuses in multiple languages with existing translations between these corpuses used to model new translations. At the time of writing there are 109 languages available, with many more in development. These include all the official Message languages plus lesser-known tongues such as West Frisian and Galician

Every Google Translate language can be translated into every other language, which is both the wondrous marvel of the system and its greatest limitation. As one might imagine, not every language pair has an extensive range of translations. I checked UNESCO’s Index Translationum, which catalogues translated books, to see if anything had been translated from Icelandic to Tajik, or vice versa. The answer was no. Even if there were examples of translations between this language pair, it could well have been done via a third language, most likely English, which is a common practice when translating between lesser-spoken languages. This is, in fact, how Google Translate manages it. It is never clear when the app is translating directly and when it is translating through English or another widely-spoken language.

Even given this limitation, Google Translate is impressive. I put the system to the test by translating the Message successively through all the languages found on the warning message sheet. I started with English, then went through Azeri, Bulgarian and the rest until I reached Chinese, before translating back into English again. This is the outcome:

Use later, read and write. This game is not suitable for children under 3 years old. Small parts can be used or absorbed.

Okay, this isn’t entirely accurate – “use later” and “absorbed” are clearly errors – but it remains intelligible and more or less conveys the correct message. It may be that starting with English helps, since the Message is pinging back and forth between English and other languages. So I tried the experiment again, starting and finishing with a lesser-spoken language, Macedonian. Here is the Macedonian original:

ВНИМАНИЕ, ЧИТАЈ И ЗАЧУВАЈ: Ситните делови можат да бидат проголтани или вдишани.

And here it is after the translation sequence:

погледни изглед изгледа. Мали парчиња може да се проголтаат или вдишат.

In English this translates as:

look look looks. Small pieces can be swallowed or inhaled.

It is astonishing to see that the second sentence is pretty much as it should be. The first sentence, though, is much further away from the original than the English sample, both in its repetitive syntax and its loss of capitalisation.

When we use Google Translate, or another form of machine translation, we are looking for sense and intelligibility. I received a translation of the Message into a language called Karamanli Turkish. This is a dialect of Turkish spoken by Orthodox Christians, whose descendants now live in Greece following the post-First World War expulsions. It is written in the Greek alphabet:

ΟΥΓΙΑΡΕ, οκού βε τουτ σουνού: Μπουό ουντζάκ ούτς γιασιντάν κουτσούκ τζοτσουκλάς ίτσιν ιουγκούν ντείλντις. Κουτσούκ πααρτζαλεριν γιουλτούμα για ντα νεφές μπορουσουνά κάτσμα ρίσκι βάρντις.

After I posted the Karamanli Turkish Message on Facebook, a friend alerted me to the automatic translation that appeared on his feed:

Good morning, let’s go to the world: we are going to have a good time for you. We are looking forward to the future of the world.

I presume that Facebook’s translation algorithm not only mistook this for Greek, but in its drive for intelligibility “corrected” it into something that made sense but was completely unrelated to the original. Google Translate fared better. Like Facebook it judged the Message to be Greek but contented itself with a transliteration into Latin script. When I told the app that the transliteration was Turkish, it had a go at translating it, coming up with:

UGYARE, okou tout presentation: Buo onjak uts yasidan koutsuk jotsuklas itchin yugun evidenced. There was a risk that the holy paratzalers could cause breath injury in the form of swallowing.

The fact that most of this is unreadable is a good thing. It tells the reader that this is unlikely to be any kind of translation. In other cases, Google Translate’s incorrect language recognition can be dangerous as it is almost correct. The app recognises Faroese as Icelandic, for example, and it does so fairly coherently, leading to who knows what misapprehensions. Humans are pattern-recognising creatures and machine translation can satiate our lust to recognise by producing translations that are linguistically “correct” but not translations at all.

Still, at least when you use Google Translate to translate something into your own language, you can tell sense from nonsense. Using it to translate into a language you do not know is a much riskier proposition. An image from 2008 still circulates virally of a bilingual English-Welsh road sign whose original reads “No entry for heavy goods vehicles. Residential site only”. The Welsh version translates to “I am not in the office at the moment. Send any work to be translated”. You can understand how it happened – a local council worker emailed the text to a translator and mistook his auto-respond for the translation.

With Google Translate, all we have are auto-responses. In 2020 an organisation was formed in Japan to campaign against the over-reliance on machine translation into English in constructing official signs.15 Sometimes the resulting errors can be subtle but devastating, such as the Kyoto department store’s slogan “Rising Again, Save the World from Kyoto JAPAN”, and at other times they can cause hilarity, such as the sign: “Please do not move while driving”. Machine translation errors that find their way onto tattoos are often funny too, but I doubt the victims are laughing, such as the owner of the Hebrew tattoo which reads: ‘Babylon is the world’s leading dictionary and translation software’.

Maybe such misunderstandings will be ironed out in the future as Google Translate continues its remorseless development and people become more aware of how (not) to use it. And machine translation helps to counteract some of the global drive towards linguistic uniformity. A world where machine translation is ubiquitous and near-perfect would be – theoretically at least – a world in which it would be possible to be a monoglot speaker of Munegàscu (the language of Monaco, spoken by a few dozen people) and still take a full part in a globalised world.

However well machine translation might end up working, and whatever its considerable advantages are, the technology contains much deeper dangers. When we input nonsense and instantly receive sense, we are in danger of falling into some dangerous delusions about what language is. The magnificent diversity of languages is erased and the beautiful differences between them are flattened. The casual use of machine translation risks cutting ourselves off from what makes language so delightful; its messiness, its confusion, its liberating possibilities.

Keith Kahn-Harris is a sociologist and writer, based in London. The author or co-author of 8 books, The Babel Message: A Love Letter to Language was published by Icon books in 2021. He can be found on his website kahn-harris.org and his social media profiles can be found here.

The Newsletter of (Not Quite) Everything

Discussion about this post