The Translator and the Machine, by Dorothy Kenny

The Linguist Published on Thursday, 15 December 2016 Parent Category: News Share page with AddThis
In a shortened version of her Threlford Memorial Lecture, Dorothy Kenny asks what implications new technology has for the translation profession

First published in The Linguist, 55,6

   

Translation without technology is now inconceivable, but the relationship between the two has become somewhat fraught of late. Even as translation activity continues to grow at a dizzying pace worldwide, translators worry about competition from computers, or having to work with poor quality machine output. At the same time, translation teachers are asking themselves what students should be learning now to see them safely through the ‘revolutionary upheaval’ currently under way in translation. After all, received wisdom is that education is the means by which human labour wins the race against technology.

In coming to terms with current upheavals, there is no doubt that what we need is careful, critical examination of what is actually happening in the contemporary world of translation, one that avoids what Michael Cronin dubs “the dual dangers of terminal pessimism and besotted optimism”.1 These positions are all too present in current reflections on translation technology. Cyber-utopian visions of a world without language barriers abound, and even within translation studies, some commentators predict that machine translation will turn most translators into post-editors sometime soon.2

Nor are predictions of wholesale automation limited to translation of the written word, where tools like Google Translate have already made their popular mark. If anything, technology pundits get even more excited about automatic translation of the spoken word – the very stuff of sci-fi fantasy. There are already several systems that use speech recognition to convert speech to written text in the same language, and then use conventional machine translation to translate that written text into another language.3 All that’s then needed is a speech synthesis module to speak the target-language text and we have speech-to-speech translation. The first two steps are error-prone and synthesising natural speech is challenging, but developers at one New York start-up are so confident that they can make the technology work that their Babel fish-style earpieces can already be pre-ordered.4

Predicting the future

Predictions about translation technology need careful scrutiny, because what we believe about the future has profound consequences for the decisions we make today. If it is only a matter of time before technology makes human translators and interpreters obsolete, or before post-editing displaces translation, should we still put effort into training translators and interpreters? And what might a career in post-editing look like anyway? What kind of conditions would post-editors work under? And would they like their jobs?

Before pursuing these questions, I would like to stress that, while I support a critical approach to translation technology, I am not advocating an antagonistic approach.

Despite frequent allusions to translators’ supposed hostility to ‘technology’, there is little to suggest that they harbour negative sentiments towards technology per se. In one recent study by the Finnish researchers Kaisa Koskinen and Minna Ruokonen, for example, some 100 participants were invited to write a short love letter or break-up letter to a technological tool or some other aspect of their work. Most chose to write a love letter.5

Koskinen and Ruokonen’s study covers all sorts of technologies, from search engines to ergonomic mice, but the technologies that are most associated with translation are undoubtedly translation memory (TM) and machine translation (MT), and in particular, statistical machine translation (SMT).

TM tools have been around since the 1990s. Put very simply, they store sentences from previously translated source texts alongside their human translations. If a source sentence (or something like it) is repeated in a subsequent translation job, the tool simply retrieves the existing translation for that sentence from memory and presents it to the human user, who can choose to accept, reject or edit it for current purposes. The human translator remains in control.

Contemporary SMT, on the other hand, is fully automatic translation in which a computer program decides on the most probable translation for a given source sentence, based on a probabilistic model of translation that it has learned from pre-existing source texts and their human translations, and on a probabilistic model of the target language, learned from a large monolingual corpus of texts. Such ‘learning’ is done in a so-called ‘training’ phase. In a second ‘tuning’ phase, system developers work out the optimal weight that should be assigned to each model to get the best outcome.

When the system is called upon to translate new text (in a third phase called ‘decoding’), it searches for the most probable target-language sentence given a particular source sentence, the models it has learned and the weights assigned to them. SMT systems thus have a tri-partite architecture and involve a lot of tuning to find the optimal weights for different models.

The models used are based on n-grams – i.e. strings of one, two, three or n words that appear contiguously in the training data used. SMTs can have difficulty handling discontinuous dependencies, such as that between ‘threw’ and ‘out’ in the sentence ‘She threw all her old clothes out’. This is due to the relatively limited amount of context used to build models, and the fact that the n-grams are translated largely independently of each other and don’t necessarily correspond to any kind of structural unit.

SMT systems are also known to perform poorly for agglutinative and highly inflected languages, as they have no principled way of handling grammatical agreement. Other problems include word drop, where a system fails to translate a given source word, and inconsistency, where the same source-language word is translated two different ways, sometimes in the same sentence. These are precisely the kind of errors that human post-editors are employed to fix.

The editing environments used by post-editors are often the same as those used by translators, namely the interfaces provided by TM tools. Although they are distinct technologies, the lines between TM and SMT are blurring somewhat, as it is now common for translators to be fed automatic translations directly from an SMT system when their translation memory does not contain a match for the source sentence. TM and SMT are also intimately connected by the fact that the translation memories that translators build up over time can become training data for their very own (or someone else’s) SMT engine.

Dominating the field

Despite known problems, SMT systems have come to dominate the field of machine translation, out-performing previously leading systems. In the last two years, however, there has been a new kid on the block: neural machine translation (NMT). Like SMTs, NMT systems learn how to translate from pre-existing source texts and their translations. They have a simpler architecture than SMTs however, and don’t use models based on n-grams. Instead, they use artificial neural networks in which individual nodes that can hold single words, phrases or whole sentences are connected with other nodes in the network. The connections between nodes are strengthened via bilingual training data.

When it comes to translating new inputs, the system reads through the source-language sentence one word at a time. Then it starts outputting one target word at a time until it reaches the end of the sentence. NMT systems thus process full sentences (rather than n-grams). They handle morphology, lexical selection and word order phenomena (including discontinuous dependencies) better than SMTs, but they take much longer and much more computing power to train.

These are problems that large corporations can overcome and, in late September 2016, Google announced that all Chinese-to-English translation delivered by Google Translate for mobile and web apps would henceforth be powered by Google Neural Machine Translation (GNMT).6 However, problems like word drop, mistranslations (especially of rare words) and contextually inappropriate translations can still occur. There may still be work, in other words, for post-editors. To date, however, we have little or no knowledge of what it is like to work as a post-editor of NMT output.

Training implications

But back to our questions: what does all this mean for the training of future translators and interpreters? And what might a career in post-editing look like? To answer the first question it is worth looking to the field of labour economics. It used to be the case that routine work was considered particularly susceptible to computerisation, but with the advent of big data (such as that provided by translation memories), concomitant progress in machine learning and advances in mobile robotics, non-routine cognitive and manual work have both become more amenable to automation.

In one widely cited recent study, Carl Frey and Michael Osborne estimate that, over the next two decades, 47% of American jobs are at high risk of computerisation.7 Interestingly, the work of interpreters and translators does not fall into this category, but appears in the lower end of the medium-risk group. Their work is characterised as requiring high levels of social and creative intelligence – requirements that are seen as creating engineering bottlenecks when it comes to computerisation, even in this era of big data.

It is also worth noting that translation and interpreting are labelled as ‘bright outlook’ occupations by the US Bureau of Labor Statistics, based on the expectation that employment in the sector will grow at a much faster rate than average over the period 2014-2024. According to this analysis, it doesn’t look like translators and interpreters should be throwing in the towel any time soon.

Further insights come from analyses of the market for post-editing. While various surveys show a growing number of language service providers offering MT with post-editing as a service, it is not always clear how much money they make from this activity. The sector-wide survey conducted by Common Sense Advisory in 2014 suggested that MT post-editing accounted for around US$1.1 billion, which is a considerable amount but represents only 3% of the language services market that year, with the vast bulk of revenues coming from traditional translation.

A wholesale shift from translation to post-editing does not seem to make sense, and universities need not abandon the training of translators any time soon on these grounds. On an individual level, translators might see value in integrating post-editing into their profile, while others may balk at the low rates that are sometimes offered.

Comparative pay

The level of remuneration appropriate for post-editing services is a pressing issue. In 2012, the translation agency Translated.net conducted an experiment to work out what would be fair compensation for post-editors. It sent out purchase orders offering different rates for two related jobs: one for translation from scratch, the other for post-editing MT output. The aim was to find the point at which at least 75% of translators would opt for the post-editing job over the translation job.

For English to French/Italian, 75% opted for post-editing once it was paid at 73% of the word rate for translation. In other words, these translators were willing to give a 27% discount for post-editing MT output. The tipping point for English to German, on the other hand, came at 110%. That is, translators wanted a 10% premium for post-editing MT output. The pricing model was ultimately considered unworkable, with the conclusion that post-editor productivity could best be expressed in terms of two key performance indicators: edit time (i.e. the average number of words processed by the post-editor per unit of time); and effort (the average proportion of words changed by the post-editor in the MT output).

These are metrics that can now be fairly easily captured by additions to the very software that post-editors use to do their work. They represent part of the translator/post-editor’s ‘data exhaust’ – by-products of her digital activity as she works her way through a translation using a TM tool enhanced with SMT functions, and, crucially, a keyboard logging tool. Keyboard and mouse activity logging, as well as eye-tracking capabilities, have already been built into experimental translation environments, such as that developed for the Casmacat project.8 Even if this project aimed to develop translation and post-editing interfaces to provide better support for human users, it may one day be seen as the vanguard of tools that bring surveillance of these users to a new level.

Dumbing down

But what of enjoyment in post-editing? The verdict of many practitioners is not very flattering, with Joss Moorkens and Sharon O’Brien reporting it to be “an edit-intensive, mechanical task that requires correction of basic linguistic errors over and over again”.9 As one of their informants puts it: “it’s mechanics, and if it’s mechanic, there must be a way it could be done by a machine.” This is one of the supreme ironies of contemporary machine translation. In some cases, at least, it has resulted in a division of labour between human and machine that assigns the most mechanical of tasks to the human. It is a classic case of deskilling, in which a complex activity previously accomplished, start to finish, by one person is broken up into a series of simplified tasks, requiring less skill than before from the humans involved. No wonder many translators are less than enthusiastic.

A further cause of unease must surely lie in the instruction frequently given to post-editors that they should make the MT output just ‘good enough’. Understandably, many translators/post-editors struggle with a brief that explicitly requires them not to do the job to the best of their ability.

But that, of course, is not the full story. Let us return to the prediction that machine translation will soon turn most translators into post-editors. We have already seen economic reasons why things might not work out this way, but there are other reasons too. One is that it is perfectly possible for translators to take ownership of the entire translation workflow, rather than being restricted to a single task at the end, even if that workflow has been largely automated. Recall that SMT and NMT both rely on bilingual training data; translators who have been using TM tools for years are likely to have at their disposal training data that are particularly appropriate for the kind of work they take on.

For the last decade, open-source tools have existed that allow such translators to use these data to build their own SMT engines. Toolkits such as Moses were initially difficult to use, but over the last five or so years, cloud-based and other SMT services have emerged that make self-service SMT much more accessible. Such services are now covered in at least some university translator training programmes.10 We have yet to see whether do-it-yourself neural machine translation will become a reality, but given the right hardware, a suitable response from software engineers and appropriate education for translators, there is no reason to discount this possibility.

No matter how NMT pans out, at the very least translators and interpreters should attend closely to issues of data ownership and how they wish their digital activity to be logged. Even if the actions of technological elites over the last decade have served to normalise the large-scale online harvesting of all kinds of data, there are legal, economic and political reasons why such harvesting is no longer going unquestioned. Technology watchers are questioning the sustainability of our current ‘winner-takes-all’ digital economy, and national governments and the European Union, in particular, are beginning to react to some of the more pernicious effects of big-data-enabled digital disruption.

Translators and interpreters would do well to heed such developments. They may yet turn out to be just as important to their careers as any technical advances we see in translation technology over the coming years.

The Threlford Memorial Lecture 2016 was given on 17 September at Stationers’ Hall.

Dr Dorothy Kenny is Associate Professor in the School of Applied Language and Intercultural Studies at Dublin City University, Ireland, where she lectures in translation technology, terminology and corpus linguistics. She is Editor of the forthcoming volume Human Issues in Translation Technology (Routledge, 2017).

Notes

1 Cronin, M (2013) Translation in the Digital Age, London: Routledge, 3

2 See Pym, A (2013) ‘Translation Skill-sets in a Machine-Translation Age’. In Meta 58(3): 487-503

3 See, e.g, www.eu-bridge.eu/lecture.html

4 See www.waverlylabs.com

5 Koskinen, K and Ruokonen, M (2017) ‘Love Letters or Hate Mail? Translators’ technology acceptance in the light of their emotional narratives’. In Kenny, D (ed) Human Issues in Translation Technology, London: Routledge

6 See https://research.googleblog.com/2016/ 09/a-neural-network-for-machine.html

7 See Frey, C B and Osborne, M A (2013) ‘The Future of Employment: How susceptible are jobs to computerisation?’; www.oxfordmartin.ox.ac.uk /downloads/academic/The_Future_of_Employment.pdf

8 Koehn, P, et al (2013) Casmacat Final Public Report, Edinburgh: University of Edinburgh; www.casmacat.eu

9 Moorkens, J and O’Brien, S (2017) ‘Assessing User Interface Needs of Post-Editors of Machine Translation’. In op cit. Kenny, D

10 See, e.g, Doherty, S and Kenny, D (2014) ‘The Design and Evaluation of a Statistical Machine Translation Syllabus for Translation Students’. In The Interpreter and Translator Trainer, 8(2): 295-315a

Urls accessed 20/10/16

(The Linguist) | Published 2017-07-12
(The Linguist) | Published 2017-07-03
(The Linguist) | Published 2017-06-21
(News) | Published 2017-05-30
(Blog) | Published 2017-05-12
(The Linguist) | Published 2017-05-05
(The Linguist) | Published 2017-04-28
(The Linguist) | Published 2017-04-07
(News) | Published 2017-04-04
(Brexit) | Published 2017-03-23
(The Linguist) | Published 2017-03-16
(The Linguist) | Published 2017-03-06
(The Linguist) | Published 2017-03-01
(The Linguist) | Published 2017-02-08
Downpatrick Courthouse
(The Linguist) | Published 2017-01-25
(The Linguist) | Published 2017-01-11
(The Linguist) | Published 2016-12-31
Awards 2016
(News) | Published 2016-11-18
(News) | Published 2016-10-04
(Jobs) | Published 2016-09-16
(Jobs) | Published 2016-09-16
(News) | Published 2016-09-07
(News) | Published 2016-08-26
(A Life with Languages) | Published 2016-08-16
(Brexit) | Published 2016-08-01
(News) | Published 2016-07-29
Evelyn Reisinger
(A Life with Languages) | Published 2016-06-03
Threlford Memorial Cup
(News) | Published 2016-05-31
Pie chart
(News) | Published 2016-05-26
Budapest
(Blog) | Published 2016-05-24
Karen Stokes FCIL CL
(News) | Published 2016-04-27
Percy Balemans
(A Life with Languages) | Published 2016-04-07
Pathway
(News) | Published 2016-04-05
Membership pathway
(Blog) | Published 2016-03-31
Language Show Scotland
(News) | Published 2016-03-14
EU flag
(Brexit) | Published 2016-03-01
Raymond Cheng
(A Life with Languages) | Published 2016-02-18
Assessors
(Jobs) | Published 2016-02-17
Irina Young
(A Life with Languages) | Published 2015-11-30
Carine Toucand
(A Life with Languages) | Published 2015-11-23
Angeliki Petrits
(A Life with Languages) | Published 2015-10-22