How good is automated translation in 2017? There is a new company called DeepL that seems to have “cracked” the translation problem in the sense that it makes fewer errors than non-professional human translators. That’s my claim, not theirs, but since their system is online for anybody to try, you can make your own experiments.

I decided to take an extract from a Michel Houellebecq‘s novel to illustrate objectively the point. It is in French, but I do not assume you can read it:

Charny il ne reste qu’une épicerie ; mais la camionnette du boucher passe le mercredi, celle du poissonnier le vendredi ; souvent, le samedi midi, sa grand-mère fait de la morue è  la crème. Michel est en train de vivre son dernier été è  Charny, mais il ne le sait pas encore. En début d’année, sa grand-mère a eu une attaque. Ses deux filles, qui vivent en banlieue parisienne sont en train de lui chercher une maison pas trop loin de chez elles. Elle n’est plus en état de vivre seule toute l’année, de s’occuper de son jardin.

Let us first try Bing Translator by Microsoft:

In Charny there is only one grocery store; But the butcher’s van spends Wednesday, the fishmonger’s truck on Fridays; Often on Saturday lunchtime, his grandmother makes cod with cream. Michel is living his last summer in Charny, but he doesn’t know it yet. At the beginning of the year, his grandmother had a seizure. His two daughters, who live in the suburbs of Paris, are now looking for a house not too far from home. She is no longer in a position to live alone all year, to look after her garden.

It is good but Bing gets the capitalization wrong, and “the butcher’s van spends Wednesday” is a bad translation. I think it is slightly worse than what a non-professional human translator would do.

How well does Google translate do on this passage?

At Charny there is only a grocery store; but the butcher’s van passes on Wednesday, that of the fishmonger on Friday; often, on Saturday noon, her grandmother made cod in the cream. Michel is living his last summer at Charny, but he does not know it yet. At the beginning of the year, his grandmother had an attack. His two daughters, who live in the Paris suburbs, are looking for a house not too far from their home. She is no longer in a condition to live alone all year round, to take care of her garden.

Unlike Bing, Google does not randomly capitalize the sentences, but there are awkward bits. The “that of the fishmonger on Friday” passage is hard to parse. And then Google gets the verb tense wrong in “her grandmother made cod in the cream”. I don’t like “his grandmother had an attack”. Like Bing, this does not reach the “non-professional human level” threshold.

What about DeepL?

In Charny, there is only one grocery store left; but the butcher’s truck passes on Wednesdays, the fishmonger’s truck passes on Fridays; often, on Saturdays at noon, his grandmother makes codfish with cream. Michel is living his last summer in Charny, but he doesn’t know it yet. Earlier this year, his grandmother had a stroke. His two daughters, who live in the suburbs of Paris, are looking for a house not too far from their homes. She is no longer able to live alone all year round, to look after her garden.

DeepL is the only one to get “his grandmother had a stroke” correctly. It is as good as what most human beings could do.

All translation engines fail to attribute the daughters to the grand-mother. No professional translator would make such a mistake, but I think many of us would.

Yes, I know that judging a system based on a single passage is methodologically problematic, but I ran many more tests that support my claim that DeepL is far above Google and Bing. I could elaborate further, but I’d encourage you instead to try it out.

Credit: I found out about DeepL via Peter Turney.

Note: My wife is a professional translator. I’m not claiming that she is about to become obsolete. Not by a long shot. But she is a professional translator, she is much better at translation than 99% of us.

10 thoughts on “DeepL is as good as human translators?”

  1. I had to re-read the french text myself twice, but my understanding is that the daughters are that of the grandmother, not Michel’s. Like Bing and Google, DepL translated it to “His two daughters” instead of “Her two daughters”. It should be possible to infer that from the following sentence, “Elle n’est plus en état de vivre seule”, but I’m not convinced all cases will be able to get resolved without semantic analysis to infer family structure, which I myself needed to confirm the sense of it.

    DepL’s sure looks like a step forward, but it’s probably still just a well-trained text robot.

    1. DepL’s sure looks like a step forward, but it’s probably still just a well-trained text robot.

      I think we have not broken the Turing test yet, for sure.

      However, please consider that most human beings are terrible at translation. For example, I only saw the mistake once you pointed it out.

      Most of us make plenty of mistakes while translating. You have to put the bar at a reasonable level.

      I’m also explicit in my post that I do not think we are close to the “professional level” of translation. That’s going to be much harder.

  2. You shouldn’t test on things that have already been translated by humans and potentially used as training data. Ideally, write something completely new.

  3. I tried the sentence: “Toujours tiré à quatre épingles, il connaît bien les impératifs de sa profession.” and none of the automated translators, including DeepL, recognized the expression “tirer à quatre épingles.” They all gave literal translations.

    None of the idioms I tried were translated correctly. These engines still have a ways to go compared even to a non-professional translator.

  4. I agree wholeheartedly with Daniel. I have tested DeepL on a difficult philosophical article and on the text of a children’s book. (E->F in both cases). I find DeepL absolutely tremendous. It doesn’t think, of course, but that’s not really the point. Its output is way superior to Google’s or Bing’s and even to the translation of the published children’s book.
    I am a professional translator (retired from the UN after a little over a quarter-century in the salt mines :).

  5. In some ways, deepL is even worse than GT. Often, it tries to do daring translations that, even if they can sound more or lless acceptable, they change absolutely the meaning. Other tines deepL leaves out entire sentences.

  6. I am not a profesionnal translater but I am a business analyst for an IT consulting firm in Montréal, Qc.

    Since I am bilingual in french and english, I often get asked to translate technical documents to and from both those languages. I admit that I would often refer to google translate for large swathes of text. I would then “correct” the translation and this would take me less time than translating from scratch. As I mentionned I’m not a trained translater.

    Since my wife introduced me to Deepl, I have completely stopped using Google Translate. Deepl is not perfect but it seems to have a much easier with very technical terms. It’s an incredibly useful tool for me since it gives amateurs like me tools that can allow them to “punch above their weight”

  7. I just stumbled across DeepL recently. I have decades old letters written in Lithuanian to my grandfather and wanted to translate them. For the most part it does OK, my main issue is that I’m having a difficult time differentiating between cursive a’s and o’s. Sometimes it’s a plug and play to see if the sentence makes more sense. It certainly does a better job than Google Translate, part of it might be that minor languages are given short shrift. What I don’t like is that DeepL doesn’t seem to just translate single words. I’d like to be able to use it as a dictionary sometimes.

