|
|
Subscribe / Log in / New account

Decouple the actions

Decouple the actions

Posted Aug 20, 2024 14:54 UTC (Tue) by khim (subscriber, #9252)
In reply to: Decouple the actions by yeltsin
Parent article: FreeBSD considers Rust in the base system

> It's still a helpful tool (or they wouldn't be using it) in the sense that it's better than nothing, bit it has a looooooong way to get to get to the level of an *average* translator who can at least produce something coherent that gets the point across, even if it's not written in Shakespearean style.

Are you sure? I remember how documentation that was received from Chinese companies looked like before advent of LLMs and that description was true for the majority of it:

> the original "translation" is often impossible to understand for someone not trained in deciphering it.

I just had to accept the fact that this is what average human translator produces.

P.S. I have no idea what causes that effect, BTW: I know that documentation translated from Russian or German was never that bad (in both directions), both with humans and LLMs, but something in Chineese just makes it impossible to translate adequately without someone who fully understands what these sentences actually mean, regular translators without knowledge of specific technology always produced garbage and LLMs still do that.


to post comments

Decouple the actions

Posted Aug 20, 2024 20:33 UTC (Tue) by Wol (subscriber, #4433) [Link] (9 responses)

> P.S. I have no idea what causes that effect, BTW: I know that documentation translated from Russian or German was never that bad (in both directions), both with humans and LLMs, but something in Chineese just makes it impossible to translate adequately without someone who fully understands what these sentences actually mean, regular translators without knowledge of specific technology always produced garbage and LLMs still do that.

It's presumably because English, German and Russian are all Indo-European languages, and as such are all descended from a fairly recent ancestor. I remember reading something about "four waves of languages" and Indo-European belongs to wave 4. I believe Hungarian and Finnish are wave 2 languages, and while their vocabulary is completely different, they share a similar grammatical structure.

I think Gaelic, Basque, Catalan might be wave 3. Where Chinese, Japanese etc fit I don't have a clue.

But the point is that the structure of European languages is similar, so it's mostly a case of translate the individual words, be aware of idiom, and converting a crude translation to a good one isn't that much work. Start translating into a language from a completely different group, and many concepts may become completely untranslateable, Language shapes your view of the world just as much as your view of the world shapes language. A good example is "Borgeois, Burgerlich, Middle-class". Three words, three languages, the same basic concept, but each word is unique to its language, and while a naive translator might think they are the same, they all three mean something rather different one from the other. Indeed, I don't even know that any of those have an exact translation into any of the other languages.

Imagine that in three closely related languages. Now extend that to massively more different languages ...

(I see that - slightly differently - all the time at work. I have Polish, a Chinese, and an Indian colleague. Because these languages all differ in the sounds they use, I have difficulty hearing them clearly, and they have difficulty hearing me clearly.)

Cheers,
Wol

Decouple the actions

Posted Aug 20, 2024 22:26 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

It's not just individual vocabulary words, either. Chinese grammar does many things that are utterly alien to Indo-European languages, like verb stacking, aspect markers in lieu of tenses, the use of classifiers and absence of determiners in noun phrases, and a near-total lack of inflection (the latter is, oddly enough, almost a feature of English, but English does a little inflection here and there, mostly for verb conjugation and plurals). That's not even getting into the fact that "Chinese" is not one language, it is a whole family of (closely-related) languages, which all do things slightly differently from each other.

(For the curious: I found https://en.wikipedia.org/wiki/Chinese_grammar a helpful starting point, but I must admit that I don't know Chinese myself, so I have no idea how accurate it is.)

Decouple the actions

Posted Aug 21, 2024 0:21 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Yup. This is correct. One thing that tripped many translation systems was the lack of grammar tenses in Chinese. In English, every verb has to have a tense, it's unavoidable. Not so in Chinese, you have to get the tense (past, present, future) from the surrounding context.

Another thing that especially trips documentation writers is passive voice. It's rarely used in Chinese, unless talking about something serious ("he was hit by a car" type serious). A sentence like "once a job is processed" is difficult to translate word-for-word.

Decouple the actions

Posted Aug 21, 2024 6:46 UTC (Wed) by viro (subscriber, #7872) [Link] (3 responses)

Excuse me, but... what the hell have you been smoking? Gaelic is a Celtic language, which is a branch of Indo-European. Catalan is a direct descendant of Vulgar Latin. Which is to say, also I-E, Italic branch. It's about as far from Spanish as the language of Robbie Burns is from BBC English. Definitely _way_ closer than English and Dutch are to each other.

Decouple the actions

Posted Aug 21, 2024 10:46 UTC (Wed) by Wol (subscriber, #4433) [Link] (2 responses)

Fair enough - it's clear from what I wrote that I didn't actually know. Which is why I was deliberately - and clearly - vague.

And actually - from what you say - I suspect the language of Rabbie Burns may be *further* from BBC English than Catalan from Spanish (I was under the impression that Catalan might have been a pre-existing language swallowed up into Spain. Bit like Welsh and English). Rabbie spoke Scots (aka "the language of the Angles"), while BBC is English (aka "the language of the Saxons").

And where does Basque fit into all this?

Cheers,
Wol

Decouple the actions

Posted Aug 21, 2024 15:24 UTC (Wed) by anselm (subscriber, #2796) [Link]

Rabbie spoke Scots (aka "the language of the Angles"), while BBC is English (aka "the language of the Saxons").

Robert Burns is actually not a great example of Scots because in many cases he would tone down his Scots to be rather more like English (he had an audience to consider, after all). The Scots language, as distinct from other varieties of English in Britain, became its own thing only by the 15th century, when “Angles vs. Saxons” hadn't been an issue for close to a millennium or so.

And where does Basque fit into all this?

Basque is really an outlier because it is the only surviving language in Europe that is not somehow related to some other language. The general thinking is that early Basque developed before Indo-European languages (such as Celtic or Romance languages) reached the area. Basque has now assimilated various words from its neighbours but the grammar is still considerably different from Indo-European languages.

Decouple the actions

Posted Aug 21, 2024 16:38 UTC (Wed) by viro (subscriber, #7872) [Link]

Catalan is a language being swallowed up - it's just that it is closely related to the language that swallows it. So's Scot to English (and divergence times are similar - 15th century or so). Basque is a very different beast - it's not I-E at all, but more to the point, the grammar is different enough to make things interesting. It's not about the common ancestry - we *know* that there had been many deep reworks in that among the I-E, to the point that reconstructing the grammar of their common ancestor is pretty much hopeless; Basque grammar has features outside of the observed range for attested I-E languages. No idea how much headache does that cause for automatic translation, though...

Decouple the actions - languages

Posted Aug 22, 2024 3:43 UTC (Thu) by kenmoffat (guest, #4807) [Link] (2 responses)

I had not come across this wave theory. But a quick look at wikipedia suggests that if Indo-European languages are wave 4 then Catalan is definitely wave 4 (a derivative of vernacular latin, like many other languages) and while Gaelic is derived from insular (Ireland, Britain) languages it is still Indo-European but possibly related to Iberian or Gallic tongues. But certainly Finnish is not Indo-European, so probably wave 2, and Basque (currently regarded as an isolate) maybe wave 1 (as in "before the known movements of languages" - or should that be "wave zero ?").

But your point that all of these have similar sentence structures whereas Chinese dialects/languages (the choice of which they are is a political choice) are completely different is very true. As is the difficulty of hearing correctly - e.g. in Han or (ex-Han languages such as Korean) sounds which we English-speakers can distinguish such as l,n,r are hard for speakers of those languages to differentiate, and I'm certain the corollary is true for certain of their sounds which we did not learn as children.

Decouple the actions - languages

Posted Aug 22, 2024 3:48 UTC (Thu) by kenmoffat (guest, #4807) [Link] (1 responses)

And I now see that other people have expressed this much better than I could. <sigh/>

Decouple the actions - languages

Posted Aug 23, 2024 10:05 UTC (Fri) by Wol (subscriber, #4433) [Link]

Well, you picked up on my sounds comment, which I don't think anyone else did ...

Cheers,
Wol


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds