Perl 5.28.0 released

Posted Jun 25, 2018 17:07 UTC (Mon) by JoeBuck (subscriber, #2330)
In reply to: Perl 5.28.0 released by epa
Parent article: Perl 5.28.0 released

A number of languages have distinct characters to write numbers, for example Kannada, the local language in the Indian state that includes Bangalore, has Unicode codepoints from 0CE6 through 0CEE to represent digits (I've been there several times and love the curlicue Kannada characters though I can't read a bit of it and fortunately for me English is widely spoken and used on signs).

Perl 5.28.0 released

Posted Jun 26, 2018 15:16 UTC (Tue) by epa (subscriber, #39769) [Link] (6 responses)

Right, and in English too we have "twenty-three" as an alternative to "23". Do people in Bangalore, if their mobile phone is localized to the Kannada language, enter phone numbers using the Kannada digits? Do spreadsheet applications display them instead of 0-9? Does your computer keyboard have them on the numeric keypad? My point is not that other ways of writing numbers don't exist, but that the digits 0-9 are used almost everywhere for the kind of technical or scientific number representation that you'd typically want to match with a regular expression \d+ and then process further.

Perl 5.28.0 released

Posted Jun 26, 2018 20:43 UTC (Tue) by karkhaz (subscriber, #99844) [Link] (5 responses)

Not sure about Bangalore and Kannada, but in most Arab-speaking countries the answer to all three of your questions is "yes".

This isn't such a challenge to deal with, because the numerical system is exactly the same as in the West (i.e., the position of a digit within the number gives its magnitude), the difference is the actual numerals are the characters ٠١٢٣٤٥٦٧٨٩ rather than 0-9. They're even written with the highest-magnitude digits on the left, just like Western numbers, even though Arabic text is written right-to-left.

So to convert an East-Arabic number string into an int, it suffices to subtract a constant from each character in the string (to turn it into an ASCII number string) and then do the type conversion.

Perl 5.28.0 released

Posted Jul 16, 2018 7:43 UTC (Mon) by epa (subscriber, #39769) [Link] (4 responses)

Don't you also have to reverse the string?

Perl 5.28.0 released

Posted Jul 16, 2018 18:58 UTC (Mon) by dtlin (subscriber, #36537) [Link] (3 responses)

Nope. http://unicode.org/reports/tr9/ opens with exactly this case.

However, there are several scripts (such as Arabic or Hebrew) where the natural ordering of horizontal text in display is from right to left. If all of the text has a uniform horizontal direction, then the ordering of the display text is unambiguous.
However, because these right-to-left scripts use digits that are written from left to right, the text is actually bidirectional: a mixture of right-to-left and left-to-right text.

Arabic letters have Bidi_Class=AL (Arabic letter, strongly RTL), while Arabic digits have Bidi_Class=AN (Arabic number, weakly LTR).

Perl 5.28.0 released

Posted Jul 16, 2018 20:31 UTC (Mon) by zlynx (guest, #2285) [Link] (2 responses)

Claiming that Arabic numerals are RTL is funny because Western Latin languages copied Arabic numerals from Arabic including the direction which is LTR.

Arabic numerals are /supposed to be/ read right to left in little-endian order. Notice that when reading a number, we have to first count all of the digits to determine hundreds, thousands, millions, etc, before we start talking. Instead all these years we could have been reading them as "1 and 20 and 400" if only we'd written them the other direction.

We also have all of the strange formatting exceptions for numbers so that they align to the right. Note that in English that's the only thing we right-align. A big hint that we write them in the wrong order.

Perl 5.28.0 released

Posted Jul 16, 2018 20:46 UTC (Mon) by karkhaz (subscriber, #99844) [Link] (1 responses)

> Claiming that Arabic numerals are RTL is funny because Western Latin languages copied Arabic numerals

I don't think dtlin claimed that at all, their comment was that Arabic digits have a LTR class. However, there's a subtle point here: what we call "Arabic numerals" (0123456789) were indeed copied from Arabic, but I was talking about the numerals that are currently used in most Arab-speaking countries (٠١٢٣٤٥٦٧٨٩, which I referred to as East Arabic numerals to disambiguate).

> Arabic numerals are /supposed to be/ read right to left in little-endian order

I'm not sure what your source for this is. I suppose it makes sense when you have a number embedded in some RTL text. However, I speak Arabic (though I cannot read nor write), and numbers are not pronounced as "1 and 20 and 400". The order is actually a bit jumbled: that particular number is pronounced "four hundred and one and twenty".

In general, higher-magnitude digits are uttered before lower-magnitude ones in spoken Arabic, just as in English. The exceptions are that units are uttered before tens ("one and twenty"), and that the numbers from eleven to twenty have special names (as they do in English, i.e. we say "eleven" as opposed to "one and ten")

Perl 5.28.0 released

Posted Jul 16, 2018 21:02 UTC (Mon) by zlynx (guest, #2285) [Link]

Ah. I must have been confused by the "1 and 20" bit. I'd been told that somewhere and I thought it generalized to higher multiples of ten.