Software and hardware obsolescence in the kernel [LWN.net]

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 8:16 UTC (Sun) by amarao (guest, #87073) [Link]

Hah! I've just realized what whose crusades really was about. Hm.. Wrong numerical (byte) order is a decent reason to declare a war or two.

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 10:19 UTC (Sun) by mpr22 (subscriber, #60784) [Link]

Ah, yes, I misread the article. My bad; thank you for the correction.

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 14:49 UTC (Sun) by pizza (subscriber, #46) [Link] (13 responses)

> In Arabic, for example, written numerals are little-endian.

Um, no. Arabic is written right-to-left, but numbers are written left-to-right, ie big-endian.

Now the numbers may be *read* right-to-left, or a mixture of the two ("125" is read as "one hundred five-and-twenty"), but in written form, they're left-to-right.

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 18:05 UTC (Sun) by epa (subscriber, #39769) [Link] (12 responses)

Really? Arabic writers put down numbers in left-to-right form? So if you are writing some text (in the normal right-to-left order) and you want to put down a numeral, you stop for a moment and work out how much space you need, then write in the number from left to right, starting with the most significant digit?

I always assumed that Arabic numbers were written and read in the same order as the rest of the text, in other words with the least significant digit coming first (little-endian) but I freely admit I have no knowledge of the Arabic language so it could be more complex than I thought.

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 19:57 UTC (Sun) by pizza (subscriber, #46) [Link] (11 responses)

> Really? Arabic writers put down numbers in left-to-right form? So if you are writing some text (in the normal right-to-left order) and you want to put down a numeral, you stop for a moment and work out how much space you need, then write in the number from left to right, starting with the most significant digit?

Yep!

Software and hardware obsolescence in the kernel

Posted Aug 31, 2020 6:34 UTC (Mon) by epa (subscriber, #39769) [Link] (10 responses)

Yikes! I guess that’s one concrete reason why left-to-right scripts are superior (apart from ink smudging). So in conclusion, numbers are *written* in big-endian direction in all common scripts, and probably read in that direction too, but this may be the opposite direction to the normal one.

Software and hardware obsolescence in the kernel

Posted Aug 31, 2020 17:11 UTC (Mon) by marcH (subscriber, #57642) [Link]

> Yikes!

This zig-zag doesn't feel like a very hard hand writing challenge, I mean not unless you have to deal with crazy long numbers. For computers and terminals it's apparently a bit harder :-)

> I guess that’s one concrete reason why left-to-right scripts are superior (apart from ink smudging).

Ink smudging _and_ hiding what you just wrote. Look at how left-handed people tend to bend their wrist, even with a pencil.

My urban legend is that right-to-left languages were superior for... carving. Ten commandments and all that :-)

Software and hardware obsolescence in the kernel

Posted Aug 31, 2020 20:53 UTC (Mon) by nybble41 (subscriber, #55106) [Link] (7 responses)

> So in conclusion, numbers are *written* in big-endian direction in all common scripts, and probably read in that direction too, but this may be the opposite direction to the normal one.

They're read big-endian but written little-endian. Endianness is determined by the position (address) of each digit, not temporal order in which they're written. The least-significant digit is located at the lowest address, closest to the beginning of the text. When "serialized" (read aloud or subvocalized) the numbers are converted into big-endian format, with the most significant digit spoken first.

Software and hardware obsolescence in the kernel

Posted Aug 31, 2020 21:22 UTC (Mon) by marcH (subscriber, #57642) [Link] (6 responses)

> The least-significant digit is located at the lowest address, closest to the beginning of the text.

No because the numbers are not part of the text, they're a left-to-right insert in a right-to-left text. There are effectively two "address spaces" embedded in one another (a.k.a. "zig-zag").

As explained here, Arabic speakers start with the most significant digit when they read and write just like everyone else and that it is what should define what the "lowest address" is, otherwise non-Arabic speakers are misled into thinking Arabic speakers do something different which is exactly what happened in this thread. Speech readers would be confused too.

Software and hardware obsolescence in the kernel

Posted Aug 31, 2020 22:22 UTC (Mon) by nybble41 (subscriber, #55106) [Link] (4 responses)

> No because the numbers are not part of the text, they're a left-to-right insert in a right-to-left text.

> As explained here, Arabic speakers start with the most significant digit when they read and write just like everyone else and that it is what should define what the "lowest address" is…

It doesn't make sense to talk about big-endian or little-endian without a single, consistent frame of reference for the addressing which is independent of the content. In a context where you would write the elements of a list right-to-left, that means starting with the lowest address on the right and monotonically increasing toward the left. Only after having defined this addressing scheme can we venture to answer whether the components of the list are written big-endian or little-endian with respect to that surrounding context.

The digit you read or write first (temporally, not spatially) has nothing to do with endianness. The order in which you wrote the digits is not part of the written record. Someone coming along later can't even tell what order the digits were recorded in; it makes no difference to them whether you wrote the least- or most-significant digit first. All they can see is the order of the digits as they are laid out visually on the page.

In serial communication the standard is different. There it matters which digit is pronounced first, because the temporal order of the symbols is *all* you can observe.

Software and hardware obsolescence in the kernel

Posted Sep 1, 2020 1:23 UTC (Tue) by marcH (subscriber, #57642) [Link] (3 responses)

> The order in which you wrote the digits is not part of the written record. Someone coming along later can't even tell what order the digits were recorded in

Of course they can, that's called "reading". I can hardly believe you wrote this...

Computers are not as smart though, so they may need some additional clues: https://www.w3.org/International/articles/inline-bidi-mar...

Software and hardware obsolescence in the kernel

Posted Sep 1, 2020 15:01 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (2 responses)

> Of course they can, that's called "reading". I can hardly believe you wrote this...

Are you being deliberately obtuse? If I sent you a picture of some digits I wrote left-to-right and some digits I wrote right-to-left, "reading" is not going to be enough to tell them apart. Here, I'll demonstrate:

1234
1234

To simulate physical writing I filled both lines with spaces and then overwrote the spaces with digits. One line was filled in left-to-right, and the other line right-to-left. Please tell me, which one was written left-to-right?

Software and hardware obsolescence in the kernel

Posted Sep 1, 2020 16:28 UTC (Tue) by marcH (subscriber, #57642) [Link] (1 responses)

> Are you being deliberately obtuse?

I thought you were.

Natural languages are all about context, that's why computers need Unicode bidi = a bit more help. This has been well discussed and explained in several other places in this thread (thanks to all those who did) but if not obtuse you are definitely not receptive. Never mind.

Software and hardware obsolescence in the kernel

Posted Sep 2, 2020 3:51 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> Natural languages are all about context, that's why computers need Unicode bidi = a bit more help.

Indeed, natural language is all about context. I get the feeling that we are talking about two completely different things and getting frustrated because the other person's answers make no sense in the context of what we each thought the conversation was about. I have been trying to describe how the terms "big-endian" or "little-endian" would apply to the *visual* layout of Arabic numerals *at rest*, for example as symbols written on paper—akin to the individual bytes of an integer field which is part of a larger structure stored in RAM or a file on disk. You seem to be interpreting my statements in the context of data which is *being* written, or read, or typed into a computer—a *serialization* of the data. Or perhaps you are referring to the particular way that the digits would be serialized as Unicode code points in a text file. Naturally my statements would seem like nonsense when taken that way; they were not intended for that context.

For data at rest there is no "time" component; all that matters is the relationships between the addresses or coordinates where each of the digits is stored. For digits written in a single line on paper this corresponds to linear physical coordinates; a digit may appear either to the left or the right of another symbol. In terms of the analogy to the storage of an array of multi-byte integers in computer memory, a system in which the most-significant digit of each number in a list of numbers is physically located on the same side as the first element of the list is "big-endian" and a system in which the least-significant digit is physically closest to the first element of the list is "little-endian". Any given serialization of the data (the process of reading or writing, for example) may employ a different "endianness" independent of the visual layout, and indeed that is the case for Arabic numerals: they are stored or rendered (on paper or other visual medium) as little-endian, but read, written, typed, or spoken aloud with the most significant digit first, in big-endian format.

Anyway, this debate is almost as pointless as the fictional conflict from which we get the terms "big-endian" and "little-endian".[1] I only replied in hopes of conveying that we are arguing *past* each other more than we are actually disagreeing about anything of substance.

[1] https://www.ling.upenn.edu/courses/Spring_2003/ling538/Le...

Software and hardware obsolescence in the kernel

Posted Aug 31, 2020 23:23 UTC (Mon) by kjpye (subscriber, #81527) [Link]

Actually, everybody reads numbers in a zig-zag fashion.

If you are reading a number like 8034175, you start "eight million", but you can't get past the "eight" until you have scanned from the right of the number to the left to determine the magnitude.

So a non-Arabic speaker will read left to right, encounter the number and skip to the end of the number and scan back to determine the magnitude and then read the number left to right and continue reading towards the right.

An Arabic speaker will encounter the right-hand end of the number first, scan across it to determine the magnitude and then read the number left to right. Then they will jump back to the left of the number and continue reading towards the left.

The only real difference is in whether the jump occurs before reading the number (non-Arabic) or after (Arabic).

Software and hardware obsolescence in the kernel

Posted Sep 1, 2020 0:43 UTC (Tue) by notriddle (subscriber, #130608) [Link]

August 31, 2020 at 12:00 PM

Figure out the endianness of THAT notation!

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 15:14 UTC (Sun) by jem (subscriber, #24231) [Link] (9 responses)

In Arabic, for example, written numerals are little-endian.

Or are they? As is common knowledge, Arabic is written right-to-left, but it is my understanding that numbers are read left-to-right. Digits are shown in the same order as in Western scripts, for example two hundred and thirty-five is "235" or "٢٣٥", depending on whether Western Arabic or Eastern Arabic (Hindi) numerals are used. When this number is read, the reader first looks at the hundreds. Likewise, when numbers are entered on a device, the input system temporarily changes direction to left-to-right, so the digits are entered in the order 2, 3, 5 and the result is displayed as "235".

Reference: Numbers in Arabic.

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 17:31 UTC (Sun) by karkhaz (subscriber, #99844) [Link] (8 responses)

> Arabic is written right-to-left, but it is my understanding that numbers are read left-to-right

This is precisely what little-endian means, right? Consider the following sequence of four numbers, written in an LTR language like English. You start reading at X and finish reading at Y.

X 123 555 1234 Y

In Arabic, this sequence is written as

Y ۱۲٣٤ ۵۵۵ ۱۲٣ X

In English, your eyes continuously move from left-to-right. In Arabic, your eyes zig-zag across the page: from right-to-left to read the sentence, but from left-to-right when reading each of the three numbers. This is analogous to little-endian, where the bytes within a structure are laid out in opposite direction to the addresses of the structures.

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 18:41 UTC (Sun) by jem (subscriber, #24231) [Link] (2 responses)

No, little-endian means the least significant byte comes first, at the lowest address. The analogy to reading a number is that the most significant digit is read first, thus the number is "big-endian". (Whether the first digit is to the left or right does not matter, and both Arabic and Western numbers are written with the most significant digit to the left.)

Software and hardware obsolescence in the kernel

Posted Oct 14, 2020 10:50 UTC (Wed) by immibis (subscriber, #105511) [Link] (1 responses)

So an 8-bit x86 processor could be called big-endian if it reads the MSB first?

I don't think so.

Software and hardware obsolescence in the kernel

Posted Oct 14, 2020 18:31 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> So an 8-bit x86 processor could be called big-endian if it reads the MSB first?

Yes, the serialization of the word on the 8-bit bus would be accurately labeled big-endian if the MSB is transferred first—not that this would be observable to software. The storage would still be little-endian since the LSB is stored at the lowest-numbered address. This can be confirmed by accessing the same memory address with byte- and word-oriented instructions.

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 18:51 UTC (Sun) by iabervon (subscriber, #722) [Link] (4 responses)

Little endian would presumably mean that people using the system were comfortable getting the ones digit first, then the tens, then the hundreds. After reading a number, the information you got most recently would be the order of magnitude, and the digits would appear in the order you use them while adding numbers or multiplying by the single digit. Furthermore, scanning through a number once forward, you'd know the place-value of each digit when you encountered it, without needing to check the length of the number first like big-endian readers do.

I expect that, if you grew up using big-endian numbers and then learn Arabic and see a number in it, you'll zig-zag. But if you grew up with Arabic, you'll read "1234 555 123" as "three, twenty, one hundred; five, fifty, five hundred; four, thirty, two hundred, one thousand", going right-to-left through the number.

I suspect that big-endian practice comes from a culture that used Roman numerals, which start with the highest-value information and don't require knowing how many more digits are coming to assign a value to the first digit, getting Arabic texts on arithmetic and keeping the computation the same while translating the explanation, and the Arabic texts had the ones digit on the right because that's the first digit you produce in addition, multiplication, or subtraction, and they put the first digit of a number where they put the first letter of a word.

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 19:26 UTC (Sun) by karkhaz (subscriber, #99844) [Link] (3 responses)

> But if you grew up with Arabic, you'll read "1234 555 123" as "three, twenty, one hundred; five, fifty, five hundred; four, thirty, two hundred, one thousand", going right-to-left through the number.

No, this really isn't the case. Native Arabic readers don't start reading the lowest-magnitude digit first, they skip to the largest digit and read left-to-right. Both when reading, and when uttering the number (with the exception that units are uttered before tens).

As another example, consider the date range 1979-2020. In Arabic this is written ١٩٧٩-٢٠٢٠ and pronounced "one thousand and nine hundred and nine and seventy to two thousand and twenty".

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 21:23 UTC (Sun) by karkhaz (subscriber, #99844) [Link]

My explanations may render incorrectly if you read them in a terminal emulator, since most of them [1] don't support Unicode's algorithm for laying out mixed LTR-RTL text [2]. I though that I had made a fool of my self on the Internet when I read my comment notification in mutt, but it turned out that I hadn't. (Or at least not this time.)

[1] https://lwn.net/Articles/749992/
[2] https://en.wikipedia.org/wiki/Bidirectional_text#Unicode_...

Software and hardware obsolescence in the kernel

Posted Aug 30, 2020 22:47 UTC (Sun) by marcH (subscriber, #57642) [Link]

> No, this really isn't the case. Native Arabic readers don't start reading the lowest-magnitude digit first, they skip to the largest digit and read left-to-right. Both when reading, and when uttering the number (with the exception that units are uttered before tens).

Very useful thanks, I suspected such a "full zig-zag" but wasn't sure.

I admit I didn't consider right-to-left languages at the time I wrote "human numbers are big endian" above. Thank you right-to-left languages for this making this zig-zag exception and keeping my statement correct, much appreciated :-)

More seriously, it's easy to imagine the rationale for this zig-zag:
- Numbering "compatibility" with other languages of course, and
- All humans of all languages seem interested by the Most Significant digits first; e.g. "rounding".

Software and hardware obsolescence in the kernel

Posted Aug 31, 2020 20:46 UTC (Mon) by nybble41 (subscriber, #55106) [Link]

> Native Arabic readers don't start reading the lowest-magnitude digit first, they skip to the largest digit and read left-to-right. Both when reading, and when uttering the number….

So numbers are written in little-endian notation (given right-to-left addressing for "unstructured" data, i.e. plain text), and converted to big-endian when "serialized" (spoken aloud) without rearranging the rest of the text. That sounds exactly like a traditional little-endian network stack to me.