|
|
Subscribe / Log in / New account

Transparent versus opaque formats

Transparent versus opaque formats

Posted Feb 26, 2017 11:35 UTC (Sun) by excors (subscriber, #95769)
In reply to: Transparent versus opaque formats by epa
Parent article: Linus on Git and SHA-1

> Linus makes a useful distinction between transparent formats (essentially, anything that is read and edited as plain text) and opaque ones. An opaque file format such as PDF can contain lots of hidden code which the end user doesn't see

A plain text file can contain lots of zero-width Unicode spaces, soft hyphens, RTL marks, etc, which the end user won't see because they're invisible.

If you constrain it to ASCII text files in a viewer that makes all control characters visible and that highlights trailing whitespace, then maybe that could count as transparent. But that sounds too constrained to be a useful distinction in practice.


to post comments

Transparent versus opaque formats

Posted Feb 27, 2017 12:07 UTC (Mon) by epa (subscriber, #39769) [Link] (1 responses)

Yes, I did touch on the idea of requiring normalized whitespace and indentation for source code.

For non-code text documents, such as a legal agreement, tightening up the allowed whitespace might be a good idea. You'll never eliminate all possible avenues for fiddling the content, but if you can get the number of variations of a document down from trillions to under a million or so, you've greatly reduced the scope for finding a collision.

Transparent versus opaque formats

Posted Feb 27, 2017 12:18 UTC (Mon) by johill (subscriber, #25196) [Link]

Depending on the input format, you could round-trip through something like pandoc, I guess.

Transparent versus opaque formats

Posted Mar 4, 2017 1:57 UTC (Sat) by zslade (subscriber, #72097) [Link] (2 responses)

There is no such thing as plain text... What encoding? UTF-8, other Unicode, 7-bit clean or maybe it's in ISO 1251 or some Latin encoding... There is no lowest common denominator other than 7-bit clean ASCII. Good luck running into only that particular encoding in the wild.

Transparent versus opaque formats

Posted Mar 6, 2017 14:28 UTC (Mon) by epa (subscriber, #39769) [Link] (1 responses)

The Linux kernel sources (with a few exceptions for people's names in comments) are indeed 7-bit ASCII, as is the majority of source code.

Transparent versus opaque formats

Posted Mar 6, 2017 15:29 UTC (Mon) by excors (subscriber, #95769) [Link]

That's a fine policy until you get a patch submitted by Mr John FÜ�¦¶~;�ª²VEÊgÖ�ÇøK�Lyà+=öøm±iÅkEÁSþß·`8érr/ç­r�IàF who insists it would be culturally insensitive to transliterate his name.

(Also it's not true that names are the only exceptions - in some older versions of the kernel I see staging drivers with Big5-encoded Chinese comments.)


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds