|
|
Subscribe / Log in / New account

The Grumpy Editor's Python 3 experience

By Jonathan Corbet
July 31, 2018
LWN has been running articles for years to the effect that the end of Python 2 is nigh and that code should be ported to Python 3 immediately. So, naturally, one might expect that our own site code, written in Python, had been forward-ported long ago. Strangely enough, that didn't actually happen. It has mostly happened now, though. In the process of doing this work, your editor has noticed a few things that don't necessarily appear in the numerous porting guides circulating on the net.

One often-heard excuse for delaying this work is that one or more dependencies have not yet been ported to Python 3. For almost everybody, that excuse ran out of steam some time ago; if a module has not been forward-ported by now, it probably never will be and other plans need to be made. In our case, the final dependency was the venerable Quixote web framework which, due to the much appreciated work of Neil Schemenauer, was forward-ported at the end of 2017. Quixote never really took the world by storm, but it makes the task of creating a code-backed site easy; we would have been sad to have to leave it behind.

Much of the anxiety around moving to Python 3 is focused on how that language handles strings. The ability to work with Unicode was kind of bolted onto Python 2, but it was designed into Python 3 from the beginning. The result is a strict separation between the string type (str), which holds text as Unicode code points, and bytes, which contains arbitrary data — including text in a specific encoding. Python 2 made it easy to be lazy and ignore that distinction much of the time; Python 3 requires a constant awareness of which kind of data is being dealt with.

In practice, for LWN at least, Unicode is not where the problems arose. The standard advice is to use bytes for encoded strings originating from (or exiting to) the world outside a program, while converting to (or from) str at the boundary, thus using only str internally. That forces a focus on how one is communicating with the environment — a focus that really needs to be there anyway. It is not a hard discipline to acquire, and it leads to more robust code overall.

So text encodings aren't a big challenge except — in your editor's experience — for a couple of places, one of which is the email module, which has proved to be the reason for the most version-dependent code in this particular project. Much of that is due to API changes in that module, most of which are probably justified for proper email handling even if they are annoying in the short term. But there is also the simple problem that one cannot hide the text-encoding issue when dealing with email. It's not just that a message can arrive in an arbitrary encoding: a single message can contain text in multiple encodings — in a single header line. Properly processing such email is arguably easier and more correct in Python 3, but it's different from Python 2 in subtle ways that took a while to figure out.

Another problem has put your editor in a pickle — literally. The Python pickle module is a convenient way to serialize objects, but it has always been loaded with traps for the unwary. Pickle in Python 2 could be relied upon to generate pickles that could be treated as strings, especially if the oldest "protocol" was used. In Python 3, pickles are bytes, and they are not friendly toward any attempt to treat them as strings. Even the "human readable" protocol=0 mode will produce distinctly non-readable output for some types; these include things like NUL bytes that trip up even the relatively oblivious Latin-1 decoder. The datetime type is prone to this kind of problem, for example.

One solution is paint "PICKLES ARE NOT STRINGS" on one's monitor and to resolve never to be so sloppy again. But pickles have other problems, including sometimes surprising behavior when one pickles an object under Python 2, then tries to unpickle it under Python 3, where the definition of the object's class may have changed considerably. Your editor has concluded that pickles are an attractive way to avoid defining a proper persistence mechanism for Python objects, but that taking that shortcut leads to problems in the long run.

Yet another inspiration for high levels of grumpiness is the change in how module importing works. In Python 2, a line like:

    import mydamnmodule

would find mydamnmodule.py in the same directory as the module doing the import. That behavior was evidently too convenient to survive into Python 3, so it was taken out. The documentation gives some lame excuse about confusion between modules located this way and standard-library modules, but your editor knows that a more mean-spirited motive must have driven such a change.

Now, one can try to fix such code with an explicit relative import:

    from . import mydamnmodule

In many situations, though, that will lead to the dreaded "attempted relative import in non-package" exception that has been the cause of a seemingly infinite series of Stack Overflow postings. Once again, the rules must make sense to somebody, but they make this kind of relative import nearly impossible to use.

So there was nothing for it but to actually get a handle on the namespaces in use and change all the import statements into proper absolute form. Doing so revealed some interesting things. The lazy way in which we had set up our hierarchy was silently causing modules to be imported multiple times — as foo, lwn.foo, and even lwn.lwn.foo, for example — unnecessarily bloating the size of the running program. Such imports can also create difficult-to-debug havoc if any modules maintain module-level state that will also be duplicated and, naturally, become inconsistent.

Moving to well-defined absolute imports fixed those issues, but revealed another that had been hidden: the presence of a number of import loops in the code. These loops, where module A imports B which, in turn (and possibly through several layers of indirection) tries to import A, lead to a "can't import" exception. They are almost always an indication of code structure that, to put it charitably, could use a little more thought. Fixing those required a fair amount of refactoring, profanity, and slanderous thoughts about the Python developers.

The truth, though, is that these issues should have been fixed long ago; the end result of the import change is a much improved code structure here.

Some of the more annoying language changes really do seem like gratuitous attacks on people who have to maintain code over the long term, though. Python 2 did the Right Thing with source files containing both spaces and tabs, for example, while Python 3 throws a fit. The problem is easily fixed, but it seems like it didn't need to be a problem in the first place. Since time immemorial, octal constants have been written with a preceding zero — 0777, for example. Python 3 requires one to write 0o777 instead, for reasons that are not particularly clear. But JavaScript made that change too, so it must be the right thing to do.

At least old-style octal constants will generate a syntax error in Python 3, so there is no chance of subtle problems resulting from those constants being interpreted as decimal. The same is not true of integer division. Python 2 defined integer division as originally intended by $DEITY and implemented by almost every processor: the result is a rounded-downward integer value. So 3/2 == 1. In Python 3, instead, dividing integers yields a floating-point result: 3/2 == 1.5. That is a change that could silently create subtle problems. In the LWN code, integer division is used for tasks like subscription management and money calculations; these are not places where mistakes can be afforded.

The fix is easy enough on its face: use // for true integer division. But that requires finding every place that needs to be fixed. Grepping "/" in a large code base is not particularly fun, especially if said code base also includes a lot of HTML. This work has been done, but it is going to take a lot of testing before your editor is confident with the results.

There are numerous other little incompatibilities that one stumbles across, naturally. Some library modules have changed or are no longer present. The syntax of the except statement is different. Dictionaries no longer have has_key(). And so on. Most of these are relatively easy to catch and fix, though — just part of a day's work.

One might wonder about the various tools that are available to help with this transition. The 2to3 tool can be useful for finding some issues, but it wants to translate the code outright, generating a result that no longer runs under Python 2. That is a bigger jump than your editor would like to take; the strategy has very much been to get the code working under both versions of the language before making the big switch. 2to3 also chokes on the Quixote template syntax that is used by much of LWN's Python code. So it was of limited use overall.

An alternative is the six compatibility library, which can be useful for writing code that works under both Python versions. Your editor steered away from six instinctively, though, due to a kernel programmer's inherent dislike for low-level, behind-the-scenes magic. It reworks the module namespace, overrides functionality in surprising places, and requires coding in a version of the language that is neither 2 nor 3. Various versions of six bundled with dependencies have already led to problems even in the Python 2 version of the code. It is better, in your editor's opinion, to have the transitional compatibility code be in one's face, where it can be left behind once the changeover is complete. The increasing number of Python 3 features added to 2.7 make it easier to write portable code, in any case.

All told, the Python 3 transition has been an adventure — one that is not yet complete. It has taken a lot of time that was already in short supply. The end result, though, is cleaner code written in a better version of the language, or so your editor believes, anyway. The Python 2 code base put in over 16 years of service; hopefully the next version will be good for at least that long.

Index entries for this article
PythonPython 3


to post comments

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 20:18 UTC (Tue) by mrshiny (guest, #4266) [Link] (49 responses)

I've always hated the octal notation of leading zeroes. That isn't how zeroes work in our numbering system and it's just a trap for the unwary. Most people never use octal anyway.

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 20:23 UTC (Tue) by lsl (subscriber, #86508) [Link] (45 responses)

Most people never create files?

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 20:32 UTC (Tue) by mrshiny (guest, #4266) [Link] (23 responses)

I've been programming for 25 years and have almost never specified file permissions using octal. If I guessed, I'd say that most programmers don't create files very often, and that when they do, the default file permissions are fine, or the permissions are specified symbolically instead of numerically.

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 23:38 UTC (Tue) by gerdesj (subscriber, #5446) [Link] (1 responses)

"almost never"

... except when you did ...

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 14:44 UTC (Wed) by mrshiny (guest, #4266) [Link]

And that time that I did I would rather have used an Octal notation that wasn't a dangerous trap for people who understand how numbers work but don't know or forgot how various computer languages work. I'm not opposed to their being an octal literal notation, I just don't like the C-style notation. I think 0o123 is fine though I can appreciate why some people might prefer a letter other than o there. Surely Octal is used way less than Hex, yet nobody has an issue with writing 0x1A2B.

I'm firmly in the camp of "computer languages shouldn't surprise people". 011 being unequal to 11 is just bizarre. If Octal were used constantly, all the time, every day, by many many programmers, it would be one of those weird things like x = x + 1. How can x be equal to itself plus a number? Oh, the equals sign means assignment. Fair, we do lots of assignment so we get used to it, although people make mistakes, like if (x = 1) when they meant if (x == 1). (Aside: that's another surprise: the language allows assignment in a test expression, and many languages don't require a boolean expression).

If we're improving languages, we can prevent problems by removing features that aren't necessary for clean, readable, easily-typed code, and still having a language that makes sense and is understandable.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 8:43 UTC (Wed) by marcH (subscriber, #57642) [Link] (10 responses)

I never use octal permissions because they're not capable of performing the most common + and - operations like: chmod u+r, o-w, a+x etc.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 20:06 UTC (Wed) by cortana (subscriber, #24596) [Link] (9 responses)

Hmm...

>>> 0o400 + 0o200 == 0o600
True

Or am I missing something?

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 20:31 UTC (Wed) by farnz (subscriber, #17727) [Link] (5 responses)

Given a file with an unknown mode, how do you set user readable, remove other writeable and add execute to all without changing the rest of the permission bits?

chmod can't apply logical operations to the disk mode expressed in octal form, but chmod u+r,o-w,a+x will do that operation.

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 12:33 UTC (Thu) by cortana (subscriber, #24596) [Link]

Oh I see, we're talking about the chmod command. I was thinking more generally, in terms of the arithmetic that you'd perform if you were implementing the chmod command. :)

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 15:07 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link] (3 responses)

perl -e '$f=$ARGV[0]; $m=(stat $f)[2]; chmod((($m | 0511) & ~02), $f)'

There are two much more interesting questions here, though.

1) Why specify a translation depending on an existing value in order to change that to one you want instead of just using 'the one you want'?

2) 0206? Seriously? May be better fix the program ...

Octal mode translations

Posted Aug 2, 2018 20:40 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link] (2 responses)

BTW,
#!/usr/bin/perl
#

sub usage
{
    print STDERR ("Usage: pchmod <mode arg> <path>+\n");
    exit(1);
}

sub add { $_[0] | $_[1] }
sub clear { $_[0] & ~$_[1] } 
sub set { $_[1] }

sub valid_mode
{
    $_[0] =~ /^([+-])?0?[0-7]{1,3}$/;
}

sub parse_mode
{
    my (@ops, $op, $v);

    for (split(/,/, $_[0])) {
        die("invalid mode $_") unless valid_mode($_);
        
        if (/^([+-])(.*)/) {
            $op = $1 eq '+' ? \&add : \&clear;
            $v = $2;
        } else {
            $op = \&set;
            $v = $_;
        }

        push(@ops, [$op, oct($v)]);
    }

    return @ops;
}

my (@ops, @stat, $m, $rc);

@ARGV > 1 || usage();
@ops = parse_mode(shift);

for (@ARGV) {
    @stat = stat;
    @stat or warn("stat '$_': $!"), next;

    $m = $stat[2];
    $m = $_->[0]($m, $_->[1]) for @ops;
    $rc = chmod($m, $_);
    $rc or warn("chmod '$_': $!");
}

Assuming that's called pchmod, it becomes something like pchmod +511,-2.

Octal mode translations

Posted Aug 2, 2018 21:57 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

Wow, I didn't think I would ever see that much (coding!!) effort put into trolling...

Octal mode translations

Posted Aug 2, 2018 22:10 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link]

You shouldn't post peiorative assumptions about other people just because they happen to disagree with your opinion and argue about that.

The features (or lack of features) of chmod are not relevant to the discussion. Apparently, nobody ever needed relative octal mode specifications so badly that this got implemented. As demonstrated above, this is trivial (I wrote this while waiting for a 'git gc' to finish).

Octal mode specifications are convenient in code, especially, C code, because the replacement macronames are lengthy sequences of unpronouncible gibberish. They're easy enough to remember that they're also convenient for specifying absolute modes for the chmod command. I found it useful to overcome my original "numbers ... "-prejudice and would thus encourage others to try the same.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 20:38 UTC (Wed) by marcH (subscriber, #57642) [Link]

> Or am I missing something?

Yes (and you're in incredibly large company, never understood why)

Let me give you a more real world example:

chmod -R g+rX friends_can_look/

Good luck octal.

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 13:27 UTC (Thu) by virtex (subscriber, #3019) [Link] (1 responses)

When setting bits you usually don't want to just add the numbers together like you're doing because if the bit is already set the answer won't be what you want:
>>> oct(0o400 + 0o200)
'0o600'  (Looks good)
>>> oct(0o600 + 0o200)
'0o1000'  (Likely not what you wanted)
It's better to use the bitwise OR operation which will set the bit if it's unset, or leave it alone if it's already set:
>>> oct(0o400 | 0o200)
'0o600'  (Looks good)
>>> oct(0o600 | 0o200)
'0o600'  (The bit is already set, so nothing changes)

The Grumpy Editor's Python 3 experience

Posted Aug 6, 2018 12:59 UTC (Mon) by cortana (subscriber, #24596) [Link]

You're quite right, whoops! :)

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 9:14 UTC (Thu) by madhatter (subscriber, #4665) [Link] (9 responses)

> I've been programming for 25 years and have almost never specified file permissions using octal.

I've been sysadminning for 25 years and have almost never specified them any other way. I say this not to show you're wrong, but to show that we're different.

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 11:58 UTC (Thu) by mrshiny (guest, #4266) [Link] (8 responses)

As a sysadmin are you often writing programs to set file permissions? Or are you using shell scripts and command line arguments? Because there's no problem with chmod interpreting its command line arguments in octal. It already does that and doesn't require the leading zero. Chmod can do its own thing without forcing every other programming language to follow suit, even languages which are rarely used for specifying Unix file permissions.

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 13:26 UTC (Thu) by madhatter (subscriber, #4665) [Link] (7 responses)

You can use octal to specify a file permission in any given programming language or tool without forcing any other language to follow suit, so that doesn't seem to me to signify.

To follow your argument, as I understood it, from the top: you said that octal-with-a-leading-zero-in-python shouldn't be needed because no-one uses octal any more; lsl pointed out that file creation is a time when it is used; you said that you don't generally specify file permissions at all, and when you do you don't do them that way, and you were assuming that was more generally true.

I was merely pointing out that last bit of the argument is fallacious, and that some people do specify absolute file permissions. Do by all means argue that my case is a corner case and doesn't justify inclusion of octal in python. But please don't argue that my case doesn't exist.

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 14:53 UTC (Thu) by mrshiny (guest, #4266) [Link] (6 responses)

You misunderstand my argument.

I'm not saying octal literals should be impossible because literally nobody uses them. I'm saying octal literals with a leading zero are a trap, a misfeature, a bug waiting to happen in every general purpose programming language. Furthermore, I'm saying that the use of octal literals is relatively tiny compared to the use of programming languages. Probably the only time I've ever used octal at all was on a command line, where I didn't even need to use the leading-zero syntax because the command-line tool already interpreted numeric values as octal. chmod only accepts octal permissions and thus lets you use an unprefixed octal number for specifying permissions, so a large swath of octal use-cases are unaffected by this conversation.

The point is that yes, some people do need or prefer to use octal. But those use-cases are so insignificant compared to the general use of languages that I support making breaking changes to a language to prevent the use of leading-zero octal and instead require 0o123 notation or some other notation that is not so badly designed. Programming languages should not surprise people, and octal with a leading zero is one of those things most people never need, never use, and probably forget about. At least with the new notation there's basically no chance they'll accidentally use an octal number.

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 15:05 UTC (Thu) by madhatter (subscriber, #4665) [Link] (3 responses)

> You misunderstand my argument.

I don't think so. I wrote:

> Do by all means argue that my case is a corner case and doesn't justify inclusion of octal in python.

You write:

> But those use-cases are so insignificant compared to the general use of languages that I support making breaking changes to a language to prevent the use of leading-zero octal

It seems to me that what I wrote is exactly what you're doing, which is fine. I think you're wrong in your count of the valid use-cases, but we'd both have to actually count them to know.

I agree with you that leading-zero-to-denote-octal is pretty odd, and I'm personally fine for you to get rid of it in any given language *as long as you replace it with another way to specify octal* - 0o777 would be fine - because some people, including me, need it.

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 17:10 UTC (Thu) by mrshiny (guest, #4266) [Link] (2 responses)

I never advocated removing octal support from a language in this thread or anywhere.

The Grumpy Editor's Python 3 experience

Posted Aug 3, 2018 5:46 UTC (Fri) by madhatter (subscriber, #4665) [Link] (1 responses)

You're right, you didn't, and I apologise for the imputation. But if all you wanted to do was celebrate the shift to 0o for octal, in which celebration I'd have joined you, why make a sweeping and unsupported statement about the general lack of usage of octal?

The Grumpy Editor's Python 3 experience

Posted Aug 3, 2018 17:14 UTC (Fri) by mrshiny (guest, #4266) [Link]

The general lack of usage of octal, especially in non command-line use, is what makes this change even possible in the first place. Granted, Python 3 broke lots of old programs on purpose in order to make improvements, but this change is happening in other languages too, and this sort of widespread change would certainly be much more difficult if octal were widely used.

Not to mention that the danger of falling into the "Accidentally used an octal literal when I meant a decimal" trap is much reduced if octal is so commonly-used a feature that everyone is aware of it.

Most programmers simply aren't specifying unix file permissions on a constant basis. Just think of every Android or MacOS or iOS or Windows or Javascript programmer, or even web programmers on Unix. File creation is rare, and file creation with specific permissions is rarer, and file creation with specific permissions on a Unix-like filesystem is rarest. I don't need to take a census to know this is true.

The Grumpy Editor's Python 3 experience

Posted Aug 9, 2018 9:47 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> The point is that yes, some people do need or prefer to use octal.

Or learnt on a language/system that encouraged people to use it (like C encourages hex).

I really can't remember that far back but I've never had a problem with octal because it was used quite extensively on Pr1me for PL/1 and FORTRAN when I first started programming what, 35 years ago now?

And seeing as Pr1mos is a Multics derivative, I guess that's where Unix got it from, too :-)

Cheers,
Wol

The Grumpy Editor's Python 3 experience

Posted Aug 9, 2018 11:52 UTC (Thu) by anselm (subscriber, #2796) [Link]

There are few visible places where Unix uses octal, and the most prominent of those is probably file access permissions. Since these come in convenient packages of three bits, octal makes a lot more sense for them than, say, hexadecimal. Incidentally, file permissions are one aspect of Unix that doesn't seem to be influenced by Multics to any great extent.

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 20:32 UTC (Tue) by roc (subscriber, #30627) [Link] (13 responses)

Most people don't write C code and those that do can use S_IRWXU etc.

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 20:54 UTC (Tue) by lsl (subscriber, #86508) [Link] (12 responses)

Do you actually prefer the symbolic constants for file permissions? Personally, I strongly dislike them and consider the octal numbers to be *much* easier to read. Having to decipher something like "S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH" feels wrong when you can just write 0755.

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 22:03 UTC (Tue) by k8to (guest, #15413) [Link] (9 responses)

The command line symbols are much clearer. I'm not bothering to decode the C you provided (underlining your point), but this is super clear to me:

chmod u=rwx,go=rx

At least it's super clear compared to the C version.

But I still tend to use the C constants in code as opposed to octal numbers. I think it's a mix of expecting other programmers to come across it who are not UNIX nerds, and the fear of mangling the octal, and a bit of dogmatic fear over magic inline numbers.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 0:19 UTC (Wed) by madscientist (subscriber, #16861) [Link] (4 responses)

It's always really bothered me that the designers of the command line syntax chose "o" and "u". These are terrible, confusing letters. "u" (user) gives no information at all (I'm a user, you're a user, everyone attempting to type this command is a user) and "o" is completely ambiguous because it could stand for "other" but also "owner".

So, it could be "u" means the owner of the file and "o" means other users, or it could be that "o" means the owner of the file and "u" means other users.

Really, it's hard to imagine a worse pair of letters for sowing confusion. In fact the way it makes the most sense to me is exactly the opposite of reality: "o" should be "owner" and "u" should be general users.

That's why I prefer the numerical codes and consider them simpler to get right. Every time I need the text syntax (if I need to do something more sophisticated such as remove the w bit without touching other values) I have to go look up the man page to make sure I have it right. You definitely don't want to mess it up!!

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 3:08 UTC (Wed) by k8to (guest, #15413) [Link]

The problem of 'o' and 'u' is not the command line, but really how the flags were named originally. Unless you're objecting to the short form, in which case sure it doesn't help, but command line flags aren't designed for clarity very often.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 11:57 UTC (Wed) by tao (subscriber, #17563) [Link] (2 responses)

I have a hard time seeing how "u=user", "g=group", "o=other", "r=read", "w=write", "x=execute" would be harder to memorise than "second position user", "third position group", "fourth position other", "1=execute", "2=write", "4=read", except if you have a 4th, leading, digit, in which case they are setuid, setgrp, sticky bit...

But I guess we all have different ways of remembering things.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 13:03 UTC (Wed) by madscientist (subscriber, #16861) [Link] (1 responses)

Position is trivial to remember since it's the same way file permissions are shown by ls etc. Most of the detail you suggest isn't used 99% of the time: people really only care about a few possibilities: 4 for read-only and 6 for read-write, then add one if you also want execute.

I already clearly (I think) explained the specific issue I had. I don't object to the text form, and as mentioned I do use it when I need to use the "+" or "-" forms of chmod for example. However I think poor design choices make it harder to use correctly and so I prefer the numeric system on the command line when possible. There's little possibility of mixing up the order of three numbers.

Please note I'm speaking here specifically of the "chmod" command line syntax.

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 12:10 UTC (Thu) by tao (subscriber, #17563) [Link]

ls shows rwx though, not the octal equivalent.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 15:50 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (3 responses)

That's exactly as 'clear' or 'unclear' as the equivalent

chmod 0755

u=rwx,go=rx has no inherent meaning. It's an arbitrary encoding of an integer using 'letters' and 'funny symbols' instead of numbers.
If you're familiar with the encoding, it's "clear" what it means, if you're not, you won't understand it. One could even call it misleading as go is an English verb which doesn't mean "group and other" (a phrase with doesn't mean anything in itself, either).

I stopped using the 'letters and funny symbols' encoding once I got over my prejudice that 'letters and funny symbols' are somehow 'inherently better' than numbers. For the octal encoding, 1 means execute, 2 means write and 4 means read (also 1 means sticky, 2 setgid and 4 setuid for the fourth set).

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 20:47 UTC (Wed) by marcH (subscriber, #57642) [Link] (2 responses)

> It's an arbitrary encoding of an integer using 'letters' and 'funny symbols' instead of numbers.

Those damn, so-called "letters"... I wish email addresses had allowed digits only, they would have been as easy to remember as phone numbers.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 21:42 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (1 responses)

UNIX file system permissions aren't phone numbers. This is an integer using 4 adjacent 3-bit groups to encode the corresponding information with 3 of these 4 groups using the same encoding. For 'everyday use' all one needs to remember is that the order is owner - group - world/ other and that 1 means execute, 2 write and 4 read. That's easy enough to remember and it's also not complicated to do additions involving only 1, 2 and 4.

It's certainly not more complicated than remembering PINs or passcodes, something many people apparently do without problems.

It does take a conscious decision to do so, though.

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 4:22 UTC (Thu) by marcH (subscriber, #57642) [Link]

> For 'everyday use' all one needs to remember is that the order is User - Group - Other and that 1 means eXecute, 2 Write and 4 Read.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 1:43 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

I certainly prefer names. I have never actually used octal in all my life. I know that it exists but meh...

The Grumpy Editor's Python 3 experience

Posted Aug 3, 2018 16:44 UTC (Fri) by mm7323 (subscriber, #87386) [Link]

Most places I've worked at have been against 'magic numbers' in code, both in Coding Standards documents and at review stage.

The reasons are simple and many.

1) giving things a name often help understanding
2) macros or constants provide a convenient place to hang documentation
3) you can easily grep for them to audit use
4) they provide a sort of type hint or type safety depending on language
5) you can Google them much more easily
6) macros can hide expressions or reliance on other macros which helps explain their derivation
7) you can change their definition at a later date without having to search or change lots of code sites

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 4:29 UTC (Wed) by warrax (subscriber, #103205) [Link]

Not with non-default permissions they don't.

(Even if it were the case, octal by prepending just a 0 is utterly stupid because it goes against conventional decimal notation for no good reason. While '0o' as a prefix is heaps better, it's still unfortunate that the characters look so similar in many fonts. Hopefully programmers would use fonts that clearly distinguish between 0 and o and O, though.)

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 6:20 UTC (Wed) by jubal (subscriber, #67202) [Link] (1 responses)

Oh, but “0” in 0755 is not denoting octal-ness, it's literally 0. Cf. the difference between 1755 and 0755.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 11:36 UTC (Wed) by dskoll (subscriber, #1630) [Link]

The constant 1755 in C is 03333, so I hope you don't use it in a C program to specify file permissions....

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 6:22 UTC (Wed) by rsidd (subscriber, #2582) [Link]

You could view the 7 in 7 == binary 111 as either octal or hex or even decimal. "Octal" really means, reading octal 71 as 7*8+1=57. Do you ever require that?

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 16:22 UTC (Thu) by MatyasSelmeci (guest, #86151) [Link] (1 responses)

This makes me wonder, is there _any_ use these days for octal at all, other than Unix file permissions? (And the COBOL software on the 3270 in the basement that everybody's too afraid to touch because it runs payroll (just kidding... I hope))

When wrote code that I wanted to make compatible with both Python 3 and Python 2.4 (which did not support the 0o syntax), I had to write "int('0755', 8)".

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 16:46 UTC (Thu) by jwilk (subscriber, #63328) [Link]

POSIX printf(1) supports octal escapes (\377), but not hex escapes (\xFF).

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 17:01 UTC (Thu) by farnz (subscriber, #17727) [Link]

Most of the time, when I create a file, I want the permission bits set to 0o0666 & ~umask, and directories set to 0o0777 & ~umask. Coincidentally, this is the default permissions I will get with a plain syscall, no efforts to set permission bits.

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 21:33 UTC (Tue) by ejr (subscriber, #51652) [Link] (2 responses)

Once upon a time, back when dinosaurs ruled, TCL would sort 010 after 009. Then in a minor point revision it started auto-converting in octal. (IIRC, something like 4.7.2 to 4.7.3, not that I'm bitter or anything.) Debugging that one was a joy.

This ish needs to go away.

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 22:04 UTC (Tue) by k8to (guest, #15413) [Link] (1 responses)

What's "ish" ? Issue?

Somewhat lost here.

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 23:09 UTC (Tue) by marduk (subscriber, #3831) [Link]

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 21:34 UTC (Tue) by JFlorian (guest, #49650) [Link] (1 responses)

Phew! Your Python migration sounds more painful than mine. I hit bumps for sure, but overall it went predictably.

The task of grep'ing for division sounds ... awful. That reminds me of when I was trying to decipher some Ruby that did something like bar=%x'foo' and it was entirely non-obvious that foo was an executable. Googling for the answer seemed impossible and I kept mumbling "code is read more than written." I'm sure glad the author didn't have to type "exec" all the way out.

It seems all migrations (languages, apps, whatever) have one thing in common, however. Things wind up healthier on the flip side. Maybe buggier in the immediate short term, but better overall. It's like moving, a chance to revisit all the clutter that we surround ourselves with and become accustomed to. It doesn't mean it's fun though.

As always, your grumpiness makes *me* feel better, so thanks! :-)

The Grumpy Editor's Python 3 experience

Posted Aug 15, 2018 20:04 UTC (Wed) by Wol (subscriber, #4433) [Link]

> It seems all migrations (languages, apps, whatever) have one thing in common, however. Things wind up healthier on the flip side. Maybe buggier in the immediate short term, but better overall.

Depends whether it's driven by tech, or by PHBs fed rubbish by sales people.

I can think of several disaster-story ports - Oxford Health Care is a well-known case story in my industry where the migration basically sent the company bankrupt, because the new system was far more resource hungry and less capable than its predecessor.

And my favourite story - consultants announcing to management (after SIX MONTHS hard work) that their new system was 10% faster than the old one. Only for the dinosaur in charge of the old system to overhear, and say "10%!, 10%!!!, you're PROUD that your twin Xeon 800 is ONLY ten percent faster than a PENTIUM NINETY!!!".

Cheers,
Wol

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 22:07 UTC (Tue) by RooTer (guest, #91640) [Link]

and here I was expecting article on Google Grumpy https://github.com/google/grumpy

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 22:28 UTC (Tue) by k8to (guest, #15413) [Link] (4 responses)

Regarding the import scenario:

* For the base problem, I've always favored shipping a lib dir with my project that the bootstrap explicitly adds to sys.path before (most) imports begin. Some people think it's hacky, but it works with any deployment model, and is simple. The only downside is if someone hacking on it doesn't realize this, so I usually put it in a README.

* import loops are nasty, and it unfortunately takes a long time in python before your project is big enough or you do something unusual enough that they bite you. The net result is that you can end up with a lot of them before you realize you don't want them. Maybe pyflakes or whatever will auto-find them. Python is kind of best run with a lot of checkers, but sadly a lot of the checkers come with too many opinions and too much work to turn off the dumb ones.

-----

* For the division, it's been possible to get "new-style" division in python2 for many years now, so I've been able to make the switch gradually module by module. That doesn't help you of course. I guess this change makes python more similar to other languages, but I don't think it makes it more internally consistent, and I don't really think it was worth it. It's one of the things that was a pain in the butt when writing polyglot code to run on both.

* I hope you don't run into situations where you really want to use a utility thing that wants str when you have bytes. Often it's wrong to go bytes->str->util->str->bytes. Sometimes I have to just re-implement it, often copy-pasting from the python code. Probably I should write patches, but sometimes a whole module has the idea it only wants bytes, so it would be a large (and maybe controversial) patch.

* six is a little magical, agreed. I think it was a godsend when maintaining code with significant numbers of developers in a time when you had to target both pythons. If you don't though, it isn't worth it, and I think the window for that time is passing.

* The octal thing seems like the right idea, though I would have vastly preferred the error be opt-in, or opt-out. I find 0o777 truly bizarre, though I guess it's worth making octal numbers hard to do by accident.

* The most worrying part of this is the discussion of pickles. It sounds like you were transmitting pickles over the network, potentially in an insecure fashion. I've always felt pickling was acceptable for persisting to disk in scenarios where accessing the disk was already game over. However, putting them in emails smells like a remote code execution open door, even if you think you control the email store.

Obviously, if you load an object and run it, it's a remote code execution, but you may not be aware that the load action can take ownership of the process immediately without ever running your code from that point on.

I view pickling of python objects as truly magical. You can stash executable logic in a set of bytes and run it again later, which can be extremely powerful. You can have ephemeral plugins over the network and other crazy ideas. It's based on the python bytecode loader (that's essentially all it is), though so it can't work across versions.

If you just want to store *data*, then something like json.dumps is probably better, though it's not necessarily safe by default (depending on python version, it is willing to deserialize executable junk to objects by default, which is truly unfortunate).

Even if you're just sending the data out to a system that is differently controlled, and not an executable object, I recommend being paranoid: https://pythonhosted.org/itsdangerous/

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 22:39 UTC (Tue) by jake (editor, #205) [Link] (2 responses)

> It sounds like you were transmitting pickles over the network, potentially in an insecure fashion.

No, our pickles are stored in the database, not taken from (or sent to) the network.

The email module woes were unrelated, mostly concerning ingesting emails to turn them into "articles".

jake

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 3:10 UTC (Wed) by k8to (guest, #15413) [Link]

Thumbs up!

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 6:50 UTC (Wed) by Darkmere (subscriber, #53695) [Link]

Databases are to be considered untrusted sources of data one should always perform data validation on it.

The data in a database was out there by someone who doesn't have the same bugs and valialdation patterns as you do today, thus you know from the beginning that it's not validated properly.

The dev in the past is always to be considered both untrustworthy and malicious on the level of incompetent. Just look at how much extra work they've caused you by not doing things that you now know is right and good. Clearly you can't trust that dev.

This is something thats likely to continue. Noone has caused me so much work as past me.

The Grumpy Editor's Python 3 experience

Posted Aug 13, 2018 7:58 UTC (Mon) by ber (subscriber, #2142) [Link]

> json.dumps [..] it's not necessarily safe by default (depending on python version, it is willing to deserialize executable junk to objects by default, which is truly unfortunate).

Can you elaborate on this? (A quick search did not turn out anything discussing this problem.)

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 23:35 UTC (Tue) by luto (guest, #39314) [Link] (3 responses)

> Python 2 defined integer division as originally intended by $DEITY and implemented by almost every processor: the result is a rounded-downward integer value.

$DEITY indeed intended for division to round down. Alas, almost every CPU rounds toward zero instead, which is an abomination unto mathematics.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 6:40 UTC (Wed) by Homer512 (subscriber, #85295) [Link]

Ups, I was never aware that Python rounded down instead to zero!

I've gotta say: All concerns about mathematical purity aside, I really hate it when Python needlessly deviates from well-established C semantics (even if the are just pseudo-standards because they work on x86). It just creates tons of new gotchas and makes code conversion harder. Just like the bitshift. Why is 16 >> -2 an error? The common argument I've heard is that it's not well defined in C, either. Okay, but then why is it okay to change the semantics of division? And how does forbidding negative bitshifts result in better code overall when we now have to sprinkle our code with if's just to get the bitshift working reliably?

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 13:58 UTC (Wed) by k3ninho (subscriber, #50375) [Link] (1 responses)

>Alas, almost every CPU rounds toward zero instead, which is an abomination unto mathematics.

Forgive them, for their direction sign is divorced from their magnitude symbols.

(I'm pretty sure I don't like -4/3 = -2. At least you can reason about the magnitude of 4/3 and -4/3 being the same thing and so having the same ratio of threes if our calculation rounds both toward zero.)

K3n.

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 15:34 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> (I'm pretty sure I don't like -4/3 = -2. At least you can reason about the magnitude of 4/3 and -4/3 being the same thing and so having the same ratio of threes if our calculation rounds both toward zero.)

If -4//3 = -(4//3) = -1 then the remainder -4%3 must be negative (-1) to preserve the relation that quotient * divisor + remainder = dividend. However, the standard representation for numbers modulo N is in the range 0 to N-1. To get a standard non-negative remainder (for positive divisors), the division operation must round toward negative infinity rather than zero.

Note that Python does produce negative remainders in the case where the divisor is negative (4 % -3). Correcting this would require the rounding direction to depend on the sign of the divisor. I assume that negative divisors were deemed too rare to justify the extra complexity.

The Grumpy Editor's Python 3 experience

Posted Jul 31, 2018 23:59 UTC (Tue) by ewen (subscriber, #4772) [Link] (1 responses)

To this useful "war story" of problems encountered, I'd also add "target Python 3.6+", and probably ideally Python 3.7 at this point. Python 2.7 and Python 3.7 are more compatible with each other than earlier Python 3 was with Python 2.6/2.7. In particular Python 3.5 (included with, eg, Debian 9/Stretch, and Ubuntu 16.04/Xenial) has a bunch of subtle incompatibilities with Python 2.6/2.7 assumptions/code, that got fixed, or better defaults, in Python 3.6. (For one recent case around JSON parsing of the output of a subshell, it seems like Python 3.5 was the last actively used version that needed special workarounds -- Python 2.6/2.7/3.6/3.7 all just worked.)

The other useful trick I found was using a Python 3.6+ venv as a way of testing compatibility, without having to change the bang path (#!) explicitly to Python 3:

python3 -m venv SOMEDIR

then activate that venv, and within it "/usr/bin/env python" will run python3, but outside the venv "/usr/bin/env python" can still run python2. That makes it easier to test both Python 2.7 and Python 3.7 side by side on the same machine, without having to edit files or manually run python FILE. (FTR, that venv creation syntax also needs Python 3.6+.)

Ewen

The Grumpy Editor's Python 3 experience

Posted Aug 1, 2018 9:56 UTC (Wed) by Kamilion (guest, #42576) [Link]

Good to know -- I'll move from 3.5 ASAP.

Thanks for the venv tip; has already come in handy.

Automated tests?

Posted Aug 1, 2018 10:02 UTC (Wed) by tekNico (subscriber, #22) [Link] (1 responses)

"it is going to take a lot of testing before your editor is confident with the results."

This seems to refer to the manual kind of testing. Are there automated tests in the code base?

Automated tests?

Posted Aug 1, 2018 19:36 UTC (Wed) by ceplm (subscriber, #41334) [Link]

One of the main lessons from py2k->py3k conversions should be truly that unit tests are not something to ignore, but just the opposite: there is no bug, if it doesn't trigger some unit test. Not only it helps py2k->py3k transition, but also maintaining Python code in the long term (when Python standards evolves).

Supporting Python 2 and 3

Posted Aug 1, 2018 15:07 UTC (Wed) by david.a.wheeler (subscriber, #72896) [Link] (1 responses)

Thanks for the article!

I find it very helpful to incrementally improve code over time to run on BOTH 2 and 3. "Futurize" can help automatically do some of this work:
http://python-future.org/automatic_conversion.html

That way, instead of trying to convert everything, you can do things a piece at a time. Tweaking code to use print functions and python3 division, while still running under Python2, is easier to handle if you do it gradually.

Here's what the Python developers suggest, though as noted it omits much:
https://docs.python.org/3/howto/pyporting.html

Supporting Python 2 and 3

Posted Aug 6, 2018 6:30 UTC (Mon) by salimma (subscriber, #34460) [Link]

Came here to suggest this myself, here's an upvote

Imports and circular dependencies

Posted Aug 1, 2018 18:00 UTC (Wed) by filbranden (guest, #87848) [Link]

This article got me curious about circular dependencies, because I was under the impression that they were mostly fine in Python.

Got me to write an experiment and see that I actually don't understand that as well as I thought I did...

Description of the experiment and questions about it posted here:
https://stackoverflow.com/questions/51639547/python-circu...

Python experts, your answer there would be appreciated :-)

The Grumpy Editor's Python 3 experience

Posted Aug 2, 2018 17:58 UTC (Thu) by kpfleming (subscriber, #23250) [Link] (1 responses)

Does it concern anyone else that 'subscription management' and 'money calculations' were being done using integer division which threw away remainders? Where are all those extra fractional pennies going, Grumpy Editor? :-)

Fractional pennies

Posted Aug 2, 2018 18:00 UTC (Thu) by corbet (editor, #1) [Link]

How do you think we acquired that massive LWN yacht?

Now if I'd said we were doing those calculations in floating point, then you would have reason to be concerned...

The Grumpy Editor's Python 3 experience

Posted Aug 9, 2018 17:50 UTC (Thu) by gartim (guest, #10123) [Link]

Ditto on your pain, suffered myself with the String vs Byte difference and how to handle "text data" with illegal data. Often from sources like mp3 meta data.

Thanks

Posted Aug 10, 2018 15:39 UTC (Fri) by sumanah (guest, #59891) [Link]

Thank you for sharing your experience! Informative.

The Grumpy Editor's Python 3 experience

Posted Aug 24, 2018 9:20 UTC (Fri) by nim-nim (subscriber, #34454) [Link]

Multiple import of the same module (or import loops, or multiple import of different versions of the same module) is indeed nasty. Things seem to work at first, the mistakes get entrenched before side effects are felt, and the side effects are hell to debug.

It is quite sad most dev environments do not detect and warn about them by default (expecting humans to be disciplined without tooling is quite hopeless). That should be much easier than some of the refactoring helping that's expected nowadays.


Copyright © 2018, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds