|
|
Subscribe / Log in / New account

Distribution quote of the week

In a paper published October 8, researchers at the University of Hawaii found that a programming error in a set of Python scripts commonly used for computational analysis of chemistry data returned varying results based on which operating system they were run on—throwing doubt on the results of more than 150 published chemistry studies. [...]

The scripts, called the "Willoughby-Hoye" scripts after their authors—Patrick Willoughby and Thomas Hoye of the University of Minnesota—were found to return correct results on macOS Mavericks and Windows 10. But on macOS Mojave and Ubuntu, the results were off by nearly a full percent.

Sean Gallagher (at Ars Technica)

to post comments

Distribution quote of the week

Posted Oct 24, 2019 11:04 UTC (Thu) by HelloWorld (guest, #56129) [Link] (9 responses)

Once more it turns out that there are only two kinds of programming: functional programming and dysfunctional programming. Python clearly seems to be in the latter category.

Distribution quote of the week

Posted Oct 24, 2019 13:21 UTC (Thu) by excors (subscriber, #95769) [Link] (8 responses)

It doesn't sound like Python's fault. According to the comments on that article, the issue was that they had one set of data files containing frequency information and a corresponding set of files containing NMR information. When each set was sorted by filename, the first frequency file would correctly match the first NMR file, and so on. But when they weren't sorted, they would be matched at random and give incorrect results. Python doesn't guarantee any kind of sorting, it just returns what the OS provides, and some OSes differ.

That's obvious in hindsight but it sounds like an easy bug to introduce and to miss during development. Perhaps the lesson is that the script should have come with tests that users would run to verify they get correct results on known data, before they run it with their new data.

Distribution quote of the week

Posted Oct 24, 2019 16:00 UTC (Thu) by HelloWorld (guest, #56129) [Link] (7 responses)

If Python doesn't guarantee any particular order, then os.listdir should return an unordered container rather than a list.

Distribution quote of the week

Posted Oct 24, 2019 16:57 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (6 responses)

A list is an unordered container, so what do you mean here? Do you mean one without an indexing operator (like set)? In any case, the documentation for the function's *second sentence* includes "The list is in arbitrary order". This is one place I can't blame the Python docs for being anemic (details like this are usually included; how best to use features or use features in concert with each other is usually the problem).

Modules like `os` have an interesting role to play in higher-level languages like Python. How many of the semantics of the lower-level should be ignored for convenience? `os.listdir` sounds like a perfect candidate for a generator rather than returning a list. Any sorting guarantees would require reading all of the entries before *any* can be returned. A set would have hashing overhead all of them first. And filesystem operations is somewhere even Python scripts can feel performance issues, so adding too much forced overhead is not likely to end in happiness. Maybe Python should offer the `os` methods, but wrap them up in a more Pythonic interface in another module? Maybe such a package already exists?

Distribution quote of the week

Posted Oct 25, 2019 14:52 UTC (Fri) by HelloWorld (guest, #56129) [Link] (5 responses)

If `os.listdir` were to return a generator, how would you make sure that the underlying file descriptor is closed? Users won't necessarily exhaust the generator, so that seems like a bad idea.

I also don't know what you mean when you say that a list is not an unordered container. The elements clearly are arranged in a specific order in the list.

Distribution quote of the week

Posted Oct 25, 2019 15:34 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (3 responses)

> If `os.listdir` were to return a generator, how would you make sure that the underlying file descriptor is closed?

There's `with`, but that is a syntactic thing rather than anything else. Without that, how do you know any `open()` call is closed?

> The elements clearly are arranged in a specific order in the list.

There's a specific order for any given set() instance as well; you just have to access it via iteration rather than the `[]` operator. I was interpreting ordered as in "total ordering" or "weak ordering".

Distribution quote of the week

Posted Oct 25, 2019 19:06 UTC (Fri) by gdiscry (subscriber, #91125) [Link]

> > If `os.listdir` were to return a generator, how would you make sure that the underlying file descriptor is closed?
>
> There's `with`, but that is a syntactic thing rather than anything else. Without that, how do you know any `open()` call is closed?

`os.scandir`[1] is the improved `os.listdir` you are looking for 😉.

[1] https://docs.python.org/3.7/library/os.html#os.scandir

Distribution quote of the week

Posted Oct 28, 2019 15:03 UTC (Mon) by HelloWorld (guest, #56129) [Link] (1 responses)

> There's `with`, but that is a syntactic thing rather than anything else. Without that, how do you know any `open()` call is closed?
Sure, you can use `with`, but then it wouldn't be a drop-in replacement for `os.listdir`.

> There's a specific order for any given set() instance as well; you just have to access it via iteration rather than the `[]` operator.

A container that is truly unordered would not give you any way to sequentially iterate over its elements. This sounds completely useless, because how are you supposed to do anything useful with a container you can't iterate over?

The answer is that such containers should offer methods to process the container's elements only in ways where the order doesn't matter. An example would be to calculate the sum of the length of the strings in a container. It's always going to be the same, regardless of the order the set is traversed in, because addition is commutative. Another example would be to apply a pure function to each element of a container and store the results in another unorderd container.

The cats library implements these abstractions for that purpose:

https://github.com/typelevel/cats/blob/master/core/src/ma...
https://github.com/typelevel/cats/blob/master/core/src/ma...

Note the use of CommutativeApplicative and CommutativeMonoid.

Distribution quote of the week

Posted Oct 28, 2019 15:26 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> Sure, you can use `with`, but then it wouldn't be a drop-in replacement for `os.listdir`.

Without destructors, no replacement could ever ensure it gets closed. Since the only way to use an analogous mechanism in Python is by using `with`, there's nothing to be done.

> <higher-order type safety stuff>

Hmm. Python only recently got even rudimentary optional static type checking support. These kinds of higher-order things being in the core sounds very un-Pythonic. For Coq or Idris, sure. Haskell probably as well, but even hash maps implement Traversable there.

Distribution quote of the week

Posted Oct 31, 2019 7:45 UTC (Thu) by Wol (subscriber, #4433) [Link]

> I also don't know what you mean when you say that a list is not an unordered container. The elements clearly are arranged in a specific order in the list.

As somebody who has had that fight with SQL, I'm pretty certain he means a list is a container with elements arranged in a *random* order. In other words, you will get the same result if you scan the list twice, but there are no guarantees beyond that. If you then *sort* the list in some order, it becomes a *different* list. As opposed to if you sort a set, it's still the same set because sets explicitly have no order.

Cheers,
Wol


Copyright © 2019, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds