Development
Python and to_file()
Python is used in a wide variety of circumstances, by people coming from different backgrounds and with various needs. A recent thread on the python-ideas mailing list thread started with a thought about a quick and easy way to write a string to a file, much like is done in some other, specialized languages (e.g. R, MATLAB). It soon expanded in several directions, partly into a philosophical consideration of the role of the language—and how best to accommodate those coming to Python from those other languages.
Nick Eubank kicked off the discussion by
noting that he is social scientist trying to help his colleagues move away
from the specialized languages to Python. "One of the behaviors I've
found unnecessarily difficult to explain is the 'file.open()/file.close()'
idiom (or, alternatively, context managers).
" In other settings,
saving to a file is a one-step operation, he said, so there is a conceptual
hurdle for those new to Python:
While there were objections that a single line "write string to named file"
operation was an unneeded niche function that would lead to bad habits and
slower code, there was also some support for the idea. Andrew Barnert said that he often uses the feature in other
languages but, given Python's feature set, he
would rarely need it in Python: "But for users migrating to
Python from another language, or using Python occasionally while primarily
using another language, I can see it being a lot more attractive.
"
On the other hand, Chris Barker was confused by the need for it at all:
open(a_path, 'w').write(the_string)
short, simple one-liner.
OK, non cPython garbage collection may leave that file open and dangling, but we're talking the quicky scripting data analysis type user -- the script will terminate soon enough.
Barker went on to note that NumPy offers ways to directly read and write arrays to and from files. There is also a one-liner for reading a string back in:
string = open(path).read()It does suffer from the same problem Barker mentioned for Python implementations that do not use reference-counted garbage collection, however.
Nick Coghlan chimed in with the idea that
Python is used for both scripting and application development, but its
tutorials and such typically focus on the application side, where
"relying on the GC
for external resource cleanup isn't a great habit to get into
".
Thus that introductory material will show the "deterministic
cleanup form
" for writing to a file. That means that the user has
to expand their mental model:
By contrast, unpacking the steps in the one-liner:
- open the nominated location for writing (with implied text encoding
& error handling)
- write the data to that location
It's that switch from a 1-step process to a 2-step process that breaks flow, rather than the specifics of the wording in the code (Python 3 at least improves the hidden third step in the process by having the implied text encoding typically be UTF-8 rather than ASCII).
Eubank strongly agreed with Coghlan's formulation and added:
In his post, Coghlan also suggested a "radical
notion
" that would create some kind of save/load function that would
automatically use UTF-8 and JSON. That would mean it could be used for
objects more complicated than strings and would create files in a
well-defined format. He noted that there would be some benefits to that
approach:
Barker suggested that perhaps the language definition could expand to require immediate garbage collection when it is known that the created object is no longer used (as in his one-liner save above). Since the result is never assigned to a variable, the file object created goes immediately out of scope, thus it could be reclaimed. The CPython reference implementation already does this, but as Brett Cannon pointed out, mandating that is not something the core developers are likely to want to do:
There was also discussion of where such a convenience function should
live. Eubank's original idea of adding a to_file() method to the
string type was not particularly popular.
Eubank had noted that the pathlib module
has functionality that is similar to what he was asking for, but "can't imagine anyone looking in
the Path library
" if they just want to write a string to a file.
Koos Zevenhoven pointed out that the needed functions
(write_text() and read_text()) had only been added in
Python 3.5, which was only released back in September, "so there
has not really been time for their adoption
". Others noted that
there are issues with converting strings to Paths and vice versa, which
would open up another can of worms for users who were just looking for a
simple way to write their string.
Coghlan suggested that the io module might be the right place, but there are still some fundamental issues. For one thing, as Victor Stinner noted, the simple one-liner "solution" does raise a warning (ResourceWarning when the -Wd flag is used) because the state of the file on disk is unknown; it may or may not have been flushed. There is a longstanding bug regarding a portable way to ensure an atomic write to a file, which Stinner also referred to. While a simple wrapper, using a with context manager, could ensure the file gets closed, it cannot ensure the data reaches the disk, which makes it something of an unsafe operation.
So, what seemed a simple idea at the start ballooned into a much more
complex question. Part of the problem is that specialized languages can
make lots of assumptions about how they will be used, which allows them to
"hide more of the complexities from [their] users
", as Coghlan
put it. But a language like Python can't
do that, except in domain-specific modules such as NumPy or Pandas. There is a question of how to
choose the right defaults:
While there seemed to be general agreement that the io module would be the right place to put a convenience function of this sort, it is not clear if there is enough support to do so. The alternatives are reasonably readable and understandable; even the proper with form is not that hard to grasp. There will be hurdles when moving from a specialized to a general-purpose language—the advantages of the latter should make clearing the hurdles worth it, at least for some. The conversation did provide an interesting look into the thinking process that goes on in Python circles, though.
Brief items
Quotes of the week
Mono Relicensed MIT
At the Mono Project blog, Miguel de Icaza announced that the Mono runtime has been relicensed, moving from a dual-license slate (with LGPLv2 and proprietary optiona) to the MIT license. The Mono compiler and class libraries were already under the MIT license and will remain so. "Moving the Mono runtime to the MIT license removes barriers to the adoption of C# and .NET in a large number of scenarios, embedded applications, including embedding Mono as a scripting engine in game engines or other applications.
" De Icaza notes that Xamarin (which was recently acquired by Microsoft) had developed several proprietary Mono modules in recent years; these will also now be released under the MIT license.
Discourse 1.5 released
Version 1.5 of the Discourse open-source discussion-and-commenting system has been
released.
Significant work went into rewriting the top-level "topics" page,
resulting in a five-fold speed increase. Administrators can now
change and customize every object label used in the interface. "Want topics to be 'threads'? Users to be
'funkatrons'? Like to be 'brofist'? Well, Discourse is your
huckleberry.
" Support for email comments has also been improved, and user
groups can now exchange private messages. The badge system, which is
used to denote user roles and to mark popular posts, received a
visual refresh and new documentation; user summary pages were also refreshed.
Exim 4.87 Released
Version 4.87 of the Exim mail transfer agent has been released. Several formerly experimental features are now marked as fully supported, including internationalized mail addressing, SOCKS support, REDIS support, and events. There are also many new expansion variables available, and improvements to the regular-expression support in ACLs.
LXC 2.0 released
Version 2.0 of the LXC containerization system has been released. Among the changes are more reliable checkpoint and restore, improved control-group handling, and many bug fixes. Also of note is that LXC 2.0 is designated a long-term support release; backported security updates and bugfixes will be provided for the next five years.
Rkt 1.3.0 released
Version 1.3.0 of the rkt container system has been released. "rkt version 1.3.0 improves handling of errors within app containers, tightens security for rkt’s modular stage1 images, and provides a more compatible handling of volumes when executing Docker container images rather than rkt’s native ACI image format. This release further develops the essential support for rkt as a component of the Kubernetes cluster orchestrator."
Newsletters and articles
Development newsletters from the past week
- What's cooking in git.git (April 1)
- What's cooking in git.git (April 4)
- OCaml Weekly News (April 5)
- OpenStack Developer Digest (April 1)
- Perl Weekly (April 4)
- PostgreSQL Weekly News (April 3)
- Python Weekly (March 31)
- Ruby Weekly (March 31)
- This Week in Rust (April 4)
- Tahoe-LAFS Weekly News (April 5)
- Wikimedia Tech News (April 4)
KDE Presents its Vision for the Future
The KDE project has released a vision statement, a single sentence that sums up what the project would like to achieve: A world in which everyone has control over their digital life and enjoys freedom and privacy. "Our vision unites KDE in common purpose. It sets out where we want to get to, but it provides no guidance on how we should get there. After finalizing our vision (the "what"), we have immediately started the process of defining KDE's Mission Statement (the "how"). As with all things KDE, you are invited to contribute. You can easily add your thoughts on our mission brainstorming wiki page." (Thanks to Paul Wise)
Page editor: Nathan Willis
Next page:
Announcements>>