Leading items

Python's GitHub migration and workflow changes

By Jake Edge
June 8, 2016

Brett Cannon gave an update on the migration of Python's repositories to GitHub and the associated workflow changes at the 2016 Python Language Summit. The goal is modernize the development process; right now that process is "old school", which is "good or bad depending on who you ask". After looking at the options, GitHub seemed to be the best choice for housing the repositories; PEP 512 lays out the options and rationale for those interested. LWN looked at some of the discussion surrounding the move back in December 2014.

The starting point is to move some simple repositories first: devinabox (a tool to get CPython developers everything they need easily), benchmarks (which may be getting a fresh start, so it might not need to migrate), PEPs (there is already an unofficial GitHub repository), and the Developer's Guide (which just needs changes so it can be built from a Git clone).

The main reasons to start with simpler repositories is to ease the transition. People who are already familiar with Git and GitHub will find it easier to contribute, while the maintainers of those projects will get push-button merging of pull requests. It will move the bugs out of the main bugs.python.org tracker into project-specific bug trackers, as well. There will also be a "CLA bot" to track whether contributors have signed the Python Contributor Agreement by matching GitHub usernames to signers of that document.

So far, the CLA bot is running and the devinabox repository has been moved. The PEPs repository is next. Cannon had hoped that the unofficial repository was up to date, but that turned out not be true. So there will need to be a migration, but "it shouldn't be too big of a deal".

But then there is the "363kg repo", which is the one for CPython. He wants to try to determine what must be in place before that repository can be migrated. The goal is to be able to handle contributions more quickly, with the hope that leads to more core developers—and "rainbows everywhere". The longer it takes to make the switch, while staying with the current workflow, the harder it will be, he said.

The question is: "what do you have to have before moving CPython to GitHub?" He has identified some definite requirements, including decisions that need to be made on the mechanics of using Git and GitHub. Whether pull requests should be merged using squash commits and how to handle merging across branches are two of those decisions. The sys._mercurial attribute (which provides information on the Mercurial commit ID that the CPython interpreter was built from) will need to be replaced. The Developer's Guide will need to be updated, but it will be on GitHub at that point, so a branch can be started to make those changes.

There is also a set of possible requirements that he has assembled. Whether or not they are truly needed before the CPython switch to GitHub can happen is up to the core developers, Cannon said. How to handle the misc/News file, which carries little blurbs about features and changes in the release, needs to be determined. If each contributor changes it on their branch, that will just lead to merge conflicts. It could be handled automatically by deriving the entry from the commit message or that file could be split up into individual-entry files and reassembled at release time. But, he said, there may be no real need to block moving to GitHub until a solution for that is found.

Another possible requirement is linking between GitHub and bugs.python.org. That is important to have, but may not be needed before the switch. It comes down to a question of whether he needs to solve these problems before switching or whether the developers can live with some manual parts of the process until these kinds of things are resolved. He encouraged the developers to think about and discuss the real requirements over the coming months.

The "moonshot goal" of this effort is to not have any pull requests that languish for long periods of time. He would like to have things automated to the extent that developers aren't burdened. That way, they could review patches monthly—or even weekly—which would go a long way toward reducing the patch-review queue.

Comments (1 posted)

The state of mypy

By Jake Edge
June 8, 2016

Python Language Summit

At last year's Python Language Summit, Guido van Rossum gave an introduction to "type hints", which are an optional feature to allow static checkers to spot type errors in Python programs. At this year's summit, he discussed mypy, which is one of several static type checkers for Python. It is being used by Dropbox, Van Rossum's employer, on its large Python codebase—with good results.

Van Rossum began by noting that he has been working on mypy, along with quite a few others. It uses the type hints that were standardized in PEP 484 and the function annotation syntax that came with PEP 3107. Because of that, multiple projects can use the annotations. Other users of the syntax include the PyCharm IDE and Google's pytype tool. It is a "big tent", he said, which is exactly what was intended.

Type hints are optional and will always remain that way. They are "not everyone's cup of tea" and can sometimes just get in the way. There are "lots of reasons you don't want to use type annotation", he said.

But he and Dropbox have found it useful. Annotations are being added to a multi-million line application. Part of the reason is to help new employees come up to speed on that codebase. Rather than have to look through a million lines of code to figure out if a function returns a list or a dict, they can just consult the stub file—the .pyi file where type annotations are often placed—or get an error from mypy.

The larger a codebase is, the more benefit it will get from adding annotations and actually checking them with a tool, Van Rossum said. But everyone starts with a small codebase and it can be hard to recognize when it transitions into a large one. He recommends that projects start looking into annotations "sooner than you would like to".

Mypy was started by Jukka Lehtosalo as a Python variant with type checking. Van Rossum and Lehtosalo met at PyCon 2013, where they discussed the project and eventually agreed that it would be more successful as an add-on to standard Python. That required a change in the syntax for mypy type annotations so that the Python parser did not need to change.

Over the years, there was a lot of discussion about type hints and how they should work. He gave a keynote about type hints at PyCon 2015 where he "overwhelmed most of the audience with too much detail way too fast". PEP 484 was eventually accepted. Since then, the typeshed repository has opened up to collect stub files with annotations for the standard library and other modules. Those stubs can be used both by mypy and by any other tools that are consuming the type hints.

Mypy has been used to find missing stub files. Those get turned into bug reports to add the stubs, so the typeshed repository is growing effectively. Dropbox has a team working on mypy and has adopted it internally. The results so far have validated the idea that type hints are useful, he said.

But, for Dropbox, type hints needed to work for its Python 2.7 codebase. Various things were tried before the Dropbox developers settled on type comments. There are lots of tools out there that think they know how to parse Python or projects that are using a different implementation of Python, which makes it difficult to simply adopt the Python-3-based type annotations in 2.7. He did not want to make changes to the upstream Python 2.7 code to support the annotations.

Someone from Google spoke up to say that the company does use the type annotations in 2.7 code, but has a "stripper" that removes them for tools that don't understand them. But Van Rossum said he wanted "unadulterated 2.7 code". He also looked at using Python docstrings, but there are already some "pseudo type annotations" in the docstrings that have not kept up with the code changes.

PEP 484 has "provisional" status, which has allowed more features to "sneak in" after its release in Python 3.5.0. All of these new features will show up by 3.5.2

Van Rossum then went through some of the changes to PEP 484 since it was accepted. The @overload decorator for overloaded functions was originally only allowed in stub files, but that has been extended to allow it in Python source files as well. Mypy has not yet added support for that, however. There is a new Text type that can be used for code that straddles the 2/3 divide. It is defined as a Unicode string for Python 2 and as the str type for Python 3.

There is also a new Type[C] type for use with classes. The C argument is a class and any arguments that use that annotation must be a subclass of C. That will allow factories to specify their class and return type. There are types to explicitly support the new coroutine syntax using async and await, though mypy does not yet support it.

Another addition that is coming to PEP 484 is the NewType() feature that will allow creating new type aliases. There has been some difficulty in naming the feature, which caused Van Rossum to jokingly refer to it as "BoatyMcBoatType". He is hopeful that a way to declare variables that doesn't require a type comment can come in some version of Python after 3.6. It would be nice, but is not urgent, he said.

Alex Gaynor asked if it was time to consider adding type annotations directly into the standard library. When the feature was first added, Van Rossum said that he did not want annotations in the library, and to put them in stub files, but is it time to readdress that?

Van Rossum said that the standard library is "very crusty code" and he worries that someone adding type annotations for everything in it will either make mistakes in 1% of the code or create merge conflicts all over the place. Either would be painful. For new modules, though, he would accept type annotations as part of the submission. Right now, producing stubs for the standard library is working well, it is parallelizable and creates no merge conflicts. Eventually, he said, his answer on this will change.

Comments (none posted)

An introduction to pytype

By Jake Edge
June 8, 2016

Python Language Summit

Google's pytype tool, which uses the PEP 484 type hints for static analysis, was the subject of a presentation by one of its developers, Matthias Kramm, at the 2016 Python Language Summit. In it, he compared several different tools and their reaction to various kinds of type errors in the code. He also described pytype's static type-inference capabilities.

There are several different tools using type hints at this point (mypy, PyCharm, and, soon, Pylint). Kramm showed a short example program with a type error:

    def f(x: int):
        return x
    f("foo")

He then showed the results of running mypy, pytype, and PyCharm on the program. As expected, each complained that the type of the argument in the

call to f() was wrong, though they each had their own way of indicating that.

He then moved on to some examples where mypy and pytype differ on whether there is a type error or not. For example, an argument annotated as an Iterable[float] (i.e. an iterable object, like a list, of floats) that was passed as a list of strings ([ "1", "2" ]) would cause pytype to emit an error, but not mypy. On the other hand, mypy looks at what is done with the argument inside the function:

    def f(x: List[str]):
        x.append(42)

That will cause a complaint from mypy, but not pytype, because pytype interprets the annotation as only applying to what is passed in. There were a few other examples where the two tools "disagree on what PEP 484 means" and how the type annotations should be interpreted. For the most part, though, the two tools are in sync, Kramm said.

After that, he turned to a pytype feature that is not present in the other tools: static type inference. Pytype can analyze Python source files to infer the types of arguments and functions return values. It outputs the type annotations into a stub (.pyi) file. There is a merge_pyi tool that can then be used to put the annotations from the stub back into the Python source file.

He gave a number of examples of the type inferences that pytype can make. For example:

    def get_label(self):
        return self.name.capitalize()

That method would be annotated with a str return type based on the type returned by capitalize(). A more complicated example:

    def dict_subset(d):
        return {key: d[key] for key in d.keys()
                if key.startswith(PREFIX)}

In this case, pytype would infer that both the argument d and the return type are of type Dict[str, Any] (i.e. a dictionary with a string key and any type for a value).

The tool will typically infer more types than might be expected. Types such as slice objects or complex objects may be overlooked by the function's author, but pytype will still take them into account.

For the future, there are plans to add support for duck typing, which might require changes to the existing type hints. There are also ideas about handling dependency graphs, so that large, existing projects can be processed. Right now, any imported modules need to have their types annotated before type inference can be done on a given file. Even for straightforward dependency graphs, that can be somewhat painful. For large projects (he showed rather complicated dependency graphs for several), it will require a tool to sort things out.

At the end of the talk, there was a short, fast-paced discussion among attendees about what kinds of annotations might be needed in the future and whether PEP 484 needs some additions.

Comments (none posted)

PyCharm and type hints

By Jake Edge
June 8, 2016

Python Language Summit

A mini-theme at this year's Python Language Summit was tools that are using the PEP 484 type hints. In the final session on that theme, Andrey Vlasovskikh, the community lead for the PyCharm IDE, described that tool's support for type hints.

Vlasovskikh started by showing a graph of how many PyCharm users were developing in Python 2 versus Python 3 (presumably gathered using the usage statistics feature). As might be expected, Python 2 use has been steadily declining since 2013, while Python 3 use is rising. Currently, it is roughly 50% Python 3 and 70% Python 2, with 20% using both. What might not be expected, though, is that extending the trend lines shows them crossing in December 2017.

After that interesting tidbit, he switched gears. PyCharm used to have its own type system that is similar to that used in PEP 484. So it was not that difficult for PyCharm to switch to supporting the new type hints. But there are still some parts of the PEP that are not supported yet in PyCharm.

Type-hints support was added to PyCharm in November 2015, but users only started to try the feature in March of this year, so there are no real statistics available yet. There are some problems that the PyCharm team has seen. Some users are confused by the Optional type (which effectively allows None as a valid value in addition to any other types specified) and how it relates to arguments with a default value of None. There is also interest in having type hints available for Python 2.

PyCharm does not support strict Union checks or the Type[C] annotation that will allow a class (or its subclasses) to be specified. It also does not use the annotations from the typeshed repository, yet, due to concerns about incomplete annotations for some modules.

One area needing more work is handling text and binary string types for programs that run on both Python 2 and 3. It is important, Vlasovskikh said, and roughly 20% of PyCharm users are using both versions in their projects. The addition of the Text type, which is an alias for str in Python 3 and unicode in Python 2, is a step in the right direction, but more is needed.

The PyCharm developers have sent proposals to the python-ideas mailing list, but they have not been adopted. One would track ASCII values in the code to check for implicit conversions (which is also discussed in this mypy issue). Another was a proposal to give up on strict text versus binary checking in Python 2 that was rejected. Vlasovskikh planned to work on the problem during the sprints that occur right after PyCon and invited others to join in.

Comments (none posted)

Python 3.6 and 3.7 release cycles

By Jake Edge
June 8, 2016

Python Language Summit

Ned Deily, who is the release manager for the upcoming Python 3.6 release and will "probably be the 3.7 release manager", led a session at the 2016 Python Language Summit to review and discuss the release cycle for the language. There have been some changes for 3.6 compared to the 3.5 cycle and there may be opportunities to make some additional changes for 3.7 and beyond.

The 3.6 cycle began with the feature development phase, which runs from May 2015 until September 2016. In 3.6, this phase started sooner than in earlier cycles; it started when the first 3.5 beta was released, which corresponded to the 3.5 feature freeze. Previously, the new feature development phase did not begin until the final release, so overlapping with the beta and release-candidate phases shaves roughly eleven weeks off the full cycle.

The 3.6 alpha phase is currently ongoing. The first of four alphas was released in May and the phase will continue until the first beta is released in September. Feature development can continue unabated up until that time.

Once the first beta is released, feature development for 3.6 is done, but new features for 3.7 can start to be added to that branch. Meanwhile, four betas will be released before the beta phase ends in December with the first release candidate. At that point, the code is frozen except for any emergency fixes during the release-candidate phase.

Deily is hoping to keep the release-candidate phase quite short. The goal is to have a single release candidate that is the same as the code for the final release, but emergency fixes may lead to a second release candidate. He is planning to make the final release on December 16, which, coincidentally, is his birthday.

At that point, 3.6 goes into bug-fix mode, which will last until early 2018 (or a few months after 3.7 is released). There will be periodic maintenance releases for bugs, regressions, and documentation fixes. No new features will be added to 3.6 unless there are security or platform problems that require them.

The next phase is for security fixes only. That starts once the bug-fix phase ends and will last until December 2021 (five years after the original release). These are source releases only and are done on an as-needed basis to address security problems. Once 3.6 hits its end of life in 2021, the source branch is retired (as is the release manager, he said with a grin).

The 3.7 development phase will begin in September, overlapping the beta and release-candidate phases for 3.6. If the same cycle is adopted for the rest of the 3.7 phases, there will be a year for feature development. The final 3.7 release would be in April 2018, which is roughly a nineteen-month release cycle. But, because of the overlap, the actual releases will be around sixteen months apart, which is less than it has been in the past.

Deily then brought up some topics to think about, though he was not expecting decisions to be made on the spot. Should Python be released more often? Or, possibly, less often? The latter is easy, but he said that doing releases more often is easier than it might have been in the past. Because of automation, it is easier for the release team to make the releases. In addition, releases are less risky since there are more buildbots and fewer unusual platforms to handle.

Someone asked if it made sense for Python to consider aligning its release cycle with Ubuntu long-term support (LTS) schedules or those for other distributions. Deily said that there could be benefits to doing so. Nick Coghlan said that what was mainly needed was a cycle that was a multiple of six months. Distributions can pretty easily work with those, he said, but it gets much trickier when the cycle is variable.

Barry Warsaw said that it is fairly straightforward for distributions to start using the beta releases if the final release will land in the right time frame. Unfortunately that doesn't always work out, Coghlan said. The Python 2.7 release turned out to be coming a little late for the RHEL 6 development cycle, so RHEL 6 shipped with 2.6. But 2.7 was actually released by the time RHEL 6 appeared, which has been painful. He did admit that the secrecy surrounding RHEL release schedules played a role in all of that, however.

Deily wondered if it might make sense to recognize that the alpha phase is really a sub-phase of the development phase. By starting the alphas earlier in the development phase or reducing from four alphas to three, a yearly release cycle might be possible.

But Warsaw cautioned that "new versions are not cost-free". There is a lot of work that goes on downstream after a release. In addition, Coghlan said, the six months between the 3.y.0 and 3.y.1 releases is important to allow the ecosystem to catch up with the new release. During that time, binaries and installers are built, frameworks are updated to the new version, and so on. He is "not sure that speeding up our cycles would actually speed up feature adoption".

There are lots of libraries and packages that don't even start working on adapting to new versions of Python until they show up in a distribution, Warsaw said. For Python release cycles, "predictability is more important than length", he added.

Deily said that he just wanted to start the discussion. There is plenty of other work to do right now (3.6 features, migrating to Git, etc.). Once the first 3.6 beta is out, there will be time to discuss more concrete proposals for 3.7 and beyond.

Comments (none posted)

Page editor: Jonathan Corbet
Next page: Security>>