Leading items

Welcome to the LWN.net Weekly Edition for February 7, 2019

This edition contains the following feature content:

Saving birds with technology: an application of machine-learning techniques to identify and reduce unwanted predators with free software.
Python elects a steering council: the adoption of the new governance model is now complete with the election of the project's first steering council.
Rusty's reminiscences: Rusty Russell's keynote at linux.conf.au.
Fixing page-cache side channels, second attempt: the first try at closing information-disclosure vulnerabilities in the page cache didn't work out, so a new approach is being tried.
Mozilla's initiatives for non-creepy deep learning: some projects to take deep learning out of the cloud.
Lisp and the foundations of computing: a look at computing history from linux.conf.au.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Saving birds with technology

By Jake Edge
February 6, 2019

LCA

Two members of the Cacophony Project came to linux.conf.au 2019 to give an overview of what the project is doing to increase the amount of bird life in New Zealand. The idea is to use computer vision and machine learning to identify and eventually eliminate predators in order to help bird populations; one measure of success will be the volume and variety of bird song throughout the islands. The endemic avian species in New Zealand evolved without the presence of predatory mammals, so many of them have been decimated by the predation of birds and their eggs. The Cacophony Project is looking at ways to reverse that.

Menno Finlay-Smits and Clare McLennan started their presentation with a recording of what parts of New Zealand might have sounded like before the arrival of humans and other mammals. Unfortunately, most of New Zealand does not sound like that any more, Finlay-Smits said. The Cacophony Project is a non-profit using open-source hardware and software to help restore the levels of bird song in the country. It is "very much a startup", he said. The project has a vision for where it wants to go, but does not have the solutions yet. Plans change regularly. The project works with other organizations on its aims and, of course, encourages volunteers.

New Zealand's native birds are "our national treasure", Finlay-Smits said. They are different from birds elsewhere in the world, so preserving them is important. They evolved without mammals, which is part of what led to multiple species of flightless birds in New Zealand. "They really need our help." The government is spending around NZ$70 million per year to control pests, but that only serves to suppress the numbers; it will never get them down to zero. There are also benefits to agriculture that come from eliminating these pests.

The current technology for identifying and capturing predatory mammals is fairly primitive. There are chew cards, which are made of plastic that bait (e.g. peanut butter) is applied to. Based on the different kinds of bite marks, the types of pests (e.g. rats, possums, stoats) can be determined. The chew cards work, but are not a great way of knowing what's in the area, he said. There are also wooden box traps that are somewhat effective to catch and eliminate the pests, though less than 1% of the animals that walk near them ever interact with them. Part of the reason is that the traps are "handicapped intentionally in their design" so that they do not capture animals other than the target type.

The project is looking for "something that is radically better": a device that can cover 100 times the area, catch four kinds of pests (the current traps target a single type), catch at least ten times as often, and that auto-resets so that it can do multiple catches without intervention. That would mean that each device was 4000 times as effective as the current technology.

Technology ecosystem

The project has a whole "ecosystem of technologies" that it is building. There is an audio recorder project that will be used to determine how well the project is doing on its goal. There is also a thermal-video platform to record and analyze data. A "sidekick" phone app is used to manage the platform; there are various analysis and visualization tools and a "bunch of stuff in the cloud", he said.

The audio recorder, which is called the "cacophonometer", is based around an Android phone. The idea is to have a cheap and easy way to measure bird song throughout the day. It wakes up once per hour, records the bird song in its location, and sends it off to be analyzed. The project is currently in the phase of getting the devices out there to start gathering the baseline data.

The thermal-camera platform is in the prototype phase; it uses a Raspberry Pi as the onboard computer. The pests that the project is interested in are mainly active at night, so they show up nicely against the cool night air in thermal video. He demonstrated the device by pointing its camera at the audience (seen in a picture on the left). There is a web server in the device that displayed the camera feed for the demo; it is used in the field to ensure that the camera is pointed in the right direction before walking away from the site.

The platform has the Raspberry Pi, a 3G or 4G modem, and a camera, all housed in a waterproof enclosure. The Raspberry Pi has a "hat" (daughterboard) with a real-time clock and some other circuitry, such as a microcontroller to turn the camera device off for 18 hours so that it only runs at night and uses less power. Running on this hardware is challenging, Finlay-Smits said. Putting electronic devices in the wild is "fraught with problems". The project struggled with waterproofing early on, but the enclosure now has good seals and waterproof connectors. In addition, it has a "Gore valve" that allows the pressure to change within the enclosure without allowing water to get in; early experiments did not have that and the pressure changes due to heat wore out the seals fairly quickly.

The camera being used is the Lepton 3, which is "quite reasonable for its price". It is not super-high resolution, but is good enough for the project's needs. It is somewhat difficult to use reliably, though; reading the camera fast enough to keep up with the data stream and not lose sync was difficult. The project switched from Python to Go for reading from the camera and changed the process to a realtime priority in order to capture the data.

The project has struggled with battery power a bit, he said. It started by using devices on mains power, sometimes with really long extension cords, but there is a limit to that. So it switched to off-the-shelf USB batteries that the developers added weatherproofing to, but the batteries turned out to be "too smart"; when the system shut down to save power, the batteries would follow suit and require human intervention to turn back on. Now there are some custom-made battery packs that are not as smart, thus work better for the device's needs.

Processing the video

At that point, Finlay-Smits handed off to McLennan to talk about how the video imagery is processed. She said that the first step is to do motion detection, but Cacophony does it differently than the existing crop of trail cameras that are used by game hunters and the like. Those devices have an infrared motion detector that turns on the camera, but it can take half a second for the camera to turn on. Stoats and rats are small and fast, so the project always has the camera on in order to detect motion.

She showed some footage of a stoat crossing the full frame in less than a second, which a trail camera would largely miss. So the camera stays on all of the time, which is working well but uses more power than the developers would like, especially now that the device is running on batteries. Some users have traps that they have trail cameras pointed at; they note that the bait is missing but the camera never got any footage—or even a still. The project's device can show them footage that gives them confirmation that there are animals in the area. For remote islands where these pests have been eliminated, it is critical to note when they have returned as early as possible, she said.

One problem is that distinguishing animal motion from wind is quite difficult. She showed footage of grass moving in the wind that repeatedly confused the motion-detection software. Finlay-Smits noted that, at the end of the day, trees may be warm from the day's sun and cooler leaves moving across them can confuse the detector. In addition, McLennan said, some animals don't appear all that warm; hedgehogs are warm on their bellies, but not on their "prickles", for example.

She showed more footage of creatures interacting with, and often outwitting, the traps. But if the motion detection mostly works well and you give people a way to look at the results from traps they have set, they will do so. They will get up in the morning to visit the web site to see what happened overnight; they will provide lots of feedback on what happened, as well as relating problems with the software or device design. In addition, "they will send you gruesome pictures of rats", she said with a chuckle.

The next stage is to identify animals in the footage without a human having to look at it. That is done using machine learning. In order to focus the training of the machine-learning model, they have identified and pulled out blocks with animals in them. Those blocks are linked frame to frame so that the model can also learn how the animals move. A bird and a rat have a similar size and shape at low resolution, but they move very differently.

For the animal footage, the background is eliminated and then a mask of the pixels that are part of the animal is created. The footage she showed was fairly obviously a possum in context, but when she showed a zoomed-in, pixelated image of just the animal in the thermal false colors, it really showed the scope of the problem the project is trying to solve. In addition, it is not always as simple as following a single animal as there can be more than one at once; occlusion of parts of an animal as it moves can also complicate things substantially.

Once the tracks are established in a video stream, the last stage is to classify the animal. In the end, the developers really want to be able to put them into one of two boxes: predators or non-predators. Humans, Kiwis, and other birds are non-predators, while rats and mice, hedgehogs, possums, and stoats (which includes ferrets and weasels) are all predators. Cats are difficult, since they are predators, but, depending on context, the project may not want to treat them that way. "People are passionate about their cats", she said.

Currently, videos are labeled by animal type for training purposes; eventually, the individual tracks should be labeled. Each track is reduced to a 48x48 pixel, three-second clip. The model is trained with this data, tested with some other test data, and when it is deemed ready, it is evaluated on a different set of videos. Early on, there was a limited set of data and the only possum footage had them climbing up trees; that led the model to conclude that anything that had a tree in it was a possum. It takes around six hours of training with a computer with a GPU; the project is looking at using Google Compute Engine to speed things up.

The model is built using NumPy, OpenCV, and TensorFlow. It creates a recurrent convolutional neural network; the "recurrent" parts means that it has memory, she said. The memory allows the model to take into account previous frames so that the movement of the animal is part of the decision-making process.

The main problem the project runs into is garbage in, garbage out, McLennan said. A single mislabeled track can confuse things quite a bit. In addition, the project is always looking for more video to use because more diversity of scenes and animal activity helps train better models. But 80% of the work actually goes into the processing and infrastructure to store, tag, and organize the footage.

The future

The next steps are for the project to get better at what it has been doing, she said. When there are multiple animal tracks, it would be helpful to be able to prioritize them based on the animal type, for example. In addition, there may need to be some training with dogs so that they can be recognized and avoided. The main question, eventually, is whether or not a trap should be opened based on what type of animal is present.

As time for their talk was running low, she handed off to Finlay-Smits so that he could talk about some future plans for the overall project. In the near future, the project needs to get the machine-learning model running on the camera devices; right now, everything runs in the cloud, which is not practical for remote sites. Once that is available, the project will have a device that can report on the numbers of pests they observe, which is an important starting point for many organizations, he said.

There is also an effort to create a "cacophony index" from the audio data that is being gathered by the cacophonometers. That will allow looking at the changes in bird song over time, for different seasons, and so on. Using audio to lure in these pests is another experimental technique the project is trying. By playing various types of sounds on a schedule or based on the presence of certain animals, researchers should be able to determine the effectiveness of the technique and which sounds work better or worse. If it is successful, it would mean that the range of the traps is increased, so fewer would be needed in a given area.

In five years, the project is looking at pairing the camera with a gun turret that fires paper "bullets" with toxin into the fur of these pests. The idea is that the pest will then groom themselves, ingest the toxin, and go off somewhere to die. This would allow a ten-meter radius to be covered, for example, and does not require a human to clear, unlike the traps. It obviously requires a lot of safety and legal review, but it has a lot of advantages, Finlay-Smits said.

Another possibility is drones, he said, not for shooting, but for scouting. They could be deployed in places that people did not need to walk to, gather some data over some number of days or weeks, then return. He has been told that the obstacle-avoidance software in drones makes it entirely possible to deploy in hard-to-reach places such as forest canopies—at least in a few years.

The project is all open source, he said. Like all such projects, Cacophony is always looking for help. As time expired, he said that there are lots of small projects that need doing using various languages and tools.

Video of the talk is available in WebM format or on YouTube.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Christchurch for linux.conf.au.]

Comments (4 posted)

Python elects a steering council

By Jake Edge
February 4, 2019

After a two-week voting period, which followed a two-week nomination window, Python now has its governance back in place—with a familiar name in the mix. As specified in PEP 13 ("Python Language Governance"), five nominees were elected to the steering council, which will govern the language moving forward. It may come as a surprise to some that Guido van Rossum, whose resignation as benevolent dictator for life (BDFL) led to the need for a new governance model and, ultimately, to the vote for a council, was one of the 17 candidates. It is perhaps much less surprising that he was elected to share the duties he once wielded solo.

The other members of the steering council are Barry Warsaw, Brett Cannon, Carol Willing, and Nick Coghlan. Other candidates and their nomination statements are available as part of PEP 8100 ("January 2019 steering council election"). Warsaw, Cannon, and Coghlan are likely recognizable names to those who follow Python development (as is, of course, Van Rossum). Willing is perhaps less-known in the Python world, even though she is a core developer, has been a member of the Python Software Foundation (PSF) board of directors, and is a core developer and steering council member for the Jupyter project.

The number of candidates for the Python steering council was rather large, especially when compared with either the eligible voter pool (96) or the number who actually cast ballots (69). Voting was restricted to active core developers, though nominees could come from outside of that set. Some concerns were expressed about allowing external nominees, but the PEP did explicitly allow core developers to nominate "outsiders". Three of the nominees were not on the list of eligible voters: David Mertz, Peter Wang, and Travis Oliphant. However, Oliphant is a former core developer as can be seen in his nomination thread. The rest of the candidates are a mix of both older and newer core developers with interests ranging throughout the Python ecosystem.

Ultimately, voters decided not to elect any of the outsiders. Part of the reason for that might have been the voting method itself, which allowed each core developer to vote for zero to five of the candidates. That is rather different than true "approval voting" where voters can choose any number of the candidates. As a thread in the Committers Discourse discussion group showed, though, that was something of an oversight. The Django election process document that was used as a starting point for the PEP that was eventually adopted (PEP 8016, "The Steering Council Model") used that scheme and no one brought up the "vote for zero to five" mechanism until after the PEP became "law". As Nathaniel J. Smith, who was a co-author of the PEP, put it:

Anyway, I'm not particularly defending the at-most-N variant, just explaining how we ended up with it. It was in the Django text we started with, it didn't jump out at us as something that needed changing, it didn't jump out at anyone who read the proposal as something that needed changing, and here we are. It's too bad no-one noticed this back during the review period when it was easy to change, so I guess we're stuck with it for this election, but if people want to revise it for the next one then I won't stand in your way :-).

It might make sense to collect up several small "amendments" like this to handle all together, a few months from now once things have settled down. I know Łukasz [Langa] would like to tweak the council term to decouple it from the release cycle, and probably we'll run into a few other nits as we start using the new system for real.

One of the concerns that came up in the thread was that a restricted number of votes could lead to a landslide victory for the best-known candidates. In the Discourse thread, Tim Peters linked to a description of "bloc voting"—what is being used for the council election—which described the problem this way:

The bloc voting system has a number of features which can make it unrepresentative of the voters' intentions. It regularly produces complete landslide majorities for the group of candidates with the highest level of support, though this does tend to lead to greater agreement among those elected.

Coghlan thought that pure approval voting might lead to the "old guard" getting elected, though Peters's intuition led him to the opposite conclusion. So far, the full number of votes for each candidate have not been released (though it is being worked on), but voters can see those totals. In a post to the python-committers mailing list, Peters used that information to show that the landslide did seem to happen:

[...] As predicted by a brief article I linked to on Discourse, limiting the number of approvals to 5 favored a landslide victory of the best-known candidates. Except for Nick, the weakest "winner" got 50% more approvals than the strongest "loser". So "landslide" for 4.

In pure Approval voting (which we've used for PSF Board elections), there is no limit, and then you get a clear picture of approval levels. The "losers" here should realize their relatively low approval levels _may_ be an artifact of the voting process. Like in "first past the post" plurality elections, with a limit there's pressure for voters to betray their actual favorite(s) if they _think_ they can't win, to avoid "wasting their vote". Without a limit, there's never a reason (regardless of whether a voter is 100% honest or 100% tactical) not to approve of your true favorites.

While it is possible that the outcome did not exactly reflect the "will of the voters", it seems likely that it will serve the project just fine for the first incarnation of the council. Tweaks to the voting method and possibly other issues with the governance model may come about over the next year or so. The current council will serve until the release of Python 3.8, which is currently scheduled for October. After that, the electorate will get another chance to choose council members.

In the meantime, though, the PEP-decision process is something that the council will need to work on. As it currently stands, there is no real mechanism to approve PEPs, which is obviously sub-optimal. Ultimately, the power rests with the steering council, but that may or may not be the path forward. The various governance proposals, as well as the discussion around them, seemed to indicate that delegating PEP decisions to the proper person might be the preferred path forward. That is not anything completely new, of course, as Van Rossum would sometimes pass his PEP-pronouncement power to a BDFL delegate.

It has been just over six months since Van Rossum stepped down. Over that time, the core developers have figured out how to choose a governance system, proposed a half-dozen systems, and voted to pick one of those. In the past month, they have nominated 17 candidates and chosen five of them to serve on the steering council. At this point, we should start to see what changes come about and perhaps get resolution on some dangling PEPs. It is not common to see a community completely change its governance in this way, so it will be interesting to watch it all play out.

To a large extent, the voters chose to stick with the status quo. Obviously, Van Rossum represents continuity with the previous regime, but the others elected were also high-profile decision makers for the language. That would seem to indicate fairly small changes ahead. What exists today, both in the language itself and in its governance, are not likely to see any radical changes, at least in the short term—which is probably for the best for a mature, nearly thirty-year-old language.

Comments (1 posted)

Rusty's reminiscences

By Jonathan Corbet
February 1, 2019

LCA

Rusty Russell was one of the first developers paid to work on the Linux kernel and the founder of the conference now known as linux.conf.au (LCA); he is one of the most highly respected figures in the Australian free-software community. The 2019 LCA was the 20th edition of this long-lived event; the organizers felt that it was an appropriate time to invite Russell to deliver the closing keynote talk. He used the opportunity to review his path into free software and the creation of LCA, but first a change of clothing was required.

Russell formally left the kernel community in 2017 to pursue the blockchain dream; his entrance at LCA 2019 reflected the wealth that resulted. Or that was maybe supposed to result; the chart of the value of Bitcoin he put up reflected the real-world experience. After a show of dismay, Russell stripped down to the obligatory LCA uniform (shorts and an LCA T-shirt) to get into his talk where he would, he said, "misremember history into a narrative that reinforces my personal biases".

He found his way into the Unix world in 1992, working on an X terminal connected to a SunOS server. SunOS was becoming the dominant Unix variant at that time, and there were a number of "legendary hackers" working at Sun to make that happen. But then Russell discovered another, different operating system: Emacs. This system was unique in that it was packaged with a manifesto describing a different way to create software. The idea of writing an entire operating system and giving it away for free seemed fantastical at the time, but the existence of Emacs meant that it couldn't be dismissed.

Even so, he took the normal path for a few more years, working on other, proprietary Unix systems; toward the end he ended up leading a research project developed in C++. The proprietary compilers were too expensive, so he was naturally using GCC instead. He did some digging in preparation for this talk and found his first free-software contribution, which was a patch to GCC in 1995. The experience of collaborating to build better software for everybody was exhilarating, but even with as much fun as he was having there was another level to aim for.

Encountering Linux

In 1997, he went to a USENIX conference which featured a special "USELINUX" track. Developers like Ted Ts'o, Alan Cox, Dave Miller, and Stephen Tweedie were all there; Russell found himself standing nervously among them as they were talking about implementing SMP. Miller grabbed a napkin and jotted down some x86 assembly code on it; Cox then optimized it on the spot. The highlight of the event, Russell said, was a talk by Miller and Miguel de Icaza on the SPARC port of the Linux kernel; the talk was not recorded but Russell is confident that it was the greatest technical talk ever given. It started with some lmbench benchmark results showing Linux performing rather poorly compared to Solaris. Miller went into a great many details regarding the SPARC architecture and how Linux could be made to perform better on it; Russell "understood some of the words". By the end of the talk, Miller showed that Linux now outperformed Solaris on every benchmark.

At one point, Russell talked with Jon "maddog" Hall, who said that it was hard to describe the pre-Unix experience to those who had not been there. Something similar holds with regard to the world before and after this USENIX event. Russell had walked into the room not thinking that a group of students could cobble together a system, a couple more could then hack on it, and the result would beat what the professionals — those legendary hackers — were producing. Afterward, it was obvious that things could work that way.

Russell's immediate conclusion was that he wanted to work with these people. Later in 1997, he recruited in Michael Neuling to work on a firewall implementation, which he had anticipated would be merged alongside the existing code in the kernel. Miller looked at the code and not only applied it; he replaced the old code entirely. Russell woke up one day and found that he had become the kernel's firewall maintainer.

Also in 1998, he returned to USENIX and ran into somebody who was looking to hire an ipchains expert. Russell explained his plans for an ipchains successor, saying "give me the money and I can do it in six months". The pitch worked, amazingly. The implementation, though, being software, still took twelve months to come around.

The genesis of LCA

These experiences convinced him of the need for an Australian Linux conference, so he decided to organize one. Part of the process involved visiting a number of Australian user groups to recruit attendees; the most common question he got was something like "why would I go to a Linux conference?" At the time, most people working with Linux were students and hobbyists who were developing free software in their spare time. It was hard to explain to them why a Linux-specific gathering would be cool; he could mention speakers and talks and such, but that is missing the forest for the trees. The job is even harder, he said, when the conference in question doesn't actually exist yet.

Beyond attendees, he also needed speakers. That required paying for long-distance travel and more — a challenge. The first speaker he invited was, of course, Miller. In the end, Hall, Federico Mena Quintero, and Carsten "Rasterman" Haitzler also agreed to come. The old story that he funded the whole thing on his credit card turns out to be true; at one point he got a call from American Express and had to fax them a copy of his bank statement to show that he could pay the bill.

The actual conference had no WiFi network (this was 1999, after all), so somebody printed the Slashdot front page every day and posted it at the venue. The last time slot was kept empty so that the best three talks could be chosen to be presented again — a tradition that LCA kept for some years. There was, naturally, no video. The conference did include some tutorial sessions, for which the materials had been requested from the speakers two weeks before the event. That didn't happen, so a lot of last-minute copying had to be done; that was a problem, since the school term was starting and all of the copy shops were fully booked. So 400 book copies were made by a group of volunteers feeding money into a set of coin-operated copy machines.

Andrew Tridgell brought a big machine with three CD burners, which were used to crank out copies of the conference proceedings — a process that took about two days. As the event began, somebody complained about the lack of a conference T-shirt, to which Russell answered something like "I am so tired right now" and the person went away. But then somebody bought a bunch of white shirts and found a silkscreen printer at the university; they then proceeded to crank out a set of shirts featuring the front page of the conference web site.

Tridgell collected donations from the speakers to buy a gift for Russell, which he found touching. That tradition remains with LCA until this day. Russell said that he learned well the most important lesson from having organized a conference: "never again". Others have stepped up since, though, with the result that LCA has become one of the premier Linux events; he has given 19 LCA talks since then.

After 1999

LCA, as his Conference of Australian Linux Users came to be called, has shaped his life. He met Tridgell, which led to his moving to Canberra and working at IBM's OzLabs, which he said was the greatest concentration of Linux kernel hackers to ever work in a single location. He had found a tribe of hackers who made him a happier and better person. All great projects, he said, come down to a small group of people — people who are smart enough to complete the job, and who are dumb enough to try. He has been lucky enough to be a part of this kind of project.

Russell's takeaway from this experience is that, when somebody has something that they want to try, and it's not actively harmful, the best responses are "sounds great, tell me more", and "what can I do to help?" You can make people far more likely to succeed by actively collaborating with them. Collaboration is "a superpower", but it is not always easy. Something he learned far too late (and is still working on) is that getting along with people is a skill in its own right. But it's a skill that is well worth picking up.

There are, he said, "headwinds to collaboration" all over our community. One of those is impostor syndrome, which causes people to shy away from roles in our community. As he was working on his exit from the kernel community, he found a developer who, he thought, would be a fine person to take over the maintainership of the kernel's module loader, but she did not feel up to the job. Six months later, though, he tried again and she accepted. With enough encouragement, people can be convinced to step up and take on responsibilities within our community.

Another headwind is a general attitude that tells people that "you don't belong here". People, he said, are well tuned to "go away vibes" that are aimed at them, but they are also good at not even seeing them otherwise. The recipients of such messages don't get to collaborate in our community; there are no superpowers for them. That is hurtful for the people involved, but also for our community as a whole.

Russell concluded with a statement that people with the skills needed to work within our community are rare, but they are wonderful and he wants to experience them as much as possible. If you want to work on the projects that he is involved with, and you have the skills to do so, he wants to work with you.

A video of this talk is available (also on YouTube).

[Thanks to linux.conf.au and the Linux Foundation for supporting my travel to the event.]

Comments (5 posted)

Fixing page-cache side channels, second attempt

By Jonathan Corbet
February 5, 2019

The kernel's page cache, which holds copies of data stored in filesystems, is crucial to the performance of the system as a whole. But, as has recently been demonstrated, it can also be exploited to learn about what other users in the system are doing and extract information that should be kept secret. In January, the behavior of the mincore() system call was changed in an attempt to close this vulnerability, but that solution was shown to break existing applications while not fully solving the problem. A better solution will have to wait for the 5.1 development cycle, but the shape of the proposed changes has started to come into focus.

The mincore() change for 5.0 caused this system call to report only the pages that are mapped into the calling process's address space rather than all pages currently resident in the page cache. That change does indeed take away the ability for an attacker to nondestructively test whether specific pages are present in the cache (using mincore() at least), but it also turned out to break some user-space applications that legitimately needed to know about all of the resident pages. The kernel community is unwilling to accept such regressions unless there is absolutely no other solution, so this change could not remain; it was thus duly reverted for 5.0-rc4.

Regressions are against the community's policy, but so is allowing known security holes to remain open. A replacement for the mincore() change is thus needed; it can probably be found in this patch set posted by Vlastimil Babka at the end of January. It applies a new test to determine whether mincore() will report on the presence of pages in the page cache; in particular, it will only provide that information for memory regions that (1) are anonymous memory, or (2) are backed by a file that the calling process would be allowed to open for write access. In the first case, anonymous mappings should not be shared across security boundaries, so there should be no need to protect information about page-cache residency. For the second case, the ability to write a given file would give an attacker the ability to create all kinds of mischief, of which learning about which pages are cached is relatively minor.

Interestingly, in the cases where mincore() does not return actual page-cache residency information, it reports all pages as being present. This was done out of worries that applications might exist that will make repeated attempts to fault in pages until mincore() confirms that they are present in the cache; reporting a "present" state will prevent such applications from looping forever. But it might also prevent them from bringing in the pages they need, harming performance later. In an attempt to avoid the second problem, Babka has added another patch partially restoring the behavior that was removed from 5.0: if information about page-cache residency for a given region is restricted by the criteria described above, pages will be marked as present only if they are mapped in the calling process's page tables. That will allow a process to observe the effect of explicitly faulting a page in while hiding information about pages that the process has not touched.

It appears that these changes should suffice to close off the use of mincore() to watch the page-cache behavior of other processes without breaking any legitimate use cases. The real world is always capable of providing surprises, though, so these changes will have to be tested for a while before they can be trusted not to break anything. For this reason, they are unlikely to be merged for the 5.0 release. They are likely to be backported to the stable updates, though, if and when they get into the mainline and nobody complains.

In the earlier discussions, though, Dave Chinner pointed out that there are other ways of obtaining the same information. In particular, the preadv2() system call, when used with the RWF_NOWAIT flag, will return immediately (without performing I/O) if the requested data is not in the page cache. It, too, can thus be used to query the presence of pages in the cache without changing that state — just the sort of tool an attacker would like to have. The proposed solution here can also be found in the patch set from Babka; it works by always initiating readahead on the pages read with RWF_NOWAIT. That will bring the queried page(s) into the cache, turning the test into a destructive one. That does not entirely foil the ability to determine whether a given page is in the cache, but it does eliminate the ability to repeatedly query to observe when a target process faults a page into the page cache. That should block most of the attacks of interest.

In theory, this change does not affect the semantics of preadv2() as seen by applications. In practice, it could still prove problematic. The existing preadv2() implementation takes pains to avoid performing I/O or blocking for any reason; the changed version could well block in the process of initiating readahead. It is hard to tell whether that change will create performance problems for specific applications, and it may take a long time before any such problems are actually observed and reported. Nobody has suggested a better solution thus far, though.

Assuming that these patches find their way into the mainline, the known mechanisms for nondestructively testing the state of the page cache will have been closed off. It will, of course, remain possible to do destructive testing by simply measuring how long it takes to access a given page; if the access happens quickly, the page is resident. But destructive attacks are much harder to block; they are also harder to exploit. A much bigger problem is likely to be nondestructive attacks that have not yet been discovered; like Spectre, such problems have the potential to haunt us for some time.

Comments (1 posted)

Mozilla's initiatives for non-creepy deep learning

By Jonathan Corbet
February 6, 2019

LCA

Jack Moffitt started off his 2019 linux.conf.au talk by calling attention to Facebook's "Portal" device. It is, he said, a cool product, but raises an important question: why would anybody in their right mind put a surveillance device made by Facebook in their kitchen? There are a lot of devices out there — including the Portal — using deep-learning techniques; they offer useful functionality, but also bring a lot of problems. We as a community need to figure out a way to solve those problems; he was there to highlight a set of Mozilla projects working toward that goal.

He defined machine learning as the process of making decisions and/or predictions by modeling from input data. Systems using these techniques can perform all kinds of tasks, including language detection and (bad) poetry generation. The classic machine-learning task is spam filtering, based on the idea that certain words tend to appear more often in spam and can be used to detect unwanted email. With more modern neural networks, though, there is no need to do that sort of feature engineering; the net itself can figure out what the interesting features are. It is, he said, "pretty magical".

Moffitt gave a quick overview of some of the structures used for contemporary deep learning, including neural networks, convolutional networks, and recurrent networks. The last of those are useful for speech recognition and synthesis tasks; they are used a lot at Mozilla. See the video (linked at the bottom) for more details about how these different types of networks perform their magic. Regardless of the architecture, the overall technique used to train these networks is the same: present them with input data, then tweak the network's parameters to bring the output closer to what is desired. Do that enough times with enough data, and the network should get good at performing the intended task.

One nice feature of these networks is that it is possible to take a trained model and use it for purposes other than the intended one. A network that has been trained to recognize objects in general, for example, can be pressed into service as the starting point for a face detector. This approach is especially useful in settings where there aren't vast amount of data available to train the network with. Another useful technique is "generative adversarial networks", where two independent networks are trained against each other. If one network generates fake images and another one detects fakes, both can be improved by pitting one against the other.

The dark side

There are many interesting applications of deep learning, he said, but also a dark side. Open-source software can, in general, be used for any purpose regardless of whether the author approves; it can be used to create weapons, for example. Deep-learning applications have their own set of uses that we should all be concerned about, he said.

For example, neural networks have an infinite appetite for data; the more data you can train a system with, the better it will learn its task. That gives huge companies an incentive to acquire as much data as they possibly can. Ostensibly this is done to create better products, but we have to trust these companies that they are not using this data for other purposes. As an example, smart assistants like Alexa will get better at speech recognition as they are trained with more data, so they save a copy of everything that is ever said to them (and sometimes things that are not). That is, he said, "scary".

Deep-learning systems are computationally expensive; it typically takes a huge farm of GPUs to perform the training. Running them is cheaper, but they still don't really fit onto edge devices, with the result that processing moves to the cloud — and all of that input data moves with it. Efficiency does not really appear to be a concern for the people who are designing and building these systems.

There are introspection issues; how does one diagnose problems with a deep-learning system when one doesn't really understand how it comes to its conclusions in the first place? Mistakes are bound to happen, and some of them may have severe consequences. Many of these issues can be solved with more input data, of course, but training data can have unknown biases in it. It will always be possible to get "weird results" from deep-learning systems, and there is no easy way to figure out why when that happens.

Then there is the issue of bias in general. He called out the famous case of Google Photos labeling black faces as belonging to gorillas. Such errors are the result of poor training data and a lack of comprehensive testing; he suggested that perhaps this case shows that Google does not have enough black employees. Word embedding is a useful technique for language processing that tracks the "distance" between related words. A word-embedding system trained on web text is much more likely to associate the word "doctor" with "man" than "woman". Some biases, such as gender-related problems, can be corrected with a technique called "reprojection", but others, such as race, are harder to deal with.

Deep learning at Mozilla

Mozilla has the desire to use these technologies and to make them available to others. But, at the same time, there is a strong desire to avoid the above problems. Moffitt listed a number of projects that, Mozilla hopes, will meet those goals.

The DeepSpeech project is building a speech-to-text system, focused on both recognition and data collection. Existing applications in this space are all owned by big companies; using them involves paying money and sending data to the cloud. DeepSpeech is meant to allow more people to play around in this space. To that end, DeepSpeech has been implemented using TensorFlow. It is able to run in real time on mobile devices, so there is no need to send data to some cloud server. With an error rate of 6.48%, it is the highest-quality open engine available and is close to the natural human error rate of 5.83%.

DeepSpeech currently has models for the English language, mostly because there is a great wealth of suitable data available (free audio books, for example, which allow the speech-to-text output to be compared against the original). Other languages are harder to support, but Mozilla wants to try. The Common Voice project is working to get sample text in other languages, with 20 languages targeted at the outset. It has collected about 1,800 hours of data so far. (See also: LWN's coverage of DeepSpeech and Common Voice from late 2017.)

Another experimental system is called "deepproof", which is a spelling and grammar checker for Firefox. The Grammarly extension for Firefox will do that now, but there is a little problem: it is essentially a key logger, sending everything the user types into the browser to a central server. That's not the kind of extension one might want to install, but Grammarly has a huge number of users, which is scary, he said.

Mozilla has set out to create a replacement that can run entirely within the browser on the user's device. It learns its corrections by example rather than through lots of rules, which is more scalable and requires less language-specific tweaking. The core technique used is to take text from Wikipedia, mutate it in some fashion, then set the system to correcting it; that allows it to learn without the need for language-specific experts. The result "seems to work" but needs more time before it will be production-ready. There are plans for a federated learning system that allows learning from everybody's mistakes but which doesn't require actually sharing everybody's text.

Finally, there is LPCNet, which is a text-to-speech system. These systems tend to be written as end-to-end applications, converting characters to audio spectrograms which are then converted to audio. A lot of systems use an algorithm called Griffin-Lim, but the results don't sound all that great. The WaveNet neural network produces better output, but requires "tens of gigaflops" of computing power to run; WaveRNN is faster than WaveNet, but it is still too expensive to run on a mobile device. Something much more efficient is needed if the objective is to run on end-user systems.

LPCNet works by performing a digital signal-processing pass over the data before feeding it to the neural net; this pass can predict a lot of the resulting output. That allows the network itself to be much smaller, to the point that it can run on a mobile device. Large-network systems like WaveRNN are probably performing a similar sort of filtering, he said, but nobody can know for sure since it's all coded into the network itself. The result "works really well" on mobile hardware and turns out to be useful for a number of other tasks, including speech compression, noise suppression, time stretching, and packet-loss concealment.

At that point Moffitt concluded his talk. For those wanting all of the details, a video of the talk is available; it can be seen on YouTube as well.

[Thanks to linux.conf.au and the Linux Foundation for supporting my travel to the event.]

Comments (14 posted)

Lisp and the foundations of computing

By Jake Edge
February 7, 2019

LCA

At the start of his linux.conf.au 2019 talk, Kristoffer Grönlund said that he would be taking attendees back 60 years or more. That is not quite to the dawn of computing history, but it is close—farther back than most of us were alive to remember. He encountered John McCarthy's famous Lisp paper [PDF] via Papers We Love and it led him to dig deeply into the Lisp world; he brought back a report for the LCA crowd.

Grönlund noted that this was his third LCA visit over the years. He was pleased that his 2017 LCA talk "Package managers all the way down" was written up in LWN. He also gave his "Everyone gets a pony!" talk at LCA 2018. He works for SUSE, which he thanked for sending him to the conference, but the company is not responsible for anything in the talk, he said with a grin.

More history than parentheses

His talk was based around the paper, but not restricted to it. Lisp itself was not really the focus either, so if attendees "were hoping to see tons of parentheses", they may be somewhat disappointed. After he read the paper, it led him to write a Lisp interpreter, which is a fairly common reaction for those who look at the language. In fact, he wrote four or five Lisp interpreters along the way.

He started with the period of 1955 to 1958, when two MIT professors, McCarthy and Marvin Minsky, decided to start a new lab at the university. That was the genesis of the MIT Artificial Intelligence (AI) Lab. McCarthy coined the term "artificial intelligence" in that time frame; he was interested in teaching computers to think like humans.

Both of McCarthy's parents were communists and he grew up speaking Russian, Grönlund said. Much of McCarthy's early knowledge of math came from books in Russian that his parents had given him. That is interesting because much of the AI work that McCarthy participated in was done during the cold war and was often funded by various military-oriented organizations.

In the late 1950s, many believed that they were just on the cusp of having computers that could think like humans. The only obstacles foreseen were things like how to represent knowledge in a computer and how to get the computer to use reason and logic. Because of their mathematical backgrounds, the researchers believed that humans use logic, Grönlund said to scattered laughter. But in 2006, McCarthy gave a presentation entitled: "HUMAN-LEVEL AI IS HARDER THAN IT SEEMED IN 1955" (slides). After widespread laughter, Grönlund said that, unlike the beliefs in the 1950s, AI turned out to be really difficult.

It is interesting to contrast the attitudes toward AI in those early papers with what we are seeing today, he said. The advent of things like AlphaGo, self-driving cars, and other deep-learning applications has given rise to lots of optimism that "real AI" is just around the corner. But it may well turn out to still be really difficult.

Prior to 1956, all programming was done using assembly language. That changed with IPL, which was still assembly-based, but added features like list processing; IPL-II was cited by McCarthy as a big influence on Lisp. FORTRAN came about as the first high-level language in 1957. In 1958, McCarthy started to work on Lisp. Those advances came about in just a few years, which is amazing, Grönlund said.

In 1959, the AI lab got a computer, which was rather hard to do in those days. It was an IBM 704 and that specific model had a huge impact on the development of Lisp—both good and bad. These systems were multiple hulking gray boxes that Grönlund likened to refrigerators, with keypunch machines for creating punched cards that were fed into card readers and read into main memory. To get an idea of what that was like, he recommended a recent YouTube video that shows an IBM 1401 compiling and running a FORTRAN II program.

"Computers"

Investigating these old computers led him to the ENIAC Programmers Project. The ENIAC was one of the first computers; it was used to calculate trajectories for the military. Prior to that, during World War II, "computers" were also used for that purpose. Rooms full of women known as "computers" did those calculations by hand. When ENIAC was built, programmers were needed to configure and run it; six of the human computers from wartime were recruited to handle that task.

The task of programming was not considered to be difficult, since it was done by women, and these first programmers were not recognized as such. In the 1990s, the ENIAC Programmers Project was started and some of the women were tracked down and interviewed. There was a similar occurrence in Grönlund's native Sweden: the first computer was programmed by a woman, Elsa-Karin Boestad-Nilsson, who is largely unknown, even in Sweden, he said.

As he was researching and encountering all of these interesting computing pioneers, he ran into another that took him back to McCarthy. Vera Watson was of Chinese-Russian descent and was hired by IBM for a machine-translation project because she spoke Russian, "but she turned out to be a really good programmer". She eventually married McCarthy. In her spare time, Watson was an accomplished mountaineer and was part of an all-woman expedition to climb Annapurna I in 1978, where she unfortunately lost her life.

When Grönlund looked at the first Lisp programming manual [PDF], which was published March 1, 1960, he saw the name of Phyllis Fox listed as one of the authors. It turns out that Fox was a human computer during the war and went on to create the first simulation language, DYNAMO, which was used to simulate various societal growth scenarios in a study called "The Limits to Growth". The simulation found three possible outcomes, two of which showed a societal collapse in the mid-late 21st century, while the other resulted in a stable world. "So we're not doomed, we have a one-in-three chance that things are going to work out", he said with a laugh.

He noted that the authors of Lisp that were listed in the manual included McCarthy, Fox, and a number of students. The only one of those who had any experience in writing a computer language was Fox, but she is only credited with writing the manual itself in the acknowledgments section. Grönlund said that he didn't have any proof, but that he thought that maybe there was "something fishy going on there". Fox went on to work at Bell Labs on various projects, including two different numerical libraries.

Back to Lisp

McCarthy thought that the classic Turing machine was far too complicated to be used in papers on computability and the like. So he wanted to come up with a way to represent computation in a way that computers could use directly. He felt that the Turing machine, with its infinite tape, read/write head, and so on, was more like a physical device and not particularly mathematical. He wanted a mathematical notation for working with programs, which is where Lisp came from.

McCarthy wanted to show that Lisp was a superior way to describe computation; he thought that the best way to do that was to create the "universal Lisp function". That function would take a Lisp program as its argument and execute the program. He came up with the eval function, which required a notation for representing Lisp functions as data. He never really intended for Lisp to be a programming language, it was simply superior notation for the paper he was working on.

One of McCarthy's graduate students, Steve Russell, who had been hand-compiling code into machine code all day, recognized that implementing the universal function would make things a lot easier. He suggested that he write eval to McCarthy, who said: "ho, ho, you're confusing theory with practice, this eval is intended for reading, not for computing". But, then, "he went ahead and did it", McCarthy said (as quoted by Grönlund).

The syntax of Lisp is inspired by the "Lambda calculus" notation that was developed by Alonzo Church in the 1930s. Both are based on the idea that any kind of computation can be expressed as function applications. The result is a "syntax that has a lot of parentheses". He quoted from the Lisp 1.5 Programmer's Manual, which recommended ending Lisp card decks with "STOP followed by a large number of right parentheses" so that a programming error would not cause the interpreter to continue reading indefinitely. It is clear from this that the parenthesis problem was with Lisp from the early days.

At its most basic level, Lisp programs contain two things: atoms and lists. Atoms are symbols, while lists contain atoms or other lists. So, the following contains an atom, a list of four atoms, and an empty list:

    foo
    (a b c d)
    ()

A more complicated list is below, it has two elements, each of which is a list of three atoms:

    ( (a b c) (d e f) )

Functions are invoked via lists, with the function name as the first atom and the remainder of the list as arguments, so f(x) would be:

    (f x)

That leads to a problem when you want to refer to a list as simply data, rather than as a function invocation. In Lisp, there is the idea of a quote function (though ' is often used as a shortcut) that can be used as follows ("=>" will be used to show the result):

    (quote a) => a
    (quote (a b c)) => (a b c)
    '(a b c) => (a b c)

There are various dialects of Lisp that have been used over the years, including Scheme, which is what Grönlund used for his examples.

The influence of the IBM 704 can be seen in the next example. Lists have traditionally been represented as linked lists in the interpreter. Two of the primitive operations, car and cdr, take their names from operations on the IBM 704. car is the "contents of the address part of the register", while cdr is the "contents of the decrement part of the register", he said. The upshot is that car results in the first element of its list argument, while cdr results in the rest of the list:

    (car '(a b c)) => a
    (cdr '(a b c)) => (b c)

Another primitive operation is cons, which constructs a list from its two arguments:

    (cons 'a '(b c)) => (a b c)

Building new functions in Lisp is done with the lambda function. It takes a list of arguments and the computation to be done:

    (lambda (x) (* x 2))
    ((lambda (x) (* x 2)) 4) => 8

The first line simply defines a function that multiplies its argument by 2. He pointed out that even arithmetic operations are done using function notation, rather than using infix expressions as in other languages. The second line actually uses the defined function with the argument 4. He also introduced the cond function, which acts as a conditional branch. It evaluates its arguments (each of which consists of a test and an action), finding the first test that evaluates to true and executing the associated action.

Even though FORTRAN came out the previous year, Grönlund said, Lisp had the first implementation of conditionals. The first version of FORTRAN could do a GOTO based on whether a value was zero or non-zero, but that was because the inventors of the language were still thinking in terms of machine language, he said. One of the big innovations of Lisp was that it allowed arbitrary tests in an if-then-else kind of construct.

His description of the language, which is abbreviated somewhat here, is enough to create a fully functioning version of Lisp. There are lots of pieces that can be added for convenience, such as mathematical operations, but that aren't truly needed in order to build a Lisp that is Turing complete. On page 13 of the Lisp 1.5 manual that was linked above, you can find the eval function that McCarthy wrote (though the notation is different than Lisp). Grönlund quoted Smalltalk inventor Alan Kay on the significance of that:

Yes, that was the big revelation to me when I was in graduate school—when I finally understood that the half page of code on the bottom of page 13 of the Lisp 1.5 manual was Lisp in itself. These were "Maxwell's Equations of Software"! This is the whole world of programming in a few lines that I can put my hand over.

Code and data

One of the other major innovations of Lisp was in showing that there is no real distinction between code and data. There is no sharp boundary between data formats and code formats. Even though this was recognized in 1960, we still fail to completely understand it today, Grönlund said. Some security vulnerabilities come about because programmers do not recognize that passing data through a program is often also just passing code through it. Handling code as data is something that Lisp got right.

Grönlund is not fond of XML, but it is okay for use on data. The problem is that when you have some kind of large configuration file in XML, parts of it will be more "code-y"; those parts will look awful in XML, he said. He works a lot with the Pacemaker project, which has the concept of "rule expressions", which are "horrible" because they are code written in XML.

To be honest, he doesn't think that Lisp as a language itself is all that interesting today. There are various dialects and descendants still in use, however. There are also some applications of Lisp that are interesting today, he said, including Guix, which is a transaction-based packaging system that uses an implementation of Scheme: Guile.

Most people don't use Lisp today, but many of its ideas survive. An interpreter is something new that Lisp brought with it; today interpreters are commonplace. Similarly, garbage collection was a concept that was a Lisp innovation. Many of the people who worked on Lisp, especially Scheme, went on to work on Java so elements of Lisp pop up there. JavaScript, Perl, Ruby, and Python are all Lisps without "the parenthesized syntax", he said.

Lisp advocates will claim that the parenthesis problem is something that people can get used to, but Grönlund thinks it is probably just too much of a hassle for most. Given what he has said, he wondered "Why Lisp?" That elicited some laughter from attendees (and Grönlund himself), but Lisp is something he's been attracted to recently and he wanted to try to understand why that was.

He started thinking about it in terms of his attraction to Linux and open source because he believes the two are related. The open-source community values its connection to the past. A community can choose to value innovation, intelligence, and entrepreneurial spirit as the highest ideals, or it can value wisdom, craft, and perfecting something by building on the efforts of others. When you consider wisdom and craft, he said, sharing obviously falls out of that; you want to learn from those who came before and to teach those who come after in order to help hone the craft. That's the connection that he sees; in free software, we are building on this legacy that goes all the way back to the first computers and first programmers.

If you look at proprietary software, he said, it is breaking that history. It is taking the chain of legacy, sharing, and history and breaking it off for selfish purposes; it is anti-social. "I don't like that at all", he said to applause.

He referred to a talk earlier in the week that advocated teaching assembly language to students so that they can understand what the computer is really doing. He thinks teaching Lisp is at least as important, even though he didn't like it or really understand its significance when he had to learn it at his university. Lisp teaches the fundamentals of computing and of computability. You can look at the eval function and have the whole concept of computing in a single screen of code.

While the idea of free software was new in some ways when Richard Stallman came up with it, it really was defending an old concept of sharing and building on the knowledge of others instead of taking ideas and not sharing anymore. Free software is based on a culture building on itself; it is proprietary software that is breaking this chain. His overarching message here was that maintaining the connection to our shared past is an important and worthy goal.

And, of course, he ended his talk with a slide reading:

    STOP )))))))))))

That led to much applause and laughter.

A video in WebM format of the talk is available, as is a YouTube version.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Christchurch for linux.conf.au.]

Comments (48 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>