LWN.net Weekly Edition for January 19, 2017

Designing for failure

By Jonathan Corbet
January 18, 2017

linux.conf.au 2017

Nobody starts a free-software project hoping that it will fail, so it is a rare project indeed that plans for its eventual demise. But not all projects succeed, and a project that doesn't plan for failure risks is doing its users harm. Dan Callahan joined Mozilla to work on the Persona authentication project, and he was there for its recent shutdown. At the 2017 linux.conf.au, he used his keynote slot to talk about the lessons that have been learned about designing a project for failure.

Mozilla is a non-profit organization dedicated to the open Internet. It "does lots of stuff", including the Firefox browser. Firefox helps to protect the net as an open resource in a number of ways, including giving Mozilla a place at the table in settings where the design of the web is under discussion. The web, he said, is too great to leave in the hands of corporations.

Callahan joined Mozilla to work on the Persona project, which sought to simplify and decentralize the process by which people log into web sites. Using Persona, users would go to a site and enter their email address there; they would then be sent to an authentication page under the email-address domain. If they authenticated successfully, they would get a certificate attesting to their identity, which could be used to log into multiple sites. The design was meant to be fully decentralized, with no big sites, not even Persona, in charge of authentication.

Authentication matters, he said, and improving it was a worthy goal. It is worth thinking back to where we were five years ago; the news was full of web-site break-ins and loss of passwords. There was little that users could do to ensure their safety beyond following good password hygiene, and few of them do that. Securing password-based authentication is not a solvable problem.

In response to this problem, sites were replacing passwords with "social login" options whereby users would log in via another provider. This mechanism deprives users of the ability to choose their identity; it "diminished the soul of the net." Social login imposes a third party between a site and its users, and subjects those users to that party's terms of service. For example, Facebook's "real name" policy has tripped up many users. In such a world, there can be no anonymous whistleblowers, no pseudonyms. It represents the loss of a fundamental human right. We cannot, he said, build a free platform without giving people the ability to choose how they identify themselves. Persona allows users to use any identification they want, but it failed. It showed that decentralized authentication is possible, but it failed to change the web.

Callahan is a cave diver, meaning he finds underwater holes and swims as far into them as he can. It is a dangerous endeavor, requiring a lot of equipment and training. Cave divers have developed a number of techniques for dealing with failures, and every dive explicitly tests failure recovery in some way. Years ago, Sheck Exley looked at all known deaths from cave diving in an effort to find the general causes of failure; he came out with five rules:

Do not exceed your training.
Maintain a guide line to open water at all times.
Reserve 2/3 of your gas for the exit.
Do not go beyond the maximum depth of your gas mixture.
Carry three lights.

At the time this was written, following those rules would have prevented all known cave-diving deaths. The free-software community, he said, can learn from what cave divers do, and should come up with its own rules.

Three weeks ago, the Persona servers went read-only, with no further changes allowed; eventually they will fall off the net entirely. We need ways to examine failures like this. If you have a failing project, he said, you should share what is going on so that the community can avoid repeating mistakes.

Lessons learned

The first lesson to be learned in this case is that a free license is not enough to ensure a project's success.

There was a design failure in that the protocol still had a point of centralization. The email provider site doing authentication could not talk directly with the web site; instead it had to go through a relay. The goal was to eventually build the relay into the browser itself, but the Persona project did not plan for a loss of development resources before native browser support was implemented. That meant that anybody wanting to fork the project would have to fork the relay as well — a relay whose location was wired into the sites using Persona. This is a problem that could have been solved, but they were blinded by the context in which they were working and didn't see it.

Bits rot more quickly online, he said. If the LibreOffice project were to go away, we would still have working applications on our systems and could still access our documents. But what happens if WordPress suffers some unfortunate fate? All of those WordPress-based sites would not last long. We need to do better at writing software that can run in a stable mode without requiring people with high skills. He doesn't know how to do that; that's why he was giving a keynote, he said: he gets to present problems for others to solve.

"Complexity limits agency" was another one of the lessons. A project with a lot of moving parts requires a lot of skills just to set it up. People with such skills tend to be in high demand and not generally available; that is not a situation that empowers people. A free license, he said, does not further freedom for people who cannot run the software.

There were a number of little mistakes. The Persona user interface would put up a popup window for the authentication, with the idea that the context of the underlying page would be preserved. But a lot of users reflexively close popups without even looking at them; then they wonder why Persona isn't working. The project built a system that didn't mesh with user heuristics.

Mistakes in the API design led to lots of bugs; that didn't help either.

The project was not measuring the right things, he said; "we did not know who we really were". Was Persona a development project, or was it network infrastructure? It was staffed and developed like a project, and measured its success by the number of users it had. If, instead, Persona had seen itself as infrastructure its developers would have asked different questions: was that infrastructure solving a real problem? This disconnect led to the wrong design decisions and a certain amount of "we will solve the web" hubris.

A project should explicitly define and communicate its scope, drawing clear boundaries between what the project is and what it is not. Did Persona verify email addresses, or did it solve the identification problem? The way the project's scope was defined, web sites almost had to be subservient to Persona.

While the Persona project was going on, Mozilla was also trying to start a new mobile phone. Phones need authentication too, so it was deemed that Persona could fit into that role. It is true that it could fit, he said, if one applied a great deal of force. But, in truth, it was the wrong tool for the job and did not fit well.

Projects should ruthlessly oppose complexity. Persona suffered from an explosion of options and dependencies, resulting in complex code that made everything harder. Among other things, that makes it harder for new contributors to join the project. In this case, there were only one or two outside contributors who did any significant work; when Mozilla stepped away from the project, nobody else was there to pick it up. Developers on a project should be able to say immediately if their system behaves as they think it should. "Focus and simplify."

Planning for failure

Persona made its share of mistakes. But, even when everything has been done right, projects can fail for any of a number of reasons. Thus, developers should be planning for failure from the beginning.

If you know your project is dead, Callahan said, you should say so. Persona took three years to go from the removal of staff to the unplugging of the servers. Mozilla tried to maintain the system without development resources, taking some 20 months to say that things were not working. There is a natural fear that admitting death is a self-fulfilling action; one always hopes that the project will come back to life. But admitting that this will not happen lets people prepare a replacement.

A project should ensure that users can recover without its involvement. Failure quickly leads to a demoralized, burned-out state; it is really hard for developers to do recovery work at that point. One of the things Persona did right was to use email addresses for identification; that allowed sites using Persona to send a password-reset email to affected users. The data needed to recover from Persona's failure was available outside of Persona itself. In general, projects should use standard data formats, and have users store their own data.

To conclude, projects should seek to minimize the harm that results if and when they go away. Like a diver with three flashlights, a user of a failing but well-planned project can switch to another. We have to talk about our failures, he said, because the alternative is to continue repeating the same mistakes.

[Your editor would like to thank linux.conf.au and the Linux Foundation for assisting with his travel to the event.]

Comments (14 posted)

Building the world we want to have

By Jonathan Corbet
January 18, 2017

linux.conf.au 2017

Pia Waugh has been a mainstay of the Australian free-software community for many years; among other things, she was one of the organizers of the 2007 linux.conf.au event. She is also known for her open government work. Ten years after running LCA, she returned to the conference as the opening keynote speaker. Nobody could possibly accuse her of thinking small as she outlined a somewhat utopian view of where the world is going and how the free-software community can help it to get there.

We are, she began, at a tipping point where we can reinvent our world. But we have to do it carefully, or we risk reinventing the past with a few shiny new things added. We need to make active choices about the future that we want to have.

Human society has evolved over hundreds of thousands of years, often helped by the "cooperative competitiveness" that causes us to try to outdo each other while working together. Early humans figured out their world and shared information through trade and travel; the latency tended to be high, but we collected a lot of information over time. Through continuous improvement, humanity was able to move far from its origins and occupy every continent on the planet.

More recently, we started building cities and, in the process, created differentiated social schemes. People had increasingly specialized roles, which gave them more time to do interesting things; the pace of advancement picked up as a result. We got to a point where we had a great surplus, leading to great power and, sometimes, great rulers.

Great power can also lead to great inequality; occasionally, people get sick of that inequality and revolt, replacing their rulers. The model has shifted somewhat with the independence movements seen around the world around 250 years ago. These movements, for the first time, codified the idea of inalienable individual rights; in a sense, she said, we all became kings. Power is now massively distributed, and we should not forget it. Part of power is in the wielding, but power is also dependent on belief that one actually has it.

People feel disempowered in our world but, in truth, we have the most individual power that people have ever had in our history. This has led people to believe that they can play an active role in defining their own future. Free software and open knowledge are products of that shift but, importantly, they are also amplifiers of it. We are on an exponential curve. That's not to say that there isn't still a great deal of inequality and related problems to deal with, of course.

There are a number of ways to look at human history to this point; Waugh suggested the Big History Project as one possibility. Another is to read the changelog, as seen on her slides; an excerpt appears below:

## [1.2.0] — 1760 CE "industrial revolution"
### Changed
— Agricultural libraries replaced by industrial libraries, still single core but heaps faster

The end result is that citizens have the power now. In a time of surplus, we can have distributed control of our society. All humans have the ability to publish, to communicate, and to monitor. Importantly, we also have the ability to enforce; anybody with a computer is now able to disrupt a company (or an economy) if they set their mind to it. We are more powerful than ever before, the rate of change is increasing and, she said, we made all this up and can do it again if we so choose.

Rethinking work and more

Everybody, she said, is creative in some way; every person she has ever met has something interesting inside them somewhere. Sometimes one has to dig a bit to find it, but it's there. Almost all of us switch it off when we go to work, though. People are scared in general about technology and jobs; that has been the case for a long time but, still, we have managed to thrive. But why are we still stuck on the 40-hour work week? Why do we see a person's value as being tied to the amount of money or stuff they have? Now that we are in a position to automate so much, it is time to question our approach to work.

As the curve of progress goes ever more steeply upward, there are a number of things we can look forward to. Three-dimensional printing is finally getting serious; we have moved beyond printing silly plastic toys to the prospect of printing organs for transplants. What if we could put raw materials into a device and get food out? Hunger would be solved. The possibilities are amazing, but the conversation around this technology is all about copyright. Why should we be protecting antiquated business models rather than caring about people? We have the potential to tackle scarcity, and should be doing so.

Then there is personal augmentation. People have always done it, using feathers, body paint, piercing, or tattoos. There is an obsession with "the norm" and discomfort with parting from it, but that is silly. There is no such thing as "normal"; the Internet has made that clear. Brains are plastic and can adapt to new things or foreign input. If you lose a leg, why be content with replacing it; why not have seven legs? Or wheels? Waugh suggested that the Paralympics should remove all limits and "let them go nuts" exploring what humans can do.

Closer to home, she recently had a baby; pregnancy was, for her, an awful experience. It would have been so much nicer to just outsource the whole thing, to grow the baby in a vat. This kind of conversation makes people nervous, but we are going to have to have those hard conversations. We need to embrace the changes that are coming.

Toward the world we want

We should be aiming for global citizenship. She asked the audience whether they felt that their ideas were reflected by their government; few hands were raised. A while back, French President Nicolas Sarkozy gave a speech about how amazing the Internet is. He then spent a lot of time talking about how the government, as the sole representative of the French people, must regulate the Internet. But governments are not our only representative; we are good at representing our own rights.

In the free-software and open-knowledge communities, we are the pioneers, building the operating system for our fellow humans. As we do so, we are obligated to work to make the world a better place. As geeks, we are good at routing around damage, but others are not so fortunate. We are going to have to be asking the hard questions; the coming artificial-intelligence wave is going to raise a lot of ethical issues, for example.

Our community is famous for scratching its own itches, but it is time to start thinking more about systemic change, she said. There is a place for symptomatic relief, but in the end, we want to get rid of the source of the itch. In a similar vein, initiatives like outreach programs are great for dealing with symptoms, but it's still addressing the symptoms. Few in the audience were willing to raise their hands and say that they were part of a truly diverse social group; that is a systemic problem.

There are a number of questions we should all be asking ourselves. Who are we building for, and who are we explicitly not building for? What is the default position in our society? If it is difficult for some people to participate fully, how can we blame them for staying apart? What does it mean to be human, and what do we value in our human society? What unconscious biases and assumptions are we carrying around? How are we helping non-geeks help themselves? And, most importantly, what kind of future do we want to see?

She repeated that we built "all this" and can build it again if need be; we are not limited to what we have now. Our fellow humans are only as free as the tools that they use. We have to expand that freedom, or our ideas will remain on the fringe. A lot of things have fundamentally changed, but a lot of our assumptions, about a closed society dominated by centralized power and scarcity, have not changed accordingly.

Waugh concluded by saying that she believes that our community is the best possible example of the cooperative competition that has brought humanity this far, and of the technical shift that is driving the next stage. We will continue to thrive and do amazing things, but we have to be careful not to bring old baggage along when we are reinventing the world. It is not enough to ensure our own freedom, we have to liberate others. We must, she said, work to build the change that we want to see.

[Your editor would like to thank linux.conf.au and the Linux Foundation for assisting with his travel to the event.]

Comments (7 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: Ansible and CVE-2016-9587; New vulnerabilities in bind, docker, qemu, webkit2gtk, ...
Kernel: kvmalloc(); Controlling storage with a filesystem.
Distributions: Tracking package updates with release-monitoring.org; Debian, Fedora, openSUSE, ...
Development: A unified TLS API for Python; Calligra, GNU ed, LTP, Plasma, ...
Announcements: An updated FSF high-priority project list, Techdirt's First Amendment Fight For Its Life, ...

Next page: Security>>