Designing for failure

By Jonathan Corbet
January 18, 2017

linux.conf.au 2017

Nobody starts a free-software project hoping that it will fail, so it is a rare project indeed that plans for its eventual demise. But not all projects succeed, and a project that doesn't plan for failure risks is doing its users harm. Dan Callahan joined Mozilla to work on the Persona authentication project, and he was there for its recent shutdown. At the 2017 linux.conf.au, he used his keynote slot to talk about the lessons that have been learned about designing a project for failure.

Mozilla is a non-profit organization dedicated to the open Internet. It "does lots of stuff", including the Firefox browser. Firefox helps to protect the net as an open resource in a number of ways, including giving Mozilla a place at the table in settings where the design of the web is under discussion. The web, he said, is too great to leave in the hands of corporations.

Callahan joined Mozilla to work on the Persona project, which sought to simplify and decentralize the process by which people log into web sites. Using Persona, users would go to a site and enter their email address there; they would then be sent to an authentication page under the email-address domain. If they authenticated successfully, they would get a certificate attesting to their identity, which could be used to log into multiple sites. The design was meant to be fully decentralized, with no big sites, not even Persona, in charge of authentication.

Authentication matters, he said, and improving it was a worthy goal. It is worth thinking back to where we were five years ago; the news was full of web-site break-ins and loss of passwords. There was little that users could do to ensure their safety beyond following good password hygiene, and few of them do that. Securing password-based authentication is not a solvable problem.

In response to this problem, sites were replacing passwords with "social login" options whereby users would log in via another provider. This mechanism deprives users of the ability to choose their identity; it "diminished the soul of the net." Social login imposes a third party between a site and its users, and subjects those users to that party's terms of service. For example, Facebook's "real name" policy has tripped up many users. In such a world, there can be no anonymous whistleblowers, no pseudonyms. It represents the loss of a fundamental human right. We cannot, he said, build a free platform without giving people the ability to choose how they identify themselves. Persona allows users to use any identification they want, but it failed. It showed that decentralized authentication is possible, but it failed to change the web.

Callahan is a cave diver, meaning he finds underwater holes and swims as far into them as he can. It is a dangerous endeavor, requiring a lot of equipment and training. Cave divers have developed a number of techniques for dealing with failures, and every dive explicitly tests failure recovery in some way. Years ago, Sheck Exley looked at all known deaths from cave diving in an effort to find the general causes of failure; he came out with five rules:

Do not exceed your training.
Maintain a guide line to open water at all times.
Reserve 2/3 of your gas for the exit.
Do not go beyond the maximum depth of your gas mixture.
Carry three lights.

At the time this was written, following those rules would have prevented all known cave-diving deaths. The free-software community, he said, can learn from what cave divers do, and should come up with its own rules.

Three weeks ago, the Persona servers went read-only, with no further changes allowed; eventually they will fall off the net entirely. We need ways to examine failures like this. If you have a failing project, he said, you should share what is going on so that the community can avoid repeating mistakes.

Lessons learned

The first lesson to be learned in this case is that a free license is not enough to ensure a project's success.

There was a design failure in that the protocol still had a point of centralization. The email provider site doing authentication could not talk directly with the web site; instead it had to go through a relay. The goal was to eventually build the relay into the browser itself, but the Persona project did not plan for a loss of development resources before native browser support was implemented. That meant that anybody wanting to fork the project would have to fork the relay as well — a relay whose location was wired into the sites using Persona. This is a problem that could have been solved, but they were blinded by the context in which they were working and didn't see it.

Bits rot more quickly online, he said. If the LibreOffice project were to go away, we would still have working applications on our systems and could still access our documents. But what happens if WordPress suffers some unfortunate fate? All of those WordPress-based sites would not last long. We need to do better at writing software that can run in a stable mode without requiring people with high skills. He doesn't know how to do that; that's why he was giving a keynote, he said: he gets to present problems for others to solve.

"Complexity limits agency" was another one of the lessons. A project with a lot of moving parts requires a lot of skills just to set it up. People with such skills tend to be in high demand and not generally available; that is not a situation that empowers people. A free license, he said, does not further freedom for people who cannot run the software.

There were a number of little mistakes. The Persona user interface would put up a popup window for the authentication, with the idea that the context of the underlying page would be preserved. But a lot of users reflexively close popups without even looking at them; then they wonder why Persona isn't working. The project built a system that didn't mesh with user heuristics.

Mistakes in the API design led to lots of bugs; that didn't help either.

The project was not measuring the right things, he said; "we did not know who we really were". Was Persona a development project, or was it network infrastructure? It was staffed and developed like a project, and measured its success by the number of users it had. If, instead, Persona had seen itself as infrastructure its developers would have asked different questions: was that infrastructure solving a real problem? This disconnect led to the wrong design decisions and a certain amount of "we will solve the web" hubris.

A project should explicitly define and communicate its scope, drawing clear boundaries between what the project is and what it is not. Did Persona verify email addresses, or did it solve the identification problem? The way the project's scope was defined, web sites almost had to be subservient to Persona.

While the Persona project was going on, Mozilla was also trying to start a new mobile phone. Phones need authentication too, so it was deemed that Persona could fit into that role. It is true that it could fit, he said, if one applied a great deal of force. But, in truth, it was the wrong tool for the job and did not fit well.

Projects should ruthlessly oppose complexity. Persona suffered from an explosion of options and dependencies, resulting in complex code that made everything harder. Among other things, that makes it harder for new contributors to join the project. In this case, there were only one or two outside contributors who did any significant work; when Mozilla stepped away from the project, nobody else was there to pick it up. Developers on a project should be able to say immediately if their system behaves as they think it should. "Focus and simplify."

Planning for failure

Persona made its share of mistakes. But, even when everything has been done right, projects can fail for any of a number of reasons. Thus, developers should be planning for failure from the beginning.

If you know your project is dead, Callahan said, you should say so. Persona took three years to go from the removal of staff to the unplugging of the servers. Mozilla tried to maintain the system without development resources, taking some 20 months to say that things were not working. There is a natural fear that admitting death is a self-fulfilling action; one always hopes that the project will come back to life. But admitting that this will not happen lets people prepare a replacement.

A project should ensure that users can recover without its involvement. Failure quickly leads to a demoralized, burned-out state; it is really hard for developers to do recovery work at that point. One of the things Persona did right was to use email addresses for identification; that allowed sites using Persona to send a password-reset email to affected users. The data needed to recover from Persona's failure was available outside of Persona itself. In general, projects should use standard data formats, and have users store their own data.

To conclude, projects should seek to minimize the harm that results if and when they go away. Like a diver with three flashlights, a user of a failing but well-planned project can switch to another. We have to talk about our failures, he said, because the alternative is to continue repeating the same mistakes.

[Your editor would like to thank linux.conf.au and the Linux Foundation for assisting with his travel to the event.]

Index entries for this article
Conference	linux.conf.au/2017

Designing for failure

Posted Jan 19, 2017 0:26 UTC (Thu) by josh (subscriber, #17465) [Link] (9 responses)

Persona had a particularly compelling use case to offer: eliminating authentication entirely, in favor of integration with browsers and Sync. The browser could do in-browser authentication locally, authenticate to sites using the Persona protocol, and sync access keys between devices. That would have entirely eliminated passwords, including for the Persona service itself.

However, that use case never quite materialized; the promised browser extensions and local integration didn't really happen. Which left Persona as a somewhat less compelling single-sign-on solution, using an account people didn't already have.

I'd have loved to see browser-integrated authentication that made passwords obsolete. I still would.

Designing for failure

Posted Jan 19, 2017 0:33 UTC (Thu) by pabs (subscriber, #43278) [Link] (8 responses)

One already exists: client side TLS certs. Debian uses it:

https://wiki.debian.org/DebianSingleSignOn

Designing for failure

Posted Jan 19, 2017 0:53 UTC (Thu) by josh (subscriber, #17465) [Link] (3 responses)

Client certificates have far less usability, especially across multiple sites with multiple identities.

Designing for failure

Posted Jan 19, 2017 5:14 UTC (Thu) by daurnimator (guest, #92358) [Link] (2 responses)

Well can we fix that?

Designing for failure

Posted Jan 19, 2017 7:57 UTC (Thu) by pabs (subscriber, #43278) [Link]

It is more likely they will go away altogether. IIRC with Chrome you currently have to whitelist sites and they plan to reduce or delete the functionality eventually.

Designing for failure

Posted Jan 19, 2017 20:01 UTC (Thu) by josh (subscriber, #17465) [Link]

Not with client certificates, no. But I've seen a few web standards proposals floating around for browser-based challenge-response authentication.

Designing for failure

Posted Jan 19, 2017 0:53 UTC (Thu) by SEJeff (guest, #51588) [Link] (3 responses)

I'm curious if that predates the Fedora Account System, which is 100% based on TLS client certificates as well for all Fedora webservices:

https://fedorahosted.org/fas/

Designing for failure

Posted Jan 19, 2017 0:58 UTC (Thu) by pabs (subscriber, #43278) [Link]

The Debian one used to be based on DACS, the client certs part is relatively recent.

Designing for failure

Posted Jan 19, 2017 8:06 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

Actually, Fedora migrated to Kerberos recently.

Designing for failure

Posted Jan 19, 2017 15:20 UTC (Thu) by smoogen (subscriber, #97) [Link]

The Fedora Account System (FAS) is a 'lot more' than just client side certs. That part was mainly for the build system, and I think that predates Persona but in a way that Fedora wasn't the inventor but trying to implement Kerberos in SSL (ie the client side certificate was analogous to the keytab. And yes this is an unfair characterization on my part.). I think that the large scale lack of adoption of Persona but just enough to keep it going was similar to what we saw with Fedora in that for every group that picked it up.. it wasn't enough to push for more development to fix things.

As pointed out by others, the Fedora Account System that Fedora proper uses has tied in various kerberos bits and so packagers no longer need to download certificates (though dozens of wiki pages still say they need to.) but do a kinit to get a kerberos ticket for the background system. This isn't the end of the old system though, there is a couple of developers led by Xavier Lamien is working on FAS3 which has other updates to I think client side certs and federation.

Designing for failure

Posted Jan 19, 2017 18:35 UTC (Thu) by flussence (guest, #85566) [Link] (1 responses)

That's some very good advice. Mozilla would do well to apply it to Weave, which has many of the same problems (de-facto centralised, hard to run).

Designing for failure

Posted Jan 20, 2017 2:33 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Just to note, Weave is basically dead. It's now called Firefox Accounts (FxA).

Designing for failure

Posted Jan 26, 2017 10:48 UTC (Thu) by ssokolow (guest, #94568) [Link]

I actively avoided Persona because the UX was accidentally hostile to users who give a different e-mail address to each site when they create an account.

(I don't like having to check a spam bin for false positives, so I've set up a system where e-mail aliases are treated as revokable API keys and, if I can find the time, I want to write a custom milter and browser extension to streamline and automate things even further.)

I was also holding a grudge that they'd "morphed" it from their 99% in-browser "Identity 2.0" plans which were slated to have the following benefits:

Unify the UX for HTTP basic/digest auth, "Remember Password", OpenID, etc.
Provide a panel-based login/logout UI that could be leveraged by HTTP basic/digest auth
Collaborate with the Google Chrome team to define a "this site's login API is..." HTML microformat that allows an in-browser login/logout UI to transparently learn new sites in a reliable fashion, similar to how OpenSearch allows the address bar to to transparently learn new site-specific search APIs.
Present an extension API that allows new authentication providers to be added to the browser's unified identity manager.

To this day, I'm still sore that such a thing never materialized when they seemed to have interest from the Chrome team at the time. The only thing that's really wrong with HTTP auth is that the client-end UX is terrible.

...and it also took far too long for them to explain how it was superior to OpenID+WebFinger. (Answer: The OpenID authenticator knows every site you logged into. A Persona identity provider doesn't.)

Designing for failure

Posted Jan 28, 2017 20:46 UTC (Sat) by mmoya (subscriber, #84295) [Link]

What about SPRESSO. There was a presentation talk in the 33C3 covering some history of web authn solutions?