LWN.net Weekly Edition for August 4, 2016

The Internet of Onions

By Nathan Willis
August 3, 2016

The tension between convenience and security is age-old, but it has, perhaps, never been as acutely felt as it is in debates about the Internet of Things (IoT). Smart devices for monitoring and controlling household and industrial appliances are cheap and near-ubiquitous. But the cheapness and the near-ubiquity of the commercial IoT products on the market both come at the cost of exposing one's devices and home network to eavesdroppers and remote servers. A Tor developer recently undertook an effort to increase IoT security by routing device traffic through the Tor network. While the work is, so far, available only in a single free-software IoT platform, the developers are hoping it will spread.

Tor developer Nathan Freitas, who is also Executive Director of the Guardian Project, undertook the the development and testing of Tor-transport support for the open-source home-automation program Home Assistant. On July 20, Freitas explained the project in a blog post.

Home invasions

In brief, Home Assistant is a Python-based tool that provides a single web interface for a wide range of individual IoT devices. Freitas's contribution is a network configuration that routes Home Assistant's web interface over a Tor Hidden Service (so that it can only be accessed by a Tor-enabled browser via a special .onion domain name).

In a demonstration video [YouTube], Freitas calls attention to two separate concerns about commercial IoT devices. The first is that many expect to have direct access to the Internet, often for an HTTP connection to a remote service, and the installation instructions advise users to enable access by simply opening up ports on their firewall (if, indeed, they have one to begin with).

The risk is that attackers can easily scan for the signatures known-to-be-insecure IoT products, log them, and access the devices at will. This is rather dangerous, when one considers that IP-based video cameras, climate controls, door locks, and other sensitive devices are exposed in this manner.

The second concern is that many products enable device operation solely through a remote web server controlled by the manufacturer; the addresses of these servers may even be hard-coded into the device's software. For the curious, Matthew Garrett has recently taken to writing in-depth reviews of such products on Amazon.com, with both informative and frightening results.

Some privacy-conscious users may understandably regard the second concern as the more serious one, since pervasive data logging and user tracking could be taking place at the remote server (not to mention the fact that the device could fail to function without an Internet connection). For many others, however, the first risk is greater. Stories of webcam and baby-monitor exploits are already commonplace; as more device classes go online, one can only expect further attacks. As a result, securing access to the web interfaces of IoT devices has been the initial focus of Freitas's effort.

Hiding the route

Merely routing the web interface over Tor goes a long way to protect users from the eavesdropping concern, but if the device's front end is reachable through its .onion address, it can still be detected and scanned on the Tor network, as the 2013 paper Trawling for Tor Hidden Services [PDF] explains. The technique described in the paper involves running a large set of Tor relay nodes that secretly log the appearance of hidden service nodes. Although the distributed-consensus model Tor employs makes it difficult for any attacker to shut out the legitimate relays (another possible attack), a well-funded attacker could certainly monitor the availability of hidden services over time and hone in on their locations using traffic correlation.

The solution is to enable a little-known "stealth" authorization option in the service's configuration. Setting the HiddenServiceAuthorizeClient to stealth on the server causes Tor to generate a random authorization key pair, so that the public key can be copied to each device that should be allowed to access the service.

From that point on, the nodes in the Tor network provide additional security beyond what would be achievable with (say) a standard HTTP cookie, blocking connection requests from clients without the proper credentials by reporting that the requested server is unroutable.

This is possible because Tor hidden services announce their presence to the network by publishing a service descriptor that normally includes the address of an "introduction point" node. The service descriptors are tracked in Hidden Service Directory nodes within the Tor network. Other Tor relay nodes then know how to establish a circuit to the service by looking up the introduction point, even though those relays cannot peek beyond the introduction point to find where the service originates.

In stealth mode, however, the introduction-point portion of the service descriptor is encrypted with the private key from the authorization key pair; authorized clients will be able to decrypt the descriptor field because they possess the corresponding key, but other clients will not. Attempts by unauthorized clients to connect to the introduction-point node listed in the service descriptor will fail as being unreachable, making the result indistinguishable from a hidden server that has simply gone offline.

There is no limit to the number of authorization keys a server can use, so each client can use a unique key and the keys can be replaced as frequently as desired. Services are also allowed to publish fake service descriptors to further obfuscate connection points from eavesdroppers and trawlers.

Secure all the things

In an email, Freitas noted that there are two challenges to adding Tor support to a project like Home Assistant. The first is that the program's network layer has to be able to work using an .onion address—which, evidently, breaks many assumptions about host-name formatting and lookup.

The second is that the code dealing with the IoT devices themselves is prone to unintended connection leakage. For example, he said, "with Home Assistant there are a lot of outbound connections that it relies on for access to public data that we also want to tunnel through Tor." As is the case with HTTPS support, he added, fixing this is typically a matter of finding developers willing to make sure the Tor routing works. "If we can build a 'LetsOnion' type script to make that easier," he said, "we will."

Moving forward, Freitas said he hopes to work on adding Tor support to openHAB and similar open-source home-automation systems. In the meantime, however, he said that users can route individual apps over Tor using the Guardian Project's Orbot proxy for Android. By default, most consumer IoT devices are paired with a one-shot app (one for each brand of light bulb, one for the thermostat, one for the security cameras, etc.). Orbot can tunnel these apps to a Tor endpoint running in the user's home; the result is not as good as complete anonymity, perhaps, but it is certainly superior to trusting the Internet unconditionally. Developers, of course, can also build Tor-routing functionality into the app directly—the Guardian Project's Netcipher library is designed for that purpose.

The one remaining piece of the IoT puzzle is what to do about devices that rely on maintaining a connection to a remote web server in order to function. Some such devices can still function with only a LAN connection, which can in turn be routed over Tor. Others, however, fail to function if they cannot "phone home." At that point, the only solution may be to avoid the device in question. Freitas said he was investigating which IoT devices work in LAN-only mode and will try to ensure that they can function when routed over Tor.

But which devices go into the home will ultimately remain the user's choice. The security-versus-convenience trade-off has attracted a fair amount of attention from highly technical users, but it remains to be seen whether or not the industry as a whole will ever care. Freitas said he hopes that the Guardian Project's IoT work can demonstrate that the steps required to secure this class of device are small and easy to take, if one is interested in the goal.

Comments (15 posted)

Why Uber dropped PostgreSQL

By Jake Edge
August 3, 2016

The rivalry between database management systems is often somewhat heated, with fans of one system often loudly proclaiming that a competitor "sucks" or similar. And the competition between MySQL and PostgreSQL in the open-source world has certainly been heated at times, which makes a recent discussion of the pros and cons of the two databases rather enlightening. While it involved technical criticism of the design decisions made by both, it lacked heat and instead focused on sober analysis of the differences and their implications.

The transportation company Uber had long used PostgreSQL as the storage back-end for a monolithic Python application, but that changed over the last year or two. Uber switched to using its own Schemaless sharding layer atop MySQL and on July 26 published a blog post by Evan Klitzke that set out to explain why the switch was made. There were a number of reasons behind it, but the main problem the company encountered involved rapid updates to a table with a large number of indexes. PostgreSQL did not handle that workload particularly well.

There were effectively two facets to the problem with the table: changing any indexed field would result in write amplification because each index also needed to be updated and all of those writes need to be replicated across multiple servers. One advantage that MySQL has, Klitzke said, is that it uses a level of indirection in the InnoDB storage engine so that an update to an indexed field does not require updating the unrelated indexes.

There were some other problems as well. Uber encountered a bug in PostgreSQL 9.2 that led to data corruption, which rightly caused a lot of consternation. The bug was fixed quickly, but some corrupted data did end up being replicated, which made things worse. However, the blog post seems to imply that this kind of problem is somehow PostgreSQL-specific and does not really acknowledge that bugs will occur in all database systems (really, all software, of course), including MySQL:

The bug we ran into only affected certain releases of Postgres 9.2 and has been fixed for a long time now. However, we still find it worrisome that this class of bug can happen at all. A new version of Postgres could be released at any time that has a bug of this nature, and because of the way replication works, this issue has the potential to spread into all of the databases in a replication hierarchy.

Another problem that Uber encountered was in upgrading to new PostgreSQL releases. The process it used was time-consuming and required quite a bit of downtime, which it could not afford. MySQL supports both binary replication (which is what PostgreSQL uses) and statement-level replication. The latter allows MySQL to be more easily upgraded in place, without significant downtime, Klitzke said.

Joshua D. Drake posted a pointer to the blog post to the pgsql-hackers mailing list: "It is a very good read and I encourage our hackers to do so with an open mind." And it seems that's just what they did. While there was disagreement with some of the statements and implications in the blog post, there was also acknowledgment that some of the problems are real.

Josh Berkus restated the main write-amplification problem in a more concrete way. If you have a large enough table for indexes to make sense and use that table in JOIN statements throughout the application, indexing most of the columns may make sense from a performance standpoint. But if that table is updated frequently ("500 times per second"), there is a problem:

That's a recipe for runaway table bloat; VACUUM can't do much because there's always some minutes-old transaction hanging around (and SNAPSHOT TOO OLD doesn't really help, we're talking about minutes here), and because of all of the indexes HOT isn't effective. Removing the indexes is equally painful because it means less efficient JOINs.

The Uber guy is right that InnoDB handles this better as long as you don't touch the primary key (primary key updates in InnoDB are really bad).

This is a common problem case we don't have an answer for yet.

Bruce Momjian amended that slightly: "Or, basically, we don't have an answer to without making something else worse." Tom Lane, though, saw it as more of an annoyance, rather than "a time-for-a-new-database kind of problem". But both Berkus and Robert Haas disagreed; Haas said that he has "seen multiple cases where this kind of thing causes a sufficiently large performance regression that the system just can't keep up". Berkus called it "considerably more than an annoyance for the people who suffer from it", but agreed that it is not something that should, by itself, cause a database switch.

Stephen Frost took issue with some of the high-level criticisms in the blog post that were quoted by Drake. In particular, the table-corruption problem is hardly PostgreSQL-specific: "The implication that MySQL doesn't have similar bugs is entirely incorrect, as is the idea that logical replication would avoid data corruption issues (in practice, it actually tends to be quite a bit worse)." In addition, there are ways to make upgrading to newer versions much less painful:

Their specific issue with these upgrades was solved, years ago, by me (and it wasn't particularly difficult to do...) through the use of pg_upgrade's --link option and rsync's ability to construct hard link trees. Making major release upgrades easier with less downtime is certainly a good goal, but there's been a solution to the specific issue they had here for quite a while.

He also wondered if Uber truly understood the write amplification problem and its implications. That is a theme that was taken up by Markus Winand in a blog post on the switch. In it, he agreed that MySQL might be the best choice for Uber's use case, but felt like there were some missing pieces in the explanation of the company's problems. The source of the problem is that Uber has an update-heavy workload that is evidently updating one or more of the indexed columns; otherwise PostgreSQL already has a solution:

However, there is a little bit more speculation possible based upon something that is not written in Uber's article: The article doesn't mention PostgreSQL Heap-Only-Tuples (HOT). From the PostgreSQL source, HOT is useful for the special case "where a tuple is repeatedly updated in ways that do not change its indexed columns." In that case, PostgreSQL is able to do the update without touching any index if the new row-version can be stored in the same page as the previous version. The latter condition can be tuned using the fillfactor setting. Assuming Uber's Engineering is aware of this means that HOT is no solution to their problem because the updates they run at high frequency affect at least one indexed column.

He wondered if all of the indexes were actually needed, but that cannot be determined from the information in Uber's blog post. That post mentions InnoDB's indirection as an advantage, though, and downplays the penalty that comes from that indirection. Winand calls it the "clustered index penalty" and suggested that it can be substantial if queries are made using the secondary keys. The fact that Uber downplayed that penalty makes it appear that most of the queries use the primary index, which doesn't suffer from the penalty.

In the end, Winand concluded—in a somewhat snarky way—that what Uber is looking for is a key/value store with a SQL front-end. Since InnoDB is a "pretty solid and popular key/value store" and that MySQL (and MariaDB) provide SQL on top of it, it makes sense that it works well for the company.

Simon Riggs also addressed Klitzke's post with a blog post of his own. He welcomed Uber raising the points that it did, but was concerned that "a number of important technical points are either not correct or not wholly correct because they overlook many optimizations in PostgreSQL that were added specifically to address the cases discussed". He noted the penalty for using indirect indexing on secondary keys, as MySQL does, but also pointed out that PostgreSQL could use indirect indexes some day as well:

Thus, it is possible to construct cases in which PostgreSQL consistently beats InnoDB, or vice versa. In the “common case” PostgreSQL beats InnoDB on reads and is roughly equal on writes for btree access. What we should note is that PostgreSQL has the widest selection of index types of any database system and this is an area of strength, not weakness.

The current architecture of PostgreSQL is that all index types are “direct”, whereas in InnoDB primary indexes are “direct” and secondary indexes “indirect”. There is no inherent architectural limitation that prevents PostgreSQL from also using indirect indexes, though it is true that has not been added yet.

Riggs also said that statement-level replication has performance and corner-case problems that make it unsuitable for PostgreSQL. It does save bandwidth, as Klitzke pointed out, but can lead to hard-to-diagnose replication problems.

The discussion on the mailing list was largely even-tempered and focused on the problems at hand, much like all three of the blog posts mentioned above. Some solutions were considered and might become PostgreSQL features down the road. In the end, publicly losing a big user like Uber is perhaps a little unfortunate—and it does seem like there may have been other factors in play, such as new pro-MySQL CTO—but it is in no way a condemnation of PostgreSQL as a whole. In fact, as Berkus put it: "Even if they switched off, it's still a nice testimonial that they once ran their entire worldwide fleet off a single Postgres cluster."

Overall, the "incident" demonstrates a sensible approach to criticism of a project: find the pieces that are truly problems and look at how to solve them. In the case of Uber, it may well be that it is best served by MySQL, but it is also likely that others with different needs will see things differently. Having several open-source software choices means that everyone can choose the right tool for their job.

Comments (15 posted)

Some news from LWN

By Jonathan Corbet
August 3, 2016

It has been some time since our last update on the state of LWN itself. That's somewhat by design, as we'd rather be writing about the community and the code than ourselves. Occasionally, though, we do like to update our readers and subscribers on the state of the operation, especially when there is some news to report, as is the case now.

We'll start with the sad (for LWN and its readers) news: Nathan Willis, who has been an LWN contributor for many years and an employee since 2012, will be stepping down at the end of September to pursue an unmissable opportunity to study one of his non-journalistic passions: fonts and type design. We will miss him, but we believe strongly in following our own paths in life and wish him well.

Nate will continue to contribute articles to LWN. But we suspect that the intricacies of Béziers, brush strokes, and kerning are going to take a lot of time and attention, meaning that we will be needing somebody to help fill his shoes. Thus, LWN is hiring. If you would like to write full-time for one of the most discriminating readerships in the world — but also one of the most interesting, engaged, and supportive readerships — we would like to hear from you. This is your chance to make your mark on one of the community's oldest publications.

Speaking of "oldest," the basic format of LWN's Weekly Edition has changed little over the last 18 years. Some pages have come and gone (long-time readers will remember the desktop page, or the once-interesting "Linux in the News" page), but substantive changes have been few indeed. That format has served us well over the years; among other things, it helps us to ensure that each edition covers a wide range of topics. But it can also be somewhat limiting; it is a sort of treadmill of slots to be filled each week that makes it hard to focus on specific areas in response to what is happening in the community.

In an attempt to address those issues, and also partially driven by the prospect of being editorially understaffed for a while, we may start to experiment a bit with the format of the edition. There will be no radical or abrupt changes, but you may see us trying out some ideas from one week to the next. As always, we will welcome feedback or suggestions for changes that readers think should be made.

LWN is, of course, a subscription-supported operation. Growth in the number of subscribers is thus critical to the growth of LWN as a whole. Unfortunately, that growth has not been happening for a few years; in the last year we have, in fact, seen a slight decline. Our financial situation is secure for now, but we would like to see subscriptions grow, which would help provide even more security as well as more resources to expand what we do. So we would like to ask our readers: if you are reading this without a subscription, please consider how LWN is created and whether it is worth supporting. If you routinely provide subscriber links to friends, please consider encouraging them to subscribe. If you work in a company with an interest in Linux, consider asking your employer to get a group subscription for everybody there.

Along the same lines, advertising revenue, which was never a huge part of LWN's income, has shrunk in recent years; this is not unique to LWN, as the whole industry is complaining about the problem. We have never felt particularly good about advertising in the first place; it is an industry with more than its share of privacy problems, and the ads we get are often not appropriate to LWN's readers. We would like to drop ads altogether, but can't quite afford to do that. If, however, subscriptions were to return to a growth path sufficient to replace the revenue we would lose, we would happily consider leaving advertisements behind. There is no doubt that LWN would be better without them.

In summary, LWN looks to be heading into a period of moderate change. One thing that will not change, though, is our commitment to producing the highest-quality coverage of the Linux and free software community available anywhere. With your support, we'll be at this for a long time yet.

Comments (74 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: Felony PGP; New vulnerabilities in dropbear, mozilla, tiff, wireshark, ...
Kernel: 4.8 Merge window part 2; Hardened usercopy; 4.7 Development statistics.
Distributions: Disallowing perf_event_open(); TP-Link agrees to allow third-party firmware, Debian and Tor Services available as Onion Services, ...
Development: Free software and smartcards; Firefox 48; Django 1.10; LibreOffice 5.2; ...
Announcements: SPI board election, The End of Gmane?, ...

Next page: Security>>