event-stream, npm, and trust
Malware inserted into a popular npm package has put some users at risk of losing Bitcoin, which is certainly worrisome. More concerning, though, is the implications of how the malware got into the package—and how the package got distributed. This is not the first time we have seen package-distribution channels exploited, nor will it be the last, but the underlying problem requires more than a technical solution. It is, fundamentally, a social problem: trust.
Npm is a registry of JavaScript packages, most of which target the Node.js event-driven JavaScript framework. As with many package repositories, npm helps manage dependencies so that picking up a new version of a package will also pick up new versions of its dependencies. Unlike, say, distribution package repositories, however, npm is not curated—anyone can put a module into npm. Normally, a module that wasn't useful would not become popular and would not get included as a dependency of other npm modules. But once a module is popular, it provides a ready path to deliver malware if the maintainer, or someone they delegate to, wants to go that route.
That is just what happened with the event-stream package, as was recently discovered. The package allows creating streams that can be used both for I/O and for event handling. Its maintainer, Dominic Tarr, had stopped using the package some time ago, so his interest in maintaining it was low. As he noted in a comment on the bug report filed in the event-stream GitHub repository, someone volunteered to take it over:
As detailed in a blog post by Zach Schneider, who plucked various pieces out of the voluminous GitHub bug report thread, the attack that was inserted by the new maintainer, "right9ctrl", was clever. The commit log of changes right9ctrl made to event-stream was fairly innocuous; even the commit that added the malware was simply adding a new dependency on another npm module: flatmap-stream.
Had anyone looked, flatmap-stream might have seemed a bit of an odd dependency: it had one contributor and no downloads prior to its inclusion. Its contents might seem reasonable at first glance, but there is a tangled chain of malware contained there.
The flatmap-stream npm package had an extra file added into it that was not in the GitHub repository. It also had "minified" code that read the AES256-encrypted data stored in the file using the parent package's npm_package_description as the key. For all except one npm package, that decryption would fail (and be ignored) but, for the victim package, it resulted in JavaScript code that would be executed. That code does a decryption of a different chunk of the "extra" file that results in the payload code, which, naturally, gets executed.
As determined by brute-forcing the key from a list of all the npm package
descriptions, the victim package was copay-dash, which is a
"secure bitcoin wallet platform
" from a company called Copay. Given the presence of the word
"bitcoin", one can probably guess what the malware ultimately targeted. It
would send account information to the attacker, who would, presumably, use
it to abscond with the Bitcoin.
The dependency on flatmap-stream only lasted a little over ten days before it was replaced with a non-malware implementation of a "flat map" in event-stream itself. The npm blog post about the incident says that it was the Copay build process that was being subverted:
Copay's initial response was that that no builds containing this malicious code were released to the public, but we now have confirmation from Copay that "the malicious code was deployed on versions 5.0.2 through 5.1.0."
As Schneider noted, the JavaScript-development community is particularly vulnerable to this kind of problem:
He goes on to note JavaScript applications tend to be fast moving:
"its users install a lot of packages and updates, and are thus
vulnerable to malicious updates
". On the other hand,
problems can also occur from not updating frequently enough, he said,
pointing to the Equifax breach. He suggested two ways to avoid this kind of
thing in the future: locking the version number of dependencies to "known
good" versions and paying attention to the dependencies a project is
adding.
We have seen other related mayhem in the npm world before. Back in 2016, a developer deleting a simple left-pad npm module "broke the internet" because so much of the rest of the npm ecosystem relied on it to pad strings.
But the problem is not at all restricted to npm or JavaScript. Other languages have similar problems with their non-curated package repositories. Typosquatting is a related problem that has occurred with some frequency as well. Beyond that, it is not even just a problem for languages; as Dirk Hohndel pointed out in a talk back in May, today's containers are built up from many constituent parts gathered from all over the internet. Most of the container creators have no idea what is actually in them, what versions of code are being used, and so on. Docker and similar technologies are also part of the "move fast" school of development.
Certainly there have been some failures even in curated repositories—humans are not infallible. But curation and "move fast" tend not to play all that well together, which is why there is always such tension between the language-specific installation methods (e.g. npm, pip) and a distribution's package-management system. Users often just want the latest and greatest; they are not willing to wait for a distribution to get around to packaging it. That may be reasonable for a personal desktop or laptop—there are obvious risks (e.g. Bitcoin wallets) but they may be considered manageable—but the public release or deployment of a web application or component seems like it warrants a higher level of scrutiny.
Beyond more scrutiny, which is surely something that development teams should be doing no matter whether it slows things down, package maintenance is an area that clearly needs to be addressed. Tarr created a package that was useful to some, but apparently got no help in maintaining it. Once he had shared it, the left-pad fiasco shows there is no real way to "unshare" it, but he lost interest in maintaining it. In his statement about the event-stream malware, Tarr noted that the problem is widespread:
He continued by noting that sharing commit and publish rights was a
longstanding npm-community practice. "Open source is driven by
sharing! It's great! it worked really well before bitcoin got
popular.
" He suggested that people should either be paying maintainers
of the packages they use or step up to help maintain packages they depend on.
Once again, this is not in any way an "npm problem". The explosion of availability of open-source software has not really been met with a concomitant increase in the number of maintainers. There are, it seems, a lot of companies and others that are using open source without truly considering what that means. Even large projects like the Linux kernel suffer from a dearth of maintainers in some areas and events like Heartbleed exposed the maintenance problem for critical internet infrastructure like OpenSSL. Heartbleed led to the founding of the Core Infrastructure Initiative, but it is hard to see that kind of effort being extended down to the "leaves"—fixing it really requires users to step up.
Index entries for this article | |
---|---|
Security | Backdoors |
Security | Package repositories |
Posted Nov 29, 2018 1:50 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Nov 29, 2018 9:57 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (12 responses)
Posted Nov 29, 2018 11:47 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (1 responses)
Posted Nov 30, 2018 10:35 UTC (Fri)
by Lennie (subscriber, #49641)
[Link]
My guess is they would also use penetration testers, so why not someone who checks dependent packages.
Now I do think it would be better to pool the effort. By having big companies pay for a package version to be checked and then have that registered somewhere on the NPM website which version was checked by whom.
Also think maybe minified code should somehow be blocked in NPM packages ? Or at least discouraged, possibly flagged.
Or maybe packages should have a grade after it has been checked with code analysis tools. Maybe that could be a start.
So there are at least some things that can be done.
Posted Nov 29, 2018 12:39 UTC (Thu)
by excors (subscriber, #95769)
[Link] (9 responses)
Unless you only make tiny projects where you can read and understand every line of code that runs on the CPU, you inevitably have to put trust in other groups to provide safe software that you can build on.
I suspect the cost of trusting a group is fairly constant, regardless of the size of the group - it's a similar risk whether it's one guy writing a 350-line event-stream module or a large team of dozens of developers from Google or Apache or wherever providing a million-line framework. A large team has more people who could turn out to be malicious; but it also has internal code review and maybe external security audits, and its members may be paid a salary so they don't need to resort to crime, and the members are probably not anonymous so they can't easily escape punishment for their actions. The risk is not zero, but it's not hundreds of times higher than for the small single-developer module.
If the cost per dependency is roughly constant, you ought to prefer a small number of large dependencies over a large number of small dependencies. But the JS community (and some others) seem to take completely the opposite approach. If a module developer wants a basic feature, they could write it themselves or at least copy a code fragment from Stack Overflow into their module - but the culture is that they should depend on a tiny third-party module that provides that feature. And that tiny module probably depends on another tiny module from another developer. They see the value of code reuse, but appear to completely ignore that it comes with the cost of trusting more groups of developers, and small modules don't provide enough value to justify that cost.
Posted Nov 29, 2018 13:57 UTC (Thu)
by chris.sykes (subscriber, #54374)
[Link] (1 responses)
This hits the nail squarely on the head IMO. A hardware analogy would be the selection of components and vetting of suppliers/manufacturers for a PCB design. Every unique component and supplier has both an up-front development, and on-going maintenance cost.
I was recently shocked to find over 450 dependencies in 'node_modules' after running 'npm install' while following a tutorial for a popular web framework!
Posted Nov 30, 2018 1:23 UTC (Fri)
by excors (subscriber, #95769)
[Link]
Posted Nov 29, 2018 14:51 UTC (Thu)
by martin.langhoff (subscriber, #61417)
[Link] (5 responses)
Yes, a thousand times. A large standard library that is consistent in its API and is well-maintained is the established pattern for most languages. Trusting hundreds of mini-libraries and their maintainers is fraught with risk.
I can't wait for the NPM world to move towards a small number of "batteries included" libraries.
Posted Nov 30, 2018 3:26 UTC (Fri)
by roc (subscriber, #30627)
[Link] (4 responses)
We can have small standard libraries and tackle the package management, trust and consistency issues directly. It would be no more difficult to attach machine readable trust labels to packages than to combine them into a standard library.
Posted Nov 30, 2018 10:49 UTC (Fri)
by smcv (subscriber, #53363)
[Link]
Perl?
Posted Nov 30, 2018 11:31 UTC (Fri)
by niner (subscriber, #26151)
[Link]
It has learned from Perl 5 where all of the functionality mentioned is available in some module on CPAN but due to the lack of standard types, there are incompatibilities because module Foo handles DateTime objects while module Bar deals with Date::Manip::Date.
Posted Nov 30, 2018 12:31 UTC (Fri)
by excors (subscriber, #95769)
[Link]
How would trust labels help? It seems to me like the important issue is a social problem, not a technical problem. And arguably the technical problems actually help solve the social problem.
With C++, there are groups who take responsibility for large swathes of code - I could write a reasonable-sized application using libc, libstdc++, Qt, Boost, and not much else. For each of those groups, I can look at what processes they have for ensuring code quality, I can look at their track record and reputation, I know where to report serious problems and can expect a timely response, I can read the news to find out if they become dysfunctional or change ownership, etc. That's feasible since there's only a few. I can't do that if I've got dependencies from hundreds of independent sources.
I agree those large projects exist partly because of C++'s technical deficiencies - I've used libraries from Boost not because they're the best but just because I don't want the hassle of adding a new dependency into the build system and packaging scripts. Library authors try to get into Boost because they know lazy people like me will be more likely to use their library - the incentive is exposure and popularity. But the side effects are that they go through Boost's design review process with a bunch of smart people, they are held to Boost's standards for documentation and testing, other Boost members will take over maintenance if they're abandoned, etc. Similar incentives and side effects apply to the C++ standard library.
Without C++'s technical limitations forcing that kind of conglomeration, how else can we get those positive side effects? If you solve the packaging problem so that it's just as easy to use a random untrustworthy GitHub user's smart pointer library as it is to use Boost.SmartPtr, what incentive is there for anyone to use or contribute to a project like Boost, and how could such a project ever get off the ground? And without projects like Boost that maintain certain standards across a large amount of code, how we can trust the code we rely on?
Posted Nov 30, 2018 17:26 UTC (Fri)
by jezuch (subscriber, #52988)
[Link]
Case in point: it took I think close to a decade to untangle the large monolithic standard library of Java (which was ostensibly already split into different packages) into separate modules which can be independently included or excluded.
Posted Nov 29, 2018 16:18 UTC (Thu)
by hkario (subscriber, #94864)
[Link]
you don't have to verify every piece of the puzzle, you can delegate that to other people (distribution providers), but they need to be different people that wrote and released the software in the first place
Posted Nov 29, 2018 10:53 UTC (Thu)
by federico3 (guest, #101963)
[Link] (2 responses)
This is clearly a problem in the javascript community. It's well known for encouraging cowboy deployments into production, using npm.
Many other languages, including Python, can be deployed from Linux distribution. Distribution review, rebuild, test, bake-in and vet libraries and applications.
javascript, and also languages that encourage static linking and dependency vendorization, are hostile to packaging.
Posted Nov 29, 2018 12:49 UTC (Thu)
by hkario (subscriber, #94864)
[Link] (1 responses)
but then it's necessary because the concept of backwards compatibility is foreign to many of the same people
they've ignored lessons learned over half a century of software development so they are still having the same problems all over again
Posted Nov 29, 2018 14:14 UTC (Thu)
by Herve5 (subscriber, #115399)
[Link]
Posted Nov 30, 2018 20:39 UTC (Fri)
by gnu_lorien (subscriber, #44036)
[Link] (3 responses)
Posted Dec 2, 2018 10:26 UTC (Sun)
by paulj (subscriber, #341)
[Link] (2 responses)
"I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode".
To conclude:
"You can't trust code that you did not totally create yourself. … No amount of source-level verification or scrutiny will protect you from using untrusted code."
That point seems fundamental, and no one has managed to disprove it.
Posted Dec 5, 2018 0:49 UTC (Wed)
by david.a.wheeler (subscriber, #72896)
[Link] (1 responses)
This means that you really can review the source code.
The problem, in this case, is that control of the source code was handed to someone who was not trustworthy, and there is no meaningful review of its source code before it is included in other systems. That is an important but different problem.
Posted Dec 5, 2018 10:00 UTC (Wed)
by paulj (subscriber, #341)
[Link]
The DDC technique raises the bar for another Thompson to carry out Thompson's specific attack, but Thompson was making a more general and fundamental point: Trust is unavoidable, even with DDC.
Posted Dec 3, 2018 8:28 UTC (Mon)
by Yui (guest, #118557)
[Link]
Posted Dec 3, 2018 11:00 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link]
However, when the "wheel" is a 1 liner and when you first need to find said library, figure out its API, make sure that it actually works, check that the license is compatible with your project, and so on… Isn't it quicker to just write the one line you need?
Moreover, I've seen npm modules ship example websites with huge images, include openssl header files, binary files, and generic various crap that should not be there.
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
450 sounded impressive until I found this tool which says copay-dash has 1277 dependencies from 378 maintainers. And I think that's only the runtime dependencies, not the 'dev dependencies' that are needed for building and testing.
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
Go...
I have been following the development of an outbound-request-filtering utility named OpenSnitch (https://www.opensnitch.io/) which indeed imposes so much tree cloning that even me, the ignorant, was worried.
Knowing that Opensnitch is something about safety turned things even worse. Now I really don't know if this really is the thing to do ;-)
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust
https://dwheeler.com/trusting-trust/
event-stream, npm, and trust
event-stream, npm, and trust
event-stream, npm, and trust