SCALE 8x: Ten million and one penguins
At SCALE 8x, Ronald Minnich gave a presentation about the
difficulties in trying to run millions of Linux kernels for simulating
botnets. The idea is to be able to run a botnet "at scale
" to
try to determine how it behaves. But, even with all of the compute power
available to researchers at the US Department of Energy's Sandia National
Laboratories—where Minnich works—there are still various
stumbling blocks to be overcome.
While the number of systems participating in botnets is open to argument,
he said, current estimates are that there are ten million systems
compromised in the US alone. He listed the current sizes of various
botnets, based on a Network
World article, noting that "depending on who you talk to, these
numbers are
either low by an order of
magnitude or high by an order of magnitude
". He also said that it
is no longer reported when thousands of systems are added into a botnet,
instead the reports are of thousands of organizations whose systems have
been compromised. "Life on the Internet has started to really
suck.
"
Botnet implementations
Botnets are built on peer-to-peer (P2P) technology that largely came from
file-sharing applications—often for music and movies—which were
shut down by the RIAA. This made the Overnet, which was an ostensibly
legal P2P network, into an illegal network, but, as he pointed out, that
didn't make it disappear. In fact, those protocols and algorithms are
still being used: "being illegal didn't stop a damn thing
".
For details, Minnich recommended the Wikipedia articles on
subjects like the Overnet, eDonkey2000, and Kademlia distributed hash
table.
P2P applications implemented Kademlia to identify other nodes in a network
overlaid on the Internet, i.e. an overnet. Information could be stored and
retrieved from the nodes participating in the P2P network. That
information could be movies or songs, but it could also be executable
programs or scripts. It's a "resilient distributed store
".
He also pointed out that computer scientists have been trying to build
large, resilient distributed systems for decades, but had little or nothing
to do
with the currently working example; in fact, it's apparently currently being
maintained by money from organized crime syndicates.
Because the RIAA has shut down any legal uses of these protocols, it makes
it difficult to study:
"The good guys can't use it, but it's all there for the bad
guys
" And the bad guys are using it, though it is difficult to get
accurate numbers as he mentioned earlier. The software itself is written
to try to hide its presence, so that it only replies to some probes.
Studying botnets with supercomputers
In the summer of 2008, when Estonia "went down, more or less
"
and had to shut down its Internet because of an attack, Minnich and his
colleagues started thinking about how to model these kinds of attacks. He
likened the view of an attack to the view a homeowner might get of a forest
fire: "my house is on fire, but what about the other side of
town?
". Basically, there is always a limited view of what is being
affected by a botnet—you may be able to see local effects, but the
effects on other people or organizations aren't really known: "we
really can't get a picture of what's going on
".
So, they started thinking about various supercomputer systems they
have access to: "Jaguar" at Oak Ridge which has 180,000 cores in 30,000
nodes, "Thunderbird" at Sandia with 20,000 cores and 5,000 nodes, and
"a lot of little 10,000 core systems out there
". All of them
run Linux, so they started to think about running "the real
thing
"—a botnet with ten million systems. By using these
supercomputers and virtualization, they believe they could actually run a
botnet.
Objections
Minnich noted that there have been two main objections to this idea. The
first is that the original botnet authors didn't need a supercomputer, so
why should one be needed to study them? He said that much of the research
for the Storm botnet was done by academics (Kademlia) and by the companies
that built the Overnet. "When they went to scale up, they just went to the
Internet
". Before the RIAA takedown, the network was run legally on
the Internet, and after that "it was done by deception
".
The Internet is known to have "at least dozens of nodes
",
really, "dozens of millions of nodes
", and the Internet was the
supercomputer that was used to develop these botnets, he said. Sandia
can't really
use the Internet that way for its research, so they will use their in-house
supercomputers instead.
The second objection is that "you just can't simulate it
".
But Minnich pointed out that every system suffers from the same
problem—people don't believe it can be simulated—yet simulation
is used very successfully. They believe that they can simulate a botnet
this way, and "until we try, we really won't know
". In
addition, researchers of the Storm botnet called virtualization the "holy
grail" that allowed them to learn a lot about the botnet.
Why ten million?
There are multiple attacks that we cannot visualize on a large scale,
including denial of service, exfiltration of data, botnets, and virus
transmission, because we are "looking at one tiny corner of the
elephant and trying to figure out what the elephant looks like
", he
said. Predicting this kind of behavior can't be done by running 1000 or so
nodes, so a more detailed simulation is required. Botnets exhibit
"emergent behavior", and pulling them apart or running them at smaller
scales does not work.
For example, the topology of the Kademlia
distributed hash network falls apart if there aren't enough (roughly
50,000) nodes in the network. The botnet nodes are designed to stop
communicating if they are disconnected too long. One researcher would hook
up a PC at home to capture the Storm botnet client, then bring it into work
and hook it up to the research botnet immediately
because if it doesn't get connected to something quickly, "it just
dies
".
And if you don't have enough connections, the botnet dies: "It's kind
of like
a living organism
".
So, they want to run ten million nodes, including routers, in a
"nation-scale" network. Since they can't afford to buy that many machines,
they will use virtualization on the supercomputer nodes to scale up to that
size. They can "multiply the size of those machines by a
thousand
" by running that many virtual machines on each node.
Using virtualization and clustering
Virtualization is a nearly 50-year-old technique to run multiple kernels in
virtual machines (VMs) on
a single machine. It was pioneered by IBM, but has come to Linux
in the last five years or so. Linux still doesn't have all of the
capabilities that IBM machines have, in particular, arbitrarily deep
nesting of VMs:
"IBM has forgotten more about VMs than we know
". But, Linux
virtualization will allow them to run ten million nodes on a cluster of
several thousand nodes, he said.
The project is tentatively called "V-matic" and they hope to release the code at the SC10 conference in November. It consists of the OneSIS cluster management software that has been extended based on what Minnich learned from the Los Alamos Clustermatic system. OneSIS is based on having NFS-mounted root filesystems, but V-matic uses lightweight RAMdisk-based nodes.
When you want to run programs on each node, you collect the binaries and
libraries and send them to each node. Instead of doing that iteratively,
something called "treespawn" was used, which would send the binary bundle
to 32 nodes at once, and each of those would send to 32 nodes. In that
way, they could bring up a 16M image on 1000 nodes in 3 seconds. The NFS
root "couldn't come close
" to that performance.
Each node requires a 20M footprint, which means "50 nodes per
gigabyte
". So, a laptop is just fine for a 100-node cluster, which
is something that Minnich routinely runs for development. "This VM
stuff for Linux is just fantastic
", he said. Other cluster
solutions just can't compete because of their size.
For running on the Thunderbird cluster, which consists of nodes that are
roughly five years old, they were easily able to get 250 VMs per node.
They used Lguest virtualization because the Thunderbird nodes were
"so old they didn't have hardware virtualization
". For more
modern clusters, they can easily get 1000 VMs per node using KVM. Since they have
10,000 node Cray XT4 clusters at Sandia, they are confident they can get to
ten million nodes.
Results so far
So far, they have gotten to 1 million node systems on Thunderbird. They
had one good success and some failures in those tests. The failures were
caused by two things: Infiniband not being very happy being rebooted all the
time, and the BIOS on the Dell boxes using Intelligent Platform Management
Interface (IPMI), which Minnich did not think very highly of. In fact,
Minnich has a joke about how to tell when a standard "sucks
": if
it starts with an "I" (I20), ends with an "I" (ACPI, EFI), or has the word "intelligent" in
it somewhere; IPMI goes three-for-three on that scale.
So "we know we can do it
", but it's hard, and not for very
good reasons, but for "a lot of silly reasons
".
Scaling issues
Some of the big problems that you run into when trying to run a
nation-scale network are the scaling issues themselves. How do you
efficiently start programs on hundreds of thousands of nodes? How do you
monitor millions of VMs? There are tools to do all of that "but all
of the tools we have will break—actually we've already broken them
all
". Even the monitoring rate needs to be adjusted for the size of
the network. Minnich is used to monitoring cluster nodes at 6Hz, but most
big cluster nodes are monitored every ten minutes or
1/600Hz—otherwise the amount of data is just too overwhelming.
Once the system is up, and is being monitored, then they want to attack
it. It's pretty easy to get malware, he said, as "you are probably
already running it
". If not, it is almost certainly all over your
corporate network, so "just connect to the network and you've
probably got it
".
Trying to monitor the network for "bad" behavior is also somewhat
difficult. Statistically separating bad behavior from normal behavior is a
non-trivial problem. Probing the networking stack may be required, but
must be done carefully to avoid "the firehose of data
".
In a ten million node network, a DHCP file is at least 350MB, even after you
get rid of the colons "because they take up space
", and parsing the
/etc/hosts file can dominate startup time. If all the nodes can
talk to all other nodes, the kernel tables eat all of memory; "that's
bad
". Unlike many of the other tools, DNS is designed for this
"large world", and they will need to set that up, along with the BGP
routing protocol so that the network will scale.
Earlier experiments
In an earlier experiment, on a 50,000 node network, Minnich modeled the Morris worm and learned some interesting things. Global knowledge doesn't really scale, so thinking in terms of things like /etc/hosts and DHCP configuration is not going to work; self-configuration is required. Unlike the supercomputer world, you can't expect all of the nodes to always be up, nor can you really even know if they are. Monitoring data can easily get too large. For example, 1Hz monitoring of 10 million nodes results in 1.2MB per second of data if each node only reports a single bit—and more than one bit is usually desired.
There is so much we don't know about a ten million node network, Minnich said. He would like to try to do a TCP-based denial of service from 10,000 nodes against the other 9,990,000. He has no idea whether it would work, but it is just the kind of experiment that this system will be able to run.
For a demonstration at SC09, they created a prototype botnet ("sandbot") using 8000 nodes and some very simple rules, somewhat reminiscent of Conway's game of Life. Based on the rules, the nodes would communicate with their neighbors under certain circumstances, and, once they had heard from their neighbors enough times would "tumble", resetting their state to zero. The nodes were laid out on a grid which were colored based on the state of the node, so that pictures and animations could be made. Each node that tumbled would be colored red.
Once the size of the botnet got over a threshold somewhere between 1,000 and 10,000 nodes, the behavior became completely unpredictable. Cascades of tumbles, called "avalanches" would occur with some frequency, and occasionally the entire grid turned red. Looking at the statistical features of how the avalanches occur may be useful in detecting malware in the wild.
Conclusion
There is still lots of work to be done, he said, but they are making progress.
It will be interesting to see what kind of practical results come from this
research. Minnich and his colleagues have already learned a great deal
about trying to run a nation-scale network, but there are undoubtedly many
lessons on botnets and malware waiting to be found. We can look forward
to hearing about them over the next few years.
Index entries for this article | |
---|---|
Security | Botnets |
Conference | Southern California Linux Expo/2010 |
Posted Mar 11, 2010 7:02 UTC (Thu)
by speedster1 (guest, #8143)
[Link] (1 responses)
On a somewhat related note, I don't suppose there might be an article based on "Getting the Word Out"? I wanted to go, but the assigned last-session-of-last-day time-slot was a bad one -- packing up booths was still finishing up, and right afterward a fellow booth staffer needed a ride to LAX. Bet I wasn't the only one unable to stick around for it.
Posted Mar 11, 2010 22:00 UTC (Thu)
by jake (editor, #205)
[Link]
Well, it would be difficult to report on my talk directly -- I don't take notes while talking very well :)
But you make a good point, I should probably pull together an article on that topic.
As you pointed out, the slot was not the best, but I did end up with around ten folks and we had some good discussions. I may try to reprise it at other conferences as well.
thanks!
jake
Posted Mar 11, 2010 13:20 UTC (Thu)
by ortalo (guest, #4654)
[Link]
Posted Mar 11, 2010 16:09 UTC (Thu)
by Tara_Li (guest, #26706)
[Link] (2 responses)
Posted Mar 12, 2010 5:42 UTC (Fri)
by felixfix (subscriber, #242)
[Link]
Why, then, don't they shut them down and rid the net of all the spam? (My own domain varies from 500:1 to 1000:1 spam:real email, where spam means nonexistent accounts, not counting the various blue pill adverts.) I can only conclude that they want to leave them in place; in the event of an attack on the US portion of the tubes, or whatever is the trigger, they can instantly recruit the botnets for their own NSA purposes.
I call this my Giles theory in honor of a friend who was really good at plausible conspiracy theories that could not be dismissed out of hand like the knee slapper that Obama is not a US citizen. My favorite of his was that Microsoft was deliberately faking evidence, lying, etc, at their anti-trust trial because Bill Gates wanted to lose and have the feds take over control of Windows so that when the Y2K feces hit the fan, he could wash his hands of it and point to the feds as responsible. "My hands are tied" was what he expected Bill Gates to be ready to say.
Posted Mar 12, 2010 16:21 UTC (Fri)
by obrakmann (subscriber, #38108)
[Link]
Posted Mar 11, 2010 21:55 UTC (Thu)
by jonabbey (guest, #2736)
[Link] (2 responses)
Posted Mar 11, 2010 22:02 UTC (Thu)
by jake (editor, #205)
[Link]
Well, I am just reporting on what Ron said, but I presume he is referring to the RIAA takedown of eDonkey.
jake
Posted Mar 20, 2010 3:10 UTC (Sat)
by rminnich (guest, #64556)
[Link]
I just find it very amusing that lawyers feel that shutting these guys did
hence my joke that if p2p is outlawed, only outlaws will have p2p.
And they
Posted Mar 13, 2010 13:12 UTC (Sat)
by asherringham (guest, #33251)
[Link]
Might be good to see if Ronald Minnich would write something about this for a future LWN article?
Alastair
Posted Mar 18, 2010 13:42 UTC (Thu)
by inouk (guest, #64516)
[Link] (2 responses)
Minnich said:
"Minnich is used to monitoring cluster nodes at 6Hz, but most big cluster nodes are monitored every ten minutes or 1/600Hz—otherwise the amount of data is just too overwhelming."
We usually say "we monitor something each minute", but 6Hz and 1/600Hz ? I'm not sure I understand about Hz (cycle).
Thanks!
Posted Mar 18, 2010 14:07 UTC (Thu)
by jake (editor, #205)
[Link] (1 responses)
I thought it was a bit odd that he put it that way, but that's what I have in my notes. 1Hz = 1 cycle/sec, so 6Hz is sampling 6 times/sec and 1/600Hz is sampling every 600 seconds (i.e. 10 minutes).
jake
Posted Mar 20, 2010 3:05 UTC (Sat)
by rminnich (guest, #64556)
[Link]
Thanks
ron
Posted Mar 20, 2010 3:14 UTC (Sat)
by rminnich (guest, #64556)
[Link]
Thanks again to SCALE8x for an excellent conference. I've only been twice
ron
SCALE 8x: Ten million and one penguins
SCALE 8x: Ten million and one penguins
SCALE 8x: Ten million and one penguins
Oops, should have said "counter-measures". Sorry.
SCALE 8x: Ten million and one penguins
My very own conspiracy theory
SCALE 8x: Ten million and one penguins
SCALE 8x: Ten million and one penguins
SCALE 8x: Ten million and one penguins
SCALE 8x: Ten million and one penguins
sharing network that used these protocols, nice writeup here:
http://en.wikipedia.org/wiki/EDonkey2000
anything but push it all underground. From what I am told, the content they
wanted to "protect" is all out there and all available. So what, precisely,
did they accomplish?
do ;-)
SCALE 8x: Ten million and one penguins
SCALE 8x: Ten million and one penguins
SCALE 8x: Ten million and one penguins
SCALE 8x: Ten million and one penguins
physical phenomena like that.
SCALE 8x: Ten million and one penguins
the things I end up saying in public :-)
and it's been great each time.