|| ||James Troup <james-AT-nocrew.org>|
|| ||more details on the recent compromise of debian.org machines|
|| ||Fri, 28 Nov 2003 01:04:00 +0000|
*NB* bear in mind that:
a) the information on the break-in in comes from compromised machines
and thus has to be taken with appropriate skepticism.
b) the investigation is still ongoing - as I was writing this draft
further information came to light which may invalidate a lot of
it. [Or not - as it turns out].
On November 20 it was noticed that master was kernel oops-ing
lots. While investigating this it was discovered that murphy was
showing the exact same oops, which was an overly suspicious
coincidence. Also klecker, murphy and gluck have aide installed to
monitor filesystem changes and at around the same time it started
warning that /sbin/init had been replaced and that the mtime and ctime
timestamps for /usr/lib/locale/en_US had changed.
Investigation revealed the cause for both these things to be the
suckit root kit (see the "Suckit" appendix for more info).
On Wednesday 19th November (2003), at approximately 5pm GMT, a sniffed
password was used to access an (unprivileged) account on
klecker.debian.org. Somehow they got root on klecker and installed
suckit. The same account was then used to log into master and gain
root (and install suckit) there too. They then tried to get to murphy
with the same account. This failed because murphy is a restricted box
that only a small subset of developers can log into. They then used
their root access on master to access to an administrative account
used for backup purposes and used that to gain access to Murphy. They
got root on murphy and installed Suckit there too. The next day they
used a password sniffed on master to login into gluck, got root there
and installed suckit.
See the "Time-line" appendix for more details on times.
Gluck was powered down and an image has been made of it's disks for
Since we didn't have direct physical access to klecker it's Internet
connection was shut down and disk images were made via serial console
to a local machine on a firewalled net connection.
master and murphy were kept running for a short while in order to make
an announcement of the compromise, after which they were also taken
off-line and imaged.
After a thorough cleanup and reinstall of modified files the non-US and
security archives were verified by looking at mirror logs for changes and
comparing MD5 checksums of the files on Klecker and those on three
different trusted mirrors.
Gluck, Master and Murphy were wiped and reinstalled from CD. data and
services are in the process of being restored.
All machines and data were checked for devices outside of /dev, suid
executables, writable files, etc. and all suspicious files were
removed. Services (and their scripts/programs) are being compared to
known-good sources and sanity checked before being re-enabled.
Since we now knew we had compromised accounts and sniffers on our
hands we had to assume that that an unknown number of accounts were
now compromised, so all accounts were locked, passwords invalidated
and ssh authorised keys removed.
How could this happen?
All the compromised machines were running recent kernels and were
up-to-date with almost all security updates.
However there was two problems.
(1) The kernels running on the machines in question didn't all get a
ptrace fixed kernel as fast one might have liked. Master, Klecker
and Murphy got new kernels in May but Gluck for various reasons
didn't get upgraded till August (although I believe it had
/proc/sys/kernel/modprobe fixed to at least block the most common
exploit before that).
(2) Master had a copy it's old harddrive still lying around by
accident. Unfortunately it had a lot of old, unpatched suid
binaries on it.
Although these could have been the attack vector, I don't believe they
were. (2) seems unlikely simply because master wasn't, AFAWK, the
first host compromised. Although it's possible an attacker with local
access to gluck got root through (1), it seems unlikely they'd sit on
that for <n> months and then use it on several machines only to
comeback and rootkit several debian.org machines and at least one
(that we know of) other unrelated system at the same time (and which
didn't have an extended ptrace vulnerability exposure.)
Based on that and the forensics on the unrelated system mentioned
above, I believe that there was an as of yet unknown local root
exploit used to go from having local unprivileged access to having
Where do we go from here?
Unfortunately due to the fact there is (I believe) an unknown local
root exploit in the wild, we can't yet unlock the Debian accounts.
Obviously we can't continue without LDAP accounts for very long
either. At the moment I'd ask for a little more patience both a)
while the painful and painstaking task of restoring machines one by
one is completed and b) while we try and exhaust all reasonable
avenues of investigation to determine how the attacker went from
unprivileged to root.
Obviously we're looking at hardening our boxes and tightening up our
procedures to try and stop this from happening again. I'll send more
details on that later.
Developers worried about their own machines might like to have a look
o Adam Heath and Brian Wolfe for their work on master & murphy.
o Wichert Akkerman for his work on klecker.
o Dann Frazier and Matt Taggart for their work on gluck.
o Michael Stone and Robert van der Meulen for their forensics work.
o Jaakko Niemi for his work on checking and re-enabling lists.debian.org.
o Colin Watson for his work on checking and re-enabling bugs.debian.org.
o Josip Rodin for his work on checking and re-enabling the lists web archives.
[This text is based on a draft by Wichert Akkerman.]
All times in GMT.
o Klecker init timestamp: Nov 19 17:08
o Master sk timestamp: Nov 19 17:47
o Murphy sk timestamp: Nov 19 18:35
o Oopses on Murphy start: Nov 19 19:25
o Oopses on Master start: Nov 20 05:38
o Gluck init timestamp: Nov 20 20:54
Suckit is a rootkit which installs a sniffer, a process hider, a file
hider and a backdoor login in a running kernel. Apparently there was a
flaw in its kernel code which caused the kernel to oops on master and
murphy. This also explained why /sbin/init was replaced: the new init
loads suckit into the kernel and then proceeds to start the real init,
making sure that it is still active after a reboot.
 Klecker: 2.4.22, Master & Murphy: 2.4.21-rc2, Gluck: 2.4.22rc2
 klecker was missing the latest postgresql updated. ssh on all
machines was a DSA-customized version which was missing only the
3rd and final round (i.e. Solar Designer's patches) of ssh
P.S. As always, I speak only for myself.
To UNSUBSCRIBE, email to email@example.com
with a subject of "unsubscribe". Trouble? Contact firstname.lastname@example.org
to post comments)