LWN.net Logo

Capturing web attacks with open proxy honeypots

Honeypots, hosts specifically set up to attract abuse, have been around since at least 1990. Typically, they have been used to detect attacks against various network services, such as SMTP or SSH, but have not been very successful at detecting a wide range of web application attacks. Open proxy honeypots provide a more attractive target for malicious web traffic. Combining several open proxies leads to the Distributed Open Proxy Honeypots (DOPH) project which centralizes the monitoring of open proxies installed all over the globe.

Standard honeypot techniques do not provide much of interest to a web attacker, there is no high profile website to deface or high value information stored there. The honeypot is unlikely to be able to respond correctly to attempts to probe for vulnerable web applications. This makes it difficult to gather information on the variety of web attacks that are being used "in the wild". What is needed is a way to listen in on malicious traffic, which is exactly what a proxy can do.

A proxy is simply a program that forwards traffic for a client. It sits in the middle of the conversation, sending the client requests to the server and forwarding the server replies back to the client. As far as the server can see, it is only talking to the proxy system, it cannot tell that there is a client elsewhere actually making the requests. Proxies exist for a number of reasons, SOCKS is used to traverse firewalls, whereas anonymizers are used to obscure the origin of web traffic. There are also less visible proxies for load balancing or to get around the "same origin" policy of the XmlHttpRequest Javascript call. Most proxies have rules that govern who can use them and what destinations are legitimate, without those rules, it becomes an open proxy.

Probably the most famous open proxy was the default configuration of sendmail (before version 8.9.0 in 1998) which would forward email to and from any destination. Before the explosion of spam, it was considered neighborly to relay mail for anyone who asked.

A system configured as an open proxy for web traffic can record information about what it sees, with luck some portion of it will be malicious. But there is a subtle problem with this approach, the proxy host may be facilitating attacks on vulnerable web servers, attacks which appear to originate with the proxy. There is also concern that recording the "conversation" could run afoul of wiretapping laws. These problems require an open proxy honeypot, at least one that wants to avoid legal trouble, to take some steps to minimize them.

Informing someone that you are recording is typically enough to avoid wiretapping violations, so the DOPH project uses two separate warnings. The first is on the proxy host's webpage, but since most malicious users will never see that page, an additional warning was added to the HTTP headers returned by the host. Typically only programs see those headers, but it is, at least, an attempt to inform the recorded party.

A much more difficult problem is to stop "bad" traffic while proxying "good" traffic. The proxy must seem to function correctly or it will never be used, but honeypot operators are interested in stopping web abuse, so they want to minimize the chances of being used in a real attack. It is a very fine line, they want the bad traffic to study, but not to pass on.

The DOPH project uses the ModSecurity module for the Apache webserver to filter content based on a set of rules maintained by Got Root. The rules specify the signature of various attacks which causes ModSecurity to flag them as it inspects the website traffic. To try to fool attackers and/or their programs, a HTTP 200 (OK) status is returned when an attack is detected. The ModEvasive Apache module is also used to detect and stop the proxy being used in a denial of service attack.

Fully configured versions of the proxy are available from the project as VMware images that can be run using the "free as in beer" VMware server software. The DOPH proxy communicates back to a central data collection server, sending the ModSecurity audit log information. This allows the project to aggregate the information to determine what kinds of attacks are currently ongoing. A Web Security Threat Report (PDF), covering the first few months of the project, was released in April. Seven, geographically diverse, hosts participated during the first reporting period and the project is always looking for more people, willing to run proxy hosts, to increase their data gathering abilities.

Open proxies are used by attackers to mask their true location. It is not uncommon for a chain of proxies to be used, as it makes it more difficult to track back to the originator. If the chain crosses borders, using proxy servers in different countries, each with its own set of laws and procedures to access the server log files, it makes it that much harder. The DOPH project does not specify how they publicize their proxies, that might be giving too much information to attackers, but during the first four months of 2007, their servers handled around a million web requests of which roughly 20% was malicious or suspicious.

Attackers are likely to get more sophisticated over time and their tools will get better at recognizing these kinds of techniques, but there is still value in gathering the data. The proxy techniques will evolve as well which will allow statistics to be gathered and new attacks to be spotted. As the attackers recognize the threat, they will be more inclined to use proxies in an attempt to mask their location, which provides a kind of feedback loop driving more traffic to the honeypots. Open proxy honeypots cannot and will not fool all of the attacks, but they provide a way to study some of them.


(Log in to post comments)

Capturing web attacks with open proxy honeypots

Posted Jul 7, 2007 0:27 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

An SMTP relay is not a proxy. A proxy stands in for the client, making it appear to the server that the proxy is the client. An SMTP relay uses a protocol that makes it clear that the relay did not generate the mail; in fact, a reply to or error message for the mail does not normally go back to the relay.

In fact, if spam filtering had developed a little differently, there would not have been any need to stop open mail relaying: the goal of routing spam through an open relay was to circumvent spam filters that rejected mail that came directly from a known spammer mail server. If those filters were a little smarter and rejected mail that had a known spam originator anywhere in its received: chain, the open relay trick would not have worked.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds