LWN.net Logo

The TALPA molehill

By Jonathan Corbet
August 6, 2008
The TALPA malware scanning API was covered here in December, 2007. Several months later, TALPA is back - in the form of a patch set posted by a Red Hat employee. The resulting discussion has certainly not been what the TALPA developers would have hoped for; it is, instead, a good example of how a potentially useful idea can be set back by poor execution and presentation to the kernel community.

The idea behind TALPA is simple: various companies in the virus-scanning business would like a hook into the kernel which allows them to check for malware and prevent its spread. So the patch adds a hook into the VFS code which intercepts every file open operation. A series of filters can be attached to this intercept, with the most important one being a mechanism which makes the file being opened available to a user-space process as a read-only file descriptor. That process can scan the file and tell the kernel whether the open operation should be allowed to proceed or not. In this way, the scanning process can prevent any sort of access to files which are deemed to contain bits with evil intentions.

There are a few other details, of course. A caching mechanism prevents rescanning of unchanged files, increasing performance considerably. There is also a hook on close() calls which can trigger the rescanning of a file. Processes can exempt themselves from scanning if it might get in their way; scanning can also be turned off for specific files, such as those used for relational database storage. But the patch set is relatively small, as it really does not have that much to do.

This capability could well prove to be useful. Even if one is not concerned about malware infections on Linux systems, a lot of files destined for more vulnerable platforms can pass through Linux servers. There is also the potential for the detection of attempted exploits of the Linux host. Normally, in the Linux world, the way we respond to knowledge of a specific vulnerability is to patch the problem rather than scan for exploits, but there may be systems which cannot be restarted on short notice, and which could benefit from an updated scanning database while running code with known vulnerabilities. Also, as Alan Cox pointed out, this feature could be useful for entirely different objectives, such as efficient indexing of files as they change.

What might be best of all, though, is that this hook could replace a number of rather less pleasant things being done by anti-malware vendors now. Some of these products use binary-only modules, plant hooks into the system call table, and generally behave in unwelcome ways. Moving all of that to a user-space process behind a well-defined API could be beneficial for everybody involved.

The patches have gotten a generally hostile reception on the kernel mailing lists, though. Some developers are uninspired about the ultimate objective:

So you are going to try to force us to take something into the Linux kernel due to the security inadequacies of a totally different operating system? You might want to rethink that argument.

That's an objection which can be worked around; the kernel developers do not normally want to determine which applications will or will not be supported by the system as a whole.

Another objection, though, might be harder: this hook is said not to be the best solution to the problem. Instead of putting a hook deep within the VFS layer, the anti-malware people could simply hook into the C library (perhaps with LD_PRELOAD), put the malware scanning directly into the processes (mail clients or web servers, say) which are passing files through the system, or embed the scanning into a stackable filesystem implemented with FUSE (or a similar mechanism). That has led to counterarguments that scanning implemented in this manner could be evaded by a hostile application - by performing system calls directly, for example, instead of going through the C library. Certain kinds of attacks, it is said, could get around a purely user-space solution.

That argument, however, highlights the real problem with this posting. The patch includes a set of 13 "requirements," including intercepting file opens, caching results, exempting processes, and so on. But none of these requirements describe the problem which is really being solved. In particular, as noted by Al Viro and others, there is no description of the threat which this patch is intended to mitigate:

Various people had been asking for _years_ to define what the hell are you trying to prevent. Not only there'd been no coherent answer (and no, this list of requirements is _not_ that - it's "what kind of hooks do we want"), you guys seem to be unable to decide whether you expect the malware in question to be passive or to be actively evading detection with infected processes running on the host that does scanning.

If the scanning host could be infected, then a scanning mechanism which could be circumvented by a rogue program is indeed a problem. But that is a very different threat than simply trying to prevent evil attachments from creating mayhem on Windows boxes; it does not appear to be a threat which these patches are trying to address.

The lack of a clearly described problem has caused the discussion of these patches to go around in circles; it is not possible to evaluate (1) whether the goals of these patches are worth supporting, or (2) whether the patches can actually be successful in achieving those goals. The code, in other words, cannot be reviewed. Until the TALPA developers can clarify that situation, their work will look like an example of "shoot first, then aim." That kind of code tends not to make it into the mainline, even if it could be useful in the end.


(Log in to post comments)

The TALPA molehill

Posted Aug 6, 2008 17:35 UTC (Wed) by dcoutts (guest, #5387) [Link]

The rest of Al Viro's comment is even better:

Moreover, the answer seems to be changing back and forth to suit the needs of the moment in the argument. Slightly exaggregated it goes like this:

-- Why don't you do $FOO?
-- Running virus would be able to evade $FOO, of course!
-- No shit, Sherlock; it would also be able to evade much more intrusive $BAR you are proposing; here's how <obvious evasion method>
-- Oh, but that's not a problem; think of Linux server with Windows clients and Windows viruses...

Requirements

Posted Aug 6, 2008 20:19 UTC (Wed) by NAR (subscriber, #1313) [Link]

But none of these requirements describe the problem which is really being solved.

It's so typical, when the costumer thinks he knows what he wants and specifies how things should be implemented, not what should be implemented...

Requirements

Posted Aug 6, 2008 22:41 UTC (Wed) by smitty_one_each (subscriber, #28989) [Link]

Even more importantly, why.
Oh, the joy of the solution-in-search-of-a-problem.

Requirements

Posted Aug 6, 2008 22:57 UTC (Wed) by nix (subscriber, #2304) [Link]

The problem is `AV vendors want to keep using DOS-style antivirus scanners 
with their guaranteed revenue stream'...

(note the absence of actual *security* anywhere in there.)

Requirements

Posted Aug 7, 2008 13:50 UTC (Thu) by kirkengaard (subscriber, #15022) [Link]

More to the point, they have a threat model formed by the vulnerabilities of said OS
ecosystem, and have successfully used it for so long that it's corrupted their thinking.  So,
"AV vendors want to keep operating under DOS-style threat-mitigation heuristics and
methodologies, which for so long have proven functional."  Their business (and profit) is
based on being functional, and not being allowed to simply apply their business methods in a
given case threatens that.  Cf. the Vista flack over kernel hooks.

The sin of not realizing that the game changes based on the terrain on which it is played.
Not being able to properly conceive of the game with fundamentally different conditions.  Like
using massed-troop methods against guerrillas, short-supply-chain logistics to invade Russia,
or cold-war anti-state information methods against terrorist groups.  Determine the problem as
a member of an already-acknowledged class into which it doesn't properly fit, and the
solutions do not properly fit.  It's an abstraction error.

Or, more simply, the old saw: "don't attribute to malice what is adequately explained by
stupidity."

Enumerating badness

Posted Aug 6, 2008 20:47 UTC (Wed) by boog (subscriber, #30882) [Link]

"Normally, in the Linux world, the way we respond to knowledge of a 
specific vulnerability is to patch the problem rather than scan for 
exploits"

Our editor's point here is key. It is hopeless to "enumerate badness" 
(e.g. http://www.ranum.com/security/computer_security/editorial... ) 
Scanning for exploits is always going to be a lost cause - viz windows 
security and the ineffectiveness of the whole anti-malware industry.

However, as suggested, there are a few situations where the mechanism 
might be temporarily useful.

Enumerating badness

Posted Aug 6, 2008 22:53 UTC (Wed) by ctpm (subscriber, #35884) [Link]

   "Scanning for exploits is always going to be a lost cause - viz 
windows security and the ineffectiveness of the whole anti-malware 
industry."

 Well, its effectiveness doesn't matter really. What matters is that it 
makes money. Lots of it. And the truth is that 99% of the World is just 
gullible and insists on thinking that security holes are handled by 
scanning for viruses/malware and not by patching holes.

 You may even call it a conspiracy theory, but the fact is that there is 
a many-billion dollar industry behind AV/Malware scanning that probably 
feels that its core business is threatened by the emergence of 
alternative free and open systems. These people have no interest on 
secure operating systems, since those represent a major loss of revenue.

 The way I see it, those patches the article talks about (which seem 
rather more like a solution in search of a problem), may be just the 
effects of the AV industry lobbying some Linux vendors just to try to 
convince end users that they need to pay for AV/Malware scanning, just 
like Windows users, so that money keeps flowing...


Enumerating badness

Posted Aug 6, 2008 23:14 UTC (Wed) by rahvin (subscriber, #16953) [Link]

Not as concise as I would put it so I will summarize. 

They want to sell Anti-virus to Linux Users AND (more importantly) to Linux Servers that
handle Windows traffic. 

It's easy with windows, they can sell their AV solution for the server, and a separate more
expensive server package that scans the hosts and traffic across the file server. Right now on
Linux all of this is being handled in user space with free open source programs that scan
specific server traffic like email of SMB traffic. 

With the right kernel hooks they would have something they could create a product on and throw
their marketing weight behind. Without the hooks they are up against the question of "how is
this better the clamd?" With the hooks they can create a very invasive AV package, much like
the windows versions that hooks itself deep into the kernel, hurting performance with
negligible benefit but with the ability to claim that their package scans at the Kernel level
every file that passes through the Linux system. This would make it possible to sell Norton /
McAfee AV for Linux, and Norton / McAfee AV for Linux SMB. Without the hooks everyone can see
the negligible value, with the hooks it becomes much harder to compare because I think
everyone can admit there might be some situation where the Kernel Level hooks could grab
something the user space tool wouldn't be able to.

Enumerating badness

Posted Aug 7, 2008 7:30 UTC (Thu) by wblew (subscriber, #39088) [Link]

Very insightful.... here is one additional thought: How about, once those hooks exist, that
they be used by the next release of clamd? Those vendors are again screwed....

Here is those vendors' real problem: with open source operating systems, the vulnerabilities
get patched because any user *CAN DO THAT*.

Enumerating badness

Posted Aug 7, 2008 8:45 UTC (Thu) by dan_a (subscriber, #5325) [Link]

Users can do that, but often don't - or don't until it's too late.  It would be good to have
an extra layer of protection against problems.  In my experience on Linux though this is far
more likely to be exploiting vulnerable web scripts than Windows style viruses, and so TALPA
and a virus scanner may not be the solution.

Viruses do not depend on vulnerabilities

Posted Aug 7, 2008 9:26 UTC (Thu) by epa (subscriber, #39769) [Link]

The traditional 'computer virus' does not depend on exploiting kernel or userspace
vulnerabilities to get more privileges.  It just attaches itself to every executable it can
write (and on Unix, I suppose, it might add itself to shell scripts).  So patching is not a
way to avoid viruses.  Not running untrusted code is a way to avoid them, but can any of us
here honestly claim that we audit all source code before typing 'make install'?  Or verify PGP
signatures on the tarball?  Wouldn't non-technical users download and install the Flash plugin
or Nvidia drivers without a second thought?

Enumerating badness

Posted Aug 7, 2008 16:14 UTC (Thu) by iabervon (subscriber, #722) [Link]

Of course, Linux servers that handle Windows traffic handle it in userspace as bulk data. They
need hooks into Samba, not the kernel. I'm not even completely certain that you can't do a
client-to-client transfer with Samba without Samba ever calling open(), if one client is
reading while another client writes. And there's no particular reason to think that a server
on Linux would store the content as recognizable files in its filesystem which it opens again
before serving them.

This sort of hook only makes any sense at all for protecting the local system, where the
kernel-provided filesystem is what programs use directly, and it seems unlikely, to me at
least, that bulk filesystem scanning will find a non-trivial portion of threats to a Linux
system.

Enumerating badness

Posted Aug 7, 2008 10:23 UTC (Thu) by nix (subscriber, #2304) [Link]

As I understand it, these are more the effect of RH asking the vendors to please define a
consistent interface so they can use an interface that's already there, instead of using
appalling hooks deep into the kernel from binary-only modules (overwriting things in the
sys_call_table, finding file_operations structures and overwriting their pointers, that sort
of thing). Even a nasty kernel interface is better than *that*.

Enumerating badness

Posted Aug 7, 2008 15:12 UTC (Thu) by mrshiny (subscriber, #4266) [Link]

Not all malware relies on patchable vulnerabilities.  Even being alerted to a known exploit
arriving from some vector allows a user to take action, even if they're not vulnerable.  I'd
certainly stop browsing a site if I discovered they were trying to infect my computer with
something, even if they can't infect it.  

Though I generally wouldn't run an anti-virus scanner because of the performance implications
- they tend to suck up tons of CPU resources and are awful for developers or anyone who
creates lots of files.  But that doesn't mean there is no use-case for anti-malware scanning.

And a resolution...

Posted Aug 7, 2008 20:41 UTC (Thu) by davecb (subscriber, #1574) [Link]

Amusingly, the credential work noted in 
http://lwn.net/Articles/292986/
can provide an elegant resolution to
the AV folk's perceived needs.

I suspect it was buried in the noise,
however...

--dave

The TALPA molehill

Posted Aug 8, 2008 9:38 UTC (Fri) by jcm (subscriber, #18262) [Link]

So I'm running malware-list for these guys and I'll be sorting out the lack of public indexing
- it's not intentional, it's just a fact that I'm travelling this week and can't fix the
mailman setup until next week.

When I was looking at this problem (before cunningly handing it off to Eric :P) my main
concern was trying to do away with the hacks - especially syscall table hacks (which these
days not only have to unprotect the table, but deal with relocatable kernel issues) - and have
something more pragmatic. No "solution" can ever guarantee that bad bits aren't getting into
the system - you can mmap a file and feed "bad" bits into it that other applications will see
but cunningly arrange for the file to seem ok on open/close, and other things. But a small
hook is hardly a big deal for the kernel especially if there's no overhead for those who don't
use it.

The alternative would seem to be that vendors end up being pressured into taking patches into
Enterprise kernels that are disjoint from upstream.

The TALPA molehill

Posted Aug 8, 2008 9:45 UTC (Fri) by jcm (subscriber, #18262) [Link]

The public indexing should be working properly now.

The TALPA molehill

Posted Aug 10, 2008 11:58 UTC (Sun) by mattmelton (guest, #34842) [Link]

This does seem to be a case of the old developers-dont-run-web-servers syndrome. 

While I know many people do run various servers for various reasons, I don't believe the
people commenting negatively on the principle have clearly thought out the application of
Linux in the business environment.

Linux is massively popular with dedicated server providers and virtual sever providers. These
providers laden their hosts with a multitude of software - software that is invariably
untested and sometimes often the very newest revision. Take the virtual server package
Virtuozzo which ships the admin panel Plesk which provides a HTTPd, FTPd and mail server etc.
If you look at ISPs like the Planet (aka EV1.net) they ship not just Plesk, but Ensim and
CPanel too - you name it. Linux is used to host a wide spectrum of applications at different
levels - be it the ISP using it to host Virtual servers, the reseller providing an API or the
end user installing applications and using the provided applications.

A long time ago, a server I was using was unfortunately compromised. It transpired that the
host we used had unmanaged hubs and that one of the unpatched and adjacent Plesk boxes was
used to ARPjack our box. Ultimately our passwords were captured and we had malware installed.
This was not my fault, but it does highlight the two problems where a kernel-based malware
scanner would help.

1) Ignoring the fact the box was unpatched, if the adjacent Plesk machine had a kernel based
malware scanner that prevented the hostile user from storing, opening or downloading his
ARPjacking toolset (a set of shell servers which merely intercept passwords) we would not have
been compromised.

2) If our box was configured with some kind of malware protection, it may have at the very
least sent out a warning message once the root user was found to be downloading malicious
software.

In my situation, I could not afford to harden my box to allow only a few services - it
wouldn't have matter since my root password was compromised anyway. I was unable to prevent my
box from being hijacked, and I believe it is impossible to properly harden a general purpose
hosting package from attack when they export so many desired services.

I understand the floodgate theory everyone is scared of. People are worried that if end users
begin relying on anti-malware products for a sense of security, then they will neglect proper
security practice, leaving their system with the pretence of security rather than actually
being hardened. But I ask, which is the better option? A security professional unable to
thwart the kind of attack I suffered, or a security professional who receiving an email saying
something was a little fishy with the last thing the root user downloaded?

I also take issue with how more useful such a kernel machnism is for admins. It is clear to me
that an unobtrusive mechanism that updates malware definitions is far more likely to be
allowed to be automated and turn on by default than a patching mechanism like up2date or a
cron "apt get update". When ISPs have to scrutinise every single patch they apply, there is a
vulnerability void present between the release of the patch, and the potential for
exploitation.

Not forgetting the end user, if we take a side step and look at the news recently, we have
seen that it easy enough for any individual to set-up a malicious 3rd party package
repository. As more people turn to Linux, unfamiliar with autoconf or just unable to compile
software themselves, we see them using more and more 3rd party repositories. When it requires
nothing but a google search to find are repo with a package, or indeed a malicious package
itself, we should be more careful with what we are downloading - or we need something to be
more careful about what we are doing.

(yes, there is an argument about hashing to be had - but then that argument really does seem
to fail when you download customised builds, platform specific builds or nightly builds)

In a day when we have clever attacks and automated updates we should look to prevent them as
best we can. Security vulnerabilities are not illusionary just because you're using a Linux
kernel. They exist in the way we use every peice of software - and the Linux kernel is the
best place to implement a warden.

Matt

The TALPA molehill

Posted Aug 12, 2008 18:25 UTC (Tue) by jhohm (guest, #7225) [Link]

Not sending the root password to the box in cleartext would have prevented your box being
hijacked.

The TALPA molehill

Posted Aug 13, 2008 16:14 UTC (Wed) by mattmelton (guest, #34842) [Link]

Not at all - we used SSH and SCP/SFTP. An ARPjacking isnt merely a NIC in promiscuous mode.
The tools used against us were replacement daemons running on another host that was
periodically emitting our MAC/IP association.

What would have helped would have been a certificate policy - "has someone changed/updated the
SSH certificate/server/encryption?". When faced with that question, we should have stopped and
phoned one another. Regrettably one of us chose to accept the new certificate and thus sending
our password to the fake daemon.

In terms of how unavoidable these novel and targeted attacks on general purpose hardware are,
I think I have shown a fair example. Whether or not it mandates a kernel level mechanism that
doesn't already exist is the topic for discussion.

As food for thought, only a few weeks ago Metasploit was compromised in the same way -
checkout Moore's statement: http://www.haloscan.com/comments/alexeck/964311044981251862

 

The TALPA molehill

Posted Aug 19, 2008 19:38 UTC (Tue) by job (guest, #670) [Link]

If you accept changed host keys you might as well run unencrypted traffic. No malware scanner
can help you there. You can be victim to any number of other tricks, including DNS spoofing.

The TALPA molehill

Posted Aug 24, 2008 16:35 UTC (Sun) by mattmelton (guest, #34842) [Link]

Of course. And I know the painful side of this. The problem here is rather than having an illusion of security (as it often is with poor inadequate software), lack the proposed file-access mechanisms is providing me with no information. For me, no information is probably as bad as having the wrong information - if configured so, virus scanners do well to inform network admins.

The TALPA molehill

Posted Aug 19, 2008 19:41 UTC (Tue) by job (guest, #670) [Link]

The article implies that the use case for this is Linux file servers with Windows clients.
Then why don't they use inotify and do it all in user space? Far simpler, far less intrusive.

The TALPA molehill

Posted Aug 23, 2008 21:13 UTC (Sat) by jhansonxi (guest, #51242) [Link]

How does TALPA compare to Dazuko?
http://www.dazuko.org/faq.shtml

The TALPA molehill

Posted Mar 10, 2009 8:22 UTC (Tue) by florin-lwn (guest, #57068) [Link]

A use case scenario would be HSM http://en.wikipedia.org/wiki/Hierarchical_storage_management.

Has two parts :

-Moving least recently used files on slower/cheap media.
-Bring them back when are needed.
Problems:

a. creating a list/database of candidate files
can be done when High Water Mark is reached:
1. by using find -atime (not optimal but can have rationale)
2. by inotify/FAM
3. TALPA/Dazuko

b. bringing back files
block requesting process until file is automatic or manual recovered from storage(robotic library,tapes, opticals,slow disk or inner zone of disk)

The TALPA molehill

Posted Mar 10, 2009 8:26 UTC (Tue) by florin-lwn (guest, #57068) [Link]

... TALPA / Dazuko can be used for this task.

Sorry , I forgot the last line (:

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds