Domesticating applications, OpenBSD style

By Jonathan Corbet
July 21, 2015

One of the many approaches to improving system security consists of reducing the attack surface of a given program by restricting the range of system calls available to it. If an application has no need for access to the network, say, then removing its ability to use the socket() system call should cause no loss in functionality while reducing the scope of the mischief that can be made should that application be compromised. In the Linux world, this kind of sandboxing can be done using a security module or the seccomp() system call. OpenBSD has lacked this capability so far, but it may soon gain it via a somewhat different approach than has been seen in Linux.

It is fair to characterize the sandboxing features in Linux as being relatively complex. The complexity of the security module options, and SELinux in particular, is legendary. The seccomp() system call has two modes: very simple (in which case almost nothing but read() and write() is allowed), or rather complex (a program written in the Berkeley packet filter (BPF) language makes decisions on system call availability). There is a great deal of flexibility available with both security modules and seccomp(), but it comes at a cost.

OpenBSD leader Theo de Raadt is particularly scornful of the BPF-based approach:

Some BPF-style approaches have showed up. So you need to write a program to observe your program, to keep things secure? That is insane.

His posting contains a work-in-progress implementation of a simpler approach to sandboxing (mostly written by Nicholas Marriott, it seems) in the form of a system call named tame().

The core idea behind tame() is that most applications run in two phases: initialization and steady-state execution. The initialization phase typically involves opening files, establishing network connections, and more; after initialization is complete, the program may not need to do any of those things. So there is often an opportunity to reduce an application's privilege level as it moves out of the initialization phase. tame() performs that privilege reduction; it is thus meant to be placed within an application, rather than (as with SELinux) imposed on it from the outside.

The system call itself is simple enough:

    int tame(int flags);

If flags is passed as zero, the only system call available to the process thereafter will be _exit(). This mode is thus suitable for a process cranking on data stored in shared memory, but not much else. For most real-world applications, the reduction in privilege will need to be a bit less heavy-handed. That is what the flags are for. If any flags at all are present, a base set of system calls, with read-only functionality like getpid(), is available. For additional privilege, specific flags must be used:

TAME_MALLOC provides access to memory-management calls like mmap(), mprotect(), and more.
TAME_RW allows I/O on existing file descriptors, enabling calls like read(), write(), poll(), fcntl(), sendmsg() and, interestingly, pipe().
TAME_RPATH enables system calls that perform pathname lookup without changing the filesystem: chdir(), openat() (read-only), fstat(), etc.
TAME_WPATH allows changes to the filesystem: chmod(), openat() for writing, chown(), etc. Note that TAME_RPATH and TAME_WPATH both implicitly set TAME_RW as well.
TAME_CPATH allows the creation and removal of files and directories via rename(), rmdir(), link(), unlink(), mkdir(), etc.
TAME_TMPPATH enables a number of filesystem-related system calls, but only when applied to files underneath /tmp.
TAME_INET allows socket() and related calls needed to function as an Internet client or server.
TAME_UNIX allows networking-related system calls restricted to Unix-domain sockets.
TAME_DNSPATH is meant to allow hostname lookups; it gives access to a few system calls like socket(), but only after the program successfully opens /etc/resolv.conf. So the kernel has to track whether a few "special" files like resolv.conf have been opened during the lifetime of the tamed process.
TAME_GETPW enables the read-only opening of a few specific files needed for getpwnam() and related functions. It will also turn on TAME_INET if the program succeeds in opening /var/run/ypbind.lock.
TAME_CMSG allows file descriptors to be passed with sendmsg() and recvmsg().
TAME_IOCTL turns on a few specific, terminal-related ioctl() commands.
TAME_PROC allows access to fork(), kill(), and related process-management system calls.

A process may make multiple calls to tame(), but it can only restrict its current capabilities. Once a particular flag has been cleared, it cannot be set again.

The patch includes changes to a number of OpenBSD utilities. The cat command is restricted to TAME_MALLOC and TAME_RPATH, for example; never again will cat be able to run amok on the net. The ping command gets access to the net, instead, but loses the ability to access the filesystem. And so on.

This system call has a number of features that may look a bit strange to developers used to Linux. It encodes quite a bit of policy in the kernel, including where the password database is stored and the use of Yellow Pages/NIS; one would grep in vain for ypbind.lock in the Linux kernel source. tame() may seem limited in the range of restrictions that it can apply to a process; it will almost certainly allow more than what is strictly needed in most cases. It thus lacks the flexibility that Linux developers typically like to see.

On the other hand, using tame(), it was evidently possible to add restrictions to a fair number of system commands with a relatively small amount of work and little code. Writing ad hoc BPF programs or SELinux policies to accomplish the same thing would have taken quite a bit longer and would have been more error-prone. tame(), thus, looks like a way to add another layer of defense to a program in a quick and standardized way; as such, it may, in the end, be used more than something like seccomp().

If the tame() interface proves to be successful in the BSD world, there is an interesting possibility on the Linux side: it should be possible to completely implement that functionality in user space using the seccomp() feature (though it would probably be necessary to merge one of the patches adding extended BPF functionality to seccomp()). We would then have the simple interface for situations where it is adequate while still being able to write more flexible filter policies where they are indicated. It could be the best of both worlds.

The first step, though, would probably be to let the OpenBSD project explore this space and see what kind of results it gets. The ability to try out different models is one of the strengths that comes from having competing kernels out there. The ability to quickly copy that work is, instead, an advantage that comes from free software. If this approach to attack-surface reduction works out, we in the Linux world may, too, be able to tame() our cat in the future.

Index entries for this article
Kernel	Security
Security	Sandboxes

Domesticating applications, OpenBSD style

Posted Jul 21, 2015 21:17 UTC (Tue) by gerdesj (subscriber, #5446) [Link] (2 responses)

"it is thus meant to be placed within an application, rather than (as with SELinux) imposed on it from the outside."

That there is a bit of a major difference.

Could someone please tell me how this differs from "capabilities" in Linux, apart from the lack of a direct equivalent to CAP_SYS_ADMIN?

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 23:37 UTC (Wed) by liam (guest, #84133) [Link] (1 responses)

AIUI, the main difference is that true capabilities involve token-based access (this requires changes to the application in order to acquire the appropriate permissions) while selinux involves an acl (policy-based, and little/no app changes).

Domesticating applications, OpenBSD style

Posted Aug 20, 2015 4:50 UTC (Thu) by vapier (guest, #15768) [Link]

not sure what you mean by tokens, but i think the question was "how is tame() different from capabilities", not "how is selinux different from capabilities". there isn't really anything different conceptually between tame and caps afaict.

at least with filecaps, capabilities can be applied externally w/out needing to modify code. although programs can update themselves to use caps syscalls to drop perms on the fly.

Domesticating applications, OpenBSD style

Posted Jul 21, 2015 21:25 UTC (Tue) by dlang (guest, #313) [Link] (29 responses)

> The initialization phase typically involves opening files, establishing network connections, and more; after initialization is complete, the program may not need to do any of those things.

until the program is sent a HUP and needs to close/reopen files (log rotation, etc)

Domesticating applications, OpenBSD style

Posted Jul 21, 2015 23:23 UTC (Tue) by felixfix (subscriber, #242) [Link] (4 responses)

The vast majority of programs are not those kind of long-lived programs.

Domesticating applications, OpenBSD style

Posted Jul 21, 2015 23:25 UTC (Tue) by dlang (guest, #313) [Link] (3 responses)

but the ones that are "this kind of long-lived programs" are the ones that you need to worry about the most because they are the ones that provide services to other programs (or other systems).

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 1:10 UTC (Wed) by wahern (subscriber, #37304) [Link] (2 responses)

Until Shellshock happened, and people realized that a huge pool of potential exploits was lying under their noses the entire time in the form of the various command-line utilities. Just because defenders are focused on one area doesn't mean attackers aren't exploiting other areas. Are you going to to audit utilities like awk, sed, and grep for exploits related to environment variables, data structures buffer bugs, etc? Executing utilities this way is poor form these days, but unfortunately still quite common. Occasionally it still makes sense, such as when executing a configurable command to perform some check or make some policy decision; and in that case you never know how it's implemented.

Regarding the usefulness in complex programs, note that OpenBSD developers tend to implement their services using a rigorous privilege separation and message passing pattern. In particular, they often use subprocesses to implement very specific operations which require no or minimal access to the environment. For example, the TLS private key in OpenSMTPd is only kept in memory in a subprocess, so that bugs in the protocol stack cannot expose the private key. (Yes, they hacked and extended the signature routines in OpenSSL to do this.) After startup, the subprocess doesn't need any privileges except the ability to read and write to the IPC messaging socket and perhaps the /dev/crypto driver. The point of tame(2) is to make it a one-liner to drop privileges after an application-specific initialization procedure.

The utility is limited from the perspective of trying to lock down every conceivable application out there. But that's not the goal. The goal is much more specific, oriented toward the needs of OpenBSD services and OpenBSD developers.

I'm not sold on the concept. But it's hardly useless.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 22:02 UTC (Wed) by dvdeug (guest, #10998) [Link] (1 responses)

This can't do much for awk and can't do anything for GNU awk. Awk is an interpreter, and doesn't know what the program is going to do, and GNU awk is an interpreter with the ability to load arbitrary shared libraries at runtime.

I'm less then impressed with cat. I acknowledge it's an easy example, but it implies that cat isn't trustable. If in 2015 C programmers can't write a cat without buffer overflows, maybe they should stop writing cat in C. A correct cat is better then one that is limited in what it can do when it fails. Other programs are more complex, but a program that fails with an exception on a buffer overflow is better then one that can corrupt local files but not reach the network on a buffer overflow.

Domesticating applications, OpenBSD style

Posted Jul 26, 2015 8:53 UTC (Sun) by mti (subscriber, #5390) [Link]

I see at least two ways of making this useful for awk. One is to expose the tame() call to awk scripts to be able to write safer awk scripts. The other is to add a command line option to awk to set the flags awk should use in its call to tame(). Most awk scripts I have seen doesn't need more than simple file I/O.

As for cat it is not quite as simple as you think it is. For one thing it includes calls to several multi-byte character string functions.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 6:55 UTC (Wed) by epa (subscriber, #39769) [Link] (23 responses)

I guess that's another argument for having log rotation handled externally to the process, which can just write its log messages to a pipe.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 11:26 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (21 responses)

Imagine putting all the logs in a single file, with metadata and the ability to do fast queries for all the messages produced by a particular daemon... kind of a replacement for /var/log/messages... you could call that the system journal :)

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 18:24 UTC (Wed) by dlang (guest, #313) [Link] (7 responses)

putting all logs in a single file isn't going to let you rotate them, and having everything in a single file is a horrible thing to do for scalability and performance.

So please, stop dictating the policy of how logs should be handled and instead just provide mechanisms and let me decide how logs should be handled.

traditional /dev/log works. The metadata is available (if you want it), but I'm not locked into the "all logs in one file on the local system" mentality of journald and don't have to deal with the failure modes of binary logs.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 19:35 UTC (Wed) by plundra (guest, #51099) [Link] (3 responses)

On a related note, since OpenBSD 5.6 syslog(3) uses the new syscall sendsyslog(2), no longer requiring opening /dev/log which helps when you're in a chroot or out of fds.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 20:01 UTC (Wed) by dlang (guest, #313) [Link] (2 responses)

how does it deliver the message? (the man page just says it delivers it directly to syslogd) does it use /dev/log but just not count it as an open file? send it over localhost? something else?

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 21:38 UTC (Wed) by plundra (guest, #51099) [Link] (1 responses)

Never looked into the details before, but as I understand it, when syslogd is started a fd is set in the kernel via an ioctl (LIOCSFD), that sendsyslog then uses, if set.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 22:21 UTC (Wed) by dlang (guest, #313) [Link]

so it's effectively a backdoor around fd limits.

writing to /dev/log with chroot is actually better because the syslog daemon can create a /dev/log in each sandbox and tell which one was written to (as well as gathering metadata across the unix socket, something that I assume is lost when you are just writing to a magic fd)

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 19:39 UTC (Wed) by anselm (subscriber, #2796) [Link] (2 responses)

… but I'm not locked into the "all logs in one file on the local system" mentality of journald …

I'm pretty sure we have been over this before but that is not actually how journald works. Read journald.conf(5).

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 20:04 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

we had not covered this particular aspect before, but the options listed (SplitMode= One of "uid", "login" and "none") is still pretty limited.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 23:31 UTC (Wed) by anselm (subscriber, #2796) [Link]

Journald will rotate log files as required to limit their size, or at fixed intervals. Very old journal files can be automatically discarded. Querying still works across all the reachable files. Also, journald does support remote logging – the journal files do not need to remain on the local machine.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 21:26 UTC (Wed) by flussence (guest, #85566) [Link] (12 responses)

> with metadata and the ability to do fast queries for all the messages produced by a particular daemon... kind of a replacement for /var/log/messages... you could call that the system journal

I call that a "filesystem". You can see it implemented properly in the likes of daemontools, runit, s6 and so on.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 22:18 UTC (Wed) by dlang (guest, #313) [Link] (11 responses)

for fast lookups of log data what you want is a database, not a filesystem or a file.

And if you are going to make it 'the' way of dealing with logs, you need to make it scale well to handle people's logging needs.

if you are just offering it as an optional thing that can be turned off, the requirements are not as high.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 11:35 UTC (Thu) by dgm (subscriber, #49227) [Link] (10 responses)

It all depends on what you mean by "lookup". Logs are essentially sequential, both in how you write and how you use them. I have dealt with such problems and often found that simply sequentially scanning the data is faster than jumping from here to there following pointers through the index and then the data itself, specially when you keep related data together. Additionally, sequential data structures are more compact and less prone to corruption if you take the precaution of adding synchronization points.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 12:42 UTC (Thu) by dlang (guest, #313) [Link] (9 responses)

This thread on logging is off-topic for the article, but I'm willing to continue discussing things

When I say that logs need to be in a database for searchability, I'm not meaning a traditional SQL database, but something optimized for time-series data like ElastiSearch/splunk. Something that can deal not just with the raw text of the logs, but also with parsing the logs and creating indexes to make searching them more efficient than grep.

Grep is a great took if you have small logs (or know where to search in yout large logs and have them split into small enough chunks. but when you start combining the logs from even dozens, let alone hundreds or thousands of servers together, the sort of split options that the journal provides don't help you.

You don't want to do reports from the database as that's inefficent and doesn't scale well. this doesn't matter for a single laptop or 5-server company, but as you get up into tens or hundreds of gigs/day of logs it matters more. I've been in environments where we've seriously had to deal with Gig-E not being fast enough to handle all the logging traffic if it was done the 'obvious' way.

and splitting the logs, but then searching across all of them is counterproductive

> I have dealt with such problems and often found that simply sequentially scanning the data is faster than jumping from here to there following pointers through the index and then the data itself, specially when you keep related data together. Additionally, sequential data structures are more compact and less prone to corruption if you take the precaution of adding synchronization points.

I strongly agree, and in part this is part of why I dislike the journald implementation. Reading the logs out of journald isn't a sequential read through the data, it's following pointers from one message to the next (and these pointers can get corrupted, which has led to cases where following the pointers gives you a loop)

I like to keep my logs organized in multiple ways

1. I keep an authoritative copy for audit/legal reasons that's a simple sequential text file, gzipped and chunked into 'reasonable' sizes (typically per-minute files that then get signed/archived at larger intervals)

2. I parse the messages extensively and store them in something that makes it possible to do fast ad-hoc searches of data (splunk or elasticsearch) being able to do a query like 'show me every log containing this IP address' across hundreds of TB of data in just a couple minutes is a great security tool.

3. I categorize the logs and write them out in per-category files, sometimes in different formats (and one log can be written multiple places) so that reporting tools for each category can efficiently process what they need to get at.

4. some of the destinations that are written to are event correlation engines that do different things with the logs. Some generate summary data (that then feeds back into the logging system so that report generators and dashboards can use it, usually from ES/Splunk). Some generate alerts based on the absence of logs. And some generate alerts based on spotting logs or combinations of log messages.

I've done a bit more writing on the topic:
https://www.usenix.org/publications/login/david-lang-series
https://www.usenix.org/publications/login/feb14/logging-r...
https://www.usenix.org/conference/lisa12/technical-sessio...

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 15:16 UTC (Thu) by dgm (subscriber, #49227) [Link] (8 responses)

As you pointed out this is clearly off topic, but I want to thank you for sharing your articles. Very interesting reading.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 15:59 UTC (Thu) by anselm (subscriber, #2796) [Link] (7 responses)

This is great but at the same time we should keep in mind that, as the wide world of logging applications are concerned, dlang is something of an outlier.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 19:15 UTC (Thu) by dlang (guest, #313) [Link] (5 responses)

> This is great but at the same time we should keep in mind that, as the wide world of logging applications are concerned, dlang is something of an outlier.

while I'm an outlier in the total volume of logs I've had to deal with, it's not by as much as you think.

the 100K logs/sec traffic was the 3-year projection at a 800 person SaaS company in 2006. When you take 100K logs/sec @ ~250 bytes/log (our measured average), delivering the logs to 4 destinations exceeds 1Gb/s. the company did not continue expanding at the predicted rate after it was purchased by a much larger company in 2007, but the log volume did continue to grow.

I spent some time at Google, and while I wasn't in their logging division, there's a lot in common between their logging architecture and what I advocate (although they do everything through their own APIs, a lot of NIH and some 'they were at a large scale before the tools got good enough for that scale')

At my new job, we 'only' have ~500 systems right now, and I find that these approaches work much better than many others that are talked about and tried. There are a LOT of companies that are at this scale and larger.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 21:16 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (4 responses)

Maybe I'm just unfamilar with this level of sysadmining, but what do you *do* with all these logs? Dump them to disk and rotate out disks for archaeological use later (breach, debugging, etc.)? Scan them for "interesting" bits and toss out anything outside the context of those bits? It seems that, to me, these log databases are larger than the actual meat of the data being manipulated in many cases (LHC and scientific simulations being the ones that come to mind where the data would still outsize a log flow like that).

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 21:59 UTC (Thu) by dlang (guest, #313) [Link] (3 responses)

> Maybe I'm just unfamilar with this level of sysadmining, but what do you *do* with all these logs?

Fair question

different things have different uses.

The archive is to recreate anything else as needed and to provide an "authoritative source" in case of lawsuits. How long you keep the logs depends on your company policies, but 3-7 years are common numbers (contracts with your customers when doing SaaS may drive this)

being able to investigate breeches, or even just fraud are reasons for the security folks to care.

for outage investigations (root cause analysis), you want to have the logs from the systems for the timeframe of the outage (and this is not just the logs from the systems that were down, you want the logs from all other systems in case there are dependencies you need to track down). For this you don't need a huge timeframe, but being able to look at the logs during a time of similar load (which may be a week/month/year ago depending on your business) to see what's different may help.

by generating rates of logs of different categories you can spot trends in usage/load/etc

By categorizing the logs and storing them by category you can notice "hey, normally these logs are this size, but they were much larger during the time we had problems" and by doing it per type in addition to per server you can easily see if different servers are logging significantly differently when one is having problems.

Part of categorizing the logs can be normalizing them. If you parse the logs you can identify all 'login' messages from your different apps and extract the useful info from them and output a message that's the same format for all logins, no matter what the source. This makes it much easier to spot issues and alert on problems.

A good approach is what Marcus Ranum coined "Artificial Ignorance"

start with your full feed of logs, sort it to find the most common log messages, If they are significant categorize those longs and push them off for something that knows that category to report on.

Remember that the number of times that an insignificant thing happens can be significant, so generate a rate of insignificant events and push that off to be monitored.

repeat for the next most common log messages.

As you progress through this, you will very quickly get to the point where you start spotting log messages that indicate problems. Pass those logs to an Event Correlation engine to alert on them (and rate limit your alerts so you don't get 5000 pages)

Much faster than you imagine, you will get to the point that the remaining uncategorized logs are not that significant, but also that there aren't very many of them and you can do something like generate a daily/weekly report of the uncategorized messages and have someone eyeball them for oddities (and keep an eye out for new message types you should categorize)

This seems like a gigantic amount of work, but it actually scales well. The bigger your organization the more logs you have, but the number of different _types_ of logs that you have grows much slower than the total log volume.

> It seems that, to me, these log databases are larger than the actual meat of the data being manipulated in many cases.

That's very common, but it doesn't mean the log data isn't valuable. Remember that I'm talking about a SaaS type environment, not HPC. Even if the service is only being provided to your employees. HPC and scientific simulations use a lot of cpu and run through a lot of data, but they don't generate much in the way of log info.

For example, your bank records are actually very small (what's your balance, what transactions took place), but the log records of your banks systems are much larger because they need to record every time that you accessed the system and what you did (or what someone did with your userid). When you then add the need to keep track of what your admins are doing (to be able to show that they are NOT accessing your accounts and catch any who try), you end up with a large number of log messages for just routine housekeeping.

But text logs are small, and they compress well (xz compression is running ~100:1 for my logfiles), so it ends up being a lot easier to store the data than you initially think. If you are working to do this efficiently, you can also use cheap storage and end up finding that the amount of money you are spending on the logs is a trivial amount of your budget.

It doesn't take many problems solved, or frauds tracked down to pay for it (completely ignoring the value of logs in the case of lawsuits)

Domesticating applications, OpenBSD style

Posted Jul 24, 2015 1:12 UTC (Fri) by pizza (subscriber, #46) [Link] (2 responses)

> The archive is to recreate anything else as needed and to provide an "authoritative source" in case of lawsuits. How long you keep the logs depends on your company policies, but 3-7 years are common numbers (contracts with your customers when doing SaaS may drive this)

You aren't using "logs" in the same sense that most sysadmins mean "logs" -- your definition is more akin to what journalling filesystems (or databases) refer to as logs -- ie a serial sequence of all transactions or application state changes.

I think that's why so many folks (myself included) express incredulity at your "logging" volume.

Domesticating applications, OpenBSD style

Posted Jul 24, 2015 1:49 UTC (Fri) by dlang (guest, #313) [Link] (1 responses)

When I say logs, I'm talking about the stuff generated by operating systems and appliances into syslog + the logs that the applications write (sometimes to syslog, more frequently to local log files that then have to be scraped to be gathered). This includes things like webserver logs (which I find to be a significant percentage of the overall logs, but only ~1/3)

I do add some additional data to the log stream, but it's low volume compared to the basic logs I refer to above (A few log messages/min per server)

Also, keep in mind that when I talk about sizing a logging system, most of the time I'm talking about the peak data rate. What it takes to keep up with the logs at the busiest part of the busiest day.

I want to have a logging system that can process all logs within about 2 min of when they are generated. This is about the limit as far as I've found for having the system react to log entries or having changes start showing up in graphs.

There is also the average volume of logs per day. This comes into play when you are sizing your storage.

so when I talk about 100K logs/sec or 1Gb of logs being delivered, this is the peak time.

100K logs/sec @256 bytes/log = 25MB/sec (1.5GB/min, 90GB/hour) If you send this logging traffic to four destinations (archive, search, alerting, reporting), you are at ~100MB/sec of the theoretical 125MB/sec signalling rate that gig-E gives you. In practice this is right about at or just above the limit of what you can do with default network settings (without playing games like jumbo frames, compressing the network stream, etc). The talk I posted the link to earlier goes into the tricks for supporting this sort of thing.

But it's important to realize that this data rate may only be sustained for a few min per day on a peak day, so the daily volume of logs can be <1TB/day on a peak day (which compresses to ~10GB), and considerably less on off-peak days. Over a year, this may average out to 500GB/day (since I'm no longer there I can't lookup numbers, but these are in the ballpark)

This was at a company providing banking services for >10M users.

now, the company that I recently started at is generating 50GB of windows eventlog data per day most weekdays, (not counting application logs, firewall logs, IDS logs, etc) from a couple hundred production systems. I don't yet have a feed of those logs, so I can't break it down any more than that yet, but the type of business that we do is very cyclical, so I would expect that peak days of the month/year the windows eventlog log volume will easily be double/triple that.

If you parse the log files early and send both the parsed data and the raw data, the combined parsed data can easily be 2x-3x the size of the original raw data (metadata that you gather just adds to this)

As a result, ElasticSearch needs to be sized to handle somewhere around 500G/day (3*3*50GB/day) for it's retention period to handle the peak period with a little headroom.

Domesticating applications, OpenBSD style

Posted Jul 24, 2015 1:55 UTC (Fri) by dlang (guest, #313) [Link]

as always, your log sizes and patterns may vary, measure them yourself. but I'll bet that you'll be surprised at how much log data you actually have around.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 22:30 UTC (Thu) by job (guest, #670) [Link]

Anyone who works under regulations, such as SOX, needs to keep an archive of logs in their sequential form. By far the easiest way to do that is to chunk them up, gzip them, sign if necessary, and file away. That should be quite a common use case in larger organizations, if not perhaps not the same volume of it. It's also a good idea to keep a (pruned) copy in elasticsearch for your daily bug hunting activities...

Domesticating applications, OpenBSD style

Posted Jul 25, 2015 19:20 UTC (Sat) by jhoblitt (subscriber, #77733) [Link]

It's nice to know that even an article on openbsd can trigger a systemd flamewar...

Domesticating applications, OpenBSD style

Posted Jul 21, 2015 21:29 UTC (Tue) by dlang (guest, #313) [Link] (4 responses)

With this sort of thing the devil is in the details. It's really easy to get something that looks good and works some of the time. It's when you start getting to the edge cases or the ground shifts under you (different ways of looking up hostnames for example) that things start getting ugly

but since there is no clearly good solution to this problem, it will be good to get another group experimenting with it.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 7:05 UTC (Wed) by epa (subscriber, #39769) [Link] (2 responses)

Well yeah. Like most security measures it's not a 100% solution and not intended to be. If a few one-line tame() calls scattered among the system utilities' code can prevent just one or two bugs from being exploitable in the future then it will be worthwhile.

I think it would be cool to declare sections of code which can be executed once and then forgotten. For example the initialization code in init() would open files, then the program calls something like scrub_code(&init). This C library routine, with support from the compiler and the kernel, overwrites the init() in the process's text segment so it can never again be executed during the process's lifetime (whether deliberately or through some stack-smashing attack). As long as self-modifying code is prohibited the rest of the time, you can be certain that no further calls to open() can happen simply because they aren't physically present in the program code.

Hmm, thinking about it this would only work for toy programs that don't have any shared libraries. It doesn't solve the problem of a stack smashing attack jumping to some place in a shared library. So masking out allowed system calls is probably a better approach, combined with some general countermeasures against memory trampling and stack smashing.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 15:38 UTC (Wed) by ibukanov (subscriber, #3942) [Link]

> I think it would be cool to declare sections of code which can be executed once and then forgotten.

This is pretty useless defense as exploits can just use return-oriented programming or be data-only, http://www.securitee.org/files/valueguard_iciss2010.pdf

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 22:21 UTC (Wed) by dvdeug (guest, #10998) [Link]

Worthwhile is a lot more complex tradeoff then that. It can and probably will cause bugs; a misplaced or miswritten tame() may cause a program to be unable to read or write files it needs to, or make a net communication it needs to. The tame() code itself could be buggy, ranging from providing false security to actually opening up features the program shouldn't have an option to. At the least it wasn't worthwhile if the same amount of time used for another more feature would have prevented the bugs instead of stopping them from being exploitable, or made more bugs unexploitable.

Domesticating applications, OpenBSD style

Posted Jul 27, 2015 16:17 UTC (Mon) by ortalo (guest, #4654) [Link]

That's the most interesting aspect IMHO. We can wait 2-3 years and see how many programs will have been tame()d and how.
At least, it will offer some very interesting experience on the usefulness of such pragmatic ideas, especially in opposition to mandatory policies implementations.

Domesticating applications, OpenBSD style

Posted Jul 21, 2015 22:04 UTC (Tue) by spender (guest, #23067) [Link] (6 responses)

TAME_WPATH allows the whole feature to be defeated -- httpd under their policy (or any other binary with this "capability", and it will be a majority of them) will be able to both write to a file and chmod it as setuid. This then allows a local attacker with access to that path to gain the full privilege of the user running httpd without any restrictions, including ability to ptrace existing processes to steal private keys, etc.

Naive minimalistic syscall-based approaches will always run into this problem: any sufficiently complex program will need enough of these seemingly minor privilege buckets to shoot itself in the foot. The above is just one of several problems with this DOA idea. Maybe it'll work great for the non-interactive 386s running only a firewall typical of OpenBSD installations, but it simply doesn't translate to the real world with real applications and actual attack surface. If it can't handle even simple exploit scenarios crafted in five minutes, what's the point of it other than to further the illusion of OpenBSD's security relevance?

This is nothing other than another entry in OpenBSD's storied history of arriving late to the game with weak re-inventions and acting like they've created something new and amazing, and lazy articles like this drink the kool-aid, ignore critical thought, and perpetuate the mythology of OpenBSD being the pinnacle of security innovation.

-Brad

Domesticating applications, OpenBSD style

Posted Jul 21, 2015 22:38 UTC (Tue) by nix (subscriber, #2304) [Link]

There is one situation in which TAME_WPATH can be used safely: if the only filesystems the user running a program using it can write to have been mounted nosuid.

How common this is in a typical OpenBSD installation I don't know. It would certainly suffice for me (with /var and /home separate filesystems, httpd only able to write to at most one of those, and both mounted nosuid).

Of course, this is a local administration policy decision, which fits rather badly with the 'wire it into the application' approach that is tame()... but of course the sysadmin has always been able to degrade the security of applications by making bad decisions, and mounting filesystems that anything but root can write to without nosuid has long been regarded as a bad administrative decision. So maybe this falls under "works if sysadmin isn't stupid".

(I don't think the article is drinking the kool-aid or ignoring critical thought, but I will write this off as you being simply incapable of writing anything that isn't an attack on your interlocutor. Attacking Jon like that? You should be ashamed of yourself.)

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 9:44 UTC (Wed) by djm (subscriber, #11651) [Link] (4 responses)

If you can execute arbitrary code in httpd then being able to write a setuid file that gives you execute permissions as httpd doesn't get you anything you didn't already have.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 10:16 UTC (Wed) by PaXTeam (guest, #24616) [Link] (3 responses)

you don't need to be able to execute arbitrary code for this (https://www.usenix.org/conference/14th-usenix-security-sy...).

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 10:41 UTC (Wed) by djm (subscriber, #11651) [Link] (2 responses)

You're trying to fit a camel through a pinhole now.

You're saying that TAME_WPATH can be exploited for actual gain in cases where the attacker

1) has an attack which can cause an application to misbehave enough to write arbitrary contents to a file and then chmod it with arbitrary permissions, but is somehow less than full codeexec; and

2) is local to the host

That's a bit of a step back from "omg broken"

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 12:44 UTC (Wed) by spender (guest, #23067) [Link] (1 responses)

So what's the point of a system to deny obscure syscalls where under trivial exploit scenarios an attacker can gain the ability to call those syscalls under the privileged account that was supposed to be protected? With your poo-pooing of the scenario you've dismissed the entire multi-billion dollar webhosting industry. In what scenario other than an exploit condition can an application just start calling random unrelated system calls? You want to give the appearance that this confines exploits, but when they're actually presented want to pretend it's got nothing to do with exploits. You can't have your cake and eat it too.

-Brad

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 16:20 UTC (Wed) by comex (subscriber, #71521) [Link]

The point applies to the vast majority of systems where the attacker does not have (legitimate) control of another local user; though being able to create SUID files while tamed is arguably a bug, considering the general willingness to disable broad swaths of kernel functionality for tamed processes - a bug which, especially since the feature is a work in progress, you would do well to attempt to report.

That said, as far as I can tell, the idea of using TAME_WPATH or TAME_CPATH without chroot is just fundamentally broken, considering home directory dotfiles etc.

Domesticating applications, OpenBSD style

Posted Jul 21, 2015 23:17 UTC (Tue) by roc (subscriber, #30627) [Link] (3 responses)

It sounds like these options have been tailored to specific applications and scenarios that Theo de Raadt has thought about and the kernel will have to be relentlessly extended to handle more applications and scenarios. That may be fine for OpenBSD but not for a general-purpose OS that people want to innovate on. For example, rr uses seccomp in interesting ways that are totally unrelated to what tame() provides. Still, this experiment will produce useful data.

And yes, if this approach is useful at all, creating a user-space library that provides this functionality on top of seccomp seems ideal.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 3:01 UTC (Thu) by thestinger (guest, #91827) [Link] (2 responses)

It's already quite painless to use seccomp-bpf via the libseccomp library. It has a very nice API and abstracts over most of the architecture portability issues.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 5:53 UTC (Thu) by roc (subscriber, #30627) [Link] (1 responses)

The issues that tame() abstracts over are not so much architecture portability issues, but having to understand the whole system-call interface to figure out which syscalls to block.

Domesticating applications, OpenBSD style

Posted Jul 27, 2015 1:41 UTC (Mon) by thestinger (guest, #91827) [Link]

You also have to understand subtleties about the system calls whitelisted by tame to use it safely, such as the gotchas pointed out by spender. It only provides convenient (or inconvenient, if your goal is minimal attack surface) groupings of related system calls. It doesn't prevent you from shooting yourself in the foot at all.

Anyway, in most sandboxes, seccomp-bpf (or tame) is for reducing the kernel attack surface. Other mechanisms are used to implement the sandboxing semantics (uid/gid separation, chroot, FreeBSD jail, Linux namespaces).

Aren't systemd's security capabilities in userspace simpler to use?

Posted Jul 22, 2015 6:28 UTC (Wed) by alison (subscriber, #63752) [Link] (8 responses)

See http://0pointer.net/public/systemd-nluug-2014.pdf

ftp.nluug.nl/video/nluug/2014-11-20_nj14/zaal-2/5_Lennart_Poettering_-_Systemd.webm

Essentially inserting some statements in services' unit files about CapabilityBoundingSet, NoNewPrivileges and RestrictAddressFamilies and the like are easy ways to create restrictions similar to those implemented by tame(). I'm too lazy to learn about SELinux, but have found the systemd capabilities control facilities quite manageable both to configure and to test.

Aren't systemd's security capabilities in userspace simpler to use?

Posted Jul 22, 2015 10:16 UTC (Wed) by djm (subscriber, #11651) [Link] (5 responses)

That's quite different. The point of tame()/seccomp-bpf/seatbelt, etc is that they can be applied *part-way* through an application's life, after most of the privilege-needing operations have been completed.

Restricting a program "from the outside" would need to include all the privileges that are only needed at initialisation time and would yield a much less restrictive policy.

Aren't systemd's security capabilities in userspace simpler to use?

Posted Jul 22, 2015 10:48 UTC (Wed) by fishface60 (subscriber, #88700) [Link] (4 responses)

If your process set-up is common stuff like setting up sockets, then you could handle that in systemd by declaratively describing what you want in a .socket unit, and using socket activation of your service, so your process doesn't need to do setup before dropping capabilities.

I don't think this has a way to set up an outgoing connection, but that's more common in outgoing client applications, which you wouldn't necessarily be able to use systemd for anyway.

Aren't systemd's security capabilities in userspace simpler to use?

Posted Jul 22, 2015 20:21 UTC (Wed) by wahern (subscriber, #37304) [Link] (3 responses)

Private keys. Sensitive configuration files. Filesystem initializations. It's not only about having read access to specific resources, but often times initializing those resources, or even re-initializing after failure. Any complex program will either need to manage dropping capabilities on its own, or it will have to be architected as a collection of multiple independent processes invoked separately. In such cases, systemd doesn't simplify things; it complicates them.

Socket activation existed for decades before systemd in the form of inetd, and almost nobody bothered to use it. Why? Because opening a socket is but one of many ad hoc initialization tasks complex services implement. Instead of trying to abstract away these patterns and trying to provide a One True Interface only suitable for imaginary or proof-of-concept programs, it makes more sense to provide a simple, powerful primitive which takes care of 80% of the work, leaving the finer ad hoc details to the application. fork is like this. tame tries to be like this, but only time will tell whether how useful it really is.

The problem with systemd is that it makes easy things easier, but complex things more convoluted, assuming it's useful at all. systemd isn't _composable_ in the sense of being able to build another layer atop it. It's a tool for system administrators wrangling poorly written software. It's not a solution for implementing correctly written software in the first instance. Likewise for something like journald. It's a nice piece of engineering. But it's not composable. A composable logging approach would be logging to stderr on a file descriptor specifiable via the invocation options, so that users can direct log messages however and where ever they see fit, including but not limited to journald. Composability allows you to programmatically extend and repurpose the functionality, without having to know about the internal details of the implementation, and without needing the cooperation of the implementation beyond the simple interface it provides--logging to a specified file descriptor. journald and how-to-properly-implement-logging-in-your-application are unrelated from a programmer's perspective, yet sadly conflated by system administrators and developers alike.

I personally favor moving developers toward something like Capsicum. Tame and seccomp seem more like interim measures, and I would hate to see the slow shift to Capsicum (or the Capsicum-like model) stall out. Systemd is irrelevant in the context of such approaches. It's more relevant for approaches like VMs or containers, which not coincidentally are approaches especially preferred by system administrators as it allows them to wrangle poorly written and misbehaved software.

Capsicum

Posted Jul 23, 2015 8:46 UTC (Thu) by gasche (subscriber, #74946) [Link]

Indeed, I was disappointed that Capsicum was not mentioned in the original article, as it is already available in FreeBSD:
https://www.cl.cam.ac.uk/research/security/capsicum/freeb...

It seems that the development of a Linux port is still ongoing:
https://github.com/google/capsicum-linux
but I'm worried that people would reject it as overlapping existing mechanisms (while it seems rather hard to combine the same expressivity and simplicity with existing mechanisms, bpf-seccomp included).

Aren't systemd's security capabilities in userspace simpler to use?

Posted Jul 23, 2015 15:39 UTC (Thu) by fishface60 (subscriber, #88700) [Link]

Whoa there. I'm not claiming that systemd replaces the need for applications to do their own setup, just that in some cases you can move the setup out.

> Private keys. Sensitive configuration files. Filesystem initializations.

For the filesystem initialization bit you could potentially use a different process. Systemd has a system-wide filesystem initialization service that runs on boot, which it uses to initialize some of its services' filesystem paths, and other services can drop in a config file if they choose to do setup this way.

> It's not only about having read access to specific resources, but often times initializing those resources, or even re-initializing after failure. Any complex program will either need to manage dropping capabilities on its own, or it will have to be architected as a collection of multiple independent processes invoked separately. In such cases, systemd doesn't simplify things; it complicates them.

You can still have your service do its own capability dropping, systemd isn't forcing you to use its capability dropping mechanism, so it can't be complicating matters here.

If you *do* choose to split your service up, then I think systemd unit files are simpler, though if you disagree nobody is stopping you using an old sysv init script.

> Socket activation existed for decades before systemd in the form of inetd, and almost nobody bothered to use it. Why? Because opening a socket is but one of many ad hoc initialization tasks complex services implement. Instead of trying to abstract away these patterns and trying to provide a One True Interface only suitable for imaginary or proof-of-concept programs, it makes more sense to provide a simple, powerful primitive which takes care of 80% of the work, leaving the finer ad hoc details to the application. fork is like this. tame tries to be like this, but only time will tell whether how useful it really is.

As a counter-point, I've repeatedly been frustrated by services that insist on doing their own socket initialization, and are insufficiently flexible about what I need to do with the sockets.
Usually binding it to an ephemeral port, so I can start up the service multiple times, to run isolated tests against the service in parallel, or binding the service to a specific interface, so I'm not relying on binding to the IP address that the interface I want to bind it to happens to currently have.

> The problem with systemd is that it makes easy things easier, but complex things more convoluted, assuming it's useful at all.

I don't understand your point, since as far as I'm aware it's not preventing you doing it the old way.

> systemd isn't _composable_ in the sense of being able to build another layer atop it.

I don't follow.

The rkt guys are building container management on top of systemd-nspawn and service management.
As I understand it, desktop environment developers are building on top of systemd-logind's session management.
The Cockpit guys are building server administration interfaces on top of systemd's APIs.

> It's a tool for system administrators wrangling poorly written software.

You've described most software there, aren't tools for dealing with it valuable?

> It's not a solution for implementing correctly written software in the first instance. Likewise for something like journald. It's a nice piece of engineering. But it's not composable.

Given your previous definition of composable software being something you can build on, then I think it is, as rsyslog builds on journald to provide the traditional interface on top of journald by reading messages out of the journal.

> A composable logging approach would be logging to stderr on a file descriptor specifiable via the invocation options, so that users can direct log messages however and where ever they see fit,

How is this standard interface for providing logging descriptors to processes composable, but setting up other resources like sockets not?

> including but not limited to journald.

Indeed. journald gives you the option of setting up an extra file descriptor, which messages written to get appropriately labelled.

The application or an ancestor program can set this up with sd_journal_stream_fd(), or from a shell you can do:

my_command --log-fd=100 100> >(systemd-cat --identifier=my_command_log)

> Composability allows you to programmatically extend and repurpose the functionality, without having to know about the internal details of the implementation, and without needing the cooperation of the implementation beyond the simple interface it provides--logging to a specified file descriptor.

Systemd isn't stopping you doing that.

> journald and how-to-properly-implement-logging-in-your-application are unrelated from a programmer's perspective, yet sadly conflated by system administrators and developers alike.

Are you saying neither system administrators or developers are programmers

Aren't systemd's security capabilities in userspace simpler to use?

Posted Jul 23, 2015 20:14 UTC (Thu) by ibukanov (subscriber, #3942) [Link]

> Private keys.

The right way to handle private keys is to have a separated process that handle cryptography for the main application over a pipe with private keys loaded via separated programs over another pipe. This way the crypto process would not need any file system access and the main application would not need to access private keys. It is interesting that recent OpenSSHD comes rather close to this model with sshd gaining ability to use host keys stored in ssh-sgent. I wrote "almost" as ssh-agent has to create its unix socket requiring minimal filesystem access rather accepting it on stdin.

> Sensitive configuration files.

It is better to have a unix socket in the application where administrator or scripts can send those config files. This also eliminates the need for any signals and races when the application has to be notified about changes in the files.

> Filesystem initializations.

Why not to have a separated process that does this?

In any case, I think the main purpose of tame is to add *quickly* another layer of security to existing codebase. But I am not convinced that in the long term this a right approach.

Aren't systemd's security capabilities in userspace simpler to use?

Posted Jul 22, 2015 21:16 UTC (Wed) by mezcalero (subscriber, #45103) [Link] (1 responses)

Caps actually control access to facilities "above" what a normal user has, i.e. stuff that normally only root has. OpenBSD's tame() stuff otoh limits access to facilities that even normal users have, hence systemd's CapabilitiesBoundingSet= cannot cover what tame() covers.

That said, I am pretty sure the tame() API is frickin' crazy, and seccomp() actually a ton more useful, especially if you use it in conjunction with some namespacing tricks like they are exposed with systemd's PrivateTmp=, ProtectSystem= or PrivateNetwork=.

I find Theo's comment on seccomp controlling programs with other programs particularly weird, given the the seccomp filters are not turing complete, and hence hardly more than a fancy parameter check, and hardly something I would really call a "program".

Aren't systemd's security capabilities in userspace simpler to use?

Posted Jul 23, 2015 21:10 UTC (Thu) by flewellyn (subscriber, #5047) [Link]

To be honest, I think it's mostly Theo's noted disdain for Linux, combined with a heaping helping of NIH syndrome.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 10:12 UTC (Wed) by Lionel_Debroux (subscriber, #30014) [Link] (9 responses)

On the one hand, it's a good thing that there's something simpler than SELinux and seccomp.
On the one hand, it's fairly coarse-grained, and would give some false sense of security... but it's not the first time OpenBSD implements a half-measure, as outlined by spender above.

Technically, why a single parameter of int type ? Two int parameters, or a 64-bit parameter, would be more future-proof.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 12:36 UTC (Wed) by patrick_g (subscriber, #44470) [Link] (6 responses)

s/as outlined by spender above/as asserted (without proof) by spender above

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 13:05 UTC (Wed) by spender (guest, #23067) [Link] (5 responses)

Oh sorry, I forgot LWN readers require me to do all their homework for them:

Everything you want to know is here: https://grsecurity.net/~spender/exploits/exp_moosecox.c

To continue the "innovation" not mentioned there, they've recently also ripped off PAX_MPROTECT from 2001 (14 years late), labeled it "now or never exec":
http://www.tedunangst.com/flak/post/now-or-never-exec
with no mention anywhere of PAX_MPROTECT, despite being keenly aware of it since this famous thread: http://www.monkey.org/openbsd/archive/misc/0304/msg01146....
I guess they had to wait 12 years since that point to let their obvious hypocrisy be less visible about having to "break POSIX".

With the exception of perhaps the extension of privilege separation (already demonstrated in Postfix prior to Niels Provos' paper), not one original useful idea has come out of OpenBSD in 15 years, and it's time to stop feeding the delusions of these plagiarists.

-Brad

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 17:17 UTC (Wed) by SEJeff (guest, #51588) [Link] (4 responses)

So I've got to ask Brad, you hate OBSD security, you hate Linux security (which is fair), what system do you use? A heavily locked down PAX / grsecurity enabled Linux distro? As much as I see you pull the rug out from under so many of these security features, generally in Linux, I'm curious what you would consider to be "secure".

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 17:39 UTC (Wed) by patrick_g (subscriber, #44470) [Link] (3 responses)

Don't know what he's using now but at the time of this LinuxFR interview (http://linuxfr.org/nodes/24807/comments/1052695) he said :

I know this will probably upset some of your readers, but I actually
am running Windows 7 RC right now. Prior to that I had been running
Windows Vista. I haven't used Linux as a primary desktop since college
or so.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 18:05 UTC (Wed) by PaXTeam (guest, #24616) [Link] (2 responses)

i'll see that interview and raise you https://microsoft.com/emet ;).

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 18:26 UTC (Wed) by patrick_g (subscriber, #44470) [Link]

OK I fold :-)

Domesticating applications, OpenBSD style

Posted Jul 26, 2015 19:47 UTC (Sun) by ploxiln (subscriber, #58395) [Link]

Wow... so Brad cares 100% about security features and 0% about software quality... he cares so much about security (features, apparently) that he uses WINDOWS!

Just to state the obvious, MS / Windows had most "mitigation" features first, like ASLR and sandboxing, but it was just checkbox features to use for sales purposes, and didn't fix their security problems. There's always the most widely used software on the platform not opting into the security feature or opting out of it, like flash plugin having a root-level helper service to get it out of the browser sandbox, or acrobat reader not opting into ASLR (and running javascript and such), or Office's VB macros and OLE hilariousness, or font kerning scripts running in the kernel. And to top it all off it's all closed source so there's no telling how much ridiculous crap is in there, and no one but Microsoft can do anything about it. Exploits for Windows continue to appear regularly in the wild, despite the industry-leading mitigation features.

Brad has good ideas, and does a lot of work to create working exploits, but has always come off as rather unbalanced in how he values different qualities of software, and wow does this confirm it. Wow.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 2:55 UTC (Thu) by busterb (subscriber, #560) [Link]

Why a single int in the first iteration? I asked the same thing directly: It was a challenge in simplicity, to find the minimum that could be useful, with the implicit warning that a call with dozens of flags is going to devolve from its goal of simplicity.

That said, tame is actively evolving beyond the description in this article, based on experience converting more programs to use it. It will likely take months to convert enough things in order to fully understand and refine all of the macro-level use cases.

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 3:07 UTC (Thu) by thestinger (guest, #91827) [Link]

seccomp is very simple from an application's perspective. The libseccomp API is quite nice. Some projects roll their own BPF handling code (Firefox, Chromium, rr), but that's entirely their fault.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 12:44 UTC (Wed) by richmoore (guest, #53133) [Link]

It might be interesting to try implementing this in userspace using the linux sandbox facility.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 18:36 UTC (Wed) by josh (subscriber, #17465) [Link] (3 responses)

Experimenting with other approaches makes sense, and it's *definitely* a good idea to have an easy way to declare high-level policies, but I don't see any obvious reason why tame() is written in the kernel, rather than providing a general-purpose mechanism in the kernel (whether BPF or some other way of checking syscalls and parameters), and then writing a policy in userspace. The entirety of tame() could be written using seccomp BPF rules assembled based on the flags provided, and then submitted to the kernel. And the next time a new flag needs adding, or someone wants to build a locked-down sandbox that doesn't exactly match what the OpenBSD kernel assumed people would want, the kernel doesn't need to change.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 20:42 UTC (Wed) by wahern (subscriber, #37304) [Link] (2 responses)

It seems like Theo is too suspicious of loading user-defined code into the kernel, no matter that the BPF byte code and VM is designed to be verifiable. I'm not so averse to BPF, even when considering the possibility of bugs. It's so incredibly useful and flexible, and yet so well contained and discrete, that the complexity and risk is more than tolerable IMO.

Plus, you have to be root to load the programs[1]. And OpenBSD already supports traditional BPF, anyhow. On further consideration, I'm not sure if Theo's apparent position is even defensible.

[1] Or have have set PR_SET_NO_NEW_PRIVS.

Domesticating applications, OpenBSD style

Posted Jul 22, 2015 21:21 UTC (Wed) by spender (guest, #23067) [Link]

> Plus, you have to be root to load the programs[1]

This isn't true. It's only correct for the new bpf syscall (which currently requires CAP_SYS_ADMIN) because it's a direct interface to the new eBPF code (as well as an extra "map" interface). Unprivileged users can create sockets and attach BPF code to them, which resulted in this (once JIT was implemented): http://mainisusuallyafunction.blogspot.com/2012/11/attack... and then the creation of a grsecurity feature to defeat the technique, and then additional grsecurity code to harden the (at that time) writable BPF interpreter buffers, as it was previously possible to turn a linear buffer overflow into a BPF interpreter buffer into arbitrary memory read/write (and after BPF was extended with eBPF and before I hassled them enough to make the interpreter buffer read-only, easy arbitrary code execution as well). Nowadays BPF programs written by unprivileged users get translated behind the scenes into eBPF programs, but retain the limitations imposed by the BPF language. While "verifiable", it shouldn't be confused with "verified" -- if you've been following eBPF development you would have noticed several vulnerabilities over several architectures where the "verifiable" code wasn't acting as intended.

-Brad

Domesticating applications, OpenBSD style

Posted Jul 23, 2015 3:06 UTC (Thu) by thestinger (guest, #91827) [Link]

There's no use case for it without PR_SET_NO_NEW_PRIVS, so I don't see how that's a problem. It requires it because otherwise it could be abused as a way to attack setuid/setgid/setcap binaries. For example, consider a buggy program not checking the return value of setuid(...) being exploited by disallowing that system call. Note that the PR_SET_NO_NEW_PRIVS feature is only relevant at exec boundaries.