User: Password:
|
|
Subscribe / Log in / New account

Ktap or BPF?

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

By Jonathan Corbet
April 23, 2014
While the kernel's built-in tracing mechanisms have advanced considerably over the last few years, the kernel still lacks a DTrace-style dynamic tracing facility. In the last year we have seen the posting of two different approaches toward scriptable dynamic tracing: ktap and BPF tracing filters. Both work by embedding a virtual machine in the kernel to execute scripts, but the similarity ends there. Putting one virtual machine into the kernel for tracing is a hard sell; adding two of them is not really seen as an option by anybody involved. So, at some point, a decision will have to be made. A recent discussion on that topic gives some hints about the direction that decision could go.

The trigger for the discussion was the posting of a new version of the ktap patch set after a period of silence. While quite a bit of work has been done on ktap, little was done to address the concerns that kept ktap out of the 3.13 kernel. Ingo Molnar, who blocked the merging of ktap the last time around, was not pleased that progress had not been made on that front.

Virtual machines

There appear to be two specific points of argument that come up when the merits of ktap and BPF tracing filters are discussed. The first of those is, naturally, the question of introducing another virtual machine into the kernel. On this point, the discussion has shifted a bit, though, for a simple reason: while ktap needs its own virtual machine, the BPF engine is already in the mainline kernel, and it has been getting better.

BPF originally stood for "Berkeley packet filter"; it was used as a way to tell the kernel how to narrow down a stream of packets from a network interface when tools like tcpdump are in use. Over time, though, BPF has been used in other contexts, such as filtering access to system calls as part of the seccomp mechanism and in a number of packet classification subsystems. Alexei Starovoitov's tracing filters patch set simply allows this virtual engine to be used to select and process system events as well.

In 2011, BPF gained a just-in-time compiler that sped it up considerably. The 3.15 kernel takes this work further; it will feature a radically reworked (by Alexei) BPF engine that expands its functionality considerably. The new BPF offers the same virtual instruction set to user space, but those instructions are translated within the kernel into a format that is closer to what the hardware provides. The new format offers a number of advantages over the old, including ten registers instead of two, 64-bit registers, more efficient jump instructions, and a mechanism to allow kernel functions to be called from BPF programs. Needless to say, the additional capabilities have further reinforced BPF's position as the virtual machine of choice for an in-kernel dynamic tracing facility.

Thus, if ktap is to be accepted into the kernel, it almost certainly needs to be retargeted to the BPF virtual machine. Ktap author Jovi Zhangwei has expressed a willingness to consider making such a change, but he sees a number of shortcomings in BPF that would need to be resolved first. BPF as it currently exists does not support features needed by ktap, including access to global variables, timer-limited looping (or loops in general, since BPF disallows them by design), and more. Jovi also repeatedly complained about the BPF tracing filter design, which is oriented around attaching scripts to specific tracepoints; Jovi wants a more flexible mechanism that would allow attaching a single script to a range of tracepoints.

That last functionality should not be too hard to add. Most of the rest of Jovi's requests could probably be worked into BPF as well, especially if Jovi were to help to do the work. Alexei seems to be amenable to evolving BPF in ways that would enable it to better support ktap. The communication between the two developers appears to be difficult, though, with frequent misunderstandings being seen. At one point, Jovi concluded that Alexei was not interested in making the necessary changes to BPF; he responded by saying:

Anyway, I think there will don't have any necessary to upstream ktap any more, I still enjoy the simplicity and flexibility given by ktap, and hope there will have a kernel built-in alternative solution in future.

In truth, the situation need not be so grim, but there may be a need for an outside developer to come in and actually do the work to integrate ktap and BPF to show that it is possible. Thus far, volunteers to do this work have not made themselves known. And, in any case, there is another issue.

Scripting languages

Ktap is built on the Lua language, which offers a number of features (associative arrays, for example) that can be useful in dynamic tracing settings. Ingo, along with a few others, would rather see a language that looks more like C:

I'd suggest using C syntax instead initially, because that's what the kernel is using. The overwhelming majority of people probing the kernel are programmers, so there's no point in inventing new syntax, we should reuse existing syntax!

The BPF tracing filters patch uses a restricted version of the C language; Alexei has also provided backends for both GCC and LLVM to translate that language into something the BPF virtual machine can run. So, once again, the BPF approach appears to have a bit of an advantage here at the moment.

Unsurprisingly, Jovi feels differently about this issue; he sees the ktap language as being far simpler to work with. To support this claim, he provided this code from a BPF tracing filter example:

    void dropmon(struct bpf_context *ctx) {
        void *loc;
        uint64_t *drop_cnt;

        loc = (void *)ctx->arg2;

        drop_cnt = bpf_table_lookup(ctx, 0, &loc);
        if (drop_cnt) {
            __sync_fetch_and_add(drop_cnt, 1);
        } else {
            uint64_t init = 0;
            bpf_table_update(ctx, 0, &loc, &init);
        }
    }

This filter, he says, can be expressed this way in ktap:

    var s ={}

    trace skb:kfree_skb {
        s[arg2] += 1
    }

Alexei concedes that ktap has a far less verbose source language, though he has reservations about the conciseness of the underlying bytecode. In any case, though, he (along with others) has suggested that, once there is agreement on which virtual machine is to be used, there could be any number of scripting languages supported in user space.

And that is roughly where the discussion wound down. There is a lot of interesting functionality to be found in ktap, but, the way things stand currently, it may well be that this code gets passed over in favor of an offering from a developer who is more willing to do what is needed to get the code upstream. That said, this discussion is far from resolved, and Jovi is not the only developer who is working on ktap. With the application of a bit of energy, it may yet be possible to get ktap's higher-level functionality into a condition where it could someday be merged.


(Log in to post comments)

Ktap or BPF?

Posted Apr 24, 2014 1:27 UTC (Thu) by karim (subscriber, #114) [Link]

Loops and concision are gold. But if I had to choose, I'd say that loops are absolutely required. There might an argument to be made that loops shouldn't be allowed in the c-like BPF syntax but should be allowed in the Lua-based ktap syntax.

Ktap or BPF?

Posted Apr 24, 2014 3:29 UTC (Thu) by jtc (guest, #6246) [Link]

"Anyway, I think there will don't have any necessary to upstream ktap any more, I still enjoy the simplicity and flexibility given by ktap, and hope there will have a kernel built-in alternative solution in future. "

Anyone have a translation of that sentence into English?

Ktap or BPF?

Posted Apr 24, 2014 3:34 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

It's obviously written by a non-native speaker but I had no problem in understanding it.

"Anyway, I don't think I have the will to upstream ktap anymore. I still enjoy the simplicity and flexibility given by ktap, and hope there will have a kernel built-in alternative solution in the future"

Ktap or BPF?

Posted Apr 24, 2014 21:00 UTC (Thu) by jtc (guest, #6246) [Link]

"I think there will don't have any necessary to upstream ktap any more," ->
"I don't think I have the will to upstream ktap anymore."

That sounds like a feasible interpretation.

Thanks.

Usually, I don't comment in such situations because it's unreasonable to expect all non-native-English speakers to speak well; but that sentence was so weird I couldn't resist.

Ktap or BPF?

Posted Apr 25, 2014 20:42 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I think "I think there will don't have any necessary to upstream ktap any more" is more likely to mean, "I don't think it is necessary to upstream ktap any more" than to mean, "I don't think I have the will to upstream ktap anymore."

I.e. "will" is just a verb particle indicating the upstreaming is in the future and "necessary" indicates necessity.

Ktap or BPF?

Posted Apr 27, 2014 22:51 UTC (Sun) by HelloWorld (guest, #56129) [Link]

It's not just that it's weird, it also makes things even harder to understand for readers whose native language isn't english. I'm glad you asked, I also didn't get what he meant. And actually I think it would have been a good idea for Jon to do this.

Ktap or BPF?

Posted Apr 24, 2014 9:25 UTC (Thu) by sorokin (subscriber, #88478) [Link]

I don't quite understand the idea to implement a lua virtual machine in the kernel.

Why not create special interface for userspace for tracing/packet-filtering and then implement virtual machine completely in userspace? In this case userspace could use full featured optimizer to perform optimizations. I assume it is not a good idea to implement full featured optimizer in kernel space from security point of view.

A quick googling shows that cost of context switching is less 2k cycles. So performing any non trivial action in lua in kernel is probably more expensive than switching to userspace and doing the same operation in userspace.

Ktap or BPF?

Posted Apr 24, 2014 11:33 UTC (Thu) by fuhchee (guest, #40059) [Link]

Beyond simple context switching costs, there is also the need to pass contextual data from the probe site to the interpreter. Some of that data may not be known declaratively (ahead of time), so the interpreter would have to run just to know what data will be needed, then the kernel would have to be consulted, then back to the interpreter, etc. etc.

Ktap or BPF?

Posted Apr 24, 2014 17:24 UTC (Thu) by marcH (subscriber, #57642) [Link]

> I'd suggest using C syntax instead initially, because that's what the kernel is using. The overwhelming majority of people probing the kernel are programmers, so there's no point in inventing new syntax, we should reuse existing syntax!

Syntax is one thing, features are another.

Is there an existing high-level language with C-like syntax? (Please don't answer Java)

Ktap or BPF?

Posted Apr 24, 2014 19:18 UTC (Thu) by zlynx (subscriber, #2285) [Link]

> Is there an existing high-level language with C-like syntax? (Please don't answer Java)

Of course. C++.

Ktap or BPF?

Posted Apr 24, 2014 21:33 UTC (Thu) by marcH (subscriber, #57642) [Link]

> Of course. C++.

Well... I meant: *significantly* higher level than C. I don't see C++ features making much difference in this specific context. (counter-)examples?

Ktap or BPF?

Posted Apr 25, 2014 8:13 UTC (Fri) by lkundrak (subscriber, #43452) [Link]

awk, I'd say?

Ktap or BPF?

Posted Apr 26, 2014 13:26 UTC (Sat) by nix (subscriber, #2304) [Link]

awk... which the D language used by DTrace was explicitly modelled on. (Though it has no loops at all.)

Ktap or BPF?

Posted Apr 25, 2014 8:25 UTC (Fri) by nhippi (subscriber, #34640) [Link]

pike, golang?

Ktap or BPF?

Posted Apr 27, 2014 23:29 UTC (Sun) by HelloWorld (guest, #56129) [Link]

> Is there an existing high-level language with C-like syntax?
What does high-level mean? What does C-like mean?

C doesn't have most of the features you want in a high-level language, such as lambda expressions, list literals, pattern matching, maybe regexes etc., so a high-level language can't possibly reuse C syntax for those (actually C99 does have array literals, but the syntax is hideous). Otoh C has features that you don't want in a high-level language, notably pointers and the silly distinction between statements and expressions, the horrid preprocessor and some operators whose usefulness is questionable at best, such as {pre,post}{in,de}crement.

“High-level language with C-like syntax” is thus pretty close to an oxymoron if you ask me.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds