LWN.net Logo

Ask a kernel developer

July 7, 2010

This article was contributed by Greg Kroah-Hartman.

One in a series of columns in which questions are asked of a kernel developer and he tries to answer them. If you have unanswered questions relating to technical or procedural things around Linux kernel development, ask them in the comment section, or email them directly to the author.

How do I figure out who to email about a problem I am having with the kernel. I see a range of messages in the kernel log, how do I go from that to the proper developers who can help me out?

For example, right now I am seeing the following error:

    [  658.831697] [drm:edid_is_valid] *ERROR* Raw EDID:
    [  658.831702] 48 48 48 48 50 50 50 50 20 20 20 20 4c 4c 4c 4c HHHHPPPP    LLLL
    [  658.831705] 50 50 50 50 33 33 33 33 30 30 30 30 36 36 36 36 PPPP333300006666
    [  658.831709] 35 35 35 35 0a 0a 0a 0a 20 20 20 20 20 20 20 20 5555....
Where do I start with tracking this down?

The kernel log is telling you where the problem is, so the real trick is going to be in tracking it down to who is responsible for the messages. There are different ways to go about this. You could first try to just grep the kernel source tree for the error string:

    $ cd linux-2.6
    $ git grep "\*ERROR\* Raw EDID"
	
Ok, that didn't work, let's try to narrow down the string some:
    $ git grep "Raw EDID"
    drivers/gpu/drm/drm_edid.c:		DRM_ERROR("Raw EDID:\n");
	
Ok, now you have a file to look at. But who is responsible for this file? As mentioned previously you can use the get_maintainer.pl script by passing it the filename you are curious about:
    $ ./scripts/get_maintainer.pl -f drivers/gpu/drm/drm_edid.c
    David Airlie <airlied@linux.ie>
    Dave Airlie <airlied@redhat.com>
    Adam Jackson <ajax@redhat.com>
    Zhao Yakui <yakui.zhao@intel.com>
    dri-devel@lists.freedesktop.org
    linux-kernel@vger.kernel.org
	
This shows that the main DRM developer, David Airlie, is the best person to ask, but other developers may also be able to help out. Sending mail to David, and CCing the others listed (including the mailing lists) will get it in front of those who are most likely to be able to assist.

Another way to find out what code is responsible for the problem is to look at the name of the function that was writing out the error:

    [  658.831697] [drm:edid_is_valid] *ERROR* Raw EDID:
The function name is in the [] characters, and we can look for that to see what code is calling it:
    $ git grep edid_is_valid
    drivers/gpu/drm/drm_edid.c: * drm_edid_is_valid - sanity check EDID data
    drivers/gpu/drm/drm_edid.c:bool drm_edid_is_valid(struct edid *edid)
    drivers/gpu/drm/drm_edid.c:EXPORT_SYMBOL(drm_edid_is_valid);
    drivers/gpu/drm/drm_edid.c:	if (!drm_edid_is_valid(edid)) {
    drivers/gpu/drm/radeon/radeon_combios.c:	if (!drm_edid_is_valid(edid)) {
    include/drm/drm_crtc.h:extern bool drm_edid_is_valid(struct edid *edid);
	
This points again at the drivers/gpu/drm/drm_edid.c file as being responsible for the error message.

In looking at the function drm_edid_is_valid there are a number of other messages that could have been produced in the kernel log right before this one:

    if (csum) {
	    DRM_ERROR("EDID checksum is invalid, remainder is %d\n", csum);
	    goto bad;
    }

    if (edid->version != 1) {
	    DRM_ERROR("EDID has major version %d, instead of 1\n", edid->version);
	    goto bad;
    }
	
So when you email the developers and mailing list found by the get_maintainer.pl script, it is always important to provide all of the kernel log, not just the few single last lines of the error, because there might be more information a bit higher up that shows more information that the developers can use to help debug the problem.

[ Thanks to Peter Favrholdt for sending in this question. ]


(Log in to post comments)

Um, what's the problem?

Posted Jul 8, 2010 7:30 UTC (Thu) by dwmw2 (subscriber, #2063) [Link]

There's one really important thing missing from this — WHAT IS THE ACTUAL PROBLEM?

Is the kernel misbehaving? Do you get a correct picture? Does your screen run at the wrong resolution? Or are you just trawling through the logs looking for something that offends your sensibilities?

I've often had this kind of report from users, and the 'solution' has been to remove the nasty scary printk. There's not really been anything wrong at all.

Um, what's the problem?

Posted Jul 8, 2010 8:15 UTC (Thu) by Los__D (guest, #15263) [Link]

If the kernel bitches for no reason, the kernel has a bug.

Um, what's the problem?

Posted Jul 8, 2010 9:12 UTC (Thu) by pfavr (subscriber, #38205) [Link]

The problem that led me to ask this question is that I'm using a Lenovo X201 laptop (with ultrabase X200 dockingstation) to drive my HP LP3065 30" monitor (2560x1600). It sort of works, but occasionally (approx. 20 times a day) the screen goes black for half a second then comes back on again.

I've tried replacing the monitor (HP sent me a replacement) as well as cables, and the HP DisplayPort-to-dual-link-DVI-adapter used in my setup.

The X201 has a core i5 CPU with integrated Intel HD graphics (which also drives the DisplayPort output on the dockingstation.

I use Debian GNU/Linux "sid" 64bit and have had the best result using the stock Debian kernel.

As can be seen from the dmesg output there is a problem with parsing the EDID info coming from the monitor. Looking a bit closer it seems all bytes are repeated 4 times: HHHHPPPP and so on.

So I thought I might be able to help solving this problem, but where to start? I did know that grepping kernel sources would be a start but I think the article is really good because now I feel more "safe" posting an email to the relevant people and mailing list :-)

Um, what's the problem?

Posted Jul 10, 2010 12:58 UTC (Sat) by alankila (subscriber, #47141) [Link]

This is interesting. I have a 30" monitor at 2560x1600, and it also goes black for about half a second once after a cold boot. It takes variable time for it to occur, usually in the order of 10 minutes, so it interrupts my work, but after it does it once, it works for the rest of the session.

I've never seen any clues for this strange behavior. I've not considered replacing the monitor, because this one works otherwise and I already played the warranty dance for a few months thanks to severe quality problems with Samsung SyncMaster 305T+.

Cut here.

Posted Jul 8, 2010 8:28 UTC (Thu) by dwmw2 (subscriber, #2063) [Link]

"...it is always important to provide all of the kernel log, not just the few single last lines of the error, because there might be more information a bit higher up that shows more information that the developers can use..."
Note that this also applies to text which may appear above the idiotic -----[ cut here ]----- line you sometimes see in kernel logs. There is often useful information above that line, so don't cut there.

Cut here.

Posted Jul 9, 2010 11:00 UTC (Fri) by mb (subscriber, #50428) [Link]

So I'm wondering why is not being removed?
Lots of people don't like that idiotic line. It's only purpose is to scroll other important lines off-screen.

Finding who to send it to

Posted Jul 8, 2010 8:50 UTC (Thu) by epa (subscriber, #39769) [Link]

So you can grep through the kernel for the error string and find who to contact - but you'll end up cc'ing the main linux-kernel list anyway. Couldn't some of this be automated - have a single contact address from where reports will be forwarded to the right person, by looking at function names mentioned in the report?

Finding who to send it to

Posted Jul 8, 2010 9:22 UTC (Thu) by avik (guest, #704) [Link]

It is automated, it's called akpm (for automated kernel patch/problem manager).

Ask a kernel developer

Posted Jul 8, 2010 15:56 UTC (Thu) by iabervon (subscriber, #722) [Link]

In cases where it's unclear what the message is trying to say, it's often worth finding the line that produced the message and then using "git blame" to find the commit that introduced it. Look at the commit to make sure it isn't making an irrelevant change to the line (if so, find the commit that introduced the version that got tweaked). In this case, the commit is by Dave Airlie, but also mentions two co-authors and two contributors. It's probably best to start with the maintainer, but the author of the code in question may be more able to explain it. (For that matter, the commit message may explain it well enough, so it's worth reading that.)

Ask a kernel developer

Posted Jul 8, 2010 16:10 UTC (Thu) by nye (guest, #51576) [Link]

>Look at the commit to make sure it isn't making an irrelevant change to the line

And in that vein, I'm surprised by how many people aren't aware of 'git blame -w':
"Ignore whitespace when comparing the parent's version and the child's to find where the lines came from".

Ask a kernel developer

Posted Jul 17, 2010 0:04 UTC (Sat) by shapr (guest, #9077) [Link]

How do I submit questions to "Ask a Kernel Developer" ?

Ask a kernel developer

Posted Jul 17, 2010 16:49 UTC (Sat) by patrick_g (subscriber, #44470) [Link]

Send a mail to Greg KH.

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds