User: Password:
|
|
Subscribe / Log in / New account

Re: 2.6.22 regression: thermal trip points

From:  Len Brown <lenb-AT-kernel.org>
To:  trenn-AT-suse.de
Subject:  Re: 2.6.22 regression: thermal trip points
Date:  Fri, 3 Aug 2007 14:59:06 -0400
Message-ID:  <200708031459.07108.lenb@kernel.org>
Cc:  Andi Kleen <andi-AT-firstfloor.org>, Pavel Machek <pavel-AT-ucw.cz>, Alan Cox <alan-AT-lxorguk.ukuu.org.uk>, Andrew Morton <akpm-AT-linux-foundation.org>, Knut Petersen <Knut_Petersen-AT-t-online.de>, linux-kernel-AT-vger.kernel.org, mjg59-AT-srcf.ucam.org
Archive-link:  Article

On Friday 03 August 2007 07:16, Thomas Renninger wrote:
> On Thu, 2007-08-02 at 20:38 +0200, Andi Kleen wrote:
> > On Thu, Aug 02, 2007 at 03:57:54PM +0000, Pavel Machek wrote:
> > > On Thu 2007-08-02 15:16:22, Andi Kleen wrote:
> > > > On Thu, Aug 02, 2007 at 02:04:42PM +0100, Alan Cox wrote:
> > > > > > > Set a taint flag, 
> > > > > > That's hardly any useful if the machine is dead afterwards.
> > > > > 
> > > > > It won't be the hardware will do a failsafe shutdown first.
> > > > 
> > > > Not necessarily. At SUSE we had at least one broken laptop
> > > > with wrong trip points. The machine ran very hot for some time
> > > > and afterwards the hard disk was dead.
> > > 
> > > Yes, but it was original BIOS trip points that were wrong. And yes,
> > > its failsafe shutdown was too late. At least lowering the trip points
> > > would allow me to run it safely.
> > 
> > I have no problem with lowering them (in fact I proposed this
> > to Thomas as a possible solution at some point). Just rising 
> > is a bad idea.
> 
> Ok.
> If nobody screams (especially Len who has to accept this in the end, I
> don't want to do work for nothing..), I'll try an implementation that:
>   - Allows lowering trip points
>   - If BIOS modifies trip points, the overridden ones might also
>     get lowered if they are even lower
>   - Allow the definition of a passive trip point (with some default
>     values for hysteresis), even if the thermal zone does not
>     provide one
> 
> If we have something like this, we could still discuss a config option,
> that also allows to increase trip points, marking it with "If you set
> this you can destroy your machine, you have been warned...". While this
> would not be an option for distributions to compile in, some people may
> come around the biggest hammer -> overriding DSDT.
> 
> I cannot promise, but I try to get this for 2.6.24.

I think if you are enamored with overriding trip points at SuSE,
that you should simply restore the original scheme as the "value add"
for SuSE kernels.  Seriously, I'm totally fine with that.

You should be aware, however, that (one of) the fundamental flaws
with that scheme, shared with what you describe above, is that the OS
can not actually change the trip points in the thermal sensor.
The sensor is going to trip at the temperature that _it_ thinks
the trip point is at -- not the trip point that you are letting
the user think it is at.  Ie. what is advertised as a trip-point
override actually defeats the entire concept of trip-points,
and it is mandatory that you enable periodic polling of the
current temperature to compare with your new thresholds
to work-around that.

This faking out the user, plus the fact that the BIOS does change
trip-points at run-time, made the original scheme fundamentally
unsound.  Further, I've not yet found a single system where use
of this scheme wasn't papering over some other problem.  For the
upstream kernel, I think it is more appropriate to expose and fix
the fundamental problems.  For distro kernels, I'm less concerned
if you hide bugs instead of fixing them.

We had quite a long discussion when I deleted the trip-point-override
scheme in -mm.  Then it rode through the entire 2.6.22 release cycle.
However, I have yet to see a single bug report filed that has shown
that Linux should be doing this, or something like it.  I'm hopeful
that Knut's or Adrian's will be the first -- but I'm still waiting.

-Len


(Log in to post comments)

Re: 2.6.22 regression: thermal trip points

Posted Jun 12, 2008 8:05 UTC (Thu) by Jim_99 (guest, #52487) [Link]

FWIW, those that set their trip points too high deserve to fry their hardware. When notebooks
cost as much as they do and repairs, downtime and recovery of data is what it is, I'll gladly
wait a few more moments for a job/work package to finish and be confident that the notebook I
own will run for years on the OEM build hardware.

Any rule with electronics is that the cooler they run the longer their useful life. 65-70* C
would be unacceptable heat for a desktop, it's no different for a notebook and it's even a
bigger atrocity.

Apple had this issue back in 2003 with Powerbooks, they sent out updates and the fan turns on
in OS X Panther, Tiger and Leopard in my G4 @ 47* C at a lower controlled speed and if that is
inadequate to cool, as temperatures increase, the fan speed ramps up to higher rpm's. Linux
should behave this way as well in my opinion. The other options/solutions are to reset trip
points where the fan comes on @ 100% or have the trip point install in Ubuntu @ 65* C. I
prefer Apple's solution, but since this is Intel/AMD, resetting trip points is my preference.
65* C trip point is a forced option that makes me cringe to be forced to use on a Dell L400 P3
733 Mhz notebook.

BTW, that story about the hdd dieing, I had that happen when I first installed Ubuntu around
version 6.10. I researched trip point resetting and was pleased with the results through 7.04,
so much that I stayed @ 7.04 until earlier this week when I updated to 7.10 and now 8.04. I
was unaware this safeguard had ben implemented, had I known, it would've made a decision to
simply not upgrade the OS very easy for me. I may go back to 7.04, maybe as upgraded as 7.10 ?
As I did these updates back to back, I really didn't get to know 7.10 and how it behaved on
the Dell L400.


Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds