[Novalug] Nvidia shutoff

Bryan J. Smith b.j.smith at ieee.org
Sun Mar 7 15:55:55 EST 2010


No, it's very likely _related_!

nVidia uses an _unified_, binary kernel memory-GPU object plus
user-space objects for all Oses, with only select interface
differences being the difference for the OSes.  So if it affects
one OS, it's very likely to affect _all_ OSes.  It's the main
way nVidia avoids all IP and porting issues, and has for the
last decade.

CPUs and GPUs over the last 5+ years will automatically scale
down performance based on thermal/environmental conditions.
Since these are nominal operations -- especially in GPUs -- they
will not be audited (unless debugging is turned up).  In this
case, the thermals are quickly going through the roof, so the
GPU starts shutting down units to the point of a hang.

Sounds like a faulty set of conditionals in the nVidia control
systems logic, one that is wrapped up in those unified objects
(and doesn't need to be).  If there was ever a reason for
nVidia to consider striping the binary object down to only the
essential IP/ABI portions, and open source everything else,
this is one.

Then again, I've seen open source drivers dork things like this
before as well -- everything from DACs and scan ranges to control
of the fan and output.  Years of dealing with Intel has taught
me that it's no less likely to happen on open source -- although
it does show how the unified object in nVidia's world can quickly
mess up _all_ OSes in short order.  ;)

-- Bryan

P.S.  At least in Linux, you can bring the OSes up without the
nVidia driver and X at all.  In Windows, even safe mode may not
launch Windows without the driver.  This leaves people in a
circular state that cannot be solved.  Oh boy, this was a major
oversight!



----- Original Message ----
From: Bonnie Dalzell <bdalzell at qis.net>

On Sun, 7 Mar 2010, John Holland wrote:
> I actually found some threads online about this problem ocurring in Windows as well - it seems to be related to the NVIDIA drivers. My son recommended using an old driver but that wouldn't compile. Kernel too new for it maybe. I hate to leave something unresolbed but I think I will just try and forget about it for a while. If they release new drivers maybe I'll try them.

Just saw this article on ZDnet
WARNING! NVIDIA 196.75 drivers can kill your graphics card
http://blogs.zdnet.com/hardware/?p=7551&tag=content;col1
Would this only concern windows users or could it hit us?



More information about the Novalug mailing list