Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 08-26-12, 05:09 AM   #73
cookiecaper
Registered User
 
Join Date: May 2007
Posts: 28
Default Re: Random crashes, NVRM Xid messages

I believe I am experiencing the same issue here with 304.37. I do not have Bumblebee, am using a desktop GTX 285 on Linux 3.4.9. I have experienced this both with WINE games (TF2, Orcs Must Die! 2, Homefront) and native Sauerbraten occasionally over the last 3-4 months. Interestingly, I seem to have no problem playing Skyrim for hours on end.

Problems typically seem to occur for me only after about 90 minutes of gameplay. There will be occasional stutters in responsiveness that span out over a relatively long period of time (20-40 minutes) before X becomes totally unresponsive and/or only responsive to mouse input. Usually external audio output from VLC or other sources continues without trouble, but the audio of the 3D process that caused the lock stutters within the same frame indefinitely.

I am usually able to SSH in to the system when this happens, but cannot kill the problematic process. I was able to use SysRq K to kill X. I typically reboot via SSH when this happens.

I have attached a bug report log from my most recent incident, which occurred while playing Orcs Must Die! 2 in WINE 1.5.11.
Attached Files
File Type: gz nvidia-bug-report.log.gz (47.8 KB, 62 views)
cookiecaper is offline   Reply With Quote
Old 08-27-12, 06:49 PM   #74
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: Random crashes, NVRM Xid messages

Still the same crashes with 304.43.
rockob is offline   Reply With Quote
Old 08-28-12, 07:02 PM   #75
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: Random crashes, NVRM Xid messages

Quote:
Originally Posted by cookiecaper View Post
I believe I am experiencing the same issue here with 304.37. I do not have Bumblebee, am using a desktop GTX 285 on Linux 3.4.9. I have experienced this both with WINE games (TF2, Orcs Must Die! 2, Homefront) and native Sauerbraten occasionally over the last 3-4 months. Interestingly, I seem to have no problem playing Skyrim for hours on end.

Problems typically seem to occur for me only after about 90 minutes of gameplay. There will be occasional stutters in responsiveness that span out over a relatively long period of time (20-40 minutes) before X becomes totally unresponsive and/or only responsive to mouse input. Usually external audio output from VLC or other sources continues without trouble, but the audio of the 3D process that caused the lock stutters within the same frame indefinitely.

I am usually able to SSH in to the system when this happens, but cannot kill the problematic process. I was able to use SysRq K to kill X. I typically reboot via SSH when this happens.

I have attached a bug report log from my most recent incident, which occurred while playing Orcs Must Die! 2 in WINE 1.5.11.
That "Attempted to yield the CPU while in atomic or interrupt context" message sure looks familiar. Unfortunately nobody at nvidia will tell us what the Xid errors mean (just that they are for debugging) or how to provide more information to help them track it down and fix it.

I have found that some games are more likely to trigger the bug, in particular crysis2 rarely lasts more than a minute on my setup. For me, with the latest 304 series drivers (including 304.43), games that used to not experience the dreaded nvidia crash are far more likely to experience it now. But alien arena (compiled natively, so not using wine) doesn't crash at all. Lots of people report that nvidia crashes X when they aren't even playing games (luckily for me my main GPU is an Intel, which doesn't crash).

Perhaps there is a particular 3d code path in the nvidia code that is prone to crashing and not all games use it (eg a particular opengl call that alien arena doesn't use because it is only using calls compatible with an older version of opengl so as to give the same experience on intel as nvidia or ati). Or perhaps some games are more 3d-intensive and the nvidia driver simply isn't well-written enough to handle the load.
rockob is offline   Reply With Quote
Old 08-31-12, 08:11 PM   #76
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: Random crashes, NVRM Xid messages

If you're using bumblebee, https://github.com/Bumblebee-Project...bee/issues/241 has details of a bumblebee VGL transport plugin you can use that might improve performance.

Unfortunately it makes no difference to the nvidia 304.43 driver, which still crashes and locks up X with the usual Xid errors.

And nvidia actually are making some attempt to support prime (http://www.phoronix.com/scan.php?pag...tem&px=MTE3MzY). If they can get that working, all they have to do is fix their long-standing driver crashing bug and we might have functioning nvidia cards again...
rockob is offline   Reply With Quote
Old 09-03-12, 04:25 AM   #77
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: Random crashes, NVRM Xid messages

I've also tried:
  • forcing the nvidia interrupt smp affiinity, the wine process, and X to use the same CPU core; renice-ing the optirun to and wine processes to -20;
  • exporting __GL_NO_DSO_FINALIZER=1, __GL_YIELD="USLEEP" or __GL_YIELD="NOTHING", _GL_SINGLE_THREADED="1";
  • turning off the GLShaderDiskCache,
  • setting DamageEvents and ConnectToAcpid to false
  • running in a completely new user login

... and none of them make any difference. The abysmal nvidia 304.43 driver consistently crashes - usually in less than a minute of gameplay although the exact point is random - on my current setup (ubuntu 12.10 (xserver 1.12.99.905, kernel 3.5)).

I emailed linux-bugs@nvidia.com over a week ago but haven't even received the courtesy of an acknowledgement.
rockob is offline   Reply With Quote
Old 09-10-12, 06:16 AM   #78
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: Random crashes, NVRM Xid messages

I tested running nvidia in its own X server directly driving the HDMI output connected to the nvidia card, in order to rule out any potential problems there might be in using virtualgl to transport the graphics output back to the intel GPU (details at https://github.com/Bumblebee-Project...bee/issues/237).

However, with this configuration the nvidia driver again crashed with a Xid 13 error and shortly afterwards the laptop hung. So it's looking very likely that it's an nvidia bug causing this and not anything to do with bumblebee.

I also tried running the test application on an nvidia 8600M card, and it didn't crash during my test (this is not conclusive, but since the 540M crashes so readily and quickly it seems a reasonable assumption that the 8600M card doesn't have the same issue with the 304.43 driver). So in case it was one of the new features in the nvidia 540 card, I modified wine to return OpenGL and GLX info matching the 8600M card on the machine with the 540M card. But even with this configuration, the nvidia driver still crashed as usual on the 540M card.
rockob is offline   Reply With Quote
Old 09-11-12, 07:21 AM   #79
Iesos
Registered User
 
Join Date: Apr 2012
Posts: 15
Default Re: Random crashes, NVRM Xid messages

I made a search around for my model: XPS 15 v2 (L502x), apperently alot of these shipped with a faulty nvidia card. I contacted Dell and after running temperature tests and describing the problem they changed the motherboard (hence the graphics card) and now I have been playing for a day with not a single crash. So it seems to work so far.

I think it is very terrible that we got no information about these Xid messages from nvidia. The only interpretation I can make of this is that the messages likely indicate hardware problems.

I the warranty still holds, contact your manufacturer.
Iesos is offline   Reply With Quote
Old 09-11-12, 09:37 AM   #80
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: Random crashes, NVRM Xid messages

Well, if mine were still in warranty I wouldn't mind trying out a new motherboard. It wouldn't surprise me if the card were faulty since my previous laptop with a GT8600M nvidia card had to have a replacement video card - nvidia infamously screwed up an entire batch of the 8400/8600 series cards (http://apcmag.com/nvidia_disaster_th...pus_faulty.htm).

However, since the 540M doesn't crash under windoze I still strongly suspect that there is an issue with the driver rather than the card.

Either way, and considering nvidia's complete lack of support for their product, it sure seems like a damned good reason to avoid them in the future...
rockob is offline   Reply With Quote

Old 09-18-12, 12:16 AM   #81
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: Random crashes, NVRM Xid messages

Here's an odd observation: if I run the CoDMW3 'Hit and Run' mission from a btrfs partition (with kernel 3.5.3 or 3.6-rc6, Xserver 1.13), the nvidia card crashes *every* time within ten seconds, often in less than 5 seconds. (The temperature of the card is usually under 55C at this point, so there's definitely no overheating issue.) But if I run it from an ext4 or ntfs partition, it runs for longer, sometimes as long as the entire mission - although it is still quite likely to fail at some random point within a few minutes.

Since the nvidia driver apparently only uses the drive for GL shader caching, and this by default goes in ~/.nv (which is always on an ext4 drive on my PC), I wonder if the nvidia driver is not handling interrupts correctly and btrfs is more interrupt intensive than ext4, increasing the chances of nvidia crashing? I can't see that GL caching would be relevant, and indeed if I disable it by exporting __GL_SHADER_DISK_CACHE=0, nvidia still crashes.
rockob is offline   Reply With Quote
Old 09-23-12, 03:46 AM   #82
bactrimel
Registered User
 
Join Date: Apr 2003
Posts: 20
Default Re: Random crashes, NVRM Xid messages

OK, obviously such symptoms could be the result of different underlying problems, but...

in my case, what *completely* cured the problem of random freezes and Xid messages (after weeks of painful experimentation) was removing all kernel modules that have to do with thermal sensors. I have also disabled all the relevant plugins from gkrellm, in order to prevent such modules from being loaded automatically.

The system is rock solid now, running for 2+ days straight under KDE+composite without a single glitch.

I suspect that some such program or kernel module is periodically polling/generating interrupts under nvidia driver's nose, messing up the interface. It would be nice if someone could debug this to the end, though.

Hope this helps,

Bactrimel
__________________
CentOS 6 + KDE 4
GeForce GTS 450
bactrimel is offline   Reply With Quote
Old 09-24-12, 03:46 AM   #83
19721201
Registered User
 
Join Date: Sep 2005
Posts: 16
Default Re: Random crashes, NVRM Xid messages

Quote:
Originally Posted by bactrimel View Post
OK, obviously such symptoms could be the result of different underlying problems, but...

in my case, what *completely* cured the problem of random freezes and Xid messages (after weeks of painful experimentation) was removing all kernel modules that have to do with thermal sensors. I have also disabled all the relevant plugins from gkrellm, in order to prevent such modules from being loaded automatically.

The system is rock solid now, running for 2+ days straight under KDE+composite without a single glitch.

I suspect that some such program or kernel module is periodically polling/generating interrupts under nvidia driver's nose, messing up the interface. It would be nice if someone could debug this to the end, though.

Hope this helps,

Bactrimel
Nice find!

In my system i have loaded two modules for sensors and then there's the nvidia built-in thermal and frequency monitor.

It looks like disabling the Thermal Monitor in nvidia-settings configuration is enough! (GeForce GT240 with 304.43 and linux 3.2.0-31-generic)

I need to do further testing though
19721201 is offline   Reply With Quote
Old 09-25-12, 07:47 PM   #84
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: Random crashes, NVRM Xid messages

I tried booting with thermal.off=1, which disables acpi thermal control (ie so fan control is entirely in the hands of the BIOS), but it didn't make any difference with 304.48 - nvidia still crashed in most games pretty quickly.

However, 304.51 is working much better, touch wood. I haven't had a crash in the last 24 hours launching CodMW2 and CodMW3 from an ext4 partition.

Note though that when I launched CodMW3 from a btrfs partition, 'Hit and Run' still crashed within 20 seconds.
rockob is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 04:46 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.