View Single Post
Old 01-14-11, 01:34 PM   #19
jpi110
Gentoo User
 
Join Date: Jan 2011
Location: Portland, Oregon
Posts: 14
Angry Re: NVRM: os_schedule: Attempted to yield the CPU while in atomic[...] [260.19.21 x86

Quote:
Originally Posted by voegel View Post
I think the problem came up with the update to the latest xorg 1.9. On ubuntu 10.04 everything was fine with driver 256.xx. If I revert to this version under ubuntu 10.10/xorg 1.9 that produces the same results.

I have also eliminated a HW related issue.

Sebastian
I tested 2 GTX 275s vs 2 GTX 295s. No change or improvement in stability. This rules out possibly bad video cards.

BIOS features are at identical settings for both Linux and Windows. Hardware-wise, all combinations seem to work fine in Windows. That definitely rules out hardware.

On my Gentoo system, I'm using just 1.9.2. (Url: http://packages.gentoo.org/package/x11-base/xorg-server platform: amd64) And, it's using the amd64 keyword. Meaning, I'm not using what Gentoo considers unstable. Masked by the ~amd64 keyword are 1.9.2.902 and 1.9.3.901. I haven't tried with either of those yet.

I also don't use crazy or unacceptable optimization flags. (-O2 -march=native -mtune=native -pipe). That rules out possible unstable or aggressive optimization flags. I've tried nvidia-drivers with and without the custom-optimization keyword. No improvement there either.

However, I am inclined to this think the issue is within the nvidia drivers (for Linux) with certain CPU/GPU combinations.

Now, moving to the next sequence of tests, no kernel version supported by the drivers in Linux (with the dual quad-core opteron 2378s) seems to yield any more or less stability. That rules out a kernel issue. (2.6.34-gentoo-r1, 2.6.34-ck-r3, 2.6.36-gentoo-r5, 2.6.36-ck-r5 tested. 2 of which are stable {gentoo kernels} and 2 of which are ck patchsets added to the stable gentoo kernels {ck}).

Looking at the patch notes for 260.19.29: Fixed a bug that caused some OpenGL applications to become unresponsive for up to a minute on some GPUs when changing the resolution or refresh rate.

In my case, I wasn't changing resolution or refresh rate when the instability manifested itself. Though, this is decidedly pointing to a driver issue specifically within the OpenGL area with respect to either the GPU or CPU - or combination of them.

Looking at a further patch note going backward: Fixed a regression introduced after 256.35 that caused stability problems on GPUs such as GeForce GT 240.

I'm thinking that the issue here lies within the nVidia implementation of OpenGL on Linux for a specific set of GPU and/or CPU combinations. That would explain why the Windows iterations do not manifest this problem. Windows uses DirectX where possible and OpenGL where it is not. The application in question does definitely use DirectX in Windows and OpenGL in Linux.

I will have to do some more testing, but initial indications on a new CPU platform and motherboard (otherwise identical OS/kernel/drivers/video cards/hard drives in Linux) suggest that the issue could have been resolved at the moment. Or, more accurately, the nVidia drivers don't seem to blow up with this combination.

I have since moved to an ASUS M4N98TD Evo board with 1 AMD Phenom II X6 1090T (Black edition) CPU -- previously on a Supermicro H8DA8-2 with 2 AMD Opteron 2378s (quad core). The ASUS + 1090T combination in Linux appears to be stable at the moment. I will do more testing later this weekend to see if I can get it to crash.

Matrix as follows:

Kernel/OS CPU/Motherboard Nvidia Drivers Nvidia Cards Result
------------------- ------------------------------------------ ----------------- ------------------ -------
2.6.34-gentoo-r1 SuperMicro+2 AMD Opteron 2378 260.19.29 2 GTX 295-SLI FAIL
2.6.36-gentoo-r5 SuperMicro+2 AMD Opteron 2378 260.19.29 2 GTX 295-SLI FAIL
2.6.34-ck-r3 SuperMicro+2 AMD Opteron 2378 260.19.29 2 GTX 295-SLI FAIL
2.6.36-ck-r5 SuperMicro+2 AMD Opteron 2378 260.19.29 2 GTX 295-SLI FAIL
2.6.34-gentoo-r1 SuperMicro+2 AMD Opteron 2378 260.19.21 2 GTX 295-SLI FAIL
2.6.36-gentoo-r5 SuperMicro+2 AMD Opteron 2378 260.19.21 2 GTX 295-SLI FAIL
2.6.34-ck-r3 SuperMicro+2 AMD Opteron 2378 260.19.21 2 GTX 295-SLI FAIL
2.6.36-ck-r5 SuperMicro+2 AMD Opteron 2378 256.53 2 GTX 295-SLI FAIL
2.6.34-gentoo-r1 SuperMicro+2 AMD Opteron 2378 256.53 2 GTX 295-SLI FAIL
2.6.36-gentoo-r5 SuperMicro+2 AMD Opteron 2378 256.53 2 GTX 295-SLI FAIL
2.6.34-ck-r3 SuperMicro+2 AMD Opteron 2378 256.53 2 GTX 295-SLI FAIL
2.6.36-ck-r5 SuperMicro+2 AMD Opteron 2378 256.53 2 GTX 295-SLI FAIL
2.6.34-gentoo-r1 SuperMicro+2 AMD Opteron 2378 195.36.31 2 GTX 295-SLI SUCCESS {3}
2.6.36-gentoo-r5 SuperMicro+2 AMD Opteron 2378 195.36.31 2 GTX 295-SLI SUCCESS {3}
2.6.34-ck-r3 SuperMicro+2 AMD Opteron 2378 195.36.31 2 GTX 295-SLI SUCCESS {3}
2.6.36-ck-r5 SuperMicro+2 AMD Opteron 2378 195.36.31 2 GTX 295-SLI SUCCESS {3}


Windows 7 64bit SuperMicro+2 AMD Opteron 2378 266.35 2 GTX 295-SLI SUCCESS
Windows 7 64bit SuperMicro+2 AMD Opteron 2378 260.99 2 GTX 295-SLI SUCCESS {1}

2.6.34-gentoo-r1 SuperMicro+2 AMD Opteron 2378 260.19.29 2 GTX 275-SLI FAIL
2.6.36-gentoo-r5 SuperMicro+2 AMD Opteron 2378 260.19.29 2 GTX 275-SLI FAIL
2.6.34-ck-r3 SuperMicro+2 AMD Opteron 2378 260.19.29 2 GTX 275-SLI FAIL
2.6.36-ck-r5 SuperMicro+2 AMD Opteron 2378 260.19.29 2 GTX 275-SLI FAIL
2.6.34-gentoo-r1 SuperMicro+2 AMD Opteron 2378 260.19.21 2 GTX 275-SLI FAIL
2.6.36-gentoo-r5 SuperMicro+2 AMD Opteron 2378 260.19.21 2 GTX 275-SLI FAIL
2.6.34-ck-r3 SuperMicro+2 AMD Opteron 2378 260.19.21 2 GTX 275-SLI FAIL
2.6.36-ck-r5 SuperMicro+2 AMD Opteron 2378 260.19.21 2 GTX 275-SLI FAIL
2.6.34-gentoo-r1 SuperMicro+2 AMD Opteron 2378 256.53 2 GTX 275-SLI FAIL
2.6.36-gentoo-r5 SuperMicro+2 AMD Opteron 2378 256.53 2 GTX 275-SLI FAIL
2.6.34-ck-r3 SuperMicro+2 AMD Opteron 2378 256.53 2 GTX 275-SLI FAIL
2.6.36-ck-r5 SuperMicro+2 AMD Opteron 2378 256.53 2 GTX 275-SLI FAIL
2.6.34-gentoo-r1 SuperMicro+2 AMD Opteron 2378 195.36.31 2 GTX 275-SLI SUCCESS {3}
2.6.36-gentoo-r5 SuperMicro+2 AMD Opteron 2378 195.36.31 2 GTX 275-SLI SUCCESS {3}
2.6.34-ck-r3 SuperMicro+2 AMD Opteron 2378 195.36.31 2 GTX 275-SLI SUCCESS {3}
2.6.36-ck-r5 SuperMicro+2 AMD Opteron 2378 195.36.31 2 GTX 275-SLI SUCCESS {3}


2.6.36-ck-r5 ASUS M4N98TD EVO+AMD 1090T 260.19.29 2 GTX 295-SLI SUCCESS {2}

Key:
{1} Flickering was detected on 3D textures (fixed by using beta drivers)
{2} Tentatively, this looks successful. Will know more after extensive testing.
{3} Flickering was detected on 2D and 3D textures - making view of the screen quite difficult.

Last edited by jpi110; 01-14-11 at 05:15 PM. Reason: Added Gentoo Packages link. Also added tested configurations.
jpi110 is offline   Reply With Quote