Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 05-13-10, 06:04 PM   #1
Brad.Scalio
Registered User
 
Join Date: Mar 2007
Posts: 20
Default Repeatable X Crash, dual FX580 on HP Z600

Greetings

Problem: Spontaneous, random, and fairly repeatable X crashes when users resize windows or change window attributes such as maximizing, minimizing, etc...

Running dual FX580 cards on HP Z600 workstations

OS: RedHat Enterprise Linux 5.2
KERNEL: 2.6.18-92.el5PAE
XORG-X11: 1.1.1-48.41.el5
NVRM: NVIDIA UNIX x86 Kernel Module 195.36.15
GCC: 4.1.2 20071124 (Red Hat 4.1.2-42)
GPU: NVIDIA Quadro FX580 (512)
RedHat Ticket: 861593

Attached are the bug report and Xorg logfile during crash.

Here is gdb attached to Xorg when it crashed, unfortunately, some lines in there (0x0000001?) seem to be nonsense, have seen RHEL5 gdb have similar issues when attached. Of course, there is no nvidia debuginfo, so the remaining missing symbology is attributed to such.

(gdb) bt f
#0 0x001a2410 in __kernel_vsyscall ()
No symbol table info available.
#1 0x00394d10 in raise () from /lib/libc.so.6
No symbol table info available
#2 0x00396621 in abort () from /lib/libc.so.6
No symbol table info available.
#3 0x080a0d55 in ddxGiveUp () at xf86Init.c:1261i = <value optimized
out>
#4 0x081ae6e3 in AbortServer () at log.c:408
No locals.
#5 0x081aec76 in FatalError (f=0x81c1c2c "Caught signal %d. Server
aborting\n") at log.c:554 args = 0xbffa5d94 "\v"beenhere = 1
#6 0x080e5a70 in xf86SigHandler (signo=11) at xf86Events.c:1484
No locals.
#7 <signal handler called>
No symbol table info available.
#8 0x014bbf88 in _nv001111X () from
/usr/lib/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#9 0x0a507300 in ?? ()
No symbol table info available.
#10 0x0a507300 in ?? ()
No symbol table info available.
#11 0xbffa6158 in ?? ()
No symbol table info available.
#12 0xbffa6154 in ?? ()
No symbol table info available.
#13 0x00000001 in ?? ()
No symbol table info available.
#14 0xbffa623c in ?? ()
No symbol table info available.
#15 0x013899d0 in _nv002838X () from
/usr/lib/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#16 0x0a24c458 in ?? ()
No symbol table info available.
#17 0x08156b63 in getDrawableDamageRef (pDrawable=0x0) at
damage.c:77
pPixmap = (PixmapPtr) 0x2

Have tried various things, it looks like most of the options I am used to, XaaNoPixmapCache, XaaNoOffScreenPixmaps, AccelMethod "xaa", all are not activated when I add them to the driver sections of the xorg.conf - I assume these have been incorporated into the driver with the newer versions.

Have tried back drivers as well 185.18.29 is the driver pre-packaged on the HP resource CD as approved for this workstation.

I understand that the Nehalem chipsets aren't supported officialy until RHEL5.3, however, whatever hardware acceleration that may or may not be occuring I doubt highly is directly attributable to this problem.

Any suggestions, comments, complaints, etc are more than welcome
Attached Files
File Type: gz nvidia-bug-report.log.gz (35.6 KB, 82 views)
File Type: log Xorg.0.log (33.6 KB, 80 views)
Brad.Scalio is offline   Reply With Quote
Old 05-13-10, 08:27 PM   #2
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: Repeatable X Crash, dual FX580 on HP Z600

AFAIR, RHEL 5.2 uses 4k kernel stacks. The nvidia driver
worked pretty well with only 4k for a while, but maybe there
is a regression? Can you please try a kernel with 8k?

Just pure speculation, though.

Bernhard
JaXXoN is offline   Reply With Quote
Old 05-14-10, 05:50 AM   #3
Brad.Scalio
Registered User
 
Join Date: Mar 2007
Posts: 20
Default Re: Repeatable X Crash, dual FX580 on HP Z600

Thanks for the reply ... yes, the CONFIG_4KSTACKS flag is set to 'y' in the kernel configuration. Unfortunately, we are tied into CM to have this specific kernel ... one option we are pursuing is updating to RHEL5.3 to "officially" support the Nehalem chipsets that way taking that out of the equation of possible factors.

We have about 40 of these workstations out there, and there are consistent repeatable steps that can make X crash -- namely, opening up an application on each of the three heads, moving one partly off-screen, then launching another application (glxgears even crashes it) and resizing the windows.
Brad.Scalio is offline   Reply With Quote
Old 05-14-10, 06:05 AM   #4
Brad.Scalio
Registered User
 
Join Date: Mar 2007
Posts: 20
Default XAA options with NVIDIA drivers 180.X +

Does anyone know the XAA options for off screen pixmaps and pixmapcache are still valid with the newer drivers. Whenever I implement these options in xorg.conf I get messages that they are ignored.
Brad.Scalio is offline   Reply With Quote
Old 05-14-10, 07:14 AM   #5
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: Repeatable X Crash, dual FX580 on HP Z600

Quote:
Originally Posted by Brad.Scalio View Post
we are tied into CM to have this specific kernel
Nevertheless I'd recommend compiling a kernel on your own, just to see if it
makes any difference. I couldn't see any hints in your logs what could be the
problem, so I fear you have to perform a couple of experiments, i.e. like trying
another distribution.

BTW.: do your LCD really have just 1280x1024? I guess those are beamers?
In this case you may like to consider purchasing a Matrox TripleHead2Go video
splitter, but in your case that would 40x $300 = $12,000 for all your workstation.
(I guess you could get a volume discount). So I'm not sure is this is really a good
solution for you :-) On the other hand, depending on your requirements, Multi-GPU
setups can be PITA and those TH2Gs make life *much* easier on Linux, so investing
into those TH2Gs may even pay off over time (because of the reduced maintenance
effort).

Please check the following threads for details:

http://www.nvnews.net/vbulletin/showthread.php?t=133740
http://www.nvnews.net/vbulletin/showthread.php?t=126134

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 05-14-10, 07:30 AM   #6
Brad.Scalio
Registered User
 
Join Date: Mar 2007
Posts: 20
Default Re: Repeatable X Crash, dual FX580 on HP Z600

PITA is an understatement

I will try the 8k stack and see how that goes ... we have 40 out there right now, but, we stop our deployment of these new workstations, in total, we have 1,005 of them, so forking over half a million for the TH2G is not an option right now ;-) although, the cost of keeping these in warehouse I am sure will come close to that if we don't resolve it soon
Brad.Scalio is offline   Reply With Quote
Old 05-14-10, 02:33 PM   #7
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: Repeatable X Crash, dual FX580 on HP Z600

Quote:
Originally Posted by Brad.Scalio View Post
we have 1,005 of them, so forking over half a million for the TH2G is not an option right now ;-)
I see. I guess you already filed a bug report to nvidia? Maybe it is possible
to get in contact with nvidia on support contract basis? (fixing such a bug
shouldn't require more than 50 hours)? I mean they shouldn't have a problem
with you paying them to fix what should just work beforehand, eh? SCNR! :-)

Bernhard
JaXXoN is offline   Reply With Quote
Old 05-14-10, 03:38 PM   #8
Brad.Scalio
Registered User
 
Join Date: Mar 2007
Posts: 20
Default Re: Repeatable X Crash, dual FX580 on HP Z600

I put the bug report in this thread -- is there a formal way to file such a report. I obtained a core dump from gdb as well as the memory maps for Xorg ... since gdb didn't provide much information in the backtrace, we were going to go through the core dump and see if we could find anything useful

How would one go about contacting NVIDIA to get a bug report started ... and contact information for such "support" sales?
Brad.Scalio is offline   Reply With Quote

Old 05-14-10, 03:56 PM   #9
Brad.Scalio
Registered User
 
Join Date: Mar 2007
Posts: 20
Default Re: Repeatable X Crash, dual FX580 on HP Z600

One last thing today ... going to try and turn off DamageEvent reporting in xorg.conf ... see if this "resolves" the problem, or more so, prevents the problem from causing a crash - at this point prevention is tolerable or remedying
Brad.Scalio is offline   Reply With Quote
Old 05-14-10, 04:25 PM   #10
Brad.Scalio
Registered User
 
Join Date: Mar 2007
Posts: 20
Default Re: Repeatable X Crash, dual FX580 on HP Z600

(**) NVIDIA(0): Option "DamageEvents" "no"
(**) NVIDIA(0): Enabling RENDER acceleration
(II) NVIDIA(0): Support for GLX with the Damage and Composite X extensions is
(II) NVIDIA(0): enabled.

Even though I set DamageEvents to "no" I still see an informational statement saying that GLX with Damage and Composite X extensions is enabled - I assume this is ok since I care more about XAA/EXA than GLX with my problem - it isn't connected to GLX in any way and we aren't using any openGL calls in the apps that are causing the crashes.
Brad.Scalio is offline   Reply With Quote
Old 05-14-10, 06:42 PM   #11
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: Repeatable X Crash, dual FX580 on HP Z600

Quote:
Originally Posted by Brad.Scalio View Post
(II) NVIDIA(0): Support for GLX with the Damage and Composite X extensions is
hmm ... it's known that the composite extension doesn't work well with xinerama,
but as far as I can tell, xinerama is not enabled in your setup, so it should be fine.
Maybe composite also doesn't work stable with a multi-screen setup.

You may try disabling several extension, just to see at which point things
start to get stable:

Code:
Section "Extensions"
    Option         "RENDER"       "False"
    Option         "DAMAGE"       "False"
    Option         "Composite"    "False"
EndSection
Also, did you yet tried some kernel boot options that often helped in the past,
like: noapic noacpi maxcpus=1


Concerning the "official" bug report: the announcement of earlier driver
versions suggested to send a mail to "linux-bugs at nvidia.com":
http://www.nvidia.de/object/linux-di....36.24-de.html
Maybe that still works.

As for purchasing support, I guess the only "official" way is to contact
customer care (I guess you could file your bug report there, as well):
http://www.nvidia.com/object/driverq...assurance.html
(Then click "NVIDIA Customer Care")

With 2000 FX580 laying around, I would expect you get at least some
little attention :-)

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 05-17-10, 04:17 PM   #12
Brad.Scalio
Registered User
 
Join Date: Mar 2007
Posts: 20
Default Re: Repeatable X Crash, dual FX580 on HP Z600

From what I have seen so far, this is a problem in the nvidia driver. Another, less likely possibility is a bug in Xorg or both.

The code being executed being executed is in response to a window changing size for moving. Based on the size it could be either a top level window or one of the larger panes in GFE (an in house application that seems to always trigger the crash)

I think this code runs regardless of whether the DAMAGE, RENDER, or COMPOSITE extensions are active. The nvidia driver probably assumes OpenGL will be used, so I turning off GLX will probably not prevent this code from running either, but it may be worth trying anyway. This wouldn't be a solution since we do have one in-house app that requires openGL :-(

This is the call stack as I have been able to determine it:

_nv001111X+711280+104 (nvidia GCops PolyFillRect)
damagePolyFillRect+97 (damage.c:1238)
??? (not sure...)
??? _nv001111X+711280+?? (nvidia GCops PolyFillRect
--- not actually seen in stack (because of tail call?))
miDbePositionWindow+1217 (midbe.c:713)
compPositionWindow+89
miSlideAndSizeWindow+461
compResizeWindow+172
ConfigureWindow+3120
ProcConfigureWindow+161
Dispatch+410
main+1157

I've opened a customer care support ticket FWIW, but we'll see what happens from it. Right now I have HP and RedHat as well involved, but shockingly everyone else is pointing fingers.

Last edited by Brad.Scalio; 05-17-10 at 04:18 PM. Reason: elaborated on what GFE is
Brad.Scalio is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 08:43 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.