nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 6000 (http://www.nvnews.net/vbulletin/showthread.php?t=174848)

lmv 02-27-12 07:28 AM

NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 6000
 
1 Attachment(s)
I have a reproducible problem whereby an application, when using a specific feature, causes X and the application to enter a live-locked state.

X CPU use reached 100% while it spins in a tight loop, the application is similarly live-locks.

I can reliably reproduce this on an HP ws460c Blade workstation (with a Quardro 6000 card) running RHEL 5.5 and driver 295.20. The blade has the latest HP firmware loaded.

I can't reproduce this effect on an HP Z800 workstation with the same application, driver version and OS version with either a Quadro 5800 or 6000 graphics card installed. This suggests either a timing problem or something specific to the architecture with this NVidia feature in use.

The application seems to be using a shader to map color values to a polygon surface. Without this shader enabled everything's fine. One this shader is enabled, the application will hang within seconds.

The X process goes to 100% and is in a loop attempting to perform an ioctl() on /dev/nvidiactl:

setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
rt_sigreturn(0) = 0
write(27, "\1\1\202\0\376\0\203\0", 8) = 8
semop(0, 0x7ffff461e2d0, 1) = 0
semop(0, 0x7ffff461e2e0, 1) = 0
select(28, [27], NULL, NULL, {0, 0}) = 1 (in [27], left {0, 0})
read(27, "\3", 1) = 1
read(27, "\0\240\6\230\2 \0\30\0", 9) = 9
ioctl(8, 0xc020462a, 0x7ffff461e070) = 0
ioctl(8, 0xc020462a, 0x7ffff461e070) = 0
ioctl(8, 0xc0384641, 0x7ffff461e150) = 0

# lsof -p 13081 | grep ' 8u'
X 13081 root 8u CHR 195,255 20153 /dev/nvidiactl

/var/log/Xorg.0.log contains:

nvLock: client timed out, taking the lock
nvLock: client timed out, taking the lock
nvLock: client timed out, taking the lock
nvLock: client timed out, taking the lock
nvLock: client timed out, taking the lock

The application is in a loop, periodically waiting for an ioctl() on /dev/nvidiactl

dmesg sometimes shows Xid's like:

NVRM: Xid (0000:06:00): 8, Channel 00000004
NVRM: Xid (0000:06:00): 31, Ch 00000004, engmask 00000101, intr 10000000
NVRM: Xid (0000:06:00): 31, Ch 00000004, engmask 00000101, intr 10000000
NVRM: Xid (0000:06:00): 31, Ch 00000004, engmask 00000101, intr 10000000
NVRM: Xid (0000:06:00): 31, Ch 00000004, engmask 00000101, intr 10000100
NVRM: Xid (0000:06:00): 8, Channel 00000001
NVRM: Xid (0000:06:00): 8, Channel 00000004
NVRM: Xid (0000:06:00): 32, Channel ID 00000004 intr 00800000

During the problem Xorg will update the framebuffer but one frame every 5-10 seconds so there's no complete deadlock.

To get out of the live lock, the X server and/or the application have to be killed.

I've seen other variants of behaviour when you strace the X server (tight loop on an rt_sigaction() call, constantly stating /proc/{PID_OF_APPLICATION]/cmdline) but these vary, probably depending on when I get an strace attached to the server. In all cases the X server is too busy to update the framebuffer.

(The Blade is being accessed via HP RGS but this isn't a factor. The same behavior is reproducible when directly attached to the blade with a local monitor. The problem can't be reproduced using RGS to a Z800 workstation).

Bug log attached taken while the problem was happening...

lmv 02-27-12 11:46 AM

Re: NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 600
 
Watching an strace of the X process before triggering the hang, we definitely halt in an ioctl call on /dev/nvidiactl

The application is spinning on access to the socket it uses to communicate with the X server.

I also sometimes see this in Xorg.0.log:

(WW) Feb 27 17:44:43 NVIDIA(0): WAIT (2, 7, 0x8000, 0x00003708, 0x0000373c)
(WW) Feb 27 17:44:48 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000373c, 0x0000373c)
(WW) Feb 27 17:44:51 NVIDIA(0): WAIT (2, 7, 0x8000, 0x00003ab4, 0x00003b00)
(WW) Feb 27 17:44:58 NVIDIA(0): WAIT (1, 7, 0x8000, 0x00003ab4, 0x00003b00)

AaronP 02-27-12 07:40 PM

Re: NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 600
 
Can you please send us a test case we can use to reproduce the problem?

lmv 02-28-12 03:17 AM

Re: NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 600
 
It's a complex multi-part application with a 42GB data set so that's not really a practical option. If possible, I'll PM you with the details...

jds7717 02-28-12 02:07 PM

Re: NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 600
 
We seem to be having a similar problem with one of our applications. For us, the problem seems to be isolated to a particular hardware configuration. We have four Dell M6600 laptops with Quadro 3000M's and three of them will intermittently give us the "nvLock: client timed out, taking the lock" problem. We are running version 290.10. The other systems we have are running the same versions of the driver, OS, and kernel and do not have this problem.

I tried version 295.20 and that didn't fix the problem.


All times are GMT -5. The time now is 03:16 PM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.