Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 02-27-12, 07:28 AM   #1
lmv
Registered User
 
Join Date: Aug 2009
Posts: 9
Default NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 6000

I have a reproducible problem whereby an application, when using a specific feature, causes X and the application to enter a live-locked state.

X CPU use reached 100% while it spins in a tight loop, the application is similarly live-locks.

I can reliably reproduce this on an HP ws460c Blade workstation (with a Quardro 6000 card) running RHEL 5.5 and driver 295.20. The blade has the latest HP firmware loaded.

I can't reproduce this effect on an HP Z800 workstation with the same application, driver version and OS version with either a Quadro 5800 or 6000 graphics card installed. This suggests either a timing problem or something specific to the architecture with this NVidia feature in use.

The application seems to be using a shader to map color values to a polygon surface. Without this shader enabled everything's fine. One this shader is enabled, the application will hang within seconds.

The X process goes to 100% and is in a loop attempting to perform an ioctl() on /dev/nvidiactl:

setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
rt_sigreturn(0) = 0
write(27, "\1\1\202\0\376\0\203\0", 8) = 8
semop(0, 0x7ffff461e2d0, 1) = 0
semop(0, 0x7ffff461e2e0, 1) = 0
select(28, [27], NULL, NULL, {0, 0}) = 1 (in [27], left {0, 0})
read(27, "\3", 1) = 1
read(27, "\0\240\6\230\2 \0\30\0", 9) = 9
ioctl(8, 0xc020462a, 0x7ffff461e070) = 0
ioctl(8, 0xc020462a, 0x7ffff461e070) = 0
ioctl(8, 0xc0384641, 0x7ffff461e150) = 0

# lsof -p 13081 | grep ' 8u'
X 13081 root 8u CHR 195,255 20153 /dev/nvidiactl

/var/log/Xorg.0.log contains:

nvLock: client timed out, taking the lock
nvLock: client timed out, taking the lock
nvLock: client timed out, taking the lock
nvLock: client timed out, taking the lock
nvLock: client timed out, taking the lock

The application is in a loop, periodically waiting for an ioctl() on /dev/nvidiactl

dmesg sometimes shows Xid's like:

NVRM: Xid (0000:06:00): 8, Channel 00000004
NVRM: Xid (0000:06:00): 31, Ch 00000004, engmask 00000101, intr 10000000
NVRM: Xid (0000:06:00): 31, Ch 00000004, engmask 00000101, intr 10000000
NVRM: Xid (0000:06:00): 31, Ch 00000004, engmask 00000101, intr 10000000
NVRM: Xid (0000:06:00): 31, Ch 00000004, engmask 00000101, intr 10000100
NVRM: Xid (0000:06:00): 8, Channel 00000001
NVRM: Xid (0000:06:00): 8, Channel 00000004
NVRM: Xid (0000:06:00): 32, Channel ID 00000004 intr 00800000

During the problem Xorg will update the framebuffer but one frame every 5-10 seconds so there's no complete deadlock.

To get out of the live lock, the X server and/or the application have to be killed.

I've seen other variants of behaviour when you strace the X server (tight loop on an rt_sigaction() call, constantly stating /proc/{PID_OF_APPLICATION]/cmdline) but these vary, probably depending on when I get an strace attached to the server. In all cases the X server is too busy to update the framebuffer.

(The Blade is being accessed via HP RGS but this isn't a factor. The same behavior is reproducible when directly attached to the blade with a local monitor. The problem can't be reproduced using RGS to a Z800 workstation).

Bug log attached taken while the problem was happening...
Attached Files
File Type: gz nvidia-bug-report.log.gz (81.0 KB, 33 views)
lmv is offline   Reply With Quote
Old 02-27-12, 11:46 AM   #2
lmv
Registered User
 
Join Date: Aug 2009
Posts: 9
Default Re: NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 600

Watching an strace of the X process before triggering the hang, we definitely halt in an ioctl call on /dev/nvidiactl

The application is spinning on access to the socket it uses to communicate with the X server.

I also sometimes see this in Xorg.0.log:

(WW) Feb 27 17:44:43 NVIDIA(0): WAIT (2, 7, 0x8000, 0x00003708, 0x0000373c)
(WW) Feb 27 17:44:48 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000373c, 0x0000373c)
(WW) Feb 27 17:44:51 NVIDIA(0): WAIT (2, 7, 0x8000, 0x00003ab4, 0x00003b00)
(WW) Feb 27 17:44:58 NVIDIA(0): WAIT (1, 7, 0x8000, 0x00003ab4, 0x00003b00)
lmv is offline   Reply With Quote
Old 02-27-12, 07:40 PM   #3
AaronP
NVIDIA Corporation
 
AaronP's Avatar
 
Join Date: Mar 2005
Posts: 2,487
Default Re: NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 600

Can you please send us a test case we can use to reproduce the problem?
AaronP is offline   Reply With Quote
Old 02-28-12, 03:17 AM   #4
lmv
Registered User
 
Join Date: Aug 2009
Posts: 9
Default Re: NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 600

It's a complex multi-part application with a 42GB data set so that's not really a practical option. If possible, I'll PM you with the details...
lmv is offline   Reply With Quote
Old 02-28-12, 02:07 PM   #5
jds7717
Registered User
 
Join Date: Nov 2003
Posts: 14
Default Re: NVidia live-lock with Xorg and app on ws460c HP Blade Workstation with Quadro 600

We seem to be having a similar problem with one of our applications. For us, the problem seems to be isolated to a particular hardware configuration. We have four Dell M6600 laptops with Quadro 3000M's and three of them will intermittently give us the "nvLock: client timed out, taking the lock" problem. We are running version 290.10. The other systems we have are running the same versions of the driver, OS, and kernel and do not have this problem.

I tried version 295.20 and that didn't fix the problem.
jds7717 is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 12:53 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.