Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 04-29-03, 09:12 AM   #1
WaxyLemon
GeForce 4 Ti 4600
 
Join Date: Apr 2003
Location: Washington, D.C.
Posts: 19
Post 4363 SMP Deadlock

Andy,

Thanks for your hard work in bringing the linux drivers to stability and for monitoring this site. I'm sure I speak for everyone when I say we appreciate having someone from Nvidia listen to us and interact with us.

After the DFP+SMP bug got fixed in the 4363 release I decided to give it a shot yesterday. The X server started up nicely, but unfortunately made it only about 30 minutes before another problem re-appeared. The X server appears to be in deadlock, failing to process any events but managing to use up 100% of all processors. This issue has been discussed for release 4191 but to my knowledge has not yet been resolved.

Before I spend too much time trying to isolate what is causing this, is anyone else still having this problem or did it clear up for other users in this release?

My system specs follow. Please see attached for X configuration file. More information is available upon request.

SuperMicro X5DAE Mobo (Intel E7505)
GeForce 4 Ti 4600 AGP [NV25]
Dual P4 Xeon 2.4GHz
Dual Sony SDM-X72 Digital Displays

Software:
RedHat 9 (with all current updates)
Kernel 2.4.20-9

Notes:
Does not seem to be affected by AGP being disabled, CRT vs DFP, kernel version, glibc version, etc.

Relevant to all releases later than 3123. (That is to say, I am using 3123 since it does not have this problem.)

Update:
I was just looking through the release notes and noticed the info about disabling APIC. I will try this when I get home tonight.
WaxyLemon is offline   Reply With Quote
Old 04-30-03, 07:50 AM   #2
WaxyLemon
GeForce 4 Ti 4600
 
Join Date: Apr 2003
Location: Washington, D.C.
Posts: 19
Default

I tried disabling apic last night to no avail. As much as I'd like to keep an open mind and accept the possibility that this is hardware related, I have difficulty in doing so given that release 3123 does not have this problem, yet 4xxx all do.

This is when it becomes extremely frustrating not to have source code. If I did, the first thing I'd do is a diff between 3123 and 4363 and take a look at all the mutexs and other changes. But I can't. I can only guess at what the problem is, and my current guess is that there's a mutex somewhere getting stuck in deadlock.

Andy, what information can I give Nvidia? To reiterate, as best as I can tell, this issue does not concern AGP, RenderAccel, APIC, etc.

Related threads:
http://www.nvnews.net/vbulletin/show...threadid=10397
http://www.nvnews.net/vbulletin/show...threadid=11010
http://www.nvnews.net/vbulletin/show...threadid=10637
http://www.nvnews.net/vbulletin/show...&threadid=7183
WaxyLemon is offline   Reply With Quote
Old 04-30-03, 08:48 AM   #3
bwkaz
Registered User
 
Join Date: Sep 2002
Posts: 2,262
Default

You can try disabling ACPI rather than APIC, too -- or disable both of them.

However, this isn't a deadlock. In a deadlock, there's 0% CPU utilization -- both threads (or all threads, or all processes, or whatever's involved in the deadlock) are waiting for each other to do something, using no CPU. This happens if thread 1 calls wait on lock 1, and thread 2 calls wait on lock 2, then they each try to wait on the other lock. They'll sleep forever, because neither one can wake up to release its lock.

What you're seeing, if it's any kind of locking issue, is a livelock, where the processes involved are spinning in some sort of busy-wait loop. However, AFAIK, X is not multithreaded -- and there's only one process running. So it can't deadlock or livelock with other threads. I therefore doubt that it's any kind of locking -- though I don't have source either, so it could be some resource contention on the card that we can't see.
__________________
Registered Linux User #219692
bwkaz is offline   Reply With Quote
Old 04-30-03, 08:57 AM   #4
m2-
Registered User
 
Join Date: Apr 2003
Location: None of your Bizland.
Posts: 11
Default Same thing here

SMP Pentium III Apollo PRO 133 chipset GeForce3 Ti 200, 2.4.21-rc1 kernel.

CONFIG_X86_GOOD_APIC, CONFIG_X86_IO_APIC and CONFIG_X86_LOCAL_APIC all set to "y".

2x AGP using the NVIDIA driver.

The X server spins for a while and eventually starts up (after 1-2 minutes). After that everything seems to be ok, can't say for sure, at least the undergrads haven't come to me to tell me for the nth time what I already know :-\

PS: I *hate* this discussion board system. Thanks for keeping it up and running and all of that, but I still hate it...
m2- is offline   Reply With Quote
Old 04-30-03, 10:39 AM   #5
WaxyLemon
GeForce 4 Ti 4600
 
Join Date: Apr 2003
Location: Washington, D.C.
Posts: 19
Default

Yeah, I guess I wasn't using correct terminology. By deadlock I meant the X server appears to be waiting for a resource and becomes dead because it never gets it.

I also assumed that X is multithreaded. It seems to me if it wants to interact with multiple clients connected on Unix domain sockets and over TCP and be interactive with all of them, while also driving a high bandwidth video display, it needs to accept() connections in one thread and process incoming events on another several threads. But I guess it could also use select() or poll() to service all events and accept new connections in a giant while loop.

Either way, in this case I don't believe X is livelocking itself or another process. It only spins on 1 processor, implying only 1 thread. However, it could be in a spinlock waiting for, say, an irq or some other resource to change status.

So my new prognosis is that 4363 is not playing nice with some hardware on my system, but 3123 does play nice with it. So I'll try disabling acpi and see what happens.
WaxyLemon is offline   Reply With Quote
Old 04-30-03, 02:09 PM   #6
Andy Mecham
l33t master
 
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 1,163
Default

m2-: see the README for long delays on X start - check out the IgnoreDisplayDevices option.

WaxyLemon: I haven't seen this inhouse, but i'll try to repro. Please send details to linux-bugs@nvidia.com.

--andy
__________________
Andy Mecham
NVIDIA Corporation
Andy Mecham is offline   Reply With Quote
Old 04-30-03, 04:36 PM   #7
WaxyLemon
GeForce 4 Ti 4600
 
Join Date: Apr 2003
Location: Washington, D.C.
Posts: 19
Default

Andy,

I belive the issue is related to this one:
http://www.nvnews.net/vbulletin/show...&threadid=7183

in which you wrote that you have reproduced it in house. But I will submit a bug report in order to provide additional information.

I also haven't checked to see if acpi is the magic bullet yet. I'll do that tonight.
WaxyLemon is offline   Reply With Quote
Old 04-30-03, 04:41 PM   #8
Andy Mecham
l33t master
 
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 1,163
Default

That thread involves a hang on TNT2s when viewing large images. I don't think it's related to your bug.

--andy
__________________
Andy Mecham
NVIDIA Corporation
Andy Mecham is offline   Reply With Quote

Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Similar Threads
Thread Thread Starter Forum Replies Last Post
PC deadlock when running X for ~2 min after Debian/wheezy driver install TpmKranz NVIDIA Linux 0 05-31-12 05:53 AM
Red Hat null(limbo beta 3), guide to install Nvidia Drivers utiel NVIDIA Linux 21 09-30-02 06:32 PM
SMP system hangs with OpenGL chazmati NVIDIA Linux 9 08-26-02 10:28 AM

All times are GMT -5. The time now is 09:16 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.