View Single Post
Old 04-04-07, 01:18 AM   #2
ooPo
Registered User
 
Join Date: Apr 2007
Posts: 4
Default Re: nforce4 + 7800gtx sli + smp = lockups

Just to brute force the general 'try this' responses, I've tried the following kernel parameters:

pci=routeirq
pci=nommconf
pci=conf1
idle=poll
noapic
pci=noacpi
pci=biosirq
acpi=off
maxcpus=1

Nothing changes. So I dug deeper. I've noticed that when I run glxgears with SLI that my cpu usage spikes up to 100% immediately. When I quit the cpu usage takes more than a few seconds to come back down to normal. If I wait before running glxgears again it generally, like 80% of the time, runs like normal.

Other times, it runs really slowly like something is using up all the cpu time.

So I ran it with strace and noticed that when it isn't working properly it is calling sched_yield() a lot, over and over again. I wonder if those 'fixes' where sched_yield was replaced with 'return 0;' is causing a problem where not only is it essentially polling, but with the added overhead of a function call making a bad problem worse. Even a 'usleep(1);' would have been better than that.

Is also could mean there's a deeper problem of which the hanging/cpu starvation is only a symptom. Perhaps a syncronization issue between the two cards? Something that happens early on, probably during initialization before rendering even occurs. Or maybe whatever shutdown code is used isn't properly releasing the second card causing each run afterwards to not work properly.

Sometimes using SMP or Cool'n'Quiet can cause subtle timing issues to appear but 'maxcpus=1' and 'acpi=off' would have eliminated those possibilities.

BTW, is this even the right place to be asking about this? I'm doubting I'll even get a response here beyond 'wait for the next release', if any response at all. I'm letting a second card go idle while I'm attempting to fix it myself here and without the source I can really only do so much...
ooPo is offline   Reply With Quote