Re: nforce4 + 7800gtx sli + smp = lockups
Just to brute force the general 'try this' responses, I've tried the following kernel parameters:
Nothing changes. So I dug deeper. I've noticed that when I run glxgears with SLI that my cpu usage spikes up to 100% immediately. When I quit the cpu usage takes more than a few seconds to come back down to normal. If I wait before running glxgears again it generally, like 80% of the time, runs like normal.
Other times, it runs really slowly like something is using up all the cpu time.
So I ran it with strace and noticed that when it isn't working properly it is calling sched_yield() a lot, over and over again. I wonder if those 'fixes' where sched_yield was replaced with 'return 0;' is causing a problem where not only is it essentially polling, but with the added overhead of a function call making a bad problem worse. Even a 'usleep(1);' would have been better than that.
Is also could mean there's a deeper problem of which the hanging/cpu starvation is only a symptom. Perhaps a syncronization issue between the two cards? Something that happens early on, probably during initialization before rendering even occurs. Or maybe whatever shutdown code is used isn't properly releasing the second card causing each run afterwards to not work properly.
Sometimes using SMP or Cool'n'Quiet can cause subtle timing issues to appear but 'maxcpus=1' and 'acpi=off' would have eliminated those possibilities.
BTW, is this even the right place to be asking about this? I'm doubting I'll even get a response here beyond 'wait for the next release', if any response at all. I'm letting a second card go idle while I'm attempting to fix it myself here and without the source I can really only do so much...