View Full Version : 6.1-Stable Freeze with 6600GT
I just installed a new 6600GT card in a new motherboard and all seems to go well until the system suddenly freezes, requiring a reboot. The mouse pointer still moves about but the system doesn't even respond to a ping. I can't work out anything special that I'm doign to cause a freeze as I can run 3d apps beforehand without a problem and it can freeze when I haven't run any 3d apps at all.
I'm running the latest -STABLE from 2006-07-19 and I installed it as specified in the documentation. I installed the nvidia-driver port with just the Linux option and recompiled the kernel without the agp and dri devices. uname reports: FreeBSD server.home 6.1-STABLE FreeBSD 6.1-STABLE #0: Wed Jul 19 11:19:16 CEST 2006 root@server.home:/usr/obj/usr/src/sys/SERVER i386
The motherboard is a brand new MSI K8T Neo2-f V2.0 which uses the VIA K8T800 PRO chipset and runs an Athlon 64 3500+ and 1GB of RAM.
I've had to resort to running the "nv" driver, which is less than optimal as I bought this graphics card specifically for 3d. Ah well, Google Earth doesn't work at the moment anyway.
If anyone has any ideas as to why my system is freezing, I'd be very grateful.
Just some more information about the problem:
I tried the patch from thread http://www.nvnews.net/vbulletin/showthread.php?t=72892 with no luck. The system still crashes.
I'm not sure if it's useful but I found the following in /var/log/messages:
Jul 20 15:26:39 server kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0005615c
Jul 20 15:26:44 server kernel: NVRM: Xid (0001:00): 8, Channel 00000000
Jul 20 15:26:47 server kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0005615d
Jul 20 15:26:55 server kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0005615e
Jul 20 15:27:03 server kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0005615f
Jul 20 15:27:11 server kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00056160
Jul 20 15:27:19 server kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00056161
@Nealie: judging from these error messages, the NVIDIA kernel module no longer receives interrupts from the GPU at some some point; is the interrupt shared with another driver?
Ah yes, it appears to be sharing IRQ 16 with re0, my RealTek network interface. I've got loads of spare IRQs so I have no idea why it decided to share these.
I've had a look at the BIOS and I can't find a way to shift any of the IRQs. Is it possible to move one or the other with a device.hint maybe?
My guess is that the two share the same physical interrupt line, you may need to move either card; please see your mainboard's manual for details. Please note that interrupt sharing isn't a problem as such, but it may be in this case.
That could be a problem as the network interface is built in and obviously there is only one AGP slot. The BIOS unfortunately does not allow for reassigning IRQs for the built in devices.
Luckily I have a spare network card; just a fast card rather than gigabit but I'll try and install it and diable the internal device and see if it makes any difference.
Just to reply to myself again: It looks like that sharing the IRQ with the network interface was the problem. I installed a separate network card and disabled the onboard device in the BIOS and all seems to well now. So far my system has not frozen all morning.
So the moral is: don't try to use an MSI K8T Neo2-F V2.0 motherboard with the NVIDIA driver as they don't like sharing the same IRQ as the network interface.
It's good to hear that avoiding interrupt sharing with the onboard NIC helped, but it points to a larger problem. Could you post the output of `dmesg` after booting with bootverbose and the onboard NIC enabled? I'd also be interested in the output of `mptable` with that configuration. There are no known interrupt sharing problems with the NVIDIA graphics driver, but it may be worthwhile to check if there are any with this particular NIC and/or its driver.
Okeydokey. Here are the two items you requested.
Let me know if you need anything else.
http://www.nvnews.net/vbulletin/images/smilies/thumbdown.gif
I know this is an older thread, but I have been getting the same kind of lockups on an MSI K8T-neo2, with a couple different NVIDIA cards, and it's trashing my AHA-29160UW scsi card during backups.
I'm running FC5 - 2.6.17-1.2174_FC5smp
ACPI: PCI Interrupt 0000:00:07.0[A] -> GSI 18 (level, low) -> IRQ 177
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
NVRM: loading NVIDIA Linux x86 Kernel Module 1.0-8774 Tue Aug 1 20:54:08 PDT 2006
ACPI: PCI Interrupt 0000:00:05.0[A] -> GSI 16 (level, low) -> IRQ 193
lastlog before "hard lock" (snippit)
Sep 17 01:26:52 kyzyl smartd[2150]: Device: /dev/sdb, Temperature changed 2 Celsius to 37 Celsius since last report
Sep 17 01:45:11 kyzyl kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 001673ee
Sep 17 01:45:13 kyzyl kernel: NVRM: Xid (0001:00): 8, Channel 0000001e
Sep 17 01:45:19 kyzyl kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 001673ef
Sep 17 01:45:21 kyzyl kernel: NVRM: Xid (0001:00): 8, Channel 00000000
Sep 17 01:45:27 kyzyl kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 001673f0
Sep 17 01:45:29 kyzyl kernel: NVRM: Xid (0001:00): 8, Channel 0000001e
Sep 17 01:45:35 kyzyl kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 001673f1
Sep 17 01:45:37 kyzyl kernel: NVRM: Xid (0001:00): 8, Channel 0000001e
Sep 17 01:45:43 kyzyl kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 001673f2
Sep 17 01:45:45 kyzyl kernel: NVRM: Xid (0001:00): 8, Channel 0000001e
Sep 17 01:45:51 kyzyl kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 001673f3
Sep 17 01:45:53 kyzyl kernel: NVRM: Xid (0001:00): 8, Channel 0000001e
Sep 17 01:45:59 kyzyl kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 001673f4
Sep 17 01:46:01 kyzyl kernel: NVRM: Xid (0001:00): 8, Channel 0000001e
Sep 17 01:46:07 kyzyl kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 001673f5
Sep 17 01:46:09 kyzyl kernel: NVRM: Xid (0001:00): 8, Channel 0000001e
This one happened during a microlite backup, after about 3.8GB of data was transfered
Anyway.. I get the exact same lock - execpt for any write to the HDs causes big problems when it happens. Mouse and various screen functions still work, but BOINC stops, and Kmail complains that it can't read the mailboxes.
I also have a 6600GT and am getting the same error. I am running FC5 with kernel 2.6.17-1.2187_FC5smp. Latest drivers from ATRPMS. The motherboard is an Asus A8v Deluxe with a K8T800Pro.
Has there been any resolution to this?
Durandal
11-03-06, 12:59 PM
Hello,
i have an 6660GT and Asus A8V Deluxe with FreeBSD Prerelease 6.2.
I also had problems with FreeBSD 6.1 Release. The system freezes, i cant even move the mouse cursor. The sound ist making a repeating noise and thats it. 3D works fine, this happens sometimes after 10 Minutes or 1 Week uptime.
Here a part of glxinfo:
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce 6600 GT/PCI/SSE2/3DNOW!
OpenGL version string: 2.0.2 NVIDIA 87.76
Driver is 87.76 build from the ports (nvidia-driver).
Can anyone help or is anyone experiencing similar problems?
I dont have dumps because none are written although i specified the paramters (dev="/dev/ar1s1b" and dumpdir="/var/crash")
Thx,
Durandal
AlienZoo
11-27-06, 04:41 PM
Just a "me too". 6600GT with FreeBSD 5.4. Asus A8V Deluxe and Athlon x2 processor.
I have no interrupt clashes that I can see and lockups appear to happen after heavy ethernet activity on onboard sk0 interface.
Same graphics card ran in same model motherboard (different physical board with older BIO is that matters) with single core AMD 64 for over a year, but also with older drivers, with no such problems. Now with dual-core and SMP kernel, lockups are daily.
Lockups happened with 8776 and now 9629.
(In fact, the machine does not completely lock up, and if I am lucky I can exit X with ctl-alt-backspace. The jobs causing the heavy interrupt load do run to completion. But I cannot connect to the machine and cannot restart X after quitting - needs a reboot. Complains about not recieving interrupts).
If there's any information I can provide, just let me know.
Am about to try downgrading back to 8178 to see if it helps - clutching at straws!
AlienZoo
12-05-06, 02:39 PM
It seems to have been worth clutching at straws!
I downgraded nvidia-driver to 8178 (and had to downgrade nvidia settings to 1.05) and (fingers crossed) I haven't had the xid issue at all for over a week, when I was getting it pretty much daily.
Looks like some change in the later versions has caused this problem to occur.
cybasheep
08-09-07, 09:54 PM
Hate to resurrect an old thread, but I'm seeing essentially the same thing, but on somewhat newer hardware.
The system is an Athlon64 X2 4600+ on an ASRock AliveSATA2-GLAN Motherboard (VIA KT890CF chipset), a PCIe 7600GT and two TFTs hooked up to it via DVI.
The symptom is that X locks up - not very frequently though - and one of the TFTs loses signal. The mousepointer still moves, but X does no longer respond to keyboard input or mouse clicks, I can only press the power button to have it shut down (it does properly shut down however, and once X is killed as part of the shutdown sequence, the system console reappears on both screens). The lockups do not seem to be correlated to any particular activity or event.
I dual-boot Windows on this machine and I haven't seen a similar thing occur there yet.
Prior to the lockups, I get errors like these - in X.0.log:
(WW) NVIDIA(0): WAIT (0, 6, 0x8000, 0x000007e4, 0x000007e4)
(WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x000007e4, 0x00001e3c)
...and dmesg:
NVRM: Xid (0006:00): 16, Head 00000000 Count 00194db3
NVRM: Xid (0006:00): 16, Head 00000001 Count 00194db2
NVRM: Xid (0006:00): 16, Head 00000000 Count 00194db4
NVRM: Xid (0006:00): 16, Head 00000001 Count 00194db3
NVRM: Xid (0006:00): 8, Channel 00000000
NVRM: Xid (0006:00): 16, Head 00000000 Count 00194db5
NVRM: Xid (0006:00): 16, Head 00000001 Count 00194db4
NVRM: Xid (0006:00): 8, Channel 0000001e
NVRM: Xid (0006:00): 16, Head 00000000 Count 00194db6
NVRM: Xid (0006:00): 16, Head 00000001 Count 00194db5
NVRM: Xid (0006:00): 8, Channel 00000020
NVRM: Xid (0006:00): 16, Head 00000000 Count 00194db7
NVRM: Xid (0006:00): 16, Head 00000001 Count 00194db6
NVRM: Xid (0006:00): 8, Channel 00000020
NVRM: Xid (0006:00): 16, Head 00000000 Count 00194db8
NVRM: Xid (0006:00): 16, Head 00000001 Count 00194db7
The FreeBSD version is:
FreeBSD kiste 6.2-STABLE FreeBSD 6.2-STABLE #7:
Thu Aug 9 22:35:48 CEST 2007 root@kiste:/usr/obj/usr/src/sys/KISTE-SMP i386
The nvidia driver version is:
nvidia-driver-1.0.9746_5
IRQs on this system:
0: *timer*
1: atkbdc0 <Keyboard controller (i8042)>
atkbd0 <AT Keyboard>
atkbdc0 <Keyboard controller (i8042)>
atkbd0 <AT Keyboard>
atkbdc0 <Keyboard controller (i8042)>
atkbd0 <AT Keyboard>
2: N.A.
3: sio1
sio1
sio1
4: sio0
sio0 <16550A-compatible COM port>
sio0
sio0 <16550A-compatible COM port>
sio0
sio0 <16550A-compatible COM port>
5: free
6: fdc0 <floppy drive controller (FDE)>
fdc0 <floppy drive controller (FDE)>
fdc0 <floppy drive controller (FDE)>
7: ppc0 <ECP parallel printer port>
ppc0 <ECP parallel printer port>
ppc0 <ECP parallel printer port>
8: *rtc*
9: free
10: free
11: free
12: free
13: *npx*
14: free
15: free
16: pcm1 <Envy24 audio (Terratec DMX 6fire)>
pcm1 <Envy24 audio (Terratec DMX 6fire)>
pcm1 <Envy24 audio (Terratec DMX 6fire)>
17: pcm0 <VIA VT8251/8237A High Definition Audio Controller>
pcm0 <VIA VT8251/8237A High Definition Audio Controller>
pcm0 <VIA VT8251/8237A High Definition Audio Controller>
18: bktr0 <BrookTree 878>
bktr0 <BrookTree 878>
bktr0 <BrookTree 878>
19: free
20: uhci0 <VIA 83C572 USB controller>
uhci0 <VIA 83C572 USB controller>
uhci0 <VIA 83C572 USB controller>
21: atapci0 <VIA 8237A SATA150 controller>
uhci2 <VIA 83C572 USB controller>
ehci0 <VIA VT6202 USB 2.0 controller>
atapci0 <VIA 8237A SATA150 controller>
uhci2 <VIA 83C572 USB controller>
ehci0 <VIA VT6202 USB 2.0 controller>
atapci0 <VIA 8237A SATA150 controller>
uhci2 <VIA 83C572 USB controller>
ehci0 <VIA VT6202 USB 2.0 controller>
22: uhci1 <VIA 83C572 USB controller>
uhci1 <VIA 83C572 USB controller>
uhci1 <VIA 83C572 USB controller>
23: uhci3 <VIA 83C572 USB controller>
uhci3 <VIA 83C572 USB controller>
uhci3 <VIA 83C572 USB controller>
24: nvidia0 <GeForce 7600 GT>
nvidia0 <GeForce 7600 GT>
nvidia0 <GeForce 7600 GT>
25: free
26: free
27: pcib2 <ACPI PCI-PCI bridge>
pcib2 <ACPI PCI-PCI bridge>
pcib2 <ACPI PCI-PCI bridge>
28: free
29: free
30: free
31: pcib3 <ACPI PCI-PCI bridge>
pcib3 <ACPI PCI-PCI bridge>
pcib3 <ACPI PCI-PCI bridge>
32: free
33: free
34: free
35: pcib4 <ACPI PCI-PCI bridge>
pcib4 <ACPI PCI-PCI bridge>
pcib4 <ACPI PCI-PCI bridge>
36: re0 <RealTek 8168/8111B PCIe Gigabit Ethernet>
re0 <RealTek 8168/8111B PCIe Gigabit Ethernet>
re0 <RealTek 8168/8111B PCIe Gigabit Ethernet>
37: free
38: free
39: pcib5 <ACPI PCI-PCI bridge>
pcib5 <ACPI PCI-PCI bridge>
pcib5 <ACPI PCI-PCI bridge>
40: free
41: free
42: free
43: pcib6 <ACPI PCI-PCI bridge>
pcib6 <ACPI PCI-PCI bridge>
pcib6 <ACPI PCI-PCI bridge>
44: free
45: free
46: free
47: free
48: free
49: free
50: free
51: free
52: free
53: free
54: free
55: free
56: free
57: free
58: free
59: free
60: free
61: free
62: free
63: free
AlienZoo
08-10-07, 06:02 AM
Hate to resurrect an old thread, but I'm seeing essentially the same thing, but on somewhat newer hardware.
Seems fair enough since nothing ever seemed to get resolved :(
I got minor relief by using 8178 drivers but it's not as good as I initially thought. I still get lock ups under *heavy, sustained* ethernet usage. AFAIK there are no shared interrupts in play. Luckily, I only seem to trigger the fault when backing up to another machine, so I just exit X before doing that.
I had no trouble with a single core processor and non-SMP kernel on otherwise identical hardware. If you are willing to sacrifice a core, you might try taking out SMP to see if it helps.
Haven't had a chance to try the nv driver since Xorg 7.1 came out to see if it works better than it did in the past.
cybasheep
08-10-07, 09:02 AM
I experimented a bit and it seems that there really is a big problem with interrupt delivery and IRQ sharing.
Disabling the ioapic (and SMP of course) results in some massive IRQ sharing, as expected:
hint.apic.0.disabled="1"
kern.smp.disabled="1"
0: *timer*
1: atkbdc0 <Keyboard controller (i8042)>
atkbd0 <AT Keyboard>
atkbdc0 <Keyboard controller (i8042)>
atkbd0 <AT Keyboard>
2: N.A.
3: uhci3 <VIA 83C572 USB controller>
sio1
uhci3 <VIA 83C572 USB controller>
sio1
4: sio0
sio0 <16550A-compatible COM port>
sio0
sio0 <16550A-compatible COM port>
5: bktr0 <BrookTree 878>
uhci2 <VIA 83C572 USB controller>
ehci0 <VIA VT6202 USB 2.0 controller>
bktr0 <BrookTree 878>
uhci2 <VIA 83C572 USB controller>
ehci0 <VIA VT6202 USB 2.0 controller>
6: fdc0 <floppy drive controller (FDE)>
fdc0 <floppy drive controller (FDE)>
7: ppc0 <ECP parallel printer port>
ppc0 <ECP parallel printer port>
8: *rtc*
9: free
10: pcib2 <ACPI PCI-PCI bridge>
nvidia0 <GeForce 7600 GT>
pcib3 <ACPI PCI-PCI bridge>
pcib4 <ACPI PCI-PCI bridge>
pcib5 <ACPI PCI-PCI bridge>
re0 <RealTek 8168/8111B PCIe Gigabit Ethernet>
pcib6 <ACPI PCI-PCI bridge>
uhci0 <VIA 83C572 USB controller>
pcm1 <Envy24 audio (Terratec DMX 6fire)>
pcib2 <ACPI PCI-PCI bridge>
nvidia0 <GeForce 7600 GT>
pcib3 <ACPI PCI-PCI bridge>
pcib4 <ACPI PCI-PCI bridge>
pcib5 <ACPI PCI-PCI bridge>
re0 <RealTek 8168/8111B PCIe Gigabit Ethernet>
pcib6 <ACPI PCI-PCI bridge>
uhci0 <VIA 83C572 USB controller>
pcm1 <Envy24 audio (Terratec DMX 6fire)>
11: atapci0 <VIA 8237A SATA150 controller>
uhci1 <VIA 83C572 USB controller>
pcm0 <VIA VT8251/8237A High Definition Audio Controller>
atapci0 <VIA 8237A SATA150 controller>
uhci1 <VIA 83C572 USB controller>
pcm0 <VIA VT8251/8237A High Definition Audio Controller>
12: free
13: *npx*
14: free
15: free
... and with that, it takes no longer than five minutes of running X before the problem appears. Normally I'd be quite happy to just blame PC hardware as usual, but I tried stressing some other devices sharing IRQ 10 for testing: playing music on pcm1 while continuously rsyncing data over re0 (which is a PCIe device as well) from a memory card in a reader attached to a port whose parent is uhci0 and none of them went haywire, plus it-works-in-windows, so there is some hope left this could be actually fixed by other means than shuffling pci cards (or even motherboards) around.
AlienZoo
08-10-07, 09:32 AM
I tried stressing some other devices sharing IRQ 10 for testing: playing music on pcm1 while continuously rsyncing data over re0 (which is a PCIe device as well) from a memory card in a reader attached to a port whose parent is uhci0 and none of them went haywire, plus it-works-in-windows
Indeed. When the NVidia driver goes haywire my huge ethernet traffic continues just fine, so whatever the interrupt issue is, it doesn't appear to affect my ethernet. This has been true for two different ethernets, one on-board one in a PCI slot and I'm sure I re-ordered the cards so the PCI card was in a slot not shared with the NVidia (though I think the on-board wasn't either). Been too long since I looked...
And as you say "it works in windows" so it must surely be driver/FreeBSD related.
Still happy to provide NVidia with any info that would help...
vBulletin® v3.7.1, Copyright ©2000-2012, Jelsoft Enterprises Ltd.