Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 08-28-08, 10:57 PM   #1
NightOwl
Registered User
 
Join Date: Sep 2006
Location: Toronto, Canada
Posts: 23
Default 177.70 fails on XPS M1730 with 2 x 8700M GPUs in SLi

The kernel module still fails, saying the interrupts aren't being received. This failure occurs in all the 177. series so far. The 173. series drivers work fine with the Dell hardware.

I've tried kernel boots with pci=noacpi, acpi=off (doesn't work well anyway as the SMP configuration requires ACPI), noapic, acpi=noirq and pci=biosirq all result in the same problem with the NVIDIA module. I haven't tried irqpoll as that really isn't a viable option.

Is the Dell 8700M SLi configuration not supported?
Attached Files
File Type: log nvidia-bug-report.log (110.7 KB, 86 views)
File Type: log Xorg.0.log (26.7 KB, 93 views)
NightOwl is offline   Reply With Quote
Old 09-11-08, 04:48 AM   #2
Nappers
Registered User
 
Join Date: Aug 2008
Posts: 11
Default Re: 177.70 fails on XPS M1730 with 2 x 8700M GPUs in SLi

Hi NightOwl, I have the same machine and while I'm still running the older 173.14.09 drivers (from the debian packages), I have had similar problems with IRQs for any kernel in the 2.6.25/26 series. In particular, I get something like the following in my kern.log...
Code:
Sep 11 11:14:51 localhost kernel: NVRM: loading NVIDIA UNIX x86 Kernel Module  173.14.09  Wed Jun  4 23:43:17 PDT 2008
Sep 11 11:14:53 localhost kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Sep 11 11:14:53 localhost kernel: Pid: 2573, comm: g15daemon Tainted: P          2.6.26.5 #1
Sep 11 11:14:53 localhost kernel:  [<c0158db4>] __report_bad_irq+0x24/0x90
Sep 11 11:14:53 localhost kernel:  [<f97e7149>] nv_kern_isr+0x59/0xb0 [nvidia]
Sep 11 11:14:53 localhost kernel:  [<c015908f>] note_interrupt+0x26f/0x2a0
Sep 11 11:14:53 localhost kernel:  [<c01584b8>] handle_IRQ_event+0x28/0x50
Sep 11 11:14:53 localhost kernel:  [<c01597eb>] handle_fasteoi_irq+0xab/0xd0
Sep 11 11:14:53 localhost kernel:  [<c0159740>] handle_fasteoi_irq+0x0/0xd0
Sep 11 11:14:53 localhost kernel:  [<c0106d70>] do_IRQ+0x80/0xd0
Sep 11 11:14:53 localhost kernel:  [<c012b96c>] irq_exit+0x3c/0x80
Sep 11 11:14:53 localhost kernel:  [<c01046f3>] common_interrupt+0x23/0x28
Sep 11 11:14:53 localhost kernel:  [<c031f2b9>] lock_kernel+0x29/0x40
Sep 11 11:14:53 localhost kernel:  [<c018bc45>] vfs_ioctl+0x65/0x90
Sep 11 11:14:53 localhost kernel:  [<c018bcd7>] do_vfs_ioctl+0x67/0x2d0
Sep 11 11:14:53 localhost kernel:  [<c012b9e8>] irq_enter+0x38/0x70
Sep 11 11:14:53 localhost kernel:  [<c018bf7d>] sys_ioctl+0x3d/0x70
Sep 11 11:14:53 localhost kernel:  [<c0103d01>] sysenter_past_esp+0x6a/0x91
Sep 11 11:14:53 localhost kernel:  =======================
Sep 11 11:14:53 localhost kernel: handlers:
Sep 11 11:14:53 localhost kernel: [<f97e70f0>] (nv_kern_isr+0x0/0xb0 [nvidia])
Sep 11 11:14:53 localhost kernel: Disabling IRQ #16
Sometimes X starts and sometimes it doesn't with the "interrupts not being received" problem. When it does start, X seems a bit slow and there are 4-5second pauses in compiz whenever the cube is rotated or the water effect occurs.

With the 2.6.24 series however, everything works nicely without any special boot parameters. I did try the 177.13 binary drivers (before I switched to using the debian packages) a while ago and the same thing happens, if I recall correctly.

So, maybe if you try a 2.6.24 kernel, the 177.70 drivers *may* work for you? (if you're lucky...) :-)
Attached Files
File Type: log kern.log (112.0 KB, 88 views)
Nappers is offline   Reply With Quote
Old 09-11-08, 07:34 AM   #3
mikedl
Registered User
 
Join Date: Oct 2004
Posts: 6
Default Re: 177.70 fails on XPS M1730 with 2 x 8700M GPUs in SLi

I have the same laptop with the same SLI GPUs but get different results (OpenSUSE 11.0):

- 177.13 works OK but very slowly in KDE 4
- 177.68 works well (faster in KDE 4) but takes ~20secs to start X, and hangs when X is terminated
- 177.70 never works because it does not receive interrupts from the NVIDIA device at PCI:3.0.0

This morning, a new OpenSuSE kernel came out with interrupt handling changes but did not change these results. For me, something changed after .13 and not for the better...
mikedl is offline   Reply With Quote
Old 09-11-08, 10:50 AM   #4
NightOwl
Registered User
 
Join Date: Sep 2006
Location: Toronto, Canada
Posts: 23
Default Re: 177.70 fails on XPS M1730 with 2 x 8700M GPUs in SLi

The IRQ 16 interrupt is not so much of a problem I have found - I think that is actually the PhysX componentry that isn't supported yet (or ever) in Linux - it gets disabled because there is nothing to service the interrupt.

The interrupt detection is certainly more flaky with the NVidia driver in the later kernels. When the system seems unresponsive in X, you may find the the NVidia interrupts have gone crazy and you will need to reboot. I've only had that happen a few times. A good utility to have is powertop (an Intel power monitoring program). It tells you about interrupts and how many are being generated. That's how I could tell it was an NVidia problem - it was generating 97% of the interrupts.

I've tried other things as well. I've implemented MSI use in the kernel as the M1730 has some MSI-capable devices. It doesn't seem to have helped, but it does move some of the devices so they aren't shared on an interrupt line.

Code:
           CPU0       CPU1       
  0:    2215984    2284775   IO-APIC-edge      timer
  1:          5          5   IO-APIC-edge      i8042
  8:          1          0   IO-APIC-edge      rtc0
  9:          0          2   IO-APIC-fasteoi   acpi
 12:         68         68   IO-APIC-edge      i8042
 14:      34780      33890   IO-APIC-edge      ata_piix
 15:          0          0   IO-APIC-edge      ata_piix
 16:      50098      49903   IO-APIC-fasteoi   nvidia
 17:       3454       3080   IO-APIC-fasteoi   nvidia
 18:          0          0   IO-APIC-fasteoi   mmc0
 19:          1          1   IO-APIC-fasteoi   ohci1394
 20:     347267     309814   IO-APIC-fasteoi   uhci_hcd:usb2, ehci_hcd:usb4, uhci_hcd:usb5
 21:      49550      44988   IO-APIC-fasteoi   uhci_hcd:usb3, HDA Intel, uhci_hcd:usb6
 22:          3          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb7
215:     164491     153559   PCI-MSI-edge      eth0
216:      47531      43970   PCI-MSI-edge      iwl4965
217:      66465      63122   PCI-MSI-edge      ahci
NMI:          0          0   Non-maskable interrupts
LOC:    2078656    2328864   Local timer interrupts
RES:    1004541     862813   Rescheduling interrupts
CAL:       1826       5533   function call interrupts
TLB:      42096      45506   TLB shootdowns
TRM:          0          0   Thermal event interrupts
SPU:          0          0   Spurious interrupts
ERR:          0
MIS:          0
I hope NVidia deal with this problem. I haven't had any luck with any of the 177. releases where at least 173. still mostly works.
NightOwl is offline   Reply With Quote
Old 09-12-08, 04:56 AM   #5
Nappers
Registered User
 
Join Date: Aug 2008
Posts: 11
Default Re: 177.70 fails on XPS M1730 with 2 x 8700M GPUs in SLi

Quote:
The IRQ 16 interrupt is not so much of a problem I have found - I think that is actually the PhysX componentry that isn't supported yet (or ever) in Linux - it gets disabled because there is nothing to service the interrupt.
Really? When I'm using the 2.6.24.7 kernel, lspci -vv shows that pin A of the first GPU is routed to IRQ 16 and the pin A of the second GPU is routed to IRQ 17. The PhysX componentry seems to be listed under
Code:
0f:00.0 Class ff00: AGEIA Technologies, Inc. Device 0000
with it's pin A routed to IRQ 7 (though it doesn't show up in /proc/interrupts).

Thanks for the tip about powertop. :-) I also have CONFIG_PCI_MSI=y in my kernels, but I wasn't sure if it was doing anything useful.
Nappers is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 04:51 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.