Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 03-16-10, 12:39 PM   #1
bengershon
Registered User
 
Join Date: Nov 2004
Posts: 15
Default Strange problem after hardware upgrade

I recently upgraded my hardware, and amongst other things moved from a 7600 to a 9800 nvidia graphics card. I am using a Gigabyte P55-UD5 motherboard with kernel 2.6.33.1 (custom built) and the latest 195.36.15 nvidia driver, on a mainly Fedora 12 system.

I mention these details even though I don't think they are all relevant!

If I start the X window system (runlevel 5) with the gnome window manager (which I always use) then after a while the system becomes unusably slow. If I manage to 'escape' X fast enough to a console screen, I cannot type anything as all characters entered are entered twice!

If left to its own devices, it may take up to an hour for the system to slow down. However, if firefox is used this time is reduced to about 10 minutes.

Why I say that the change of graphics card may not be relevant is because when I put back the old card the problem was worse!

Now, I had the same problem after the upgrade with the 190 series driver, which was the same as I had been using on the pre-upgrade system. I had also been using the same kernel (at the time it was 2.6.32.7) the only change being to recognise different network driver hardware.

As things stand, the system is unusable, unless I only use the console (so it is fine in its role as a server, but not as a desktop machine).

What has happened? If it was caused by some upgrade which I installed via yum, surely there would be others reporting terrible problems over the past few days!
bengershon is offline   Reply With Quote
Old 03-17-10, 07:03 AM   #2
bengershon
Registered User
 
Join Date: Nov 2004
Posts: 15
Default Re: Strange problem after hardware upgrade

More info.

The system does not just slow down in a 'conventional' manner, by which I mean that things just run slowly. The mouse cursor becomes very unresponsive as well. I think that somehow the graphics card itself is getting 'clogged up'. Could this be related to the other thread here about memory leaks?

http://www.nvnews.net/vbulletin/showthread.php?t=143220

The symptoms do not seem the same though.
bengershon is offline   Reply With Quote
Old 03-17-10, 01:45 PM   #3
Dizzle7677
Registered User
 
Dizzle7677's Avatar
 
Join Date: May 2008
Location: Relativity
Posts: 194
Default Re: Strange problem after hardware upgrade

You could start by looking for where the problem may obviously lie by using dmesg, lsmod and top/system monitor. I'd think the problem would be in whatever kernel settings you built it with and little to do with the driver, so start there first and maybe look into motherboard(if it's new also) hardware/bios quirks along with it. And since you're using nVid hardware you won't need to build the vgaarb or DRM (Direct Rendering Manager) kernel parts. Just guessing...
Dizzle7677 is offline   Reply With Quote
Old 03-17-10, 02:09 PM   #4
bengershon
Registered User
 
Join Date: Nov 2004
Posts: 15
Default Re: Strange problem after hardware upgrade

Thanks for the suggestions. I have already tried using top and 'ps uax', but neither show unusually high resource usage. That is why I am guessing that the problem lies in the card's memory getting full. However, it may be an interrupt problem or something else - possibly some kernel settings which were left over from my old Intel board.
bengershon is offline   Reply With Quote
Old 03-17-10, 03:29 PM   #5
zeb
Registered User
 
Join Date: Apr 2003
Posts: 52
Default Re: Strange problem after hardware upgrade

Does it occur with a live CD of another distro ?
zeb is offline   Reply With Quote
Old 03-17-10, 03:37 PM   #6
bengershon
Registered User
 
Join Date: Nov 2004
Posts: 15
Default Re: Strange problem after hardware upgrade

Quote:
Originally Posted by zeb View Post
Does it occur with a live CD of another distro ?
I don't know. I don't have one either. My system has been upgraded over the internet since about Fedora 9 and has been through other hardware upgrades in the past. Tis is the first time I have had such a strange problem.
bengershon is offline   Reply With Quote
Old 03-17-10, 05:50 PM   #7
bengershon
Registered User
 
Join Date: Nov 2004
Posts: 15
Default Re: Strange problem after hardware upgrade

I have done some more work. Firstly, here is the /var/log/messages file segment from the point where the problem happened:

Mar 18 00:32:32 linux kernel: ------------[ cut here ]------------
Mar 18 00:32:32 linux kernel: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1b3/0x1c0()
Mar 18 00:32:32 linux kernel: Hardware name: P55-UD5
Mar 18 00:32:32 linux kernel: NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
Mar 18 00:32:32 linux kernel: ---[ end trace e8bfe80e2f2cdec8 ]---
Mar 18 00:32:32 linux kernel: r8169: eth1: link up
Mar 18 00:32:44 linux kernel: r8169: eth1: link up
Mar 18 00:32:56 linux kernel: r8169: eth1: link up
Mar 18 00:33:08 linux kernel: r8169: eth1: link up
Mar 18 00:33:20 linux kernel: r8169: eth1: link up
Mar 18 00:33:38 linux kernel: r8169: eth1: link up
Mar 18 00:33:45 linux pppd[15941]: No response to 3 echo-requests
Mar 18 00:33:45 linux pppd[15941]: Serial link appears to be disconnected.
Mar 18 00:33:45 linux pppd[15941]: Connect time 22.4 minutes.
Mar 18 00:33:45 linux pppd[15941]: Sent 1661677 bytes, received 9704151 bytes.

Next, the machine itself does not seem to actually slow down, but the keyboard and mouse seem to become very unresponsive indeed. In addition, the network seems to be affected. I am using the r8169 driver for a Realtek 8111d network controller.

/proc/interrupts puts the nvidia card, eth1 and the USB port with the keyboard and mouse onto interrupt 12. Could this be the source of the problem - an interrupt clash? If so, can it be fixed up somehow?

Last edited by bengershon; 03-17-10 at 06:03 PM. Reason: added more information
bengershon is offline   Reply With Quote
Old 03-17-10, 06:52 PM   #8
bengershon
Registered User
 
Join Date: Nov 2004
Posts: 15
Default Re: Strange problem after hardware upgrade

For the time being, I have moved the keyboard and mouse to another USB port. This will eliminate the unresponsiveness of the system, but I fear that the network will still get disconnected, as I have seen reported elsewhere. I now have for /proc/interrupts:

Code:
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:        126          0          0          2          0          0          0          0   IO-APIC-edge      timer
  1:          0          0          0          2          0          0          0          0   IO-APIC-edge      i8042
  9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
 16:          0          0          0          0       2200          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3, uhci_hcd:usb9, eth1, nvidia
 17:          0          0          0          0          0          7          0          0   IO-APIC-fasteoi   firewire_ohci
 18:          0          0          0          0          0        387         19          0   IO-APIC-fasteoi   ahci, ahci, ehci_hcd:usb1, uhci_hcd:usb5, uhci_hcd:usb8
 19:          0      10869          0          0          0          0       7671          0   IO-APIC-fasteoi   ahci, uhci_hcd:usb7
 21:          0          0       1534          0          0          0          0        155   IO-APIC-fasteoi   uhci_hcd:usb4
 22:          0          0          0          0          0          0          0        963   IO-APIC-fasteoi   hda_intel
 23:          0          0          0          0          0          0          2          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb6
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:     333666     333754     332873     329752     333355     333263     332586     332858   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0          0          0   Performance pending work
RES:        366        709        249        353        277        743        522        393   Rescheduling interrupts
CAL:        412        354        303        424        407        319        373        418   Function call interrupts
TLB:        255        288        217        218        862        688        717        964   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          3          3          3          3          3          3          3          3   Machine check polls
ERR:          7
MIS:          0
Is there any way to eliminate the problematic interrupt sharing?
bengershon is offline   Reply With Quote

Old 03-17-10, 07:09 PM   #9
conky
Registered User
 
Join Date: Nov 2007
Posts: 70
Default Re: Strange problem after hardware upgrade

Try enabling MSI on the components that support it. MSI works pretty well on the Intel chipsets. Add NVreg_EnableMSI=1 to the nvidia kernel module options and enable_msi=1 to snd-hda-intel. The kernel should enable MSI by default on the ahci controllers and ethernet cards that are known to support it, and the Intel southbridge AHCI controllers should support it. You might not have MSI enabled in your kernel config.
conky is offline   Reply With Quote
Old 03-17-10, 07:44 PM   #10
bengershon
Registered User
 
Join Date: Nov 2004
Posts: 15
Default Re: Strange problem after hardware upgrade

Thanks for the MSI suggestion. So far all I have done is built a kernel with MSI enabled and the interrupt picture is already very much improved. the eth devices are now on their own lines, but nvidia still shares interrupts with some usb devices. There seem to be many spare interrupts available, so I don't quite understand why they are not being used.
bengershon is offline   Reply With Quote
Old 03-18-10, 02:28 AM   #11
zeb
Registered User
 
Join Date: Apr 2003
Posts: 52
Default Re: Strange problem after hardware upgrade

Someone has reported system freezes after upgrading from Fedora 10 to Fedora 12: http://www.nvnews.net/vbulletin/showthread.php?t=149056

Could your problem be similar?
zeb is offline   Reply With Quote
Old 03-18-10, 04:42 AM   #12
bengershon
Registered User
 
Join Date: Nov 2004
Posts: 15
Default Re: Strange problem after hardware upgrade

Quote:
Originally Posted by zeb View Post
Someone has reported system freezes after upgrading from Fedora 10 to Fedora 12: http://www.nvnews.net/vbulletin/showthread.php?t=149056

Could your problem be similar?
I don't know. My problem was clearly a case of interrupts clashing. Once that was discovered all that was required was to find a solution.

Here is the new /proc/interrupts display (with MSI enabled). What a difference that made!

Code:
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:        125          0          0        116          0          0          0          0   IO-APIC-edge      timer
  1:          0          0          0          1          0          0          0          0   IO-APIC-edge      i8042
  9:          0          0          0          2          0          0          0          0   IO-APIC-fasteoi   acpi
 16:       8312          0      29783          0          0          0          0        195   IO-APIC-fasteoi   uhci_hcd:usb3, uhci_hcd:usb9, nvidia
 17:          0        286         21          0          0          0          0        236   IO-APIC-fasteoi   firewire_ohci
 18:        408         63         33          0          0          0          0          0   IO-APIC-fasteoi   ahci, ahci, ehci_hcd:usb1, uhci_hcd:usb5, uhci_hcd:usb8
 19:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb7
 21:       1006          0          0          0          0      12687       8671       6399   IO-APIC-fasteoi   uhci_hcd:usb4
 23:          0          0          0          0          0          2          0          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb6
 24:   34694612          0          0          0          0          0          0          0  HPET_MSI-edge      hpet2
 25:          0   34710997          0          0          0          0          0          0  HPET_MSI-edge      hpet3
 26:          0          0   34707210          0          0          0          0          0  HPET_MSI-edge      hpet4
 27:          0          0          0   34679288          0          0          0          0  HPET_MSI-edge      hpet5
 28:          0          0          0          0   34687757          0          0          0  HPET_MSI-edge      hpet6
 34:          0          0      33564       7785          0      39422     184157          0   PCI-MSI-edge      ahci
 35:          0          0          0          0        431     156119          0          0   PCI-MSI-edge      eth1
 36:          0          0          0          0      20938      18402          0          0   PCI-MSI-edge    
 37:       2427         63          0          0          0          0       1103        449   PCI-MSI-edge      hda_intel
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:        545        585        495        407        317   34685840   34708754   34684188   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0          0          0   Performance pending work
RES:      18556      21393      11385     114236       3908       3158       3820       2361   Rescheduling interrupts
CAL:        461        387        514        399        466        502        479        432   Function call interrupts
TLB:       1162       1382       1963        897       3200       3602       3851       3048   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:        117        117        117        117        117        117        117        117   Machine check polls
ERR:          7
MIS:          0
I have to say that the system has now been working reliably for 9 hours and I am quite sure that the problem has been solved. Many thanks to all those who offered suggestions and advice - it is most appreciated.

Last edited by bengershon; 03-18-10 at 05:20 AM. Reason: added more information
bengershon is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 12:02 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.