nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   The strangest problem in the world. (http://www.nvnews.net/vbulletin/showthread.php?t=48180)

Madera 03-24-05 02:02 PM

The strangest problem in the world.
 
I have a four headed system with four Riva TNT2 64MB cards. My configuration is okay and I know a bit about multi headed systems.

The problem is that sometimes the system stops reponding (num lock doesn't respond and such) but (please grasp yourselves) there is a way to bring the system back and make the system work ok!!

What is that magical way??

To ping the system!! :confused: :confused:

I'm really impressed with this problem and I'm seeing that this could be a Guiness World Record for strangest problem in the Nvidia cards area.

We suspect of a driver problem but we accept any clues for this misterious problem.

Thanks for anyone who could help us with clues as to what the heck ping could have to do with the system stopping completely. The systems only runs with a remote machine pinging it and if you stop the ping on the remote machine it stops too... this is plain strange.

Thanks for your attention,
Rodrigo

leonardh 03-24-05 02:29 PM

Re: The strangest problem in the world.
 
Are you sure it's not NIC related? I've had the oposite happen (system locks when someone pings it) when a NIC was on the verge of failing.

Madera 03-24-05 02:34 PM

Re: The strangest problem in the world.
 
Thanks for your reply.

We are sure that it's not a NIC problem because in the 2.4.25 kernel version and the 5xxx nvidia driver it worked correctly.

We are struggling at this moment to figure this out but never saw something soo ackward.

Our NIC is an onboard VIA-Rhine with the kernel 2.6.10 module via-rhine.

The key question would be: How in earth's name does a simple ping from a remote machine interfere with theunlockup of the machine? What could it do during the ping response?

Thanks for your input,
Rodrigo

silentplummet 03-24-05 05:09 PM

Re: The strangest problem in the world.
 
cat /proc/interrupts

Madera 03-24-05 05:52 PM

Re: The strangest problem in the world.
 
[root@naugthysystem root]# cat /proc/interrupts
CPU0
0: 187504 XT-PIC timer
1: 481 XT-PIC i8042
2: 0 XT-PIC cascade
7: 0 XT-PIC uhci_hcd, nvidia
8: 1 XT-PIC rtc
9: 0 XT-PIC acpi
10: 0 XT-PIC ehci_hcd
11: 4141 XT-PIC uhci_hcd, uhci_hcd, eth0, nvidia
12: 2054 XT-PIC i8042
14: 8465 XT-PIC ide0
15: 23 XT-PIC ide1
NMI: 0
ERR: 0


That was nice for you to point out. I am working on this computer for hours now and I have some more information to help you.

About the ping, as we can see in the interrupts the nvidia card and the NIC are sharing IRQ11. That answers why the video driver exits when we ping the NIC card. It fires an IRQ11 and the nvidia driver does something strange and all becomes working again.

Also, I didn't mention that the error I'm getting is

NVRM: Xid: 8, Channel 00000001
NVRM: Xid: 16, Head 00000000 Count 00000004

Or variations like that. So now this problem becomes more clear, but still no solution to it.

What could be the Nvidia driver be doing because of this IRQ11 sent by the NIC? Maybe this IRQ is causing it to abort something it was doing and letting the system stable.

Thanks for your attention and I hope we find a solution for all of us.

Best wishes,
Rodrigo Madera

Madera 03-25-05 12:17 PM

Re: The strangest problem in the world.
 
An update on my investigation if anyone out there is having the same problem.

As the cat/proc/interrupts showed, we are sharing IRQ11 for the uhci_hcd, et0 and our infamous nvidia card.

As I pointed out, when I turn on my multi-head system, it goes blocking from seconds to seconds, untill the network card receives some traffic. This is showing that the system is only getting stable AFTER THE CARD IS USED, or in other words when the card sends an IRQ11.

I don't know much about Linux USB, but since the uhci_hcd is also in IRQ11 I tried to make it throw and IRQ11 by inserting a memory key, but to no avail.

Now, after supreme hours of supreme suffering, I discovered that a good'ol "service network restart" (Fedora Core 3, btw) would make the network card make it's IRQ11 and the system works.

So what we can see from all this is:

1) The nvidia driver is expecting some kind of action from the IRQ11 card(s) and is aborting some kind of loop after it receives an IRQ11 generated from a device that isn't hers (the NIC).

2) We can make the NIC send an IRQ11 and make the nvidia driver come back to the real world.

3) This is not such a strange problem after all... there goes my Guines!! :D

Hugs,
Rodrigo

Madera 04-05-05 01:28 PM

Re: The strangest problem in the world.
 
Well, I have continued my research and found some interesting things.

By now, I can fix the problem if I can make a nvidia-driver-intercepted irq fire. That is, if I fire the interrupt where one of my nvidia card listens (IRQ11) but my ethernet adapter also uses (IRQ11) the nvidia driver will work.

So one solution would be to modify my NVIDIA-6111 driver to constantly run it's ISR (interrupt service routine) via a timer, setting the params to something inofensive that would result in ignoring the signal.

Could anyone know exactly how this could be done and if it could be done with the source portion of the nvidia driver??

Thanks to all and I hope this solves the problems of other humans =o)

Rodrigo


All times are GMT -5. The time now is 04:02 PM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.