Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 04-06-06, 04:24 PM   #49
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by zander
@JaXXoN: i think it's sufficient to disable PAT support, i.e. to load the NVIDIA Linux kernel module with nv_disable_pat=1, the driver ought to take care of the rest.
Thanks for the hint! I think that did the trick: after running glxgears, ping -f -s <ip>
and "find /" in parallel for about an hour, cyclictest captured a maximum
latency of 93 microseconds, /proc/latency_trace was at 43 microseconds.

The high latencies, especially while starting/stopping glxgears, have disapeared!

With glxgears, i couldn't recognize any performance drop, so far - i will check
with ut2k4 and maybe with SPECviewperf, if i have time.

Anyway: in addition to disabling pat support, i needed to uncomment the
three occurences of nv_flush_cache() in nv_rm_malloc_pages() and
nv_rm_free_pages(). Since change_page_attr() is not any more called
in these functions, i conclude flushing caches is not necessary any more?!
At least, I didn't yet had any freezes or other negative effects so far.
There is a fourth occurence of nv_flush_cache() in nv_vmap_vmalloc(),
but this one is never called when pat support is disabled (so i left it in).

Please find attached a patch that disables pat support by default and
comments nv_flush_cache() where appropriate. Patch order:

1. NVIDIA_kernel-1.0-8178-U012206.diff.txt
2. NVIDIA_kernel-1.0-8178-1491837.diff.txt
3. patch-nv-1.0-8178-U012206-1491837-2.6.16-rt11
4. patch-nv-1.0-8178-U012206-1491837-2.6.16-rt11-nowbinvd-20060406.txt


So the only problem left for now is the "3-seconds glxgears sticky" effect
on single processor machines, but as far as i can tell, this is somehow caused
by hrtimers and has basically little to do with the nvidia driver.

Zander, thanks again very much for your patience tracking down the
latency problem and helping to fix it!

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 04-06-06, 04:27 PM   #50
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Here's the patch i talked about in my previous post.

Bernhard
JaXXoN is offline   Reply With Quote
Old 04-06-06, 06:03 PM   #51
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by JaXXoN
Anyway: in addition to disabling pat support, i needed to uncomment the
three occurences of nv_flush_cache() in nv_rm_malloc_pages() and
nv_rm_free_pages(). Since change_page_attr() is not any more called
in these functions, i conclude flushing caches is not necessary any more?!
At least, I didn't yet had any freezes or other negative effects so far.
There is a fourth occurence of nv_flush_cache() in nv_vmap_vmalloc(),
but this one is never called when pat support is disabled (so i left it in).
OK, I thought you were still using the build that had the NV_CPA_NEEDS_FLUSHING #define disabled. As to nv_vmap_vmalloc(), this function isn't built unless you use a Linux/x86 2.4 kernel with vmap() support. The function works around an integer overflow in Linux 2.4's vmap().

Quote:
Originally Posted by JaXXoN
So the only problem left for now is the "3-seconds glxgears sticky" effect
on single processor machines, but as far as i can tell, this is somehow caused
by hrtimers and has basically little to do with the nvidia driver.

Zander, thanks again very much for your patience tracking down the
latency problem and helping to fix it!
No problem, it's good to hear that things are working better for you with these changes. I'll try to look into the other problems sometime after the upcoming 1.0-87xx driver release.
zander is offline   Reply With Quote
Old 04-06-06, 10:35 PM   #52
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

I'll also give it a try tommorrow and how it behaves. So far, over the last 4 or 5 days, the "patch-nv-1.0-8178-U012206-1491837-2.6.16-rt11" modification has worked quite well, with no system hardlocks or strange behavior while switch consoles or restarting the X-server. glxgears - well it seems to be the only app suffering the 3-second sticky problem, here. The 3D modelling stuff works as well as it always did & the games are still about the same.
I'd agree with JaXXoN - there's not much to suggest that glxgears' strange behavior is anything but an hrtimers problem. After looking at the glxgears source - it seems to be the call to gettimeofday() used to calc. the framerate that's the offender.

I see also that -rt13 is out with some patches to the hrtimers system by Steve Rostedt & I'll see if the change brings any improvements. I might also see how Mesa and hrtimers get along with each other, if time allows.

Anyway - a big thanks to both of you for taking the time get 8178 working with the RT kernels! I just wonder how long it will take for a new RT patch breaks things all over again.
dmetz99 is offline   Reply With Quote
Old 04-07-06, 04:01 PM   #53
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

OK guys - how would you expect 8756 to perform with an RT kernel? It looks like most of the zander/JaXXoN patches still apply cleanly to 8756, so we'll see I guess.
dmetz99 is offline   Reply With Quote
Old 04-07-06, 04:03 PM   #54
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

@dmetz99: unfortunately i'd expect 1.0-8756 to fare no better than 1.0-8178 without patching; Jaxxon's patch should work mostly fine with the new driver release, though.
zander is offline   Reply With Quote
Old 04-07-06, 07:08 PM   #55
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

I suspected as much. Shouldn't be to much of a problem to adapt the 8178 patches. Got to build and test -rt13 with 8178 first - I was seeing some differences on the P3 box that were interesting.
dmetz99 is offline   Reply With Quote
Old 04-07-06, 09:10 PM   #56
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

A quick test with -rt13 and 8756, patched looked good. Some modest performance increase, no instability in short test.

Glad to see the thermal bug was finally squashed!
dmetz99 is offline   Reply With Quote

Old 04-10-06, 12:13 PM   #57
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Hi!

Please find attached a patch for nv-1.0-8756 tested with 2.6.16-rt14
including all earlier rt specific patches discussed in this thread.

Please note that PAT support is disabled by default which can
degrade 3D performance. So if 3D performance is more important
for you then guaranteed low latencies, please change the initial
value for "nv_disable_pat" in nv.c back to "0".

regards

Bernhard
Attached Files
File Type: txt patch-nv-1.0-8756-rt.txt (5.5 KB, 202 views)
JaXXoN is offline   Reply With Quote
Old 04-10-06, 01:06 PM   #58
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

I had put together a virtually identical patch, based on your earlier -rt11/8178 patch (and zander's, too). Works quite well on both boxes of mine.

I've found that the "sticky glxgears" problem goes away if I boot the P4 box with the "noapic nolapic" options. The older P3 box never uses IO-APIC (even if compiled into the kernel - uses the PIC, instead) and never suffers the problem. Booting the P4 with "no apic/nolapic" essentially forces it to use the PIC also. I suspect this effectively disables hrtimers. I'd conclude that this is a residual hrtimers bug and might not have anything to do with the nvidia driver, directly.
Anyway, it's not a big deal, since glxgears doesn't do anything very useful, anyway!
dmetz99 is offline   Reply With Quote
Old 04-10-06, 01:52 PM   #59
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Hi!

While doing some more advances real time benchmarks, i.e. playing
UT2004 while running cyclictest, i still observered high latencies up to
600 microseconds. In order to understand what might cause such a high
latency, i re-activated my "proprietary APIC timer" interface project.
Please check the README in the attached package for details on how
to capture the high latency path(s).

It looks like that the CPU is stopped for several hundred microseconds
while accesing the DMA status byte of the nforce4 sata controller.


I experienced a similar effect while working on an embedded PowerPC
platform: there, in-appropriate PCI bus arbitration settings could prevent
the CPU from accessing the PCI bus during long taking DMA transfers
initiated by a PCI card.

@Zander, it is possible to get some more details on the nforce4 chipset
on that topic?

regards

Bernhard
Attached Files
File Type: zip apictimer-checkeip.zip (7.9 KB, 184 views)
JaXXoN is offline   Reply With Quote
Old 04-10-06, 01:58 PM   #60
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by dmetz99
I suspect this effectively disables hrtimers.
Probably. Does "cyclictest" still tells reasonable values? if not, i.e.
ever increasing "Act" and "Max" values, then hrtimer support is disabled.

Quote:
Originally Posted by dmetz99
I'd conclude that this is a residual hrtimers bug and might not have
anything to do with the nvidia driver, directly.
ACK, I came to the same conclusion.

Quote:
Originally Posted by dmetz99
Anyway, it's not a big deal, since glxgears doesn't do anything very useful, anyway!
Nevertheless, this issue should be fixed - in the mean time, i figured
out it might have something to do with sched_yield(). At least this
is the only system call issued while glxgears is running.

regards

Bernhard
JaXXoN is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 12:50 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.