Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 03-30-06, 06:13 PM   #1
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Just for those who are interessted in using realtime preemption and nvidia
hardware:


0. Summary

This post includes a patch to get the nvidia graphics
driver version 1.0-8178 working stable with the real time
preemption enhanced Linux kernel 2.6.16-rt11. This post also
includes detailed descriptions about the patch and known issues.


1. Installation

1.1 Kernel installation

Install Linux 2.6.16 with patch-2.6.16-rt11 applied. Please
find attached a kernel configuration file that works for my
setup.

1.2 nvidia driver installation

Make sure that the U012206 patch has been applied to your
nvidia glue source tree, then apply the patch, attached.
Compile/Install as usual.


2. Booting the kernel

You may need to apply the kernel boot options "idle=poll" and/or
"acpi=off" in order to get your system running stable:

If i don't apply "idle=poll", then it sometimes happens that gettimeofday()
returns bad values because the two cores on my AMD dual core system are
running out of sync. This may result in jumpy/laggy gaming i.e. while
playing UT2004. From what i understand, this is more a hardware related
issue on AMD multicore systems rather than a software problem. If you
have a single core system, then you probably don't need this option.

If i don't apply "acpi=off", then my system easily freezes within
a minute while browsing the web with mozilla - this problem
also occurs with recent vanilla kernels. Kernel 2.6.13 and earlier
are not affected. Obviously there have been changes in 2.6.14 and
newer that make my particular setup working pretty unstable when
ACPI is enabled.


3. Known issues

When "High Resolution Timer Support" is enabled (to be found
in kernel configuration menu "Processor type and features")
and "maxcpus=1" is applied as kernel boot option, then in my
setup, the screen easily freezes for a couple of seconds (typically
exactly three seconds - mouse pointer still moves) when dragging
a glxgears windows around. A remote "top" shows that Xorg consumes
100% cpu load. I have observered this effect since HRT support was
added to the realtime preemption patch. While experimenting with
2.6.14-rt13 on Fedora Core 4, i figured out that the X-Server
keeps looping like crazy through gettimeofday(), by attaching
gdb to the running X-Server remotly and stopping it once the
screen freezes (attach with "gdb --pid=<pid of Xorg>).

Interesstingly, the freezing-effect disapears when the X-Server
is attached to gdb!! Very strange, in deed. When gdb attaches
itself to another process, then the return path from a system call
for that process is different (ptrace path) - this might make
the difference.

If HRT support is disabled in the kernel configuration, then i also
couldn't reproduze the screen-freezing effect, but without HRT support,
a realtime preemption enhanced kernel is of limited use :-)

When running on two cores ("maxcpus=1" not applied), then the
freezing effect has also gone, but for some reason, the
HRT interface is also not available.


4. Usage of semaphores in interrupt context

During my experiments, I figured out that the nvidia driver uses
semaphores in interrupt and tasklet context, but sleeping functions
are basically strictly forbidden in those code areas! i.e. a call to
down() will cause a sleep in case the semaphore has aready been held
by another processes.

Unreflected usage of semaphores in interrupt context usualy
leads to heavy system instabilites, so i was wondering why the nvidia
driver is typically operating way more stable then one would expect
under this cirumstances. So i did some more analysis:

Whenever a not yet held sempahore is aquired with the nvidia
semaphore wrapper function os_sema_acquire(), then also all nvidia
interrupts are disabled (and re-enabled when releasing the
semaphore). So in this case, you can savely call down() in interrupt
context, because you know for sure that the semaphore can't have been
acquired by another process, because the prior nvidia sempahore lock
would also have disables the nvidia interrupts and thus the interrupt
couldn't have happend in the first place.

Altough the nvidia driver works stable that way, it would be better
to run the driver interrupt service routine in thread context.
Fortnuatly this exactly automatically happens when applying the
realtime preemption patch! However, there is still one issue:

The "real interrupt" will wakeup the nvidia "interrupt service
routine thread" (nv_kern_isr), which in turn may activate a
tasklet (nv_kern_isr_bh). The situation now is that a semphore
may be acquired in nv_kern_isr(), but released in nv_kern_isr_bh().
nv_kern_isr() is called in the "interrupt thread" while nv_kern_isr_bh()
is called in the softirq-tasklet thread. Means: the semaphore is
acquired in a different process then it will be released!

This is strictly forbidden! The simple solution to that problem
was to simply call nv_kern_isr_bh() directly in nv_kern_isr()
instead of scheduling it as a tasklet. So down() and up() are called
in the same process. Only drawback: with a vanilla kernel, adding
the "bottom half handler" to the interrupt service routine keeps
other interrupts from operating for a longer time then necessary,
but with a realtime preemption enhanced kernel, this can be compensated
by prioritizing the interrupts threads according to the users application.


5. System setup

For the tests, i was using Fedora Core 5 with the following hardware:

1 x AMD Athlon64 X2 4400+
1 x Asus A8N-SLI Premium
2 x Gainward 3500/PCX (Geforce 7800GTX)
3 x 20" 1600x1200 LCDs
1 x Maxtor 300GB HDD on nv_sata

Two LCDs are connected to the first 7800GTX, configured
in twinview mode and the third LCD is connected to the
second 7800GTX, tied to to the other two using Xinerama.


6. Conclusion

The patch seems to work pretty stable on a dual core, dual card
setup, however, realtime performance testing has only been
briefly done: since HRT support doesn't work in SMP mode,
cyclictest didn't worked i needed to write a simpler testing
application (attached, compile with "gcc -o rttest rttest.c -lrt"),
which shows a maximum of 4.8 microseconds, even under 3D and
disk load. With "maxcpus=1" applied, the worst case latency
measured was 155 microseconds under the same load.

Feedback appreciated

regards

Bernhard
Attached Files
File Type: zip patch-nv-1.0-8178-U012206-2.6.16-rt11.zip (1.8 KB, 528 views)
File Type: zip config-2.6.16-rt11.zip (7.7 KB, 311 views)
File Type: zip rttest.c.zip (1.1 KB, 323 views)
JaXXoN is offline   Reply With Quote
Old 03-30-06, 06:54 PM   #2
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Thanks for the detailed summary, JaXXoN. I've been loosely following some of the earlier discussions about incompatibilities with the Linux 2.6 -rt* kernels, but unfortunately, we haven't had a chance to investigate the problems in detail, yet. I haven't looked at your patch, but the summary helps get an idea of what the current status is. I'm not sure off-hand what the problem with high resolution timers might be.

With respect to the semaphore usage, it might be useful to note that the current implementation of os_acquire_sema() basically (ab)uses the Linux semaphore as a wait queue, i.e. when down() is acquired, the semaphore will already be in a locked state, such that the process attempting to acquire it will be put to sleep. The reasons for doing this are historical. We're currently investigating the use of completions to replace the wait semaphore; I can provide a patch if you want to give it a try.
zander is offline   Reply With Quote
Old 03-30-06, 07:44 PM   #3
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by zander
I'm not sure off-hand what the problem with high resolution timers might be.
In the mean time, i figured out that "acpi=off" is the problem: when not
applying that boot option then hrtimers are working with both cores
enabled but even executing mozilla crashes the system (even on
recent vanilla kernels). As told in another post, this is a kernel bug
introduced in 2.6.14-rc1 - 2.6.13 and earlier are working fine without
"acpi=off". Since hrtimer support is working with "acpi=off maxcpus=1",
i'm pretty confident that there is some way to get hrtimers also working
with both CPUs and ACPI disabled. I contacted Ingo Molnar and Thomas
Gleixner.

Quote:
Originally Posted by zander
With respect to the semaphore usage, it might be useful to note that the current implementation of os_acquire_sema() basically (ab)uses the Linux semaphore as a wait queue, i.e. when down() is acquired, the semaphore will already be in a locked state, such that the process attempting to acquire it will be put to sleep.
I changed that "miss-behaviour" in the patch above: the problem is
that if the semaphore is initialized locked, then
1. insmod/modprobe will exit with the lock held -> not allowed in -rt
2. the semaphore is freed by another process -> not allowed in -rt

Quote:
Originally Posted by zander
We're currently investigating the use of completions to replace the wait semaphore; I can provide a patch if you want to give it a try.
I already figured out that semaphores are used in a very strange
way, here :-)

I'm considering "wake_up_interruptible() + wait_event_interruptible()"
as an alternative. Nevertheless, i would certainly like to try out your
patch using completions.

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 03-30-06, 08:20 PM   #4
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Yes, I saw the lock-held-on-process-exit error messages with early -rt* kernels. The attached patch uses completions, instead. This logic hasn't seen extensive stress testing, yet (i.e. use at your own risk), but should work fine; though I haven't tested it on a -rt* kernel, it should work better than the current code.
Attached Files
File Type: txt NVIDIA_kernel-1.0-8178-1491837.diff.txt (1.8 KB, 328 views)
zander is offline   Reply With Quote
Old 03-30-06, 08:54 PM   #5
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

You really have been hard at it jaXXoN! It will take a bit for me to digest your summary (I'm not a very advanced programmer). I'll try both yours & zander's patches and see how they work.

I've been using your older 2.6.15-rt17 patch with good success on this P4/UP box, but had rather poor performance on my P3/UP test box at work. No crashes - just sluggish performance.

You are correct that the 'sticky glxgears window" phenomenon appeared with the introduction of the hrtimers subsystem. Curiously, it seems to be the only app that exhibits this (on this system, anyway). It doesn't seem to show up in any of the FE modelers/meshers such as Cubit or Salome, nor in any of the older openGL games i've got around here (Tuxracer or the Vavoom variant of the Doom engine).

I appreciate both your efforts - especially jaXXoN for having the stones to pester the RT developers about these issues!

(They usually ignore anything having to do with non-OS drivers.)
dmetz99 is offline   Reply With Quote
Old 03-31-06, 07:11 AM   #6
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by zander
The attached patch uses completions, instead.
I love small but effective and simple to understand patches like this one! :-)
Works pretty smooth for me, thanks for sharing the patch.


I have attached an additional patch that address some minor issues
concerning -rt:

1.
In nv.c, __nv_setup_pat_entries() and __nv_restore_pat_entries()
are doing direct interrupt flag manipulations which is not allowed
any more in -rt. I'm using raw_spinlocks, instead, but when unloading
nvidia.ko, then the kernel complains about "caller is drain_array_locked".
Doesn't seem to have negative effects other than that. I currently
have no idea how to solve that issue otherwise.

2.
NV_MAY_SLEEP() needed to be modifed to fix a potential bug in -rt.

3.
in -rt, NV_IRQL_IS_RAISED() is not working as expected. As a workaround,
i simply skipped the sanity check in os_delay().


The patch order is
1. NVIDIA_kernel-1.0-8178-U012206.diff.txt
2. NVIDIA_kernel-1.0-8178-1491837.diff.txt
3. patch-nv-1.0-8178-U012206-1491837-2.6.16-rt11

regards

Bernhard
Attached Files
File Type: txt patch-nv-1.0-8178-U012206-1491837-2.6.16-rt11.txt (3.7 KB, 254 views)
JaXXoN is offline   Reply With Quote
Old 03-31-06, 07:34 AM   #7
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

That's good to hear, thanks for the update. BTW, we're definetely interested in making the stock NVIDIA Linux graphics driver compatible with -rt* kernels now that they've had more time to mature; I hope to get to this after the upcoming web release.
zander is offline   Reply With Quote
Old 03-31-06, 08:19 AM   #8
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Update on hrtimer support:

hrtimers are working when applying "clocksource=tsc" as kernel boot
option (in addition to "acpi=off". The "three-seconds-freezing" effects
as experienced with glxgears when "maxcpus=1" is applied doesn't show
up in dual core operation.

While writing this in Mozilla, i have running glxgears in parallel and
cyclictest says only 12 micorseconds worst case latency - but only
when cyclictest is started after glxgears has been started and stopped
before glxgears is stopped. Otherwise i easily get about 200 microseconds
because the nvidia driver issues an "expensive" "wbinvd" instruction that
flushes the cache. Zander, could you please do me a favour and look if the wbinvd
is realy necessary in the nvidia driver? Maybe it is possible to replace the
wbinvd calls with flush_cache_range()? Otherwise, the brute-force solution
would be to read in several megabytes form main memory in a high priority
task before issuing the wbinvd instructions - means: the acutal number of
cache lines to be flushed is much smaller when most of them have been
flushed "manually", before.

regards

Bernhard

Last edited by JaXXoN; 03-31-06 at 08:39 AM.
JaXXoN is offline   Reply With Quote

Old 03-31-06, 08:26 AM   #9
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

I've given both NVIDIA_kernel-1.0-8178-1491837.diff.txt and patch-nv-1.0-8178-U012206-2.6.16-rt11 brief tests this morning on the P3/UP test box. I've had good results with both patches and comparable max latencies (using rttest) under heavy 3D loads. The only difference I saw was that zander's patch hardlocked the system once upon exiting KDE. (I've heard this may not necessarily be a driver issue..) Hopefully, I'll have time to test on the P4 box at home this weekend.

I'll test JaXXonN's latest patch as time allow. Thanks again, guys.
dmetz99 is offline   Reply With Quote
Old 03-31-06, 08:33 AM   #10
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

I believe the wbinvd instruction is only issued directly on Linux 2.4 and Linux/x86-64 2.6 kernels; on Linux 2.6, this is done because vanilla Linux 2.6.11 >= x < 2.6.14 and a number of distribution kernels have a bug in global_flush_tlb() that keeps it from flushing the CPU caches. You can try updating the #ifdef logic conditionalizing the NV_CPA_NEEDS_FLUSHING #define in nv-vm.c to read:
#if defined(KERNEL_2_4) || (defined(KERNEL_2_6) && defined(NVCPU_X86_64) && (LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 14)))
This should keep the NVIDIA kernel module from issuing the wbinvd instruction on working kernels.
zander is offline   Reply With Quote
Old 03-31-06, 08:53 AM   #11
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by zander
The attached patch uses completions, instead.
The only thing that concerned me is that i was not sure was happens
if two processes are waiting for completion in a multicore system.
I just learned a minute ago that complete() will only wake up
exactly *one* process. Means, completions should in deed work pretty
fine!

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 03-31-06, 08:54 AM   #12
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by zander
You can try updating the #ifdef logic
Thanks for the feedback! i will try it out and post the results.

regards

Bernhard
JaXXoN is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 06:29 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.