Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 03-31-06, 08:59 AM   #13
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by dmetz99
I appreciate both your efforts - especially jaXXoN for having the stones to pester the RT developers about these issues!

(They usually ignore anything having to do with non-OS drivers.)
Thanks for the credits! Concerning the RT developers, at least Ingo
Molnar and especially Thomas Gleixner have typically replied to my
personal mails, being helpfull most of the time. But true, mails
to LKML concerning the nvidia driver raises a big red flag for most
readers of that mailing list :-)

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 03-31-06, 09:00 AM   #14
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by JaXXoN
The only thing that concerned me is that i was not sure was happens
if two processes are waiting for completion in a multicore system.
I just learned a minute ago that complete() will only wake up
exactly *one* process. Means, completions should in deed work pretty
fine!
Yes, the API also provides complete_all(), which wouldn't work in this context; functionally, the OS semaphore implementation with complete() should behave exactly like the current semaphore based logic.
zander is offline   Reply With Quote
Old 03-31-06, 09:12 AM   #15
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by zander
Yes, the API also provides complete_all(), which wouldn't work in this context;
Right!

Quote:
Originally Posted by zander
functionally, the OS semaphore implementation with complete() should behave exactly like the current semaphore based logic.
Agreed!

I think you (nvidia) should encourage linux nvidia users to try out your
patch, just to make sure that there are no other side effects we may have
overlooked - i can imagine that the patch even has positive influences
on yet unstable systems since semaphores are not intended to be used
in interrupt and tasklet context and thus may have negative side effects
on certain setups. I think your solution with completions is way more
robust and way less confusing :-)

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 03-31-06, 02:44 PM   #16
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by zander
You can try updating the #ifdef logic conditionalizing the NV_CPA_NEEDS_FLUSHING #define in nv-vm.c to read:
#if defined(KERNEL_2_4) && (defined(KERNEL_2_6) && defined(NVCPU_X86_64) && (LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 14)))
Instead of figuring out the correct logic (i.e., IMHO, the first "&&" should
be a "||"), i simply commented out "#define NV_CPA_NEEDS_FLUSHING 1".

Anyway, it did't do the trick. So I did some more analysis and figured out
that wbinvd is only called while enabling/disabling pat support (which is
done when the nvidia kernel module is loaded and unloaded) and
in os_flush_cpu_cache(). Seems like the later function is never called
from nvidia.o object file, so wbinvd actually can't be the problem?!

I re-compiled the kernel with latency trace support and get
pretty confusing outputs:

Code:
                _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
glxgears-2414  1D.h4    0us : __trace_start_sched_wakeup <<...>-15> (62 1)
glxgears-2414  1Dnh3    0us!: try_to_wake_up <<...>-15> (62 75)
glxgears-2414  1D..2  185us : trace_array <<...>-15> (62 62)
glxgears-2414  1D..2  186us+: trace_array <glxgears-2414> (75 78)
   <...>-15    1D..2  189us : __schedule <glxgears-2414> (75 62)
   <...>-15    1D..2  190us : trace_stop_sched_switched <<...>-15> (62 1)
   <...>-15    1D..2  191us : __schedule (__schedule)
Code:
                _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
glxgears-2525  1D.h5    0us : __trace_start_sched_wakeup <<...>-2732> (13 1)
glxgears-2525  1Dnh4    0us : try_to_wake_up <<...>-2732> (13 7d)
glxgears-2525  1Dnh2    0us!: clockevents_set_next_event (1c9ad7904a1 28fa)
glxgears-2525  1D..2  354us : trace_array <<...>-2732> (13 13)
glxgears-2525  1D..2  356us+: trace_array <glxgears-2525> (7d 78)
   <...>-2732  1D..2  360us : __schedule <glxgears-2525> (7d 13)
   <...>-2732  1D..2  361us : trace_stop_sched_switched <<...>-2732> (13 1)
   <...>-2732  1D..2  362us : __schedule (__schedule)
Code:
                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
glxgears-2774  0D.h4    0us : __trace_start_sched_wakeup <<...>-10> (62 0)
glxgears-2774  0Dnh3    0us : try_to_wake_up <<...>-10> (62 74)
glxgears-2774  0Dnh2    0us+: clockevents_set_next_event (1e19ebf4cf9 22a8)
glxgears-2774  0Dnh2    9us!: do_IRQ (c010f654 0 0)
glxgears-2774  0D..2  118us : trace_array <<...>-10> (62 62)
glxgears-2774  0D..2  120us+: trace_array <glxgears-2774> (73 78)
   <...>-10    0D..2  123us : __schedule <glxgears-2774> (73 62)
   <...>-10    0D..2  124us : trace_stop_sched_switched <<...>-10> (62 0)
   <...>-10    0D..2  124us : __schedule (__schedule)
Code:
                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
  <idle>-0     1D.h3    1us : __trace_start_sched_wakeup <<...>-15> (62 1)
  <idle>-0     1Dnh2    1us+: try_to_wake_up <<...>-15> (62 8c)
  <idle>-0     1Dnh2    5us : activate_task <<...>-19> (62 1)
  <idle>-0     1Dnh2    6us+: try_to_wake_up <<...>-19> (62 8c)
  <idle>-0     1D..2   11us : trace_array <<...>-15> (62 62)
  <idle>-0     1D..2   11us+: trace_array <<...>-19> (62 62)
   <...>-15    1D..2   15us+: __schedule <<idle>-0> (8c 62)
   <...>-15    1D..2   92us : trace_stop_sched_switched <<...>-15> (62 1)
   <...>-15    1D..2   93us : __schedule (__schedule)
Code:
                 _------=> CPU#            
                / _-----=> irqs-off        
               | / _----=> need-resched    
               || / _---=> hardirq/softirq 
               ||| / _--=> preempt-depth   
               |||| /                      
               |||||     delay             
   cmd     pid ||||| time  |   caller      
      \   /    |||||   \   |   /           
  <idle>-0     1D.h3    0us : __trace_start_sched_wakeup <<...>-15> (62 1)
  <idle>-0     1Dnh2    0us+: try_to_wake_up <<...>-15> (62 8c)
  <idle>-0     1D..2    3us : trace_array <<...>-15> (62 62)
   <...>-15    1D..2    6us+: __schedule <<idle>-0> (8c 62)
   <...>-15    1D..2   83us : trace_stop_sched_switched <<...>-15> (62 1)
   <...>-15    1D..2   85us : __schedule (__schedule)
It looks like that starting/stopping glxgears randomly prevents the
CPU from operation for up to several hundered microseconds.
(which doesn't happen when using the vesa driver and Mesa, BTW.)

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 03-31-06, 09:09 PM   #17
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

I've had a little time to try out the various patch iterations on the P4/UP box (Asus P4C800-E/3.2 gHZ P4/1 gig RAM/XFX 6800 video) with the latest RT kernel (now, -rt12). All appear quite stable, at least compared to no patches at all. All of the patched nvidia drivers, however, still exhibit the "3-seccond sticky glxgears" phenomenon that I don't see on the P3 box at work. (FWIW, my config for the P4 box is attached.)

Intuitively, the "smoothest" operator appeared to be the

NVIDIA_kernel-1.0-8178-U012206.diff.txt
NVIDIA_kernel-1.0-8178-1491837.diff.txt
patch-nv-1.0-8178-U012206-1491837-2.6.16-rt11

patch sequence.

From a practical standpoint, audio apps are working great. No dropouts from fluidsynth, driven by kmid, playing back some large midi file (many instruments) and concurrently running two glxgears windows and moving all the windows around rapidly. I haven't found any problems with any of the openGL modelling apps (yet) and full screen openGL apps seem to be starting and exiting gracefully.

I tried some testing under moderate loads with cyclictest (as JaXXoN suggested) but got no more than 31 max microseconds (I assume that's the correct unit, here) latency. This was with two glxgears windows running and browsing around with Firefox. I plan to do more testing as time allows.

Overall good results, so far!
Attached Files
File Type: txt config-2.6.16-rt12.txt (45.4 KB, 186 views)
dmetz99 is offline   Reply With Quote
Old 04-01-06, 12:35 PM   #18
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by dmetz99
I've had a little time to try out the various patch iterations
Thanks for testing and reporting!

Quote:
Originally Posted by dmetz99
All of the patched nvidia drivers, however, still exhibit the "3-seccond sticky glxgears" phenomenon that I don't see on the P3 box at work. (FWIW, my config for the P4 box is attached.)
1. Could you please also post the configuration file for the P3? Maybe
there is some kernel configuration option that makes the difference?!

2. Can you please reproduce that the effect disapears when attaching
a gdb server remotely to the X-server? Use the following procedure:

a) remote login to the P4 as "root" from another machine (i.e. using ssh)
b) attach to the running X-Server: gdb --pid `pidof X`
Please not that the X-server will be halted right at the moment it is
attached to gdb - you need to continue it manually:
c) in the gdb command console, type "c" and press "enter"

The 3-seconds-freezing effect disapeared for me when using
this procedure.

Be prepared that the screen freezes once on a while: whenever
the X-Server catches a signal (which may happen when exiting
some application), gdb will halt the X-Server. In this case,
turn to the gdb session running on the other machine and instruct
gdb to continue (type "c" + "enter").

Quote:
Originally Posted by dmetz99
I tried some testing under moderate loads with cyclictest (as JaXXoN suggested) but got no more than 31 max microseconds (I assume that's the correct unit, here) latency.
Right, these are microseconds.

BTW.: if you compile in "latency tracing", then you can check high
latency paths with the following procedure

1. echo 0 > /proc/sys/kernel/preempt_max_latency
2. do some stuff causing latencies
3. cat /proc/latency_trace

This will produce some output similar to what you can read in one
of my resent posts in this thread. BTW. In this context: i guess
the latency tracer was somehow missconfigured and told nosense
during my first experimentations. I can now reproduce that the
high latencies occuring when glxgears is started/stopped is caused
by "kernel_flush_map()", which is called from "global_flush_tlb()"
(not by "wbinvd" as suspected, earlier).

I will perform further investigations. My goal is to get a below
100 microsecond worst case user space context switch time.

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 04-01-06, 03:08 PM   #19
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

I will pull the config file for the P3 box on Monday AM and post it then. The machine is at work and inaccessable remotely because it's behind the firewall there.

One big difference between the machines is that this one is using IO-APIC for interupt routing and the P3 seems be using the PIC. I have not yet gone through trying the APIC/no-APIC/ACPI/no-ACPI combinations, yet. That hasn't yielded any progress in the past, but I'll give it a try, anyway.

I used latency tracing to try and find the source of the3-sec-glxgears bug when it first appeared, but did not find anything. Might be worth a second try..

I have not tried remote debugging, before. This should be an educational experience for someone whose debugging knowledge is low!

I'll keep you posted as I do these tests.

Thanks
dmetz99 is offline   Reply With Quote
Old 04-02-06, 10:29 AM   #20
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Some further data and answer's to JaXXoN's questions:

1. "3-second sticky glx-gears bug" is still present in in my P4 box EVEN WITH gbd attached to X (via ssh from a second machine).

2. If I send a SIGINT (break or ^C) to X during a freeze, it invariably breaks in the nanosleep() function. This has been a known problem area in the HR timers subsystem in the past.

3. The various combinations of apic/acpi on/off have no effect on the problem.


Further info on Monday or later today...


But - I must stress that overall stability with the zander/JaXXoN patch has been great!! This is a very great improvement in a small amount of time.
dmetz99 is offline   Reply With Quote

Old 04-02-06, 02:22 PM   #21
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

I built a 2.6.16-rt12 kernel with latency tracing enabled. Config was otherwise identical to 2.6.16-rt12. I enabled tracing as you (JaXXoN) described with

echo 0 > /proc/sys/kernel/preempt_max_latency

and ran glxgears through several 3-second "sticky" episodes.

Collected trace data -- cat /proc/latency_trace > latency_trac.e.txt


As I suspected from previous tests, the glxgears delays do not show up. The worst latency recorded was 15 microseconds, not 3 seconds. (Trace file is attached).

I still suspect a problem with nanosleep behavior with hrtimers.
Attached Files
File Type: txt latency_trace.txt (4.6 KB, 161 views)
dmetz99 is offline   Reply With Quote
Old 04-03-06, 05:49 AM   #22
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

@JaXXoN:

Config for the P3 box is attached. It exhibits no delays with glxgears.
Attached Files
File Type: txt config-2.6.16-rt12.txt (42.2 KB, 184 views)
dmetz99 is offline   Reply With Quote
Old 04-03-06, 07:58 AM   #23
JaXXoN
Registered User
 
Join Date: Jul 2005
Location: Munich
Posts: 910
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

Quote:
Originally Posted by dmetz99
2. If I send a SIGINT (break or ^C) to X during a freeze, it invariably breaks in the nanosleep() function.
Interessting: with earlier -rt/hrt versions, for me, the X-Server looped
through gettimeofday() - never stumbled over nanosleep().
Can you please send me a stack backtrace: after pressing ctrl+c
in the gdb window, please type "bt + enter".

Quote:
Originally Posted by dmetz99
This has been a known problem area in the HR timers subsystem in the past.
Do you have more details (URLs) describing the nature of this
known problem?

regards

Bernhard
JaXXoN is offline   Reply With Quote
Old 04-03-06, 08:20 AM   #24
dmetz99
Registered User
 
Join Date: Mar 2005
Posts: 84
Default Re: [PATCH, REALTIME] nvidia-1.0-8178 and Linux-2.6.16-rt11

OK - I'll get a backtrace from gdb tonight, when I get home.

The was some discussion about nanosleep problems/RT on the lkml, maybe a month or two ago. I don't recall the exact date. I'll do some searching thru the list and see what I can find and let you know.
dmetz99 is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 05:31 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.