Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 02-18-11, 05:31 AM   #1
khautomation
Registered User
 
Join Date: Feb 2011
Posts: 4
Default Xid Error and system hangup at FIXED ABSOLUTE DATES

Hello,

we use Novell Desktop Linux 10SP1 and NVS 290 on several Hardware Platforms (all Intel, Core2DUO and ASUS MB) and have the Problem of Xid Errors in /var/log/messages together with hangup of the X system at fixed dates. This happens on all machines with load on the X-Server at:

15Apr2010 07:45, 04Jun2010 00:47, 23Jul2010 17:08, and 31Oct2010 02:56,
19Dec2010 19:59 and the last: 07Feb2011 13:01.

The failure appears on a lot of machines even if they are not connect over network. It is reproduceable (!). Set the date to 07Feb2011 13:00 and start the X Server with load, it crahses.

The dates are at the moments, the gettimeofday() in the second-part returned values * 1000 make an overflow.

Can anyone have a look at the driver to fix this "time oveflow" problem please ?

Thanks a lot !
Attached Files
File Type: gz nvidia-bug-report.log.gz (54.8 KB, 79 views)

Last edited by khautomation; 02-18-11 at 05:35 AM. Reason: Add attachment
khautomation is offline   Reply With Quote
Old 02-20-11, 04:49 PM   #2
yumkam
Registered User
 
Join Date: Jul 2007
Posts: 5
Default Re: Xid Error and system hangup at FIXED ABSOLUTE DATES

Huh, nice catch with *1000 overflow. Seems I've hit same problem, with a bit less drastic consequences: just few WAIT in Xorg.0.log:
Code:
(WW) Dec 19 21:59:07 NVIDIA(0): WAIT (0, 6, 0x8000, 0x00006194, 0x00006194)
...[see attached log]...
(WW) Dec 19 21:59:15 NVIDIA(0): WAIT (0, 6, 0x0000, 0x0000e5dc, 0x0000e5dc)
(WW) Feb 07 15:01:53 NVIDIA(0): WAIT (0, 6, 0x0000, 0x0000c038, 0x0000c038)
...[see attached log]...
(WW) Feb 07 15:02:03 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00006948, 0x000074b8)
(differenet timezone, so time a bit shifted comparing to your).
i386/kernel 2.6.27.something/nvidia driver 260.19.29/xserver-xorg-core_1.1.1-21etch5/GF GT 430.
Just in case: there were no NVRM/XID in kernel logs at time when it happened; X was almost idle (I was AFK both times), no screensaver - only DPMS, but monitor was on 07Feb (not sure about 19Dec), IIRC there were no active opengl/XVideo/vdpau applications.
Attached Files
File Type: gz nvidia-bug-report.log.gz (25.2 KB, 64 views)
yumkam is offline   Reply With Quote
Old 02-22-11, 12:15 PM   #3
danix
NVIDIA Corporation
 
danix's Avatar
 
Join Date: Feb 2010
Location: Santa Clara, CA
Posts: 237
Default Re: Xid Error and system hangup at FIXED ABSOLUTE DATES

Thanks for reporting this.

The times you identified do all coincide with 32-bit overflows of "milliseconds since epoch":

<pre>$ for i in `seq 295 305`; do date -u -d "jan 1, 1970 0:0:0 utc + $[2**32/1000*i] sec"; done
Wed Feb 24 12:41:05 UTC 2010
Thu Apr 15 05:43:52 UTC 2010
Thu Jun 3 22:46:39 UTC 2010
Fri Jul 23 15:49:26 UTC 2010
Sat Sep 11 08:52:13 UTC 2010
Sun Oct 31 01:55:00 UTC 2010
Sun Dec 19 18:57:47 UTC 2010
Mon Feb 7 12:00:34 UTC 2011
Tue Mar 29 05:03:21 UTC 2011
Tue May 17 22:06:08 UTC 2011
Wed Jul 6 15:08:55 UTC 2011</pre>

I tried setting the clock on a 32-bit system to just before one of these times, then starting X and a bunch of X clients immediately after setting the clock, but wasn't able to trigger the failure... Could you be a little bit more specific about what you mean by "start the X Server with load"? Does X not crash when the X server is started without load?
danix is offline   Reply With Quote
Old 02-23-11, 06:38 AM   #4
khautomation
Registered User
 
Join Date: Feb 2011
Posts: 4
Default Re: Xid Error and system hangup at FIXED ABSOLUTE DATES

Yes, that is correct. If the X-Server is running"empty", no error occurs. The same in runlevel 3.
Our graphic application uses pixmaps, double buffer mode, painting in one pixmap and activating it if ready painted and then painting in the other, switching rate is 2 per second.

I did not catch the failure by X load, e.g. produced by xengine or any X "stressing" application alone
.
We work on isolating our code to generate simple failure reproducing test program independent from our application.

Last edited by khautomation; 02-23-11 at 06:42 AM. Reason: bad formulated
khautomation is offline   Reply With Quote
Old 02-23-11, 03:11 PM   #5
danix
NVIDIA Corporation
 
danix's Avatar
 
Join Date: Feb 2010
Location: Santa Clara, CA
Posts: 237
Default Re: Xid Error and system hangup at FIXED ABSOLUTE DATES

Thanks. It must be that the X clients I'm throwing at the server when I start it (and the other "stress" apps you tried) aren't exercising whatever path your application does, which ends up triggering the overflow. Let us know when you have a test case that reproduces the problem.
danix is offline   Reply With Quote
Old 02-25-11, 10:55 AM   #6
khautomation
Registered User
 
Join Date: Feb 2011
Posts: 4
Thumbs up Re: Xid Error and system hangup at FIXED ABSOLUTE DATES

Further testing got the following results:

1. Running our application on a separate machine with a time far away from the "crash time" and redirecting the display to the xdisplay with the nvs290 on a "crashtime" machine lets the crash appear.

2. Testing our application on other machines lets appear "WAIT" messages in der Xservers log file. e.g. with a Geforce 7300:

(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000ae3c, 0x0000ae3c)
(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000ae60, 0x0000ae60)
(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000ae84, 0x0000ae84)
(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000aea8, 0x0000aea8)
(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000aecc, 0x0000aecc)
...
(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000b07c, 0x0000b07c)
(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000b0a0, 0x0000b0a0)
(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000b0c4, 0x0000b0c4)
(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000b0e8, 0x0000b0e8)
(WW) Feb 07 13:01:53 NVIDIA(0): WAIT (0, 7, 0x8000, 0x0000b10c, 0x0000b10c)

3. Trying to isolate the code did not show the eror yet. X "stressing" application as xengine e.g. produce als "WAIT" messages in the Xserver log filesat the "crash time". If we reduce the load of our application by displaying simpler content, the crashes do not happen at "crash time" but a lot of wait states appear:

(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000c9cc, 0x0000ca9c)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000ca9c, 0x0000e53c)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000eedc, 0x0000f378)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000f378, 0x0000f448)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000ffe8, 0x00000f00)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00000f00, 0x00001730)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x0000, 0x00001b04, 0x00001d14)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00005df4, 0x00005ec4)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00005ec4, 0x00007964)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00007964, 0x00008194)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00008568, 0x00008778)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000c858, 0x0000c928)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000c928, 0x0000e3c8)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000efcc, 0x0000f1dc)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000f1dc, 0x0000f2ac)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x0000fff0, 0x00000d5c)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00000d5c, 0x0000158c)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00001960, 0x00001c30)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00001c30, 0x000036d0)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x0000, 0x000036d0, 0x00003f00)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x00004000, 0x000044e4)
(WW) Feb 07 13:02:01 NVIDIA(0): WAIT (1, 6, 0x8000, 0x000044e4, 0x000045b4)

I guess, disturbing the driver at "crash time" to let the X Server wait for it, this should be reproduceable with xengine e.g. as here on our site, leads with further load to the crash.

The appearance of the "WAIT"s and fixing them should fix the failure also.

Please let me know, if this helps you to reproduce at least the "WAIT"'s or if you need further informations.
khautomation is offline   Reply With Quote
Old 02-25-11, 04:46 PM   #7
danix
NVIDIA Corporation
 
danix's Avatar
 
Join Date: Feb 2010
Location: Santa Clara, CA
Posts: 237
Default Re: Xid Error and system hangup at FIXED ABSOLUTE DATES

Thanks - we've reproduced the issue locally now. Turns out my overflow date calculations were off by a couple of minutes, and also the bug doesn't reproduce in a bare X server or with a simple window manager... only while running a GNOME or KDE session. (I had previously been doing many of my tests in a bare X server.)
danix is offline   Reply With Quote
Old 03-27-11, 02:31 PM   #8
khautomation
Registered User
 
Join Date: Feb 2011
Posts: 4
Default Re: Xid Error and system hangup at FIXED ABSOLUTE DATES

Hi,

we tested the latest version (260.19.44) and still found the bug existing. Can we please get information about fixing this bug ? Our customers become a little bit nervous because of the outstanding crash next tuesday.

Thanks a lot !
khautomation is offline   Reply With Quote

Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 12:56 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.