Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 03-21-12, 09:13 AM   #1
james-p
Registered User
 
Join Date: Jun 2010
Posts: 13
Default Xorg crash on centos5 with 295.20 and Quadro 2000

I've been seeing a number of Xorg crashes on machines running CentOS 5.5 - the latest was on a machine with a Quadro 2000 card that was recently upgraded to 295.20

/var/log/Xorg.0.log.old ends with:

Backtrace:
0: /usr/bin/Xorg(xf86SigHandler+0x71) [0x490ac1]
1: /lib64/libc.so.6 [0x3a142302d0]
2: /usr/lib64/xorg/modules/drivers/nvidia_drv.so [0x2b5c4f876a8c]
3: /usr/lib64/xorg/modules/drivers/nvidia_drv.so [0x2b5c4f877350]
4: /usr/lib64/xorg/modules/drivers/nvidia_drv.so [0x2b5c4f881b76]
5: /usr/lib64/xorg/modules/drivers/nvidia_drv.so [0x2b5c4f877a74]
6: /usr/bin/Xorg(miPolyText16+0xac) [0x4dde1c]
7: /usr/bin/Xorg [0x51d654]
8: /usr/bin/Xorg(doPolyText+0xe3) [0x44e6e3]
9: /usr/bin/Xorg(PolyText+0x74) [0x44ebd4]
10: /usr/bin/Xorg(ProcPolyText+0xe9) [0x448ad9]
11: /usr/bin/Xorg(Dispatch+0x1ca) [0x44b45a]
12: /usr/bin/Xorg(main+0x44e) [0x43376e]
13: /lib64/libc.so.6(__libc_start_main+0xf4) [0x3a1421d994]
14: /usr/bin/Xorg(FontFileCompleteXLFD+0x241) [0x432a49]

Fatal server error:
Caught signal 11. Server aborting

(WW) Mar 21 06:46:28 NVIDIA(0): WAIT (0, 6, 0x8000, 0x0000f87c, 0x0000f87c)

I've seen virtually identical backtraces on a few other machines - running earlier driver versions (280.13, 285.05.09 etc) and different cards (Quadro 2000 and Quadro FX 1800)

Any ideas on what might be causing this?
Attached Files
File Type: gz nvidia-bug-report.log.gz (78.5 KB, 42 views)
james-p is offline   Reply With Quote
Old 03-27-12, 05:09 AM   #2
james-p
Registered User
 
Join Date: Jun 2010
Posts: 13
Default Re: Xorg crash on centos5 with 295.20 and Quadro 2000

Xorg crashed again - this time I had gdb attached, so here is the output of 'bt full'

Program received signal SIGSEGV, Segmentation fault.
0x00002b43bd107a8c in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#0 0x00002b43bd107a8c in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#1 0x00002b43bd108350 in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#2 0x00002b43bd112b76 in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#3 0x00002b43bd108a74 in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#4 0x00000000004dde1c in miPolyText16 (pDraw=0x15df5620, pGC=0x131e70c0,
x=466, y=8, count=<value optimized out>, chars=<value optimized out>)
at mipolytext.c:136
n = 1
i = <value optimized out>
w = 0
charinfo = {0x1728dff0, 0x13b9e740, 0x13b9e6b0, 0x13b9e6b0,
0x13b9e710, 0x13b9e6b0, 0x13b9e530, 0x13b9edb8, 0x13b9e8a8,
0x13b9e9e0, 0x13b9e9e0, 0x13b9e998, 0x13b9e9e0, 0x13b9ede8,
0x13b9e530, 0x13b9e530, 0x13b9e530, 0x13b9eb78, 0x13b9eb48,
0x13b9ec80, 0x13b9ec80, 0x13b9ec98, 0x13b9ed10, 0x13b9e530,
0x13b9ebd8, 0x13b9eba8, 0x13b9ed10, 0x13b9e530, 0x13b9ed40,
0x13b9eb48, 0x13b9ece0, 0x13b9ec08, 0x13b9eb48, 0x13b9eb60,
0x13b9ec50, 0x13b9eba8, 0x13b9e530, 0x13b9e5d8, 0x13b9ecf8,
0x13b9ebf0, 0x13b9eb48, 0x13b9eb90, 0x13b9ec98, 0x13b9ed58,
0x13b9e878, 0x13b9ec98, 0x13b9ec68, 0x13b9ecb0, 0x13b9ec98,
0x13b9ec80, 0x13b9eba8, 0x13b9ec80, 0x13b9ed10, 0x13b9ea10,
0x13b9ed88, 0x13b9ecb0, 0x13b9eba8, 0x13b9e5d8, 0x13b9e530,
0x13b9ebc0, 0x13b9ece0, 0x13b9ec98, 0x13b9ec68, 0x13b9e530,
0x13b9ecf8, 0x13b9ebf0, 0x13b9eb48, 0x13b9eb90, 0x13b9eba8,
0x13b9ece0, 0x13b9e530, 0x13b9e5d8, 0x13b9f478, 0x13b9e230,
0x13b9f478, 0x13b9e230, 0x13b9ed70, 0x13b9e530, 0x13b9ecf8,
0x13b9ed28, 0x13b9eb78, 0x13b9ebf0, 0x13b9e530, 0x13b9ecb0,
0x13b9eb48, 0x13b9ece0, 0x13b9eb48, 0x13b9ec68, 0x13b9eba8,
0x13b9ed10, 0x13b9eba8, 0x13b9ece0, 0x13b9e530, 0x13b9ec98,
0x13b9ece0, 0x13b9e530, 0x13b9ec68, 0x13b9eba8, 0x13b9ec68,
0x13b9eb60, 0x13b9eba8, 0x13b9ece0, 0x13b9e530, 0x13b9eab8,
0x13b9e7d0, 0x13b9e9f8, 0x13b9ebf0, 0x13b9eb48, 0x13b9eb90,
0x13b9ec08, 0x13b9ec80, 0x13b9ebd8, 0x13b9e800, 0x13b9e530,
0x13b9e998, 0x13b9eb60, 0x13b9ec20, 0x13b9eba8, 0x13b9eb78,
0x13b9ed10, 0x13b9e7a0, 0x13b9e530, 0x13b9eb60, 0x13b9ec98,
0x13b9eb90, 0x13b9ed88, 0x13b9eb18, 0x13b9e908, 0x13b9e890,
0x13b9ecf8, 0x13b9ec38, 0x13b9ec08, 0x13b9ec80, 0x13b9eb18,
0x13b9e8d8, 0x13b9e8a8, 0x13b9e998, 0x13b9e530, 0x13b9e9f8,
0x13b9ebf0, 0x13b9eb48, 0x13b9eb90, 0x13b9eba8, 0x13b9ece0,
0x13b9e7a0, 0x13b9e530, 0x13b9eae8, 0x13b9eae8, 0x13b9ebc0,
0x13b9eb48, 0x13b9eb78, 0x13b9eba8, 0x13b9e698, 0x13b9e878,
0x13b9ec98, 0x13b9ec50, 0x13b9ec50, 0x13b9eba8, 0x13b9eb78,
0x13b9ed10, 0x13b9ec98, 0x13b9ece0, 0x13b9e698, 0x13b9e6e0,
0x13b9e680, 0x13b9e788, 0x13b9e698, 0x13b9e878, 0x13b9ec98,
0x13b9ec50, 0x13b9ec50, 0x3a13c08c00, 0x13b9eb78, 0x12a434d0, 0x0,
0x1f, 0x81ef7f8b, 0x3a13c08f64, 0x13b9ed10, 0x7fff35aba010,
0x81ef7f8a, 0x7fff35aba1a0, 0x7fff35aba1b8, 0x406668, 0x0, 0x0,
0x2b43b9263000, 0x2b43bcecce85, 0x9cba88, 0x2b43bcecb450,
0x100000000, 0x100000859, 0x13b9ec98, 0x12a43378, 0x7fff35aba1f0,
0x7fff35aba1a0, 0x81ef7f8b, 0x7fff35aba1b8, 0x0, 0x3a13c09162...}
#5 0x000000000051d654 in damagePolyText16 (pDrawable=0x15df5620,
pGC=0x131e70c0, x=<value optimized out>, y=8, count=1, chars=0x180e0bde)
at damage.c:1408
pGCPriv = 0x131e71f0
oldFuncs = 0x2b43bd849260
#6 0x000000000044e6e3 in doPolyText (client=0x17189120, c=0x7fff35aba340)
at dixfonts.c:1382
pNextElt = 0x180e0be0 "8e\004"
pFont = 0x158bcfb0
fid = 0
oldfid = <value optimized out>
err = 0
lgerr = <value optimized out>
client_state = NEVER_SLEPT
fpe = <value optimized out>
origGC = 0x0
#7 0x000000000044ebd4 in PolyText (client=0x2b43be329488,
pDraw=<value optimized out>, pGC=0xa, pElt=0x0, endReq=0x2b43bd86eac0 "",
xorg=0, yorg=8, reqType=0, did=15733016) at dixfonts.c:1455
local_closure = {client = 0x17189120, pDraw = 0x15df5620,
pGC = 0x131e70c0, pElt = 0x180e0bdc "\001",
endReq = 0x180e0be0 "8e\004", data = 0x131e70c0 "P\357\251\022",
xorg = 466, yorg = 8, reqType = 75 'K',
polyText = 0x51d520 <damagePolyText16>, itemSize = 2,
did = 15733016, err = 0, slept = 0}
#8 0x0000000000448ad9 in ProcPolyText (client=0x17189120) at dispatch.c:2341
err = <value optimized out>
pDraw = 0x15df5620
pGC = 0x131e70c0
#9 0x000000000044b45a in Dispatch () at dispatch.c:459
clientReady = <value optimized out>
result = <value optimized out>
client = <value optimized out>
nready = 0
start_tick = 25489760
#10 0x000000000043376e in main (argc=8, argv=0x7fff35abad98,
envp=<value optimized out>) at main.c:447
i = 1
error = 0
xauthfile = <value optimized out>
alwaysCheckForInput = {0, 1}
#0 0x00002b43bd107a8c in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#1 0x00002b43bd108350 in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#2 0x00002b43bd112b76 in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#3 0x00002b43bd108a74 in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#4 0x00000000004dde1c in miPolyText16 (pDraw=0x15df5620, pGC=0x131e70c0,
x=466, y=8, count=<value optimized out>, chars=<value optimized out>)
at mipolytext.c:136
#5 0x000000000051d654 in damagePolyText16 (pDrawable=0x15df5620,
pGC=0x131e70c0, x=<value optimized out>, y=8, count=1, chars=0x180e0bde)
at damage.c:1408
#6 0x000000000044e6e3 in doPolyText (client=0x17189120, c=0x7fff35aba340)
at dixfonts.c:1382
#7 0x000000000044ebd4 in PolyText (client=0x2b43be329488,
pDraw=<value optimized out>, pGC=0xa, pElt=0x0, endReq=0x2b43bd86eac0 "",
xorg=0, yorg=8, reqType=0, did=15733016) at dixfonts.c:1455
#8 0x0000000000448ad9 in ProcPolyText (client=0x17189120) at dispatch.c:2341
#9 0x000000000044b45a in Dispatch () at dispatch.c:459
#10 0x000000000043376e in main (argc=8, argv=0x7fff35abad98,
envp=<value optimized out>) at main.c:447
james-p is offline   Reply With Quote
Old 03-28-12, 09:26 AM   #3
james-p
Registered User
 
Join Date: Jun 2010
Posts: 13
Default Re: Xorg crash on centos5 with 295.20 and Quadro 2000

Had another crash early today - this time I had enabled core dumps, so can possibly obtain more info - although, I'm not really sure what I'm looking for as the crash is happening somewhere in nvidia_drv.so

The backtrace for today starts with:

Program received signal SIGSEGV, Segmentation fault.
0x00002b30f7b6ea8c in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#0 0x00002b30f7b6ea8c in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#1 0x00002b30f7b6f350 in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#2 0x00002b30f7b79b76 in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#3 0x00002b30f7b6fa74 in ?? ()
from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
No symbol table info available.
#4 0x00000000004dde1c in miPolyText16 (pDraw=0x1c887910, pGC=0x1b57e6b0,
x=436, y=8, count=<value optimized out>, chars=<value optimized out>)
at mipolytext.c:136
n = 4
i = <value optimized out>
w = 0
charinfo = {0x1bd00a10, 0x1bd00a10, 0x1bd00a10, 0x1bd00a10,

The source for miPolyText16() is:

118 _X_EXPORT int
119 miPolyText16(pDraw, pGC, x, y, count, chars)
120 DrawablePtr pDraw;
121 GCPtr pGC;
122 int x, y;
123 int count;
124 unsigned short *chars;
125 {
126 unsigned long n, i;
127 int w;
128 CharInfoPtr charinfo[255]; /* encoding only has 1 byte for count */
129
130 GetGlyphs(pGC->font, (unsigned long)count, (unsigned char *)chars,
131 (FONTLASTROW(pGC->font) == 0) ? Linear16Bit : TwoD16Bit,
132 &n, charinfo);
133 w = 0;
134 for (i=0; i < n; i++) w += charinfo[i]->metrics.characterWidth;
135 if (n != 0)
136 (*pGC->ops->PolyGlyphBlt)(
137 pDraw, pGC, x, y, n, charinfo, FONTGLYPHS(pGC->font));
138 return x+w;
139 }

It is crashing somewhere in the routine at line 136

gdb gives the the values of w=0, count=4, n=4, x=436, y=8 and the first 4 elements of charinfo are all:

{metrics = {leftSideBearing = 0, rightSideBearing = 0,
characterWidth = 0, ascent = 0, descent = 0, attributes = 0}, bits = 0x0}

and pGC->font contains:

{refcnt = 2, info = {firstCol = 0, lastCol = 255, firstRow = 0,
lastRow = 255, defaultCh = 0, noOverlap = 1, terminalFont = 1,
constantMetrics = 1, constantWidth = 1, inkInside = 1, inkMetrics = 0,
allExist = 0, drawDirection = 0, cachable = 1, anamorphic = 0,
maxOverlap = 0, pad = 0, maxbounds = {leftSideBearing = 0,
rightSideBearing = 10, characterWidth = 10, ascent = 8, descent = 2,
attributes = 1000}, minbounds = {leftSideBearing = 0,
rightSideBearing = 10, characterWidth = 10, ascent = 8, descent = 2,
attributes = 1000}, ink_maxbounds = {leftSideBearing = 0,
rightSideBearing = 10, characterWidth = 10, ascent = 8, descent = 2,
attributes = 1000}, ink_minbounds = {leftSideBearing = 0,
rightSideBearing = 10, characterWidth = 10, ascent = 8, descent = 2,
attributes = 1000}, fontAscent = 8, fontDescent = 2, nprops = 36,
props = 0x1b728e90,
isStringProp = 0x1b7290d0 "\001\001\001\001\001\001\001"}, bit = 0 '\000',
byte = 0 '\000', glyph = 4 '\004', scan = 1 '\001', format = 512,
get_glyphs = 0x3a14e34950 <_fs_get_glyphs>, get_metrics = 0x3a14e34650,
unload_font = 0x3a14e348e0 <_fs_unload_font>, unload_glyphs = 0,
fpe = 0x1b309110, svrPrivate = 0x0, fontPrivate = 0x1fee7f80,
fpePrivate = 0x1fee7fa0, maxPrivate = 0, devPrivates = 0x1ec62ec8}


Unfortunately, I have no idea what all the above means ... and whether it is significant or not

Can anyone tell if this is an Nvidia driver problem or Xorg issue - and any idea what I can do next to get to the bottom of this ???

Thanks
james-p is offline   Reply With Quote
Old 04-03-12, 06:22 AM   #4
james-p
Registered User
 
Join Date: Jun 2010
Posts: 13
Default Re: Xorg crash on centos5 with 295.20 and Quadro 2000

I added a some logging via syslog to the Xorg code in miPolyText16() that logs when all the elements of the charinfo array are zero or NULL - and I get one or more of these log messages just before a crash - so it appears to be related ...

I have no idea if the elements of this structure can all be zero/NULL - but I guess something is going wrong elsewhere and then causing Xorg to crash later as a result.

As a work round, I've added a hack to miPolyText16() that removes any elements of the charinfo array that have elements that are all zero/NULL - and so far this appears to 'work' (as in, Xorg no longer crashes ...)
james-p is offline   Reply With Quote
Old 04-03-12, 08:57 AM   #5
Plagman
NVIDIA Corporation
 
Plagman's Avatar
 
Join Date: Sep 2007
Posts: 254
Default Re: Xorg crash on centos5 with 295.20 and Quadro 2000

I wonder if this is random memory corruption or possibly a real bug with core text handling on that version of X. Do you have any custom fonts installed? (core fonts, not client fonts for use by Xft/etc)
Plagman is offline   Reply With Quote
Old 04-03-12, 09:55 AM   #6
james-p
Registered User
 
Join Date: Jun 2010
Posts: 13
Default Re: Xorg crash on centos5 with 295.20 and Quadro 2000

I'm not aware we have any custom fonts installed (or, more correctly, any new custom fonts that we haven't had installed for years).

We think we have possibly found an application that can trigger the crash - but this is a 3rd party application that is used by hundreds of our users day in/day out without a problem (using the same OS install, Xorg version and Nvidia cards/drivers). The application is Tcl/Tk based.

I have see a few very similar backtraces for different users on different machines over the last year or so. I don't know if this same application was in use at the time or not.

It just so happens that I have one user at the moment that is suffering 3 or 4 Xorg crashes in this way a day - and the problem moves with the user when using other machines. In this way, it is quite handy to have this situation with this user, as hopefully it can give more debug info ...

My guess is that this probably isn't an Nvidia driver issue - but to aid my understanding, is it (legally) possible to get all zero/NULL values in the charinfo array elements returned by GetGlyphs()?

Also, is it possible to find out which font is being used at the time of the crash from the core dump info?

Thanks
james-p is offline   Reply With Quote
Old 04-05-12, 09:53 AM   #7
james-p
Registered User
 
Join Date: Jun 2010
Posts: 13
Default Re: Xorg crash on centos5 with 295.20 and Quadro 2000

Poking about the pGC->font structure in miPolyText16() from the core dump, the font in question appears to be:

"-misc-baekmuk batang-medium-r-normal--10-*-*-*-*-*-iso10646-1"

Which on RHEL5/CentOS5 systems is from the fonts-korean RPM

I have no idea why this font is involved - the user is not knowingly using this font.

However, I have managed to reproduce the crash with a simple piece of X11 code - based on examples found on the web (attached)

This app crashes Xorg with the same backtrace every time when using CentOS 5.6 and any Nvidia driver version. It doesn't crash Xorg when I use the nv or vesa Xorg driver

Running the app on a CentOS 6.2 box and it doesn't crash Xorg - so I guess it something to do with the Xorg server version that RHEL5/CentOS5 uses and the Nvidia drivers ???

I'm going install CentOS 5.8 on a machine to see if it has the same problem
Attached Files
File Type: gz crash.c.gz (646 Bytes, 36 views)
james-p is offline   Reply With Quote
Old 04-06-12, 08:57 AM   #8
james-p
Registered User
 
Join Date: Jun 2010
Posts: 13
Default Re: Xorg crash on centos5 with 295.20 and Quadro 2000

There are other fonts that cause the same crash - e.g. '-b&h-luxi mono-bold-r-normal--0-0-0-0-m-0-ascii-0' - which on RHEL5/CentOS5 is from the 'xorg-x11-fonts-truetype' RPM
james-p is offline   Reply With Quote

Old 04-28-12, 03:31 PM   #9
james-p
Registered User
 
Join Date: Jun 2010
Posts: 13
Default Re: Xorg crash on centos5 with 295.20 and Quadro 2000

I've done some more digging and it appears the problem is to do with xfs (X font server) and the Nvidia driver - and it is not linked to any Nvidia driver version, card type or possibly distro (the subject of this thread reflects the driver/hardware/OS where the problem was first spotted).

We use CentOS 5 with xfs, however CentOS 6 (and probably most other recent distros) have deprecated xfs - however, if I install, configure and use xfs with CentOS 6 using the Nvidia drivers, then I can crash Xorg in exactly the same way. I can not crash Xorg using xfs/CentOS6 with, for example, the nouveau driver.

It appears xfs generates slightly different glyphs for the fonts/characters that cause the crash. xfs has the bits pointer element of the charinfo struct set to NULL as returned by GetGlyphs() - but when xfs is not used, the same bits pointer is set to a valid empty string.

i.e it appears that the Nvidia drivers are at fault - as all the other Xorg drivers I've tried don't crash in this situation.

I've also found that I can crash Xorg with the same fonts/characters using XDrawString(), XDrawImageString() and XDrawImageString16() - with the crashes in the Xorg code in miPolyText8(), miImageText8() and miImageText16() - as they all use the same GetGlyphs() function.

So, the simple fix is to not use xfs - but as we still do, I have hacked/patched Xorg to set the bits pointer in the above situations to a valid empty string when GetGlyphs() returns a charinfo structure that has all elements set to zero/NULL. We haven't had any of these crashes since applying this workaround.

I still don't know why these apparently random font/characters are being rendered in the first place - but whatever the reason, rendering what is a valid font/character should not crash Xorg ...
james-p is offline   Reply With Quote
Old 04-29-12, 01:30 AM   #10
Plagman
NVIDIA Corporation
 
Plagman's Avatar
 
Join Date: Sep 2007
Posts: 254
Default Re: Xorg crash on centos5 with 295.20 and Quadro 2000

That's good to know, thanks for digging; I'll look into it. You're right that we probably shouldn't crash in this case.
Plagman is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 10:26 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.