PDA

View Full Version : 4191 and 4349 2d corruption problem


Pages : [1] 2

grmoc
04-03-03, 08:29 PM
Ok, this one is a fun one...

(screenshot attached, with lots of ugly .jpg compression (sorry about that))

Take just about any program that updates the screen quickly.

e.g.

#include <stdio.h>
int main(int argc, char**argv){
int i;
while(1){
printf("\r");
for(int j=0;j<10;++j) printf("happy happy joy joy ");
printf(" %d",++i);
}
return 1;//keeps compiler happy. Bad compiler!
}

Ok, now run it for a while. Some random amount ot time later, you'll start getting corruption anywhere you move the mouse. The corruption seems to start in the rapidly updating window,
and then can at times go away.

This bug is not apparent in 3123, but IS in 4191.

.. I do lots of GL programming every day with your drivers (I do real-time television special-effects)... and that means that I can be a veritable fountain of bugs. Should I just email them to linux-bugs@nvidia.com, or do you want me to post here first?

Thanks!

grmoc
04-03-03, 09:17 PM
Switching desktops seems to temporarily clear the problem (i.e. fully obscuring everything, then reshowing it).

It really is a wierd bug. (And it -never- happens in 3123)

bwkaz
04-03-03, 09:33 PM
WFM...

But then again, I'm not running a heavy DE, either. Does it still happen if you turn off Gnome and run just a window manager?

grmoc
04-04-03, 03:59 AM
Yup. After some random amount of time, it still happens even with TWM and no pretty-stuff. (i.e. TWM+xterm)

It can take on the order of hours to reproduce this one at times.. Other times it takes seconds.

I havnt yet disabled a CPU (it is an SMP system) to see if that has anything to do with it.

bwkaz
04-04-03, 08:23 AM
Interesting... you might want to try disabling a CPU, yeah. Might be a workaround, anyway -- granted, a pretty crappy one, but still.

Your monitor isn't by any chance connected via the DVI port, in digital mode, is it?

grmoc
04-04-03, 01:43 PM
Hahahahah No...

When you have a DVI connection in digital mode on an SMP machine, you have problems (i.e. doesn't work at all, machine hangs).

It is in analog mode.... However that being said, with 4191 (when DVI worked) the same behaviour manifested with either the DVI output or the HD15 output.

I've tried turning off UBB, alas that did nothing.

I'll try it in non-SMP mode now.

Andy Mecham
04-04-03, 02:17 PM
Cool, thanks for the sample code. I'll look at this today.

When you have a DVI connection in digital mode on an SMP machine, you have problems (i.e. doesn't work at all, machine hangs).
Whatever works easiest for you. Sample code (and detailed descriptions) are very useful to those on the NVIDIA end of linux-bugs@nvidia.com. We're always happy to fix bugs. :D

--andy

grmoc
04-04-03, 02:57 PM
okies, we'll keep to this forum for the meantime, I'll send more easily reproducible bugs directly to linux-bugs@nvidia.com

Allrighty then, here is a screen-shot of the behaviour using TWM.

In this case, I did not have a lot of printouts in a terminal, but rather had a xforms (which is a widget set) text-label updating with the frame-rate. (some 1000 fps for this particular application).

Look at the fields which appear black. TWM exhibits slightly different behaviour than KDE's WM- when you switch windows most of the time the problem temporarily goes away.

I'm going to try uniprocessor mode next.

grmoc
04-04-03, 03:42 PM
Allrighty, this is in uniprocessor mode, with a different application triggering the bug. Screenshot attached.

Use the following code to duplicate the problem (which seems only to manifest when applications use GL... at least thusfar..)

//
// Command line to compile me:
//
// g++ test.cpp -lGL -lGLU -L/usr/X11R6/lib -lX11

#include <iostream>

using namespace std;

#ifdef DEBUG
# define ONDEBUG(X) X
#else
# define ONDEBUG(X)
#endif


#include <X11/Xlib.h>
#include <X11/keysym.h>
#include "GL/gl.h"
#include "GL/glx.h"
#include "GL/glu.h"

int main(int argc, char** argv){
Display *_dpy=0;
GLXFBConfig* _configs=0;
GLXContext *_ctx=0;

GLXPbuffer _pbuffer;

char *dpyName = 0;

_dpy = XOpenDisplay(dpyName);
if (!_dpy) {
std::cerr<<"Error: couldn't open display ";
if(dpyName) std::cerr<<dpyName;
std::cerr<<"\n";
exit(0);
}

int i;

int nItems=0;
int fbAttribList[]={
GLX_RED_SIZE,8,
GLX_GREEN_SIZE,8,
GLX_BLUE_SIZE,8,
GLX_ALPHA_SIZE,8,
GLX_STENCIL_SIZE,8,
GLX_DEPTH_SIZE,24,
GLX_ACCUM_RED_SIZE,16,
GLX_ACCUM_GREEN_SIZE,16,
GLX_ACCUM_BLUE_SIZE,16,
GLX_ACCUM_ALPHA_SIZE,16,
GLX_RENDER_TYPE,GLX_RGBA_BIT,
GLX_DRAWABLE_TYPE,GLX_PBUFFER,
GLX_CONFIG_CAVEAT,GLX_NONE,
None
};


_configs=glXChooseFBConfig(_dpy,DefaultScreen(_dpy ),
fbAttribList,&nItems);

//GLXFBConfig* configs=glXGetFBConfigs(dpy,DefaultScreen(dpy),&nItems);
ONDEBUG(std::cerr<<nItems<<" GLXFBConfigs returned.\n";)

if(_configs==0){
std::cerr<<"error: "<< gluErrorString(glGetError()) <<"\n";
std::cerr<<"dpy==" <<(void*)_dpy<<" screen="<<DefaultScreen(_dpy)<<"\n";
std::cerr<<"configs="<< (void*)_configs<<"\n";
std::cerr<<"Wasn't able to choose a valid FBConfig. Exiting.\n";
exit(0);
}

int pbAttribList[] = {
GLX_PBUFFER_WIDTH,720,
GLX_PBUFFER_HEIGHT,486,
None
};

_pbuffer=glXCreatePbuffer(_dpy,_configs[0],pbAttribList);
ONDEBUG(std::cerr<<"Here is the pbuffer!! ("<<_pbuffer <<")\n";)


_ctx = new GLXContext;
*_ctx=glXCreateNewContext(_dpy,_configs[0],GLX_RGBA_TYPE,0,True );

ONDEBUG(std::cerr<<"Here is the ctx!" <<_ctx<< "\n";)
if(glXMakeCurrent(_dpy,_pbuffer,*_ctx)){
ONDEBUG(std::cerr<<"Binding of new glx context (to pbuffer) is successful\n";)
}else{
std::cerr<<"Doh! Binding of new glx context (to pbuffer) is NOT successful\n";
std::cerr<<"Gl says: \"" << gluErrorString(glGetError()) <<"\"\n";
}

std::cerr<<"GL_RENDERER = "<< glGetString(GL_RENDERER) <<"\n";
std::cerr<<"GL_VERSION = "<< glGetString(GL_VERSION) <<"\n";
std::cerr<<"GL_VENDOR = "<< glGetString(GL_VENDOR) <<"\n";
std::cerr<<"GL_EXTENSIONS = "<< glGetString(GL_EXTENSIONS) <<"\n";

glViewport(0,0,720,486);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0,720,0,486,-10000,10000);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();


unsigned char frameBuf[720][486][4];
i=0;
while(1){
glReadPixels(0,0,720,486,GL_RGBA,GL_UNSIGNED_BYTE, (unsigned char*) frameBuf);
cerr<<"\rThis is a string of indeterminate length. I love big words "<<++i<<" iterations";
}
return 1;
}

grmoc
04-04-03, 03:44 PM
Sorry about this, but there is one more thing-
Remeber- This bug manifests with 4191 and 4349, but not 3123.

Andy Mecham
04-04-03, 03:54 PM
grmoc: it's probably better to attach sample code as a file rather than cut-n-paste - formatting can be broken in forums.

--andy

grmoc
04-04-03, 04:09 PM
OKdokie, noted :)

I would have attached it (It was less work that way), but already had the attachment. Here it is again in non-reformatted non-smiley plaintext..

Andy Mecham
04-04-03, 04:35 PM
So far, no repro. Can you give me your distro/machine/card/XF86Config?

--andy

grmoc
04-04-03, 04:52 PM
Distro:
any of: RH 7.3, 8, 9

Kernel:
all stock RH kernels (both SMP and UNI)
linus's 2.4.20 (other vanillas not tried)
-Do you need the output of lsmod?

Machine:
Supermicro X5DA8 MB (Intel E7505 chipset)
dual 2.8 Ghz CPU
1 Gig registered ecc RAM
U320 scsi (aic79xx based) (onboard the MB)
intel e1000 ethernet (onboard the MB)
DVS SDI I/O card (www.dvs.de)
-If you need more detail, I can post a lspci or other infos.


Card:
PNY Quadro4 980XGL

XF86Config:
see attached.

It can take some time to reproduce. (usually less than 20 minutes for the flow-blown 'eww' black corruption, but you can see hints of things (a little more flicker than usual in a rapidly updating dialogue, for example) before the full-blown eww)

grmoc
04-04-03, 05:08 PM
The screenshots were taken on RH8 with the stock (updated) 2.4.18-24.8.0 kernel, both SMP and UNI

grmoc
04-07-03, 12:53 PM
So, were you able to reproduce the problem yet? Waiting for new hardware? =)

Andy Mecham
04-07-03, 01:24 PM
Actually, no - both test cases ran for several hours without causing any corruption on the systems I checked. I'll push it out to some other systems today - i've got some ideas about it now.

I didn't get to it this weekend, sorry...

--andy

grmoc
04-07-03, 03:04 PM
Well, here is more corruption.
XF86Config has not changed, nothing has changed, however, this is the corruption I get.

To be clear- I was not running any program in particular, not even something that updates the screen often.

grmoc
04-07-03, 03:16 PM
Here is the script used to harvest this information

Assuming you guys are in the SF Bay area, I'm happy to take this computer to you to play with for a few days.


echo "-------- lspci output follows --------" > machinespecs
lspci -vv >> machinespecs
echo "-------- lsmod output follows --------" >> machinespecs
lsmod >> machinespecs
echo "-------- uname -a output follows --------" >> machinespecs
uname -a >> machinespecs
echo "-------- output from 'cat /lib/modules/`uname -r`/build/.config' follows --------" >> machinespecs
cat /lib/modules/`uname -r`/build/.config >> machinespecs

grmoc
04-07-03, 03:36 PM
[fenix@gandalf agp]$ cat /proc/driver/nvidia/cards/0

Model: Quadro4 980 XGL
IRQ: 21
Video BIOS: 04.28.20.05.03
Card Type: AGP

(cat agp/card gives no additional output)

[fenix@gandalf agp]$ cat agp/host-bridge
Host Bridge: PCI device 8086:2550 (Intel Corp.)
Fast Writes: Supported
SBA: Supported
AGP Rates: 8x 4x
Registers: 0x1f00421b:0x00000102

[fenix@gandalf agp]$ cat agp/status
Status: Enabled
Driver: NVIDIA
AGP Rate: 8x
Fast Writes: Disabled
SBA: Disabled

Andy Mecham
04-07-03, 03:37 PM
Thanks for all the info!

--andy

grmoc
04-09-03, 01:16 PM
Have you guys been able to reproduce the problem yet?

I'm willing to bring the machine over to you guys to play with for a few days to help solve the problem, if that would help (you'd be free to wipe the harddrive, etc)

More information:

Offscreen rendering is uneffected by the corruption.

When you get it into this state, you can open up a new window, and certain parts of the new window (most/all of the background part of the window) will be transparent. i.e. black text will appear as black, but as if the window's canvas was translucent.

The screenshot was taken on RH9.0 running kernel 2.4.20-9smp, using 4349

grmoc
04-09-03, 01:17 PM
Here it is

grmoc
04-21-03, 03:22 PM
Sorry to keep bringing this up, but, is there any word or reproducing this problem?

casey
04-21-03, 03:42 PM
Cant reproduce the problem, tho im using gentoo and i have removed many services out of my os that cause me to drop frames, as I too use linux to attempt to run realtime graphics on the display. If crond or some other silly service pops up and take s cpu focus off your prog, your prog could suffer visualy.

hope this helps.

here is my XF86 file if it helps: