condass 02-19-07 01:04 PM

Excessive RSS for Xorg
I am trying to debug a problem in which the RSS of the Xorg process will hold constant for a long period of time, then it will start growing until we are forced to restart the X server (the machine has 2GB and the XServer will consume nearly all of it). We have not been able to reproduce the problem in our development or test systems, but have seen it in production (where the users are getting annoyed!).

When this last happened we ran xrestop and it did not report any significant memory usage (All was < 50MB). We tried closing all of the running applications and the RSS for X was still 750MB!

The machine configurations are RHEL4U4 with the Nvidia driver 1.0-9746, and each has 6 heads supported on an NVS440 and NVS285.

Does anyone have any ideas for how I can debug this problem? I tried a stress test by starting "x11perf -all" on each of the six heads, and although there is some other kind of bug that causes the Xorg to eventually become unresponsive (problem in X or in the Nvidia drivers??), I cannot reproduce the memory utilization problem. I looked at the output of Xorg.log (debuglevel 6) and the only notable entry was "not allocating video overlay" for the second, fourth, and sixth video devices. What does this mean?

Thanks in advance for any advice!

netllama 02-19-07 04:17 PM

Re: Excessive RSS for Xorg
I don't think that your interpretation of X's memory usage has any relationship to the X instability. I'd suggest reading the driver README's section titled "Why does X use so much memory?" and also:


condass 02-19-07 05:11 PM

Re: Excessive RSS for Xorg
I have read that portion of the README and I still believe that the X memory usage is growing. One of our systems is having the issue right now and the X11 RES size is 373MB, SHR size is 3Mbytes. I did a pmap on the "X" process and found a memory section that is 364056 Kbytes and is not marked shared (RHEL4 does not provide a "private" summary).

I looked at the same section in /proc/????/maps and found it marked as "p" for private.

condass 02-21-07 05:03 PM

Re: Excessive RSS for Xorg
Today I was able to attach gdb to the X process on a workstation experiencing the problem (this is somewhat difficult to arrange since these workstations are used 24X7). The memory section in question was over 300MB in size, marked private, and when I dumped out random blocks of the contents I could see no clear pattern. I did find in the first 4MB of the section the following strings:


and others, which are also in the nvidia driver. I am now trying to figure out my next step in diagnosing the problem.

AaronP 02-21-07 08:28 PM

Re: Excessive RSS for Xorg
Are your users running Firefox for long periods of time or with a lot of open tabs? That's the #1 reason for my home machine to cause X to suck up all my memory. Your best best might be to start killing X clients one by one, starting with Firefox, until the memory usage goes down to see what's using so many resources.

condass 02-21-07 09:09 PM

Re: Excessive RSS for Xorg
My users are using firefox, but closing applications does not significantly reduce the amount of memory used. In fact, when I did my gdb examination we had closed every user application accessing the X server! We also verified via xrestop that the X server did not show any user applications running.

I now am thinking through the hypothesis that one of our application is issuing some X calls with parameters that are causing the nvidia driver, or X itself, to leak memory. If this is the case, then we do not have sufficient test cases in our test environment to reproduce the exact conditions that expose this problem.

condass 02-24-07 08:33 AM

Re: Excessive RSS for Xorg
Today I ran Xorg under strace and can cannot see the mmap operations for the memory section that is growing (I can see the mmaps for the section right before it, though). From this I conclude that it is the nvidia device driver that is allocating this memory. Since I think I have exhausted what I can run down, I think we will get an ATI firemv board and see if the problem still happens.

