nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   185.18.10 - Xid errors and hang during VDPAU video playback (http://www.nvnews.net/vbulletin/showthread.php?t=134058)

marcan 06-04-09 07:26 PM

185.18.10 - Xid errors and hang during VDPAU video playback
 
Basics:
- nVidia beta drivers 185.18.10
- GeForce 9700M GT
- Linux 2.6.29

After a while (30 minutes or so) playing smallish (576x320) xvid videos using the VDPAU output driver in mplayer, I'm getting stuttering followed by a hang. I know VDPAU doesn't accelerate xvid and mplayer is using a standard software codec - I still prefer the VDPAU output driver to Xv because it draws subtitles at full screen resolution.

I can kill mplayer remotely and X might or might not recover - if it does, I usually get corruption of a few desktop windows which resolves itself after moving them around to force a redraw. If it doesn't, I can send Xorg a SIGKILL and things go back to normal after it is automatically restarted by the login manager.

GPU core temperature was around 69C when I checked using nvidia-settings after restarting X following a crash. The fan was running. This looks like the kind of crash that comes from hardware/overheating issues, but I can't see presentation-only VDPAU being very GPU-heavy, and this is a stock laptop (Acer Aspire 8930G) which presumably shouldn't have heat issues using stock clocks and settings.

Here's nvidia-bug-report.log.gz and Xorg.0.log.old (which shows the Xorg log during the crashed session - the one in nvidia-bug-report is clean):
http://marcansoft.com/transf/nvidia-...u_crash.log.gz
http://marcansoft.com/transf/Xorg.0....u_crash.log.gz

The dmesg output in nvidia-bug-report has some superfluous stuff that you should ignore: the PM debug stuff (I have PM debug messages enabled because I plan on debugging a broken suspend/resume issue that isn't related to nvidia), the CAP_* and "Private value" stuff (Some time ago I messed with the ALSA driver to get the speaker routing corrected for my laptop and added some printk's which I've been too lazy to remove; also unrelated), and the USB connect/disconnect messages as I grabbed my iPhone which I was using as the remote terminal to kill -9 Xorg and/or mplayer.

Particularly interesting, though, is that pciehp seems to be trying to say that the nVidia card is being removed and reinserted from the (internal) PCIe bus during/after the errors (!).

If there is anything else I can do to help debug the issue please just ask.

Stephen Warren 06-04-09 09:22 PM

Re: 185.18.10 - Xid errors and hang during VDPAU video playback
 
Does the issue still repro without the Option RegistryDwords entry in xorg.conf?

Stephen Warren 06-04-09 09:31 PM

Re: 185.18.10 - Xid errors and hang during VDPAU video playback
 
Also, could you test without the X Composite extension enabled in xorg.conf; I suspect the issue won't repro then. If it does, the XID message should be different at least; could you paste them here.

Thanks.

marcan 06-04-09 11:37 PM

Re: 185.18.10 - Xid errors and hang during VDPAU video playback
 
Reproduced without RegistryDwords. I'll try without Composite next.

marcan 06-05-09 09:50 AM

Re: 185.18.10 - Xid errors and hang during VDPAU video playback
 
Can't seem to repro without Composite. I'll leave it running all night to confirm.

Stephen Warren 06-05-09 01:25 PM

Re: 185.18.10 - Xid errors and hang during VDPAU video playback
 
One more question: Are the pciehp messages you see correlated with when you see XID messages and VDPAU problems?

In other words, do you ever:
a) See pciehp messages indicating unplug/replug, while not using VDPAU
b) See pciehp messages indicating unplug/replug, using VDPAU (with composite enabled), but without problem
c) See VDPAU problems and/or XID messages, but no pciehp messages at the same time?

Is your GPU on a plugin MXM card? If so, is it fully seated in the slot? Are there any other indications of HW problems in your laptop?

Thanks.

marcan 06-05-09 09:16 PM

Re: 185.18.10 - Xid errors and hang during VDPAU video playback
 
Grepping through old logs (including while I was running 180.xx and other driver versions), I see:
- a few rare scattered "Card present" messages with no matching "not present" messages. I was able to cause one of these by switching to a text console.
- some not present/present cycles when Xorg is manually killed/restarted
- this one time I had a hang with 180.44, although I don't remember what caused it (might have been the same vdpau thing or not):
Code:

May 18 17:26:55 raider pciehp 0000:00:01.0:pcie02: Card not present on Slot(1)
May 18 17:26:55 raider pciehp 0000:00:01.0:pcie02: Card present on Slot(1)
May 18 17:26:55 raider pciehp 0000:00:01.0:pcie02: Card not present on Slot(1)
May 18 17:26:55 raider pciehp 0000:00:01.0:pcie02: Card present on Slot(1)
May 18 17:27:04 raider NVRM: Xid (0001:00): 16, Head 00000001 Count 00000000
May 18 17:27:05 raider NVRM: Xid (0001:00): 16, Head 00000000 Count 0014a8ac
May 18 17:27:06 raider SysRq : SAK
May 18 17:27:06 raider SAK: killed process 21146 (X): task_session_nr(p)==tty->session
May 18 17:27:06 raider SAK: killed process 21146 (X): task_session_nr(p)==tty->session
May 18 17:27:07 raider /usr/sbin/gpm[7139]: *** info [mice.c(1988)]:
May 18 17:27:07 raider /usr/sbin/gpm[7139]: imps2: Auto-detected intellimouse PS/2
May 18 17:27:07 raider pciehp 0000:00:01.0:pcie02: Card present on Slot(1)
May 18 17:27:08 raider 1.3.1: FATAL: CXWindowsScreen.cpp,1590: X display has unexpectedly disconnected
May 18 17:27:08 raider kdm[7521]: X server for display :0 terminated unexpectedly
May 18 17:27:10 raider kdm: :0[21151]: pam_unix(kde:session): session closed for user marcansoft

Looking at yesterday's logs, I don't see a strong pattern. There are pciehp replugs interspersed with Xid errors. Sometimes there's a Xid, I restart, a pciehp cycle, then 20 minutes later another Xid and hang. Sometimes there's a pciehp and then an immediate Xid. Sometimes there's a Xid an then an immediate pciehp. So it looks like it's correlated, but not entirely clear.

I guess that means:
a) Yes, rarely alone, often when killing/restarting Xorg
b) Yes, but usually a problem happens soon thereafter
c) I can't find any of the Xid messages from yesterday that didn't have some pciehp message relatively nearby, but it's not entirely clear.

It's too erratic to make any solid conclusions as far as I can tell :(

Yes, the GPU is on an MXM card. This is a near stock laptop and quite new (bought it late 2008) - the only thing I've done to it is add a second hard drive (there's a vacant spot on this configuration; other configurations have it stock). I can see the MXM card when I take off the single "user-serviceable" cover (which uncovers a good part of the bottom of the laptop, including HDDs and RAM) but I can't remove the MXM card this way, as I'd have to take off the rest of the laptop's bottom case to be able to remove the thermal system on top of it. I haven't done or attempted to do anything to the card.

The laptop is stable as far as hardware is concerned, so far. I've had a few Xorg crashes over the past few months, but nothing worrisome. This is the first time I've had multiple repeatable crashes traceble to something in particular. Of note: I upgraded to 185.18.10 because it fixed the PowerMizer issues. Prior drivers never went beyond levels 0-1, while this one is able to switch all the way up to 3, and sometimes does while using VDPAU in this manner. So I guess it is possible that the increased GPU clocking at the higher PowerMizer levels is uncovering some stability issues. Or maybe the issue occurs when switching PowerMizer levels. On the other hand, unless Acer screwed up or this laptop is defective, there's no reason why there would be a hardware-caused stability issue.

I'm leaving the video looping all night now. No RegistryDwords, no Composite.

marcan 06-06-09 08:48 AM

Re: 185.18.10 - Xid errors and hang during VDPAU video playback
 
Confirmed stable with no Composite.

Stephen Warren 06-08-09 12:08 PM

Re: 185.18.10 - Xid errors and hang during VDPAU video playback
 
Interesting. Could I ask you to perform another test to determine if this is a regression. Please re-enable composite so that the bug shows up, then test the following two drivers:

185.13 (doesn't contain sync-to-VBLANK for blit-based presentation queue)
185.19 (does contain sync-to-VBLANK for blit-based presentation queue) (older than 185.18.*)

and see whether those versions have the issue.

Thanks very much.

Thanks.


All times are GMT -5. The time now is 09:03 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.