Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 06-04-09, 07:26 PM   #1
marcan
Registered User
 
Join Date: Apr 2009
Posts: 10
Default 185.18.10 - Xid errors and hang during VDPAU video playback

Basics:
- nVidia beta drivers 185.18.10
- GeForce 9700M GT
- Linux 2.6.29

After a while (30 minutes or so) playing smallish (576x320) xvid videos using the VDPAU output driver in mplayer, I'm getting stuttering followed by a hang. I know VDPAU doesn't accelerate xvid and mplayer is using a standard software codec - I still prefer the VDPAU output driver to Xv because it draws subtitles at full screen resolution.

I can kill mplayer remotely and X might or might not recover - if it does, I usually get corruption of a few desktop windows which resolves itself after moving them around to force a redraw. If it doesn't, I can send Xorg a SIGKILL and things go back to normal after it is automatically restarted by the login manager.

GPU core temperature was around 69C when I checked using nvidia-settings after restarting X following a crash. The fan was running. This looks like the kind of crash that comes from hardware/overheating issues, but I can't see presentation-only VDPAU being very GPU-heavy, and this is a stock laptop (Acer Aspire 8930G) which presumably shouldn't have heat issues using stock clocks and settings.

Here's nvidia-bug-report.log.gz and Xorg.0.log.old (which shows the Xorg log during the crashed session - the one in nvidia-bug-report is clean):
http://marcansoft.com/transf/nvidia-...u_crash.log.gz
http://marcansoft.com/transf/Xorg.0....u_crash.log.gz

The dmesg output in nvidia-bug-report has some superfluous stuff that you should ignore: the PM debug stuff (I have PM debug messages enabled because I plan on debugging a broken suspend/resume issue that isn't related to nvidia), the CAP_* and "Private value" stuff (Some time ago I messed with the ALSA driver to get the speaker routing corrected for my laptop and added some printk's which I've been too lazy to remove; also unrelated), and the USB connect/disconnect messages as I grabbed my iPhone which I was using as the remote terminal to kill -9 Xorg and/or mplayer.

Particularly interesting, though, is that pciehp seems to be trying to say that the nVidia card is being removed and reinserted from the (internal) PCIe bus during/after the errors (!).

If there is anything else I can do to help debug the issue please just ask.
marcan is offline   Reply With Quote
Old 06-04-09, 09:22 PM   #2
Stephen Warren
Moderator
 
Stephen Warren's Avatar
 
Join Date: Aug 2005
Posts: 1,327
Default Re: 185.18.10 - Xid errors and hang during VDPAU video playback

Does the issue still repro without the Option RegistryDwords entry in xorg.conf?
Stephen Warren is offline   Reply With Quote
Old 06-04-09, 09:31 PM   #3
Stephen Warren
Moderator
 
Stephen Warren's Avatar
 
Join Date: Aug 2005
Posts: 1,327
Default Re: 185.18.10 - Xid errors and hang during VDPAU video playback

Also, could you test without the X Composite extension enabled in xorg.conf; I suspect the issue won't repro then. If it does, the XID message should be different at least; could you paste them here.

Thanks.
Stephen Warren is offline   Reply With Quote
Old 06-04-09, 11:37 PM   #4
marcan
Registered User
 
Join Date: Apr 2009
Posts: 10
Default Re: 185.18.10 - Xid errors and hang during VDPAU video playback

Reproduced without RegistryDwords. I'll try without Composite next.
marcan is offline   Reply With Quote
Old 06-05-09, 09:50 AM   #5
marcan
Registered User
 
Join Date: Apr 2009
Posts: 10
Default Re: 185.18.10 - Xid errors and hang during VDPAU video playback

Can't seem to repro without Composite. I'll leave it running all night to confirm.
marcan is offline   Reply With Quote
Old 06-05-09, 01:25 PM   #6
Stephen Warren
Moderator
 
Stephen Warren's Avatar
 
Join Date: Aug 2005
Posts: 1,327
Default Re: 185.18.10 - Xid errors and hang during VDPAU video playback

One more question: Are the pciehp messages you see correlated with when you see XID messages and VDPAU problems?

In other words, do you ever:
a) See pciehp messages indicating unplug/replug, while not using VDPAU
b) See pciehp messages indicating unplug/replug, using VDPAU (with composite enabled), but without problem
c) See VDPAU problems and/or XID messages, but no pciehp messages at the same time?

Is your GPU on a plugin MXM card? If so, is it fully seated in the slot? Are there any other indications of HW problems in your laptop?

Thanks.
Stephen Warren is offline   Reply With Quote
Old 06-05-09, 09:16 PM   #7
marcan
Registered User
 
Join Date: Apr 2009
Posts: 10
Default Re: 185.18.10 - Xid errors and hang during VDPAU video playback

Grepping through old logs (including while I was running 180.xx and other driver versions), I see:
- a few rare scattered "Card present" messages with no matching "not present" messages. I was able to cause one of these by switching to a text console.
- some not present/present cycles when Xorg is manually killed/restarted
- this one time I had a hang with 180.44, although I don't remember what caused it (might have been the same vdpau thing or not):
Code:
May 18 17:26:55 raider pciehp 0000:00:01.0:pcie02: Card not present on Slot(1)
May 18 17:26:55 raider pciehp 0000:00:01.0:pcie02: Card present on Slot(1)
May 18 17:26:55 raider pciehp 0000:00:01.0:pcie02: Card not present on Slot(1)
May 18 17:26:55 raider pciehp 0000:00:01.0:pcie02: Card present on Slot(1)
May 18 17:27:04 raider NVRM: Xid (0001:00): 16, Head 00000001 Count 00000000
May 18 17:27:05 raider NVRM: Xid (0001:00): 16, Head 00000000 Count 0014a8ac
May 18 17:27:06 raider SysRq : SAK
May 18 17:27:06 raider SAK: killed process 21146 (X): task_session_nr(p)==tty->session
May 18 17:27:06 raider SAK: killed process 21146 (X): task_session_nr(p)==tty->session
May 18 17:27:07 raider /usr/sbin/gpm[7139]: *** info [mice.c(1988)]:
May 18 17:27:07 raider /usr/sbin/gpm[7139]: imps2: Auto-detected intellimouse PS/2
May 18 17:27:07 raider pciehp 0000:00:01.0:pcie02: Card present on Slot(1)
May 18 17:27:08 raider 1.3.1: FATAL: CXWindowsScreen.cpp,1590: X display has unexpectedly disconnected
May 18 17:27:08 raider kdm[7521]: X server for display :0 terminated unexpectedly
May 18 17:27:10 raider kdm: :0[21151]: pam_unix(kde:session): session closed for user marcansoft
Looking at yesterday's logs, I don't see a strong pattern. There are pciehp replugs interspersed with Xid errors. Sometimes there's a Xid, I restart, a pciehp cycle, then 20 minutes later another Xid and hang. Sometimes there's a pciehp and then an immediate Xid. Sometimes there's a Xid an then an immediate pciehp. So it looks like it's correlated, but not entirely clear.

I guess that means:
a) Yes, rarely alone, often when killing/restarting Xorg
b) Yes, but usually a problem happens soon thereafter
c) I can't find any of the Xid messages from yesterday that didn't have some pciehp message relatively nearby, but it's not entirely clear.

It's too erratic to make any solid conclusions as far as I can tell

Yes, the GPU is on an MXM card. This is a near stock laptop and quite new (bought it late 2008) - the only thing I've done to it is add a second hard drive (there's a vacant spot on this configuration; other configurations have it stock). I can see the MXM card when I take off the single "user-serviceable" cover (which uncovers a good part of the bottom of the laptop, including HDDs and RAM) but I can't remove the MXM card this way, as I'd have to take off the rest of the laptop's bottom case to be able to remove the thermal system on top of it. I haven't done or attempted to do anything to the card.

The laptop is stable as far as hardware is concerned, so far. I've had a few Xorg crashes over the past few months, but nothing worrisome. This is the first time I've had multiple repeatable crashes traceble to something in particular. Of note: I upgraded to 185.18.10 because it fixed the PowerMizer issues. Prior drivers never went beyond levels 0-1, while this one is able to switch all the way up to 3, and sometimes does while using VDPAU in this manner. So I guess it is possible that the increased GPU clocking at the higher PowerMizer levels is uncovering some stability issues. Or maybe the issue occurs when switching PowerMizer levels. On the other hand, unless Acer screwed up or this laptop is defective, there's no reason why there would be a hardware-caused stability issue.

I'm leaving the video looping all night now. No RegistryDwords, no Composite.
marcan is offline   Reply With Quote
Old 06-06-09, 08:48 AM   #8
marcan
Registered User
 
Join Date: Apr 2009
Posts: 10
Default Re: 185.18.10 - Xid errors and hang during VDPAU video playback

Confirmed stable with no Composite.
marcan is offline   Reply With Quote

Old 06-08-09, 12:08 PM   #9
Stephen Warren
Moderator
 
Stephen Warren's Avatar
 
Join Date: Aug 2005
Posts: 1,327
Default Re: 185.18.10 - Xid errors and hang during VDPAU video playback

Interesting. Could I ask you to perform another test to determine if this is a regression. Please re-enable composite so that the bug shows up, then test the following two drivers:

185.13 (doesn't contain sync-to-VBLANK for blit-based presentation queue)
185.19 (does contain sync-to-VBLANK for blit-based presentation queue) (older than 185.18.*)

and see whether those versions have the issue.

Thanks very much.

Thanks.
Stephen Warren is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 07:47 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.