Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 12-26-08, 10:05 AM   #73
kernelOfTruth
Gentoo Linux addict
 
Join Date: Nov 2007
Location: Vienna, Austria; Germany; hello world :)
Posts: 202
Default Re: 180.* graphical corruption and freezes system

Quote:
Originally Posted by rockman1981 View Post
Could you check if in your xorg.conf the option "AddARGBGLXVisuals" is enabled?
And if so, try disabling it?
Maybe i'm wrong again, but it seems a bit more stable now..

yes, it was enabled

thanks, rockman, I'm currently trying it with that setting, backingstore and some other stuff disabled
__________________
2.6.30-rc3-zen0+ w. compcache, reiser4 + ccreg40 (lzo-compression + checksumming)
gcc version 4.4.0-pre9999 built 20090425 (prerelease) rev. 146764 (Gentoo SVN)
gcc version 4.4.0 (Gentoo Hardened 4.4.0-r1, espf-0.2.1)
Ubuntu Jaunty/9.04 x86_64 Dell XPS M1330
kernelOfTruth is offline   Reply With Quote
Old 12-26-08, 11:30 AM   #74
DeepThought
Registered User
 
Join Date: Sep 2007
Posts: 14
Default Re: 180.* graphical corruption and freezes system

I also have the AddARGBGLXVisuals enabled. I will disable it and see if it happens again.

I updated to 180.16, and it happened again (no surprise, since there was no mentioning about fixing this problem in the changelog). This time it was a bit different, thou. I could still move the pointer using my mouse, and I could see the (corrupted) pointer move on screen. Nothing happened when trying to do anything else (i.e. right-clicking on desktop). Then it froze totally, so I pressed the off-button and my Caps and Scroll Lock (?) started flashing, which means a kernel panic, right?

I also found the following in my logs:

/var/log/syslog
Code:
Dec 26 17:56:17 TheGuide kernel: [16689.189129] NVRM: Xid (0001:00): 6, PE007e 
Dec 26 17:56:24 TheGuide kernel: [16696.184347] NVRM: Xid (0001:00): 8, Channel 00000001
Dec 26 17:56:57 TheGuide kernel: [16728.757547] NVRM: Xid (0001:00): 6, PE0001 
Dec 26 17:56:57 TheGuide kernel: [16728.847453] NVRM: Xid (0001:00): 6, PE0001 
Dec 26 17:56:57 TheGuide kernel: [16728.928183] NVRM: Xid (0001:00): 6, PE0001 
Dec 26 17:56:57 TheGuide kernel: [16729.011586] NVRM: Xid (0001:00): 6, PE0001 
Dec 26 17:56:57 TheGuide kernel: [16729.095120] NVRM: Xid (0001:00): 6, PE0001 
Dec 26 17:56:57 TheGuide kernel: [16729.177785] NVRM: Xid (0001:00): 6, PE0001
/var/log/Xorg.0.log.old
Code:
(II) NVIDIA(0): Initialized GPU GART.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.

[...]

tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(EE) NVIDIA(0): Error recovery failed.
(EE) NVIDIA(0):  *** Aborting ***
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(EE) NVIDIA(0): Error recovery failed.
(EE) NVIDIA(0):  *** Aborting ***

[...]

(II) NVIDIA(0): Initialized GPU GART.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.
tossed event which came in late
mieqEnequeue: out-of-order valuator event; dropping.

[...]
DeepThought is offline   Reply With Quote
Old 12-26-08, 01:29 PM   #75
pacnow
Registered User
 
Join Date: Dec 2008
Posts: 3
Default Re: 180.* graphical corruption and freezes system

I'm experiencing the problems mentioned in this thread. I was able to work around by switching to single channel memory mode. This is not a solution and shouldn't be considered fixed.

After I login, the desktop appears normal, as I change what's being displayed on the screen (opening menus), there will be different types of distortions like grayscale shadows behind objects, traces of windows being left behind and not refreshing, and complete lockups. I tried opening the Nvidia settings manager and GPU errors started counting up with each flicker of the screen. I started GLXgears and the system froze. Alt + Sysrq + REISUB worked to reboot the system. I started a Memtest and it hung but the little plus sign continued to blink.

When I changed my system to 1 gb in single channel, everything works without a hitch.

Hardware:

CPU : 2.66 Mhz Intel Quad Core QX6700
MOTHERBOARD : MSI P6N Diamond SLI
RAM : X2 1GB Corsair 1066 Mhz Dual Channel
HARD DRVES : X2 Sata
POWER SUPPLY : BFG 1KW
VIDEO CARD : X1 Evga Nvidia 8800 GTX

Software:

Ubuntu 8.10 Intrepid
Linux silvercross 2.6.27-9-generic #1 SMP Thu Nov 20 21:57:00 UTC 2008 i686 GNU/Linux
Tried Nvidia driver 177 and 180, 180 seemed to be a bit more stable.


Kernel Log
Code:
Dec 25 21:52:25 silvercross kernel: [  142.298197] NVRM: Xid (000a:00): 13, 0001 00000000 0000502d 00000800 0004f21c 0000000c
Dec 25 21:52:25 silvercross kernel: [  142.298855] NVRM: Xid (000a:00): 13, 0001 00000000 0000502d 00000800 0004f21c 0000000c
Dec 25 21:52:29 silvercross kernel: [  146.509584] NVRM: Xid (000a:00): 13, 0001 00000000 00005097 000015e0 00000000 00000040
Dec 25 21:52:52 silvercross kernel: [  169.339150] NVRM: Xid (000a:00): 6, PE0001 
Dec 25 21:52:52 silvercross kernel: [  169.382107] NVRM: Xid (000a:00): 3, C 00000001 SC 00000001 M 00001e00 Data 00000000
Dec 25 21:52:52 silvercross kernel: [  169.467893] NVRM: Xid (000a:00): 6, PE0001 
Dec 25 21:52:52 silvercross kernel: [  169.574361] NVRM: Xid (000a:00): 6, PE0001 
Dec 25 21:52:52 silvercross kernel: [  169.603103] NVRM: Xid (000a:00): 3, C 00000001 SC 00000002 M 00000560 Data ff0b30d9
Dec 25 21:53:42 silvercross kernel: [  219.752050] rpcbind: server hermes.pacnow not responding, timed out
Dec 25 21:57:56 silvercross kernel: [  473.613605] NVRM: Xid (000a:00): 3, C 00000001 SC 00000001 M 00000a60 Data ff433b30
Dec 25 21:58:24 silvercross kernel: [  501.638264] NVRM: Xid (000a:00): 13, 0001 00000000 0000502d 00000104 0000ff00 00000005
Dec 25 21:58:24 silvercross kernel: [  501.638914] NVRM: Xid (000a:00): 13, 0001 00000000 0000502d 00000104 0000ff00 00000005
Dec 25 21:58:25 silvercross kernel: [  502.054081] NVRM: Xid (000a:00): 6, PE0003 
Dec 25 21:58:55 silvercross kernel: [  532.298144] NVRM: Xid (000a:00): 6, PE0001 
Dec 25 21:58:55 silvercross kernel: [  532.323545] NVRM: Xid (000a:00): 3, C 00000001 SC 00000002 M 00001a60 Data ff593630
Dec 25 21:58:55 silvercross kernel: [  532.427775] NVRM: Xid (000a:00): 3, C 00000001 SC 00000001 M 00001960 Data ffa06c57
Dec 25 21:58:56 silvercross kernel: [  533.118786] NVRM: Xid (000a:00): 6, PE0001
Xorg.0.log
Code:
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Initialized GPU GART.
pacnow is offline   Reply With Quote
Old 12-26-08, 05:12 PM   #76
pacnow
Registered User
 
Join Date: Dec 2008
Posts: 3
Default Re: 180.* graphical corruption and freezes system

I got dual channel memory working right. I have been overclocking the RAM because it was supposed to be run at 1066 Mhz. Then when I pulled out a stick the system wouldn't boot altogether and give memory error codes. I understand this is because one stick can't function at 1066 Mhz, and after a few unsuccessful reboots the bios reset to the default 800 Mhz. I put the RAM back in dual channel at 800 Mhz and my system is functional again.
pacnow is offline   Reply With Quote
Old 12-29-08, 04:47 PM   #77
Rudd-O
Registered User
 
Join Date: Dec 2008
Posts: 21
Default Re: 180.* graphical corruption and freezes system

The cgroup scheduler thing I disabled on kernel recompilation. Running with 2.6.28 vanilla. Even with that thing disabled, now the N glxgears + compiz causes X to segfault:

----------------

NVRM: Xid (0003:00): 12, COCOD 00000006 beef5097 00008297 00001aac 00000000
NVRM: Xid (0003:00): 12, COCOD 00000006 beef5097 00008297 00001aac 00000000
NVRM: Xid (0003:00): 12, COCOD 00000006 beef5097 00008297 00001aac 00000000
NVRM: Xid (0003:00): 12, COCOD 00000006 beef5097 00008297 00001aac 00000000
NVRM: Xid (0003:00): 12, COCOD 00000006 beef5097 00008297 00001aac 00000000
NVRM: Xid (0003:00): 12, COCOD 00000006 beef5097 00008297 00001aac 00000000
NVRM: Xid (0003:00): 12, COCOD 00000006 beef5097 00008297 00001aac 00000000
NVRM: Xid (0003:00): 12, COCOD 00000006 beef5097 00008297 00001aac 00000000
X[9473]: segfault at 110 ip 00000000005491f4 sp 00007fff7b0e9970 error 4 in Xorg[400000+1aa000]

-----------------------

I also lost the ability to change to a VT now. Means I have to reboot!
Rudd-O is offline   Reply With Quote
Old 12-30-08, 06:33 AM   #78
olivn
Registered User
 
Join Date: Dec 2005
Posts: 11
Default Re: 180.* graphical corruption and freezes system

Freeze (180.18) while running google-earth (qt4 app) under Gnome/Compiz.

X process was running at 100%cpu and most of the tasks were stuck in D state (syslog,cat /proc/driver/nvidia/cards/0, ... ). I was able to ssh to the system (Linux am2 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux) and capture some info.

It seems that Xid output is truncated
Code:
NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 0000060c 000001b3 00000100
NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 000008dc 00000000 00000100
NVRM: Xid (0001:00): 6, PE0000
SysRq : Show Blocked State
Code:
compiz        D 0000000000000001     0  5808   5775
 ffff8800d5441cb8 0000000000000086 ffff88000001a808 ffffffff80a59980
 ffffffff80a59980 ffffffff80a567f0 ffffffff80a59980 ffffffff80a59980
 ffffffff80a59980 ffffffff80a59980 ffffffff80a59980 ffffffff80a59980
Call Trace:
 [<ffffffff804a2cfb>] schedule_timeout+0x1e/0xad
 [<ffffffff804a230f>] wait_for_common+0xe9/0x16f
 [<ffffffffa0647f6a>] os_acquire_sema+0x36/0x59 [nvidia]
 [<ffffffffa057e671>] _nv004414rm+0x9/0xe [nvidia]
DWARF2 unwinder stuck at _nv004414rm+0x9/0xe [nvidia]

Leftover inexact backtrace:

 [<ffffffffa0507cd3>] _nv005988rm+0x74/0x10d [nvidia]
 [<ffffffffa03883a5>] _nv003189rm+0x225/0x6fd [nvidia]
 [<ffffffffa0586923>] rm_ioctl+0x2f/0x67 [nvidia]
 [<ffffffffa0644c08>] nv_kern_ioctl+0x307/0x369 [nvidia]
 [<ffffffff804a2af6>] thread_return+0x3a/0xd5
 [<ffffffffa0644ca7>] nv_kern_unlocked_ioctl+0x1c/0x21 [nvidia]
 [<ffffffff802c70b9>] vfs_ioctl+0x21/0x6c
 [<ffffffff802c7343>] do_vfs_ioctl+0x23f/0x255
 [<ffffffff802c73aa>] sys_ioctl+0x51/0x73
 [<ffffffff8020c37a>] system_call_fastpath+0x16/0x1b

googleearth-b D 0000000000000001     0  7971   5747
 ffff8800a995db98 0000000000000046 00000001009bd036 ffffffff80a59980
 ffffffff80a59980 ffffffff80a567f0 ffffffff80a59980 ffffffff80a59980
 ffffffff80a59980 ffffffff80a59980 ffffffff80a59980 ffffffff80a59980
Call Trace:
 [<ffffffff804a2cfb>] schedule_timeout+0x1e/0xad
 [<ffffffff804a230f>] wait_for_common+0xe9/0x16f
 [<ffffffffa0647f6a>] os_acquire_sema+0x36/0x59 [nvidia]
 [<ffffffffa057e671>] _nv004414rm+0x9/0xe [nvidia]
DWARF2 unwinder stuck at _nv004414rm+0x9/0xe [nvidia]

Leftover inexact backtrace:

 [<ffffffffa0586c01>] rm_free_unused_clients+0x7f/0xe1 [nvidia]
 [<ffffffff804a3eb1>] _spin_lock_irqsave+0x2e/0x35
 [<ffffffffa06446f3>] nv_kern_ctl_close+0x92/0xc7 [nvidia]
 [<ffffffff802bbe00>] __fput+0xa1/0x165
 [<ffffffff802b949b>] filp_close+0x5b/0x62
 [<ffffffff80242f8d>] put_files_struct+0x65/0xc4
 [<ffffffff80244bac>] do_exit+0x235/0x334
 [<ffffffff80244d2c>] do_group_exit+0x81/0xad
 [<ffffffff8024df65>] get_signal_to_deliver+0x33e/0x375
 [<ffffffff8020bc7b>] do_signal+0x66/0x190
 [<ffffffff80239719>] default_wake_function+0x0/0xe
 [<ffffffff80260c7a>] do_futex+0x51/0xcd
 [<ffffffff80260fe4>] compat_sys_futex+0xf5/0x113
 [<ffffffff8020bdc7>] do_notify_resume+0x22/0x43
 [<ffffffff8020c6f7>] int_signal+0x12/0x17
cat           D 0000000000000001     0 10094  10082
 ffff88005a097c58 0000000000000082 0000000000000001 ffffffff80a59980
 ffffffff80a59980 ffffffff80a567f0 ffffffff80a59980 ffffffff80a59980
 ffffffff80a59980 ffffffff80a59980 ffffffff80a59980 ffffffff80a59980
Call Trace:
 [<ffffffff804a2cfb>] schedule_timeout+0x1e/0xad
 [<ffffffff804a230f>] wait_for_common+0xe9/0x16f
 [<ffffffffa0647f6a>] os_acquire_sema+0x36/0x59 [nvidia]
 [<ffffffffa057e671>] _nv004414rm+0x9/0xe [nvidia]
DWARF2 unwinder stuck at _nv004414rm+0x9/0xe [nvidia]

Leftover inexact backtrace:

 [<ffffffffa0587e53>] rm_get_device_name+0x11f/0x1ad [nvidia]
 [<ffffffffa0646707>] nv_kern_read_cardinfo+0x77/0x2c0 [nvidia]
 [<ffffffff803008c8>] proc_file_read+0x0/0x20a
 [<ffffffff8030098e>] proc_file_read+0xc6/0x20a
 [<ffffffff803008c8>] proc_file_read+0x0/0x20a
 [<ffffffff802fc389>] proc_reg_read+0x9b/0xb5
 [<ffffffff802bb7d2>] vfs_read+0xaa/0x153
 [<ffffffff802bb937>] sys_read+0x45/0x6e
 [<ffffffff8020c37a>] system_call_fastpath+0x16/0x1b
olivn is offline   Reply With Quote
Old 01-02-09, 08:12 AM   #79
Rudd-O
Registered User
 
Join Date: Dec 2008
Posts: 21
Default Re: 180.* graphical corruption and freezes system

I cannot continue with this driver issue, guys. I made a further discovery. After the first 3D failure, after I've reverted to noncompositing desktop and no use of 3D apps, after those nasty NVRM messages, (it would seem to be that) heavy disk activity causes network and disk data corruption.

The network corruption makes it impossible for a download to complete without the file being corrupted, some Web pages to come up with content-encoding errors, SSL sites not to load, SSH connections like ssh rudd-o.com cat /dev/zero to disconnect after a while with Corrupted MAC error, rsync over SSH to fail with the same error.

The disk corruption causes extraneous files named ??8888?8??? (randomly varied) to appear in recently-modified directories. If prelinking is on, the prelinked files are likely to be corrupted and make the system fail.

I discovered this by running the memtest script along with glxgears and simultanously ssh iphone cat /dev/zero. After a while, just about before X froze, the Corrupted MAC error came up and I could not sustain any more SSH connections anywhere. Removal of the network driver r8169 and reinstatement of the module caused a kernel oops.

The problem is simply NOT reproduceable after HOURS of trying with the standard nv video driver. If you value your data, avoid the NVIDIA accelerated driver like the plague. At least until I can re-test the new version, see if these problems do not happen.

Script: http://people.redhat.com/dledford/memtest
Rudd-O is offline   Reply With Quote
Old 01-02-09, 06:50 PM   #80
psychok9
Registered User
 
Join Date: Dec 2008
Posts: 55
Default Re: 180.* graphical corruption and freezes system

Strange: today I've got corruption on Wine + WOW (OpenGL) also with 177.80...
Same 180.xx.
psychok9 is offline   Reply With Quote

Old 01-02-09, 09:57 PM   #81
Rudd-O
Registered User
 
Join Date: Dec 2008
Posts: 21
Exclamation Re: 180.* graphical corruption and freezes system

Devs, we're dying here. For the trouble I'm going to, I might just have not bought the video card and used the builtin card altogether. I am willing to pursue this matter to the END, and give you ABSOLUTELY ANYTHING that you request in terms of debugging info, sys specs, and more. But please, please, help us get a stable driver that does not corrupt filesystems or network I/O!
Rudd-O is offline   Reply With Quote
Old 01-03-09, 08:25 AM   #82
Rudd-O
Registered User
 
Join Date: Dec 2008
Posts: 21
Default Re: 180.* graphical corruption and freezes system

Latest test, with self-compiled 2.6.28, no cgroups scheduler, voluntary preemption instead of fully preemptible kernel. The whole compiz + glxgears + some other apps.

Everything seems to be running fine so far. Ten minutes in. No NVRM messages.

The network card data corruption was a problem with the checksum receive algorithm in the card driver, it was offloading the checksumming to the network card r8169, and the card was SUCKING at it, but for some reason the problems with the video card exacerbated the issue. Solved by ethtool removing all offloading processing on the card.

The disk corruption problem, still a mystery. Memtest script reports no problems.

Update: 20 minutes in. No problems.

Apparently the driver does not get along with full kernel preemption. If anyone wants my .config, ping me.
Rudd-O is offline   Reply With Quote
Old 01-03-09, 08:47 AM   #83
rockman1981
Registered User
 
Join Date: Dec 2008
Posts: 34
Default Re: 180.* graphical corruption and freezes system

iirc the fitst time i spotted this bug i had exactly that config (voluntary preemption) and i added RCU preempt later, as a "new test".
I will try again now, anyway..
This morning i tried also:
  • disabling 64bit resources
  • changing the "high memory" option (it was 64GB, i moved it to 4GB)
  • changing from "sparse memory" to "flat memory"
again, with no results at all.
As i said before, disabling composite in kwin had no results: even in non-composite mode, i had the screen corruption.
But strangely, disabling composite ALSO in xorg.conf changed this behaviour, i had no crash nor corruption at all.

Another note: psychok9 posted that he had this problem on 177.xxx too.
I can't exactly confirm this: on 177.xxx i have no corruption at all, but i can confirm hat there's something similar too: i have the same "screen lockup" that i have in 180.xxx when it goes corrupting.. the only difference is that the screen is just clear, and that in 180.xxx when it goes locking, sometimes it doesn't resume at all.
update i tried configuring kernel with voluntary preemption.
First boot: no corruption at all, for about 20/30 minutes.
Reboot (with no configuration changes): total screen corruption and screen locked after some minutes.
One difference between the two boots can be about the started application, the first time i disabled "auto-start" and i just started a few applicaton manually, the second time i reenabled all my usual applications... i'm starting suspecting skype, do you have it working when screen is corrupting?
rockman1981 is offline   Reply With Quote
Old 01-03-09, 04:17 PM   #84
olivn
Registered User
 
Join Date: Dec 2005
Posts: 11
Default Re: 180.* graphical corruption and freezes system

Quote:
Originally Posted by Rudd-O View Post
Latest test, with self-compiled 2.6.28, no cgroups scheduler, voluntary preemption instead of fully preemptible kernel. The whole compiz + glxgears + some other apps.
I'm now running 2.6.28 with volontary preempt and without cgroups


After using google-earth a few minutes, I get some Xid and garbled display but at least, I can recover by restarting the x server.
Code:
NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 0000060c 000001b4 00000100
NVRM: Xid (0001:00): 13, 0005 00000000 00008297 00001310 00000000 00000040
NVRM: Xid (0001:00): 13, 0004 00000000 0000502d 00000104 00000001 00000100
NVRM: Xid (0001:00): 13, 0004 00000000 0000502d 00000104 00000001 00000100
NVRM: Xid (0001:00): 13, 0004 00000000 0000502d 00000104 00000001 00000100
NVRM: Xid (0001:00): 13, 0005 00000000 00008297 00001b0c 1000f010 00000040
NVRM: Xid (0001:00): 13, 0005 00000000 00008297 00001310 00000000 00000040
NVRM: Xid (0001:00): 13, 0005 00000000 00008297 00001310 00000000 00000040
olivn is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 09:18 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.