Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 03-29-05, 03:15 PM   #1
AronRubin
Registered User
 
Join Date: Mar 2005
Posts: 8
Default Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

So I have been wrestling with a problem for quite some time now where my AMD64 FC3 box will hard lock. The pattern of when this would happen was very difficult to isolate. In my experience only one thing causes and immediate hard lock, hardware issues. After buying almost an entirely new machine's worth of parts with no luck I had given up. The only clues I had was that the freeze tended to happen when I was doing something more graphicly active (2D or 3D) and the fan speeds would ramp (up or down). This lead me to believe it was a house power problem. I bought a new house and still no luck (actually I was going to buy the house anyway but it sounds funnier to think that I would buy a house to solve a bug). At work I ordered a new machine with a similar config. This machine was running fine under RHEL3 but when I upgraded to RHEL4 the problem occurred in that machine too. Finally a solid clue.

After a lot of testing and searching I discovered the pattern - if the CPU speed is scaled either up or down durring high graphics activity the box would freeze. I found that if I disabled cpuspeed (CPU scaling daemon) the problem goes away. I suspect that the nVIDIA driver has some hard coded (or measured at init time) timing parameters that make it choke when the CPU changes clock rate. Also note Xv sync to vblank is on. I do not have time to mess with that.

Aron
AronRubin is offline   Reply With Quote
Old 03-29-05, 07:58 PM   #2
AronRubin
Registered User
 
Join Date: Mar 2005
Posts: 8
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

Oh, I forgot to mention that one of the reasons that I think it may have something to do with the video system is that the frame buffer would often get trashed the moment prior to a freeze.
AronRubin is offline   Reply With Quote
Old 03-29-05, 08:21 PM   #3
chunkey
#!/?*
 
Join Date: Oct 2004
Posts: 662
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

Quote:
Originally Posted by AronRubin
Oh, I forgot to mention that one of the reasons that I think it may have something to do with the video system is that the frame buffer would often get trashed the moment prior to a freeze.
hmm, can you please run nvidia-bug-report.sh and attach the created log here?
chunkey is offline   Reply With Quote
Old 03-29-05, 09:42 PM   #4
atrlinux
Registered User
 
Join Date: Mar 2005
Posts: 9
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

nvidia driver freeze

Fedora core 3 with all update
amd k8 3200+
asus k8v-x
nvidia gforce4 mx

a few minutes after boot, system freeze (only mouse move)

below, the log:

Mar 29 16:07:02 reis kernel: arch/x86_64/kernel/semaphore.c:65: spin_is_locked on uninitialized spinlock ffffffff8866c4b8.
Mar 29 16:07:02 reis kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
Mar 29 16:07:02 reis kernel: <ffffffff803bab2d>{__down+237}
Mar 29 16:07:02 reis kernel: PGD 27936067 PUD 27908067 PMD 0
Mar 29 16:07:02 reis kernel: Oops: 0002 [1] PREEMPT
Mar 29 16:07:02 reis kernel: CPU 0
Mar 29 16:07:02 reis kernel: Modules linked in: parport_pc lp parport ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button battery ac nvidia md5 ipv6 uhci_hcd ehci_hcd emu10k1_gp gameport i2c_viapro i2c_core snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcoresk98lin 8139too mii floppy ext3 jbd sata_via libata sd_mod scsi_mod
Mar 29 16:07:02 reis kernel: Pid: 5270, comm: grep Tainted: P 2.6.11.6
Mar 29 16:07:02 reis kernel: RIP: 0010:[<ffffffff803bab2d>] <ffffffff803bab2d>{__down+237}
Mar 29 16:07:02 reis kernel: RSP: 0018:ffff810027935cf8 EFLAGS: 00010002
Mar 29 16:07:02 reis kernel: RAX: ffff810027935d30 RBX: ffffffff8866c4b0 RCX: 0000000000000000
Mar 29 16:07:02 reis kernel: RDX: ffffffff8866c4e8 RSI: 0000000000000001 RDI: 0000000000000001
Mar 29 16:07:02 reis kernel: RBP: ffffffff8866c4b8 R08: ffff81003f4466c0 R09: 000000000000000f
Mar 29 16:07:02 reis kernel: R10: 00000000ffffffff R11: 0000000000000000 R12: ffff810034675810
Mar 29 16:07:02 reis kernel: R13: ffff810027935d18 R14: ffff81003710a6c0 R15: ffff81003710a6c0
Mar 29 16:07:02 reis kernel: FS: 00002aaaaaac6de0(0000) GS:ffffffff80556840(0000) knlGS:00000000557b7ee0
Mar 29 16:07:02 reis kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 29 16:07:02 reis kernel: CR2: 0000000000000000 CR3: 000000002790a000 CR4: 00000000000006e0
Mar 29 16:07:02 reis kernel: Process grep (pid: 5270, threadinfo ffff810027934000, task ffff810034675810)
Mar 29 16:07:02 reis kernel: Stack: ffff81003e9cc8c8 ffffffff00000246 0000000000000005 0000000000000246
Mar 29 16:07:02 reis kernel: 0000000000000001 ffff810034675810 ffffffff80134ec0 ffffffff8866c4e8
Mar 29 16:07:02 reis kernel: 0000000000000000 0000000000000000
Mar 29 16:07:02 reis kernel: Call Trace:<ffffffff80134ec0>{default_wake_function+0} <ffffffff803bdc4c>{__down_failed+53}
Mar 29 16:07:02 reis kernel: <ffffffff802255f0>{selinux_inode_permission+0} <ffffffff8843e84b>{:nvidia:nv_printf+92}
Mar 29 16:07:02 reis kernel: <ffffffff8843c7e7>{:nvidia:.text.lock.nv+45} <ffffffff801aac19>{chrdev_open+1097}
Mar 29 16:07:02 reis kernel: <ffffffff8019bdfd>{dentry_open+301} <ffffffff8019bf7e>{filp_open+62}
Mar 29 16:07:02 reis kernel: <ffffffff8019c157>{get_unused_fd+455} <ffffffff801b139a>{getname+138}
Mar 29 16:07:02 reis kernel: <ffffffff8019c4ac>{sys_open+76} <ffffffff8010ed0a>{system_call+126}
Mar 29 16:07:02 reis kernel:
Mar 29 16:07:02 reis kernel:
Mar 29 16:07:02 reis kernel: Code: 48 89 01 48 89 48 08 ff 43 04 8b 43 04 ff c8 01 03 0f 98 c0
Mar 29 16:07:02 reis kernel: RIP <ffffffff803bab2d>{__down+237} RSP <ffff810027935cf8>
Mar 29 16:07:02 reis kernel: CR2: 0000000000000000
Mar 29 16:07:02 reis kernel: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
Mar 29 16:07:02 reis kernel: in_atomic():1, irqs_disabled():0
Mar 29 16:07:02 reis kernel:
Mar 29 16:07:02 reis kernel: Call Trace:<ffffffff80133dd5>{__might_sleep+197} <ffffffff8013c379>{profile_task_exit+41}
Mar 29 16:07:02 reis kernel: <ffffffff8013e522>{do_exit+34} <ffffffff80110647>{oops_end+199}
Mar 29 16:07:02 reis kernel: <ffffffff801242cf>{do_page_fault+1871} <ffffffff80135086>{__wake_up+326}
Mar 29 16:07:02 reis kernel: <ffffffff802331f5>{cond_compute_av+37} <ffffffff80230a8c>{context_struct_compute_av+684 }
Mar 29 16:07:02 reis kernel: <ffffffff8010f6dd>{error_exit+0} <ffffffff803bab2d>{__down+237}
Mar 29 16:07:02 reis kernel: <ffffffff803baaba>{__down+122} <ffffffff80134ec0>{default_wake_function+0}
Mar 29 16:07:02 reis kernel: <ffffffff803bdc4c>{__down_failed+53} <ffffffff802255f0>{selinux_inode_permission+0}
Mar 29 16:07:02 reis kernel: <ffffffff8843e84b>{:nvidia:nv_printf+92} <ffffffff8843c7e7>{:nvidia:.text.lock.nv+45}
Mar 29 16:07:02 reis kernel: <ffffffff801aac19>{chrdev_open+1097} <ffffffff8019bdfd>{dentry_open+301}
Mar 29 16:07:02 reis kernel: <ffffffff8019bf7e>{filp_open+62} <ffffffff8019c157>{get_unused_fd+455}
Mar 29 16:07:02 reis kernel: <ffffffff801b139a>{getname+138} <ffffffff8019c4ac>{sys_open+76}
Mar 29 16:07:02 reis kernel: <ffffffff8010ed0a>{system_call+126}
Mar 29 16:07:02 reis kernel: note: grep[5270] exited with preempt_count 1
Mar 29 16:07:02 reis kernel: scheduling while atomic: grep/0x10000001/5270
Mar 29 16:07:02 reis kernel:
Mar 29 16:07:02 reis kernel: Call Trace:<ffffffff803bb13a>{schedule+122} <ffffffff8013a67c>{__call_console_drivers+76}
Mar 29 16:07:02 reis kernel: <ffffffff803bd18f>{cond_resched+47} <ffffffff801824f7>{unmap_vmas+2087}
Mar 29 16:07:02 reis kernel: <ffffffff80189cb5>{exit_mmap+293} <ffffffff80137043>{mmput+51}
Mar 29 16:07:02 reis kernel: <ffffffff8013e639>{do_exit+313} <ffffffff80110647>{oops_end+199}
Mar 29 16:07:02 reis kernel: <ffffffff801242cf>{do_page_fault+1871} <ffffffff80135086>{__wake_up+326}
Mar 29 16:07:02 reis kernel: <ffffffff802331f5>{cond_compute_av+37} <ffffffff80230a8c>{context_struct_compute_av+684 }
Mar 29 16:07:02 reis kernel: <ffffffff8010f6dd>{error_exit+0} <ffffffff803bab2d>{__down+237}
Mar 29 16:07:02 reis kernel: <ffffffff803baaba>{__down+122} <ffffffff80134ec0>{default_wake_function+0}
Mar 29 16:07:02 reis kernel: <ffffffff803bdc4c>{__down_failed+53} <ffffffff802255f0>{selinux_inode_permission+0}
Mar 29 16:07:02 reis kernel: <ffffffff8843e84b>{:nvidia:nv_printf+92} <ffffffff8843c7e7>{:nvidia:.text.lock.nv+45}
Mar 29 16:07:02 reis kernel: <ffffffff801aac19>{chrdev_open+1097} <ffffffff8019bdfd>{dentry_open+301}
Mar 29 16:07:02 reis kernel: <ffffffff8019bf7e>{filp_open+62} <ffffffff8019c157>{get_unused_fd+455}
Mar 29 16:07:02 reis kernel: <ffffffff801b139a>{getname+138} <ffffffff8019c4ac>{sys_open+76}
Mar 29 16:07:02 reis kernel: <ffffffff8010ed0a>{system_call+126}
atrlinux is offline   Reply With Quote
Old 03-30-05, 07:19 AM   #5
AronRubin
Registered User
 
Join Date: Mar 2005
Posts: 8
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

In my case I am not getting an oops. I don't get anything.

Spaceballs (Editted for TV): Sir, we aint found mookh.
AronRubin is offline   Reply With Quote
Old 03-30-05, 07:25 AM   #6
AronRubin
Registered User
 
Join Date: Mar 2005
Posts: 8
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

Attached is the nvidia-bug-report.log from home. Don't put too much in the driver version it has been the same condition for the past few driver versions as well.
Attached Files
File Type: txt nvidia-bug-report.txt (94.1 KB, 190 views)
AronRubin is offline   Reply With Quote
Old 03-30-05, 07:41 AM   #7
atrlinux
Registered User
 
Join Date: Mar 2005
Posts: 9
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

fedora core 3, nvida driver 7167, amd64, asus k8v-x
problem: a few minutes after boot, system freeze (only mouse move)
Solution (workaround): Disable cpuspeed
Conclusion: nvidia drver 7167 have a bug (not work with cpuspeed daemon)
atrlinux is offline   Reply With Quote
Old 03-30-05, 08:30 AM   #8
AronRubin
Registered User
 
Join Date: Mar 2005
Posts: 8
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

I believe at least all 6k and 7k series drives exhibit this behavior. I recommend that nVIDIA notes that cpuspeed be disabled or sent SIGUSR1 to force a specific rate.

Note: killall -SIGUSR1 cpuspeed
AronRubin is offline   Reply With Quote

Old 03-30-05, 11:59 AM   #9
atrlinux
Registered User
 
Join Date: Mar 2005
Posts: 9
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

I install a 32 bit version of linux (fedora core 3) on a amd64 CPU (asus k8v-x) and all work fine with nvidia driver version 7167 (32 bits). Cpuspeed also work fine!!!
The problem is with 64 bit version of nvidia driver.
atrlinux is offline   Reply With Quote
Old 03-31-05, 10:29 AM   #10
comag
Registered User
 
comag's Avatar
 
Join Date: Sep 2004
Posts: 54
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

i have the same setup (AMD64, fc3, asus board, nvidia 7167 driver) and no problems.
cpuspeed and Cool&Quiet is is enabled.
__________________
Fedora Core 4 (2.6.17) on an x86_64
Geforce 6600GT
comag is offline   Reply With Quote
Old 03-31-05, 03:16 PM   #11
AronRubin
Registered User
 
Join Date: Mar 2005
Posts: 8
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

I could have a bad touch or something but I have had the same problem (with different occuance rates) across three different setups. The machine at work (MSI mb, A64 939 3800+, NV 6800) would not occur nearly as often. Usually right before a big deadline or meeting

Aron
AronRubin is offline   Reply With Quote
Old 03-31-05, 03:57 PM   #12
chunkey
#!/?*
 
Join Date: Oct 2004
Posts: 662
Default Re: Linux 2.6 on x86_64 + nVIDIA Card Hard Lock Problem and Workaround

@AronRubin:

Code:
NVRM: loading NVIDIA Linux x86_64 NVIDIA Kernel Module  1.0-7167  Fri Feb 25 09:11:39 PST 2005
NVRM: WARNING: Your Linux kernel has problems in its implementation of
NVRM: the change_page_attr kernel interface.  The NVIDIA kernel
NVRM: module will attempt to work around these problems, but
NVRM: system stability may be affected.  It is recommended that
NVRM: you update to a 2.6.11 or newer kernel.
or if an update (to 2.6.11 of course) doesn't help, disable Linux Security Extension (SElinux) and see if it helps...
chunkey is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 06:47 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.