Go Back   nV News Forums > Linux Support Forums > NVIDIA FreeBSD

Newegg Daily Deals

Reply
 
Thread Tools
Old 05-13-10, 06:40 AM   #1
crsd
Registered User
 
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
Post panics with 195.36.24

Hi,

I'm getting following panics with nvidia-driver 195.36.24 on 9.0-CURRENT r207995 amd64:

with debug.witness.watch=1 (reproducible on xorg-server start):

Code:
blockable sleep lock (sleep mutex) select mtxpool @ sys/kern/sys_generic.c:1479

db:0:kdb.enter.panic>  run lockinfo
db:1:lockinfo> show locks
db:1:locks>  show alllocks
Process 1509 (xdm) thread 0xffffff005da09000 (100218)
exclusive sx user map (user map) r = 0 (0xffffff005d564b68) locked @ /home/yuri/src/FreeBSD/head/sys/vm/vm_map.c:2991
db:1:alllocks>  show lockedvnods
Locked vnodes
db:0:kdb.enter.panic>  show pcpu
cpuid        = 3
dynamic pcpu    = 0xffffff807f3e8780
curthread    = 0xffffff005d9eeb40: pid 1511 "Xorg"
curpcb       = 0xffffff8058913d40
fpcurthread  = none
idlethread   = 0xffffff000340a780: pid 11 "idle: cpu3"
curpmap         = 0
tssp            = 0xffffffff80e8cc38
commontssp      = 0xffffffff80e8cc38
rsp0            = 0xffffff8058913d40
gs32p           = 0xffffffff80e8ba70
ldt             = 0xffffffff80e8bab0
tss             = 0xffffffff80e8baa0
spin locks held:
db:0:kdb.enter.panic>  bt
Tracing pid 1511 tid 100219 td 0xffffff005d9eeb40
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
witness_checkorder() at witness_checkorder+0x948
_mtx_lock_flags() at _mtx_lock_flags+0x78
selrecord() at selrecord+0x81
nvidia_dev_poll() at nvidia_dev_poll+0x57
devfs_poll_f() at devfs_poll_f+0x61
kern_select() at kern_select+0x4f2
select() at select+0x5d
syscall() at syscall+0x102
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (93, FreeBSD ELF64, select), rip = 0x8016c0ecc, rsp = 0x7fffffffe9e8, rbp = 0x6c2160 ---
with debug.witness.watch=0 (random):

Code:
mi_switch: switch in a critical section

db:0:kdb.enter.panic>  run lockinfo
db:1:lockinfo> show locks
db:1:locks>  show alllocks
db:1:alllocks>  show lockedvnods
Locked vnodes
db:0:kdb.enter.panic>  show pcpu
cpuid        = 2
dynamic pcpu    = 0xffffff807f3e1780
curthread    = 0xffffff0005db2000: pid 1518 "Xorg"
curpcb       = 0xffffff80588aad40
fpcurthread  = 0xffffff0005db2000: pid 1518 "Xorg"
idlethread   = 0xffffff000340a3c0: pid 11 "idle: cpu2"
curpmap         = 0
tssp            = 0xffffffff80e8cbd0
commontssp      = 0xffffffff80e8cbd0
rsp0            = 0xffffff80588aad40
gs32p           = 0xffffffff80e8ba08
ldt             = 0xffffffff80e8ba48
tss             = 0xffffffff80e8ba38
spin locks held:
db:0:kdb.enter.panic>  bt
Tracing pid 1518 tid 100198 td 0xffffff0005db2000
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
mi_switch() at mi_switch+0x341
turnstile_wait() at turnstile_wait+0x243
_mtx_lock_sleep() at _mtx_lock_sleep+0xd6
_mtx_lock_flags() at _mtx_lock_flags+0xe1
selrecord() at selrecord+0x81
nvidia_dev_poll() at nvidia_dev_poll+0x57
devfs_poll_f() at devfs_poll_f+0x61
kern_select() at kern_select+0x4f2
select() at select+0x5d
syscall() at syscall+0x102
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (93, FreeBSD ELF64, select), rip = 0x8016c0ecc, rsp = 0x7fffffffe9e8, rbp = 0x801c20cc0 ---
crsd is offline   Reply With Quote
Old 05-13-10, 11:59 AM   #2
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: panics with 195.36.24

Is this specific to 195.36.24 (vs. e.g. 195.36.15) or FreeBSD 9? Or neither?
zander is offline   Reply With Quote
Old 05-13-10, 12:20 PM   #3
crsd
Registered User
 
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
Default Re: panics with 195.36.24

Another panic with all available for amd64 drivers (195.22, 195.36.15, 195.36.24) after recent changes to vm (on X server start):
Code:
mutex page lock not owned at /home/yuri/src/FreeBSD/head/sys/vm/vm_page.c:1572

db:0:kdb.enter.panic>  run lockinfo
db:1:lockinfo> show locks
db:1:locks>  show alllocks
db:1:alllocks>  show lockedvnods
Locked vnodes
db:0:kdb.enter.panic>  show pcpu
cpuid        = 1
dynamic pcpu    = 0xffffff807f3da780
curthread    = 0xffffff0005dcd780: pid 1518 "Xorg"
curpcb       = 0xffffff80587c9d40
fpcurthread  = none
idlethread   = 0xffffff000340a000: pid 11 "idle: cpu1"
curpmap         = 0
tssp            = 0xffffffff80e8cb68
commontssp      = 0xffffffff80e8cb68
rsp0            = 0xffffff80587c9d40
gs32p           = 0xffffffff80e8b9a0
ldt             = 0xffffffff80e8b9e0
tss             = 0xffffffff80e8b9d0
spin locks held:
db:0:kdb.enter.panic>  bt
Tracing pid 1518 tid 100153 td 0xffffff0005dcd780
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
_mtx_assert() at _mtx_assert+0xdc
vm_page_wire() at vm_page_wire+0x37
nv_alloc_system_pages() at nv_alloc_system_pages+0x21a
nv_alloc_pages() at nv_alloc_pages+0xdd
_nv020074rm() at _nv020074rm+0x7f
Otherwise 195.22 works stable (after commenting out assert in sys/vm/vm_page.c, which is done for other driver versions as well).

Panics reported in previous post are specific to FreeBSD 9 and both 195.36.15 and 195.36.24.
crsd is offline   Reply With Quote
Old 05-13-10, 02:37 PM   #4
arundel
Registered User
 
Join Date: May 2008
Posts: 36
Default Re: panics with 195.36.24

this thread might be interesting in connection with your panic:

http://www.mail-archive.com/freebsd-...msg122234.html

alan cox is currently changing some vm stuff.
arundel is offline   Reply With Quote
Old 05-13-10, 02:43 PM   #5
crsd
Registered User
 
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
Default Re: panics with 195.36.24

Yes, I started that thread, should have linked it here too. :-) Thanks
crsd is offline   Reply With Quote
Old 06-23-10, 04:59 AM   #6
crsd
Registered User
 
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
Unhappy Re: panics with 195.36.24

Still getting lot of random (?) panics with 256.35 -CURRENT/amd64 r209358:
Code:
mi_switch: switch in a critical section

cpuid        = 1
dynamic pcpu = 0xffffff807f399400
curthread    = 0xffffff011c337000: pid 1952 "Xorg"
curpcb       = 0xffffff80796f2d40
fpcurthread  = none
idlethread   = 0xffffff000351d000: tid 100005 "idle: cpu1"
curpmap      = 0
tssp         = 0xffffffff80eadd68
commontssp   = 0xffffffff80eadd68
rsp0         = 0xffffff80796f2d40
gs32p        = 0xffffffff80eacba0
ldt          = 0xffffffff80eacbe0
tss          = 0xffffffff80eacbd0
spin locks held:
db:0:kdb.enter.panic>  bt
Tracing pid 1952 tid 100225 td 0xffffff011c337000
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
mi_switch() at mi_switch+0x341
turnstile_wait() at turnstile_wait+0x243
_mtx_lock_sleep() at _mtx_lock_sleep+0xd6
_mtx_lock_flags() at _mtx_lock_flags+0xe1
selrecord() at selrecord+0x81
nvidia_dev_poll() at nvidia_dev_poll+0x57
devfs_poll_f() at devfs_poll_f+0x61
kern_select() at kern_select+0x4f2
select() at select+0x5d
syscallenter() at syscallenter+0xf0
syscall() at syscall+0x4c
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (93, FreeBSD ELF64, select), rip = 0x8016d5ebc, rsp = 0x7fffffffe9e8, rbp = 0x801c217c0 ---
crsd is offline   Reply With Quote
Old 07-07-10, 08:38 AM   #7
bigknife
Registered User
 
Join Date: Jul 2010
Posts: 2
Default Re: panics with 195.36.24

I suspect these panics are caused by the use of spin mutexes for &filep->event_mtx. It is certainly not safe to hold a spin mutex across selrecord(). Given that the nvidia driver uses a regular interrupt handler (rather than a filter), it should be safe to simply convert the event_mtx locks to be a regular mutex. To do that, replace 'MTX_SPIN' with 'MTX_DEF' in the mtx_init() calls in nvidia_ctl.c and nvidia_dev.c and replace all calls to mtx_lock_spin() and mtx_unlock_spin() with calls to mtx_lock() and mtx_unlock() instead.
bigknife is offline   Reply With Quote
Old 07-07-10, 11:12 AM   #8
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: panics with 195.36.24

That's a good point. However, looking at that piece of code again, I don't think the mutex should be held across the call to selrecord(), even if it were of type MTX_DEF (see nv_kern_post()). I'll try to make some time to get this fixed soon.
zander is offline   Reply With Quote

Old 07-08-10, 09:14 AM   #9
bigknife
Registered User
 
Join Date: Jul 2010
Posts: 2
Default Re: panics with 195.36.24

Yes, selrecord() and selwakeup() do have internal locking (albeit a single global mutex, ugh). One minor nit, if devfs_get_cdevpriv() fails for some reason in nvidia_ctl_poll() or nvidia_dev_poll(), then the function should return a mask of 0 rather than the errno value from devfs_get_cdevpriv().
bigknife is offline   Reply With Quote
Old 07-13-10, 12:46 PM   #10
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: panics with 195.36.24

I'll fix that, too.
zander is offline   Reply With Quote
Old 08-20-10, 08:47 PM   #11
crsd
Registered User
 
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
Default Re: panics with 195.36.24

So everything looks stable again with 256.44, thanks!
crsd is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 03:27 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.