nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA FreeBSD (http://www.nvnews.net/vbulletin/forumdisplay.php?f=47)
-   -   panics with 195.36.24 (http://www.nvnews.net/vbulletin/showthread.php?t=150920)

crsd 05-13-10 05:40 AM

panics with 195.36.24
 
Hi,

I'm getting following panics with nvidia-driver 195.36.24 on 9.0-CURRENT r207995 amd64:

with debug.witness.watch=1 (reproducible on xorg-server start):

Code:

blockable sleep lock (sleep mutex) select mtxpool @ sys/kern/sys_generic.c:1479

db:0:kdb.enter.panic>  run lockinfo
db:1:lockinfo> show locks
db:1:locks>  show alllocks
Process 1509 (xdm) thread 0xffffff005da09000 (100218)
exclusive sx user map (user map) r = 0 (0xffffff005d564b68) locked @ /home/yuri/src/FreeBSD/head/sys/vm/vm_map.c:2991
db:1:alllocks>  show lockedvnods
Locked vnodes
db:0:kdb.enter.panic>  show pcpu
cpuid        = 3
dynamic pcpu    = 0xffffff807f3e8780
curthread    = 0xffffff005d9eeb40: pid 1511 "Xorg"
curpcb      = 0xffffff8058913d40
fpcurthread  = none
idlethread  = 0xffffff000340a780: pid 11 "idle: cpu3"
curpmap        = 0
tssp            = 0xffffffff80e8cc38
commontssp      = 0xffffffff80e8cc38
rsp0            = 0xffffff8058913d40
gs32p          = 0xffffffff80e8ba70
ldt            = 0xffffffff80e8bab0
tss            = 0xffffffff80e8baa0
spin locks held:
db:0:kdb.enter.panic>  bt
Tracing pid 1511 tid 100219 td 0xffffff005d9eeb40
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
witness_checkorder() at witness_checkorder+0x948
_mtx_lock_flags() at _mtx_lock_flags+0x78
selrecord() at selrecord+0x81
nvidia_dev_poll() at nvidia_dev_poll+0x57
devfs_poll_f() at devfs_poll_f+0x61
kern_select() at kern_select+0x4f2
select() at select+0x5d
syscall() at syscall+0x102
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (93, FreeBSD ELF64, select), rip = 0x8016c0ecc, rsp = 0x7fffffffe9e8, rbp = 0x6c2160 ---

with debug.witness.watch=0 (random):

Code:

mi_switch: switch in a critical section

db:0:kdb.enter.panic>  run lockinfo
db:1:lockinfo> show locks
db:1:locks>  show alllocks
db:1:alllocks>  show lockedvnods
Locked vnodes
db:0:kdb.enter.panic>  show pcpu
cpuid        = 2
dynamic pcpu    = 0xffffff807f3e1780
curthread    = 0xffffff0005db2000: pid 1518 "Xorg"
curpcb      = 0xffffff80588aad40
fpcurthread  = 0xffffff0005db2000: pid 1518 "Xorg"
idlethread  = 0xffffff000340a3c0: pid 11 "idle: cpu2"
curpmap        = 0
tssp            = 0xffffffff80e8cbd0
commontssp      = 0xffffffff80e8cbd0
rsp0            = 0xffffff80588aad40
gs32p          = 0xffffffff80e8ba08
ldt            = 0xffffffff80e8ba48
tss            = 0xffffffff80e8ba38
spin locks held:
db:0:kdb.enter.panic>  bt
Tracing pid 1518 tid 100198 td 0xffffff0005db2000
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
mi_switch() at mi_switch+0x341
turnstile_wait() at turnstile_wait+0x243
_mtx_lock_sleep() at _mtx_lock_sleep+0xd6
_mtx_lock_flags() at _mtx_lock_flags+0xe1
selrecord() at selrecord+0x81
nvidia_dev_poll() at nvidia_dev_poll+0x57
devfs_poll_f() at devfs_poll_f+0x61
kern_select() at kern_select+0x4f2
select() at select+0x5d
syscall() at syscall+0x102
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (93, FreeBSD ELF64, select), rip = 0x8016c0ecc, rsp = 0x7fffffffe9e8, rbp = 0x801c20cc0 ---


zander 05-13-10 10:59 AM

Re: panics with 195.36.24
 
Is this specific to 195.36.24 (vs. e.g. 195.36.15) or FreeBSD 9? Or neither?

crsd 05-13-10 11:20 AM

Re: panics with 195.36.24
 
Another panic with all available for amd64 drivers (195.22, 195.36.15, 195.36.24) after recent changes to vm (on X server start):
Code:

mutex page lock not owned at /home/yuri/src/FreeBSD/head/sys/vm/vm_page.c:1572

db:0:kdb.enter.panic>  run lockinfo
db:1:lockinfo> show locks
db:1:locks>  show alllocks
db:1:alllocks>  show lockedvnods
Locked vnodes
db:0:kdb.enter.panic>  show pcpu
cpuid        = 1
dynamic pcpu    = 0xffffff807f3da780
curthread    = 0xffffff0005dcd780: pid 1518 "Xorg"
curpcb      = 0xffffff80587c9d40
fpcurthread  = none
idlethread  = 0xffffff000340a000: pid 11 "idle: cpu1"
curpmap        = 0
tssp            = 0xffffffff80e8cb68
commontssp      = 0xffffffff80e8cb68
rsp0            = 0xffffff80587c9d40
gs32p          = 0xffffffff80e8b9a0
ldt            = 0xffffffff80e8b9e0
tss            = 0xffffffff80e8b9d0
spin locks held:
db:0:kdb.enter.panic>  bt
Tracing pid 1518 tid 100153 td 0xffffff0005dcd780
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
_mtx_assert() at _mtx_assert+0xdc
vm_page_wire() at vm_page_wire+0x37
nv_alloc_system_pages() at nv_alloc_system_pages+0x21a
nv_alloc_pages() at nv_alloc_pages+0xdd
_nv020074rm() at _nv020074rm+0x7f

Otherwise 195.22 works stable (after commenting out assert in sys/vm/vm_page.c, which is done for other driver versions as well).

Panics reported in previous post are specific to FreeBSD 9 and both 195.36.15 and 195.36.24.

arundel 05-13-10 01:37 PM

Re: panics with 195.36.24
 
this thread might be interesting in connection with your panic:

http://www.mail-archive.com/freebsd-...msg122234.html

alan cox is currently changing some vm stuff.

crsd 05-13-10 01:43 PM

Re: panics with 195.36.24
 
Yes, I started that thread, should have linked it here too. :-) Thanks

crsd 06-23-10 03:59 AM

Re: panics with 195.36.24
 
Still getting lot of random (?) panics with 256.35 -CURRENT/amd64 r209358:
Code:

mi_switch: switch in a critical section

cpuid        = 1
dynamic pcpu = 0xffffff807f399400
curthread    = 0xffffff011c337000: pid 1952 "Xorg"
curpcb      = 0xffffff80796f2d40
fpcurthread  = none
idlethread  = 0xffffff000351d000: tid 100005 "idle: cpu1"
curpmap      = 0
tssp        = 0xffffffff80eadd68
commontssp  = 0xffffffff80eadd68
rsp0        = 0xffffff80796f2d40
gs32p        = 0xffffffff80eacba0
ldt          = 0xffffffff80eacbe0
tss          = 0xffffffff80eacbd0
spin locks held:
db:0:kdb.enter.panic>  bt
Tracing pid 1952 tid 100225 td 0xffffff011c337000
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
mi_switch() at mi_switch+0x341
turnstile_wait() at turnstile_wait+0x243
_mtx_lock_sleep() at _mtx_lock_sleep+0xd6
_mtx_lock_flags() at _mtx_lock_flags+0xe1
selrecord() at selrecord+0x81
nvidia_dev_poll() at nvidia_dev_poll+0x57
devfs_poll_f() at devfs_poll_f+0x61
kern_select() at kern_select+0x4f2
select() at select+0x5d
syscallenter() at syscallenter+0xf0
syscall() at syscall+0x4c
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (93, FreeBSD ELF64, select), rip = 0x8016d5ebc, rsp = 0x7fffffffe9e8, rbp = 0x801c217c0 ---


bigknife 07-07-10 07:38 AM

Re: panics with 195.36.24
 
I suspect these panics are caused by the use of spin mutexes for &filep->event_mtx. It is certainly not safe to hold a spin mutex across selrecord(). Given that the nvidia driver uses a regular interrupt handler (rather than a filter), it should be safe to simply convert the event_mtx locks to be a regular mutex. To do that, replace 'MTX_SPIN' with 'MTX_DEF' in the mtx_init() calls in nvidia_ctl.c and nvidia_dev.c and replace all calls to mtx_lock_spin() and mtx_unlock_spin() with calls to mtx_lock() and mtx_unlock() instead.

zander 07-07-10 10:12 AM

Re: panics with 195.36.24
 
That's a good point. However, looking at that piece of code again, I don't think the mutex should be held across the call to selrecord(), even if it were of type MTX_DEF (see nv_kern_post()). I'll try to make some time to get this fixed soon.

bigknife 07-08-10 08:14 AM

Re: panics with 195.36.24
 
Yes, selrecord() and selwakeup() do have internal locking (albeit a single global mutex, ugh). One minor nit, if devfs_get_cdevpriv() fails for some reason in nvidia_ctl_poll() or nvidia_dev_poll(), then the function should return a mask of 0 rather than the errno value from devfs_get_cdevpriv().

zander 07-13-10 11:46 AM

Re: panics with 195.36.24
 
I'll fix that, too.

crsd 08-20-10 07:47 PM

Re: panics with 195.36.24
 
So everything looks stable again with 256.44, thanks!


All times are GMT -5. The time now is 04:21 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.