|
|
#1 | |
|
Registered User
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
|
Hi,
I'm getting following panics with nvidia-driver 195.36.24 on 9.0-CURRENT r207995 amd64: with debug.witness.watch=1 (reproducible on xorg-server start): Code:
blockable sleep lock (sleep mutex) select mtxpool @ sys/kern/sys_generic.c:1479 db:0:kdb.enter.panic> run lockinfo db:1:lockinfo> show locks db:1:locks> show alllocks Process 1509 (xdm) thread 0xffffff005da09000 (100218) exclusive sx user map (user map) r = 0 (0xffffff005d564b68) locked @ /home/yuri/src/FreeBSD/head/sys/vm/vm_map.c:2991 db:1:alllocks> show lockedvnods Locked vnodes db:0:kdb.enter.panic> show pcpu cpuid = 3 dynamic pcpu = 0xffffff807f3e8780 curthread = 0xffffff005d9eeb40: pid 1511 "Xorg" curpcb = 0xffffff8058913d40 fpcurthread = none idlethread = 0xffffff000340a780: pid 11 "idle: cpu3" curpmap = 0 tssp = 0xffffffff80e8cc38 commontssp = 0xffffffff80e8cc38 rsp0 = 0xffffff8058913d40 gs32p = 0xffffffff80e8ba70 ldt = 0xffffffff80e8bab0 tss = 0xffffffff80e8baa0 spin locks held: db:0:kdb.enter.panic> bt Tracing pid 1511 tid 100219 td 0xffffff005d9eeb40 kdb_enter() at kdb_enter+0x3d panic() at panic+0x17b witness_checkorder() at witness_checkorder+0x948 _mtx_lock_flags() at _mtx_lock_flags+0x78 selrecord() at selrecord+0x81 nvidia_dev_poll() at nvidia_dev_poll+0x57 devfs_poll_f() at devfs_poll_f+0x61 kern_select() at kern_select+0x4f2 select() at select+0x5d syscall() at syscall+0x102 Xfast_syscall() at Xfast_syscall+0xe1 --- syscall (93, FreeBSD ELF64, select), rip = 0x8016c0ecc, rsp = 0x7fffffffe9e8, rbp = 0x6c2160 --- Code:
mi_switch: switch in a critical section db:0:kdb.enter.panic> run lockinfo db:1:lockinfo> show locks db:1:locks> show alllocks db:1:alllocks> show lockedvnods Locked vnodes db:0:kdb.enter.panic> show pcpu cpuid = 2 dynamic pcpu = 0xffffff807f3e1780 curthread = 0xffffff0005db2000: pid 1518 "Xorg" curpcb = 0xffffff80588aad40 fpcurthread = 0xffffff0005db2000: pid 1518 "Xorg" idlethread = 0xffffff000340a3c0: pid 11 "idle: cpu2" curpmap = 0 tssp = 0xffffffff80e8cbd0 commontssp = 0xffffffff80e8cbd0 rsp0 = 0xffffff80588aad40 gs32p = 0xffffffff80e8ba08 ldt = 0xffffffff80e8ba48 tss = 0xffffffff80e8ba38 spin locks held: db:0:kdb.enter.panic> bt Tracing pid 1518 tid 100198 td 0xffffff0005db2000 kdb_enter() at kdb_enter+0x3d panic() at panic+0x17b mi_switch() at mi_switch+0x341 turnstile_wait() at turnstile_wait+0x243 _mtx_lock_sleep() at _mtx_lock_sleep+0xd6 _mtx_lock_flags() at _mtx_lock_flags+0xe1 selrecord() at selrecord+0x81 nvidia_dev_poll() at nvidia_dev_poll+0x57 devfs_poll_f() at devfs_poll_f+0x61 kern_select() at kern_select+0x4f2 select() at select+0x5d syscall() at syscall+0x102 Xfast_syscall() at Xfast_syscall+0xe1 --- syscall (93, FreeBSD ELF64, select), rip = 0x8016c0ecc, rsp = 0x7fffffffe9e8, rbp = 0x801c20cc0 --- |
|
|
|
|
|
|
#2 | |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,740
|
Is this specific to 195.36.24 (vs. e.g. 195.36.15) or FreeBSD 9? Or neither?
|
|
|
|
|
|
|
#3 |
|
Registered User
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
|
Another panic with all available for amd64 drivers (195.22, 195.36.15, 195.36.24) after recent changes to vm (on X server start):
Code:
mutex page lock not owned at /home/yuri/src/FreeBSD/head/sys/vm/vm_page.c:1572 db:0:kdb.enter.panic> run lockinfo db:1:lockinfo> show locks db:1:locks> show alllocks db:1:alllocks> show lockedvnods Locked vnodes db:0:kdb.enter.panic> show pcpu cpuid = 1 dynamic pcpu = 0xffffff807f3da780 curthread = 0xffffff0005dcd780: pid 1518 "Xorg" curpcb = 0xffffff80587c9d40 fpcurthread = none idlethread = 0xffffff000340a000: pid 11 "idle: cpu1" curpmap = 0 tssp = 0xffffffff80e8cb68 commontssp = 0xffffffff80e8cb68 rsp0 = 0xffffff80587c9d40 gs32p = 0xffffffff80e8b9a0 ldt = 0xffffffff80e8b9e0 tss = 0xffffffff80e8b9d0 spin locks held: db:0:kdb.enter.panic> bt Tracing pid 1518 tid 100153 td 0xffffff0005dcd780 kdb_enter() at kdb_enter+0x3d panic() at panic+0x17b _mtx_assert() at _mtx_assert+0xdc vm_page_wire() at vm_page_wire+0x37 nv_alloc_system_pages() at nv_alloc_system_pages+0x21a nv_alloc_pages() at nv_alloc_pages+0xdd _nv020074rm() at _nv020074rm+0x7f Panics reported in previous post are specific to FreeBSD 9 and both 195.36.15 and 195.36.24. |
|
|
|
|
|
#4 | |
|
Registered User
Join Date: May 2008
Posts: 36
|
this thread might be interesting in connection with your panic:
http://www.mail-archive.com/freebsd-...msg122234.html alan cox is currently changing some vm stuff. |
|
|
|
|
|
|
#5 |
|
Registered User
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
|
Yes, I started that thread, should have linked it here too. :-) Thanks
|
|
|
|
|
|
#6 | |
|
Registered User
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
|
Still getting lot of random (?) panics with 256.35 -CURRENT/amd64 r209358:
Code:
mi_switch: switch in a critical section cpuid = 1 dynamic pcpu = 0xffffff807f399400 curthread = 0xffffff011c337000: pid 1952 "Xorg" curpcb = 0xffffff80796f2d40 fpcurthread = none idlethread = 0xffffff000351d000: tid 100005 "idle: cpu1" curpmap = 0 tssp = 0xffffffff80eadd68 commontssp = 0xffffffff80eadd68 rsp0 = 0xffffff80796f2d40 gs32p = 0xffffffff80eacba0 ldt = 0xffffffff80eacbe0 tss = 0xffffffff80eacbd0 spin locks held: db:0:kdb.enter.panic> bt Tracing pid 1952 tid 100225 td 0xffffff011c337000 kdb_enter() at kdb_enter+0x3d panic() at panic+0x17b mi_switch() at mi_switch+0x341 turnstile_wait() at turnstile_wait+0x243 _mtx_lock_sleep() at _mtx_lock_sleep+0xd6 _mtx_lock_flags() at _mtx_lock_flags+0xe1 selrecord() at selrecord+0x81 nvidia_dev_poll() at nvidia_dev_poll+0x57 devfs_poll_f() at devfs_poll_f+0x61 kern_select() at kern_select+0x4f2 select() at select+0x5d syscallenter() at syscallenter+0xf0 syscall() at syscall+0x4c Xfast_syscall() at Xfast_syscall+0xe1 --- syscall (93, FreeBSD ELF64, select), rip = 0x8016d5ebc, rsp = 0x7fffffffe9e8, rbp = 0x801c217c0 --- |
|
|
|
|
|
|
#7 |
|
Registered User
Join Date: Jul 2010
Posts: 2
|
I suspect these panics are caused by the use of spin mutexes for &filep->event_mtx. It is certainly not safe to hold a spin mutex across selrecord(). Given that the nvidia driver uses a regular interrupt handler (rather than a filter), it should be safe to simply convert the event_mtx locks to be a regular mutex. To do that, replace 'MTX_SPIN' with 'MTX_DEF' in the mtx_init() calls in nvidia_ctl.c and nvidia_dev.c and replace all calls to mtx_lock_spin() and mtx_unlock_spin() with calls to mtx_lock() and mtx_unlock() instead.
|
|
|
|
|
|
#8 |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,740
|
That's a good point. However, looking at that piece of code again, I don't think the mutex should be held across the call to selrecord(), even if it were of type MTX_DEF (see nv_kern_post()). I'll try to make some time to get this fixed soon.
|
|
|
|
|
|
#9 |
|
Registered User
Join Date: Jul 2010
Posts: 2
|
Yes, selrecord() and selwakeup() do have internal locking (albeit a single global mutex, ugh). One minor nit, if devfs_get_cdevpriv() fails for some reason in nvidia_ctl_poll() or nvidia_dev_poll(), then the function should return a mask of 0 rather than the errno value from devfs_get_cdevpriv().
|
|
|
|
|
|
#10 |
|
NVIDIA Corporation
Join Date: Aug 2002
Posts: 3,740
|
I'll fix that, too.
|
|
|
|
|
|
#11 |
|
Registered User
Join Date: Jan 2008
Location: Russia, Krasnodar
Posts: 7
|
So everything looks stable again with 256.44, thanks!
|
|
|
|
![]() |
| Thread Tools | |
|
|