Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 11-07-11, 12:31 PM   #1
natewiebe13
Registered User
 
Join Date: Nov 2011
Posts: 8
Default Segfault with 285.05.09

I'm using Ubuntu 11.10 x64 with an nvidia gtx 285m. I'm using the nvidia 285.05.09 drivers, and when using Blender 2.6 (using the svn version that I compile), I get a segfault. Removing the nvidia driver and using mesa or nouveau drivers, I get no segfault. I have also confirmed with other Ubuntu + nvidia users that share the same result.

Here is a core-dump and backtrace:

Code:
Reading symbols from /home/nate/comp/install/linux/blender...(no debugging symbols found)...done.
[New LWP 5116]
[New LWP 5090]
[New LWP 5112]
[New LWP 5093]
[New LWP 5092]
[New LWP 5095]
[New LWP 5114]
[New LWP 5115]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Core was generated by `./blender'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f3900dd1a08 in _nv007tls ()
   from /usr/lib/nvidia-current/tls/libnvidia-tls.so.285.05.09
(gdb) bt
#0  0x00007f3900dd1a08 in _nv007tls ()
   from /usr/lib/nvidia-current/tls/libnvidia-tls.so.285.05.09
#1  0x00007f390501dce3 in __nptl_deallocate_tsd ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f390501df0a in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f390579c89d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()
(gdb) quit
natewiebe13 is offline   Reply With Quote
Old 12-07-11, 05:34 PM   #2
edscottwilson
Registered User
 
Join Date: Dec 2011
Posts: 2
Default Re: Segfault with 285.05.09

Same problem here, with gentoo linux and 290.10 driver and geforce 7300le card with Rodent Filemanager.

Quote:
Program terminated with signal 11, Segmentation fault.
#0 0x00007fdae266ea08 in _nv007tls ()
from /usr/lib64/tls/libnvidia-tls.so.290.10
(gdb) where
#0 0x00007fdae266ea08 in _nv007tls ()
from /usr/lib64/tls/libnvidia-tls.so.290.10
#1 0x00007fdae4de9f69 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#2 0x00007fdae4dea922 in start_thread () from /lib64/libpthread.so.0
#3 0x00007fdae49527dd in clone () from /lib64/libc.so.6
edscottwilson is offline   Reply With Quote
Old 12-08-11, 05:41 AM   #3
sandipt
NVIDIA Corporation
 
sandipt's Avatar
 
Join Date: Dec 2010
Posts: 260
Default Re: Segfault with 285.05.09

Please provide nvidia-bug-report, reproduction steps and if possible any test model you guys using.
sandipt is offline   Reply With Quote
Old 12-09-11, 05:35 AM   #4
dr_barnowl
Registered User
 
Join Date: Dec 2011
Posts: 4
Default Re: Segfault with 285.05.09

Kernel logs too large for forum ; nvidia-bug-report attached.

http://dl.dropbox.com/u/10616420/nvidia-crash/kern.log
http://dl.dropbox.com/u/10616420/nvi.../kern.log.1.gz
http://dl.dropbox.com/u/10616420/nvi.../kern.log.2.gz
http://dl.dropbox.com/u/10616420/nvi.../kern.log.3.gz
http://dl.dropbox.com/u/10616420/nvi.../kern.log.4.gz

I had quite a long post written out, then I went and spoiled things by writing that my system uptime had got to 1 hour 20 at the end of it.. my system naturally crashed as soon as I did that. Now I'm posting from my laptop. That crash can't be seen in the log - it should be in kern.log at Dec 9 10:10 or thereabouts but apparently the system locked up before it could write anything to disk. There is more interesting stuff after that, plenty of segfaults, Xid errors and other detail in these logs.

System :

Ubuntu "Oneiric" Linux 64-bit
Windows Vista 32-bit
Core 2 Quad, nForce chipset, 6GB of RAM
ASUS DirectCU II 560Ti

No overclocking, conservative memory timings. GPU set to lowest clock speed in xorg.conf
System runs mprime (from SSH on 64-bit Linux) and Memtest86 with no test problems.

Symptoms :

Crashes, lockups and hard resets. When applications start crashing, a lockup or reset is not far behind, but sometimes you get no real warning. The amount of uptime I get varies.

Once the system has crashed once, it remains in an unstable state, even after reboots. It's like there's some bad state that accumulates in the card. Note that this "unstable state" only applies to Linux, not Windows - you can reboot to Windows and start playing games.

Windows runs fine with one exception. It will play Skyrim for hours, etc. The exception is that after I've been running Linux, Windows will hard reset the machine the first time it boots, just after it reaches the login screen (when it initializes Aero?). The next time it boots, it runs fine. This supports the idea that there is "bad state" in the GPU card - it makes Windows crash, but the process of initialization cleans it out.

Reproduction :

No real recipe. "Doing graphical stuff" is what seems to provoke problems. I see a lot of crashes in chromium-browser and npviewer.bin (wrapping Adobe Flash) but even drawing the (Unity, composited) desktop causes problems. Events I've associated with crashes include, scrolling windows, seeing notifier popups, coming out of screensaver, but once you get to the unstable state, basically anything causes problems.

I rebooted after the crash mentioned at the top of this post, processes were crashing before the desktop had even finished loading.

Things I've tried :

Drivers 280, 285, 290, 275

Oneiric ships with 280. Other driver versions are available from x-swat repository on Launchpad.

nouveau (although I'm not sure I gave it a fair shake - I think the "bad state" may have still been present, but it still crashed, even in Unity 2D).

Things I have left to try :

Driver 270
- not sure this will work well on my hardware, I remember having to upgrade Natty from this driver to support non-basic screen resolutions. But I don't remember Natty crashing like this.

Ubuntu Natty
- I don't recall Natty crashing, even after upgrading the nvidia driver to support my new 560Ti card.

My old GTS8800
- I never had stability problems with this card - the newest drivers that I ran with it were the 270 drivers on Natty though. Would be interesting to see if the newer drivers can make it unstable.

New graphics hardware
- I still like to play games, and even if going back to the GTS8800 fixes my Linux stability issues, I'll be disappointed that I can't play modern games in the fidelity I'd like.

Things I'm not going to try :
- 32-bit version of Ubuntu

It's more important for my system to be stable in 64-bit Linux than it is for me to play games - this system is a "working" system first. I actually need a 64-bit OS because I work on some pretty RAM hungry Java applications, or I'd try out the 32-bit version of Ubuntu - anecdotally, this may be (more) stable.

-----

If it wasn't for Windows running games just fine for extended periods, with no texture corruption, etc, I would have pegged this for a hardware fault right from the start. The system is currently running Minecraft and Chrome / YouTube in Windows Vista (32) with zero apparent problems, despite having just rebooted from Linux where it had an uptime measured in seconds.

Update : while writing this, the machine has managed to crash while running nouveau again (from my portable eSATA drive with my office copy of Ubuntu on it, that usually runs on an Intel or ATI machine and doesn't have the nvidia drivers on it). Again, maybe this was down to the accumulation of bad state, but my experiments with nouveau haven't really been any more stable than the nvidia drivers so far.

Hopefully my enormous ramblings will be helpful.
Attached Files
File Type: gz nvidia-bug-report.log.gz (62.9 KB, 43 views)
dr_barnowl is offline   Reply With Quote
Old 12-26-11, 05:03 AM   #5
edscottwilson
Registered User
 
Join Date: Dec 2011
Posts: 2
Default Re: Segfault with 285.05.09

I've now posted a nvidia-bug-report with the following information.

The bug apparently is due to a race condition between threads, and therefore is not reproducible on demand. I have not yet pinpointed the race, but it happens in current svn code for "rodent filemanager" apparently when two threads are producing graphic output via the gtk library. These threads are protected with a mutex (GDK_THREADS_ENTER/LEAVE) mechanism.

Latest core dump reveals that after a program fork, the wrong process is inheriting the gtk loop and thus all graphic output is blocked. I don't know exactly how gtk determines which process keeps the gtk event loop after a fork, but in my experience, the first process which has graphic i/o keeps the event loop. This is the latest traceback:

Core was generated by `rodent-forked /raid/home/SVN/xffm/CURRENT/rodent'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f20f9761c50 in _nv019tls () from /usr/lib64/tls/libnvidia-tls.so.290.10
(gdb) where
#0 0x00007f20f9761c50 in _nv019tls () from /usr/lib64/tls/libnvidia-tls.so.290.10
#1 0x00007f20fba15966 in fork () from /lib64/libc.so.6
#2 0x00007f210072b30b in Tubo_threads (fork_function=0x7f210071d539 <fork_function>,
fork_function_data=0x7fffbe080910, stdin_fd=0x0, stdout_f=0x7f210071b9f5 <rfm_operate_stdout>,
stderr_f=0x7f210071b96e <rfm_operate_stderr>,
tubo_done_f=0x7f210071d517 <run_fork_finished_function>, user_function_data=0x80c900,
reap_child=0, check_valid_ansi_sequence=1) at ../../libs/tubo/src/tubo.c:109
#3 0x00007f210071e88c in thread_run (in_widgets_p=0x682328, argv=0x7fffbe080910,
stdout_f=0x7f210071b9f5 <rfm_operate_stdout>, stderr_f=0x7f210071b96e <rfm_operate_stderr>,
stdin_fd=0x0) at ../../libs/rfm/primary/primary-run.i:444
#4 0x00007f210071eb8b in private_rfm_thread_run_argv (in_widgets_p=0x682328, argv=0x7fffbe0849e0,
interm=1, stdout_f=0, stderr_f=0, stdin_fd=0x0) at ../../libs/rfm/primary/primary-run.i:508
#5 0x00007f210071f026 in rfm_thread_run_argv (widgets_p=0x682328, argv=0x7fffbe0849e0, interm=1)
at ../../libs/rfm/primary/primary-run.c:60
#6 0x00007f21004d448c in rodent_open_in_terminal_activate (menuitem=0x7f20e0065840,
user_data=0x682328) at ../../libs/rfm/rodent/rodent_popup.i:997
#7 0x00007f20fd63825e in g_closure_invoke () from /usr/lib64/libgobject-2.0.so.0
#8 0x00007f20fd64e6f7 in ?? () from /usr/lib64/libgobject-2.0.so.0
#9 0x00007f20fd64fbb6 in g_signal_emit_valist () from /usr/lib64/libgobject-2.0.so.0
#10 0x00007f20fd650143 in g_signal_emit () from /usr/lib64/libgobject-2.0.so.0
#11 0x00007f20ff9a06ee in gtk_widget_activate () from /usr/lib64/libgtk-x11-2.0.so.0
#12 0x00007f20ff893ced in gtk_menu_shell_activate_item () from /usr/lib64/libgtk-x11-2.0.so.0
#13 0x00007f20ff89558b in ?? () from /usr/lib64/libgtk-x11-2.0.so.0
#14 0x00007f20ff885a28 in ?? () from /usr/lib64/libgtk-x11-2.0.so.0
#15 0x00007f20fd63825e in g_closure_invoke () from /usr/lib64/libgobject-2.0.so.0
#16 0x00007f20fd64e340 in ?? () from /usr/lib64/libgobject-2.0.so.0
#17 0x00007f20fd64f9fb in g_signal_emit_valist () from /usr/lib64/libgobject-2.0.so.0
#18 0x00007f20fd650143 in g_signal_emit () from /usr/lib64/libgobject-2.0.so.0
#19 0x00007f20ff99cd7f in ?? () from /usr/lib64/libgtk-x11-2.0.so.0
#20 0x00007f20ff87dee3 in gtk_propagate_event () from /usr/lib64/libgtk-x11-2.0.so.0
#21 0x00007f20ff87ef9b in gtk_main_do_event () from /usr/lib64/libgtk-x11-2.0.so.0
#22 0x00007f20ff4f3e6c in ?? () from /usr/lib64/libgdk-x11-2.0.so.0
#23 0x00007f20fcf4e262 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#24 0x00007f20fcf52928 in ?? () from /usr/lib64/libglib-2.0.so.0
#25 0x00007f20fcf52e35 in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#26 0x00007f20ff87f437 in gtk_main () from /usr/lib64/libgtk-x11-2.0.so.0
#27 0x00000000004033e4 in main (argc=2, argv=0x6294f0) at ../../src/rfm/fm/rodent.c:493

I will keep looking for the race condition which is triggering the segfault.
edscottwilson is offline   Reply With Quote
Old 01-17-12, 04:56 AM   #6
sandipt
NVIDIA Corporation
 
sandipt's Avatar
 
Join Date: Dec 2010
Posts: 260
Default Re: Segfault with 285.05.09

tried with Ubuntu 11.10 x64 with blender-2.60 application, rodent file manager and chrome browser but not able to reproduce.
sandipt is offline   Reply With Quote
Old 01-17-12, 06:24 AM   #7
mrpollo
developer
 
Join Date: Jan 2012
Posts: 2
Default Re: Segfault with 285.05.09

Same problem here:

#0 0x00007f05f95e6a08 in _nv007tls () from /usr/lib/tls/libnvidia-tls.so.290.10
#1 0x00007f0601e24ed9 in __nptl_deallocate_tsd () at pthread_create.c:155
#2 0x00007f0601e258c8 in start_thread (arg=<value optimized out>) at pthread_create.c:307
#3 0x00007f05fd08802d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4 0x0000000000000000 in ?? ()

Debian Linux installed. It only happens in quadro cards, as I have tested it out in other GeForce GTX 460. In my case setting "export __GL_SINGLE_THREADED=1" before the execution seems to solve this issue.
Crashes in __nptl_deallocate_tsd might be caused by dangling thread_key_delete function.
Old drivers like 195.36.31 or 256.53 solve the issue as well as this only happens in recent drivers.
mrpollo is offline   Reply With Quote
Old 01-17-12, 06:30 AM   #8
sandipt
NVIDIA Corporation
 
sandipt's Avatar
 
Join Date: Dec 2010
Posts: 260
Default Re: Segfault with 285.05.09

Quote:
Originally Posted by mrpollo View Post
Same problem here:

#0 0x00007f05f95e6a08 in _nv007tls () from /usr/lib/tls/libnvidia-tls.so.290.10
#1 0x00007f0601e24ed9 in __nptl_deallocate_tsd () at pthread_create.c:155
#2 0x00007f0601e258c8 in start_thread (arg=<value optimized out>) at pthread_create.c:307
#3 0x00007f05fd08802d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4 0x0000000000000000 in ?? ()

Debian Linux installed. It only happens in quadro cards, as I have tested it out in other GeForce GTX 460. In my case setting "export __GL_SINGLE_THREADED=1" before the execution seems to solve this issue.
Crashes in __nptl_deallocate_tsd might be caused by dangling thread_key_delete function.
Old drivers like 195.36.31 or 256.53 solve the issue as well as this only happens in recent drivers.

Please provide nvidia-bug-report, reproduction steps and if possible .
sandipt is offline   Reply With Quote

Old 01-17-12, 06:41 AM   #9
mrpollo
developer
 
Join Date: Jan 2012
Posts: 2
Default Re: Segfault with 285.05.09

Thanks for the quick reply, here's the bug report:

http://dl.dropbox.com/u/5743483/nvid...-report.log.gz

Hard to reproduce as it only happens when performing some fluid simulation (SPH) in a propietary software (RealFlow)
mrpollo is offline   Reply With Quote
Old 03-21-12, 05:41 AM   #10
mjakt
Registered User
 
Join Date: Mar 2012
Posts: 1
Default Re: Segfault with 285.05.09

Has there been any resolution to this problem?

I'm having identical issues with 290.10-13.1 running on OpenSUSE 12.1 with a GeForce GTX 560 Ti.

The segmentation fault is triggered by my own application whilst processing data in several concurrent threads. The segmentation fault only happens if I'm not running the application from GDB, but the backtrace of the core gives me:


Program terminated with signal 11, Segmentation fault.
#0 0x00007ff0c8171a08 in _nv007tls () from /usr/lib64/tls/libnvidia-tls.so.290.10
(gdb) bt
#0 0x00007ff0c8171a08 in _nv007tls () from /usr/lib64/tls/libnvidia-tls.so.290.10
#1 0x00007ff0c8e82ca3 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#2 0x00007ff0c8e82f13 in start_thread () from /lib64/libpthread.so.0
#3 0x00007ff0c844f10d in clone () from /lib64/libc.so.6

so, pretty much identical.

As suggested by a previous poster:

export __GL_SINGLE_THREADED=1

seems to prevent the crash.

For me, on this hardware the crash is very reproducible, but on an older computer running 275.21-7.1 I have not seen this issue.

For now I'll use the fix above, but am interested if a resolution to the problem should be found. Also, if useful I can supply code and data that (so far at least) consistently cause the bug to occur.

cheers,

Martin
mjakt is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 07:02 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.