Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 02-16-12, 01:15 PM   #1
kdt3rd
Registered User
 
Join Date: Feb 2012
Posts: 4
Default 295.20 crash on fork call

Hi -

290.10 was fine, but in the 295.20 series driver, I am seeing a crash during the call to fork. Basically, have a QThread that is running and monitoring a background process. If I run the program in gdb, it works, I presume because gdb is zero-ing memory as it loads objects. If I run normally, here is the call stack where it is crashing.

Thanks in advance,

#0 0x00007fc639eb6c0f in _nv022tls () from /usr/lib/nvidia-current/tls/libnvidia-tls.so.295.20
#1 0x00007fc63ed83a01 in ?? () from /usr/lib/nvidia-current/libGL.so.1
#2 0x00007fc63b738425 in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:95
#3 0x00000000006e6829 in Core::SystemProcess::execute(bool) ()
#4 0x000000000053ef55 in ProcessThread::run() ()
#5 0x00007fc642152775 in ?? () from /usr/lib/libQtCore.so.4
#6 0x00007fc63c5c19ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#7 0x00007fc63b77470d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()
kdt3rd is offline   Reply With Quote
Old 02-16-12, 01:56 PM   #2
danix
NVIDIA Corporation
 
danix's Avatar
 
Join Date: Feb 2010
Location: Santa Clara, CA
Posts: 237
Default Re: 295.20 crash on fork call

Hi kdt3rd,

What application is this? If it's your own app, can you provide it, or at least a cut down version that captures your forking code and is able to reproduce the problem? If you don't want the app to be shared publicly, you can upload it to the VDPAU file drop described here: http://www.nvnews.net/vbulletin/showthread.php?t=123819

Please also provide an nvidia-bug-report.log file: http://www.nvnews.net/vbulletin/showthread.php?t=46678
danix is offline   Reply With Quote
Old 02-16-12, 03:00 PM   #3
kdt3rd
Registered User
 
Join Date: Feb 2012
Posts: 4
Default Re: 295.20 crash on fork call

It is our own application. It is not possible for me to upload that, however, it is trivial to recreate. Attached is a "forktest" qt application that exhibits the problem. It is perhaps the most complicated way to draw a dark grey square imaginable, but is trivial. Steps to recreate:

% qmake
% make
% ./forktest

hit the "do it" button.

Also enclosed in the zip file is the nvidia bug report log file...
Attached Files
File Type: zip forktest.zip (58.5 KB, 100 views)
kdt3rd is offline   Reply With Quote
Old 02-21-12, 06:20 PM   #4
kdt3rd
Registered User
 
Join Date: Feb 2012
Posts: 4
Default Re: 295.20 crash on fork call

as a follow up, this is not related to Qt at all. I have updated my test case to simply be a glut app with a pthread:

% make
% ./forktest

if you hit 'm', it will do the fork in the main thread, and work successfully.
if you hit the spacebar, it will create a thread and do the fork in there. and crash.

thanks in advance,
Attached Files
File Type: zip forktest.zip (1.6 KB, 115 views)
kdt3rd is offline   Reply With Quote
Old 02-26-12, 12:10 AM   #5
ticpu
Registered User
 
Join Date: Feb 2012
Posts: 1
Default Re: 295.20 crash on fork call

That would explain the crash in gnome-shell since 295.20 which probably forks in another thread to get thumbnails of recent items.
ticpu is offline   Reply With Quote
Old 02-28-12, 10:06 AM   #6
HDave
Registered User
 
Join Date: Nov 2007
Posts: 13
Default Re: 295.20 crash on fork call

Would this crash result in a system that is totally frozen -- e.g. unresponsive to keyboard or mouse clicks, but yet still show the mouse cursor moving on the screen?

I am seeing this with 295.20...
HDave is offline   Reply With Quote
Old 02-29-12, 12:03 AM   #7
kdt3rd
Registered User
 
Join Date: Feb 2012
Posts: 4
Default Re: 295.20 crash on fork call

My system hasn't locked up in the manner you mention so far, but I suppose it could, depending on the nature of the bug.
kdt3rd is offline   Reply With Quote
Old 02-29-12, 02:54 AM   #8
sandipt
NVIDIA Corporation
 
sandipt's Avatar
 
Join Date: Dec 2010
Posts: 260
Default Re: 295.20 crash on fork call

NVIDIA internally filed bug 941836 to track this issue.
sandipt is offline   Reply With Quote

Old 03-05-12, 06:09 PM   #9
bobo1on1
Registered User
 
Join Date: Dec 2006
Posts: 5
Default Re: 295.20 crash on fork call

I'm having a similar issue with popen(), but the child process hangs instead of crashes.
From the backtrace it shows that it's hanging on libGL locking a mutex, which will never be unlocked since the child process only has one thread.
bobo1on1 is offline   Reply With Quote
Old 03-31-12, 09:42 AM   #10
graingert
Registered User
 
Join Date: Mar 2012
Posts: 1
Default Re: 295.20 crash on fork call

This seems to be fixed in 295.33

$ ./forktest
starting fork...
in child
exiting child
child successfully waited for in parent...
graingert is offline   Reply With Quote
Old 05-23-12, 03:28 AM   #11
martvdsanden
Registered User
 
Join Date: May 2012
Location: The Netherlands
Posts: 1
Default Re: 295.20 crash on fork call

I am seeing the same problem in 295.49, it was fine before a driver update, I'm not sure what version that was though.

I'm running a multi-threaded program which uses OpenGL (direct X11 setup of GL windows, no QT or Glut), but also needs to fork() to capture a process output.

The stack-trace of the child process below clearly shows that a mutex inside libGL is hanging. Since fork() only "clones" the calling thread, there will be no other thread to unlock this mutex, therefore this is a deadlock situation.

This seems to me to be a bug inside libGL? Anyone any ideas?

I might have to note that it happens only once in a while. I do a lot of forks (a few per second) and usually in about a minute the first child hangs.

Child process stack-trace:

0xb776b424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb776b424 in __kernel_vsyscall ()
#1 0xb75715a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2 0xb756cebb in _L_lock_764 () from /lib/i386-linux-gnu/libpthread.so.0
#3 0xb756cd75 in __pthread_mutex_lock (mutex=0xb5502660) at pthread_mutex_lock.c:82
#4 0xb54be671 in ?? () from /usr/lib/nvidia-current-updates/libGL.so.1
#5 0xb7574464 in __fork () at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c:26
#6 0xb1deb8f1 in yb::HttpServiceCGIResponse::doWrite (this=0xb07070c0, fd=41) at /home/mart/Projects/yb/yb/net/httpservicecgiresponse.cc:53
#7 0xb1dc794d in yb::ConfigServiceRequestHandler:nWrite (this=0xb07079f0) at /home/mart/Projects/yb/plugins/configservice/configservicerequesthandler.cc:319
#8 0xb1de42ec in operator() (this=0xb0700aa0) at /home/mart/Projects/yb/yb/net/httpservice.cc:107
#9 std::_Function_handler<void(), yb::HttpService::asyncWrite(const std::shared_ptr<yb::HttpServiceRequest>&)::<lambda ()>::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
at /usr/include/c++/4.6/functional:1778
#10 0x0808b4c0 in operator() (this=0xb1a291ec) at /usr/include/c++/4.6/functional:2161
#11 yb::WorkerThread::loop (this=0x8c9a758, context=...) at /home/mart/Projects/yb/yb/threading/workerthread.cc:289
#12 0x0808bfc5 in operator() (this=0x8c9ac90) at /home/mart/Projects/yb/yb/threading/workerthread.cc:133
#13 __call<void> (this=0x8c9ac90, __args=...) at /usr/include/c++/4.6/functional:1287
#14 operator()<> (this=0x8c9ac90) at /usr/include/c++/4.6/functional:1378
#15 std::thread::_Impl<std::_Bind_result<void, yb::WorkerThread::increaseThreads(size_t)::<lambda ()>()> >::_M_run(void) (this=0x8c9ac84) at /usr/include/c++/4.6/thread:117
#16 0xb7624007 in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#17 0xb54c3a01 in ?? () from /usr/lib/nvidia-current-updates/libGL.so.1
#18 0x5c8b0824 in ?? ()
#19 0xbab80424 in ?? ()
#20 0x65000000 in ?? ()
#21 0x001015ff in ?? ()
#22 0xd3890000 in ?? ()
#23 0xfff0013d in ?? ()
#24 0xc30173ff in ?? ()
#25 0x0f8122e8 in ?? ()
#26 0x9fc18100 in ?? ()

Last edited by martvdsanden; 05-23-12 at 04:03 AM. Reason: Added that it doesn't happen every fork
martvdsanden is offline   Reply With Quote
Old 06-05-12, 04:31 PM   #12
bji
Registered User
 
Join Date: Jun 2012
Posts: 1
Default Re: 295.20 crash on fork call

I can confirm that this issue still exists in the 295.53 drivers.

I strongly suspect that the issue is as others have surmised:

- The NVidia GL driver uses pthread_atfork() call to register a fork callback in the child process
- This fork callback in the child process attempts to lock a pthread_mutex
- That pthread_mutex was already locked by some other thread in the parent at the time that the fork() call was made, and thus has a locked state in the child process. Thus the attempt to lock the mutex in the child process hangs.

There is a workaround for this. You must compile a shared library that overrides pthread_atfork() with a no-op function, and use LD_PRELOAD to force this version of pthread_atfork() to be used. This will prevent the NVidia GL library from registering its atfork() callback and as far as I can tell has no bad side effects.

To create such a shared library, create a file fix_nvidia.c with these contents:

Code:
#include <stdio.h>

int __register_atfork(void (*prepare) (void), void (*parent) (void),
                      void (*child) (void), void *dso_handle)
{
    fprintf(stderr, "__register_atfork ignored\n");
    fflush(stderr);
    return 0;
}
Then compile this into a shared library with:

gcc -fPIC -c fix_nvidia.c
gcc -shared -o fix_nvidia.so fix_nvidia.o

Now you can use LD_PRELOAD to force this function to be used:

LD_PRELOAD=/path/to/fix_nvidia.so your_program your_commandline_args

If it is working as expected, you will see a few such lines when your program starts up:

__register_atfork ignored
__register_atfork ignored

(I traced the above calls in my program and they were both from the NVidia GL library)

You can remove those fprintf calls and recompile if you don't want to see those messages; I included them just so that you can verify that you have done everything correctly.

It is likely that if your forked program is making any OpenGL calls, forcing the atfork() callback to be ignored will result in problems. I am immediately doing an exec() after my fork() (as I suspect most people are) and don't make any OpenGL calls, so whatever NVidia's child atfork() callback is doing is pointless in my program anyway.

I should add that if some other library or application code in your program legitimately calls pthread_atfork(), using this workaround will likely break your program. However, I expect that pthread_atfork() is very rarely used and for the vast majority of programs, the workaround will not cause any problems.
bji is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 06:55 PM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.