nV News Forums


nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   295.20 crash on fork call (http://www.nvnews.net/vbulletin/showthread.php?t=174244)

kdt3rd 02-16-12 12:15 PM

295.20 crash on fork call
Hi -

290.10 was fine, but in the 295.20 series driver, I am seeing a crash during the call to fork. Basically, have a QThread that is running and monitoring a background process. If I run the program in gdb, it works, I presume because gdb is zero-ing memory as it loads objects. If I run normally, here is the call stack where it is crashing.

Thanks in advance,

#0 0x00007fc639eb6c0f in _nv022tls () from /usr/lib/nvidia-current/tls/libnvidia-tls.so.295.20
#1 0x00007fc63ed83a01 in ?? () from /usr/lib/nvidia-current/libGL.so.1
#2 0x00007fc63b738425 in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:95
#3 0x00000000006e6829 in Core::SystemProcess::execute(bool) ()
#4 0x000000000053ef55 in ProcessThread::run() ()
#5 0x00007fc642152775 in ?? () from /usr/lib/libQtCore.so.4
#6 0x00007fc63c5c19ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#7 0x00007fc63b77470d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

danix 02-16-12 12:56 PM

Re: 295.20 crash on fork call
Hi kdt3rd,

What application is this? If it's your own app, can you provide it, or at least a cut down version that captures your forking code and is able to reproduce the problem? If you don't want the app to be shared publicly, you can upload it to the VDPAU file drop described here: http://www.nvnews.net/vbulletin/showthread.php?t=123819

Please also provide an nvidia-bug-report.log file: http://www.nvnews.net/vbulletin/showthread.php?t=46678

kdt3rd 02-16-12 02:00 PM

Re: 295.20 crash on fork call
1 Attachment(s)
It is our own application. It is not possible for me to upload that, however, it is trivial to recreate. Attached is a "forktest" qt application that exhibits the problem. It is perhaps the most complicated way to draw a dark grey square imaginable, but is trivial. Steps to recreate:

% qmake
% make
% ./forktest

hit the "do it" button.

Also enclosed in the zip file is the nvidia bug report log file...

kdt3rd 02-21-12 05:20 PM

Re: 295.20 crash on fork call
1 Attachment(s)
as a follow up, this is not related to Qt at all. I have updated my test case to simply be a glut app with a pthread:

% make
% ./forktest

if you hit 'm', it will do the fork in the main thread, and work successfully.
if you hit the spacebar, it will create a thread and do the fork in there. and crash.

thanks in advance,

ticpu 02-25-12 11:10 PM

Re: 295.20 crash on fork call
That would explain the crash in gnome-shell since 295.20 which probably forks in another thread to get thumbnails of recent items.

HDave 02-28-12 09:06 AM

Re: 295.20 crash on fork call
Would this crash result in a system that is totally frozen -- e.g. unresponsive to keyboard or mouse clicks, but yet still show the mouse cursor moving on the screen?

I am seeing this with 295.20...

kdt3rd 02-28-12 11:03 PM

Re: 295.20 crash on fork call
My system hasn't locked up in the manner you mention so far, but I suppose it could, depending on the nature of the bug.

sandipt 02-29-12 01:54 AM

Re: 295.20 crash on fork call
NVIDIA internally filed bug 941836 to track this issue.

bobo1on1 03-05-12 05:09 PM

Re: 295.20 crash on fork call
I'm having a similar issue with popen(), but the child process hangs instead of crashes.
From the backtrace it shows that it's hanging on libGL locking a mutex, which will never be unlocked since the child process only has one thread.

graingert 03-31-12 08:42 AM

Re: 295.20 crash on fork call
This seems to be fixed in 295.33

$ ./forktest
starting fork...
in child
exiting child
child successfully waited for in parent...

martvdsanden 05-23-12 02:28 AM

Re: 295.20 crash on fork call
I am seeing the same problem in 295.49, it was fine before a driver update, I'm not sure what version that was though.

I'm running a multi-threaded program which uses OpenGL (direct X11 setup of GL windows, no QT or Glut), but also needs to fork() to capture a process output.

The stack-trace of the child process below clearly shows that a mutex inside libGL is hanging. Since fork() only "clones" the calling thread, there will be no other thread to unlock this mutex, therefore this is a deadlock situation.

This seems to me to be a bug inside libGL? Anyone any ideas?

I might have to note that it happens only once in a while. I do a lot of forks (a few per second) and usually in about a minute the first child hangs.

Child process stack-trace:

0xb776b424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb776b424 in __kernel_vsyscall ()
#1 0xb75715a2 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2 0xb756cebb in _L_lock_764 () from /lib/i386-linux-gnu/libpthread.so.0
#3 0xb756cd75 in __pthread_mutex_lock (mutex=0xb5502660) at pthread_mutex_lock.c:82
#4 0xb54be671 in ?? () from /usr/lib/nvidia-current-updates/libGL.so.1
#5 0xb7574464 in __fork () at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c:26
#6 0xb1deb8f1 in yb::HttpServiceCGIResponse::doWrite (this=0xb07070c0, fd=41) at /home/mart/Projects/yb/yb/net/httpservicecgiresponse.cc:53
#7 0xb1dc794d in yb::ConfigServiceRequestHandler::onWrite (this=0xb07079f0) at /home/mart/Projects/yb/plugins/configservice/configservicerequesthandler.cc:319
#8 0xb1de42ec in operator() (this=0xb0700aa0) at /home/mart/Projects/yb/yb/net/httpservice.cc:107
#9 std::_Function_handler<void(), yb::HttpService::asyncWrite(const std::shared_ptr<yb::HttpServiceRequest>&)::<lambda ()>::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
at /usr/include/c++/4.6/functional:1778
#10 0x0808b4c0 in operator() (this=0xb1a291ec) at /usr/include/c++/4.6/functional:2161
#11 yb::WorkerThread::loop (this=0x8c9a758, context=...) at /home/mart/Projects/yb/yb/threading/workerthread.cc:289
#12 0x0808bfc5 in operator() (this=0x8c9ac90) at /home/mart/Projects/yb/yb/threading/workerthread.cc:133
#13 __call<void> (this=0x8c9ac90, __args=...) at /usr/include/c++/4.6/functional:1287
#14 operator()<> (this=0x8c9ac90) at /usr/include/c++/4.6/functional:1378
#15 std::thread::_Impl<std::_Bind_result<void, yb::WorkerThread::increaseThreads(size_t)::<lambda ()>()> >::_M_run(void) (this=0x8c9ac84) at /usr/include/c++/4.6/thread:117
#16 0xb7624007 in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#17 0xb54c3a01 in ?? () from /usr/lib/nvidia-current-updates/libGL.so.1
#18 0x5c8b0824 in ?? ()
#19 0xbab80424 in ?? ()
#20 0x65000000 in ?? ()
#21 0x001015ff in ?? ()
#22 0xd3890000 in ?? ()
#23 0xfff0013d in ?? ()
#24 0xc30173ff in ?? ()
#25 0x0f8122e8 in ?? ()
#26 0x9fc18100 in ?? ()

bji 06-05-12 03:31 PM

Re: 295.20 crash on fork call
I can confirm that this issue still exists in the 295.53 drivers.

I strongly suspect that the issue is as others have surmised:

- The NVidia GL driver uses pthread_atfork() call to register a fork callback in the child process
- This fork callback in the child process attempts to lock a pthread_mutex
- That pthread_mutex was already locked by some other thread in the parent at the time that the fork() call was made, and thus has a locked state in the child process. Thus the attempt to lock the mutex in the child process hangs.

There is a workaround for this. You must compile a shared library that overrides pthread_atfork() with a no-op function, and use LD_PRELOAD to force this version of pthread_atfork() to be used. This will prevent the NVidia GL library from registering its atfork() callback and as far as I can tell has no bad side effects.

To create such a shared library, create a file fix_nvidia.c with these contents:


#include <stdio.h>

int __register_atfork(void (*prepare) (void), void (*parent) (void),
                      void (*child) (void), void *dso_handle)
    fprintf(stderr, "__register_atfork ignored\n");
    return 0;

Then compile this into a shared library with:

gcc -fPIC -c fix_nvidia.c
gcc -shared -o fix_nvidia.so fix_nvidia.o

Now you can use LD_PRELOAD to force this function to be used:

LD_PRELOAD=/path/to/fix_nvidia.so your_program your_commandline_args

If it is working as expected, you will see a few such lines when your program starts up:

__register_atfork ignored
__register_atfork ignored

(I traced the above calls in my program and they were both from the NVidia GL library)

You can remove those fprintf calls and recompile if you don't want to see those messages; I included them just so that you can verify that you have done everything correctly.

It is likely that if your forked program is making any OpenGL calls, forcing the atfork() callback to be ignored will result in problems. I am immediately doing an exec() after my fork() (as I suspect most people are) and don't make any OpenGL calls, so whatever NVidia's child atfork() callback is doing is pointless in my program anyway.

I should add that if some other library or application code in your program legitimately calls pthread_atfork(), using this workaround will likely break your program. However, I expect that pthread_atfork() is very rarely used and for the vast majority of programs, the workaround will not cause any problems.

All times are GMT -5. The time now is 10:47 PM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2015, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.