nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   CUDA ok, OpenCL crashes in libcuda.so (http://www.nvnews.net/vbulletin/showthread.php?t=174760)

eudoxos 02-25-12 08:09 AM

CUDA ok, OpenCL crashes in libcuda.so
 
Running Linux, I have nVidia driver 290.10, CUDA and the nVidia GPU computing SDK installed (latest downloadable versions). The card is a GTX560Ti; it not used as graphics card in the computer anymore, an ATI card is. The nVidia card is detected just fine, I can run CUDA programs on it.

OpenCL runtime reports the platform / device correctly as NVIDIA CUDA / GeForce GTX 560 Ti, but whenever I run some code on it, it reports error compiling the program (which is compiled just fine by both Intel and AMD SDK's), i.e. clBuildProgram returns CL_BUILD_PROGRAM_FAILURE and clGetProgramBuildInfo(...,CL_PROGRAM_BUILD_LOG...) crashes:

Code:

#0 __strlen_sse42 () at ../sysdeps/x86_64/multiarch/strlen-sse4.S:32
#1 0x00007ffff22c0e67 in ?? () from /usr/lib/nvidia-current/libcuda.so
#2 0x0000000000402e85 in main (argc=2, argv=0x7fffffffdfc8) at test-chain.cc:107 // this is where clGetProgramBuildInfo is called

Any hint?

(I posted this first on the nvidia forum without any response. Having comparison Intel/nVidia/ATI regarding support of OpenCL, both Intel and AMD were very reponsive when I reported bugs in their compilers. nVidia seems to be happy that it sold the card, pushing CUDA everywhere and not giving a damn about OpenCL. Am I wrong? The card was not the cheapest one, and if I ever finish the code I work on, I will definitely recommend customers to go for ATI.)

eudoxos 03-15-12 07:49 AM

Re: CUDA ok, OpenCL crashes in libcuda.so
 
It reveals this was due to missing LD_LIBRARY_PATH for libnvidia-compiler.so. The stupid people at nvidia don't check for errors from dlopen; for the money they get, they could at least hire competent programmers.

AaronP 03-15-12 03:08 PM

Re: CUDA ok, OpenCL crashes in libcuda.so
 
Hi eudoxos,

Thanks for reporting this. The only place I can find in the code that loads libnvidia-compiler.so does properly check for dlopen failures. Can you please post the test case you're using to reproduce this problem and an nvidia-bug-report.log.gz file?

eudoxos 03-19-12 08:52 AM

Re: CUDA ok, OpenCL crashes in libcuda.so
 
2 Attachment(s)
Hi AaronP,

good to hear that you check for error when opening libnvidia-compiler.so, but apparently you dont' react accordingly anyway.

I added two attachments, one is the output of nvidia-bug-report, the other is the source file plus (trivial) Makefile. The source basically only opens platform+device according to command-line options and compiles a trivial source code (you need to have cl.hpp somewhere around). It takes two args, platform number and device number, but you will see that when you run it without args.

My configuration is such that nvidia directory with libcuda.so, libnvidia-compiler.so & other is NOT in LD_LIBRARY_PATH or in /etc/ldd.so.conf. The reason is that it also contains nvidia's libGL.so, which I however need from fglrx (the xserver runs on the ATI card). For that reason, /etc/OpenCL/vendors/nvidiaocl64.icd contains the absolute path to libcuda.so, i.e. /usr/lib/nvidia-current/libcuda.so; therefore the nvidia platform is discovered by OpenCL runtime -- unlike libnvidia-compiler.so, which is not found by dlopen.

I think you should be able to reproduce the bug trivially on a machine with nvidia installed as per normal, and removing libnvidia-compiler.so somewhere out of reach of dlopen (I can check by strace that the lib is being searched). Then after cleaning ~/.nv/ComputeCache and running the attached program, you should get the crash.

Cheers, Vaclav

---

Additional backtrace for this program, when compiled with -g and run with "gdb --args ./main 1 0".

Code:

** OpenCL ready: platform "NVIDIA CUDA", device "GeForce GTX 560 Ti".
Error building source. Build log follows.

Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2_pminub () at ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:39
39        ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S: No such file or directory.
(gdb) bt
#0  __strlen_sse2_pminub () at ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:39
#1  0x00007ffff22353c7 in ?? () from /usr/lib/nvidia-current/libcuda.so
#2  0x0000000000405521 in cl::detail::GetInfoFunctor1<int (*)(_cl_program*, _cl_device_id*, unsigned int, unsigned long, void*, unsigned long*), _cl_program*, _cl_device_id*>::operator() (this=0x7fffffffdbd0, param=4483, size=0,
    value=0x0, size_ret=0x7fffffffdba8) at ./cl.hpp:1009
#3  0x0000000000405053 in cl::detail::GetInfoHelper<cl::detail::GetInfoFunctor1<int (*)(_cl_program*, _cl_device_id*, unsigned int, unsigned long, void*, unsigned long*), _cl_program*, _cl_device_id*>, std::string>::get (f=...,
    name=4483, param=0x7fffffffde60) at ./cl.hpp:745
#4  0x0000000000404868 in cl::detail::getInfo<int (*)(_cl_program*, _cl_device_id*, unsigned int, unsigned long, void*, unsigned long*), _cl_program*, _cl_device_id*, std::string> (f=0x4014b0 <clGetProgramBuildInfo@plt>,
    arg0=@0x7fffffffddb0: 0xf93170, arg1=@0x7fffffffdc78: 0x710430, name=4483, param=0x7fffffffde60) at ./cl.hpp:1027
#5  0x0000000000403f3c in cl::Program::getBuildInfo<std::string> (this=0x7fffffffddb0, device=..., name=4483,
    param=0x7fffffffde60) at ./cl.hpp:2916
#6  0x0000000000403499 in cl::Program::getBuildInfo<4483> (this=0x7fffffffddb0, device=..., err=0x0) at ./cl.hpp:2925
#7  0x0000000000401ef5 in main (argc=3, argv=0x7fffffffdf88) at main.cpp:54
(gdb)


AaronP 03-20-12 10:42 AM

Re: CUDA ok, OpenCL crashes in libcuda.so
 
Thanks for the detailed report. I identified the problem and filed internal bug number 957326.


All times are GMT -5. The time now is 05:50 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.