PDA

View Full Version : Anybody into CUDA ?


Pages : [1] 2

walterman
09-02-08, 06:49 PM
I just installed the SDK & Toolkit, and i'm reading the documentation.

The first step is always the hardest -> Build & compile the first project :)

I'm checking the SDK\Projects samples. In special, there is one called 'cppIntegration'.

I can compile & run all the projects, but, i still dunno how to create a project from zero in Visual Studio.

I think that everybody would like to use standard C++ code, and a call to a CUDA function, to do the hard work.

If somebody has tried these first steps, please, share your knowledge with us :)

hemmy
11-28-08, 12:31 PM
I got to about where you are, and just kinda gave up

Indeed it would be easy if we could write it in c/c++ :(

ViN86
11-29-08, 07:51 PM
im pretty sure you need to add a reference to some DLL or something but i cant figure it out either.

the software seems very powerful but damn is it hard to get going :lol:

walterman
11-30-08, 02:53 PM
I'm waiting for the CUDA 2.1 release. I heard that it will support VS 2008, and it is what i'm using now. I need this version, cause i use SSE4 assembler ops in my code. I did some tests with VS 2005, but, i was not able to compare my C++ code with the CUDA code, due the missing SSE4 ops.

nekrosoft13
12-01-08, 09:37 AM
planning to implement it in your BloodRayne 2 patch?

jcrox
12-01-08, 10:04 AM
the software seems very powerful but damn is it hard to get going :lol:

My Java instructor who does some sort of research with GPGPUs said the same exact thing about CUDA, it can be a real pain to get it up and running but once you figure it out its apparently some pretty cool stuff.

If anyone does get it up and running please, do tell how you did so... I really want to try it out soon

walterman
12-01-08, 01:11 PM
planning to implement it in your BloodRayne 2 patch?

In the benchmark part only.

In the game, i see 2 problems:

1) I dunno if Direct3D 8 will work with CUDA.
2) The game is bounded by the gfx card power (& i do not like to lose frame rate).


If anyone does get it up and running please, do tell how you did so... I really want to try it out soon

Once you install VS 2005 & the CUDA SDK, check the 'nvidia cuda sdk\projects\template' folder. It's the most simple project. It works in VS 2005.

My problem is that i already have a full working solution, and i want to add CUDA support to it, and it's a VS 2008 project.

Bad Sector
01-13-09, 07:38 PM
I played with CUDA and wrote a bit about it (http://www.badsectoracula.com/blog/playingwithcuda.html). Its a very interesting technology and once i find some more time i plan to play more with it :-)

ViN86
01-13-09, 07:51 PM
i am going to be working with PDE solvers here in grad school, so i may give CUDA a try when i start working on the project.

Tuork
01-15-09, 06:07 PM
Does anyone know any simple guides to get started with CUDA?

CUDA is part of my "want to learn" list for 2009.
:)

Bman212121
01-19-09, 12:39 AM
i m not familiar with CUDA but i want to know some how about it still i think you might have to change your plat forum may be it will work.wish you good luck


Until then you might want to work on your grammar. :lol:

walterman
02-19-09, 02:54 PM
Finally, i have the whole thing working: VS 2008 + CUDA 2.1.

The setup to create a new project, by hand, is not very easy, and i lost a lot of time trying to compile my first program with success.

This is a quick guide, from my personal experiences:


Download & Install CUDA 2.1 & the CUDA SDK, from the nVidia site.


Open VS 2008, and start a new C++ Win32 Console project.


Right click in the name of the project in the solution explorer, and select 'Custom Build Rules'. A new window will open. Click on 'import' and select the file "C:\CUDA SDK\common\CUDA.rules". Then mark the checkbox for the CUDA files.


Rename the main .cpp file to .cu


Right click on the .cu file and select 'properties'. Select 'CUDA Build Rule 2.1.0' in General - Tool.


Open the properties of the project, and change:


C++
General



Additional include directories: $(CUDA_INC_PATH);$(NVSDKCUDA_ROOT)\common\inc
Debug Information Format: Program Database (/Zi)


Code Generation



Runtime Library: Multi-threaded Debug (/MTd)



Linker
General



Enable incremental linkin: No (/INCREMENTAL:NO)
Additional Library Directories: $(CUDA_LIB_PATH);$(NVSDKCUDA_ROOT)\common\lib


Input



Additional Dependencies: cudart.lib cutil32D.lib


Optimization



Enable COMDAT folding: Do Not Remove Redundant COMDATs (/OPT:NOICF)



Then, edit your .cu file, and put your kernel there.

I tried my perlin code on the GPU, and, atm, it sucks. It's slightly faster than my multi-threaded SSE3 code, and this gfx card has 240 'cores', vs my quad. Obviously, i need to learn some CUDA tricks to speed up my code, because this cannot be so slow.

Also, i still dunno how to debug the CUDA code. It's a pain in the ass atm when something does not work.

Tuork
02-19-09, 03:10 PM
Thanks for the input mate.
I'll give this a go whenever I get some free time... which is no time soon :(

walterman
02-20-09, 10:34 AM
After a long night of fighting vs CUDA, i managed to finish the first alpha version of my benchmarking tool.

You can get the tool here:
http://www.speedyshare.com/566199380.html

Unrar, exec 'run_tests.bat', and you should see something like this:

http://i42.tinypic.com/2qaqwlk.png

The best time of my GTX285 in the 256x256 test was 0.024s, and the best time of my 3.6 GHz Xeon 3350 Quad was 0.65s (using the old tool that comes with my br2 patch). So, my GPU is running around 27x times faster than my CPU in this test. Finally, good results.

There is still a lot of room for optimizations, so, this is going really well.

There are some problems with the FP 'precision'. The ALUs in the GPU do not follow the IEEE FP standards, and there are some errors in the 1024x1024 test.

I would like to see your results.

walterman
02-21-09, 02:25 PM
Well, i was doing something bad, and the results of the test are not valid.

The kernel was not running, because i was selecting bad <<numBlocks, numThreads>>. The only real results for the 1st run are: [128,512] to [32768,8].

So, the GPU is only running 1.25x times faster than the CPU, which is a really bad result.

I have a new version, in which i get 2.5x times faster than the CPU.

This has been a big disappointment.

Note: I have deleted the dl link.

walterman
02-21-09, 03:30 PM
Z:\code\Visual Studio Projects\BloodRayne 2\br2cudaPerlin\Debug>br2cudaperlin 20
0 256

BloodRayne 2 FSAA Patch - CUDA Perlin Benchmark Tool 0.11 Alpha
---------------------------------------------------------------

Running Benchmarks ...
----------------------
[128, 512] Total Time: 0.263607s
[256, 256] Total Time: 0.256560s
[512, 128] Total Time: 0.256693s
[1024, 64] Total Time: 0.255465s
[2048, 32] Total Time: 0.251548s
[4096, 16] Total Time: 0.267286s
[8192, 8] Total Time: 0.306301s
[16384, 4] Total Time: 0.469901s
[32768, 2] Total Time: 0.908473s

Best Config [2048, 32]: 0.251548s

Running Verification Test at [2048, 32] ...
--------------------------------------------
Everything OK :)

Now, it is ok, but, the performance isn't that great as expected.

You can leech it here: http://www.megaupload.com/?d=Y9M8S5FB

walterman
02-26-09, 05:16 PM
I have a new version.

It uses 2 methods: texture fetching / shared memory.

BloodRayne 2 FSAA Patch - CUDA Perlin Benchmark Tool 0.15 Alpha
---------------------------------------------------------------

Running Benchmarks ...
----------------------
TF [128, 512] Total Time: 0.192191s
SM [128, 512] Total Time: 0.107221s
TF [256, 256] Total Time: 0.191223s
SM [256, 256] Total Time: 0.103024s
TF [512, 128] Total Time: 0.190813s
SM [512, 128] Total Time: 0.126796s
TF [1024, 64] Total Time: 0.189704s
SM [1024, 64] Total Time: 0.189470s
TF [2048, 32] Total Time: 0.189634s
SM [2048, 32] Total Time: 0.390741s
TF [4096, 16] Total Time: 0.198291s
SM [4096, 16] Total Time: 0.942939s
TF [8192, 8] Total Time: 0.255677s
SM [8192, 8] Total Time: 2.238887s
TF [16384, 4] Total Time: 0.435167s
SM [16384, 4] Total Time: 6.500768s
TF [32768, 2] Total Time: 0.856746s
SM [32768, 2] Total Time: 22.913676s

Best Config (Shared Memory) [256, 256]: 0.103024s

Running Verification Test at (Shared Memory) [256, 256] ...
------------------------------------------------------------
Everything OK :)

BloodRayne 2 FSAA Patch - CUDA Perlin Benchmark Tool 0.15 Alpha
---------------------------------------------------------------

Running Benchmarks ...
----------------------
TF [512, 512] Total Time: 0.697142s
SM [512, 512] Total Time: 0.370065s
TF [1024, 256] Total Time: 0.692771s
SM [1024, 256] Total Time: 0.374493s
TF [2048, 128] Total Time: 0.690623s
SM [2048, 128] Total Time: 0.464357s
TF [4096, 64] Total Time: 0.688960s
SM [4096, 64] Total Time: 0.712639s
TF [8192, 32] Total Time: 0.690871s
SM [8192, 32] Total Time: 1.504626s
TF [16384, 16] Total Time: 0.702908s
SM [16384, 16] Total Time: 3.673776s
TF [32768, 8] Total Time: 0.974903s
SM [32768, 8] Total Time: 8.863379s

Best Config (Shared Memory) [512, 512]: 0.370065s

Running Verification Test at (Shared Memory) [512, 512] ...
------------------------------------------------------------
Everything OK :)

BloodRayne 2 FSAA Patch - CUDA Perlin Benchmark Tool 0.15 Alpha
---------------------------------------------------------------

Running Benchmarks ...
----------------------
TF [2048, 512] Total Time: 2.361321s
SM [2048, 512] Total Time: 1.282897s
TF [4096, 256] Total Time: 2.369100s
SM [4096, 256] Total Time: 1.336673s
TF [8192, 128] Total Time: 2.361914s
SM [8192, 128] Total Time: 1.677807s
TF [16384, 64] Total Time: 2.360700s
SM [16384, 64] Total Time: 2.639611s
TF [32768, 32] Total Time: 2.360139s
SM [32768, 32] Total Time: 5.692210s

Best Config (Shared Memory) [2048, 512]: 1.282897s

Running Verification Test at (Shared Memory) [2048, 512] ...
------------------------------------------------------------
Everything OK :)

It's 6.5x times faster than the CPU. It will be hard to make it faster.

You can leech it here: http://www.speedyshare.com/455357158.html

I have problems to run it on my old G80. If somebody can try it, i would like to know if it works with other cards.

Dreamweavernoob
02-28-09, 06:08 PM
how do you learn this stuff? I have always wanted to learn about software development but never knew where to start :(

walterman
02-28-09, 06:27 PM
how do you learn this stuff? I have always wanted to learn about software development but never knew where to start :(

If you want to do it professionally, at the University, or another sort of superior studies.

Personally, i started with the BASIC of my Sinclair ZX Spectrum, 25y ago, and i haven't stopped learning yet.

Time & patience make the master.

Dreamweavernoob
03-01-09, 03:12 PM
If you want to do it professionally, at the University, or another sort of superior studies.

Personally, i started with the BASIC of my Sinclair ZX Spectrum, 25y ago, and i haven't stopped learning yet.

Time & patience make the master.

Much respect for you dude.

Phyxion
03-02-09, 02:14 AM
how do you learn this stuff? I have always wanted to learn about software development but never knew where to start :(
First start with some VERY basic examples. Good languages to start with are C# and maybe Java, however I do suggest C#. After going through some more difficult C# you can take a look at C++. There are some pretty good tutorials available on how to start.

lightman
03-02-09, 05:59 AM
First start with some VERY basic examples. Good languages to start with are C# and maybe Java, however I do suggest C#. After going through some more difficult C# you can take a look at C++. There are some pretty good tutorials available on how to start.

I would suggest against C#. C++, with all its shortcomings is a widely used, cross platform, and standard language. C# is not so diffused outside the Microsoft/.NET environments (Mono tries to replicate most of the functionalities and the environment, but only up to .NET 2.0, as far as I know, although the implementation of the C# standard is complete).

To start, either go with Java and/or C++ if you want to begin with a OO language, or go with (clean) C.

C has the advantage of not letting any wrongdoing go unnoticed. You have to be very careful with memory allocation/deallocation (think explicit garbage collection), otherwise you risk ending up with segfaults really quickly. You have to thoroughly understand pointers and how things are stored in memory, which is always a good thing. And when you're ready, you can delve in more difficult to master techniques, like e.g. pointer arithmetic.

If you want to start with C, look for the abundant online tutorials and the K&R book.

For C++, the Stroustup can be a little hard in some points, but it's still one of the best books out there.

walterman
03-04-09, 04:31 PM
A new "beta" version:
http://www.speedyshare.com/366621969.html

It will benchmark your CPU vs your GPU.

It supports multi-GPU rigs too.

You can specify from the command line, the number of GPUs to use. You will need to disable SLI to use multiple GPUs in CUDA, according to nVidia papers.

Examples:

br2perlin 1 5 -> This will use just 1 GPU
br2perlin 2 5 -> This will use 2 GPUs

The library also supports mixing the CPU & GPU at the same time. In theory, when i designed it, i thought that CPU+GPU was going to be faster, but, due to the asynchronous nature of CUDA, it ends slower than the CPU or GPU alone.

My BR2 Patch is using the new CUDA code now, and the perlin effects run on the GPU now.

Unluckily, if you only have 1 gfx card, this is not a good idea, because the framerate is lower due to the resources used for the CUDA calculations. But, if you have 2 gfx cards, you won't lose any fps, and the perlin code will run faster in the GPU (bigger & more complex effects).

Basically, i've written this to use my old 8800GTX to run the Perlin effects, and my GTX285 to render the shiny graphics at 1920x1200 SSAA 2x :)

The results of my Xeon 3350 @ 3.6 GHz + eVGA GTX 285 SSC:
CPU SSE3 4 Threads
Total Time: 0.660127, Min: -0.699944, Max: 0.798931, Range: 1.498875
GPU
Total Time: 0.106165

In my system, the GPU is 6.5x times faster than the CPU.

Tuork
03-04-09, 06:15 PM
Hey mate, you think you could whip up a simple tutorial on how to work with CUDA?

I know several people here, including myself, would greatly appreciate it.
:)

walterman
03-05-09, 10:16 AM
A quick update (0.46):
http://www.speedyshare.com/633582280.html

It adds 'kernel launch error detection'.

It seems like some cards refuse to launch the kernel (8600 GTS).

In my old 8800GTX, it needs 0.22s.