Need Gigaflops from GCC on P4
Anybody know how to configure GCC?
I've got SlackWare 8, which for stability reasons has been set up entirely using the '386 instruction set. I think they also have the GCC compiler set up to only generate 386 instructions.
Using this, I can get 0.7 billion double precision floating point operations per second, computing the inner product of two long vectors (on a 1.8GHz Pentium4). That's not bad, but I think I should be able to get more like 8.0 Gflops (10X), if I can invoke the SIMD/SSE2 capabilities of my Pentium 4. The GCC compiler is supposed to support this. I just want to do a little sub-atomic physics at home in my spare time.
I think I may need the -b MACHINE option, which in turn requires (I think?) a library configured for P4 instead of i386. I've tried -mcpu=pentium4 and -march=pentium4 and -msse2 compile options, with no effect. I haven't tried -mfpmath=sse.
Do I need to recompile some libraries for pentium4? Do I need to set up GCC as if it's doing a cross-compile (for a different platform)? Am I even asking the right questions?
Waw, what a level!
I think gnu.org is a good place to start (I think SSE & 3DNOW! are assembly supported, but may be I am wrong).
Ok, I was wrong, what I said is true in my gcc version. here go two useful links:
here is a discussion about the subject:
look for the MMX/3Dnow!/SSE/SSE2 compilers thread. As you can see, SSE2 instructions may take longer than FPU instructions.
And here there is a depiction of what can you expect:
I realize x86 architecture is not appropiated for number crunching (look at the great results of the humble G3 300MHz Apple iBook)
|All times are GMT -5. The time now is 03:36 AM.|
Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright ©1998 - 2014, nV News.