Need Gigaflops from GCC on P4
Anybody know how to configure GCC?
I've got SlackWare 8, which for stability reasons has been set up entirely using the '386 instruction set. I think they also have the GCC compiler set up to only generate 386 instructions.
Using this, I can get 0.7 billion double precision floating point operations per second, computing the inner product of two long vectors (on a 1.8GHz Pentium4). That's not bad, but I think I should be able to get more like 8.0 Gflops (10X), if I can invoke the SIMD/SSE2 capabilities of my Pentium 4. The GCC compiler is supposed to support this. I just want to do a little sub-atomic physics at home in my spare time.
I think I may need the -b MACHINE option, which in turn requires (I think?) a library configured for P4 instead of i386. I've tried -mcpu=pentium4 and -march=pentium4 and -msse2 compile options, with no effect. I haven't tried -mfpmath=sse.
Do I need to recompile some libraries for pentium4? Do I need to set up GCC as if it's doing a cross-compile (for a different platform)? Am I even asking the right questions?