Ok, I was wrong, what I said is true in my gcc version. here go two useful links:
here is a discussion about the subject:
http://compilers.iecc.com/comparch/index/2002-05
look for the MMX/3Dnow!/SSE/SSE2 compilers thread. As you can see, SSE2 instructions may take longer than FPU instructions.
And here there is a depiction of what can you expect:
http://www.andrew.cmu.edu/~komarek/w...leskyPerf.html
I realize x86 architecture is not appropiated for number crunching (look at the great results of the humble G3 300MHz Apple iBook)
Bye
E.