Ok, I was wrong, what I said is true in my gcc version. here go two useful links:

here is a discussion about the subject:

look for the MMX/3Dnow!/SSE/SSE2 compilers thread. As you can see, SSE2 instructions may take longer than FPU instructions.

And here there is a depiction of what can you expect:

I realize x86 architecture is not appropiated for number crunching (look at the great results of the humble G3 300MHz Apple iBook)

I store all my stuff in /dev/null, so I don't need zip, gzip nor rar.

