Well, as I said, the number of execution units doesn't add up. I'm not sure the person who wrote those shaders tested adequately for parallelism (that is, the possibility of executing two FP instructions per clock is still open...if those two instructions must be executed in parallel instead of serial...which would mean that they would have to have independent data...).
"Physics is like sex. Sure, it may give some practical results, but that's not why we do it." - Richard P. Feynman
