Originally posted by Uttar
Your understanding of shuffle instruction is not correct.
The tradtional ( read: NV20, NV25, NV30, ... ) architectures work on Vec4s. The R300 works on Vec3s and on Scalars at the same time.
This results in improved performance if you can run both of them in parallel.
All they're doing is saying "Do this before that instead of after that" - nothing more. This will result in no IQ difference, and the shader will still work in all cases.
IMO, this is a perfectly valid optimization, and ATI is really only removing it to make sure people who don't know what they're talking about don't spread BS about them cheating.
i dont believe it works without external pre-calculation but if what you said was true, i really think ati is stupid not using 8% more performance boost just because people THINK its a cheat, ati could just prove that it was not a cheat by using it properly without doing driver detection and code alteration in next driver.
to me, removing it does not make it more innocent than proving it to be a valid optimization.
but thanks for you view, i just cant believe it. if its valid why remove it? i just cant get it, sorry.