Originally posted by Uttar
Your understanding of shuffle instruction is not correct.
The tradtional ( read: NV20, NV25, NV30, ... ) architectures work on Vec4s. The R300 works on Vec3s and on Scalars at the same time.
This results in improved performance if you can run both of them in parallel.
All they're doing is saying "Do this before that instead of after that" - nothing more. This will result in no IQ difference, and the shader will still work in all cases.
IMO, this is a perfectly valid optimization, and ATI is really only removing it to make sure people who don't know what they're talking about don't spread BS about them cheating.
OMFG I love you Uttar!....that was the most brilliant, perfect explanation of the ATi situation I've ever seen written.