Hellbinder
09-19-03, 01:15 PM
Here it is and this is brand new info (At least for some of it)
http://www.beyond3d.com/forum/viewtopic.php?t=8005
Check out the Diagram.
That pretty much lays out how we got to where we are today with game performance. You can see that Ati simply has more execution units per pipeline. Nvidia also has its Texture lookup shared with one of the FP ALUs for each pipeline. Ati has it in seperate hardware. I Also found interesting that Nvidia it seems they have Quite a few more Native Math Capabilites (which some of us already knew... but its good to point out for eveyone else).
This was a shocker...
Actually, because of constant propagation optimization, it should execute in 1 cycle on an R3x0 (eventually). Something like:
add_sat oC0, (c0+c1-c2), v1
We are working hard on improving our current PS compiler, so that it can map PS ops to our HW in an optimal way. The current stuff is pretty simple. The HW is naturally very fast and executes well. However, it will get better. That's also why one should be careful when trying to determine our internal architecture based on shader code.
From Sireric of ATi. apparently they are only just begining to optomize their own shader compiler. Everything you see currently is based on Simple Raw Calculation performance of the hardware. :eek:
http://www.beyond3d.com/forum/viewtopic.php?t=8005
Check out the Diagram.
That pretty much lays out how we got to where we are today with game performance. You can see that Ati simply has more execution units per pipeline. Nvidia also has its Texture lookup shared with one of the FP ALUs for each pipeline. Ati has it in seperate hardware. I Also found interesting that Nvidia it seems they have Quite a few more Native Math Capabilites (which some of us already knew... but its good to point out for eveyone else).
This was a shocker...
Actually, because of constant propagation optimization, it should execute in 1 cycle on an R3x0 (eventually). Something like:
add_sat oC0, (c0+c1-c2), v1
We are working hard on improving our current PS compiler, so that it can map PS ops to our HW in an optimal way. The current stuff is pretty simple. The HW is naturally very fast and executes well. However, it will get better. That's also why one should be careful when trying to determine our internal architecture based on shader code.
From Sireric of ATi. apparently they are only just begining to optomize their own shader compiler. Everything you see currently is based on Simple Raw Calculation performance of the hardware. :eek: