Solid Snake
06-18-05, 12:22 AM
"I also scratched my head which kind of wild marketing optimizm lead to these numbers. Things I could think of:
If texture interpolation now can be done in FP32:
Bilinear interpolation per one component takes 4 multiplies, 3 adds and 2 subs: 9 ops, x4 channels = 36 ops.
If result blend operation (when writing the results into FP32 buffer) can be done in FP32, it would add another 12 ops.
Add the original 8 ops of the other shareder unit (4xMADD)
We have 56 ops/cycle/pipe.
At (suppose) PS 32 pipes, it would be 1792 FP ops/cycle.
At 550Mhz, we have 985 GFlops.
Well, we do have a (theoretic) teraflop, we are almost there I am not too sure about number of PS pipes (24 or 32?). Anayway, I did not count the VS pipes (8, likely?). Also, if anamorphic filtering can be done in FP32, that would hike the texture unit FP to twice, and we would be at 1.8 Tflops
Now, before we claim a record, let see how it compares to a Cray supercomputer, model X1E, one liquid cooled cabinet configuration:
2.3 TFlops, memory bandwidth 3200Gb/s
Teraflops could be comparable, but it seems we have a memory bandwidth problem: Cray has 100x more to do any useful job with all these FP units
Roman"
If texture interpolation now can be done in FP32:
Bilinear interpolation per one component takes 4 multiplies, 3 adds and 2 subs: 9 ops, x4 channels = 36 ops.
If result blend operation (when writing the results into FP32 buffer) can be done in FP32, it would add another 12 ops.
Add the original 8 ops of the other shareder unit (4xMADD)
We have 56 ops/cycle/pipe.
At (suppose) PS 32 pipes, it would be 1792 FP ops/cycle.
At 550Mhz, we have 985 GFlops.
Well, we do have a (theoretic) teraflop, we are almost there I am not too sure about number of PS pipes (24 or 32?). Anayway, I did not count the VS pipes (8, likely?). Also, if anamorphic filtering can be done in FP32, that would hike the texture unit FP to twice, and we would be at 1.8 Tflops
Now, before we claim a record, let see how it compares to a Cray supercomputer, model X1E, one liquid cooled cabinet configuration:
2.3 TFlops, memory bandwidth 3200Gb/s
Teraflops could be comparable, but it seems we have a memory bandwidth problem: Cray has 100x more to do any useful job with all these FP units
Roman"