SpiffMistroII
11-16-06, 07:36 AM
Someone got owned badly didn't he?
http://www.rage3d.com/board/showthread.php?t=33873447&page=2
Someone got owned??? What is this now? The World Of WarCraft forum??
Sorry boy no one got "OWNED".
Originally Posted by DemoCoder
First of all, the X1950XTX does not perform a branch in 500 microseconds (0.5 ms). Are you insane? This chip runs at 650Mhz, and 500 microseconds would mean 650 x 10^6 * 0.5 * 10^-3 = 325 * 10^3 = 325,000 cycles latency! What you've done is look at GPUBench scores and fail to understand what was being plotted.
What is being plotted is the average time for a single branch across all pixels. This includes the latency of waiting for free shader units so the code can actually be run. It is not a measurement of a single branch on a single shader unit. That would complete much more quickly. This benchmark is a fairly accurate measurement of performance on branch-intensive shaders.
Quote:Secondly, the comparisons you link to are not "dynamic branching" tests, they are tests are z-cull functionality. Even the GeForce3 has had this, well before DirectX9. It is not more a test of shader branching functionality than early stencil reject, or alpha-kill.
This is exactly the kind of technique formerly used to do branching. As it requires more overhead, it should not perform better than a true PS 3.0 branch. However, early z branching on ATI completes 20 times more quickly than either early z branching or PS 3.0 branching on the G80.
Quote:Third, traversing a BSP with dynamic branching is not so much testing DB performance, but gather operations as well. There are a gazillion variables to consider in any BSP traversal technique, so unless you are prepared to post sample code that reproduces the problem, or atleast explain in pseudo-code detail the algorithm, data layout, et al you are using, the claims are kinda meaningless.
Needless to say, we do not have a "gazillion" variables in our efficient implementation. The representation of our tree in the texture is very cache coherent for our purposes. I am not going to release information on our algorithm But suffice it to say, this is definately a test of dynamic branching.
Quote:Fourth, anyone doing pointer-chasing algorithms would do well to sign up to the CUDA program, as CUDA claims to expose a linear on-chip local storage model with a C programming model that allows gather/scatter "pointer chasing" style code to run alot faster, as well as offering inter-thread communication and synchronization.
Inter-thread communication and synchronization is not needed and not desired. We do have a registration sent in to nVidia to get further information on CUDA. But performance increases are not really expected.
Quote:Maybe if Mike Houston claimed that G80 DB performance was 20x worse than an R580, people might take it more seriously, but you've made a post where you misinterpreted GPU bench figures, and then claimed you have some private benchmark test, without providing any details.
I misinterpreted nothing. Perhaps you misinterpreted what I was saying about the GPUBench figures. I have already given far too much information out regarding our engine. I cannot really give out much more. The GPUBench figures stand on their own.
-Raystonn
http://www.rage3d.com/board/showthread.php?t=33873447&page=2
Someone got owned??? What is this now? The World Of WarCraft forum??
Sorry boy no one got "OWNED".
Originally Posted by DemoCoder
First of all, the X1950XTX does not perform a branch in 500 microseconds (0.5 ms). Are you insane? This chip runs at 650Mhz, and 500 microseconds would mean 650 x 10^6 * 0.5 * 10^-3 = 325 * 10^3 = 325,000 cycles latency! What you've done is look at GPUBench scores and fail to understand what was being plotted.
What is being plotted is the average time for a single branch across all pixels. This includes the latency of waiting for free shader units so the code can actually be run. It is not a measurement of a single branch on a single shader unit. That would complete much more quickly. This benchmark is a fairly accurate measurement of performance on branch-intensive shaders.
Quote:Secondly, the comparisons you link to are not "dynamic branching" tests, they are tests are z-cull functionality. Even the GeForce3 has had this, well before DirectX9. It is not more a test of shader branching functionality than early stencil reject, or alpha-kill.
This is exactly the kind of technique formerly used to do branching. As it requires more overhead, it should not perform better than a true PS 3.0 branch. However, early z branching on ATI completes 20 times more quickly than either early z branching or PS 3.0 branching on the G80.
Quote:Third, traversing a BSP with dynamic branching is not so much testing DB performance, but gather operations as well. There are a gazillion variables to consider in any BSP traversal technique, so unless you are prepared to post sample code that reproduces the problem, or atleast explain in pseudo-code detail the algorithm, data layout, et al you are using, the claims are kinda meaningless.
Needless to say, we do not have a "gazillion" variables in our efficient implementation. The representation of our tree in the texture is very cache coherent for our purposes. I am not going to release information on our algorithm But suffice it to say, this is definately a test of dynamic branching.
Quote:Fourth, anyone doing pointer-chasing algorithms would do well to sign up to the CUDA program, as CUDA claims to expose a linear on-chip local storage model with a C programming model that allows gather/scatter "pointer chasing" style code to run alot faster, as well as offering inter-thread communication and synchronization.
Inter-thread communication and synchronization is not needed and not desired. We do have a registration sent in to nVidia to get further information on CUDA. But performance increases are not really expected.
Quote:Maybe if Mike Houston claimed that G80 DB performance was 20x worse than an R580, people might take it more seriously, but you've made a post where you misinterpreted GPU bench figures, and then claimed you have some private benchmark test, without providing any details.
I misinterpreted nothing. Perhaps you misinterpreted what I was saying about the GPUBench figures. I have already given far too much information out regarding our engine. I cannot really give out much more. The GPUBench figures stand on their own.
-Raystonn