Uttar
11-11-02, 03:10 AM
Hello everyone,
I wouldn't want to criticize ATI by saying this. I think that for a card released so much before the NV30, they did a very good job. Too bad their margins are so small, but since they're ready to give R300s at $399, hey, let them do so! :)
Now, what am going to talk about? Well, I think that simply asking for more AA samples doesn't make any type of sense anymore. It made sense a few years ago. But today, it just won't cut it anymore.
Using more samples is, IMO, brute force. You aren't fixing the REASON the jaggies are created. You're just searching for a work around.
I can already imagine your reaction... "Yeah, great. But that would have been done a long time ago if it was possible".
Actually, no. Let me begin by talking about how Triangle Setup works. I'll simply quote an excellent article about it, available at http://www.extremetech.com/article2/0,3973,471298,00.asp
First off, the triangle setup operation computes the slope (or steepness) of a triangle edge using vertex information at each of edge's two endpoints. You may recall the equation of a straight line being y=mx+b, where y is the y-axis value, x is the x-axis value, b is the value of y when x=0 (the "y intercept"), and m is the slope (or the ratio of the rate of change between x and y values).
The slope is often called delta x/delta y, dx/dy, Dx/Dy, or literally change in x/change in y). Using the slope information, an algorithm called a digital differential analyzer (DDA) can calculate x,y values to see which pixels each triangle side (line segment) touches. The process operates horizontal scan line by horizontal scan line. The DDA figures out the x-value of the pixels touched by a given triangle side in each successive scan-line. (Watt, p. 143)
What it really does is determine how much the x value of the pixel touched by a given triangle side changes per scan line, and increments it by that value on each subsequent scan-line.
To actually calculate the y value of the triangle edge for a given integer value of x, as we move incrementally along the x axis one pixel at a time, we use the slope value. For every single pixel increment along the x-axis, we must increment the y-axis value of the triangle edge by Dy, which is equal to the slope m when x is incremented by one pixel.
Note that each scan line is the next incremental y coordinate in screen space. The y values of non-vertex points on the triangle edge are approximated by the DDA algorithm, and are non-integer floating-point values that typically fall between two integer y values (scan lines). The algorithm finds the nearest y value (scan line number) to assign to y.
This can be seen in the stair-step "jaggie" effect along edges that 3D systems try to reduce using higher resolution display or anti-aliasing techniques that we'll describe soon. Ultimately, the result of the DDA operation is that we now have x,y values for all scan line crossing points of each line segment in a triangle.
Now, the bold is my own personal addition.
The jaggies, thus, are created because the algorithm finds the nearest y value. Also, please note that there is only ONE division there, and that it happens during initialization. Triangle Setup is thus quite fast, really.
Now, there IS another solution to jaggies. During that line calculation algorithm, for each pixel, you calculate how much the approximation is using ONE division ( I tried to do a program seeing if this would work, and it seems that the system I'm describing works great ) which gives the result of how much a pixel is covered by the line.
Now, it all becomes harder. You'll want to blend the coordinate of the main pixel of the part of the line on the other pixel covered. And how much transparent that pixel is depends on how much it is covered.
The problem, here, is that this would require PERFECT front-to-back ordering.
However, there are workarounds. But those are VERY difficult to implement in hardware. That's why I'm really not sure the NV30 will use a so good algorithm - it's a LOT harder than multisampling IMO
You've got to use a buffer which describes the number of overdraw of each pixel, then another one which describes how each pixel got to be blended on another nearby pixel IF the overdraw count is the same. And that buffer also got to have the influence ( which was determined during triangle setup, remember ) on that other pixel to determine the alpha value of the blending. Of course, using another Z Buffer instead of that whole overdraw thing would be more efficient, but a lot more costly.
But then you also got to fix the problem that alpha blending IS order dependent... So you may have to stock final colors in another buffer for a while or you could directly stock the colors in the second buffer which also describes how a pixel got to be blended on another one. That could be interesting.
Now, of course, I wasn't very clear in describing on how to fix those problems. But it really isn't easy to explain :)
I'm not sure of the exact performance cost of this system. It might cost a fair bit, but I'd be heavily surprised if it cost more than 2X MultiSampling. Evantually, if the scene isn't rendered front-to-back correctly, it might cost a little more. But it should never cost anywhere as much as 4X MultiSampling. And the quality would be EXCELLENT.
Okay, so I got no idea if anyone understood me. But anyway, I don't know if such a system will be in the NV30. It would be nice if it was, because having nearly perfect AA at 2X Multisampling cost is certainly nice, but I guess we'll see that at Comdex :)
Uttar
EDIT: Sounds like i forgot to point out that this would be very cheap using Z3 because Z3 supports order-independent transparency. But Z3 will certainly not be in the NV30. However, I think there are ways to implement this beside Z3.
I wouldn't want to criticize ATI by saying this. I think that for a card released so much before the NV30, they did a very good job. Too bad their margins are so small, but since they're ready to give R300s at $399, hey, let them do so! :)
Now, what am going to talk about? Well, I think that simply asking for more AA samples doesn't make any type of sense anymore. It made sense a few years ago. But today, it just won't cut it anymore.
Using more samples is, IMO, brute force. You aren't fixing the REASON the jaggies are created. You're just searching for a work around.
I can already imagine your reaction... "Yeah, great. But that would have been done a long time ago if it was possible".
Actually, no. Let me begin by talking about how Triangle Setup works. I'll simply quote an excellent article about it, available at http://www.extremetech.com/article2/0,3973,471298,00.asp
First off, the triangle setup operation computes the slope (or steepness) of a triangle edge using vertex information at each of edge's two endpoints. You may recall the equation of a straight line being y=mx+b, where y is the y-axis value, x is the x-axis value, b is the value of y when x=0 (the "y intercept"), and m is the slope (or the ratio of the rate of change between x and y values).
The slope is often called delta x/delta y, dx/dy, Dx/Dy, or literally change in x/change in y). Using the slope information, an algorithm called a digital differential analyzer (DDA) can calculate x,y values to see which pixels each triangle side (line segment) touches. The process operates horizontal scan line by horizontal scan line. The DDA figures out the x-value of the pixels touched by a given triangle side in each successive scan-line. (Watt, p. 143)
What it really does is determine how much the x value of the pixel touched by a given triangle side changes per scan line, and increments it by that value on each subsequent scan-line.
To actually calculate the y value of the triangle edge for a given integer value of x, as we move incrementally along the x axis one pixel at a time, we use the slope value. For every single pixel increment along the x-axis, we must increment the y-axis value of the triangle edge by Dy, which is equal to the slope m when x is incremented by one pixel.
Note that each scan line is the next incremental y coordinate in screen space. The y values of non-vertex points on the triangle edge are approximated by the DDA algorithm, and are non-integer floating-point values that typically fall between two integer y values (scan lines). The algorithm finds the nearest y value (scan line number) to assign to y.
This can be seen in the stair-step "jaggie" effect along edges that 3D systems try to reduce using higher resolution display or anti-aliasing techniques that we'll describe soon. Ultimately, the result of the DDA operation is that we now have x,y values for all scan line crossing points of each line segment in a triangle.
Now, the bold is my own personal addition.
The jaggies, thus, are created because the algorithm finds the nearest y value. Also, please note that there is only ONE division there, and that it happens during initialization. Triangle Setup is thus quite fast, really.
Now, there IS another solution to jaggies. During that line calculation algorithm, for each pixel, you calculate how much the approximation is using ONE division ( I tried to do a program seeing if this would work, and it seems that the system I'm describing works great ) which gives the result of how much a pixel is covered by the line.
Now, it all becomes harder. You'll want to blend the coordinate of the main pixel of the part of the line on the other pixel covered. And how much transparent that pixel is depends on how much it is covered.
The problem, here, is that this would require PERFECT front-to-back ordering.
However, there are workarounds. But those are VERY difficult to implement in hardware. That's why I'm really not sure the NV30 will use a so good algorithm - it's a LOT harder than multisampling IMO
You've got to use a buffer which describes the number of overdraw of each pixel, then another one which describes how each pixel got to be blended on another nearby pixel IF the overdraw count is the same. And that buffer also got to have the influence ( which was determined during triangle setup, remember ) on that other pixel to determine the alpha value of the blending. Of course, using another Z Buffer instead of that whole overdraw thing would be more efficient, but a lot more costly.
But then you also got to fix the problem that alpha blending IS order dependent... So you may have to stock final colors in another buffer for a while or you could directly stock the colors in the second buffer which also describes how a pixel got to be blended on another one. That could be interesting.
Now, of course, I wasn't very clear in describing on how to fix those problems. But it really isn't easy to explain :)
I'm not sure of the exact performance cost of this system. It might cost a fair bit, but I'd be heavily surprised if it cost more than 2X MultiSampling. Evantually, if the scene isn't rendered front-to-back correctly, it might cost a little more. But it should never cost anywhere as much as 4X MultiSampling. And the quality would be EXCELLENT.
Okay, so I got no idea if anyone understood me. But anyway, I don't know if such a system will be in the NV30. It would be nice if it was, because having nearly perfect AA at 2X Multisampling cost is certainly nice, but I guess we'll see that at Comdex :)
Uttar
EDIT: Sounds like i forgot to point out that this would be very cheap using Z3 because Z3 supports order-independent transparency. But Z3 will certainly not be in the NV30. However, I think there are ways to implement this beside Z3.