PDA

View Full Version : Don't ask for more AA samples; ask for a better algorithm


Uttar
11-11-02, 03:10 AM
Hello everyone,

I wouldn't want to criticize ATI by saying this. I think that for a card released so much before the NV30, they did a very good job. Too bad their margins are so small, but since they're ready to give R300s at $399, hey, let them do so! :)

Now, what am going to talk about? Well, I think that simply asking for more AA samples doesn't make any type of sense anymore. It made sense a few years ago. But today, it just won't cut it anymore.

Using more samples is, IMO, brute force. You aren't fixing the REASON the jaggies are created. You're just searching for a work around.

I can already imagine your reaction... "Yeah, great. But that would have been done a long time ago if it was possible".

Actually, no. Let me begin by talking about how Triangle Setup works. I'll simply quote an excellent article about it, available at http://www.extremetech.com/article2/0,3973,471298,00.asp


First off, the triangle setup operation computes the slope (or steepness) of a triangle edge using vertex information at each of edge's two endpoints. You may recall the equation of a straight line being y=mx+b, where y is the y-axis value, x is the x-axis value, b is the value of y when x=0 (the "y intercept"), and m is the slope (or the ratio of the rate of change between x and y values).

The slope is often called delta x/delta y, dx/dy, Dx/Dy, or literally change in x/change in y). Using the slope information, an algorithm called a digital differential analyzer (DDA) can calculate x,y values to see which pixels each triangle side (line segment) touches. The process operates horizontal scan line by horizontal scan line. The DDA figures out the x-value of the pixels touched by a given triangle side in each successive scan-line. (Watt, p. 143)

What it really does is determine how much the x value of the pixel touched by a given triangle side changes per scan line, and increments it by that value on each subsequent scan-line.

To actually calculate the y value of the triangle edge for a given integer value of x, as we move incrementally along the x axis one pixel at a time, we use the slope value. For every single pixel increment along the x-axis, we must increment the y-axis value of the triangle edge by Dy, which is equal to the slope m when x is incremented by one pixel.

Note that each scan line is the next incremental y coordinate in screen space. The y values of non-vertex points on the triangle edge are approximated by the DDA algorithm, and are non-integer floating-point values that typically fall between two integer y values (scan lines). The algorithm finds the nearest y value (scan line number) to assign to y.

This can be seen in the stair-step "jaggie" effect along edges that 3D systems try to reduce using higher resolution display or anti-aliasing techniques that we'll describe soon. Ultimately, the result of the DDA operation is that we now have x,y values for all scan line crossing points of each line segment in a triangle.


Now, the bold is my own personal addition.

The jaggies, thus, are created because the algorithm finds the nearest y value. Also, please note that there is only ONE division there, and that it happens during initialization. Triangle Setup is thus quite fast, really.

Now, there IS another solution to jaggies. During that line calculation algorithm, for each pixel, you calculate how much the approximation is using ONE division ( I tried to do a program seeing if this would work, and it seems that the system I'm describing works great ) which gives the result of how much a pixel is covered by the line.

Now, it all becomes harder. You'll want to blend the coordinate of the main pixel of the part of the line on the other pixel covered. And how much transparent that pixel is depends on how much it is covered.

The problem, here, is that this would require PERFECT front-to-back ordering.
However, there are workarounds. But those are VERY difficult to implement in hardware. That's why I'm really not sure the NV30 will use a so good algorithm - it's a LOT harder than multisampling IMO

You've got to use a buffer which describes the number of overdraw of each pixel, then another one which describes how each pixel got to be blended on another nearby pixel IF the overdraw count is the same. And that buffer also got to have the influence ( which was determined during triangle setup, remember ) on that other pixel to determine the alpha value of the blending. Of course, using another Z Buffer instead of that whole overdraw thing would be more efficient, but a lot more costly.

But then you also got to fix the problem that alpha blending IS order dependent... So you may have to stock final colors in another buffer for a while or you could directly stock the colors in the second buffer which also describes how a pixel got to be blended on another one. That could be interesting.


Now, of course, I wasn't very clear in describing on how to fix those problems. But it really isn't easy to explain :)
I'm not sure of the exact performance cost of this system. It might cost a fair bit, but I'd be heavily surprised if it cost more than 2X MultiSampling. Evantually, if the scene isn't rendered front-to-back correctly, it might cost a little more. But it should never cost anywhere as much as 4X MultiSampling. And the quality would be EXCELLENT.


Okay, so I got no idea if anyone understood me. But anyway, I don't know if such a system will be in the NV30. It would be nice if it was, because having nearly perfect AA at 2X Multisampling cost is certainly nice, but I guess we'll see that at Comdex :)


Uttar

EDIT: Sounds like i forgot to point out that this would be very cheap using Z3 because Z3 supports order-independent transparency. But Z3 will certainly not be in the NV30. However, I think there are ways to implement this beside Z3.

egdusp
11-11-02, 03:41 AM
As far as I understand u, you are describing a Edge AA, similar to the Matrox Parhelia.
We all know that the Parhelia has some problems, maybe resulting from that "front to back" necessary. Matrox is planing to fix this with their next card, so I guess NV should be able to do it right with their first try.

egdusp

Uttar
11-11-02, 03:58 AM
EDIT: I was mistaking about how FAA works. Corrections have been made. If something about FAA isn't correct, please say me. I use the faa_16x paper from Matrox for information about it, but I could have misunderstood something.

Actually, no. Yes, it's also Edge AA, but it's not done in the same manner as FAA.

Matrox system is an excellent idea. Here is a basic summary of how it works:
1. Determine triangle edges
2. Render the scene
3. For those edges, use 16 samples to smooth them.
4. Add those anitaliased pixels.

My system, in EVERY case, never uses more than 2 samples. And it destroys jaggies quite well, too! :)

The performance of my algorithm, since information is written for EVERY possible jaggies, is HEAVILY dependant of overdraw. If the scene isn't well sorted front-to-back like it should be, performance isn't as great.
I don't think that's how FAA works, because it simply asks for more samples for that pixel. My system, instead, calculates what the color of that second pixel would be. So it got to write some more info.

The two big differences between this algorithm and Matrox's one are:
1. Matrox's system doesn't care about how much a pixel is covered
2. My algorithm's quality is in %. That is the value of how much a pixel got to be covered by a line to be drawn. So, the lower that value, the higher the memory bandwidth & blending costs.

Matrox algorithm still uses samples. Mine doesn't.

Was I clear enough?

Uttar

StealthHawk
11-11-02, 07:14 AM
is this a hypothetical algorigthm you are suggesting? or something that you have actually managed to implement via software?

Uttar
11-11-02, 08:10 AM
Actually, I'm still working a little on this whole idea of implementing it in software. I didn't begin programming it yet, because I've got to plan everything first.

I think I'm going to use DrawPrimitiveUP with points with no Z Buffer or anything. So that's it's all done in software, but in the end you still see it on screen.

The first versions will obviously use back-to-front ordering, while the next ones would try to make it work in front-to-back systems

In one way, I certainly hope nVidia would have a so wonderful algorithm. In another one, that would mean my program would be useless if it isn't ready before Comdex :)


Uttar

pastor
11-11-02, 10:22 AM
Originally posted by Uttar
Actually, I'm still working a little on this whole idea of implementing it in software. I didn't begin programming it yet, because I've got to plan everything first.

I think I'm going to use DrawPrimitiveUP with points with no Z Buffer or anything. So that's it's all done in software, but in the end you still see it on screen.

The first versions will obviously use back-to-front ordering, while the next ones would try to make it work in front-to-back systems

In one way, I certainly hope nVidia would have a so wonderful algorithm. In another one, that would mean my program would be useless if it isn't ready before Comdex :)


Uttar

Come On uttar! this is no more a mystery that you are an nvidia employee (prophet?) :D

Uttar
11-11-02, 10:42 AM
For the last time...
I do NOT work for nVidia and I am NOT a nVidia affiliated developer or anything.


Uttar

tazdevl
11-11-02, 11:21 AM
Interesting thoughts Uttar... one thing to note, you can pick an OEM 9700 Pro for $311 and Retail for $320. That's a big enough difference for folks to start quoting street prices rather than MSRP.

sbp
11-11-02, 11:28 AM
Y'all keep teasing Uttar about being an Nvidia employee, the NV30 will be even more delayed. http://sbp777.homestead.com/files/wink2.gif

tieros
11-11-02, 12:06 PM
Is that your interpretation of nVidia's recently granted patent found here (http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/search-bool.html&r=2&f=G&l=50&co1=AND&d=ft00&s1=nvidia&OS=nvidia&RS=nvidia) , or is it a different algorithm?

When I was reading your post, it reminded me of something, but I wasn't sure if it was the Z3 document posted last week, or the patent.

Uttar
11-11-02, 12:23 PM
LOL, sbp. Now *that's* an odd way to think about it! :)

Actually, that isn't my interpretation of anything. It's just an idea I'd like to try.

I've never read that patent, tieros. I'll try to read it tommorow.


Uttar

Typedef Enum
11-11-02, 03:49 PM
Having used the Matrox Parhelia for the last 6-7 weeks, they're implemention, though not 100% perfect, is really a pretty darn good way to tackle the problem.

The best example, IMHO, is the following...Take a game like NOLF. When you start a brand new game, they put the character in front of a very large window. Without any AA, the jaggies are horribly bad. When you apply nVidia AA, it obviously helps out, but leaves a lot to be desired.

On the other hand, the FAA implementation really does a heck of a job on the window. For real small objects, it's not quite as clear...But for really large objects, the difference is more obvious.

The same goes for Nascar 2002. If you take a look @ the inside of the car, the job it does on specific objects is really amazing.

Based on everything I have heard (which I can't really expand upon), it seems that FAA will be completely ironed out in the not too distant future.

Uttar
11-14-02, 11:26 AM
While the algorithm kinda seems to work, I've got a few problems with gamma correction. I incorrectly assumed that two nearby pixels of a (128,128,128) color would look the same as one (255,255,255) for our dumb eye.
I'd guess that's EXACTLY what the R300's "Gamma Correct FSAA" is all about...

I'll try to get a better result... Maybe by multiplying influence by a 1.2 factor or something. I'll have to see what works best...

However, from my understanding, that ALSO happens with supersampling. I'll have to try it on my GF2 to be sure, I'll do that tommorow too.


Uttar

EDIT: typo corrected: replaced "gamme" by "gamma"

AngelGraves13
11-14-02, 06:10 PM
wow.....now I feel like I don't know anything about computers......lol. Instead of learning math in school I was too busy listenin' to Social Distortion and greasign up my hair into a pomp and dressing all 50's......oh well, at least I had some chicks chasing after me.

Uttar
11-16-02, 04:54 AM
Bah... I abandon making this work.

Making it work with lines is indeed VERY easy, but making it work with triangles is a LOT harder.

So, if nVidia uses this system, I'd be impressed. Once you fix about fifty billion problems, you should get a very high quality result, with a small performance hit.


Uttar

Mod
11-16-02, 05:41 AM
Don't give up so easily. Good things generaly take months or years to be made. Work on this algorithm for more uh, let's 5 or 6 months, and then you may think on its feasibility.

You could study more algorithms to get some ideas on what to do. Make sure that anyone has made something useful or related to what you're doing.

Don't be shy, email whatever specialist about anything you need. The worst thing could happen is get a bad answer, but most of them would give a useful answer.

Be humble.

Uttar
11-16-02, 10:01 AM
Hmm, did a little research... Sounds like someone already thinked about this whole algorithm. In 1992 :)

A similar approach was published in Dr. Dobb's Journal in June 1992. It was called "Wu Antiantialiasing" - and it's what I was thinking about, beside the fact that the article shows a few ways to make it much faster than what I initially supposed.

For the full article, here's the link:
http://www.whisqu.se/per/docs/graphics75.htm

Thus, if I wanted to continue this idea, I should try to make it work with filled triangles using blending.

I'd REALLY want to find the source code this article is refering to. It's easier to make a wheel more round than to actually invent the wheel :)


Uttar

EDIT: Found it. And saying I had the exact same article, but with source code, in Zen of Graphics Programming by Michael Abrash! And I never noticed it! LOL, now this is funny :) However, the article provides no perfect solution for filled polygons. Sounds like that's what I'll have to try to figure out if I get the time...

Uttar
11-16-02, 10:37 AM
There it is! http://www.whisqu.se/per/docs/graphics78.htm


I trust, however, that you can see how easy it would be to
improve image quality by antialiasing with the DDA approach. For example, we
could simply average the four surrounding pixels as we did for simple,
unweighted antialiasing in this column last year. Or, we could take a Wu
antialiasing approach (see my June column) and average the two bracketing
pixels along each axis according to proximity. If we had cycles to waste
(which, given that this is real-time animation on a PC, we don't), we could
improve image quality by putting the source pixels through a low-pass filter
sized in X and Y according to the ratio of the source and destination
dimensions (that is, how much the destination is scaled up or down from the
source).


More than 10 years ago, Michal Abrash, currently hired by Microsoft ( he worked on the XBox ) , proposed multiple solutions for increasing textured polygons quality.

"Averaging the four surrounding pixels" - That's what supersampling & multisampling is all about.

"Wu antialiasing" - Pretty much what I described in the original thread

Solution 3, however, is VERY strange. It makes me think of anisotropy, but it doesn't quite seem to be it...

Note that today, Wu Antialiasing could be done using floating point. That would result in PERFECT antialiasing! :)

Now, the reason Solution 2 with floating point isn't currently used in GPUs is that it got problems with adjacent polygons. The solution to this, of course, is alpha blending. When Wu Antialiasing was invented in 1991, alpha blending was WAY too costly. And thus was the whole idea of making it work with polygons pretty much abandonned.

As amusing as this sounds, ONE system enables good quality nearly free Wu Antialiasing. Z3 :) I can't explain all the reasons, but the main one is that order-indepedant transparency fixes a LOT of problems with Wu Antialiasing.

Now, there are others way than Z3 to do this. I have imagined one, but I'm not sure its quality would be as good. However, performance costs would obviously be lower.


Conclusion? I don't think I'm going to abandon this idea just yet. It was already thinked before by Michal Abrash himself ( which, in case you need a refresher, worked with Carmack on Quake 1 ) - it just never was implemented because it was too costly at the time.

However, seeing how Abrash worked on the XBox... Maybe he tried to make nVidia use Wu Antialiasing for the XBox GPU, but it was decided it was going to take too much time to make it work... Thus resulting in making it a NV30 technology.

Which means that if my insane and super complex speculation is right, we could all be getting cheap, excellent quality antialiasing announced at Comdex! Exciting! :)


Uttar

StealthHawk
11-16-02, 10:01 PM
Originally posted by Uttar
However, seeing how Abrash worked on the XBox... Maybe he tried to make nVidia use Wu Antialiasing for the XBox GPU, but it was decided it was going to take too much time to make it work... Thus resulting in making it a NV30 technology.

Which means that if my insane and super complex speculation is right, we could all be getting cheap, excellent quality antialiasing announced at Comdex! Exciting! :)


Uttar
only 2 more days till NV30 is launched, i can't wait! the onyl question is whether or not we will get benchmarks...there definitely will be some technology previews, hopefully they will at least expose some of the quality and performance hits we can expect with enabling those kick ass IQ features