If you'd like to see the OpenGL aniso specs, here they are:

http://oss.sgi.com/projects/ogl-samp...nisotropic.txt
Here's the relevant portion:

"Anisotropic texture filtering substantially changes Section 3.8.5.

Previously a single scale factor P was determined based on the

pixel's projection into texture space. Now two scale factors,

Px and Py, are computed.

Px = sqrt(dudx^2 + dvdx^2)

Py = sqrt(dudy^2 + dvdy^2)

Pmax = max(Px,Py)

Pmin = min(Px,Py)

N = min(ceil(Pmax/Pmin),maxAniso)

Lamda' = log2(Pmax/N)

where maxAniso is the smaller of the texture's value of

TEXTURE_MAX_ANISOTROPY_EXT or the implementation-defined value of

MAX_TEXTURE_MAX_ANISOTROPY_EXT.

It is acceptable for implementation to round 'N' up to the nearest

supported sampling rate. For example an implementation may only

support power-of-two sampling rates.

It is also acceptable for an implementation to approximate the ideal

functions Px and Py with functions Fx and Fy subject to the following

conditions:

1. Fx is continuous and monotonically increasing in |du/dx| and |dv/dx|.

Fy is continuous and monotonically increasing in |du/dy| and |dv/dy|.

2. max(|du/dx|,|dv/dx|} <= Fx <= |du/dx| + |dv/dx|.

max(|du/dy|,|dv/dy|} <= Fy <= |du/dy| + |dv/dy|.

If you'll note, the specs allow for an approximation of the Px and Py functions, but the rest of the aniso degree selection calcs must be the same. It appears that ATI uses the sum of absolute values in their approximation. The 8500 presumably uses the sum of two absolute values, whereas the 9700 apparently uses the sum of four (I'm not entirely certain exactly what is used, but this does seem to be the case based on the data that's been put out to date). It seems to me that this is getting precious-close to the calculation power needed to just use the true functions Px and Py.

By the way, to see how I arrived at this conclusion as to how ATI apparently does the math, consider the following two functions:

8500-type method:

Fx = |du/dx| + |dv/dx|

Fy = |du/dy| + |dv/dy|

You might see how this could produce lines similar to:

\/

9700-type method:

Fx = |(|du/dx - a| + |dv/dx - a|) - (|du/dx + a| + |dv/dx + a|)|

Fy = similar

(Disclaimer: Written as is, this probably won't conform to specs. By modifying the value for a and scale factors, such as adding a constant to the entire equation, or multiplying the entire equation by a constant, should be enough to make the equation conform)

And this should produce lines similar to:

\_/

If you'll note, I actually based these calcs on MIP map selection algorithms, which appear to be very highly related (If you look at the math, it's almost identical...and the differences in MIP lines seem to coincide perfectly with the differences in aniso selection algorithms).

After looking at this, it really seems even more blatant to me that even from a transistor count perspective, that it would have been a good idea to go with the real function instead. The only possible problem may be the square root. That function may take a number of transistors, but I'm not really sure how many...