Oh, and one last thing, for those who care about this technical stuff.

I just realized that the square root is absolutely trivial. Since there is a logarithm that must be done after calculating the Px and Py, all that you would need to do is keep all of the previous values before calculating that log as squares, and divide the result after the log by 2. This is absolutely trivial, as it consists of just truncating the last digit.

So, I really see no reason why ATI should have gone the route they did with the Radeon 9700.

It now is seeming to me like it was more of an engineering cost issue. They wanted to spend time elsewhere instead, feeling that their aniso implementation was, "good enough," and would rather spend time on other things.
