The pipeline is through and through 24 bpp per pixel float. It takes a 32 or 16 bit RGBA data component as input, expands it to 96 bit, manipulates it at 96 bit (writing intermediate results as 128 bits to preserve power of 2 framebuffer alignment), then dithers as a final step back to 10:10:10:2 RGBA for output through the DAC, all with no performance differences.

This is exactly the same as the Kyro I and II which internally rendered everything at 32bit making 16bit look better than the competitions equivalent.
