Clearly all the back-ends except "vesa" have problems.
Well the reason why vesa and via don't have this problems is because i guess the via card is shared memory and vesa is it too (or at least rendering happens in system-ram).
Therefor this drivers don't need the additional readback from VMRAM which is what hurts so much compared to the other results. Furthermore vesa does not have any accalerated routines thats why its consistent - blending is always done by CPU.

btw, gtk is WORSE than qt! Here's the 6800GT on gtk+2.0-2.8.16:
I guess this is because you don't get that bad numbers I get (haven't seen any results compareable to mine). Your subpixel slowdown factor is quite similar to what I get on my fast machines.

I wonder wether anybody can explain why my fastest machine performs only have as fast as my slowest.

In fact my Duron800/FX5200 is able to render subpixel-AA strings twice as fast as my 1.8ghz Sempron (4 times more L2 cache, faster architecture) with a GF6600/256mb ram.
Its the same software (same xorg version, same nvidia driver, same kernel).
It was not my intetion to buy a new one and get something that performs like crap!

lg Clemens
