Well, CG is sometimes slower than HLSL, even on a NV3x. This was proved by pocketmoon over at Beyond3D. So on a non-NV3x, I wouldn't be too optimistic on performance.
Microsoft got a LOT of experience with compilers. nVidia don't. I guess it'll get better in the future. But for now, Cg is still slower than HLSL.