Re: My own Benchmark
I think it's better if you add another pass in which the program just uses vertex shader for the shadow volume(i.e. inserting degenerate quad).
I modified the glsl shader, nulling most of the fragment level operation and only get less than 200 fps. I assume the program is mainly limited by the cpu in this situation.