suburbanguy
08-11-09, 07:28 AM
Part 1 (http://www.pcgameshardware.com/aid,692145/Nvidias-Chief-Scientist-Bill-Dally-about-GPU-technology-DirectX-11-and-Intels-Larrabee/News/)
Part 2 (http://www.pcgameshardware.com/aid,692145/Nvidias-Chief-Scientist-Bill-Dally-about-GPU-technology-DirectX-11-and-Intels-Larrabee/News/?page=2)
Part 3 (http://www.pcgameshardware.com/aid,692145/Nvidias-Chief-Scientist-Bill-Dally-about-GPU-technology-DirectX-11-and-Intels-Larrabee/News/?page=3)
select quotes:
PCGH: You have replaced David Kirk as a chief scientist. How do you think you will be a different kind of chief scientist. What will you do different than your predecessor.
Bill Dally: David is an Nvidia fellow now. And I think that i complement him well. I mean David is a real expert on graphics, in particular on ray tracing and I am a real expert at parallel computing. And so I think - both graphics and parallel computing are very important areas for Nvidia. And so I think, I'm gonna continuing many of the things David did. I think he did a wonderful job as chief scientist. I'm gonna try to expand our research in particular into a lot of areas that have to do with how we implement our GPUs.
He created Nvidia Research, and he focused it very much on the application end of things - how to deliver better graphics and a number of things having to do with GPU computing and in particular on ray tracing. We're continuing those activities and actually still consider them as really important areas. But also in our mission of looking 5 -10 years forward, it's very important to us to look at things like VLSI Design, computer architecture, compilers and programming systems. And so I'm expanding Nvidia Research into those directions so we can have a long-term view, identify strategic opportunities there as well as in the application spaces.
PCGH: Now you've mentioned it two times: Ray tracing. I'll have to go into that direction a bit. Intel made a lot of fuzz about ray tracing in the last 18 months or so. Do you think that's going to be a major part of computer and especially gaming graphics in the foreseeable future, until 2015 maybe?
Bill Dally: It's interesting that they've made a big fuzz about it while we've had a demonstration of real-time ray tracing at Siggraph last year. It's one thing making a fuzz, it's another thing demonstrating it running real-time on GPUs.
But to answer that, what I see as most likely for game graphics going forward is hybrid graphics. Where you start out by rasterizing the scene and then you make a decision at each fragment, whether that fragment can be renderer just with a shader calculating using local information or if it's a specular surface or if it's transparent surface or if there's is a silhouette edge and soft-shadows are important. Then you may need cast rays to compute a very accurate and photo realistic color for that point. So I think it's gonna be a hybrid version where some pixels are rendered conventionally and some pixels involve ray tracing and that gives us the most efficient use of our computational resources - using ray tracing where it does the most good.
PCGH: While we're at it. Intel also made a big fuzz about Larrabee.
Bill Dally: M-hm.
PCGH: They are aiming for a mostly programmable architecture there. They state that they have only 10 percent dedicated to graphics of the whole die, the rest being completely programmable according to Intel. And still they want to compete in the high-end with your GPUs. Do you think that's feasible right now?
Bill Dally: First of all, right now, Larrabee is a bunch of View-graphs. So, until they actually have a product, it's difficult to say how good it is or what it does. You have to be careful to read to much into View-graphs - it's easy to be perfect, when you have to do is be a View-Graph. It's much harder when you have to deliver a product that actually works.
But to the question of the degree of fixed function hardware: I think it puts them at a very serious disadvantage. Our understanding of Larrabee, which is based on their paper at Siggraph last summer and the two presentations at the Game Developers Conference in April, is that they have fixed function hardware for texture filtering, but they do not have any fixed function hardware either for rasterization or compositing and I think that that puts them at a very serious disadvantage. Because for those parts of the graphics pipeline they're gonna have to pay 20 times or more energy than we will for those computations. And so, while we also have the option of doing rasterization in software if we want - we can write a kernel for that running on our Streaming Multiprocessors - we also have the option of using our rasterizer to do it and do it far more efficiently. So I think it puts them at a very big disadvantage power-wise to not have fixed function hardware for these critical functions. Because everybody in a particular envelope is dominated by their power consumption. It means that at a given power value they're going to deliver much lower performance graphics.
I think also that the fact that they've adopted an x86-instruction set puts them at a disadvantage. It's a complex instruction set, it's got instruction prefixes, it only has eight registers and while they claim that this gives them code compatibility, it gives them code compatibility only if they want to run one core without the SIMD extension. To use the 32 cores or use the 16-wide SIMD extension , they have to write a parallel program, so they have to start over again anyway. And they might as well have started over with a clean instruction set and not carry the area and power cost of interpreting a very complicated instruction set - that puts them at a disadvantage as well.
So while we're very concerned about Larrabee, Intel is a very capable company, and you always worry, when a very capable company starts eating your lunch, we're not too worried about Larrabee at least based on what they disclosed so far.
PCGH: So you think - based on available information - that it [Larrabee] will, when it comes out in early 2010, will not be competitive compared to then high-end GPUs?
Bill Dally: That's our view, yes.
PCGH: When do you think we're going to see products on the shelves, that were influenced by your work?
Bill Dally: I've had small influences on some of the products that are going to be coming out towards the end of this year but those products were largely defined and it was just little tweaks toward the end. It's really gonna be the products in about the 2011 time frame that I will be involved in from the earlier stages.
PCGH: That's about the same time, we're expecting the new game consoles. Is that also an opportunity Nvidia is looking forward to? What's your take on that?
Bill Dally: We're certainly very interested in game console opportunities. [smiles]
PCGH: How fast is the ALU/FLOP-ratio evolving? Is the move towards more FLOPS accelerating in the future?
Bill Dally: The texturing and FLOPS actually tends to hold a pretty constant ratio and that's driven by what the shaders we consider important are using. We're constantly benchmarking against different developers‘ shaders and see what our performance bottlenecks are. If we're gonna be texture limited on our next generation, we pop another texture unit down. Our architecture is very modular and that makes it easy to re-balance.
The ratio of FLOPS to bandwidth, off-chip bandwidth is increasing. This is, I think, driven by two things. One is fortunately the shaders are becoming more complex. That's what they want anyway. The other is, it's just much less expensive to provide FLOPS than it is [to provide] bandwidth. So you tend to provide more of the thing which is less expensive and then try to completely saturate the critical expensive resource which is the memory bandwidth.
PCGH: Do you think a large leap in available bandwidth would be necessary for next generation hardware - like for DirectX 11 with it's focus on random R/W (Scatter, Gather operations etc.) which should benefit greatly from more or at least more granular memory access.
Bill Dally: Almost everything would benefit from more bandwidth and being able to do it at a finer grain. But I don't think that there's gonna be any large jumps. I think we're gonna evolve our memory bandwidth as the GDDR memory components evolve and track that increase.
PCGH: Talking about multi-platform titles. Current generation game consoles use hardware that is four to five years old - at least the design. The PC's graphics can be better than on the consoles. Is it difficult to motivate game developers to build better graphics for PC games when they are developing a multi-platform title which may end up being graphics bound on the consoles?
Bill Dally: I don't have direct experience with that, our devtech people tend to work with the game developers. They seem to be getting them to use our GPUs to the best of their capabilities. But I don't know quite what challenges they face in doing that.
PCGH: Another topic: In contrast to your competitor, Nvidia's GPUs, at least the high-end ones, have in the last couple of years always been very large, physically. AMD is going the route of having a medium-sized die and scale it with X2 configurations for high-end needs; Nvidia is producing very large GPUs. Is that a trend which could change in the future or don't you think you have reached the limits of integration in single-chips, the "Big Blocks of Graphics"?
Bill Dally: We're trying to always deliver the best performance and value to our customers and we're gonna continue doing that. And for any given generation there's an economic decision that has to be made about how large to make the die. Our architecture is very scalable, so the larger we make the die, the more performance we deliver. We also deliver duplex-configurations as in GTX 295 and so, if build a very large die and then also put two of them we can deliver even more performance. And so for each generation we're gonna to the calculation side what is the most economic way of delivering the best performance for our customers.
PCGH: Is this decision driven more by financial economics or power economics? Like if we go for this and that power envelope we can squeeze a dual configuration on one PCI Express card.
Bill Dally: It's one combined calculation. We are trying to deliver the best performance subject to a number of constraints that set the edges of the envelope. Some of those constraints are financial and some of those constraints are physical things like board area and things like power, you know, how large your die can be and fit that particular type of package.
Part 2 (http://www.pcgameshardware.com/aid,692145/Nvidias-Chief-Scientist-Bill-Dally-about-GPU-technology-DirectX-11-and-Intels-Larrabee/News/?page=2)
Part 3 (http://www.pcgameshardware.com/aid,692145/Nvidias-Chief-Scientist-Bill-Dally-about-GPU-technology-DirectX-11-and-Intels-Larrabee/News/?page=3)
select quotes:
PCGH: You have replaced David Kirk as a chief scientist. How do you think you will be a different kind of chief scientist. What will you do different than your predecessor.
Bill Dally: David is an Nvidia fellow now. And I think that i complement him well. I mean David is a real expert on graphics, in particular on ray tracing and I am a real expert at parallel computing. And so I think - both graphics and parallel computing are very important areas for Nvidia. And so I think, I'm gonna continuing many of the things David did. I think he did a wonderful job as chief scientist. I'm gonna try to expand our research in particular into a lot of areas that have to do with how we implement our GPUs.
He created Nvidia Research, and he focused it very much on the application end of things - how to deliver better graphics and a number of things having to do with GPU computing and in particular on ray tracing. We're continuing those activities and actually still consider them as really important areas. But also in our mission of looking 5 -10 years forward, it's very important to us to look at things like VLSI Design, computer architecture, compilers and programming systems. And so I'm expanding Nvidia Research into those directions so we can have a long-term view, identify strategic opportunities there as well as in the application spaces.
PCGH: Now you've mentioned it two times: Ray tracing. I'll have to go into that direction a bit. Intel made a lot of fuzz about ray tracing in the last 18 months or so. Do you think that's going to be a major part of computer and especially gaming graphics in the foreseeable future, until 2015 maybe?
Bill Dally: It's interesting that they've made a big fuzz about it while we've had a demonstration of real-time ray tracing at Siggraph last year. It's one thing making a fuzz, it's another thing demonstrating it running real-time on GPUs.
But to answer that, what I see as most likely for game graphics going forward is hybrid graphics. Where you start out by rasterizing the scene and then you make a decision at each fragment, whether that fragment can be renderer just with a shader calculating using local information or if it's a specular surface or if it's transparent surface or if there's is a silhouette edge and soft-shadows are important. Then you may need cast rays to compute a very accurate and photo realistic color for that point. So I think it's gonna be a hybrid version where some pixels are rendered conventionally and some pixels involve ray tracing and that gives us the most efficient use of our computational resources - using ray tracing where it does the most good.
PCGH: While we're at it. Intel also made a big fuzz about Larrabee.
Bill Dally: M-hm.
PCGH: They are aiming for a mostly programmable architecture there. They state that they have only 10 percent dedicated to graphics of the whole die, the rest being completely programmable according to Intel. And still they want to compete in the high-end with your GPUs. Do you think that's feasible right now?
Bill Dally: First of all, right now, Larrabee is a bunch of View-graphs. So, until they actually have a product, it's difficult to say how good it is or what it does. You have to be careful to read to much into View-graphs - it's easy to be perfect, when you have to do is be a View-Graph. It's much harder when you have to deliver a product that actually works.
But to the question of the degree of fixed function hardware: I think it puts them at a very serious disadvantage. Our understanding of Larrabee, which is based on their paper at Siggraph last summer and the two presentations at the Game Developers Conference in April, is that they have fixed function hardware for texture filtering, but they do not have any fixed function hardware either for rasterization or compositing and I think that that puts them at a very serious disadvantage. Because for those parts of the graphics pipeline they're gonna have to pay 20 times or more energy than we will for those computations. And so, while we also have the option of doing rasterization in software if we want - we can write a kernel for that running on our Streaming Multiprocessors - we also have the option of using our rasterizer to do it and do it far more efficiently. So I think it puts them at a very big disadvantage power-wise to not have fixed function hardware for these critical functions. Because everybody in a particular envelope is dominated by their power consumption. It means that at a given power value they're going to deliver much lower performance graphics.
I think also that the fact that they've adopted an x86-instruction set puts them at a disadvantage. It's a complex instruction set, it's got instruction prefixes, it only has eight registers and while they claim that this gives them code compatibility, it gives them code compatibility only if they want to run one core without the SIMD extension. To use the 32 cores or use the 16-wide SIMD extension , they have to write a parallel program, so they have to start over again anyway. And they might as well have started over with a clean instruction set and not carry the area and power cost of interpreting a very complicated instruction set - that puts them at a disadvantage as well.
So while we're very concerned about Larrabee, Intel is a very capable company, and you always worry, when a very capable company starts eating your lunch, we're not too worried about Larrabee at least based on what they disclosed so far.
PCGH: So you think - based on available information - that it [Larrabee] will, when it comes out in early 2010, will not be competitive compared to then high-end GPUs?
Bill Dally: That's our view, yes.
PCGH: When do you think we're going to see products on the shelves, that were influenced by your work?
Bill Dally: I've had small influences on some of the products that are going to be coming out towards the end of this year but those products were largely defined and it was just little tweaks toward the end. It's really gonna be the products in about the 2011 time frame that I will be involved in from the earlier stages.
PCGH: That's about the same time, we're expecting the new game consoles. Is that also an opportunity Nvidia is looking forward to? What's your take on that?
Bill Dally: We're certainly very interested in game console opportunities. [smiles]
PCGH: How fast is the ALU/FLOP-ratio evolving? Is the move towards more FLOPS accelerating in the future?
Bill Dally: The texturing and FLOPS actually tends to hold a pretty constant ratio and that's driven by what the shaders we consider important are using. We're constantly benchmarking against different developers‘ shaders and see what our performance bottlenecks are. If we're gonna be texture limited on our next generation, we pop another texture unit down. Our architecture is very modular and that makes it easy to re-balance.
The ratio of FLOPS to bandwidth, off-chip bandwidth is increasing. This is, I think, driven by two things. One is fortunately the shaders are becoming more complex. That's what they want anyway. The other is, it's just much less expensive to provide FLOPS than it is [to provide] bandwidth. So you tend to provide more of the thing which is less expensive and then try to completely saturate the critical expensive resource which is the memory bandwidth.
PCGH: Do you think a large leap in available bandwidth would be necessary for next generation hardware - like for DirectX 11 with it's focus on random R/W (Scatter, Gather operations etc.) which should benefit greatly from more or at least more granular memory access.
Bill Dally: Almost everything would benefit from more bandwidth and being able to do it at a finer grain. But I don't think that there's gonna be any large jumps. I think we're gonna evolve our memory bandwidth as the GDDR memory components evolve and track that increase.
PCGH: Talking about multi-platform titles. Current generation game consoles use hardware that is four to five years old - at least the design. The PC's graphics can be better than on the consoles. Is it difficult to motivate game developers to build better graphics for PC games when they are developing a multi-platform title which may end up being graphics bound on the consoles?
Bill Dally: I don't have direct experience with that, our devtech people tend to work with the game developers. They seem to be getting them to use our GPUs to the best of their capabilities. But I don't know quite what challenges they face in doing that.
PCGH: Another topic: In contrast to your competitor, Nvidia's GPUs, at least the high-end ones, have in the last couple of years always been very large, physically. AMD is going the route of having a medium-sized die and scale it with X2 configurations for high-end needs; Nvidia is producing very large GPUs. Is that a trend which could change in the future or don't you think you have reached the limits of integration in single-chips, the "Big Blocks of Graphics"?
Bill Dally: We're trying to always deliver the best performance and value to our customers and we're gonna continue doing that. And for any given generation there's an economic decision that has to be made about how large to make the die. Our architecture is very scalable, so the larger we make the die, the more performance we deliver. We also deliver duplex-configurations as in GTX 295 and so, if build a very large die and then also put two of them we can deliver even more performance. And so for each generation we're gonna to the calculation side what is the most economic way of delivering the best performance for our customers.
PCGH: Is this decision driven more by financial economics or power economics? Like if we go for this and that power envelope we can squeeze a dual configuration on one PCI Express card.
Bill Dally: It's one combined calculation. We are trying to deliver the best performance subject to a number of constraints that set the edges of the envelope. Some of those constraints are financial and some of those constraints are physical things like board area and things like power, you know, how large your die can be and fit that particular type of package.