View Full Version : SLI of 4
Daneel Olivaw
01-12-05, 12:30 PM
any1 else think this might be too hard even for strong cpu to do the load balancing on such a rig? We already see the load of SLI of 2 on the CPU at res below 1280...
see front page and
http://www.digit-life.com/news.html?112573#112573
and btw, no mobo I know of has more than 20 PCIE channels
einstein_314
01-12-05, 01:37 PM
I'm also trying to figure out how this would work. From what I understand, when you have 2 sli cards (say 2 6800 GTs) and are running them in sli, the pcie slots on the board are running at 8x. If you have a dual GPU sli card (like the one in the article) the pcie slot it's in runs at 16x so each gpu gets 8x, just like having 2 separate cards. For them to have 2 of these dual gpu cards in sli, wouldn't they have to have each slot at 8x meaning each gpu would only be at 4x?
and btw, no mobo I know of has more than 20 PCIE channels
Same here which means that it's impossible on current mobos. I think that this dual gpu on one board sli is intended for those mobos with only 1 pcie 16x slot. Then they can utilize sli with their 1 card.
well the other question that i havent seen anyone ask is will the drivers work with a four card sli? and how will it handle the renderer... sfr and afr .... i just dont think that nvidia is ready with any sort of drivers that will handle four gpu's yet...
superklye
01-12-05, 02:45 PM
Same here which means that it's impossible on current mobos. I think that this dual gpu on one board sli is intended for those mobos with only 1 pcie 16x slot. Then they can utilize sli with their 1 card.
I thought the dual gpu cards required an SLi board?
I think they use the SLI link between eachother, so it might not need a SLI board to use 1 dual GPU card.
And for as the GPUs SLI links are already in use, you prolly couldnt use 2 dual boards in SLI either.. dunno tho..
superklye
01-12-05, 02:57 PM
I think they use the SLI link between eachother, so it might not need a SLI board to use 1 dual GPU card.
And for as the GPUs SLI links are already in use, you prolly couldnt use 2 dual boards in SLI either.. dunno tho..
Nope, I just checked:
http://www.anandtech.com/video/showdoc.aspx?i=2315
"The reader should understand this before beginning the review: these solutions are somewhat limited in application until NVIDIA changes its philosophy on multi-GPU support in ForceWare drivers. In order to get any multi-GPU support at all, the driver must detect an SLI capable motherboard. This means that we had to go back to the 66.81 driver in order to test Intel SLI. It also means that even if the 3D1 didn't require a special motherboard BIOS in order to boot video, it wouldn't be able to run in SLI mode unless it were in an SLI motherboard."
CaptNKILL
01-12-05, 04:24 PM
Nope, I just checked:
http://www.anandtech.com/video/showdoc.aspx?i=2315
"The reader should understand this before beginning the review: these solutions are somewhat limited in application until NVIDIA changes its philosophy on multi-GPU support in ForceWare drivers. In order to get any multi-GPU support at all, the driver must detect an SLI capable motherboard. This means that we had to go back to the 66.81 driver in order to test Intel SLI. It also means that even if the 3D1 didn't require a special motherboard BIOS in order to boot video, it wouldn't be able to run in SLI mode unless it were in an SLI motherboard."
Wow thats actually really good news. That means that its a driver limitation, rather than a hardware limitation. Since its based on software, anything is possible... you KNOW someone somewhere is going to produce a card with 4 6800 GPUs on it, plus an SLI connector to run more than one of them at once :D
They did it with 3dfx VSA-100 chips, and those werent that powerful. They would have to do it with a 6800 (or even a 6600).
r2d2d3d4d5
01-12-05, 04:39 PM
I'm also trying to figure out how this would work. From what I understand, when you have 2 sli cards (say 2 6800 GTs) and are running them in sli, the pcie slots on the board are running at 8x. If you have a dual GPU sli card (like the one in the article) the pcie slot it's in runs at 16x so each gpu gets 8x, just like having 2 separate cards. For them to have 2 of these dual gpu cards in sli, wouldn't they have to have each slot at 8x meaning each gpu would only be at 4x?
The TYAN Thunder K8WEX supports dual 16x sli (courtesy of an nVidia IO-4 running in addition to the CK8-04). It's an dual Opteron board however (you might need two CPUs to run this setup in any future board).
Daneel Olivaw
01-12-05, 04:50 PM
well the other question that i havent seen anyone ask is will the drivers work with a four card sli? and how will it handle the renderer... sfr and afr .... i just dont think that nvidia is ready with any sort of drivers that will handle four gpu's yet...
strong cpu to do the load balancing on such a rig? We already see the load of SLI of 2 on the CPU at res below 1280...
I was close no? ;)
This some cool stuff, Gigabyte experiments are pretty interesting, I wonder if Nvidia assisted them in the design? What other designs are being looked into? With dual core cpu's coming soon the limitations posed by SLI should be answered. Hell when Unreal3 comes around that game maybe enjoying 100fps rates at high quality settings with AA :D.
and btw, no mobo I know of has more than 20 PCIE channels
you can always do what tyan is doing with its chipset to support beyond 20 pci-e lanes
here (http://www.theinquirer.net/?article=18462)
i saw a picture of the board i just cant find it right now
r2d2d3d4d5
01-12-05, 07:01 PM
you can always do what tyan is doing with its chipset to support beyond 20 pci-e lanes
here (http://www.theinquirer.net/?article=18462)
i saw a picture of the board i just cant find it right now
here??? (http://www.theinquirer.net/?article=20030) also here (http://www.nvnews.net/vbulletin/showthread.php?t=39238)
Daneel Olivaw
01-13-05, 12:57 PM
Instead of making dual core this, dual northbridge that, dual CPU there, dual channel here, Raid0 under here, why not build a beowulf cluster while we're at it? :lol:
(j/k)
http://www.beowulf.org/
http://www.beowulf.org/ hmmmmmmm beowulf mmmmmmm now if only microsoft would incorperate something like beowulf into advanced server....
Daneel Olivaw
01-14-05, 08:12 AM
Liiiiiiiiiinnnnnnnnuuuuuuuuuuuuuxxxxxxxxxxx (freeeeeeedooooommmmm) (Mel Gibson)
lightman
01-14-05, 10:04 AM
They did it with 3dfx VSA-100 chips, and those werent that powerful. They would have to do it with a 6800 (or even a 6600).
They were able to do it with the VSA-100 because they weren't so powerful.
See, let's say that you have a frame that with one gpu takes X to be rendered. With 2 gpus, the time would be :
T_2 = X/2 + T_overhead
(please note that that's a lower limit, considering a perfect load splitting between the two gpus)
where T_overhead is the time spent by the driver to analyze the frame, split the work between the 2 gpus, and recompose the final frame.
Now, moving to n gpus, the time would be :
T_n = X/n + T_overhead_n
The problem is that, at some point, T_overhead_n will be higher than X/n. At that point, for each gpu you add to the card, you are in fact raising the time needed to render a frame, thus lowering performance.
Now, the interesting thing you have to consider here is that the faster the single gpu, the faster T_overhead_n will reach the point of being higher than X/n.
So in fact, using fast gpus like a 6800 or a 6800ultra will mean deviating more from the theoretic 100% performance gain (in the case of 2 gpus).
The only way you have to overcome this limiting factor is by lowering T_overhead_n, either recoding the "divide and conquer" part, or by using a faster cpu (which handles the splitting code). Or maybe effectively moving the analysis and splitting code to one of the gpus ;).
The problem is that, at some point, T_overhead_n will be higher than X/n. At that point, for each gpu you add to the card, you are in fact raising the time needed to render a frame, thus lowering performance.
Now, the interesting thing you have to consider here is that the faster the single gpu, the faster T_overhead_n will reach the point of being higher than X/n.
So in fact, using fast gpus like a 6800 or a 6800ultra will mean deviating more from the theoretic 100% performance gain (in the case of 2 gpus).
The only way you have to overcome this limiting factor is by lowering T_overhead_n, either recoding the "divide and conquer" part, or by using a faster cpu (which handles the splitting code). Or maybe effectively moving the analysis and splitting code to one of the gpus .
ok well clusters have way way more overhead but it doesnt stop system admins from adding more nodes to a cluster to add more firepower to there server farms. so if you run a high end render farm and there is a way to cut render times by adding more gpu's in parrrele via pci-ex, then they you would do it. i have a feeling that lucus films or pixar would do something like that.
lightman
01-16-05, 01:51 PM
ok well clusters have way way more overhead but it doesnt stop system admins from adding more nodes to a cluster to add more firepower to there server farms. so if you run a high end render farm and there is a way to cut render times by adding more gpu's in parrrele via pci-ex, then they you would do it. i have a feeling that lucus films or pixar would do something like that.
Given that I am the sysadm for two clusters, one with fast ethernet interconnect, and the other one with gigabit (for nfs+tcp) and Infiniband (for actual interprocess communication), I think I am quite qualified to talk about clusters and parallel computing in general ;)
That said, every kind of parallel process as a limit in scalability, and that limit is given by the above said "serial" time (that is, the T_overhead_n part in the above equation).
Now, let me elaborate a little more the above equations.
The general form for the time required by a parallel process is :
T = T_serial + T_communication + T_computation
when you have n "processing units" the above equation can be written as :
T = T_serial + T_communication + T_total_computation/n
You can see that in the above equation you have two terms, T_serial and T_communication, that don't change with the number of processing units.
Now, again, the faster the processing units, the faster the time (number of processing units) it takes to reach the point where T_total_computation/n is lower than T_serial+T_communication, at which point adding any more processing units makes completely no sense (in fact, it's counterproductive, because you would actually slow down the computation).
The higher limit for n depends a lot on the type of computation you're doin. There are cases in which that limit is, for e.g. a cluster using gigabit interconnection, is of 6-8 P4 @ 3.0GHz nodes. Others in which it is of the order of 100s or 1000s of nodes.
In the case of a render farm such as the ones that Pixar or LucasFilm are using, you have to remember that they usually have thousands of frames to render, each one requiring quite a lot of computation. Usually, the workload is, in that case, distributed such that each processing unit (node of a cluster, or gpu, or whatever) processes a number of frames.
Now, in the case of SLI, each gpu isn't processing a different frame, but a different part of the same frame. Moreover, each part of the said frame requires progressively less processing with each gpu you add to the (hypothetic) board. Not to mention the fact that your driver have to be smart enough to correctly balance the processing needed by each part of the frame, and so, in the end, the processing done by each gpu.
The more powerful the single gpu is, the faster you reach the limit for the number of [useful] gpus.
Daneel Olivaw
01-18-05, 07:24 PM
Elegantly put. We already see this overhead when the GPU workload is smaller (for example at 1024x768 and below) on dual 6800GT/U.
Lezmaka
01-25-05, 12:23 PM
The TYAN Thunder K8WEX supports dual 16x sli (courtesy of an nVidia IO-4 running in addition to the CK8-04). It's an dual Opteron board however (you might need two CPUs to run this setup in any future board).
I don't know if it's an arbitrary requirement, but a spec/config sheet I saw for a workstation with this board said SLI was only available with dual processors.
r2d2d3d4d5
01-25-05, 03:39 PM
I don't know if it's an arbitrary requirement, but a spec/config sheet I saw for a workstation with this board said SLI was only available with dual processors.
I was mostly thinking of this pic from The Inquirer (http://www.theinquirer.net/?article=20030). The second nVidia chip is hooked up to the second CPU. Makes me wonder if it's a requirement of this configuration. In the future a second chip might not be necessary at all....
r2d2d3d4d5
01-25-05, 06:46 PM
More info here (http://www.theinquirer.net/?article=20913) & here (http://www.nvidia.com/page/nforce_pro.html).
Opterons, however, support more than one HyperTransport link - 2-way Opterons, for example, support three. One connects the two chips together, meaning each chip has two spare. Each chip can use one HTT link to connect to the 2200 MCP. The 2200 supports 22 lanes of PCI Express, SATA and PATA RAID, integrated security features and native Gigabit Ethernet. However, with the spare HTT link each chip has, you can also throw in a 2050 MCP and grab yourself another 20 lanes of PCI Express along with more SATA ports. In fact, in a large-scale Opteron system, you can link together a single 2200 with multiple 2050s and get yourself a bandwidth-busting 80 lanes of PCI Express, as well as configuring a RAID array across 16 SATA drives.
:drooling: :drooling: :drooling: :drooling: :drooling:
vBulletin® v3.7.1, Copyright ©2000-2012, Jelsoft Enterprises Ltd.