PDA

View Full Version : How to make Win7 handle eight GPUs properly ?


Pages : [1] 2

Romant
02-20-12, 12:13 AM
Hi,

Hardware part of the system consists of TYAN FT72B7015 barebone (http://tyan.com/product_SKU_spec.aspx?ProductType=BB&pid=412&SKU=600000188), two Xeon 5690 CPUs, 144 GB of RAM and 8 Zotac GTX 580 cards with 3GB of RAM each. This system used to work under XP64 without any problems and issues, however, system does not work under Win7 64 Ultimate (original question can be found here (http://social.technet.microsoft.com/Forums/en-US/w7itproperf/thread/265b7dff-3063-4056-871f-1dee4a505359)).

After a number of experiments I have localized the problem: system works just fine when only seven (not eight) GTX 580 cards are installed, however, as soon as the eighth card is installed OS starts to work very slow (extremely slow visualization of even simple windows GUI, all applications that use CUDA technology work very slow, FurMark test works slow and jumpy). It all happens with latest official Nvidia drivers (285.62).

I have an impression that this happens due to the lack of resources that OS distributes among various installed devices. Good experience with XP64 as well as a number of examples of succesfull FT72 + GTX 580 based builds makes me think that configuration with 8 GTX 580 cards can also be handled by Win7 properly.

What must be done in order to make such a configuration work fine under Win7 64 Ultimate ? May be drivers can be tweaked in some manner ?

Thanks in advance.

Redeemed
02-20-12, 12:22 AM
I'm sorry... after reading "two Xeon 5690 CPUs, 144GB of RAM and 8 Zotac GTX 580 cards with 3GB of RAM each" my brain shut down and I couldn't stop drooling.

Such a system is a million percent out of my league. :lol: No idea what the issue is. My best guess is that you broke a performance barrier- the system was operating so fast that it actually appeared as if it were slowing down.

Yup, I'm certain that is what's happening. :lol:

t3hl33td4rg0n
02-20-12, 12:55 AM
flux capacitor?

Redeemed
02-20-12, 02:23 AM
flux capacitor?

OH! I'd never considered that. :o Very good possibility. :D

Q
02-20-12, 10:43 AM
I have a feeling that it is due to a limitation in the new graphics framework provided under Vista/7. In many ways the graphics card in the 2000/XP environment was just another device with another driver. With the new model everything has changed.

You've got a long road ahead of you. You will need to submit a ticket with Microsoft formally, and I have a feeling they're going to tell you to contact Nvidia. I hope I'm wrong, because they will undoubtedly point the finger at Microsoft.

I wonder if anyone is having an issue with running 8 ATI GPUs? It would be nice to isolate if it is a Windows infrastructure bug versus and NVIDIA driver issue. Since the whole machine is slowing, I'd have to believe that it was Windows 7.

When you boot into safe mode, is the entire machine slow? This should bypass the driver and let you know if it is a resource allocation issue.

ViN86
02-20-12, 10:47 AM
I assume this is a computation machine. The link to the server rack mount says it supports 8 GPU's and has adequate power, so I'm not quite sure what the issue is.

You are pushing the RAM limitation on Windows 7 Ultimate x64. Have you considered Windows Server or Linux? I guess it depends on how you are using these GPU's. If you are writing your own code for them in C/C++ (for CUDA) then Linux would be a better choice anyways. If you're using specific Windows software, then I would make sure Windows server would fix the issue before you take the plunge.

Good luck.

EDIT: The more I look at this machine, the more I think you shouldn't be running a consumer version of Windows on it. It needs Windows Server or some other Enterprise software (or just run Linux).

Romant
02-20-12, 01:45 PM
Yes, this machine is a number cruncher.

>>> When you boot into safe mode, is the entire machine slow? This should bypass the driver and let you know if it is a resource allocation issue.
Yes, it is slow. May not that slow but slower than normal anyway.

>>> It needs Windows Server or some other Enterprise software (or just run Linux).
I thought of it but Nvidia does not provide Geforce drivers for server Windows versions.

Migration to Linux is also not a variant - soft that works on this machine is windows-specific.

What I'm currently trying to do is to get rid of High Definition Audio devices (Device Manager shows 32 devices of this kind when all 8 graphics cards are plugged).

ViN86
02-20-12, 02:37 PM
May I ask what software you are using?

Romant
02-20-12, 02:56 PM
>>> May I ask what software you are using?
It is a proprietary project, software is not well-known. What I can say is that it is not that simple to migrate it to Linux.

ViN86
02-20-12, 03:13 PM
Have you considered contacting Nvidia or Microsoft?

Nvidia may have drivers for Server systems. Computation machines such as the one you're using are not completely new. Surely they must have, at least, a beta driver for you to test. They should aim to help you since you have 8 cards, and whatever work you do would be great promotion/testing for their cards.

Redeemed
02-20-12, 03:39 PM
Have you considered running Tesla cards instead?

I cannot believe it Slawter- his rig makes yours look like a smart phone. :lol:

Redeemed
02-20-12, 03:48 PM
Hopefully an iPhone ;)

:p :lol:

ViN86
02-20-12, 04:07 PM
Have you considered running Tesla cards instead?

That's another option. But that would be a very large price tag for 8 high end Tesla cards, which typically cost twice as much as their GeForce counterparts. And the capital investment has already been made, so it'd be a pain to sell/buy all the cards.

The only real performance difference between Quadro cards and the GeForce "equivalents" is the double precision floating point performance. There are some bandwidth discrepancies too that can be addressed via BIOS updates.

This may seem like a silly question, but have you looked at other people who run the same or similar setups? What OS do they use?

Also, was this setup purchased or assembled? There are vendors that sell computation rigs like that with multiple GTX 580's. Do they support Windows? If so what version and how many GPU's are they using? Another option to consider is scaling back the PCIe width on the cards to 4x or 8x (if they aren't scaled back already, which they probably are).

Romant
02-20-12, 05:24 PM
>>> Have you considered contacting Nvidia or Microsoft?
I would like to but have no ideas how to do it. Nvidia support web pages provide links to forums where nobody answers, Microsoft forums are a bit more responsive but also not too much. How to contact them to make them do something ?

>>> Tesla
Well, GTX580 cards are a bit cheaper than Teslas, I would say that price difference is very remarkable. In addition, floats are enough for my purposes.

>>> Other people/companies that build something similar.
Yes, I have contacted one company. They say they do exactly same things with exactly same OS and drivers with almost exactly similar hardware (they use FT77 instead of FT72 barebone) - and everything works just fine for them, no problems at all. I would have considered hardware problems - but this rig worked under XP64!

ViN86
02-20-12, 05:42 PM
You also get business class support and drivers. That's a BIG difference between GeForce and Tesla. By the looks of it, this system is used for professional workloads. So a few thousand bucks shouldn't be a problem.

While true for large corporations, smaller companies aren't so keen on dropping a couple thousand on something that should work the way it is. ;)

There are commercial systems available that include 4+ GPU's and can be setup with 580's, so it should be possible. They don't seem to have driver problems. But then again, I'm pretty sure they run custom Linux variants.

ViN86
02-20-12, 05:45 PM
>>> Have you considered contacting Nvidia or Microsoft?
I would like to but have no ideas how to do it. Nvidia support web pages provide links to forums where nobody answers, Microsoft forums are a bit more responsive but also not too much. How to contact them to make them do something ?

>>> Tesla
Well, GTX580 cards are a bit cheaper than Teslas, I would say that price difference is very remarkable. In addition, floats are enough for my purposes.

>>> Other people/companies that build something similar.
Yes, I have contacted one company. They say they do exactly same things with exactly same OS and drivers with almost exactly similar hardware (they use FT77 instead of FT72 barebone) - and everything works just fine for them, no problems at all. I would have considered hardware problems - but this rig worked under XP64!
Very interesting. I would continue running 7 until you can figure it out.

Is it possible the 8th card is damaged/defective?

Romant
02-20-12, 05:49 PM
>>> Very interesting. Have you considered testing the cards 1 at a time? Start with 1, add 1, add 1, add 1, etc.?
Yes, that's exactly what I have done. 1, 2, 3, 4, 5, 6, 7 - OK; as soon as all eight cards are plugged - slowdown.

ViN86
02-20-12, 05:50 PM
>>> Very interesting. Have you considered testing the cards 1 at a time? Start with 1, add 1, add 1, add 1, etc.?
Yes, that's exactly what I have done. 1, 2, 3, 4, 5, 6, 7 - OK; as soon as all eight cards are plugged - slowdown.

Yes, sorry I re-read the OP and edited the above post. I was wondering if it's a single card that's causing the problem (one is defective).

Also, the QUOTE button is easier. ;)

Romant
02-20-12, 05:50 PM
>>> Is it possible the 8th card is damaged/defective?
I have replugged the cards into different slots.

ViN86
02-20-12, 05:54 PM
>>> Is it possible the 8th card is damaged/defective?
I have replugged the cards into different slots.

Ok, have you tried 1 at a time? As in test card 1, remove, test card 2, remove, test card 3, remove, etc. ?

Also have you tested all the PCIe slots individually? Working card test in slot 1, test in slot 2, test in slot 3, etc.

It takes a very long time to troubleshoot these kinds of things, but it will prevent you from going insane. :)

Romant
02-20-12, 06:17 PM
I'm already a bit insane :-), lots of things have been tried.

Test each card in each slot (considering the only card plugged at a time) gives 64 variants, say, 8 minutes each. 512 minutes of pure action :-)

ViN86
02-20-12, 06:19 PM
I'm already a bit insane :-), lots of things have been tried.

Test each card in each slot (considering the only card plugged at a time) gives 64 variants, say, 8 minutes each. 512 minutes of pure action :-)

:rofl

I can't think of a better way to spend a day. It shouldn't take that long though. I'd test each card individually (8 cards), then if all cards work, test one of them in each of the 8 slots, so it'd only be 16 boots instead of 64, so 128 minutes roughly. ;)

Romant
02-21-12, 09:10 AM
:rofl
I can't think of a better way to spend a day. It shouldn't take that long though. I'd test each card individually (8 cards), then if all cards work, test one of them in each of the 8 slots, so it'd only be 16 boots instead of 64, so 128 minutes roughly. ;)

I have tested the system using the method described above. All GTX580 cards have been removed from the system physically, each card has been tested separately, each slot has been tested separately, all cards were tested in bunches. Result is the same: as soon as the number of plugged cards becomes equal to eight the system jams.

I have done all these tests on a clean system (reinstalled Win7).

Arghhh ...

ViN86
02-21-12, 11:01 AM
I don't know if this is related:
http://forums.nvidia.com/index.php?showtopic=218028

Have you tried updating the BIOS? Maybe contact Tyan?

http://www.tyan.com/support_download_bios.aspx?model=B.FT72B7015

ViN86
02-21-12, 12:32 PM
Well, the time and recourses spend just to get this working is also expensive for a company.

A GeForce Solution costs around $4800. An equivalent Tesla solution around $19000.

Based on our "rates" at work, you'd probably have 2 work weeks to get it running (and also keep it running!). Otherwise the Tesla solution would be cheaper already.
And that's just considering the cost for 1 person, no resources included. The effective duration would be shorter.

The Tesla solution is not that expensive for a company, big or small.

I've worked for some small companies that would disagree. I mean, he does have 7 GPU's running. Just having issues with that 8th.