Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 02-11-11, 01:53 PM   #1
jesmith
Registered User
 
Join Date: Mar 2007
Posts: 64
Default Using Dual Tesla's (Amazon EC2)

I'm using the new Amazon EC2 dual-Tesla configuration.

I'm running OpenGL, and using the graphics card to render to a frame buffer, which I then compress and send as a video stream.

My plan is to have half my server processes use one tesla card, and the other half use the other card. But I can't figure out how to talk to the second card.

This demonstrates my issue:

[root@ip-10-17-162-227 tmp]# echo $XAUTHORITY
/var/gdm/:0.Xauth
[root@ip-10-17-162-227 tmp]# export DISPLAY=:0.0
[root@ip-10-17-162-227 tmp]# xterm
Warning: Cannot convert string "nil2" to type FontStruct
^C
# note that this xterm worked
[root@ip-10-17-162-227 tmp]# export DISPLAY=:0.1
[root@ip-10-17-162-227 tmp]# xterm
Warning: This program is an suid-root program or is being run by the root user.
The full text of the error or warning message cannot be safely formatted
in this environment. You may get a more descriptive message by running the
program as a non-root user or by removing the suid bit on the executable.
xterm Xt error: Can't open display: %s
# note that this xterm didn't
[root@ip-10-17-162-227 tmp]#

I can open an xterm on :0.0, but not on :0.1. Everything I've read says that when you have two cards, this is how you direct things to run on one or the other.

Apparently not. So does anyone know how to get a process to use the second card?
Attached Files
File Type: gz nvidia-bug-report.log.gz (24.7 KB, 92 views)
jesmith is offline   Reply With Quote
Old 02-11-11, 02:23 PM   #2
AaronP
NVIDIA Corporation
 
AaronP's Avatar
 
Join Date: Mar 2005
Posts: 2,487
Default Re: Using Dual Tesla's (Amazon EC2)

You have to actually configure X to use both cards. Normally I'd say to use "nvidia-xconfig -a --separate-x-screens", but I have no idea whether that will work on Amazon EC2. It's worth a shot. Check /var/log/Xorg.0.log to see how your screens were actually initialized.
AaronP is offline   Reply With Quote
Old 02-11-11, 02:32 PM   #3
jesmith
Registered User
 
Join Date: Mar 2007
Posts: 64
Default Re: Using Dual Tesla's (Amazon EC2)

Did that (it's in the nvidia-bug-report). Relevant stuff from xorg.conf:

Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "Tesla M2050"
BusID "PCI:0:3:0"
Option "NoLogo" "true"
Option "AllowGLXWithComposite" "true"
Option "ModeValidation" "NoWidthAlignmentCheck" #AUTO
Option "ModeValidation" "AllowNon60HzDFPModes,NoVertRefreshCheck,NoHorizSy ncCheck,NoWidthAlignmentCheck,NoDFPNativeResolutio nCheck"

EndSection

Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "Tesla M2050"
BusID "PCI:0:4:0"
Option "NoLogo" "true"
Option "AllowGLXWithComposite" "true"
Option "ModeValidation" "NoWidthAlignmentCheck" #AUTO
Option "ModeValidation" "AllowNon60HzDFPModes,NoVertRefreshCheck,NoHorizSy ncCheck,NoWidthAlignmentCheck,NoDFPNativeResolutio nCheck"

EndSection

Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "ConnectedMonitor" "CRT"
Option "TwinView" "False"
Option "MetaModes" "1920x1440_60"
SubSection "Display"
Depth 24
Modes "1920x1440_60"
EndSubSection
EndSection

Section "Screen"
Identifier "Screen1"
Device "Device1"
Monitor "Monitor1"
DefaultDepth 24
Option "ConnectedMonitor" "CRT"
Option "TwinView" "False"
Option "MetaModes" "1920x1440_60"
SubSection "Display"
Depth 24
Modes "1920x1440_60"
EndSubSection
EndSection
jesmith is offline   Reply With Quote
Old 02-11-11, 07:18 PM   #4
AaronP
NVIDIA Corporation
 
AaronP's Avatar
 
Join Date: Mar 2005
Posts: 2,487
Default Re: Using Dual Tesla's (Amazon EC2)

Sorry, I missed that one was attached. Looks like your Device1 bus ID was correct in Xorg.0.log.old:
Code:
(--) PCI: (0:3:0) nVidia Corporation unknown chipset (0x06de) rev 163, Mem @ 0xd2000000/25, 0xc0000000/26, 0xc4000000/26, I/O @ 0xc100/7, BIOS @ 0xd7000000/19
(--) PCI: (0:4:0) nVidia Corporation unknown chipset (0x06de) rev 163, Mem @ 0xd4000000/25, 0xc8000000/26, 0xcc000000/26, I/O @ 0xc180/7, BIOS @ 0xd7080000/19
but became incorrect when you ran the server again:
Code:
(--) PCI: (0:3:0) nVidia Corporation unknown chipset (0x06de) rev 163, Mem @ 0xd2000000/25, 0xc0000000/26, 0xc4000000/26, I/O @ 0xc100/7, BIOS @ 0xd7000000/19
(--) PCI: (0:5:0) nVidia Corporation unknown chipset (0x06de) rev 163, Mem @ 0xd4000000/25, 0xc8000000/26, 0xcc000000/26, I/O @ 0xc180/7, BIOS @ 0xd7080000/19
...
(WW) NVIDIA: No matching Device section for instance (BusID PCI:0:5:0) found
Maybe the PCI topology is changing on different Amazon servers?
AaronP is offline   Reply With Quote
Old 02-15-11, 03:00 PM   #5
jesmith
Registered User
 
Join Date: Mar 2007
Posts: 64
Default Re: Using Dual Tesla's (Amazon EC2)

Awesome. I wrote a little script to fix the BusID values at boot time, and now it works correctly!

Thanks!

Are there any command-line utilities that I could use to check the memory and/or GPU utilization on these cards at run time? (To check that all my scripts are working correctly, and the processes are being distributed the way I want)
jesmith is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 10:40 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.