nV News Forums

 
 

nV News Forums (http://www.nvnews.net/vbulletin/index.php)
-   NVIDIA Linux (http://www.nvnews.net/vbulletin/forumdisplay.php?f=14)
-   -   Unable to install Nvidia driver in SLES 11 (http://www.nvnews.net/vbulletin/showthread.php?t=156211)

asenjo 10-18-10 02:03 PM

Unable to install Nvidia driver in SLES 11
 
Hi all,

I'm sorry if this is not the appropriate forum to post the following issue. I'm trying to use Cuda in a HP Proliant DL580 server (32 cores, 128GB main memory) connected to a brand new Tesla s2050 with 4 Fermi GPUs. The server is running SLES 11 SP1 x86_64 in runlevel 3 with this kernel:

Code:

yuca:~ # uname -a
Linux yuca 2.6.32.23-0.3-default #1 SMP 2010-10-07 14:57:45 +0200 x86_64 x86_64 x86_64 GNU/Linux

The lspci command reports the following:

Code:

yuca:~ # lspci
00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI Port (rev 22)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)
00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 2 (rev 22)
....
....
81:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
81:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
86:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
87:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
87:01.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
87:02.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
87:03.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
8a:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
8f:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
90:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
90:01.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
90:02.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
90:03.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
93:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)

So it seams that the server can see the Tesla s2050 through the PCI bus.

However, if I try to install the latest NVIDIA driver (NVIDIA-Linux-x86_64-260.19.12.run.sh), I first get a warning "You do not appear to have an NVIDIA GPU supported by the 260.19.12 NVIDIA Linux graphics driver installed in this system". I don't understand why, because in this page: http://www.nvidia.com/object/product...-S2050-us.html, SLES 11 is one of the OS supported for the s2050, and here: http://www.nvidia.com/object/linux-d...12-driver.html the s2050 is listed as supported.

Then, although the module is successfully compiled, it can not be loaded: "ERROR: Unable to load the kernel module 'nvidia.ko'.". Looking at the /var/log/nvidia-installer.log file, I found these two important messages:

Code:

-> Kernel module load error: insmod: error inserting './kernel/nvidia.ko': -1
  No such device

and, in the kernel messages section:

Code:

  [276290.049770] NVRM: No NVIDIA graphics adapter found!
I would be really really grateful if you can help me or point out what I'm doing wrong. Thank you very much in advance.

The whole nvidia-installer.log file follows:
Code:

yuca:~ # cat /var/log/nvidia-installer.log
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Mon Oct 18 17:44:07 2010
installer version: 260.19.12

PATH:
/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin
/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin

option status:
  license pre-accepted              : false
  update                            : false
  force update                      : false
  expert                            : false
  uninstall                          : false
  driver info                        : false
  precompiled interfaces            : true
  no ncurses color                  : false
  query latest version              : false
  no questions                      : false
  silent                            : false
  no recursion                      : false
  no backup                          : false
  kernel module only                : false
  sanity                            : false
  add this kernel                    : false
  no runlevel check                  : false
  no network                        : false
  no ABI note                        : false
  no RPMs                            : false
  no kernel module                  : false
  force SELinux                      : default
  no X server check                  : false
  no cc version check                : false
  run distro scripts                : true
  no nouveau check                  : false
  run nvidia-xconfig                : false
  sigwinch work around              : true
  force tls                          : (not specified)
  force compat32 tls                : (not specified)
  X install prefix                  : (not specified)
  X library install path            : (not specified)
  X module install path              : (not specified)
  OpenGL install prefix              : (not specified)
  OpenGL install libdir              : (not specified)
  compat32 install chroot            : (not specified)
  compat32 install prefix            : (not specified)
  compat32 install libdir            : (not specified)
  utility install prefix            : (not specified)
  utility install libdir            : (not specified)
  installer prefix                  : (not specified)
  doc install prefix                : (not specified)
  kernel name                        : (not specified)
  kernel include path                : (not specified)
  kernel source path                : (not specified)
  kernel output path                : (not specified)
  kernel install path                : (not specified)
  precompiled kernel interfaces path : (not specified)
  precompiled kernel interfaces url  : (not specified)
  proc mount point                  : /proc
  ui                                : (not specified)
  tmpdir                            : /tmp
  ftp mirror                        : ftp://download.nvidia.com
  RPM file list                      : (not specified)
  selinux chcon type                : (not specified)

Using: nvidia-installer ncurses user interface
WARNING: You do not appear to have an NVIDIA GPU supported by the 260.19.12
        NVIDIA Linux graphics driver installed in this system.  For further
        details, please see the appendix SUPPORTED NVIDIA GRAPHICS CHIPS in
        the README available on the Linux driver download page at
        www.nvidia.com.
-> License accepted.
-> Installing NVIDIA driver version 260.19.12.
-> Performing CC sanity check with CC="cc".
-> Performing CC version check with CC="cc".
-> Kernel source path: '/lib/modules/2.6.32.23-0.3-default/source'
-> Kernel output path: '/lib/modules/2.6.32.23-0.3-default/build'
-> Performing rivafb check.
-> Performing nvidiafb check.
-> Performing Xen check.
-> Cleaning kernel module build directory.
  executing: 'cd ./kernel; make clean'...
-> Building kernel module:
  executing: 'cd ./kernel; make module SYSSRC=/lib/modules/2.6.32.23-0.3-defau
  lt/source SYSOUT=/lib/modules/2.6.32.23-0.3-default/build'...
  NVIDIA: calling KBUILD...
  make -C /lib/modules/2.6.32.23-0.3-default/build \
          KBUILD_SRC=/usr/src/linux-2.6.32.23-0.3 \
          KBUILD_EXTMOD="/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel" -f /u
  sr/src/linux-2.6.32.23-0.3/Makefile \
          modules
  test -e include/linux/autoconf.h -a -e include/config/auto.conf || (                \
          echo;                                                                \
          echo "  ERROR: Kernel configuration is invalid.";                \
          echo "        include/linux/autoconf.h or include/config/auto.conf are mis
  sing.";        \
          echo "        Run 'make oldconfig && make prepare' on kernel src to fix it
  .";        \
          echo;                                                                \
          /bin/false)
  mkdir -p /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.tmp_versions
  ; rm -f /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.tmp_versions/
  *
  make -f /usr/src/linux-2.6.32.23-0.3/scripts/Makefile.build obj=/tmp/selfgz3
  5869/NVIDIA-Linux-x86_64-260.19.12/kernel
    cc -Wp,-MD,/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.nv.o.d 
  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -Iinclude -I
  include2 -I/usr/src/linux-2.6.32.23-0.3/include -I/usr/src/linux-2.6.32.23-0
  .3/arch/x86/include -include include/linux/autoconf.h  -I/tmp/selfgz35869/N
  VIDIA-Linux-x86_64-260.19.12/kernel -D__KERNEL__ -Wall -Wundef -Wstrict-prot
  otypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-func
  tion-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m
  64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulat
  e-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-
  sign-compare -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-protector -fo
  mit-frame-pointer -fasynchronous-unwind-tables -g -Wdeclaration-after-statem
  ent -Wno-pointer-sign -fno-strict-overflow  -I/tmp/selfgz35869/NVIDIA-Linux
  -x86_64-260.19.12/kernel -Wall -MD -Wsign-compare -Wno-cast-qual -Wno-error
  -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"260.19.12\" -mcmodel=kern
  el -mno-red-zone -UDEBUG -U_DEBUG -DNDEBUG  -DMODULE -D"KBUILD_STR(s)=#s" -D
  "KBUILD_BASENAME=KBUILD_STR(nv)"  -D"KBUILD_MODNAME=
  KBUILD_STR(nvidia)" -D"DEBUG_HASH=10" -D"DEBUG_HASH2=49" -c -o /tmp/selfgz35
  869/NVIDIA-Linux-x86_64-260.19.12/kernel/.tmp_nv.o /tmp/selfgz35869/NVIDIA-L
  inux-x86_64-260.19.12/kernel/nv.c
    cc -Wp,-MD,/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.nv_gvi.o
  .d  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -Iinclud
  e -Iinclude2 -I/usr/src/linux-2.6.32.23-0.3/include -I/usr/src/linux-2.6.32.
  23-0.3/arch/x86/include -include include/linux/autoconf.h  -I/tmp/selfgz358
  69/NVIDIA-Linux-x86_64-260.19.12/kernel -D__KERNEL__ -Wall -Wundef -Wstrict-
  prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-
  function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O
  2 -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccum
  ulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -
  Wno-sign-compare -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-protector
  -fomit-frame-pointer -fasynchronous-unwind-tables -g -Wdeclaration-after-sta
  t
  ement -Wno-pointer-sign -fno-strict-overflow  -I/tmp/selfgz35869/NVIDIA-Lin
  ux-x86_64-260.19.12/kernel -Wall -MD -Wsign-compare -Wno-cast-qual -Wno-erro
  r -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"260.19.12\" -mcmodel=ke
  rnel -mno-red-zone -UDEBUG -U_DEBUG -DNDEBUG  -DMODULE -D"KBUILD_STR(s)=#s"
  -D"KBUILD_BASENAME=KBUILD_STR(nv_gvi)"  -D"KBUILD_MODNAME=KBUILD_STR(nvidia)
  " -D"DEBUG_HASH=10" -D"DEBUG_HASH2=49" -c -o /tmp/selfgz35869/NVIDIA-Linux-x
  86_64-260.19.12/kernel/.tmp_nv_gvi.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-26
  0.19.12/kernel/nv_gvi.c
    cc -Wp,-MD,/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.nv-vm.o.
  d  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -Iinclude
  -Iinclude2 -I/usr/src/linux-2.6.32.23-0.3/include -I/usr/src/linux-2.6.32.23
  -0.3/arch/x86/include -include include/linux/autoconf.h  -I/tmp/selfgz35869
  /NVIDIA-Linux-x86_64-260.19.12/kernel -D__KERNEL__ -Wall -Wundef -Wstrict-pr
  ototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-fu
  nction-decl
  aration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m64 -mtune
  =generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoin
  g-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-comp
  are -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-protector -fomit-frame
  -pointer -fasynchronous-unwind-tables -g -Wdeclaration-after-statement -Wno-
  pointer-sign -fno-strict-overflow  -I/tmp/selfgz35869/NVIDIA-Linux-x86_64-2
  60.19.12/kernel -Wall -MD -Wsign-compare -Wno-cast-qual -Wno-error -D__KERNE
  L__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"260.19.12\" -mcmodel=kernel -mno-r
  ed-zone -UDEBUG -U_DEBUG -DNDEBUG  -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_B
  ASENAME=KBUILD_STR(nv_vm)"  -D"KBUILD_MODNAME=KBUILD_STR(nvidia)" -D"DEBUG_H
  ASH=10" -D"DEBUG_HASH2=49" -c -o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19
  .12/kernel/.tmp_nv-vm.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kerne
  l/nv-vm.c
  /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nv-vm.c: In function ?
  ??nv_sg_map_buffer’:
  /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nv-vm.c:151: warning:
  assignment makes integer from pointer without a cast
  /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nv-vm.c:236: warning:
  label ‘done’ defined but not used
  /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nv-vm.c:144: warning:
  unused variable ‘count’
    cc -Wp,-MD,/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.os-agp.o
  .d  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -Iinclud
  e -Iinclude2 -I/usr/src/linux-2.6.32.23-0.3/include -I/usr/src/linux-2.6.32.
  23-0.3/arch/x86/include -include include/linux/autoconf.h  -I/tmp/selfgz358
  69/NVIDIA-Linux-x86_64-260.19.12/kernel -D__KERNEL__ -Wall -Wundef -Wstrict-
  prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-
  function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O
  2 -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccum
  ulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pi
  pe -Wno-sign-compare -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-prote
  ctor -fomit-frame-pointer -fasynchronous-unwind-tables -g -Wdeclaration-afte
  r-statement -Wno-pointer-sign -fno-strict-overflow  -I/tmp/selfgz35869/NVID
  IA-Linux-x86_64-260.19.12/kernel -Wall -MD -Wsign-compare -Wno-cast-qual -Wn
  o-error -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"260.19.12\" -mcmo
  del=kernel -mno-red-zone -UDEBUG -U_DEBUG -DNDEBUG  -DMODULE -D"KBUILD_STR(s
  )=#s" -D"KBUILD_BASENAME=KBUILD_STR(os_agp)"  -D"KBUILD_MODNAME=KBUILD_STR(n
  vidia)" -D"DEBUG_HASH=10" -D"DEBUG_HASH2=49" -c -o /tmp/selfgz35869/NVIDIA-L
  inux-x86_64-260.19.12/kernel/.tmp_os-agp.o /tmp/selfgz35869/NVIDIA-Linux-x86
  _64-260.19.12/kernel/os-agp.c
    cc -Wp,-MD,/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.os-inter
  face.o.d  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -I
  include -Iinclude2 -I/usr/src/linux-2.6.32.23-0.3/include -I/usr/src/linux-2
  .6.32.23-0.3/arch/x86/include -include include/linux/autoconf.h  -I/tmp/sel
  fgz358
  69/NVIDIA-Linux-x86_64-260.19.12/kernel -D__KERNEL__ -Wall -Wundef -Wstrict-
  prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-
  function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O
  2 -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccum
  ulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -
  Wno-sign-compare -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-protector
  -fomit-frame-pointer -fasynchronous-unwind-tables -g -Wdeclaration-after-sta
  tement -Wno-pointer-sign -fno-strict-overflow  -I/tmp/selfgz35869/NVIDIA-Li
  nux-x86_64-260.19.12/kernel -Wall -MD -Wsign-compare -Wno-cast-qual -Wno-err
  or -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"260.19.12\" -mcmodel=k
  ernel -mno-red-zone -UDEBUG -U_DEBUG -DNDEBUG  -DMODULE -D"KBUILD_STR(s)=#s"
  -D"KBUILD_BASENAME=KBUILD_STR(os_interface)"  -D"KBUILD_MODNAME=KBUILD_STR(n
  vidia)" -D"DEBUG_HASH=10" -D"DEBUG_HASH2=49" -c -o /tmp/selfgz35869/NVIDIA-L
  inux-x86_64-260.19.12/kernel/.tmp_
  os-interface.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/os-inte
  rface.c
    cc -Wp,-MD,/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.os-regis
  try.o.d  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -Ii
  nclude -Iinclude2 -I/usr/src/linux-2.6.32.23-0.3/include -I/usr/src/linux-2.
  6.32.23-0.3/arch/x86/include -include include/linux/autoconf.h  -I/tmp/self
  gz35869/NVIDIA-Linux-x86_64-260.19.12/kernel -D__KERNEL__ -Wall -Wundef -Wst
  rict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-impl
  icit-function-declaration -Wno-format-security -fno-delete-null-pointer-chec
  ks -O2 -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -m
  accumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -p
  ipe -Wno-sign-compare -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-prot
  ector -fomit-frame-pointer -fasynchronous-unwind-tables -g -Wdeclaration-aft
  er-statement -Wno-pointer-sign -fno-strict-overflow  -I/tmp/selfgz35869/NVI
  DIA-Linux-x86_64-260.19.12/k
  ernel -Wall -MD -Wsign-compare -Wno-cast-qual -Wno-error -D__KERNEL__ -DMODU
  LE -DNVRM -DNV_VERSION_STRING=\"260.19.12\" -mcmodel=kernel -mno-red-zone -U
  DEBUG -U_DEBUG -DNDEBUG  -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KB
  UILD_STR(os_registry)"  -D"KBUILD_MODNAME=KBUILD_STR(nvidia)" -D"DEBUG_HASH=
  10" -D"DEBUG_HASH2=49" -c -o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/
  kernel/.tmp_os-registry.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/ker
  nel/os-registry.c
    cc -Wp,-MD,/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.nv-i2c.o
  .d  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -Iinclud
  e -Iinclude2 -I/usr/src/linux-2.6.32.23-0.3/include -I/usr/src/linux-2.6.32.
  23-0.3/arch/x86/include -include include/linux/autoconf.h  -I/tmp/selfgz358
  69/NVIDIA-Linux-x86_64-260.19.12/kernel -D__KERNEL__ -Wall -Wundef -Wstrict-
  prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-
  function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O
  2 -m64 -mtune=gene
  ric -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-arg
  s -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare -
  mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-protector -fomit-frame-poin
  ter -fasynchronous-unwind-tables -g -Wdeclaration-after-statement -Wno-point
  er-sign -fno-strict-overflow  -I/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19
  .12/kernel -Wall -MD -Wsign-compare -Wno-cast-qual -Wno-error -D__KERNEL__ -
  DMODULE -DNVRM -DNV_VERSION_STRING=\"260.19.12\" -mcmodel=kernel -mno-red-zo
  ne -UDEBUG -U_DEBUG -DNDEBUG  -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENA
  ME=KBUILD_STR(nv_i2c)"  -D"KBUILD_MODNAME=KBUILD_STR(nvidia)" -D"DEBUG_HASH=
  10" -D"DEBUG_HASH2=49" -c -o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/
  kernel/.tmp_nv-i2c.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/n
  v-i2c.c
    cc -Wp,-MD,/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.nvacpi.o
  .d  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -Iinclud
  e -Iinclude2 -I/usr/src/linux-2.6.32.23-0.3/include -I/usr/src/linux-2.6.32.
  23-0.3/arch/x86/include -include include/linux/autoconf.h  -I/tmp/selfgz358
  69/NVIDIA-Linux-x86_64-260.19.12/kernel -D__KERNEL__ -Wall -Wundef -Wstrict-
  prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-
  function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O
  2 -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccum
  ulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -
  Wno-sign-compare -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-protector
  -fomit-frame-pointer -fasynchronous-unwind-tables -g -Wdeclaration-after-sta
  tement -Wno-pointer-sign -fno-strict-overflow  -I/tmp/selfgz35869/NVIDIA-Li
  nux-x86_64-260.19.12/kernel -Wall -MD -Wsign-compare -Wno-cast-qual -Wno-err
  or -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"260.19.12\" -mcmodel=k
  ernel -mno-red-zone -UDEBUG -U_DEBUG -DNDEBUG  -DMODULE -D"KBUILD_STR(s)=#s"
  -D"KBUILD_BASENAME=KBUILD_STR(nvacpi)"  -D"KBUILD_MODNAME=KBUI
  LD_STR(nvidia)" -D"DEBUG_HASH=10" -D"DEBUG_HASH2=49" -c -o /tmp/selfgz35869/
  NVIDIA-Linux-x86_64-260.19.12/kernel/.tmp_nvacpi.o /tmp/selfgz35869/NVIDIA-L
  inux-x86_64-260.19.12/kernel/nvacpi.c
    ld -m elf_x86_64  -r -o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/ke
  rnel/nvidia.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nv-kerne
  l.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nv.o /tmp/selfgz35
  869/NVIDIA-Linux-x86_64-260.19.12/kernel/nv_gvi.o /tmp/selfgz35869/NVIDIA-Li
  nux-x86_64-260.19.12/kernel/nv-vm.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260
  .19.12/kernel/os-agp.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel
  /os-interface.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/os-reg
  istry.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nv-i2c.o /tmp/
  selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nvacpi.o
  (cat /dev/null;  echo kernel//tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12
  /kernel/nvidia.ko;) > /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/
  modules.order
  make -f /usr/src/linux-2.6.32.23-0.3/scripts/Makefile.modpost
    scripts/mod/modpost -m -a -i /usr/src/linux-2.6.32.23-0.3-obj/x86_64/defau
  lt/Module.symvers -I /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/M
  odule.symvers  -o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/Modu
  le.symvers -S -w  -N /usr/src/linux-2.6.32.23-0.3-obj/x86_64/default/Module
  .supported -s
  WARNING: could not find /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kerne
  l/.nv-kernel.o.cmd for /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel
  /nv-kernel.o
    cc -Wp,-MD,/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/.nvidia.m
  od.o.d  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.3/include -Iin
  clude -Iinclude2 -I/usr/src/linux-2.6.32.23-0.3/include -I/usr/src/linux-2.6
  .32.23-0.3/arch/x86/include -include include/linux/autoconf.h  -I/tmp/selfg
  z35869/NVIDIA-Linux-x86_64-260.19.12/kernel -D__KERNEL__ -Wall -Wundef -Wstr
  ict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-impli
  cit-function-declaration
    -Wno-format-security -fno-delete-null-pointer-checks -O2 -m64 -mtune=generi
  c -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args
  -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare -mn
  o-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-stack-protector -fomit-frame-pointe
  r -fasynchronous-unwind-tables -g -Wdeclaration-after-statement -Wno-pointer
  -sign -fno-strict-overflow  -I/tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.1
  2/kernel -Wall -MD -Wsign-compare -Wno-cast-qual -Wno-error -D__KERNEL__ -DM
  ODULE -DNVRM -DNV_VERSION_STRING=\"260.19.12\" -mcmodel=kernel -mno-red-zone
  -UDEBUG -U_DEBUG -DNDEBUG  -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_ST
  R(nvidia.mod)"  -D"KBUILD_MODNAME=KBUILD_STR(nvidia)" -D"DEBUG_HASH=10" -D"D
  EBUG_HASH2=49" -DMODULE -c -o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12
  /kernel/nvidia.mod.o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/n
  vidia.mod.c
    ld -r -m elf_x86_64 -T /usr/src/linux-2.6.32.23-0.3/scripts/module-common.
  lds --build-id -o /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nvid
  ia.ko /tmp/selfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nvidia.o /tmp/se
  lfgz35869/NVIDIA-Linux-x86_64-260.19.12/kernel/nvidia.mod.o
  NVIDIA: left KBUILD.
-> done.
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most
      frequently when this kernel module was built against the wrong or
      improperly configured kernel sources, with a version of gcc that differs
      from the one used to build the target kernel, or if a driver such as
      rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel
      module from obtaining ownership of the NVIDIA graphics device(s), or
      NVIDIA GPU installed in this system is not supported by this NVIDIA
      Linux graphics driver release.
     
      Please see the log entries 'Kernel module load error' and 'Kernel
      messages' at the end of the file '/var/log/nvidia-installer.log' for
      more information.
-> Kernel module load error: insmod: error inserting './kernel/nvidia.ko': -1
  No such device
-> Kernel messages:
  [  36.101352] eth0: no IPv6 routers present
  [  546.591477] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:24:36:a3:02:9c:08:00 SRC=150.214.109.13
  DST=150.214.109.84 LEN=64 TOS=0x00 PREC=0x00 TTL=63 ID=23295 DF PROTO=TCP
  SPT=35095 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0 OPT
  (020405B4010303030101080A1E58ADF30000000004020000)
  [  556.864993] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:24:36:a3:02:9c:08:00 SRC=150.214.109.13
  DST=150.214.109.84 LEN=64 TOS=0x00 PREC=0x00 TTL=63 ID=21435 DF PROTO=TCP
  SPT=41638 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0 OPT
  (020405B4010303030101080A1E58AE590000000004020000)
  [  577.509559] JBD: barrier-based sync failed on cciss/c0d0p4 - disabling
  barriers
  [  645.751341] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:24:36:a3:02:9c:08:00 SRC=150.214.109.13
  DST=150.214.109.84 LEN=64 TOS=0x00 PREC=0x00 TTL=63 ID=18974 DF PROTO=TCP
  SPT=47962 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0 OPT
  (020405B4010303030101080A1E58B1D20000000004020000)
  [  700.135815] nvidia: module license 'NVIDIA' taints kernel.
  [  700.135819] Disabling lock debugging due to kernel taint
  [  700.542134] NVRM: No NVIDIA graphics adapter found!
  [44240.339713] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:15:e8:af:22:02:08:00 SRC=95.79.26.146
  DST=150.214.109.84 LEN=60 TOS=0x00 PREC=0x00 TTL=45 ID=2151 DF PROTO=TCP
  SPT=35377 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT
  (020405640402080A00FCD5260000000001030307)
  [45980.056651] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:15:e8:af:22:02:08:00 SRC=66.240.52.5
  DST=150.214.109.84 LEN=60 TOS=0x00 PREC=0x00 TTL=45 ID=41866 DF PROTO=TCP
  SPT=37427 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT
  (020405640402080A08CA97270000000001030307)
  [172620.812792] SFW2-INext-DROP-DEFLT IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:0c:29:fc:f0:d0:08:00 SRC=150.214.109.7
  DST=150.214.109.84 LEN=353 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
  SPT=67 DPT=68 LEN=333
  [172620.815096] SFW2-INext-DROP-DEFLT IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:0c:29:22:e7:aa:08:00 SRC=150.214.109.1
  DST=150.214.109.84 LEN=353 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
  SPT=67 DPT=68 LEN=333
  [204601.250607] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:15:e8:af:22:02:08:00 SRC=87.233.170.130
  DST=150.214.109.84 LEN=48 TOS=0x00 PREC=0x00 TTL=108 ID=62661 PROTO=TCP
  SPT=17905 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0 OPT (0204056401010402)
  [208778.459994] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:15:e8:af:22:02:08:00 SRC=87.233.170.130
  DST=150.214.109.84 LEN=60 TOS=0x00 PREC=0x00 TTL=44 ID=4033 DF PROTO=TCP
  SPT=58973 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT
  (020405640402080A0707A3150000000001030306)
  [238402.525866] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:1c:eb:a8:a2:2c:08:00 SRC=119.226.71.89
  DST=150.214.109.84 LEN=60 TOS=0x00 PREC=0x00 TTL=36 ID=42300 DF PROTO=TCP
  SPT=59420 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT
  (020405640402080A0DC82F230000000001030307)
  [238860.247562] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:1c:eb:a8:a2:2c:08:00 SRC=119.226.71.89
  DST=150.214.109.84 LEN=60 TOS=0x00 PREC=0x00 TTL=36 ID=53885 DF PROTO=TCP
  SPT=53465 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT
  (020405640402080A0DCF33610000000001030307)
  [238865.168467] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:1c:eb:a8:a2:2c:08:00 SRC=119.226.71.89
  DST=150.214.109.84 LEN=60 TOS=0x00 PREC=0x00 TTL=36 ID=41865 DF PROTO=TCP
  SPT=53572 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT
  (020405640402080A0DCF46C00000000001030307)
  [238869.130913] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:1c:eb:a8:a2:2c:08:00 SRC=119.226.71.89
  DST=150.214.109.84 LEN=60 TOS=0x00 PREC=0x00 TTL=36 ID=1872 DF PROTO=TCP
  SPT=53687 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT
  (020405640402080A0DCF563A0000000001030307)
  [238876.887722] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:1c:eb:a8:a2:2c:08:00 SRC=119.226.71.89
  DST=150.214.109.84 LEN=60 TOS=0x00 PREC=0x00 TTL=36 ID=7247 DF PROTO=TCP
  SPT=53887 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT
  (020405640402080A0DCF74930000000001030307)
  [238882.609385] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:1c:eb:a8:a2:2c:08:00 SRC=119.226.71.89
  DST=150.214.109.84 LEN=60 TOS=0x00 PREC=0x00 TTL=36 ID=48747 DF PROTO=TCP
  SPT=54021 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 OPT
  (020405640402080A0DCF8AF20000000001030307)
  [275011.347300] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:24:36:a3:02:9c:08:00 SRC=150.214.109.13
  DST=150.214.109.84 LEN=64 TOS=0x00 PREC=0x00 TTL=63 ID=16543 DF PROTO=TCP
  SPT=45715 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0 OPT
  (020405B4010303030101080A1E5DAAC60000000004020000)
  [275799.921757] NVRM: No NVIDIA graphics adapter found!
  [275840.621579] SFW2-INext-ACC-TCP IN=eth0 OUT=
  MAC=d8:d3:85:62:aa:84:00:24:36:a3:02:9c:08:00 SRC=150.214.109.13
  DST=150.214.109.84 LEN=64 TOS=0x00 PREC=0x00 TTL=63 ID=36444 DF PROTO=TCP
  SPT=35639 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0 OPT
  (020405B4010303030101080A1E5DCB2C0000000004020000)
  [275917.899033] NVRM: No NVIDIA graphics adapter found!
  [276290.049770] NVRM: No NVIDIA graphics adapter found!
ERROR: Installation has failed.  Please see the file
      '/var/log/nvidia-installer.log' for details.  You may find suggestions
      on fixing installation problems in the README available on the Linux
      driver download page at www.nvidia.com.


AaronP 10-19-10 08:44 AM

Re: Unable to install Nvidia driver in SLES 11
 
You abridged your lspci log and didn't attach an nvidia-bug-report.log.gz so it's hard to say for sure, but it looks like only the Tesla bridges are visible, and that the GPUs themselves are not. This could be caused by a system BIOS problem, or a power problem on the Tesla boxes, or loose cables, or a variety of other problems. I would recommend contacting HP support.

AaronP 10-19-10 08:48 AM

Re: Unable to install Nvidia driver in SLES 11
 
Oh wait, maybe I misread -- if you bought the Tesla S2050 directly from NVIDIA instead of through HP, then I'd suggest you contact NVIDIA support instead. ;)

You should have a developer relations / support contact person that should be able to help you through any setup issues.

asenjo 10-19-10 10:14 AM

Re: Unable to install Nvidia driver in SLES 11
 
Thank you AaronP. I've contacted both HP and Nvidia support, but I'm still waiting for help. It's true there is a lot of possible causes for this problem. I will double check the PCIe cables, but they should be well connected. Regarding the BIOS problem or power problem, are you aware of any tool or linux command I can use to check whether or not the GPUs are running?. Is there any particular BIOS configuration that should be set to enable the communication between the server and the tesla box? To try to isolate the problem, I will first try to connect the tesla to other server to better identify if the problem is in the HP or Nvidia side. Thank you very much.

asenjo 10-21-10 10:54 AM

Re: Unable to install Nvidia driver in SLES 11
 
Update. I've received news from nVidia support:

Quote:

"We now have a clear response from HP. There is a problem with the BIOS of the DL580 which results in the GPUs in the S2050 chassis not being recognised. Also, the bandwidth between the DL580 and the GPUs is not as high as it should be; the bandwidth problem has been observed when testing an earlier NVIDIA product (S1070) with the DL580. HP is working on the BIOS problem, and also HP and NVIDIA are cooperating to resolve the bandwidth problem."
So, I'm waiting... Thanks.

ltpvu 10-21-10 06:27 PM

Re: Unable to install Nvidia driver in SLES 11
 
Hi AaronP. I have a same problem. We brought two S2050 servers from Nvidia to install in our cluster with CentOS 5.3 platform. I could not get the driver installed because of errors that is exactly same as asenjo’ errors.

Here the PCI list of one cluster node which connects to 1 PCI port of Tesla S2050

Quote:

03:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a3) (prog-if 00 [Normal decode])
04:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a3) (prog-if 00 [Normal decode])
04:01.0 PCI bridge: nVidia Corporation Tesla S870 (rev a3) (prog-if 00 [Normal decode])
04:02.0 PCI bridge: nVidia Corporation Tesla S870 (rev a3) (prog-if 00 [Normal decode])
04:03.0 PCI bridge: nVidia Corporation Tesla S870 (rev a3) (prog-if 00 [Normal decode])
07:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a3) (prog-if 00 [Normal decode])
08:00.0 PCI bridge: nVidia Corporation Tesla S870 (rev a3) (prog-if 00 [Normal decode])
08:02.0 PCI bridge: nVidia Corporation Tesla S870 (rev a3) (prog-if 00 [Normal decode])
Our cluster includes Altus 1702 servers from Penguin. Could you see if this is also the BIOS problem and how I can fix it?

A weird thing is the drivers in Nvidia CD (e.g. NVIDIA-Linux-x86_64-195.36.15-pkg2.run) may be not support S2050. It could not found Tesla S2050 in the support list of the README file in the doc directory of installation package. I want to use the NVIDIA-Linux-x86_64-260.19.12.run since the S2050 in the support list but it sounds CUDA support for Redhat 2.6.18-194 kernel only.

I have more four questions and expect your answers

1. Will CUDA 3.2 package work on CentOS 5.3 (kernel 2.6.18-128.1.1.el5.530g0000, gcc 4.1.2)?

2. If not, will the driver package (e.g. NVIDIA-Linux-x86_64-195.36.15-pkg2.run) work for Tesla S2050?

3. If I install NVIDIA-Linux-x86_64-260.19.12.run (which is said that supports Tesla S2050), can it work with older CUDA toolkit like 3.1, 3.0 …?

4. Could you show me how to install these S2050 drivers manually? I have tried to copy 'nvidia.ko' to "/lib/modules/$(uname -r)/kernel/drivers/video/” but it does not work. When I ran “modprobe nvidia”, I got the error message “FATAL: Module nvidia not found”

Thank you


All times are GMT -5. The time now is 06:39 AM.

Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright ©1998 - 2014, nV News.