Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 09-14-12, 04:45 PM   #1
Godlikearg
Registered User
 
Join Date: Jan 2009
Posts: 12
Default K1000M on Thinkpad W530: card falls off the bus

Hi,

I've been breaking my head trying to get Bumblebee to work. After some generous help on #bumblebee IRC channel, I started doing some low-level testing, and managed to narrow down the problem.

Distro is Gentoo. Kernel version is 3.2 (I can try with 3.5 or other kernels but I suspect the problem will persist). Driver versions I have tried are 304.22, 304.43 and 304.48.

Either doing this:

Code:
# modprobe nvidia
# nvidia-xconfig -query-gpu-info
or this:

Code:
# nvidia-xconfig -query-gpu-info
from a tty as soon as the system has started (no X running though I suspect with X the results would be the same), yields this on dmesg:

Code:
[   48.990268] nvidia: module license 'NVIDIA' taints kernel.
[   48.990271] Disabling lock debugging due to kernel taint
[   49.030980] nvidia 0000:01:00.0: power state changed by ACPI to D0
[   49.030984] nvidia 0000:01:00.0: power state changed by ACPI to D0
[   49.030987] nvidia 0000:01:00.0: enabling device (0004 -> 0007)
[   49.030992] nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[   49.030999] nvidia 0000:01:00.0: setting latency timer to 64
[   49.031003] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none
[   49.031104] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  304.22  Mon Jul  9 21:07:07 PDT 2012
[   54.728024] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[   54.728039] NVRM: os_pci_init_handle: invalid context!
[   54.728041] NVRM: os_pci_init_handle: invalid context!
[   54.728045] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[   54.728048] NVRM: os_pci_init_handle: invalid context!
[   54.728049] NVRM: os_pci_init_handle: invalid context!
[   54.971061] NVRM: RmInitAdapter failed! (0x26:0xffffffff:1181)
[   54.971071] NVRM: rm_init_adapter(0) failed
[   54.974870] NVRM: RmInitAdapter failed! (0x23:0x2f:675)
[   54.974873] NVRM: rm_init_adapter(0) failed
lspci output:

Code:
01:00.0 VGA compatible controller [0300]: nVidia Corporation Device [10de:0ffc] (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: nvidia
No bumblebee component (bbswitch, bumblebeed, optirun) was run, like I said, I managed to narrow down the problem to this.

Any ideas?

Thanks in advance
Cheers
GODLiKE
Godlikearg is offline   Reply With Quote
Old 09-16-12, 10:26 PM   #2
Godlikearg
Registered User
 
Join Date: Jan 2009
Posts: 12
Default Re: K1000M on Thinkpad W530: card falls off the bus

Forgot to attach nvidia-bug-report: http://www.vicarious.com.ar/~godlike...-report.log.gz (have patience, it's my home connection )
Godlikearg is offline   Reply With Quote
Old 09-18-12, 12:06 AM   #3
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: K1000M on Thinkpad W530: card falls off the bus

Don't you need to put 'optirun' in front of any commands you want to run with the nvidia card? For instance, lspci tells you 'unknown header type 7f' because the card is off (ie in lower power state), so if you do 'optirun lspci' you should see more useful information. If you run nvidia-settings, you also need to specify the X display to use, ie "optirun nvidia-settings -c :8".

And you shouldn't need to worry about nvidia-xconfig, just edit the config file that bumblebee is using (eg on Ubuntu it puts this in /etc/bumblebee/xorg.conf.nvidia).
rockob is offline   Reply With Quote
Old 09-18-12, 08:41 AM   #4
Godlikearg
Registered User
 
Join Date: Jan 2009
Posts: 12
Default Re: K1000M on Thinkpad W530: card falls off the bus

Quote:
Originally Posted by rockob View Post
Don't you need to put 'optirun' in front of any commands you want to run with the nvidia card? For instance, lspci tells you 'unknown header type 7f' because the card is off (ie in lower power state), so if you do 'optirun lspci' you should see more useful information. If you run nvidia-settings, you also need to specify the X display to use, ie "optirun nvidia-settings -c :8".

And you shouldn't need to worry about nvidia-xconfig, just edit the config file that bumblebee is using (eg on Ubuntu it puts this in /etc/bumblebee/xorg.conf.nvidia).
optirun is only needed when you wish to run some application (e.g. a game) using the dedicated GPU. nvidia-xconfig / nvidia-smi and such commands do not need optirun as they work at a lower level.

Moreover, optirun basically what it does is running whatever it is you put after "optirun" in another X server running on the dedicated GPU, and then drawing the results back to the main display. "optirun lspci" does not make sense in this scenario.
Godlikearg is offline   Reply With Quote
Old 09-18-12, 10:05 AM   #5
rockob
Registered User
 
Join Date: Nov 2008
Posts: 95
Default Re: K1000M on Thinkpad W530: card falls off the bus

Quote:
Originally Posted by Godlikearg View Post
optirun is only needed when you wish to run some application (e.g. a game) using the dedicated GPU. nvidia-xconfig / nvidia-smi and such commands do not need optirun as they work at a lower level.

Moreover, optirun basically what it does is running whatever it is you put after "optirun" in another X server running on the dedicated GPU, and then drawing the results back to the main display. "optirun lspci" does not make sense in this scenario.
Perhaps, but in bumblebee you need to use optirun to enable the nvidia card and the nvidia libraries. Otherwise you're just using the intel card and the intel libraries and the nvidia card is turned off.

This is why your lspci command couldn't get any details about the nvidia card, and why "optirun lspci" makes perfect sense. For instance, on my system:

Code:
optirun lspci -s 1:00.0  -v
which gives:

Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 540M] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Dell Device 050e
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at f0000000 (32-bit, non-prefetchable) [size=16M]
	Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Memory at d0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 3000 [size=128]
	[virtual] Expansion ROM at f1000000 [disabled] [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidia_current, nouveau, nvidiafb
whereas as what you tried is:

Code:
lspci -s 1:00.0  -v
gives

Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 540M] (rev ff) (prog-if ff)
	!!! Unknown header type 7f
rockob is offline   Reply With Quote
Old 09-18-12, 12:12 PM   #6
Godlikearg
Registered User
 
Join Date: Jan 2009
Posts: 12
Default Re: K1000M on Thinkpad W530: card falls off the bus

I should have mentioned it before, but lspci errors out only after I get the "fallen off the bus" error (which happens whenever I wish to use the GPU).

Here's what I get after a clean reboot, and nothing loaded (not bbswitch, not nvidia module, no nothing). Also, I can modprobe nvidia and throw an lspci afterwards and the result is the same. I also tried modprobing both nvidia an bbswitch and manually power-cycling the card, which works. Only after doing anything that actually requires use of the card (be it optirun, nvidia-xconfig, nvidia-smi, or a CUDA program), does the GPU fall off the bus and the lspci output is as displayed on my first post.

Code:
panther godlike # lspci -d 10de: -vvnn
01:00.0 VGA compatible controller [0300]: nVidia Corporation Device [10de:0ffc] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device [17aa:21f5]
	Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at f0000000 (32-bit, non-prefetchable) [disabled] [size=16M]
	Region 1: Memory at c0000000 (64-bit, prefetchable) [disabled] [size=256M]
	Region 3: Memory at d0000000 (64-bit, prefetchable) [disabled] [size=32M]
	Region 5: I/O ports at 5000 [disabled] [size=128]
	Expansion ROM at f1000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us
			ClockPM+ Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest+
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [128 v1] Power Budgeting <?>
	Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900 v1] #19
	Kernel modules: nvidia
The difference in your case is that surely you have started bumblebee before running those commands, and by default, bumblebee turns off the card. During my debug sessions I have set bumblebee to not turn off the card when loaded (which, in turn, made bbswitch keep the card on when modprobing it).
Godlikearg is offline   Reply With Quote
Old 09-18-12, 09:44 PM   #7
Godlikearg
Registered User
 
Join Date: Jan 2009
Posts: 12
Default Re: K1000M on Thinkpad W530: card falls off the bus

I just booted on an Ubuntu 12.04 x64 livecd and can confirm that the GPU is working. At least nvidia-xconfig -query-gpu-info now gives me something.
Godlikearg is offline   Reply With Quote
Old 09-19-12, 03:07 AM   #8
Godlikearg
Registered User
 
Join Date: Jan 2009
Posts: 12
Default Re: K1000M on Thinkpad W530: card falls off the bus

Fixed it. I was missing these two kernel options:

Code:
CONFIG_NO_HZ:                                                
                                                             
This option enables a tickless system: timer interrupts will 
only trigger on an as-needed basis both when the system is   
busy and when the system is idle.                            


CONFIG_RCU_FAST_NO_HZ:                                          
                                                                
This option causes RCU to attempt to accelerate grace periods   
in order to allow CPUs to enter dynticks-idle state more        
quickly.  On the other hand, this option increases the overhead 
of the dynticks-idle checking, particularly on systems with     
large numbers of CPUs.
The second one depends on the first. After enabling both of those, I could query my GPU. I don't know why both are needed, but I'm guessing it's something to do with the interrupts. On my main desktop machine, only the first one is set.

Anyway, I'm off to sleep. Hope this serves somebody.
Godlikearg is offline   Reply With Quote

Old 09-19-12, 03:32 PM   #9
Godlikearg
Registered User
 
Join Date: Jan 2009
Posts: 12
Default Re: K1000M on Thinkpad W530: card falls off the bus

One more thing: after doing more testing at the request of the Bumblebee guys, I could see that IOMMU kernel configuration has an impact too. Without this option compiled in:

Code:
CONFIG_CALGARY_IOMMU:                                       
                                                            
Support for hardware IOMMUs in IBM's xSeries x366 and x460  
systems. Needed to run systems with more than 3GB of memory 
properly with 32-bit PCI devices that do not support DAC    
(Double Address Cycle). Calgary also supports bus level     
isolation, where all DMAs pass through the IOMMU.  This     
prevents them from going anywhere except their intended     
destination. This catches hard-to-find kernel bugs and      
mis-behaving drivers and devices that do not use the DMA-API
properly to set up their DMA buffers.  The IOMMU can be     
turned off at boot time with the iommu=off parameter.       
Normally the kernel will make the right choice by itself.   
If unsure, say Y.
while I would not get "has fallen off the bus" I do get the following messages (rminitcontext etc) and the card is unusable.
Godlikearg is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 09:00 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.