Go Back   nV News Forums > Linux Support Forums > NVIDIA Linux

Newegg Daily Deals

Reply
 
Thread Tools
Old 11-17-05, 09:58 PM   #1
yitzhakbg
Registered User
 
Join Date: Feb 2003
Location: ISRAEL
Posts: 31
Default Dismayed by X crashes after logout

At this juncture, all I can do is express my dismay. The X server crash phenomenon after logout with multiple local X servers under GDM has been known for a long time. There have been numerous posts, but no solution. We have tried all the suggestions on the forum posts, but to no avail. Worse yet, there is only silence from Nvidia. We're close to losing an important client whose patience has run out. Shame I can't attach a few tears to this post after all the sleepless nights I've invested in attempting to apply a solution.
It is definitely an Nvidia issue. Users of the Linux open nv driver seldom see the crash. I'd be happy to forgo the accelerated video and use nv, but we have not been able to run the nv driver on multiple local X servers. Multi-local X server users with chips from other manufacturers do not have the problem.
In misery,
Yitzhak
P.S. Don't get me wrong, fellows. I'm sure you Nvidia people work hard and the Linux community appreciates that. But unfortunately, we need a solution or a workaround. What choice do we have now?
__________________
Best,
Yitzhak
yitzhakbg is offline   Reply With Quote
Old 11-17-05, 10:31 PM   #2
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: Dismayed by X crashes after logout

Please post a bug report, along with information about your hardware. This is clearly a problem specific to your environment, and not global to all users.

Thanks,
Lonni
netllama is offline   Reply With Quote
Old 11-22-05, 10:25 AM   #3
yitzhakbg
Registered User
 
Join Date: Feb 2003
Location: ISRAEL
Posts: 31
Default Re: Dismayed by X crashes after logout

Thank you very much for your willingness to assist.
Attached is a report after a typical crash. The crash invariably occurs after a logout on the multi-user system. In this case, it was X2. It may not crash immediately after logout, or at all, but we almost never see crashes until after the first logout.
Preventing a reset of the X server (AlwaysRestartServer=false in gdm.conf) does not help.
As many of my multi local X server colleagues I sense it is an Nvidia problem. Indicative is the fact that your bug report script logs the first X server (X0) only. I have included the other X server logs in the attached zip file together with gdm.conf.
Believe me, we're desperate.
Attached Files
File Type: zip nvidia-bug-report.zip (43.0 KB, 148 views)
__________________
Best,
Yitzhak
yitzhakbg is offline   Reply With Quote
Old 11-22-05, 11:52 AM   #4
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: Dismayed by X crashes after logout

Yitzhak,
Does this problem also exist with 1.0-7676?

Which vendor's nForce2 motherboard are you using? Which BIOS version?

In my experience ruby-kernel issues are often tied to the specific hardware & OS configuration in use Is shipping a system to NVIDIA where this reproduces an option for you?

Thanks,
Lonni
netllama is offline   Reply With Quote
Old 11-22-05, 02:18 PM   #5
mppardee
Registered User
 
Join Date: Jul 2004
Posts: 8
Default Re: Dismayed by X crashes after logout

Hello,
We too have an x crashing problem after logout, but we have also found a scenario where a similar problem occurs on initialization and is repeatable. We are NOT using ruby or anything kernel specific, and the problem occurs on both Asus VIA and nforce4 boards. We filed a very detailed bug report on November 18th, and haven't heard anything. Indeed, the problem is specific to situations where more than one x server is running, but it is not specific to a board or distro -- so far we have not found one user or company that has been able to avoid this crashing with multiple users - using many different approaches, boards, and distros. Also, using the nv driver prevents the problem, but obviously that has other drawbacks, including that more recent cards won't even work with the nv driver.

Here is the bug report I filed, is linux-bugs@nvidia.com still the correct address?
Dear NVIDIA,
My company sells multi-user computers, and we are in the process of
moving to Ubuntu (instead of standard debian) and switching to the
latest nvidia drivers (required because newer cards don't work with
the old nvidia drivers).

Summary:
Now we are seeing problems with X servers crashing and not being able
to start back up again. We have finally found a repeatable scenario
for this normally intermittent problem. There are 4 cards, each
independent with their own keyboard and mouse. The X server on :3
won't come up, and complains about /dev/nvidia1 in the gdm log, but x
server :3 should not be trying to use /dev/nvidia1. This same
problem occurs intermittently in other configurations, i.e. all
screens come up fine and 1/2 hour later when someone logs out/in one
screen crashes and won't come back up until we /etc/init.d/gdm
restart.

Key Points:
1. Please tell us what we need to do to help you debug this. we are
willing to put in the time to debug it, but we've tried everything we
can possibly think of over the past 2 weeks. We are willing to sign
an NDA, send you hardware, make a trip to your headquarters, etc.,
whatever it takes. we even have a multi-user live cd we can give you
so you don't need to spend time configuring your software
2. it is mainly intermittent, and occurs ~ 1/20 times when a user is
logging out/in -- however, we now have a situation in which all cards
will not start up and it is repeatable
3. the messages in /var/log/gdm/:X when display X crashes and won't
restart reference a /dev/nvidiaY device where Y is a different display
-- in other words, one video card is trying to interfere with another
card
4. running /etc/init.d/gdm restart will get the cards back up again
after a crash (but the repeatable problem where the fourth card won't
come up can't be fixed in any way)
5. we have finally found a completely repeatable situation under which
this occurs, so we can debug and get you any information you need
6. we have read about several other people having this exact same
problem, some on your forums. we have tried all the suggestions to no
avail
7. we have even tried your "leaked" 8168 drivers, which behave the same or worse
8. we have replicated this on several different motherboards. we saw
one thread where you blamed the motherboard, so we bought a asus
nvidia nforce4 chipset motherboard so we could have a complete nvidia
solution. We don't know of any motherboard that could possibly be
"better".
9. This particular repeatable problem happens to occur on a dual
pci-e motherboard with 2 pci-e cards and two pci cards, but as I
mentioned it happens with normal agp boards, so dual pci-e isn't the
problem
10. I know you probably think multi-user isn't important, but more
and more people are switching to it. if we can't resolve this, we
will have to look at other brands of video cards and nvidia's linux
and multi-user friendly reputation won't be as good. If we find the
root cause of this problem, that could fix other bugs or prevent more
serious problems down the road, so it really is in your best interest
to get this fixed.

If you have any questions, please let me know.
a bug report is available with extra log and configuration files included:
http://groovix.com/nvidia-bug-report...groovix.tar.gz

Thank You,
Michael Pardee
Open Sense Solutions LLC
http://opensensesolutions.com
mppardee is offline   Reply With Quote
Old 11-22-05, 02:26 PM   #6
zander
NVIDIA Corporation
 
zander's Avatar
 
Join Date: Aug 2002
Posts: 3,740
Default Re: Dismayed by X crashes after logout

@mppardee: your problem isn't necessarily related to yitzhakbg's. In your case, the problem appears to be related to exhaustion of the kernel's virtual address space. It is likely that disabling vesafb or, if it's already disabled, increasing the kernel's virtual address space size with the vmalloc kernel parameter (e.g. to 196MB) will help.
zander is offline   Reply With Quote
Old 11-22-05, 02:28 PM   #7
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: Dismayed by X crashes after logout

Michael,
I apologize for the delay in responding to your email sooner. I've just sent you a reply, and am including it here as well:

Looking at the log file, it seems that the kernel's virtual address space is barely large enough to support your configuration under normal circumstances, and, depending on what's going on in the system, the space remaining with all but the last X server up can be fragmented enough for the register mapping attempt to fail.

The system in question is a Linux/x86 system with 2GB of RAM, the
kernel's virtual address space is < 128MB to begin with.

A couple of things worth trying:
- make sure vesafb is disabled (it's unclear if it's actually
attached to cards in this case).
- use the 'vmalloc' kernel parameter to increase the kernel's
virtual address space size (e.g. to 196MB).

If neither of the above helps, and you are still willing to ship a system that easily reproduces this problem, please let me know, and I'll provide you with a shipping address. Getting a system that easily reproduces this is the best way to ensure that development can investigate further.

Thanks,
Lonni J Friedman
NVIDIA Corporation
netllama is offline   Reply With Quote
Old 11-23-05, 02:25 PM   #8
mppardee
Registered User
 
Join Date: Jul 2004
Posts: 8
Default Re: Dismayed by X crashes after logout

For anyone else tracking this issue:
using vmalloc=256M as a kernel parameter allowed us to initialize all screens successfully in that pci-e machine. However, our main problem is the crashes and failure to restart that occur after several login/logout cycles on all of our machines, which happens even with vmalloc=512M and vesafb disabled. The logs looked the same so we thought the initial problem was the same as crashing problem . The main symptom is i/o errors on /dev/nvidiaX in the gdm log for display Y, where Y and X aren't the same.
We are sending a machine to nvidia for testing -- thanks!
mppardee is offline   Reply With Quote

Old 11-23-05, 04:59 PM   #9
yitzhakbg
Registered User
 
Join Date: Feb 2003
Location: ISRAEL
Posts: 31
Default Re: Dismayed by X crashes after logout

I'm pretty sure that Mike and I have the same problem. We also report that the problem does not occur until at least one logout (resettin X) has occurred. It also occurs around the same frequency. The heck of it is that since it's a multi-user system, having to reset gdm knocks out the other two or three innocent users o the system.
before we get down to the nitty-gritty, can one of you tell us how we can resset only the offending X server without having to restart GDM (or XDM, KDM)? That would take a lot of pain away while we work on soving the problem
__________________
Best,
Yitzhak
yitzhakbg is offline   Reply With Quote
Old 11-23-05, 05:28 PM   #10
netllama
NVIDIA Corporation
 
Join Date: Dec 2004
Posts: 8,763
Default Re: Dismayed by X crashes after logout

You could just kill the X process, however I'm not sure that's going to serve as a reasonable workaround.

-Lonni
netllama is offline   Reply With Quote
Old 11-24-05, 05:33 PM   #11
yitzhakbg
Registered User
 
Join Date: Feb 2003
Location: ISRAEL
Posts: 31
Default Re: Dismayed by X crashes after logout

After having implemented the multi-user local X server solution over the past years on different hardware platforms, both Intel and AMD I am reinforced in being convinced that Mike and I are talking about the same thing. Daniel Weingartner in Brazil, a multi-user pioneer had to reluctantly abandon Nvidia in favor of Sis 315 boards which solved the problem for him.
If I may suggest, you folks at Nvidia can easily implement your own multi-user system for testing in less than an hour. For you, it's a piece of cake. Just install a standard Linux distro with an AGP card and two or three PCI cards. Install Aivis' faketty module, see:
http://lkml.org/lkml/2005/10/4/25
It's quick and simple.
I've included xorg.conf and gdm.conf and you're off and running. Start GDM with three or four local servers and then get crash the system by repeatedly pressing <CTRL><ALT>BKSP>. You can do it the polite way, by logging in and logging out until the servers crash, but <CTRL><ALT>BKSP> is quicker.
Believe me, if we had Nvidia working smoothly, the rapidly growing multi-user local X server community would embrace you. If you could provide us with a resonably priced multi GPU card (must absolutely be multi-GPU with seperate PCI addresses, like the Matrox G-550). We'd all be storming ahead. In the meantime, I'm up a creek and need a paddle fast. If I don't get a solution, we'll have to swithch the video boards. Painful, but what can I do?
Yitzhak
Attached Files
File Type: zip confs.zip (10.5 KB, 136 views)
__________________
Best,
Yitzhak
yitzhakbg is offline   Reply With Quote
Old 11-26-05, 07:24 PM   #12
mppardee
Registered User
 
Join Date: Jul 2004
Posts: 8
Default Re: Dismayed by X crashes after logout

Yitzhak, we are sending nvidia a machine on monday already setup with automatic logins/logouts to exhibit the problem. Multi-user is only a small fraction of their customer base and they are busy trying to get a new release out, we have to do everything we can to make their job easier.

If the next driver release fixes the "NVRM: RmInitAdapter failed!" problem, I'm hoping that might fix our problem, but here's more info from /var/log/messages using the nvidia 8168 leaked driver if anyone cares, right before a total system freeze (7676 doesn't freeze the machine, a gdm restart gets things back to normal.)

Nov 26 18:05:13 localhost kernel: [4294842.118000] NVRM: RmInitAdapter failed! (0x23:0xffffffff:676)
Nov 26 18:05:13 localhost kernel: [4294842.118000] e1165849
Nov 26 18:05:13 localhost kernel: [4294842.118000] Modules linked in: fuse rfcomm l2cap bluetooth cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave cpufreq_ondemand cpufreq_conservative nvidia agpgart video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi i2c_acpi_ec button battery container ac ipv6 af_packet analog gameport snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device pcspkr rtc ohci1394 shpchp pci_hotplug snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_nforce2 i2c_core dm_mod tsdev evdev sr_mod sbp2 ieee1394 psmouse mousedev parport_pc lp parport md ext3 jbd thermal processor fan usbhid skge sata_sil forcedeth ehci_hcd ohci_hcd usbcore sd_mod ide_cd cdrom ide_generic sata_nv libata scsi_mod amd74xx ide_core unix tileblit font bitblit cfbcopyarea cfbimgblt cfbfillrect softcursor capability commoncap
Nov 26 18:05:13 localhost kernel: [4294842.118000] CPU: 0
Nov 26 18:05:13 localhost kernel: [4294842.118000] EIP: 0060:[pg0+551372873/1069995008] Tainted: P VLI
Nov 26 18:05:13 localhost kernel: [4294842.118000] EFLAGS: 00013016 (2.6.12-10-386)
Nov 26 18:05:13 localhost kernel: [4294842.118000] EIP is at _nv002298rm+0x15/0x1c [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] eax: e4700000 ebx: c99a5800 ecx: 00000001 edx: 00000050
Nov 26 18:05:13 localhost kernel: [4294842.118000] esi: c9b08000 edi: ca3d0400 ebp: c94f5e10 esp: c94f5e10
Nov 26 18:05:13 localhost kernel: [4294842.118000] ds: 007b es: 007b ss: 0068
Nov 26 18:05:13 localhost kernel: [4294842.118000] Process Xorg (pid: 9713, threadinfo=c94f4000 task=c9971020)
Nov 26 18:05:13 localhost kernel: [4294842.118000] Stack: c94f5e40 e12aa06a ca3d0400 c99a5800 00000140 00000001 c99a5800 c9b08000
Nov 26 18:05:13 localhost kernel: [4294842.118000] 00000001 00075e40 c99a5800 c9b08000 c94f5e70 e116b9f1 c99a5800 00000140
Nov 26 18:05:13 localhost kernel: [4294842.118000] 00000001 00000023 c9e1dd40 00003297 00000000 c94f5ea8 00000008 00000007
Nov 26 18:05:13 localhost kernel: [4294842.118000] Call Trace:
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+552702058/1069995008] _nv004895rm+0x8a/0x94 [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+551397873/1069995008] rm_set_interrupts+0x129/0x144 [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+553637341/1069995008] os_release_sema+0x21/0x3e [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+551368050/1069995008] _nv002248rm+0x12/0x18 [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+551368002/1069995008] _nv002340rm+0x12/0x18 [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+551396955/1069995008] rm_init_adapter+0x77/0x8c [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+551396937/1069995008] rm_init_adapter+0x65/0x8c [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+551396918/1069995008] rm_init_adapter+0x52/0x8c [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+553624975/1069995008] nv_kern_open+0x116/0x207 [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+553627720/1069995008] nv_kern_isr+0x0/0x5b [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [pg0+553625081/1069995008] nv_kern_open+0x180/0x207 [nvidia]
Nov 26 18:05:13 localhost kernel: [4294842.118000] [chrdev_open+215/240] chrdev_open+0xd7/0xf0
Nov 26 18:05:13 localhost kernel: [4294842.118000] [dentry_open+190/391] dentry_open+0xbe/0x187
Nov 26 18:05:13 localhost kernel: [4294842.118000] [filp_open+65/73] filp_open+0x41/0x49
Nov 26 18:05:13 localhost kernel: [4294842.118000] [sys_chown+54/65] sys_chown+0x36/0x41
Nov 26 18:05:13 localhost kernel: [4294842.118000] [sys_open+56/179] sys_open+0x38/0xb3
Nov 26 18:05:13 localhost kernel: [4294842.118000] [sysenter_past_esp+84/117] sysenter_past_esp+0x54/0x75
Nov 26 18:05:13 localhost kernel: [4294842.118000] Code: 8b 4d 14 8b 55 10 d1 ea 8b 80 7c 01 00 00 66 89 0c 50 89 ec 5d c3 55 89 e5 8b 45 0c 8b 4d 14 8b 55 10 c1 ea 02 8b 80 7c 01 00 00 <89> 0c 90 89 ec 5d c3 55 89 e5 83 ec 10 56 53 8b 5d 0c 8b 75 10
Nov 26 18:05:14 localhost kernel: [4294842.118000] <3>irq 18: nobody cared (try booting with the "irqpoll" option.
Nov 26 18:05:14 localhost kernel: [4294842.846000] [__report_bad_irq+49/116] __report_bad_irq+0x31/0x74
Nov 26 18:05:14 localhost kernel: [4294842.846000] [note_interrupt+125/162] note_interrupt+0x7d/0xa2
Nov 26 18:05:14 localhost kernel: [4294842.846000] [__do_IRQ+133/177] __do_IRQ+0x85/0xb1
Nov 26 18:05:14 localhost kernel: [4294842.846000] [do_IRQ+25/36] do_IRQ+0x19/0x24
Nov 26 18:05:14 localhost kernel: [4294842.846000] [common_interrupt+26/32] common_interrupt+0x1a/0x20
Nov 26 18:05:14 localhost kernel: [4294842.846000] [schedule+1167/1188] schedule+0x48f/0x4a4
Nov 26 18:05:14 localhost kernel: [4294842.846000] [sys_sched_yield+89/98] sys_sched_yield+0x59/0x62
Nov 26 18:05:14 localhost kernel: [4294842.846000] [sysenter_past_esp+84/117] sysenter_past_esp+0x54/0x75

Thanks,
Mike
mppardee is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


Similar Threads
Thread Thread Starter Forum Replies Last Post
nvidia 9800gt, linux, and nvidia 295.xx crashes mattgen88 NVIDIA Linux 0 06-06-12 12:11 AM
Crashes now? Imperito NVIDIA Linux 4 10-10-02 12:23 AM
XF86COnifg-4 configuring and server crashes JoeJaz100 NVIDIA Linux 2 10-01-02 10:49 AM
crashes and freezes Lethal Weapon NVIDIA Linux 9 09-27-02 09:23 PM
Crashes in Wolfenstein SP Demo Heulsuse NVIDIA Linux 2 08-24-02 04:10 PM

All times are GMT -5. The time now is 12:14 AM.


Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright 1998 - 2014, nV News.