PDA

View Full Version : gigabit problems...


sanity
07-12-05, 05:04 PM
I've got a pair of nforce4 Ultra boards linked together via a Gigabit crossover. One is a Linux fileserver with a pair of SATA drives in a RAID0. The other is a dual-boot box.

The idea is to be able to share my files easily between Windows/Linux, without losing any performance over a local drive, or dealing with kludges like a fat32 partition. Plus, it's always nice to have a dedicated Bittorrent machine that I can leave downloading while desktop shuts down / reboots...

Anyway, I understood from reading nVidia's site that the nForce4 boards do both SATA and gigabit ethernet entirely outside of the PCI bus. If this was true, I would expect to easily saturate my gigabit link, as each hard drive is capable of 60 megs and gigabit is 100 theoretical max.

Anyway, tested one file, forget how big it was, but it was moving plenty fast enough at 1.5 seconds to copy over NFS, after I tweaked NFS and cached the file in the server's RAM. In other words, RAM->giga->RAM works the way I'd expect. It takes about 2 seconds to cache the file on the server (cat file > /dev/null), so disk->RAM isn't quite as fast as I wanted, but it's acceptible, especially considering I have 1 gig of RAM on the server and 2 on the desktop to fill up with cache.

However, when I try to read a file that I haven't read before on either machine, over the gigabit, it took 5-6 seconds. That doesn't even seem plausible even if everything was off a PCI bus -- I should still be able to get at least 60 megs/sec, and I'm not even getting half that.

This happens whether I'm using the forcedeth driver patched to support jumbo frames (9000 mtu), or nvidia's nvnet driver in hardware mode (required for jumbo support) whether I optimize for CPU or Throughput.

I really, really don't want to have to go buy a gigabit and/or SATA PCI Express card, especially since the gigabit and SATA support is most of the reason I got this board for the server.

Is it possible for me to tune some things and saturate my gigabit link while simultaneously reading from the disk? Or was nVidia lying to us about gigabit and SATA being separate from the PCI bus? Or am I just missing some good Linux drivers?

And, while I'm at it, how should I configure my crossover network? I want to use 10.1.1.1 for the server and 10.1.1.2 for the desktop, so what should my broadcast/netmask be? And how do I tell WinXP about this?

LBJM
07-12-05, 07:22 PM
if you don't know how to setup your network do a search on google. read the Linux howtos, and if you leave you file server on all the time I would suggest you run a dhcp server it will auto config your xp machine.

don't try to read a file over the gigabit connection. jumbo frames is more for transfering large files than anything else.

after you've setup your network lets make sure Jumbo Frames is actually working copy a 4GB file, if you dont have one make one. Jumbo frames reduces cpu load check both machine while the file is copying both machines should be at 50% cpu usage. if it's at 100% jumbo frames is not working. if all your settings on both machine are correct then at this point I would suggest a switch that supports jumbo frames the crossover cable can't do it. most gigabit switches do not support jumbo frames.
this one does: http://www.newegg.com/Product/Product.asp?Item=N82E16833129012

I ordered a dlink gigabit router so I was test if that supports jumbo frames.

my system and my file server work fine with gigabit and jumbo frames but I'm using a broadcom controller on my file server. I have my cable router and my smc switch pluged into it so the router acts like a dhcp server.

retsam
07-12-05, 09:13 PM
question is that linux machine being used as a gateway to the internet?
why do you have to use large frames for ...you probably dont need it...


and to answer your question about netmask for jsut two host it should be 255.255.255.252.... this is dont right in xp's ip config control panel

retsam
07-12-05, 09:22 PM
jumbo frames is more for transfering large files than anything else. not really it has been shown that jumbo frames can increase throughput by as much as 50% ..... and decrease cpu overhead and tcp overhead

Is it possible for me to tune some things and saturate my gigabit link while simultaneously reading from the disk? Or was nVidia lying to us about gigabit and SATA being separate from the PCI bus? Or am I just missing some good Linux drivers?
oh i jsut thought of something.... do you have full duplex on ? this is sounds like the reason that you are having long waits ....also you might have timeing issues...windows in the past has been know todo this from time to time

sanity
07-12-05, 10:06 PM
Ok, time to clarify a few things:

Yes, the Linux machine is a gateway to the Internet, but that's irrelevant. Its primary purpose is to be a fileserver, and two SATA disks is more than enough to saturate gigabit, and will benifit from Jumbo Frames, I hope.

Now, if you think I'm being slowed down by routing Internet traffic, I could probably find a spare NIC somewhere for the desktop (Windows) machine, so that no packets go over the Gigabit other than file serving packets.

I'm trying DHCP, and it's not working. I also can't find a DHCP option for broadcast. I do have another DHCP server working, even did network booting, so I know how to do DHCP, and it's not working.

When I plug the Windows box straight into my main network, DHCP works (off my actual Internet router). When I plug it into the Linux fileserver (which plugs into main network, which plugs into normal router), DHCP does NOT work, from either Windows or Linux.

I've been Googling, but I can't find these three options for a simple point2point-ish crossover network: available IPs, netmask, and broadcast address. But I still don't know how to make Windows do a broadcast address anyway. I may have to sacrifice a whole 255.255.255.0 network to this crossover, but that's not a huge issue.

Jumbo frames do work, and they do increase throughput. From Linux to Linux (since I can't get Windows to work on the crossover), without Jumbo frames, it takes much longer than with Jumbo frames, don't remember how much.

And why wouldn't jumbo frames work over a crossover cable?

I think I checked for full duplex, but even if it was off, how would that affect an NFS transfer, which is only sending data one way anyway? And even if it would, why do I get a full 100 megabytes per second (or at least 90-95) when transferring from RAM over the gigabit, and at least 80-90 megs per second from disk to RAM (not over a network), but from disk to gigabit, I get less than 3?

This means, by the way, that the disk access must be slowing down the network access, and vice versa. That would suggest that the SATA and network controllers are fighting for PCI bandwidth. But, the network card is supposed to be on something called a Mac interface or PHY or some odd name nvidia invented/stole, and the paper about that says specifically NOT PCI, because PCI can't saturate gigabit at full duplex, it can only saturate one way at a time. And, as in my case, if you use a PCI Gigabit card, you'll just about saturate your PCI bus, meaning everything else on the PCI bus (sound, video capture card, other 10/100 network card) slows down simultaneously.

BTW, I'm pretty sure my CPU usage never got below 50%, so I doubt it's CPU, I'm pretty sure it's PCI.

The question is, is my gigabit card really on the PCI, or is there some other way of getting to it? And if there is another way, do nVidia's Linux drivers support it? And if so, why isn't it working?

One more Linux-related question... I'm trying to install Linux on the desktop using NFS on the fileserver as root (basically diskless, only it'll have a swap and a /boot on the local machine), and at a certain point during the install, stuff just freezes, and awhile later, I get messages in the logs about the NFS server stopped responding, but then it's OK, but then it stopped... I suspect that if I left it for a couple WEEKS, it'd get done just fine, but why after only a few seconds of activity is my NFS dying? And why can I still do ssh and other things across the link? (NFS is dying, but the network connection is fine, IP is fine, etc...)

retsam
07-12-05, 10:42 PM
I think I checked for full duplex, but even if it was off, how would that affect an NFS transfer, tcp acks would bring it to a halt if it was in half duplex..... also check to see if your getting collision problems... to find this out you would have to snif the interface...

When I plug the Windows box straight into my main network, DHCP works (off my actual Internet router). When I plug it into the Linux fileserver (which plugs into main network, which plugs into normal router), DHCP does NOT work, from either Windows or Linux. dhcp is a broadcast protocol it will not make it past your linux box ....you want to hard wire addresses into your linux and windows boxes (that are dirrectly connected)...so anoither words this is how it should look (you have to remember your linux box is now acting as a router between two diffrent networks......

10.1.1.2/24 10.1.1.1/24 |dhcp-client ______ dhcp-server
[] --------------------------[] --------------------------------------[]
windows linux internet router

And why wouldn't jumbo frames work over a crossover cable?
no reason it shouldnt .... but you did say its faster with 9000 mtu .....


BTW, I'm pretty sure my CPU usage never got below 50%, so I doubt it's CPU, I'm pretty sure it's PCI.
wow thats actually a huge performance issue there ....


The question is, is my gigabit card really on the PCI, or is there some other way of getting to it? And if there is another way, do nVidia's Linux drivers support it? And if so, why isn't it working?
the nforce4 chipset hangs off the chipset not the pci bus...so there shouldnt be any pci related issues at all.

LBJM
07-13-05, 02:43 AM
with jumbo frames cpu usage is going to at 50% not any lower. now if it's at 100% that means it's not working. what are you using to test your throughput?
are you seeing 30000kbps? that equals 30M/s you'll get 30-40M/s with gigabit and jumbo frames not 90 m/s. the faster the connection the higher the overhead. I rather not go into detail about that.

with newer nics such as the nforce 4 nic you don't need a crossover cable. They can auto detect in hardware and do the crossover in hardware. if both machines are running nforce4s hook them together with a regular cat5e or cat 6(thats backwards compatible with cat5e) cable. cat 5 is too slow to get the full gigabit speed.

a question for you about the windows box is it running a firewall? if so disable it by uninstalling it. sygate and zonealarm don't work with jumbo frames, but outpost firewall does. I'm not sure if tiny does or not I haven't tried it.

but right now you biggest concern is making sure your file server is configured right.
check out http://www.linuxquestions.org/questions/ I'm sorry but I don't help people with their personal Linux configs anymore.

retsam
07-13-05, 02:53 AM
with newer nics such as the nforce 4 nic you don't need a crossover cable. They can auto detect in hardware and do the crossover in hardware. i have never seen that before outside of high end server boards and high end networking equiptment.... wow thats nice to know ...



cat 5 is too slow to get the full gigabit speed.
at the distences he is using it for cat 5 ,5e ,6 doesnt make much of a diffrence...

i think hes trying to fix way tomany things at once... i think he needs to foocus on one thing at a time....

LBJM
07-13-05, 03:03 AM
at the distences he is using it for cat 5 ,5e ,6 doesnt make much of a diffrence...

i think hes trying to fix way tomany things at once... i think he needs to foocus on one thing at a time....

it matters for anything larger than 3' of cable. I know this for a fact; learned it the hard way. I do agree he's doing too much at once he needs to start with his fileserver config first then go from there.

sanity
07-13-05, 08:59 PM
It should be a Cat 6 crossover, 10 feet. Should I have gotten a straight through? Should I get a shorter one?

30-40M/s max? Uh, I'll go back and check tonight, but I'm pretty sure I was getting 80-90, when I wasn't touching the disk. And where can I find this "overhead"? If you don't want to talk about it, can you point me to a paper or something?

And what overhead, in particular, are you talking about? TCP overhead? Samba overhead? These tests were with NFS, Linux to Linux. The Windows box is dual-boot.

What's a good Linux tool to sniff for collisions, and how do I know when they are collisions (and not something else)?

Now, if the Nforce4 is hanging off the chipset, then where's the SATAII? And why are they lagging each other? I'll do some more tests on that tonight, but I bet that if I do straight RAM-to-RAM transfers and also try disk activity, they will both slow each other down.

The fileserver problem can wait, actually. If it's a software problem, I can always fix it later. But if this gigabit problem is a problem of gigabit not supporting what I want it to do, or of me having to buy a PCI-E gigabit card to take load off the PCI...

BTW, about the PCI vs "hanging off the chipset" -- Why, if it's hanging off the chipset, does it show up as a PCI device on Linux? And if it shows up as a PCI device, can it be that I don't have the right drivers, or does it just look like that in the same way that PCI-E devices look like PCI?

LBJM
07-14-05, 03:17 AM
http://www.linuxquestions.org/questions/

sanity
07-14-05, 12:00 PM
Thanks, but the issue of why my SATA and gigabit ethernet lag each other is (probably) nvidia-specific, not Linux-specific.

If you like, I can try to reproduce this using Windows as the server.

Maybe there's some other sort of bus, in the chipset, shared between SATA and Gigabit?

But, the main question is, can I solve this simply by buying a PCI-E Gigabit card?

LBJM
07-14-05, 01:37 PM
Thanks, but the issue of why my SATA and gigabit ethernet lag each other is (probably) nvidia-specific, not Linux-specific.

If you like, I can try to reproduce this using Windows as the server.

Maybe there's some other sort of bus, in the chipset, shared between SATA and Gigabit?

But, the main question is, can I solve this simply by buying a PCI-E Gigabit card?

buying a pci-e gigabit card will not help. There is nothing wrong with the hardware. I'm not having any trouble with my gigabit setup. The problem that you seem to be having is that your unsure of whether or not your network is setup the way you need it, or maybe your thinking the whole setup shoould be faster than it is. only you doing reasearch on that can know for sure. try running a windows server setup if you like. The important thing to need the make sure of is it that without any doubt you have everything setup the way you need it to be. The less varibles you have the easier it is to troubleshoot.

sanity
07-16-05, 10:13 AM
My network setup is fine, actually.

Can you explain to me why I still have the speed problem even when using netcat? That is, I get exactly as much speed as I expect, so long as I don't touch disk, but if there's disk activity at the same time, then I get 30mbps instead of 80-90?

Anyway, I've ordered a couple of PCI-E gigabit cards, since I haven't gotten any suggestions (here or elsewhere) on something to try in software. They will be useful even if I don't need them for this anyway.

LBJM
07-16-05, 04:17 PM
because accessing the hard drive is what causes the bottleneck. when it comes to doing things over the network the hard drives are what slows things down. the different RAID levels and SCSI U160 and U320 were made for this reason.

since you ordered the pci-e cards, I would like to know what kind of proformance you get out of them

sanity
07-18-05, 02:44 PM
I have two hard drives in there. Each is capable of 60 megabytes per second, and together (striped RAID), they get at least 100 megabytes per second, sometimes 110-120. Gigabit is 100 megabytes per second, theoretical maximum.

I've said this at least twice in this thread. Go back and read.

I SHOULD BE ABLE TO SATURATE THE PIPE.

But, I can't saturate it while I'm accessing the disk. I can't saturate the disk while I'm sending too much across the network. Therefore, for whatever reason, my hard drive CONTROLLER, not any single drive, is fighting with my network card for some sort of bandwidth. The kind of performance I'm getting suggests that it's actually a PCI bus, which according to nvidia and to people on this forum is not how nforce4 gigabit should be.

But, on my board at least, it is. I can get at least 60 megs per second per drive, why then am I getting 30 or less over a pipe that can transfer 80-90 at least from RAM?

If these gigabit cards do perform as they are supposed to, then I will make a mental note to avoid nforce in the future.

LBJM
07-18-05, 08:17 PM
why don't you do a simple google search!! you will see what the truth is. it's not the nforce4 that's the source of "your" problem.

sanity
07-18-05, 10:23 PM
What kind of Google search?

I came here because I was out of Google searches!

Maybe it's not the nforce4, and I'll have to send these cards back or think of something else creative to do with them. But I'd like to know what the bottleneck actually is, and whether I can do something about it.

retsam
07-18-05, 11:14 PM
ok did you do what i said you could have timing issues in windows there is the performance counters this is located in administrative tools use it to see what might be wrong with your network. just trurn on the perfoamnce counters for tcp udp and other network protocols ...then do a file transfer while this is active.... and please post what you find ...

LBJM
07-19-05, 04:32 PM
he's using Linux on both machines one machine dual boots windows.

try this kind of search: slow gigabit performance under linux with Jumbo frames

http://www.google.com/search?q=slow+gigabit+performance+under+linux+with +Jumbo+frames&hl=en&hs=zXJ&lr=&client=firefox&rls=org.mozilla:en-US:unofficial&start=0&sa=N

I will try to help you out with this later. I didn't mean to be so rude earlier it just felt you weren't trying to do anything.

also here a pdf you might want to read "The Performance of a Linux NFS Implementation" http://www.wpi.edu/Pubs/ETD/Available/etd-0523102-121726/unrestricted/boumenot.pdf

sanity
07-20-05, 12:14 AM
My gigabit performance, Linux to Linux, with Jumbo frames, is fine. It's my NFS that's broken, but I'll fix that later.

rentsam, which counters? There's some 20-30 counters for various protocols, and they just seem to make a graph... If you really want all the counters, how can I just get a summary, without having to squint at 30 lines on a graph?

The very first real problem to solve is, why, when I am reading from the disk on the Linux server, does my network traffic slow down?

I do:

nc -lp 8080 -q 0 < /huge/file # on the fileserver
time nc 10.1.1.1 8080 -q 0 > /dev/null # on the client

Those two commands set up a TCP connection between the fileserver and the Linux desktop, reading from a huge file. The command run on the server will wait for the client connection, which is why I'm couting how long the client takes. The -q 0 means to close immediately after end-of-file, so that I know exactly when the transfer ended.

The first time I run the above pair of commands on a certain file (think it was about 177 megs or so), it took 5-6 seconds to finish. The second time, it took about 2 seconds. It was much, much faster the second time, because it was already cached in RAM on the server.

However, if I find another file about the same size (170 megs this time, I think), this command:

time cat /huge/file > /dev/null # on the server

takes only 1.5 seconds.

A rough calculation shows that my network, when transferring from the RAM cache, does go close to 80-90 megabytes per second one way (took 2 seconds). Reading the same file from disk to RAM cache, no network invovled, got 110-120 megabytes per second (took 1.5 seconds), which is what I'd expect -- this is a pair of 250 gig drives in RAID0 (striping), each benchmarked at 60 megs per second on its own.

However, when reading straight from the disk to the network, the transfer was 5-6 seconds, something like 20-30 megs, 40 tops.

Are my calculations really that far off? And anyway, even if you ignore my mental math, the numbers are real: 1.5 seconds disk-to-RAM, 2 seconds RAM-through-net, 5+ seconds disk-to-net.

The problem is not likely to be on the client at all, because the client can't tell the difference -- as far as it's concerned, it's getting the exact same transfer. My numbers can't be skewed by client cache, either, because I'm using netcat, which does no caching -- straight from the TCP connection to the null device. As far as I know, it can't be my network cable, either -- it has "cat 6" printed on it, it's red (which by my color scheme, red is ALWAYS crossover), and I can get reasonable transfers through it. The problem is exclusively on the server.

Conclusion: Obviously, the network card and the disk controller are fighting for SOME sort of bandwidth. By my rough calculations, it's behaving exactly as if they were on some sort of shared PCI bus (130 or 133 megs per second maximum), but I could be wrong -- maybe my RAM is the culprit (cheap DDR400). Maybe 1.8 ghz isn't enough, even though my CPU usage is about nil. Maybe I'm being screwed over by Chaintech, my mobo manufacturer. Maybe the Gods of Gigabit hate me. I don't know.

But I know one thing -- this is NOT right, and probably cannot be solved by a simple STFW.

Good news is, my PCI-E gigabit cards will get here in a day or two, and I can eliminate another couple of guesses.

sanity
07-22-05, 01:29 PM
Or not.

Got the new PCI-E gigabit cards. They eliminate the Netcat probem. Now I get at least 90 megs, even if it's coming off the disk.

But, NFS still performs the way Netcat used to.

LBJM
07-22-05, 03:56 PM
thats interesting what pci-E nics did you buy?