Bug 139529

Summary: Access NFS Server Freezes the machine
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Joseph Loo <jloo>
Component: KernelAssignee: Neil Brown <nfbrown>
Status: RESOLVED WORKSFORME QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: jloo
Version: unspecified   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Joseph Loo 2005-12-16 02:34:59 UTC
I have SUSE 10.0 with the latest kernel updates. It is running on AMD on ASUS k8vSE motherboard. The machine has 1.5 Gbytes of ram. It is running the 64 bit version of the operatings system.

I had install nfs-utils with the box version of SUSE 10.0. Autofs has been enable with auto.net. With the original version, I have a cyrix 933 machine with basically the same configuration exept it has only 1 Gbyte of RAM.

The Cyrix has a a /export/home exported with the default parameters.
The amd 64 has /export/home & /export/home0. I have all my users automounted to their respective file.

on the cyrix machine I could do an ls/net/crab/export/home0 and get a listing with no problem. Likewise I ccould do ls /net/sirus and the directories would appear with no problem.

After the kernel update a few weeks ago, whenever the cyrix machine would do a ls /net/crab/export/home0 it would freeze the AMD64 bit machine. The only way to grab control is to do reset. The cyrix machine has none of this symptons. It seems to be specific to the 64 bit cde.
Comment 1 Olaf Kirch 2005-12-19 10:51:59 UTC
Joseph, could you enable verbose kernel logging to the console on the NFS
server (klogconsole -l8 -r0) and switch to the virtual console (Alt-F1)?
I'd like to know whether the kernel prints any oops messages.

Neil, could you take care of this one, please?
Comment 2 Neil Brown 2005-12-20 00:15:13 UTC
Yes, kernel messages would be helpful.

Also, a few more details about what works and what doesn't.
You say that 'ls /net/crab/export/home0' will freeze the AMD64 machine (the server).  Does any NFS access to that server cause it to freeze, or just that particular access?
What about 'ls /net/crab/exprot/home' ??
What if you mount manually rather than using automount.
e.g. on sirus (the Cyrix machine I assume)
   mkdir /test
   mount crab:/export/home0 /test
   ls -l /test
   echo hello > /test/testfile
   rm /test/testfile

Does any or all of that work?  If not, at which point does 'crab' stop responding?

Thanks,
NeilBrown
Comment 3 Joseph Loo 2005-12-20 06:50:39 UTC
Comment 1
The screen dump was extremely long I tooks some snippets of the screen
Unable to handle kernel null pointer dereference at 0000000000000098 RIP:
<ffffffff8815375a>{:sk981in:FreeTxDescriptors+186}
PGD 0
OOP :0000[1]
CPU 0
Moduleslinked in:NLS_utf8 nvidia hfsplus vfat fat subfs freq_table ipv6 autofs4
nfsd exportfs snd_pcm_oss snd_mixer_oss button batter ac snd_seq ...


PID 0, Com: swapper tainted:pf u 2.6.13.-15.7-default
RIP:0010 [<ffffffff8815375a7>] <ffffffff8815375a>{:sk981ub:FreeTxDescriptor+186}


Call Trace <IRQ><ffffffff88153888>{:sk91in:xmitFrane+72><ffffffff88153>{sklin:sk
gexmit+89}
<ffffffff802e93d7>{ip_finish_output+455}<ffffffff802c5d62>{dev_queue_xmit+242}
<ffffffff802e6b9d>{ip_dst_oupput+109}<fffffffff802e9799>{ip-ouptut+201}

As for comment 2
It also freezes the machine with no problem.
Comment 4 Neil Brown 2005-12-20 07:43:19 UTC
"If not, at which point does 'crab' stop responding?" ????

I'm guessing at the mount command.  Is that correct?

The stack trace seems to point to a problem in the network driver rather than in NFS.  Could you try the confirm that by trying various other network accesses to 'crab'. e.g.
   ping
   ping -s 2000
   ssh
and anything else that you happen to have a server for on 'crab'.

If none of them fail, try to explicitly set the transport protocol that nfs is using with
   mount -o udp ....
and
   mount -o tcp ....
and see if one works better than the other.

Also, please report what network card you have on 'crab'.

Finally, if you have a digital camera available, a photograph of the dump messages may be very helpful.

Thanks,
Comment 5 Joseph Loo 2005-12-21 03:12:16 UTC
I have bad news on the bug. It just went away for no reason. The only thing that is different is that I rebooted my sius computer. It wass running for several days with a bitt torrent client.

I do have one domment. I never had a problem with the other tcp/ip services at all. I regularly usee ssh between the two machines
Comment 6 Neil Brown 2006-04-27 02:21:53 UTC
Ok, let's close this as 'WORKSFORME' seeing it works-for-you now...

Reopen if it happens again.