Bugzilla – Bug 153295
access to CIFS mount returns Input/output error after suspend
Last modified: 2008-06-25 09:53:42 UTC
when waking up after a suspend all cifs mounts return "Input/output error": pell:~ # mount /install/ pell:~ # ls /install/ . aix cdimages images redhat suse vmware .. bilder cisco lost+found sernet tmp pell:~ # ls /install /bin/ls: /install: Input/output error pell:~ # cat /proc/mounts |grep install //oscar/data /install cifs rw,mand,nosuid,nodev,noexec,unc=\\oscar\data,username=lufsuser,domain=,rsize=16384,wsize=57344 0 0
well, in the above example, right after the first "ls /install" I did a suspend to ram for 5 seconds and then did the next "ls /install".
Steve, can you look into this please?
Two questions, after the suspend and restart can you report a) what the output of: "cat /proc/fs/cifs/DebugData" so we can see what the state of the session(s) and shares (tree connections) are. 2) whether cifsd, the cifs background thread which handles response processing, has exited (e.g. "ps -A | grep cifsd") 3) enable debug logging ("echo 1 > /proc/fs/cifs/cifsFYI") and report any output that is put in dmesg by the suspend and restart (e.g. execute the dmesg command or equivalent after the restart)
1: ---snip--- pell:~ # cat /proc/fs/cifs/DebugData Display Internal CIFS Data Structures for Debugging --------------------------------------------------- CIFS Version 1.40 Active VFS Requests: 4 Servers: 1) Name: 192.168.16.128 Domain: TUX-NET Mounts: 1 OS: Windows 5.1 NOS: Windows 2000 LAN Manager Capability: 0xe3fd SMB session status: 2 Shares: 1) \\192.168.16.128\c$ Uses: 1 Type: NTFS DevInfo: 0x20 Attributes: 0x700ff PathComponentMax: 255 Status: 1 type: DISK 2) \\192.168.16.128\f$ Uses: 1 Type: NTFS DevInfo: 0x20 Attributes: 0x700ff PathComponentMax: 255 Status: 3 type: DISK DISCONNECTED ---snap--- (the second mount here has a different problem, it's reported to be busy and can't be unmounted any more. Let's just concentrate on the c$ share here) 2: there's no more cifsd process running 3: is the killing of the cifsd thread good !? : fs/cifs/cifsfs.c: CIFS VFS: in cifs_statfs as Xid: 137592 with uid: 0 fs/cifs/cifssmb.c: In QFSInfo fs/cifs/cifssmb.c: OldQFSInfo fs/cifs/cifsfs.c: CIFS VFS: leaving cifs_statfs (xid = 137592) rc = -5 fs/cifs/cifsfs.c: In cifs_put_super fs/cifs/connect.c: CIFS VFS: in cifs_umount as Xid: 137593 with uid: 0 fs/cifs/cifssmb.c: In tree disconnect fs/cifs/connect.c: No session or bad tcon fs/cifs/connect.c: CIFS VFS: leaving cifs_umount (xid = 137593) rc = -5 CIFS VFS: cifs_umount failed with return code -5 fs/cifs/cifsfs.c: Devname: //192.168.16.128/c$ flags: 64 fs/cifs/connect.c: CIFS VFS: in cifs_mount as Xid: 137594 with uid: 0 fs/cifs/connect.c: Username: administrator fs/cifs/connect.c: UNC: \\192.168.16.128\c$ ip: 192.168.16.128 fs/cifs/connect.c: Socket created fs/cifs/connect.c: sndbuf 16384 rcvbuf 87380 rcvtimeo 0x7fffffff fs/cifs/connect.c: Existing smb sess not found fs/cifs/transport.c: For smb_command 114 fs/cifs/transport.c: Sending smb of length 47 fs/cifs/connect.c: Demultiplex PID: 22419 fs/cifs/connect.c: rfc1002 length 0x79) fs/cifs/connect.c: Security Mode: 0x3 Capabilities: 0xe3fd Time Zone: 65476 fs/cifs/connect.c: In sesssetup fs/cifs/transport.c: For smb_command 115 fs/cifs/transport.c: Sending smb of length 272 fs/cifs/connect.c: rfc1002 length 0x87) fs/cifs/connect.c: UID = 2048 fs/cifs/connect.c: CIFS Session Established successfully fs/cifs/connect.c: file mode: 0x180 dir mode: 0x1c0 fs/cifs/connect.c: Next tcon - fs/cifs/transport.c: For smb_command 117 fs/cifs/transport.c: Sending smb of length 90 fs/cifs/connect.c: rfc1002 length 0x42) fs/cifs/connect.c: Tcon flags: 0x1 fs/cifs/connect.c: CIFS Tcon rc = 0 fs/cifs/cifssmb.c: In QFSDeviceInfo fs/cifs/transport.c: For smb_command 50 fs/cifs/transport.c: Sending smb of length 68 fs/cifs/connect.c: rfc1002 length 0x44) fs/cifs/cifssmb.c: In QFSAttributeInfo fs/cifs/transport.c: For smb_command 50 fs/cifs/transport.c: Sending smb of length 68 fs/cifs/connect.c: rfc1002 length 0x50) fs/cifs/connect.c: CIFS VFS: leaving cifs_mount (xid = 137594) rc = 0 fs/cifs/inode.c: CIFS VFS: in cifs_read_inode as Xid: 137595 with uid: 0 fs/cifs/inode.c: Getting info on fs/cifs/transport.c: For smb_command 50 fs/cifs/transport.c: Sending smb of length 74 fs/cifs/connect.c: rfc1002 length 0x8a) fs/cifs/inode.c: Old time 0 fs/cifs/inode.c: New time 203752098 fs/cifs/inode.c: Directory inode usbcore: deregistering driver usb-storage uhci_hcd 0000:00:1d.2: remove, state 1 usb usb4: USB disconnect, address 1 uhci_hcd 0000:00:1d.2: USB bus 4 deregistered uhci_hcd 0000:00:1d.1: remove, state 1 usb usb3: USB disconnect, address 1 uhci_hcd 0000:00:1d.1: USB bus 3 deregistered uhci_hcd 0000:00:1d.0: remove, state 1 usb usb2: USB disconnect, address 1 uhci_hcd 0000:00:1d.0: USB bus 2 deregistered ehci_hcd 0000:00:1d.7: remove, state 1 usb usb1: USB disconnect, address 1 ehci_hcd 0000:00:1d.7: USB bus 1 deregistered ieee1394: Node removed: ID:BUS[0-00:1023] GUID[00061b0201229cb4] Stopping tasks: ===============================================<7> fs/cifs/connect.c: cifsd thread killed ===============================================================================| pnp: Device 00:0b disabled. pnp: Device 00:0a disabled. Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Back to C! PCI: Setting latency timer of device 0000:00:1e.0 to 64 ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI Interrupt 0000:00:1f.5[B] -> Link [LNKB] -> GSI 5 (level, low) -> IRQ 5 PCI: Setting latency timer of device 0000:00:1f.5 to 64 ACPI: PCI Interrupt 0000:02:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI Interrupt 0000:02:00.1[B] -> Link [LNKB] -> GSI 5 (level, low) -> IRQ 5 ACPI: PCI Interrupt 0000:02:00.2[C] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11 pnp: Device 00:07 does not supported activation. pnp: Device 00:08 does not supported activation. pnp: Device 00:0a activated. pnp: Device 00:0b activated. Restarting tasks... done ipw2100: Intel(R) PRO/Wireless 2100 Network Driver, 1.1.4 ipw2100: Copyright(c) 2003-2005 Intel Corporation ACPI: PCI Interrupt 0000:02:02.0[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11 ipw2100: Detected Intel PRO/Wireless 2100 Network Connection ACPI: PCI Interrupt 0000:02:00.2[C] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11 ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[11] MMIO=[c0202000-c02027ff] Max Packet=[2048] IR/IT contexts=[4/4] ACPI: PCI Interrupt 0000:00:1d.7[D] -> Link [LNKH] -> GSI 11 (level, low) -> IRQ 11 PCI: Setting latency timer of device 0000:00:1d.7 to 64 ehci_hcd 0000:00:1d.7: EHCI Host Controller ehci_hcd 0000:00:1d.7: debug port 1 PCI: cache line size of 32 is not supported by device 0000:00:1d.7 ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:1d.7: irq 11, io mem 0xc0000000 ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: new device found, idVendor=0000, idProduct=0000 usb usb1: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb1: Product: EHCI Host Controller usb usb1: Manufacturer: Linux 2.6.16-rc6-git1-2-default ehci_hcd usb usb1: SerialNumber: 0000:00:1d.7 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 6 ports detected USB Universal Host Controller Interface driver v2.3 ACPI: PCI Interrupt 0000:00:1d.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 PCI: Setting latency timer of device 0000:00:1d.0 to 64 uhci_hcd 0000:00:1d.0: UHCI Host Controller uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:1d.0: irq 11, io base 0x00001800 usb usb2: new device found, idVendor=0000, idProduct=0000 usb usb2: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb2: Product: UHCI Host Controller usb usb2: Manufacturer: Linux 2.6.16-rc6-git1-2-default uhci_hcd usb usb2: SerialNumber: 0000:00:1d.0 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.1[B] -> Link [LNKD] -> GSI 11 (level, low) -> IRQ 11 PCI: Setting latency timer of device 0000:00:1d.1 to 64 uhci_hcd 0000:00:1d.1: UHCI Host Controller uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1d.1: irq 11, io base 0x00001820 usb usb3: new device found, idVendor=0000, idProduct=0000 usb usb3: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb3: Product: UHCI Host Controller usb usb3: Manufacturer: Linux 2.6.16-rc6-git1-2-default uhci_hcd usb usb3: SerialNumber: 0000:00:1d.1 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:1d.2[C] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11 PCI: Setting latency timer of device 0000:00:1d.2 to 64 uhci_hcd 0000:00:1d.2: UHCI Host Controller uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4 uhci_hcd 0000:00:1d.2: irq 11, io base 0x00001840 usb usb4: new device found, idVendor=0000, idProduct=0000 usb usb4: new device strings: Mfr=3, Product=2, SerialNumber=1 usb usb4: Product: UHCI Host Controller usb usb4: Manufacturer: Linux 2.6.16-rc6-git1-2-default uhci_hcd usb usb4: SerialNumber: 0000:00:1d.2 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected Initializing USB Mass Storage driver... usbcore: registered new driver usb-storage USB Mass Storage support registered. eth0 renamed to funk ieee1394: Host added: ID:BUS[0-00:1023] GUID[00061b0201229cb4]
Steven, what's the status of this? How shall we proceed?
I am puzzled - your trace shows cifsd getting killed rather than being put to sleep - I am not sure why - it may be something I don't understand about the suspend sequence of events but basically if cifsd process dies (is not killed) and it is not due to unmount - we will get EIO on that mount until it is umounted and mounted again.
Steve, can you reproduce it? Stopping tasks: ===============================================<7> fs/cifs/connect.c: cifsd thread killed ===============================================================================| pnp: Device 00:0b disabled. Is it opclock or dnotify thread dying? ... no, it should be the one in connect.c . ...ouch, could someone fix that code to follow codingstyle, please? AFAICS, kernel returns -EINTR from _recvmsg, and cifsd commits suicide. Can you try attached patch?
Created attachment 89236 [details] this may help...
Patch makes sense and looks fine. Fixing inconsistencies in the number of spaces etc. in fs/cifs/connect.c mostly had been put off because about 1/2 of connect.c is going away with the rewrite of the SMB session setup code necessitated by the addition of legacy lanman (for Windows 9x servers to obsolete smbfs), plaintext password and strong ntlmv2 (and in the future kerberos) password support.
Now in cifs development tree (should be autopulled into mm soon, will also push it with the cifs 1.44 into mainline when 2.6.18 opens up)
(Warning, the patch also changes code, so cifs should now survive the suspend/resume cycle. But someone -- not me -- needs to check that :-).
if you send me a URL to a kernel-default i386 RPM I'm volunteering to test it asap :-).
Don't have that - but I made a version for backlevel (earlier) kernels at http://us1.samba.org/samba/ftp/cifs-cvs/cifs-1.44.tar.gz
CIFS had one path where reconnect would not work after resume - but the code (1.45) now includes the fixes for suspend/resume including reconnect errors (a missing try_to_freeze). Will be including in the cifs rollup that we are testing.
mass reopening all SuSE Linux bugs that are set to REMIND+LATER to change the resolution to WONTFIX (adapting to new policy)
Closing old LATER+REMIND bugs as WONTFIX - if you still plan to work on it, feel free to reopen and set to ASSIGNED. In case the report saw repeated reopen comments, it's due to bugzilla timing out on the huge request ;(