Bugzilla – Bug 115200
IPsec crashes kernel
Last modified: 2006-02-16 15:54:30 UTC
The kernel crashes during startup of IPsec when trying to transmit packages over the tunnel before it is fully set up.
It appears that just enough load is needed to trigger the bug any time, not only during startup.
Is this on PowerPC?
Perhaps related: <http://marc.theaimsgroup.com/?l=linux-kernel&m=112578092026623&w=2>
Could you provide one or more oopses please?
There were none, not even xmon did say anything.
I've contacted Olaf Hering and we can use lychee (attached to HMC) to test it again on PPC. BTW: I'm running beta4 since 4 days on my gateway (ipsec + nat-t) and have no problems with it. It is a PII-Dual machine. On i386 it seems to work without problems. The only exception is, that a default setting of "plutowait=yes" was changed to "no" in the new openswan version. This causes, that the tunnels are now created on demand and the transmission of the first package per tunnel fails (e.g. ping reports "temporary unavaliable" error). Setting it back to "plutowait=yes" workarrounds the problem.
Same with my Beta 4 x86 laptop.
plutowait does not make any difference.
Yes, plutowait changes the pluto behaviour (user space) only and it seems, there is a kernel problem.
The system running openswan is behind a SuSE Linux 9.3 router/ gateway and I have to do IPsec NAT-T as the internal network is a private. I also have plutowait=yes in the config. sysrq is active, but doesn't help in the situation. I have to turn off the system. /etc/sysconfig/syslog:KERNEL_LOGLEVEL=7
NAT-T also does not make a difference.
I've reproduced the problem, but not on the system where I've tryed to do it :-) My ipsec gw to suse-ras is crashed, while I've tryed to "torture" a test system. As I wrote, it transfered about 1,2GB in last 4 days and today I've rsynced the beta4 CD ISOs without any problems: # l beta4-cds.log -rw-r--r-- 1 root root 292 Sep 5 12:55 beta4-cds.log # tail -2 beta4-cds.log wrote 116 bytes read 3594998622 bytes 596729.81 bytes/sec total size is 3594559488 speedup is 1.00 About 15min after the sync was finished, the machine stopped. No reaction on SYSRQ, NUM-Lock, not possible to wake up the screen saver on the text console. A try to boot the system failed, the last boot message I was able to read was: Boot logging started on /dev/tty1(/dev/console) ... after this two messages was displayed in a loop: ERROR: no hit for procs_blocked ERROR: no hit for procs_running A reiserfsck --rebuild-tree /dev/sda3 (normal reiserfsck displayed no errors) has put 13 files into /lost+found... There are no log messages about any error. The last message from Pluto is, that it has received a "Delete SA payload: replace IPSEC State" and tryed to reinit the connection.
The test machines: PPC with Beta4 kernel+openswan vs. 9.3-i386 + openswan-2.4.0 are still working fine with load average: 4.13, 2.35, 1.64 on the PPC machine.
Jiri, can you take a look on this please? There are 3 machines that we can use: g167.suse.de (currently 9.3-i386), lychee.suse.de and cube.suse.de, both PPC64 and attached to HMC recheable via "vncviewer boettger.suse.de:42". lychee.suse.de currently runs a tunnel to g167 and has enabled the xmon debugger.
Now, there is also a tunnel between g167 and cube.suse.de...
"ERROR: no hit for procs_blocked" seems to come from /sbin/startpar trying to read /proc/stat. It seems something is hosing the system memory big time. This is probably related to the L-K discussion referenced in comment #3, as that seems to be a case of bad memory corruption as well (the fact that it died in sha1_update is probably a red herring).
Okay, I have an oops now. Simply did a ping -i 0.01 -s 4096 10.10.103.167 on cube.
Created attachment 48814 [details] Oops
This is 100% reproduceable. It oopsed in the exact same location, with the exact same backtrace. (The oops attached above is a bit mangled though - my mouse sometimes pastes things twice; in this case please ignore everything after the first syscall_exit)
The way an skbuff is laid out in memory, we have an skb head pointing to some memory that holds the packet data, followed by some "shared info". If we write past the end of the packet data, we clobber this shared info, which among other things contains an array of struct page pointers. Calling put_page on random "addresses" is a great way to corrupt your memory.
Now we just need to find out if this is the tunnel mode stuff writing past the end of the packet, or the ESP code. Marius, could you check if this happens in transport mode as well? Or in tunnel mode, but AH mode only? I will be in a phone call in 2 minutes from now, so feel free to take cube.
Created attachment 48858 [details] Oops - cube in transport mode
If I change the config to auth=ah type=tunnel no Oops happens, the packages are transfered: 02:52:16.762251 00:00:1c:b5:6f:96 > 00:02:55:6f:97:29, ethertype IPv4 (0x0800), length 98: IP (tos 0x0, ttl 64, id 15086, offset 1480, flags [none], length: 84) 10.10.103.167 > 10.10.1.180: ah 02:52:16.762670 00:00:1c:b5:6f:96 > 00:02:55:6f:97:29, ethertype IPv4 (0x0800), length 1514: IP (tos 0x0, ttl 64, id 15087, offset 0, flags [+], length: 1500) 10.10.103.167 > 10.10.1.180: AH(spi=0xed7cccc2,sumlen=16,seq=0x10da): ESP(spi=0xcb3fa21b,seq=0x10da)
it happens also with vanilla kernel, mailed netdev.
2.6.12 works, 2.6.13-rc3 crashed like that: cpu 0x1: Vector: 300 (Data Access) at [c00000004b92f220] pc: c000000000211950: .memcpy+0x1d0/0x280 lr: d0000000001a87e4: .esp_output+0x3f4/0x560 [esp4] sp: c00000004b92f4a0 msr: 8000000000009032 dar: 4c766234 dsisr: 42000000 current = 0xc00000004b9c8030 paca = 0xc000000000517800 pid = 3949, comm = ping enter ? for help 1:mon> t [link register ] d0000000001a87e4 .esp_output+0x3f4/0x560 [esp4] [c00000004b92f4a0] d0000000001a8548 .esp_output+0x158/0x560 [esp4] (unreliable) [c00000004b92f580] c0000000003b04e8 .xfrm4_output+0x238/0x450 [c00000004b92f640] c00000000036c188 .ip_push_pending_frames+0x4d8/0x590 [c00000004b92f700] c00000000038fbac .raw_sendmsg+0x66c/0x840 [c00000004b92f830] c00000000039b7cc .inet_sendmsg+0x6c/0xb0 [c00000004b92f8d0] c00000000032a41c .sock_sendmsg+0x16c/0x1c0 [c00000004b92fae0] c00000000032b4c4 .sys_sendmsg+0x204/0x340 [c00000004b92fd10] c00000000034da24 .compat_sys_sendmsg+0x14/0x30 [c00000004b92fd90] c00000000034e400 .compat_sys_socketcall+0x290/0x2b0 [c00000004b92fe30] c00000000000d600 syscall_exit+0x0/0x18 --- Exception: c01 (System Call) at 0000000007f518f8 SP (ffbd38a0) is in userspace looking further.
This is a different bug that was fixed in rc4, see <http://marc.theaimsgroup.com/?l=linux-kernel&m=112183465507337&w=2>.
its in the first 34 of ~olh/patch*
Created attachment 49008 [details] patch-2.6.13-rc2-git3-21 this causes the oops
The patch Herbert posted to the netdev list fixes the crash. Odd, I can ping ok with vanilla 2.6.13. But I cant with our current cvs + the patch from Herbert. cube runs the "fixed" kernel. Please check why ping doesnt work, while it does with vanilla 2.6.13. g224:~ # ssh root@g167.suse.de "rcipsec restart" g224:~ # ssh root@cube.suse.de "rcipsec restart ; sync" g224:~ # ssh root@cube.suse.de "sync ; sleep 1 ; ping -i 0.01 -s 4096 10.10.103.167"
Created attachment 49017 [details] ipsec-crash-bug115200.patch patch from Herbert which fixes the crash in put_page
re comment #29: g167 replies to the echo request, cube seems to unwrap it properly, but something's still wrong...
The unwrapped and reassembled packet has a bad IP checksum. Still checking
Marius, can you confirm that this works properly with 9.3 talking to 9.3?
When I use "ping -s 1500", I can see two packets coming into cube. Packet1: Fragment, 1480 octets, checksum 1234 Packet2, Fragment, 48 bytes, offset 1480, checksum ABCD This is then reassembled into one big packet of 1508 bytes; simply copying the checksum from the first packet.
can I reboot into a vanilla kernel? just type reboot if yes
works with vanilla kernel, must be one of our patches.
interesting...
Can you please try reverting the following patches: patches.suse/netfilter-02-output-hooks patches.suse/netfilter-03-input-hooks patches.suse/netfilter-04-policy-lookup patches.suse/netfilter-05-policy-checks patches.suse/netfilter-06-policy-match
yes, this helps. - patches.suse/netfilter-02-output-hooks - patches.suse/netfilter-03-input-hooks - patches.suse/netfilter-04-policy-lookup - patches.suse/netfilter-05-policy-checks - patches.suse/netfilter-06-policy-match ++okir patches.suse/netfilter-02-output-hooks ++okir patches.suse/netfilter-03-input-hooks ++okir patches.suse/netfilter-04-policy-lookup ++okir patches.suse/netfilter-05-policy-checks ++okir patches.suse/netfilter-06-policy-match
(In reply to comment #33) > Marius, can you confirm that this works properly with 9.3 talking to 9.3? > Yes, it works fine on 9.3 (openswan-2.4.0rc4 => openswan-2.2.0 in this case): root@Nirwana:~ # uname -a Linux Nirwana 2.6.11.4-21.8-smp #1 SMP Tue Jul 19 12:42:37 UTC 2005 i686 i686 i386 GNU/Linux root@Nirwana:~ # rpm -q openswan openswan-2.4.0-4.1 root@Nirwana:~ # ping -i 0.01 -s 4096 ras.suse.de PING ras.suse.de (10.0.9.7) 4096(4124) bytes of data. 4104 bytes from 10.0.9.7: icmp_seq=1 ttl=62 time=98.8 ms [...] --- ras.suse.de ping statistics --- 30 packets transmitted, 6 received, 80% packet loss, time 361ms rtt min/avg/max/mdev = 98.807/192.206/286.146/64.127 ms, pipe 24 root@Nirwana:~ # ping -i 0.01 -s 4096 app-team PING app-team.suse.de (10.10.2.53) 4096(4124) bytes of data. 4104 bytes from 10.10.2.53: icmp_seq=1 ttl=62 time=96.2 ms [...] --- app-team.suse.de ping statistics --- 19 packets transmitted, 3 received, 84% packet loss, time 232ms rtt min/avg/max/mdev = 96.263/130.920/166.074/28.503 ms, pipe 13 I've tryed other sizes (1024, 1500, 2048, ...) as well.
Ahmm.. the ipsec peer is suse-ras (sles9) in above case.
NO, it doesn't. I've tested it on 9.3-i386 <=> 9.3-i386 now using different openswan versions: openswan 2.4.0rc4 vs. 2.4.0rc4 openswan 2.2.0+fix vs. openswan 2.2.0+fix (enckeylen, Bug #95863) openswan 2.2.0(DVD) vs. openswan 2.2.0(DVD) root@Xanthos:~ # uname -a Linux Xanthos 2.6.11.4-21.8-smp #1 SMP Tue Jul 19 12:42:37 UTC 2005 i686 i686 i386 GNU/Linux Trinity:~ # uname -a Linux Trinity 2.6.11.4-21.7-smp #1 SMP Thu Jun 2 14:23:14 UTC 2005 i686 i686 i386 GNU/Linux => using -s 1024 works, using (1500 or) 4096 not - the peer (xanthos) does not response to the ping, independently from the openswan version: 16:04:05.451754 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype IPv4 (0x0800), length 1514: IP (tos 0x0, ttl 64, id 40279, offset 0, flags [+], length: 1500) 192.168.110.5 > 192.168.110.1: ESP(spi=0x8f1cbcd2,seq=0xbdd3) 16:04:05.451939 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype IPv4 (0x0800), length 86: IP (tos 0x0, ttl 64, id 40279, offset 1480, flags [none], length: 72) 192.168.110.5 > 192.168.110.1: esp 16:04:05.452384 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype IPv4 (0x0800), length 1514: IP (tos 0x0, ttl 64, id 40278, offset 0, flags [+], length: 1500) 192.168.110.5 > 192.168.110.1: icmp 1480: echo request seq 1 16:04:05.452488 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype IPv4 (0x0800), length 1514: IP (tos 0x0, ttl 64, id 40280, offset 0, flags [+], length: 1500) 192.168.110.5 > 192.168.110.1: ESP(spi=0x8f1cbcd2,seq=0xbdd4) 16:04:05.452506 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype IPv4 (0x0800), length 86: IP (tos 0x0, ttl 64, id 40280, offset 1480, flags [none], length: 72) 192.168.110.5 > 192.168.110.1: esp 16:04:05.452506 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype IPv4 (0x0800), length 1514: IP (tos 0x0, ttl 64, id 40278, offset 1480, flags [+], length: 1500) 192.168.110.5 > 192.168.110.1: icmp 16:04:05.453105 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype IPv4 (0x0800), length 1230: IP (tos 0x0, ttl 64, id 40281, offset 0, flags [none], length: 1216) 192.168.110.5 > 192.168.110.1: ESP(spi=0x8f1cbcd2,seq=0xbdd5) 16:04:05.453105 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype IPv4 (0x0800), length 1178: IP (tos 0x0, ttl 64, id 40278, offset 2960, flags [none], length: 1164) 192.168.110.5 > 192.168.110.1: icmp 16:04:05.453105 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype IPv4 (0x0800), length 4138: IP (tos 0x0, ttl 64, id 40278, offset 0, flags [none], length: 4124, bad cksum 5a73 (->7033)!) 192.168.110.5 > 192.168.110.1: icmp 4104: echo request seq 1 16:04:10.451210 00:40:05:40:91:02 > 00:e0:81:24:21:ca, ethertype ARP (0x0806), length 60: arp who-has 192.168.110.1 tell 192.168.110.5 16:04:10.451238 00:e0:81:24:21:ca > 00:40:05:40:91:02, ethertype ARP (0x0806), length 42: arp reply 192.168.110.1 is-at 00:e0:81:24:21:ca
But no crash happens here.
Good. So the problem is that these policy patches cause the reassembled packet to be routed through ip_rcv once more, which checks the IP csum. This needs an added ip_send_check somewhere (preferably somewhere in xfrm_input.c). I will not have any time to debug this tomorrow, so if someone would like to try this, be my guest. Jiri?
Created attachment 49085 [details] Possible patch What the heck. Heres a patch you may want to try.
Sorry, I won't be able to look into this until next week.
I'm pretty positive the patch above should fix it. If it doesn't, I'll get back to this tomorrow.
this patch fixes it.
Great! Thanks for verifying this. I committed the patch to the HEAD branch.
I still have problems with the ipsec and RC4. After some time (10-15 minutes) my system got frozen. And this only happens if I use ipsec.
thats nice. open a new bug. this one is about some other and already fixed crash.
Still not fixed.
Yes.
Please continue the discussion in bug #117749.
No, that's a different bug.
I'm willing to be educated... Please describe what symptoms you are seeing.
See comment #1.
I've updated the system to 2.6.14-rc5-20051021130504-default. No crash till now but again several times 'Resource temporarily unavailable'.
I'm sorry, but with all the other stuff going on I currently don't have the time to debug this properly. Marius and/or Jiri, can you look into this please and try to reproduce?
I've tryed it in the last two weeks. What happens, is not directly a crash but a system lockup/deadlock. And it depends on ipsec / ipsec triggers it. Most of my tests was with openswan with NAT traversal behind a router, but it happens with racoon as well. Without ipsec the machines are running stable. I've tryed it with the original 10.0 kernel-smp-2.6.13-15, the current one (kernel-smp-2.6.13-15.4) from /work/CDs/all/full-10.0-i386 with enabled nmi_watchdog=1 as well as with kernel-debug-2.6.13-15.3 with kdb=on. I never seen any Oops. The kind of the lockup differs every time. Sometimes it happens directly while ipsec establishes the SAs, sometimes it works fine for multiple hours and "stops without any reason". Interesting is, that the lockups are different to each other. Every time, the machine closes all connections (e.g. a ssh to it) and serial console and keyboard does not respond any more. In many cases it is also not possible to ping the machine. But sometimes (rarely) I can still access the internet behind the machine, except the tunnelled networks and the machine itself. For example, yesterday night I've transferred 18GB over the tunnel and then the tunnel expired because the IP changes on the router. In the hope, the current kernel-smp-2.6.13-15.4 works better, I restarted the tunnel and contacted Lars via IRC to tell him about. And then, after one half hour the lockup happened again. Nothing worked (keyboard, serial console was not responding) and I phoned Uwe to write Lars, that I've a lockup again. He wrote it and after ~2 min the message he wrote, appeared in my IRC client. This means, that the tunnel "works", but the machine is busy with something. Strange.
What's the status of this bug? Isn't this a dup of bug #117749?
See comment #56
Alright, I see...
I committed a patch to CVS that may fix this problem. Please try the upcoming KOTD.
Christoph, is this fixed?
No, it isn't. With the patch from attachement #49085, I didn't had any crashes -- but that patch had some bad side effects in ipv6 AFAIK, that's why it's disabled right now. Olaf?
I'v been using/testing the latests patch in STABLE * Mo Feb 06 2006 - mason@suse.de - patches.fixes/xfrm-endless-loop: Fix infinite loop in xfrm_lookup (117749, 115200). Add error_nopol: back and didn't had a single crash since then. Just now I saw this messages in /var/log/messages: Feb 8 13:09:56 t41p klogd: xfrm_lookup bailing out after 11 loops Feb 8 13:09:56 t41p klogd: [<c0277dac>] xfrm_lookup+0x9c/0x473 Feb 8 13:09:56 t41p klogd: [<c024fc51>] ip_output+0x1e3/0x242 Feb 8 13:09:56 t41p klogd: [<c0115912>] default_wake_function+0x0/0xc Feb 8 13:09:56 t41p klogd: [<c0276d49>] xfrm4_output_finish+0x8a/0x36a Feb 8 13:09:56 t41p klogd: [<c0248738>] ip_route_output_flow+0x77/0x8b Feb 8 13:09:56 t41p klogd: [<c02643bb>] raw_sendmsg+0x2e6/0x70f Feb 8 13:09:56 t41p klogd: [<c0263f61>] raw_recvmsg+0x6e/0x171 Feb 8 13:09:56 t41p klogd: [<c026af1e>] inet_sendmsg+0x35/0x3f Feb 8 13:09:56 t41p klogd: [<c0229dde>] sock_sendmsg+0xd2/0xec Feb 8 13:09:56 t41p klogd: [<c0127a23>] autoremove_wake_function+0x0/0x2d Feb 8 13:09:56 t41p klogd: [<c0127a23>] autoremove_wake_function+0x0/0x2d Feb 8 13:09:56 t41p klogd: [<c0280df5>] _spin_unlock_irq+0x5/0x7 Feb 8 13:09:56 t41p klogd: [<c027fc2b>] schedule+0x4dd/0x4f7 Feb 8 13:09:56 t41p klogd: [<c01ae0c3>] copy_from_user+0x54/0x80 Feb 8 13:09:56 t41p klogd: [<c022fa5a>] verify_iovec+0x3e/0x70 Feb 8 13:09:56 t41p klogd: [<c022a209>] sys_sendmsg+0x186/0x1df Feb 8 13:09:56 t41p klogd: [<c022aa72>] sys_recvmsg+0x1a1/0x1b5 Feb 8 13:09:56 t41p klogd: [<c0240002>] tc_ctl_tclass+0x1b8/0x1dc Feb 8 13:09:56 t41p klogd: [<c0139672>] get_page_from_freelist+0x6f/0x2c4 Feb 8 13:09:56 t41p klogd: [<c012a110>] ktime_get+0x10/0x2c Feb 8 13:09:56 t41p klogd: [<c0280df5>] _spin_unlock_irq+0x5/0x7 Feb 8 13:09:56 t41p klogd: [<c027fbe7>] schedule+0x499/0x4f7 Feb 8 13:09:56 t41p klogd: [<c0139510>] __free_pages+0x3f/0x107 Feb 8 13:09:56 t41p klogd: [<c0280e34>] _spin_lock_irqsave+0x9/0xd Feb 8 13:09:56 t41p klogd: [<c022b236>] sys_socketcall+0x183/0x19e Feb 8 13:09:56 t41p klogd: [<c011baa1>] sys_gettimeofday+0x27/0x53 Feb 8 13:09:56 t41p klogd: [<c0102a5b>] sysenter_past_esp+0x54/0x79
this messages should have vanished with current kernels, can somebody please retest if this is now fixed?
afaik the upstream patch hasn't been merged into the SUSE tree yet.
For what I can tell it seems to be fixed -- it didn't crash at least after this change: * Do Feb 09 2006 - okir@suse.de - patches.fixes/xfrm-endless-loop: Updated with upstream patch (117749, 115200). But I still have recurring "connection hangs", where we run into some kind of timeout (with ras-beta), which of course doesn't have much todo with this bug ;) Feel free to close FIXED.
Until someone proves me otherwise, I consider this as a duplicte of 117749 *** This bug has been marked as a duplicate of 117749 ***