|
Bugzilla – Full Text Bug Listing |
| Summary: | named segfaults upon starting | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.2 | Reporter: | Martin Jedamzik <martin.jedamzik> |
| Component: | Network | Assignee: | E-mail List <kernel-maintainers> |
| Status: | RESOLVED DUPLICATE | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | alexandre, benvanderjagt, edwin, jreuter, katzyn, martin.jedamzik, meissner, mike, petr.m, ug |
| Version: | Factory | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 11.2 | ||
| Whiteboard: | |||
| Found By: | Field Engineer | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
corefile of named
/var/log/messages with a trace |
||
|
Description
Martin Jedamzik
2009-10-04 09:36:31 UTC
Created attachment 320955 [details]
corefile of named
please test again with the next milestone and not with factory. I can not reproduce this here on milestone 8. On Milestone 8 named works for me just fine. But not on 11.2-RC1 (x86_64): # tail /var/log/messages Oct 18 11:47:54 main named[10418]: starting BIND 9.6.1-P1 -t /var/lib/named -u named Oct 18 11:47:54 main named[10418]: built with '--prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=/usr/lib64' '--includedir=/usr/include/bind' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-openssl' '--enable-threads' '--with-libtool' '--enable-runidn' '--with-libxml2' '--with-dlz-mysql' 'CFLAGS=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fno-strict-aliasing' 'LDFLAGS=-L/usr/lib64' Oct 18 11:47:54 main kernel: [47646.058637] type=1503 audit(1255834074.636:21): operation="capable" pid=10418 parent=10417 profile="/usr/sbin/named" name="sys_resource" Oct 18 11:47:54 main kernel: [47646.058679] named[10418]: segfault at 7f3af4fff92c ip 00007f3af4ddef77 sp 00007fff523e8f50 error 4 Oct 18 11:47:54 main kernel: [47646.058705] note: named[10418] exited with preempt_count 1 Also, this problem does not appear on 11.2-RC1/i586. I think that's a kernel problem: note: named[27799] exited with preempt_count 1 BUG: scheduling while atomic: named/27799/0x10000002 I'll attach the messages file Created attachment 323030 [details]
/var/log/messages with a trace
*** Bug 548573 has been marked as a duplicate of this bug. *** Still happening on 11.2 RC2. (In reply to comment #7) > I think that's a kernel problem: the problem does not occur on the default kernel version, only on the desktop-flavour ... I built 2.6.32-rc5-git5 using a copy of the config from 2.6.31-5.0.1-desktop, and the error doesn't occur. Sorry that I don't know how to find what kernel patch fixes it, but at least it's a functional workaround for servers that need/desire low latency. (In reply to comment #12) > I built 2.6.32-rc5-git5 using a copy of the config from 2.6.31-5.0.1-desktop, > and the error doesn't occur. Sorry that I don't know how to find what kernel > patch fixes it, but at least it's a functional workaround for servers that > need/desire low latency. @Benjamin please tell us what you need to reproduce ... As I stated earlier I have two systems available for testing. If you need any assistance, just tell me @Martin Jedamzik Do you want me to reproduce the error or solution? For my solution, after installing openSUSE 11.2-RC2 and all available updates, I installed make, gcc-c++, and ncurses-devel. I downloaded Kernel 2.6.32-rc5 and the patch to git5, patched the source, and copied it to /usr/src/linux-2.6.32-rc5-git5. I copied the /boot/config-2.6.31.5-0.1-desktop to /usr/src/linux-2.6.32-rc5-git5/.config and ran "make menuconfig". I didn't change anything, just exited and saved, so that it would fill in the blanks for me. I ran "make", "make install", "make modules_install", and then mkinitrd (I forget the command line, but it was something like mkinitrd -i /boot/initrd-2.6.32.... -k /boot/vmlinuz-2.6.32..." I modified /boot/grub/menu.lst so that the new kernel would boot first, and I added the line "initrd /boot/initrd-2.6.32-rc5-git5-0.1-desktop" to the entry. I rebooted, and everything seems to be working perfectly, including named. I don't know how to pick and choose particular kernel patches without botching up my kernel sources, but somewhere between the 2.6.31.5 and 2.6.32-rc5-git5 kernels the patch fixed the bug causing the "scheduling while atomic" crash when trying to use named. I guess it's possible to build half-way-in-between kernels until the exact version with the patch is discovered, then add that to the final kernel in openSUSE, since making a kernel version number change this late in the game is a bad idea. If I can find the time to, I'll do that, but I don't know if I can in the next week. (In reply to comment #14) > @Martin Jedamzik > Do you want me to reproduce the error or solution? For my solution, .... @Benjamin Sorry, this is a misunderstanding, mostly from my side. :-) My comment (11) was meant as a hint to Uwe and to make thinks easier. At least it works with the default kernel, so the reason for the segfault must be one of the desktop optimizations, mustn't it? The big mistake on my side was comment 13, thinking you were responsible for fixing the bug. Sorry ! Since you found out that the segfault does not happen anymore with 2.6.32-rc5-git5, I'm quite sure Uwe will present us a working solution quite soon. Unfortunately I'm still able to reproduce the bug, so if a test system is needed ... I tried a self compiled vanilla 2.6.31 and 2.6.32 with our desktop flavour settings and both times bind did not crash. It must be some of our patches and I try to find out which one. Since I have very low kernel patches experience, it will take quite some time. I had the same problem, it seems to be caused by AppArmor profile for BIND. It can be solved by granting capability sys_resource in /etc/apparmor.d/usr.sbin.named It happens on 11.2 GM as well (kernel 2.6.31.5-0.1-desktop). See also http://lists.opensuse.org/opensuse-de/2009-11/msg00722.html Also from this thread it looks like AppArmor is causing the issue. (In reply to comment #19) > See also http://lists.opensuse.org/opensuse-de/2009-11/msg00722.html Also from > this thread it looks like AppArmor is causing the issue. Don't know about that. I installed kernel 2.6.31.5-0.1-default and the problem is gone. I think we can wait and pick it up with the next maintenance update. Workaround is available. why wasnt this fix in the security update? It was checked in on Nov 25! if this needs a kernel fix, then no seperate SWAMPID is required. It will just be in the next kernel update if a kernel developer has committed a fix. This is the same AppArmor issue documented in bnc#557302 *** This bug has been marked as a duplicate of bug 557302 *** |