Bug 544181 - named segfaults upon starting
Summary: named segfaults upon starting
Status: RESOLVED DUPLICATE of bug 557302
: 548573 (view as bug list)
Alias: None
Product: openSUSE 11.2
Classification: openSUSE
Component: Network (show other bugs)
Version: Factory
Hardware: x86-64 openSUSE 11.2
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-04 09:36 UTC by Martin Jedamzik
Modified: 2009-12-13 21:52 UTC (History)
10 users (show)

See Also:
Found By: Field Engineer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
corefile of named (1.21 MB, application/octet-stream)
2009-10-04 09:38 UTC, Martin Jedamzik
Details
/var/log/messages with a trace (82.77 KB, application/octet-stream)
2009-10-19 09:38 UTC, Uwe Gansert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Jedamzik 2009-10-04 09:36:31 UTC
named always segfaults upon starting (bind-9.6.1P1-1.9.x86_64)

Attached is the core, supportconfig won't finish (hangs at DNS)
Comment 1 Martin Jedamzik 2009-10-04 09:38:22 UTC
Created attachment 320955 [details]
corefile of named
Comment 2 Uwe Gansert 2009-10-09 10:17:41 UTC
please test again with the next milestone and not with factory.
I can not reproduce this here on milestone 8.
Comment 5 Eugene Ryazanov 2009-10-18 03:21:01 UTC
On Milestone 8 named works for me just fine.

But not on 11.2-RC1 (x86_64):

# tail /var/log/messages
Oct 18 11:47:54 main named[10418]: starting BIND 9.6.1-P1 -t /var/lib/named -u named
Oct 18 11:47:54 main named[10418]: built with '--prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=/usr/lib64' '--includedir=/usr/include/bind' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-openssl' '--enable-threads' '--with-libtool' '--enable-runidn' '--with-libxml2' '--with-dlz-mysql' 'CFLAGS=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fno-strict-aliasing' 'LDFLAGS=-L/usr/lib64'
Oct 18 11:47:54 main kernel: [47646.058637] type=1503 audit(1255834074.636:21): operation="capable" pid=10418 parent=10417 profile="/usr/sbin/named" name="sys_resource"
Oct 18 11:47:54 main kernel: [47646.058679] named[10418]: segfault at 7f3af4fff92c ip 00007f3af4ddef77 sp 00007fff523e8f50 error 4
Oct 18 11:47:54 main kernel: [47646.058705] note: named[10418] exited with preempt_count 1
Comment 6 Eugene Ryazanov 2009-10-18 03:49:24 UTC
Also, this problem does not appear on 11.2-RC1/i586.
Comment 7 Uwe Gansert 2009-10-19 09:37:15 UTC
I think that's a kernel problem:

note: named[27799] exited with preempt_count 1
BUG: scheduling while atomic: named/27799/0x10000002

I'll attach the messages file
Comment 8 Uwe Gansert 2009-10-19 09:38:09 UTC
Created attachment 323030 [details]
/var/log/messages with a trace
Comment 9 Uwe Gansert 2009-10-21 14:35:19 UTC
*** Bug 548573 has been marked as a duplicate of this bug. ***
Comment 10 Alexandre Rogoski 2009-10-30 12:26:15 UTC
Still happening on 11.2 RC2.
Comment 11 Martin Jedamzik 2009-10-30 12:46:05 UTC
(In reply to comment #7)
> I think that's a kernel problem:

the problem does not occur on the default kernel version, only on the desktop-flavour ...
Comment 12 Benjamin Vander Jagt 2009-11-03 00:06:23 UTC
I built 2.6.32-rc5-git5 using a copy of the config from 2.6.31-5.0.1-desktop, and the error doesn't occur.  Sorry that I don't know how to find what kernel patch fixes it, but at least it's a functional workaround for servers that need/desire low latency.
Comment 13 Martin Jedamzik 2009-11-03 08:05:41 UTC
(In reply to comment #12)
> I built 2.6.32-rc5-git5 using a copy of the config from 2.6.31-5.0.1-desktop,
> and the error doesn't occur.  Sorry that I don't know how to find what kernel
> patch fixes it, but at least it's a functional workaround for servers that
> need/desire low latency.

@Benjamin please tell us what you need to reproduce ...
As I stated earlier I have two systems available for testing. 
If you need any assistance, just tell me
Comment 14 Benjamin Vander Jagt 2009-11-03 14:59:15 UTC
@Martin Jedamzik
Do you want me to reproduce the error or solution?  For my solution, after installing openSUSE 11.2-RC2 and all available updates, I installed make, gcc-c++, and ncurses-devel.  I downloaded Kernel 2.6.32-rc5 and the patch to git5, patched the source, and copied it to /usr/src/linux-2.6.32-rc5-git5.  I copied the /boot/config-2.6.31.5-0.1-desktop to /usr/src/linux-2.6.32-rc5-git5/.config and ran "make menuconfig".  I didn't change anything, just exited and saved, so that it would fill in the blanks for me.  I ran "make", "make install", "make modules_install", and then mkinitrd (I forget the command line, but it was something like mkinitrd -i /boot/initrd-2.6.32.... -k /boot/vmlinuz-2.6.32..."  I modified /boot/grub/menu.lst so that the new kernel would boot first, and I added the line "initrd /boot/initrd-2.6.32-rc5-git5-0.1-desktop" to the entry.  I rebooted, and everything seems to be working perfectly, including named.

I don't know how to pick and choose particular kernel patches without botching up my kernel sources, but somewhere between the 2.6.31.5 and 2.6.32-rc5-git5 kernels the patch fixed the bug causing the "scheduling while atomic" crash when trying to use named.  I guess it's possible to build half-way-in-between kernels until the exact version with the patch is discovered, then add that to the final kernel in openSUSE, since making a kernel version number change this late in the game is a bad idea.  If I can find the time to, I'll do that, but I don't know if I can in the next week.
Comment 15 Martin Jedamzik 2009-11-03 19:30:25 UTC
(In reply to comment #14)
> @Martin Jedamzik
> Do you want me to reproduce the error or solution?  For my solution, ....

@Benjamin Sorry, this is a misunderstanding, mostly from my side. :-)

My comment (11) was meant as a hint to Uwe and to make thinks easier. At least it works with the default kernel, so the reason for the segfault must be one of the desktop optimizations, mustn't it?

The big mistake on my side was comment 13, thinking you were responsible for fixing the bug. Sorry !

Since you found out that the segfault does not happen anymore with 2.6.32-rc5-git5, I'm quite sure Uwe will present us a working solution quite soon. 
 
Unfortunately I'm still able to reproduce the bug, so if a test system is needed ...
Comment 16 Uwe Gansert 2009-11-06 10:39:42 UTC
I tried a self compiled vanilla 2.6.31 and 2.6.32 with our desktop flavour settings and both times bind did not crash.
It must be some of our patches and I try to find out which one.
Since I have very low kernel patches experience, it will take quite some time.
Comment 17 Michal Kubeček 2009-11-13 03:03:01 UTC
I had the same problem, it seems to be caused by AppArmor profile for BIND. It can be solved by granting capability sys_resource in /etc/apparmor.d/usr.sbin.named
Comment 18 Edwin Boersma 2009-11-13 14:21:20 UTC
It happens on 11.2 GM as well (kernel 2.6.31.5-0.1-desktop).
Comment 19 Lars Müller 2009-11-17 11:26:42 UTC
See also http://lists.opensuse.org/opensuse-de/2009-11/msg00722.html  Also from this thread it looks like AppArmor is causing the issue.
Comment 20 Edwin Boersma 2009-11-17 11:36:17 UTC
(In reply to comment #19)
> See also http://lists.opensuse.org/opensuse-de/2009-11/msg00722.html  Also from
> this thread it looks like AppArmor is causing the issue.

Don't know about that. I installed kernel 2.6.31.5-0.1-default and the problem is gone.
Comment 22 Christian Dengler 2009-11-23 20:20:12 UTC
I think we can wait and pick it up with the next maintenance update. Workaround is available.
Comment 23 Marcus Meissner 2009-12-01 17:21:14 UTC
why wasnt this fix in the security update? It was checked in on Nov 25!
Comment 25 Marcus Meissner 2009-12-02 10:09:19 UTC
if this needs a kernel fix, then no seperate SWAMPID is required. It will just be in the next kernel update if a kernel developer has committed a fix.
Comment 26 Jeff Mahoney 2009-12-13 21:52:10 UTC
This is the same AppArmor issue documented in bnc#557302

*** This bug has been marked as a duplicate of bug 557302 ***