Bug 555653 - kernel: ... BUG: scheduling while atomic: cupsd ...
Summary: kernel: ... BUG: scheduling while atomic: cupsd ...
Status: RESOLVED DUPLICATE of bug 557302
Alias: None
Product: openSUSE 11.2
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: x86-64 openSUSE 11.2
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: Jeff Mahoney
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-15 17:50 UTC by Elmar Stellnberger
Modified: 2009-12-01 14:59 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
cups error_log, loglevel debug (3.99 KB, text/plain)
2009-11-16 13:09 UTC, Elmar Stellnberger
Details
console output of cups-deviced (2.87 KB, text/plain)
2009-11-17 10:28 UTC, Elmar Stellnberger
Details
tail -f /var/log/messages (4.68 KB, text/plain)
2009-11-19 13:57 UTC, Elmar Stellnberger
Details
strace -f cupsd (125.60 KB, text/plain)
2009-11-19 16:19 UTC, Elmar Stellnberger
Details
dmesg -s 1000000 (22.93 KB, application/x-bzip)
2009-11-19 16:55 UTC, Elmar Stellnberger
Details
/var/log/messages* (137.55 KB, application/x-bzip-compressed-tar)
2009-11-19 18:43 UTC, Elmar Stellnberger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Elmar Stellnberger 2009-11-15 17:50:18 UTC
User-Agent:       Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.10 (like Gecko) SUSE

> cups start
Starting cupsdcupsd: Child exited on signal 11!
startproc:  exit status of parent of /usr/sbin/cupsd: 3

> tail -f /var/log/messages
Nov 15 18:42:42 linux-k7n1 kernel: [29409.355480] cupsd[2632]: segfault at 7fe59cfd7c10 ip 00007fe59cfd7c10 sp 00007fffafaf04e8 error 14
Nov 15 18:42:42 linux-k7n1 kernel: [29409.355507] note: cupsd[2632] exited with preempt_count 1


Reproducible: Always
Comment 1 Elmar Stellnberger 2009-11-15 20:13:20 UTC
Interestingly cups starts well if I set CUPSD_OPTIONS="-f" in /etc/sysconfig/cups. Nonetheless installing my printer driver fails because lpinfo -v hangs.
Comment 2 Elmar Stellnberger 2009-11-15 20:29:25 UTC
Sometimes cupsd terminates unmotivatedly and has to be restarted.
Comment 3 Elmar Stellnberger 2009-11-16 13:08:22 UTC
Updated to newest cups-1.3.11-25.1.x86_64 from the build service (repositories/Printing/openSUSE_11.2/). Error is still the same: Cupsd is only working with the -f or -F option; loglevel debug in cupsd.conf set; without -f no message in error_log; with -F: sigsegv when trying to add a printer; see attachement
Comment 4 Elmar Stellnberger 2009-11-16 13:09:30 UTC
Created attachment 327696 [details]
cups error_log, loglevel debug
Comment 5 Johannes Meixner 2009-11-17 08:38:03 UTC
Do you again have AppArmor running
like in bug #474403 or bug #539401?

If yes,
switch off AppArmor completely and retry.
Comment 6 Johannes Meixner 2009-11-17 08:49:05 UTC
Attachment #327696 [details] shows
------------------------------------------------------------------------
E [16/Nov/2009:14:01:09 +0100] PID 4183 
 (/usr/lib64/cups/daemon/cups-deviced) crashed on signal 11!
I [16/Nov/2009:14:01:40 +0100] Scheduler shutting down normally.
------------------------------------------------------------------------
so that it is not the cupsd which segfaults but the cups-deviced
and because of this the cupsd is "shutting down normally".

If the root cause is not AppArmor,
(i.e. if cups-deviced segfaults also without running AppArmor)
to find out more about cups-deviced, run it as root as

/usr/lib64/cups/daemon/cups-deviced 1 0 4 requested-attributes=all

and report its results.
Comment 7 Elmar Stellnberger 2009-11-17 10:28:22 UTC
Created attachment 327864 [details]
console output of cups-deviced

hope that helps.
Comment 8 Johannes Meixner 2009-11-17 10:32:01 UTC
Do you again have AppArmor running
like in bug #474403 or bug #539401?
Comment 9 Johannes Meixner 2009-11-17 10:35:31 UTC
As far as I see attachment #327864 [details]
doesn't show any error.

What results for you

/usr/lib64/cups/daemon/cups-deviced 1 0 4 \
 requested-attributes=all 1>/dev/null ; echo $?
Comment 10 Elmar Stellnberger 2009-11-17 12:00:18 UTC
No, apparmor is disabled for cups and cups-deviced.

> /usr/lib64/cups/daemon/cups-deviced 1 0 4 \
>  requested-attributes=all 1>/dev/null ; echo $?
DEBUG: No address specified and no Address line in /etc/cups/snmp.conf...
DEBUG: [cups-deviced] Added device "beh"...
DEBUG: [cups-deviced] Added device "ipp"...
DEBUG: [cups-deviced] Added device "socket"...
DEBUG: [cups-deviced] Added device "hpfax"...
DEBUG: [cups-deviced] Added device "lpd"...
DEBUG: [cups-deviced] Added device "pipe"...
DEBUG: [cups-deviced] Added device "smb"...
DEBUG: [cups-deviced] Added device "hal"...
DEBUG: [cups-deviced] Added device "scsi"...
DEBUG: [cups-deviced] Added device "hp"...
DEBUG: [cups-deviced] Added device "http"...
0
Comment 11 Johannes Meixner 2009-11-17 14:25:35 UTC
Strange:

Comment #0:
"cupsd ... segfault"

Comment #2:
"Sometimes cupsd terminates unmotivatedly"

Comment #4 the attachment therein:
"cupsd ... normally" but "cups-deviced crashed on signal 11"

Comment #10:
cups-deviced works well

I have no idea what exactly goes wrong on your system
except that something seems to be somehow wrong...

Currently I can only set it back to you as "needinfo"
to provide me a description how you can reproduce a crash
of cupsd or cups-deviced or anything related.
Comment 12 Johannes Meixner 2009-11-17 16:14:24 UTC
Is CUPS the only part which causes trouble on this system?
Does the rest of this system (X, desktop, applications,...)
work stable and reliably (provided there is a reasonable
amount of load for the rest of the system)?
Comment 13 Elmar Stellnberger 2009-11-17 18:12:44 UTC
everything works perfect; except cups and lpr (lpq, ...).
Comment 14 Johannes Meixner 2009-11-18 08:33:46 UTC
Are perhaps the packages cups-libs, cups, and cups-client
mixed up with different versions installed?
It is crucial that cups and cups-client have exactly
the same version as cups-libs. What results
  rpm -q cups-libs cups-client cups

I still need a description how you can reproduce a crash
of cupsd or cups-deviced or anything related like lpr/lpq/...

When you have a reproducible crash case on your system,
provide a gdb backtrace as follows:

Prerequisite:
Install the cups-debuginfo package where its version
matches exactly to the version of your installed
packages cups-libs, cups, and cups-client so that
  rpm -q cups-libs cups-client cups cups-debuginfo
shows the exact same version for all those packages.
Without installed cups-debuginfo package a gdb backtrace
is useless for me because it would not show the function name
where the crash actually happened so that I could not find
the point of interest in the sources.

Assume it is the cupsd which crashes,
then do the following:

Run /usr/sbin/cupsd in gdb with:

  gdb /usr/sbin/cupsd
  run -f

("-f" is passed as argument for /usr/sbin/cupsd
so that here "/usr/sbin/cupsd -f" is run).
Wait for the crash.
Then do at the gdb prompt "(gdb)"

  where

to see a backtrace.
Post the gdb backtrace here.

Additionally attach the last about 100 lines
(i.e. the last part which looks of interest
in relation to the crash) from /var/log/cups/error_log
as MIME type "text/plain" to this bug so that
I may have a chance to see which incident
leads to the crash.


If it is not cupsd but e.g. lpstat which crashes,
post a gdb backtrace for /usr/bin/lpstat
plus the exact lpstat command which you called
plus the last part of interest from /var/log/cups/error_log
Comment 15 Elmar Stellnberger 2009-11-18 16:51:09 UTC
  Unfortunately I could not reproduce the crash invoking cups with the -f option. However without -f there seems to be no way to obtain a backtrace with gdb. When running cups with -f my printer starts to be on the blink shutting up and down all the time without being able to print (only disconnect from PC helps or cups shutdown; though I think I had this bug also once before 11.1 perhaps). 

 Horrible. Even downgrades to RC1, MS7, os11.1 or os11.0 could not make cups run without crashing (though cups runs without -f on 11.1 and 11.0). 
 Do you think the errors pertain to the environment cups runs in or may it help to let the buildservice build an elder version of cups?
Comment 16 Elmar Stellnberger 2009-11-18 16:51:12 UTC
  Unfortunately I could not reproduce the crash invoking cups with the -f option. However without -f there seems to be no way to obtain a backtrace with gdb. When running cups with -f my printer starts to be on the blink shutting up and down all the time without being able to print (only disconnect from PC helps or cups shutdown; though I think I had this bug also once before 11.1 perhaps). 

 Horrible. Even downgrades to RC1, MS7, os11.1 or os11.0 could not make cups run without crashing (though cups runs without -f on 11.1 and 11.0). 
 Do you think the errors pertain to the environment cups runs in or may it help to let the buildservice build an elder version of cups?
Comment 17 Johannes Meixner 2009-11-19 09:15:31 UTC
I am not such an expert to decide if the root cause is
in CUPS or in its environmet but when "/usr/sbin/cupsd -f"
works but the default "/usr/sbin/cupsd" (i.e. run in the
background as a "daemon") doesn't it looks from my point
of view very much like an error in the environmet in
your special case (but I have no idea how to find out
what exactly makes your case so "special").
I think this in particular because if cupsd would usually
crash in openSUSE 11.2 I would have seen many bug reports
but up to now your's is the only one.

How did you install openSUSE 11.2?
Was it a new installation from scratch or an update?
If the latter from which older openSUSE version?

Please try the maximum
  LogLevel debug2
in /etc/cups/cupsd.conf to log all debugging information
to get perhaps some helpful messages in the CUPS error_log?

As some kind of desperate attempt finally try
"Reinstalling the Printing System" according to
http://en.opensuse.org/SDB:CUPS_-_Reinstalling_the_Printing_System
Comment 18 Elmar Stellnberger 2009-11-19 11:16:15 UTC
  Unfortunately even with Loglevel debug2 nothing has gone into the error_log of cupsd before it has crashed. Carefully having followed all steps at SDB:CUPS_-_Reinstalling_the_Printing_System everything crashed as usual.
  The thing about it is that I am working on a plain new installation of openSUSE 11.2. There is nothing I have actually done before trying to run cups (except some sys. cfg tasks). I have formerly had some problems with the  printing system whenever my computer got cracked. The fact that the problem only seems to apply to my computer is no good sign either. This time there should not have been any possibility to crack it since I kept it disconnected until I could enable Apparmor for all running services, even before I did the first update. There must be a massive backdoor somewhere in the distro. It actually seems to be impossible to connect my computer to the internet without giving away its control to these crackers. Tools like my self-developed checkroot can only help if a computer gets cracked later on. I will need somebody who can help me - or otherwise my only possiblity will be not to use Linux again and thereby to resign with my commitment to openSUSE, Linux and my testing activtiy. (although I don`t like that idea at all).

  Anyway. Many thanks for your effort!
Comment 19 Johannes Meixner 2009-11-19 13:51:17 UTC
I don't think crackers are the root cause here.

When I Google for "exited with preempt_count 1"
I get results related to kernel "Oops" issues.

What results
  grep -i "oops" /var/log/messages
or more precisely
  grep "Oops:" /var/log/messages
for you?

If you have kernel oopses, the root cause is your kernel,
more precisely your kernel on your particular hardware
which does not work reliable and stable and as a side effect
you see various crashes related to CUPS (see comment #11)
but then it is a bit unexpected how the rest of your system
could work "perfect" (see comment #12 and comment #13).
Comment 20 Elmar Stellnberger 2009-11-19 13:57:37 UTC
Created attachment 328434 [details]
tail -f /var/log/messages

No kernel oops; just segfault & register - dump messages.
Comment 21 Elmar Stellnberger 2009-11-19 14:00:08 UTC
 As elder versions of cups from 11.1 and 11.0 do not compile free of errors under 11.2 any more (buildservice) I have started to look for rpm-based distros featuring cups 1.4; perhaps the paackages from Mandriva should be worth a try.
Comment 22 Johannes Meixner 2009-11-19 15:12:17 UTC
Attachment #328434 [details]
looks similar to issues like
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=301021

If I am right, the issue here is also a bug in the kernel
so that I change the component of this bug report
accordingly from "Printing" to "Kernel".
Comment 23 Johannes Meixner 2009-11-19 16:06:22 UTC
Please attach the output for a cupsd crash from
  strace -f /usr/sbin/cupsd &>/tmp/strace-f.cupsd
in /tmp/strace-f.cups as MIME type text/plain to this bug.
Comment 24 Elmar Stellnberger 2009-11-19 16:19:50 UTC
Created attachment 328474 [details]
strace -f cupsd
Comment 25 Elmar Stellnberger 2009-11-19 16:31:12 UTC
 I personally can`t see any similarity except that both programs have terminated unmotivatedly (called kernel-Oops, right?). For ifconfig the cause was a paging request, in our case it seems to be unknown yet.
Comment 26 Jiri Slaby 2009-11-19 16:47:20 UTC
Just in case, can you attach whole dmesg?

dmesg -s 1000000 >dmesg.out
Comment 27 Elmar Stellnberger 2009-11-19 16:55:25 UTC
Created attachment 328490 [details]
dmesg -s 1000000
Comment 28 Jiri Slaby 2009-11-19 18:26:04 UTC
(In reply to comment #27)
> Created an attachment (id=328490) [details]
> dmesg -s 1000000

Unfortunately kernel ring buffer was already rewritten. The machine is after several hours of uptime with many suspend cycles.

Could you
* find last 0.000000 kernel line in /var/log/messages* and attach the one which you find it in and all from that moment on (e.g. in one tarball)?
* retry after reboot
Comment 29 Jiri Slaby 2009-11-19 18:29:33 UTC
(In reply to comment #28)
> * find last 0.000000 kernel line in /var/log/messages* and attach the one which
> you find it in and all from that moment on (e.g. in one tarball)?

Or just tar 'em all.
Comment 30 Elmar Stellnberger 2009-11-19 18:32:27 UTC
> grep 0[.]000000 /var/log/messages*
> # nothing found!

 Why do you think that there has been nothing about the cups crash in the attached dmesg? I have let it crash immediately before taking the dmesg!
Comment 31 Jiri Slaby 2009-11-19 18:36:40 UTC
(In reply to comment #30)
> > grep 0[.]000000 /var/log/messages*
> > # nothing found!

Doesn't your logrotate setup gzip them?

>  Why do you think that there has been nothing about the cups crash in the
> attached dmesg?

I don't think that. Cups is broken and touches memory which doesn't belong to that. Hence it crashes.

But you hit a different problem before the cups. There has to be some bug triggered in the kernel. And I need the logs to find that.

> I have let it crash immediately before taking the dmesg!

Yup, you probably hit two bugs.
Comment 32 Elmar Stellnberger 2009-11-19 18:43:11 UTC
Created attachment 328508 [details]
/var/log/messages*

 There is only a /var/log/messages, no messagesXY.gz or something else that starts with 'mess'. By the way what is the difference between viewing dmesg and /var/log/messages? I always keep looking at the latter.
Comment 33 Jeff Mahoney 2009-11-19 19:05:52 UTC
Elmar, can you try an updated kernel from http://ftp.suse.com/pub/projects/kernel/kotd/openSUSE-11.2/ ? I believe this is a bug I ran into and fixed independently yesterday.
Comment 34 Elmar Stellnberger 2009-11-19 21:41:04 UTC
 Cups crashes are resolved with 2.6.31.6-0.0.0.23.273000e-desktop. Nonetheless my proprietary Brother drivers doesn`t seem to work with newer versions of cups.
Comment 35 Jeff Mahoney 2009-11-20 17:14:57 UTC
I opened bug 557302 to explain this appropriately and make for easier searching. Closing this as a duplicate of that one.

*** This bug has been marked as a duplicate of bug 557302 ***
Comment 37 Jeff Mahoney 2009-12-01 14:59:13 UTC
*** Bug 539401 has been marked as a duplicate of this bug. ***