|
Bugzilla – Full Text Bug Listing |
| Summary: | kernel bug in …/mm/slab.c:3175 and subsequent deep freeze | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 12.2 | Reporter: | Anton Samsonov <avsco> |
| Component: | Kernel | Assignee: | E-mail List <kernel-maintainers> |
| Status: | RESOLVED WONTFIX | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P5 - None | CC: | bpetkov, jeffm |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 12.2 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Photograph of the first two crashes
Photograph of the third crash Photograph of a crash similar to the original one Photograph of an alternate crash with the same “nohz=off highres=off” options as above Photograph of the final screen of the same boot sequence as on previous picture Photograph of a crash with “powersaved=off nohz=off highres=off processor.max_cstate=1” options |
||
|
Description
Anton Samsonov
2013-01-19 12:55:17 UTC
Created attachment 520988 [details]
Photograph of the first two crashes
Created attachment 520989 [details]
Photograph of the third crash
There has been a development. Today it didn't boot even with “nohz=off highres=off” options, resulting in 1 crash report followed by deep freeze (see trace03.png in new attachments). Although I could not scroll back to see the first lines, this looked very similar to the situation I was experiencing originally until applying updates yesterday, though the numbers after function names were different from the very first text transcript posted here. The other difference was the falling into deep freeze instead of entering single-user mode.
I retried just for the case to check whether I could have misspelled the options. The outcome was a new story: after quickly displaying a couple of crash reports, it stuck for a while with the message about “fsck /boot” (trace04.png), then after 30 seconds it displayed another crash report, and after 23 more seconds it ultimately halted on watchdog timer (trace05.png).
The most strange thing here is that fsck said that /boot (labeled “LinuxBoot”) partition had 130'560 files, while in reality it has 361 files and 11 folders. I must also add that in previous tryouts, when a single-user prompt was available but incapable to shutdown, and I used REISUB magic to force rebooting, there were new messages about fsck and mount after sending SIGTERM, as well as then after SIGKILL.
Booting with the complete set of failsafe options still worked though, so I then started to check other configurations. Trying to boot with “powersaved=off nohz=off highres=off processor.max_cstate=1” resulted in yet unseen crash referring to CPU cache tuning, but still caused by some ext4 handling routines (see trace06.png). Was finally able to boot with
> apm=off edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset
or something like this, don't remember exactly which one of “edd” or “nomodeset” was excluded.
At this moment I could start to consider the possibility of a real hardware failure, if only openSUSE was the only one operating system on this computer. But my primary OS is Windows 7, and it boots just fine and [almost] never screws the fake-RAID on shutdown, and runs demanding modern videogames, as well as CPU-, GPU- and RAM-intensive BOINC computations that are cross-validated against other nodes. Of course, it's not a strong proof, but I'm more inclined towards a software bug, taking into account that the situation worsened each time after updating openSUSE.
Created attachment 520996 [details]
Photograph of a crash similar to the original one
Created attachment 520997 [details]
Photograph of an alternate crash with the same “nohz=off highres=off” options as above
Created attachment 520998 [details]
Photograph of the final screen of the same boot sequence as on previous picture
Created attachment 520999 [details]
Photograph of a crash with “powersaved=off nohz=off highres=off processor.max_cstate=1” options
Just to be sure it was not a hardware failure, I occasionally ran MemTest86+ and several live-boot systems, including Debian, Mint, Clonezilla, Fedora, Chakra (Arch), as well as OpenSUSE Factory. Some of those systems activated my logical volumes automatically, so I mounted and fsck'ed them, without any error, as well as the /boot partition. Not to mention that the main system, Windows 7, still functioned perfectly all the time, running heavy videogames, decoding HD video, crunching BOINC tasks for CPU and GPU, doing large file processing with results identical to other machines. So I will now install OpenSUSE 12.3 from scratch. Let's see how quickly it reaches the deteriorated state, too. Sorry for the delay. The screenshots you've taken are, unfortunately, all of secondary crashes and can't be used for debugging. The system is already in an unstable state when they occur. Has your experience with 12.3 been any better? (In reply to comment #9) > Has your experience with 12.3 been any better? The experience is always the same (more or less): for several months after the installation, everything works just fine, but occasionally deteriorates to an unusable state — when only a single-user prompt is available, which doesn't help as the system either crashes or hangs on transition to higher runlevels. By the time this happens, a new version of openSUSE is usually available, so, after several attempts to fix the problem, I install the new version from scratch. This gives me another few months, and the cycle repeats. If/when the same happens to openSUSE 12.3, I'll try report back, but, again, I have absolutely no idea how to provide more helpful dumps for such cases. Is this issue still of interest or can we close? (In reply to comment #11) > Is this issue still of interest or can we close? I've updated to 12.3 and 13.1 since then, with relatively less usage than earlier, so problems (if they persist) didn't have much time to accumulate. Thus it may be better to close this entry and perhaps open a new one if/when necessary — against a recent openSUSE version. This report is against openSUSE 12.2 which is no longer under maintenance. If you are able to reproduce it with openSUSE 13.1 or openSUSE Factory, please re-open and reset the the "Product" field to the appropriate release. |