Bug 1006417 - System doesn't boot with new kernel 4.8.3 (32-bit)
System doesn't boot with new kernel 4.8.3 (32-bit)
Status: RESOLVED FIXED
: 1004949 1006632 1007746 (view as bug list)
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
i686 Other
: P5 - None : Critical (vote)
: ---
Assigned To: Borislav Petkov
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-10-23 13:25 UTC by Petr Matula
Modified: 2016-11-09 10:51 UTC (History)
9 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg of working kernel 4.7.6-1-default on Pentium III (Katmai) (59.49 KB, text/plain)
2016-10-27 20:34 UTC, Ralph Gauer
Details
dmesg of working kernel 4.7.6-1-default on Pentium-S (63.83 KB, text/plain)
2016-10-27 20:52 UTC, Ralph Gauer
Details
dmesg of working kernel 4.7.6-1-default on Athlon64 inside Virtualbox (42.42 KB, text/plain)
2016-10-27 20:59 UTC, Wolfgang Bauer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Matula 2016-10-23 13:25:14 UTC
After my "zypper dup" and update kernel to 4.8.3-1 version on my three i686 computers (different hardware) those computers don't boot.
Computers still restarted again and again.
The last message on screen is "Loading initial ramdisk ..."after that computers reboots itself. 
Kernels version are 2x pae and 1x default.
Comment 1 Takashi Iwai 2016-10-24 06:38:09 UTC
Does the issue happen even if you blacklist nouveau as in bug 1006420?
Comment 2 Petr Matula 2016-10-24 11:00:05 UTC
Yes,blacklisting of nouveau doesn't influence on this bug.

I have this (1006417)bug on my i686 computers only.

(Bug 1006420 is on my 64-bit computer only)
Comment 3 Ralph Gauer 2016-10-25 17:15:00 UTC
I can confirm this on a Pentium III (Katmai).
Kernel 4.7.6-1 is booting without problem.
Comment 4 Jiri Slaby 2016-10-26 09:41:41 UTC
Could you guys remove "quiet" kernel option in grub and add "debug" instead? Where approximatelly it reboots? You can also add "boot_delay=100" (or =10) to slow down the output.
Comment 5 Petr Matula 2016-10-26 11:22:58 UTC
It's still the same.
First message "Loading Linux 4.8.3-1-pae"
Second message "Loading initial ramdisk ..."
and immediately after that computer reboots itself
Comment 6 Ralph Gauer 2016-10-26 20:53:30 UTC
The same here, now with kernel 4.8.4 on i686 (Pentium III Katmai) and
kernel 4.8.3 on i586 Pentium-S.
The debug and boot_delay parameters don't help.
Immediately after "Loading initial ramdisk ..." the system is rebooting without any other messages/output.
Comment 7 Ralph Gauer 2016-10-27 07:17:18 UTC
Maybe it's an illegal instruction in an early phase.
Pentium III (Katmai) doesn't have "sse2".
Pentium-S doesn't have "mmx", "sse" or "sse2".
Comment 8 Wolfgang Bauer 2016-10-27 08:28:19 UTC
(In reply to Ralph Gauer from comment #7)
> Maybe it's an illegal instruction in an early phase.
> Pentium III (Katmai) doesn't have "sse2".
> Pentium-S doesn't have "mmx", "sse" or "sse2".

I'm having this problem in a 32bit Tumbleweed VM (in VirtualBox) too, the host is an AMD Athlon64 which definitely has "mmx", "sse" and "sse2"...

The 4.7.6 kernel boots fine here as well.
Comment 9 Takashi Iwai 2016-10-27 13:15:03 UTC
*** Bug 1004949 has been marked as a duplicate of this bug. ***
Comment 10 Takashi Iwai 2016-10-27 13:35:04 UTC
I checked the installation on KVM, and the latest TW 32bit image could be installed / run fine.  So it's likely depending on BIOS / CPU / whatever.

Boris, do you know of any change in x86 code that may affect 32bit boot?
Comment 11 Takashi Iwai 2016-10-27 13:39:55 UTC
Just to be sure: did anyone try "dis_ucode_ldr" boot option?
Comment 12 Wolfgang Bauer 2016-10-27 14:07:14 UTC
(In reply to Takashi Iwai from comment #11)
> Just to be sure: did anyone try "dis_ucode_ldr" boot option?

Yes, that was the first thing I tried.
But it didn't have any (positive) effect.

I tried it again to be sure (I could have made a typo), and no, it doesn't help here.
Comment 13 Borislav Petkov 2016-10-27 15:10:16 UTC
(In reply to Takashi Iwai from comment #10)
> Boris, do you know of any change in x86 code that may affect 32bit boot?

Hmm, nothing rings a bell.

The only thing I can think of is bisection. Maybe people can try 4.8.1,
4.8.2 and this way gradually narrow it down.

Also, can people upload dmesg from a working kernel?

Thanks.
Comment 14 Ralph Gauer 2016-10-27 20:34:03 UTC
Created attachment 699697 [details]
dmesg of working kernel 4.7.6-1-default on Pentium III (Katmai)
Comment 15 Ralph Gauer 2016-10-27 20:52:26 UTC
Created attachment 699699 [details]
dmesg of working kernel 4.7.6-1-default on Pentium-S
Comment 16 Wolfgang Bauer 2016-10-27 20:59:15 UTC
Created attachment 699700 [details]
dmesg of working kernel 4.7.6-1-default on Athlon64 inside Virtualbox

I tried booting kernel-vanilla-4.8.4 from the Tumbleweed repo, with the same problem.

I also tried booting *without* an initrd (by removing the initrd line from the boot menu entry), and it behaved the same.
So I suppose we can rule out a problem with the initrd.

A dmesg from me too is attached, kernel 4.7.6-1-default inside VirtualBox with an Athlon64 3000+ on the host side.
Comment 17 Ralph Gauer 2016-10-28 07:46:40 UTC
(In reply to Takashi Iwai from comment #9)
> *** Bug 1004949 has been marked as a duplicate of this bug. ***

Kernel 4.8.4 32-bit with Tumbleweed in a Virtualbox VM does not boot, even if the IDE controller is removed and only the SATA controller with the hard drive is left.
Comment 18 Wolfgang Bauer 2016-10-28 13:42:42 UTC
Hm.
The latest 32bit Krypton LiveCD (based on Tumbleweed) with kernel 4.8.4-pae boots fine on this host and also as guest in vmware, but "crashes" (i.e. reboots immediately after the kernel/inintrd is loaded) when running inside VirtualBox (all on the same host).

So it may indeed be BIOS specific indeed or certain hardware other than the CPU...
Comment 19 Borislav Petkov 2016-10-28 13:57:51 UTC
Can you reproduce with qemu/kvm instead?
Comment 20 Wolfgang Bauer 2016-10-28 14:06:20 UTC
(In reply to Wolfgang Bauer from comment #18)
> but "crashes" (i.e.
> reboots immediately after the kernel/inintrd is loaded) when running inside
> VirtualBox (all on the same host).

I noticed that if I enable IO-APIC in the VM settings (under "System") the 4.8.4 kernel works, if it's disabled it immediately reboots after the kernel is loaded.
Maybe this helps in finding the problem?

Also I noticed when booting the LiveCD that this message shortly is displayed after the kernel is loaded, before the system reboots:
"Probing EDD (edd=off to disable)... OK"
(I don't see that message when booting the Tumbleweed installation)
edd=off doesn't help though (it only makes this message disappear).
Comment 21 Wolfgang Bauer 2016-10-28 14:07:04 UTC
(In reply to Borislav Petkov from comment #19)
> Can you reproduce with qemu/kvm instead?

Sorry, I can't try that.
My CPU has no virtualization support...
Comment 22 Borislav Petkov 2016-10-28 14:29:45 UTC
(In reply to Wolfgang Bauer from comment #21)
> My CPU has no virtualization support...

You don't absolutely need hw virtualization support to install a guest in qemu.
Comment 23 Borislav Petkov 2016-10-28 14:42:17 UTC
(In reply to Wolfgang Bauer from comment #20)
> I noticed that if I enable IO-APIC in the VM settings (under "System") the
> 4.8.4 kernel works, if it's disabled it immediately reboots after the kernel
> is loaded.
> Maybe this helps in finding the problem?

Yap, that rings a bell. I'm willing to put some money on this:

ff8560512b8d ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")

which is already queud for stable.

Wanna apply it ontop of your kernel, rebuild and retest?

I can help out along the way if you'd like :)
Comment 24 Wolfgang Bauer 2016-10-28 14:56:46 UTC
(In reply to Borislav Petkov from comment #23)
> Yap, that rings a bell. I'm willing to put some money on this:
> 
> ff8560512b8d ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")
> 
> which is already queud for stable.
> 
> Wanna apply it ontop of your kernel, rebuild and retest?

This one I suppose:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ff8560512b8d4b7ca3ef4fd69166634ac30b2525

I will try it out and report back.
Comment 25 Borislav Petkov 2016-10-28 14:58:34 UTC
(In reply to Wolfgang Bauer from comment #24)
> This one I suppose:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=ff8560512b8d4b7ca3ef4fd69166634ac30b2525
> 
> I will try it out and report back.

Exactly and thanks!
Comment 26 Wolfgang Bauer 2016-10-28 19:29:08 UTC
(In reply to Wolfgang Bauer from comment #24)
> This one I suppose:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=ff8560512b8d4b7ca3ef4fd69166634ac30b2525
> 
> I will try it out and report back.

Unfortunately this doesn't help with VirtualBox.

Here are my packages, if somebody wants to try that patch on real hardware:
http://download.opensuse.org/repositories/home:/wolfi323:/branches:/Kernel:/stable/standard/i586/
Comment 27 Borislav Petkov 2016-10-28 19:52:52 UTC
Yeah, this got reported on lkml too and we're debugging:

https://lkml.kernel.org/r/ca86ac5f-ce41-4773-98c5-9f55bac43503@default

I'll let you know once a fix is found.

Thanks for giving it a try anyway.
Comment 28 Borislav Petkov 2016-10-28 19:54:30 UTC
(In reply to Wolfgang Bauer from comment #26)
> Here are my packages, if somebody wants to try that patch on real hardware:
> http://download.opensuse.org/repositories/home:/wolfi323:/branches:/Kernel:/
> stable/standard/i586/

Yes, other people on the bug with real hardware, please try this package.

Thanks.
Comment 29 Petr Matula 2016-10-28 20:31:12 UTC
I'm trying it right now (old computer processor VIA CN700). 
And it's fixed. The computer is booting.
Comment 30 Borislav Petkov 2016-10-28 20:40:08 UTC
Good,

btw, this fix 

85533a1ae7b6 ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")

is in 4.8.5 which you can get from here:

http://kernel.opensuse.org/packages/stable

HTH.
Comment 31 Wolfgang Bauer 2016-10-28 22:21:51 UTC
(In reply to Borislav Petkov from comment #27)
> Yeah, this got reported on lkml too and we're debugging:
> 
> https://lkml.kernel.org/r/ca86ac5f-ce41-4773-98c5-9f55bac43503@default
> 
> I'll let you know once a fix is found.

Thank you.
But it's not a problem for me anyway, especially as I have a workaround now (enabling IO-APIC). ;-)

I will still give the patch posted that mailinglist thread a try though:
https://lkml.org/lkml/2016/10/28/581
Comment 32 Borislav Petkov 2016-10-28 23:10:50 UTC
(In reply to Wolfgang Bauer from comment #31)
> I will still give the patch posted that mailinglist thread a try though:
> https://lkml.org/lkml/2016/10/28/581

Sure, and please do report whether it worked or not...

Thanks.
Comment 33 Wolfgang Bauer 2016-10-29 00:54:05 UTC
Yes, it works.

I applied the patch to Kernel 4.8.5 from Kernel:stable, and the system successfully boots now regardless whether IO-APIC is enabled or not.
Comment 34 Borislav Petkov 2016-10-29 12:53:59 UTC
Ok, fix is queued:

http://git.kernel.org/tip/1e90a13d0c3dc94512af1ccb2b6563e8297838fa

Closing.
Comment 35 Ian Jones 2016-11-01 11:40:10 UTC
*** Bug 1007746 has been marked as a duplicate of this bug. ***
Comment 36 Egon Niessner 2016-11-02 13:17:04 UTC
Is there an Tumbleweed iso file to download for an i586 CPU,
with a linux kernel which contains the fix of comment #34 ?

I think, openSUSE-Tumbleweed-DVD-i586-Snapshot20161031-Media.iso 
does not contain this fix.
Comment 37 Borislav Petkov 2016-11-02 13:41:15 UTC
(In reply to Egon Niessner from comment #36)
> Is there an Tumbleweed iso file to download for an i586 CPU,

Is it really a 32-bit-only CPU or you could theoretically upgrade to
64-bit?
Comment 38 Egon Niessner 2016-11-02 14:12:57 UTC
(In reply to Borislav Petkov from comment #37)
> (In reply to Egon Niessner from comment #36)
> > Is there an Tumbleweed iso file to download for an i586 CPU,
> 
> Is it really a 32-bit-only CPU or you could theoretically upgrade to
> 64-bit?

Yes, it is a 32 bit AMD Athlon(tm) CPU.
Regards 
Egon
Comment 39 Borislav Petkov 2016-11-02 15:27:58 UTC
Ok, so I'm being told 4.8.6 is in the queue which contains
the fix. I'd keep checking the Changes* files here
http://download.opensuse.org/tumbleweed/iso/ for the new kernel version
to appear.

HTH.
Comment 40 Jiri Slaby 2016-11-02 20:06:07 UTC
*** Bug 1006632 has been marked as a duplicate of this bug. ***
Comment 41 Wolfgang Bauer 2016-11-06 16:25:30 UTC
FYI, kernel 4.8.6 is included in today's new Tumbleweed snapshot, so it should be on the latest TW iso files too.

And I can confirm that it boots successfully here (inside VirtualBox).
Comment 42 Egon Niessner 2016-11-08 14:20:14 UTC
I tried to boot with a DVD containing
openSUSE-Tumbleweed-DVD-i586-Snapshot20161105-Media.iso
(it contains  kernel-default-4.8.6-2.1.i586)

During boot of the rescue system in save mode (and all selectable kernel settings
before booting),
the rescue system crashes after loading the initrd
with the last message after backtrace
"DWARF2 unwinder stuck at resume_userspace +  0xe/0x13
 leftover inexact backtrace"

So tumbleweed is not bootable on a real i586 single core system.

(On a pentium 4 system with two 32 bit cores, this dvd can be booted.)
Comment 43 Wolfgang Bauer 2016-11-08 14:46:37 UTC
(In reply to Egon Niessner from comment #42)
> During boot of the rescue system in save mode (and all selectable kernel
> settings
> before booting),
> the rescue system crashes after loading the initrd
> with the last message after backtrace
> "DWARF2 unwinder stuck at resume_userspace +  0xe/0x13
>  leftover inexact backtrace"

I suppose you should file a new bug report about this though.

The original problem was different (crash immediately after "Loading initial ramdisk..." without any further messages, and has been confirmed as fixed by the reporter too.

Btw, it's not a general problem with single core systems.
Mine is single core too (though 64bit actually), and the reporter's probably as well (I don't think a VIA CN700 is dual core)... ;-)
Comment 44 Egon Niessner 2016-11-09 10:51:25 UTC
I created a new bug-report 1009246 for this problem.
Regards 
Egon