Bug 548021 - 11.2 Milestone 7 won't run on AMD x86_64
Summary: 11.2 Milestone 7 won't run on AMD x86_64
Status: RESOLVED WONTFIX
Alias: None
Product: openSUSE 11.2
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: x86-64 openSUSE 11.2
: P3 - Medium : Major with 10 votes (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL: http://git.kernel.org/?p=linux/kernel...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-18 21:38 UTC by Forgotten User eDStDj8Y1e
Modified: 2010-09-03 17:45 UTC (History)
5 users (show)

See Also:
Found By: Beta-Customer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
hwinfo tried with opensuse 11.0, 32 bit (464.22 KB, text/plain)
2009-11-26 17:42 UTC, Frank Hübner
Details
boot messages without noacpi (Brandon's iso) (35.68 KB, text/plain)
2009-12-17 21:06 UTC, Frank Hübner
Details
boot messages with noacpi (Brandon's iso) (7.40 KB, text/plain)
2009-12-17 21:10 UTC, Frank Hübner
Details
boot log with noacpi (Brandon's iso) (260.04 KB, application/octet-stream)
2009-12-19 09:14 UTC, Frank Hübner
Details
end of boot messages without noacpi (60.30 KB, text/plain)
2009-12-19 15:42 UTC, Frank Hübner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Forgotten User eDStDj8Y1e 2009-10-18 21:38:10 UTC
+++ This bug was initially created as a clone of Bug #541396 +++

Using CR1. 

ISO burned to DVD. Boot from DVD and expected splash screen is given. Finally
selection to boot from hard drive... If ANY option is taken besides boot from
hard drive (I didn't test this one), the load fails.

This failure is seen by the screen not only going dark, but in my case, the
status light to the LCD unit shows loss of signal. IF you select TEXT for the
display, a series of modules being loaded shows, and then the whole thing hangs
(last 2 modules listed: 
[   0.184001]  [<ffffffff819d80b0>] kernel_init+0x102/0x15c
[   0.184001]  [<ffffffff8100d6ca>] child_rip+0xa/0x20.)

I tried again with ACPI disabled and text mode, and sure enough, I am in the text mode of the install, doing an install repair (because 11.1's GRUB got twisted somehow).

So a reboot and ACPI only disabled and the full graphics system ran just fine
for an upgrade (from 11.1 64 bit).

This is interesting in that *All* SuSE installs (32 or 64 from 10.2 to 11.1)
along with a few other distros have worked on this system without having to
disable anything. 

Config info: 

AMD Athlon 64 Dual Core 4200+
896MB Ram
ATI RS690 (Radeon X1200 Series)
ATI SB600 IDE & Non-Raid-5 SATA
Dual 500GB SATA drives (with /home being RAID1)

Reason for testing this, problems found with 11.0-1 in the set up of RAID1
while doing install. Seems that GRUB is not correctly configured. As you can
see, I didn't get that far this time.
Comment 1 Forgotten User eDStDj8Y1e 2009-11-01 17:26:15 UTC
After install completes, system will not complete boot correctly. 

Message is given (I'm copying this by hand):

The superblock could not be read or does not describe a correct ext2
file system.  If the device is valid and it really contains an ext2 
file system (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
 [etc etc]

Yes, I can recreate this at will.

It seems that this is a problem when RAID1 is involved. 

I think this is a show stopper.
Comment 2 Forgotten User eDStDj8Y1e 2009-11-01 21:25:18 UTC
Well, this is interesting. After having the problem of telling the system to start an RC1 and it failing each time, and my default also failing I thought I'd burn an RC2 and test it. At this point I had not tried either of the last two items in the GRUB menu (as listed below).

A COLD boot was attempted using the newly burned RC2 system but the DVD was not recognized.

So GRUB came up with a list of items:

Desktop -- openSUSE 11.2 RC 1 - 2.6.31.3-1
Failsafe -- openSUSE 11.2 RC 1 - 2.6.31.3-1
openSUSE 11.1 - 2.6.27.29-0.1 (default)
Failsafe -- openSUSE 11.1 2.6.27.29-0.1
openSUSE 11.1
Failsafe -- openSUSE 11.1

I selected "openSUSE 11.1" in an attempt to go back to the last running system (which was a 10.3 that had been upgraded to 11.1 w/ RAID 1 for /home) and what came up was 11.2 RC1!

Seems that the UPGRADE install on RC1 is not correctly configuring GRUB. The 11.2 RC1 that came up needed to have the users redefined, which then connected them immediately to the /home locations (which is correctly, RAID1). So the upgrade misses some configuration items as well. Accordingly I am changing the severity and priority to a lower value.
Comment 3 Frank Hübner 2009-11-16 19:52:11 UTC
I do not have a raid system, but I do have problems which look very similar. 

Installation hangs after kernel is loaded with messages as Steve mentioned:

[   0.184001]  [<ffffffff819d80b0>] kernel_init+0x102/0x15c
[   0.184001]  [<ffffffff8100d6ca>] child_rip+0xa/0x20.)


When booting in secure mode or acpi switched off, I get messages as in bug 548165:

[    6.169153] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[    6.506022] ata1.00: qc timeout (cmd 0xec)
[    6.506032] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[    6.626035] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    6.627152] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)


(I do have pictures from the screen available, for the case they are of interest I can post them too)

When I switch the SATA controler off (boot from ide cdrom or usb), booting succeeds (Net install, check installation media as well as gnome live system)

Anyway, this occurs only with os 11.2 64 bit system, 32 bit system installs fine.
Also 11.1 64 system runs fine. 

Then I tried another distro (sabayon), with the same kernel (2.6.31): same results, can boot only if sata controler is switched off.

Searching the web shows up issues with this kernel and serial ata. Can this be a kerne issue?

What is the difference between 32 and 64 bit kernel?
Comment 4 Brandon Philips 2009-11-25 09:48:37 UTC
Can you please try and reproduce with 11.2 final and attach the output from hwinfo to the bug?

hwinfo > 548021.hwinfo
Comment 5 Forgotten User eDStDj8Y1e 2009-11-25 15:14:12 UTC
I'll try to get this for you next week. I'm currently on holiday and away from the system that has the problem.

Since I am a bit new at this, could you tell me how to get the output from hwinfo captured when running the install? I currently do not have a functional USB stick (it died).
Comment 6 Frank Hübner 2009-11-26 17:42:54 UTC
Created attachment 329690 [details]
hwinfo tried with opensuse 11.0, 32 bit
Comment 7 Brandon Philips 2009-12-07 18:23:35 UTC
A fix for this is available in Kernel 2.6.31.6 which isn't available on the install media. 

So, please install 11.2 with ACPI disabled (I believe you said everything worked in that case).  Then install the latest Kernel (2.6.31.6 or greater) on the new install and everything should work with ACPI enabled.
Comment 8 Frank Hübner 2009-12-07 20:28:55 UTC
(In reply to comment #7)
> A fix for this is available in Kernel 2.6.31.6 which isn't available on the
> install media. 
> 
> So, please install 11.2 with ACPI disabled (I believe you said everything
> worked in that case).  Then install the latest Kernel (2.6.31.6 or greater) on
> the new install and everything should work with ACPI enabled.

This will not work for me. I can not access the disk when acpi is disabled. If possible, can you please provide an installation disk (minimal, and with access to a usb hard disk e.g) or give me a hint how to create one by myself?
Comment 9 Forgotten User eDStDj8Y1e 2009-12-08 01:26:21 UTC
(In reply to comment #7)
> A fix for this is available in Kernel 2.6.31.6 which isn't available on the
> install media. 
> 
> So, please install 11.2 with ACPI disabled (I believe you said everything
> worked in that case).  Then install the latest Kernel (2.6.31.6 or greater) on
> the new install and everything should work with ACPI enabled.

Everything does not work in this case. Only the install works, to a point. From comment #2 above:

"Seems that the UPGRADE install on RC1 is not correctly configuring GRUB. The
11.2 RC1 that came up needed to have the users redefined, which then connected
them immediately to the /home locations (which is correctly, RAID1). So the
upgrade misses some configuration items as well. Accordingly I am changing the
severity and priority to a lower value."

Well, the 11.2 final is similar.

Note that it took running a 11.1 recovery install to fix things, it was that GRUB's menu is built wrong. Once the 11.1 recovery ran and fixed fstab (among other things), the attempted boot of the 11.2 DVD FAILED to be recognized, causing a boot from the hard drive, which I selected the 11.1 which actually ran 11.2.

[from original posting "Reason for testing this, problems found with 11.0-1 in the set up of RAID1 while doing install. Seems that GRUB is not correctly configured. As you can see, I didn't get that far this time."]

So I will re-open this, because it is not fixed. It might be time for the decision makers to make the hard decision to rebuild 11.2 with a working kernel.
Comment 17 Brandon Philips 2009-12-16 06:27:26 UTC
Could you test this kISO?

http://beta.suse.com/private/bphilips/bnc548021/openSUSE-11.2-bnc548021.iso

48c0ffda0b88c8a5ca4ba696a5804aaa  openSUSE-11.2-bnc548021.iso

The usage instructions for a kISO follow.

  1. Boot with kISO and select operation to perform from the boot
     menu.  Boot loader loads kernel and initrd images into memory.

  2. Wait until boot loader finishes loading.

  3. All the needed contents from the kISO are contained in kernel and
     initrd images, so kISO can be removed from the drive once boot
     loader finishes loading.  Remove kISO media from the drive and
     put in release installation media.

  4-1. If the release installation media was put in before hardware
       probe is complete, installation system starts and proceeds the
       same way.

  4-2. If the media was put in too late, linuxrc will complain that it
       can't find product repository and activate manual setup.
       Nothing to worry about.  You just need to press enter a few
       more times.  Proceed to #5.

  5. Make sure the release installation media is in the drive and
     select language and keyboard map.  It will give you Main Menu.
     Select "Start Installation or System" -> "Start Installation or
     Update" -> "CD-ROM".  All the selections are the default, so just
     pressing enter several times is enough.  linuxrc will report that
     no update was found which can be safely ignored.  Installation
     continues as usual.  Because manual mode was activated,
     installation system will ask a few more questions.
Comment 19 Frank Hübner 2009-12-16 21:28:32 UTC
(In reply to comment #17)
> Could you test this kISO?
> 
> http://beta.suse.com/private/bphilips/bnc548021/openSUSE-11.2-bnc548021.iso
> 
> 48c0ffda0b88c8a5ca4ba696a5804aaa  openSUSE-11.2-bnc548021.iso
> 

I tried it, it's getting better: the kernel boots with "noacpi", but not without. 

The kernel recognizes with "noacpi" my sata disk. I did not try the installation completely, just booting until the installation program started. Then I checked manually by fdisk -l from console 2 that the disk is recognized (the original installation disk did not recognize the hard disk).
Comment 20 Brandon Philips 2009-12-17 00:18:49 UTC
(In reply to comment #19)
> (In reply to comment #17)
> > Could you test this kISO?
> > 
> > http://beta.suse.com/private/bphilips/bnc548021/openSUSE-11.2-bnc548021.iso
> > 
> > 48c0ffda0b88c8a5ca4ba696a5804aaa  openSUSE-11.2-bnc548021.iso
> > 
> 
> I tried it, it's getting better: the kernel boots with "noacpi", but not
> without. 

How does it not work without noacpi? Can you provide some more details? Maybe serial console capture?
Comment 21 Frank Hübner 2009-12-17 21:06:45 UTC
Created attachment 333287 [details]
boot messages without noacpi (Brandon's iso)
Comment 22 Frank Hübner 2009-12-17 21:10:48 UTC
Created attachment 333289 [details]
boot messages with noacpi (Brandon's iso)

I had some difficulties logging the console messages. It starts later

It looks like I have some trouble when booting with a serial console. Without serial console booting with noacpi works fine, at least I do not get these messages (or I am not able to see them).
Comment 23 Frank Hübner 2009-12-17 21:17:10 UTC
I forgot to mention that when doing a normal boot without noacpi and without serial console the messages look different. I can only see at the screen the last messages (acpi errors), they do not appear if I use the serial console. 

At least I will try to log down the messages at a higher baud rate - the last time I tried it I was not able because of slow hardware (I use an old palm device for logging).
Comment 24 Frank Hübner 2009-12-19 09:14:57 UTC
Created attachment 333548 [details]
boot log with noacpi (Brandon's iso)

This is the full boot log with noacpi as written to /var/lob/boot.msg
I tried first time to log it down with a serial console (attachment 333289 [details]), stopped it as it took very long with 4800 baud.
Comment 25 Frank Hübner 2009-12-19 15:42:19 UTC
Created attachment 333557 [details]
end of boot messages without noacpi

This is the end of boot messages taken with Brandon's iso. It completes attachment 333287 [details] (start of the boot message).
Comment 26 Frank Hübner 2009-12-22 07:43:30 UTC
I could solve my problem now: After updating the bios I was able to boot with the 11.2 installation disk. 

Strange, because Opensuse 11.0 (32 bit) and 11.1 (64 bit) work fine with the old bios.
Comment 27 Brandon Philips 2009-12-23 06:38:37 UTC
Frank seems to have his issue solved. Steven?
Comment 28 Forgotten User eDStDj8Y1e 2009-12-25 01:00:26 UTC
Steve -- no "n" suffix.

No, my issue is not solved. I have upgraded my BIOS and retried. If I do not specify NOACPI, then the screen goes black and the only way out is to hit the <reset> button (the 3 finger salute doesn't do it).

And I noted when this bug report was opened, OpenSuSE 10.3 - 11.1 32 and 64 bit worked fine. 11.2 64Bit CR1, CR2 and Final do not. I did not try 11.2 32bit as this is now a fully functioning file server and it is set up as a 64bit machine.

If you want or need logs from this machine, you will have to tell me how to capture them, because I don't have a USB device to boot and operate from.

Also, not that it made any difference, I have also upgraded to 2G of RAM from 1G (which is shared with the built in video adapter).

Meanwhile, once the install is complete, the kernel that is installed has the appropriate support for ACPI. I say that because I do not have to tell GRUB noACPI for the boot.
Comment 29 Brandon Philips 2010-02-27 00:31:27 UTC
(In reply to comment #28)
> Meanwhile, once the install is complete, the kernel that is installed has the
> appropriate support for ACPI. I say that because I do not have to tell GRUB
> noACPI for the boot.

So, once it is installed everything is OK?
Comment 30 Greg Kroah-Hartman 2010-08-25 23:28:29 UTC
Closing due to lack of response.
Comment 31 Forgotten User eDStDj8Y1e 2010-08-27 05:04:11 UTC
Actually I responded, and that response somehow didn't post.

Yes, once it is installed, things ran OK. I say ran, because that machine is now gone.

But understand, the install could not be completed with the 11.2 DVD. Once the system was loaded to disk, I had to use an 11.1 DVD to go through the repair steps. Once that was done, THEN the system ran correctly, handling RAID1.

I have not seen anything that says that 11.3 has been fixed for this RAID problem (or GRUB, etc.). So I have another system with 8GB RAM and a dual core AMD that is running 64bit 11.2 -- IN PRODUCTION -- but no RAID drives. 

Once I get everything together for a new server system, I might consider upgrading to 11.3 64bit with RAID 0. But the plans for that are in October at the earliest.

Regards,
Steve Thompson
Comment 32 Jeff Mahoney 2010-09-03 17:45:51 UTC
openSUSE 11.2 is in security-maintenance mode. Please reopen if this issue still occurs with 11.3 or Factory.