Bug 118305

Summary: LTC18070-Default boot loader fails to install for both upgrade and new install
Product: [openSUSE] SUSE LINUX 10.0 Reporter: LTC BugProxy <bugproxy>
Component: InstallationAssignee: Olaf Dabrunz <odabrunz>
Status: RESOLVED FIXED QA Contact: Klaus Kämpf <kkaempf>
Severity: Normal    
Priority: P5 - None    
Version: Beta 4   
Target Milestone: ---   
Hardware: PowerPC-64   
OS: Linux   
See Also: https://bugzilla.linux.ibm.com/show_bug.cgi?id=18070
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: sles9-10migrate_bootLoaderBug.GIF
sles9-10migrate_bootLoaderBug2.GIF
oss_SuSE10_bootLoaderProblem-1.jpg
oss_SuSE10_bootLoaderProblem.jpg

Description LTC BugProxy 2005-09-21 21:25:38 UTC
LTC Owner is: thinh@us.ibm.com
LTC Originator is: marksmit@us.ibm.com


Problem description:
Testing basic installation of OSS-SUSE10 on Power5 Vscsi and Veth client lpar.
Both new installation and upgrade from (successfully installed) SLES9_SP2 fail
to install the Boot Loader.

Performing an NFS install of Beta4.  
Lpar contains two Vscsi disks:  sda (20GB) (Logical Volume created on VIO 
server)
sdb (146GB entire physical disk assigned to Vscsi disk)

Hardware Environment
    Machine type (p5-550 SF4)
    Cpu type (Power5):
    Describe any special hardware you think might be relevant to this problem:
System is a "no-hmc" VIO server (alpha VIO).

alpha VIO is accessed: http://codeine01.austin.ibm.com/  (padmin : padmin)
(use this to lower BSO firewall).

After being blocked from starting a "new installation", the lpar: 
codeine03.austin.ibm.com:1  (vnc pw: don2rry) was installed with SLES9_SP2.
The installation was re-started, this time choosing "upgrade" and existing 
install.  All defaults were selected.
Upgrade proceeded until attempting to install the boot loader.
System has failed to proceed, despite several attempts to re-define the boot 
loader.

The lpar: codeine03.austin.ibm.com:1  is still accessible via vncviewer for 
inspection.  I left it in the failing state.  Please proceed to click "yes" to 
retry.  It will send you to the manual boot loader configuration (and 
installation) tabs within the vncviewer.
I tried both Lilo (default) and "ppc" but am continuing to get the fail.

Created an attachment (id=12251)
Initial boot loader error screen


Created an attachment (id=12252)
Default boot loader selections


we will take a look

Hmm... Olaf said that there were issues with installing a bootloader on iSeries
(legacy).  RC1 was just released today, so you could try that, too. ;)  And I
don't see this on the list of most annoying bugs:
http://www.opensuse.org/Bugs:most_annoying_bugs

Hi Mike,
I will wait until RC1 is mirrored to 
http://software.linux.ibm.com/pub/suse/beta_cds/opensuse-10/SL-OSS-current/iso/
before I can access it.  At that time I will attempt recreate. (usually a few 
days).
In the mean time, Thinh, please indicate when/if you are done with the 
investigation of its current failing state. 

Mark,
Have you try to resolve these error before installing:
(errlog on the VIO server)

---------------------------------------------------------------------------
LABEL:          CLIENT_FAILURE
IDENTIFIER:     C972F43B

Date/Time:       Wed Sep  7 14:06:29 CDT 2005
Sequence Number: 40
Machine Id:      00CD66BF4C00
Node Id:         codeine01
Class:           S
Type:            TEMP
Resource Name:   vhost1

Description
Misbehaved Virtual SCSI Client

Probable Causes
Bad IU, or SRP Violation

Failure Causes
Bad IU, or SRP Violation

        Recommended Actions
        Remove Virtual SCSI Client, then Configure the same instance

Detail Data
ADDITIONAL INFORMATION
        module: target_trans_event      rc: 00000000FFFFFFD8    location: 00000002
        data:  1 1 0 0 0
---------------------------------------------------------------------------
LABEL:          CLIENT_FAILURE
IDENTIFIER:     C972F43B

Date/Time:       Wed Sep  7 09:29:27 CDT 2005
Sequence Number: 39
Machine Id:      00CD66BF4C00
Node Id:         codeine01
Class:           S
Type:            TEMP
Resource Name:   vhost1

Description
Misbehaved Virtual SCSI Client

Probable Causes
Bad IU, or SRP Violation

Failure Causes
Bad IU, or SRP Violation

        Recommended Actions
        Remove Virtual SCSI Client, then Configure the same instance

Detail Data
ADDITIONAL INFORMATION
        module: target_trans_event      rc: 00000000FFFFFFD8    location: 00000002
        data:  1 1 0 0 0
---------------------------------------------------------------------------


Mark,
I'm done with the machine.

The "Misbehaved Virtual SCSI Client" error, I think that is linux ibmvscsi client.

we need to look at OSS-SUSE10 source to see if the ibmvscsi is at the latest.

downloaded rc1 iso's and created network install images
aborted previous "upgrade" installation (blocked by this bug) and started a 
new nfs install.
did an over-ride on the default offered "new installation" and instead chose 
to upgrade the existing  install  (that was aborted).
1. will see if upgrade succeeds or recreate this bug.
2. if recreate, then will attempt "new installation" with new rc1 isos.
3. if recreate on "new", then will delete vscsi disks, per VIOServer error log 
recommendation, then will retry #2.
4. if recreate, then will update VIOServer code and system f/w to latest 
GA6/53D available, and again do #3 and then #2.

did all 1, 2, 3, 4 steps.  bug continues to recreate in all cases.  A 
different distro also having partition problems with this 2 disk vscsi combo: 
sda = 20GB (logical lvm vscsi) , sdb = 136GB (physical disk vscsi).

just to re-iterate:  can install SLES9SP2 ok, but cannot upgrade that install, 
nor can install new with oss_SuSE10, both due to this bug.

Created an attachment (id=12422)
boot loader problem with just one disk

removed sdb (physical volume 136GB disk) and recreated with just sda.

Created an attachment (id=12423)
same problem with only the 136GB physical volume vscsi

This time I removed the lvm 20GB disk and put the physical volume 136GB disk
back in.  problem still recreates.
tried setting sda3 as default loader loc.  doesn't help.
tried removing and adding a new one at sda1.  doesn't help.

I'm stuck.

receated on a 2nd VIOS hmc-attached, but also on a nonVIO no-hmc (aka genesis)
OpenPower 710 (just IPR disks).
Comment 1 LTC BugProxy 2005-09-21 21:26:16 UTC
Created attachment 50573 [details]
sles9-10migrate_bootLoaderBug.GIF
Comment 2 LTC BugProxy 2005-09-21 21:26:57 UTC
Created attachment 50574 [details]
sles9-10migrate_bootLoaderBug2.GIF
Comment 3 LTC BugProxy 2005-09-21 21:27:24 UTC
Created attachment 50575 [details]
oss_SuSE10_bootLoaderProblem-1.jpg
Comment 4 LTC BugProxy 2005-09-21 21:27:46 UTC
Created attachment 50576 [details]
oss_SuSE10_bootLoaderProblem.jpg
Comment 5 Olaf Hering 2005-09-22 06:19:56 UTC
Yes, you are right. The URL http://www.opensuse.org/Bugs:most_annoying_bugs
should _clearly_ mention that yast does ONLY know about msdos partition tables.
And as a result, installation will require many many manual tweaks. They are
mentionend behind the links on this page:

http://www.opensuse.org/POWER@SUSE

assigning to webmaster of that site.
Comment 6 LTC BugProxy 2005-09-26 15:42:42 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Owner|thinh@us.ibm.com            |gjlynx@us.ibm.com
        Owning Team|LTC Internal Support        |SuSE




------- Additional Comments From thinh@us.ibm.com(prefers email via th2tran@austin.ibm.com)  2005-09-26 11:39 EDT -------
assign to OpenSuse team. 
Comment 7 LTC BugProxy 2005-09-28 19:43:36 UTC
---- Additional Comments From marksmit@us.ibm.com  2005-09-28 15:38 EDT -------
ok, restarted a netboot install and enabled ssh to have a command prompt.
then created manual partitions:

inst-sys:~ # fdisk -l

Disk /dev/sda: 146.8 GB, 146814976000 bytes
255 heads, 63 sectors/track, 17849 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1          14      112423+   c  W95 FAT32 (LBA)
/dev/sda2              15         137      987997+  82  Linux swap / Solaris
/dev/sda3             138        3177    24418800   83  Linux

and then started the install.  The previous Sles9 install is erased. (I think 
that is the problem, but I\'m trying to do an OSS10 install to a blank disk).

I tried #2, but since there is no o/s that doesn\'t work.
Start Installation or System
1) Start Installation or Update
2) Boot Installed System
3) Start Rescue System

So I again tried #1
From command line, pdisk -l says
pdisk: No valid block 1 on \'/dev/sda\'

So I mounted /boot with /dev/sda1.
created an /etc/lilo.conf per web page suggestion
scp of CD1/suseboot/ initrd64 and linux64.gz to /boot/initrd and /boot/vmlinux

Ran lilo to try to write /dev/sda, but lilo fails and gives a segmentation 
fault.

Am I supposed to run lilo to make /dev/sda bootable?
Or do I need to install Sles9 and then perform an upgrade type install?

I don\'t seem to understand the workaround process for this problem. 
Comment 8 LTC BugProxy 2005-09-28 20:02:33 UTC
---- Additional Comments From marksmit@us.ibm.com  2005-09-28 15:56 EDT -------
Ok, sorry for the last append.  I think I needed to continue trying like 
this.  It seems to be installing now. Please let me know if this is the 
correct action:
used created partitions, and started yast
choose to update existing system.
choose to show all partitions, so I can see the empty /dev/sda3.
choose to update /dev/sda3.
manually choose update method to base update on installing all packages needed 
for KDE install (instead of just updating existing install - default 
selection - which is empty).
start install.
upon reboot, again choose ssh and then go run lilo on installation.

does this look right? 
Comment 9 Olaf Hering 2005-09-30 15:04:46 UTC
Mark,

the partitioner is supposed to work ok on pseries. Only the bootloader would be
an issue on pseries.
The instructions are mostly for Macs, have to tweak them a bit this weekend.

The link on the most annoying bugs page is still missing.
Comment 10 LTC BugProxy 2005-11-05 22:21:17 UTC
---- Additional Comments From marksmit@us.ibm.com  2005-11-05 17:12 EDT -------
This recreates on OSS 10.1 Alpha 2, power Lpar served by IVM \"alpha\" VIO 
server (Vscsi disk and Veth devices)
I checked http://www.opensuse.org/POWER@SUSE but do not understand how to get 
the bootloader installed. 
Comment 11 Olaf Hering 2005-11-08 14:03:32 UTC
Mark,

http://www.opensuse.org/PPC:Boot_pseries
this explains how to create a lilo.conf after install.
Comment 12 LTC BugProxy 2005-12-20 03:47:37 UTC
---- Additional Comments From marksmit@us.ibm.com  2005-12-19 22:40 EDT -------
This problem is also blocking ppc64 installs (power5 h/w) on Sles10, preview2.
I do not see a way to work around it, nor tell it to continue installing 
despite no boot loader defined.
So I do not understand how to tell the system to do a new install, and fix the 
boot loader later.
Do you wish for this bug to be duplicated to Sles10?  Or are you already aware 
of this problem on Sles10 previews? 
Comment 13 LTC BugProxy 2005-12-21 17:20:38 UTC
---- Additional Comments From marksmit@us.ibm.com  2005-12-21 12:11 EDT -------
documenting helpful hints from Olaf:
to work around this problem & fix it after install finishes, 
boot with start_shell option on command prompt (to fix without a rescue boot)
change proposed config -> pick bootloader -> Bootloader Installation
-> pick twisty next to default proposed \"ppc\" and pick other option:
\"do not install boot loader\"
finish install. system drops back to shell.
work around possible problem in shell:
To reset the terminal to a usable size, type:
<RETURN>
stty cols 80 rows 24

assuming the root partition is on sda3:
type \'echo Root: /dev/sda3 > /etc/yast.inf\' followed by \'exit\' or \'ctrl d\'
This will do the very same thing as \'booting into the installed system\'
from within yast.
yast.inf is the \"communication channel\" between yast and linuxrc.
then \"exit\" and system will boot and mount Root normally.
(next it may take you to /usr/lib/YaST2/startup/YaST2.ssh first)
At this point, create a suitable /etc/lilo.conf file (sample in link above),
and run \'lilo\' to install bootloader.  it will create a /etc/yaboot.conf for 
you and seems to change the SMS bootlist to insert /dev/sda as the 1st boot 
device. 
Comment 14 LTC BugProxy 2006-01-23 19:45:21 UTC
---- Additional Comments From marksmit@us.ibm.com  2006-01-23 14:43 EDT -------
I was able to install Sles10 preview2 ok using the documented workaround.
On beta1 for Sles10, I attempted a \"new install\" without workaround.
The autopartitioner correctly proposed removing the previous install and 
proposed creating newer partitions for the install.  I chose the defaults and 
the install got to bootloader installation at the end of the \"new 
installation\" where it failed to install the boot loader.
So I still seem to have a problem.
To attempt recreate, I removed all installations and attempted an install.  In 
this case autopartitioner refuses to propose a scenario, but when I manually 
(expert path) propose a partitioning scheme, the install succeeds.
sda1 1 block prep boot
sda2 1GB swap
sda3 12GB reiser for /

Is it expected behavior for the Sles10 autopartitioner to require manual 
partitioning on blank disks?  Or should I open a bug?

I am attempting recreate of the \"re-install\" where the bootloader install 
failed.  This time it will be from Sles10 beta1 (manual partition) to 
another \"re-install\" of the same beta1, using autopartitioner\'s defaults. 
Comment 15 LTC BugProxy 2006-01-25 07:30:21 UTC
---- Additional Comments From marksmit@us.ibm.com  2006-01-25 02:27 EDT -------
clarified during conference phone call today that autopartitioner should 
propose a scheme for blank disks.  I\'ve created an \"install scenario\" for 
testing that and will have the team pursue it.
In the mean time, the Sles10-beta2 patches uploaded to ftp3 server by SuSE, 
when patched onto beta1 isos, do fix this bug for a \"new install\".  Since that 
is a different release, should we reject this on the IBM side 
with \"alt_solution_avail\"  or should we change it to \"accepted\" so it can then 
be closed? 
Comment 16 Olaf Hering 2006-05-03 15:26:53 UTC
works slightly better in 10.1
Comment 17 LTC BugProxy 2006-05-05 01:45:11 UTC
----- Additional Comments From marksmit@us.ibm.com  2006-05-04 21:43 EDT -------
I am downloading OSS10.1 RC3 to try it.
question about deltaiso's:
I read here how to do it:
http://en.opensuse.org/Download_Instructions#Applying_Delta_ISOs
but my install server is Sles9, which does not contain the deltarpm package.
Can I instead just mount -o loop <rc1-rc3-delta.iso> and then copy the contents
over the existing rc1 content?

Or do I need to find a version of deltarpm for Sles9? 
Comment 18 Olaf Hering 2006-05-05 05:04:51 UTC
Mark, the codebase of SLES10 and 10.1 is identical. If it works with Beta11, it will work with 10.1 rc3 as well
Comment 19 LTC BugProxy 2006-10-11 03:16:12 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |FIXEDAWAITINGTEST
         Resolution|                            |FIX_BY_DISTRO




------- Additional Comments From marksmit@us.ibm.com  2006-10-10 23:11 EDT -------
Appears fixed in OpenSUSE 10.2  (alpha 4) for most ppc64 configs.  I am 
investigating only one scenario - VIO served lpar - one lvm vio disk (sda, 
11GB) and one entire physical disk (sdb, 72GB)- also vio to see if I can 
recreate a yaboot.conf problem.   Part of that scenario had a Fedora 
installation existing on sda, and the OpenSUSE install proposed a default 
scenario (that I accpepted) which preserved pieces of the previous install.

However, closing this one as fixed.