Bugzilla – Bug 118305
LTC18070-Default boot loader fails to install for both upgrade and new install
Last modified: 2016-02-13 05:47:49 UTC
LTC Owner is: thinh@us.ibm.com LTC Originator is: marksmit@us.ibm.com Problem description: Testing basic installation of OSS-SUSE10 on Power5 Vscsi and Veth client lpar. Both new installation and upgrade from (successfully installed) SLES9_SP2 fail to install the Boot Loader. Performing an NFS install of Beta4. Lpar contains two Vscsi disks: sda (20GB) (Logical Volume created on VIO server) sdb (146GB entire physical disk assigned to Vscsi disk) Hardware Environment Machine type (p5-550 SF4) Cpu type (Power5): Describe any special hardware you think might be relevant to this problem: System is a "no-hmc" VIO server (alpha VIO). alpha VIO is accessed: http://codeine01.austin.ibm.com/ (padmin : padmin) (use this to lower BSO firewall). After being blocked from starting a "new installation", the lpar: codeine03.austin.ibm.com:1 (vnc pw: don2rry) was installed with SLES9_SP2. The installation was re-started, this time choosing "upgrade" and existing install. All defaults were selected. Upgrade proceeded until attempting to install the boot loader. System has failed to proceed, despite several attempts to re-define the boot loader. The lpar: codeine03.austin.ibm.com:1 is still accessible via vncviewer for inspection. I left it in the failing state. Please proceed to click "yes" to retry. It will send you to the manual boot loader configuration (and installation) tabs within the vncviewer. I tried both Lilo (default) and "ppc" but am continuing to get the fail. Created an attachment (id=12251) Initial boot loader error screen Created an attachment (id=12252) Default boot loader selections we will take a look Hmm... Olaf said that there were issues with installing a bootloader on iSeries (legacy). RC1 was just released today, so you could try that, too. ;) And I don't see this on the list of most annoying bugs: http://www.opensuse.org/Bugs:most_annoying_bugs Hi Mike, I will wait until RC1 is mirrored to http://software.linux.ibm.com/pub/suse/beta_cds/opensuse-10/SL-OSS-current/iso/ before I can access it. At that time I will attempt recreate. (usually a few days). In the mean time, Thinh, please indicate when/if you are done with the investigation of its current failing state. Mark, Have you try to resolve these error before installing: (errlog on the VIO server) --------------------------------------------------------------------------- LABEL: CLIENT_FAILURE IDENTIFIER: C972F43B Date/Time: Wed Sep 7 14:06:29 CDT 2005 Sequence Number: 40 Machine Id: 00CD66BF4C00 Node Id: codeine01 Class: S Type: TEMP Resource Name: vhost1 Description Misbehaved Virtual SCSI Client Probable Causes Bad IU, or SRP Violation Failure Causes Bad IU, or SRP Violation Recommended Actions Remove Virtual SCSI Client, then Configure the same instance Detail Data ADDITIONAL INFORMATION module: target_trans_event rc: 00000000FFFFFFD8 location: 00000002 data: 1 1 0 0 0 --------------------------------------------------------------------------- LABEL: CLIENT_FAILURE IDENTIFIER: C972F43B Date/Time: Wed Sep 7 09:29:27 CDT 2005 Sequence Number: 39 Machine Id: 00CD66BF4C00 Node Id: codeine01 Class: S Type: TEMP Resource Name: vhost1 Description Misbehaved Virtual SCSI Client Probable Causes Bad IU, or SRP Violation Failure Causes Bad IU, or SRP Violation Recommended Actions Remove Virtual SCSI Client, then Configure the same instance Detail Data ADDITIONAL INFORMATION module: target_trans_event rc: 00000000FFFFFFD8 location: 00000002 data: 1 1 0 0 0 --------------------------------------------------------------------------- Mark, I'm done with the machine. The "Misbehaved Virtual SCSI Client" error, I think that is linux ibmvscsi client. we need to look at OSS-SUSE10 source to see if the ibmvscsi is at the latest. downloaded rc1 iso's and created network install images aborted previous "upgrade" installation (blocked by this bug) and started a new nfs install. did an over-ride on the default offered "new installation" and instead chose to upgrade the existing install (that was aborted). 1. will see if upgrade succeeds or recreate this bug. 2. if recreate, then will attempt "new installation" with new rc1 isos. 3. if recreate on "new", then will delete vscsi disks, per VIOServer error log recommendation, then will retry #2. 4. if recreate, then will update VIOServer code and system f/w to latest GA6/53D available, and again do #3 and then #2. did all 1, 2, 3, 4 steps. bug continues to recreate in all cases. A different distro also having partition problems with this 2 disk vscsi combo: sda = 20GB (logical lvm vscsi) , sdb = 136GB (physical disk vscsi). just to re-iterate: can install SLES9SP2 ok, but cannot upgrade that install, nor can install new with oss_SuSE10, both due to this bug. Created an attachment (id=12422) boot loader problem with just one disk removed sdb (physical volume 136GB disk) and recreated with just sda. Created an attachment (id=12423) same problem with only the 136GB physical volume vscsi This time I removed the lvm 20GB disk and put the physical volume 136GB disk back in. problem still recreates. tried setting sda3 as default loader loc. doesn't help. tried removing and adding a new one at sda1. doesn't help. I'm stuck. receated on a 2nd VIOS hmc-attached, but also on a nonVIO no-hmc (aka genesis) OpenPower 710 (just IPR disks).
Created attachment 50573 [details] sles9-10migrate_bootLoaderBug.GIF
Created attachment 50574 [details] sles9-10migrate_bootLoaderBug2.GIF
Created attachment 50575 [details] oss_SuSE10_bootLoaderProblem-1.jpg
Created attachment 50576 [details] oss_SuSE10_bootLoaderProblem.jpg
Yes, you are right. The URL http://www.opensuse.org/Bugs:most_annoying_bugs should _clearly_ mention that yast does ONLY know about msdos partition tables. And as a result, installation will require many many manual tweaks. They are mentionend behind the links on this page: http://www.opensuse.org/POWER@SUSE assigning to webmaster of that site.
changed: What |Removed |Added ---------------------------------------------------------------------------- Owner|thinh@us.ibm.com |gjlynx@us.ibm.com Owning Team|LTC Internal Support |SuSE ------- Additional Comments From thinh@us.ibm.com(prefers email via th2tran@austin.ibm.com) 2005-09-26 11:39 EDT ------- assign to OpenSuse team.
---- Additional Comments From marksmit@us.ibm.com 2005-09-28 15:38 EDT ------- ok, restarted a netboot install and enabled ssh to have a command prompt. then created manual partitions: inst-sys:~ # fdisk -l Disk /dev/sda: 146.8 GB, 146814976000 bytes 255 heads, 63 sectors/track, 17849 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 14 112423+ c W95 FAT32 (LBA) /dev/sda2 15 137 987997+ 82 Linux swap / Solaris /dev/sda3 138 3177 24418800 83 Linux and then started the install. The previous Sles9 install is erased. (I think that is the problem, but I\'m trying to do an OSS10 install to a blank disk). I tried #2, but since there is no o/s that doesn\'t work. Start Installation or System 1) Start Installation or Update 2) Boot Installed System 3) Start Rescue System So I again tried #1 From command line, pdisk -l says pdisk: No valid block 1 on \'/dev/sda\' So I mounted /boot with /dev/sda1. created an /etc/lilo.conf per web page suggestion scp of CD1/suseboot/ initrd64 and linux64.gz to /boot/initrd and /boot/vmlinux Ran lilo to try to write /dev/sda, but lilo fails and gives a segmentation fault. Am I supposed to run lilo to make /dev/sda bootable? Or do I need to install Sles9 and then perform an upgrade type install? I don\'t seem to understand the workaround process for this problem.
---- Additional Comments From marksmit@us.ibm.com 2005-09-28 15:56 EDT ------- Ok, sorry for the last append. I think I needed to continue trying like this. It seems to be installing now. Please let me know if this is the correct action: used created partitions, and started yast choose to update existing system. choose to show all partitions, so I can see the empty /dev/sda3. choose to update /dev/sda3. manually choose update method to base update on installing all packages needed for KDE install (instead of just updating existing install - default selection - which is empty). start install. upon reboot, again choose ssh and then go run lilo on installation. does this look right?
Mark, the partitioner is supposed to work ok on pseries. Only the bootloader would be an issue on pseries. The instructions are mostly for Macs, have to tweak them a bit this weekend. The link on the most annoying bugs page is still missing.
---- Additional Comments From marksmit@us.ibm.com 2005-11-05 17:12 EDT ------- This recreates on OSS 10.1 Alpha 2, power Lpar served by IVM \"alpha\" VIO server (Vscsi disk and Veth devices) I checked http://www.opensuse.org/POWER@SUSE but do not understand how to get the bootloader installed.
Mark, http://www.opensuse.org/PPC:Boot_pseries this explains how to create a lilo.conf after install.
---- Additional Comments From marksmit@us.ibm.com 2005-12-19 22:40 EDT ------- This problem is also blocking ppc64 installs (power5 h/w) on Sles10, preview2. I do not see a way to work around it, nor tell it to continue installing despite no boot loader defined. So I do not understand how to tell the system to do a new install, and fix the boot loader later. Do you wish for this bug to be duplicated to Sles10? Or are you already aware of this problem on Sles10 previews?
---- Additional Comments From marksmit@us.ibm.com 2005-12-21 12:11 EDT ------- documenting helpful hints from Olaf: to work around this problem & fix it after install finishes, boot with start_shell option on command prompt (to fix without a rescue boot) change proposed config -> pick bootloader -> Bootloader Installation -> pick twisty next to default proposed \"ppc\" and pick other option: \"do not install boot loader\" finish install. system drops back to shell. work around possible problem in shell: To reset the terminal to a usable size, type: <RETURN> stty cols 80 rows 24 assuming the root partition is on sda3: type \'echo Root: /dev/sda3 > /etc/yast.inf\' followed by \'exit\' or \'ctrl d\' This will do the very same thing as \'booting into the installed system\' from within yast. yast.inf is the \"communication channel\" between yast and linuxrc. then \"exit\" and system will boot and mount Root normally. (next it may take you to /usr/lib/YaST2/startup/YaST2.ssh first) At this point, create a suitable /etc/lilo.conf file (sample in link above), and run \'lilo\' to install bootloader. it will create a /etc/yaboot.conf for you and seems to change the SMS bootlist to insert /dev/sda as the 1st boot device.
---- Additional Comments From marksmit@us.ibm.com 2006-01-23 14:43 EDT ------- I was able to install Sles10 preview2 ok using the documented workaround. On beta1 for Sles10, I attempted a \"new install\" without workaround. The autopartitioner correctly proposed removing the previous install and proposed creating newer partitions for the install. I chose the defaults and the install got to bootloader installation at the end of the \"new installation\" where it failed to install the boot loader. So I still seem to have a problem. To attempt recreate, I removed all installations and attempted an install. In this case autopartitioner refuses to propose a scenario, but when I manually (expert path) propose a partitioning scheme, the install succeeds. sda1 1 block prep boot sda2 1GB swap sda3 12GB reiser for / Is it expected behavior for the Sles10 autopartitioner to require manual partitioning on blank disks? Or should I open a bug? I am attempting recreate of the \"re-install\" where the bootloader install failed. This time it will be from Sles10 beta1 (manual partition) to another \"re-install\" of the same beta1, using autopartitioner\'s defaults.
---- Additional Comments From marksmit@us.ibm.com 2006-01-25 02:27 EDT ------- clarified during conference phone call today that autopartitioner should propose a scheme for blank disks. I\'ve created an \"install scenario\" for testing that and will have the team pursue it. In the mean time, the Sles10-beta2 patches uploaded to ftp3 server by SuSE, when patched onto beta1 isos, do fix this bug for a \"new install\". Since that is a different release, should we reject this on the IBM side with \"alt_solution_avail\" or should we change it to \"accepted\" so it can then be closed?
works slightly better in 10.1
----- Additional Comments From marksmit@us.ibm.com 2006-05-04 21:43 EDT ------- I am downloading OSS10.1 RC3 to try it. question about deltaiso's: I read here how to do it: http://en.opensuse.org/Download_Instructions#Applying_Delta_ISOs but my install server is Sles9, which does not contain the deltarpm package. Can I instead just mount -o loop <rc1-rc3-delta.iso> and then copy the contents over the existing rc1 content? Or do I need to find a version of deltarpm for Sles9?
Mark, the codebase of SLES10 and 10.1 is identical. If it works with Beta11, it will work with 10.1 rc3 as well
changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |FIXEDAWAITINGTEST Resolution| |FIX_BY_DISTRO ------- Additional Comments From marksmit@us.ibm.com 2006-10-10 23:11 EDT ------- Appears fixed in OpenSUSE 10.2 (alpha 4) for most ppc64 configs. I am investigating only one scenario - VIO served lpar - one lvm vio disk (sda, 11GB) and one entire physical disk (sdb, 72GB)- also vio to see if I can recreate a yaboot.conf problem. Part of that scenario had a Fedora installation existing on sda, and the OpenSUSE install proposed a default scenario (that I accpepted) which preserved pieces of the previous install. However, closing this one as fixed.