|
Bugzilla – Full Text Bug Listing |
| Summary: | No operating system found | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE LINUX 10.0 | Reporter: | Marco Michna <mmichna> |
| Component: | Basesystem | Assignee: | Jiri Srain <jsrain> |
| Status: | VERIFIED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Blocker | ||
| Priority: | P5 - None | CC: | aj, burnus, duwe, exigentsky, linuxblacksmith, nix, peter |
| Version: | Beta 4 Plus | ||
| Target Milestone: | --- | ||
| Hardware: | i386 | ||
| OS: | SUSE Other | ||
| Whiteboard: | |||
| Found By: | Component Test | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Bug Depends on: | |||
| Bug Blocks: | 97395 | ||
| Attachments: |
Copy of MBR
YaST2 logs YaST2 logs New MBR without bs=446 log from failed install C program to check and maybe fix the problem YaST logs; rather large as this is an update with existing logs from the old system Disk Image blkid output fdisk -l output save_y2logs /var/log/YaST2 tar ball dd if=/dev/hda of=mbr bs=512 count=1 |
||
|
Description
Marco Michna
2005-08-09 13:56:39 UTC
This is a different one from #100728, as we have seen. Very strange... *** Bug 104034 has been marked as a duplicate of this bug. *** *** Bug 103011 has been marked as a duplicate of this bug. *** Created attachment 45848 [details]
Copy of MBR
Used # dd if=/dev/hda of=/foo/mbr.iso bs=446 count=1 to capture image
Created attachment 45849 [details] YaST2 logs Last 2 posts from Bug 103011. Atatched are the YaST2 logs Created attachment 45850 [details] YaST2 logs Last 2 posts from Bug 103011. Atatched are the YaST2 logs good news / bad news: Using the HD image of the failing machine in qemu works flawlessly. So this is not a problem of the MBR code alone but only with interaction from the BIOS. The exact circumstances remain to be determined. I am seeing this same issue on a Toshiba Portege laptop that only has 1 internal hard drive. I have attempted the install several times, and even fdisk'd the drive between 2 installs with no change in results. Is there any information I can provide to help? Generally, it looks like BIOS information is required here. I assume the behaviour on some int 0x13 calls makes a difference. To get a working system meanwhile, Marco "Daemon" Michna has reported a /boot partition at the beginning of the disk saves the day. I noticed that the install copies the packages from cd1 to disk, thinks its done and then reboots. Never does ask for disk 2,3 or 4. I am seeing this problem on a compaq evo 500 desktop. That's a different issue. This bug is about the system failing to reboot. Could people affected attach their whole MBR, please (see dd command above, but WITHOUT the bs=446, result must be a 512-byte file), in addition to what the BIOS setup's main page says about those disk's geometry? (In reply to comment #10) > I noticed that the install copies the packages from cd1 to disk, thinks its done > and then reboots. Never does ask for disk 2,3 or 4. I am seeing this problem > on a compaq evo 500 desktop. My experience shows that is correct behavior. If it was able to boot, it would complete the install by running through the rest of the disk, prompting for root's password, user creation, configure H/W, etc.. Created attachment 45942 [details]
New MBR without bs=446
Used # dd if=/dev/hda of=/foo/mbr.iso count=1 to capture image
Output on screen:
Rescue:/ # dd if=/dev/hda of=/foo/mbr1.iso count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.023333 seconds, 21.9 kB/s
AMIBIOS SETUP - STANDARD CMOS SETUP Pri Master: Type - User Size - 3249Mb Cyln - 6296 Head - 16 WPcom - 0 Sec - 63 LBA Mode - On Blk Mode - On PIO Mode - 4 32Bit Mode - Off (In reply to comment #7) > good news / bad news: > > Using the HD image of the failing machine in qemu works flawlessly. > So this is not a problem of the MBR code alone but only with interaction from > the BIOS. The exact circumstances remain to be determined. Not sure if this helps any, but SuSE 9.3 Pro does NOT have this problem. (In reply to comment #9) > To get a working system meanwhile, Marco "Daemon" Michna has reported a /boot > partition at the beginning of the disk saves the day. Just noticed that if I change SWAP to use /dev/hda2 and / to use /dev/hda1, the install is able to finish with no error messages. The install was automatically partitioning swap to be /dev/hda1 and / to be /dev/hda2. I have tested this using both /dev/hdc and /dev/hda. *** Bug 104566 has been marked as a duplicate of this bug. *** As per my bug report 104566 here is the recovery procedure: * Boot from CD1 * Select "Rescue System" from the boot menu * Pick a keyboard map when it asks * Login as "root" * Type "mkdir suntel" * Type "fdisk -l /dev/hda" (or /dev/sda etc) to list your partition table * Mount the correct "Linux" root partition which in my case is /dev/hda3 to the suntel directory with "mount /dev/hda3 suntel" * type "chroot suntel" * type "mount /proc" * type "grub-install /dev/hda" (or /dev/sda or whatever your boot hard disk is) * type "umount /proc" * type "exit" * type "umount suntel" * eject CD1 from your drive * type "reboot" Your system should not boot correctly into linux and continue the rest of the installations (CDs 2-4) Note: That this uses the version of grub from your newly updated hard disk to re-write the MBR so therefore grub does know how to deal with the hard disks and partition structure... Weird problem. I had this problem on my IBM R50e notebook. (In reply to comment #12) > (In reply to comment #10) > > I noticed that the install copies the packages from cd1 to disk, thinks its done > > and then reboots. Never does ask for disk 2,3 or 4. I am seeing this problem > > on a compaq evo 500 desktop. > > My experience shows that is correct behavior. If it was able to boot, it would > complete the install by running through the rest of the disk, prompting for > root's password, user creation, configure H/W, etc.. It never asks me for root password, user creation, etc. Just reboots, then No operating system found. Created attachment 46137 [details]
log from failed install
Thomas, Arvin, the ycp code complains about a wrong list index at line 584: StorageDevices.ycp:198 invalid index 0 (max -1) in YCPValue YCPListRep::value(int) const Adrian just experienced the same bug on his laptop. Sorry, discard comments 20-22, wrong bug :-( (In reply to comment #16) > (In reply to comment #9) > > > To get a working system meanwhile, Marco "Daemon" Michna has reported a /boot > > partition at the beginning of the disk saves the day. > > Just noticed that if I change SWAP to use /dev/hda2 and / to use /dev/hda1, the > install is able to finish with no error messages. The install was automatically > partitioning swap to be /dev/hda1 and / to be /dev/hda2. I have tested this > using both /dev/hdc and /dev/hda. I switched my partitioning in this way and it worked for me as well. The solution seems to be:
1st: check what the BIOS thinks the disk's geometry is.
See the main setup screen or alike, or, once linux is running,
insmod edd and see /sys/firmware/edd/int13_dev80/legacy_*
2nd: make sure the active partition's LBA matches the C/H/S address.
use fdisk "x"pert menu to adjust it to the BIOS' geometry. After
"r"eturn to the main menu, "l"ist will inform you of potential
problems.
This all probably doesn't matter as long as the BIOS' LBA read works
perfectly, but the default case is that the BIOS is broken in one way or the
other. Sometimes LBA read is affected...
Jiri, can we implement the above in any way? Note that only the C/H/S address
of "our" Linux partition needs to be changed to match the LBA, a few bits to
change in the MBR.
Torsten, I need a non-interactive solution, fdisk is unusable for me. As far as I understand your comment, the solution means to modify the contents of MBR in some cases, but it is not specified how to do it. Also, we shouldn't modify it if it contains generic code, or vendor-specific code. Could you, please, provide a script, which does it all? Or at least which detects the wrong situation (in which I would fallback to boot loader in MBR)? What do you use to manipulate the partition table? Just point me at the code and I'll hack something up. Created attachment 46324 [details]
C program to check and maybe fix the problem
Can everyone experiencing this problem compile and run this test program,
please?
If this program does not report an error on the active partition, your problem lies somewhere else. (In reply to comment #28) > Created an attachment (id=46324) [edit] > C program to check and maybe fix the problem > > Can everyone experiencing this problem compile and run this test program, > please? Not sure how you want to test this. When I ran this from a server in rescue mode, it produced the following errors. ./FixCHS.c: line 8: unsigned: command not found ./FixCHS.c: line 98: unexpected EOF while looking for matching `'' ./FixCHS.c: line 156: syntax error: unexpected end of file Torsten, how and when am I supposed to run the program? It doesn't seem to write anything to MBR... Jiri: As I wrote in a private e-mail Wed, 17 Aug 2005 15:49:08 +0200: "The write is still #ifdef'ed out and commented out..." So that nobody gets hurt until it's at least a few times tested. The usage is e.g "FixCHS /dev/hda 2" assumed that you boot off the primary P-ATA master, second partition. Always give it a whole disk as 1st arg. Optionally number 1-4 for a primary part to fix (with the write enabled in the source, of course ;-) Otherwise it will only report. "Linux": the magic word is "... compile ...". I used "gcc -m32 -O2 -Wall -o FixCHS FixCHS.c" to achieve this. You'll need to do this from an installed system, as the rescue most likely has no compiler. OK, also am I supposed not to change anything now? How do you want it to be tested? Otherwise, I need to know when do you want me to run it (before GRUB is installed, after GRUB is installed, doesn't matter,...) Take out the #ifdef notyet and the "/* */" comment chars. Compile a 32-Bit binary. (See above gcc command) Get it into the installation environment. Run it any time before the reboot with the int13-0x80 disk as first arg, and the number of the linux partition you installed grub on as the second, IFF you installed a generic MBR and activated the former partition. Running the program shouldn't hurt, unless there's some weird other MBR code that has different requirements (The one Windoze installs works the same as the one we use!) Ok, sorry about not seeing the compile piece. :) Compiled FixCHS.c as mention in comment #32, (used "gcc -m32 -O2 -Wall -o FixCHS FixCHS.c"), on another machine running SUSE10. I copied FixCHS to a floppy. I reinstalled and stop the first reboot. Mounted the floppy and ran the program. /media/floppy # ./FixCHS /dev/hdc 2 Partition 2 mismatch [ 00 01 21 ] [5f 01 41 ] (Fixed) /media/floppy # fdisk -l Disk /dev/hdc: 3249 MB, 3249340416 bytes 255 heads, 63 sectors/track, 395 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hdc1 1 33 265041 82 Linux swap / Solaris /dev/hdc2 * 34 395 2907765 83 Linux Rebooted and still get "No Operating System" message. Is this how you wanted it tested? Yes. Thanks. The writing code is disabled, for safety reasons, so the "Fixed" is a fake, it doesn't change anything. The next beta will run this program in "armed" mode. BTW, isn't this system booting off /dev/hda? If you're daring you can edit FixCHS.c according to comment #34's first line, to arm the program. (In reply to comment #36) > > BTW, isn't this system booting off /dev/hda? > I have 2 PC's that are identical except one has the IDE drive as master on secondary (/dev/hdc), and the other has the IDE drive as master on primary (/dev/hda). On both boxes I get the same results. BTW since these are test boxes, I am willing to try whatever I can to help out. (In reply to comment #37) > If you're daring you can edit FixCHS.c according to comment #34's first line, > to arm the program. Sorry, but I guess I am missing something. I tried comment #34 of, "Take out the #ifdef notyet and the "/* */" comment chars.", but I get the following error. Modified: #ifdef notyet /* write(fd, buf, 512); */ #endif To: write(fd, buf, 512); #endif results: dellc600:~/temp # gcc -m32 -O2 -Wall -o FixCHS FixCHS.c FixCHS.c:145:2: error: #endif without #if If I remove the lines "#ifdef notyet" and "#endif", and change "/* write(fd, buf, 512); */" to "write(fd, buf, 512);" it compiles without any errors. Now when I run it, I still get the fixed message. But when I reboot, I get the follwing error on both the hdc and hda PC's: GRUB Loading stage1.5. GRUB loading, please wait... Error 17 and a blinking cursor Did I modify FixCHS.c correctly? Yes, perfectly. You're now 2 steps further into the boot process. Does the partition in question still contain a valid file system? Anyway, for you the original problem is solved; a new install (be it the whole installation or just the boot loader) should fail no more. YaST now calls the armed version of the script while installing if - the generic code is being written to MBR and - a partition is activated This should solve the problem. Done in SVN, will go to Beta3. Stop the Press!
This is cruicial:
@@ -66,6 +66,8 @@
check_part(int n, int do_fix)
{
int c, h, s;
+ unsigned LBA;
+
part_entry * parttab = (part_entry *)(buf+PARTTAB_OFFSET);
n--;
@@ -75,7 +77,9 @@
if(parttab[n].sysId == 0) /* unused partition */
return 0;
- set_hsc(h,s,c,parttab[n].lstart);
+ LBA = parttab[n].lstart;
+
+ set_hsc(h,s,c,LBA);
I'll mail you the current source.
(In reply to comment #40) > Yes, perfectly. You're now 2 steps further into the boot process. > Does the partition in question still contain a valid file system? > What is the best way to test if you have a valid file system? I tried to reinstall, and right after agreeing to the license agreement I get a popup window with the follwing text. The partitioning on disk /dev/hda is not readable by partitioning tool parted, which is used to change the partition table. You can use the partitions on disk /dev/hda as they are. You can format them and assign mount points to them, but you cannot add, edit, resize or remove partitions from that disk with this tool. And I have an OK button. On my /dev/hdc PC, I had to use fdisk to delete everything to get it back to a working state. After fdisking the drive I can then change the partitions in YaST. The pop-up is because the serious bug in my code that the patch in comment #42 (sic!) fixes. And this is also why I called it "daring" in comment #37. It should help to delete and re-create the partition in question, preferably with a tool that respects the BIOS' c/h/s mapping. No problem... Let me know if you would like for me to try any additional test or patches. Please note that modifying the grub setup in yast from a fully working SUSE 10 Beta 1 system also leaves my machine unable to boot with the same problem. The procedure in Comment #18 once again fixes the problem. Why is yast not correctly running grub-install /hard/disk ??? this FixCHS is now part of yast2-bootloader. To compile it, a 32bit devel enviroment is required. This moves yast2-bootloader past baselibs-32bit and extends the rebuild cycle unneccessary. Please move that simple C file to some other package, I'm sure it will never change. remind me why it has to be 32bit anyway? I see unsigned int and unsigned char there. Hmm, any suggestion to which package? I could put it into master-boot-code, but this one is shipped with the BSD license (and FixMBR contains a piece of fdisk code - according to comments). Torsten, would it be possible to put it into the GRUB package? Well, it doesn't make sense either, as it is needed for LILO as well, but creating a new package is not possible (according to schedule). And no, I don't know why it has to be 32-bit, but as it fiddles with the MBR code, I guess there is a reason. Torsten, any idea? master-boot-code is perfect. The snippet from fdisk is buggy anyways, a rewrite more than welcome (I'll do it myself, no prob). This would ease the merge with #105828 a lot ! Updated Code with sane Copyright in your Inbox, Jiri. Please put into master-boot-code! Updated master-boot-code package has been checked in already, yast2-bootloader without FixCHS is in SVN and will be submitted after beta3 is out. *** Bug 106593 has been marked as a duplicate of this bug. *** Is this one supposed to be fixed in beta3 or AFTER beta3? I've just encountered the problem with an SL 9.2.9 late beta to SL 10.0 beta3 update. "No operating system" after the reboot. Making this a blocker in case it is supposed to be fixed for beta3. Should I attach any logs? Yes, attach the logs, please. at least the output of dd if=/dev/hda of=disk.img bs=1024 count=1 fdisk -l blkid I'm seeing this with beta2 and beta3 on an ibm thinkpad t42p. I believe beta1 installed without problems. I was just about to report the same problem. The trouble is with my Toshiba Tecra A4. Can someone, please, provide information requested in comments #54 and #55 (from Beta3 or Beta4 once it is out)? I cannot reproduce it and without logs I cannot find what's wrong or fix it. I'll post my YaST log folder and the info from #55 in a minute. Created attachment 48344 [details]
YaST logs; rather large as this is an update with existing logs from the old system
Created attachment 48345 [details]
Disk Image
Created attachment 48346 [details]
blkid output
Created attachment 48347 [details]
fdisk -l output
*** Bug 114429 has been marked as a duplicate of this bug. *** *** Bug 103761 has been marked as a duplicate of this bug. *** Actually there is no info missing in this bug. Fixed packages have been submitted for RC1. Let's leave this bug open until we tested the fix. OK, as the fixed package has been submitted, I'm marking this bug as resolved/fixed. Please, reopen (and attach new logs) if you still have problems. still happens with beta4plus what logs do you want / need? This was a fresh installation, /dev/hda1 is swap, /dev/hda2 is /root, no other partitions, no windows on the machine. If you already rebooted, then tar /var/log/YaST2. If you are before reboot, tar both /var/log/YaST2 and /mnt/var/log/YaST2. Created attachment 48815 [details]
save_y2logs
of course i rebooted, otherise we would not have seen "no operating system" ;-)
Logs salvaged with rescue system.
Hmm, this is the result of the policy "Do not touch MBR if there is generic code or unknown code". In this case, Torsten's script detected Generic MBR, thus MBR wasn't touched. So, Torsten, what to do? Run FixMBR even if code in MBR wasn't touched? Is it safe? Doesn't it make things even worse? Nope, FixCHS on _our_ partitions is always a good idea (all of them :-). fdisk is broken (admitted in its manpage), and for parted I don't know. With LBA-enforcing boot mechanisms this isn't so much of a problem, but we have to deal with a lot of legacy stuff in this area. I would only recommend against running FixCHS for partition types of ancient OSes, that might rely on a particular C/H/S; OTOH, these will ensure for themselves that the mapping is correct and FixCHS shouldn't be needed on these anyways. BTW, it's a good idea not to waste the first cylinders in a primary partition as swap... That's the Yast recommended setup. Our new trainee installed the system, i'm pretty sure he did nothing special. I was under the impression that it was a good idea to use the first cylinders in a partition for swap as it was faster... Indeed (assumed you meant to say "disk"). But as we race drivers say: "To finish first, you have to finish first." A system that doesn't boot isn't fast ;-) And once you start to swap, you're _a_lot_ slower than with a little more memory. Yes. I meant disk. I also agree that a system that doesnt boot isnt fast, however Linux (LILO and Grub) have supported "swap as first partition" configuration since at least 1998. I dont see why this should change now... RAM is also cheap, and preferred of course, however in the real world most systems at some time in their life have to rely on swap. Why would you make it less efficient that necessary? Torsten, I now (just committed to SVN) run fix_chs on each partition which is being activated regardles whether the MBR itself is changed. Hope it is what you wanted (if not, please, reopen ASAP). Resolving as FIXED. does still happen with RC1 here And RC1 logs? I need to know whether fix_chs was run. Also attach the resulting MBR, please. Created attachment 49022 [details]
/var/log/YaST2 tar ball
Created attachment 49023 [details]
dd if=/dev/hda of=mbr bs=512 count=1
system is still in this state, if you need further informations or tests Thanks. Torsten, how can I fix this? Running command /usr/sbin/fix_chs /dev/hda 2 Command output: $["exit":1, "stderr":"fix_chs: edd module loaded?\n chdir to /sys/firmware/edd/int13_dev80: No such file or directory\n", "stdout":""] (hope it is readable for you) Is "modprobe edd" sufficient? BTW: Is it important whether the script is run before or after code in MBR is eventually replaced? First: _*This*_ bug _is_ fixed. For further proceeding: Jiri, yes, modprobe edd does the trick. This modules asks the BIOS what _it_ thinks the geometry is, the only relevant geo for booting :-/ AJ, Adrian: your prob has a 50% chance to be a duplicate of Bug #115330 . I'll now check with Seife whether it is. RC1 installed without (unexpected) manual intervention. Before I had to create and install the grub menu manually, this it was just there :)) I would propose to mark the bug as verified. |