Bugzilla – Bug 115372
INIT: cannot execute /sbin/mingetty after installation
Last modified: 2005-09-09 15:24:36 UTC
After installing Beta 4 Plus (I went home during installation) the system prints out lots of INIT: cannot execute "/sbin/mingetty" and INIT: Id "1" respawning too fast: disabled for 5 minutes (the same for Id 2-6). On Ctrl-Alt-Delete it only prints INIT: cannot execute "/sbin/shutdown" This also happens after a reboot. The last thing that actually seemed to work is configuring the serial ports. After that, the system cannot exec /bin/rm or /bin/mount due to "Permission denied". This also happens in Failsafe startup. There are no obvious kernel messages. If the System is booted into another os, I can successfully chroot into the new installation root. The machine is an IBM T42p, currently booted into 9.3, name is g147. New installation root can be found at /root2.
Which filesystem do you use?
Forgot to mention that: Reiser It's a pretty standard installation.
You might validate the md5-sums for your downloaded ISO-Image (file MD5SUMS on the mirror), maby this package was broken and the installation failed that way.
I installed via SLP.
I can do another installation try, but I didn't want to change anything so that someone interested in a phorensic analysis still gets the laptop smoking hot. It is sitting here in 3.2.6, if someone is interested in taking a look at it. I don't know where to look at, I've never seen a similar problem. Please tell me what I should do.
OK, Matthias. Can you mount the hdd from somehwere else (rescue system maby) and provide, let's say 500 lines of /var/log/messages? This would help I guess.
(comment #0) > If the System is booted into another os, I can successfully chroot into the new > installation root. The machine is an IBM T42p, currently booted into 9.3, name > is g147. New installation root can be found at /root2. Feel free to log in and analyze. /var/log/messages is empty, so is /var/boot.log. I think the harddisk hasn't been mounted read-write prior to failure (/bin/mount failed: Permission denied). SysRQ didn't work either. Strangely enough I got some debug output (appearantly the currently running process, pid 0) when I pressed Alt+ScrollLock by accident.
This does not need to be a kernel problem, maybe the dynamic loader is broken or the glibc. The very same kernel is being used during installation, so it obviously is able to execute binaries. Since the system was unattended during install, we do not know what exactly started the problems. It's just that every mysterious bug is declared a kernel bug eventually... Please provide steps on how to reproduce the problem, else I will close this one as WORKSFORME.
Actually this is pretty definitively not a kernel problem, as copying an old kernel (+initd +System.map +modules) from SL9.3 didn't behave vastly different. Changing component as soon as we have a culprit. I also thought about the dynamic loader. glibc would be another possibility. I'll try copying the according files from 9.3 (will produce other side effects, I know, but might help to narrow this one down). First I'll try installing 10.0b4+ on another partition, this time using ext3, just to make sure. Wait for results here.
Ok, installation on another partition using ext3 worked out-of-the-box. Will do a filesystem diff now.
Did you also try re-installing it with reisfs? Might not neccessarily be a problem with reiser...
Created attachment 48954 [details] Filesystem differences
Created attachment 48955 [details] Reduced filesystem diff I removed all differences that are obviously not the source of the problem (i.e. all /opt references, /proc, config files of higher level system services etc.)
Ok, as the initrd was different (and the output suggests that something already fails in initrd) I tried just copying the initrd from the ext3 installation. Kernel panic at end of booting from initrd as anticipated, the messages look exactly the same like with the installed version. As /boot/grub/stage2 was different, I also copied the working version over the other, and also did a mkinitrd in the chroot environment. Not that this helped anything. I don't see any additional differences that could influence booting, I'm pretty much lost now. If nobody has any additional ideas, I'll try an installation with reiser again. Just to make sure.
I do get udev[xxx]: run_program: exec of program '/sbin/udev.mount.sh' failed during booting of the working installation as well. So the initrd failures do not seem to influence behavior, it must be something of the base system. Again, machine is g147, it is up and running SL10.0b4+, non-working installation in /root2.
Did you fsck the reiser partition? Please also try to connect the machine to a serial console and capture all boot messages.
fsck told me this is a perfectly healthy reiserfs. I also just installed once more on a different partition using reiserfs, no problems this time. Where do I get a serial crossover cable?
Actually this won't work. The laptop doesn't have a serial console.
fsck will not do a very thorough check by default. Did you try reiserfsck --check? BTW g147 isn't reachable right now.
Was just configuring after installation. Works again. Doing the reiserfsck in a screen now. Feel free to attach. No corruptions found.
So if nobody is interested in a post-mortem analysis, I will try to install on /dev/hda7 again tomorrow, nuking the broken partition. I have to do some more serious work on this laptop. In any case, I don't know what went wrong with the installation, I cannot find any serious differences between the different file systems, so I have no clue where to continue searching, but given the circumstances I'm pretty frightened that this could happen for customers as well.
*** Bug 115247 has been marked as a duplicate of this bug. ***
Was the broken install an upgrade or a fresh install?
Fresh install. The log of bug 115247 looks completely different from this one. I don't think it is about the same bug.
Matthias, can you please save the partition to some other machine so we can loopback mount it? I am unable to look into this today. Chris, this could be a file system problem - do you mind having a look?
Ok, I copied the system to 'delen', /dev/sda7. The partition is actually much larger than the filesystem. I've set up a serial link from delen to 'ivanova', the serial console is running in a root screen there. Haven't configured grub to use serial, will do soon. But the system stops booting even earlier than before (maybe a grub misconfiguration - will check that). Actually it tries to exec /sbin/udev.input_device.sh, which needs /bin/bash, which is not there in the initrd. Then it just exits to /bin/sh. I'll try building a new initrd in a chroot environment for this system.
Ok, added ata_piixi, and now the system behaves exactly like on the laptop. So at least we have a completely reproducable and appearantly hardware and kernel independend problem. Grub is also set up for serial console, baud rate is 38400. A root screen with the console is running, attach with screen -x on ivanova. The only problem is that SYSRQ (i.e. sending a serial break) doesn't work any more, if the system is crashed. Hell, even the power button doesn't do any good, the only possibility to reboot the system is to unplug the power cord of the live system! W/o ACPI (failsafe settings) I can switch off the system with the power button, but SYSRQ doesn't work, and the boot messages change a bit. Though the problem seems to remain the same. Attaching boot log files now.
Created attachment 49201 [details] Boot log with Failsafe settings.
Ok, the partition with the failing filesystem is /dev/sda7 on delen, right? Could you please boot delen off a different disk so that I can look at /dev/sda7 with a running kernel?
Created attachment 49206 [details] Boot log with standard settings. The major difference at the end of the log is just due to the different runlevel selection.
Ok, the system is now booted on SL10.0b2, the partition-under-test is mounted on /os3. Feel free to login and reboot as you like. You will have to call me in order to reset the machine. I'll reinstall the laptop now.
Great, this made things much easier. The filesystem is not corrupt, but udev is doing something strange. Hannes has an idea so I've cc'd him.
Oh well. mkinitrd refuses to include any udev script which defaults to /bin/bash, as bash might not be available in the initrd. Additionally the corresponding udev rules should be commented out in the initrd, but somehow udev still tries to execute these scripts. Pls try the following: unpack the initrd (mkdir /tmp/initrd-tmp; cd /tmp/initrd-tmp; zcat /boot/initrd | cpio -idumv) and attach the contents of etc/udev/rules.d/50-udev.rules. To get the real problem solved: Please create a dummy /sbin/udev.mount.sh program eg #!/bin/sh : re-run mkinitrd and reboot. The message should then vanish and the real problem can be attacked.
Created attachment 49222 [details] etc/udev/rules.d/50-udev.rules from initrd
During mkinitrd: Shared libs: lddlibc4: cannot read header from '/sbin/udev.mount.sh': No such file or directory. Trying a reboot anyway.
Created attachment 49226 [details] Boot log with patched initrd
From the kernel side, the filesystem isn't corrupt. This must be an issue with the initramfs. Hannes, any other ideas?
On IRC Hannes points out they have made some changes in this area for RC1. If a new install works now I think we can close this.
You could try booting with init=/bin/bash. If that works I doubt it's a mkinitrd problem.
As I cannot reproduce it even with beta4, Ok, I have a even simpler way to reproduce now; booting with init=/bin/bash ls / works mount -o remount,ro / ls / : Permission denied echo /* indicates that the filesystem is still there. Alternatively, a 'mount -o remount,rw /' triggers the bug as well (note that the filesystem was already mounted r/w). I am still able to write something to the filesystem with 'echo /* >/tmp/blub'. But no executables are executed any more. I even tried executing of a binary from another partition that has been mounted before, with the same error message. The system is currently in this state, console in root screen on ivanova as usual.
First sentence was interrupted; I have only seen this issue once, but I'm afraid that this could happen for customers as well. As long as we don't even know what's going on here, we cannot simply close it.
Checked with latest kernel from stable, same effect.
The console right now seems frozen. Could you please boot with init=/bin/bash again?
Was just doing another test. I created a new reiser on /dev/sda5 and copied all files with 'cp -a' to the new filesystem. I have just booted the new system, and it behaves exactly the same. I can retry with ext3 as well, but I think this pretty much ruled out a broken filesystem. I also get a 'Badness in pci_get_subsys at drivers/pci/search.c:234' when doing booting with SysRQ-U SysRQ-B plus a stack trace (in screen history log). I'm now rebooting into /dev/sda7. Reducing severity, as it only occured once so far.
ok, this is very strange indeed. I'm booted on /dev/sda6 (the disk that works). /dev/sda7 (the bad partition) is mounted on /os3. chroot /os3 ls # works mount # works mount -o rw,remount / # works ls # permission denied exit mount -o rw,remount /os3 # works ls /os3 # works Some part of the install on /os3 is corrupted.
Created attachment 49421 [details] trace output of mount mount appearantly specifies the wrong flags (e.g. noexec) for remounting. Don't know yet where the flags come from. BTW the trace shows 'hda7' which is a leftover from the laptop, the partition is now sda7. Checking that.
/etc/fstab on that filesystem had ``user'' in the root filesystem's mount options, so the first remount cleared to exec and other mount flags. We saw the results in various settings above. Matthias is not sure why the flag was in the fstab; maybe this happened by accident during the installation. As the bug only triggered once so far, this is the most likely explanation.