Bug 115372 - INIT: cannot execute /sbin/mingetty after installation
Summary: INIT: cannot execute /sbin/mingetty after installation
Status: RESOLVED FIXED
: 115247 (view as bug list)
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 4 Plus
Hardware: i386 All
: P5 - None : Critical
Target Milestone: ---
Assignee: Chris L Mason
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-06 09:29 UTC by Matthias Hopf
Modified: 2005-09-09 15:24 UTC (History)
3 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Filesystem differences (44.57 KB, application/x-gzip)
2005-09-06 16:22 UTC, Matthias Hopf
Details
Reduced filesystem diff (3.92 KB, text/plain)
2005-09-06 16:23 UTC, Matthias Hopf
Details
Boot log with Failsafe settings. (17.05 KB, text/plain)
2005-09-08 12:51 UTC, Matthias Hopf
Details
Boot log with standard settings. (26.92 KB, text/plain)
2005-09-08 13:03 UTC, Matthias Hopf
Details
etc/udev/rules.d/50-udev.rules from initrd (11.26 KB, text/plain)
2005-09-08 14:47 UTC, Matthias Hopf
Details
Boot log with patched initrd (16.11 KB, text/plain)
2005-09-08 14:54 UTC, Matthias Hopf
Details
trace output of mount (6.25 KB, text/plain)
2005-09-09 15:09 UTC, Matthias Hopf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Hopf 2005-09-06 09:29:39 UTC
After installing Beta 4 Plus (I went home during installation) the system prints
out lots of

INIT: cannot execute "/sbin/mingetty"

and

INIT: Id "1" respawning too fast: disabled for 5 minutes
(the same for Id 2-6).

On Ctrl-Alt-Delete it only prints
INIT: cannot execute "/sbin/shutdown"

This also happens after a reboot. The last thing that actually seemed to work is
configuring the serial ports. After that, the system cannot exec /bin/rm or
/bin/mount due to "Permission denied". This also happens in Failsafe startup.
There are no obvious kernel messages.

If the System is booted into another os, I can successfully chroot into the new
installation root. The machine is an IBM T42p, currently booted into 9.3, name
is g147. New installation root can be found at /root2.
Comment 1 Dr. Werner Fink 2005-09-06 09:43:25 UTC
Which filesystem do you use?
Comment 2 Matthias Hopf 2005-09-06 09:51:07 UTC
Forgot to mention that: Reiser
It's a pretty standard installation.
Comment 3 Michael Gross 2005-09-06 11:53:47 UTC
You might validate the md5-sums for your downloaded ISO-Image (file MD5SUMS on
the mirror), maby this package was broken and the installation failed that way.
Comment 4 Matthias Hopf 2005-09-06 12:33:27 UTC
I installed via SLP.
Comment 5 Matthias Hopf 2005-09-06 12:38:37 UTC
I can do another installation try, but I didn't want to change anything so that
someone interested in a phorensic analysis still gets the laptop smoking hot.

It is sitting here in 3.2.6, if someone is interested in taking a look at it. I
don't know where to look at, I've never seen a similar problem.

Please tell me what I should do.
Comment 6 Michael Gross 2005-09-06 13:00:30 UTC
OK, Matthias.
Can you mount the hdd from somehwere else (rescue system maby) and provide,
let's say 500 lines of /var/log/messages? This would help I guess.
Comment 7 Matthias Hopf 2005-09-06 13:06:45 UTC
(comment #0)
> If the System is booted into another os, I can successfully chroot into the new
> installation root. The machine is an IBM T42p, currently booted into 9.3, name
> is g147. New installation root can be found at /root2.

Feel free to log in and analyze.
/var/log/messages is empty, so is /var/boot.log. I think the harddisk hasn't
been mounted read-write prior to failure (/bin/mount failed: Permission denied).
SysRQ didn't work either. Strangely enough I got some debug output (appearantly
the currently running process, pid 0) when I pressed Alt+ScrollLock by accident.
Comment 9 Hubert Mantel 2005-09-06 13:29:25 UTC
This does not need to be a kernel problem, maybe the dynamic loader is broken or
the glibc. The very same kernel is being used during installation, so it
obviously is able to execute binaries. Since the system was unattended during
install, we do not know what exactly started the problems. It's just that every
mysterious bug is declared a kernel bug eventually...
Please provide steps on how to reproduce the problem, else I will close this one
as WORKSFORME.
Comment 10 Matthias Hopf 2005-09-06 13:45:38 UTC
Actually this is pretty definitively not a kernel problem, as copying an old
kernel (+initd +System.map +modules) from SL9.3 didn't behave vastly different.
Changing component as soon as we have a culprit.

I also thought about the dynamic loader. glibc would be another possibility.
I'll try copying the according files from 9.3 (will produce other side effects,
I know, but might help to narrow this one down).

First I'll try installing 10.0b4+ on another partition, this time using ext3,
just to make sure. Wait for results here.
Comment 11 Matthias Hopf 2005-09-06 14:41:34 UTC
Ok, installation on another partition using ext3 worked out-of-the-box.

Will do a filesystem diff now.
Comment 12 Michael Gross 2005-09-06 14:52:11 UTC
Did you also try re-installing it with reisfs?
Might not neccessarily be a problem with reiser...
Comment 13 Matthias Hopf 2005-09-06 16:22:02 UTC
Created attachment 48954 [details]
Filesystem differences
Comment 14 Matthias Hopf 2005-09-06 16:23:33 UTC
Created attachment 48955 [details]
Reduced filesystem diff

I removed all differences that are obviously not the source of the problem
(i.e. all /opt references, /proc, config files of higher level system services
etc.)
Comment 15 Matthias Hopf 2005-09-06 16:38:08 UTC
Ok, as the initrd was different (and the output suggests that something already
fails in initrd) I tried just copying the initrd from the ext3 installation.
Kernel panic at end of booting from initrd as anticipated, the messages look
exactly the same like with the installed version.

As /boot/grub/stage2 was different, I also copied the working version over the
other, and also did a mkinitrd in the chroot environment. Not that this helped
anything.

I don't see any additional differences that could influence booting, I'm pretty
much lost now. If nobody has any additional ideas, I'll try an installation with
reiser again. Just to make sure.
Comment 16 Matthias Hopf 2005-09-06 16:42:44 UTC
I do get

udev[xxx]: run_program: exec of program '/sbin/udev.mount.sh' failed

during booting of the working installation as well. So the initrd failures do
not seem to influence behavior, it must be something of the base system.

Again, machine is g147, it is up and running SL10.0b4+, non-working installation
in /root2.
Comment 17 Olaf Kirch 2005-09-07 10:08:14 UTC
Did you fsck the reiser partition? Please also try to connect the machine 
to a serial console and capture all boot messages. 
Comment 18 Matthias Hopf 2005-09-07 10:26:15 UTC
fsck told me this is a perfectly healthy reiserfs.
I also just installed once more on a different partition using reiserfs, no
problems this time.

Where do I get a serial crossover cable?
Comment 19 Matthias Hopf 2005-09-07 10:29:13 UTC
Actually this won't work. The laptop doesn't have a serial console.
Comment 20 Olaf Kirch 2005-09-07 10:39:16 UTC
fsck will not do a very thorough check by default. Did you try   
reiserfsck --check?  
 
BTW g147 isn't reachable right now. 
Comment 21 Matthias Hopf 2005-09-07 12:12:13 UTC
Was just configuring after installation. Works again.

Doing the reiserfsck in a screen now. Feel free to attach.
No corruptions found.
Comment 22 Matthias Hopf 2005-09-07 13:47:59 UTC
So if nobody is interested in a post-mortem analysis, I will try to install on
/dev/hda7 again tomorrow, nuking the broken partition. I have to do some more
serious work on this laptop.

In any case, I don't know what went wrong with the installation, I cannot find
any serious differences between the different file systems, so I have no clue
where to continue searching, but given the circumstances I'm pretty frightened
that this could happen for customers as well.
Comment 23 Hubert Mantel 2005-09-07 14:06:26 UTC
*** Bug 115247 has been marked as a duplicate of this bug. ***
Comment 24 Chris L Mason 2005-09-07 14:14:15 UTC
Was the broken install an upgrade or a fresh install? 
Comment 25 Matthias Hopf 2005-09-07 14:38:33 UTC
Fresh install.

The log of bug 115247 looks completely different from this one. I don't think it
is about the same bug.
Comment 26 Olaf Kirch 2005-09-08 07:06:44 UTC
Matthias, can you please save the partition to some other machine so we can
loopback mount it? I am unable to look into this today.

Chris, this could be a file system problem - do you mind having a look?
Comment 27 Matthias Hopf 2005-09-08 11:13:06 UTC
Ok, I copied the system to 'delen', /dev/sda7. The partition is actually much
larger than the filesystem.

I've set up a serial link from delen to 'ivanova', the serial console is running
in a root screen there. Haven't configured grub to use serial, will do soon.

But the system stops booting even earlier than before (maybe a grub
misconfiguration - will check that). Actually it tries to exec
/sbin/udev.input_device.sh, which needs /bin/bash, which is not there in the
initrd. Then it just exits to /bin/sh.

I'll try building a new initrd in a chroot environment for this system.
Comment 28 Matthias Hopf 2005-09-08 12:50:13 UTC
Ok, added ata_piixi, and now the system behaves exactly like on the laptop.
So at least we have a completely reproducable and appearantly hardware and
kernel independend problem.

Grub is also set up for serial console, baud rate is 38400. A root screen with
the console is running, attach with screen -x on ivanova.

The only problem is that SYSRQ (i.e. sending a serial break) doesn't work any
more, if the system is crashed. Hell, even the power button doesn't do any good,
the only possibility to reboot the system is to unplug the power cord of the
live system!

W/o ACPI (failsafe settings) I can switch off the system with the power button,
but SYSRQ doesn't work, and the boot messages change a bit. Though the problem
seems to remain the same.

Attaching boot log files now.
Comment 29 Matthias Hopf 2005-09-08 12:51:04 UTC
Created attachment 49201 [details]
Boot log with Failsafe settings.
Comment 30 Chris L Mason 2005-09-08 13:00:36 UTC
Ok, the partition with the failing filesystem is /dev/sda7 on delen, right? 
 
Could you please boot delen off a different disk so that I can look at /dev/sda7 with a 
running kernel? 
Comment 31 Matthias Hopf 2005-09-08 13:03:32 UTC
Created attachment 49206 [details]
Boot log with standard settings.

The major difference at the end of the log is just due to the different
runlevel selection.
Comment 32 Matthias Hopf 2005-09-08 13:13:30 UTC
Ok, the system is now booted on SL10.0b2, the partition-under-test is mounted on
/os3. Feel free to login and reboot as you like. You will have to call me in
order to reset the machine.

I'll reinstall the laptop now.
Comment 33 Chris L Mason 2005-09-08 13:42:31 UTC
Great, this made things much easier.   
   
The filesystem is not corrupt, but  udev is doing something strange.  Hannes has an idea 
so I've cc'd him.  
Comment 34 Hannes Reinecke 2005-09-08 14:00:09 UTC
Oh well.
mkinitrd refuses to include any udev script which defaults to /bin/bash, as bash
might not be available in the initrd.
Additionally the corresponding udev rules should be commented out in the initrd,
but somehow udev still tries to execute these scripts.

Pls try the following: unpack the initrd (mkdir /tmp/initrd-tmp; cd
/tmp/initrd-tmp; zcat /boot/initrd | cpio -idumv) and attach the contents of
etc/udev/rules.d/50-udev.rules.

To get the real problem solved:

Please create a dummy /sbin/udev.mount.sh program eg

#!/bin/sh
:

re-run mkinitrd and reboot.
The message should then vanish and the real problem can be attacked.
Comment 35 Matthias Hopf 2005-09-08 14:47:51 UTC
Created attachment 49222 [details]
etc/udev/rules.d/50-udev.rules from initrd
Comment 36 Matthias Hopf 2005-09-08 14:53:37 UTC
During mkinitrd:

Shared libs: lddlibc4: cannot read header from '/sbin/udev.mount.sh': No such
file or directory.

Trying a reboot anyway.
Comment 37 Matthias Hopf 2005-09-08 14:54:30 UTC
Created attachment 49226 [details]
Boot log with patched initrd
Comment 38 Chris L Mason 2005-09-09 12:18:18 UTC
From the kernel side, the filesystem isn't corrupt.  This must be an issue with the 
initramfs.  Hannes, any other ideas? 
Comment 39 Chris L Mason 2005-09-09 12:49:14 UTC
On IRC Hannes points out they have made some changes in this area for RC1.  If a new 
install works now I think we can close this. 
Comment 40 Hannes Reinecke 2005-09-09 13:02:28 UTC
You could try booting with init=/bin/bash.

If that works I doubt it's a mkinitrd problem.
Comment 41 Matthias Hopf 2005-09-09 13:24:23 UTC
As I cannot reproduce it even with beta4, 

Ok, I have a even simpler way to reproduce now;

booting with init=/bin/bash
ls / works
mount -o remount,ro /
ls / : Permission denied
echo /* indicates that the filesystem is still there.

Alternatively, a 'mount -o remount,rw /' triggers the bug as well (note that the
filesystem was already mounted r/w). I am still able to write something to the
filesystem with 'echo /* >/tmp/blub'. But no executables are executed any more.
I even tried executing of a binary from another partition that has been mounted
before, with the same error message.

The system is currently in this state, console in root screen on ivanova as usual.
Comment 42 Matthias Hopf 2005-09-09 13:28:50 UTC
First sentence was interrupted;

I have only seen this issue once, but I'm afraid that this could happen for
customers as well. As long as we don't even know what's going on here, we cannot
simply close it.
Comment 43 Matthias Hopf 2005-09-09 13:53:20 UTC
Checked with latest kernel from stable, same effect.
Comment 44 Chris L Mason 2005-09-09 14:00:15 UTC
The console right now seems frozen.  Could you please boot with init=/bin/bash again? 
Comment 45 Matthias Hopf 2005-09-09 14:30:07 UTC
Was just doing another test.

I created a new reiser on /dev/sda5 and copied all files with 'cp -a' to the new
filesystem. I have just booted the new system, and it behaves exactly the same.
I can retry with ext3 as well, but I think this pretty much ruled out a broken
filesystem.

I also get a 'Badness in pci_get_subsys at drivers/pci/search.c:234' when doing
booting with SysRQ-U SysRQ-B plus a stack trace (in screen history log).

I'm now rebooting into /dev/sda7.

Reducing severity, as it only occured once so far.
Comment 46 Chris L Mason 2005-09-09 14:47:52 UTC
ok, this is very strange indeed.   
   
I'm booted on /dev/sda6 (the disk that works).  /dev/sda7 (the bad partition) is mounted  
on /os3.  
  
chroot /os3  
ls   # works 
mount # works 
mount -o rw,remount / # works 
ls # permission denied 
exit 
 
mount -o rw,remount /os3 # works 
ls /os3 # works 
 
Some part of the install on /os3 is corrupted. 
Comment 47 Matthias Hopf 2005-09-09 15:09:01 UTC
Created attachment 49421 [details]
trace output of mount

mount appearantly specifies the wrong flags (e.g. noexec) for remounting. Don't
know yet where the flags come from.

BTW the trace shows 'hda7' which is a leftover from the laptop, the partition
is now sda7. Checking that.
Comment 48 Andreas Gruenbacher 2005-09-09 15:24:36 UTC
/etc/fstab on that filesystem had ``user'' in the root filesystem's mount   
options, so the first remount cleared to exec and other mount flags. We saw   
the results in various settings above. 
 
Matthias is not sure why the flag was in the fstab; maybe this happened by 
accident during the installation. As the bug only triggered once so far, this 
is the most likely explanation.