Bug 117560 - Final Fails to boot on Gateway M680XL
Summary: Final Fails to boot on Gateway M680XL
Status: RESOLVED FIXED
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Final
Hardware: x86 SUSE Other
: P2 - High : Major
Target Milestone: ---
Assignee: Thomas Renninger
QA Contact: Klaus Kämpf
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-16 21:17 UTC by William Beebe
Modified: 2005-10-16 14:46 UTC (History)
1 user (show)

See Also:
Found By: Customer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Hardware information log file. (211.38 KB, text/plain)
2005-09-16 21:25 UTC, William Beebe
Details
Output of acpidmp from system booted with ACPI=off (118.67 KB, text/plain)
2005-10-11 16:08 UTC, William Beebe
Details
Output of dmesg from system booted with ACPI=off (17.06 KB, text/plain)
2005-10-11 16:09 UTC, William Beebe
Details
Latest dmesg with pci=noacpi (46.29 KB, text/plain)
2005-10-13 02:38 UTC, William Beebe
Details

Note You need to log in before you can comment on or make changes to this bug.
Description William Beebe 2005-09-16 21:17:49 UTC
System currently runs SuSE 10 Beta 3. CDROM images were downloaded and burned
onto media. All five CDROMs pass boot testing on other machines. When CDROM 1 is
allowed to boot on the Gateway, there is a brief flash in *text* mode with the
message "Could not find SuSE Linux installation", then the screen goes
completely black.

If RC1's CD1 is booted in safe mode, the standard graphical installer is
displayed. However, when the graphical installer is ready to do a clean install
and the clean install button is pressed, the installation process locks up.

In both cases there is no way to check alternate screens for any kind of
messages. The only way to get the machine back in both cases is to disconnect
the power adapter and remove the battery pack.
Comment 1 William Beebe 2005-09-16 21:25:53 UTC
Created attachment 50208 [details]
Hardware information log file.

Saved via YaST2 Hardware Information module (YaST Control Center > Hardware >
Hardware Information).
Comment 2 Jiří Suchomel 2005-09-19 07:39:29 UTC
"Could not find SuSE Linux installation" - looks like YaST didn't even started -
Steffen, any idea?
Comment 3 Steffen Winterfeldt 2005-09-19 09:06:11 UTC
Sounds like a hardware or kernel problem to me.  
 
BTW, at least in the first case you can switch to console 4 _before_ it 
locks up and watch the kernel messages. 
  
Assigning to our notebook people.  
Comment 4 Christian Zoz 2005-09-27 13:48:07 UTC
Huh, this is hard to debug remotely.

Please switch to console 4 early and watch the messages. What are the last lines?
Do that in bothe cases normal and safe mode. (Can you attach a serial console?)

Did it boot with Beta3 CDs?

What do you mean with "clean install button"?
Comment 5 William Beebe 2005-10-07 01:29:45 UTC
I finally got the final DVD version to fail, and I was able to look at screen
four to find the following errors:

ACPI-0325: *** Warning: No handler for Region [ERAM] (dfff6800) [EmbeddedControl]
ACPI-0287: *** Error: Region EmbeddedControl(3) has no handler
ACPI-1174: *** Method execution failed [\_PR_.CPU0._PPC] (Node dffea480),
AE_NOT_EXIST
ACPI-0132: *** Warning: Error evaluating _PPC

At this point SuSE 10 is unable to access the DVD drive. I believe this is the
same problem that plagued SuSE 10 RC1 with regular CDROM, but I can't be sure
because it locked up sooner and more completely.
Comment 6 Klaus Kämpf 2005-10-07 07:15:03 UTC
Did you try the "failsafe" boot option ?! 
Comment 7 Olaf Kirch 2005-10-07 07:25:25 UTC
Cc'ing the acpi wizard... 
Comment 8 Thomas Renninger 2005-10-07 08:06:12 UTC
Who are you talking about?!?

Hopefully the machine is booting with acpi=off?
Also try to boot pci=noacpi.
The DSDT exported by BIOS is probably totally broken (warning messages). However
these messages above should not be fatal. I expect something is also going wrong
when ACPI processes devices and tries to assign ressources (just a guess), so
pci=noacpi could help.

You should first try to install with pci=noacpi.
If this works attach dmesg and acpidmp, please.

If you can boot/install only with acpi=off, then:
Edit /etc/sysconfig/kernel and remove the modules processor thermal fan from
INITRD_MODULES="..." variable. Now call mkinitrd.
Also set ACPI_MODULES="" to ACPI_MODULES="NONE" in /etc/sysconfig/powersave/common.
Try to reboot without acpi=off, but with pci=noacpi (maybe it's even possible
without?) and attach dmesg and acpidmp.

If both is not working/installing we have a problem (probably should try to get
the system up with an older kernel then, was there already a Linux running on
this machine?) ...
Comment 9 William Beebe 2005-10-07 10:44:13 UTC
In answer to Klaus Kaempf: Booting in failsafe did allow the system to boot into
the graphical installer but resulted in a complete lockup of the system after
packages were selected and the installation button was clicked.

Yes, SuSE 10 Beta 3 is currently running on this machine. The Beta 3
installation went very smooth with no problems at all. And I have built and run
kernels up to 2.6.13.2 on the machine as well under Beta 3. 
Comment 10 Thomas Renninger 2005-10-07 11:19:38 UTC
So it's possbily ec_burst mode, could you try with ec_burst=0 boot param.
Can you attach whole dmesg and acpidmp output of booting distribution, please.
Comment 11 William Beebe 2005-10-11 16:06:47 UTC
While on holiday Monday, October 11 (Columbus Day, the American celebration of a
European navigation error) I managed to install SuSE 10 final with ACPI=off.
After installation the system booted, and with acpi=off as a boot option in
grub. Enabling ACPI causes the notebook to fail. I've attached the output from
dmesg and acpidmp.
Comment 12 William Beebe 2005-10-11 16:08:17 UTC
Created attachment 53649 [details]
Output of acpidmp from system booted with ACPI=off
Comment 13 William Beebe 2005-10-11 16:09:14 UTC
Created attachment 53650 [details]
Output of dmesg from system booted with ACPI=off
Comment 14 Thomas Renninger 2005-10-11 16:21:46 UTC
pci=noacpi does not work instead of acpi=off?

If you can boot with pci=noacpi this kernel (no idea whether it is already
available as YOU update) you should also be able to boot without pci=noacpi with
this kernel:
ftp.suse.com/pub/people/trenn/10_0_kernel_reboot_fix/kernel-default-2.6.13-2.i586.rpm

If you cannot boot with acpi=noacpi, please stay tuned,  I try to look into
acpidump output ASAP.
Comment 15 William Beebe 2005-10-11 17:55:18 UTC
Yes, the systen with the stock kernel boots with pci=noacpi. I just rebooted the
system, and the kernel is now behaving much better with regards to power
management. The notebook fan is now off (it was on constantly before the reboot)
and only comes on when I'm pushing the notebook.

Thank you.

kernel version:
Linux version 2.6.13-15-default (geeko@buildhost) (gcc version 4.0.2 20050901
(prerelease) (SUSE Linux)) #1 Tue Sep 13 14:56:15 UTC 2005
Comment 16 Thomas Renninger 2005-10-12 05:45:14 UTC
So with the next YOU update kernel or the one I stated above, you possibly could
also boot without pci=noacpi (what should not make much a difference in
functionality for you). It might be needed if some device gets a wrong irq
assigned, but this shouldn't be the case.

Do you still get these ACPI errors you stated in #5?
I think no, as if your system runs cooler now, you probably have cpufreq running.
And error #5 should prevent the system of cpufreq running correctly.
If still see these errors, please provide dmesg with pci=noacpi again.
If no, could we close this one?
Comment 17 William Beebe 2005-10-13 02:37:38 UTC
I can't tell if I still get the ACPI messages in comment #5. I am going to
attach the latest dmesg output for you to look at. And yes, it looks like the
thermal stuff is running. Since there seems to be nothing else, I'll let you
decide after reviewing the latest dmesg if this issue needs to be closed.
Comment 18 William Beebe 2005-10-13 02:38:49 UTC
Created attachment 53906 [details]
Latest dmesg with pci=noacpi
Comment 19 Christian Zoz 2005-10-13 08:35:34 UTC
This bug looks like it belongs to the acpi wizard himself. Have fun ;)
Comment 20 Thomas Renninger 2005-10-15 09:27:39 UTC
This is really strange..., you should still get the ACPI errors from comment #5
if the powersave daemon tries to activate cpufreq. Have enabled cpufreq
(speedstep/powernow) in BIOS now or deactivated cpufreq in
/etc/sysconfig/powersave/cpufreq? That would make sense.

If you encounter IRQ problems with a device you should try as mentioned in dmesg:
PCI: No IRQ known for interrupt pin A of device 0000:00:01.0. Please try using
pci=biosirq.

With the next YOU update kernel you possibly are also able to boot without
acpi=noacpi, then the above IRQ message should also vanish in dmesg.

Can we close this one?
Comment 21 William Beebe 2005-10-16 14:37:59 UTC
Yes, close it. If there are still issues after the next YOU kernel update then I will open another bug report. Right now the system is working fine. It's my primary development system and I do not have too much time to reboot and retest in the middle of an ongoing effort. Everything is working right now: video (1680x1050 hardware accelerated via the ATI driver), wired and wireless, full USB, and the CPU. I appreciate everyone's help and comments, but I have a looming deadline. I'll be free again in November to look at this final issue. But I am not going to mess with a working stable system at this particular moment.
Comment 22 Thomas Renninger 2005-10-16 14:46:29 UTC
Thanks for your help.