Bug 117177

Summary: problem booting with acpi
Product: [openSUSE] SUSE LINUX 10.0 Reporter: John Smith <lbjunk>
Component: KernelAssignee: Thomas Renninger <trenn>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: uli.iske
Version: RC 1   
Target Milestone: ---   
Hardware: i686   
OS: All   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: acpidmp output
acpidump - running 2.6.13-8-smp, acpi=ht
screenshot - kernel 2.6.13-8-smp, no further parameters
Here's a Dump, Kernel is tainted because of vmware modules
acpidmp output

Description John Smith 2005-09-15 08:22:45 UTC
I can not boot Suse using the default boot option from the lillo menu. Typing
acpi=off or acpi=oldboot as a boot option lets me boot up without any problems.
Obviously Failsafe mode works too. When I try and boot up normally it freezes. I
have a desktop pc I built myself with a Pentium 4 processor and 1GB Ram on a
fujitsu siemens  d1675 motherboard.
Comment 1 Andreas Mäder 2005-09-30 09:59:13 UTC
Do you use the smp-kernel (hyperthreading) ?
...your problem sounds faamiliar to me.

I have tested two computers:
-on the "old" one, with the default kernel: no problems
-while the newer one (P6@3.0GHz, with smp-kernel) freezes during boot.

The last messages were:
Loading kernel/drivers/acpi/processor.ko
ACPI: CPU0 (power states: C1[C1])
ACPI: CPU0 (power states: C1[C1])               -> shouldn't it be "CPU1" ?

Customized, as well as newer kernels (2.6.13.2-200509...), show the same
behaviour. I think, that the problem is in processor.ko, 'cause when I don't
include it in the initial ramdisk, the boot process continues and the system
freezes later (acpid loads the module anyway).

Now I use "acpi=ht" which enables at least hyperthreading, but acpid doesn't
work and features like power-down the system aren't available...
Comment 2 Thomas Renninger 2005-09-30 11:04:31 UTC
Can you attach acpidmp output, please.
Can you confirm that the last lines you see are (or similar):

pci_link-0186 [07] acpi_pci_link_get_poss: Error evaluating _PRS
ACPI: Power Resource [C258] (off)
ACPI: Power Resource [C259] (off)
ACPI: Power Resource [C25A] (off)
ACPI: Power Resource [C25B] (off)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 17 devices
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.1
PCI: Cannot allocate resource region 8 of bridge 0000:00:1c.1
PCI: Cannot allocate resource region 9 of bridge 0000:00:1c.1
TC classifier action (bugs to netdev@vger.kernel.org cc hadi@cyberus.ca)

Or at least, do you somewhere see this line:
pci_link-0186 [07] acpi_pci_link_get_poss: Error evaluating _PRS
Comment 3 Uli Iske 2005-10-05 11:31:28 UTC
Created attachment 51479 [details]
acpidmp output
Comment 4 Uli Iske 2005-10-05 11:32:57 UTC
Same here, update Suse 9.3 to 10.0 Hardware is a Fujitsu-Siemens Scenic_W B8015
with latest Firmware and BIOS installed.
Comment 5 Andreas Mäder 2005-10-06 11:18:53 UTC
Created attachment 51580 [details]
acpidump - running 2.6.13-8-smp, acpi=ht

Is it a Fujitsu-Siemens specific Problem? I have a
Celsius M420 (Board D1688).
Comment 6 Andreas Mäder 2005-10-06 11:26:55 UTC
Created attachment 51581 [details]
screenshot - kernel 2.6.13-8-smp, no further parameters

Here's the screenshot (sorry for the quality) when
the problem occurs. I could not find any of the outputs
yoou mentioned... unfortunately 'boot.msg' is created later during the boot
process.
Comment 7 Thomas Renninger 2005-10-06 11:56:47 UTC
Could you try to boot with acpi=oldboot.
Then modify /etc/sysconfig/kernel and throw out the processor, thermal and fan
module in the INITRD_MODULES="..." variable. Then invoke mkinitrd.
Also change the ACPI_MODULES variable to
ACPI_MODULES="NONE"
in /etc/sysconfig/powersave/common
and the CPUFREQD_MODULE variable to
CPUFREQD_MODULE="off"
in /etc/sysconfig/powersave/cpufreq
->Reboot
Can you boot now?
Comment 8 Uli Iske 2005-10-06 14:51:49 UTC
Yes, can boot now.
Comment 9 Thomas Renninger 2005-10-06 15:19:51 UTC
Great.
So it is the thermal, fan or processor module that causes the system freeze.
Hmmm, acpidmp does not show obvious errors...

First you should enable sysreq:
echo 1 >/proc/sys/kernel/sysrq and change ENABLE_SYSRQ="no" to
ENABLE_SYSRQ="yes" in /etc/sysconfig/sysctl. Now, if you hit the <SysReq>
(ALT-Print) and <h> keys you should get a message at the end of
/var/log/messages like:
kernel: SysRq : HELP : loglevel0-8 reBoot Crashdump tErm Full kIll saK showMem
Nice powerOff showPc unRaw Sync showTasks Unmount

I expect the processor module to be the bad one...
Could you try to load the modules step by step manually.
Always wait some seconds after loading a module to be sure that it really does
not freeze the machine. Also have a look into /var/log/messages for error
messages after each step.

First try:
modprobe processor nocst=1
sill running? Then try:
rmmod processor; modprobe processor
sill running? Then try:
modprobe thermal
sill running? Then try:
modprobe fan

If the machine freezes hit the <sysreq><t> and try to write down the last
invoked methods (The important parts should look like: {invoked_method+100} ...).

Thanks for your help.
Comment 10 Uli Iske 2005-10-07 05:27:00 UTC
Looks like it's the processor module, after trying modprobe processor nocst=1
the machine totally freezes, no sysreq possible. Tried in an xterminal and also
on tty1.
Comment 11 Thomas Renninger 2005-10-07 08:22:57 UTC
That's bad. You could either:

  a) live without the processor module. Which means you can't load the thermal 
     module also, which manages thermal critical shutdowns ...

  b) need to recompile the kernel and we can try to find why it hangs and 
     possibly fix it.

You probably need a bit time for b, but be sure that I will help you as much as
I can as I am very interessted why the processor module freezes the machine.

To recompile your kernel (and enable ACPI debug output) you need to install the
kernel-source package. You also have to install the ncurses, ncurses-devel and
gcc packages.
Then do:
"cd /usr/src/linux" and "cp arch/i386/defconfig.smp .config" and "make
menuconfig" -> choose the Powermanagement and then the ACPI options.
At this point disable "Print ACPI errors and warnings" and enable "Debug
Statements". Go out of the menu by choosing exit that often until you are asked
to save the configuration.
Now call "make;make install;make modules_install;mkinitrd" to compile and
install the kernel (This will take about 1 hour depending on the power of your
machine).
Tell me if you stuck or have any problems.

Let me know whether you like to help at this point and I tell how to enable
several ACPI debug levels to track down the freeze. Do you have ICQ/IRC or some
chat program, maybe it's easier to communicate that way?

Thanks a lot for your help.
Comment 12 Thomas Renninger 2005-11-14 14:09:04 UTC
What is the status here?
Maybe you could load the processor module with max_cstate=1?
Comment 13 Uli Iske 2005-11-15 06:49:26 UTC
Created attachment 57329 [details]
Here's a Dump, Kernel is tainted because of vmware modules
Comment 14 Uli Iske 2005-11-15 06:50:59 UTC
max_cstate=1 isn't working.
Comment 15 Thomas Renninger 2005-12-22 10:34:04 UTC
Sorry for the long delay...
Could you please provide acpidmp, maybe I have a patch that could solve this issue.
Comment 16 Uli Iske 2005-12-22 10:50:16 UTC
Created attachment 61689 [details]
acpidmp output

Here's the new acpidmp as requested.
Comment 17 Thomas Renninger 2005-12-22 13:49:18 UTC
OK, I expect this is it.
However, I am not so sure anymore whether the patch is safe.
There have been some bugreports (no C-states anymore at all) that might have been caused by this patch. I'd like to wait a bit whether this patch really caused regressions on other machines.
You find the patch here:
http://bugzilla.kernel.org/show_bug.cgi?id=5452
in comment #13.
If you are used to compile a kernel, it would be interessting to know whether it really helps. If not here are some hints:

1) Install kernel-source.rpm
2) Download the patch and patch the source:
   cd /usr/src/linux
   patch -F3 -p1 -i /tmp/downloaded_patch
   (if this fails you can try to replace the lines by hand).
3) Copy the smp config file:
   cp arch/i386/defconfig.smp .config
4) Compile (may take an hour) and install the kernel
   make;make install;make modules_install
5) Create an initrd
   mkinitrd
6) Adjust boot loader
   Could already be OK.
   Check ls -al /boot for the new kernel and the vmlinuz link
   and compare it with your bootloader configs (/boot/grub/menu.lst).
7) Reboot.

If this is too much work, stay tuned, if it is in mainline for some more days, I think it could be added, still I better ask for review and ACK for this one.
Comment 18 Thomas Renninger 2005-12-22 20:12:49 UTC
OK, the bug report came out to not come from this patch.
Rereading it, I am sure it is safe. 
It only comes in if it is tried to add another CPU device to the same acpi id and that must not happen.
-> Patch is added to 10.0 and it will be in the next YOU update kernel.

Until then you may want to use a kernel from here:
ftp.suse.com/pub/people/mantel/kotd/10.0-i386/SL100_BRANCH/kernel-smp-2.6.13*.rpm
It may take a while until it pops up there.
This kernel will (after some more fixes are added) get the next YOU update kernel, so it is safe to use it on your 10.0 system.
Would be nice if you could tell me if it works for you.
Comment 19 Thomas Renninger 2005-12-22 20:23:09 UTC
*** Bug 117452 has been marked as a duplicate of this bug. ***
Comment 20 Uli Iske 2005-12-23 07:32:21 UTC
Works for me. Sorry for that, but there is a space missing after "wrong ACPI id "

ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "BIOS reporting wrong ACPI id"

Thanks, I think this could be closed.