|
Bugzilla – Full Text Bug Listing |
| Summary: | problem booting with acpi | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE LINUX 10.0 | Reporter: | John Smith <lbjunk> |
| Component: | Kernel | Assignee: | Thomas Renninger <trenn> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | uli.iske |
| Version: | RC 1 | ||
| Target Milestone: | --- | ||
| Hardware: | i686 | ||
| OS: | All | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
acpidmp output
acpidump - running 2.6.13-8-smp, acpi=ht screenshot - kernel 2.6.13-8-smp, no further parameters Here's a Dump, Kernel is tainted because of vmware modules acpidmp output |
||
|
Description
John Smith
2005-09-15 08:22:45 UTC
Do you use the smp-kernel (hyperthreading) ? ...your problem sounds faamiliar to me. I have tested two computers: -on the "old" one, with the default kernel: no problems -while the newer one (P6@3.0GHz, with smp-kernel) freezes during boot. The last messages were: Loading kernel/drivers/acpi/processor.ko ACPI: CPU0 (power states: C1[C1]) ACPI: CPU0 (power states: C1[C1]) -> shouldn't it be "CPU1" ? Customized, as well as newer kernels (2.6.13.2-200509...), show the same behaviour. I think, that the problem is in processor.ko, 'cause when I don't include it in the initial ramdisk, the boot process continues and the system freezes later (acpid loads the module anyway). Now I use "acpi=ht" which enables at least hyperthreading, but acpid doesn't work and features like power-down the system aren't available... Can you attach acpidmp output, please. Can you confirm that the last lines you see are (or similar): pci_link-0186 [07] acpi_pci_link_get_poss: Error evaluating _PRS ACPI: Power Resource [C258] (off) ACPI: Power Resource [C259] (off) ACPI: Power Resource [C25A] (off) ACPI: Power Resource [C25B] (off) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 17 devices PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.1 PCI: Cannot allocate resource region 8 of bridge 0000:00:1c.1 PCI: Cannot allocate resource region 9 of bridge 0000:00:1c.1 TC classifier action (bugs to netdev@vger.kernel.org cc hadi@cyberus.ca) Or at least, do you somewhere see this line: pci_link-0186 [07] acpi_pci_link_get_poss: Error evaluating _PRS Created attachment 51479 [details]
acpidmp output
Same here, update Suse 9.3 to 10.0 Hardware is a Fujitsu-Siemens Scenic_W B8015 with latest Firmware and BIOS installed. Created attachment 51580 [details]
acpidump - running 2.6.13-8-smp, acpi=ht
Is it a Fujitsu-Siemens specific Problem? I have a
Celsius M420 (Board D1688).
Created attachment 51581 [details]
screenshot - kernel 2.6.13-8-smp, no further parameters
Here's the screenshot (sorry for the quality) when
the problem occurs. I could not find any of the outputs
yoou mentioned... unfortunately 'boot.msg' is created later during the boot
process.
Could you try to boot with acpi=oldboot. Then modify /etc/sysconfig/kernel and throw out the processor, thermal and fan module in the INITRD_MODULES="..." variable. Then invoke mkinitrd. Also change the ACPI_MODULES variable to ACPI_MODULES="NONE" in /etc/sysconfig/powersave/common and the CPUFREQD_MODULE variable to CPUFREQD_MODULE="off" in /etc/sysconfig/powersave/cpufreq ->Reboot Can you boot now? Yes, can boot now. Great.
So it is the thermal, fan or processor module that causes the system freeze.
Hmmm, acpidmp does not show obvious errors...
First you should enable sysreq:
echo 1 >/proc/sys/kernel/sysrq and change ENABLE_SYSRQ="no" to
ENABLE_SYSRQ="yes" in /etc/sysconfig/sysctl. Now, if you hit the <SysReq>
(ALT-Print) and <h> keys you should get a message at the end of
/var/log/messages like:
kernel: SysRq : HELP : loglevel0-8 reBoot Crashdump tErm Full kIll saK showMem
Nice powerOff showPc unRaw Sync showTasks Unmount
I expect the processor module to be the bad one...
Could you try to load the modules step by step manually.
Always wait some seconds after loading a module to be sure that it really does
not freeze the machine. Also have a look into /var/log/messages for error
messages after each step.
First try:
modprobe processor nocst=1
sill running? Then try:
rmmod processor; modprobe processor
sill running? Then try:
modprobe thermal
sill running? Then try:
modprobe fan
If the machine freezes hit the <sysreq><t> and try to write down the last
invoked methods (The important parts should look like: {invoked_method+100} ...).
Thanks for your help.
Looks like it's the processor module, after trying modprobe processor nocst=1 the machine totally freezes, no sysreq possible. Tried in an xterminal and also on tty1. That's bad. You could either:
a) live without the processor module. Which means you can't load the thermal
module also, which manages thermal critical shutdowns ...
b) need to recompile the kernel and we can try to find why it hangs and
possibly fix it.
You probably need a bit time for b, but be sure that I will help you as much as
I can as I am very interessted why the processor module freezes the machine.
To recompile your kernel (and enable ACPI debug output) you need to install the
kernel-source package. You also have to install the ncurses, ncurses-devel and
gcc packages.
Then do:
"cd /usr/src/linux" and "cp arch/i386/defconfig.smp .config" and "make
menuconfig" -> choose the Powermanagement and then the ACPI options.
At this point disable "Print ACPI errors and warnings" and enable "Debug
Statements". Go out of the menu by choosing exit that often until you are asked
to save the configuration.
Now call "make;make install;make modules_install;mkinitrd" to compile and
install the kernel (This will take about 1 hour depending on the power of your
machine).
Tell me if you stuck or have any problems.
Let me know whether you like to help at this point and I tell how to enable
several ACPI debug levels to track down the freeze. Do you have ICQ/IRC or some
chat program, maybe it's easier to communicate that way?
Thanks a lot for your help.
What is the status here? Maybe you could load the processor module with max_cstate=1? Created attachment 57329 [details]
Here's a Dump, Kernel is tainted because of vmware modules
max_cstate=1 isn't working. Sorry for the long delay... Could you please provide acpidmp, maybe I have a patch that could solve this issue. Created attachment 61689 [details]
acpidmp output
Here's the new acpidmp as requested.
OK, I expect this is it. However, I am not so sure anymore whether the patch is safe. There have been some bugreports (no C-states anymore at all) that might have been caused by this patch. I'd like to wait a bit whether this patch really caused regressions on other machines. You find the patch here: http://bugzilla.kernel.org/show_bug.cgi?id=5452 in comment #13. If you are used to compile a kernel, it would be interessting to know whether it really helps. If not here are some hints: 1) Install kernel-source.rpm 2) Download the patch and patch the source: cd /usr/src/linux patch -F3 -p1 -i /tmp/downloaded_patch (if this fails you can try to replace the lines by hand). 3) Copy the smp config file: cp arch/i386/defconfig.smp .config 4) Compile (may take an hour) and install the kernel make;make install;make modules_install 5) Create an initrd mkinitrd 6) Adjust boot loader Could already be OK. Check ls -al /boot for the new kernel and the vmlinuz link and compare it with your bootloader configs (/boot/grub/menu.lst). 7) Reboot. If this is too much work, stay tuned, if it is in mainline for some more days, I think it could be added, still I better ask for review and ACK for this one. OK, the bug report came out to not come from this patch. Rereading it, I am sure it is safe. It only comes in if it is tried to add another CPU device to the same acpi id and that must not happen. -> Patch is added to 10.0 and it will be in the next YOU update kernel. Until then you may want to use a kernel from here: ftp.suse.com/pub/people/mantel/kotd/10.0-i386/SL100_BRANCH/kernel-smp-2.6.13*.rpm It may take a while until it pops up there. This kernel will (after some more fixes are added) get the next YOU update kernel, so it is safe to use it on your 10.0 system. Would be nice if you could tell me if it works for you. *** Bug 117452 has been marked as a duplicate of this bug. *** Works for me. Sorry for that, but there is a space missing after "wrong ACPI id " ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "BIOS reporting wrong ACPI id" Thanks, I think this could be closed. |