|
Bugzilla – Full Text Bug Listing |
| Summary: | HAL doesn't always start properly | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 10.2 | Reporter: | Magnus Boman <mboman> |
| Component: | Basesystem | Assignee: | Danny Al-Gaaf <dalgaaf> |
| Status: | VERIFIED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | andreas.hanke, fred.blaise, markus.kriewald, nix, wittemar |
| Version: | Beta 1 plus | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
boot.msg
messages bootchart graph of a boot process where hald disappeared bootchart graph of a boot process where hald survived hal_output.txt |
||
|
Description
Magnus Boman
2006-11-02 21:35:58 UTC
Created attachment 103606 [details]
boot.msg
Created attachment 103607 [details]
messages
please change in /etc/init.d/haldaemon this line: HALDAEMON_PARA="--daemon=yes --retain-privileges"; to: HALDAEMON_PARA="--daemon=yes --retain-privileges --verbose=yes --use-syslog"; and attach the part of /var/log/messages since boot if this happen again. I ran into this as well, seems to be a race as it can not be reproduced reliably. However, D-Bus was always working fine, just HAL did not run. Adjusting summary. -> Beta1 Plus Just for the log: The change as proposed by Danny (comment #4) makes it impossible to reproduce the problem at any time for me (HAL always gets started properly). (In reply to comment #8) > The change as proposed by Danny (comment #4) makes it impossible to reproduce > the problem at any time for me (HAL always gets started properly). This sounds very familiar, it's the same in bug 218184: hald doesn't start properly, but as soon as the debug parameters are added, it does. Adding myself to CC (for a reason, please don't remove me again, thanks). *** Bug 218184 has been marked as a duplicate of this bug. *** I have been seeing this bug also for approximately the last month. I run the latest Factory updated on a daily basis with smart. I see the problem on about 20% of boots, however it is MUCH more likely to happen if I have just done a "smart upgrade" Created attachment 104740 [details]
bootchart graph of a boot process where hald disappeared
Created attachment 104741 [details]
bootchart graph of a boot process where hald survived
Andreas, thanks a lot for the graphs -- that's a great idea to narrow down the cause of this bug. Did anyone run into this with Beta2? So far, I did not run into this issue on my systems running Beta2. I am still seeing this problem with latest Factory (Is it in sync with Beta2)? # date Sat Nov 11 21:05:42 EET 2006 # smart update;smart upgrade -y Loading cache... Updating cache... ################################################################### [100%] Fetching information for 'SUSE Factory'... -> ftp://mirrors.kernel.org/opensuse/distribution/SL-OSS-factory/inst-source/media.1/media media ################################################################### [ 100%] Updating cache... ################################################################### [100%] Channels have no new packages. Saving cache... Loading cache... Updating cache... ################################################################### [100%] Computing transaction... No interesting upgrades available. Peter, can you please try whether HAL survives if you delay the start? You can test that by replacing startproc -p $HALDAEMON_PID $HALDAEMON_BIN $HALDAEMON_PARA with sleep 5 && tartproc -p $HALDAEMON_PID $HALDAEMON_BIN $HALDAEMON_PARA in '/etc/init.d/haldaemon'. Thanks! (In reply to comment #16) > sleep 5 && tartproc -p $HALDAEMON_PID $HALDAEMON_BIN $HALDAEMON_PARA Of course, this should read sleep 5 && startproc -p $HALDAEMON_PID $HALDAEMON_BIN $HALDAEMON_PARA Knowing that the desired way to debug hald is --daemon=yes --verbose=yes --use-syslog, I have ignored this because it makes the problem irreproducible. Instead I have changed the startproc invocation to be as follows: HALDAEMON_PARA="--daemon=no" startproc -l /tmp/hal_output.txt -p $HALDAEMON_PID $HALDAEMON_BIN $HALDAEMON_PARA You can find my /tmp/hal_output.txt attached. Maybe it's at least a bit useful. Created attachment 104803 [details]
hal_output.txt
** ERROR **: file blockdev.c: line 835 (hotplug_event_begin_add_blockdev): assertion failed: (d_it != NULL) aborting... I also see this error in /var/log/messages when it doesnt work. I have made the change requested in Comment #16 As the problem is difficult to reproduce reliably, I can't tell if it made any difference. I will report it it reoccurs.. (Note. The problem most reliably occurs on the first and second reboot after a "smart upgrade".. Maybe something starts up a bit slower the first few times after it has been upgraded???) Just a quick note: In my opinion the Severity of this bug should be upgraded. It causes me major annoyance, but would send a non-expert linux user running for another platform if it affects them... As it is I can't figure out a reliable way to stop it happening or to reproduce it... When it happens, its possible to reboot 3 or 4 times without fixing it at which point I usually revert to a manual "ifconfig up" on an ethernet cable. One or 2 reboots later it usually fixes itself and I return to using wifi as normal... Forget about smart, it has absolutely and definitely nothing to do with this and just causes confusion here. Only the engineers should touch the "Severity" field. Be patient, I'm very confident that this report will be handled properly nevertheless. I think now it's time to wait and see whether the information about the failed assertion in file blockdev.c: line 835 goes into the right direction. hm ... the g_assert() call is IMO really strange and the only case in the complete code where the complete daemon die because a device could not be found. And somehow the code look not really 'secure/save', because the code only try to get the parent device from the gdl and not from tdl. Could be a littlebit racy. I take a look at this. Danny, we should really make HAL to issue such warnings using syslog. It would have spared us a lot of time. I reported the bug 218184 (Comment #10) Here it seems that hald isn't crashing anymore since I upgraded to Beta2 with smart... Could you check if this already happen with the package from http://beta.suse.com/private/dkukawka/hal/testpackages/hal-0.5.8_git20061106-6/ ? Danny, do you mean me? I've installed hal-0.5.8_git20061106-4.x86_64 and it's working now. Marcel, you wrote in comment 26 that it worked for you even with stock Beta2 packages, but not for me. So your information from comment 30 doesn't really apply, sorry. I'm testing hal-0.5.8_git20061106-5.i586.rpm right now on the very same machine where stock Beta2 had the problem. So far it looks good, but I have rebooted only 5 times and would like to test it more. sry for my english... I justed wanted to know if I should try hal-0.5.8_git20061106-5.x86_64.rpm, even if it is working since update to beta2 with hal-0.5.8_git20061106-4.x86_64... I hate english ;-) mboman also could no longer reproduce the bug. If the bug occours anymore, open the bug. I submitted a new package to STABLE. I have tested the test package hal-0.5.8_git20061106-5.i586.rpm by rebooting the system 30 times after installing it. There was not a single failure. For verification, I downgraded hal to the stock Beta2 package and then it failed again on the first attempt already. So assuming that the new hal submission has the patch from the test package in it, the bug is fixed. I have also upgraded to your package and after 10 reboots have as yet been unable to reproduce a failure.. Looks good.. *** Bug 220912 has been marked as a duplicate of this bug. *** |