|
Bugzilla – Full Text Bug Listing |
| Summary: | hdd goes to sleep too often - maybe damages disk | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 12.3 | Reporter: | kolA flash <kolAflash> |
| Component: | Basesystem | Assignee: | E-mail List <bnc-team-screening> |
| Status: | RESOLVED WORKSFORME | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P5 - None | CC: | auxsvr, duge, forgotten_5bOUleMVRM, forgotten_shd9M8nrvK, jdelvare, rmilasan |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 12.3 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
99-apm-level.rules
99-apm-level.rules 99-apm-level.rules Script to set APM_level after s2ram Comment 43 iotop output as textfile iotop output: iotop_2013-06-18_dirty_expire_centisecs-500_APM-level-128.txt Those are my current bios settings. hdparm related files from kubuntu-14.04.1-desktop-amd64 |
||
P.S. We should have a look at external harddisk's (usb) too. I didn't tested those until now, but maybe those got the same problem! P.S. The harddisk's in my desktop pc are: WDC WD20EARS-00MVWB0 Caviar Green 2TB Maybe they got too sleep so often, because they are "green"??? Tested with a usb-harddisk. ================== /sbin/hdparm -B /dev/sdb /dev/sdb: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 APM_level = not supported ================== The "Load_Cycle_Count" is not rising, except when I switch off/on the harddisk. I'm not sure, but I think if "hdparm -B" works depends on the usb-to-sata controller. This seems to be a nice way, to do this for all disks at detection time: http://forums.opensuse.org/english/get-technical-help-here/hardware/482678-hdparm-3.html /etc/udev/rules.d/99-sda-apm.rules ================== KERNEL=="sda", ACTION=="add", SUBSYSTEM=="block", RUN+="/usr/sbin/hdparm -B 192 $devnode" ================== /etc/pm/sleep.d/99sda-apm ================== #!/bin/bash . $PM_UTILS_LIBDIR/functions case "$1" in thaw|resume) /usr/sbin/hdparm -B 192 /dev/sda ;; *) ;; esac exit 0 ================== Looks like we can forget about those 3.5 inch "WD20EARS-00MVWB0 Caviar Green" disks in my desktop-pc. http://www.howtoeverything.net/linux/hardware/why-some-hard-disks-wont-spin-down-hdparm ==== quote If you own a "green drive" that comes with built-in power saving features, you might also see "APM_level=not supported" here. In that case you can't set the APM_level manually, but you can still change the spin-down time by using e.g. "sudo hdparm -S180 /dev/sda" to spin down sda after 15 minutes. ==== I also tested to put them into standby by: hdparm -y /dev/sdX This makes a clear sound which I don't get during normal usage while "Load_Cycle_Count" counts up. So at least the disks are not completely spinning down while "Load_Cycle_Count" rises (but maybe parking the heads). So setting the spin-down time using "hdparm -S" also won't help at least in my case. No questions? If you need any more information, please tell me! I think this is a critical bug, but I don't have the possibility to put any changes to openSUSE packages. And I'm also a single person who doesn't have so much capabilities to test the behavior I've seen and the ideas I had to fix it. But I've seen the bug on 4 notebooks and I think it's a "critical" bug because it may break users hardware and deletes important user-data by it. So please ether tell me which more information you need or at least that you're working on it. So at least I have a: "Thanks for the report and we're working on it." Or tell me that I reported totally bulls**t and the priority for this shouldn't be critical but something like "maybe nice to have". Thanks colAflash I encountered the same problem with a Lenovo X121e, fast increasing load/unload cycles due to frequent head parks. This seems to be a common behavior of laptop hard disks (and probably some desktop hard disks, too). If I remember right the data sheet specifies 600,000 load/unload cycles resulting in a life time of 1-2 years. I use package laptop-mode-tools which allows to specify more suitable settings in the configuration file, even separate for battery and AC operation. The OS is imho not to blame because the timeout is set up by the hard disk itself (or probably BIOS), but it would be a good idea though if the user received a warning if a problematic disk is detected during installation. On windows my notebook harddisk ( HTS723232A7A364 ) doesn't go to sleep so often. This is the data-sheet of my harddisk. http://www.hgst.com/tech/techlib.nsf/techdocs/5781663792A88E8B8625772F0082E860/$file/TS_Z7K320_DS_final.pdf It tells something about 600.000 load/unload cycles but I think this is more like a maximum after which the disk is no more reliable. Thanks for the report, I'll look into it. Please check whether you have /usr/lib/pm-utils/power.d/harddrive. It should not be there any more (bnc#663067), but just to be sure. Also, regarding those "green" hdds, I found an interesting bit in hdparm manpage:
-J Get/set the Western Digital (WD) Green Drive's "idle3" timeout value. This timeout controls how often the drive parks its heads and enters a low power consumption state.
The factory default is eight (8) seconds, which is a very poor choice for use with Linux. Leaving it at the default will result in hundreds of thousands of head load/unload cycles in a very short period of time.
So this default should be changed in openSUSE out of the box, but I don't know where and how yet.
I don't have /usr/lib/pm-utils/power.d/harddrive on my notebook and on my pc. What should it do or did it do, until which version of openSUSE? I don't really understand why the hdparm manpage says: ...is a very poor choice for use with Linux. Isn't it a poor choice for BSD or Windows too? Does Windows change this value? Sometimes I boot Windows 7 on that pc, but the value is still at "8" (got the value via "hdparm -J"). Nevertheless, the manpage says it's not safe to use "-J". So that's maybe why openSUSE shouldn't change this value. Windows also doesn't have a mechanism for that, right? What about my notebook disk? The APM_level value is adjusted by openSUSE and on Windows the harddisks doesn't go to sleep so often. So that looks like our fault and should be much more easy to fix! http://idle3-tools.sourceforge.net/ This linux-tool is also about setting the idle3 value for Western Digital disks. Hi, (In reply to comment #15) > I don't have > /usr/lib/pm-utils/power.d/harddrive > on my notebook and on my pc. What should it do or did it do, until which > version of openSUSE? It was removed in 11.4, and it had been used for setting different hdparm parameters when on battery/AC - more dangerous than useful. > I don't really understand why the hdparm manpage says: ...is a very poor > choice for use with Linux. > Isn't it a poor choice for BSD or Windows too? Does Windows change this value? > Sometimes I boot Windows 7 on that pc, but the value is still at "8" (got the > value via "hdparm -J"). I think that window$ filesystems don't access the disk as often as linux ones (for example writing journal by ext3/4 wakes the disk every few seconds, I guess), so they don't wake it so soon after it is put to sleep. I don't know what to think apart from that. I'll add Robert to CC, I heard he had some experience with this. Robert, what You think we should do about this? Hi, sorry for the delay, but I lost the bug for a bit. Anyway, not sure eider what to say, I guess the reporter is right and this should be fixed (so we wont brake disks), but this must be done by something. Who sets this value? Is this from pm-utils or what? Is this only openSUSE or any distro out there. If not, how the other distros fixed this? Until now, I only saw whats going on, but I didn't get from this bug, who/what does this? I meaning setting the value. Is this from the kernel based on the disk/bios? BTW, forgot to add. Are we sure we don't have the same issue in 12.2/12.1? > Who sets this value? Is this from pm-utils or what? Is this only openSUSE or > any distro out there. If not, how the other distros fixed this? > This has nothing to do with pm-utils. The manufacturer sets this value upon making the drive. I have found nothing about this being fixed in other distros apart from this ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/607560 it says that journal had been written too often and that it's been fixed in kernel 2.6.something. Guide to increase the head parking timeout for WD drives: http://www.storagereview.com/how_to_stop_excessive_load_cycles_on_the_western_digital_2tb_caviar_green_wd20ears_with_wdidle3 (Requires a dos tool provided by WD, wdidle. Hdparm does the same but its manpage says it's beter to use the dos tool.) Btw maybe someone from SUSE could talk to WD and hdparm upstream guy and arrange that WD helps with making this tweak possible in linux too? But this happened on some laptops too, manufacturers tend to produce HDDs with aggressive head parking times, WD Caviar Green is not the only one. Ideas for possible solutions: 1. write an ugly hack-script (probably a systemd unit?) that reads HDD head parking time and sets it to something reasonable if it's too short. Has to be run during boot and probably resume? Seems like something we should avoid. 2. Fix all the stuff that accesses the disk unnecessarily or too often, like journal, syslog, etc. I don't know if this is possible at all? Please assign it to someone else. All I do is maintain hdparm and there is nothing more I can do apart from gathering this information and throwing in some ideas, thanks. > Btw maybe someone from SUSE could talk to WD and hdparm upstream guy and > arrange that WD helps with making this tweak possible in linux too? Oh I forgot that Moritz mentioned idle3tools before ( http://idle3-tools.sourceforge.net/), so I take this one back :) Sorry for that, but it's probably my fault: === THIS IS ABOUT TWO DIFFERENT BUGS! === Probably I should have made two bug-reports from the beginning on. So should we split them now? Please do so if you think it's a good idea (I'm not very experienced in bug reporting). BUG 1: I think easy to find a good fix. My notebook hdd and other notebook hdds I saw, park their heads because of the "APM_level" which can be safely fixed using /sbin/hdparm I think openSUSE it responsible for this value and set's it at some point while booting and coming back from suspend2disk (more details see my first message). Also changing this value won't influence the disk when not running with openSUSE, because (as far as I know) the value reset's after powercycling. BUG 2: I think hard to find a good fix. My WD green hdds park their heads after 8 seconds because of the green-hdd specific "idle3" value. This value seems to be a factory setting of those disks. On Windows it's not a problem, because Windows doesn't writes back so often when idle. But Linux does a lot of more writebacks, so the hdd goes to sleep after 8 seconds and then comes back because of some writeback. Changing the "idle3" value on the hdd is a difficult choise, because it's a factory default in the hdd and a change stays persistent after a power cycle. A maybe better solution would be, to extend the time until writeback in openSUSE. But I feel that's not so easy and it also brings the disadvantage of lost writes in case of power-loss. Hi, regarding the first bug, I believe openSUSE does not set that and 128 is the default (see this arch article https://wiki.archlinux.org/index.php/Hdparm). We could create an udev rule as the article mentions, and set it always to 254, since the default (128) doesn't make sense when the disk wakes up again so soon anyway. Robert, could you please decide if we should do it and then do it? Thanks... And about the second bug, you may try to open it, why not, if you believe that frequent writebacks are a bug.. Also they might not be the only thing accessing and waking the drive often. When I set my APM_level to 192, the hdd stopped parking it's heads all the time. I'm not sure what 192 really does, but I hope it keeps some powersaving features but doesn't park the heads anymore. Someone knows more about that? So maybe we should put 192 as value, not 254. But I'm not sure that's a good decision for all hdds. Additionally the question: If setting an udev rule, will this also set the APM_level after s2ram and s2disk? Because after s2ram/s2disk, my hdds APM_level resets to 128 (I put a script into /etc/pm/sleep.d/ to workaround this). Are we 100% sure that setting the disk to 192 is a good idea for most disks? There are disk (like in my case) that don't even have the APM_level, so setting for this type of disks is just ignore I suppose, but for those that support this, is it OK? Anyway, if someone would like please test rule: SUBSYSTEM!="block", GOTO="apm_level_end" ACTION!="add|change", GOTO="apm_level_end" KERNEL=="sr*", GOTO="apm_level_end" KERNEL=="sd*[!0-9]|hd*[!0-9]", RUN+="/sbin/hdparm -B 192 /dev/$name" LABEL="apm_level_end" Here I'm not sure if ACTION should be both 'add' and 'change'. Add usually is when the disk is added or at boot time and change is anytime that something changes. Maybe this rule would be better:
SUBSYSTEM!="block", GOTO="apm_level_end"
ACTION!="add|change", GOTO="apm_level_end"
KERNEL=="sr*", GOTO="apm_level_end"
KERNEL=="sd*[!0-9]|hd*[!0-9]", ATTR{queue/rotational}=="1", RUN+="/sbin/hdparm -B 192 /dev/$name"
LABEL="apm_level_end"
Add another check for the disk, meaning if it's really moving or SSD. If queue/rotational is 0, the disk suppose to be SSD, so for that we don't need to change the value of APM_level.
Created attachment 544475 [details]
99-apm-level.rules
The final rule which I've tested and seems to do the job. Please test and let me know if it works. Maybe 99-apm-level.rules name is not the best, but it's temporary.
Another app that may help people with WD Green disks which don't support APM, could use or try to use: http://hd-idle.sourceforge.net/ Maybe we should add this also to opensuse. Created attachment 544480 [details]
99-apm-level.rules
Just clean-up some mistakes made by me.
I made another bug report about that "Western Digital" idle3 issue (the bug, I described as the second in my comment 22). Please use that bug-report for all further discussion about it. https://bugzilla.novell.com/show_bug.cgi?id=825461 This bug report will be used for further discussion about how to set the APM_level on usual hdds (what Robert Milasan just wrote about). Created attachment 544515 [details]
99-apm-level.rules
Again a re-work. This rules should work now also at boot time.
Created attachment 544516 [details]
Script to set APM_level after s2ram
@Robert Milasan
I tested your udev rule (attachment id=544480). After boot APM_level is set to 192.
But after returning from s2ram and s2disk the APM_level is back to 128. I tried both, the standby button on my ThinkPad and the KDE buttons, to activate s2ram and s2disk.
As I wrote before, I created a script /etc/pm/sleep.d/99hdparm which handles to set the value after s2ram and s2disk. Maybe this is the only way?
Did you try the last attached rule (comment #31)? I made some mistakes in the other ones. Please also tell me exactly how you do s2ram and s2disk, so I would test it out. I got a small laptop/netbook which I'm testing with. Ok, on s2disk, the APM_level is reset to 254, which is weird, but maybe there is no kernel even 'add' or 'change' when coming back from s2disk. Still don't know for sure why, but seems to be that way. @Robert Milasan Just tried attachment 544516 [details] from comment 31. But I couldn't see any change in behavior to attachment 544480 [details] (I saw the changes in the file). If I suspend my notebook, the APM_level goes to 128, not to 254. Seems there's some more difference to your notebook. Ways to s2ram my notebook: Those 3 behave the same: - Press FN + F3 (F3 marked with a blue moon) on my ThinkPad keyboard. - Clicking suspend in my KDE menu. - Executing: sudo /usr/sbin/pm-suspend After resuming from s2ram APM_level is 128, but my pm script successfully fixes that. Behaves little different: sudo /usr/sbin/s2ram After resuming from s2ram APM_level is 128. But it looks like my pm script is NOT run, so doesn't fixes the APM_level. Can you try to create a script in /etc/pm/sleep.d/99disks and add the following:
#!/bin/bash
case "$1" in
hibernate|suspend)
#nothing
;;
thaw|resume)
echo 'change' > /sys/block/[hs]d*/uevent
;;
*)
;;
esac
exit 0
Then test s2disk/s2ram and let me know. Of course drop 99hdparm script first.
Regarding s2ram, not sure exactly what suppose to be ran, maybe the script needs to be in /usr/lib/pm-utils/sleep.d, but got no clue.
Looks like that /etc/pm/sleep.d/99disks script works for s2ram and s2disk. Except when using /usr/sbin/s2ram. But I guess, that's because this tool doesn't cares about the pm-scripts. Most probably, but it's good that it works. Now lets talk to Vojtech to see whats the best way to implement this. @Vojtech: please check the ideas which we implemented here and let me know what do you think would be the best way to continue. Seems that the rules and scripts work, but maybe you got better ideas on this. I don't want to add another pm-utils script, because Cristian and Frederic want to drop pm-utils soon and they would be angry about this. Instead, I would put this: #!/bin/bash [ "$1" = "post" ] && exec echo 'change' > /sys/block/[hs]d*/uevent to /usr/lib/systemd/system-sleep. Apart from that, this should do the trick I think. Just a few random comments... I've seen conflicting reports about how Windows behaves. Some claim that the disk doesn't sleep so often under Windows, others claim that the disk doesn't wake up so often under Windows. Which is true? We need to know what Windows really does. Does it arbitrarily change the power setting values of all disks? Or does it simply not wake up the disks so often? If Linux wakes the disk more often than Windows does, then tweaking the hdparm -B value is the wrong fix. On a laptop with a loaded battery (or a desktop / server machine powered by a loaded UPS, for that matter) the risk of loosing unwritten disk data is rather low. So it makes a lot of sense to ask the filesystems / kernel to leave the disk alone and only write to it when it really has to. Power savings should only be disabled when on AC power with no battery/UPS, or when on battery/UPS and energy level is getting low. The rationale for aggressive power management in general and disk head parking in particular is double: it saves power and, on laptops, it prevents disk damage if shocks occur. Disabling power savings unconditionally will change the negative comparison from "Linux killed my disk due to high Load Cycle Count, Windows is better" to "battery lasts longer under Windows than under Linux" and "Linux fails to park the disk's head and I lost my data in a shock because of this." I don't think this is what we want. Now I'm not saying that anyone involved in this bug has the power or energy to make it happen - I certainly won't, I'm busy enough as is. But ideally what would be needed is better cooperation between the Linux kernel, the hardware, and the power management policies. Disabling power savings altogether is only a temporary band aid. I would also add that some power saving settings can be misleading. For example, on disks which go to sleep very often (ever 8 seconds for example), the default value of /proc/sys/vm/dirty_expire_centisecs (30 seconds) makes no sense. When data is being written to the disk, the disk will wake up every 30 seconds, go to sleep after 8 seconds, wake up again after 22 seconds, etc. This is counterproductive, as the power savings achieved while sleeping are lost when waking up / unparking / parking. And performance suffers as well. While you would think that increasing dirty_expire_centisecs is the standard method to save more power, here the proper solution would be to lower dirty_expire_centisecs to, say, 5 seconds. This would guarantee that the disk only goes to sleep when all data has been written to the disk. OTOH increasing dirty_expire_centisecs would note help significantly, unless you make it insanely large (say 30 minutes) but then the risk of data loss would be increased, and performance might suffer when the writeback finally triggers. As a last note, this is unfortunate that the dirty settings are system-wide. Being able to tune them per disk would be quite useful, methinks. @Jean Delvare I think that's right. I also like my notebook to consume as less power as possible! Before some time I did a power measurement with powertop. I activated the hdd and then I put it to sleep with "hdparm -y /dev/sda". HD active: 7350 mA HD standby: 6650 mA Those values are note totally exact, because they where actually oscillating a little. Nevertheless, as long as there's no other way to prevent so many load cycles the APM_level should be set to 192 or 254. Parking and resuming the hdd so many times will for sure break disks and cause data loss! By the way: The 8 seconds were related to the other issue I split into bug 825461 For this issue, I observed times between 5 to 20 seconds (varying). Put my hdd back to an APM_level of 128 for testing. This is what happened in about 90 seconds. While that, I did just a some scrolling in the terminal window. My desktop was running KDE and there were a minimized Firefox and Okular (pdf viewer).
I wanted to repeat the test with a lower dirty_expire_centisecs value. But the problem didn't appeared (still testing with the same dirty_expire_centisecs value of 3000 - my system default). The hdd just didn't went to sleep once in about a 25 seconds and there was no i/o (regarding to iotop).
This makes me think, there is some more wrong at a lower level. hdparm manpage says about the "-B" option:
----
Possible settings range from values 1 through 127 (which permit spin-
down), and values 128 through 254 (which do not permit spin-down).
----
So there shouldn't be no spindown at all, as long as I don't go below 128. Or maybe this is about stopping the disk rotation and Load_Cycle_Count is about parking the head without stopping the disk? I don't know...
But every time Load_Cycle_Count increases I hear a click-sound. So I think there is some physical degeneration for sure! And there's some degeneration in my ear and head by that clicking...
Altogether I'm very satisfied with my battery power and I don't think it became worse since I switched to APM_level 192.
Some more confusion: hdparm manpage says:
-S Put the drive into idle (low-power) mode, and also set the standby (spindown)
timeout for the drive. This timeout value is used by the drive to determine
how long to wait (with no disk activity) before turning off the spindle motor
to save power. Under such circumstances, the drive may take as long as 30
seconds to respond to a subsequent disk access, though most drives are much
quicker. The encoding of the timeout value is somewhat peculiar. A value of
zero means "timeouts are disabled": the device will not automatically enter
standby mode.
How does this values interact with the -B APM_level? And, as far as I can see, there's no way to get the -S value from the drive. Just a way to set it.
How ever we decide: I think openSUSE should provide an easier possibility to adjust this value (together with a warning for the user to be careful). Currently it's not easy to set this value in openSUSE (also persistent after s2ram). Maybe there should be some configuration value in /etc/sysconfig/ide or somewhere else it's easy adjustable. So users can adjust if the hdd is "clicking" too often.
==============================
======== test results ========
// My system default
# cat /proc/sys/vm/dirty_expire_centisecs
3000
# date; /usr/sbin/smartctl -A /dev/sda | grep Load_Cycle_Count
Di 18. Jun 21:09:12 CEST 2013
193 Load_Cycle_Count 0x0012 089 089 000 Old_age Always - 115543
# date; /usr/sbin/smartctl -A /dev/sda | grep Load_Cycle_Count
Di 18. Jun 21:10:44 CEST 2013
193 Load_Cycle_Count 0x0012 089 089 000 Old_age Always - 115555
// deleted the lines without i/o
# sudo iotop -o --time
TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
21:09:13 Total DISK READ: 0.00 B/s | Total DISK WRITE: 46.89 K/s
21:09:13 339 be/3 root 0.00 B/s 3.91 K/s 0.00 % 5.13 % [jbd2/dm-2-8]
21:09:13 23593 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:1]
21:09:13 17043 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/1:1]
21:09:16 Total DISK READ: 0.00 B/s | Total DISK WRITE: 418.97 K/s
21:09:18 Total DISK READ: 0.00 B/s | Total DISK WRITE: 15.66 K/s
21:09:18 347 be/4 root 0.00 B/s 0.00 B/s 0.00 % 34.62 % [flush-253:2]
21:09:23 Total DISK READ: 0.00 B/s | Total DISK WRITE: 35.24 K/s
21:09:23 339 be/3 root 0.00 B/s 11.75 K/s 0.00 % 33.98 % [jbd2/dm-2-8]
21:09:31 1908 be/4 moritz 0.00 B/s 3.92 K/s 0.00 % 0.00 % kdeinit4: plasma-desktop [kdeinit]
21:09:36 Total DISK READ: 0.00 B/s | Total DISK WRITE: 7.83 K/s
21:09:37 Total DISK READ: 0.00 B/s | Total DISK WRITE: 66.57 K/s
21:09:37 717 be/3 root 0.00 B/s 3.92 K/s 0.00 % 33.40 % [jbd2/dm-1-8]
21:09:38 Total DISK READ: 0.00 B/s | Total DISK WRITE: 11.74 K/s
21:09:38 339 be/3 root 0.00 B/s 3.91 K/s 0.00 % 5.59 % [jbd2/dm-2-8]
21:09:45 1246 be/4 root 0.00 B/s 3.91 K/s 0.00 % 0.00 % nmbd -D -s /etc/samba/smb.conf
21:09:46 373 be/4 root 0.00 B/s 58.72 K/s 0.00 % 0.00 % systemd-journald
21:09:46 992 be/4 root 0.00 B/s 3.91 K/s 0.00 % 0.00 % rsyslogd -n
21:09:47 Total DISK READ: 0.00 B/s | Total DISK WRITE: 3.91 K/s
21:09:48 373 be/4 root 0.00 B/s 11.75 K/s 0.00 % 0.00 % systemd-journald
21:09:50 Total DISK READ: 0.00 B/s | Total DISK WRITE: 50.90 K/s
21:09:51 Total DISK READ: 0.00 B/s | Total DISK WRITE: 3.91 K/s
21:09:51 339 be/3 root 0.00 B/s 46.96 K/s 0.00 % 35.07 % [jbd2/dm-2-8]
21:09:52 373 be/4 root 0.00 B/s 15.66 K/s 0.00 % 0.00 % systemd-journald
21:09:58 Total DISK READ: 0.00 B/s | Total DISK WRITE: 31.32 K/s
21:09:58 339 be/3 root 0.00 B/s 11.75 K/s 0.00 % 34.38 % [jbd2/dm-2-8]
21:10:00 Total DISK READ: 0.00 B/s | Total DISK WRITE: 3.92 K/s
21:10:07 Total DISK READ: 0.00 B/s | Total DISK WRITE: 3.92 K/s
21:10:07 717 be/3 root 0.00 B/s 0.00 B/s 0.00 % 32.06 % [jbd2/dm-1-8]
21:10:11 Total DISK READ: 0.00 B/s | Total DISK WRITE: 3.92 K/s
21:10:12 Total DISK READ: 0.00 B/s | Total DISK WRITE: 62.64 K/s
21:10:17 Total DISK READ: 0.00 B/s | Total DISK WRITE: 86.14 K/s
21:10:18 Total DISK READ: 0.00 B/s | Total DISK WRITE: 11.75 K/s
21:10:18 347 be/4 root 0.00 B/s 0.00 B/s 0.00 % 37.28 % [flush-253:2]
21:10:20 Total DISK READ: 0.00 B/s | Total DISK WRITE: 3.92 K/s
21:10:21 23789 be/4 root 0.00 B/s 0.00 B/s 0.00 % 41.87 % udisksd --no-debug
21:10:21 23786 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0]
21:10:22 Total DISK READ: 0.00 B/s | Total DISK WRITE: 23.48 K/s
21:10:23 339 be/3 root 0.00 B/s 15.66 K/s 0.00 % 5.24 % [jbd2/dm-2-8]
21:10:25 Total DISK READ: 0.00 B/s | Total DISK WRITE: 23.57 K/s
21:10:25 717 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.07 % [jbd2/dm-1-8]
21:10:26 Total DISK READ: 0.00 B/s | Total DISK WRITE: 349.60 K/s
21:10:26 717 be/3 root 0.00 B/s 39.28 K/s 0.00 % 9.34 % [jbd2/dm-1-8]
21:10:26 19604 be/4 moritz 0.00 B/s 310.32 K/s 0.00 % 0.16 % firefox
21:10:30 Total DISK READ: 0.00 B/s | Total DISK WRITE: 11.79 K/s
21:10:30 339 be/3 root 0.00 B/s 3.93 K/s 0.00 % 5.31 % [jbd2/dm-2-8]
21:10:31 Total DISK READ: 0.00 B/s | Total DISK WRITE: 55.00 K/s
21:10:31 717 be/3 root 0.00 B/s 11.79 K/s 0.00 % 4.61 % [jbd2/dm-1-8]
21:10:31 8794 be/4 moritz 0.00 B/s 3.93 K/s 0.00 % 0.00 % okular /home/moritz/Moritz/main.lecture_notes.pdf --icon okular -caption Okular
21:10:38 Total DISK READ: 0.00 B/s | Total DISK WRITE: 11.79 K/s
21:10:38 339 be/3 root 0.00 B/s 3.93 K/s 0.00 % 5.39 % [jbd2/dm-2-8]
Created attachment 544595 [details] Comment 43 iotop output as textfile I tried putting a wrapper around hdparm, by renaming hdparm to hdparm.org and putting a sh-script to /usr/sbin/hdparm. The sh-script logged if someone tried to call hdparm (and then called hdparm itself with the same parameters). Then I did a reboot and tried s2ram. It looks like hdparm isn't used from somewhere else. The APM_level of 128 after s2ram and after a reboot must be ether: - set by an other tool - set by the kernel - is the hdds default Created attachment 544600 [details]
iotop output: iotop_2013-06-18_dirty_expire_centisecs-500_APM-level-128.txt
Tested again with APM_level 128 and issue is back. Now used an dirty_expire_centisecs of 500 (starting at a time of 22:44:18).
# /usr/sbin/hdparm -B /dev/sda
/dev/sda:
APM_level = 128
# date; /usr/sbin/smartctl -A /dev/sda | grep Load_Cycle_Count
Di 18. Jun 22:44:58 CEST 2013
# echo 500 > /proc/sys/vm/dirty_expire_centisecs
# date; /usr/sbin/smartctl -A /dev/sda | grep Load_Cycle_Count
Di 18. Jun 22:45:18 CEST 2013
193 Load_Cycle_Count 0x0012 089 089 000 Old_age Always - 115636
# date; /usr/sbin/smartctl -A /dev/sda | grep Load_Cycle_Count
Di 18. Jun 22:46:25 CEST 2013
193 Load_Cycle_Count 0x0012 089 089 000 Old_age Always - 115651
I also agree with Jean on this and I would prefer to close this as RESOLVED/WONTFIX or RESOLVED/INVALID. Play with this setting wont end good or wont be a good idea. If some user believes that implementing the rule and script which I wrote before will help him/her then fine, but fully implement this in 12.3 don't see it as viable. Note that hdparm -y leads to an even more aggressive power saving mode. It's not only parking the head but also typically spinning down the motor. A more accurate measurement would be with -B 128, and then comparing the maximum and minimum values with CPU and graphics idle (HDD would be the only thing which changes, from active idle to low power idle.) I am curious where you get the mA values from, powertop doesn't show up anything on my Thinkpad X230. I can only get values from /proc/acpi/battery/BAT0/state and only when on battery. mA values aren't too useful anyway if you don't know the voltage. Is it 11.1 V on your system as it is on mine? Oddly enough despite the 11.1 V nominal battery voltage, ACPI claims 12.3 V for me so I don't know if I can trust it. 7000 mA seems way too much to me for an idle laptop. Under 11-12V this is about 80 W, that's rather unrealistic. My X230 uses 9.5 W idle with internal display off, measured at the power plug. Either way, mW values from /proc/acpi/battery/BAT0/state (or an external wattmeter as I do) are more interesting IMHO because you can compare them with the specs from the drive vendor. For your drive they say: * Performance idle 1.7 W * Active idle 1.0 W * Low power idle 0.8 W I admit I have no idea what performance idle is supposed to be, but I suppose the disk doesn't spend much time in this state as it consumes almost as much as reads and writes. So basically I think we're speaking about a 200 mW difference in your case. I am curious if you would find the same difference by measurement. @Jean Delvare I have to be on battery (no ac) to get power consumption values. I just found out, that I didn't got those 7350 mA and 6650 mA values using powertop but using KDE's System-Monitor ( ksysguard ). It has a sensor called "acpi/battery/0/batteryusage" and it's unit is mA. It always shows nearly about the same value as powertop does (example: powertop shows 17.2 W and ksysguard shows 17117 mA). ksysguard is just updating the value more frequently. I don't know why ksysguard talks about mA. I know, that's something completely different then W ( https://en.wikipedia.org/wiki/Watt https://en.wikipedia.org/wiki/Ampere ). But the values I wrote should be something like: HD active: 7.3 W HD standby: 6.6 W Don't wonder: I was really doing nothing at my notebook for about 10 minutes, to get it down to 6.6 W. I actually waited until the value didn't fluctuated anymore before I started testing the hdd power consumption. @Robert Milasan I understand, that this isn't something that feels nice. But consider: I actually saw this behavior on all notebooks I checked for it and openSUSE is running on: Thinkpad x220 Thinkpad x220t (Tablet) R-Series Thinkpad (about 8 years old) L-Series Thinkpad This tells me, this issue isn't rare! And it's an issue that breaks hardware!!! So it should be fixed! One more note: I was using laptop-mode-tools for some time after I bought my notebook. After upgrading to openSUSE 12.3 I recognized, that laptop-mode-tools gave me no more power saving (on openSUSE 12.2 it still used a lot). So I uninstalled laptop-mode-tools and after that, the hdd-clicking issue began. So on notebook with laptop-mode-tools this bug may not appear, as long as laptop-mode-tools are running. (In reply to comment #43) > So there shouldn't be no spindown at all, as long as I don't go below 128. Or > maybe this is about stopping the disk rotation and Load_Cycle_Count is about > parking the head without stopping the disk? I don't know... I confirm that Load_Cycle_Count is about head parking and not spinning down/up the motor. I think the attribute for that is Start_Stop_Count. > But every time Load_Cycle_Count increases I hear a click-sound. So I think > there is some physical degeneration for sure! And there's some degeneration in > my ear and head by that clicking... Yes, it certainly causes hardware wear-out, even though you should keep in mind that these laptop / green drives are _meant_ to park / unpark heads relatively often, and are thus much more robust in this regard than traditional desktop drives / older drives. So these cycles should be limited, but avoiding them completely is not the goal. > Altogether I'm very satisfied with my battery power and I don't think it > became worse since I switched to APM_level 192. If I can trust the datasheet and read it correctly, the maximum loss would be around 0.2 W, which would be about 1% of your system's total power consumption. I wouldn't expect you to notice the difference unless you try very hard to measure it. However the savings may be more important for other drives, and the % may be even more important on other mobile devices. Which is why the decision can't be made arbitrarily by the OS. > Some more confusion: hdparm manpage says: > -S Put the drive into idle (low-power) mode, and also set the standby > (spindown) > timeout for the drive. This timeout value is used by the drive to determine > how long to wait (with no disk activity) before turning off the spindle motor > to save power. Under such circumstances, the drive may take as long as 30 > seconds to respond to a subsequent disk access, though most drives are much > quicker. The encoding of the timeout value is somewhat peculiar. A value of > zero means "timeouts are disabled": the device will not automatically enter > standby mode. > How does this values interact with the -B APM_level? I'm not sure, but I think -B operates at a higher level, specifying a general performance/power balance, while -S, -y and -Y are lower level, possibly IDE-specific. > And, as far as I can see, > there's no way to get the -S value from the drive. Just a way to set it. Correct, my experience is the same :-( I'd say this is a good reason to leave it alone and use -B instead. > How ever we decide: I think openSUSE should provide an easier possibility to > adjust this value (together with a warning for the user to be careful). I agree, while ideally the user shouldn't have to care about these details, in practice this seems to be a recurring issue. So while I don't think openSUSE should arbitrarily change the default, an easy way to let the user change it (e.g. through sysconfig) would be welcome... At least until a better solution is found and implemented. > # date; /usr/sbin/smartctl -A /dev/sda | grep Load_Cycle_Count > Di 18. Jun 21:09:12 CEST 2013 > 193 Load_Cycle_Count 0x0012 089 089 000 Old_age Always > - 115543 > # date; /usr/sbin/smartctl -A /dev/sda | grep Load_Cycle_Count > Di 18. Jun 21:10:44 CEST 2013 > 193 Load_Cycle_Count 0x0012 089 089 000 Old_age Always > - 115555 This is +12 in 90 seconds, i.e. on cycle every 7.5 seconds. Note that this is between dirty_expire_centisecs (30 s) and dirty_writeback_centisecs (5 s.) For comparison, I get +0 for my idle laptop (X230) and also +0 for my busy desktop (WG green). I'll test again on desktop when the drive is idle, I expect the same results as yours then. (In reply to comment #45) > It looks like hdparm isn't used from somewhere else. The APM_level of 128 after > s2ram and after a reboot must be ether: > - set by an other tool > - set by the kernel > - is the hdds default It is either the HDDs default (each drive can have its own) or set by the BIOS. The kernel isn't involved. Note, if I set hdparm -B 128 (from the default 254) on my X230's laptop, I get +15 Load_Cycle_Count over 90 seconds, so basically the same as you. This makes me wonder if Lenovo changed their mind between the X220 and the X230 to avoid the issue you're reporting. Or maybe this is a BIOS setting. Or maybe the Z7K500 series drives (mine) have a different default from the Z7K320 (yours.) As I said, while using openSUSE 12.2 (before March 2013) I didn't had this issue because of laptop-mode-tools. My current Load_Cycle_Count is 115658 ! The most of those 115658 must have appeared after I uninstalled laptop-mode-tools. And if you look at the tests I posted yesterday, you can see that this value would continue to rise dramatically, if I didn't changed the APM_level. My hdds datasheet (see comment 10) tells something about 600.000 cycles maximum. So this issue could really shorten the lifetime of my harddisk to much less then 3 years. And I bought this notebook to use it like 5 or 6 years. For me this isn't such a big issue anymore, because now I know how to handle. But it may break a lot of other peoples hdds. Created attachment 544702 [details]
Those are my current bios settings.
Did a few bios updates via the Windows tool. But the power-settings should be the default ones.
I tried resetting the values via "Load default values" but that didn't changed anything. Also not reactivated the fingerprint reader I switched off (which makes me think maybe "Load default values" is not working at all).
I just remembered, in Windows there are settings in the controlpanel to control when the hdd goes to sleep. https://www.alesis.com/emails/nov08/images/tips/5.3-XP-Turn-Off-HD-Sleep.jpg http://cdn.overclock.net/9/97/350x700px-LL-97206128_b888ce6294c942f999dad2c.png For exmaple Windows XP gives me those default powerprofiles regarding to the hdd timeout (german/english): Minimale Batterieauslastung/minimal battery comsumption AC: never Battery: 3 min. Minimaler Energieverbrauch/minimal energy consumption AC: never Battery: 15 min. Desktop: AC: never Battery: 10 min. Laptop: AC: 30 min. Battery: 5 min. Presentation: AC: never Battery: never Dauerbetrieb / always running: AC: never Battery: 30 min. I guess this hdd sleep topic is already an operation system issue. I know, it would be the right way if the harddrive sets a good and safe default, but maybe some vendors just rely on Windows doing that... @Jean Delvare What kind of hdd is in your ThinkPad? In mine I got a: HardDisk: HITACHI HTS723232A7A364 (Size 320072933376 Byte ~ 298 GB) This could be an default behavior, written into the disks firmware or hard-wired somewhere in the disk. And if running Windows, it's overwritten by Windows. Every harddisk different to mine, even with just another firmware, could be an explanation why you don't suffer from that issue... I know, really hard to find out the real reason... :-/ FWIW I measured the difference between -B 254 and -B 128 on my Z7K500 drive, and that's about -0.5 W, measured with an external wattmeter. Surprisingly /proc/acpi/battery/BAT0/state did not reveal a significant difference. (In reply to comment #49) > I was using laptop-mode-tools for some time after I bought my notebook. After > upgrading to openSUSE 12.3 I recognized, that laptop-mode-tools gave me no more > power saving (on openSUSE 12.2 it still used a lot). So I uninstalled > laptop-mode-tools and after that, the hdd-clicking issue began. So on notebook > with laptop-mode-tools this bug may not appear, as long as laptop-mode-tools > are running. Indeed, if you look at /etc/laptop-mode/laptop-mode.conf, you'll see: CONTROL_HD_POWERMGMT=1 BATT_HD_POWERMGMT=128 LM_AC_HD_POWERMGMT=254 NOLM_AC_HD_POWERMGMT=254 I _do_ have laptop-mode-tools installed on my laptop and it's on AC most of the time, which explains why I had -B value 254. It gets down to 128 when I switch to battery, and back to 254 when I plug the AC adapter back in. This also explains why /proc/acpi/battery/BAT0/state did not reveal a significant difference: -B value was 128 all along. Doing the proper comparison now, the difference is about 420 mW. Thanks for the tip about powertop only showing power consumption when on battery - same here, of course. Still a problem on openSUSE 13.2!
Meanwhile I changed my notebook's hard disk (rest of the hardware is still the same). But the problem persists...
My new hard disk:
Hitachi Travelstar (1 TB)
Model Number: HGST HTS541010A7E630
Firmware Revision: SE0OA430
We should really think about this again!
This can break people's hardware!!!
"laptop-mode" shouldn't currently touch this on my notebook. The file
/etc/laptop-mode/laptop-mode.conf
is set to:
CONTROL_HD_POWERMGMT=0
I'm currently using the following two files to keep my disk from suspending and waking up again all the time.
This file takes care when the system boot's up:
/etc/udev/rules.d/99-apm-level.rules
==========
SUBSYSTEM!="block|usb", GOTO="apm_level_end"
ACTION!="add|change", GOTO="apm_level_end"
KERNEL=="sr*", GOTO="apm_level_end"
KERNEL=="sd*[!0-9]|hd*[!0-9]", IMPORT{program}="ata_id --export $devnode"
KERNEL=="sd*[!0-9]|hd*[!0-9]", ATTR{queue/rotational}=="1", ENV{ID_ATA_FEATURE_SET_APM}=="1", RUN+="/sbin/hdparm -B 192 $devnode"
LABEL="apm_level_end"
==========
This file takes care after wakeup from suspend:
/usr/lib/systemd/system-sleep/99hdparm
(instead of /etc/pm/sleep.d/99hdparm which isn't working anymore)
==========
#!/bin/sh
# Argument 1: either pre or post, depending on whether the machine is going to sleep or waking up
# Argument 2: suspend, hibernate or hybrid-sleep, depending on which is being invoked
case $1/$2 in
pre/*)
#nothing
;;
post/*)
/usr/sbin/hdparm -B 192 /dev/sda
;;
esac
==========
Created attachment 615483 [details]
hdparm related files from kubuntu-14.04.1-desktop-amd64
More information:
Tested with an openSUSE 13.2 x86_64 KDE live-image on my notebook (from USB memory) on ac-power.
Started "hdparm -B /dev/sda" after booting. Result: 128
Tested kubuntu-14.04.1-desktop-amd64.iso on my notebook (from USB memory)
Started "hdparm -B /dev/sda" after booting. Result: 254
Looks like these files (extracted from the Kubuntu system) are the reason, why Kubuntu has a value of 254.
(attached a copy of these files in the provided zip file)
/etc/hdparm.conf
/lib/hdparm/hdparm-functions
/usr/lib/grub/i386-pc/hdparm.mod
/usr/lib/pm-utils/power.d/95hdparm-apm
I propose openSUSE should do it the same way!
(maybe we have to do it via systemd instead of power.d)
Nevertheless, the Kubuntu file "95hdparm-apm" says "On battery we set hdparm -B 128".
I think also on battery openSUSE should use a value of at least 192 or higher!
Every 2.5 inch hdd I have, has dramatically increasing loads/unloads for values smaller than 192.
Spending a little more power is less worse then breaking peoples hdd's!!!
An easy solution is to put the hdparm command into /etc/init.d/boot.local - works for me. @Norbert Jurkeit Probably! But test what happens if you put your Notebook/PC into standby/s2ram or hibernate/s2disk. Afterwards the value will probably be 128, because s2ram and s2disk will reset the value but boot.local doesn't become active when waking from s2ram/s2disk. For handling s2ram/s2disk use the file /usr/lib/systemd/system-sleep/99hdparm as described by me. Nevertheless, this bug-report is now more about find a general strategy to fix this issue for all openSUSE users. E.g. keeping openSUSE from accessing the hdd every few seconds or setting the hdparm/APM value to 192 or 254. Reassigned the bug to Jean. It turns out I'm not actually working on this, sorry. Same problem on my Dell computer (openSUSE 42.1 / 42.2): HDD model HGST HTS721010A9E630 smartctl -A /dev/sdb 193 Load_Cycle_Count: 9968 Five minutes later it's already 9971 (+3). This happened on a brand new computer fter less than 5 month of usage! While switching the computer on/off about once per day. (either by "halt" or by "s2disk") Please fix this bug, because it kind of damages hardware. (In reply to Moritz Duge from comment #65) > Same problem on my Dell computer (openSUSE 42.1 / 42.2): > > > HDD model > HGST HTS721010A9E630 > > smartctl -A /dev/sdb > 193 Load_Cycle_Count: 9968 > Five minutes later it's already 9971 (+3). > > This happened on a brand new computer fter less than 5 month of usage! > While switching the computer on/off about once per day. (either by "halt" or > by "s2disk") > > > Please fix this bug, because it kind of damages hardware. The powermanagement for this is handled by hdparm/systemd on new systems. I checked all the laptops around and power cycle is really low. Please open a new issue detailing what hardware is it happening on (not just hdd). I am closing this bug as it is reported against out-of-support release. |
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:20.0) Gecko/20100101 Firefox/20.0 Hi, at first I think this is a critical bug because it may damages the harddisk. But I want to say sorry for the trouble if I'm wrong with that. My system: OS: openSUSE 12.3 x86_64 Model: Thinkpad x220 (Type 4291-36G) CPU: Intel i7-2620M HardDisk: HITACHI HTS723232A7A364 (Size 320072933376 Byte ~ 298 GB) When my notebook is idle I heard some strange clicking. About one click in every 5-20 seconds. After some while I started investigating and found, that with every click this value rises up +1. sudo /usr/sbin/smartctl -A /dev/sda | grep Load_Cycle_Count Right now it's at: 114825 It's the same click-sound when I put my notebook into s2ram or power it down. For me this looks like my harddisk is stopping very often to safe power. As far as I know too often, because so many power-cycles may damage the disk after some while. So I checked this value: ================== # sudo /sbin/hdparm -B /dev/sda /dev/sda: APM_level = 128 ================== After changing it with: sudo /sbin/hdparm -B 192 /dev/sda and putting that command into "/etc/init.d/boot.local" (for setting at boot) and "/etc/pm/sleep.d/99hdparm" (created new, for setting after wake from standby) the clicking stopped or maybe I just stopped hearing it because it didn't happend so often anymore. Also the "Load_Cycle_Count" grew much slower (maybe +10 a day and this is about the times I put my notebook into standby a day). My full "/etc/pm/sleep.d/99hdparm" looks like this: ================== #!/bin/bash case "$1" in hibernate|suspend) #nothing ;; thaw|resume) /sbin/hdparm -B 192 /dev/sda ;; *) ;; esac exit 0 ================== I also saw this behavior on: Thinkpad x220t (Tablet) R-Series Thinkpad (about 8 years old) L-Series Thinkpad Putting the notebooks on ac or battery doesn't changes the behavior. On my desktop pc I got sdb and sdc in a software-mirror-raid configured by the motherboard. When calling "smartctl -A" on sdb or sdc I get "SMART Disabled". But after enabling SMART using /usr/sbin/smartctl -s on /dev/sdX I can see the "Load_Cycle_Count" reported by smartctl rising quite fast too (about +3 per minute). But I'm not hearing any clicking from the disks. Mainboard: ASUS M4A785TD-V EVO # dmraid -r /dev/sdc: pdc, "pdc_bebjhhcbgg", mirror, ok, 3906249984 sectors, data@ 0 /dev/sdb: pdc, "pdc_bebjhhcbgg", mirror, ok, 3906249984 sectors, data@ 0 I can't set the APM_level: ================== # /sbin/hdparm -B 192 /dev/sdc /dev/sdc: setting Advanced Power Management level to 0xc0 (192) HDIO_DRIVE_CMD failed: Input/output error APM_level = not supported ================== Are my thoughts correct? Is one hdd power-cycle per 5 or 20 seconds too much? If yes, this should be fixed VERY SOON, because every day it damages the people's harddisks. Thanks colAflash Reproducible: Always