Bugzilla – Bug 1154665
too much HDD load cycles (hdparm -B)
Last modified: 2020-01-16 18:22:46 UTC
Looks like this is still present in openSUSE-LEAP-15.1, breaking harddisks. Bug 825461 - WD (Western Digital) Green disks do too much load cycles (writebacks idle3) (edit) I just had to swap a disk because of read errors. And it looks pretty much like the Load_Cycle_Count wasn't in a really unhealthy state. Sadly I didn't tested how Load_Cycle_Count changes after a few minutes before swapping the disk. But 408151 is clearly something like the machines working time in my office. 4 years, 210 days per year, 8 hours a day, every minute = 4*210*8*60 = 403200 ~ 408151 SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 152 149 021 Pre-fail Always - 1400 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 350 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 3744 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 350 191 G-Sense_Error_Rate 0x0032 092 092 000 Old_age Always - 8 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22 193 Load_Cycle_Count 0x0032 064 064 000 Old_age Always - 408151 194 Temperature_Celsius 0x0022 113 099 000 Old_age Always - 30 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 240 Head_Flying_Hours 0x0032 097 097 000 Old_age Always - 2427 241 Total_LBAs_Written 0x0032 200 200 000 Old_age Always - 1618739076 242 Total_LBAs_Read 0x0032 200 200 000 Old_age Always - 1298508156 254 Free_Fall_Sensor 0x0032 200 200 000 Old_age Always - 0
Hello, can you help to find where the problem is? I am not familiar to this, thanks!
Created attachment 822148 [details] openSUSE-15.1 workaround (In reply to Alynx Zhou from comment #1) > Hello, can you help to find where the problem is? I am not familiar to this, > thanks! It's pretty much described in the referenced bug 825461. Abstract: In openSUSE HDDs work on APM_level 128 by default. This causes HDD to go to sleep very soon (e.g. after idling for 5 seconds - for some disks you can actually hear a click sound when that happends). But because openSUSE writes to disk regularly (e.g. logfiles once a minute), the HDD will spin up and down every minute which isn't healthy. Possible solutions: Change APM_level to at least 192. (alternatively increase writeback timeout on mounts, but that may cause data loss) I changed the APM_level for my remaining HDDs using the attached files. /etc/udev/rules.d/10-hdd-hdparm.rules /etc/scripts/hdd-apm-level.bash My broken disk is a WD3200BEKX-75B7WT0. Other disks I observed a rapidly rising Load_Cycle_Count for (if APM_level<192): - HGST HTS721010A9E630 - HGST HTS541010A7E630 So this isn't just a problem for WD disks.
It seems that this "WD Green idle3 timer problem" is a known issue. It's very nicely described e.g. at [1]. But I'm not sure what we can do about it. WD Green doesn't support APM so one has to use either 'hdparm -J' (which is not perfect and upstream recommends rather using official WD tool) or idle3-tools/idle3ctl. I also read that the different WD Green series behaved differently regarding idle3 value setting so sometimes it's necessary to test more ways before one is successful. So because of that, I don't think that this is something we can fix globally within hdparm. [1] https://wiki.archlinux.org/index.php/hdparm#Power_management_for_Western_Digital_Green_drives
I'm closing this as WONTFIX because I think that it's not reasonably fixable via hdparm.