Bug 1154665

Summary: too much HDD load cycles (hdparm -B)
Product: [openSUSE] openSUSE Distribution Reporter: Moritz Duge <duge>
Component: BasesystemAssignee: Kristyna Streitova <kstreitova>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: alynx.zhou, jochenbl
Version: Leap 15.1   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: openSUSE-15.1 workaround

Description Moritz Duge 2019-10-21 16:17:36 UTC
Looks like this is still present in openSUSE-LEAP-15.1, breaking harddisks.
Bug 825461 - WD (Western Digital) Green disks do too much load cycles (writebacks idle3) (edit) 

I just had to swap a disk because of read errors.
And it looks pretty much like the Load_Cycle_Count wasn't in a really unhealthy state.

Sadly I didn't tested how Load_Cycle_Count changes after a few minutes before swapping the disk. But 408151 is clearly something like the machines working time in my office.
4 years, 210 days per year, 8 hours a day, every minute = 4*210*8*60 = 403200 ~ 408151

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   152   149   021    Pre-fail  Always       -       1400
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       350
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       3744
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       350
191 G-Sense_Error_Rate      0x0032   092   092   000    Old_age   Always       -       8
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       22
193 Load_Cycle_Count        0x0032   064   064   000    Old_age   Always       -       408151
194 Temperature_Celsius     0x0022   113   099   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
240 Head_Flying_Hours       0x0032   097   097   000    Old_age   Always       -       2427
241 Total_LBAs_Written      0x0032   200   200   000    Old_age   Always       -       1618739076
242 Total_LBAs_Read         0x0032   200   200   000    Old_age   Always       -       1298508156
254 Free_Fall_Sensor        0x0032   200   200   000    Old_age   Always       -       0
Comment 1 Alynx Zhou 2019-10-22 02:57:22 UTC
Hello, can you help to find where the problem is? I am not familiar to this, thanks!
Comment 2 Moritz Duge 2019-10-22 08:32:24 UTC
Created attachment 822148 [details]
openSUSE-15.1 workaround

(In reply to Alynx Zhou from comment #1)
> Hello, can you help to find where the problem is? I am not familiar to this,
> thanks!

It's pretty much described in the referenced bug 825461.

Abstract:
In openSUSE HDDs work on APM_level 128 by default. This causes HDD to go to sleep very soon (e.g. after idling for 5 seconds - for some disks you can actually hear a click sound when that happends).
But because openSUSE writes to disk regularly (e.g. logfiles once a minute), the HDD will spin up and down every minute which isn't healthy.

Possible solutions:
Change APM_level to at least 192. (alternatively increase writeback timeout on mounts, but that may cause data loss)
I changed the APM_level for my remaining HDDs using the attached files.
  /etc/udev/rules.d/10-hdd-hdparm.rules
  /etc/scripts/hdd-apm-level.bash

My broken disk is a WD3200BEKX-75B7WT0.
Other disks I observed a rapidly rising Load_Cycle_Count for (if APM_level<192):
- HGST HTS721010A9E630
- HGST HTS541010A7E630
So this isn't just a problem for WD disks.
Comment 3 Kristyna Streitova 2019-12-06 13:24:38 UTC
It seems that this "WD Green idle3 timer problem" is a known issue. It's very nicely described e.g. at [1].

But I'm not sure what we can do about it. WD Green doesn't support APM so one has to use either 'hdparm -J' (which is not perfect and upstream recommends rather using official WD tool) or idle3-tools/idle3ctl. I also read that the different WD Green series behaved differently regarding idle3 value setting so sometimes it's necessary to test more ways before one is successful. 

So because of that, I don't think that this is something we can fix globally within hdparm.


[1] https://wiki.archlinux.org/index.php/hdparm#Power_management_for_Western_Digital_Green_drives
Comment 4 Kristyna Streitova 2020-01-16 18:22:46 UTC
I'm closing this as WONTFIX because I think that it's not reasonably fixable via hdparm.