MrAbz
21-05-2014, 17:04
Originally Posted by RikT
Originally Posted by happyman
Still no reply.
many thanks for all the information, it's really complete.
Our engineers have confirmed that they'll change the drive. We only need you to confirm in writing that you've got a backup of all the data and they are allowed to go ahead with the change.
Hi, with reference to Ticket #2014051277000478 -
/dev/sda has failed on this server and I am still awaiting ANY form of communication from you as to whether this disk will be replaced or NOT. Still no reply 48 hours from the last ticket. I am currently looking to switching ALL of my servers to Hetzner as they offer the same hardware/service for slightly less than what you can offer. This is not the first time I've had a disk fail, and I've had to let the lease run out and rent a "new" one from SYS/KS as I've either not heard back, or jumped through the hoops and then get the silent treatment from further replies.
Unless this gets resolved in the next 24 hours, I will be looking to moving my data elsewhere.
sudo hdparm --write-sector 0x004989f8 --yes-i-know-what-i-am-doing /dev/sda /dev/sda: re-writing sector 4819448: succeeded
smartctl 6.2 2013-04-20 r3812 [x86_64-linux-3.10.23-xxxx-std-ipv6-64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Hitachi Deskstar 7K3000 Device Model: Hitachi HDS723020BLA642 Serial Number: MN1240F33B1LDD LU WWN Device Id: 5 000cca 369ef4c93 Firmware Version: MN6OAA10 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sat Mar 1 19:35:31 2014 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. See vendor-specific Attribute list for failed Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 28) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 321) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 001 001 016 Pre-fail Always FAILING_NOW 4294967295 2 Throughput_Performance 0x0005 135 135 054 Pre-fail Offline - 86 3 Spin_Up_Time 0x0007 181 181 024 Pre-fail Always - 300 (Average 333) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 18 5 Reallocated_Sector_Ct 0x0033 065 065 005 Pre-fail Always - 597 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 138 138 020 Pre-fail Offline - 25 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 614 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 16 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 38 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 39 194 Temperature_Celsius 0x0002 153 153 000 Old_age Always - 39 (Min/Max 18/45) 196 Reallocated_Event_Count 0x0032 066 066 000 Old_age Always - 785 197 Current_Pending_Sector 0x0022 001 001 000 Old_age Always - 20933 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 1169 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1169 occurred at disk power-on lifetime: 601 hours (25 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 06 ca 05 01 04 Error: UNC at LBA = 0x040105ca = 67175882 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 00 c8 05 01 40 00 3d+13:34:29.203 READ FPDMA QUEUED 60 08 f0 00 06 01 40 00 3d+13:34:29.175 READ FPDMA QUEUED 60 08 e8 f8 05 01 40 00 3d+13:34:29.175 READ FPDMA QUEUED 60 08 e0 f0 05 01 40 00 3d+13:34:29.175 READ FPDMA QUEUED 60 08 d8 e8 05 01 40 00 3d+13:34:29.175 READ FPDMA QUEUED Error 1168 occurred at disk power-on lifetime: 577 hours (24 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 06 ca 05 01 04 Error: UNC at LBA = 0x040105ca = 67175882 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 00 c8 05 01 40 00 2d+13:26:28.510 READ FPDMA QUEUED 60 08 f0 00 06 01 40 00 2d+13:26:28.468 READ FPDMA QUEUED 60 08 e8 f8 05 01 40 00 2d+13:26:28.468 READ FPDMA QUEUED 60 08 e0 f0 05 01 40 00 2d+13:26:28.468 READ FPDMA QUEUED 60 08 d8 e8 05 01 40 00 2d+13:26:28.468 READ FPDMA QUEUED Error 1167 occurred at disk power-on lifetime: 553 hours (23 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 06 ca 05 01 04 Error: UNC at LBA = 0x040105ca = 67175882 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 08 c8 05 01 40 00 1d+13:56:16.781 READ FPDMA QUEUED 60 08 00 00 06 01 40 00 1d+13:56:16.781 READ FPDMA QUEUED 60 08 f0 f8 05 01 40 00 1d+13:56:16.778 READ FPDMA QUEUED 60 08 e8 f0 05 01 40 00 1d+13:56:16.778 READ FPDMA QUEUED 60 08 e0 e8 05 01 40 00 1d+13:56:16.778 READ FPDMA QUEUED Error 1166 occurred at disk power-on lifetime: 529 hours (22 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 06 ca 05 01 04 Error: UNC at LBA = 0x040105ca = 67175882 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 08 c8 05 01 40 00 14:05:34.629 READ FPDMA QUEUED 60 00 00 30 68 bc 40 00 14:05:34.628 READ FPDMA QUEUED 60 00 00 30 67 bc 40 00 14:05:34.628 READ FPDMA QUEUED 60 08 00 a8 1c c1 40 00 14:05:34.627 READ FPDMA QUEUED 60 08 00 50 10 c1 40 00 14:05:34.627 READ FPDMA QUEUED Error 1165 occurred at disk power-on lifetime: 515 hours (21 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 06 ca 05 01 04 Error: UNC at LBA = 0x040105ca = 67175882 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 00 c8 05 01 40 00 00:01:23.891 READ FPDMA QUEUED 60 08 00 10 40 c4 40 00 00:01:23.695 READ FPDMA QUEUED 60 08 00 80 11 47 40 00 00:01:23.685 READ FPDMA QUEUED 60 08 00 e8 3f c4 40 00 00:01:23.684 READ FPDMA QUEUED 60 10 00 30 1b 04 40 00 00:01:23.658 READ FPDMA QUEUED SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 510 - # 2 Short offline Completed without error 00% 138 - # 3 Short offline Completed without error 00% 10 - # 4 Short offline Completed without error 00% 0 - # 5 Short offline Completed without error 00% 0 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Drive failure expected in less than 24 hours. SAVE ALL DATA.
40 51 08 f8 89 49 00 Error: UNC 8 sectors at LBA = 0x004989f8 = 4819448
Feb 25 11:19:00 rescue kernel: sd 0:0:0:0: [sda]
Feb 25 11:19:00 rescue kernel: Add. Sense: Unrecovered read error - auto reallocate failed
Feb 25 11:19:00 rescue kernel: sd 0:0:0:0: [sda] CDB:
Feb 25 11:19:00 rescue kernel: Read(10): 28 00 00 49 89 f0 00 00 08 00
Feb 25 11:19:00 rescue kernel: ata1: EH complete
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 56
I have already removed the drive from the raid using mdadm - pending drive removal and raid array rebuild. About 4 days ago, the server went down with ~1200 SMART errors for Current Pending Sectors, and was investigated by SYS, however on the Control Panel, there is no information given for any interventions made, unlike OVH where you can view all the details for each intervention. However, here is the full log from smartctl:
root@rescue:~# smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.10.23-xxxx-std-ipv6-64-rescue] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar 7K3000
Device Model: Hitachi HDS723020BLA642
Serial Number: MN1240F33B1LDD
LU WWN Device Id: 5 000cca 369ef4c93
Firmware Version: MN6OAA10
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Tue Feb 25 15:25:38 2014 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 28) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 135 135 054 Pre-fail Offline - 85
3 Spin_Up_Time 0x0007 181 181 024 Pre-fail Always - 300 (Average 333)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 17
5 Reallocated_Sector_Ct 0x0033 075 075 005 Pre-fail Always - 561
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 135 135 020 Pre-fail Offline - 26
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 514
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 15
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 37
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 38
194 Temperature_Celsius 0x0002 157 157 000 Old_age Always - 38 (Min/Max 18/45)
196 Reallocated_Event_Count 0x0032 068 068 000 Old_age Always - 749
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 56
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 1162 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1162 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 f8 89 49 00 Error: UNC 8 sectors at LBA = 0x004989f8 = 4819448
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 f8 89 49 e0 00 00:09:31.999 READ DMA
c8 00 08 f0 89 49 e0 00 00:09:31.999 READ DMA
ca 00 08 f0 89 49 e0 00 00:09:31.999 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:31.991 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:31.990 READ NATIVE MAX ADDRESS EXT
Error 1161 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 f0 89 49 00 Error: UNC 8 sectors at LBA = 0x004989f0 = 4819440
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 f0 89 49 e0 00 00:09:28.935 READ DMA
c8 00 08 e8 89 49 e0 00 00:09:28.935 READ DMA
ca 00 08 e8 89 49 e0 00 00:09:28.935 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:28.927 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:28.927 READ NATIVE MAX ADDRESS EXT
Error 1160 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 e8 89 49 00 Error: UNC 8 sectors at LBA = 0x004989e8 = 4819432
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 e8 89 49 e0 00 00:09:25.887 READ DMA
c8 00 08 e0 89 49 e0 00 00:09:25.887 READ DMA
ca 00 08 e0 89 49 e0 00 00:09:25.887 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:25.875 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:25.875 READ NATIVE MAX ADDRESS EXT
Error 1159 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 e0 89 49 00 Error: UNC 8 sectors at LBA = 0x004989e0 = 4819424
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 e0 89 49 e0 00 00:09:22.815 READ DMA
c8 00 08 d8 89 49 e0 00 00:09:22.815 READ DMA
ca 00 08 d8 89 49 e0 00 00:09:22.815 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:22.807 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:22.807 READ NATIVE MAX ADDRESS EXT
Error 1158 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 d8 89 49 00 Error: UNC 8 sectors at LBA = 0x004989d8 = 4819416
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 d8 89 49 e0 00 00:09:19.759 READ DMA
c8 00 08 d0 89 49 e0 00 00:09:19.759 READ DMA
ca 00 08 d0 89 49 e0 00 00:09:19.759 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:19.751 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:19.751 READ NATIVE MAX ADDRESS EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 510 -
# 2 Short offline Completed without error 00% 138 -
# 3 Short offline Completed without error 00% 10 -
# 4 Short offline Completed without error 00% 0 -
# 5 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.