OVH Community, your new community space.

Email support is useless...


MrAbz
21-05-2014, 17:04
Quote Originally Posted by RikT
its more of a case of submit a ticket do a rain dance and make a sacrifice to the server gods and hope they are pleased by your offering
So much this ^

Quote Originally Posted by happyman
Oh, OK. No way to see whether it is assigned, status, any updates, etc

So it is a case of submit the message and hope for the best ?
Yep, there's no acknowledgement email so you know they've received your email, just keep those fingers crossed that they see your message and not skim past it in their ticket queue.

Still no reply.

RikT
21-05-2014, 16:28
its more of a case of submit a ticket do a rain dance and make a sacrifice to the server gods and hope they are pleased by your offering

happyman
21-05-2014, 15:25
Quote Originally Posted by elcct
That's the ticket system...
Oh, OK. No way to see whether it is assigned, status, any updates, etc

So it is a case of submit the message and hope for the best ?

elcct
21-05-2014, 15:15
Quote Originally Posted by happyman
Out of interest when you say 'ticket' where do you submit tickets ? - I can only see the e-mail form for sending an e-mail message to them (via http://www.soyoustart.com/en/contact...r-services.xml). Is there a ticketing system for SYS ?

Thanks
Daniel
That's the ticket system...

happyman
21-05-2014, 14:53
Out of interest when you say 'ticket' where do you submit tickets ? - I can only see the e-mail form for sending an e-mail message to them (via http://www.soyoustart.com/en/contact...r-services.xml). Is there a ticketing system for SYS ?

Thanks
Daniel

MrAbz
21-05-2014, 13:22
What I did with the last server was rent a new one, transfer the data over and let the lease expire... Long winded and I was down on my money that month (the date around the OP) but that still is not the point, the whole customer base and I pay for servers with 99% uptime whether it is an OVH, SYS or KS server.

Don't get me wrong, the servers are amazing (when they work) but the absolute lack of consideration for customers issues besides the usual copypasta responses after the Point of Purchase is terrible.

EDIT: Only 2 hours lapsed so far which I'll give them on this new ticket, but still no response yet.

happyman
21-05-2014, 12:37
This is appalling and what I'm dreading when I have hardware issues with my SYS server - Which is bound to happen at some point - Maybe a year or more from now, but it will happen at some point

MrAbz
21-05-2014, 11:55
OK so I am digging this thread up again as I've now had a second drive fail. Support were amazing, provided all the information required, then I got a message back saying:

many thanks for all the information, it's really complete.

Our engineers have confirmed that they'll change the drive. We only need you to confirm in writing that you've got a backup of all the data and they are allowed to go ahead with the change.
Obviously I replied immediately giving the A-OK.

Then nothing.

Emailed back yesterday, asking for an update, nothing.

So I guess I'm getting the silent treatment this time, instead of the usual fobbing off. This is the last email I will probably ever send to OVH/SYS/KS, and I have dug this up, so potential customers can see what it's like when things go wrong (and they do! Horribly!):

Hi, with reference to Ticket #2014051277000478 -

/dev/sda has failed on this server and I am still awaiting ANY form of communication from you as to whether this disk will be replaced or NOT. Still no reply 48 hours from the last ticket. I am currently looking to switching ALL of my servers to Hetzner as they offer the same hardware/service for slightly less than what you can offer. This is not the first time I've had a disk fail, and I've had to let the lease run out and rent a "new" one from SYS/KS as I've either not heard back, or jumped through the hoops and then get the silent treatment from further replies.

Unless this gets resolved in the next 24 hours, I will be looking to moving my data elsewhere.
Please note I have already given them the outputs of:
smartctl -a -d ata /dev/sda
smartctl -a -d ata /dev/sdb
fdisk -l
cat /proc/mdstat

Which for any sysadmin is more than enough to diagnose a dying disk.

Yet they fail to action anything. The section outlining "Guarantees and SLA" mentioned on their pages should be removed and better suited to showing Oles bathing in all the money us fools have given them.

LawsHosting
02-03-2014, 23:23
Quote Originally Posted by theatheist
Waiting this long for hardware diagnosis/replacement is an absolute disgrace.
Think himself lucky it's not a £100+ KS server...


theatheist
02-03-2014, 03:50
Waiting this long for hardware diagnosis/replacement is an absolute disgrace.

Trying to get a reply within 24-48 from SYS support is like trying to get blood out of a stone. I'm sure they are doing a great job but they clearly don't have enough people as the replies always seem to be from same couple of reps.

MrAbz
01-03-2014, 20:03
Quote Originally Posted by Myatu
Backup, backup, backup...
Don't worry, RAID1 marked as failed and sda is "removed" from mdadm (again) any other data has been downloaded to another server. I do know what I'm doing and I'm not with OVH for the cheap minecraft / game "servers"

Andy
01-03-2014, 18:56
I've moved to Hetzner. Got the same server I had with OVH for 60% of the cost and the support is 10000x better (replies in under an hour and then quicker once a ticket is assigned to someone).

Myatu
01-03-2014, 18:49
Well, at least you now have a de-facto proof that it is failing Backup, backup, backup...

P.S: Notice the raw read error value is now lower than the threshold. There was a question about that in another post, so this serves as a nice example. Post was: http://forum.ovh.co.uk/showthread.ph...r_Rate&p=59814

MrAbz
01-03-2014, 18:37
Quote Originally Posted by Myatu
Overall the HDD is just fine though:
Not anymore, just ran another test after force writing to that sector you highlighted...
Code:
sudo hdparm --write-sector 0x004989f8 --yes-i-know-what-i-am-doing /dev/sda

/dev/sda:
re-writing sector 4819448: succeeded
SMART test now FAILED!

Code:
smartctl 6.2 2013-04-20 r3812 [x86_64-linux-3.10.23-xxxx-std-ipv6-64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K3000
Device Model:     Hitachi HDS723020BLA642
Serial Number:    MN1240F33B1LDD
LU WWN Device Id: 5 000cca 369ef4c93
Firmware Version: MN6OAA10
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Mar  1 19:35:31 2014 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   28) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 321) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   001   001   016    Pre-fail  Always   FAILING_NOW 4294967295
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       86
  3 Spin_Up_Time            0x0007   181   181   024    Pre-fail  Always       -       300 (Average 333)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       18
  5 Reallocated_Sector_Ct   0x0033   065   065   005    Pre-fail  Always       -       597
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   138   138   020    Pre-fail  Offline      -       25
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       614
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       16
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       38
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       39
194 Temperature_Celsius     0x0002   153   153   000    Old_age   Always       -       39 (Min/Max 18/45)
196 Reallocated_Event_Count 0x0032   066   066   000    Old_age   Always       -       785
197 Current_Pending_Sector  0x0022   001   001   000    Old_age   Always       -       20933
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 1169 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1169 occurred at disk power-on lifetime: 601 hours (25 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 ca 05 01 04  Error: UNC at LBA = 0x040105ca = 67175882

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 c8 05 01 40 00   3d+13:34:29.203  READ FPDMA QUEUED
  60 08 f0 00 06 01 40 00   3d+13:34:29.175  READ FPDMA QUEUED
  60 08 e8 f8 05 01 40 00   3d+13:34:29.175  READ FPDMA QUEUED
  60 08 e0 f0 05 01 40 00   3d+13:34:29.175  READ FPDMA QUEUED
  60 08 d8 e8 05 01 40 00   3d+13:34:29.175  READ FPDMA QUEUED

Error 1168 occurred at disk power-on lifetime: 577 hours (24 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 ca 05 01 04  Error: UNC at LBA = 0x040105ca = 67175882

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 c8 05 01 40 00   2d+13:26:28.510  READ FPDMA QUEUED
  60 08 f0 00 06 01 40 00   2d+13:26:28.468  READ FPDMA QUEUED
  60 08 e8 f8 05 01 40 00   2d+13:26:28.468  READ FPDMA QUEUED
  60 08 e0 f0 05 01 40 00   2d+13:26:28.468  READ FPDMA QUEUED
  60 08 d8 e8 05 01 40 00   2d+13:26:28.468  READ FPDMA QUEUED

Error 1167 occurred at disk power-on lifetime: 553 hours (23 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 ca 05 01 04  Error: UNC at LBA = 0x040105ca = 67175882

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 08 c8 05 01 40 00   1d+13:56:16.781  READ FPDMA QUEUED
  60 08 00 00 06 01 40 00   1d+13:56:16.781  READ FPDMA QUEUED
  60 08 f0 f8 05 01 40 00   1d+13:56:16.778  READ FPDMA QUEUED
  60 08 e8 f0 05 01 40 00   1d+13:56:16.778  READ FPDMA QUEUED
  60 08 e0 e8 05 01 40 00   1d+13:56:16.778  READ FPDMA QUEUED

Error 1166 occurred at disk power-on lifetime: 529 hours (22 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 ca 05 01 04  Error: UNC at LBA = 0x040105ca = 67175882

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 08 c8 05 01 40 00      14:05:34.629  READ FPDMA QUEUED
  60 00 00 30 68 bc 40 00      14:05:34.628  READ FPDMA QUEUED
  60 00 00 30 67 bc 40 00      14:05:34.628  READ FPDMA QUEUED
  60 08 00 a8 1c c1 40 00      14:05:34.627  READ FPDMA QUEUED
  60 08 00 50 10 c1 40 00      14:05:34.627  READ FPDMA QUEUED

Error 1165 occurred at disk power-on lifetime: 515 hours (21 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 ca 05 01 04  Error: UNC at LBA = 0x040105ca = 67175882

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 c8 05 01 40 00      00:01:23.891  READ FPDMA QUEUED
  60 08 00 10 40 c4 40 00      00:01:23.695  READ FPDMA QUEUED
  60 08 00 80 11 47 40 00      00:01:23.685  READ FPDMA QUEUED
  60 08 00 e8 3f c4 40 00      00:01:23.684  READ FPDMA QUEUED
  60 10 00 30 1b 04 40 00      00:01:23.658  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       510         -
# 2  Short offline       Completed without error       00%       138         -
# 3  Short offline       Completed without error       00%        10         -
# 4  Short offline       Completed without error       00%         0         -
# 5  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
EDIT: LOL looks like we'll see how fast OVH/SYS replaces this disk now with this interesting item:
Code:
Drive failure expected in less than 24 hours. SAVE ALL DATA.
PS Andy, yes it seems like a sinking ship at the moment, it's only the price that keeps me here, and as you said in that post... When the servers work fine, everything is amazing - no qualms... It's only when you start receiving overused hardware from the 10's of users before you, you start to see why OVH gets bad mouthed on WHT for example. This is the 2nd drive I've had fail on me from OVH, the last occurrence was on an old-old mSP, well over a year ago and was replaced within hours - and you could see the intervention from the Support pages... With SYS, looking for details of the Hardware Intervention, all you see is:


Which is... Very helpful

Andy
01-03-2014, 17:56
See the link in my signature. I thought I had it bad but you have it even worse...!

Myatu
01-03-2014, 17:05
The error:
[/HTML]
40 51 08 f8 89 49 00 Error: UNC 8 sectors at LBA = 0x004989f8 = 4819448
relates to:

Feb 25 11:19:00 rescue kernel: sd 0:0:0:0: [sda]
Feb 25 11:19:00 rescue kernel: Add. Sense: Unrecovered read error - auto reallocate failed
Feb 25 11:19:00 rescue kernel: sd 0:0:0:0: [sda] CDB:
Feb 25 11:19:00 rescue kernel: Read(10): 28 00 00 49 89 f0 00 00 08 00
Feb 25 11:19:00 rescue kernel: ata1: EH complete
Because a self-test is a read operation, the HDD does not automatically re-allocates a failed sector (it only does that on a write operation by default).

You can force a write on the sector by using hdparm (but obviously you would lose any data in that sector - backup, backup, backup).

Overall the HDD is just fine though:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
A bad sector isn't uncommon, especially on a multi-terrabyte disk. There's a reason why you don't have access to the full 2TB. This particular HDD has 512 bytes per sector - 52 sectors, provided that value is correct - is a tiny drop in that huge bucket.

So no, I wouldn't replace that HDD if I were an OVH tech.

MrAbz
01-03-2014, 16:47
Tweeted Oles, FWIW...

ctype_alnum
01-03-2014, 15:11
Tweet Oles and see what he has to say for himself about this? 8 days is unacceptable.

MrAbz
01-03-2014, 12:47
Quote Originally Posted by NeddySeagoon
Its dead Jim ...
Try telling support that Been an ongoing issue for 8 days.

NeddySeagoon
01-03-2014, 12:05
Code:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 56
So ... you have 56 sectors that the drive knows it can't read, maybe many more.

Its dead Jim ...

MrAbz
28-02-2014, 23:59
I'm posting here in the hopes that someone gives support a kick up the backside and actually do something about my support tickets.

It all started on the 21/02/2014 @ 15:02... The UI for SYS is so simple, all it gives is "Hardware diagnosis". While I've been with OVH for a while now, on OVH support system, it details what the interventions are... I figured that was the last of it, however when I noticed the server becoming unstable and consistently going into rescue mode, I knew I had an issue.

So I checked /var/logs/messages and lo-and-behold, see the signs that the HDD is dying. So I posted a ticket [#2014022577000324 if anyone from OVH/SYS is reading] :
Quote Originally Posted by Me to Support
Hi, I noticed on webmin that one of the HDDs is failing with Current Pending
Sectors on SMART. The drive in question is:

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar 7K3000
Device Model: Hitachi HDS723020BLA642
Serial Number: MN1240F33B1LDD
LU WWN Device Id: 5 000cca 369ef4c93
Firmware Version: MN6OAA10
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Tue Feb 25 11:21:23 2014 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

---------------------

The disk has been removed from the raid array and ready for immediate replacement.

Many Thanks,
Hopefully when checking their systems someone would notice that there was an intervention earlier. I get a response back asking for logs:
Quote Originally Posted by Support to Me
Dear Customer

I can not see any problem with the drive. It is possible that the drive has left the software raid but not because of hardware problem?

If the drive has a hardware problem, could you show me the logs? The smartctl check will show whether the drive is malfunctioning or not, on the overall test.

Thanks

Kind regards,
So I oblige:
I have already removed the drive from the raid using mdadm - pending drive removal and raid array rebuild. About 4 days ago, the server went down with ~1200 SMART errors for Current Pending Sectors, and was investigated by SYS, however on the Control Panel, there is no information given for any interventions made, unlike OVH where you can view all the details for each intervention. However, here is the full log from smartctl:


root@rescue:~# smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.10.23-xxxx-std-ipv6-64-rescue] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar 7K3000
Device Model: Hitachi HDS723020BLA642
Serial Number: MN1240F33B1LDD
LU WWN Device Id: 5 000cca 369ef4c93
Firmware Version: MN6OAA10
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Tue Feb 25 15:25:38 2014 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 28) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 135 135 054 Pre-fail Offline - 85
3 Spin_Up_Time 0x0007 181 181 024 Pre-fail Always - 300 (Average 333)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 17
5 Reallocated_Sector_Ct 0x0033 075 075 005 Pre-fail Always - 561
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 135 135 020 Pre-fail Offline - 26
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 514
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 15
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 37
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 38
194 Temperature_Celsius 0x0002 157 157 000 Old_age Always - 38 (Min/Max 18/45)
196 Reallocated_Event_Count 0x0032 068 068 000 Old_age Always - 749
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 56
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 1162 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1162 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 f8 89 49 00 Error: UNC 8 sectors at LBA = 0x004989f8 = 4819448

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 f8 89 49 e0 00 00:09:31.999 READ DMA
c8 00 08 f0 89 49 e0 00 00:09:31.999 READ DMA
ca 00 08 f0 89 49 e0 00 00:09:31.999 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:31.991 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:31.990 READ NATIVE MAX ADDRESS EXT

Error 1161 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 f0 89 49 00 Error: UNC 8 sectors at LBA = 0x004989f0 = 4819440

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 f0 89 49 e0 00 00:09:28.935 READ DMA
c8 00 08 e8 89 49 e0 00 00:09:28.935 READ DMA
ca 00 08 e8 89 49 e0 00 00:09:28.935 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:28.927 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:28.927 READ NATIVE MAX ADDRESS EXT

Error 1160 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 e8 89 49 00 Error: UNC 8 sectors at LBA = 0x004989e8 = 4819432

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 e8 89 49 e0 00 00:09:25.887 READ DMA
c8 00 08 e0 89 49 e0 00 00:09:25.887 READ DMA
ca 00 08 e0 89 49 e0 00 00:09:25.887 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:25.875 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:25.875 READ NATIVE MAX ADDRESS EXT

Error 1159 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 e0 89 49 00 Error: UNC 8 sectors at LBA = 0x004989e0 = 4819424

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 e0 89 49 e0 00 00:09:22.815 READ DMA
c8 00 08 d8 89 49 e0 00 00:09:22.815 READ DMA
ca 00 08 d8 89 49 e0 00 00:09:22.815 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:22.807 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:22.807 READ NATIVE MAX ADDRESS EXT

Error 1158 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 d8 89 49 00 Error: UNC 8 sectors at LBA = 0x004989d8 = 4819416

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 d8 89 49 e0 00 00:09:19.759 READ DMA
c8 00 08 d0 89 49 e0 00 00:09:19.759 READ DMA
ca 00 08 d0 89 49 e0 00 00:09:19.759 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:19.751 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:19.751 READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 510 -
# 2 Short offline Completed without error 00% 138 -
# 3 Short offline Completed without error 00% 10 -
# 4 Short offline Completed without error 00% 0 -
# 5 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
So now we get to the swings and roundabouts...
Quote Originally Posted by Support to Me
Dear Customer

we would need the specific logs that point to the hardware problem, not all the logs.

Kind regards,
So I sent /var/log/messages to an attachment, and hear nothing back from support in 36 hours. After I emailed them again asking what is going to happen to the server, I receive:
Quote Originally Posted by Support to Me
yes, we would need more specific logs, we cannot work with the whole log file. You have to point us to the right logs where it says that the drive needs replacement.

Kind regards,
So now I get a bit of blue air out of my system and reply with:
Quote Originally Posted by Me to Support
I've provided the information, you "briefly scanned through it". I shall paste again below the specific points for your attention:

SMART Short Test Results:

196 Reallocated_Event_Count 0x0032 068 068 000 Old_age Always - 749
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 56

SMART Long Test Results:

Error 1162 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 f8 89 49 00 Error: UNC 8 sectors at LBA = 0x004989f8 = 4819448

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 f8 89 49 e0 00 00:09:31.999 READ DMA
c8 00 08 f0 89 49 e0 00 00:09:31.999 READ DMA
ca 00 08 f0 89 49 e0 00 00:09:31.999 WRITE DMA
ef 10 02 00 00 00 a0 00 00:09:31.991 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 00:09:31.990 READ NATIVE MAX ADDRESS EXT

From /var/log/messages:

Feb 25 11:18:57 rescue kernel: Sense Key : Medium Error [current] [descriptor]
Feb 25 11:18:57 rescue kernel: Descriptor sense data with sense descriptors (in hex):
Feb 25 11:18:57 rescue kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Feb 25 11:18:57 rescue kernel: 00 49 89 e8
Feb 25 11:18:57 rescue kernel: sd 0:0:0:0: [sda]
Feb 25 11:18:57 rescue kernel: Add. Sense: Unrecovered read error - auto reallocate failed
Feb 25 11:18:57 rescue kernel: sd 0:0:0:0: [sda] CDB:
Feb 25 11:18:57 rescue kernel: Read(10): 28 00 00 49 89 e8 00 00 08 00
Feb 25 11:18:57 rescue kernel: ata1: EH complete
Feb 25 11:19:00 rescue kernel: ata1.00: configured for UDMA/133
Feb 25 11:19:00 rescue kernel: sd 0:0:0:0: [sda] Unhandled sense code
Feb 25 11:19:00 rescue kernel: sd 0:0:0:0: [sda]
Feb 25 11:19:00 rescue kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Feb 25 11:19:00 rescue kernel: sd 0:0:0:0: [sda]
Feb 25 11:19:00 rescue kernel: Sense Key : Medium Error [current] [descriptor]
Feb 25 11:19:00 rescue kernel: Descriptor sense data with sense descriptors (in hex):
Feb 25 11:19:00 rescue kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Feb 25 11:19:00 rescue kernel: 00 49 89 f0
Feb 25 11:19:00 rescue kernel: sd 0:0:0:0: [sda]
Feb 25 11:19:00 rescue kernel: Add. Sense: Unrecovered read error - auto reallocate failed
Feb 25 11:19:00 rescue kernel: sd 0:0:0:0: [sda] CDB:
Feb 25 11:19:00 rescue kernel: Read(10): 28 00 00 49 89 f0 00 00 08 00
Feb 25 11:19:00 rescue kernel: ata1: EH complete
Feb 25 11:19:03 rescue kernel: ata1.00: configured for UDMA/133
Feb 25 11:19:03 rescue kernel: sd 0:0:0:0: [sda] Unhandled sense code
Feb 25 11:19:03 rescue kernel: sd 0:0:0:0: [sda]
Feb 25 11:19:03 rescue kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Feb 25 11:19:03 rescue kernel: sd 0:0:0:0: [sda]
Feb 25 11:19:03 rescue kernel: Sense Key : Medium Error [current] [descriptor]
Feb 25 11:19:03 rescue kernel: Descriptor sense data with sense descriptors (in hex):
Feb 25 11:19:03 rescue kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Feb 25 11:19:03 rescue kernel: 00 49 89 f8
Feb 25 11:19:03 rescue kernel: sd 0:0:0:0: [sda]
Feb 25 11:19:03 rescue kernel: Add. Sense: Unrecovered read error - auto reallocate failed
Feb 25 11:19:03 rescue kernel: sd 0:0:0:0: [sda] CDB:
Feb 25 11:19:03 rescue kernel: Read(10): 28 00 00 49 89 f8 00 00 08 00
Feb 25 11:19:03 rescue kernel: ata1: EH complete

From webmin UI:
Errors logged 1167 errors detected

Raw Read Error Rate 4294967295

Reallocated Sector Ct 575

Current Pending Sector 52

Full SMART attributes:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 001 001 016 Pre-fail Always FAILING_NOW 4294967295
2 Throughput_Performance 0x0005 135 135 054 Pre-fail Offline - 86
3 Spin_Up_Time 0x0007 181 181 024 Pre-fail Always - 300 (Average 333)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 18
5 Reallocated_Sector_Ct 0x0033 075 075 005 Pre-fail Always - 575
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 135 135 020 Pre-fail Offline - 26
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 566
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 16
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 38
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 39
194 Temperature_Celsius 0x0002 153 153 000 Old_age Always - 39 (Min/Max 18/45)
196 Reallocated_Event_Count 0x0032 067 067 000 Old_age Always - 763
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 52
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0


All of this has been provided in the tickets prior. I will again remove the SOFTWARE RAID from sda, so the drive can be replaced ASAP and rebuilt upon completion of the replacement.

Many Thanks,
And I am still waiting, more than 24 hours later...