OVH Community, your new community space.

why is a critical issue taking almost 12 hours without a fix?


Neil
08-04-2010, 12:16
Quote Originally Posted by HandsomeChap
Ok so if we provide the log and tick a box then even though the server is still technically available and pinging, etc, the replacement will still be completed within 1 hour as opposed to 12?

Useful to know incase the worst should ever happen!
It may be done in 1 Hour but we do not guarantee that, you need to look at the 'Repair' times under Level 1, which for HG, MG and EG is 2 Hours. Although if a technician has to do some pre checks, then you can include the 1 Hour intervention times which is 3 hours in total, after this we a breaking the SLA. SP Servers is 4 hours repair time.

If your server is in RAID 0 then you will need write on the ticket that you have made a backup and are ready for the drives to be replaced.

HandsomeChap
08-04-2010, 12:01
Ok so if we provide the log and tick a box then even though the server is still technically available and pinging, etc, the replacement will still be completed within 1 hour as opposed to 12?

Useful to know incase the worst should ever happen!

Neil
08-04-2010, 11:09
Quote Originally Posted by HandsomeChap
So just to confirm anyone using anything other than raid 0 (IE: people being sensible with their data) who has a single drive failure will never come under level 1 SLA for a hard drive repair, only a guarantee of level 2 (within 12 hours)?
Unless you supply Smartctl log and tick the box that I mentioned in the above post, which automatically creates the intervention (and makes it level 1) then it will have to be processed by our incident team and make it a level 2.

HandsomeChap
08-04-2010, 10:05
So just to confirm anyone using anything other than raid 0 (IE: people being sensible with their data) who has a single drive failure will never come under level 1 SLA for a hard drive repair, only a guarantee of level 2 (within 12 hours)?

Neil
08-04-2010, 09:37
Quote Originally Posted by turbanator
i don't want to sound too persuasive..but being a critical hardware fault and having opened up a ticket 24 hours ago..how much longer would i have to wait..before i can start using my server to full capability?

SLA

If your server becomes unavailable, Ovh guarantees intervention and repair time on levels 1 and 2, from 30 minutes to 4 hours, 24/7. The availability of the network is 99.9%, 99.95% and 99.99% monthly. In case of non-compliance with the SLA, penalties are automatically calculated.

Level 1
(Unavailable server, faulty component)
- Intervention (GTI) 1 hour
- Repair (GTR) 2 hours

as u can see i have a level 1 fault..i should no longer be waiting for more than 3 hours..and its been already 2 hours and sadly there has not been a start..

and my server is pinging only because it is in rescue mode..it wont if its in HD boot
All finished at 18:55:14. Regarding the SLA it just comes under the level 2 because the server did not go offline (because of the operating system being in RAID 1). Although if you want your ticket to be responded to quicker, then when opening the ticket copy the smartctl log that is listed in Rescue Mode into the ticket and tick the box the box that has the following statement:

By checking this box, you confirm the request for intervention on the server and authorise OVH staff to intervene on the dedicated server. I also have been warned by OVH the possible consequences of this intervention, including the risk of losing all or part of the data stored on the server. OVH can not be liable for loss or unavailability of such data.
This will then automatically open an intervention which will go straight through to the datacentre and they will replace the hard drive.

turbanator
07-04-2010, 18:02
ok they have intervened..and replacing will let u know when an update is sent

turbanator
07-04-2010, 17:24
Quote Originally Posted by Neil
As soon as a technician is free, you will receive a notification on the ticket when they are about to do it.
i don't want to sound too persuasive..but being a critical hardware fault and having opened up a ticket 24 hours ago..how much longer would i have to wait..before i can start using my server to full capability?

SLA

If your server becomes unavailable, Ovh guarantees intervention and repair time on levels 1 and 2, from 30 minutes to 4 hours, 24/7. The availability of the network is 99.9%, 99.95% and 99.99% monthly. In case of non-compliance with the SLA, penalties are automatically calculated.

Level 1
(Unavailable server, faulty component)
- Intervention (GTI) 1 hour
- Repair (GTR) 2 hours

as u can see i have a level 1 fault..i should no longer be waiting for more than 3 hours..and its been already 2 hours and sadly there has not been a start..

and my server is pinging only because it is in rescue mode..it wont if its in HD boot

Neil
07-04-2010, 16:53
Quote Originally Posted by turbanator
how long does it take for them to replace the drive?
As soon as a technician is free, you will receive a notification on the ticket when they are about to do it.

turbanator
07-04-2010, 16:22
how long does it take for them to replace the drive?

Neil
07-04-2010, 15:05
Hi

Well if you had pasted the Rescue Mode log, then it would of been something like this:

smartctl --all /dev/sda
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar T7K250 series
Device Model: HDT722525DLA380
Serial Number: VDS41DT4ELXJNJ
Firmware Version: V44OA96A
User Capacity: 250,059,350,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 1
Local Time is: Wed Apr 7 16:00:35 2010 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

This way we have the model and serial number so we know which drive to replace, an intervention is secluded, hopefully they will replace the right one, also since your home directory was in RAID 0 the data has been lost. Next time please send an email and not post on the forum bashing about a ticket that is not complete.

turbanator
07-04-2010, 14:57
Quote Originally Posted by Neil
Hi

Your server is online and pinging, you have supplied details about which hard drive has failed. If you were in Rescue Mode you should of copied the SmartCtl log which would show you which hard drive has failed (if any), it maybe that your raid just needs rebuilding.

that is the first thing i pasted Neil....when i opened the ticket? if you look below its there


The server does not boot via HD boot or NetBoot. Rescue
mode pro shows:

RAID
State of md1 (RAID 1) : degraded

Disk sda (698.6 GB) has a RED background [ERROR]

All the SMART data is showing old_age or pre_fail

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051
Pre-fail Always - 0
3 Spin_Up_Time 0x0027 233 230 021
Pre-fail Always - 8325
4 Start_Stop_Count 0x0032 100 100 000
Old_age Always - 24
5 Reallocated_Sector_Ct 0x0033 200 200 140
Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000
Old_age Always - 0
9 Power_On_Hours 0x0032 095 095 000
Old_age Always - 4101
10 Spin_Retry_Count 0x0032 100 253 000
Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000
Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000
Old_age Always - 22
192 Power-Off_Retract_Count 0x0032 200 200 000
Old_age Always - 20
193 Load_Cycle_Count 0x0032 200 200 000
Old_age Always - 24
194 Temperature_Celsius 0x0022 100 093 000
Old_age Always - 50
196 Reallocated_Event_Count 0x0032 200 200 000
Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000
Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000
Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000
Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000
Old_age Offline - 0

Disk SDA needs to be replaced ASAP.

Thanks.


what more should i paste? it took me 10 tries to get the server rebooted..

Neil
07-04-2010, 14:53
Hi

Your server is online and pinging, you have supplied details about which hard drive has failed. If you were in Rescue Mode you should of copied the SmartCtl log which would show you which hard drive has failed (if any), it maybe that your raid just needs rebuilding.

turbanator
07-04-2010, 14:20
can an ovh staff please check ticket number 413822

i have pasted the results of my HDD failing..but there has been no response to

it..it needs to be urgently replaced.