OVH Community, your new community space.

Is this disk broken?


wminside
29-06-2010, 20:49
Quote Originally Posted by marks
Obviously, those commands don't change a thing. it's just reading information.

If you tell me the ticket number, I could try to see if there is more info.
The ticket id is 480722. I really appreciate it Mark

marks
29-06-2010, 09:19
Obviously, those commands don't change a thing. it's just reading information.

If you tell me the ticket number, I could try to see if there is more info.

wminside
28-06-2010, 20:52
OVH's tech team already fixed it. But they haven't told me yet what was wrong.

In the tickets they only say "software diagnosis"

After taking a look at the history on the server this is all they do
Code:
  517  cd /boot/grub/
  518  cat grub.conf
  519  fdisk -l
They do just that every time. Now it works. What's that supposed to fix? They are just checking my grub config

marks
28-06-2010, 15:43
everything looks alright.

Soft RAID has no errors, so it doesn't look there is any error on that.

Could you give more details? if the server doesn't boot up from the HD, put it back in rescue mode and check the logs for information on why it didn't boot.

wminside
28-06-2010, 13:34
Code:
root@rescue:~# mdadm -D /dev/md1
/dev/md1:
        Version : 00.90
  Creation Time : Thu Apr 15 21:13:57 2010
     Raid Level : raid1
     Array Size : 205696 (200.91 MiB 210.63 MB)
  Used Dev Size : 205696 (200.91 MiB 210.63 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Jun 28 14:20:03 2010
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 917de67b:7048b473:a4d2adc2:26fd5302 (local to host rescue.ovh.                                                                             net)
         Events : 0.17920

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
root@rescue:~# mdadm -D /dev/md2
/dev/md2:
        Version : 00.90
  Creation Time : Thu Apr 15 21:14:03 2010
     Raid Level : raid1
     Array Size : 77418432 (73.83 GiB 79.28 GB)
  Used Dev Size : 77418432 (73.83 GiB 79.28 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Mon Jun 28 14:20:11 2010
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 692b3be2:30b97038:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
         Events : 0.3501908

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
Code:
root@rescue:~# fdisk -l

Disk /dev/sda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1          26      205793   fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2              26        9664    77418496   fd  Linux raid autodetect
/dev/sda3            9664        9729      523872   82  Linux swap / Solaris

Disk /dev/sdb: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0001dd95

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1          26      205793   fd  Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdb2              26        9664    77418496   fd  Linux raid autodetect
/dev/sdb3            9664        9729      523872   82  Linux swap / Solaris

Disk /dev/md2: 79.2 GB, 79276474368 bytes
2 heads, 4 sectors/track, 19354608 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0x00000000

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/md1: 210 MB, 210632704 bytes
2 heads, 4 sectors/track, 51424 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0x00000000

Disk /dev/md1 doesn't contain a valid partition table
Now it doesn't boot from hd

Do I need to install grub on /dev/sda1 and set it up to be a boot partition?

marks
28-06-2010, 12:35
the hardware of the disk is fine.

Maybe you can check the state of the software RAID:

# mdadm -D /dev/md1

post that output plus the fdisk -l to check further

wminside
28-06-2010, 12:03
Edit Trying what marks just posted

Code:
root@rescue:~# smartctl --attributes /dev/sda
smartctl 5.40 2010-02-03 r3060 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                                                                             FAILED RAW_VALUE
  3 Spin_Up_Time            0x0020   100   100   000    Old_age   Offline      -                                                                                    0
  4 Start_Stop_Count        0x0030   100   100   000    Old_age   Offline      -                                                                                    0
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -                                                                                    0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -                                                                                    781
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -                                                                                    53
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -                                                                                    38
225 Load_Cycle_Count        0x0030   200   200   000    Old_age   Offline      -                                                                                    4321
226 Load-in_Time            0x0032   100   100   000    Old_age   Always       -                                                                                    2034
227 Torq-amp_Count          0x0032   100   100   000    Old_age   Always       -                                                                                    2
228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always       -                                                                                    3054073515
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -                                                                                    0
233 Media_Wearout_Indicator 0x0032   099   099   000    Old_age   Always       -                                                                                    0
184 End-to-End_Error        0x0033   100   100   099    Pre-fail  Always       -                                                                                    0
root@rescue:~# smartctl --health /dev/sda
smartctl 5.40 2010-02-03 r3060 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
It took like two seconds to complete both "tests".

I believe this just fixed it

Code:
sfdisk  -d /dev/sdb | sfdisk /dev/sda --force
 mdadm --add /dev/md1 /dev/sda1
mdadm --add /dev/md2 /dev/sda2

marks
28-06-2010, 11:56
yes, smartctl has got a bug running on SSD drives and that's the reason in red. But it's just an incompatibility with the standard smartctl commands.

you can use the commands:

# smartctl --attributes /dev/sda - displays all SMART attributes
# smartctl --health /dev/sda - displays overall SMART health status of the drive

Myatu
28-06-2010, 04:42
Looks fine to me. It's an SSD with only 780 working hours, so I doubt its an "old age" issue

Boot your system as usual and type "mdadm -add /dev/md* /dev/sda" where "/dev/md*" is your RAID array (probably 1) and it'll resync it with /dev/sdb. If it fails again, it would need deeper investigating. If not, it was probably a haywire bit/byte...

wminside
28-06-2010, 03:36
Quote Originally Posted by yonatan
what did you run on that disk?
how long did it work for you?

broken mirror could happen from Hard Reboot.

I am not sure that smartctl is a good way to check there.

regarding:
http://travaux.ovh.net/?do=details&id=3467

what did they actually fix there?
I took the "all tests" option but I believe that particular log was there right after I logged into that web panel. There's something wrong with that drive. I mean it's red in the picture. I bet that is't good lol.

yonatan
28-06-2010, 02:34
what did you run on that disk?
how long did it work for you?

broken mirror could happen from Hard Reboot.

I am not sure that smartctl is a good way to check there.

regarding:
http://travaux.ovh.net/?do=details&id=3467

what did they actually fix there?

wminside
28-06-2010, 02:11
My RAID1 got degraded. It's displaying sdb but not sda anymore [_U] so I just booted into pro-recue mode and here it is the sda log

Code:
smartctl 5.40 2010-02-03 r3060 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     INTEL SSDSA2M080G2GC
Serial Number:    CVPO004503E3080BGN
Firmware Version: 2CV102HD
User Capacity:    80,026,361,856 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
Local Time is:    Mon Jun 28 03:00:15 2010 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 (   1) seconds.
Offline data collection
capabilities: 			 (0x75) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Abort Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 (   1) minutes.
Conveyance self-test routine
recommended polling time: 	 (   1) minutes.

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0020   100   100   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0030   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       780
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       51
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       36
225 Load_Cycle_Count        0x0030   200   200   000    Old_age   Offline      -       4321
226 Load-in_Time            0x0032   100   100   000    Old_age   Always       -       1706
227 Torq-amp_Count          0x0032   100   100   000    Old_age   Always       -       1
228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       4127815179
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   099   099   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   099    Pre-fail  Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Here's a screenshot



I also noticed I have half my usual swap memory
Code:
Swap:   523864k total
I set it up to have 512MB on each disk.

It's a SSD btw.

What's going on?