OVH Community, your new community space.

Your SYS server is faulty? You pay for testing it


andy7
02-03-2017, 10:24
Happy end came, finally.

Support - 28/02/2017 15:17


Dear Customer,

All servers go through a compliance test before we assign the server to
customers.

As mentioned the logs doesn't indicate any issues and the rescue mode does
make usage of the RAM however the usage is minimal as it only has core
services for the rescue mode to operate.

In your case, whilst I have offered you 50% of the month, I can look to make
the offer to be at 100%. If you confirm this, I will apply this to the server.


Currently the current server of ns3029898.ip-188-165-210.eu has expired and it
has been recycled by the system. I can assign you another server of the same
type and specification for 1 month.

Kind Regards,
Danny
SoYouStart Support

Me - 28/02/2017 15:40

Hi Danny,
yes, it's a satisfactory solution, I can confirm I am interested in staying with SYS if you assign me a replacement server.

Support - 28/02/2017 17:20


Dear Customer,

I will make the arrangements for the new server.

It will be of the same spec. This will be done in the next 48 hours.

It will be set for 1 month.

Kind Regards,
Danny
SoYouStart Support

Support - 28/02/2017 20:12


Dear Customer,

As agreed I have assigned you the new server for 1 month:

ns3001569.ip-37-59-49.eu

Kind Regards,
Danny
SoYouStart Support

andy7
24-02-2017, 21:48
The saga continues.

Support - 21/02/2017 17:06


Dear Customer,

The RAM test is an complete test.

This is the logs from the previous RAM

---------------------

######################
# Test on thread 1
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 2
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 3
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 4
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 5
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 6
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 7
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 8
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

------------------------------------

What do you mean by the RAM test isn't complete? The test is for the full RAM
modules to be tested. The test is run from the rescue mode to verify the
issue.

The replacement which we did for the RAM was not needed but we did it based on
the request.

Kind Regards,
Danny
SoYouStart Support

Me - 21/02/2017 17:27

The software you use for testing does not test RAM occupied by system, and makes the test uncomplete and unreliable. The only proper way for RAM test is memtest86 run directly from BIOS, instead of any operating system.

As I said, I have a ubuntu xenial deployment which I run successfuly on many OVH machines without any issues. The only problem was with that SYS server and the server started to work properly exaxtly when you replaced RAM.


Support - 21/02/2017 20:42


Dear Customer,

This is an extract I have found at memtest website:

As much as possible of the system memory is tested. Unfortunately memtest86+
can usually not test all of the memory. The reason for this is that todays
processors have become so complex that they require a small amount of memory
to keep accounting data of the processor state. If memtest were to write
over these areas the state of the processor becomes invalid and it's
behaviour unpredictable. Alas it is also impossible to relocate these areas
in the memory.

The rescue mode ha a self relocating environment where by it would move to a
different part of the RAM. Whilst the system does this, its still not 100% of
the RAM but its sufficient to determine if the RAM has issues.

Kind Regards,
Danny
SoYouStart Support

Me - 21/02/2017 21:12

"rescue mode ha a self relocating environment where by it would move to a different part of the RAM" - it's simply not true, linux kernel is not relocating anywhere. With your simple memtest you are not even checking all available memory, because it's simply impossible by linux kernel memory design.

It's also possible that your memory was not properly inserted into the sockets and doing it again, you corrected the situation.

You did not quote whole memtest86 description, "If this part of the memory is defective you will know soon enough though as the processor, or parts of the processor simply won't work correctly if this part of your memory is defective." Because you have changed or at least re-inserted RAM, and my server does not restart, it simply means, that there are no more problems with hardware and that small amount of memory is not faulty.

I was not able to use the server for the month I paid, and by replacing memory the server started to behave normally as it should.

You have two solutions: keeping me as a paying customer after you refund me that month i spent discovering hardware problems or - second solution is - to suspend my server, making me not paying for next months, being not happy and never come back to SYS again.

Support - 22/02/2017 15:14


Dear Customer,

For the memory if it wasn't inserted properly the RAM wouldn't be detected or
the memory test will show errors or incorrect RAM values.

As for the relocating, its for the test itself which manages the RAM rather
than the kernel.

Whilst part of the rescue system uses the RAM, it is only a small value and it
doesn't use the whole stick of the RAM of the server, if the very same part of
the RAM is faulty, the logs would of indicated this issue.

currently your server has been suspended due to billing actions rather than
any action I have taken.

In light of this, I do wish to find you a resolution. I can look to offer you
50% of the payments on your next renewal upon your confirmation.

Kind Regards,
Danny
SoYouStart Support

Me - 25/02/2017 03:44

Sorry, but you have no idea what are you writing about. Your rescue mode takes huge amount of RAM, including kernel, web server, sshd server and so on.

50% is not satisfactory answer for a rental which I was able to use my server only for last 3 days of the contract, just to test that it is finally working.

If you offer 150%, it would be correct answer, but I can make a concession and accept 100% refund of the period when I was not able to use the server.

heise
21-02-2017, 17:48
Don't expect too much from SoYouStart. Also had a server running MariaDB exclusively and rebooting every few days. Just ordered a new one after third reboot. I ran their tests and opened a ticket. Since their tests didn't show any errors and the reboots were less than daily, it is for sure a hard to diagnose hardware problem. They didn't fix the problem before the end of the months, so I feel sorry for the customer after me that will experience the same hardware problems....

Dani
21-02-2017, 17:06
Hi

For the server, it has been suspended due to billing reasons as the server expired a few days ago.

In relation to the ticket itself, the logs which we have provided shows that the server the memory test for the RAM.

You can reply to the ticket if you wish to have more information regarding the server.

Thanks

Danny

andy7
21-02-2017, 10:27
Update: my server got suspended.

I paid 30 quid and spend one month to fix OVH hardware, unbelievable!

andy7
17-02-2017, 17:04
I am attaching a conversation between me and SYS 'support'. In summary: it seems that thay want me to pay for my own diagnosing their faulty hardware. Mind the delays in contact from 'support'.

EDIT 01/03/2017 - The issue was resolved, I got a working replacement and was refunded for a month I could not use the server.

Me - 26/01/2017 07:32

Hi, it seems that my server is being forcefully reset by OVH without any apparent reason. Could you please check your logs to tell me what is going on?


Support - 26/01/2017 13:17


Dear Customer,

What do you mean by force reset?

On our side I am unable to locate any force reset.

can you provide me with more information?

Kind Regards,
Danny
SoYouStart Support

Me - 27/01/2017 08:36

OK - if you exclude any restarts forced by OVH, the situation is even worse.

My server restarts in random intervals, usually just after boot. Sometimes I'm able to log in after boot, but usually for very short time, and later it reboots again.

It's a pretty standard build based on Xenial, which i run successfully on other OVH machines, OVH and kimsufi, without any issues.

This server is the only one which behaves unpredictable, I would tend to think that there's something wrong with hardware. Why hardware? Because it is completely random, intervals between reboots are random.

What can we do in this situation? Are you able to deliver a replacement server? if I were you, I would boot this server with memtest86 for few cycles, as usually such random reboots are result from misbehaving RAM.

In summary: it's impossible to use current hardware and we must sort it out.

Support - 27/01/2017 08:41


Dear Customer,

Yes a hardware issue is the logical explaination for the issue.

You can boot the server into the rescue environment to test the hardware.

Manager >> Net boot >> rescue64-pro.

This will allow you to get the logs.

If the hardware is defective, we will look to replace it.

Kind Regards,
Danny
SoYouStart Support

Me - 27/01/2017 17:27

Hi, yeah, I know how to use rescue cd, but there is nothing in logs, as server resets itself momentary. From my experience it's usually a result of bad RAM.

I do not clearly understand how could I test the RAM by myself from rescue64, as memtest86 does not allow to have networking running. Memtest86 usually requires a clean boot from BIOS, with KVM for example, which is not available at my server.

Could you please change the hardware, then, possibly keeping the disk drives?

Thanks,

Me - 27/01/2017 23:30

To be clear: the server is in rescue mode, waiting for your intervention with hardware.

Me - 29/01/2017 21:26

I do not clearly understand. You promise SLA 99.9%.

The facts are: you have delivered a server, which restarts itself under normal working schedule, with probably a faulty RAM.

You are responding to my support requests with at least 12h delays.

And currently the delay from our last conversation is longer than 48 hours.

It is SLA = 0%.

I hope you finally deliver a working server on Monday and fully refund me from the beginning from server delivery I spent on diagnosing RAM faults I have experienced, without having proper tools for doing that ie. no KVM access.

Let's have a new beginning. I hope this time the server will run smoothly

Support - 30/01/2017 09:33


Dear Customer,

The issue maybe due to the RAM.

The rescue mode allows a RAM test. Currently the RAM test is progress.

I also checked the drives and they appear to be healthy.

If the RAM is faulty, we can look to replace the RAM itself.

Kind Regards,
Danny
SoYouStart Support

Me - 31/01/2017 05:42

OK, could you please tell me what are results of the RAM test, please? Drives were OK, no problems with them.

Support - 31/01/2017 08:29


Dear Customer,

Here is the drive logs:

sda

smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.14.77-mod-std-ipv6-64-rescue]
(local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, [1]www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Intel 730 and DC S3500/S3700 Series SSDs
Device Model: INTEL SSDSC2BB240G4
Serial Number: BTWL34130B0S240NGN
LU WWN Device Id: 5 5cd2e4 04b4e36d2
Firmware Version: D2010370
User Capacity: 240,057,409,536 bytes [240 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Jan 31 10:20:06 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 2) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always
- 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always
- 14424
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 160
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always
- 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always
- 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always
- 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always
- 156
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always
- 634 (81 4824)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always
- 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always
- 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
- 0
190 Temperature_Case 0x0022 079 072 000 Old_age Always
- 21 (Min/Max 18/28)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always
- 156
194 Temperature_Internal 0x0022 100 100 000 Old_age Always
- 30
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always
- 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always
- 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always
- 799008
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always
- 4638
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always
- 0
228 Workload_Minutes 0x0032 100 100 000 Old_age Always
- 863574
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always
- 0
233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always
- 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always
- 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always
- 799008
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always
- 762347

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Short offline Completed without error 00% 14117 -
# 2 Short offline Completed without error 00% 14115 -
# 3 Short offline Completed without error 00% 14115 -
# 4 Short offline Completed without error 00% 14066 -
# 5 Short offline Completed without error 00% 8700 -
# 6 Short offline Completed without error 00% 8698 -
# 7 Short offline Completed without error 00% 8698 -
# 8 Short offline Completed without error 00% 8601 -
# 9 Short offline Completed without error 00% 8599 -
#10 Short offline Completed without error 00% 8599 -
#11 Short offline Completed without error 00% 8596 -
#12 Short offline Completed without error 00% 8595 -
#13 Short offline Completed without error 00% 19 -
#14 Short offline Completed without error 00% 19 -
#15 Short offline Completed without error 00% 18 -
#16 Short offline Completed without error 00% 18 -
#17 Short offline Completed without error 00% 14 -
#18 Short offline Completed without error 00% 13 -
#19 Short offline Completed without error 00% 13 -
#20 Short offline Completed without error 00% 5 -
#21 Short offline Completed without error 00% 5 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

sdb

smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.14.77-mod-std-ipv6-64-rescue]
(local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, [2]www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSC2BB240G6
Serial Number: PHWA6115003Z240AGN
LU WWN Device Id: 5 5cd2e4 04c71f2e1
Firmware Version: G2010150
User Capacity: 240,057,409,536 bytes [240 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Jan 31 10:20:06 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 2) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always
- 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always
- 311
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always
- 96
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always
- 0
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always
- 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always
- 0
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always
- 94
175 Program_Fail_Count_Chip 0x0033 100 100 010 Pre-fail Always
- 4612755750
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always
- 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always
- 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 073 066 000 Old_age Always
- 27 (Min/Max 24/34)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 94
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always
- 27
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always
- 0
225 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always
- 28044
226 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always
- 102
227 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always
- 46
228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always
- 18391
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always
- 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always
- 0
234 Unknown_Attribute 0x0032 100 100 000 Old_age Always
- 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always
- 28044
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always
- 33137
243 Unknown_Attribute 0x0032 100 100 000 Old_age Always
- 33817

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Short offline Completed without error 00% 3 -
# 2 Short offline Completed without error 00% 0 -
# 3 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

RAM logs:

######################
# Test on thread 1
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 2
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 3
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 4
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 5
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 6
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 7
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

######################
# Test on thread 8
######################
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 3832MB (4018143232 bytes)
got 3832MB (4018143232 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.

All logs indicate the components are operational.

Kind Regards,
Danny
SoYouStart Support



[1] http://www.smartmontools.org
[2] http://www.smartmontools.org

Me - 01/02/2017 07:07

Yeah, I have repeated this test and it has not detected any errors as well. But the server still restarts itself, so because of nature of that particular test, there is a chance that test does not test all memory. The thing is that I am not able to see anything in logs, as the server restarts momentary. My last idea is that you connect KVM on your cost to allow me to understand the reasons. As I said previously, I have the same linux build on at least 5 other OVH machines and they work properly without any single issue.

Me - 02/02/2017 06:37

The other option is if you are able to provide a replacement machine, I can install the server from scratch.

Support - 02/02/2017 08:54


Dear Customer,

On the same sever this is the current time:

10:52:14 up 1 day, 17:31,

The server is been online without any restarts for 24 hours.

We can look to leave the sever in this rescue mode for another 24 hours to see
if it reboots, currently I am still unable to locate the issue. If I am still
unable to resolve the issue I will forward the ticket to a higher level team
for further checks.

Kind Regards,
Danny
SoYouStart Support

Me - 02/02/2017 17:58

the thing is that in rescue mode the server works properly, but the kernel you provide is old and load is artificial.

Previously it rebooted itself at the beginning after installation of my system, and later it was working properly for few days without load, when i did not use it.

But when I started to use the server, it means heavy RAM operations, it started to reboot again, and now it reboots after reaching init level 3 momentary without leaving any traces in the logs.

Support - 03/02/2017 09:09


Dear Customer,

For your server you can change the kernel to the OS one if you wish.

By default we load the OVH kernel.

As for the server hardware, whilst we don't see any issues I can replace the
RAM as a preventive measure.

Kind Regards,
Danny
SoYouStart Support

Me - 09/02/2017 07:26

Hi, please replace RAM or provide a new machine. Using OVH kernel is not possible in my case.

BTW, I have rented OVH Cloud Instance just few days ago and the same setup works fine, so this SoYouStart server is the only one from dozens of machines which are not working properly. I'm not able to use it because of constant restarts.

Support - 14/02/2017 14:53


Dear Customer,

When is a good for us to replace the RAM?

For the kernel you can just use the stock kernel from the OS.

Kind Regards,
Danny
SoYouStart Support

Me - 14/02/2017 16:30

Any time, I'm not able to use my server and it's in rescue mode from the date I have created this ticket

Support - 14/02/2017 17:28

Hello,

The RAM will be replaced shortly.

For any other questions or concerns, please feel free to contact us through a
support ticket or through by phone at 0333 370 0427. We’re here to help you!

We thank you again for choosing SoyouStart,

Danny
SoyouStart UK Support

Me - 17/02/2017 10:17

Hi, after you changed RAM, the server is working properly. Thanks.

But I was not able to use it before, as it was broken, could you please extend rental period for one month?

Support - 17/02/2017 14:30


Dear Customer,

The logs which I have provided previously shows the RAM on the server as
healthy.

The test is an industry standard test where by the RAM is tested on all
aspects.

I did the RAM replacement as an exception. I'm not sure what the extended
rental period is for.

Kind Regards,
Danny
SoYouStart Support

Me - 17/02/2017 15:48

on 17th of January you provided me a broken server. I spent dozens of hours to diaganose the reason why it reboots itself and later contacted you for support. The RAM tests you made were insufficient to diagnose the fault, as they are incomplete - do not diagnose full RAM. After three weeks of conversations you finally fixed the hardware and delivered a working server. I do not plan to pay 30 pound for testing your hardware, it is definitey you who is going to pay for it.