OVH Community, your new community space.

Problem with write speed degrading over days/week down to 4MB/sec or less.


AshleyUk
17-02-2015, 22:21
What is the output of cat /proc/mdstat now and when you next have the issue?

,Ashley

heise
17-02-2015, 16:02
Quote Originally Posted by mgould73
/# hdparm -tT /dev/sda

/dev/sda:
Timing cached reads: 23584 MB in 2.00 seconds = 11807.91 MB/sec
Timing buffered disk reads: 428 MB in 3.00 seconds = 142.48 MB/sec

/# hdparm -tT /dev/sdb

/dev/sdb:
Timing cached reads: 23636 MB in 2.00 seconds = 11833.19 MB/sec
Timing buffered disk reads: 436 MB in 3.00 seconds = 145.32 MB/sec
root@ns304751:/#
This would indicate that both HDD are fine even after days running, while as RAID they cannot perform as expected. That would indicate a software problem.

marks
17-02-2015, 12:37
just my little contribution, a bit different form the others: I agree that if the problem is fixed by a software reboot, then it seems to point out to a software issue. But if the problem appeared just recently, I understand that you're trying to see if there is a hardware issue.

To be honest, if you get stuck, one way to move forward would be to reinstall the OS. If this issue repeats exactly the same, then it would probably be hardware.

Careimages
17-02-2015, 00:29
I just think it's unlikely that many people run a configuration where different partitions of the same physical disks are being used with different Raid strategies, so there's more possibility that you could hit an obscure bug in the software raid controller (or some other associated bit of software) with such a setup. As I said in my first message, any problem that is cleared (even if only temporarily) by a reboot definitely implies a software problem rather than a hardware one. Perhaps some sort of buffer thrashing happens after a certain number of reads/writes? Just throwing out suggestions, I'm no expert of the intricacies of the raid controller code.

But if it's disk capacity rather than data security you're after why run Raid at all? You're unlikely to see much if any benefit from Raid 0 on these types of server unless your application is very specialised.

mgould73
16-02-2015, 23:56
Quote Originally Posted by Careimages
There could well be an issue with using two different software raid types on the same physical disks. What is the thinking behind using Raid 0 for /home? With modern disk speeds Raid 0 strikes me as pretty pointless.
It's software raid, and mixing raid types should never cause problems. You can have many partitions striping or mirroring on the same disks.

I have over 50 servers, never had a problem like this. Plus this server was working fine for 6 months, until just recently.

And people use raid 0 for striping for increased space/speed, for you maybe pointless but for others it can save costs on other servers with larger disks. (as long as your not worried about data) as well as disk throughput.

Careimages
16-02-2015, 21:14
There could well be an issue with using two different software raid types on the same physical disks. What is the thinking behind using Raid 0 for /home? With modern disk speeds Raid 0 strikes me as pretty pointless.

mgould73
16-02-2015, 20:56
Quote Originally Posted by heise
What do you get after a few days of running your server for


???
/# hdparm -tT /dev/sda

/dev/sda:
Timing cached reads: 23584 MB in 2.00 seconds = 11807.91 MB/sec
Timing buffered disk reads: 428 MB in 3.00 seconds = 142.48 MB/sec

/# hdparm -tT /dev/sdb

/dev/sdb:
Timing cached reads: 23636 MB in 2.00 seconds = 11833.19 MB/sec
Timing buffered disk reads: 436 MB in 3.00 seconds = 145.32 MB/sec
root@ns304751:/#

heise
16-02-2015, 20:52
What do you get after a few days of running your server for

hdparm -tT /dev/sda
hdparm -tT /dev/sdb
???

mgould73
16-02-2015, 20:27
Quote Originally Posted by heise
See unofficial FAQ on hardware diagnostic. Maybe that shows the error.
No errors detected on any smart tests, but write speed issues persist. Rebooting the server and not running any programs the server may last a few days and slowly start to lose write speed. Read speeds are always ok.

I don't know if its the disk(s) or the raid that is the problem, but something is wrong.

Here is a test after server was rebooted for about 1 day:

# sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 65.6433 s, 16.4 MB/s
/home#

(it has gone as low as 5MB/sec)


/home is raid 0
/ is raid 1

Both show just about the same write speeds when this occurs. Reboot from the cpanel will bring speeds back to normal for a period of a time.


After a reboot:
# sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 6.03028 s, 178 MB/s
/home#

Any other things I Can try? or is it time for a ticket?

heise
11-02-2015, 20:02
See unofficial FAQ on hardware diagnostic. Maybe that shows the error.

mgould73
06-02-2015, 19:44
Quote Originally Posted by Careimages
A problem that goes away with a reboot suggests some sort of softwar e issue. Check memory and cpu utilisation and any other processes that are running when you get the slow write.
CPU is under 10% and memory load is like 3%. I can stop everything I have running and it will still do it.

Sometimes it will stay ok for 3 weeks, sometimes 2 days.

But read speeds are not effected, so I am confused.

Careimages
06-02-2015, 17:35
A problem that goes away with a reboot suggests some sort of softwar e issue. Check memory and cpu utilisation and any other processes that are running when you get the slow write.

mgould73
06-02-2015, 16:50
Hello I have a ESAT-1 server that has been working well for some time now. Just in the past month I have started to notice problems with speeds incoming to the server. At first I thought it was a bandwidth problem, but after diagnosis I have found out that write speeds across the raid are degraded. The only way I have been able to fix is it to reboot the server, then write speeds return to normal. This seems to happen pretty quickly too, speeds can be normal hours before the problem happens. I've stopped all services/processes but write speeds remain like this until a full reboot. READ speeds are NOT affected!

Running a sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync

Results:
1073741824 bytes (1.1 GB) copied, 225.577 s, 4.8 MB/s

After a reboot:
1073741824 bytes (1.1 GB) copied, 4.66193 s, 230 MB/s

I have no idea what could be causing this, anyone have any ideas I can check?

Ubuntu 14.04 32bit, software raid 0.

I have run Smartctl -a on the disks, no errors or problems reported.

Thanks