OVH Community, your new community space.

USB I/O Errors


mgv
10-07-2009, 06:29
Myatu : Thanks for the reply Myatu very constructive reply and some food for thought i need run to work just now but if i get the USB drive in question back online (been near the 48hr mark in a couple of hours) so i expect it hopefully to be back on when i get home i will sure give it a try as for the USB cable that was my thoughts exactly and asked for them to be changed if not already and nope i am not pestering ovh apart from the fact that its offline and i cant run these tests so have to pester them to get it online although i will give your reply a go and had thoughts along similar lines but you have given the commands for testing thanks for that i have a feeling the problem lies deeper we shall c and i will keep u posted.

Myatu
10-07-2009, 04:00
It's a bit tricky, and could be a number of things, but you may have to adjust the I/O buffer a bit.

Have a look at what it's currently at with

Code:
cat /sys/block/sdX/queue/max_sectors_kb
where sdX is your USB drive. Likely it will be at 240, seemingly the Linux default.

Now, depending on the make/model of the USB drive, this may have to go up or down. I guess this all depends on what OVH could stock up on at the time...

Some USB drives really don't like this to be higher than 128 (some even no higher than 64!). On the other hand, some other USB drives will see better I/O performance and no I/O errors with a higher value - up to 1024 (these are in multiples of 8 on a 32-bit OS [8, 16, 24, etc], multiples 16 on a 64-bit OS [16, 32, 48, etc]).

Start with a lower value, ie:

Code:
echo 128 > /sys/block/sdX/queue/max_sectors_kb
Where sdX, again, is your USB drive.

To test the impact on the I/O (and errors), first change to your USB drive and make a temporary directory for testing purposes. Ie:

Code:
mkdir /mnt/usb1/temp
cd /mnt/usb1/temp
(This assumes that /mnt/usb1 is the actual mount point for your USB drive.)

Now use dd to test different block sizes and throughput write speeds:

Code:
dd if=/dev/zero of=testfile bs= count=< * block size>
For you could use 1024 for example - 1 KB Blocks. Then, if you were to specify 102400 as the count parameter, it would generate a file the size of 100 M (105 MB). For example:

Code:
# dd if=/dev/zero of=testfile bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 0.388348 s, 270 MB/s
Or to use 512 KB Blocks and 1 GB file:

Code:
# dd if=/dev/zero of=testfile bs=524288 count=2000
2000+0 records in
2000+0 records out
1048576000 bytes (1.0 GB) copied, 10.2572 s, 102 MB/s
(You can delete the file "testfile" afterwards)

Not only will it be able to tell you the increase or decrease in performance, but also whether this still causes IO errors with:

Code:
dmesg | tail
or alternatively, by tracking it in real-time in a second terminal window with:

Code:
tail -f /var/log/syslog
You would see messages like

Code:
Jul 10 03:43:56 ashanti kernel: [17681643.076000] Buffer I/O error on device sdb6, logical block 28334
or

Code:
Jul 10 03:43:57 ashanti kernel: [17681643.076000] XT3-fs error (device sdb6) in ext3_dirty_inode: IO failure
if there's a problem still. If so, decrease (or increase) the max_sector value and try again. (Remember to use multiples of 8 or 16, depending on your OS).

If the I/O errors continue though, it might be worth asking OVH if they can replace the USB cable for you. I'm not saying that this *is* the cause in this case, but it is a *possible* cause ( so don't badger OVH )

Gosh, night shifts... What to do!?

mgv
10-07-2009, 00:44
Well i was thinking more a BIOS error of some sorts ! Each time they go offline with I/O errors ovh techs seem to get it back without issue or change to the actual system(software/configs) and when the "Power Supply" went when i asked what the issue was was the same errors so my feeling it is a BIOS error of sorts.

freshwire
09-07-2009, 22:09
3 servers sounds like some software/config issue to me

mgv
09-07-2009, 19:06
Is there anyone else out there with theses issues. I have 3 dedicated servers each have a couple of the 500GB USB`s, every now and again without notice or error one of the USB drives will go into I/O errors and the drive is inaccessible. I try rescue mode and do some scans for the USB drive but it does not show i reboot the server then it is totally gone and does not show anywhere i.e fdisk -l. Now this has happened on many ocassions on all 3 servers 2/3 times in a couple of months on 1 server. I raise a ticket and am told SLA does not cover this(ok i can accept that) so could take upto(in most cases over) 48hrs to fix. I get the email eventually that an intervention will happen within the 15mins then all off a sudden the USB is back on only 1 occasion have i lost the data on the USB`s. I have been told on 1 ocassion that it was the "Power Supply". Anyway i just wanted some confirmation that this was not just my issue i have spoken to a couple of guys and they have had the issues aswell.

Thanks for listening.