OVH Community, your new community space.

Raidcard borked?


OVHelp
02-08-2009, 16:14
Still no response, server is stuck in Rescue Mode loop. 3 days and counting...

OVHelp
01-08-2009, 18:58
As a follow up, I am currently logged onto my other server and going through some debugging steps.

Is there any reason as to WHY we are unable to increase the Stripe Size on Raid0 raids? Anything above the default 64 produces errors?

OVHelp
01-08-2009, 15:16
Quote Originally Posted by derchris
Ok, FYI, this doesn't mean tw_cli is not working.
Because I get the same error on mine.



Looking at the errors, it looks like it is not c0, but you would be able to check with the show command.

So please check again.
And are you using a LSI card? The fact of the matter is was able to confirm my other EG AMD with LSIUtil. And not to mention that this is a fresh format, wth could of gone wrong.

The commands you suggested above display nothing promising.

//rescue> show c0

Error: (CLI:041) Invalid shell command.
//rescue> show

No controller found.
Make sure appropriate AMCC/3ware device driver(s) are loaded.

//rescue> info

No controller found.
Make sure appropriate AMCC/3ware device driver(s) are loaded.
As the opertator said, he wasn't sure why the the LSI was installed either - I'm sure a simple look by a technician could have this all solved. As it was a "new" kit could simply be poor configuration...

derchris
01-08-2009, 13:04
Ok, FYI, this doesn't mean tw_cli is not working.
Because I get the same error on mine.

//derchris> show c0

Error: (CLI:041) Invalid shell command.
//derchris> show

Ctl Model Ports Drives Units NotOpt RRate VRate BBU
------------------------------------------------------------------------
c0 8006-2LP 2 2 1 0 2 - -

//derchris> info c0

Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC
------------------------------------------------------------------------------
u0 RAID-1 OK - - 698.637 ON - -

Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 698.63 GB 1465149168 5QK0L8CC
p1 OK u0 698.63 GB 1465149168 5QK0LHLN
Looking at the errors, it looks like it is not c0, but you would be able to check with the show command.

So please check again.

OVHelp
01-08-2009, 05:08
Quote Originally Posted by Neil
I see you are running Debian (which it appears) what happens if you do tw_cli then info? Do you get any output?
As stated 3 times before, tw_cli commands do NOT work, all come back with errors/no information - I have a LSI card installed, I was able to create Raid0/1 on my other EG AMD box with the same controller using lsiutil (should document this better FYI).

rescue:~# tw_cli
//rescue> show c0

Error: (CLI:041) Invalid shell command.
//rescue> maint remove c0 u0 p0

Error: (CLI:003) Specified controller does not exist.
//rescue> maint remove c0 u0 p1

Error: (CLI:003) Specified controller does not exist.
//rescue>
And so on... item still not resolved; server has some def. configuration problems and needs to been checked in person - please stay on topic.

Neil
31-07-2009, 23:39
I see you are running Debian (which it appears) what happens if you do tw_cli then info? Do you get any output?

DedicatedPros
31-07-2009, 22:17
They don't knock doors down anymore, they get all sneaky and no one knowz you got taken

Try taking a few pictures of your local Security Service office, than you'd know the feeling

freshwire
31-07-2009, 22:04
Quote Originally Posted by DedicatedPros
It is safe to post IP addresses here, just by doing a traceroute on my domain you find out the hostname of mine, and its IP address, just like doing the same with microsoft.com, paypal.com, verisign.com, or even cia.gov (fbi.gov is a bit more secure though)
209.85.229.138 and 84.53.178.40 are what my firefox connects to for the duration of the request. The first IP does seem to change every so often. The second seems to switch between that and ..178.48.

There is no way to hide the true information if you connect to it then you can know the IP.. it probably isn't the IP of the actual hosting server but it's the IP you connect to to get to it. Probably like some 'firewall server'.

I think they use some sort of system between there various NS servers to work out the real requests from the lookups. Some sort of time delay between looking up on one then the other? Anyway thats just guessing and I probably should stop now before they knock the door down lol!

OVHelp
31-07-2009, 18:12
Just got off phone with Tech Support - guy was clueless. Telling me to run hdparm - pretty useless if the problem is there is no raid (nor can I create one), and the fact that fdisk -l isn't even displaying HDDs...

Apparently my kit was brand new - which I guess is good, BUT since it has a LSI card, which is NOT normal for EG AMD servers, and the fact that the kit is new - makes me wonder if it has been properly setup/configured (Download/Upload speeds were much lower then our other EG Bestof servers, with the EXACT same software/setting setup..). Again, operator couldn't help - and not sure what else i can provide Tech Support with.

Please let me knw OVH Team.

DedicatedPros
31-07-2009, 17:36
It is safe to post IP addresses here, just by doing a traceroute on my domain you find out the hostname of mine, and its IP address, just like doing the same with microsoft.com, paypal.com, verisign.com, or even cia.gov (fbi.gov is a bit more secure though)

freshwire
31-07-2009, 17:13
I don't think there is any real risk in publishing the IP address since any sort of attack can just pick any random OVH IP. Unless you have people who hate you there is no problems

91.121.175.121

OVHelp
31-07-2009, 16:54
Is it safe to post server address publicly?

94.23.214.173 - Please post when you have it Neil - I will remove after

@Neil: While your looking at my server, could you possibly also investigate/see how old the HDDs are, was running exceptionally slow while downloading/uploading from my other post the other day - wanted to see if it was ethier the HDDs or NIC card.

Neil
31-07-2009, 16:46
Quote Originally Posted by OVHelp
Thanks for your replies - in response to Raid0 and having a business server, the screenshot just HAPPENED to be after I tried Raid0, the same applies for Raid1.

And no, I spent days trying to figure out why tw_cli commands did not work when following the various guides scattered across these forums. I then tried a suggest from the OVH Team, to use lsiutil - and that seemed to do the trick (I have been able to do Raid0/1 on numerous other EG AMD servers, and now with a reformat, the whole thing seems to have gone to sh*t).

The lspci command shows:



Something is definitely wrong and needs to be sorted - no response from OVH Tech Support still...
Whats your server address so I can check the RAID Card and see where your request is?

OVHelp
31-07-2009, 16:27
Thanks for your replies - in response to Raid0 and having a business server, the screenshot just HAPPENED to be after I tried Raid0, the same applies for Raid1.

And no, I spent days trying to figure out why tw_cli commands did not work when following the various guides scattered across these forums. I then tried a suggest from the OVH Team, to use lsiutil - and that seemed to do the trick (I have been able to do Raid0/1 on numerous other EG AMD servers, and now with a reformat, the whole thing seems to have gone to sh*t).

The lspci command shows:

0000:02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 08)
Something is definitely wrong and needs to be sorted - no response from OVH Tech Support still...

Neil
31-07-2009, 14:46
The RAID Controller in the EG AMD Range should (99.9%!) be 3Ware, have you tried the tw_cli command?

derchris
31-07-2009, 13:10
How about a fsck on the FS?
Looks like the Journal gots corrupted.
If the Raid is working no problem when booting from HDD, then it is most likely that you are using the wrong CLI Raid tool to manage the Raid.
You sure it is a LSI?

On the other hand it is very funny claiming that your business can't have a downtime, but then trying to create a Raid 0.
Just my 2 cents

DedicatedPros
31-07-2009, 09:21
Well I have no clue how to help you fix this, but doing experimental work on a server that you need to be up is mad...

OVHelp
31-07-2009, 05:31
Upon a reformat of one of our servers today, EG AMD, and an installation of 'lsiutil' (this should really be installed by default OVH if u stick us with a hardware card - all ur guides do not point to how to fix raids with LSI cards...).

It is showing that we have NO raid, be it 0 or 1:



And when I go to create one (be it in Rescue or Normal mode), I am met with the following error:



[Note: On normal mode the error is listed as kernel:journal commit I/O error - and then the server because unresponsive]

I have NO idea whats going on, but require immediate help as my business can't have downtime like this - loss in sales is irreplaceable.