OVH Community, your new community space.

server crashing


ovhfan2
09-01-2010, 18:54
Quote Originally Posted by yonatan
Code:
yum install lm_sensors -y
sensors
normal output for a hardworking I7 machine ( hosts a LARGE forum with 70K users /day )

Code:
root@ns beastserv ~ # sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +66°C  (high =  +100°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1:      +65°C  (high =  +100°C)

coretemp-isa-0002
Adapter: ISA adapter
Core 2:      +65°C  (high =  +100°C)

.............................................
Yonatan, do you get any ALARM warnings after the cores info, like this?


w83627dhg-isa-0ca0
Adapter: ISA adapter
VCore: +0.79 V (min = +0.92 V, max = +1.48 V) ALARM
in1: +5.12 V (min = +10.72 V, max = +13.20 V) ALARM
AVCC: +3.36 V (min = +2.96 V, max = +3.63 V)
3VCC: +3.36 V (min = +2.24 V, max = +1.38 V) ALARM
in4: +1.09 V (min = +1.35 V, max = +1.65 V) ALARM
in5: +0.78 V (min = +1.13 V, max = +1.38 V) ALARM
in6: +2.46 V (min = +4.53 V, max = +4.86 V) ALARM
VSB: +3.36 V (min = +2.96 V, max = +3.63 V)
VBAT: +3.30 V (min = +2.96 V, max = +3.63 V)
Case Fan: 0 RPM (min = 715 RPM, div = 32) ALARM
CPU Fan: 0 RPM (min = 715 RPM, div = 32) ALARM
Aux Fan: 0 RPM (min = 715 RPM, div = 32) ALARM
fan4: 0 RPM (min = 715 RPM, div = 32) ALARM
fan5: 0 RPM (min = 715 RPM, div = 32) ALARM
Sys Temp: +43°C (high = +75°C, hyst = +70°C) [thermistor]
CPU Temp: +0.0°C (high = +85.0°C, hyst = +75.0°C) [CPU diode ]
AUX Temp: +0.0°C (high = +80.0°C, hyst = +75.0°C) [CPU diode ]
vid: +3.500 V

Should I be worried about those ALARM bits?

These figures are for a new, non-production i7-4T.

sic
27-12-2009, 22:04
ok. i just installed those how do you access the test results?

Since that last intervention the box has not crashed once! So that is like 19 hours and no defects! IT has not been like that for about a week now so fingers crossed!

I would just like to say thanks to all on here who have taken the time to help me with this problem. Especially over the festive season.

yonatan
27-12-2009, 13:33
Yeah , rescue mode wont let you install that.
you need to boot in normal mode..

sic
27-12-2009, 13:09
W: Not using locking for read only lock file /var/lib/dpkg/lock
E: Unable to write to /var/cache/apt/
E: The package lists or status file could not be parsed or opened.


am i being dumb? This was the response i got?

edit - Oh hang on i am still in rescue mode? Would that be the issue?

yonatan
27-12-2009, 13:01
Quote Originally Posted by sic
Hey yonatan thanks for the reply. unfortunately i am running ubuntu desktop. Would you happen to know the code for that o/s?
log on to your desktop
open a terminal

type

sudo bash

then the use the debian method.

ubuntu and debian are compatible.

sic
27-12-2009, 12:55
Hey yonatan thanks for the reply. unfortunately i am running ubuntu desktop. Would you happen to know the code for that o/s?

yonatan
27-12-2009, 12:05
Quote Originally Posted by sic
The intervention on xxx has been
completed.

This operation was closed at 2009-12-27 00:29:50

Here are the details of this operation:
CPU cooling check

If you need any further information regarding this
intervention, please do not hesitate to contact our
technical support.


I just got this reply back, so what does that mean exactly? Did they find a fault?
this means they have opened the rack and checked the heatsink.

you can check your cpu temp with sensors, so to see if the temperature is down to normal.
also , look at your /var/log/messages for any kernel messages about the CPU temperature - just to be sure.

for debian:

Code:
apt-get install lm-sensors
modprobe coretemp
sensors-detect
sensors
for centos

Code:
yum install lm_sensors -y
sensors


normal output for a hardworking I7 machine ( hosts a LARGE forum with 70K users /day )

Code:
root@ns beastserv ~ # sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +66°C  (high =  +100°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1:      +65°C  (high =  +100°C)

coretemp-isa-0002
Adapter: ISA adapter
Core 2:      +65°C  (high =  +100°C)

coretemp-isa-0003
Adapter: ISA adapter
Core 3:      +64°C  (high =  +100°C)

coretemp-isa-0004
Adapter: ISA adapter
Core 4:      +67°C  (high =  +100°C)

coretemp-isa-0005
Adapter: ISA adapter
Core 5:      +65°C  (high =  +100°C)

coretemp-isa-0006
Adapter: ISA adapter
Core 6:      +65°C  (high =  +100°C)

coretemp-isa-0007
Adapter: ISA adapter
Core 7:      +64°C  (high =  +100°C)

root@ns beastserv ~ #

sic
27-12-2009, 11:20
The intervention on xxx has been
completed.

This operation was closed at 2009-12-27 00:29:50

Here are the details of this operation:
CPU cooling check

If you need any further information regarding this
intervention, please do not hesitate to contact our
technical support.


I just got this reply back, so what does that mean exactly? Did they find a fault?

sic
27-12-2009, 11:15
well i tried to run the cpu test in rescue mode again but every time i tried it kept crashing, about 5 times so far?

I have sent this off to them again so fingers crossed i spose.

Andy
27-12-2009, 01:23
Indeed. It's their way of putting it off hoping it goes away. Just do as they ask I guess, and take it from there.

sic
26-12-2009, 22:05
lmao. ovh responded saying that as monitoring is off and that the cpu temp is now ok they want the test results again! surely one test should be sufficient?

Shephard
26-12-2009, 14:36
Quote Originally Posted by Andy
On the server status page of the manager, find monitoring and then click turn off. Easy It's a big icon with an exclamation mark in it I believe.
I did that and I entered the server at random times. It went down, but now I did not get noticed. So that would be a useless patch for me.

Thanks though.

Andy
26-12-2009, 11:20
You definitely have grounds for OVH to check the server then. Drop them a support ticket.

sic
26-12-2009, 11:10
i managed to get the server into rescue mode, as it went offline last night and was still off this morning. it would not respond to ping either.

Ok in rescue mode i am currently testing the cpu and it came back with a message saying that the system froze. it now does not respond to ping again! So got to presume that was my problem. lol

sic
25-12-2009, 19:07
glad you enjoyed your dinner!

those both sound like fantastic ideas! Any idea on how i do them?

lol

Andy
25-12-2009, 17:57
Yes thanks, we are. Dinner was gorgeous

By the way, check you don't have another NIC running in bridged mode (such as VMWare etc) as that can cause OVH to kick your server off the network if the MAC isn't specified in their routers.

Keep an eye on your logs as well to see if it reports anything in the times it "crashes".

sic
25-12-2009, 17:52
thanx andy i just did that so fingers crossed!

I trust you and yours are enjoying the festivities!

Andy
25-12-2009, 14:45
On the server status page of the manager, find monitoring and then click turn off. Easy It's a big icon with an exclamation mark in it I believe.

sic
25-12-2009, 13:50
hey andy, i hear you bro sometimes folks also feel neglected at xmas!

how do i turn off monitoring?

Andy
25-12-2009, 13:29
Servers feel lonely at christmas. While everyone is off eating dinner they get neglected and get no visitors to sites they host!

The server may not be crashing at all but OVH may be detecting it as down if their monitoring system see's that it stops pinging, and restarts it. I had this problem when there was nothing wrong with my server. I turned monitoring off and have had no issues since.

sic
25-12-2009, 10:05
Hey Yonatan i tried that the first time it did it and the checks came back clear but i will try it again.

@shephard no it is the old style 'XL'

merry christamas!

Shephard
25-12-2009, 03:11
Hello,

Is it a i7-2T? I have the very same thing in 3 i7-2T.

yonatan
24-12-2009, 15:03
boot it in rescue-pro and run hardware tests

sic
24-12-2009, 10:43
Hey thanks for taking the time to respond. I will try that. But how do i stop the server from crashing in the first place?

MicroChip123
24-12-2009, 10:20
Have you turned ping off?

If you have switch the monitoring off from the control panel or you will keep getting it put in to rescue mode.

sic
24-12-2009, 10:13
Hi guys,

i am after a little help please. I have had a server running ubuntu desktop for about a year now and it has been great. However, about 3 days ago the server just started crashing. Nx will connect like normal and then the server goes offline and will not respond to ping. I get 'that' email from ovh saying they have raised a ticket and about 5/10 minutes later it comes back online. It has done this about 50 times in 3 days. Making the server pretty much unusable.

I am unsure as what would be my best cause of action? Any help would be much appreciated.

And yes i am not the worlds best at this sort of thing.

Merry christmas one and all.