OVH Community, your new community space.

Server Crashes/Timeouts


marks
30-04-2010, 14:22
I see that the server was intervened yesterday twice because it became unreachable. In those 2 occasions, the server was found in a black screen and was back online after reboot.

I'm afraid that you should be able to find out more information about, and also rule out some software causes of it (like using an up-to-date network kernel through the netboot).

You could also show all the time the server's rebooted through the last weeks (filtering through the logs).

If you want more help about it, it's better that we do that directly by email or call. After that, you can post the result on the forum, or just get help from other users here.

But this ticket doesn't show where the problem is.

Zoom
30-04-2010, 12:18
Please refer to ticket # 435071

If I dont do a hard reboot, it takes hours for it to come back online. What logs specifically should I look at? But looking at most logs it seems there is no specific reason at all.

marks
30-04-2010, 11:54
this could be caused by 2 reasons: your server crashes and reboots or network connectivity problem.

You would know that following the logs, because there you could see if the server was actually restarting at the moment that the monitoring system detected the issue.

If that's the case, check the logs around the moment it went down. Did it crashed suddenly and restarted right after? how many times the same thing repeats? In the support we can try to give you more information in case the server was intervened by an engineer.

If it's a network connectivity issue, we'll have to investigate further, for which we would need the server.

Send an email to the support if you want us to look further into it.

Zoom
30-04-2010, 08:02
Hi all,

I am having some serious problem with my EG-09 Max, and I haven't found any clue as to what is causing it. Hopefully someone can help me discover the underlying cause.

Basically the server becomes unpingable, and completely cuts off from the Internet with no SSH access. OVH RTM mails me saying its not pinging at all and that it opened a ticket blah blah. Thinking it was a hardware issue I even reordered the server to be sure it wasnt on my end since this has been happening almost everyday for the last 3-4 months. But, the problem still hasn't gone away. Only a HARD Reboot seems to bring it back online. Here is the system setup:

Debian Lenny 5.0.4 x64 Fully updated RAID 0
Linux ns2* 2.6.32.2-xxxx-grs-ipv4-64 #1 SMP Tue Dec 29 14:41:12 UTC 2009 x86_64 GNU/Linux

Hardware can be found here: http://www.ovh.com/fr/produits/eg_max.xml

Did all tests in rescue mode: All passed
I don't have a REALTEK nic which some people had problems with.
I looked through /var/log/* and here is what I saw
messages - only seems to report things after the hard reboot uptil bootup
dmesg - pretty much same as above with hardware info only
debug - pretty much same as messages and dmesg
kern.log - Most common errors are as follows
Apr 30 08:33:09 ns2** kernel: TCP: Peer 8*.*.*.*:40247/38933 unexpectedly shrunk window 209988050:209996377 (repaired) [and hundreds like that]
Apr 30 06:09:43 ns2** kernel: grsec: From 1*.*.*.*: Segmentation fault occurred at 00007f196baff000 in /usr/lib/jvm/java-6-sun-1.6.0.12/jre/bin/java[java:7888] uid/euid:1000/1000 gid/egid:1000/1000, parent /home/***/***/bin/wrapper[wrapper:7674] uid/euid:1000/1000 gid/egid:1000/1000 [and thounsands like that]
syslog - Combination of all of the above
In some logs I see a big line of ^@^@^@^@^@^@^@^@^@^@^@^ between crashes and reboot

Segmentation error lead me to this: http://forums.sun.com/thread.jspa?threadID=5437179 which says to use the kernel 2.6.32.2-xxxx-std-ipv4-64. How would I do that?

So any ideas or suggestions? I appreciate any help.