OVH Community, your new community space.

Very poor service; Any ideas?


Thelen
12-06-2013, 13:03
wow, just wow.

oles drunk and typing random **** into routers?

raxxeh
12-06-2013, 12:59
Jesus I never thought I would actually see packetloss originating on the HG network.


http://puu.sh/3dXXG.png

@Neil, might want to get your network ops to take a look at that.

I brushed off someone complaining about loss yesterday and said it was their connection, but jesus was I wrong, I have never seen rbx-s3-6k lose a packet until today.


edit:

http://puu.sh/3dYb7.png

Box doesn't even EXIST for his ISP anymore either.

Server -> his ip: http://puu.sh/3dYbp.png

edit: another winner http://puu.sh/3dYvS.png

chris6273
12-06-2013, 11:01
Quote Originally Posted by raxxeh
"Please provide traceroutes"

"no issues here"

"please provide traceroutes"

"no issues with that"

That's the gist.

Every single reply I've explicitly stated to them that internal traffic is fine, but I cannot break exactly 60.0 externally no matter how hard I try, and yet they seem to gloss over this simple fact.

I'm not a .uk customer, so i don't really expect much out of the UK service team; but the subsidiary where my server is are hilariously bad at comprehension.
Escalate the ticket to the manager if you can and mark the replies as -3 on the rating; that's what I did and it has made a lot more progress.

By the way as far as my ticket goes, it appears as though they have moved our server to a different router.

Old: vss-9b-6k.fr.eu

New: vss-9a-6k.fr.eu

If you are on vss-9b-6k.fr.eu keep an eye on your servers' packetloss! You may run into the same problem we have had. Until we monitor the server for a few days we cannot be sure whether the problem is completely resolved but the Network Administrator says the problem should be fixed.

Just so everyone knows as soon as I stopped taking silly support and used my real, firm attitude on the ticket (As you have probably noticed on this thread - I don't take any messing when I've had enough), I notified the manager and correctly rated the replies the ticket was escalated to the data-center team. Maybe 'Support' knew the 'Support' they were providing me was inadequate and were afraid of their managers getting involved?


In-fact in my last reply to the technical ticket itself (Separate from the incident ticket) yesterday morning I put everything out in black and white. I said I wasn't happy and that I expect their replies to be more proactive instead of going round in circles as-well as pointing out the silly attitude they have towards MTR test results. I also escalated the ticket to the manager straight after I replied to it and rated it.
Their reply to that was that suddenly out of the blue my results (evidence/MTR Tests and quality graph) were absolutely valid and fine and that their support had ran tests and passed the results to the data-center technicians.
Looking at their results, they sent 10,000 ICMP packets to our server and nearly 3000 got lost.


You guys with 10Gbps servers should try having a very firm attitude - Don't let 'support' push you away with excuses until you get the problem sorted. I've seen the prices of those 10Gbps servers and I'd not be having any of it until it is fixed. Escalate the ticket to managers if they don't get the problem fixed.

Plenty of the public and potential customers will see this thread and if they see that support can't/won't provide 'support' when the problem is very evident then OVH are not going to be very popular.

I just want to make it clear that anyone on vss-9b-6k.fr.eu should keep an eye on their servers for the same problems as we have been having.

raxxeh
12-06-2013, 10:05
Quote Originally Posted by chris6273
Have you received any 'help' from the 'support' through your OVH manager panel?
"Please provide traceroutes"

"no issues here"

"please provide traceroutes"

"no issues with that"

That's the gist.

Every single reply I've explicitly stated to them that internal traffic is fine, but I cannot break exactly 60.0 externally no matter how hard I try, and yet they seem to gloss over this simple fact.

I'm not a .uk customer, so i don't really expect much out of the UK service team; but the subsidiary where my server is are hilariously bad at comprehension.

chris6273
12-06-2013, 09:20
Quote Originally Posted by Neil
7Gbps is optimistic since the Mega RAID Controllers can only handle up to 6Gbps maximum, and then below that you have the performance of the hard disks themselves, where did you get the 7Gbps figure from?
To start off with I said I'd expect the connection to be capable of at least 7Gbps, not the server:

Quote Originally Posted by chris6273
Granted there are overheads involved but I'd expect the connection to be capable of at least 7Gbps.
I got the 7Gbps figure from the total connection speed (10Gbps) negative the overheads. Not sure on an exact figure but a small approximation of what I would expect.

Optimistic? As elcct said, you can serve data from the RAM which means these servers 'Should' be capable of 10Gbps.

Next if you have read the thread you'd know that these customers have had no problems achieving 10Gbps internally (Take note that these servers must be capable of 10Gbps then), but they are having problems externally. Now, are you going to say that this is a hard drive restriction? Because if they can do it internally it's 100% going to be an issue with your network which is preventing them from transferring at the same speed to an external network.

Quote Originally Posted by Neil
So far we have had not a ping/mtr or traceroute from you showing any packetloss, as Myatu as already mentioned "Note that with an MTR or traceroute, you have to keep in mind that ICMP is low priority. In other words, no reply to an ICMP doesn't mean it isn't functioning correctly -- it's just that it had 'better things to do' with other traffic, like serving up web pages over TCP. "
Right to me that sounds contradictory and the silly issue I've been having from your support because what you are saying is you want evidence in the form of an MTR to show packetloss, but that packetloss could be caused by low priority ICMP so in the end the evidence (MTR) would be inaccurate?

In other words; Why on earth do you want MTR tests when the results from them are going to be made inaccurate in the first place due to low priority ICMP? How are you going to see what is wrong and where there is something wrong with the inaccurate results? Your support are just going to tell customers these results are not valid as the packetloss is 'normal' on the routers on your network as they have done to me.

Out of interest was that pointed towards me? Because I've already sent your 'Support' numerous MTR results. Here is one:

| Host - % | Sent | Recv | Best | Avrg | Wrst | Last |

|------------------------------------------------|------|------|------|------|------|------|

| edge1-r-0-0-1.evansnet.co.uk - 0 | 183 | 183 | 0 | 0 | 2 | 0 |

| 217.32.141.2 - 0 | 183 | 183 | 7 | 10 | 15 | 9 |

| 217.32.140.206 - 0 | 183 | 183 | 7 | 19 | 210 | 8 |

| 213.120.161.98 - 0 | 183 | 183 | 12 | 14 | 20 | 14 |

| 31.55.164.179 - 0 | 183 | 183 | 12 | 15 | 20 | 15 |

| 31.55.164.107 - 0 | 183 | 183 | 12 | 14 | 23 | 13 |

| acc1-10GigE-0-3-0-5.bm.21cn-ipp.bt.net - 0 | 183 | 183 | 12 | 15 | 22 | 14 |

| core1-te0-4-0-4.ealing.ukcore.bt.net - 0 | 183 | 183 | 18 | 25 | 33 | 25 |

| peer1-xe2-0-0.telehouse.ukcore.bt.net - 0 | 183 | 183 | 19 | 22 | 79 | 19 |

| ldn-1-6k.uk.eu - 58 | 56 | 24 | 0 | 28 | 74 | 19 |

| rbx-g1-a9.fr.eu - 0 | 183 | 183 | 23 | 26 | 63 | 27 |

| vss-9b-6k.fr.eu - 71 | 48 | 14 | 25 | 86 | 385 | 25 |

| xxxxxxxx.kimsufi.com - 58 | 56 | 24 | 0 | 23 | 26 | 23 |



Quote Originally Posted by raxxeh
Given that I can actually get that speed internally, it cannot be a local configuration issue to my server, at least, I can't think of anything that would cause that to happen.
That's the point which this post has tried to point out re: 10Gbps servers. If it can achieve 10Gbps internally then it's a problem with OVH's network not having enough capacity to transit the traffic outside the datacenter. That or there is a restriction in place somewhere.

Have you received any 'help' from the 'support' through your OVH manager panel?

raxxeh
11-06-2013, 23:48
Quote Originally Posted by marks
60MB/s is 480Mbps, which is quite some bandwidth, though I agree that you should be able to do more. Can you exceed that with peaks of traffic?

Could you show as an MTR? could you show us the output of these commands aswell, please?
Inbound speeds are fine, as expected, can be pushed up to 500-600MB/s as expected.

Outbound speeds will not, under any circumstance, break 60.0MB/s, regardless of what I am doing.

This includes iperf to datacentres outside OVH, FTP/HTTP traffic, streaming traffic. Not just one service affected.


I can however, achieve true 10gbps internally, via iperf and ~5gbit FTP (ramdisk to ramdisk on 10gbit boxes, peaked out at around 400-500MB/s, still faster than 60.0MB/s for external OVH traffic).


Quote Originally Posted by chris6273
Whether it is quite some bandwidth or not, a 10Gbps box should be running at 10Gbps (Unless the traffic limitation has been reached). I remember the old specifications for the old servers before the limits went into place.

At 480Mbps the server is running at less than a twentieth of the capacity it is supposed too. It's like me buying a car capable of travelling 100 Miles Per Hour but it only lets me travel at 4.8 miles per hour and then being told to take it on a 'dyno' by the manufacturer/supplier and give them the results from it despite it being obvious there is a problem. Granted there are overheads involved but I'd expect the connection to be capable of at least 7Gbps.

I had the same attitude/view point from support last year when I gave the mini Kimsufi box a try. They didn't even both helping. For some reason I could max out the 100Mbps downstream completely but only managed to get about 30Mbps upstream at the most, at any time of day. Eventually I gave up as the server was only being leased for 1 month. As this server I am working on now belongs to a company I work for, I'm not going to back down until the problems are solved with the outages and packetloss.

Why don't you utilize engineers more around the datacenter? Why don't you ask them to check out these packetloss problems and speed problems? Don't you think it would be solved much more quickly if you did? They will have to get involved anyway, right?

Yeah, I love OVH and still swear by them; but if this is me being forced back onto a lower speed plan as punishment for keeping an old box and actually using the burst capability, it will hurt a little bit.

I have made an effort to ensure it never broke 40T externally just so I could keep the burst.


This is the internal 10gbit iperf: http://puu.sh/3dCA2.png (bwm-ng is running in dynamic mode, so it shifts to bytes dynamically)

I don't currently have access to iperf services running on leaseweb, redstation, hetzner, or nforce, but these locations would all result in a net total of 60MB/s even while being run all at once.

Given that I can actually get that speed internally, it cannot be a local configuration issue to my server, at least, I can't think of anything that would cause that to happen.

elcct
11-06-2013, 13:00
Quote Originally Posted by elcct
It's getting ridiculous...

I am getting max download speed of 100Kb/s now from London or Warsaw.
Ok i have exceeded 5TB on one of the servers that's why. Sorry about that.

elcct
11-06-2013, 12:36
Quote Originally Posted by Neil
7Gbps is optimistic since the Mega RAID Controllers can only handle up to 6Gbps maximum, and then below that you have the performance of the hard disks themselves, where did you get the 7Gbps figure from?
You can serve data from RAM.

Neil
11-06-2013, 12:34
Quote Originally Posted by chris6273
Whether it is quite some bandwidth or not, a 10Gbps box should be running at 10Gbps (Unless the traffic limitation has been reached). I remember the old specifications for the old servers before the limits went into place.

At 480Mbps the server is running at less than a twentieth of the capacity it is supposed too. It's like me buying a car capable of travelling 100 Miles Per Hour but it only lets me travel at 4.8 miles per hour and then being told to take it on a 'dyno' by the manufacturer/supplier and give them the results from it despite it being obvious there is a problem. Granted there are overheads involved but I'd expect the connection to be capable of at least 7Gbps.

I had the same attitude/view point from support last year when I gave the mini Kimsufi box a try. They didn't even both helping. For some reason I could max out the 100Mbps downstream completely but only managed to get about 30Mbps upstream at the most, at any time of day. Eventually I gave up as the server was only being leased for 1 month. As this server I am working on now belongs to a company I work for, I'm not going to back down until the problems are solved with the outages and packetloss.

Why don't you utilize engineers more around the datacenter? Why don't you ask them to check out these packetloss problems and speed problems? Don't you think it would be solved much more quickly if you did? They will have to get involved anyway, right?
7Gbps is optimistic since the Mega RAID Controllers can only handle up to 6Gbps maximum, and then below that you have the performance of the hard disks themselves, where did you get the 7Gbps figure from?

So far we have had not a ping/mtr or traceroute from you showing any packetloss, as Myatu as already mentioned "Note that with an MTR or traceroute, you have to keep in mind that ICMP is low priority. In other words, no reply to an ICMP doesn't mean it isn't functioning correctly -- it's just that it had 'better things to do' with other traffic, like serving up web pages over TCP. "

elcct
11-06-2013, 12:21
It's getting ridiculous...

I am getting max download speed of 100Kb/s now from London or Warsaw.

chris6273
11-06-2013, 11:47
Quote Originally Posted by marks
60MB/s is 480Mbps, which is quite some bandwidth, though I agree that you should be able to do more. Can you exceed that with peaks of traffic?
Whether it is quite some bandwidth or not, a 10Gbps box should be running at 10Gbps (Unless the traffic limitation has been reached). I remember the old specifications for the old servers before the limits went into place.

At 480Mbps the server is running at less than a twentieth of the capacity it is supposed too. It's like me buying a car capable of travelling 100 Miles Per Hour but it only lets me travel at 4.8 miles per hour and then being told to take it on a 'dyno' by the manufacturer/supplier and give them the results from it despite it being obvious there is a problem. Granted there are overheads involved but I'd expect the connection to be capable of at least 7Gbps.

I had the same attitude/view point from support last year when I gave the mini Kimsufi box a try. They didn't even both helping. For some reason I could max out the 100Mbps downstream completely but only managed to get about 30Mbps upstream at the most, at any time of day. Eventually I gave up as the server was only being leased for 1 month. As this server I am working on now belongs to a company I work for, I'm not going to back down until the problems are solved with the outages and packetloss.

Why don't you utilize engineers more around the datacenter? Why don't you ask them to check out these packetloss problems and speed problems? Don't you think it would be solved much more quickly if you did? They will have to get involved anyway, right?

marks
11-06-2013, 10:48
60MB/s is 480Mbps, which is quite some bandwidth, though I agree that you should be able to do more. Can you exceed that with peaks of traffic?

Could you show as an MTR? could you show us the output of these commands aswell, please?

raxxeh
11-06-2013, 02:19
Something has changed in the last week or so on their network.

my 10G box appears to have been limited back to about 500mbit max outbound (it's an older box, before the new 300mbit limits went into place)

Before the 3rd, the server would have no issues occasionally bursting to 3-4gbit out for short periods of time (10-15 seconds), and some time around there (I can't pinpoint it exactly as some days go by with no speed bursts)


They're insisting that it's an internal issue, and yet I am able to maintain 10Gbits internally for as long as I try (2 iperf sessions to 2 other 10gbit boxes), whenever I try, but am not able to exceed 60MB/s outside of the datacentre at all, at any time of the day.

It's a little disappointing to have had this happen when I've kept the 40TB limit so I am still able to have the burst for those few seconds every now and again, on top of the fact that I've never actually gone over 40TB in a month, I don't think I've ever been close.

Anyway, this issue might be unrelated to you guys (no packet loss for me) but there are a lot of things changing on their network that aren't exactly in the best interest for customers, it seems.

elcct
11-06-2013, 00:17
My users also are experiencing problems with servers.



You can see random drops of traffic. It's happening on dozens of servers since Friday.

Myatu
10-06-2013, 20:00
During a period where you are experiencing dropped packets, try using a third party source (other than Think Broadband) to perform ping tests. For example, use HE or Level3:

http://lg.he.net
http://lookingglass.level3.net

The graphic from TB is all good and well, but isn't very useful in determining where the cause is. Using the looking glass at HE allows you to test it from various locations on different networks. If they correspond with your server not replying to pings, then you can investigate it further.

Note that with an MTR or traceroute, you have to keep in mind that ICMP is low priority. In other words, no reply to an ICMP doesn't mean it isn't functioning correctly -- it's just that it had 'better things to do' with other traffic, like serving up web pages over TCP.

chris6273
10-06-2013, 13:19
Quote Originally Posted by cartwright118
You may want to speak to the technical support about your packet loss, but as regards to your server being unreachable then I would submit an incident. If you haven't disabled the automatic intervention then as soon as your server is unreachable via ping it automatically opens an intervention and they hopefully fix it. I would only open a incident request if the server is offline due to hardware problems and not a software related issue from an incorrect configuration. For configuration issues it's best to wait for the technical support as they may be able to help you more.

Technical support work normal business hours where incident support work 24/7.

Have you thought that maybe you are being DOS'ed?
From the experience I have received so far, technical 'support' hasn't been supportive at all.

Every piece of evidence I have presented to them has been more or less dismissed as not enough and additional quality graphs I have presented them, they have classed as unreliable. I just don't know what to do.

How on earth are customers supposed to get support with products if every piece of evidence they submit is dismissed?

I've even suggested as per the evidence where the problem may lie and I have been receiving replies dismissing them as if they think their network has no faults whatsoever; without them even checking whether they are at fault!

I'm also not pleased at the response times of the 'support'; on the website it states:

"4 hour guaranteed response time - 24/7"

When the only time it has been within this 4 hour window was today at 11:34.

Anyone know what I should do next?

chris6273
06-06-2013, 21:45
I received a reply from Technical Support saying an intervention hasn't been triggered yet. They have requested MTR tests which I have done (One initiated when the problem is occurring and another just before the problem happens - luck of the draw as it's unpredictable).

The tests I completed show the router before the server stops working when the server becomes unreachable and starts responding a few seconds before the server starts becoming reachable again.

As for the intervention, I have a smart explanation for the absence of one: The problem probably lies outside the network which monitors the server. Would this make sense?

Also since there isn't an intervention being triggered then there isn't actually anything wrong with the server but with the network as the outside world can't communicate with it, whereas OVH's systems can?

What do you think?

As for being DOS'ed I doubt it due to the sudden gaps in the graph as shown here:

http://www.thinkbroadband.com/ping/s...5-06-2013.html

Could maybe be QOS kicking in on a faulty/overloaded port?

cartwright118
06-06-2013, 08:43
You may want to speak to the technical support about your packet loss, but as regards to your server being unreachable then I would submit an incident. If you haven't disabled the automatic intervention then as soon as your server is unreachable via ping it automatically opens an intervention and they hopefully fix it. I would only open a incident request if the server is offline due to hardware problems and not a software related issue from an incorrect configuration. For configuration issues it's best to wait for the technical support as they may be able to help you more.

Technical support work normal business hours where incident support work 24/7.

Have you thought that maybe you are being DOS'ed?

chris6273
05-06-2013, 22:31
Quote Originally Posted by cartwright118
Did the invention not kick in? ... Did you submit an incident?
Invention? I've submitted a Technical Support Request; Is that enough or shall I submit an incident? I was put off by the charge if it isn't diagnosed to be their fault as the problem seems to be intermittent.

cartwright118
05-06-2013, 19:12
Did the invention not kick in? ... Did you submit an incident?

chris6273
05-06-2013, 19:04
Hi guys,

For the past few days we have been suffering from very poor service from OVH.

For some reason there appears to be major packetloss to our server as you can see from this graph:

http://www.thinkbroadband.com/ping/s...5-06-2013.html

Yesterday it seemed to clear up around 4pm and resumed at 2am. The server is completely unreachable when unable to be pinged including the website its hosting.

Does anyone have any idea as to what is going on? The server isn't restarting and before yesterday when we restarted it to see if that was causing the problem, it had a 71 day uptime.

Anyone?

Cheers,
Chris