OVH Community, your new community space.

Failover IPs constantly failing


Diovan
02-06-2015, 18:40
Quote Originally Posted by marks
Hi Kamilleri

Are you saying that this is always happening (when someone on your same rack modified the vMAC you have a problem?)? if it's your case, let us have a look at your server.

I would say that's not the case at all, it doesn't happen as a rule. there can be the occasional bug, but generally speaking, even modifications done by yourself are put through without any problem. We just recommend not to do it on a live server, but it's not a problem that happens all the time.
My IPs gone offline... without any notice!... I did not even login in the server for last 10 days! did not edit anything! Thn how could this happen?
This must be happening due to something else!
And please solve this problem ASAP...

Kamilleri
02-06-2015, 17:43
Quote Originally Posted by marks
Hi Kamilleri

Are you saying that this is always happening (when someone on your same rack modified the vMAC you have a problem?)? if it's your case, let us have a look at your server.

I would say that's not the case at all, it doesn't happen as a rule. there can be the occasional bug, but generally speaking, even modifications done by yourself are put through without any problem. We just recommend not to do it on a live server, but it's not a problem that happens all the time.
Actually it has nothing to do with my server at all.

This is what's happening:
1. You buying a server. Setting a bunch of virtual machines.
2. You ordered and configured all your failover IPs. Everything is perfect, all of IPs are online and working. No letters from OVH about misconfiguration.
3. You are not touching Control Panel anymore (as you don't need to change already existing vMACs).
4. After some time, let's say month or a two, all your failover IPs (except main IP) suddenly becoming unavailable in the same time. You were asleep at that moment. They are not receiving any input connection at all, nor ICMP nor service. Nothing.
5. You are trying to reboot virtual machines, reenable their network, recheck all firewalls, disable firewalls. Nothing helping, all IPs from your account is not available to connect.
6. Since nothing helping, you just remembering that switching anything in "IPs" tab at Control Panel actually doing some heavy operation that's locking the whole IP manager for 20-40 seconds for all failover IPs.
7. You are trying just to create or remove any vMAC, for example for not yet used failovered IP.
8. The whole IP manager is locked. Something is happening at the backend, you are waiting and watching at those AJAX loading animations.
9. Network is back. All failover IPs that was unavailable became available. By all means it looks like I fixed this issue by triggering any action in IP Manager.

At the moment when every failover IP became unavailable at the same time, I wasn't managing my server.
It also couldn't be software problem on server because nothing helped but triggering action in IP Manager. I really tried everything. There was some production stuff going on, and this network shutdown was unexpected to me.

I experienced this already 3 times.

Are you saying that this is always happening (when someone on your same rack modified the vMAC you have a problem?)? if it's your case, let us have a look at your server.
I don't know the way to reproduce this issue.
For me, it looks like sudden network shutdown on all failovered IPs. I'm not changing or managing anything at that moment.
This makes me think that MAC routing issue appearing after some other customer starting to change something at his Control Panel, or maybe when two of them changing something at the same time. I don't know when and why this is happening, I'm even not sure if this appearing after some other customer starts to change anything. Network just disappearing, and I'm forced to go to Control Panel and retrigger any action to make failover IPs go back online again. This is rather strange and annoying behavior.

I left a TICKET#2015042819005231 with logs and graphs on all our servers when we experienced our previous network loss, but didn't get any useful feedback at all.

marks
02-06-2015, 15:38
Quote Originally Posted by Kamilleri
Thanks for breaking in!
The issue with vMACed failoverIPs unavailability appearing not when somebody modifying something at all. It's appearing by itself, in random period of time. Could be several days or months - just sudden failoverIPs unavailability, main IP is available at that moment.

Because going to Control Panel and toggling some operation like delete/create vMAC immediatly fixing it, we are suspecting that this happening when some other customer doing some modifications to their vMACs, and then something goes wrong on backend (routing tables? network hardware?) and other customers starting to experience sudden failoverIPs unavailability. Not massively, maybe rack-dependant.
Hi Kamilleri

Are you saying that this is always happening (when someone on your same rack modified the vMAC you have a problem?)? if it's your case, let us have a look at your server.

I would say that's not the case at all, it doesn't happen as a rule. there can be the occasional bug, but generally speaking, even modifications done by yourself are put through without any problem. We just recommend not to do it on a live server, but it's not a problem that happens all the time.

alvaroag
02-06-2015, 14:54
From my perspective, the problem with VMACs is about flexibility. Every change must go through the Manager, and it would take me some minutes waiting to change an IP from one server to another. Using techniques as ProxyARP I can do the changes in a minute, just changing firewall configuration and applying configuration. And, of course, this way I'm not exposed to such bugs on the Manager/Backend. Haven't suffered them when I used VMACs, but riskless is better.

Quote Originally Posted by wn6
I configured "Routed (brouter)" scheme using hetzner wiki. ipv4 seems working fine. I'm testing it. But I'm fails with setup routing of ipv6. Can anybody help?
Consider the network parameters for Hetzner are different from the required ones for OVH. Specially, the gateway address is totally different. With OVH, being your prefix "2607:5300:xxyy:zznn::/64", your gateway would be "2607:5300:xxyy:zzFF:FF:FF:FF:FF", that is, the gateway is outside your prefix. Under some Linux distros, you may need to add a route to the gatewat(/128) before the default gateway, or it will fail adding the default gateway. You may also need to run "sysctl net.ipv6.conf.eth0.proxy_ndp=1" and "sysctl net.ipv6.conf.eth0.forwarding=1"(Changing interface name if required) on after interface is up, so it would allow forwarding traffic between interfaces. Firewall rules may also be required to allow such traffic. BTW, what platform are you using?

Kamilleri
02-06-2015, 13:40
Quote Originally Posted by marks
you are welcome to use other methods, but the vMAC method of internet bridging is perfectly good. We would just recommend not to do modifications on the vMACs on a live server. normally operations on vMACs go through without no problem, but they are not subject to SLA, so if there is one, then you could experience some downtime till our engineers can check the server. So it's better to put the server in "maintenance" mode before doing modifications.
Thanks for breaking in!
The issue with vMACed failoverIPs unavailability appearing not when somebody modifying something at all. It's appearing by itself, in random period of time. Could be several days or months - just sudden failoverIPs unavailability, main IP is available at that moment.

Because going to Control Panel and toggling some operation like delete/create vMAC immediatly fixing it, we are suspecting that this happening when some other customer doing some modifications to their vMACs, and then something goes wrong on backend (routing tables? network hardware?) and other customers starting to experience sudden failoverIPs unavailability. Not massively, maybe rack-dependant.

marks
02-06-2015, 13:11
Quote Originally Posted by alvaroag
To avoid that kind of problems, it's better to avoid using VMACs. This is impossible(I think) on VMware, but it's perfectly possible on any Linux based hypervisor.
you are welcome to use other methods, but the vMAC method of internet bridging is perfectly good. We would just recommend not to do modifications on the vMACs on a live server. normally operations on vMACs go through without no problem, but they are not subject to SLA, so if there is one, then you could experience some downtime till our engineers can check the server. So it's better to put the server in "maintenance" mode before doing modifications.

wn6
02-06-2015, 06:33
Quote Originally Posted by alvaroag
To avoid that kind of problems, it's better to avoid using VMACs. This is impossible(I think) on VMware, but it's perfectly possible on any Linux based hypervisor.
I configured "Routed (brouter)" scheme using hetzner wiki. ipv4 seems working fine. I'm testing it. But I'm fails with setup routing of ipv6. Can anybody help?

alvaroag
01-06-2015, 19:49
What OS are you using? (Host)

Diovan
01-06-2015, 18:28
Quote Originally Posted by alvaroag
Diovan, not sure if that would fall under SLA, 'cause they can state "Your server is online and with network connection" [but without all the additional, importanr IP addresses]. A little tricky, but they may tell you that.

As always, I recommend to check your configuration. Being the Main IP (the one which comes with the server) "A.B.C.D", every FO IP "E.F.G.H" should be configured with mask "255.255.255.255" and with gateway "A.B.C.254". Also, make sure the VM is in the correct bridge(This applies specially for Proxmox).

If everything is OK, try deleting the VMAC and creating it again. Takes some minutes, and may require you to restart your VM/CT, but it's better than having it offline.

I'm sure this is a bug with the SYS Manager backend. Don't know it's structure, but it's surely on the backend. Probably, the backend is erroneously changing the VMAC or port for your FO IP directly on the router, but not on the database. That would explain why you see it OK in the manager, but doesn't works with the router.
Whatever the issue is... The IE customer service dont even receive the phone! Same thing happen to me about 20 days ago, I called them...they said its not a problem from there end!
Now its happening... once again... :@

alvaroag
01-06-2015, 15:46
Diovan, not sure if that would fall under SLA, 'cause they can state "Your server is online and with network connection" [but without all the additional, importanr IP addresses]. A little tricky, but they may tell you that.

As always, I recommend to check your configuration. Being the Main IP (the one which comes with the server) "A.B.C.D", every FO IP "E.F.G.H" should be configured with mask "255.255.255.255" and with gateway "A.B.C.254". Also, make sure the VM is in the correct bridge(This applies specially for Proxmox).

If everything is OK, try deleting the VMAC and creating it again. Takes some minutes, and may require you to restart your VM/CT, but it's better than having it offline.

I'm sure this is a bug with the SYS Manager backend. Don't know it's structure, but it's surely on the backend. Probably, the backend is erroneously changing the VMAC or port for your FO IP directly on the router, but not on the database. That would explain why you see it OK in the manager, but doesn't works with the router.

Diovan
01-06-2015, 15:15
My IP ALSO OFFLINE...

149.202.31.XXX
46.105.76.XXX
91.121.234.XXX

It almost offline for last 8 hours and yesterday it was offline for almost 14hours!
No ticket response from the SYS team!
Whats going on? Whats about SLA?

Plz solve the problem ASAP...

alvaroag
01-06-2015, 15:05
To avoid that kind of problems, it's better to avoid using VMACs. This is impossible(I think) on VMware, but it's perfectly possible on any Linux based hypervisor.

wn6
01-06-2015, 08:11
The same issue with all failover IP's.

Kamilleri
25-05-2015, 09:00
It's happened again...

zenithteq
19-05-2015, 07:00
@marks, 6 of my FO IPs have been down for over a week now, I have raised a ticket regarding this and called the support guys on a regular basis and have never had anything constructive back from the techies or engineers. The ticket reference is SD #13509 & SYS #13425 which was to do with being unable to delete vMACs.

Any help or escalation with these issues will be appreciated.

Thank you.

zenithteq
13-05-2015, 06:50
@marks, my IP woes still go on, I have raised a number of tickets and managed to talk to one of the support team to follow up on the issue but no resolution yet.

The IPs that I currently have issues are in the 178.32.48.xxx and 178.32.161.xxx range and have been down for the past couple of days.

Our server is in RBX-2

marks
12-05-2015, 11:21
it's probably to do with this task reported in status, to do with the upgrading of a number of switches:

http://status.ovh.net/?do=details&id=9463

it's fixed now, so it should be back. Could you confirm?

Deboshir
11-05-2015, 15:33
Quote Originally Posted by zenithteq
Also two other IPs have stopped working, there were no changes made on my side to the VMs.

As alvaroag explains, the vMAC must be getting duplicated and assigned to another IP and hence the traffic is interrupted. Could SYS support please escalate this as a problem rather than an issue?
+1 for the same issue. Have just added anew purchased IP from another subnet on same ESXi machine - works instantly. 2 Other IPs configured before (transferred from another VM) are down. Seems like a switch MAC issue?

Careimages
11-05-2015, 11:24
Quote Originally Posted by alvaroag
Are you sure? Didn't knew about that. Will try it on my next server upgrade.
Yep, that's how I've got it on my machines and it works a treat. I set up the route commands in post-up and pre-down scripts for vmbr1

alvaroag
11-05-2015, 04:47
Quote Originally Posted by Careimages
With Proxmox you don't even need to use ProxyARP to avoid using a VMAC, all you need to do is
ip route add dev vmbr1

on the physical host (assuming you've got vmbr1 set up as per the Proxmox guide).
Are you sure? Didn't knew about that. Will try it on my next server upgrade.

Careimages
11-05-2015, 02:54
Quote Originally Posted by alvaroag
You are not the only one being unable to delete the VMAC. Yesterday, I helped a guy with a problem with his server on SYS, and he had the same problem. To solve it quickly, he had to order a new IP. Calling SYS support might be the fastest way to solve this problem.

BTW, the best way to avoid using VMACs is using ProxyARP. This can be done on Proxmox(with either VMs and CTs). I suppose it can be done on any Linux-based hypervisor, such as Xen. Just need to install Shorewall and configure it.
With Proxmox you don't even need to use ProxyARP to avoid using a VMAC, all you need to do is
ip route add dev vmbr1

on the physical host (assuming you've got vmbr1 set up as per the Proxmox guide).

Kamilleri
10-05-2015, 17:49
Quote Originally Posted by alvaroag
BTW, the best way to avoid using VMACs is using ProxyARP. This can be done on Proxmox(with either VMs and CTs). I suppose it can be done on any Linux-based hypervisor, such as Xen. Just need to install Shorewall and configure it.
does Shorewall really needed to run Proxy ARP?
There is no extra configuration on VMs? Just public IP configuration, just like currently, but without need of using some special MAC address?

alex
07-05-2015, 21:41
Quote Originally Posted by zenithteq
As alvaroag explains, the vMAC must be getting duplicated and assigned to another IP and hence the traffic is interrupted. Could SYS support please escalate this as a problem rather than an issue?
they don't know the solution this is a reason no public reply.

alvaroag
07-05-2015, 17:22
You are not the only one being unable to delete the VMAC. Yesterday, I helped a guy with a problem with his server on SYS, and he had the same problem. To solve it quickly, he had to order a new IP. Calling SYS support might be the fastest way to solve this problem.

BTW, the best way to avoid using VMACs is using ProxyARP. This can be done on Proxmox(with either VMs and CTs). I suppose it can be done on any Linux-based hypervisor, such as Xen. Just need to install Shorewall and configure it.

zenithteq
07-05-2015, 15:11
I now have another funny issue with the vMAC on my IP Management page...am unable to delete the vMAC associated with an IP. Have raised a support ticket for this.

Also two other IPs have stopped working, there were no changes made on my side to the VMs.

As alvaroag explains, the vMAC must be getting duplicated and assigned to another IP and hence the traffic is interrupted. Could SYS support please escalate this as a problem rather than an issue?

alvaroag
02-05-2015, 00:08
I bet, for some unknown reason, the IP is getting assigned to another MAC or port, so the router will not route it anymore(at least not from that server). That won't get reflected on the manager, because the database for the manager is not altered. So, when the user assigns it a VMAC, the router configuration is changed according to the manager database. Then, when the user deletes the VMAC again, the router configuration is changed to reflect that change, and everything gets back to normal. That way most probable be an error on the manager backend, that may be changing MAC/port assignment for some IPs without needing to.

alex
01-05-2015, 23:54
Quote Originally Posted by zenithteq
I have come across similar issues with sudden drop in connectivity on my FO IPs. I am unsure as to what causes it. As Kamilleri says, I add or remove a vMAC from an unused IP and usually the connectivity is restored after a short while.

When the issues is raised with SYS/OVH Support, we first talk about the config on the VMs and we go back and forth a few times and we never find out what the underlying issue is.

I am glad that someone else has experienced the same issue. It's quite worrying when this happens to production VMs.

It has affected various single IPs and blocks for me, the most recent was with 178.32.161.172.xx and 46.105.218.xx.
I do believe that the issue at switch level, as we had exactly the same issue and all IPs are used for VM with vMAC, one solution was switch IPs to other server and back, as support is sucks - clueless about such issue.

zenithteq
01-05-2015, 00:15
I have come across similar issues with sudden drop in connectivity on my FO IPs. I am unsure as to what causes it. As Kamilleri says, I add or remove a vMAC from an unused IP and usually the connectivity is restored after a short while.

When the issues is raised with SYS/OVH Support, we first talk about the config on the VMs and we go back and forth a few times and we never find out what the underlying issue is.

I am glad that someone else has experienced the same issue. It's quite worrying when this happens to production VMs.

It has affected various single IPs and blocks for me, the most recent was with 178.32.161.172.xx and 46.105.218.xx.

marks
28-04-2015, 12:59
Quote Originally Posted by QuickClick
Is anyone else having issues with failover IPs at the moment? We seem to constantly have connection issues with them, at the moment the followign IPs do not ping:

91.121.218.xxx
87.98.241.xxx
188.165.0.xxx
188.165.28.xxx
188.165.20.xxx
87.98.225.xxx
5.196.151.xxx

Anyone else on the same class?
how do you use the IPs? for virtualisation or just normal extra IPs?

I can't see anything reported in status that would have affected your IPs, at least for what I know of the problem you had:

http://status.ovh.net/

Kamilleri
28-04-2015, 10:46
Mine failover IPs became unavailable 30 min ago.
Solved by going into IP manager and triggering any action, like attach virtual MAC.

I couldn't understand why they became unavailable. Probably some other user used IP manager, and something gone wrong for all rack? I don't know. There was somewhere topic related to this.

alvaroag
26-04-2015, 18:53
Everything working fine for me, currently working with 16 FO IPs with no problem.

Have you checked your configuration? Being you server's main IP "A.B.C.D", all your FO IPs should be configured with mask "/32" or "255.255.255.255", and gateway "A.B.C.254". Some linux distros require additional steps for this, as won't allow a gateway outside of the network.

QuickClick
26-04-2015, 17:41
Is anyone else having issues with failover IPs at the moment? We seem to constantly have connection issues with them, at the moment the followign IPs do not ping:

91.121.218.xxx
87.98.241.xxx
188.165.0.xxx
188.165.28.xxx
188.165.20.xxx
87.98.225.xxx
5.196.151.xxx

Anyone else on the same class?