Saturday, December 21, 2013

Enhancing VM Mobility with VxLAN, OVSDB and EVPN

Organizations are increasingly using virtual machine mobility to optimize server resources, ensure application performance and to aid in disaster avoidance. Typically VM live migration has relied on increasing the scale of the L2 broadcast domain to ensure that the VMs can be reached after migrations using their current addressing. This has resulted in the increasing use of VLANs and the need for L2 extension over the WAN.  As a result organizations are looking for ways overcome the limitations with VLAN scale and for methods to extend the L2 domain over the WAN that ensure the best performance. VxLAN has emerged as an alternative technology to VLANs, and EVPN has emerged at a better way to transport VMs over the WAN. Together these technologies can enable VM live migration over the WAN, or long distance vMotion in VMware parlance, but they need to all work together effectively and this is where OSVDB, VxLAN routing and a new technology from Juniper called ORE come in to play.

VxLAN Increases VLAN Scale
Organizations are increasingly looking to VxLAN as a solution. The primary goals behind this network architecture is to increase traditional VLAN limits from 4,094 and to enable VM mobility across Layer 3 subnets. VxLAN is a tunneling technology and is used to create an overlay network so that virtual machines can communicate with each other and to enable the migration of VMs both within a data center and between data centers. VxLAN enables multi-tenant networks at scale, as a component of these logical, software-based networks that can be created on-demand. VxLAN enables enterprises to leverage capacity wherever it’s available by supporting VM live migration. VxLAN implements a Layer 2 network isolation technology using MAC in IP encapsulation that uses a 24-bit segment identifier to scale beyond the 4K limitations of VLANs.

OVSDB Provides Control for VxLAN
In an earlier blog, see this link, I noted that there was a concern with the early implementation of VxLAN in that it used network flooding for MAC address resolution and this was not likely to scale. That has been fixed now with the use of OVSDB by some SDN controllers as a mechanism to manage state in an overlay network. OVSDB or the Open vSwitch Database Management Protocol is an OpenFlow configuration protocol that is used to manage Open vSwitch deployments. A database server and a switch daemon are used along with the OVSDB protocol to supply configuration information to the switch database server.  Using the OVSDB protocol you can create, configure and delete ports and tunnels and create, configure and delete queues.  Juniper has implemented OVSDB support on the MX Series and on the QFX5100 switch so that they can interoperate with SDN controllers such as VMware’s NSX that use VxLAN to connect to virtual machines.

VxLAN Routing and SDN
Juniper has announced support for VxLANs on the MX Series routers and on the QFX5100 TOR, both of which can act as VTEPS or VxLAN Tunnel End Points. VxLAN is used as a virtual network tunneling protocol by some SDN controllers such as VMware’s NSX. When registered with the NSX controller, the MX Series platforms can be configured to provide Layer 3 gateway services via the VMware NSX API, allowing the NSX controller to coordinate the creation of VxLAN tunnels. Juniper is delivering VxLAN routing capabilities on the MX Series that allow virtual machines to communicate with other IP subnets and/or other IP networks. VxLAN routing allows application decisions to be centralized and managed independent of individual switches, routers, and other data center devices in a VMware NSX environment. The MX is capable of operating independently of the VMware NSX using standard routing tables utilizing the capabilities of routing information bases and/or forwarding information bases, or by registering with VMware’s NSX controller to provide external routing services.

VxLAN to EVPN Stitching for L2 Extension
Juniper recently announced support for EVPNs as a better way to do L2 extension over the WAN. See this link. EVPN delivers multi-point connectivity among Ethernet LAN sites across an MPLS backbone. It is similar to VPLS but adds the capability to use BGP control plane driven MAC address learning to avoid flooding of the network. It increases the scale of MAC addresses and VLANs that can be supported. BGP capabilities such as constrained distribution, route reflectors, and inter-AS are reused to provide better convergence in the event of network failures. Some organizations see VxLAN as a more suitable technology for use within a data center, while EVPN is seen as being suited for use over the WAN. Juniper has enabled extending VxLANs over the WAN by providing the capability to stitch VxLANs to EVPN. This means that you can create a tunnel mesh of VxLANs in the data center and extend them to another data center over EVPN. It also means that EVPN can be extended to a TOR that does not support MPLS based EVPN but that does support VxLAN.

epvnvxlan.png

Translating between SDN Types
SDN controllers such as an NSX or Juniper’s Contrail have their own control plane and data plane methods such as OVSBD for the NSX using VxLAN or BGP for Contrail using IP VPNs. If you want to connect to multiple hypervisors, and they are controlled by their own SDN environment, then you need a gateway between them for translation tunnels. The MX Series can create connections between two different SDN systems that are using different control plane protocols. For example it can take learned control plane information from OVSDB for VMware’s NSX on one interface and can do the same with BGP for Juniper’s Contrail on another and share information. It can encapsulate the data plane information appropriately based on the learning. The two controllers can be on the same subnet or they can be on different subnets where the MX can route between them. These connections might be same LAN or in a different data center. The MX can also program L2 and L3 entries to VMware’s NSX using the OVSDB API. This enables you to try out any SDN system and communicate between them or migrate resources from one to the other.

Overlay Packet Replication
With the various tunnel types in use there is still a need to optimize network broadcasting. BUM traffic or Broadcast, Unknown Unicast and Multicast can put an excessive load on the network. Layer 2 packets which have not been learned by the switch must be flooded to all devices in a broadcast domain. Some Layer 2 packets must be flooded to more than one device. BUM traffic must be processed efficiently so that addresses can be resolved and data traffic can continue to flow in the event of VM moves. To solve this problem Juniper has implemented what we call the Overlay Packet Replicator (ORE) capability in the MX Series routers. The MX Series can act as the data center edge device for L2 extension and as a gateway device between SDN systems. Without the MX acting as a gateway in SDN environments you will need to deploy an appliance (usually an x86 server running an application in a VM) to do the BUM traffic replication.

Without ORE when the server needs to send a BUM packet such as an ARP, or DHCP a proprietary packet is sent to a x86 Virtual Machine dedicated for BUM Replication. The x86 Virtual Machine converts the packet into a standard Multicast or Broadcast packet and forwards it to all intended receivers. This is a sub-optimal method as it becomes an exponential burden that doesn’t scale, is subject to performance degradation and is unreliable.

With ORE when the server needs to send a BUM packet a proprietary packet is sent to the MX which then converts the packet into a standard multicast or broadcast packet and forwards it to all intended receivers. This is an optimal method because the conversion and replication is done on purpose built hardware using Juniper’s programmable silicon enables this functionality and provides much greater scale and performance.

Ore.png

To learn more about how the gateway capabilities on the MX Series see this whitepaper, Integrating SDN into the Data Center.

No comments:

Post a Comment