Network planning with ExpressRoute for Office 365

Before adding Azure ExpressRoute to your network architecture, it is a good idea to think through and plan for how application requests will be translated into network traffic.

ExpressRoute for Office 365 provides layer 3 connectivity between the customer’s network and Microsoft’s datacenters. The layer 3 connectivity is provided through Border Gateway Protocol (BGP) routing advertisements that offer direct routes to Office 365’s front end servers. From the perspective of the devices on-premises, Azure ExpressRoute is seen as an alternative to the internet when evaluating the correct TCP/IP path to Office 365.

Azure ExpressRoute adds a direct path to a specific set of supported features and services that are offered by Office 365 servers within Microsoft’s datacenters. Azure ExpressRoute doesn’t replace internet connectivity to Microsoft datacenters or basic internet services such as domain name resolution.

The following table highlights a few differences between the internet and Azure ExpressRoute connections in the context of Office 365.

Differences in network planning

Internet network connection

ExpressRoute network connection

Access to required internet services, including;

  • DNS name resolution

  • Certificate revocation verification

  • Content delivery networks

Yes

Requests to Microsoft owned DNS and/or CDN infrastructure may use the ExpressRoute network.

Access to Office 365 services, including;

  • Exchange Online

  • SharePoint Online

  • Skype for Business Online

  • Office Online

  • Office 365 Portal and Authentication

Yes, all applications and features

Yes, specific applications and features

On-premises security at perimeter.

Yes

Yes

High availability planning.

Failover to an alternate internet network connection

Failover to an alternate ExpressRoute connection*

Direct connection with a predictable network profile.

No

Yes

IPv6 connectivity.

Yes

No

*Failover to an internet connection is only recommended for simple network topologies.

The following resources are here to help guide your planning.

If you’re using an existing Azure ExpressRoute circuit and would like to enable connectivity to Office 365 over this circuit, first evaluate if your existing number of circuits, egress locations, and size of circuits are suitable for extending to Office 365. Most customers require additional bandwidth and many require additional circuits.

Adding connectivity to Office 365 over existing Azure ExpressRoute circuits is done through configuring Microsoft Public peering alongside the Azure Private or Azure Public peering configuration on the circuit. A single circuit can provide all three of these peering relationships.

The Azure ExpressRoute subscription is customer centric, subscriptions are tied to customers, customers may have multiple Azure ExpressRoute circuits and they may access many Microsoft cloud resources over those circuits. For example, a single customer can choose to access an Azure hosted virtual machine, an Office 365 test tenant, and an Office 365 production tenant over a pair of Azure ExpressRoute circuits.

If you’re using Azure ExpressRoute today with Azure, there are a few differences that you’ll need to know about.

Peering relationship

Azure Private

Azure Public

Microsoft

Services

IaaS: Azure Virtual Machines

PaaS: Azure Public

SaaS: Office 365 and CRM Online

Connection initiation

Customer-to-Microsoft

Microsoft-to-Customer

Customer-to-Microsoft

Customer-to-Microsoft

Microsoft-to-Customer

QoS support

No QoS

No QoS

QoS1

1QoS supports Skype for Business only at this time.

Every Office 365 customer has unique bandwidth needs depending on the number of users at each office location, how active those users are with each Office 365 application, and other factors such as the use of on-premises or hybrid equipment and network security configurations.

Having too little bandwidth will result in congestion, retransmissions of data, and unpredictable delays. Having too much bandwidth will result in unnecessary cost. On an existing network, bandwidth is often referred to in terms of the amount of available headroom on the circuit as a percentage. Having 10% headroom will likely result in congestion and having 80% headroom generally means unnecessary cost. Typical headroom target allocations are 20% to 50%.

To find the right level of bandwidth, the best mechanism is to test your existing network consumption. This is the only way to get a true measure of usage and need as every network configuration and applications are in some ways unique. We offer a few bandwidth calculators to get an estimate for your Exchange Online, OneDrive for Business, and Skype for Business Online bandwidth needs; however, these calculators won’t account for other network traffic that may traverse the Azure ExpressRoute circuit such as CRM Online, identity synchronization, and so on.

Once you have an estimated baseline that includes all network applications, pilot Office 365 with a small group that comprises the different profiles of end users in your organization to determine actual usage, and use the two measurements to estimate the amount of bandwidth you’ll require for each office location.

It is also important to note the throttling mechanisms for Exchange Online and SharePoint Online are unaffected by Azure ExpressRoute, all of the guidance on the Office 365 performance tuning site apply to customers regardless of their use of ExpressRoute.

Once your bandwidth needs are determined per location, you can determine the number and size of circuits to acquire. Refer to the Azure content for more detail on the different circuit sizes and billing models that are available to suit your needs.

Securing Azure ExpressRoute connectivity starts with the same principles as securing internet connectivity. Many customers choose to deploy network and perimeter controls along the ExpressRoute path connecting their on-premises network to Office 365 and other Microsoft clouds. These controls may include firewalls, application proxies, data leakage prevention, intrusion detection, intrusion prevention systems, and so on. In many cases customers apply different levels of controls to traffic initiated from on-premises going to Microsoft, versus traffic initiated from Microsoft going to customer on-premises network.

When considering options and topology solutions for maintaining desired level of network and perimeter control for ExpressRoute for Office 365 connections, it is important to understand that they’re closely related to the overall network topology and ExpressRoute connectivity model that you choose to deploy. The following table provides some examples:

ExpressRoute Integration Option

Network Security Perimeter Model

Co-located at a cloud exchange

Install new or leverage existing security/perimeter infrastructure in the co-location facility where the ExpressRoute connection is established.

Leverage co-location facility purely for routing/interconnect purposes and back haul connections from co-location facility into the on-premises security/perimeter infrastructure.

Point-to-Point Ethernet

Terminate the Point-to-Point ExpressRoute connection in the existing on-premises security/perimeter infrastructure location.

Install new security/perimeter infrastructure specific to the ExpressRoute path and terminate the Point-to-Point connection there.

Any-to-Any IPVPN

Leverage an existing on-premises security/perimeter infrastructure at all locations that egress into the IPVPN used for ExpressRoute for Office 365 connectivity.

Hairpin the IPVPN used for ExpressRoute for Office 365 to specific on-premises locations designated to serve as the security/perimeter.

In addition to the above options, some service providers may offer managed security/perimeter functionality as a part of their integration solutions with Azure ExpressRoute.

When considering the topology placement of the network/security perimeter options used for ExpressRoute for Office 365 connections, following are additional considerations

  • The depth and type network/security controls may have impact on the performance and scalability of the Office 365 user experience.

  • Outbound (on-premises->Microsoft) and inbound (Microsoft->on-premises) [if enabled] flows may have different requirements.

  • Office 365 requirements for ports/protocols and necessary IP subnets are the same whether traffic is routed through ExpressRoute for Office 365 or via the Internet.

  • Topological placement of the customer network/security controls determines the ultimate end to end network between the user and Office 365 service and can have a substantial impact on network latency.

  • Customers are encouraged to design their security/perimeter topology for use with ExpressRoute for Office 365 in accordance with best practices for redundancy, high availability and disaster recovery.

In the following example, Woodgrove Bank compares the different Azure ExpressRoute connectivity options alongside the perimeter security models discussed above.

Example 1: Securing Azure ExpressRoute

Woodgrove Bank is considering implementing Azure ExpressRoute and after planning the optimal architecture for Routing with ExpressRoute for Office 365 and after using the above guidance to understand bandwidth requirements, they’re determining the best method for securing their perimeter.

For Woodgrove, a multi-national organization with locations in multiple continents, security must span all perimeters. The optimal connectivity option for Woodgrove is a multi-point connection with multiple peering locations around the globe to service the needs of their employees in each continent. Each continent includes redundant Azure ExpressRoute circuits within the continent and security must span all of these.

The existing infrastructure that Woodgrove already has is reliable and can handle the additional work, as a result, Woodgrove Bank is able to utilize the infrastructure for their Azure ExpressRoute and internet perimeter security. If this weren’t the case, Woodgrove could choose to purchase additional equipment to supplement their existing equipment or to handle a different type of connection.

When you consider High availability, consider it from the experience of the person using the service. This applies to on-premises infrastructure services and applications just as it applies to internet based services such as Office 365. There are many factors that can influence a person’s availability experience, ranging from the components of the Office 365 service itself, to all the on-premises components the person must rely on to use Office 365, and everything in-between.

Often the network path to Office 365 includes many on-premises components that are not redundant or designed to be highly available. Many customers have experienced poor availability when using Office 365 due to a lack of availability from these intermediate components, even when Office 365 is available for everyone else.

If you plan to use ExpressRoute for Office 365 for your production network traffic, it’s critical to evaluate all factors of your networking topology, ExpressRoute connections, and associated on-premises infrastructure with the individual end user experience in mind. Starting with the Office 365 service itself and moving toward the end user, here are several considerations you should take into account when planning your connectivity and availability strategy for Office 365.

Service Availability

  • Office 365 services are covered by well-defined service level agreements, which include uptime and availability metrics for individual services. One reason Office 365 can maintain such high service availability levels is the ability for individual components to seamlessly failover between the many Microsoft datacenters, using the global Microsoft network. This failover extends from the datacenter and network to the multiple Internet egress points, and enables failover seamlessly from the perspective of the people using the service.

  • ExpressRoute provides a 99.9% availability SLA on individual dedicated circuits between the Microsoft Network Edge and the ExpressRoute provider or partner infrastructure. These service levels are applied at the ExpressRoute circuit level, which consists of two independent interconnects between the redundant Microsoft equipment and the network provider equipment in each peering location.

Provider Availability

  • Microsoft’s service level arrangements stop at your ExpressRoute provider or partner. This is also the first place you can make choices that will influence your availability level. You should closely evaluate the architecture, availability, and resiliency characteristics your ExpressRoute provider offers between your network perimeter and your providers connection at each Microsoft peering location. Pay close attention to both the logical and physical aspects of redundancy, peering equipment, carrier provided WAN circuits, and any additional value add services such as NAT services or managed firewalls.

Customer Availability

  • Your on-premises network perimeter and ExpressRoute egress requires a deep review. From your WAN infrastructure, to equipment at the egress points, to perimeter networks that connect into ExpressRoute circuits, your review should examine how availability and resiliency are affected by the network topology. These portions of your connectivity scenarios are not covered by ExpressRoute or Office 365 SLAs, but they play a critical role in the end to end service availability as perceived by end users.

  • Your internet availability is still critical. Every location where people will be using Office 365 must have access to the internet, regardless of ExpressRoute connectivity. Office 365 relies on a number of system dependencies such as domain name resolution, certificate validation, content delivery networks, as well as access to some Office 365 service endpoints that aren’t available over ExpressRoute connections.

  • Focus on the people using and operating Office 365, if a failure of any one component would affect peoples’ experience using the service, look for ways to limit the total percentage of people affected. If a failover mode is operationally complex, consider the peoples’ experience of a long time to recovery and look for operationally simple and automated failover modes.

Designing your availability plan

We strongly recommend that you plan and design high availability and resiliency into your end-to-end connectivity scenarios for Office 365. A design should include;

  • no single points of failure.

  • minimizing the number of people affected and duration of that impact for most anticipated failure modes.

  • optimizing for simple, repeatable, and automatic recovery process from most anticipated failure modes.

  • supporting the full demands of your network traffic and functionality through redundant paths, without substantial degradation.

Your connectivity scenarios should include a network topology that is optimized for multiple independent and active network paths to Office 365. This will yield a better end-to-end availability than a topology that is optimized only for redundancy at the individual device or equipment level.

Tip: If your users are distributed across multiple continents or geographic regions and each of those locations connects over redundant WAN circuits to a single on-premises location where a single ExpressRoute circuit is located, your users will experience less end-to-end service availability than a network topology design that includes independent ExpressRoute circuits that connect the different regions to the nearest peering location.

We recommend provisioning ExpressRoute circuits into different geo-redundant peering locations for every region where people will use ExpressRoute connectivity for Office 365 services. This allows each region to remain connected during a disaster that affects a major location such as a datacenter or peering location. We also recommend these connections be configured in an active/active manner allowing end user traffic to be distributed across multiple network paths. This reduces the scope of people affected during equipment or component level outages.

All of the failover scenarios discussed so far have been between ExpressRoute circuits and entirely independent of the internet egress and availability of the internet network path. Some customers have considered using the internet as a network path to failover to in the event ExpressRoute is unavailable. In simple network topologies, this should be designed into failover plan. Once a network topology begins to include different bandwidth capabilities at different locations, complex traffic routing, or application level configuration through the use of .PAC or WPAD configurations automated failover from an ExpressRoute network path to an internet network path becomes impractical. If your network topology includes these more complicated components, we recommend elaborate testing to understand the failover process and experience.

Example 2: Failover and High Availability

Woodgrove Bank’s multi-geographic design has undergone a review of routing, bandwidth, security, and now must go through a high availability review. Woodgrove thinks about high availability as covering three categories; resiliency, reliability, and redundancy.

Resiliency allows Woodgrove to recover from failures quickly. Reliability allows Woodgrove to offer a consistent outcome within the system. Redundancy allows Woodgrove to a move between one or more mirrored instances of infrastructure.

Within each edge configuration, Woodgrove has redundant Firewalls, Proxies, and IDS. For North America, Woodgrove has one edge configuration in their Dallas datacenter and another edge configuration in their Virginia datacenter. The redundant equipment at each location offers resiliency to that location.

The network configuration at Woodgrove Bank is built based on a few key principles:

  • Within each geographic region, there are multiple Azure ExpressRoute circuits.

  • Each circuit within a region can support all of the network traffic within that region.

  • Routing will clearly prefer one or the other path depending on availability, location, and so on.

  • Failover between Azure ExpressRoute circuits happens automatically without additional configuration or action required by Woodgrove.

  • Failover between Internet circuits happens automatically without additional configuration or action required by Woodgrove.

In this configuration, with redundancy at the physical and virtual level, Woodgrove Bank is able to offer local resiliency, regional resiliency, and global resiliency in a reliable way. Woodgrove elected this configuration after evaluating a single Azure ExpressRoute circuit per region as well as the possibility of failing over to the internet.

If Woodgrove was unable to have multiple Azure ExpressRoute circuits per region, routing traffic originating in North America to the Azure ExpressRoute circuit in Asia Pacific would add an unacceptable level of latency and the required DNS forwarder configuration adds complexity.

Leveraging the internet as a backup configuration removes the predictable consistent connection offered by Azure ExpressRoute. This breaks Woodgrove’s reliability principle, resulting in an inconsistent experience using the connection. Additionally, manual configuration would be required to failover considering the BGP advertisements that have been configured, NAT configuration, DNS configuration, and the proxy configuration.

Still have questions about how to plan for and implement traffic management or Azure ExpressRoute? Read the rest of our network and performance guidance or the Azure ExpressRoute FAQ.

There are many different types of Azure ExpressRoute providers. The design of Azure ExpressRoute allows customers to pick and choose the right provider for each location where an Azure ExpressRoute circuit is provisioned. For some customers, this means all new network providers and for others it means more functionality from an existing provider.

When selecting an Azure ExpressRoute provider, you should evaluate the location you desire to have the circuit based on all of your previous planning around routing, bandwidth, security, and high availability. Once you have determined the optimal locations, evaluating the connectivity option, point-to-point, multi-point, or hosted. Remember, you can mix and match the connectivity options so long as the bandwidth and other redundant components support your routing and high availability design.

With these two variables, location and connectivity type identified, review the current list of providers by region.

See Also

Network connectivity to Office 365

Azure ExpressRoute for Office 365

Routing with ExpressRoute for Office 365

Implementing ExpressRoute for Office 365

Using BGP communities in ExpressRoute for Office 365 scenarios (preview)

Media Quality and Network Connectivity Performance in Skype for Business Online

Office 365 performance tuning using baselines and performance history

Performance troubleshooting plan for Office 365

Office 365 URLs and IP address ranges

Office 365 network and performance tuning

Share Facebook Facebook Twitter Twitter Email Email

Was this information helpful?

Great! Any other feedback?

How can we improve it?

Thank you for your feedback!

×