Incident: Top of Rack Switch Outage in Tokyo, JP (NRT1) Datacenter
Outage Start Time: 12:25 AM (EST) on November 7, 2017 Outage End Time: 1:30 AM (EST) on November 7, 2017
Reason for Outage
During this period of time, some customers connected to a single cabinet/top of rack switch in our Tokyo (NRT1) datacenter experienced a loss of connectivity.
=Identifying the Core Issue / Resolution=
On investigation, we identified an issue with the software running on the device in question, which was resolved following a full reload.
We are working with our hardware manufacturer, Juniper, at the highest levels, and have identified several flaws in their current code base related to how firewall Access Control Lists (ACLs) get programmed into the hardware, resulting in valid customer traffic getting discarded over time. With their assistance, we’ve also identified a code revision which may solve these issues, which we are currently testing in our lab. In addition, we’ve made some changes to our internal provisioning systems to help with how ACLs are generated and deployed, to help mitigate this issue.
We are improving our internal procedures around incident response and network device troubleshooting, based upon the lessons learned from this particular outage. Though hardware failures are an unfortunate (and rare) fact of life, we endeavor to diagnose and recover from these issues as quickly as possible.
As a customer, if you are interested in greater switch diversity, you can review the “switch ID” note, which is exposed in our API and customer portal for each device below the facility code on the server detail page.
This (8) character field is an identifier for the physical (top of rack) switch each server is connected to, and dependent on for connectivity. Most instance types are available on multiple switches in each datacenter, and we are happy to work with you to promote greater switch diversity in your deployment if that is of interest. As we work to introduce some provisioning-time options around diversity, please don’t hesitate to drop us a note (firstname.lastname@example.org) if we can assist.