Following the large Gmail outage earlier this week, Google has launched a status page for their most important hosted services called Google Apps Status Dashboard. It’s hard for me to believe that they created this in two days, so they must have been preparing this for a while; but it’s a useful page nonetheless. In Google’s words:
The Google Apps Status Dashboard represents an additional layer of transparency that we believe will be particularly useful for our business users, and it’s also relevant to users of our consumer products. The Status Dashboard is the best place to check for information on service availability for Google Apps anywhere in the world.
The indicent reports posted so far are great; they not only include information about when the Gmail issues started and ended, but also about the actual reason for the outage:
On Tuesday, February 24, 2009, an unexpected service disruption occurred during a routine maintenance event in a data center. In this particular case, users were directed towards an alternate data center in preparation for the maintenance tasks, but the new software that optimizes the location of user data had the unexpected side effect of triggering a latent bug in the Gmail code. The bug caused the destination data center to become overloaded when users were directed to it, and which in turn caused multiple downstream overload conditions as user traffic was automatically shifted in response to the failures. […]
Google engineers take system outages very seriously. This commitment is demonstrated in our drive to build resiliency into everything that we develop. Despite this commitment, we’re not perfect, and we don’t always get it right the first time.
It’s reassuring to know that even Google sometimes makes mistakes, and I like the fact that they are upfront about it. Let’s hope they continue to be this open about outages in the future.