Microsoft is giving Windows Azure customers a 33 percent credit on their entire monthly bills because of an outage on Feb. 29 caused by a “leap day bug.” Azure executive Bill Laing announced the credit Friday night in a post that also offered further details on how the leap day bug took down Azure.
We’ve already noted that core parts of Azure were inaccessible to many customers for more than 12 hours, due to the service’s inability to calculate the correct date while trying to inspect SSL certificates. Laing now offers a more thorough root cause analysis.
New “transfer” certificates are generated when virtual machines are created, ensuring that application details are encrypted, he wrote. “When the GA (guest agent) creates the transfer certificate, it gives it a one year validity range,” Laing wrote. “It uses midnight UST of the current day as the valid-from date and one year from that date as the valid-to date. The leap day bug is that the GA calculated the valid-to date by simply taking the current date and adding one to its year. That meant that any GA that tried to create a transfer certificate on leap day set a valid-to date of February 29, 2013, an invalid date that caused the certificate creation to fail.”
Storage clusters and SQL Azure remained unaffected, but the outage did affect Windows Azure Compute, the Access Control Service, Azure Service Bus, SQL Azure Portal, and Data Sync Services. Because of the “extraordinary nature” of the outage, Microsoft is providing a “33% credit to all customers of Windows Azure Compute, Access Control, Service Bus and Caching for the entire affected billing month(s) for these services, regardless of whether their service was impacted.”