Welcome!

Open Web Authors: Sarah Lake, Shelly Palmer, Tad Anderson, An Bui, Mat Rider

Related Topics: Open Web, Cloud Expo

Open Web: Article

Tape Salvages Gmail Disaster

The bug was created by an otherwise unidentified defect in a storage software update

Late Monday Google said that the bug that ate multiple copies of people's Gmail Sunday in multiple Google data centers - erasing years of e-mails, attachments, folders, chat logs, all-important contact lists and personalized settings - was created by an otherwise unidentified defect in a storage software update and that the only thing that ultimately saved the situation was old-fashioned tape back-ups.

Google engineering VP and 24x7 site reliability czar Ben Trenor explained:

"Imagine the sinking feeling of logging in to your Gmail account and finding it empty. That's what happened to 0.02% of Gmail users yesterday, and we're very sorry. The good news is that email was never lost and we've restored access for many of those affected. Though it may take longer than we originally expected, we're making good progress and things should be back to normal for everyone soon.

"I know what some of you are thinking: how could this happen if we have multiple copies of your data, in multiple data centers? Well, in some rare instances software bugs can affect several copies of the data. That's what happened here. Some copies of mail were deleted, and we've been hard at work over the last 30 hours getting it back for the people affected by this issue.

"To protect your information from these unusual bugs, we also back it up to tape. Since the tapes are offline, they're protected from such software bugs. But restoring data from them also takes longer than transferring your requests to another data center, which is why it's taken us hours to get the email back instead of milliseconds.

"So what caused this problem? We released a storage software update that introduced the unexpected bug, which caused 0.02% of Gmail users to temporarily lose access to their email. When we discovered the problem, we immediately stopped the deployment of the new software and reverted to the old version.

"As always, we'll post a detailed incident report outlining what happened to the Apps Status Dashboard, as well as the corrective actions we're taking to help prevent it from occurring again. If you were affected by this issue, it's important to note that email sent to you between 6:00 PM PST on February 27 and 2:00 PM PST on February 28 was likely not delivered to your mailbox, and the senders would have received a notification that their messages weren't delivered.

"Thanks for bearing with us as we fix this, and sorry again for the scare."

More Stories By Maureen O'Gara

Maureen O'Gara the most read technology reporter for the past 20 years, is the Cloud Computing and Virtualization News Desk editor of SYS-CON Media. She is the publisher of famous "Billygrams" and the editor-in-chief of "Client/Server News" for more than a decade. One of the most respected technology reporters in the business, Maureen can be reached by email at maureen(at)sys-con.com or paperboy(at)g2news.com, and by phone at 516 759-7025. Twitter: @MaureenOGara

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.