Email Outage

Here is a broad time line and summary for this outage:

At around 8:30 GMT on April 24thwe detected that one of the storage units which serves part of our email infrastructure became non-operational causing email services for a subset of our customers to not function.
We immediately started a process of restoring access to this storage unit and our engineers worked dedicatedly over the next few hours to restore functionality to affected email accounts.
As of 17:00 GMT on April 24th access to all email accounts affected by this issue was restored. All affected accounts were at that time able send and receive email via webmail and through external clients.
Emails that were queued up during this outage were relayed to individual email accounts after email services were restored and should be working without pause at this time.
As of this update, we are still working to restore this storage unit to a fully operational state. This process is taking longer than we originally anticipated. We currently have some of the best storage unit experts in the industry working round the clock to bring the unit back online. We are currently being advised that this process can take several days to complete.

What is the current impact? Is there still something which is still non-functional?

Customers that were affected by this downtime are currently unable to view emails received prior to this outage, specifically, those emails that were stored on the storage unit that suffered the outage and were being accessed either via webmail or IMAP. If you were however downloading your emails to a local email client such as Outlook or Thunderbird using the POP protocol – then there is no further impact to your service due to this issue.

When will you fix this issue? What can I expect in the interim?

Our goal is to restore access to your stored emails as soon as possible and we are sparing no effort towards this. We care deeply about ensuring that you have best-in-class services and we apologize deeply for the inconvenience caused to you by this outage. We are now being advised that bringing up the affected storage unit is, unfortunately, not a matter of hours but of days. In the interim we will endeavor to keep you updated as best we can with any and all meaningful updates because we do care about getting you back up and running fully. As much as it pains us, we cannot offer a concrete ETA at this time but will continue to update you regularly.

What happened? Can you explain in detail what really went wrong?

One of the storage units that stores email for a subset of our customers went offline as a result of set of pre-planned activities we were working on to bring a new storage cluster online. We build our systems with multiple layers of fail-safes and safeguards, and we are still in the process of doing a post-mortem on what we could have done better to prevent this from happening. We promise you a full account of this when we complete this investigation along with a detailed summary of steps we will take to prevent a similar event in the future as soon as we complete this analysis.

What can I do if you have additional questions?

While we realize you will still have questions, we hope this post adds to your understanding of this outage. We promise to be here to work with you and provide transparent & meaningful updates as soon as we are able. This post will be the primary medium of communication with you and as such we are directing our contact center agents to refer you to here so we can have clear and consistent communication with you. However, we will continue to be available on our regular support channels in case you require any additional assistance during this time.

Thank you for continued patience and support.

5 Comments

5 Responses to “Email Outage”

  1. admin says:

    Why is bringing this storage cluster online taking so long?

    The way in which this storage cluster went offline has required specialized techniques to restore. As we mentioned previously in this post, we have some of the foremost experts in the industry working on this around the clock.

    When will my mail be available?

    At this time we cannot confirm a concrete timeline.

    Can’t you restore data using backups?

    As mentioned in our last update, the storage cluster failed during a planned activity to bring another storage cluster online. Our backup strategy here consisted of creating periodic snapshots of the data. These snapshots were stored on redundant & highly-available systems that comprise this storage cluster. Unfortunately, given that the storage cluster is not functional at this time, we do not currently have access to these backups.

    When can I expect the next update?

    We will continue our updates – and provide you with one every 24 hours – or sooner if we have additional details to share. Thank you for your continued patience.

  2. admin says:

    Apologies for the delay in updating the forum post.

    We assure you that this issue is being treated with the highest priority and our System administrators are working round the clock with experts from the storage industry to restore access to all services. Unfortunately this is proving to be a very time consuming operation and as of now we cannot provide you an ETA on this but we will make sure we update this forum with as soon as we have any new information.

    We deeply and sincerely regret the inconvenience caused. We appreciate your patience and co-operation.

  3. admin says:

    How did this happen?

    We were bringing a new storage cluster online, and during this process data was inadvertently removed from an existing storage cluster.

    What are you doing to get my mail back?

    The removal of this data has made it very challenging to restore your messages. As we’ve mentioned, we are working with storage industry experts to reconstruct the data, work continues around the clock towards this end, but this is a very time consuming process, and it may take a few weeks to complete the entire process.

    Are my messages gone forever?

    It is possible that some messages of your messages may be unrecoverable, however at this time we have not completed our work on the restoration. We will continue working on our recovery efforts, with an eye towards restoring as many messages for as many customers as quickly as possible.

    Should I be worried that this is going to happen again?

    The error that occurred was highly unusual. We are carefully analyzing what happened and are implementing additional safeguards to prevent this particular event from reoccurring.

  4. admin says:

    Summary

    Mail restoration efforts continue
    Information on IMAP availability

    Updates on recovery process

    The process to bring back the storage cluster continues at this time. We have started to see some progress from the engineering team around recovering critical pieces of meta-data that will aid in rebuilding the storage cluster and identifying the contents of and starting the process of recovering the emails contained therein. Please note that this process is highly time-intensive since it requires rebuilding individual files distributed across multiple and redundant disks & re-constructing inbox structure for all affected users piece-by-piece. As highlighted in yesterday’s update – this will take several weeks to complete. While our engineering team remains hopeful that we will be able to restore a portion of the data, we are unable to share specifics on exactly how much or what is recoverable at this time. It is our commitment to continue updating you throughout this process.

    Activities in the next 24H

    Here are the broad set of activities our various teams are working on over the next 24H

    Email recovery process continues as explained above – we are monitoring the progress of & driving this round the clock – efforts will continue 24 hours a day, 7 days a week.

    On 25/04/2014, we had shut down IMAP services for all affected email users to allow emails they had stored offline (via clients like Outlook & Thunderbird) to be backed-up on local storage and prevent those offline emails from being deleted due to an IMAP-sync. We are now starting the process of reaching out with specific instructions to allow users to store those emails in local folders in preparation for turning on IMAP services for all users. We will be sending out detailed communication around this over the next 24H. Please reach out to our support team with any additional questions about this.

    Timeline for next update

    We will send out the next update in the next 24H (or earlier in case of any other important developments).

  5. admin says:

    Our Engineering team continues work on email recovery and to bring the storage cluster online; we are starting to see some limited progress. This process is very time-intensive, and we continue to work 24×7 on it. As we communicated earlier, it is likely to take several weeks to complete this process.

    We have scheduled IMAP service restoration for Monday, May 5th. We are reaching out to some users to assist them with backing up their emails locally before we restore IMAP functionality.

    We will continue to post updates here as they become available. Thank you for your continued patience and understanding.

Leave a Reply

You must be logged in to post a comment.