Reading Incident Reports¶
Understand how we communicate during and after service incidents.
Incident Lifecycle¶
Every incident goes through these stages:
flowchart LR
A[🔴 Investigating] --> B[🟡 Identified]
B --> C[🔧 Monitoring]
C --> D[✅ Resolved]
| Stage | What It Means |
|---|---|
| Investigating | We've detected an issue and are determining the cause |
| Identified | We know what's wrong and are working on a fix |
| Monitoring | A fix has been deployed, we're confirming it works |
| Resolved | The issue is fully fixed and verified |
Incident Severity Levels¶
| Severity | Icon | Impact |
|---|---|---|
| Critical | Complete service outage | |
| Major | Significant functionality impacted | |
| Minor | Degraded performance or partial issue | |
| Informational | Maintenance or known issue with workaround |
What's in an Incident Report?¶
During the Incident¶
- Title – Brief description of the issue
- Affected Services – Which services are impacted
- Current Status – The stage in the lifecycle
- Updates – Timestamped progress notes
After Resolution¶
- Duration – How long the incident lasted
- Root Cause – What caused the issue
- Resolution – What we did to fix it
- Prevention – Steps taken to prevent recurrence (for major incidents)
Example Incident Timeline¶
Sample Incident
09:15 UTC – Investigating
We're investigating reports of website timeouts affecting shared hosting customers.
09:32 UTC – Identified
The issue has been traced to a network switch failure in our EU datacenter.
10:05 UTC – Monitoring
A replacement switch has been installed. We're monitoring to confirm full restoration.
10:45 UTC – Resolved
All services have been restored. Total downtime: 1 hour 30 minutes. A full post-mortem will follow.
How to Find Past Incidents¶
On the Status Page¶
Scroll down to the "Past Incidents" section to see recent incidents grouped by date.
In Your Email¶
If you're subscribed, you'll have email records of all incidents that occurred since subscribing.
Understanding Impact¶
Some users or services are affected, but not everyone:
- Only EU servers affected
- Only email service, not websites
- Intermittent rather than complete failure
The entire service is unavailable for all users.
The service works but is slower than normal:
- Longer page load times
- Timeout errors under heavy load
- Delayed email delivery
What We Do During Incidents¶
- Alert – Our monitoring detects the issue automatically
- Triage – Engineers assess severity and impact
- Communicate – We post to the status page and send alerts
- Fix – We work to resolve the underlying issue
- Verify – We confirm the fix works before marking resolved
- Review – For major incidents, we conduct a post-mortem
Post-Incident Reviews¶
For significant incidents, we publish a detailed review including:
- Timeline – Detailed sequence of events
- Root Cause Analysis – What failed and why
- Impact Assessment – Who was affected and how
- Lessons Learned – What we'll do differently
- Action Items – Specific improvements we're making
Have Questions About an Incident?¶
If you need more information about a specific incident:
- Check the status page for updates
- Wait for the post-mortem (for major incidents)
- Contact support with specific questions
Related Articles¶
-
Return to Monitoring & Status home
-
Get notified about future incidents