Incident Summary: On 2023-05-11 at 07:24 , our engineering team was alerted to an issue impacting our Fido Phone API, specifically affecting the Whatsapp and Telegram signals. This incident report provides a postmortem analysis of the event, including the timeline of the incident, actions taken to resolve it, and the root causes involving an external supplier bug and a cloud provider issue.
- [5 DAYS AGO, 14:48 CEST]: The incident was initially reported, and our engineering team began investigating the issue.
- [5 DAYS AGO, 17:12 CEST]: The issue was identified, and a fix for Whatsapp was deployed.
- [5 DAYS AGO, 8:08 CEST]: The issue was further identified, and a fix was deployed to restore normal parameters for Telegram signal coverage.
Resolution: The incident has been resolved, and as of 2023-05-16, the Fido Phone API is functioning as expected. We appreciate the patience of our customers throughout this period and encourage them to reach out to our support team if they have any further questions or concerns.
Root Cause Analysis: Upon conducting a thorough investigation, it was determined that the incident was caused by a combination of factors involving an external supplier bug and a cloud provider issue.
- External Supplier Bug: During our investigation, we discovered that an update from one of our external suppliers introduced a bug in their software, which impacted the communication between our API and messaging platforms such as Whatsapp and Telegram. This bug caused a disruption in the signal coverage for these platforms, affecting the user experience.
- Cloud Provider Issue: In addition to the external supplier bug, we identified a concurrent issue with our cloud provider during the incident. Our cloud provider experienced a temporary infrastructure problem that further exacerbated the disruption in signal coverage. This issue hindered the normal functioning of our API and contributed to the prolonged impact on Whatsapp and Telegram signals.
- Incident Identification ([5 DAYS AGO, 14:48 CEST]): The incident was promptly identified, and our engineering team initiated the investigation process to understand the scope and impact of the issue.
- Fix Deployment for Whatsapp ([5 DAYS AGO, 17:12 CEST]): After identifying the issue, a fix was developed and deployed to restore the signal coverage for Whatsapp.
- Fix Deployment for Telegram ([5 DAYS AGO, 8:08 CEST]): Following the resolution of the Whatsapp issue, our team deployed another fix to restore normal parameters for Telegram signal coverage.
- Collaboration with External Supplier: We engaged in close collaboration with our external supplier to address and rectify the bug in their software. Together, we worked to ensure that future updates are thoroughly tested and validated to prevent similar incidents.
- Communication with Cloud Provider: We communicated the impact of the incident to our cloud provider, who promptly addressed the infrastructure problem. We are working with them to implement measures that will prevent or mitigate such issues in the future.
- Supplier Management and Testing: We will enhance our supplier management processes, including conducting rigorous testing and validation of software updates before deploying them in our systems.
- Redundancy and Resilience: To mitigate the impact of cloud provider issues, we will explore redundancy measures, such as multi-region deployment or backup systems, to ensure uninterrupted service delivery.
- Incident Response and Communication: We acknowledge the importance of timely and transparent communication with our customers during incidents. Moving forward, we will provide regular updates to affected users and ensure that our support team is readily available to address any concerns or questions.
Conclusion: The incident affecting the Fido Phone API, specifically the signal coverage for Whatsapp and Telegram, has been resolved. We apologize for any inconvenience caused and appreciate the patience showed during this incident.