Summary: The outage is solved and everything is good now. The list of actions are described below. If you have any questions, feel free to reach us.

Dear customers and users,

An issue occurs on the 13th of October 2021 preventing users from accessing Carbone services between 9:20 am UTC+2 to 11:50am UTC+2; you can learn more about chronological events below. It was caused by an outage at our hosting provider OVHCloud. Following this event, we are making the following list of actions a priority.

Post-mortem & actions

Updated: 2021/10/14 4:00 pm UTC+2

To all businesses and the people worldwide who depend on us, we are sorry for any inconvenience caused by the outage across our services. Carbone availability was highly impacted by this extreme event caused by our hosting service OVHCloud, and Carbone services should handle any kind of exceptional scenario. We learned centralizing increases the risk of failure when it is managed by a single entity. That's the reason why we are going to turn decentralized with the following actions:

Even if our architecture is replicated in multiple locations for handling high availability, it does not prevent an outage from our hosting provider, a DNS disruption, or both. That's the reason why we are going to replicate one part of Carbone services into another hosting provider, and we are going to create a new dedicated domain with another DNS provider. It will be used only as a fallback in the worst case scenario: if our main hosting provider or DNS does not respond, a relay domain and Carbone Render API will be used as a relay. We are going to adapt all Carbone Render SDKs to make the fallback system native and automatic. In that way, you will not experience service disruptions.
When our services started to be accessible, our main storage was still not available because OVHCloud was recovering the storage network. It was not possible to login to Carbone Account, or upload new template through the Carbone Render API. Only few people were impacted of this issue because most of the templates were in a cache. It took us some time to find the origin of the issue and we had to manually change the storage access to another storage replica located at a different zone. As a solution, if the primary storage replica does not respond, we are going to adapt our internal systems to switch automatically to another storage replica at a different location.
For users and customers, it was difficult to have visibility about the Carbone availability. Until now, the only solution for you was to check the Carbone Render API endpoint GET render.carbone.io/status, but servers were not responding. Our only media of communication during the event was Twitter. To solve this visibility issue, we just deployed a status page of our services: https://status.carbone.io

We are aware of the impact of the event, and we apologize to all those affected. We are now working to make our infrastructure decentralized and more resilient. We will update the advancement of actions on Twitter, Linkedin and through our newsletters (You can subscribe to our newsletter by Login to your account > Profile > Check subscribe to the Carbone Newsletter).

Feel free to reach us on the chat, the team is available to answer your questions.

Enjoy creating with Carbone, and Cheers! - The Carbone Team

Outage Chronological events

Updated: 2021/10/13 12:00 am UTC+2

09:20 am (UTC +2 France/Paris) OVHCloud (hosting provider) no longer responds, thousands of sites and services are inaccessible globally. All servers, services, website were not accessible: Carbone Render API, Carbone Studio, Carbone Account, and the Carbone website. We create a communication feed on twitter until everything is resolved: https://twitter.com/carbone_io/status/1448190940980649987
10:00 am (UTC +2 France/Paris) OVHCloud Update: Octave Klaba, co-founder of OVHCloud, confirmed that the problem was related to a reconfiguration of the network following a human error. A solution is being deployed. (https://twitter.com/olesovhcom/status/1448196879020433409) & Statements on Twitter from OVHCloud https://twitter.com/OVHcloud/status/1448225243198369798
10:10 am (UTC +2 France/Paris) OVHCloud Update: The maintenance scheduled this morning was planned to cope with the very large number of DDoS attacks in recent days. OVHCloud had decided to increase their DDoS processing capacity by adding new infrastructure in their data center in VH (US-EST). The problem is due to a misconfiguration of the router. (https://twitter.com/olesovhcom/status/1448199383170834434)
10:23 am (UTC +2 France/Paris) OVHCloud Update: The router that caused the general breakdown at OVH has been removed from the network. The access to the websites is back in place, the servers are accessible again. (https://twitter.com/olesovhcom/status/1448202696071254016)
10:27 am (UTC +2 France/Paris) Carbone services are starting to be accessible, and all data are safe.
10:41 am (UTC +2 France/Paris) Internal DNS problem solved for Accout website, Carbone Render API V2 and V3
10:48 am (UTC +2 France/Paris) Carbone Render and Carbone Studio are now accessible but you can't upload new templates. The OVH storage network is still not accessible, we are working on a solution.
11:52 am (UTC +2 France/Paris) The OVHCloud Storages are now accessible, Carbone Render, Carbone Studio and you account are fully available. The Carbone environment is back on track, everything is good now.
Was this article helpful?
Cancel
Thank you!