Cloudflare has published a report explaining the causes of a severe network outage that impacted global internet traffic for several hours and affected millions of users. The incident began at 11:20 UTC and was traced back to an internal configuration error related to a permissions update in the ClickHouse database cluster, which was intended to enhance security. This oversight caused a critical feature file to exceed its size limit, overwhelming the system and causing significant downtime for Cloudflare’s services.
Initially considered a massive DDoS attack due to coinciding events, the outage stemmed from failures in the Bot Management module that halted request processing, resulting in various errors across the platform. Core services, such as CAPTCHA and email security, were heavily impacted, leading to difficulties in authentication and traffic management. After implementing a rollback to a stable version and halting bad-file propagation, the system recovered by 17:06 UTC. This incident, described by Cloudflare’s CEO as serious and unacceptable, underscores the fragility of reliance on major cloud providers, evidenced by similar outages at Microsoft Azure and AWS in recent weeks.
👉 Pročitaj original: Cyber Security News