qx1u4zw6.png
The world’s internet infrastructure is increasingly reliant on just a few providers like Cloudflare, AWS, and Microsoft Azure. Photo: The Verge

A significant portion of the global web was disrupted on the evening of November 18 when Cloudflare experienced a service outage. The incident is yet another alarming reminder of how heavily the internet depends on a small number of infrastructure providers.

Cloudflare is the latest web infrastructure giant to suffer a large-scale disruption in the past month, leaving major platforms like X, ChatGPT, Spotify, Canva, and even DownDetector unresponsive for several hours.

This is just one link in a growing chain of failures that, according to Mehdi Daoudi, CEO and co-founder of performance monitoring platform Catchpoint, should serve as “a wake-up call” for companies everywhere.

“Businesses are putting all their eggs in one basket and are then shocked when something breaks,” Daoudi said on The Verge. “Companies must ensure their systems are resilient and fault-tolerant.”

The Cloudflare incident came shortly after similar outages affected Microsoft Azure and Amazon Web Services, which disrupted large swathes of the internet that rely on these platforms.

Cloudflare plays a similar role: it operates a content delivery network (CDN) to keep websites running smoothly, while also offering DDoS protection, DNS services, and various other infrastructure tools.

In 2024, Cloudflare reported that roughly 20% of global web traffic passed through its network. The company also serves 35% of Fortune 500 firms and “millions” of other customers.

Its high performance and strong security track record have made Cloudflare a widely preferred choice. But this latest incident underscores the increasing dependency of the web infrastructure sector on just a few key players.

After an AWS outage recently took down messaging app Signal, Signal President Meredith Whittaker noted that the company “had no other choice” but to run on one of the dominant providers. “Almost the entire tech stack is now in the hands of just three or four giants,” she wrote.

While it may be impossible for companies to completely avoid reliance on these few infrastructure providers, the recent string of failures shows the urgent need for contingency planning. “Outages will keep happening and become more frequent. Their impact will also be more widespread,” Daoudi warned. “The question is, what will you do to prepare?”

While the AWS and Azure failures were traced back to DNS - the system that translates domain names into IP addresses - Cloudflare has identified its issue as stemming from a single file.

“According to Cloudflare, the root cause was an automatically generated configuration file designed to manage malicious traffic,” said spokesperson Jackie Dutton. “The file exceeded its expected size and triggered failures in the traffic-handling systems of multiple Cloudflare services.”

It may sound unreasonable that a single file could disrupt so much of the internet, but with Cloudflare’s scale, it’s entirely plausible. “At this level of infrastructure, even a small deviation can have massive consequences,” said Rob Lee, Chief AI & Research Officer at the SANS Institute. “These systems are built for speed, so anything that slows or halts decision-making can trigger cascading failures. In high-performance environments, even a millisecond delay can cause a complete traffic jam.”

According to Lee, configuration files like the one mentioned by Cloudflare “dictate routing policies, load balancing decisions, and global traffic distribution.”

If such a file suddenly balloons in size, “it could slow down parsing, create memory errors, spark CPU contention, or cause logic breakdowns in dependent systems,” he explained.

AWS, too, has previously blamed automation errors for chain-reaction failures, a type of glitch that experts believe is bound to recur.

“Are you going to complain every time Cloudflare sneezes?” Daoudi asked. “Or will you build your system to withstand it?”

Du Lam