Cloudflare December 5th 2025 Outage

Overview

On December 5th, 2025, Cloudflare experienced a 25-minute outage affecting approximately 28% of HTTP traffic.

TLDR: A combination of a configuation change, plus a killswitch (feature flag), to disable certain code paths from being executed, caused code to try and access object properties that were nil or null values.

This follows a recent previous outage on 18th Nov 2025 that was also a code exception that was caused by assumptions about the existence of the underlying data / memory being accessed -> https://blog.cloudflare.com/18-november-2025-outage/

Official post-mortem

How Requests Flow Through Cloudflare’s WAF

When a request hits Cloudflare, the WAF evaluates it against a set of rules to decide whether to block, allow, or modify it.

   User's Browser
        │
        │  GET https://example.com/api/users
        ▼
   ┌─────────────────────────────────────────┐
   │         Cloudflare Edge Server          │
   │                                         │
   │  "Check if this request is malicious    │
   │   before forwarding to origin"          │
   └─────────────────────────────────────────┘
        │
        ▼
   ┌─────────────────────────────────────────┐
   │             WAF Ruleset                 │
   │                                         │
   │  Rule 1: If SQL injection → BLOCK       │
   │  Rule 2: If XSS attack   → BLOCK        │
   │  Rule 3: EXECUTE test-ruleset  ◄────────┼── Special rule
   │  Rule 4: If bot traffic  → CHALLENGE    │
   │                                         │
   └─────────────────────────────────────────┘

Most rules have simple actions like block or log. The execute action is special. It triggers evaluation of another ruleset. Cloudflare uses this internally to test new rules before releasing them publicly.

The Two-Pass Architecture

Based on the extremely brief code snippet in the blog post, coupled with the explanation, I’m going to assume cloudflare’s code processes rules in two separate passes:

class WAFProcessor {
	async processRequest(request: Request) {
		const ruleset = this.loadRuleset();
		const ruleResults: RuleResult[] = [];
		const subRulesetResults: SubRulesetResult[] =
			[];

		// PASS 1: Evaluate each rule
		for (const rule of ruleset.rules) {
			const result = await this.evaluateRule(
				rule,
				request,
				subRulesetResults
			);
			ruleResults.push(result);
		}

		// PASS 2: Stitch sub-ruleset results back onto rule results
		this.attachResults(
			ruleResults,
			subRulesetResults
		);

		return ruleResults;
	}
}

Why two passes? Sub-rulesets might run in parallel, or results might be stored separately for memory efficiency. Either way, the results get attached in a second loop:

Pass 1: Evaluate rules, store results separately
─────────────────────────────────────────────────

   ruleResults[]              subRulesetResults[]
   ┌────────────────────┐     ┌─────────────────┐
   │ Rule 1 (execute)   │────▶│ Sub-results 0   │
   │ results_index: 0   │     ├─────────────────┤
   ├────────────────────┤     │ Sub-results 1   │
   │ Rule 2 (execute)   │────▶│                 │
   │ results_index: 1   │     └─────────────────┘
   ├────────────────────┤
   │ Rule 3 (block)     │
   │ (no sub-results)   │
   └────────────────────┘


Pass 2: Attach results back to rule objects
─────────────────────────────────────────────

   for each ruleResult:
       if action == "execute":
           ruleResult.execute.results = subRulesetResults[index]

The Bug

The bug lived in how these two passes communicated. Let’s look at the full code:

interface RuleResult {
	ruleId: string;
	action: "block" | "log" | "execute" | "skip";
	execute?: {
		results_index: number;
		results?: SubRulesetResult;
	};
}

class WAFProcessor {
	async evaluateRule(
		rule: Rule,
		request: Request,
		subRulesetResults: SubRulesetResult[]
	): Promise<RuleResult> {
		// Check if killswitch is active for this rule
		if (this.isKillswitched(rule.id)) {
			// Skip the action, but still return result metadata
			return {
				ruleId: rule.id,
				action: rule.action, // ← Still says "execute"!
				// execute: ???      // ← Never created!
			};
		}

		// Normal execution for "execute" action
		if (rule.action === "execute") {
			const subResults = await this.runSubRuleset(
				rule.targetRuleset,
				request
			);
			const index =
				subRulesetResults.push(subResults) - 1;

			return {
				ruleId: rule.id,
				action: "execute",
				execute: {
					// ← This object gets created
					results_index: index,
				},
			};
		}

		// Handle other actions...
	}

	attachResults(
		ruleResults: RuleResult[],
		subRulesetResults: SubRulesetResult[]
	) {
		for (const result of ruleResults) {
			// THE BUG: Assumes execute object exists if action is "execute"
			if (result.action === "execute") {
				result.execute.results =
					subRulesetResults[
						result.execute.results_index
					];
				//     ▲
				//     └── 💥 result.execute is undefined when killswitched!
			}
		}
	}
}

The two functions had incompatible assumptions:

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│  evaluateRule thinks:                                       │
│    "I'll skip the action but keep the action type           │
│     so logs show what kind of rule it was"                  │
│                                                             │
│  attachResults thinks:                                      │
│    "If action is 'execute', someone definitely ran it       │
│     and created the execute object"                         │
│                                                             │
│  These assumptions worked fine for years...                 │
│  until someone killswitched an "execute" rule.              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

How the Code Could Be Structured

Option 1: Change the action type when skipped

The simplest fix. If you skip a rule, don’t claim it’s still an “execute” action:

async evaluateRule(rule: Rule, request: Request): Promise<RuleResult> {
    if (this.isKillswitched(rule.id)) {
        return {
            ruleId: rule.id,
            action: "skipped",        // ← Changed from rule.action
            originalAction: rule.action,  // ← Keep for logging
        };
    }
    // ...
}

Option 2: Check for the object before accessing it

Defensive programming in the second pass:

attachResults(ruleResults: RuleResult[], subRulesetResults: SubRulesetResult[]) {
    for (const result of ruleResults) {
        if (result.action === "execute" && result.execute) {
            //                             ▲
            //                             └── Guard clause
            result.execute.results = subRulesetResults[result.execute.results_index];
        }
    }
}

Option 3: Eliminate the second pass entirely

Why store an index and look it up later? Just attach the results immediately:

async evaluateRule(rule: Rule, request: Request): Promise<RuleResult> {
    if (this.isKillswitched(rule.id)) {
        return { ruleId: rule.id, action: "skipped" };
    }

    if (rule.action === "execute") {
        const subResults = await this.runSubRuleset(rule.targetRuleset, request);

        return {
            ruleId: rule.id,
            action: "execute",
            execute: {
                results: subResults,  // ← Attach immediately, no second pass
            },
        };
    }
    // ...
}

This eliminates the bug entirely because there’s no second pass with mismatched assumptions.

Option 4: Use types to make invalid states unrepresentable

The real fix. Design your types so the bug can’t exist:

// Instead of one type with optional fields...
type RuleResult =
	| { action: "block"; ruleId: string }
	| { action: "log"; ruleId: string }
	| {
			action: "skipped";
			ruleId: string;
			originalAction: string;
	  }
	| {
			action: "execute";
			ruleId: string;
			execute: { results: SubRulesetResult };
	  };
//    ▲
//    └── If action is "execute", execute MUST exist.
//        TypeScript enforces this at compile time.

function attachResults(result: RuleResult) {
	if (result.action === "execute") {
		// TypeScript KNOWS result.execute exists here
		// No runtime check needed, no bug possible
		console.log(result.execute.results);
	}
}

Cloudflare noted that this bug didn’t occur in their Rust-based FL2 proxy. Rust’s type system would force you to handle the “skipped” case explicitly. You can’t just leave the execute field undefined and hope for the best.

Why Increase the Request Buffer Size?

What is a Request Buffer?

When you submit a form or upload data to a website, that data travels in the HTTP request body. Before Cloudflare forwards the request to the origin server, the WAF needs to scan it for malicious content like SQL injection or XSS payloads.

But scanning happens in memory. Cloudflare can’t just stream the body through, it needs to hold it somewhere to analyze it. That “somewhere” is the request buffer.

┌──────────────────────────────────────────────────────────────────┐
│                        HTTP Request                              │
├──────────────────────────────────────────────────────────────────┤
│  Headers: POST /api/submit                                       │
│           Content-Type: application/json                         │
│           Content-Length: 500000                                 │
├──────────────────────────────────────────────────────────────────┤
│  Body: { "data": "...500KB of content..." }                      │
│                                                                  │
│        ┌─────────────────────────────────────────────────────┐   │
│        │                  Request Buffer                     │   │
│        │                                                     │   │
│        │  WAF loads body here to scan for:                   │   │
│        │  • SQL injection patterns                           │   │
│        │  • XSS payloads                                     │   │
│        │  • Known exploit signatures                         │   │
│        │  • Malicious file uploads                           │   │
│        │                                                     │   │
│        │  Buffer size: 128KB (old) → 1MB (new)               │   │
│        └─────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────┘

Why 1MB?

Cloudflare was responding to CVE-2025-55182, a critical vulnerability in React Server Components. From their post:

“We started rolling out an increase to our buffer size to 1MB, the default limit allowed by Next.js applications. We wanted to make sure as many customers as possible were protected.”

The exact behavior for content beyond the 128KB buffer limit isn’t documented. It could be truncated, streamed without scanning, or handled some other way. What we do know is that increasing the buffer to match Next.js’s 1MB default ensures the WAF can analyze the full request body that applications will accept.

The Tradeoff

Larger buffers mean more memory per request. At Cloudflare’s scale (millions of requests per second), increasing from 128KB to 1MB is significant.

Lessons Learned

Two-pass architectures need clear contracts. When different parts of your code make assumptions about data shape, those assumptions need to be explicit and enforced, preferably by types.

The “off” path is rarely tested. The killswitch had been used many times, but never on an “execute” rule. The bug hid for years in an untested code path.

Simpler code is safer code. The second pass existed for some reason (parallelism? memory?), but it introduced a coupling between two functions that could fall out of sync. Sometimes the simpler design is worth the tradeoff.