CDP vs. BiDi: Browser Automation Protocol Internals for Scrapers


The Scraper
Scraping Techniques
Most scraping engineers use Playwright or Puppeteer without thinking much about the protocol layer. That's fine until it isn't, until you need to do something the high-level API doesn't expose, diagnose a subtle timing issue, or understand why one automation approach gets detected and another doesn't.
The protocol layer matters. This is what's actually happening when Playwright tells a browser to do something.
The Chrome DevTools Protocol (CDP)
CDP was originally built for the Chrome DevTools, the inspector, profiler, and debugger you open with F12. It grew into the standard protocol for browser automation. Puppeteer is built on CDP. Playwright originally used CDP for Chromium and built separate protocols for Firefox and WebKit.
CDP is a JSON-over-WebSocket protocol. The automation client connects to a WebSocket endpoint exposed by the browser, sends JSON commands, and receives JSON responses and events.
// Client → Browser: navigate to URL { "id": 1, "method": "Page.navigate", "params": { "url": "https://example.com" } } // Browser → Client: response { "id": 1, "result": { "frameId": "A1B2C3", "loaderId": "D4E5F6", "errorText": null } } // Browser → Client: event (unsolicited) { "method": "Page.loadEventFired", "params": { "timestamp": 1714521600.123 } }
CDP exposes domains (logical groupings of methods and events):
Page— navigation, lifecycle events, screenshotsNetwork— request interception, response inspection, cookie managementRuntime— JavaScript execution in the page contextDOM— DOM inspection and manipulationEmulation— device emulation, viewport, geolocation spoofingTarget— managing multiple browser contexts and pages
Accessing CDP Directly from Playwright
from playwright.async_api import async_playwright async def use_cdp_directly(): async with async_playwright() as p: browser = await p.chromium.launch() context = await browser.new_context() page = await context.new_page() # Get a CDP session for this page client = await context.new_cdp_session(page) await page.goto("https://example.com") # Enable network interception via CDP directly await client.send("Network.enable") await client.send("Network.setBlockedURLs", { "urls": ["*.analytics.com/*", "*.ads.google.com/*"] }) # Capture all network requests as they happen requests = [] client.on("Network.requestWillBeSent", lambda event: requests.append(event)) await page.reload() await page.wait_for_load_state("networkidle") print(f"Captured {len(requests)} network requests") await browser.close()
CDP for Scraper-Specific Operations
Intercepting requests before they're sent:
async def intercept_and_modify(page, cdp_client): """Intercept API requests and modify headers or responses.""" await cdp_client.send("Fetch.enable", { "patterns": [{"urlPattern": "*/api/v*", "requestStage": "Request"}] }) async def handle_fetch(event): # Modify request headers before sending await cdp_client.send("Fetch.continueRequest", { "requestId": event["requestId"], "headers": [ {"name": "X-Custom-Header", "value": "modified"}, # Pass through other headers unchanged ] }) cdp_client.on("Fetch.requestPaused", handle_fetch)
Spoofing geolocation:
await cdp_client.send("Emulation.setGeolocationOverride", { "latitude": 48.8566, "longitude": 2.3522, "accuracy": 100 })
Overriding timezone (anti-fingerprint):
await cdp_client.send("Emulation.setTimezoneOverride", { "timezoneId": "Europe/Paris" })
WebDriver BiDi: The New Protocol
BiDi (bidirectional) is the W3C standard for next-generation browser automation. Where CDP grew organically out of DevTools, BiDi was designed from scratch to be a proper standard, cross-browser, formally specified, and not tied to Chrome's internals.
Playwright has been transitioning toward BiDi for Firefox (Chrome DevTools is still used for Chromium). When you use Playwright with Firefox in 2026, you're using BiDi under the hood.
BiDi uses the same JSON-over-WebSocket transport as CDP but with a cleaner, more formally defined structure:
// BiDi command { "id": 1, "method": "browsingContext.navigate", "params": { "context": "abc123", "url": "https://example.com", "wait": "complete" } } // BiDi response { "id": 1, "type": "success", "result": { "navigation": "def456", "url": "https://example.com" } }
BiDi's module system is cleaner than CDP's domains:
browsingContext— navigation, contexts (like tabs/frames)network— request/response interceptionscript— JavaScript execution and evaluationlog— console output capturesession— capabilities and session management
BiDi vs CDP for Scrapers: The Practical Differences
Feature | CDP | BiDi |
|---|---|---|
Browser support | Chromium only | Chrome, Firefox, Safari (partial) |
Specification | Informal, Google-driven | W3C standard |
Request interception | Mature, well-documented | Mature in Playwright 1.44+ |
Low-level control | More granular | More abstracted |
Stability | Changes with Chrome | Versioned standard |
Fingerprint exposure | CDP connection detectable | BiDi detectable differently |
For scraping in 2026, the choice is mostly Playwright abstracts it for you. The key case where the protocol matters: CDP is detectable.
Detection Through Protocol Artifacts
This is the part that matters for anti-bot evasion. CDP leaves artifacts that detection systems look for.
The CDP Detection Vector
When a browser is controlled via CDP, several signals can leak:
Runtime bindings. Playwright injects __playwright_binding__ and related properties into the page context. These are visible to JavaScript running on the page.
CDP WebSocket connection. Some detection systems check whether the browser has an open DevTools WebSocket connection. The chrome.debugger API and CDP connection leave traces in the browser's internal state.
window.__cdp and related properties. Certain CDP operations populate internal properties that are not present in non-automated browsers.
rebrowser-patches (the open-source Playwright fork) specifically targets these detection vectors:
// What rebrowser-patches removes/patches: // 1. The CDP binding injection pattern // 2. The Runtime.enable domain exposure // 3. Page.addScriptToEvaluateOnNewDocument artifacts // 4. navigator.webdriver = true
BiDi Detection Differences
BiDi connections are detectable through different mechanisms — primarily the WebSocket upgrade request pattern and the specific headers used. Firefox automation via BiDi has a different fingerprint than Chrome automation via CDP, and anti-bot systems are trained on both.
The practical upshot: no automation protocol is invisible. The goal isn't making the protocol undetectable, it's making the browser state (fingerprint, behavior, cookies, IP) legitimate enough that the detection system doesn't investigate whether you're automated.
Useful CDP Patterns for Scrapers
Blocking Unnecessary Resources
Reduce bandwidth and increase speed by blocking images, fonts, and tracking scripts:
async def setup_resource_blocking(page, client): await client.send("Network.enable") await client.send("Network.setBlockedURLs", { "urls": [ "*.png", "*.jpg", "*.gif", "*.svg", "*.webp", # Images "*.woff", "*.woff2", "*.ttf", # Fonts "*.google-analytics.com/*", # Analytics "*.doubleclick.net/*", # Ads ] })
This can reduce page load time by 50-70% for image-heavy pages where you only need the data.
Extracting Full Network HAR
For API discovery (pre-scraper research):
async def capture_har(page, client): """Capture a complete HAR (HTTP Archive) of all page requests.""" await client.send("Network.enable") entries = [] def on_request(event): entries.append({ 'type': 'request', 'url': event['request']['url'], 'method': event['request']['method'], 'headers': event['request']['headers'], 'requestId': event['requestId'], }) def on_response(event): entries.append({ 'type': 'response', 'url': event['response']['url'], 'status': event['response']['status'], 'headers': dict(event['response']['headers']), 'requestId': event['requestId'], }) client.on("Network.requestWillBeSent", on_request) client.on("Network.responseReceived", on_response) await page.goto("https://target.com/product/123") await page.wait_for_load_state("networkidle") # Find API endpoints api_calls = [e for e in entries if '/api/' in e.get('url', '')] return api_calls
The Level of Abstraction to Work At
For 95% of scraping tasks, Playwright's high-level API is the right level. It handles CDP/BiDi session management, multi-frame handling, and the edge cases you don't want to debug yourself.
Drop to the CDP level when:
You need to intercept and modify requests before they're sent
You need to set browser properties that Playwright doesn't expose directly (advanced emulation, specific Chrome flags)
You're doing API discovery on a new target (HAR capture)
You're debugging a detection issue and need to understand exactly what the browser is exposing
The protocol is the plumbing. Know it exists, understand when to touch it, and let Playwright handle it the rest of the time.
Most scraping engineers use Playwright or Puppeteer without thinking much about the protocol layer. That's fine until it isn't, until you need to do something the high-level API doesn't expose, diagnose a subtle timing issue, or understand why one automation approach gets detected and another doesn't.
The protocol layer matters. This is what's actually happening when Playwright tells a browser to do something.
The Chrome DevTools Protocol (CDP)
CDP was originally built for the Chrome DevTools, the inspector, profiler, and debugger you open with F12. It grew into the standard protocol for browser automation. Puppeteer is built on CDP. Playwright originally used CDP for Chromium and built separate protocols for Firefox and WebKit.
CDP is a JSON-over-WebSocket protocol. The automation client connects to a WebSocket endpoint exposed by the browser, sends JSON commands, and receives JSON responses and events.
// Client → Browser: navigate to URL { "id": 1, "method": "Page.navigate", "params": { "url": "https://example.com" } } // Browser → Client: response { "id": 1, "result": { "frameId": "A1B2C3", "loaderId": "D4E5F6", "errorText": null } } // Browser → Client: event (unsolicited) { "method": "Page.loadEventFired", "params": { "timestamp": 1714521600.123 } }
CDP exposes domains (logical groupings of methods and events):
Page— navigation, lifecycle events, screenshotsNetwork— request interception, response inspection, cookie managementRuntime— JavaScript execution in the page contextDOM— DOM inspection and manipulationEmulation— device emulation, viewport, geolocation spoofingTarget— managing multiple browser contexts and pages
Accessing CDP Directly from Playwright
from playwright.async_api import async_playwright async def use_cdp_directly(): async with async_playwright() as p: browser = await p.chromium.launch() context = await browser.new_context() page = await context.new_page() # Get a CDP session for this page client = await context.new_cdp_session(page) await page.goto("https://example.com") # Enable network interception via CDP directly await client.send("Network.enable") await client.send("Network.setBlockedURLs", { "urls": ["*.analytics.com/*", "*.ads.google.com/*"] }) # Capture all network requests as they happen requests = [] client.on("Network.requestWillBeSent", lambda event: requests.append(event)) await page.reload() await page.wait_for_load_state("networkidle") print(f"Captured {len(requests)} network requests") await browser.close()
CDP for Scraper-Specific Operations
Intercepting requests before they're sent:
async def intercept_and_modify(page, cdp_client): """Intercept API requests and modify headers or responses.""" await cdp_client.send("Fetch.enable", { "patterns": [{"urlPattern": "*/api/v*", "requestStage": "Request"}] }) async def handle_fetch(event): # Modify request headers before sending await cdp_client.send("Fetch.continueRequest", { "requestId": event["requestId"], "headers": [ {"name": "X-Custom-Header", "value": "modified"}, # Pass through other headers unchanged ] }) cdp_client.on("Fetch.requestPaused", handle_fetch)
Spoofing geolocation:
await cdp_client.send("Emulation.setGeolocationOverride", { "latitude": 48.8566, "longitude": 2.3522, "accuracy": 100 })
Overriding timezone (anti-fingerprint):
await cdp_client.send("Emulation.setTimezoneOverride", { "timezoneId": "Europe/Paris" })
WebDriver BiDi: The New Protocol
BiDi (bidirectional) is the W3C standard for next-generation browser automation. Where CDP grew organically out of DevTools, BiDi was designed from scratch to be a proper standard, cross-browser, formally specified, and not tied to Chrome's internals.
Playwright has been transitioning toward BiDi for Firefox (Chrome DevTools is still used for Chromium). When you use Playwright with Firefox in 2026, you're using BiDi under the hood.
BiDi uses the same JSON-over-WebSocket transport as CDP but with a cleaner, more formally defined structure:
// BiDi command { "id": 1, "method": "browsingContext.navigate", "params": { "context": "abc123", "url": "https://example.com", "wait": "complete" } } // BiDi response { "id": 1, "type": "success", "result": { "navigation": "def456", "url": "https://example.com" } }
BiDi's module system is cleaner than CDP's domains:
browsingContext— navigation, contexts (like tabs/frames)network— request/response interceptionscript— JavaScript execution and evaluationlog— console output capturesession— capabilities and session management
BiDi vs CDP for Scrapers: The Practical Differences
Feature | CDP | BiDi |
|---|---|---|
Browser support | Chromium only | Chrome, Firefox, Safari (partial) |
Specification | Informal, Google-driven | W3C standard |
Request interception | Mature, well-documented | Mature in Playwright 1.44+ |
Low-level control | More granular | More abstracted |
Stability | Changes with Chrome | Versioned standard |
Fingerprint exposure | CDP connection detectable | BiDi detectable differently |
For scraping in 2026, the choice is mostly Playwright abstracts it for you. The key case where the protocol matters: CDP is detectable.
Detection Through Protocol Artifacts
This is the part that matters for anti-bot evasion. CDP leaves artifacts that detection systems look for.
The CDP Detection Vector
When a browser is controlled via CDP, several signals can leak:
Runtime bindings. Playwright injects __playwright_binding__ and related properties into the page context. These are visible to JavaScript running on the page.
CDP WebSocket connection. Some detection systems check whether the browser has an open DevTools WebSocket connection. The chrome.debugger API and CDP connection leave traces in the browser's internal state.
window.__cdp and related properties. Certain CDP operations populate internal properties that are not present in non-automated browsers.
rebrowser-patches (the open-source Playwright fork) specifically targets these detection vectors:
// What rebrowser-patches removes/patches: // 1. The CDP binding injection pattern // 2. The Runtime.enable domain exposure // 3. Page.addScriptToEvaluateOnNewDocument artifacts // 4. navigator.webdriver = true
BiDi Detection Differences
BiDi connections are detectable through different mechanisms — primarily the WebSocket upgrade request pattern and the specific headers used. Firefox automation via BiDi has a different fingerprint than Chrome automation via CDP, and anti-bot systems are trained on both.
The practical upshot: no automation protocol is invisible. The goal isn't making the protocol undetectable, it's making the browser state (fingerprint, behavior, cookies, IP) legitimate enough that the detection system doesn't investigate whether you're automated.
Useful CDP Patterns for Scrapers
Blocking Unnecessary Resources
Reduce bandwidth and increase speed by blocking images, fonts, and tracking scripts:
async def setup_resource_blocking(page, client): await client.send("Network.enable") await client.send("Network.setBlockedURLs", { "urls": [ "*.png", "*.jpg", "*.gif", "*.svg", "*.webp", # Images "*.woff", "*.woff2", "*.ttf", # Fonts "*.google-analytics.com/*", # Analytics "*.doubleclick.net/*", # Ads ] })
This can reduce page load time by 50-70% for image-heavy pages where you only need the data.
Extracting Full Network HAR
For API discovery (pre-scraper research):
async def capture_har(page, client): """Capture a complete HAR (HTTP Archive) of all page requests.""" await client.send("Network.enable") entries = [] def on_request(event): entries.append({ 'type': 'request', 'url': event['request']['url'], 'method': event['request']['method'], 'headers': event['request']['headers'], 'requestId': event['requestId'], }) def on_response(event): entries.append({ 'type': 'response', 'url': event['response']['url'], 'status': event['response']['status'], 'headers': dict(event['response']['headers']), 'requestId': event['requestId'], }) client.on("Network.requestWillBeSent", on_request) client.on("Network.responseReceived", on_response) await page.goto("https://target.com/product/123") await page.wait_for_load_state("networkidle") # Find API endpoints api_calls = [e for e in entries if '/api/' in e.get('url', '')] return api_calls
The Level of Abstraction to Work At
For 95% of scraping tasks, Playwright's high-level API is the right level. It handles CDP/BiDi session management, multi-frame handling, and the edge cases you don't want to debug yourself.
Drop to the CDP level when:
You need to intercept and modify requests before they're sent
You need to set browser properties that Playwright doesn't expose directly (advanced emulation, specific Chrome flags)
You're doing API discovery on a new target (HAR capture)
You're debugging a detection issue and need to understand exactly what the browser is exposing
The protocol is the plumbing. Know it exists, understand when to touch it, and let Playwright handle it the rest of the time.
Most scraping engineers use Playwright or Puppeteer without thinking much about the protocol layer. That's fine until it isn't, until you need to do something the high-level API doesn't expose, diagnose a subtle timing issue, or understand why one automation approach gets detected and another doesn't.
The protocol layer matters. This is what's actually happening when Playwright tells a browser to do something.
The Chrome DevTools Protocol (CDP)
CDP was originally built for the Chrome DevTools, the inspector, profiler, and debugger you open with F12. It grew into the standard protocol for browser automation. Puppeteer is built on CDP. Playwright originally used CDP for Chromium and built separate protocols for Firefox and WebKit.
CDP is a JSON-over-WebSocket protocol. The automation client connects to a WebSocket endpoint exposed by the browser, sends JSON commands, and receives JSON responses and events.
// Client → Browser: navigate to URL { "id": 1, "method": "Page.navigate", "params": { "url": "https://example.com" } } // Browser → Client: response { "id": 1, "result": { "frameId": "A1B2C3", "loaderId": "D4E5F6", "errorText": null } } // Browser → Client: event (unsolicited) { "method": "Page.loadEventFired", "params": { "timestamp": 1714521600.123 } }
CDP exposes domains (logical groupings of methods and events):
Page— navigation, lifecycle events, screenshotsNetwork— request interception, response inspection, cookie managementRuntime— JavaScript execution in the page contextDOM— DOM inspection and manipulationEmulation— device emulation, viewport, geolocation spoofingTarget— managing multiple browser contexts and pages
Accessing CDP Directly from Playwright
from playwright.async_api import async_playwright async def use_cdp_directly(): async with async_playwright() as p: browser = await p.chromium.launch() context = await browser.new_context() page = await context.new_page() # Get a CDP session for this page client = await context.new_cdp_session(page) await page.goto("https://example.com") # Enable network interception via CDP directly await client.send("Network.enable") await client.send("Network.setBlockedURLs", { "urls": ["*.analytics.com/*", "*.ads.google.com/*"] }) # Capture all network requests as they happen requests = [] client.on("Network.requestWillBeSent", lambda event: requests.append(event)) await page.reload() await page.wait_for_load_state("networkidle") print(f"Captured {len(requests)} network requests") await browser.close()
CDP for Scraper-Specific Operations
Intercepting requests before they're sent:
async def intercept_and_modify(page, cdp_client): """Intercept API requests and modify headers or responses.""" await cdp_client.send("Fetch.enable", { "patterns": [{"urlPattern": "*/api/v*", "requestStage": "Request"}] }) async def handle_fetch(event): # Modify request headers before sending await cdp_client.send("Fetch.continueRequest", { "requestId": event["requestId"], "headers": [ {"name": "X-Custom-Header", "value": "modified"}, # Pass through other headers unchanged ] }) cdp_client.on("Fetch.requestPaused", handle_fetch)
Spoofing geolocation:
await cdp_client.send("Emulation.setGeolocationOverride", { "latitude": 48.8566, "longitude": 2.3522, "accuracy": 100 })
Overriding timezone (anti-fingerprint):
await cdp_client.send("Emulation.setTimezoneOverride", { "timezoneId": "Europe/Paris" })
WebDriver BiDi: The New Protocol
BiDi (bidirectional) is the W3C standard for next-generation browser automation. Where CDP grew organically out of DevTools, BiDi was designed from scratch to be a proper standard, cross-browser, formally specified, and not tied to Chrome's internals.
Playwright has been transitioning toward BiDi for Firefox (Chrome DevTools is still used for Chromium). When you use Playwright with Firefox in 2026, you're using BiDi under the hood.
BiDi uses the same JSON-over-WebSocket transport as CDP but with a cleaner, more formally defined structure:
// BiDi command { "id": 1, "method": "browsingContext.navigate", "params": { "context": "abc123", "url": "https://example.com", "wait": "complete" } } // BiDi response { "id": 1, "type": "success", "result": { "navigation": "def456", "url": "https://example.com" } }
BiDi's module system is cleaner than CDP's domains:
browsingContext— navigation, contexts (like tabs/frames)network— request/response interceptionscript— JavaScript execution and evaluationlog— console output capturesession— capabilities and session management
BiDi vs CDP for Scrapers: The Practical Differences
Feature | CDP | BiDi |
|---|---|---|
Browser support | Chromium only | Chrome, Firefox, Safari (partial) |
Specification | Informal, Google-driven | W3C standard |
Request interception | Mature, well-documented | Mature in Playwright 1.44+ |
Low-level control | More granular | More abstracted |
Stability | Changes with Chrome | Versioned standard |
Fingerprint exposure | CDP connection detectable | BiDi detectable differently |
For scraping in 2026, the choice is mostly Playwright abstracts it for you. The key case where the protocol matters: CDP is detectable.
Detection Through Protocol Artifacts
This is the part that matters for anti-bot evasion. CDP leaves artifacts that detection systems look for.
The CDP Detection Vector
When a browser is controlled via CDP, several signals can leak:
Runtime bindings. Playwright injects __playwright_binding__ and related properties into the page context. These are visible to JavaScript running on the page.
CDP WebSocket connection. Some detection systems check whether the browser has an open DevTools WebSocket connection. The chrome.debugger API and CDP connection leave traces in the browser's internal state.
window.__cdp and related properties. Certain CDP operations populate internal properties that are not present in non-automated browsers.
rebrowser-patches (the open-source Playwright fork) specifically targets these detection vectors:
// What rebrowser-patches removes/patches: // 1. The CDP binding injection pattern // 2. The Runtime.enable domain exposure // 3. Page.addScriptToEvaluateOnNewDocument artifacts // 4. navigator.webdriver = true
BiDi Detection Differences
BiDi connections are detectable through different mechanisms — primarily the WebSocket upgrade request pattern and the specific headers used. Firefox automation via BiDi has a different fingerprint than Chrome automation via CDP, and anti-bot systems are trained on both.
The practical upshot: no automation protocol is invisible. The goal isn't making the protocol undetectable, it's making the browser state (fingerprint, behavior, cookies, IP) legitimate enough that the detection system doesn't investigate whether you're automated.
Useful CDP Patterns for Scrapers
Blocking Unnecessary Resources
Reduce bandwidth and increase speed by blocking images, fonts, and tracking scripts:
async def setup_resource_blocking(page, client): await client.send("Network.enable") await client.send("Network.setBlockedURLs", { "urls": [ "*.png", "*.jpg", "*.gif", "*.svg", "*.webp", # Images "*.woff", "*.woff2", "*.ttf", # Fonts "*.google-analytics.com/*", # Analytics "*.doubleclick.net/*", # Ads ] })
This can reduce page load time by 50-70% for image-heavy pages where you only need the data.
Extracting Full Network HAR
For API discovery (pre-scraper research):
async def capture_har(page, client): """Capture a complete HAR (HTTP Archive) of all page requests.""" await client.send("Network.enable") entries = [] def on_request(event): entries.append({ 'type': 'request', 'url': event['request']['url'], 'method': event['request']['method'], 'headers': event['request']['headers'], 'requestId': event['requestId'], }) def on_response(event): entries.append({ 'type': 'response', 'url': event['response']['url'], 'status': event['response']['status'], 'headers': dict(event['response']['headers']), 'requestId': event['requestId'], }) client.on("Network.requestWillBeSent", on_request) client.on("Network.responseReceived", on_response) await page.goto("https://target.com/product/123") await page.wait_for_load_state("networkidle") # Find API endpoints api_calls = [e for e in entries if '/api/' in e.get('url', '')] return api_calls
The Level of Abstraction to Work At
For 95% of scraping tasks, Playwright's high-level API is the right level. It handles CDP/BiDi session management, multi-frame handling, and the edge cases you don't want to debug yourself.
Drop to the CDP level when:
You need to intercept and modify requests before they're sent
You need to set browser properties that Playwright doesn't expose directly (advanced emulation, specific Chrome flags)
You're doing API discovery on a new target (HAR capture)
You're debugging a detection issue and need to understand exactly what the browser is exposing
The protocol is the plumbing. Know it exists, understand when to touch it, and let Playwright handle it the rest of the time.

Author
The Scraper
Engineer and Webscraping Specialist
About Author
The Scraper is a software engineer and web scraping specialist, focused on building production-grade data extraction systems. His work centers on large-scale crawling, anti-bot evasion, proxy infrastructure, and browser automation. He writes about real-world scraping failures, silent data corruption, and systems that operate at scale.

