CDP vs. BiDi: Browser Automation Protocol Internals for Scrapers

The Scraper

Last updated on May 27, 2026

Scraping Techniques

Most scraping engineers use Playwright or Puppeteer without thinking much about the protocol layer. That's fine until it isn't, until you need to do something the high-level API doesn't expose, diagnose a subtle timing issue, or understand why one automation approach gets detected and another doesn't.

The protocol layer matters. This is what's actually happening when Playwright tells a browser to do something.


The Chrome DevTools Protocol (CDP)

CDP was originally built for the Chrome DevTools, the inspector, profiler, and debugger you open with F12. It grew into the standard protocol for browser automation. Puppeteer is built on CDP. Playwright originally used CDP for Chromium and built separate protocols for Firefox and WebKit.

CDP is a JSON-over-WebSocket protocol. The automation client connects to a WebSocket endpoint exposed by the browser, sends JSON commands, and receives JSON responses and events.


// Client → Browser: navigate to URL
{
    "id": 1,
    "method": "Page.navigate",
    "params": {
        "url": "https://example.com"
    }
}

// Browser → Client: response
{
    "id": 1,
    "result": {
        "frameId": "A1B2C3",
        "loaderId": "D4E5F6",
        "errorText": null
    }
}

// Browser → Client: event (unsolicited)
{
    "method": "Page.loadEventFired",
    "params": {
        "timestamp": 1714521600.123
    }
}


CDP exposes domains (logical groupings of methods and events):

  • Page — navigation, lifecycle events, screenshots

  • Network — request interception, response inspection, cookie management

  • Runtime — JavaScript execution in the page context

  • DOM — DOM inspection and manipulation

  • Emulation — device emulation, viewport, geolocation spoofing

  • Target — managing multiple browser contexts and pages


Accessing CDP Directly from Playwright


from playwright.async_api import async_playwright

async def use_cdp_directly():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        context = await browser.new_context()
        page = await context.new_page()

        # Get a CDP session for this page
        client = await context.new_cdp_session(page)

        await page.goto("https://example.com")

        # Enable network interception via CDP directly
        await client.send("Network.enable")
        await client.send("Network.setBlockedURLs", {
            "urls": ["*.analytics.com/*", "*.ads.google.com/*"]
        })

        # Capture all network requests as they happen
        requests = []
        client.on("Network.requestWillBeSent", lambda event: requests.append(event))

        await page.reload()
        await page.wait_for_load_state("networkidle")

        print(f"Captured {len(requests)} network requests")
        await browser.close()



CDP for Scraper-Specific Operations

Intercepting requests before they're sent:


async def intercept_and_modify(page, cdp_client):
    """Intercept API requests and modify headers or responses."""
    await cdp_client.send("Fetch.enable", {
        "patterns": [{"urlPattern": "*/api/v*", "requestStage": "Request"}]
    })

    async def handle_fetch(event):
        # Modify request headers before sending
        await cdp_client.send("Fetch.continueRequest", {
            "requestId": event["requestId"],
            "headers": [
                {"name": "X-Custom-Header", "value": "modified"},
                # Pass through other headers unchanged
            ]
        })

    cdp_client.on("Fetch.requestPaused", handle_fetch)


Spoofing geolocation:


await cdp_client.send("Emulation.setGeolocationOverride", {
    "latitude": 48.8566,
    "longitude": 2.3522,
    "accuracy": 100
})


Overriding timezone (anti-fingerprint):


await cdp_client.send("Emulation.setTimezoneOverride", {
    "timezoneId": "Europe/Paris"
})



WebDriver BiDi: The New Protocol

BiDi (bidirectional) is the W3C standard for next-generation browser automation. Where CDP grew organically out of DevTools, BiDi was designed from scratch to be a proper standard, cross-browser, formally specified, and not tied to Chrome's internals.

Playwright has been transitioning toward BiDi for Firefox (Chrome DevTools is still used for Chromium). When you use Playwright with Firefox in 2026, you're using BiDi under the hood.

BiDi uses the same JSON-over-WebSocket transport as CDP but with a cleaner, more formally defined structure:


// BiDi command
{
    "id": 1,
    "method": "browsingContext.navigate",
    "params": {
        "context": "abc123",
        "url": "https://example.com",
        "wait": "complete"
    }
}

// BiDi response
{
    "id": 1,
    "type": "success",
    "result": {
        "navigation": "def456",
        "url": "https://example.com"
    }
}


BiDi's module system is cleaner than CDP's domains:

  • browsingContext — navigation, contexts (like tabs/frames)

  • network — request/response interception

  • script — JavaScript execution and evaluation

  • log — console output capture

  • session — capabilities and session management


BiDi vs CDP for Scrapers: The Practical Differences

Feature

CDP

BiDi

Browser support

Chromium only

Chrome, Firefox, Safari (partial)

Specification

Informal, Google-driven

W3C standard

Request interception

Mature, well-documented

Mature in Playwright 1.44+

Low-level control

More granular

More abstracted

Stability

Changes with Chrome

Versioned standard

Fingerprint exposure

CDP connection detectable

BiDi detectable differently

For scraping in 2026, the choice is mostly Playwright abstracts it for you. The key case where the protocol matters: CDP is detectable.


Detection Through Protocol Artifacts

This is the part that matters for anti-bot evasion. CDP leaves artifacts that detection systems look for.


The CDP Detection Vector

When a browser is controlled via CDP, several signals can leak:

Runtime bindings. Playwright injects __playwright_binding__ and related properties into the page context. These are visible to JavaScript running on the page.

CDP WebSocket connection. Some detection systems check whether the browser has an open DevTools WebSocket connection. The chrome.debugger API and CDP connection leave traces in the browser's internal state.

window.__cdp and related properties. Certain CDP operations populate internal properties that are not present in non-automated browsers.

rebrowser-patches (the open-source Playwright fork) specifically targets these detection vectors:


// What rebrowser-patches removes/patches:
// 1. The CDP binding injection pattern
// 2. The Runtime.enable domain exposure
// 3. Page.addScriptToEvaluateOnNewDocument artifacts
// 4. navigator.webdriver = true



BiDi Detection Differences

BiDi connections are detectable through different mechanisms — primarily the WebSocket upgrade request pattern and the specific headers used. Firefox automation via BiDi has a different fingerprint than Chrome automation via CDP, and anti-bot systems are trained on both.

The practical upshot: no automation protocol is invisible. The goal isn't making the protocol undetectable, it's making the browser state (fingerprint, behavior, cookies, IP) legitimate enough that the detection system doesn't investigate whether you're automated.


Useful CDP Patterns for Scrapers


Blocking Unnecessary Resources

Reduce bandwidth and increase speed by blocking images, fonts, and tracking scripts:


async def setup_resource_blocking(page, client):
    await client.send("Network.enable")
    await client.send("Network.setBlockedURLs", {
        "urls": [
            "*.png", "*.jpg", "*.gif", "*.svg", "*.webp",  # Images
            "*.woff", "*.woff2", "*.ttf",                   # Fonts
            "*.google-analytics.com/*",                     # Analytics
            "*.doubleclick.net/*",                          # Ads
        ]
    })


This can reduce page load time by 50-70% for image-heavy pages where you only need the data.


Extracting Full Network HAR

For API discovery (pre-scraper research):


async def capture_har(page, client):
    """Capture a complete HAR (HTTP Archive) of all page requests."""
    await client.send("Network.enable")
    
    entries = []
    
    def on_request(event):
        entries.append({
            'type': 'request',
            'url': event['request']['url'],
            'method': event['request']['method'],
            'headers': event['request']['headers'],
            'requestId': event['requestId'],
        })

    def on_response(event):
        entries.append({
            'type': 'response',
            'url': event['response']['url'],
            'status': event['response']['status'],
            'headers': dict(event['response']['headers']),
            'requestId': event['requestId'],
        })

    client.on("Network.requestWillBeSent", on_request)
    client.on("Network.responseReceived", on_response)

    await page.goto("https://target.com/product/123")
    await page.wait_for_load_state("networkidle")

    # Find API endpoints
    api_calls = [e for e in entries if '/api/' in e.get('url', '')]
    return api_calls



The Level of Abstraction to Work At

For 95% of scraping tasks, Playwright's high-level API is the right level. It handles CDP/BiDi session management, multi-frame handling, and the edge cases you don't want to debug yourself.

Drop to the CDP level when:

  • You need to intercept and modify requests before they're sent

  • You need to set browser properties that Playwright doesn't expose directly (advanced emulation, specific Chrome flags)

  • You're doing API discovery on a new target (HAR capture)

  • You're debugging a detection issue and need to understand exactly what the browser is exposing

The protocol is the plumbing. Know it exists, understand when to touch it, and let Playwright handle it the rest of the time.

Most scraping engineers use Playwright or Puppeteer without thinking much about the protocol layer. That's fine until it isn't, until you need to do something the high-level API doesn't expose, diagnose a subtle timing issue, or understand why one automation approach gets detected and another doesn't.

The protocol layer matters. This is what's actually happening when Playwright tells a browser to do something.


The Chrome DevTools Protocol (CDP)

CDP was originally built for the Chrome DevTools, the inspector, profiler, and debugger you open with F12. It grew into the standard protocol for browser automation. Puppeteer is built on CDP. Playwright originally used CDP for Chromium and built separate protocols for Firefox and WebKit.

CDP is a JSON-over-WebSocket protocol. The automation client connects to a WebSocket endpoint exposed by the browser, sends JSON commands, and receives JSON responses and events.


// Client → Browser: navigate to URL
{
    "id": 1,
    "method": "Page.navigate",
    "params": {
        "url": "https://example.com"
    }
}

// Browser → Client: response
{
    "id": 1,
    "result": {
        "frameId": "A1B2C3",
        "loaderId": "D4E5F6",
        "errorText": null
    }
}

// Browser → Client: event (unsolicited)
{
    "method": "Page.loadEventFired",
    "params": {
        "timestamp": 1714521600.123
    }
}


CDP exposes domains (logical groupings of methods and events):

  • Page — navigation, lifecycle events, screenshots

  • Network — request interception, response inspection, cookie management

  • Runtime — JavaScript execution in the page context

  • DOM — DOM inspection and manipulation

  • Emulation — device emulation, viewport, geolocation spoofing

  • Target — managing multiple browser contexts and pages


Accessing CDP Directly from Playwright


from playwright.async_api import async_playwright

async def use_cdp_directly():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        context = await browser.new_context()
        page = await context.new_page()

        # Get a CDP session for this page
        client = await context.new_cdp_session(page)

        await page.goto("https://example.com")

        # Enable network interception via CDP directly
        await client.send("Network.enable")
        await client.send("Network.setBlockedURLs", {
            "urls": ["*.analytics.com/*", "*.ads.google.com/*"]
        })

        # Capture all network requests as they happen
        requests = []
        client.on("Network.requestWillBeSent", lambda event: requests.append(event))

        await page.reload()
        await page.wait_for_load_state("networkidle")

        print(f"Captured {len(requests)} network requests")
        await browser.close()



CDP for Scraper-Specific Operations

Intercepting requests before they're sent:


async def intercept_and_modify(page, cdp_client):
    """Intercept API requests and modify headers or responses."""
    await cdp_client.send("Fetch.enable", {
        "patterns": [{"urlPattern": "*/api/v*", "requestStage": "Request"}]
    })

    async def handle_fetch(event):
        # Modify request headers before sending
        await cdp_client.send("Fetch.continueRequest", {
            "requestId": event["requestId"],
            "headers": [
                {"name": "X-Custom-Header", "value": "modified"},
                # Pass through other headers unchanged
            ]
        })

    cdp_client.on("Fetch.requestPaused", handle_fetch)


Spoofing geolocation:


await cdp_client.send("Emulation.setGeolocationOverride", {
    "latitude": 48.8566,
    "longitude": 2.3522,
    "accuracy": 100
})


Overriding timezone (anti-fingerprint):


await cdp_client.send("Emulation.setTimezoneOverride", {
    "timezoneId": "Europe/Paris"
})



WebDriver BiDi: The New Protocol

BiDi (bidirectional) is the W3C standard for next-generation browser automation. Where CDP grew organically out of DevTools, BiDi was designed from scratch to be a proper standard, cross-browser, formally specified, and not tied to Chrome's internals.

Playwright has been transitioning toward BiDi for Firefox (Chrome DevTools is still used for Chromium). When you use Playwright with Firefox in 2026, you're using BiDi under the hood.

BiDi uses the same JSON-over-WebSocket transport as CDP but with a cleaner, more formally defined structure:


// BiDi command
{
    "id": 1,
    "method": "browsingContext.navigate",
    "params": {
        "context": "abc123",
        "url": "https://example.com",
        "wait": "complete"
    }
}

// BiDi response
{
    "id": 1,
    "type": "success",
    "result": {
        "navigation": "def456",
        "url": "https://example.com"
    }
}


BiDi's module system is cleaner than CDP's domains:

  • browsingContext — navigation, contexts (like tabs/frames)

  • network — request/response interception

  • script — JavaScript execution and evaluation

  • log — console output capture

  • session — capabilities and session management


BiDi vs CDP for Scrapers: The Practical Differences

Feature

CDP

BiDi

Browser support

Chromium only

Chrome, Firefox, Safari (partial)

Specification

Informal, Google-driven

W3C standard

Request interception

Mature, well-documented

Mature in Playwright 1.44+

Low-level control

More granular

More abstracted

Stability

Changes with Chrome

Versioned standard

Fingerprint exposure

CDP connection detectable

BiDi detectable differently

For scraping in 2026, the choice is mostly Playwright abstracts it for you. The key case where the protocol matters: CDP is detectable.


Detection Through Protocol Artifacts

This is the part that matters for anti-bot evasion. CDP leaves artifacts that detection systems look for.


The CDP Detection Vector

When a browser is controlled via CDP, several signals can leak:

Runtime bindings. Playwright injects __playwright_binding__ and related properties into the page context. These are visible to JavaScript running on the page.

CDP WebSocket connection. Some detection systems check whether the browser has an open DevTools WebSocket connection. The chrome.debugger API and CDP connection leave traces in the browser's internal state.

window.__cdp and related properties. Certain CDP operations populate internal properties that are not present in non-automated browsers.

rebrowser-patches (the open-source Playwright fork) specifically targets these detection vectors:


// What rebrowser-patches removes/patches:
// 1. The CDP binding injection pattern
// 2. The Runtime.enable domain exposure
// 3. Page.addScriptToEvaluateOnNewDocument artifacts
// 4. navigator.webdriver = true



BiDi Detection Differences

BiDi connections are detectable through different mechanisms — primarily the WebSocket upgrade request pattern and the specific headers used. Firefox automation via BiDi has a different fingerprint than Chrome automation via CDP, and anti-bot systems are trained on both.

The practical upshot: no automation protocol is invisible. The goal isn't making the protocol undetectable, it's making the browser state (fingerprint, behavior, cookies, IP) legitimate enough that the detection system doesn't investigate whether you're automated.


Useful CDP Patterns for Scrapers


Blocking Unnecessary Resources

Reduce bandwidth and increase speed by blocking images, fonts, and tracking scripts:


async def setup_resource_blocking(page, client):
    await client.send("Network.enable")
    await client.send("Network.setBlockedURLs", {
        "urls": [
            "*.png", "*.jpg", "*.gif", "*.svg", "*.webp",  # Images
            "*.woff", "*.woff2", "*.ttf",                   # Fonts
            "*.google-analytics.com/*",                     # Analytics
            "*.doubleclick.net/*",                          # Ads
        ]
    })


This can reduce page load time by 50-70% for image-heavy pages where you only need the data.


Extracting Full Network HAR

For API discovery (pre-scraper research):


async def capture_har(page, client):
    """Capture a complete HAR (HTTP Archive) of all page requests."""
    await client.send("Network.enable")
    
    entries = []
    
    def on_request(event):
        entries.append({
            'type': 'request',
            'url': event['request']['url'],
            'method': event['request']['method'],
            'headers': event['request']['headers'],
            'requestId': event['requestId'],
        })

    def on_response(event):
        entries.append({
            'type': 'response',
            'url': event['response']['url'],
            'status': event['response']['status'],
            'headers': dict(event['response']['headers']),
            'requestId': event['requestId'],
        })

    client.on("Network.requestWillBeSent", on_request)
    client.on("Network.responseReceived", on_response)

    await page.goto("https://target.com/product/123")
    await page.wait_for_load_state("networkidle")

    # Find API endpoints
    api_calls = [e for e in entries if '/api/' in e.get('url', '')]
    return api_calls



The Level of Abstraction to Work At

For 95% of scraping tasks, Playwright's high-level API is the right level. It handles CDP/BiDi session management, multi-frame handling, and the edge cases you don't want to debug yourself.

Drop to the CDP level when:

  • You need to intercept and modify requests before they're sent

  • You need to set browser properties that Playwright doesn't expose directly (advanced emulation, specific Chrome flags)

  • You're doing API discovery on a new target (HAR capture)

  • You're debugging a detection issue and need to understand exactly what the browser is exposing

The protocol is the plumbing. Know it exists, understand when to touch it, and let Playwright handle it the rest of the time.

Most scraping engineers use Playwright or Puppeteer without thinking much about the protocol layer. That's fine until it isn't, until you need to do something the high-level API doesn't expose, diagnose a subtle timing issue, or understand why one automation approach gets detected and another doesn't.

The protocol layer matters. This is what's actually happening when Playwright tells a browser to do something.


The Chrome DevTools Protocol (CDP)

CDP was originally built for the Chrome DevTools, the inspector, profiler, and debugger you open with F12. It grew into the standard protocol for browser automation. Puppeteer is built on CDP. Playwright originally used CDP for Chromium and built separate protocols for Firefox and WebKit.

CDP is a JSON-over-WebSocket protocol. The automation client connects to a WebSocket endpoint exposed by the browser, sends JSON commands, and receives JSON responses and events.


// Client → Browser: navigate to URL
{
    "id": 1,
    "method": "Page.navigate",
    "params": {
        "url": "https://example.com"
    }
}

// Browser → Client: response
{
    "id": 1,
    "result": {
        "frameId": "A1B2C3",
        "loaderId": "D4E5F6",
        "errorText": null
    }
}

// Browser → Client: event (unsolicited)
{
    "method": "Page.loadEventFired",
    "params": {
        "timestamp": 1714521600.123
    }
}


CDP exposes domains (logical groupings of methods and events):

  • Page — navigation, lifecycle events, screenshots

  • Network — request interception, response inspection, cookie management

  • Runtime — JavaScript execution in the page context

  • DOM — DOM inspection and manipulation

  • Emulation — device emulation, viewport, geolocation spoofing

  • Target — managing multiple browser contexts and pages


Accessing CDP Directly from Playwright


from playwright.async_api import async_playwright

async def use_cdp_directly():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        context = await browser.new_context()
        page = await context.new_page()

        # Get a CDP session for this page
        client = await context.new_cdp_session(page)

        await page.goto("https://example.com")

        # Enable network interception via CDP directly
        await client.send("Network.enable")
        await client.send("Network.setBlockedURLs", {
            "urls": ["*.analytics.com/*", "*.ads.google.com/*"]
        })

        # Capture all network requests as they happen
        requests = []
        client.on("Network.requestWillBeSent", lambda event: requests.append(event))

        await page.reload()
        await page.wait_for_load_state("networkidle")

        print(f"Captured {len(requests)} network requests")
        await browser.close()



CDP for Scraper-Specific Operations

Intercepting requests before they're sent:


async def intercept_and_modify(page, cdp_client):
    """Intercept API requests and modify headers or responses."""
    await cdp_client.send("Fetch.enable", {
        "patterns": [{"urlPattern": "*/api/v*", "requestStage": "Request"}]
    })

    async def handle_fetch(event):
        # Modify request headers before sending
        await cdp_client.send("Fetch.continueRequest", {
            "requestId": event["requestId"],
            "headers": [
                {"name": "X-Custom-Header", "value": "modified"},
                # Pass through other headers unchanged
            ]
        })

    cdp_client.on("Fetch.requestPaused", handle_fetch)


Spoofing geolocation:


await cdp_client.send("Emulation.setGeolocationOverride", {
    "latitude": 48.8566,
    "longitude": 2.3522,
    "accuracy": 100
})


Overriding timezone (anti-fingerprint):


await cdp_client.send("Emulation.setTimezoneOverride", {
    "timezoneId": "Europe/Paris"
})



WebDriver BiDi: The New Protocol

BiDi (bidirectional) is the W3C standard for next-generation browser automation. Where CDP grew organically out of DevTools, BiDi was designed from scratch to be a proper standard, cross-browser, formally specified, and not tied to Chrome's internals.

Playwright has been transitioning toward BiDi for Firefox (Chrome DevTools is still used for Chromium). When you use Playwright with Firefox in 2026, you're using BiDi under the hood.

BiDi uses the same JSON-over-WebSocket transport as CDP but with a cleaner, more formally defined structure:


// BiDi command
{
    "id": 1,
    "method": "browsingContext.navigate",
    "params": {
        "context": "abc123",
        "url": "https://example.com",
        "wait": "complete"
    }
}

// BiDi response
{
    "id": 1,
    "type": "success",
    "result": {
        "navigation": "def456",
        "url": "https://example.com"
    }
}


BiDi's module system is cleaner than CDP's domains:

  • browsingContext — navigation, contexts (like tabs/frames)

  • network — request/response interception

  • script — JavaScript execution and evaluation

  • log — console output capture

  • session — capabilities and session management


BiDi vs CDP for Scrapers: The Practical Differences

Feature

CDP

BiDi

Browser support

Chromium only

Chrome, Firefox, Safari (partial)

Specification

Informal, Google-driven

W3C standard

Request interception

Mature, well-documented

Mature in Playwright 1.44+

Low-level control

More granular

More abstracted

Stability

Changes with Chrome

Versioned standard

Fingerprint exposure

CDP connection detectable

BiDi detectable differently

For scraping in 2026, the choice is mostly Playwright abstracts it for you. The key case where the protocol matters: CDP is detectable.


Detection Through Protocol Artifacts

This is the part that matters for anti-bot evasion. CDP leaves artifacts that detection systems look for.


The CDP Detection Vector

When a browser is controlled via CDP, several signals can leak:

Runtime bindings. Playwright injects __playwright_binding__ and related properties into the page context. These are visible to JavaScript running on the page.

CDP WebSocket connection. Some detection systems check whether the browser has an open DevTools WebSocket connection. The chrome.debugger API and CDP connection leave traces in the browser's internal state.

window.__cdp and related properties. Certain CDP operations populate internal properties that are not present in non-automated browsers.

rebrowser-patches (the open-source Playwright fork) specifically targets these detection vectors:


// What rebrowser-patches removes/patches:
// 1. The CDP binding injection pattern
// 2. The Runtime.enable domain exposure
// 3. Page.addScriptToEvaluateOnNewDocument artifacts
// 4. navigator.webdriver = true



BiDi Detection Differences

BiDi connections are detectable through different mechanisms — primarily the WebSocket upgrade request pattern and the specific headers used. Firefox automation via BiDi has a different fingerprint than Chrome automation via CDP, and anti-bot systems are trained on both.

The practical upshot: no automation protocol is invisible. The goal isn't making the protocol undetectable, it's making the browser state (fingerprint, behavior, cookies, IP) legitimate enough that the detection system doesn't investigate whether you're automated.


Useful CDP Patterns for Scrapers


Blocking Unnecessary Resources

Reduce bandwidth and increase speed by blocking images, fonts, and tracking scripts:


async def setup_resource_blocking(page, client):
    await client.send("Network.enable")
    await client.send("Network.setBlockedURLs", {
        "urls": [
            "*.png", "*.jpg", "*.gif", "*.svg", "*.webp",  # Images
            "*.woff", "*.woff2", "*.ttf",                   # Fonts
            "*.google-analytics.com/*",                     # Analytics
            "*.doubleclick.net/*",                          # Ads
        ]
    })


This can reduce page load time by 50-70% for image-heavy pages where you only need the data.


Extracting Full Network HAR

For API discovery (pre-scraper research):


async def capture_har(page, client):
    """Capture a complete HAR (HTTP Archive) of all page requests."""
    await client.send("Network.enable")
    
    entries = []
    
    def on_request(event):
        entries.append({
            'type': 'request',
            'url': event['request']['url'],
            'method': event['request']['method'],
            'headers': event['request']['headers'],
            'requestId': event['requestId'],
        })

    def on_response(event):
        entries.append({
            'type': 'response',
            'url': event['response']['url'],
            'status': event['response']['status'],
            'headers': dict(event['response']['headers']),
            'requestId': event['requestId'],
        })

    client.on("Network.requestWillBeSent", on_request)
    client.on("Network.responseReceived", on_response)

    await page.goto("https://target.com/product/123")
    await page.wait_for_load_state("networkidle")

    # Find API endpoints
    api_calls = [e for e in entries if '/api/' in e.get('url', '')]
    return api_calls



The Level of Abstraction to Work At

For 95% of scraping tasks, Playwright's high-level API is the right level. It handles CDP/BiDi session management, multi-frame handling, and the edge cases you don't want to debug yourself.

Drop to the CDP level when:

  • You need to intercept and modify requests before they're sent

  • You need to set browser properties that Playwright doesn't expose directly (advanced emulation, specific Chrome flags)

  • You're doing API discovery on a new target (HAR capture)

  • You're debugging a detection issue and need to understand exactly what the browser is exposing

The protocol is the plumbing. Know it exists, understand when to touch it, and let Playwright handle it the rest of the time.

Author

The Scraper

Engineer and Webscraping Specialist

About Author

The Scraper is a software engineer and web scraping specialist, focused on building production-grade data extraction systems. His work centers on large-scale crawling, anti-bot evasion, proxy infrastructure, and browser automation. He writes about real-world scraping failures, silent data corruption, and systems that operate at scale.

Like this article? Share it.
You asked, we answer - Users questions:

In This Article