Powerful C# Web Scraping: Tools, Proxies, & Techniques

Sarah Whitmore

Last edited on May 4, 2025
Last edited on May 4, 2025

Scraping Techniques

Diving into C# Web Scraping: Tools and Tactics

Leveraging C# for web scraping opens up a world of possibilities for businesses. In today's data-driven landscape, the ability to automatically gather and analyze web data is a significant advantage. Think price tracking, competitor analysis, market trend monitoring, or aggregating customer feedback – web scraping makes it feasible at scale.

And guess what? Using C# and the .NET ecosystem for these tasks can be surprisingly straightforward and efficient. We're going to walk through how you can scrape websites using C# paired with a powerful technique: controlling headless browsers. We'll cover everything from setting up your environment to pulling out the data you need.

Crucially, we'll also explore how to perform web scraping responsibly and effectively, minimizing the chances of encountering blocks. Ready to dive in?

Is C# a Solid Choice for Web Scraping?

Absolutely. C# offers a compelling combination of performance and ease of use, making it a strong contender for web scraping projects. It's part of the robust .NET framework, known for its speed and efficiency. Plus, C# boasts a large, active community. This means plenty of libraries, documentation, and online discussions are available, so if you hit a snag, chances are someone else has already figured it out.

Choosing Your C# Web Scraping Toolkit

When it comes to scraping the web with C#, you have several approaches. Let's look at the common ones:

  • Raw HTTP Requests + Regex: You could use libraries like HttpClient to fetch raw HTML and then parse it with regular expressions. However, Regex can be notoriously tricky to write correctly and extremely brittle – minor changes to a website's structure can break your scraper.

  • HTTP Requests + HTML Parsers: A step up involves fetching HTML and then using a dedicated parser library like Html Agility Pack. While better than Regex, these parsers often struggle with modern websites that rely heavily on JavaScript to load content dynamically. They essentially analyze the initial HTML source, potentially missing crucial data loaded later.

  • Targeting APIs or XHR Requests: Sometimes, websites load data through background API calls (XHR). If you can identify and replicate these requests, it can be very efficient. However, APIs aren't always available, documented, or stable, and figuring them out requires some reverse-engineering.

  • Headless Browsers: This is often the most robust and reliable method, especially for complex, JavaScript-heavy sites. Libraries like Playwright or PuppeteerSharp allow you to programmatically control a real browser (like Chrome, Firefox, or WebKit) behind the scenes.

Using a headless browser means your code interacts with the website just like a real user would: it runs JavaScript, handles dynamic content, allows clicks, form submissions, and scrolling. This dramatically increases the chances of successfully extracting the data you need. For C#, the standout library in this category is Playwright for .NET.

A Quick Word on Playwright vs. PuppeteerSharp

Both Playwright and PuppeteerSharp are capable .NET libraries for controlling headless browsers. PuppeteerSharp is a port of the popular Node.js Puppeteer library. Playwright, backed by Microsoft, offers a slightly broader API, supports multiple browser engines (Chromium, Firefox, WebKit) out-of-the-box, and is often considered easier to get started with, especially within the .NET ecosystem. That's why we'll be focusing on Playwright for our examples.

Staying Under the Radar: Avoiding Blocks

So, can websites tell you're scraping them? Sometimes, yes. While web scraping itself is generally legal for publicly accessible data, many websites employ measures to detect and block automated access to protect their resources or data.

Detection often relies on two things:

  1. Suspicious Request Signatures: Simple HTTP clients might send requests lacking standard browser headers (like User-Agent). Headless browsers like Playwright mitigate this significantly because they use a real browser engine, sending typical headers.

  2. Unusual Browsing Patterns: Making hundreds of requests from the same IP address in a short period is a dead giveaway. Visiting pages in a robotic sequence or hitting specific endpoints repeatedly can also trigger alarms.

The most effective way to handle the IP address issue is by using proxies. Proxies act as intermediaries, masking your real IP address. By routing your requests through different proxy IPs, each connection appears to come from a unique visitor, making it much harder for websites to identify and block your scraping activity based on volume.

This is where services like Evomi come in. We provide ethically sourced residential proxies that offer IP addresses from real devices worldwide. This makes your requests look indistinguishable from genuine user traffic. Evomi provides high-quality proxies with reliable connections and excellent support, all based out of Switzerland, known for its quality standards. You can even try our residential, mobile, or datacenter proxies with a completely free trial to see how they work.

When you use a proxy service, you'll typically get connection details (endpoint address, port, username, password) to configure in your scraper.

Let's Scrape! C# and Playwright in Action

Okay, theory time is over. Let's get practical.

Step 1: Setting Up Your C# Scraping Project

First, you'll need a development environment. If you don't have one, Visual Studio (Community Edition is free) or Visual Studio Code with the C# Dev Kit are excellent choices for Windows, macOS, or Linux.

1. Create a new C# project. A Console App (.NET Core or later) is a good starting point for a simple scraper.

2. Add the Playwright library. You can do this via the NuGet Package Manager in Visual Studio (Tools > NuGet Package Manager > Manage NuGet Packages for Solution...) or by using the .NET CLI:

3. Install the necessary browser binaries. Playwright needs these to function. Run the following command in your project's terminal (you might need PowerShell):

(Replace netX.Y with your project's target framework, e.g., net8.0). This command downloads the browser engines Playwright controls.

Step 2: Basic Scraping and Proxy Integration

Now let's write some code. Open your Program.cs file and replace its contents with something like this:

using Microsoft.Playwright;

class Program
{
    static async Task Main(string[] args)
    {
        // --- Evomi Proxy Configuration ---
        // Replace with your actual Evomi credentials
        var proxySettings = new Proxy
        {
            Server = "rp.evomi.com:1000", // Example: Evomi Residential HTTP endpoint
            Username = "YOUR_EVOMI_USERNAME",
            Password = "YOUR_EVOMI_PASSWORD"
        };
        // --- End Proxy Configuration ---

        Console.WriteLine("Initializing Playwright...");
        using var playwright = await Playwright.CreateAsync();

        Console.WriteLine("Launching browser with proxy settings...");
        await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
        {
            Headless = true, // Set to false to see the browser window
            Proxy = proxySettings
        });

        Console.WriteLine("Opening new page...");
        var page = await browser.NewPageAsync();

        string targetUrl = "https://geo.evomi.com/"; // Target site to check IP
        Console.WriteLine($"Navigating to {targetUrl}...");
        await page.GotoAsync(targetUrl);

        Console.WriteLine($"Page loaded: {await page.TitleAsync()}");

        // Example: Extracting the displayed IP address
        var ipElement = page.Locator("#ip-address"); // Adjust selector based on geo.evomi.com's structure
        string displayedIp = await ipElement.InnerTextAsync();
        Console.WriteLine($"IP address detected by website: {displayedIp}");

        // Optional: Take a screenshot
        string screenshotPath = "website_screenshot.png";
        await page.ScreenshotAsync(new PageScreenshotOptions
        {
            Path = screenshotPath
        });
        Console.WriteLine($"Screenshot saved to {screenshotPath}");

        Console.WriteLine("Closing browser...");
        await browser.CloseAsync();

        Console.WriteLine("Scraping task finished.");
    }
}

Breaking Down the Code:

  • It starts by defining the proxy configuration using Evomi's residential proxy endpoint (`rp.evomi.com:1000` for HTTP). Remember to replace the placeholders with your actual Evomi credentials.

  • Playwright.CreateAsync() initializes the Playwright environment.

  • playwright.Chromium.LaunchAsync() starts a Chromium browser instance. We pass our proxySettings here. Headless = true means the browser runs in the background; set it to false if you want to see the browser window pop up.

  • browser.NewPageAsync() opens a new tab.

  • page.GotoAsync() navigates the tab to the specified URL (we're using Evomi's IP checker tool as an example).

  • page.Locator("#ip-address").InnerTextAsync() finds an element with the ID `ip-address` (you'd need to inspect `geo.evomi.com` to find the correct selector for the IP display) and extracts its text content.

  • page.ScreenshotAsync() saves a screenshot of the current view.

  • Finally, it closes the browser and finishes.

When you run this code, the output should show the IP address of the Evomi proxy server, not your own machine's IP. This confirms the proxy connection is working!

Step 3: Capturing Screenshots

As seen above, taking a basic screenshot is simple:

await page.ScreenshotAsync(new()
{
    Path = "page_capture.jpg"
});

But Playwright offers more control:

  • Full Page Screenshot: Capture the entire scrollable page, not just the visible part.

    await page.ScreenshotAsync(new()
    {
        Path = "full_page.png",
        FullPage = true
    });
  • Element Screenshot: Capture only a specific element.

    // Example: Screenshot only the element with id 'main-content'
    await page.Locator("#main-content").ScreenshotAsync(new()
    {
        Path = "element_capture.png"
    });

Step 4: Extracting Data with Locators

Screenshots are nice, but usually, you want the actual data. Playwright's `Locator` system is key here. Locators define how to find elements on the page.

You can select elements using various strategies:

  • Text Content

  • CSS Selectors

  • XPath Expressions

  • Combinations and filtering

Text Locators

Select elements based on the text they contain. This is often resilient to minor HTML structure changes.

// Find a button with the exact text "Submit" and click it
await page.Locator("text=Submit").ClickAsync();

// Shortcut using single quotes inside double quotes (assumed text selector)
await page.Locator("'Submit'").ClickAsync();

// Find any element containing the text "product details" (case-insensitive)
var detailsLocator = page.Locator("text=/product details/i");
string detailsText = await detailsLocator.InnerTextAsync();

CSS Selectors

Use standard CSS selectors, from simple IDs and classes to complex combinations.

// Get text from the first H2 heading inside an element with class 'product-info'
string title = await page.Locator(".product-info h2").First.TextContentAsync() ?? "Title not found";

// Find all list items within an ordered list with id 'results'
var listItems = page.Locator("ol#results li");
int count = await listItems.CountAsync();
Console.WriteLine($"Found {count} result items.");

// Get the 'href' attribute of a link with class 'download-link'
string? downloadUrl = await page.Locator("a.download-link").GetAttributeAsync("href");

XPath Selectors

XPath provides another powerful way to navigate the HTML structure.

// Select an input field whose 'name' attribute is 'username'
var usernameInput = page.Locator("xpath=//input[@name='username']");
await usernameInput.FillAsync("myUser");

// Find the second paragraph tag on the page
string secondParagraph = await page.Locator("xpath=(//p)[2]").TextContentAsync() ?? "";

You can often get the XPath or CSS selector for an element by using your browser's developer tools (right-click the element -> Inspect -> right-click in the Elements panel -> Copy -> Copy selector / Copy XPath).

Wrapping Up

We've journeyed through the essentials of web scraping using C# and the Playwright library. You've seen how to set up your project, control a headless browser, integrate proxies like those from Evomi to avoid detection, capture screenshots, and extract specific data using various locators.

C# provides a performant and developer-friendly platform for these tasks, and Playwright makes interacting with modern web pages remarkably robust. Keep experimenting, respect website terms of service, and happy scraping!

Diving into C# Web Scraping: Tools and Tactics

Leveraging C# for web scraping opens up a world of possibilities for businesses. In today's data-driven landscape, the ability to automatically gather and analyze web data is a significant advantage. Think price tracking, competitor analysis, market trend monitoring, or aggregating customer feedback – web scraping makes it feasible at scale.

And guess what? Using C# and the .NET ecosystem for these tasks can be surprisingly straightforward and efficient. We're going to walk through how you can scrape websites using C# paired with a powerful technique: controlling headless browsers. We'll cover everything from setting up your environment to pulling out the data you need.

Crucially, we'll also explore how to perform web scraping responsibly and effectively, minimizing the chances of encountering blocks. Ready to dive in?

Is C# a Solid Choice for Web Scraping?

Absolutely. C# offers a compelling combination of performance and ease of use, making it a strong contender for web scraping projects. It's part of the robust .NET framework, known for its speed and efficiency. Plus, C# boasts a large, active community. This means plenty of libraries, documentation, and online discussions are available, so if you hit a snag, chances are someone else has already figured it out.

Choosing Your C# Web Scraping Toolkit

When it comes to scraping the web with C#, you have several approaches. Let's look at the common ones:

  • Raw HTTP Requests + Regex: You could use libraries like HttpClient to fetch raw HTML and then parse it with regular expressions. However, Regex can be notoriously tricky to write correctly and extremely brittle – minor changes to a website's structure can break your scraper.

  • HTTP Requests + HTML Parsers: A step up involves fetching HTML and then using a dedicated parser library like Html Agility Pack. While better than Regex, these parsers often struggle with modern websites that rely heavily on JavaScript to load content dynamically. They essentially analyze the initial HTML source, potentially missing crucial data loaded later.

  • Targeting APIs or XHR Requests: Sometimes, websites load data through background API calls (XHR). If you can identify and replicate these requests, it can be very efficient. However, APIs aren't always available, documented, or stable, and figuring them out requires some reverse-engineering.

  • Headless Browsers: This is often the most robust and reliable method, especially for complex, JavaScript-heavy sites. Libraries like Playwright or PuppeteerSharp allow you to programmatically control a real browser (like Chrome, Firefox, or WebKit) behind the scenes.

Using a headless browser means your code interacts with the website just like a real user would: it runs JavaScript, handles dynamic content, allows clicks, form submissions, and scrolling. This dramatically increases the chances of successfully extracting the data you need. For C#, the standout library in this category is Playwright for .NET.

A Quick Word on Playwright vs. PuppeteerSharp

Both Playwright and PuppeteerSharp are capable .NET libraries for controlling headless browsers. PuppeteerSharp is a port of the popular Node.js Puppeteer library. Playwright, backed by Microsoft, offers a slightly broader API, supports multiple browser engines (Chromium, Firefox, WebKit) out-of-the-box, and is often considered easier to get started with, especially within the .NET ecosystem. That's why we'll be focusing on Playwright for our examples.

Staying Under the Radar: Avoiding Blocks

So, can websites tell you're scraping them? Sometimes, yes. While web scraping itself is generally legal for publicly accessible data, many websites employ measures to detect and block automated access to protect their resources or data.

Detection often relies on two things:

  1. Suspicious Request Signatures: Simple HTTP clients might send requests lacking standard browser headers (like User-Agent). Headless browsers like Playwright mitigate this significantly because they use a real browser engine, sending typical headers.

  2. Unusual Browsing Patterns: Making hundreds of requests from the same IP address in a short period is a dead giveaway. Visiting pages in a robotic sequence or hitting specific endpoints repeatedly can also trigger alarms.

The most effective way to handle the IP address issue is by using proxies. Proxies act as intermediaries, masking your real IP address. By routing your requests through different proxy IPs, each connection appears to come from a unique visitor, making it much harder for websites to identify and block your scraping activity based on volume.

This is where services like Evomi come in. We provide ethically sourced residential proxies that offer IP addresses from real devices worldwide. This makes your requests look indistinguishable from genuine user traffic. Evomi provides high-quality proxies with reliable connections and excellent support, all based out of Switzerland, known for its quality standards. You can even try our residential, mobile, or datacenter proxies with a completely free trial to see how they work.

When you use a proxy service, you'll typically get connection details (endpoint address, port, username, password) to configure in your scraper.

Let's Scrape! C# and Playwright in Action

Okay, theory time is over. Let's get practical.

Step 1: Setting Up Your C# Scraping Project

First, you'll need a development environment. If you don't have one, Visual Studio (Community Edition is free) or Visual Studio Code with the C# Dev Kit are excellent choices for Windows, macOS, or Linux.

1. Create a new C# project. A Console App (.NET Core or later) is a good starting point for a simple scraper.

2. Add the Playwright library. You can do this via the NuGet Package Manager in Visual Studio (Tools > NuGet Package Manager > Manage NuGet Packages for Solution...) or by using the .NET CLI:

3. Install the necessary browser binaries. Playwright needs these to function. Run the following command in your project's terminal (you might need PowerShell):

(Replace netX.Y with your project's target framework, e.g., net8.0). This command downloads the browser engines Playwright controls.

Step 2: Basic Scraping and Proxy Integration

Now let's write some code. Open your Program.cs file and replace its contents with something like this:

using Microsoft.Playwright;

class Program
{
    static async Task Main(string[] args)
    {
        // --- Evomi Proxy Configuration ---
        // Replace with your actual Evomi credentials
        var proxySettings = new Proxy
        {
            Server = "rp.evomi.com:1000", // Example: Evomi Residential HTTP endpoint
            Username = "YOUR_EVOMI_USERNAME",
            Password = "YOUR_EVOMI_PASSWORD"
        };
        // --- End Proxy Configuration ---

        Console.WriteLine("Initializing Playwright...");
        using var playwright = await Playwright.CreateAsync();

        Console.WriteLine("Launching browser with proxy settings...");
        await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
        {
            Headless = true, // Set to false to see the browser window
            Proxy = proxySettings
        });

        Console.WriteLine("Opening new page...");
        var page = await browser.NewPageAsync();

        string targetUrl = "https://geo.evomi.com/"; // Target site to check IP
        Console.WriteLine($"Navigating to {targetUrl}...");
        await page.GotoAsync(targetUrl);

        Console.WriteLine($"Page loaded: {await page.TitleAsync()}");

        // Example: Extracting the displayed IP address
        var ipElement = page.Locator("#ip-address"); // Adjust selector based on geo.evomi.com's structure
        string displayedIp = await ipElement.InnerTextAsync();
        Console.WriteLine($"IP address detected by website: {displayedIp}");

        // Optional: Take a screenshot
        string screenshotPath = "website_screenshot.png";
        await page.ScreenshotAsync(new PageScreenshotOptions
        {
            Path = screenshotPath
        });
        Console.WriteLine($"Screenshot saved to {screenshotPath}");

        Console.WriteLine("Closing browser...");
        await browser.CloseAsync();

        Console.WriteLine("Scraping task finished.");
    }
}

Breaking Down the Code:

  • It starts by defining the proxy configuration using Evomi's residential proxy endpoint (`rp.evomi.com:1000` for HTTP). Remember to replace the placeholders with your actual Evomi credentials.

  • Playwright.CreateAsync() initializes the Playwright environment.

  • playwright.Chromium.LaunchAsync() starts a Chromium browser instance. We pass our proxySettings here. Headless = true means the browser runs in the background; set it to false if you want to see the browser window pop up.

  • browser.NewPageAsync() opens a new tab.

  • page.GotoAsync() navigates the tab to the specified URL (we're using Evomi's IP checker tool as an example).

  • page.Locator("#ip-address").InnerTextAsync() finds an element with the ID `ip-address` (you'd need to inspect `geo.evomi.com` to find the correct selector for the IP display) and extracts its text content.

  • page.ScreenshotAsync() saves a screenshot of the current view.

  • Finally, it closes the browser and finishes.

When you run this code, the output should show the IP address of the Evomi proxy server, not your own machine's IP. This confirms the proxy connection is working!

Step 3: Capturing Screenshots

As seen above, taking a basic screenshot is simple:

await page.ScreenshotAsync(new()
{
    Path = "page_capture.jpg"
});

But Playwright offers more control:

  • Full Page Screenshot: Capture the entire scrollable page, not just the visible part.

    await page.ScreenshotAsync(new()
    {
        Path = "full_page.png",
        FullPage = true
    });
  • Element Screenshot: Capture only a specific element.

    // Example: Screenshot only the element with id 'main-content'
    await page.Locator("#main-content").ScreenshotAsync(new()
    {
        Path = "element_capture.png"
    });

Step 4: Extracting Data with Locators

Screenshots are nice, but usually, you want the actual data. Playwright's `Locator` system is key here. Locators define how to find elements on the page.

You can select elements using various strategies:

  • Text Content

  • CSS Selectors

  • XPath Expressions

  • Combinations and filtering

Text Locators

Select elements based on the text they contain. This is often resilient to minor HTML structure changes.

// Find a button with the exact text "Submit" and click it
await page.Locator("text=Submit").ClickAsync();

// Shortcut using single quotes inside double quotes (assumed text selector)
await page.Locator("'Submit'").ClickAsync();

// Find any element containing the text "product details" (case-insensitive)
var detailsLocator = page.Locator("text=/product details/i");
string detailsText = await detailsLocator.InnerTextAsync();

CSS Selectors

Use standard CSS selectors, from simple IDs and classes to complex combinations.

// Get text from the first H2 heading inside an element with class 'product-info'
string title = await page.Locator(".product-info h2").First.TextContentAsync() ?? "Title not found";

// Find all list items within an ordered list with id 'results'
var listItems = page.Locator("ol#results li");
int count = await listItems.CountAsync();
Console.WriteLine($"Found {count} result items.");

// Get the 'href' attribute of a link with class 'download-link'
string? downloadUrl = await page.Locator("a.download-link").GetAttributeAsync("href");

XPath Selectors

XPath provides another powerful way to navigate the HTML structure.

// Select an input field whose 'name' attribute is 'username'
var usernameInput = page.Locator("xpath=//input[@name='username']");
await usernameInput.FillAsync("myUser");

// Find the second paragraph tag on the page
string secondParagraph = await page.Locator("xpath=(//p)[2]").TextContentAsync() ?? "";

You can often get the XPath or CSS selector for an element by using your browser's developer tools (right-click the element -> Inspect -> right-click in the Elements panel -> Copy -> Copy selector / Copy XPath).

Wrapping Up

We've journeyed through the essentials of web scraping using C# and the Playwright library. You've seen how to set up your project, control a headless browser, integrate proxies like those from Evomi to avoid detection, capture screenshots, and extract specific data using various locators.

C# provides a performant and developer-friendly platform for these tasks, and Playwright makes interacting with modern web pages remarkably robust. Keep experimenting, respect website terms of service, and happy scraping!

Diving into C# Web Scraping: Tools and Tactics

Leveraging C# for web scraping opens up a world of possibilities for businesses. In today's data-driven landscape, the ability to automatically gather and analyze web data is a significant advantage. Think price tracking, competitor analysis, market trend monitoring, or aggregating customer feedback – web scraping makes it feasible at scale.

And guess what? Using C# and the .NET ecosystem for these tasks can be surprisingly straightforward and efficient. We're going to walk through how you can scrape websites using C# paired with a powerful technique: controlling headless browsers. We'll cover everything from setting up your environment to pulling out the data you need.

Crucially, we'll also explore how to perform web scraping responsibly and effectively, minimizing the chances of encountering blocks. Ready to dive in?

Is C# a Solid Choice for Web Scraping?

Absolutely. C# offers a compelling combination of performance and ease of use, making it a strong contender for web scraping projects. It's part of the robust .NET framework, known for its speed and efficiency. Plus, C# boasts a large, active community. This means plenty of libraries, documentation, and online discussions are available, so if you hit a snag, chances are someone else has already figured it out.

Choosing Your C# Web Scraping Toolkit

When it comes to scraping the web with C#, you have several approaches. Let's look at the common ones:

  • Raw HTTP Requests + Regex: You could use libraries like HttpClient to fetch raw HTML and then parse it with regular expressions. However, Regex can be notoriously tricky to write correctly and extremely brittle – minor changes to a website's structure can break your scraper.

  • HTTP Requests + HTML Parsers: A step up involves fetching HTML and then using a dedicated parser library like Html Agility Pack. While better than Regex, these parsers often struggle with modern websites that rely heavily on JavaScript to load content dynamically. They essentially analyze the initial HTML source, potentially missing crucial data loaded later.

  • Targeting APIs or XHR Requests: Sometimes, websites load data through background API calls (XHR). If you can identify and replicate these requests, it can be very efficient. However, APIs aren't always available, documented, or stable, and figuring them out requires some reverse-engineering.

  • Headless Browsers: This is often the most robust and reliable method, especially for complex, JavaScript-heavy sites. Libraries like Playwright or PuppeteerSharp allow you to programmatically control a real browser (like Chrome, Firefox, or WebKit) behind the scenes.

Using a headless browser means your code interacts with the website just like a real user would: it runs JavaScript, handles dynamic content, allows clicks, form submissions, and scrolling. This dramatically increases the chances of successfully extracting the data you need. For C#, the standout library in this category is Playwright for .NET.

A Quick Word on Playwright vs. PuppeteerSharp

Both Playwright and PuppeteerSharp are capable .NET libraries for controlling headless browsers. PuppeteerSharp is a port of the popular Node.js Puppeteer library. Playwright, backed by Microsoft, offers a slightly broader API, supports multiple browser engines (Chromium, Firefox, WebKit) out-of-the-box, and is often considered easier to get started with, especially within the .NET ecosystem. That's why we'll be focusing on Playwright for our examples.

Staying Under the Radar: Avoiding Blocks

So, can websites tell you're scraping them? Sometimes, yes. While web scraping itself is generally legal for publicly accessible data, many websites employ measures to detect and block automated access to protect their resources or data.

Detection often relies on two things:

  1. Suspicious Request Signatures: Simple HTTP clients might send requests lacking standard browser headers (like User-Agent). Headless browsers like Playwright mitigate this significantly because they use a real browser engine, sending typical headers.

  2. Unusual Browsing Patterns: Making hundreds of requests from the same IP address in a short period is a dead giveaway. Visiting pages in a robotic sequence or hitting specific endpoints repeatedly can also trigger alarms.

The most effective way to handle the IP address issue is by using proxies. Proxies act as intermediaries, masking your real IP address. By routing your requests through different proxy IPs, each connection appears to come from a unique visitor, making it much harder for websites to identify and block your scraping activity based on volume.

This is where services like Evomi come in. We provide ethically sourced residential proxies that offer IP addresses from real devices worldwide. This makes your requests look indistinguishable from genuine user traffic. Evomi provides high-quality proxies with reliable connections and excellent support, all based out of Switzerland, known for its quality standards. You can even try our residential, mobile, or datacenter proxies with a completely free trial to see how they work.

When you use a proxy service, you'll typically get connection details (endpoint address, port, username, password) to configure in your scraper.

Let's Scrape! C# and Playwright in Action

Okay, theory time is over. Let's get practical.

Step 1: Setting Up Your C# Scraping Project

First, you'll need a development environment. If you don't have one, Visual Studio (Community Edition is free) or Visual Studio Code with the C# Dev Kit are excellent choices for Windows, macOS, or Linux.

1. Create a new C# project. A Console App (.NET Core or later) is a good starting point for a simple scraper.

2. Add the Playwright library. You can do this via the NuGet Package Manager in Visual Studio (Tools > NuGet Package Manager > Manage NuGet Packages for Solution...) or by using the .NET CLI:

3. Install the necessary browser binaries. Playwright needs these to function. Run the following command in your project's terminal (you might need PowerShell):

(Replace netX.Y with your project's target framework, e.g., net8.0). This command downloads the browser engines Playwright controls.

Step 2: Basic Scraping and Proxy Integration

Now let's write some code. Open your Program.cs file and replace its contents with something like this:

using Microsoft.Playwright;

class Program
{
    static async Task Main(string[] args)
    {
        // --- Evomi Proxy Configuration ---
        // Replace with your actual Evomi credentials
        var proxySettings = new Proxy
        {
            Server = "rp.evomi.com:1000", // Example: Evomi Residential HTTP endpoint
            Username = "YOUR_EVOMI_USERNAME",
            Password = "YOUR_EVOMI_PASSWORD"
        };
        // --- End Proxy Configuration ---

        Console.WriteLine("Initializing Playwright...");
        using var playwright = await Playwright.CreateAsync();

        Console.WriteLine("Launching browser with proxy settings...");
        await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
        {
            Headless = true, // Set to false to see the browser window
            Proxy = proxySettings
        });

        Console.WriteLine("Opening new page...");
        var page = await browser.NewPageAsync();

        string targetUrl = "https://geo.evomi.com/"; // Target site to check IP
        Console.WriteLine($"Navigating to {targetUrl}...");
        await page.GotoAsync(targetUrl);

        Console.WriteLine($"Page loaded: {await page.TitleAsync()}");

        // Example: Extracting the displayed IP address
        var ipElement = page.Locator("#ip-address"); // Adjust selector based on geo.evomi.com's structure
        string displayedIp = await ipElement.InnerTextAsync();
        Console.WriteLine($"IP address detected by website: {displayedIp}");

        // Optional: Take a screenshot
        string screenshotPath = "website_screenshot.png";
        await page.ScreenshotAsync(new PageScreenshotOptions
        {
            Path = screenshotPath
        });
        Console.WriteLine($"Screenshot saved to {screenshotPath}");

        Console.WriteLine("Closing browser...");
        await browser.CloseAsync();

        Console.WriteLine("Scraping task finished.");
    }
}

Breaking Down the Code:

  • It starts by defining the proxy configuration using Evomi's residential proxy endpoint (`rp.evomi.com:1000` for HTTP). Remember to replace the placeholders with your actual Evomi credentials.

  • Playwright.CreateAsync() initializes the Playwright environment.

  • playwright.Chromium.LaunchAsync() starts a Chromium browser instance. We pass our proxySettings here. Headless = true means the browser runs in the background; set it to false if you want to see the browser window pop up.

  • browser.NewPageAsync() opens a new tab.

  • page.GotoAsync() navigates the tab to the specified URL (we're using Evomi's IP checker tool as an example).

  • page.Locator("#ip-address").InnerTextAsync() finds an element with the ID `ip-address` (you'd need to inspect `geo.evomi.com` to find the correct selector for the IP display) and extracts its text content.

  • page.ScreenshotAsync() saves a screenshot of the current view.

  • Finally, it closes the browser and finishes.

When you run this code, the output should show the IP address of the Evomi proxy server, not your own machine's IP. This confirms the proxy connection is working!

Step 3: Capturing Screenshots

As seen above, taking a basic screenshot is simple:

await page.ScreenshotAsync(new()
{
    Path = "page_capture.jpg"
});

But Playwright offers more control:

  • Full Page Screenshot: Capture the entire scrollable page, not just the visible part.

    await page.ScreenshotAsync(new()
    {
        Path = "full_page.png",
        FullPage = true
    });
  • Element Screenshot: Capture only a specific element.

    // Example: Screenshot only the element with id 'main-content'
    await page.Locator("#main-content").ScreenshotAsync(new()
    {
        Path = "element_capture.png"
    });

Step 4: Extracting Data with Locators

Screenshots are nice, but usually, you want the actual data. Playwright's `Locator` system is key here. Locators define how to find elements on the page.

You can select elements using various strategies:

  • Text Content

  • CSS Selectors

  • XPath Expressions

  • Combinations and filtering

Text Locators

Select elements based on the text they contain. This is often resilient to minor HTML structure changes.

// Find a button with the exact text "Submit" and click it
await page.Locator("text=Submit").ClickAsync();

// Shortcut using single quotes inside double quotes (assumed text selector)
await page.Locator("'Submit'").ClickAsync();

// Find any element containing the text "product details" (case-insensitive)
var detailsLocator = page.Locator("text=/product details/i");
string detailsText = await detailsLocator.InnerTextAsync();

CSS Selectors

Use standard CSS selectors, from simple IDs and classes to complex combinations.

// Get text from the first H2 heading inside an element with class 'product-info'
string title = await page.Locator(".product-info h2").First.TextContentAsync() ?? "Title not found";

// Find all list items within an ordered list with id 'results'
var listItems = page.Locator("ol#results li");
int count = await listItems.CountAsync();
Console.WriteLine($"Found {count} result items.");

// Get the 'href' attribute of a link with class 'download-link'
string? downloadUrl = await page.Locator("a.download-link").GetAttributeAsync("href");

XPath Selectors

XPath provides another powerful way to navigate the HTML structure.

// Select an input field whose 'name' attribute is 'username'
var usernameInput = page.Locator("xpath=//input[@name='username']");
await usernameInput.FillAsync("myUser");

// Find the second paragraph tag on the page
string secondParagraph = await page.Locator("xpath=(//p)[2]").TextContentAsync() ?? "";

You can often get the XPath or CSS selector for an element by using your browser's developer tools (right-click the element -> Inspect -> right-click in the Elements panel -> Copy -> Copy selector / Copy XPath).

Wrapping Up

We've journeyed through the essentials of web scraping using C# and the Playwright library. You've seen how to set up your project, control a headless browser, integrate proxies like those from Evomi to avoid detection, capture screenshots, and extract specific data using various locators.

C# provides a performant and developer-friendly platform for these tasks, and Playwright makes interacting with modern web pages remarkably robust. Keep experimenting, respect website terms of service, and happy scraping!

Author

Sarah Whitmore

Digital Privacy & Cybersecurity Consultant

About Author

Sarah is a cybersecurity strategist with a passion for online privacy and digital security. She explores how proxies, VPNs, and encryption tools protect users from tracking, cyber threats, and data breaches. With years of experience in cybersecurity consulting, she provides practical insights into safeguarding sensitive data in an increasingly digital world.

Like this article? Share it.
You asked, we answer - Users questions:
How can I ensure my C# web scraping project is ethical and legally compliant beyond just using proxies?+
Does using Playwright for C# scraping significantly impact performance compared to `HttpClient` for large-scale tasks?+
Besides IP blocking, what advanced anti-scraping techniques might I encounter, and how can C# with Playwright help mitigate them?+
When might I choose datacenter or mobile proxies instead of residential proxies for my C# scraping project?+
How can I make my C# Playwright scraper more robust against website structure changes or temporary network errors?+

In This Article

Read More Blogs