Powerful C# Web Scraping: Tools, Proxies, & Techniques





Sarah Whitmore
Scraping Techniques
Diving into C# Web Scraping: Tools and Tactics
Leveraging C# for web scraping opens up a world of possibilities for businesses. In today's data-driven landscape, the ability to automatically gather and analyze web data is a significant advantage. Think price tracking, competitor analysis, market trend monitoring, or aggregating customer feedback – web scraping makes it feasible at scale.
And guess what? Using C# and the .NET ecosystem for these tasks can be surprisingly straightforward and efficient. We're going to walk through how you can scrape websites using C# paired with a powerful technique: controlling headless browsers. We'll cover everything from setting up your environment to pulling out the data you need.
Crucially, we'll also explore how to perform web scraping responsibly and effectively, minimizing the chances of encountering blocks. Ready to dive in?
Is C# a Solid Choice for Web Scraping?
Absolutely. C# offers a compelling combination of performance and ease of use, making it a strong contender for web scraping projects. It's part of the robust .NET framework, known for its speed and efficiency. Plus, C# boasts a large, active community. This means plenty of libraries, documentation, and online discussions are available, so if you hit a snag, chances are someone else has already figured it out.
Choosing Your C# Web Scraping Toolkit
When it comes to scraping the web with C#, you have several approaches. Let's look at the common ones:
Raw HTTP Requests + Regex: You could use libraries like
HttpClient
to fetch raw HTML and then parse it with regular expressions. However, Regex can be notoriously tricky to write correctly and extremely brittle – minor changes to a website's structure can break your scraper.HTTP Requests + HTML Parsers: A step up involves fetching HTML and then using a dedicated parser library like Html Agility Pack. While better than Regex, these parsers often struggle with modern websites that rely heavily on JavaScript to load content dynamically. They essentially analyze the initial HTML source, potentially missing crucial data loaded later.
Targeting APIs or XHR Requests: Sometimes, websites load data through background API calls (XHR). If you can identify and replicate these requests, it can be very efficient. However, APIs aren't always available, documented, or stable, and figuring them out requires some reverse-engineering.
Headless Browsers: This is often the most robust and reliable method, especially for complex, JavaScript-heavy sites. Libraries like Playwright or PuppeteerSharp allow you to programmatically control a real browser (like Chrome, Firefox, or WebKit) behind the scenes.
Using a headless browser means your code interacts with the website just like a real user would: it runs JavaScript, handles dynamic content, allows clicks, form submissions, and scrolling. This dramatically increases the chances of successfully extracting the data you need. For C#, the standout library in this category is Playwright for .NET.
A Quick Word on Playwright vs. PuppeteerSharp
Both Playwright and PuppeteerSharp are capable .NET libraries for controlling headless browsers. PuppeteerSharp is a port of the popular Node.js Puppeteer library. Playwright, backed by Microsoft, offers a slightly broader API, supports multiple browser engines (Chromium, Firefox, WebKit) out-of-the-box, and is often considered easier to get started with, especially within the .NET ecosystem. That's why we'll be focusing on Playwright for our examples.
Staying Under the Radar: Avoiding Blocks
So, can websites tell you're scraping them? Sometimes, yes. While web scraping itself is generally legal for publicly accessible data, many websites employ measures to detect and block automated access to protect their resources or data.
Detection often relies on two things:
Suspicious Request Signatures: Simple HTTP clients might send requests lacking standard browser headers (like User-Agent). Headless browsers like Playwright mitigate this significantly because they use a real browser engine, sending typical headers.
Unusual Browsing Patterns: Making hundreds of requests from the same IP address in a short period is a dead giveaway. Visiting pages in a robotic sequence or hitting specific endpoints repeatedly can also trigger alarms.
The most effective way to handle the IP address issue is by using proxies. Proxies act as intermediaries, masking your real IP address. By routing your requests through different proxy IPs, each connection appears to come from a unique visitor, making it much harder for websites to identify and block your scraping activity based on volume.
This is where services like Evomi come in. We provide ethically sourced residential proxies that offer IP addresses from real devices worldwide. This makes your requests look indistinguishable from genuine user traffic. Evomi provides high-quality proxies with reliable connections and excellent support, all based out of Switzerland, known for its quality standards. You can even try our residential, mobile, or datacenter proxies with a completely free trial to see how they work.
When you use a proxy service, you'll typically get connection details (endpoint address, port, username, password) to configure in your scraper.
Let's Scrape! C# and Playwright in Action
Okay, theory time is over. Let's get practical.
Step 1: Setting Up Your C# Scraping Project
First, you'll need a development environment. If you don't have one, Visual Studio (Community Edition is free) or Visual Studio Code with the C# Dev Kit are excellent choices for Windows, macOS, or Linux.
1. Create a new C# project. A Console App (.NET Core or later) is a good starting point for a simple scraper.
2. Add the Playwright library. You can do this via the NuGet Package Manager in Visual Studio (Tools > NuGet Package Manager > Manage NuGet Packages for Solution...) or by using the .NET CLI:
3. Install the necessary browser binaries. Playwright needs these to function. Run the following command in your project's terminal (you might need PowerShell):
(Replace netX.Y
with your project's target framework, e.g., net8.0
). This command downloads the browser engines Playwright controls.
Step 2: Basic Scraping and Proxy Integration
Now let's write some code. Open your Program.cs
file and replace its contents with something like this:
using Microsoft.Playwright;
class Program
{
static async Task Main(string[] args)
{
// --- Evomi Proxy Configuration ---
// Replace with your actual Evomi credentials
var proxySettings = new Proxy
{
Server = "rp.evomi.com:1000", // Example: Evomi Residential HTTP endpoint
Username = "YOUR_EVOMI_USERNAME",
Password = "YOUR_EVOMI_PASSWORD"
};
// --- End Proxy Configuration ---
Console.WriteLine("Initializing Playwright...");
using var playwright = await Playwright.CreateAsync();
Console.WriteLine("Launching browser with proxy settings...");
await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
{
Headless = true, // Set to false to see the browser window
Proxy = proxySettings
});
Console.WriteLine("Opening new page...");
var page = await browser.NewPageAsync();
string targetUrl = "https://geo.evomi.com/"; // Target site to check IP
Console.WriteLine($"Navigating to {targetUrl}...");
await page.GotoAsync(targetUrl);
Console.WriteLine($"Page loaded: {await page.TitleAsync()}");
// Example: Extracting the displayed IP address
var ipElement = page.Locator("#ip-address"); // Adjust selector based on geo.evomi.com's structure
string displayedIp = await ipElement.InnerTextAsync();
Console.WriteLine($"IP address detected by website: {displayedIp}");
// Optional: Take a screenshot
string screenshotPath = "website_screenshot.png";
await page.ScreenshotAsync(new PageScreenshotOptions
{
Path = screenshotPath
});
Console.WriteLine($"Screenshot saved to {screenshotPath}");
Console.WriteLine("Closing browser...");
await browser.CloseAsync();
Console.WriteLine("Scraping task finished.");
}
}
Breaking Down the Code:
It starts by defining the proxy configuration using Evomi's residential proxy endpoint (`rp.evomi.com:1000` for HTTP). Remember to replace the placeholders with your actual Evomi credentials.
Playwright.CreateAsync()
initializes the Playwright environment.playwright.Chromium.LaunchAsync()
starts a Chromium browser instance. We pass ourproxySettings
here.Headless = true
means the browser runs in the background; set it tofalse
if you want to see the browser window pop up.browser.NewPageAsync()
opens a new tab.page.GotoAsync()
navigates the tab to the specified URL (we're using Evomi's IP checker tool as an example).page.Locator("#ip-address").InnerTextAsync()
finds an element with the ID `ip-address` (you'd need to inspect `geo.evomi.com` to find the correct selector for the IP display) and extracts its text content.page.ScreenshotAsync()
saves a screenshot of the current view.Finally, it closes the browser and finishes.
When you run this code, the output should show the IP address of the Evomi proxy server, not your own machine's IP. This confirms the proxy connection is working!
Step 3: Capturing Screenshots
As seen above, taking a basic screenshot is simple:
await page.ScreenshotAsync(new()
{
Path = "page_capture.jpg"
});
But Playwright offers more control:
Full Page Screenshot: Capture the entire scrollable page, not just the visible part.
await page.ScreenshotAsync(new() { Path = "full_page.png", FullPage = true });
Element Screenshot: Capture only a specific element.
// Example: Screenshot only the element with id 'main-content' await page.Locator("#main-content").ScreenshotAsync(new() { Path = "element_capture.png" });
Step 4: Extracting Data with Locators
Screenshots are nice, but usually, you want the actual data. Playwright's `Locator` system is key here. Locators define how to find elements on the page.
You can select elements using various strategies:
Text Content
CSS Selectors
XPath Expressions
Combinations and filtering
Text Locators
Select elements based on the text they contain. This is often resilient to minor HTML structure changes.
// Find a button with the exact text "Submit" and click it
await page.Locator("text=Submit").ClickAsync();
// Shortcut using single quotes inside double quotes (assumed text selector)
await page.Locator("'Submit'").ClickAsync();
// Find any element containing the text "product details" (case-insensitive)
var detailsLocator = page.Locator("text=/product details/i");
string detailsText = await detailsLocator.InnerTextAsync();
CSS Selectors
Use standard CSS selectors, from simple IDs and classes to complex combinations.
// Get text from the first H2 heading inside an element with class 'product-info'
string title = await page.Locator(".product-info h2").First.TextContentAsync() ?? "Title not found";
// Find all list items within an ordered list with id 'results'
var listItems = page.Locator("ol#results li");
int count = await listItems.CountAsync();
Console.WriteLine($"Found {count} result items.");
// Get the 'href' attribute of a link with class 'download-link'
string? downloadUrl = await page.Locator("a.download-link").GetAttributeAsync("href");
XPath Selectors
XPath provides another powerful way to navigate the HTML structure.
// Select an input field whose 'name' attribute is 'username'
var usernameInput = page.Locator("xpath=//input[@name='username']");
await usernameInput.FillAsync("myUser");
// Find the second paragraph tag on the page
string secondParagraph = await page.Locator("xpath=(//p)[2]").TextContentAsync() ?? "";
You can often get the XPath or CSS selector for an element by using your browser's developer tools (right-click the element -> Inspect -> right-click in the Elements panel -> Copy -> Copy selector / Copy XPath).
Wrapping Up
We've journeyed through the essentials of web scraping using C# and the Playwright library. You've seen how to set up your project, control a headless browser, integrate proxies like those from Evomi to avoid detection, capture screenshots, and extract specific data using various locators.
C# provides a performant and developer-friendly platform for these tasks, and Playwright makes interacting with modern web pages remarkably robust. Keep experimenting, respect website terms of service, and happy scraping!
Diving into C# Web Scraping: Tools and Tactics
Leveraging C# for web scraping opens up a world of possibilities for businesses. In today's data-driven landscape, the ability to automatically gather and analyze web data is a significant advantage. Think price tracking, competitor analysis, market trend monitoring, or aggregating customer feedback – web scraping makes it feasible at scale.
And guess what? Using C# and the .NET ecosystem for these tasks can be surprisingly straightforward and efficient. We're going to walk through how you can scrape websites using C# paired with a powerful technique: controlling headless browsers. We'll cover everything from setting up your environment to pulling out the data you need.
Crucially, we'll also explore how to perform web scraping responsibly and effectively, minimizing the chances of encountering blocks. Ready to dive in?
Is C# a Solid Choice for Web Scraping?
Absolutely. C# offers a compelling combination of performance and ease of use, making it a strong contender for web scraping projects. It's part of the robust .NET framework, known for its speed and efficiency. Plus, C# boasts a large, active community. This means plenty of libraries, documentation, and online discussions are available, so if you hit a snag, chances are someone else has already figured it out.
Choosing Your C# Web Scraping Toolkit
When it comes to scraping the web with C#, you have several approaches. Let's look at the common ones:
Raw HTTP Requests + Regex: You could use libraries like
HttpClient
to fetch raw HTML and then parse it with regular expressions. However, Regex can be notoriously tricky to write correctly and extremely brittle – minor changes to a website's structure can break your scraper.HTTP Requests + HTML Parsers: A step up involves fetching HTML and then using a dedicated parser library like Html Agility Pack. While better than Regex, these parsers often struggle with modern websites that rely heavily on JavaScript to load content dynamically. They essentially analyze the initial HTML source, potentially missing crucial data loaded later.
Targeting APIs or XHR Requests: Sometimes, websites load data through background API calls (XHR). If you can identify and replicate these requests, it can be very efficient. However, APIs aren't always available, documented, or stable, and figuring them out requires some reverse-engineering.
Headless Browsers: This is often the most robust and reliable method, especially for complex, JavaScript-heavy sites. Libraries like Playwright or PuppeteerSharp allow you to programmatically control a real browser (like Chrome, Firefox, or WebKit) behind the scenes.
Using a headless browser means your code interacts with the website just like a real user would: it runs JavaScript, handles dynamic content, allows clicks, form submissions, and scrolling. This dramatically increases the chances of successfully extracting the data you need. For C#, the standout library in this category is Playwright for .NET.
A Quick Word on Playwright vs. PuppeteerSharp
Both Playwright and PuppeteerSharp are capable .NET libraries for controlling headless browsers. PuppeteerSharp is a port of the popular Node.js Puppeteer library. Playwright, backed by Microsoft, offers a slightly broader API, supports multiple browser engines (Chromium, Firefox, WebKit) out-of-the-box, and is often considered easier to get started with, especially within the .NET ecosystem. That's why we'll be focusing on Playwright for our examples.
Staying Under the Radar: Avoiding Blocks
So, can websites tell you're scraping them? Sometimes, yes. While web scraping itself is generally legal for publicly accessible data, many websites employ measures to detect and block automated access to protect their resources or data.
Detection often relies on two things:
Suspicious Request Signatures: Simple HTTP clients might send requests lacking standard browser headers (like User-Agent). Headless browsers like Playwright mitigate this significantly because they use a real browser engine, sending typical headers.
Unusual Browsing Patterns: Making hundreds of requests from the same IP address in a short period is a dead giveaway. Visiting pages in a robotic sequence or hitting specific endpoints repeatedly can also trigger alarms.
The most effective way to handle the IP address issue is by using proxies. Proxies act as intermediaries, masking your real IP address. By routing your requests through different proxy IPs, each connection appears to come from a unique visitor, making it much harder for websites to identify and block your scraping activity based on volume.
This is where services like Evomi come in. We provide ethically sourced residential proxies that offer IP addresses from real devices worldwide. This makes your requests look indistinguishable from genuine user traffic. Evomi provides high-quality proxies with reliable connections and excellent support, all based out of Switzerland, known for its quality standards. You can even try our residential, mobile, or datacenter proxies with a completely free trial to see how they work.
When you use a proxy service, you'll typically get connection details (endpoint address, port, username, password) to configure in your scraper.
Let's Scrape! C# and Playwright in Action
Okay, theory time is over. Let's get practical.
Step 1: Setting Up Your C# Scraping Project
First, you'll need a development environment. If you don't have one, Visual Studio (Community Edition is free) or Visual Studio Code with the C# Dev Kit are excellent choices for Windows, macOS, or Linux.
1. Create a new C# project. A Console App (.NET Core or later) is a good starting point for a simple scraper.
2. Add the Playwright library. You can do this via the NuGet Package Manager in Visual Studio (Tools > NuGet Package Manager > Manage NuGet Packages for Solution...) or by using the .NET CLI:
3. Install the necessary browser binaries. Playwright needs these to function. Run the following command in your project's terminal (you might need PowerShell):
(Replace netX.Y
with your project's target framework, e.g., net8.0
). This command downloads the browser engines Playwright controls.
Step 2: Basic Scraping and Proxy Integration
Now let's write some code. Open your Program.cs
file and replace its contents with something like this:
using Microsoft.Playwright;
class Program
{
static async Task Main(string[] args)
{
// --- Evomi Proxy Configuration ---
// Replace with your actual Evomi credentials
var proxySettings = new Proxy
{
Server = "rp.evomi.com:1000", // Example: Evomi Residential HTTP endpoint
Username = "YOUR_EVOMI_USERNAME",
Password = "YOUR_EVOMI_PASSWORD"
};
// --- End Proxy Configuration ---
Console.WriteLine("Initializing Playwright...");
using var playwright = await Playwright.CreateAsync();
Console.WriteLine("Launching browser with proxy settings...");
await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
{
Headless = true, // Set to false to see the browser window
Proxy = proxySettings
});
Console.WriteLine("Opening new page...");
var page = await browser.NewPageAsync();
string targetUrl = "https://geo.evomi.com/"; // Target site to check IP
Console.WriteLine($"Navigating to {targetUrl}...");
await page.GotoAsync(targetUrl);
Console.WriteLine($"Page loaded: {await page.TitleAsync()}");
// Example: Extracting the displayed IP address
var ipElement = page.Locator("#ip-address"); // Adjust selector based on geo.evomi.com's structure
string displayedIp = await ipElement.InnerTextAsync();
Console.WriteLine($"IP address detected by website: {displayedIp}");
// Optional: Take a screenshot
string screenshotPath = "website_screenshot.png";
await page.ScreenshotAsync(new PageScreenshotOptions
{
Path = screenshotPath
});
Console.WriteLine($"Screenshot saved to {screenshotPath}");
Console.WriteLine("Closing browser...");
await browser.CloseAsync();
Console.WriteLine("Scraping task finished.");
}
}
Breaking Down the Code:
It starts by defining the proxy configuration using Evomi's residential proxy endpoint (`rp.evomi.com:1000` for HTTP). Remember to replace the placeholders with your actual Evomi credentials.
Playwright.CreateAsync()
initializes the Playwright environment.playwright.Chromium.LaunchAsync()
starts a Chromium browser instance. We pass ourproxySettings
here.Headless = true
means the browser runs in the background; set it tofalse
if you want to see the browser window pop up.browser.NewPageAsync()
opens a new tab.page.GotoAsync()
navigates the tab to the specified URL (we're using Evomi's IP checker tool as an example).page.Locator("#ip-address").InnerTextAsync()
finds an element with the ID `ip-address` (you'd need to inspect `geo.evomi.com` to find the correct selector for the IP display) and extracts its text content.page.ScreenshotAsync()
saves a screenshot of the current view.Finally, it closes the browser and finishes.
When you run this code, the output should show the IP address of the Evomi proxy server, not your own machine's IP. This confirms the proxy connection is working!
Step 3: Capturing Screenshots
As seen above, taking a basic screenshot is simple:
await page.ScreenshotAsync(new()
{
Path = "page_capture.jpg"
});
But Playwright offers more control:
Full Page Screenshot: Capture the entire scrollable page, not just the visible part.
await page.ScreenshotAsync(new() { Path = "full_page.png", FullPage = true });
Element Screenshot: Capture only a specific element.
// Example: Screenshot only the element with id 'main-content' await page.Locator("#main-content").ScreenshotAsync(new() { Path = "element_capture.png" });
Step 4: Extracting Data with Locators
Screenshots are nice, but usually, you want the actual data. Playwright's `Locator` system is key here. Locators define how to find elements on the page.
You can select elements using various strategies:
Text Content
CSS Selectors
XPath Expressions
Combinations and filtering
Text Locators
Select elements based on the text they contain. This is often resilient to minor HTML structure changes.
// Find a button with the exact text "Submit" and click it
await page.Locator("text=Submit").ClickAsync();
// Shortcut using single quotes inside double quotes (assumed text selector)
await page.Locator("'Submit'").ClickAsync();
// Find any element containing the text "product details" (case-insensitive)
var detailsLocator = page.Locator("text=/product details/i");
string detailsText = await detailsLocator.InnerTextAsync();
CSS Selectors
Use standard CSS selectors, from simple IDs and classes to complex combinations.
// Get text from the first H2 heading inside an element with class 'product-info'
string title = await page.Locator(".product-info h2").First.TextContentAsync() ?? "Title not found";
// Find all list items within an ordered list with id 'results'
var listItems = page.Locator("ol#results li");
int count = await listItems.CountAsync();
Console.WriteLine($"Found {count} result items.");
// Get the 'href' attribute of a link with class 'download-link'
string? downloadUrl = await page.Locator("a.download-link").GetAttributeAsync("href");
XPath Selectors
XPath provides another powerful way to navigate the HTML structure.
// Select an input field whose 'name' attribute is 'username'
var usernameInput = page.Locator("xpath=//input[@name='username']");
await usernameInput.FillAsync("myUser");
// Find the second paragraph tag on the page
string secondParagraph = await page.Locator("xpath=(//p)[2]").TextContentAsync() ?? "";
You can often get the XPath or CSS selector for an element by using your browser's developer tools (right-click the element -> Inspect -> right-click in the Elements panel -> Copy -> Copy selector / Copy XPath).
Wrapping Up
We've journeyed through the essentials of web scraping using C# and the Playwright library. You've seen how to set up your project, control a headless browser, integrate proxies like those from Evomi to avoid detection, capture screenshots, and extract specific data using various locators.
C# provides a performant and developer-friendly platform for these tasks, and Playwright makes interacting with modern web pages remarkably robust. Keep experimenting, respect website terms of service, and happy scraping!
Diving into C# Web Scraping: Tools and Tactics
Leveraging C# for web scraping opens up a world of possibilities for businesses. In today's data-driven landscape, the ability to automatically gather and analyze web data is a significant advantage. Think price tracking, competitor analysis, market trend monitoring, or aggregating customer feedback – web scraping makes it feasible at scale.
And guess what? Using C# and the .NET ecosystem for these tasks can be surprisingly straightforward and efficient. We're going to walk through how you can scrape websites using C# paired with a powerful technique: controlling headless browsers. We'll cover everything from setting up your environment to pulling out the data you need.
Crucially, we'll also explore how to perform web scraping responsibly and effectively, minimizing the chances of encountering blocks. Ready to dive in?
Is C# a Solid Choice for Web Scraping?
Absolutely. C# offers a compelling combination of performance and ease of use, making it a strong contender for web scraping projects. It's part of the robust .NET framework, known for its speed and efficiency. Plus, C# boasts a large, active community. This means plenty of libraries, documentation, and online discussions are available, so if you hit a snag, chances are someone else has already figured it out.
Choosing Your C# Web Scraping Toolkit
When it comes to scraping the web with C#, you have several approaches. Let's look at the common ones:
Raw HTTP Requests + Regex: You could use libraries like
HttpClient
to fetch raw HTML and then parse it with regular expressions. However, Regex can be notoriously tricky to write correctly and extremely brittle – minor changes to a website's structure can break your scraper.HTTP Requests + HTML Parsers: A step up involves fetching HTML and then using a dedicated parser library like Html Agility Pack. While better than Regex, these parsers often struggle with modern websites that rely heavily on JavaScript to load content dynamically. They essentially analyze the initial HTML source, potentially missing crucial data loaded later.
Targeting APIs or XHR Requests: Sometimes, websites load data through background API calls (XHR). If you can identify and replicate these requests, it can be very efficient. However, APIs aren't always available, documented, or stable, and figuring them out requires some reverse-engineering.
Headless Browsers: This is often the most robust and reliable method, especially for complex, JavaScript-heavy sites. Libraries like Playwright or PuppeteerSharp allow you to programmatically control a real browser (like Chrome, Firefox, or WebKit) behind the scenes.
Using a headless browser means your code interacts with the website just like a real user would: it runs JavaScript, handles dynamic content, allows clicks, form submissions, and scrolling. This dramatically increases the chances of successfully extracting the data you need. For C#, the standout library in this category is Playwright for .NET.
A Quick Word on Playwright vs. PuppeteerSharp
Both Playwright and PuppeteerSharp are capable .NET libraries for controlling headless browsers. PuppeteerSharp is a port of the popular Node.js Puppeteer library. Playwright, backed by Microsoft, offers a slightly broader API, supports multiple browser engines (Chromium, Firefox, WebKit) out-of-the-box, and is often considered easier to get started with, especially within the .NET ecosystem. That's why we'll be focusing on Playwright for our examples.
Staying Under the Radar: Avoiding Blocks
So, can websites tell you're scraping them? Sometimes, yes. While web scraping itself is generally legal for publicly accessible data, many websites employ measures to detect and block automated access to protect their resources or data.
Detection often relies on two things:
Suspicious Request Signatures: Simple HTTP clients might send requests lacking standard browser headers (like User-Agent). Headless browsers like Playwright mitigate this significantly because they use a real browser engine, sending typical headers.
Unusual Browsing Patterns: Making hundreds of requests from the same IP address in a short period is a dead giveaway. Visiting pages in a robotic sequence or hitting specific endpoints repeatedly can also trigger alarms.
The most effective way to handle the IP address issue is by using proxies. Proxies act as intermediaries, masking your real IP address. By routing your requests through different proxy IPs, each connection appears to come from a unique visitor, making it much harder for websites to identify and block your scraping activity based on volume.
This is where services like Evomi come in. We provide ethically sourced residential proxies that offer IP addresses from real devices worldwide. This makes your requests look indistinguishable from genuine user traffic. Evomi provides high-quality proxies with reliable connections and excellent support, all based out of Switzerland, known for its quality standards. You can even try our residential, mobile, or datacenter proxies with a completely free trial to see how they work.
When you use a proxy service, you'll typically get connection details (endpoint address, port, username, password) to configure in your scraper.
Let's Scrape! C# and Playwright in Action
Okay, theory time is over. Let's get practical.
Step 1: Setting Up Your C# Scraping Project
First, you'll need a development environment. If you don't have one, Visual Studio (Community Edition is free) or Visual Studio Code with the C# Dev Kit are excellent choices for Windows, macOS, or Linux.
1. Create a new C# project. A Console App (.NET Core or later) is a good starting point for a simple scraper.
2. Add the Playwright library. You can do this via the NuGet Package Manager in Visual Studio (Tools > NuGet Package Manager > Manage NuGet Packages for Solution...) or by using the .NET CLI:
3. Install the necessary browser binaries. Playwright needs these to function. Run the following command in your project's terminal (you might need PowerShell):
(Replace netX.Y
with your project's target framework, e.g., net8.0
). This command downloads the browser engines Playwright controls.
Step 2: Basic Scraping and Proxy Integration
Now let's write some code. Open your Program.cs
file and replace its contents with something like this:
using Microsoft.Playwright;
class Program
{
static async Task Main(string[] args)
{
// --- Evomi Proxy Configuration ---
// Replace with your actual Evomi credentials
var proxySettings = new Proxy
{
Server = "rp.evomi.com:1000", // Example: Evomi Residential HTTP endpoint
Username = "YOUR_EVOMI_USERNAME",
Password = "YOUR_EVOMI_PASSWORD"
};
// --- End Proxy Configuration ---
Console.WriteLine("Initializing Playwright...");
using var playwright = await Playwright.CreateAsync();
Console.WriteLine("Launching browser with proxy settings...");
await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
{
Headless = true, // Set to false to see the browser window
Proxy = proxySettings
});
Console.WriteLine("Opening new page...");
var page = await browser.NewPageAsync();
string targetUrl = "https://geo.evomi.com/"; // Target site to check IP
Console.WriteLine($"Navigating to {targetUrl}...");
await page.GotoAsync(targetUrl);
Console.WriteLine($"Page loaded: {await page.TitleAsync()}");
// Example: Extracting the displayed IP address
var ipElement = page.Locator("#ip-address"); // Adjust selector based on geo.evomi.com's structure
string displayedIp = await ipElement.InnerTextAsync();
Console.WriteLine($"IP address detected by website: {displayedIp}");
// Optional: Take a screenshot
string screenshotPath = "website_screenshot.png";
await page.ScreenshotAsync(new PageScreenshotOptions
{
Path = screenshotPath
});
Console.WriteLine($"Screenshot saved to {screenshotPath}");
Console.WriteLine("Closing browser...");
await browser.CloseAsync();
Console.WriteLine("Scraping task finished.");
}
}
Breaking Down the Code:
It starts by defining the proxy configuration using Evomi's residential proxy endpoint (`rp.evomi.com:1000` for HTTP). Remember to replace the placeholders with your actual Evomi credentials.
Playwright.CreateAsync()
initializes the Playwright environment.playwright.Chromium.LaunchAsync()
starts a Chromium browser instance. We pass ourproxySettings
here.Headless = true
means the browser runs in the background; set it tofalse
if you want to see the browser window pop up.browser.NewPageAsync()
opens a new tab.page.GotoAsync()
navigates the tab to the specified URL (we're using Evomi's IP checker tool as an example).page.Locator("#ip-address").InnerTextAsync()
finds an element with the ID `ip-address` (you'd need to inspect `geo.evomi.com` to find the correct selector for the IP display) and extracts its text content.page.ScreenshotAsync()
saves a screenshot of the current view.Finally, it closes the browser and finishes.
When you run this code, the output should show the IP address of the Evomi proxy server, not your own machine's IP. This confirms the proxy connection is working!
Step 3: Capturing Screenshots
As seen above, taking a basic screenshot is simple:
await page.ScreenshotAsync(new()
{
Path = "page_capture.jpg"
});
But Playwright offers more control:
Full Page Screenshot: Capture the entire scrollable page, not just the visible part.
await page.ScreenshotAsync(new() { Path = "full_page.png", FullPage = true });
Element Screenshot: Capture only a specific element.
// Example: Screenshot only the element with id 'main-content' await page.Locator("#main-content").ScreenshotAsync(new() { Path = "element_capture.png" });
Step 4: Extracting Data with Locators
Screenshots are nice, but usually, you want the actual data. Playwright's `Locator` system is key here. Locators define how to find elements on the page.
You can select elements using various strategies:
Text Content
CSS Selectors
XPath Expressions
Combinations and filtering
Text Locators
Select elements based on the text they contain. This is often resilient to minor HTML structure changes.
// Find a button with the exact text "Submit" and click it
await page.Locator("text=Submit").ClickAsync();
// Shortcut using single quotes inside double quotes (assumed text selector)
await page.Locator("'Submit'").ClickAsync();
// Find any element containing the text "product details" (case-insensitive)
var detailsLocator = page.Locator("text=/product details/i");
string detailsText = await detailsLocator.InnerTextAsync();
CSS Selectors
Use standard CSS selectors, from simple IDs and classes to complex combinations.
// Get text from the first H2 heading inside an element with class 'product-info'
string title = await page.Locator(".product-info h2").First.TextContentAsync() ?? "Title not found";
// Find all list items within an ordered list with id 'results'
var listItems = page.Locator("ol#results li");
int count = await listItems.CountAsync();
Console.WriteLine($"Found {count} result items.");
// Get the 'href' attribute of a link with class 'download-link'
string? downloadUrl = await page.Locator("a.download-link").GetAttributeAsync("href");
XPath Selectors
XPath provides another powerful way to navigate the HTML structure.
// Select an input field whose 'name' attribute is 'username'
var usernameInput = page.Locator("xpath=//input[@name='username']");
await usernameInput.FillAsync("myUser");
// Find the second paragraph tag on the page
string secondParagraph = await page.Locator("xpath=(//p)[2]").TextContentAsync() ?? "";
You can often get the XPath or CSS selector for an element by using your browser's developer tools (right-click the element -> Inspect -> right-click in the Elements panel -> Copy -> Copy selector / Copy XPath).
Wrapping Up
We've journeyed through the essentials of web scraping using C# and the Playwright library. You've seen how to set up your project, control a headless browser, integrate proxies like those from Evomi to avoid detection, capture screenshots, and extract specific data using various locators.
C# provides a performant and developer-friendly platform for these tasks, and Playwright makes interacting with modern web pages remarkably robust. Keep experimenting, respect website terms of service, and happy scraping!

Author
Sarah Whitmore
Digital Privacy & Cybersecurity Consultant
About Author
Sarah is a cybersecurity strategist with a passion for online privacy and digital security. She explores how proxies, VPNs, and encryption tools protect users from tracking, cyber threats, and data breaches. With years of experience in cybersecurity consulting, she provides practical insights into safeguarding sensitive data in an increasingly digital world.