Scrape Websites With cURL and Proxies (Live Examples)

Getting Started with Web Scraping using cURL and Proxies

Using cURL with a proxy is a straightforward way to gather web data without running into immediate blocks. It’s a classic combination for a reason.

cURL itself is a remarkably versatile command-line tool. It's flexible, widely supported across operating systems and programming environments, and relatively easy to pick up.

This makes cURL a solid choice for web scraping tasks. However, sending repeated requests from your own IP address using basic cURL commands is a surefire way to get flagged or blocked by websites.

So, today we'll explore how to effectively use cURL for scraping, focusing on integrating proxies to keep your operations smooth and undetected. We'll cover the basics of cURL, why you'd choose it, how to configure proxy authentication, and other techniques to mask your scraping activity.

We'll provide practical examples you can test yourself. Let’s dive in!

What Exactly is cURL?

cURL, which stands for Client URL, is fundamentally a tool for transferring data with URLs. It lets you send requests to a server, specify data or headers, and retrieve the response, much like a web browser does under the hood, but from your command line.

You can perform various HTTP actions: fetch a webpage's HTML, interact with an API endpoint, submit form data, and more. For insights on specific tasks like file transfers, check out our guide on how to download files using cURL with proxies.

Its power in web scraping comes from this direct interaction capability and its extensive set of options (over 200 flags!) that allow fine-grained control over requests.

A simple command might look like this, fetching the content of a test site:

curl http://httpbin.org/html

This returns the basic HTML structure of that page.

But cURL handles complexity too. You can combine flags for more advanced scenarios, like fetching multiple sequentially numbered pages through a specific proxy server:

curl \
  -U your_username:your_password \
  -x http://rp.evomi.com:1000 \
  http://example.com/page/[1-5]

(Remember to replace your_username:your_password with your actual Evomi credentials and adjust the proxy endpoint/port if needed, e.g., `dc.evomi.com:2000` for datacenter proxies).

This command attempts to fetch pages 1 through 5 from `example.com`, routing each request through the specified Evomi residential proxy (`rp.evomi.com` on port `1000`). This technique of using ranges in the URL is known as "URL globbing".

It's crucial to understand that cURL is a data transfer tool, not a web browser. It won't parse HTML, execute JavaScript, or render pages. It simply fetches the raw response from the server.

Therefore, cURL excels at scraping data from:

Static HTML websites (where content doesn't rely heavily on JavaScript).
Backend API endpoints (often returning structured data like JSON).
Direct XHR (XMLHttpRequest) requests discovered via browser developer tools.

How to Configure Proxy Access in cURL Commands

When using proxies with cURL, especially authenticated ones like those from Evomi, you need to provide the connection details. There are a few common ways to do this:

Directly within the cURL command itself.
Using environment variables or command aliases.
Through a dedicated cURL configuration file (`.curlrc`).

Using a reliable proxy service is key to avoiding blocks during scraping. Evomi offers robust residential proxies sourced ethically, alongside datacenter, mobile, and static ISP options, all designed for performance and reliability. We even offer free trials for residential, mobile and datacenter proxies if you want to test them out.

The most basic method for using an authenticated proxy in cURL involves the `-x` (or `--proxy`) flag for the proxy address and port, and the `-U` (or `--proxy-user`) flag for the username and password.

Here's the syntax using the short flags:

curl \
  -U your_username:your_password \
  -x http://rp.evomi.com:1000 \
  https://api.ipify.org?format=json

And here's the equivalent using the long flags:

curl \
  --proxy-user your_username:your_password \
  --proxy http://rp.evomi.com:1000 \
  https://api.ipify.org?format=json

Both commands achieve the same result: fetching your public IP address (as seen by the target server) via the specified Evomi residential proxy. This confirms the proxy is working.

While simple, embedding credentials directly in commands can be repetitive and might expose them in your command history or process list, which could be a security concern on shared systems. Let's look at more secure and convenient methods.

Can cURL Use Proxy Environment Variables?

Yes, cURL automatically checks for specific environment variables to configure proxy settings. This is a convenient way to set your proxy details once per shell session, or even persistently.

You typically set these using the `export` command in Bash-compatible shells. The relevant variables are `http_proxy` for HTTP URLs and `https_proxy` for HTTPS URLs.

Here’s how you might set them up for an Evomi residential proxy:

export http_proxy="http://your_username:your_password@rp.evomi.com:1000"
export https_proxy="http://your_username:your_password@rp.evomi.com:1000"

# Verify they are set (optional)
echo $http_proxy
echo $https_proxy

# Now run a cURL command without explicit proxy flags
curl https://api.ipify.org?format=json

After exporting these variables, any subsequent `curl` command in that shell session targeting an HTTP or HTTPS URL will automatically use the specified proxy and credentials.

The main consideration here is that these are *environment* variables. Other programs in the same shell environment that also respect these variables (like `wget`, `pip`, `apt`, etc.) will *also* use this proxy. If that's your intention, great! If not, you might prefer a method that only affects cURL.

Setting Up a cURL Alias for Proxied Requests

Another neat trick is to create a command-line alias. An alias is essentially a shortcut or a custom command name that expands to a longer, predefined command.

You could create an alias, say `curl_evomi`, that automatically includes your Evomi proxy settings whenever you use it.

Here's how you'd define such an alias in your shell:

alias curl_evomi='curl -x http://rp.evomi.com:1000 -U your_username:your_password'
# Now use the alias like a regular command
curl_evomi https://api.ipify.org?format=json

(Note: Online shell environments might not support alias creation effectively for testing.)

When you type `curl_evomi https://api.ipify.org?format=json`, the shell replaces `curl_evomi` with the full command `curl -x http://rp.evomi.com:1000 -U your_username:your_password`, and then appends the URL. This keeps your regular `curl` command unchanged while providing a specific command for proxied requests.

To make this alias permanent, add the `alias` definition line to your shell's startup file (e.g., `~/.bashrc`, `~/.bash_profile`, or `~/.zshrc` depending on your shell) and then reload your shell configuration (e.g., `source ~/.bashrc`) or open a new terminal window.

Creating and Using a `.curlrc` Configuration File

For settings you want cURL to use automatically for *every* invocation, without relying on environment variables or aliases, you can use a configuration file named `.curlrc`.

cURL looks for this file in your home directory by default (`~/.curlrc` on Linux/macOS, `_curlrc` in your user profile directory on Windows). Any valid cURL command-line options can be placed in this file, one per line.

To configure your Evomi proxy, you could create a `.curlrc` file with the following content:

# ~/.curlrc configuration for Evomi Proxy
proxy = "rp.evomi.com:1000"
proxy-user = "your_username:your_password"
# You can add other default options here too, e.g.:
# user-agent = "MyCustomScraper/1.0"

With this file in place, any `curl` command you run will automatically use these proxy settings, unless you override them with command-line flags for a specific request. This is often considered the cleanest way to set persistent defaults for cURL.

Passing Custom Headers with cURL

Beyond using proxies, sending appropriate HTTP headers is crucial for mimicking legitimate browser traffic and avoiding detection. Many websites check headers like `User-Agent`, `Accept-Language`, etc.

You can add custom headers to your cURL requests using the `-H` or `--header` flag, and you can use it multiple times for multiple headers.

For example, let's set a custom `User-Agent` and an `Accept-Language` header, and use the `-v` (`--verbose`) flag to see the outgoing request details:

curl https://httpbin.org/headers \
  -H "User-Agent: MyDataFetcherBot/2.1" \
  -H "Accept-Language: en-US,en;q=0.9" \
  -v

In the verbose output (usually prefixed with `>`), you'll see these headers being sent to the server:

> GET /headers HTTP/1.1
> Host: httpbin.org
> Accept: */*
> User-Agent: MyDataFetcherBot/2.1
> Accept-Language: en-US,en;q=0.9

How do you find realistic headers? Your browser's developer tools are your best friend. Open the Network tab, load the target page, click on a request, and examine the "Request Headers" section. You can copy these values or even right-click the request and copy the entire command as a cURL request (often available in Chrome/Firefox DevTools).

Tools like Evomi's Browser Fingerprint Checker can also show you what information your browser typically exposes.

Just like proxy settings, you can make common headers persistent using environment variables (though less common for headers), aliases, or by adding `header = "Header-Name: HeaderValue"` lines to your `.curlrc` file.

How to POST JSON Data using cURL

Sending data to a server, often in JSON format for APIs, is another common task. cURL provides the `--json` flag as a convenient shortcut for POSTing JSON.

Using `--json '{ "key": "value" }'` is equivalent to using:

-X POST (explicitly set method to POST)
-H "Content-Type: application/json"
-H "Accept: application/json" (often helpful)
-d '{ "key": "value" }' (or `--data`)

Here’s an example using the shortcut:

curl https://httpbin.org/post \
  --json '{ "query": "scraping data", "page": 1 }'

Note: The `--json` flag is relatively new. If you encounter an "unknown option" error, your cURL version might be older. In that case, use the explicit flags:

curl https://httpbin.org/post \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{ "query": "scraping data", "page": 1 }'

For larger JSON payloads, putting the data directly on the command line can be cumbersome. You can instead store your JSON in a file (e.g., `payload.json`) and tell cURL to read the data from that file using the `@` prefix with the `-d` or `--data` flag:

# Assuming payload.json contains: { "query": "scraping data", "page": 1 }
curl https://httpbin.org/post \
  -X POST \
  -H "Content-Type: application/json" \
  -d @payload.json

Check out our article on working with JSON in Python for more details on handling this data format.

Extracting Data from Responses with cURL and Grep

You've successfully fetched data using cURL, perhaps through a proxy and with custom headers. Now, how do you extract the specific information you need from the response?

If you're dealing with JSON responses from an API, tools like `jq` are excellent for parsing and filtering. However, if you're scraping HTML, a common command-line approach involves piping the cURL output to `grep` (Global Regular Expression Print).

`grep` allows you to search the input text (in this case, the HTML source from cURL) for lines matching a specified pattern, often a regular expression.

For instance, let's try to extract the main heading (h1 tag) from a simple example page:

curl http://httpbin.org/html --silent \
  | grep -Eo '<h1>.*</h1>'

Let's break down this command:

curl http://httpbin.org/html --silent: Fetches the HTML content. `--silent` suppresses progress meters and error messages, outputting only the page data.
|: The pipe symbol sends the output of the `curl` command as input to the `grep` command.
grep -Eo '<h1>.*</h1>': Searches the input for the pattern.
- -E: Use extended regular expressions.
- -o: Only output the matched part of the line(s), not the entire line.
- '<h1>.*</h1>': The pattern. It looks for the literal text `<h1>`, followed by any character (`.`) repeated zero or more times (`*`), followed by the literal text `<h1>`.

This command would output something like: <h1>Herman Melville - Moby Dick</h1>.

While `grep` can work for simple cases, parsing HTML with regular expressions is often fragile and complex. If the website structure changes slightly, your regex might break. Whenever possible, look for APIs or structured data formats (like JSON-LD embedded in HTML, or RSS feeds) provided by the site, as these are much more reliable to parse.

Wrapping Up

Today we explored how to leverage the power of cURL for web scraping while using proxies to avoid blocks. You learned several methods for configuring proxy authentication (command line, environment variables, aliases, `.curlrc`), the importance of setting appropriate headers, how to POST JSON data, and a basic technique for extracting information using `grep`.

Using these techniques, especially with reliable and ethically sourced proxies like those from Evomi, you can build effective and robust web scraping solutions directly from the command line. Happy scraping!

Getting Started with Web Scraping using cURL and Proxies

Using cURL with a proxy is a straightforward way to gather web data without running into immediate blocks. It’s a classic combination for a reason.

cURL itself is a remarkably versatile command-line tool. It's flexible, widely supported across operating systems and programming environments, and relatively easy to pick up.

This makes cURL a solid choice for web scraping tasks. However, sending repeated requests from your own IP address using basic cURL commands is a surefire way to get flagged or blocked by websites.

So, today we'll explore how to effectively use cURL for scraping, focusing on integrating proxies to keep your operations smooth and undetected. We'll cover the basics of cURL, why you'd choose it, how to configure proxy authentication, and other techniques to mask your scraping activity.

We'll provide practical examples you can test yourself. Let’s dive in!

What Exactly is cURL?

cURL, which stands for Client URL, is fundamentally a tool for transferring data with URLs. It lets you send requests to a server, specify data or headers, and retrieve the response, much like a web browser does under the hood, but from your command line.

You can perform various HTTP actions: fetch a webpage's HTML, interact with an API endpoint, submit form data, and more. For insights on specific tasks like file transfers, check out our guide on how to download files using cURL with proxies.

Its power in web scraping comes from this direct interaction capability and its extensive set of options (over 200 flags!) that allow fine-grained control over requests.

A simple command might look like this, fetching the content of a test site:

curl http://httpbin.org/html

This returns the basic HTML structure of that page.

But cURL handles complexity too. You can combine flags for more advanced scenarios, like fetching multiple sequentially numbered pages through a specific proxy server:

curl \
  -U your_username:your_password \
  -x http://rp.evomi.com:1000 \
  http://example.com/page/[1-5]

(Remember to replace your_username:your_password with your actual Evomi credentials and adjust the proxy endpoint/port if needed, e.g., `dc.evomi.com:2000` for datacenter proxies).

This command attempts to fetch pages 1 through 5 from `example.com`, routing each request through the specified Evomi residential proxy (`rp.evomi.com` on port `1000`). This technique of using ranges in the URL is known as "URL globbing".

It's crucial to understand that cURL is a data transfer tool, not a web browser. It won't parse HTML, execute JavaScript, or render pages. It simply fetches the raw response from the server.

Therefore, cURL excels at scraping data from:

Static HTML websites (where content doesn't rely heavily on JavaScript).
Backend API endpoints (often returning structured data like JSON).
Direct XHR (XMLHttpRequest) requests discovered via browser developer tools.

How to Configure Proxy Access in cURL Commands

When using proxies with cURL, especially authenticated ones like those from Evomi, you need to provide the connection details. There are a few common ways to do this:

Directly within the cURL command itself.
Using environment variables or command aliases.
Through a dedicated cURL configuration file (`.curlrc`).

Using a reliable proxy service is key to avoiding blocks during scraping. Evomi offers robust residential proxies sourced ethically, alongside datacenter, mobile, and static ISP options, all designed for performance and reliability. We even offer free trials for residential, mobile and datacenter proxies if you want to test them out.

The most basic method for using an authenticated proxy in cURL involves the `-x` (or `--proxy`) flag for the proxy address and port, and the `-U` (or `--proxy-user`) flag for the username and password.

Here's the syntax using the short flags:

curl \
  -U your_username:your_password \
  -x http://rp.evomi.com:1000 \
  https://api.ipify.org?format=json

And here's the equivalent using the long flags:

curl \
  --proxy-user your_username:your_password \
  --proxy http://rp.evomi.com:1000 \
  https://api.ipify.org?format=json

Both commands achieve the same result: fetching your public IP address (as seen by the target server) via the specified Evomi residential proxy. This confirms the proxy is working.

While simple, embedding credentials directly in commands can be repetitive and might expose them in your command history or process list, which could be a security concern on shared systems. Let's look at more secure and convenient methods.

Can cURL Use Proxy Environment Variables?

Yes, cURL automatically checks for specific environment variables to configure proxy settings. This is a convenient way to set your proxy details once per shell session, or even persistently.

You typically set these using the `export` command in Bash-compatible shells. The relevant variables are `http_proxy` for HTTP URLs and `https_proxy` for HTTPS URLs.

Here’s how you might set them up for an Evomi residential proxy:

export http_proxy="http://your_username:your_password@rp.evomi.com:1000"
export https_proxy="http://your_username:your_password@rp.evomi.com:1000"

# Verify they are set (optional)
echo $http_proxy
echo $https_proxy

# Now run a cURL command without explicit proxy flags
curl https://api.ipify.org?format=json

After exporting these variables, any subsequent `curl` command in that shell session targeting an HTTP or HTTPS URL will automatically use the specified proxy and credentials.

The main consideration here is that these are *environment* variables. Other programs in the same shell environment that also respect these variables (like `wget`, `pip`, `apt`, etc.) will *also* use this proxy. If that's your intention, great! If not, you might prefer a method that only affects cURL.

Setting Up a cURL Alias for Proxied Requests

Another neat trick is to create a command-line alias. An alias is essentially a shortcut or a custom command name that expands to a longer, predefined command.

You could create an alias, say `curl_evomi`, that automatically includes your Evomi proxy settings whenever you use it.

Here's how you'd define such an alias in your shell:

alias curl_evomi='curl -x http://rp.evomi.com:1000 -U your_username:your_password'
# Now use the alias like a regular command
curl_evomi https://api.ipify.org?format=json

(Note: Online shell environments might not support alias creation effectively for testing.)

When you type `curl_evomi https://api.ipify.org?format=json`, the shell replaces `curl_evomi` with the full command `curl -x http://rp.evomi.com:1000 -U your_username:your_password`, and then appends the URL. This keeps your regular `curl` command unchanged while providing a specific command for proxied requests.

To make this alias permanent, add the `alias` definition line to your shell's startup file (e.g., `~/.bashrc`, `~/.bash_profile`, or `~/.zshrc` depending on your shell) and then reload your shell configuration (e.g., `source ~/.bashrc`) or open a new terminal window.

Creating and Using a `.curlrc` Configuration File

For settings you want cURL to use automatically for *every* invocation, without relying on environment variables or aliases, you can use a configuration file named `.curlrc`.

cURL looks for this file in your home directory by default (`~/.curlrc` on Linux/macOS, `_curlrc` in your user profile directory on Windows). Any valid cURL command-line options can be placed in this file, one per line.

To configure your Evomi proxy, you could create a `.curlrc` file with the following content:

# ~/.curlrc configuration for Evomi Proxy
proxy = "rp.evomi.com:1000"
proxy-user = "your_username:your_password"
# You can add other default options here too, e.g.:
# user-agent = "MyCustomScraper/1.0"

With this file in place, any `curl` command you run will automatically use these proxy settings, unless you override them with command-line flags for a specific request. This is often considered the cleanest way to set persistent defaults for cURL.

Passing Custom Headers with cURL

Beyond using proxies, sending appropriate HTTP headers is crucial for mimicking legitimate browser traffic and avoiding detection. Many websites check headers like `User-Agent`, `Accept-Language`, etc.

You can add custom headers to your cURL requests using the `-H` or `--header` flag, and you can use it multiple times for multiple headers.

For example, let's set a custom `User-Agent` and an `Accept-Language` header, and use the `-v` (`--verbose`) flag to see the outgoing request details:

curl https://httpbin.org/headers \
  -H "User-Agent: MyDataFetcherBot/2.1" \
  -H "Accept-Language: en-US,en;q=0.9" \
  -v

In the verbose output (usually prefixed with `>`), you'll see these headers being sent to the server:

> GET /headers HTTP/1.1
> Host: httpbin.org
> Accept: */*
> User-Agent: MyDataFetcherBot/2.1
> Accept-Language: en-US,en;q=0.9

How do you find realistic headers? Your browser's developer tools are your best friend. Open the Network tab, load the target page, click on a request, and examine the "Request Headers" section. You can copy these values or even right-click the request and copy the entire command as a cURL request (often available in Chrome/Firefox DevTools).

Tools like Evomi's Browser Fingerprint Checker can also show you what information your browser typically exposes.

Just like proxy settings, you can make common headers persistent using environment variables (though less common for headers), aliases, or by adding `header = "Header-Name: HeaderValue"` lines to your `.curlrc` file.

How to POST JSON Data using cURL

Sending data to a server, often in JSON format for APIs, is another common task. cURL provides the `--json` flag as a convenient shortcut for POSTing JSON.

Using `--json '{ "key": "value" }'` is equivalent to using:

-X POST (explicitly set method to POST)
-H "Content-Type: application/json"
-H "Accept: application/json" (often helpful)
-d '{ "key": "value" }' (or `--data`)

Here’s an example using the shortcut:

curl https://httpbin.org/post \
  --json '{ "query": "scraping data", "page": 1 }'

Note: The `--json` flag is relatively new. If you encounter an "unknown option" error, your cURL version might be older. In that case, use the explicit flags:

curl https://httpbin.org/post \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{ "query": "scraping data", "page": 1 }'

For larger JSON payloads, putting the data directly on the command line can be cumbersome. You can instead store your JSON in a file (e.g., `payload.json`) and tell cURL to read the data from that file using the `@` prefix with the `-d` or `--data` flag:

# Assuming payload.json contains: { "query": "scraping data", "page": 1 }
curl https://httpbin.org/post \
  -X POST \
  -H "Content-Type: application/json" \
  -d @payload.json

Check out our article on working with JSON in Python for more details on handling this data format.

Extracting Data from Responses with cURL and Grep

You've successfully fetched data using cURL, perhaps through a proxy and with custom headers. Now, how do you extract the specific information you need from the response?

If you're dealing with JSON responses from an API, tools like `jq` are excellent for parsing and filtering. However, if you're scraping HTML, a common command-line approach involves piping the cURL output to `grep` (Global Regular Expression Print).

`grep` allows you to search the input text (in this case, the HTML source from cURL) for lines matching a specified pattern, often a regular expression.

For instance, let's try to extract the main heading (h1 tag) from a simple example page:

curl http://httpbin.org/html --silent \
  | grep -Eo '<h1>.*</h1>'

Let's break down this command:

curl http://httpbin.org/html --silent: Fetches the HTML content. `--silent` suppresses progress meters and error messages, outputting only the page data.
|: The pipe symbol sends the output of the `curl` command as input to the `grep` command.
grep -Eo '<h1>.*</h1>': Searches the input for the pattern.
- -E: Use extended regular expressions.
- -o: Only output the matched part of the line(s), not the entire line.
- '<h1>.*</h1>': The pattern. It looks for the literal text `<h1>`, followed by any character (`.`) repeated zero or more times (`*`), followed by the literal text `<h1>`.

This command would output something like: <h1>Herman Melville - Moby Dick</h1>.

While `grep` can work for simple cases, parsing HTML with regular expressions is often fragile and complex. If the website structure changes slightly, your regex might break. Whenever possible, look for APIs or structured data formats (like JSON-LD embedded in HTML, or RSS feeds) provided by the site, as these are much more reliable to parse.

Wrapping Up

Today we explored how to leverage the power of cURL for web scraping while using proxies to avoid blocks. You learned several methods for configuring proxy authentication (command line, environment variables, aliases, `.curlrc`), the importance of setting appropriate headers, how to POST JSON data, and a basic technique for extracting information using `grep`.

Using these techniques, especially with reliable and ethically sourced proxies like those from Evomi, you can build effective and robust web scraping solutions directly from the command line. Happy scraping!

Getting Started with Web Scraping using cURL and Proxies

Using cURL with a proxy is a straightforward way to gather web data without running into immediate blocks. It’s a classic combination for a reason.

cURL itself is a remarkably versatile command-line tool. It's flexible, widely supported across operating systems and programming environments, and relatively easy to pick up.

This makes cURL a solid choice for web scraping tasks. However, sending repeated requests from your own IP address using basic cURL commands is a surefire way to get flagged or blocked by websites.

So, today we'll explore how to effectively use cURL for scraping, focusing on integrating proxies to keep your operations smooth and undetected. We'll cover the basics of cURL, why you'd choose it, how to configure proxy authentication, and other techniques to mask your scraping activity.

We'll provide practical examples you can test yourself. Let’s dive in!

What Exactly is cURL?

cURL, which stands for Client URL, is fundamentally a tool for transferring data with URLs. It lets you send requests to a server, specify data or headers, and retrieve the response, much like a web browser does under the hood, but from your command line.

You can perform various HTTP actions: fetch a webpage's HTML, interact with an API endpoint, submit form data, and more. For insights on specific tasks like file transfers, check out our guide on how to download files using cURL with proxies.

Its power in web scraping comes from this direct interaction capability and its extensive set of options (over 200 flags!) that allow fine-grained control over requests.

A simple command might look like this, fetching the content of a test site:

curl http://httpbin.org/html

This returns the basic HTML structure of that page.

But cURL handles complexity too. You can combine flags for more advanced scenarios, like fetching multiple sequentially numbered pages through a specific proxy server:

curl \
  -U your_username:your_password \
  -x http://rp.evomi.com:1000 \
  http://example.com/page/[1-5]

(Remember to replace your_username:your_password with your actual Evomi credentials and adjust the proxy endpoint/port if needed, e.g., `dc.evomi.com:2000` for datacenter proxies).

This command attempts to fetch pages 1 through 5 from `example.com`, routing each request through the specified Evomi residential proxy (`rp.evomi.com` on port `1000`). This technique of using ranges in the URL is known as "URL globbing".

It's crucial to understand that cURL is a data transfer tool, not a web browser. It won't parse HTML, execute JavaScript, or render pages. It simply fetches the raw response from the server.

Therefore, cURL excels at scraping data from:

Static HTML websites (where content doesn't rely heavily on JavaScript).
Backend API endpoints (often returning structured data like JSON).
Direct XHR (XMLHttpRequest) requests discovered via browser developer tools.

How to Configure Proxy Access in cURL Commands

When using proxies with cURL, especially authenticated ones like those from Evomi, you need to provide the connection details. There are a few common ways to do this:

Directly within the cURL command itself.
Using environment variables or command aliases.
Through a dedicated cURL configuration file (`.curlrc`).

Using a reliable proxy service is key to avoiding blocks during scraping. Evomi offers robust residential proxies sourced ethically, alongside datacenter, mobile, and static ISP options, all designed for performance and reliability. We even offer free trials for residential, mobile and datacenter proxies if you want to test them out.

The most basic method for using an authenticated proxy in cURL involves the `-x` (or `--proxy`) flag for the proxy address and port, and the `-U` (or `--proxy-user`) flag for the username and password.

Here's the syntax using the short flags:

curl \
  -U your_username:your_password \
  -x http://rp.evomi.com:1000 \
  https://api.ipify.org?format=json

And here's the equivalent using the long flags:

curl \
  --proxy-user your_username:your_password \
  --proxy http://rp.evomi.com:1000 \
  https://api.ipify.org?format=json

Both commands achieve the same result: fetching your public IP address (as seen by the target server) via the specified Evomi residential proxy. This confirms the proxy is working.

While simple, embedding credentials directly in commands can be repetitive and might expose them in your command history or process list, which could be a security concern on shared systems. Let's look at more secure and convenient methods.

Can cURL Use Proxy Environment Variables?

Yes, cURL automatically checks for specific environment variables to configure proxy settings. This is a convenient way to set your proxy details once per shell session, or even persistently.

You typically set these using the `export` command in Bash-compatible shells. The relevant variables are `http_proxy` for HTTP URLs and `https_proxy` for HTTPS URLs.

Here’s how you might set them up for an Evomi residential proxy:

export http_proxy="http://your_username:your_password@rp.evomi.com:1000"
export https_proxy="http://your_username:your_password@rp.evomi.com:1000"

# Verify they are set (optional)
echo $http_proxy
echo $https_proxy

# Now run a cURL command without explicit proxy flags
curl https://api.ipify.org?format=json

After exporting these variables, any subsequent `curl` command in that shell session targeting an HTTP or HTTPS URL will automatically use the specified proxy and credentials.

The main consideration here is that these are *environment* variables. Other programs in the same shell environment that also respect these variables (like `wget`, `pip`, `apt`, etc.) will *also* use this proxy. If that's your intention, great! If not, you might prefer a method that only affects cURL.

Setting Up a cURL Alias for Proxied Requests

Another neat trick is to create a command-line alias. An alias is essentially a shortcut or a custom command name that expands to a longer, predefined command.

You could create an alias, say `curl_evomi`, that automatically includes your Evomi proxy settings whenever you use it.

Here's how you'd define such an alias in your shell:

alias curl_evomi='curl -x http://rp.evomi.com:1000 -U your_username:your_password'
# Now use the alias like a regular command
curl_evomi https://api.ipify.org?format=json

(Note: Online shell environments might not support alias creation effectively for testing.)

When you type `curl_evomi https://api.ipify.org?format=json`, the shell replaces `curl_evomi` with the full command `curl -x http://rp.evomi.com:1000 -U your_username:your_password`, and then appends the URL. This keeps your regular `curl` command unchanged while providing a specific command for proxied requests.

To make this alias permanent, add the `alias` definition line to your shell's startup file (e.g., `~/.bashrc`, `~/.bash_profile`, or `~/.zshrc` depending on your shell) and then reload your shell configuration (e.g., `source ~/.bashrc`) or open a new terminal window.

Creating and Using a `.curlrc` Configuration File

For settings you want cURL to use automatically for *every* invocation, without relying on environment variables or aliases, you can use a configuration file named `.curlrc`.

cURL looks for this file in your home directory by default (`~/.curlrc` on Linux/macOS, `_curlrc` in your user profile directory on Windows). Any valid cURL command-line options can be placed in this file, one per line.

To configure your Evomi proxy, you could create a `.curlrc` file with the following content:

# ~/.curlrc configuration for Evomi Proxy
proxy = "rp.evomi.com:1000"
proxy-user = "your_username:your_password"
# You can add other default options here too, e.g.:
# user-agent = "MyCustomScraper/1.0"

With this file in place, any `curl` command you run will automatically use these proxy settings, unless you override them with command-line flags for a specific request. This is often considered the cleanest way to set persistent defaults for cURL.

Passing Custom Headers with cURL

Beyond using proxies, sending appropriate HTTP headers is crucial for mimicking legitimate browser traffic and avoiding detection. Many websites check headers like `User-Agent`, `Accept-Language`, etc.

You can add custom headers to your cURL requests using the `-H` or `--header` flag, and you can use it multiple times for multiple headers.

For example, let's set a custom `User-Agent` and an `Accept-Language` header, and use the `-v` (`--verbose`) flag to see the outgoing request details:

curl https://httpbin.org/headers \
  -H "User-Agent: MyDataFetcherBot/2.1" \
  -H "Accept-Language: en-US,en;q=0.9" \
  -v

In the verbose output (usually prefixed with `>`), you'll see these headers being sent to the server:

> GET /headers HTTP/1.1
> Host: httpbin.org
> Accept: */*
> User-Agent: MyDataFetcherBot/2.1
> Accept-Language: en-US,en;q=0.9

How do you find realistic headers? Your browser's developer tools are your best friend. Open the Network tab, load the target page, click on a request, and examine the "Request Headers" section. You can copy these values or even right-click the request and copy the entire command as a cURL request (often available in Chrome/Firefox DevTools).

Tools like Evomi's Browser Fingerprint Checker can also show you what information your browser typically exposes.

Just like proxy settings, you can make common headers persistent using environment variables (though less common for headers), aliases, or by adding `header = "Header-Name: HeaderValue"` lines to your `.curlrc` file.

How to POST JSON Data using cURL

Sending data to a server, often in JSON format for APIs, is another common task. cURL provides the `--json` flag as a convenient shortcut for POSTing JSON.

Using `--json '{ "key": "value" }'` is equivalent to using:

-X POST (explicitly set method to POST)
-H "Content-Type: application/json"
-H "Accept: application/json" (often helpful)
-d '{ "key": "value" }' (or `--data`)

Here’s an example using the shortcut:

curl https://httpbin.org/post \
  --json '{ "query": "scraping data", "page": 1 }'

Note: The `--json` flag is relatively new. If you encounter an "unknown option" error, your cURL version might be older. In that case, use the explicit flags:

curl https://httpbin.org/post \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{ "query": "scraping data", "page": 1 }'

For larger JSON payloads, putting the data directly on the command line can be cumbersome. You can instead store your JSON in a file (e.g., `payload.json`) and tell cURL to read the data from that file using the `@` prefix with the `-d` or `--data` flag:

# Assuming payload.json contains: { "query": "scraping data", "page": 1 }
curl https://httpbin.org/post \
  -X POST \
  -H "Content-Type: application/json" \
  -d @payload.json

Check out our article on working with JSON in Python for more details on handling this data format.

Extracting Data from Responses with cURL and Grep

You've successfully fetched data using cURL, perhaps through a proxy and with custom headers. Now, how do you extract the specific information you need from the response?

If you're dealing with JSON responses from an API, tools like `jq` are excellent for parsing and filtering. However, if you're scraping HTML, a common command-line approach involves piping the cURL output to `grep` (Global Regular Expression Print).

`grep` allows you to search the input text (in this case, the HTML source from cURL) for lines matching a specified pattern, often a regular expression.

For instance, let's try to extract the main heading (h1 tag) from a simple example page:

curl http://httpbin.org/html --silent \
  | grep -Eo '<h1>.*</h1>'

Let's break down this command:

curl http://httpbin.org/html --silent: Fetches the HTML content. `--silent` suppresses progress meters and error messages, outputting only the page data.
|: The pipe symbol sends the output of the `curl` command as input to the `grep` command.
grep -Eo '<h1>.*</h1>': Searches the input for the pattern.
- -E: Use extended regular expressions.
- -o: Only output the matched part of the line(s), not the entire line.
- '<h1>.*</h1>': The pattern. It looks for the literal text `<h1>`, followed by any character (`.`) repeated zero or more times (`*`), followed by the literal text `<h1>`.

This command would output something like: <h1>Herman Melville - Moby Dick</h1>.

While `grep` can work for simple cases, parsing HTML with regular expressions is often fragile and complex. If the website structure changes slightly, your regex might break. Whenever possible, look for APIs or structured data formats (like JSON-LD embedded in HTML, or RSS feeds) provided by the site, as these are much more reliable to parse.

Wrapping Up

Today we explored how to leverage the power of cURL for web scraping while using proxies to avoid blocks. You learned several methods for configuring proxy authentication (command line, environment variables, aliases, `.curlrc`), the importance of setting appropriate headers, how to POST JSON data, and a basic technique for extracting information using `grep`.

Using these techniques, especially with reliable and ethically sourced proxies like those from Evomi, you can build effective and robust web scraping solutions directly from the command line. Happy scraping!

United States

United Kingdom

Germany

France

Japan

Canada

Australia

South Korea

Scrape Websites With cURL and Proxies (Live Examples)

Getting Started with Web Scraping using cURL and Proxies

What Exactly is cURL?

How to Configure Proxy Access in cURL Commands

Can cURL Use Proxy Environment Variables?

Setting Up a cURL Alias for Proxied Requests

Creating and Using a `.curlrc` Configuration File

Passing Custom Headers with cURL

How to POST JSON Data using cURL

Extracting Data from Responses with cURL and Grep

Wrapping Up

Getting Started with Web Scraping using cURL and Proxies

What Exactly is cURL?

How to Configure Proxy Access in cURL Commands

Can cURL Use Proxy Environment Variables?

Setting Up a cURL Alias for Proxied Requests

Creating and Using a `.curlrc` Configuration File

Passing Custom Headers with cURL

How to POST JSON Data using cURL

Extracting Data from Responses with cURL and Grep

Wrapping Up

Getting Started with Web Scraping using cURL and Proxies

What Exactly is cURL?

How to Configure Proxy Access in cURL Commands

Can cURL Use Proxy Environment Variables?

Setting Up a cURL Alias for Proxied Requests

Creating and Using a `.curlrc` Configuration File

Passing Custom Headers with cURL

How to POST JSON Data using cURL

Extracting Data from Responses with cURL and Grep

Wrapping Up

About Author

Like this article? Share it.

You asked, we answer - Users questions:

In This Article

Read More Blogs

Node Unblocker 2025: Web Scraping Step-by-Step

3 Steps to Master PHP Web Scraping with Proxies

Swiss Proxies: Gold Standard for Digital Security

Get Started with Swiss Quality Proxies

Get Started with Swiss Quality Proxies

Get Started with Swiss Quality Proxies