Async Web Scraping with Aiohttp: A Proxy Integration Guide

David Foster

Last edited on May 15, 2025
Last edited on May 15, 2025

Scraping Techniques

Diving into Asynchronous Web Scraping with Aiohttp and Proxies

Extracting data from the web, or web scraping, is a cornerstone technique for everything from tracking e-commerce prices to aggregating news feeds or monitoring financial markets. It's about using code to fetch information automatically and efficiently. If you're venturing into building your own scraper, getting comfortable with some coding is essential.

Python is a fantastic choice for this, largely thanks to its straightforward syntax and powerful libraries. When it comes to making HTTP requests, Python offers several popular options, including aiohttp, httpx, and the classic requests library. Each has its own strengths, which you can explore further in our comparison of httpx vs aiohttp vs requests.

This guide focuses on aiohttp, an asynchronous library. What does "asynchronous" mean here? It means aiohttp can juggle multiple web requests simultaneously without getting stuck waiting for each one to finish. This makes it incredibly efficient for tasks requiring many concurrent connections. However, to truly harness this power without running into roadblocks like IP bans, integrating proxies is crucial. Proxies help mask your origin and distribute your requests, making your scraping activities less likely to be flagged by target websites.

Here’s what we'll cover:

  • The basics of how aiohttp achieves concurrency.

  • Getting aiohttp set up on your machine.

  • Integrating proxies (like Evomi's residential proxies) with aiohttp.

  • Smart strategies for using proxies with aiohttp effectively.

Let's get started!

What is Aiohttp Anyway?

Think about needing to grab real-time data from several different online sources, perhaps fetching currency exchange rates from multiple financial websites at once. With traditional, synchronous libraries (like the standard `requests` library), your program would make a request, wait for the response, process it, then move to the next request, one by one. This sequential process can become a significant bottleneck, especially when dealing with many slow or unresponsive servers.

Aiohttp tackles this differently using Python's asyncio framework. It allows your program to initiate a request and then immediately move on to other tasks (like starting another request) without waiting for the first one to complete. When a response arrives, aiohttp handles it. This non-blocking approach means you can manage numerous HTTP operations concurrently, drastically speeding up I/O-bound tasks like web scraping.

Setting Up Aiohttp with Proxies

So, aiohttp lets you fire off requests rapidly. That's great for speed, but it also increases the chances of overwhelming a target server or triggering its defenses. Websites often monitor incoming traffic, and a sudden flood of requests from a single IP address is a classic sign of automated scraping. This can lead to rate limiting (slowing you down), CAPTCHAs, or outright IP blocks.

This is where proxies become indispensable. By routing your aiohttp requests through proxy servers, you change the source IP address for each request or group of requests. For high-volume scraping, simple datacenter proxies might not be enough. Rotating residential proxies, which use IP addresses assigned by ISPs to real home users, are often the gold standard. They provide a high degree of anonymity and legitimacy, making it harder for websites to distinguish your scraper from genuine user traffic. Evomi offers ethically sourced residential proxies starting at just $0.49/GB, perfect for these kinds of tasks.

Let's translate this into practical code.

What You'll Need

Before diving into the code, ensure you have:

  • Python 3.7 or a newer version installed.

  • Access to proxy servers. For robust scraping, consider rotating residential proxies. Evomi provides these, along with mobile and datacenter options, and even offers a completely free trial to test them out.

Got everything? Great, let's proceed.

Installing Aiohttp

Getting aiohttp is simple using pip, Python's package installer. Open your terminal or command prompt and type:

To verify the installation, you can check the installed package details:

This command should display information about the installed aiohttp library, including its version.

A Basic Script Using Aiohttp with a Proxy

Now that the setup is complete, let's write a simple Python script. We'll use `aiohttp` to fetch product names from a test e-commerce site, routing the request through an Evomi residential proxy. For this example, we'll target a test scraping site.

Step 1: Import Libraries

First, we need to import the necessary Python libraries: aiohttp for the web requests and asyncio to run our asynchronous code.

import aiohttp
import asyncio

Step 2: Configure Your Proxy

Define the proxy server details. We'll use Evomi's residential proxy endpoint format. Remember to replace placeholders with your actual credentials.

# Evomi Residential Proxy Configuration (replace with your details)
# Format: http://username:password@hostname:port
proxy_url = "http://user-xyz:pass123@rp.evomi.com:1000"

# If your proxy doesn't require auth embedded in the URL,
# you might use BasicAuth like this (adjust accordingly):
# proxy_url = "http://rp.evomi.com:1000"
# proxy_auth = aiohttp.BasicAuth('user-xyz', 'pass123')

# For this example, we embed auth in the proxy_url.

Note: Evomi offers different ports for HTTP (1000), HTTPS (1001), and SOCKS5 (1002) for residential proxies. Ensure you use the correct one for your needs.

Step 3: Write the Async Fetching Function

Let's create the asynchronous function that performs the actual web request via the proxy.

# Async function to fetch product names
async def fetch_product_names(session):
    target_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    print(f"Attempting to fetch data from: {target_url}")

    # Use the proxy configured earlier
    try:
        async with session.get(target_url, proxy=proxy_url) as response:
            # Check if the request was successful
            response.raise_for_status() # Raises an exception for bad status codes (4xx or 5xx)

            html_content = await response.text()

            # Basic extraction: find lines containing product titles (usually in links within description class)
            # This is a simplified approach; a real scraper would use libraries like BeautifulSoup
            product_lines = [line.strip() for line in html_content.splitlines() if 'class="title"' in line]

            # Further clean-up might be needed depending on HTML structure
            product_names = [line for line in product_lines if 'href' in line] # Filter for lines likely containing titles

            print("Successfully fetched data.")
            return product_names
    except aiohttp.ClientError as e:
        print(f"An error occurred: {e}")
        return [] # Return empty list on error
  • async def fetch_product_names(session): Defines an asynchronous function. async signals it can perform non-blocking operations.

  • target_url: The web page we want to scrape.

  • session.get(target_url, proxy=proxy_url): Makes an HTTP GET request using the provided session, directing it through our configured proxy_url.

  • response.raise_for_status(): A good practice to check if the request succeeded (status code 2xx).

  • await response.text(): Asynchronously gets the response body as text. await pauses this function until the text is ready, allowing other tasks to run.

  • Extracting Names: This example uses basic string searching ('class="title"') to find lines likely containing product titles. For robust scraping, libraries like Beautiful Soup or lxml are recommended for parsing HTML.

  • Error Handling: The try...except block catches potential connection or HTTP errors.

Step 4: Create the Main Async Function

This function orchestrates the process: it creates the aiohttp session and calls our fetching function.

# Main async function to manage the process
async def main():
    # Create a client session to manage connections
    async with aiohttp.ClientSession() as session:
        print("Client session started.")
        product_titles = await fetch_product_names(session)

        if product_titles:
            print("\n--- Extracted Product Title Lines ---")
            for title_line in product_titles:
                # Print the raw line containing the title - further parsing needed for clean titles
                print(title_line)
            print("------------------------------------")
        else:
            print("No product titles extracted.")

        print("Client session closed.")
  • async def main(): The main entry point for our async operations.

  • aiohttp.ClientSession(): Creates a session object. Using a session is efficient as it can reuse connections and manage cookies.

  • await fetch_product_names(session): Calls our fetching function and waits for it to complete.

  • Printing Results: Loops through the returned list and prints the lines identified as potentially containing titles.

Step 5: Run the Async Code

Finally, use asyncio.run() to execute the main function.

# Entry point to run the main async function
if __name__ == "__main__":
    print("Starting scraper...")
    asyncio.run(main())
    print("Scraper finished.")

Step 6: Putting It All Together

Here’s the complete script:

import aiohttp
import asyncio

# Evomi Residential Proxy Configuration (replace with your details)
# Format: http://username:password@hostname:port
proxy_url = "http://user-xyz:pass123@rp.evomi.com:1000"

# Async function to fetch product names
async def fetch_product_names(session):
    target_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    print(f"Attempting to fetch data from: {target_url} via proxy {proxy_url.split('@')[-1]}") # Hide credentials in log
    try:
        # Use the proxy configured earlier
        async with session.get(target_url, proxy=proxy_url) as response:
            # Check if the request was successful
            response.raise_for_status() # Raises an exception for bad status codes (4xx or 5xx)

            html_content = await response.text()

            # Basic extraction: find lines containing product titles (usually in links within description class)
            product_lines = [line.strip() for line in html_content.splitlines() if 'class="title"' in line]
            product_names = [line for line in product_lines if 'href' in line] # Filter for lines likely containing titles

            print(f"Successfully fetched data from {target_url}.")
            return product_names
    except aiohttp.ClientError as e:
        print(f"An error occurred while fetching {target_url}: {e}")
        return [] # Return empty list on error

# Main async function to manage the process
async def main():
    # Create a client session to manage connections
    async with aiohttp.ClientSession() as session:
        print("Client session started.")
        product_titles = await fetch_product_names(session)

        if product_titles:
            print("\n--- Extracted Product Title Lines ---")
            for title_line in product_titles:
                print(title_line)
            print("------------------------------------")
        else:
            print("No product titles extracted.")

        print("Client session closed.")

# Entry point to run the main async function
if __name__ == "__main__":
    print("Starting scraper...")
    asyncio.run(main())
    print("Scraper finished.")

If everything runs correctly, the script will connect through the specified Evomi proxy and print the HTML lines containing the product titles from the target page. This basic example demonstrates proxy integration; next, we'll explore more advanced techniques like rotation.

Advanced Aiohttp Proxy Strategies

The simple script works, but for serious scraping, relying on a single proxy IP isn't ideal. Let's enhance our script to handle multiple proxies and rotate them, significantly improving robustness and reducing the likelihood of blocks.

Managing and Rotating Multiple Proxies

We'll modify the code to use a list of proxies and select one randomly for each request. This distributes the load and makes the scraping pattern less predictable. We'll aim to scrape product names from the first few pages of the laptop category on our test site.

Step 1: Import Additional Libraries

We'll need the random library for selecting proxies and potentially re (regular expressions) or a parsing library like BeautifulSoup (recommended, but we'll stick to basic string methods for simplicity here, install with pip install beautifulsoup4 if you want to use it) for better data extraction.

import aiohttp
import asyncio
import random
# import re # If using regex for extraction
# from bs4 import BeautifulSoup # If using BeautifulSoup for parsing

Step 2: Define Your Proxy List

Create a list containing your proxy connection strings. We'll use Evomi's datacenter proxy endpoint format as an example. Datacenter proxies (starting at $0.30/GB with Evomi) can be cost-effective for some tasks, though residential might be needed for stricter sites.

# List of Evomi Datacenter Proxies (replace with your actual proxies)
# Format: http://username:password@hostname:port
proxy_list = [
    "http://user-dc1:pass123@dc.evomi.com:2000",
    "http://user-dc2:pass456@dc.evomi.com:2000",
    "http://user-dc3:pass789@dc.evomi.com:2000",
    "http://user-dc4:passabc@dc.evomi.com:2000",
    "http://user-dc5:passdef@dc.evomi.com:2000",
]

Reminder: Evomi datacenter proxies use ports 2000 (HTTP), 2001 (HTTPS), and 2002 (SOCKS5).

Step 3: Update the Fetching Function

Modify the function to accept a URL and a specific proxy from the list for each call. We'll also add error handling specific to proxy connections.

# Updated async function to fetch data from a specific URL using a specific proxy
async def fetch_page_data(session, page_url, proxy):
    # Log proxy host, hide credentials
    proxy_host = "N/A"
    if proxy:
        try:
            # Attempt to extract host safely
            proxy_host = proxy.split('@')[-1].split(':')[0]
        except IndexError:
            proxy_host = "Invalid Format" # Or handle as needed

    print(f"Fetching {page_url} using proxy {proxy_host}...")
    try:
        async with session.get(page_url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=15)) as response:
            response.raise_for_status() # Check for HTTP errors
            html_content = await response.text()

            # --- Extraction Logic ---
            # Replace this with more robust parsing (e.g., BeautifulSoup)
            product_lines = [line.strip() for line in html_content.splitlines() if 'class="title"' in line]
            product_names = [line for line in product_lines if 'href' in line]
            # --- End Extraction Logic ---

            print(f"Successfully fetched {page_url}")
            return product_names

    # Handle proxy-specific connection errors
    except aiohttp.ClientProxyConnectionError as e:
        print(f"Proxy Connection Error for {proxy_host}: {e}")
        return None # Indicate failure for this proxy/page
    # Handle other potential client errors (timeout, DNS issues, etc.)
    except aiohttp.ClientError as e:
        print(f"Client Error fetching {page_url} via {proxy_host}: {e}")
        return None
    # Handle timeouts specifically
    except asyncio.TimeoutError:
        print(f"Timeout fetching {page_url} via {proxy_host}")
        return None
  • The function now takes page_url and proxy as arguments.

  • random.choice(proxy_list) will be used in the main loop to pick a proxy.

  • We added specific error handling for aiohttp.ClientProxyConnectionError.

  • A timeout (e.g., 15 seconds) is added to prevent hanging on unresponsive proxies/servers.

  • The extraction logic remains basic; consider using BeautifulSoup for real-world scenarios:

    # Example with BeautifulSoup (install first: pip install beautifulsoup4)
    # from bs4 import BeautifulSoup
    # soup = BeautifulSoup(html_content, 'html.parser')
    # titles = [a['title'] for a in soup.select('a.title')]
    # return titles

Step 4: Update the Main Function

The main function will now manage the loop for multiple pages, select proxies randomly, and gather results.

# Updated main function for rotating proxies and multiple pages
async def main():
    base_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    # Let's try to scrape the first 3 pages (assuming pagination exists or modify URLs accordingly)
    # NOTE: This site might not have simple pagination; adjust target URLs as needed.
    # For this example, we'll just fetch the same page multiple times with different proxies.
    num_requests = 5  # Make 5 requests in total
    target_urls = [base_url] * num_requests  # Re-use the base URL for demo purposes
    all_results = {}  # Dictionary to store results per proxy

    async with aiohttp.ClientSession() as session:
        tasks = []
        for i in range(num_requests):
            selected_proxy = random.choice(proxy_list)
            page_url = target_urls[i]  # In a real case, this would be page_url_1, page_url_2 etc.

            # Create an asyncio task for each request
            task = asyncio.create_task(fetch_page_data(session, page_url, selected_proxy))
            tasks.append((selected_proxy, task))  # Store proxy with its task

        # Wait for all tasks to complete
        results = await asyncio.gather(*(task for _, task in tasks))

        # Process results
        print("\n--- Scraping Results ---")
        for i, (selected_proxy, _) in enumerate(tasks):
            proxy_host = selected_proxy.split('@')[-1]
            if results[i] is not None:
                print(f"Proxy {proxy_host}: Successfully fetched {len(results[i])} items.")
                if proxy_host not in all_results:
                    all_results[proxy_host] = []
                all_results[proxy_host].extend(results[i])  # Append results for this proxy
            else:
                print(f"Proxy {proxy_host}: Failed to fetch data.")

    # Optional: Print combined results per proxy
    print("\n--- Combined Data Per Proxy ---")
    for proxy_host, data in all_results.items():
        print(f"\nData from Proxy: {proxy_host}")
        # Print first few items as example
        for item in data[:3]:
            print(f"  - {item}")
        if len(data) > 3:
            print(f"  ... and {len(data) - 3} more items")

    print("\nClient session closed.")
  • We define the number of requests/pages to fetch.

  • random.choice(proxy_list) selects a proxy for each request.

  • asyncio.create_task() creates tasks for concurrent execution.

  • asyncio.gather() runs all tasks concurrently and collects their results.

  • The results are processed, showing which proxy fetched what data (or if it failed).

Step 5: Run the Updated Code

Use the standard asyncio entry point:

# Run the main async function
if __name__ == "__main__":
    if not proxy_list:
        print("Error: Proxy list is empty. Please add proxies.")
    else:
        print("Starting multi-proxy scraper...")
        asyncio.run(main())
        print("Scraper finished.")

Step 6: The Complete Rotating Proxy Script

Here is the full code combining these changes:

import aiohttp
import asyncio
import random

# import re # Uncomment if using regex
# from bs4 import BeautifulSoup # Uncomment if using BeautifulSoup

# List of Evomi Datacenter Proxies (replace with your actual proxies)
proxy_list = [
    "http://user-dc1:pass123@dc.evomi.com:2000",
    "http://user-dc2:pass456@dc.evomi.com:2000",
    "http://user-dc3:pass789@dc.evomi.com:2000",
    "http://user-dc4:passabc@dc.evomi.com:2000",
    "http://user-dc5:passdef@dc.evomi.com:2000",
]


# Updated async function to fetch data from a specific URL using a specific proxy
async def fetch_page_data(session, page_url, proxy):
    proxy_host_for_log = proxy.split('@')[-1] if '@' in proxy else proxy  # Log proxy host, hide credentials
    print(f"Fetching {page_url} using proxy {proxy_host_for_log}...")
    try:
        # Increased timeout
        async with session.get(
            page_url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=20)
        ) as response:
            response.raise_for_status()  # Check for HTTP errors
            html_content = await response.text()

            # --- Basic Extraction Logic ---
            product_lines = [
                line.strip()
                for line in html_content.splitlines()
                if 'class="title"' in line
            ]
            product_names = [line for line in product_lines if 'href' in line]
            # --- End Extraction Logic ---

            print(f"Successfully fetched {page_url} via {proxy_host_for_log}")
            return product_names

    except aiohttp.ClientProxyConnectionError as e:
        print(f"Proxy Connection Error for {proxy_host_for_log}: {e}")
        return None
    except aiohttp.ClientError as e:
        print(f"Client Error fetching {page_url} via {proxy_host_for_log}: {e}")
        return None
    except asyncio.TimeoutError:
        print(f"Timeout fetching {page_url} via {proxy_host_for_log}")
        return None


# Updated main function for rotating proxies and multiple pages
async def main():
    base_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    num_requests = 5
    target_urls = [base_url] * num_requests
    all_results = {}

    # Use a TCPConnector to limit concurrent connections if needed
    # connector = aiohttp.TCPConnector(limit=10) # Limit to 10 concurrent connections per host
    # async with aiohttp.ClientSession(connector=connector) as session:
    async with aiohttp.ClientSession() as session:  # Default connector
        tasks = []
        # Select proxies for this run
        proxies_in_use = random.sample(proxy_list, min(num_requests, len(proxy_list)))

        for i in range(num_requests):
            # Cycle through the selected proxies if num_requests > len(proxies_in_use)
            selected_proxy = proxies_in_use[i % len(proxies_in_use)]
            page_url = target_urls[i]

            task = asyncio.create_task(
                fetch_page_data(session, page_url, selected_proxy)
            )
            tasks.append((selected_proxy, task))

        # Capture exceptions too
        results = await asyncio.gather(*(task for _, task in tasks), return_exceptions=True)

        print("\n--- Scraping Results ---")
        successful_fetches = 0
        failed_fetches = 0

        for i, (selected_proxy, _) in enumerate(tasks):
            proxy_host = selected_proxy.split('@')[-1] if '@' in selected_proxy else selected_proxy
            if isinstance(results[i], Exception):
                print(f"Proxy {proxy_host}: Task failed with exception: {results[i]}")
                failed_fetches += 1
            elif results[i] is not None:
                print(f"Proxy {proxy_host}: Successfully fetched {len(results[i])} items.")
                if proxy_host not in all_results:
                    all_results[proxy_host] = []
                all_results[proxy_host].extend(results[i])
                successful_fetches += 1
            else:
                # This case might happen if fetch_page_data returned None without an exception
                print(f"Proxy {proxy_host}: Failed to fetch data (returned None).")
                failed_fetches += 1

        print(f"\nSummary: {successful_fetches} successful fetches, {failed_fetches} failed fetches.")

    # Optional: Print combined results per proxy
    print("\n--- Combined Data Per Proxy (Sample) ---")
    for proxy_host, data in all_results.items():
        print(f"\nData from Proxy: {proxy_host} ({len(data)} total items)")
        for item in data[:3]:
            print(f"  - {item}")
        if len(data) > 3:
            print(f"  ... and {len(data) - 3} more items")

    print("\nClient session closed.")


# Run the main async function
if __name__ == "__main__":
    if not proxy_list:
        print("Error: Proxy list is empty. Please add proxies.")
    else:
        print(f"Starting multi-proxy scraper with {len(proxy_list)} proxies...")
        asyncio.run(main())
        print("Scraper finished.")

When you run this script, you'll see output indicating which proxy is being used for each request. The final summary will show the data collected via each proxy IP, demonstrating successful rotation. This approach significantly enhances the resilience of your scraper against IP-based blocks.

Handling Proxy Authentication

Our examples already incorporate proxy authentication directly within the proxy URL string:

http://USERNAME:PASSWORD@HOSTNAME:PORT

aiohttp automatically parses this format. When you pass a URL like "http://user-dc1:pass123@dc.evomi.com:2000" to the proxy parameter in session.get(), aiohttp handles the necessary Proxy-Authorization header for Basic Authentication.

Alternatively, if your proxy provider requires or if you prefer separating credentials, you can use aiohttp.BasicAuth:

proxy_url_no_auth = "http://dc.evomi.com:2000"
auth = aiohttp.BasicAuth("user-dc1", "pass123")

# ... inside your async function ...
async with session.get(target_url, proxy=proxy_url_no_auth, proxy_auth=auth) as response:
    # ... rest of the code

Both methods achieve the same result. Using the embedded format is often more convenient when managing lists of proxies.

Securing Connections with SSL

When scraping sites over HTTPS or handling sensitive data, ensuring your connection is encrypted via SSL/TLS is vital. aiohttp handles SSL verification by default when connecting to HTTPS URLs.

Our examples used HTTP URLs (http://...). If you target HTTPS sites (https://...), aiohttp will automatically attempt an SSL handshake. By default, it verifies the server's SSL certificate against a trusted set of Certificate Authorities (CAs), usually provided by the certifi library.

You generally don't need to manually configure SSL unless:

  1. You need to trust a self-signed certificate (common in testing environments).

  2. You want to disable SSL verification (strongly discouraged for production as it opens you to man-in-the-middle attacks).

  3. You need to specify a particular set of CAs.

To customize SSL behavior, you create an ssl.SSLContext:

import ssl

# Create a default SSL context (recommended starting point)
ssl_context = ssl.create_default_context()

# Example: Load custom CA bundle (if needed)
# ssl_context.load_verify_locations(cafile='/path/to/custom/ca.crt')

# Example: Disable verification (DANGEROUS - for testing only)
# ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
# ssl_context.check_hostname = False
# ssl_context.verify_mode = ssl.CERT_NONE

# Pass the context to the session.get call
async with session.get(https_url, proxy=proxy, ssl=ssl_context) as response:
    # ...

For most scraping tasks involving standard HTTPS websites, the default SSL handling in aiohttp is sufficient and secure.

Best Practices for Aiohttp Proxy Usage

You've now got the technical skills to integrate and rotate proxies with aiohttp. To maximize effectiveness and minimize disruptions, consider these best practices:

Tips for Staying Under the Radar

  • Choose Quality Proxies: Not all proxies are equal. Opt for reputable providers like Evomi, known for reliable and ethically sourced residential or mobile proxies. These blend in better with normal user traffic compared to datacenter IPs, especially on stricter websites. Our Swiss base also reflects a commitment to quality and privacy. You can always verify proxy performance using tools like our free Proxy Tester.

  • Implement Smart Rotation: Don't just use multiple proxies; rotate them intelligently. Avoid hitting the same domain repeatedly with the same IP in a short period. The random rotation shown earlier is a good start. For large-scale scraping, consider session-based rotation (keeping one IP for a user's "session" on a site) or geographic targeting if needed.

  • Mimic Human Behavior: Automation is fast, humans aren't always. Introduce random delays between requests to avoid predictable, machine-like patterns.

    import time # Inside your loop or before making a request
    sleep_time = random.uniform(1.5, 4.5) # Wait 1.5 to 4.5 seconds
    print(f"Sleeping for {sleep_time:.2f} seconds...")
    await asyncio.sleep(sleep_time) # Now make the request...
  • Manage Headers and Fingerprints: Send realistic User-Agent strings and other HTTP headers that match common browsers. Be aware of browser fingerprinting techniques websites might use. Tools like Evomi's Browser Fingerprint Checker can show what sites see, and our antidetect browser, Evomium (free for customers), is designed to manage these fingerprints effectively.

  • Respect robots.txt: While not technically related to anonymity, respecting a site's robots.txt file (which outlines scraping rules) is good practice and can prevent legal or ethical issues.

  • Use SSL/TLS Correctly: Always use HTTPS where available and ensure SSL verification is enabled unless you have a very specific, understood reason to disable it.

  • Handle CAPTCHAs Gracefully: If you encounter CAPTCHAs, integrate a solving service. Don't just give up or hammer the site. Check out options in our review of top CAPTCHA solvers.

Common Problems and Fixes

Even with best practices, you might hit snags. Here are common aiohttp proxy-related errors and how to approach them:

  • aiohttp.ClientProxyConnectionError: This usually means your script couldn't even reach the proxy server.

    • Check Proxy Details: Double-check the IP/hostname, port, username, and password. Typos are common!

    • Verify Proxy Status: Is the proxy online and working? Use a tool like Evomi's Proxy Tester or simple curl -x http://user:pass@proxy:port http://example.com from your terminal.

    • Firewall Issues: Ensure no local or network firewall is blocking the connection to the proxy port.

  • aiohttp.ClientHttpProxyError / Status Code 407 Proxy Authentication Required: The connection to the proxy worked, but authentication failed.

    • Check Credentials: Verify the username and password again.

    • Authentication Format: Ensure you're using the correct authentication method (Basic Auth is common, handled by the URL format or aiohttp.BasicAuth). Check your provider's documentation.

    • IP Authorization: Some providers require you to authorize the IP address *from which* you are connecting to the proxy. Check your Evomi dashboard or provider's settings.

  • aiohttp.ClientHttpProxyError / Other 4xx/5xx Status Codes from Proxy: The proxy responded, but with an error (e.g., 403 Forbidden, 502 Bad Gateway).

    • Proxy Restrictions: The proxy itself might be blocked from accessing the target site, or it might have internal issues. Try a different proxy from your pool.

    • Provider Issue: There might be a temporary problem with the proxy service. Check the provider's status page or contact support.

  • asyncio.TimeoutError or aiohttp.ServerTimeoutError: The request took too long.

    • Increase Timeout: The default timeout might be too short for slow proxies or target sites. Increase it in aiohttp.ClientTimeout(total=...) passed to the request or session.

    • Proxy Performance: The specific proxy might be slow or overloaded. Rotate to a different one.

    • Target Server Slow: The website you're scraping might be slow to respond.

  • aiohttp.ClientSSLError: An issue occurred during the SSL handshake with the *target* server (when using HTTPS).

    • Outdated Certificates: Ensure your system's CA certificates (often managed by `certifi`) are up to date (`pip install --upgrade certifi`).

    • Server Configuration: The target website might have an invalid or misconfigured SSL certificate. You might need to investigate further or, as a last resort (and if you understand the risks), customize the SSL context to be less strict (see SSL section above).

    • Proxy Interference (Less Common): Some proxies (especially transparent ones, not typically used for scraping this way) might interfere with SSL. Ensure you're using appropriate HTTP/S or SOCKS proxies designed for this.

Wrapping Up

Hopefully, this guide provides a solid foundation for using `aiohttp` with proxies for your web scraping projects. The asynchronous nature of `aiohttp` offers significant performance benefits, while proxies provide the necessary means to scrape responsibly and avoid interruptions. Remember that successful scraping often involves combining the right tools (`aiohttp`, quality proxies like those from Evomi) with smart strategies (rotation, delays, header management).

Python's ecosystem offers many tools for web scraping beyond `aiohttp`. To explore other options, take a look at our overview of the best Python web scraping libraries.

Diving into Asynchronous Web Scraping with Aiohttp and Proxies

Extracting data from the web, or web scraping, is a cornerstone technique for everything from tracking e-commerce prices to aggregating news feeds or monitoring financial markets. It's about using code to fetch information automatically and efficiently. If you're venturing into building your own scraper, getting comfortable with some coding is essential.

Python is a fantastic choice for this, largely thanks to its straightforward syntax and powerful libraries. When it comes to making HTTP requests, Python offers several popular options, including aiohttp, httpx, and the classic requests library. Each has its own strengths, which you can explore further in our comparison of httpx vs aiohttp vs requests.

This guide focuses on aiohttp, an asynchronous library. What does "asynchronous" mean here? It means aiohttp can juggle multiple web requests simultaneously without getting stuck waiting for each one to finish. This makes it incredibly efficient for tasks requiring many concurrent connections. However, to truly harness this power without running into roadblocks like IP bans, integrating proxies is crucial. Proxies help mask your origin and distribute your requests, making your scraping activities less likely to be flagged by target websites.

Here’s what we'll cover:

  • The basics of how aiohttp achieves concurrency.

  • Getting aiohttp set up on your machine.

  • Integrating proxies (like Evomi's residential proxies) with aiohttp.

  • Smart strategies for using proxies with aiohttp effectively.

Let's get started!

What is Aiohttp Anyway?

Think about needing to grab real-time data from several different online sources, perhaps fetching currency exchange rates from multiple financial websites at once. With traditional, synchronous libraries (like the standard `requests` library), your program would make a request, wait for the response, process it, then move to the next request, one by one. This sequential process can become a significant bottleneck, especially when dealing with many slow or unresponsive servers.

Aiohttp tackles this differently using Python's asyncio framework. It allows your program to initiate a request and then immediately move on to other tasks (like starting another request) without waiting for the first one to complete. When a response arrives, aiohttp handles it. This non-blocking approach means you can manage numerous HTTP operations concurrently, drastically speeding up I/O-bound tasks like web scraping.

Setting Up Aiohttp with Proxies

So, aiohttp lets you fire off requests rapidly. That's great for speed, but it also increases the chances of overwhelming a target server or triggering its defenses. Websites often monitor incoming traffic, and a sudden flood of requests from a single IP address is a classic sign of automated scraping. This can lead to rate limiting (slowing you down), CAPTCHAs, or outright IP blocks.

This is where proxies become indispensable. By routing your aiohttp requests through proxy servers, you change the source IP address for each request or group of requests. For high-volume scraping, simple datacenter proxies might not be enough. Rotating residential proxies, which use IP addresses assigned by ISPs to real home users, are often the gold standard. They provide a high degree of anonymity and legitimacy, making it harder for websites to distinguish your scraper from genuine user traffic. Evomi offers ethically sourced residential proxies starting at just $0.49/GB, perfect for these kinds of tasks.

Let's translate this into practical code.

What You'll Need

Before diving into the code, ensure you have:

  • Python 3.7 or a newer version installed.

  • Access to proxy servers. For robust scraping, consider rotating residential proxies. Evomi provides these, along with mobile and datacenter options, and even offers a completely free trial to test them out.

Got everything? Great, let's proceed.

Installing Aiohttp

Getting aiohttp is simple using pip, Python's package installer. Open your terminal or command prompt and type:

To verify the installation, you can check the installed package details:

This command should display information about the installed aiohttp library, including its version.

A Basic Script Using Aiohttp with a Proxy

Now that the setup is complete, let's write a simple Python script. We'll use `aiohttp` to fetch product names from a test e-commerce site, routing the request through an Evomi residential proxy. For this example, we'll target a test scraping site.

Step 1: Import Libraries

First, we need to import the necessary Python libraries: aiohttp for the web requests and asyncio to run our asynchronous code.

import aiohttp
import asyncio

Step 2: Configure Your Proxy

Define the proxy server details. We'll use Evomi's residential proxy endpoint format. Remember to replace placeholders with your actual credentials.

# Evomi Residential Proxy Configuration (replace with your details)
# Format: http://username:password@hostname:port
proxy_url = "http://user-xyz:pass123@rp.evomi.com:1000"

# If your proxy doesn't require auth embedded in the URL,
# you might use BasicAuth like this (adjust accordingly):
# proxy_url = "http://rp.evomi.com:1000"
# proxy_auth = aiohttp.BasicAuth('user-xyz', 'pass123')

# For this example, we embed auth in the proxy_url.

Note: Evomi offers different ports for HTTP (1000), HTTPS (1001), and SOCKS5 (1002) for residential proxies. Ensure you use the correct one for your needs.

Step 3: Write the Async Fetching Function

Let's create the asynchronous function that performs the actual web request via the proxy.

# Async function to fetch product names
async def fetch_product_names(session):
    target_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    print(f"Attempting to fetch data from: {target_url}")

    # Use the proxy configured earlier
    try:
        async with session.get(target_url, proxy=proxy_url) as response:
            # Check if the request was successful
            response.raise_for_status() # Raises an exception for bad status codes (4xx or 5xx)

            html_content = await response.text()

            # Basic extraction: find lines containing product titles (usually in links within description class)
            # This is a simplified approach; a real scraper would use libraries like BeautifulSoup
            product_lines = [line.strip() for line in html_content.splitlines() if 'class="title"' in line]

            # Further clean-up might be needed depending on HTML structure
            product_names = [line for line in product_lines if 'href' in line] # Filter for lines likely containing titles

            print("Successfully fetched data.")
            return product_names
    except aiohttp.ClientError as e:
        print(f"An error occurred: {e}")
        return [] # Return empty list on error
  • async def fetch_product_names(session): Defines an asynchronous function. async signals it can perform non-blocking operations.

  • target_url: The web page we want to scrape.

  • session.get(target_url, proxy=proxy_url): Makes an HTTP GET request using the provided session, directing it through our configured proxy_url.

  • response.raise_for_status(): A good practice to check if the request succeeded (status code 2xx).

  • await response.text(): Asynchronously gets the response body as text. await pauses this function until the text is ready, allowing other tasks to run.

  • Extracting Names: This example uses basic string searching ('class="title"') to find lines likely containing product titles. For robust scraping, libraries like Beautiful Soup or lxml are recommended for parsing HTML.

  • Error Handling: The try...except block catches potential connection or HTTP errors.

Step 4: Create the Main Async Function

This function orchestrates the process: it creates the aiohttp session and calls our fetching function.

# Main async function to manage the process
async def main():
    # Create a client session to manage connections
    async with aiohttp.ClientSession() as session:
        print("Client session started.")
        product_titles = await fetch_product_names(session)

        if product_titles:
            print("\n--- Extracted Product Title Lines ---")
            for title_line in product_titles:
                # Print the raw line containing the title - further parsing needed for clean titles
                print(title_line)
            print("------------------------------------")
        else:
            print("No product titles extracted.")

        print("Client session closed.")
  • async def main(): The main entry point for our async operations.

  • aiohttp.ClientSession(): Creates a session object. Using a session is efficient as it can reuse connections and manage cookies.

  • await fetch_product_names(session): Calls our fetching function and waits for it to complete.

  • Printing Results: Loops through the returned list and prints the lines identified as potentially containing titles.

Step 5: Run the Async Code

Finally, use asyncio.run() to execute the main function.

# Entry point to run the main async function
if __name__ == "__main__":
    print("Starting scraper...")
    asyncio.run(main())
    print("Scraper finished.")

Step 6: Putting It All Together

Here’s the complete script:

import aiohttp
import asyncio

# Evomi Residential Proxy Configuration (replace with your details)
# Format: http://username:password@hostname:port
proxy_url = "http://user-xyz:pass123@rp.evomi.com:1000"

# Async function to fetch product names
async def fetch_product_names(session):
    target_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    print(f"Attempting to fetch data from: {target_url} via proxy {proxy_url.split('@')[-1]}") # Hide credentials in log
    try:
        # Use the proxy configured earlier
        async with session.get(target_url, proxy=proxy_url) as response:
            # Check if the request was successful
            response.raise_for_status() # Raises an exception for bad status codes (4xx or 5xx)

            html_content = await response.text()

            # Basic extraction: find lines containing product titles (usually in links within description class)
            product_lines = [line.strip() for line in html_content.splitlines() if 'class="title"' in line]
            product_names = [line for line in product_lines if 'href' in line] # Filter for lines likely containing titles

            print(f"Successfully fetched data from {target_url}.")
            return product_names
    except aiohttp.ClientError as e:
        print(f"An error occurred while fetching {target_url}: {e}")
        return [] # Return empty list on error

# Main async function to manage the process
async def main():
    # Create a client session to manage connections
    async with aiohttp.ClientSession() as session:
        print("Client session started.")
        product_titles = await fetch_product_names(session)

        if product_titles:
            print("\n--- Extracted Product Title Lines ---")
            for title_line in product_titles:
                print(title_line)
            print("------------------------------------")
        else:
            print("No product titles extracted.")

        print("Client session closed.")

# Entry point to run the main async function
if __name__ == "__main__":
    print("Starting scraper...")
    asyncio.run(main())
    print("Scraper finished.")

If everything runs correctly, the script will connect through the specified Evomi proxy and print the HTML lines containing the product titles from the target page. This basic example demonstrates proxy integration; next, we'll explore more advanced techniques like rotation.

Advanced Aiohttp Proxy Strategies

The simple script works, but for serious scraping, relying on a single proxy IP isn't ideal. Let's enhance our script to handle multiple proxies and rotate them, significantly improving robustness and reducing the likelihood of blocks.

Managing and Rotating Multiple Proxies

We'll modify the code to use a list of proxies and select one randomly for each request. This distributes the load and makes the scraping pattern less predictable. We'll aim to scrape product names from the first few pages of the laptop category on our test site.

Step 1: Import Additional Libraries

We'll need the random library for selecting proxies and potentially re (regular expressions) or a parsing library like BeautifulSoup (recommended, but we'll stick to basic string methods for simplicity here, install with pip install beautifulsoup4 if you want to use it) for better data extraction.

import aiohttp
import asyncio
import random
# import re # If using regex for extraction
# from bs4 import BeautifulSoup # If using BeautifulSoup for parsing

Step 2: Define Your Proxy List

Create a list containing your proxy connection strings. We'll use Evomi's datacenter proxy endpoint format as an example. Datacenter proxies (starting at $0.30/GB with Evomi) can be cost-effective for some tasks, though residential might be needed for stricter sites.

# List of Evomi Datacenter Proxies (replace with your actual proxies)
# Format: http://username:password@hostname:port
proxy_list = [
    "http://user-dc1:pass123@dc.evomi.com:2000",
    "http://user-dc2:pass456@dc.evomi.com:2000",
    "http://user-dc3:pass789@dc.evomi.com:2000",
    "http://user-dc4:passabc@dc.evomi.com:2000",
    "http://user-dc5:passdef@dc.evomi.com:2000",
]

Reminder: Evomi datacenter proxies use ports 2000 (HTTP), 2001 (HTTPS), and 2002 (SOCKS5).

Step 3: Update the Fetching Function

Modify the function to accept a URL and a specific proxy from the list for each call. We'll also add error handling specific to proxy connections.

# Updated async function to fetch data from a specific URL using a specific proxy
async def fetch_page_data(session, page_url, proxy):
    # Log proxy host, hide credentials
    proxy_host = "N/A"
    if proxy:
        try:
            # Attempt to extract host safely
            proxy_host = proxy.split('@')[-1].split(':')[0]
        except IndexError:
            proxy_host = "Invalid Format" # Or handle as needed

    print(f"Fetching {page_url} using proxy {proxy_host}...")
    try:
        async with session.get(page_url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=15)) as response:
            response.raise_for_status() # Check for HTTP errors
            html_content = await response.text()

            # --- Extraction Logic ---
            # Replace this with more robust parsing (e.g., BeautifulSoup)
            product_lines = [line.strip() for line in html_content.splitlines() if 'class="title"' in line]
            product_names = [line for line in product_lines if 'href' in line]
            # --- End Extraction Logic ---

            print(f"Successfully fetched {page_url}")
            return product_names

    # Handle proxy-specific connection errors
    except aiohttp.ClientProxyConnectionError as e:
        print(f"Proxy Connection Error for {proxy_host}: {e}")
        return None # Indicate failure for this proxy/page
    # Handle other potential client errors (timeout, DNS issues, etc.)
    except aiohttp.ClientError as e:
        print(f"Client Error fetching {page_url} via {proxy_host}: {e}")
        return None
    # Handle timeouts specifically
    except asyncio.TimeoutError:
        print(f"Timeout fetching {page_url} via {proxy_host}")
        return None
  • The function now takes page_url and proxy as arguments.

  • random.choice(proxy_list) will be used in the main loop to pick a proxy.

  • We added specific error handling for aiohttp.ClientProxyConnectionError.

  • A timeout (e.g., 15 seconds) is added to prevent hanging on unresponsive proxies/servers.

  • The extraction logic remains basic; consider using BeautifulSoup for real-world scenarios:

    # Example with BeautifulSoup (install first: pip install beautifulsoup4)
    # from bs4 import BeautifulSoup
    # soup = BeautifulSoup(html_content, 'html.parser')
    # titles = [a['title'] for a in soup.select('a.title')]
    # return titles

Step 4: Update the Main Function

The main function will now manage the loop for multiple pages, select proxies randomly, and gather results.

# Updated main function for rotating proxies and multiple pages
async def main():
    base_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    # Let's try to scrape the first 3 pages (assuming pagination exists or modify URLs accordingly)
    # NOTE: This site might not have simple pagination; adjust target URLs as needed.
    # For this example, we'll just fetch the same page multiple times with different proxies.
    num_requests = 5  # Make 5 requests in total
    target_urls = [base_url] * num_requests  # Re-use the base URL for demo purposes
    all_results = {}  # Dictionary to store results per proxy

    async with aiohttp.ClientSession() as session:
        tasks = []
        for i in range(num_requests):
            selected_proxy = random.choice(proxy_list)
            page_url = target_urls[i]  # In a real case, this would be page_url_1, page_url_2 etc.

            # Create an asyncio task for each request
            task = asyncio.create_task(fetch_page_data(session, page_url, selected_proxy))
            tasks.append((selected_proxy, task))  # Store proxy with its task

        # Wait for all tasks to complete
        results = await asyncio.gather(*(task for _, task in tasks))

        # Process results
        print("\n--- Scraping Results ---")
        for i, (selected_proxy, _) in enumerate(tasks):
            proxy_host = selected_proxy.split('@')[-1]
            if results[i] is not None:
                print(f"Proxy {proxy_host}: Successfully fetched {len(results[i])} items.")
                if proxy_host not in all_results:
                    all_results[proxy_host] = []
                all_results[proxy_host].extend(results[i])  # Append results for this proxy
            else:
                print(f"Proxy {proxy_host}: Failed to fetch data.")

    # Optional: Print combined results per proxy
    print("\n--- Combined Data Per Proxy ---")
    for proxy_host, data in all_results.items():
        print(f"\nData from Proxy: {proxy_host}")
        # Print first few items as example
        for item in data[:3]:
            print(f"  - {item}")
        if len(data) > 3:
            print(f"  ... and {len(data) - 3} more items")

    print("\nClient session closed.")
  • We define the number of requests/pages to fetch.

  • random.choice(proxy_list) selects a proxy for each request.

  • asyncio.create_task() creates tasks for concurrent execution.

  • asyncio.gather() runs all tasks concurrently and collects their results.

  • The results are processed, showing which proxy fetched what data (or if it failed).

Step 5: Run the Updated Code

Use the standard asyncio entry point:

# Run the main async function
if __name__ == "__main__":
    if not proxy_list:
        print("Error: Proxy list is empty. Please add proxies.")
    else:
        print("Starting multi-proxy scraper...")
        asyncio.run(main())
        print("Scraper finished.")

Step 6: The Complete Rotating Proxy Script

Here is the full code combining these changes:

import aiohttp
import asyncio
import random

# import re # Uncomment if using regex
# from bs4 import BeautifulSoup # Uncomment if using BeautifulSoup

# List of Evomi Datacenter Proxies (replace with your actual proxies)
proxy_list = [
    "http://user-dc1:pass123@dc.evomi.com:2000",
    "http://user-dc2:pass456@dc.evomi.com:2000",
    "http://user-dc3:pass789@dc.evomi.com:2000",
    "http://user-dc4:passabc@dc.evomi.com:2000",
    "http://user-dc5:passdef@dc.evomi.com:2000",
]


# Updated async function to fetch data from a specific URL using a specific proxy
async def fetch_page_data(session, page_url, proxy):
    proxy_host_for_log = proxy.split('@')[-1] if '@' in proxy else proxy  # Log proxy host, hide credentials
    print(f"Fetching {page_url} using proxy {proxy_host_for_log}...")
    try:
        # Increased timeout
        async with session.get(
            page_url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=20)
        ) as response:
            response.raise_for_status()  # Check for HTTP errors
            html_content = await response.text()

            # --- Basic Extraction Logic ---
            product_lines = [
                line.strip()
                for line in html_content.splitlines()
                if 'class="title"' in line
            ]
            product_names = [line for line in product_lines if 'href' in line]
            # --- End Extraction Logic ---

            print(f"Successfully fetched {page_url} via {proxy_host_for_log}")
            return product_names

    except aiohttp.ClientProxyConnectionError as e:
        print(f"Proxy Connection Error for {proxy_host_for_log}: {e}")
        return None
    except aiohttp.ClientError as e:
        print(f"Client Error fetching {page_url} via {proxy_host_for_log}: {e}")
        return None
    except asyncio.TimeoutError:
        print(f"Timeout fetching {page_url} via {proxy_host_for_log}")
        return None


# Updated main function for rotating proxies and multiple pages
async def main():
    base_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    num_requests = 5
    target_urls = [base_url] * num_requests
    all_results = {}

    # Use a TCPConnector to limit concurrent connections if needed
    # connector = aiohttp.TCPConnector(limit=10) # Limit to 10 concurrent connections per host
    # async with aiohttp.ClientSession(connector=connector) as session:
    async with aiohttp.ClientSession() as session:  # Default connector
        tasks = []
        # Select proxies for this run
        proxies_in_use = random.sample(proxy_list, min(num_requests, len(proxy_list)))

        for i in range(num_requests):
            # Cycle through the selected proxies if num_requests > len(proxies_in_use)
            selected_proxy = proxies_in_use[i % len(proxies_in_use)]
            page_url = target_urls[i]

            task = asyncio.create_task(
                fetch_page_data(session, page_url, selected_proxy)
            )
            tasks.append((selected_proxy, task))

        # Capture exceptions too
        results = await asyncio.gather(*(task for _, task in tasks), return_exceptions=True)

        print("\n--- Scraping Results ---")
        successful_fetches = 0
        failed_fetches = 0

        for i, (selected_proxy, _) in enumerate(tasks):
            proxy_host = selected_proxy.split('@')[-1] if '@' in selected_proxy else selected_proxy
            if isinstance(results[i], Exception):
                print(f"Proxy {proxy_host}: Task failed with exception: {results[i]}")
                failed_fetches += 1
            elif results[i] is not None:
                print(f"Proxy {proxy_host}: Successfully fetched {len(results[i])} items.")
                if proxy_host not in all_results:
                    all_results[proxy_host] = []
                all_results[proxy_host].extend(results[i])
                successful_fetches += 1
            else:
                # This case might happen if fetch_page_data returned None without an exception
                print(f"Proxy {proxy_host}: Failed to fetch data (returned None).")
                failed_fetches += 1

        print(f"\nSummary: {successful_fetches} successful fetches, {failed_fetches} failed fetches.")

    # Optional: Print combined results per proxy
    print("\n--- Combined Data Per Proxy (Sample) ---")
    for proxy_host, data in all_results.items():
        print(f"\nData from Proxy: {proxy_host} ({len(data)} total items)")
        for item in data[:3]:
            print(f"  - {item}")
        if len(data) > 3:
            print(f"  ... and {len(data) - 3} more items")

    print("\nClient session closed.")


# Run the main async function
if __name__ == "__main__":
    if not proxy_list:
        print("Error: Proxy list is empty. Please add proxies.")
    else:
        print(f"Starting multi-proxy scraper with {len(proxy_list)} proxies...")
        asyncio.run(main())
        print("Scraper finished.")

When you run this script, you'll see output indicating which proxy is being used for each request. The final summary will show the data collected via each proxy IP, demonstrating successful rotation. This approach significantly enhances the resilience of your scraper against IP-based blocks.

Handling Proxy Authentication

Our examples already incorporate proxy authentication directly within the proxy URL string:

http://USERNAME:PASSWORD@HOSTNAME:PORT

aiohttp automatically parses this format. When you pass a URL like "http://user-dc1:pass123@dc.evomi.com:2000" to the proxy parameter in session.get(), aiohttp handles the necessary Proxy-Authorization header for Basic Authentication.

Alternatively, if your proxy provider requires or if you prefer separating credentials, you can use aiohttp.BasicAuth:

proxy_url_no_auth = "http://dc.evomi.com:2000"
auth = aiohttp.BasicAuth("user-dc1", "pass123")

# ... inside your async function ...
async with session.get(target_url, proxy=proxy_url_no_auth, proxy_auth=auth) as response:
    # ... rest of the code

Both methods achieve the same result. Using the embedded format is often more convenient when managing lists of proxies.

Securing Connections with SSL

When scraping sites over HTTPS or handling sensitive data, ensuring your connection is encrypted via SSL/TLS is vital. aiohttp handles SSL verification by default when connecting to HTTPS URLs.

Our examples used HTTP URLs (http://...). If you target HTTPS sites (https://...), aiohttp will automatically attempt an SSL handshake. By default, it verifies the server's SSL certificate against a trusted set of Certificate Authorities (CAs), usually provided by the certifi library.

You generally don't need to manually configure SSL unless:

  1. You need to trust a self-signed certificate (common in testing environments).

  2. You want to disable SSL verification (strongly discouraged for production as it opens you to man-in-the-middle attacks).

  3. You need to specify a particular set of CAs.

To customize SSL behavior, you create an ssl.SSLContext:

import ssl

# Create a default SSL context (recommended starting point)
ssl_context = ssl.create_default_context()

# Example: Load custom CA bundle (if needed)
# ssl_context.load_verify_locations(cafile='/path/to/custom/ca.crt')

# Example: Disable verification (DANGEROUS - for testing only)
# ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
# ssl_context.check_hostname = False
# ssl_context.verify_mode = ssl.CERT_NONE

# Pass the context to the session.get call
async with session.get(https_url, proxy=proxy, ssl=ssl_context) as response:
    # ...

For most scraping tasks involving standard HTTPS websites, the default SSL handling in aiohttp is sufficient and secure.

Best Practices for Aiohttp Proxy Usage

You've now got the technical skills to integrate and rotate proxies with aiohttp. To maximize effectiveness and minimize disruptions, consider these best practices:

Tips for Staying Under the Radar

  • Choose Quality Proxies: Not all proxies are equal. Opt for reputable providers like Evomi, known for reliable and ethically sourced residential or mobile proxies. These blend in better with normal user traffic compared to datacenter IPs, especially on stricter websites. Our Swiss base also reflects a commitment to quality and privacy. You can always verify proxy performance using tools like our free Proxy Tester.

  • Implement Smart Rotation: Don't just use multiple proxies; rotate them intelligently. Avoid hitting the same domain repeatedly with the same IP in a short period. The random rotation shown earlier is a good start. For large-scale scraping, consider session-based rotation (keeping one IP for a user's "session" on a site) or geographic targeting if needed.

  • Mimic Human Behavior: Automation is fast, humans aren't always. Introduce random delays between requests to avoid predictable, machine-like patterns.

    import time # Inside your loop or before making a request
    sleep_time = random.uniform(1.5, 4.5) # Wait 1.5 to 4.5 seconds
    print(f"Sleeping for {sleep_time:.2f} seconds...")
    await asyncio.sleep(sleep_time) # Now make the request...
  • Manage Headers and Fingerprints: Send realistic User-Agent strings and other HTTP headers that match common browsers. Be aware of browser fingerprinting techniques websites might use. Tools like Evomi's Browser Fingerprint Checker can show what sites see, and our antidetect browser, Evomium (free for customers), is designed to manage these fingerprints effectively.

  • Respect robots.txt: While not technically related to anonymity, respecting a site's robots.txt file (which outlines scraping rules) is good practice and can prevent legal or ethical issues.

  • Use SSL/TLS Correctly: Always use HTTPS where available and ensure SSL verification is enabled unless you have a very specific, understood reason to disable it.

  • Handle CAPTCHAs Gracefully: If you encounter CAPTCHAs, integrate a solving service. Don't just give up or hammer the site. Check out options in our review of top CAPTCHA solvers.

Common Problems and Fixes

Even with best practices, you might hit snags. Here are common aiohttp proxy-related errors and how to approach them:

  • aiohttp.ClientProxyConnectionError: This usually means your script couldn't even reach the proxy server.

    • Check Proxy Details: Double-check the IP/hostname, port, username, and password. Typos are common!

    • Verify Proxy Status: Is the proxy online and working? Use a tool like Evomi's Proxy Tester or simple curl -x http://user:pass@proxy:port http://example.com from your terminal.

    • Firewall Issues: Ensure no local or network firewall is blocking the connection to the proxy port.

  • aiohttp.ClientHttpProxyError / Status Code 407 Proxy Authentication Required: The connection to the proxy worked, but authentication failed.

    • Check Credentials: Verify the username and password again.

    • Authentication Format: Ensure you're using the correct authentication method (Basic Auth is common, handled by the URL format or aiohttp.BasicAuth). Check your provider's documentation.

    • IP Authorization: Some providers require you to authorize the IP address *from which* you are connecting to the proxy. Check your Evomi dashboard or provider's settings.

  • aiohttp.ClientHttpProxyError / Other 4xx/5xx Status Codes from Proxy: The proxy responded, but with an error (e.g., 403 Forbidden, 502 Bad Gateway).

    • Proxy Restrictions: The proxy itself might be blocked from accessing the target site, or it might have internal issues. Try a different proxy from your pool.

    • Provider Issue: There might be a temporary problem with the proxy service. Check the provider's status page or contact support.

  • asyncio.TimeoutError or aiohttp.ServerTimeoutError: The request took too long.

    • Increase Timeout: The default timeout might be too short for slow proxies or target sites. Increase it in aiohttp.ClientTimeout(total=...) passed to the request or session.

    • Proxy Performance: The specific proxy might be slow or overloaded. Rotate to a different one.

    • Target Server Slow: The website you're scraping might be slow to respond.

  • aiohttp.ClientSSLError: An issue occurred during the SSL handshake with the *target* server (when using HTTPS).

    • Outdated Certificates: Ensure your system's CA certificates (often managed by `certifi`) are up to date (`pip install --upgrade certifi`).

    • Server Configuration: The target website might have an invalid or misconfigured SSL certificate. You might need to investigate further or, as a last resort (and if you understand the risks), customize the SSL context to be less strict (see SSL section above).

    • Proxy Interference (Less Common): Some proxies (especially transparent ones, not typically used for scraping this way) might interfere with SSL. Ensure you're using appropriate HTTP/S or SOCKS proxies designed for this.

Wrapping Up

Hopefully, this guide provides a solid foundation for using `aiohttp` with proxies for your web scraping projects. The asynchronous nature of `aiohttp` offers significant performance benefits, while proxies provide the necessary means to scrape responsibly and avoid interruptions. Remember that successful scraping often involves combining the right tools (`aiohttp`, quality proxies like those from Evomi) with smart strategies (rotation, delays, header management).

Python's ecosystem offers many tools for web scraping beyond `aiohttp`. To explore other options, take a look at our overview of the best Python web scraping libraries.

Diving into Asynchronous Web Scraping with Aiohttp and Proxies

Extracting data from the web, or web scraping, is a cornerstone technique for everything from tracking e-commerce prices to aggregating news feeds or monitoring financial markets. It's about using code to fetch information automatically and efficiently. If you're venturing into building your own scraper, getting comfortable with some coding is essential.

Python is a fantastic choice for this, largely thanks to its straightforward syntax and powerful libraries. When it comes to making HTTP requests, Python offers several popular options, including aiohttp, httpx, and the classic requests library. Each has its own strengths, which you can explore further in our comparison of httpx vs aiohttp vs requests.

This guide focuses on aiohttp, an asynchronous library. What does "asynchronous" mean here? It means aiohttp can juggle multiple web requests simultaneously without getting stuck waiting for each one to finish. This makes it incredibly efficient for tasks requiring many concurrent connections. However, to truly harness this power without running into roadblocks like IP bans, integrating proxies is crucial. Proxies help mask your origin and distribute your requests, making your scraping activities less likely to be flagged by target websites.

Here’s what we'll cover:

  • The basics of how aiohttp achieves concurrency.

  • Getting aiohttp set up on your machine.

  • Integrating proxies (like Evomi's residential proxies) with aiohttp.

  • Smart strategies for using proxies with aiohttp effectively.

Let's get started!

What is Aiohttp Anyway?

Think about needing to grab real-time data from several different online sources, perhaps fetching currency exchange rates from multiple financial websites at once. With traditional, synchronous libraries (like the standard `requests` library), your program would make a request, wait for the response, process it, then move to the next request, one by one. This sequential process can become a significant bottleneck, especially when dealing with many slow or unresponsive servers.

Aiohttp tackles this differently using Python's asyncio framework. It allows your program to initiate a request and then immediately move on to other tasks (like starting another request) without waiting for the first one to complete. When a response arrives, aiohttp handles it. This non-blocking approach means you can manage numerous HTTP operations concurrently, drastically speeding up I/O-bound tasks like web scraping.

Setting Up Aiohttp with Proxies

So, aiohttp lets you fire off requests rapidly. That's great for speed, but it also increases the chances of overwhelming a target server or triggering its defenses. Websites often monitor incoming traffic, and a sudden flood of requests from a single IP address is a classic sign of automated scraping. This can lead to rate limiting (slowing you down), CAPTCHAs, or outright IP blocks.

This is where proxies become indispensable. By routing your aiohttp requests through proxy servers, you change the source IP address for each request or group of requests. For high-volume scraping, simple datacenter proxies might not be enough. Rotating residential proxies, which use IP addresses assigned by ISPs to real home users, are often the gold standard. They provide a high degree of anonymity and legitimacy, making it harder for websites to distinguish your scraper from genuine user traffic. Evomi offers ethically sourced residential proxies starting at just $0.49/GB, perfect for these kinds of tasks.

Let's translate this into practical code.

What You'll Need

Before diving into the code, ensure you have:

  • Python 3.7 or a newer version installed.

  • Access to proxy servers. For robust scraping, consider rotating residential proxies. Evomi provides these, along with mobile and datacenter options, and even offers a completely free trial to test them out.

Got everything? Great, let's proceed.

Installing Aiohttp

Getting aiohttp is simple using pip, Python's package installer. Open your terminal or command prompt and type:

To verify the installation, you can check the installed package details:

This command should display information about the installed aiohttp library, including its version.

A Basic Script Using Aiohttp with a Proxy

Now that the setup is complete, let's write a simple Python script. We'll use `aiohttp` to fetch product names from a test e-commerce site, routing the request through an Evomi residential proxy. For this example, we'll target a test scraping site.

Step 1: Import Libraries

First, we need to import the necessary Python libraries: aiohttp for the web requests and asyncio to run our asynchronous code.

import aiohttp
import asyncio

Step 2: Configure Your Proxy

Define the proxy server details. We'll use Evomi's residential proxy endpoint format. Remember to replace placeholders with your actual credentials.

# Evomi Residential Proxy Configuration (replace with your details)
# Format: http://username:password@hostname:port
proxy_url = "http://user-xyz:pass123@rp.evomi.com:1000"

# If your proxy doesn't require auth embedded in the URL,
# you might use BasicAuth like this (adjust accordingly):
# proxy_url = "http://rp.evomi.com:1000"
# proxy_auth = aiohttp.BasicAuth('user-xyz', 'pass123')

# For this example, we embed auth in the proxy_url.

Note: Evomi offers different ports for HTTP (1000), HTTPS (1001), and SOCKS5 (1002) for residential proxies. Ensure you use the correct one for your needs.

Step 3: Write the Async Fetching Function

Let's create the asynchronous function that performs the actual web request via the proxy.

# Async function to fetch product names
async def fetch_product_names(session):
    target_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    print(f"Attempting to fetch data from: {target_url}")

    # Use the proxy configured earlier
    try:
        async with session.get(target_url, proxy=proxy_url) as response:
            # Check if the request was successful
            response.raise_for_status() # Raises an exception for bad status codes (4xx or 5xx)

            html_content = await response.text()

            # Basic extraction: find lines containing product titles (usually in links within description class)
            # This is a simplified approach; a real scraper would use libraries like BeautifulSoup
            product_lines = [line.strip() for line in html_content.splitlines() if 'class="title"' in line]

            # Further clean-up might be needed depending on HTML structure
            product_names = [line for line in product_lines if 'href' in line] # Filter for lines likely containing titles

            print("Successfully fetched data.")
            return product_names
    except aiohttp.ClientError as e:
        print(f"An error occurred: {e}")
        return [] # Return empty list on error
  • async def fetch_product_names(session): Defines an asynchronous function. async signals it can perform non-blocking operations.

  • target_url: The web page we want to scrape.

  • session.get(target_url, proxy=proxy_url): Makes an HTTP GET request using the provided session, directing it through our configured proxy_url.

  • response.raise_for_status(): A good practice to check if the request succeeded (status code 2xx).

  • await response.text(): Asynchronously gets the response body as text. await pauses this function until the text is ready, allowing other tasks to run.

  • Extracting Names: This example uses basic string searching ('class="title"') to find lines likely containing product titles. For robust scraping, libraries like Beautiful Soup or lxml are recommended for parsing HTML.

  • Error Handling: The try...except block catches potential connection or HTTP errors.

Step 4: Create the Main Async Function

This function orchestrates the process: it creates the aiohttp session and calls our fetching function.

# Main async function to manage the process
async def main():
    # Create a client session to manage connections
    async with aiohttp.ClientSession() as session:
        print("Client session started.")
        product_titles = await fetch_product_names(session)

        if product_titles:
            print("\n--- Extracted Product Title Lines ---")
            for title_line in product_titles:
                # Print the raw line containing the title - further parsing needed for clean titles
                print(title_line)
            print("------------------------------------")
        else:
            print("No product titles extracted.")

        print("Client session closed.")
  • async def main(): The main entry point for our async operations.

  • aiohttp.ClientSession(): Creates a session object. Using a session is efficient as it can reuse connections and manage cookies.

  • await fetch_product_names(session): Calls our fetching function and waits for it to complete.

  • Printing Results: Loops through the returned list and prints the lines identified as potentially containing titles.

Step 5: Run the Async Code

Finally, use asyncio.run() to execute the main function.

# Entry point to run the main async function
if __name__ == "__main__":
    print("Starting scraper...")
    asyncio.run(main())
    print("Scraper finished.")

Step 6: Putting It All Together

Here’s the complete script:

import aiohttp
import asyncio

# Evomi Residential Proxy Configuration (replace with your details)
# Format: http://username:password@hostname:port
proxy_url = "http://user-xyz:pass123@rp.evomi.com:1000"

# Async function to fetch product names
async def fetch_product_names(session):
    target_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    print(f"Attempting to fetch data from: {target_url} via proxy {proxy_url.split('@')[-1]}") # Hide credentials in log
    try:
        # Use the proxy configured earlier
        async with session.get(target_url, proxy=proxy_url) as response:
            # Check if the request was successful
            response.raise_for_status() # Raises an exception for bad status codes (4xx or 5xx)

            html_content = await response.text()

            # Basic extraction: find lines containing product titles (usually in links within description class)
            product_lines = [line.strip() for line in html_content.splitlines() if 'class="title"' in line]
            product_names = [line for line in product_lines if 'href' in line] # Filter for lines likely containing titles

            print(f"Successfully fetched data from {target_url}.")
            return product_names
    except aiohttp.ClientError as e:
        print(f"An error occurred while fetching {target_url}: {e}")
        return [] # Return empty list on error

# Main async function to manage the process
async def main():
    # Create a client session to manage connections
    async with aiohttp.ClientSession() as session:
        print("Client session started.")
        product_titles = await fetch_product_names(session)

        if product_titles:
            print("\n--- Extracted Product Title Lines ---")
            for title_line in product_titles:
                print(title_line)
            print("------------------------------------")
        else:
            print("No product titles extracted.")

        print("Client session closed.")

# Entry point to run the main async function
if __name__ == "__main__":
    print("Starting scraper...")
    asyncio.run(main())
    print("Scraper finished.")

If everything runs correctly, the script will connect through the specified Evomi proxy and print the HTML lines containing the product titles from the target page. This basic example demonstrates proxy integration; next, we'll explore more advanced techniques like rotation.

Advanced Aiohttp Proxy Strategies

The simple script works, but for serious scraping, relying on a single proxy IP isn't ideal. Let's enhance our script to handle multiple proxies and rotate them, significantly improving robustness and reducing the likelihood of blocks.

Managing and Rotating Multiple Proxies

We'll modify the code to use a list of proxies and select one randomly for each request. This distributes the load and makes the scraping pattern less predictable. We'll aim to scrape product names from the first few pages of the laptop category on our test site.

Step 1: Import Additional Libraries

We'll need the random library for selecting proxies and potentially re (regular expressions) or a parsing library like BeautifulSoup (recommended, but we'll stick to basic string methods for simplicity here, install with pip install beautifulsoup4 if you want to use it) for better data extraction.

import aiohttp
import asyncio
import random
# import re # If using regex for extraction
# from bs4 import BeautifulSoup # If using BeautifulSoup for parsing

Step 2: Define Your Proxy List

Create a list containing your proxy connection strings. We'll use Evomi's datacenter proxy endpoint format as an example. Datacenter proxies (starting at $0.30/GB with Evomi) can be cost-effective for some tasks, though residential might be needed for stricter sites.

# List of Evomi Datacenter Proxies (replace with your actual proxies)
# Format: http://username:password@hostname:port
proxy_list = [
    "http://user-dc1:pass123@dc.evomi.com:2000",
    "http://user-dc2:pass456@dc.evomi.com:2000",
    "http://user-dc3:pass789@dc.evomi.com:2000",
    "http://user-dc4:passabc@dc.evomi.com:2000",
    "http://user-dc5:passdef@dc.evomi.com:2000",
]

Reminder: Evomi datacenter proxies use ports 2000 (HTTP), 2001 (HTTPS), and 2002 (SOCKS5).

Step 3: Update the Fetching Function

Modify the function to accept a URL and a specific proxy from the list for each call. We'll also add error handling specific to proxy connections.

# Updated async function to fetch data from a specific URL using a specific proxy
async def fetch_page_data(session, page_url, proxy):
    # Log proxy host, hide credentials
    proxy_host = "N/A"
    if proxy:
        try:
            # Attempt to extract host safely
            proxy_host = proxy.split('@')[-1].split(':')[0]
        except IndexError:
            proxy_host = "Invalid Format" # Or handle as needed

    print(f"Fetching {page_url} using proxy {proxy_host}...")
    try:
        async with session.get(page_url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=15)) as response:
            response.raise_for_status() # Check for HTTP errors
            html_content = await response.text()

            # --- Extraction Logic ---
            # Replace this with more robust parsing (e.g., BeautifulSoup)
            product_lines = [line.strip() for line in html_content.splitlines() if 'class="title"' in line]
            product_names = [line for line in product_lines if 'href' in line]
            # --- End Extraction Logic ---

            print(f"Successfully fetched {page_url}")
            return product_names

    # Handle proxy-specific connection errors
    except aiohttp.ClientProxyConnectionError as e:
        print(f"Proxy Connection Error for {proxy_host}: {e}")
        return None # Indicate failure for this proxy/page
    # Handle other potential client errors (timeout, DNS issues, etc.)
    except aiohttp.ClientError as e:
        print(f"Client Error fetching {page_url} via {proxy_host}: {e}")
        return None
    # Handle timeouts specifically
    except asyncio.TimeoutError:
        print(f"Timeout fetching {page_url} via {proxy_host}")
        return None
  • The function now takes page_url and proxy as arguments.

  • random.choice(proxy_list) will be used in the main loop to pick a proxy.

  • We added specific error handling for aiohttp.ClientProxyConnectionError.

  • A timeout (e.g., 15 seconds) is added to prevent hanging on unresponsive proxies/servers.

  • The extraction logic remains basic; consider using BeautifulSoup for real-world scenarios:

    # Example with BeautifulSoup (install first: pip install beautifulsoup4)
    # from bs4 import BeautifulSoup
    # soup = BeautifulSoup(html_content, 'html.parser')
    # titles = [a['title'] for a in soup.select('a.title')]
    # return titles

Step 4: Update the Main Function

The main function will now manage the loop for multiple pages, select proxies randomly, and gather results.

# Updated main function for rotating proxies and multiple pages
async def main():
    base_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    # Let's try to scrape the first 3 pages (assuming pagination exists or modify URLs accordingly)
    # NOTE: This site might not have simple pagination; adjust target URLs as needed.
    # For this example, we'll just fetch the same page multiple times with different proxies.
    num_requests = 5  # Make 5 requests in total
    target_urls = [base_url] * num_requests  # Re-use the base URL for demo purposes
    all_results = {}  # Dictionary to store results per proxy

    async with aiohttp.ClientSession() as session:
        tasks = []
        for i in range(num_requests):
            selected_proxy = random.choice(proxy_list)
            page_url = target_urls[i]  # In a real case, this would be page_url_1, page_url_2 etc.

            # Create an asyncio task for each request
            task = asyncio.create_task(fetch_page_data(session, page_url, selected_proxy))
            tasks.append((selected_proxy, task))  # Store proxy with its task

        # Wait for all tasks to complete
        results = await asyncio.gather(*(task for _, task in tasks))

        # Process results
        print("\n--- Scraping Results ---")
        for i, (selected_proxy, _) in enumerate(tasks):
            proxy_host = selected_proxy.split('@')[-1]
            if results[i] is not None:
                print(f"Proxy {proxy_host}: Successfully fetched {len(results[i])} items.")
                if proxy_host not in all_results:
                    all_results[proxy_host] = []
                all_results[proxy_host].extend(results[i])  # Append results for this proxy
            else:
                print(f"Proxy {proxy_host}: Failed to fetch data.")

    # Optional: Print combined results per proxy
    print("\n--- Combined Data Per Proxy ---")
    for proxy_host, data in all_results.items():
        print(f"\nData from Proxy: {proxy_host}")
        # Print first few items as example
        for item in data[:3]:
            print(f"  - {item}")
        if len(data) > 3:
            print(f"  ... and {len(data) - 3} more items")

    print("\nClient session closed.")
  • We define the number of requests/pages to fetch.

  • random.choice(proxy_list) selects a proxy for each request.

  • asyncio.create_task() creates tasks for concurrent execution.

  • asyncio.gather() runs all tasks concurrently and collects their results.

  • The results are processed, showing which proxy fetched what data (or if it failed).

Step 5: Run the Updated Code

Use the standard asyncio entry point:

# Run the main async function
if __name__ == "__main__":
    if not proxy_list:
        print("Error: Proxy list is empty. Please add proxies.")
    else:
        print("Starting multi-proxy scraper...")
        asyncio.run(main())
        print("Scraper finished.")

Step 6: The Complete Rotating Proxy Script

Here is the full code combining these changes:

import aiohttp
import asyncio
import random

# import re # Uncomment if using regex
# from bs4 import BeautifulSoup # Uncomment if using BeautifulSoup

# List of Evomi Datacenter Proxies (replace with your actual proxies)
proxy_list = [
    "http://user-dc1:pass123@dc.evomi.com:2000",
    "http://user-dc2:pass456@dc.evomi.com:2000",
    "http://user-dc3:pass789@dc.evomi.com:2000",
    "http://user-dc4:passabc@dc.evomi.com:2000",
    "http://user-dc5:passdef@dc.evomi.com:2000",
]


# Updated async function to fetch data from a specific URL using a specific proxy
async def fetch_page_data(session, page_url, proxy):
    proxy_host_for_log = proxy.split('@')[-1] if '@' in proxy else proxy  # Log proxy host, hide credentials
    print(f"Fetching {page_url} using proxy {proxy_host_for_log}...")
    try:
        # Increased timeout
        async with session.get(
            page_url, proxy=proxy, timeout=aiohttp.ClientTimeout(total=20)
        ) as response:
            response.raise_for_status()  # Check for HTTP errors
            html_content = await response.text()

            # --- Basic Extraction Logic ---
            product_lines = [
                line.strip()
                for line in html_content.splitlines()
                if 'class="title"' in line
            ]
            product_names = [line for line in product_lines if 'href' in line]
            # --- End Extraction Logic ---

            print(f"Successfully fetched {page_url} via {proxy_host_for_log}")
            return product_names

    except aiohttp.ClientProxyConnectionError as e:
        print(f"Proxy Connection Error for {proxy_host_for_log}: {e}")
        return None
    except aiohttp.ClientError as e:
        print(f"Client Error fetching {page_url} via {proxy_host_for_log}: {e}")
        return None
    except asyncio.TimeoutError:
        print(f"Timeout fetching {page_url} via {proxy_host_for_log}")
        return None


# Updated main function for rotating proxies and multiple pages
async def main():
    base_url = "http://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"
    num_requests = 5
    target_urls = [base_url] * num_requests
    all_results = {}

    # Use a TCPConnector to limit concurrent connections if needed
    # connector = aiohttp.TCPConnector(limit=10) # Limit to 10 concurrent connections per host
    # async with aiohttp.ClientSession(connector=connector) as session:
    async with aiohttp.ClientSession() as session:  # Default connector
        tasks = []
        # Select proxies for this run
        proxies_in_use = random.sample(proxy_list, min(num_requests, len(proxy_list)))

        for i in range(num_requests):
            # Cycle through the selected proxies if num_requests > len(proxies_in_use)
            selected_proxy = proxies_in_use[i % len(proxies_in_use)]
            page_url = target_urls[i]

            task = asyncio.create_task(
                fetch_page_data(session, page_url, selected_proxy)
            )
            tasks.append((selected_proxy, task))

        # Capture exceptions too
        results = await asyncio.gather(*(task for _, task in tasks), return_exceptions=True)

        print("\n--- Scraping Results ---")
        successful_fetches = 0
        failed_fetches = 0

        for i, (selected_proxy, _) in enumerate(tasks):
            proxy_host = selected_proxy.split('@')[-1] if '@' in selected_proxy else selected_proxy
            if isinstance(results[i], Exception):
                print(f"Proxy {proxy_host}: Task failed with exception: {results[i]}")
                failed_fetches += 1
            elif results[i] is not None:
                print(f"Proxy {proxy_host}: Successfully fetched {len(results[i])} items.")
                if proxy_host not in all_results:
                    all_results[proxy_host] = []
                all_results[proxy_host].extend(results[i])
                successful_fetches += 1
            else:
                # This case might happen if fetch_page_data returned None without an exception
                print(f"Proxy {proxy_host}: Failed to fetch data (returned None).")
                failed_fetches += 1

        print(f"\nSummary: {successful_fetches} successful fetches, {failed_fetches} failed fetches.")

    # Optional: Print combined results per proxy
    print("\n--- Combined Data Per Proxy (Sample) ---")
    for proxy_host, data in all_results.items():
        print(f"\nData from Proxy: {proxy_host} ({len(data)} total items)")
        for item in data[:3]:
            print(f"  - {item}")
        if len(data) > 3:
            print(f"  ... and {len(data) - 3} more items")

    print("\nClient session closed.")


# Run the main async function
if __name__ == "__main__":
    if not proxy_list:
        print("Error: Proxy list is empty. Please add proxies.")
    else:
        print(f"Starting multi-proxy scraper with {len(proxy_list)} proxies...")
        asyncio.run(main())
        print("Scraper finished.")

When you run this script, you'll see output indicating which proxy is being used for each request. The final summary will show the data collected via each proxy IP, demonstrating successful rotation. This approach significantly enhances the resilience of your scraper against IP-based blocks.

Handling Proxy Authentication

Our examples already incorporate proxy authentication directly within the proxy URL string:

http://USERNAME:PASSWORD@HOSTNAME:PORT

aiohttp automatically parses this format. When you pass a URL like "http://user-dc1:pass123@dc.evomi.com:2000" to the proxy parameter in session.get(), aiohttp handles the necessary Proxy-Authorization header for Basic Authentication.

Alternatively, if your proxy provider requires or if you prefer separating credentials, you can use aiohttp.BasicAuth:

proxy_url_no_auth = "http://dc.evomi.com:2000"
auth = aiohttp.BasicAuth("user-dc1", "pass123")

# ... inside your async function ...
async with session.get(target_url, proxy=proxy_url_no_auth, proxy_auth=auth) as response:
    # ... rest of the code

Both methods achieve the same result. Using the embedded format is often more convenient when managing lists of proxies.

Securing Connections with SSL

When scraping sites over HTTPS or handling sensitive data, ensuring your connection is encrypted via SSL/TLS is vital. aiohttp handles SSL verification by default when connecting to HTTPS URLs.

Our examples used HTTP URLs (http://...). If you target HTTPS sites (https://...), aiohttp will automatically attempt an SSL handshake. By default, it verifies the server's SSL certificate against a trusted set of Certificate Authorities (CAs), usually provided by the certifi library.

You generally don't need to manually configure SSL unless:

  1. You need to trust a self-signed certificate (common in testing environments).

  2. You want to disable SSL verification (strongly discouraged for production as it opens you to man-in-the-middle attacks).

  3. You need to specify a particular set of CAs.

To customize SSL behavior, you create an ssl.SSLContext:

import ssl

# Create a default SSL context (recommended starting point)
ssl_context = ssl.create_default_context()

# Example: Load custom CA bundle (if needed)
# ssl_context.load_verify_locations(cafile='/path/to/custom/ca.crt')

# Example: Disable verification (DANGEROUS - for testing only)
# ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
# ssl_context.check_hostname = False
# ssl_context.verify_mode = ssl.CERT_NONE

# Pass the context to the session.get call
async with session.get(https_url, proxy=proxy, ssl=ssl_context) as response:
    # ...

For most scraping tasks involving standard HTTPS websites, the default SSL handling in aiohttp is sufficient and secure.

Best Practices for Aiohttp Proxy Usage

You've now got the technical skills to integrate and rotate proxies with aiohttp. To maximize effectiveness and minimize disruptions, consider these best practices:

Tips for Staying Under the Radar

  • Choose Quality Proxies: Not all proxies are equal. Opt for reputable providers like Evomi, known for reliable and ethically sourced residential or mobile proxies. These blend in better with normal user traffic compared to datacenter IPs, especially on stricter websites. Our Swiss base also reflects a commitment to quality and privacy. You can always verify proxy performance using tools like our free Proxy Tester.

  • Implement Smart Rotation: Don't just use multiple proxies; rotate them intelligently. Avoid hitting the same domain repeatedly with the same IP in a short period. The random rotation shown earlier is a good start. For large-scale scraping, consider session-based rotation (keeping one IP for a user's "session" on a site) or geographic targeting if needed.

  • Mimic Human Behavior: Automation is fast, humans aren't always. Introduce random delays between requests to avoid predictable, machine-like patterns.

    import time # Inside your loop or before making a request
    sleep_time = random.uniform(1.5, 4.5) # Wait 1.5 to 4.5 seconds
    print(f"Sleeping for {sleep_time:.2f} seconds...")
    await asyncio.sleep(sleep_time) # Now make the request...
  • Manage Headers and Fingerprints: Send realistic User-Agent strings and other HTTP headers that match common browsers. Be aware of browser fingerprinting techniques websites might use. Tools like Evomi's Browser Fingerprint Checker can show what sites see, and our antidetect browser, Evomium (free for customers), is designed to manage these fingerprints effectively.

  • Respect robots.txt: While not technically related to anonymity, respecting a site's robots.txt file (which outlines scraping rules) is good practice and can prevent legal or ethical issues.

  • Use SSL/TLS Correctly: Always use HTTPS where available and ensure SSL verification is enabled unless you have a very specific, understood reason to disable it.

  • Handle CAPTCHAs Gracefully: If you encounter CAPTCHAs, integrate a solving service. Don't just give up or hammer the site. Check out options in our review of top CAPTCHA solvers.

Common Problems and Fixes

Even with best practices, you might hit snags. Here are common aiohttp proxy-related errors and how to approach them:

  • aiohttp.ClientProxyConnectionError: This usually means your script couldn't even reach the proxy server.

    • Check Proxy Details: Double-check the IP/hostname, port, username, and password. Typos are common!

    • Verify Proxy Status: Is the proxy online and working? Use a tool like Evomi's Proxy Tester or simple curl -x http://user:pass@proxy:port http://example.com from your terminal.

    • Firewall Issues: Ensure no local or network firewall is blocking the connection to the proxy port.

  • aiohttp.ClientHttpProxyError / Status Code 407 Proxy Authentication Required: The connection to the proxy worked, but authentication failed.

    • Check Credentials: Verify the username and password again.

    • Authentication Format: Ensure you're using the correct authentication method (Basic Auth is common, handled by the URL format or aiohttp.BasicAuth). Check your provider's documentation.

    • IP Authorization: Some providers require you to authorize the IP address *from which* you are connecting to the proxy. Check your Evomi dashboard or provider's settings.

  • aiohttp.ClientHttpProxyError / Other 4xx/5xx Status Codes from Proxy: The proxy responded, but with an error (e.g., 403 Forbidden, 502 Bad Gateway).

    • Proxy Restrictions: The proxy itself might be blocked from accessing the target site, or it might have internal issues. Try a different proxy from your pool.

    • Provider Issue: There might be a temporary problem with the proxy service. Check the provider's status page or contact support.

  • asyncio.TimeoutError or aiohttp.ServerTimeoutError: The request took too long.

    • Increase Timeout: The default timeout might be too short for slow proxies or target sites. Increase it in aiohttp.ClientTimeout(total=...) passed to the request or session.

    • Proxy Performance: The specific proxy might be slow or overloaded. Rotate to a different one.

    • Target Server Slow: The website you're scraping might be slow to respond.

  • aiohttp.ClientSSLError: An issue occurred during the SSL handshake with the *target* server (when using HTTPS).

    • Outdated Certificates: Ensure your system's CA certificates (often managed by `certifi`) are up to date (`pip install --upgrade certifi`).

    • Server Configuration: The target website might have an invalid or misconfigured SSL certificate. You might need to investigate further or, as a last resort (and if you understand the risks), customize the SSL context to be less strict (see SSL section above).

    • Proxy Interference (Less Common): Some proxies (especially transparent ones, not typically used for scraping this way) might interfere with SSL. Ensure you're using appropriate HTTP/S or SOCKS proxies designed for this.

Wrapping Up

Hopefully, this guide provides a solid foundation for using `aiohttp` with proxies for your web scraping projects. The asynchronous nature of `aiohttp` offers significant performance benefits, while proxies provide the necessary means to scrape responsibly and avoid interruptions. Remember that successful scraping often involves combining the right tools (`aiohttp`, quality proxies like those from Evomi) with smart strategies (rotation, delays, header management).

Python's ecosystem offers many tools for web scraping beyond `aiohttp`. To explore other options, take a look at our overview of the best Python web scraping libraries.

Author

David Foster

Proxy & Network Security Analyst

About Author

David is an expert in network security, web scraping, and proxy technologies, helping businesses optimize data extraction while maintaining privacy and efficiency. With a deep understanding of residential, datacenter, and rotating proxies, he explores how proxies enhance cybersecurity, bypass geo-restrictions, and power large-scale web scraping. David’s insights help businesses and developers choose the right proxy solutions for SEO monitoring, competitive intelligence, and anonymous browsing.

Like this article? Share it.
You asked, we answer - Users questions:
How can I control the number of concurrent requests aiohttp makes when using proxies to avoid overwhelming my system or the target server?+
What's a robust strategy for handling proxy failures within an aiohttp scraping loop?+
When should I choose SOCKS5 proxies over HTTP/HTTPS proxies for my aiohttp scraper?+
Can I maintain cookies and session data across multiple aiohttp requests when using rotating proxies?+
Besides proxy rotation and delays, what other aiohttp settings can help optimize scraping performance and reliability?+

In This Article