Scraping Amazon Data at Scale: Proxy-Based Solutions

David Foster

Last edited on May 4, 2025
Last edited on May 4, 2025

Scraping Techniques

The Scalability Hurdle in Web Scraping

So, you've written some slick code to grab data from a website. It works perfectly for one page, maybe ten. But what happens when you need data from thousands, or even hundreds of thousands, of pages? Scaling up is often where web scraping projects hit a wall. It's arguably one of the trickiest parts of the whole process.

The question that inevitably pops up is: "How do I stop my IP address from getting blocked?" Sending too many requests too quickly from the same IP is a surefire way to get flagged and denied access.

Thankfully, there's a well-established and highly effective solution that's also relatively straightforward to implement: using proxies.

Why Proxies are Your Best Friend for Amazon Scraping

Think of a proxy server as an intermediary. It sits between your computer and the website you're trying to access (like Amazon). When you send a request, it goes to the proxy first. The proxy then forwards your request to Amazon using its own IP address, receives the response, and sends it back to you.

This simple mechanism is incredibly powerful for scaling. By routing your requests through different proxy IPs, you avoid overwhelming Amazon's servers from a single source. This dramatically reduces the chances of getting blocked and allows you to gather data much faster and more reliably. Proxies offer a practical and cost-effective way to overcome the IP ban challenge when scraping at scale.

Getting Started: A Practical Example

Let's dive into a concrete example. A common reason to scrape Amazon is for competitor analysis – checking prices, product descriptions, ratings, etc. We can achieve this with a bit of Python code.

First, we'll need a couple of popular Python libraries. If you don't have them installed, you can usually install them using pip (e.g., pip install requests beautifulsoup4 lxml).

# Import necessary libraries
import requests
from bs4 import BeautifulSoup
import csv  # We'll use this later to handle product IDs

A crucial step often missed by beginners is using a requests.Session object. Why? Sessions allow you to persist certain parameters across requests, like cookies and headers. More importantly for performance, they reuse the underlying network connection (TCP connection), making multiple requests faster than establishing a new connection each time. We'll also use this session object to configure our proxy settings.

Integrating Proxies Seamlessly

Here’s where Evomi proxies come into play. We can add our proxy details directly to the requests.Session object. This way, every request made using this session will automatically be routed through the proxy.

Evomi provides you with proxy credentials typically in a format like username:password@endpoint:port. For instance, if you're using our rotating residential proxies (which are excellent for tasks like this as they automatically change the IP address), the setup in Python looks something like this:

# Basic structure with session setup
import requests
from bs4 import BeautifulSoup


# Placeholder functions we'll define later
def load_asins_from_csv(filepath):
    pass


def fetch_product_page(session, asin):
    pass


def extract_product_info(html_content, asin):
    pass


def run_scraper():
    # Initialize the session
    session = requests.Session()

    # Set standard headers to mimic a real browser
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Upgrade-Insecure-Requests': '1'
    })

    # Configure Evomi proxies (replace with your actual credentials)
    # Using residential proxies (rp.evomi.com) via HTTP (port 1000) as an example
    proxy_user = 'YOUR_USERNAME'
    proxy_pass = 'YOUR_PASSWORD'
    proxy_host = 'rp.evomi.com'
    proxy_port = '1000'  # HTTP port for residential
    proxies = {
        'http': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}',
        'https': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}'  # Often the same for HTTP/S setup
    }
    session.proxies.update(proxies)

    print("Session configured with headers and proxies:")
    # print(session.headers) # Uncomment to verify headers
    # print(session.proxies) # Uncomment to verify proxies

    # --- Add scraping logic here ---


# Entry point for the script
if __name__ == "__main__":
    run_scraper()

Note the headers like User-Agent. These are important! Websites like Amazon often check headers to filter out basic bots. Using realistic headers increases your chances of success. We've configured an Evomi residential proxy here; these IPs rotate automatically, giving you a fresh identity for different requests, which is ideal for avoiding detection.

Handling Amazon Product Identifiers (ASINs)

To target specific products on Amazon, we use their unique identifier: the ASIN (Amazon Standard Identification Number). You can usually find it in the product's URL or the 'Product Details' section on the page.

Diagram showing where to find an Amazon ASIN on a product page

For scraping multiple products, you'll likely have a list of ASINs, perhaps in a CSV file. Here's a simple function to read ASINs from a CSV (assuming one ASIN per row in the first column):

import csv

# Function to load ASINs from a CSV file
def load_asins_from_csv(filepath):
    asin_list = []
    try:
        with open(filepath, mode='r', newline='', encoding='utf-8') as csvfile:
            reader = csv.reader(csvfile)
            # Skip header row if present (optional)
            # next(reader, None)
            for row in reader:
                if row: # Ensure row is not empty
                    asin_list.append(row[0].strip())
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
    except Exception as e:
        print(f"An error occurred reading the CSV: {e}")
    return asin_list

Next, we need a function to make the actual request using our configured session. We pass the session object and the ASIN to construct the target URL.

# Function to fetch the product page HTML
def fetch_product_page(session, asin):
    # Construct the URL for the Amazon product page (using amazon.com)
    product_url = f"https://www.amazon.com/dp/{asin}"
    try:
        response = session.get(product_url, timeout=15) # Added timeout
        response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
        return response.text, asin # Return HTML content and ASIN
    except requests.exceptions.RequestException as e:
        print(f"Request failed for ASIN {asin}: {e}")
        return None, asin

Returning the HTML content along with the ASIN helps keep track of which data belongs to which product, especially if errors occur.

Extracting the Data You Need

Once we have the HTML content, we need to parse it to extract the specific pieces of information we want (like product title and price). BeautifulSoup combined with CSS selectors is a great way to do this. CSS selectors provide a concise way to pinpoint elements on a web page.

# Function to parse HTML and extract product data
def extract_product_info(html_content, asin):
    if not html_content:
        return None
    soup = BeautifulSoup(html_content, 'lxml') # Using lxml parser
    product_data = {'asin': asin, 'title': None, 'price': None}
    try:
        # Extract product title (selector might need adjustment based on page structure)
        title_element = soup.select_one('span#productTitle')
        if title_element:
            product_data['title'] = title_element.get_text(strip=True)

        # Extract price (this selector often works, but can vary)
        # It looks for common price patterns like elements with class 'a-offscreen'
        # or specific price block elements.
        price_element = soup.select_one('span.a-price > span.a-offscreen')
        if not price_element: # Try alternative common selector
            price_element = soup.select_one('span#priceblock_ourprice')
        if not price_element: # Another alternative
            price_element = soup.select_one('span#price_inside_buybox')

        if price_element:
            product_data['price'] = price_element.get_text(strip=True)
    except Exception as e:
        print(f"Error parsing data for ASIN {asin}: {e}")

    # Basic validation: only return data if title and price were found
    if product_data['title'] and product_data['price']:
        return product_data
    else:
        print(f"Could not extract complete data for ASIN {asin}")
        return None # Return None if essential data is missing

We store the extracted data in a Python dictionary. It's good practice to include some error handling (like the try...except block) because website structures can change, or elements might be missing on certain pages.

Putting It All Together and Testing

Now, let's update our main execution function (run_scraper) to tie everything together. We'll load the ASINs, loop through them, fetch each page, parse the data, and print the results.

Crucially, always test your scraper on a *small* number of ASINs first! Don't immediately unleash it on thousands. This helps you catch errors in your selectors or logic without wasting resources or potentially getting blocked during debugging.

# Updated main function to run the scraper
def run_scraper():
    # ... (Session and proxy setup code from earlier) ...
    session = requests.Session()
    # ... (Add headers and proxy config here) ...
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Upgrade-Insecure-Requests': '1'
    })
    proxy_user = 'YOUR_USERNAME'
    proxy_pass = 'YOUR_PASSWORD'
    proxy_host = 'rp.evomi.com'
    proxy_port = '1000'
    proxies = {
        'http': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}',
        'https': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}'
    }
    session.proxies.update(proxies)

    # --- Scraping Logic ---
    asin_file = 'asins_to_scrape.csv' # Name of your CSV file
    asins_to_process = load_asins_from_csv(asin_file)

    if not asins_to_process:
        print("No ASINs loaded. Exiting.")
        return

    print(f"Loaded {len(asins_to_process)} ASINs. Starting scraping...")
    results = []
    for asin in asins_to_process:
        print(f"Processing ASIN: {asin}")
        html_content, fetched_asin = fetch_product_page(session, asin)
        if html_content:
            product_info = extract_product_info(html_content, fetched_asin)
            if product_info:
                print(f"Successfully extracted: {product_info}")
                results.append(product_info)
            else:
                print(f"Failed to extract data for ASIN: {asin}")
        else:
            print(f"Failed to fetch page for ASIN: {asin}")

        # Optional: Add a small delay between requests to be polite
        # import time
        # time.sleep(1) # Sleep for 1 second

    print("\nScraping complete.")
    print(f"Successfully extracted data for {len(results)} products.")
    # Here you would typically save 'results' to a file (CSV, JSON, database, etc.)
    # print(results)

# Entry point
if __name__ == "__main__":
    run_scraper()

Create a file named asins_to_scrape.csv in the same directory as your script, and add a few test ASINs, one per line (e.g., B081FGTPB7, B07VGRJDFY).

Complete Example Code

Here is the full Python script incorporating all the parts discussed. This provides a solid foundation for scraping Amazon product data using Evomi proxies.

import requests
from bs4 import BeautifulSoup
import csv
import time # Optional: for adding delays


# Function to load ASINs from a CSV file
def load_asins_from_csv(filepath):
    asin_list = []
    try:
        with open(filepath, mode='r', newline='', encoding='utf-8') as csvfile:
            reader = csv.reader(csvfile)
            for row in reader:
                if row: # Ensure row is not empty
                    asin_list.append(row[0].strip())
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
    except Exception as e:
        print(f"An error occurred reading the CSV: {e}")
    return asin_list


# Function to fetch the product page HTML
def fetch_product_page(session, asin):
    product_url = f"https://www.amazon.com/dp/{asin}"
    try:
        response = session.get(product_url, timeout=15)
        response.raise_for_status() # Check for HTTP errors
        print(f"Request successful for {asin} (Status: {response.status_code})")
        return response.text, asin
    except requests.exceptions.Timeout:
        print(f"Request timed out for ASIN {asin}")
        return None, asin
    except requests.exceptions.HTTPError as e:
        print(f"HTTP error for ASIN {asin}: {e.response.status_code}")
        return None, asin
    except requests.exceptions.RequestException as e:
        print(f"Request failed for ASIN {asin}: {e}")
        return None, asin


# Function to parse HTML and extract product data
def extract_product_info(html_content, asin):
    if not html_content:
        return None

    soup = BeautifulSoup(html_content, 'lxml')
    product_data = {'asin': asin, 'title': None, 'price': None}

    try:
        title_element = soup.select_one('span#productTitle')
        if title_element:
            product_data['title'] = title_element.get_text(strip=True)

        # Try common price selectors sequentially
        price_element = soup.select_one('span.a-price > span.a-offscreen')
        if not price_element:
            price_element = soup.select_one('span#priceblock_ourprice') # Older layout?
        if not price_element:
            price_element = soup.select_one('span#price_inside_buybox') # Inside buy box?
        # Add more selectors here if needed based on page variations

        if price_element:
            product_data['price'] = price_element.get_text(strip=True)
        else:
             # If no price found, try getting text from a broader price container
             price_container = soup.select_one('div#corePrice_feature_div span.a-price-whole')
             if price_container:
                 price_fraction = soup.select_one('div#corePrice_feature_div span.a-price-fraction')
                 currency_symbol = soup.select_one('div#corePrice_feature_div span.a-price-symbol')
                 whole = price_container.get_text(strip=True)
                 fraction = price_fraction.get_text(strip=True) if price_fraction else '00'
                 symbol = currency_symbol.get_text(strip=True) if currency_symbol else '$' # Default symbol
                 product_data['price'] = f"{symbol}{whole}.{fraction}"

    except Exception as e:
        print(f"Error parsing data for ASIN {asin}: {e}")

    if product_data['title'] and product_data['price']:
        return product_data
    else:
        missing = []
        if not product_data['title']:
            missing.append("title")
        if not product_data['price']:
            missing.append("price")
        print(f"Could not extract ({', '.join(missing)}) for ASIN {asin}")
        return None


# Main execution function
def run_scraper():
    # --- Evomi Proxy Configuration ---
    # Replace with your actual Evomi credentials and desired proxy type/port
    proxy_user = 'YOUR_USERNAME'
    proxy_pass = 'YOUR_PASSWORD'
    proxy_host = 'rp.evomi.com'  # Example: Residential endpoint
    proxy_port = '1000'        # Example: HTTP port for residential

    proxies = {
        'http': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}',
        'https': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}'
    }

    # --- Session Setup ---
    session = requests.Session()
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Upgrade-Insecure-Requests': '1',
        'Referer': 'https://www.google.com/' # Add a referer
    })
    session.proxies.update(proxies)
    print("Session configured. Proxies enabled.")

    # --- Scraping Logic ---
    asin_file = 'asins_to_scrape.csv' # Your input file
    asins_to_process = load_asins_from_csv(asin_file)

    if not asins_to_process:
        print("No ASINs loaded or file not found. Exiting.")
        return

    print(f"Loaded {len(asins_to_process)} ASINs. Starting scraping...")
    results = []
    processed_count = 0
    for asin in asins_to_process:
        processed_count += 1
        print(f"\n[{processed_count}/{len(asins_to_process)}] Processing ASIN: {asin}")
        html_content, fetched_asin = fetch_product_page(session, asin)
        if html_content:
            product_info = extract_product_info(html_content, fetched_asin)
            if product_info:
                print(f"--> Success: Extracted {product_info['title']} - {product_info['price']}")
                results.append(product_info)
            # else: (Error message already printed in extract_product_info)
        # else: (Error message already printed in fetch_product_page)

        # Optional delay between requests
        # import random # Import random if using the delay
        # time.sleep(random.uniform(1, 3)) # Random delay between 1-3 seconds

    # --- Output Results ---
    print("\n--------------------")
    print("Scraping complete.")
    print(f"Successfully extracted data for {len(results)} out of {len(asins_to_process)} products.")
    print("--------------------")

    # Example: Save results to a new CSV file
    if results:
        output_file = 'amazon_product_data.csv'
        try:
            with open(output_file, mode='w', newline='', encoding='utf-8') as outfile:
                writer = csv.DictWriter(outfile, fieldnames=['asin', 'title', 'price'])
                writer.writeheader()
                writer.writerows(results)
            print(f"Results saved to {output_file}")
        except Exception as e:
            print(f"Error writing results to CSV: {e}")
    else:
        print("No data extracted to save.")


# Script entry point
if __name__ == "__main__":
    run_scraper()

Final Thoughts on Scaling Your Scraping

Incorporating proxies into your web scraping toolkit is a fundamental step for scaling operations and navigating around IP blocks. As demonstrated, integrating Evomi's proxies, particularly our residential proxies which offer automatic rotation, is quite straightforward within a Python script using the requests library.

This method significantly boosts the robustness of your Amazon data gathering efforts. Remember that successful scraping isn't just about code; it's also about using the right tools. Evomi provides ethically sourced, reliable proxies (backed by Swiss quality standards) at competitive price points, ensuring your scraping projects run smoothly and effectively. Give your scraping project the edge it needs!

The Scalability Hurdle in Web Scraping

So, you've written some slick code to grab data from a website. It works perfectly for one page, maybe ten. But what happens when you need data from thousands, or even hundreds of thousands, of pages? Scaling up is often where web scraping projects hit a wall. It's arguably one of the trickiest parts of the whole process.

The question that inevitably pops up is: "How do I stop my IP address from getting blocked?" Sending too many requests too quickly from the same IP is a surefire way to get flagged and denied access.

Thankfully, there's a well-established and highly effective solution that's also relatively straightforward to implement: using proxies.

Why Proxies are Your Best Friend for Amazon Scraping

Think of a proxy server as an intermediary. It sits between your computer and the website you're trying to access (like Amazon). When you send a request, it goes to the proxy first. The proxy then forwards your request to Amazon using its own IP address, receives the response, and sends it back to you.

This simple mechanism is incredibly powerful for scaling. By routing your requests through different proxy IPs, you avoid overwhelming Amazon's servers from a single source. This dramatically reduces the chances of getting blocked and allows you to gather data much faster and more reliably. Proxies offer a practical and cost-effective way to overcome the IP ban challenge when scraping at scale.

Getting Started: A Practical Example

Let's dive into a concrete example. A common reason to scrape Amazon is for competitor analysis – checking prices, product descriptions, ratings, etc. We can achieve this with a bit of Python code.

First, we'll need a couple of popular Python libraries. If you don't have them installed, you can usually install them using pip (e.g., pip install requests beautifulsoup4 lxml).

# Import necessary libraries
import requests
from bs4 import BeautifulSoup
import csv  # We'll use this later to handle product IDs

A crucial step often missed by beginners is using a requests.Session object. Why? Sessions allow you to persist certain parameters across requests, like cookies and headers. More importantly for performance, they reuse the underlying network connection (TCP connection), making multiple requests faster than establishing a new connection each time. We'll also use this session object to configure our proxy settings.

Integrating Proxies Seamlessly

Here’s where Evomi proxies come into play. We can add our proxy details directly to the requests.Session object. This way, every request made using this session will automatically be routed through the proxy.

Evomi provides you with proxy credentials typically in a format like username:password@endpoint:port. For instance, if you're using our rotating residential proxies (which are excellent for tasks like this as they automatically change the IP address), the setup in Python looks something like this:

# Basic structure with session setup
import requests
from bs4 import BeautifulSoup


# Placeholder functions we'll define later
def load_asins_from_csv(filepath):
    pass


def fetch_product_page(session, asin):
    pass


def extract_product_info(html_content, asin):
    pass


def run_scraper():
    # Initialize the session
    session = requests.Session()

    # Set standard headers to mimic a real browser
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Upgrade-Insecure-Requests': '1'
    })

    # Configure Evomi proxies (replace with your actual credentials)
    # Using residential proxies (rp.evomi.com) via HTTP (port 1000) as an example
    proxy_user = 'YOUR_USERNAME'
    proxy_pass = 'YOUR_PASSWORD'
    proxy_host = 'rp.evomi.com'
    proxy_port = '1000'  # HTTP port for residential
    proxies = {
        'http': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}',
        'https': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}'  # Often the same for HTTP/S setup
    }
    session.proxies.update(proxies)

    print("Session configured with headers and proxies:")
    # print(session.headers) # Uncomment to verify headers
    # print(session.proxies) # Uncomment to verify proxies

    # --- Add scraping logic here ---


# Entry point for the script
if __name__ == "__main__":
    run_scraper()

Note the headers like User-Agent. These are important! Websites like Amazon often check headers to filter out basic bots. Using realistic headers increases your chances of success. We've configured an Evomi residential proxy here; these IPs rotate automatically, giving you a fresh identity for different requests, which is ideal for avoiding detection.

Handling Amazon Product Identifiers (ASINs)

To target specific products on Amazon, we use their unique identifier: the ASIN (Amazon Standard Identification Number). You can usually find it in the product's URL or the 'Product Details' section on the page.

Diagram showing where to find an Amazon ASIN on a product page

For scraping multiple products, you'll likely have a list of ASINs, perhaps in a CSV file. Here's a simple function to read ASINs from a CSV (assuming one ASIN per row in the first column):

import csv

# Function to load ASINs from a CSV file
def load_asins_from_csv(filepath):
    asin_list = []
    try:
        with open(filepath, mode='r', newline='', encoding='utf-8') as csvfile:
            reader = csv.reader(csvfile)
            # Skip header row if present (optional)
            # next(reader, None)
            for row in reader:
                if row: # Ensure row is not empty
                    asin_list.append(row[0].strip())
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
    except Exception as e:
        print(f"An error occurred reading the CSV: {e}")
    return asin_list

Next, we need a function to make the actual request using our configured session. We pass the session object and the ASIN to construct the target URL.

# Function to fetch the product page HTML
def fetch_product_page(session, asin):
    # Construct the URL for the Amazon product page (using amazon.com)
    product_url = f"https://www.amazon.com/dp/{asin}"
    try:
        response = session.get(product_url, timeout=15) # Added timeout
        response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
        return response.text, asin # Return HTML content and ASIN
    except requests.exceptions.RequestException as e:
        print(f"Request failed for ASIN {asin}: {e}")
        return None, asin

Returning the HTML content along with the ASIN helps keep track of which data belongs to which product, especially if errors occur.

Extracting the Data You Need

Once we have the HTML content, we need to parse it to extract the specific pieces of information we want (like product title and price). BeautifulSoup combined with CSS selectors is a great way to do this. CSS selectors provide a concise way to pinpoint elements on a web page.

# Function to parse HTML and extract product data
def extract_product_info(html_content, asin):
    if not html_content:
        return None
    soup = BeautifulSoup(html_content, 'lxml') # Using lxml parser
    product_data = {'asin': asin, 'title': None, 'price': None}
    try:
        # Extract product title (selector might need adjustment based on page structure)
        title_element = soup.select_one('span#productTitle')
        if title_element:
            product_data['title'] = title_element.get_text(strip=True)

        # Extract price (this selector often works, but can vary)
        # It looks for common price patterns like elements with class 'a-offscreen'
        # or specific price block elements.
        price_element = soup.select_one('span.a-price > span.a-offscreen')
        if not price_element: # Try alternative common selector
            price_element = soup.select_one('span#priceblock_ourprice')
        if not price_element: # Another alternative
            price_element = soup.select_one('span#price_inside_buybox')

        if price_element:
            product_data['price'] = price_element.get_text(strip=True)
    except Exception as e:
        print(f"Error parsing data for ASIN {asin}: {e}")

    # Basic validation: only return data if title and price were found
    if product_data['title'] and product_data['price']:
        return product_data
    else:
        print(f"Could not extract complete data for ASIN {asin}")
        return None # Return None if essential data is missing

We store the extracted data in a Python dictionary. It's good practice to include some error handling (like the try...except block) because website structures can change, or elements might be missing on certain pages.

Putting It All Together and Testing

Now, let's update our main execution function (run_scraper) to tie everything together. We'll load the ASINs, loop through them, fetch each page, parse the data, and print the results.

Crucially, always test your scraper on a *small* number of ASINs first! Don't immediately unleash it on thousands. This helps you catch errors in your selectors or logic without wasting resources or potentially getting blocked during debugging.

# Updated main function to run the scraper
def run_scraper():
    # ... (Session and proxy setup code from earlier) ...
    session = requests.Session()
    # ... (Add headers and proxy config here) ...
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Upgrade-Insecure-Requests': '1'
    })
    proxy_user = 'YOUR_USERNAME'
    proxy_pass = 'YOUR_PASSWORD'
    proxy_host = 'rp.evomi.com'
    proxy_port = '1000'
    proxies = {
        'http': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}',
        'https': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}'
    }
    session.proxies.update(proxies)

    # --- Scraping Logic ---
    asin_file = 'asins_to_scrape.csv' # Name of your CSV file
    asins_to_process = load_asins_from_csv(asin_file)

    if not asins_to_process:
        print("No ASINs loaded. Exiting.")
        return

    print(f"Loaded {len(asins_to_process)} ASINs. Starting scraping...")
    results = []
    for asin in asins_to_process:
        print(f"Processing ASIN: {asin}")
        html_content, fetched_asin = fetch_product_page(session, asin)
        if html_content:
            product_info = extract_product_info(html_content, fetched_asin)
            if product_info:
                print(f"Successfully extracted: {product_info}")
                results.append(product_info)
            else:
                print(f"Failed to extract data for ASIN: {asin}")
        else:
            print(f"Failed to fetch page for ASIN: {asin}")

        # Optional: Add a small delay between requests to be polite
        # import time
        # time.sleep(1) # Sleep for 1 second

    print("\nScraping complete.")
    print(f"Successfully extracted data for {len(results)} products.")
    # Here you would typically save 'results' to a file (CSV, JSON, database, etc.)
    # print(results)

# Entry point
if __name__ == "__main__":
    run_scraper()

Create a file named asins_to_scrape.csv in the same directory as your script, and add a few test ASINs, one per line (e.g., B081FGTPB7, B07VGRJDFY).

Complete Example Code

Here is the full Python script incorporating all the parts discussed. This provides a solid foundation for scraping Amazon product data using Evomi proxies.

import requests
from bs4 import BeautifulSoup
import csv
import time # Optional: for adding delays


# Function to load ASINs from a CSV file
def load_asins_from_csv(filepath):
    asin_list = []
    try:
        with open(filepath, mode='r', newline='', encoding='utf-8') as csvfile:
            reader = csv.reader(csvfile)
            for row in reader:
                if row: # Ensure row is not empty
                    asin_list.append(row[0].strip())
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
    except Exception as e:
        print(f"An error occurred reading the CSV: {e}")
    return asin_list


# Function to fetch the product page HTML
def fetch_product_page(session, asin):
    product_url = f"https://www.amazon.com/dp/{asin}"
    try:
        response = session.get(product_url, timeout=15)
        response.raise_for_status() # Check for HTTP errors
        print(f"Request successful for {asin} (Status: {response.status_code})")
        return response.text, asin
    except requests.exceptions.Timeout:
        print(f"Request timed out for ASIN {asin}")
        return None, asin
    except requests.exceptions.HTTPError as e:
        print(f"HTTP error for ASIN {asin}: {e.response.status_code}")
        return None, asin
    except requests.exceptions.RequestException as e:
        print(f"Request failed for ASIN {asin}: {e}")
        return None, asin


# Function to parse HTML and extract product data
def extract_product_info(html_content, asin):
    if not html_content:
        return None

    soup = BeautifulSoup(html_content, 'lxml')
    product_data = {'asin': asin, 'title': None, 'price': None}

    try:
        title_element = soup.select_one('span#productTitle')
        if title_element:
            product_data['title'] = title_element.get_text(strip=True)

        # Try common price selectors sequentially
        price_element = soup.select_one('span.a-price > span.a-offscreen')
        if not price_element:
            price_element = soup.select_one('span#priceblock_ourprice') # Older layout?
        if not price_element:
            price_element = soup.select_one('span#price_inside_buybox') # Inside buy box?
        # Add more selectors here if needed based on page variations

        if price_element:
            product_data['price'] = price_element.get_text(strip=True)
        else:
             # If no price found, try getting text from a broader price container
             price_container = soup.select_one('div#corePrice_feature_div span.a-price-whole')
             if price_container:
                 price_fraction = soup.select_one('div#corePrice_feature_div span.a-price-fraction')
                 currency_symbol = soup.select_one('div#corePrice_feature_div span.a-price-symbol')
                 whole = price_container.get_text(strip=True)
                 fraction = price_fraction.get_text(strip=True) if price_fraction else '00'
                 symbol = currency_symbol.get_text(strip=True) if currency_symbol else '$' # Default symbol
                 product_data['price'] = f"{symbol}{whole}.{fraction}"

    except Exception as e:
        print(f"Error parsing data for ASIN {asin}: {e}")

    if product_data['title'] and product_data['price']:
        return product_data
    else:
        missing = []
        if not product_data['title']:
            missing.append("title")
        if not product_data['price']:
            missing.append("price")
        print(f"Could not extract ({', '.join(missing)}) for ASIN {asin}")
        return None


# Main execution function
def run_scraper():
    # --- Evomi Proxy Configuration ---
    # Replace with your actual Evomi credentials and desired proxy type/port
    proxy_user = 'YOUR_USERNAME'
    proxy_pass = 'YOUR_PASSWORD'
    proxy_host = 'rp.evomi.com'  # Example: Residential endpoint
    proxy_port = '1000'        # Example: HTTP port for residential

    proxies = {
        'http': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}',
        'https': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}'
    }

    # --- Session Setup ---
    session = requests.Session()
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Upgrade-Insecure-Requests': '1',
        'Referer': 'https://www.google.com/' # Add a referer
    })
    session.proxies.update(proxies)
    print("Session configured. Proxies enabled.")

    # --- Scraping Logic ---
    asin_file = 'asins_to_scrape.csv' # Your input file
    asins_to_process = load_asins_from_csv(asin_file)

    if not asins_to_process:
        print("No ASINs loaded or file not found. Exiting.")
        return

    print(f"Loaded {len(asins_to_process)} ASINs. Starting scraping...")
    results = []
    processed_count = 0
    for asin in asins_to_process:
        processed_count += 1
        print(f"\n[{processed_count}/{len(asins_to_process)}] Processing ASIN: {asin}")
        html_content, fetched_asin = fetch_product_page(session, asin)
        if html_content:
            product_info = extract_product_info(html_content, fetched_asin)
            if product_info:
                print(f"--> Success: Extracted {product_info['title']} - {product_info['price']}")
                results.append(product_info)
            # else: (Error message already printed in extract_product_info)
        # else: (Error message already printed in fetch_product_page)

        # Optional delay between requests
        # import random # Import random if using the delay
        # time.sleep(random.uniform(1, 3)) # Random delay between 1-3 seconds

    # --- Output Results ---
    print("\n--------------------")
    print("Scraping complete.")
    print(f"Successfully extracted data for {len(results)} out of {len(asins_to_process)} products.")
    print("--------------------")

    # Example: Save results to a new CSV file
    if results:
        output_file = 'amazon_product_data.csv'
        try:
            with open(output_file, mode='w', newline='', encoding='utf-8') as outfile:
                writer = csv.DictWriter(outfile, fieldnames=['asin', 'title', 'price'])
                writer.writeheader()
                writer.writerows(results)
            print(f"Results saved to {output_file}")
        except Exception as e:
            print(f"Error writing results to CSV: {e}")
    else:
        print("No data extracted to save.")


# Script entry point
if __name__ == "__main__":
    run_scraper()

Final Thoughts on Scaling Your Scraping

Incorporating proxies into your web scraping toolkit is a fundamental step for scaling operations and navigating around IP blocks. As demonstrated, integrating Evomi's proxies, particularly our residential proxies which offer automatic rotation, is quite straightforward within a Python script using the requests library.

This method significantly boosts the robustness of your Amazon data gathering efforts. Remember that successful scraping isn't just about code; it's also about using the right tools. Evomi provides ethically sourced, reliable proxies (backed by Swiss quality standards) at competitive price points, ensuring your scraping projects run smoothly and effectively. Give your scraping project the edge it needs!

The Scalability Hurdle in Web Scraping

So, you've written some slick code to grab data from a website. It works perfectly for one page, maybe ten. But what happens when you need data from thousands, or even hundreds of thousands, of pages? Scaling up is often where web scraping projects hit a wall. It's arguably one of the trickiest parts of the whole process.

The question that inevitably pops up is: "How do I stop my IP address from getting blocked?" Sending too many requests too quickly from the same IP is a surefire way to get flagged and denied access.

Thankfully, there's a well-established and highly effective solution that's also relatively straightforward to implement: using proxies.

Why Proxies are Your Best Friend for Amazon Scraping

Think of a proxy server as an intermediary. It sits between your computer and the website you're trying to access (like Amazon). When you send a request, it goes to the proxy first. The proxy then forwards your request to Amazon using its own IP address, receives the response, and sends it back to you.

This simple mechanism is incredibly powerful for scaling. By routing your requests through different proxy IPs, you avoid overwhelming Amazon's servers from a single source. This dramatically reduces the chances of getting blocked and allows you to gather data much faster and more reliably. Proxies offer a practical and cost-effective way to overcome the IP ban challenge when scraping at scale.

Getting Started: A Practical Example

Let's dive into a concrete example. A common reason to scrape Amazon is for competitor analysis – checking prices, product descriptions, ratings, etc. We can achieve this with a bit of Python code.

First, we'll need a couple of popular Python libraries. If you don't have them installed, you can usually install them using pip (e.g., pip install requests beautifulsoup4 lxml).

# Import necessary libraries
import requests
from bs4 import BeautifulSoup
import csv  # We'll use this later to handle product IDs

A crucial step often missed by beginners is using a requests.Session object. Why? Sessions allow you to persist certain parameters across requests, like cookies and headers. More importantly for performance, they reuse the underlying network connection (TCP connection), making multiple requests faster than establishing a new connection each time. We'll also use this session object to configure our proxy settings.

Integrating Proxies Seamlessly

Here’s where Evomi proxies come into play. We can add our proxy details directly to the requests.Session object. This way, every request made using this session will automatically be routed through the proxy.

Evomi provides you with proxy credentials typically in a format like username:password@endpoint:port. For instance, if you're using our rotating residential proxies (which are excellent for tasks like this as they automatically change the IP address), the setup in Python looks something like this:

# Basic structure with session setup
import requests
from bs4 import BeautifulSoup


# Placeholder functions we'll define later
def load_asins_from_csv(filepath):
    pass


def fetch_product_page(session, asin):
    pass


def extract_product_info(html_content, asin):
    pass


def run_scraper():
    # Initialize the session
    session = requests.Session()

    # Set standard headers to mimic a real browser
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Upgrade-Insecure-Requests': '1'
    })

    # Configure Evomi proxies (replace with your actual credentials)
    # Using residential proxies (rp.evomi.com) via HTTP (port 1000) as an example
    proxy_user = 'YOUR_USERNAME'
    proxy_pass = 'YOUR_PASSWORD'
    proxy_host = 'rp.evomi.com'
    proxy_port = '1000'  # HTTP port for residential
    proxies = {
        'http': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}',
        'https': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}'  # Often the same for HTTP/S setup
    }
    session.proxies.update(proxies)

    print("Session configured with headers and proxies:")
    # print(session.headers) # Uncomment to verify headers
    # print(session.proxies) # Uncomment to verify proxies

    # --- Add scraping logic here ---


# Entry point for the script
if __name__ == "__main__":
    run_scraper()

Note the headers like User-Agent. These are important! Websites like Amazon often check headers to filter out basic bots. Using realistic headers increases your chances of success. We've configured an Evomi residential proxy here; these IPs rotate automatically, giving you a fresh identity for different requests, which is ideal for avoiding detection.

Handling Amazon Product Identifiers (ASINs)

To target specific products on Amazon, we use their unique identifier: the ASIN (Amazon Standard Identification Number). You can usually find it in the product's URL or the 'Product Details' section on the page.

Diagram showing where to find an Amazon ASIN on a product page

For scraping multiple products, you'll likely have a list of ASINs, perhaps in a CSV file. Here's a simple function to read ASINs from a CSV (assuming one ASIN per row in the first column):

import csv

# Function to load ASINs from a CSV file
def load_asins_from_csv(filepath):
    asin_list = []
    try:
        with open(filepath, mode='r', newline='', encoding='utf-8') as csvfile:
            reader = csv.reader(csvfile)
            # Skip header row if present (optional)
            # next(reader, None)
            for row in reader:
                if row: # Ensure row is not empty
                    asin_list.append(row[0].strip())
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
    except Exception as e:
        print(f"An error occurred reading the CSV: {e}")
    return asin_list

Next, we need a function to make the actual request using our configured session. We pass the session object and the ASIN to construct the target URL.

# Function to fetch the product page HTML
def fetch_product_page(session, asin):
    # Construct the URL for the Amazon product page (using amazon.com)
    product_url = f"https://www.amazon.com/dp/{asin}"
    try:
        response = session.get(product_url, timeout=15) # Added timeout
        response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
        return response.text, asin # Return HTML content and ASIN
    except requests.exceptions.RequestException as e:
        print(f"Request failed for ASIN {asin}: {e}")
        return None, asin

Returning the HTML content along with the ASIN helps keep track of which data belongs to which product, especially if errors occur.

Extracting the Data You Need

Once we have the HTML content, we need to parse it to extract the specific pieces of information we want (like product title and price). BeautifulSoup combined with CSS selectors is a great way to do this. CSS selectors provide a concise way to pinpoint elements on a web page.

# Function to parse HTML and extract product data
def extract_product_info(html_content, asin):
    if not html_content:
        return None
    soup = BeautifulSoup(html_content, 'lxml') # Using lxml parser
    product_data = {'asin': asin, 'title': None, 'price': None}
    try:
        # Extract product title (selector might need adjustment based on page structure)
        title_element = soup.select_one('span#productTitle')
        if title_element:
            product_data['title'] = title_element.get_text(strip=True)

        # Extract price (this selector often works, but can vary)
        # It looks for common price patterns like elements with class 'a-offscreen'
        # or specific price block elements.
        price_element = soup.select_one('span.a-price > span.a-offscreen')
        if not price_element: # Try alternative common selector
            price_element = soup.select_one('span#priceblock_ourprice')
        if not price_element: # Another alternative
            price_element = soup.select_one('span#price_inside_buybox')

        if price_element:
            product_data['price'] = price_element.get_text(strip=True)
    except Exception as e:
        print(f"Error parsing data for ASIN {asin}: {e}")

    # Basic validation: only return data if title and price were found
    if product_data['title'] and product_data['price']:
        return product_data
    else:
        print(f"Could not extract complete data for ASIN {asin}")
        return None # Return None if essential data is missing

We store the extracted data in a Python dictionary. It's good practice to include some error handling (like the try...except block) because website structures can change, or elements might be missing on certain pages.

Putting It All Together and Testing

Now, let's update our main execution function (run_scraper) to tie everything together. We'll load the ASINs, loop through them, fetch each page, parse the data, and print the results.

Crucially, always test your scraper on a *small* number of ASINs first! Don't immediately unleash it on thousands. This helps you catch errors in your selectors or logic without wasting resources or potentially getting blocked during debugging.

# Updated main function to run the scraper
def run_scraper():
    # ... (Session and proxy setup code from earlier) ...
    session = requests.Session()
    # ... (Add headers and proxy config here) ...
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Upgrade-Insecure-Requests': '1'
    })
    proxy_user = 'YOUR_USERNAME'
    proxy_pass = 'YOUR_PASSWORD'
    proxy_host = 'rp.evomi.com'
    proxy_port = '1000'
    proxies = {
        'http': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}',
        'https': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}'
    }
    session.proxies.update(proxies)

    # --- Scraping Logic ---
    asin_file = 'asins_to_scrape.csv' # Name of your CSV file
    asins_to_process = load_asins_from_csv(asin_file)

    if not asins_to_process:
        print("No ASINs loaded. Exiting.")
        return

    print(f"Loaded {len(asins_to_process)} ASINs. Starting scraping...")
    results = []
    for asin in asins_to_process:
        print(f"Processing ASIN: {asin}")
        html_content, fetched_asin = fetch_product_page(session, asin)
        if html_content:
            product_info = extract_product_info(html_content, fetched_asin)
            if product_info:
                print(f"Successfully extracted: {product_info}")
                results.append(product_info)
            else:
                print(f"Failed to extract data for ASIN: {asin}")
        else:
            print(f"Failed to fetch page for ASIN: {asin}")

        # Optional: Add a small delay between requests to be polite
        # import time
        # time.sleep(1) # Sleep for 1 second

    print("\nScraping complete.")
    print(f"Successfully extracted data for {len(results)} products.")
    # Here you would typically save 'results' to a file (CSV, JSON, database, etc.)
    # print(results)

# Entry point
if __name__ == "__main__":
    run_scraper()

Create a file named asins_to_scrape.csv in the same directory as your script, and add a few test ASINs, one per line (e.g., B081FGTPB7, B07VGRJDFY).

Complete Example Code

Here is the full Python script incorporating all the parts discussed. This provides a solid foundation for scraping Amazon product data using Evomi proxies.

import requests
from bs4 import BeautifulSoup
import csv
import time # Optional: for adding delays


# Function to load ASINs from a CSV file
def load_asins_from_csv(filepath):
    asin_list = []
    try:
        with open(filepath, mode='r', newline='', encoding='utf-8') as csvfile:
            reader = csv.reader(csvfile)
            for row in reader:
                if row: # Ensure row is not empty
                    asin_list.append(row[0].strip())
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
    except Exception as e:
        print(f"An error occurred reading the CSV: {e}")
    return asin_list


# Function to fetch the product page HTML
def fetch_product_page(session, asin):
    product_url = f"https://www.amazon.com/dp/{asin}"
    try:
        response = session.get(product_url, timeout=15)
        response.raise_for_status() # Check for HTTP errors
        print(f"Request successful for {asin} (Status: {response.status_code})")
        return response.text, asin
    except requests.exceptions.Timeout:
        print(f"Request timed out for ASIN {asin}")
        return None, asin
    except requests.exceptions.HTTPError as e:
        print(f"HTTP error for ASIN {asin}: {e.response.status_code}")
        return None, asin
    except requests.exceptions.RequestException as e:
        print(f"Request failed for ASIN {asin}: {e}")
        return None, asin


# Function to parse HTML and extract product data
def extract_product_info(html_content, asin):
    if not html_content:
        return None

    soup = BeautifulSoup(html_content, 'lxml')
    product_data = {'asin': asin, 'title': None, 'price': None}

    try:
        title_element = soup.select_one('span#productTitle')
        if title_element:
            product_data['title'] = title_element.get_text(strip=True)

        # Try common price selectors sequentially
        price_element = soup.select_one('span.a-price > span.a-offscreen')
        if not price_element:
            price_element = soup.select_one('span#priceblock_ourprice') # Older layout?
        if not price_element:
            price_element = soup.select_one('span#price_inside_buybox') # Inside buy box?
        # Add more selectors here if needed based on page variations

        if price_element:
            product_data['price'] = price_element.get_text(strip=True)
        else:
             # If no price found, try getting text from a broader price container
             price_container = soup.select_one('div#corePrice_feature_div span.a-price-whole')
             if price_container:
                 price_fraction = soup.select_one('div#corePrice_feature_div span.a-price-fraction')
                 currency_symbol = soup.select_one('div#corePrice_feature_div span.a-price-symbol')
                 whole = price_container.get_text(strip=True)
                 fraction = price_fraction.get_text(strip=True) if price_fraction else '00'
                 symbol = currency_symbol.get_text(strip=True) if currency_symbol else '$' # Default symbol
                 product_data['price'] = f"{symbol}{whole}.{fraction}"

    except Exception as e:
        print(f"Error parsing data for ASIN {asin}: {e}")

    if product_data['title'] and product_data['price']:
        return product_data
    else:
        missing = []
        if not product_data['title']:
            missing.append("title")
        if not product_data['price']:
            missing.append("price")
        print(f"Could not extract ({', '.join(missing)}) for ASIN {asin}")
        return None


# Main execution function
def run_scraper():
    # --- Evomi Proxy Configuration ---
    # Replace with your actual Evomi credentials and desired proxy type/port
    proxy_user = 'YOUR_USERNAME'
    proxy_pass = 'YOUR_PASSWORD'
    proxy_host = 'rp.evomi.com'  # Example: Residential endpoint
    proxy_port = '1000'        # Example: HTTP port for residential

    proxies = {
        'http': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}',
        'https': f'http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}'
    }

    # --- Session Setup ---
    session = requests.Session()
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Upgrade-Insecure-Requests': '1',
        'Referer': 'https://www.google.com/' # Add a referer
    })
    session.proxies.update(proxies)
    print("Session configured. Proxies enabled.")

    # --- Scraping Logic ---
    asin_file = 'asins_to_scrape.csv' # Your input file
    asins_to_process = load_asins_from_csv(asin_file)

    if not asins_to_process:
        print("No ASINs loaded or file not found. Exiting.")
        return

    print(f"Loaded {len(asins_to_process)} ASINs. Starting scraping...")
    results = []
    processed_count = 0
    for asin in asins_to_process:
        processed_count += 1
        print(f"\n[{processed_count}/{len(asins_to_process)}] Processing ASIN: {asin}")
        html_content, fetched_asin = fetch_product_page(session, asin)
        if html_content:
            product_info = extract_product_info(html_content, fetched_asin)
            if product_info:
                print(f"--> Success: Extracted {product_info['title']} - {product_info['price']}")
                results.append(product_info)
            # else: (Error message already printed in extract_product_info)
        # else: (Error message already printed in fetch_product_page)

        # Optional delay between requests
        # import random # Import random if using the delay
        # time.sleep(random.uniform(1, 3)) # Random delay between 1-3 seconds

    # --- Output Results ---
    print("\n--------------------")
    print("Scraping complete.")
    print(f"Successfully extracted data for {len(results)} out of {len(asins_to_process)} products.")
    print("--------------------")

    # Example: Save results to a new CSV file
    if results:
        output_file = 'amazon_product_data.csv'
        try:
            with open(output_file, mode='w', newline='', encoding='utf-8') as outfile:
                writer = csv.DictWriter(outfile, fieldnames=['asin', 'title', 'price'])
                writer.writeheader()
                writer.writerows(results)
            print(f"Results saved to {output_file}")
        except Exception as e:
            print(f"Error writing results to CSV: {e}")
    else:
        print("No data extracted to save.")


# Script entry point
if __name__ == "__main__":
    run_scraper()

Final Thoughts on Scaling Your Scraping

Incorporating proxies into your web scraping toolkit is a fundamental step for scaling operations and navigating around IP blocks. As demonstrated, integrating Evomi's proxies, particularly our residential proxies which offer automatic rotation, is quite straightforward within a Python script using the requests library.

This method significantly boosts the robustness of your Amazon data gathering efforts. Remember that successful scraping isn't just about code; it's also about using the right tools. Evomi provides ethically sourced, reliable proxies (backed by Swiss quality standards) at competitive price points, ensuring your scraping projects run smoothly and effectively. Give your scraping project the edge it needs!

Author

David Foster

Proxy & Network Security Analyst

About Author

David is an expert in network security, web scraping, and proxy technologies, helping businesses optimize data extraction while maintaining privacy and efficiency. With a deep understanding of residential, datacenter, and rotating proxies, he explores how proxies enhance cybersecurity, bypass geo-restrictions, and power large-scale web scraping. David’s insights help businesses and developers choose the right proxy solutions for SEO monitoring, competitive intelligence, and anonymous browsing.

Like this article? Share it.
You asked, we answer - Users questions:
Is it legal to scrape data from Amazon product pages using proxies?+
Are rotating residential proxies the only type suitable for scraping Amazon, or can datacenter proxies work?+
What happens if Amazon presents a CAPTCHA challenge during scraping, and how can my script handle it?+
Amazon often changes its website layout. How can I maintain the CSS selectors used for data extraction?+
What if crucial product details like price or availability are loaded dynamically using JavaScript after the initial page load?+

In This Article

Read More Blogs