Scrape in Python with Undetected ChromeDriver & Proxies





Sarah Whitmore
Bypass Methods
Stealthy Web Scraping: Combining Python, Undetected ChromeDriver, and Proxies
Selenium is a heavyweight champion in the world of browser automation and web scraping. Typically, it works hand-in-glove with standard browser drivers, like the official ChromeDriver for Google Chrome. While effective, these standard drivers aren't exactly invisible. Savvy websites have developed methods to detect automation scripts using Selenium, often leading to blocks or CAPTCHAs.
The main giveaway? Standard drivers tend to leak specific automation-related properties. Enter Undetected ChromeDriver, a clever Python library designed specifically to patch these leaks. By modifying ChromeDriver, it significantly lowers the chances of your script being flagged as a bot.
Integrating Undetected ChromeDriver into your Selenium web scraping projects can be a game-changer. It often leads to more successful data gathering and can even cut down on costs by reducing the need for aggressive IP rotation (though proxies remain crucial!). Since it's an open-source library, adding Undetected ChromeDriver is a smart move for anyone serious about web scraping with Selenium.
Getting Started: Installation
Like any Python package, you'll need to install Undetected ChromeDriver first. Fire up your terminal or command prompt and run:
One handy thing: you don't need to install Selenium separately. Undetected ChromeDriver pulls it in as a dependency, along with other necessary bits.
Another neat feature: Undetected ChromeDriver automatically downloads a compatible ChromeDriver binary for you. This saves you the hassle of manually finding and matching driver versions, a common task in older Selenium setups.
With the installation complete, you can import the library into your Python script:
import undetected_chromedriver as uc
Using Undetected ChromeDriver: A Practical Guide
Making Basic Web Requests
Fetching a webpage (making a GET request) is fundamental to web scraping. With Undetected ChromeDriver, the process is very similar to standard Selenium, but under the hood, it's doing more to stay hidden.
import undetected_chromedriver as uc
import time
def fetch_webpage(target_url):
# Initialize the Undetected ChromeDriver
# Use 'options' argument if you need custom configurations later
browser = uc.Chrome()
try:
# Navigate to the desired URL
print(f"Attempting to load: {target_url}")
browser.get(target_url)
# Let's pause briefly to observe the browser (optional)
print("Page loaded. Pausing for 5 seconds...")
time.sleep(5) # Keep the browser open for a few seconds
print("Finished.")
finally:
# Ensure the browser is closed even if errors occur
browser.quit()
# Example: Let's try accessing a site known for bot detection
fetch_webpage('https://nowsecure.nl') # A site that tests browser fingerprinting
In this script, we define a function fetch_webpage
. It initializes an instance of the modified Chrome browser using uc.Chrome()
. Then, it navigates to the specified URL using the familiar get()
method.
We've added a time.sleep(5)
just so you can see the browser window open and load the page before it automatically closes. The try...finally
block ensures the browser closes properly, even if something goes wrong during the page load.
We're using nowsecure.nl
as an example target because it actively checks for bot-like browser properties. Standard Selenium might struggle here, making it a decent test for Undetected ChromeDriver's capabilities.
Capturing Website Source Code
Simply visiting a page isn't enough for scraping; you need the underlying HTML content. You can then feed this HTML to a parsing library like Beautiful Soup 4 to extract the data you need.
import undetected_chromedriver as uc
import time
def get_page_html(target_url):
# Initialize the Undetected ChromeDriver
browser = uc.Chrome()
html_source = None # Initialize variable to store HTML
try:
# Navigate to the URL
print(f"Fetching HTML from: {target_url}")
browser.get(target_url)
# Wait a moment for potential dynamic content loading (adjust as needed)
time.sleep(2)
# Get the page source HTML
html_source = browser.page_source
print("Successfully retrieved HTML source.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Close the browser
browser.quit()
# Return the captured HTML
return html_source
# Example usage:
page_content = get_page_html('https://nowsecure.nl')
if page_content:
# Print the first 500 characters as a sample
print("\n--- HTML Source (First 500 chars) ---")
print(page_content[:500])
print("...")
else:
print("Failed to retrieve page content.")
This version builds on the previous one. The key addition is html_source = browser.page_source
. This command grabs the full HTML source code of the currently loaded page and stores it in the html_source
variable. The function then returns this HTML after closing the browser.
Customizing Browser Behavior
Undetected ChromeDriver aims for stealth out-of-the-box, but sometimes you need to tweak its settings for specific scraping tasks or further optimization.
Keep in mind that altering default settings requires careful testing. Some options might inadvertently make your browser *more* detectable, while others might be necessary for specific scenarios (like running without a visible browser window).
import undetected_chromedriver as uc
import time
def fetch_with_options(target_url):
# Create a ChromeOptions object
options = uc.ChromeOptions()
# Example: Run in headless mode (no visible browser window)
# Note: Headless detection is sophisticated; test thoroughly!
# Use options.add_argument('--headless=new') for modern Chrome versions
options.add_argument('--headless=new')
# Example: Disable loading images (can speed up scraping)
# options.add_argument('--blink-settings=imagesEnabled=false')
# Initialize the driver with the specified options
browser = uc.Chrome(options=options)
html_source = None
try:
print(f"Fetching (with options) from: {target_url}")
browser.get(target_url)
time.sleep(2) # Allow time for page load in headless mode
html_source = browser.page_source
print("Successfully retrieved HTML source in headless mode.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
browser.quit()
return html_source
# Example usage:
page_content = fetch_with_options('https://nowsecure.nl')
if page_content:
print("\nRetrieved content successfully (headless).")
# print(page_content[:500]) # Optionally print content
else:
print("Failed to retrieve page content (headless).")
Here, we introduce uc.ChromeOptions()
. We create an options
object and then add arguments like --headless=new
to run the browser without a GUI. These options are then passed when creating the uc.Chrome
instance.
Many other options exist, like disabling image loading or setting window sizes. One particularly powerful option is customizing the User-Agent string.
The User-Agent is a piece of information your browser sends with every request, identifying itself (browser type, version, OS, etc.). Changing the User-Agent can sometimes help blend in better, especially if the default doesn't match common user profiles.
import undetected_chromedriver as uc
import time
def fetch_with_custom_ua(target_url):
options = uc.ChromeOptions()
options.add_argument('--headless=new')
# Define a custom User-Agent string
custom_user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
options.add_argument(f'--user-agent={custom_user_agent}')
browser = uc.Chrome(options=options)
detected_ua = None
html_source = None
try:
# Optional: Verify the user agent being used by the browser instance
browser.get('https://httpbin.org/user-agent') # A site that echoes the UA
time.sleep(1)
ua_info = browser.find_element("tag name", "pre").text
print(f"Detected User Agent by httpbin: {ua_info}")
# Now navigate to the actual target
print(f"Fetching (with custom UA) from: {target_url}")
browser.get(target_url)
time.sleep(2)
html_source = browser.page_source
print("Successfully retrieved HTML source with custom User-Agent.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
browser.quit()
return html_source
# Example usage:
page_content = fetch_with_custom_ua('https://nowsecure.nl')
# Check results...
We added another argument to set the --user-agent
. To confirm it's working, we first visit https://httpbin.org/user-agent
which simply reports back the headers it received, allowing us to see the User-Agent our browser is actually sending before proceeding to the main target.
You can experiment with different user agents or even rotate through a list of common, up-to-date ones. Other useful options include specifying browser versions or even providing a path to a specific ChromeDriver binary (though usually unnecessary).
Integrating Proxies
While Undetected ChromeDriver helps mimic a real user's browser, aggressive anti-bot systems also monitor IP address behavior. Sending too many requests from a single IP is a classic red flag. This is where proxies become essential.
Proxies act as intermediaries, routing your requests through different IP addresses. This makes it much harder for websites to track and block your scraping activity based solely on your IP. Fortunately, adding a proxy to Undetected ChromeDriver is straightforward.
import undetected_chromedriver as uc
import time
def fetch_via_proxy(target_url, proxy_string):
# proxy_string format example: "http://username:password@host:port" or "http://host:port"
options = uc.ChromeOptions()
options.add_argument('--headless=new')
options.add_argument('--ignore-certificate-errors') # Often needed for proxies
# Add the proxy server argument
options.add_argument(f'--proxy-server={proxy_string}')
print(f"Configured proxy: {proxy_string}")
browser = uc.Chrome(options=options)
html_source = None
try:
# Let's check our apparent IP address via the proxy
print("Checking IP address via proxy...")
browser.get('https://httpbin.org/ip')
time.sleep(2)
ip_info = browser.find_element("tag name", "pre").text
print(f"IP address seen by httpbin: {ip_info}")
# Now navigate to the actual target via proxy
print(f"Fetching (via proxy) from: {target_url}")
browser.get(target_url)
time.sleep(3) # Allow a bit more time for proxy connection
html_source = browser.page_source
print("Successfully retrieved HTML source via proxy.")
except Exception as e:
print(f"An error occurred while using proxy: {e}")
finally:
browser.quit()
return html_source
# --- Evomi Proxy Example ---
# Replace with your actual Evomi credentials and desired endpoint/port
# Example using Evomi Residential Proxies (HTTP) - Authentication typically needed
# Format: protocol://username:password@host:port
# evomi_proxy = "http://YOUR_USERNAME:YOUR_PASSWORD@rp.evomi.com:1000"
# Example using Evomi Datacenter Proxies (HTTP) - Replace with your details if applicable
# evomi_proxy = "http://YOUR_USERNAME:YOUR_PASSWORD@dc.evomi.com:2000"
# For this example, let's use a placeholder structure.
# Make sure to replace this with a real, working proxy string for actual use.
proxy_details = "http://username:password@rp.evomi.com:1000" # <-- REPLACE THIS
page_content = fetch_via_proxy('https://nowsecure.nl', proxy_details)
# Check results...
# Consider using Evomi's Free Proxy Tester (https://proxy-tester.evomi.com/) to verify proxy status separately.
We add another argument, --proxy-server=YOUR_PROXY_STRING
, passing our proxy details to the options. The example shows a common format including protocol, username, password, host, and port, referencing Evomi's residential proxy endpoint structure. Remember to replace the placeholder with your actual, valid proxy credentials and endpoint provided by Evomi.
Using different proxy types like residential, mobile, or datacenter proxies can significantly impact success rates. Residential and mobile proxies often perform best against strict anti-bot measures as they use IPs associated with real devices. Evomi offers a range of these options, starting from competitive prices like $0.49/GB for residential proxies. You can explore the specifics on the Evomi pricing page.
Note: If you use an invalid proxy string, the script might run without errors but fail to load the page correctly, often capturing an error page from the proxy itself instead of the target website's content.
Beyond Undetected ChromeDriver: Other Strategies
Undetected ChromeDriver is a powerful tool, but it's part of a larger scraping ecosystem. If you're still facing blocks, consider these points:
Alternative Libraries: Frameworks like Playwright and Pyppeteer (a Python port of Puppeteer) offer similar browser automation capabilities and have their own communities developing anti-detection techniques. Switching would require code adaptation.
Headless vs. Headful: Experiment with running your browser visibly (headful) versus invisibly (headless). Some sites are better at detecting headless browsers, even modified ones.
Proxy Strategy: Don't just use *a* proxy, use proxies *strategically*. Rotate IPs regularly, use high-quality residential or mobile proxies for tough targets, and match proxy geolocation to the target site if necessary.
User Agent Rotation: Cycle through a list of current, common user agents instead of using just one static string.
Behavioral Analysis: Mimic human behavior. Add random delays between actions, avoid predictable scraping patterns, navigate through login pages naturally instead of hitting internal pages directly.
Anti-Detect Browsers: For maximum stealth, specialized browsers like Evomium (free for Evomi customers) are designed to manage and spoof many browser fingerprint parameters automatically, often simplifying the setup compared to manual configuration in code.
Fingerprint Checking: Use tools like Evomi's Browser Fingerprint Checker to see what information your configured browser might be leaking.
Quick Recap
Let's summarize the core steps for using Undetected ChromeDriver with options and proxies:
Install the library:
Import it:
import undetected_chromedriver as uc
Adapt the Python code below, configuring options and replacing the placeholder proxy details with your actual credentials (e.g., from Evomi):
import undetected_chromedriver as uc
import time
def scrape_with_proxy(target_url, proxy_string):
options = uc.ChromeOptions()
options.add_argument('--headless=new') # Or remove for visible browser
# Example: Use a realistic user agent
options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36')
options.add_argument('--ignore-certificate-errors')
# Configure the proxy
options.add_argument(f'--proxy-server={proxy_string}')
browser = uc.Chrome(options=options)
html_source = None
try:
print(f"Attempting to fetch {target_url} via proxy {proxy_string.split('@')[-1]}...") # Mask user/pass in log
# Optional: Check IP first
# browser.get('https://httpbin.org/ip')
# print(f"Current IP: {browser.find_element('tag name', 'pre').text}")
browser.get(target_url)
time.sleep(3) # Adjust wait time as needed
html_source = browser.page_source
print("Successfully retrieved page source.")
except Exception as e:
print(f"Scraping failed: {e}")
finally:
browser.quit()
return html_source
# --- Configuration ---
# Replace with your actual proxy details from Evomi or another provider
# Format: protocol://username:password@host:port
proxy_config = "http://YOUR_USERNAME:YOUR_PASSWORD@rp.evomi.com:1000" # <-- REPLACE THIS
target_website = 'https://httpbin.org/headers' # Example site that shows request headers
# --- Execution ---
content = scrape_with_proxy(target_website, proxy_config)
if content:
print("\n--- Received Content Sample ---")
print(content[:600])
print("...")
By combining Undetected ChromeDriver's stealth capabilities with the IP anonymity provided by quality proxies like those from Evomi, you significantly increase your chances of successful and sustainable web scraping in Python.
Stealthy Web Scraping: Combining Python, Undetected ChromeDriver, and Proxies
Selenium is a heavyweight champion in the world of browser automation and web scraping. Typically, it works hand-in-glove with standard browser drivers, like the official ChromeDriver for Google Chrome. While effective, these standard drivers aren't exactly invisible. Savvy websites have developed methods to detect automation scripts using Selenium, often leading to blocks or CAPTCHAs.
The main giveaway? Standard drivers tend to leak specific automation-related properties. Enter Undetected ChromeDriver, a clever Python library designed specifically to patch these leaks. By modifying ChromeDriver, it significantly lowers the chances of your script being flagged as a bot.
Integrating Undetected ChromeDriver into your Selenium web scraping projects can be a game-changer. It often leads to more successful data gathering and can even cut down on costs by reducing the need for aggressive IP rotation (though proxies remain crucial!). Since it's an open-source library, adding Undetected ChromeDriver is a smart move for anyone serious about web scraping with Selenium.
Getting Started: Installation
Like any Python package, you'll need to install Undetected ChromeDriver first. Fire up your terminal or command prompt and run:
One handy thing: you don't need to install Selenium separately. Undetected ChromeDriver pulls it in as a dependency, along with other necessary bits.
Another neat feature: Undetected ChromeDriver automatically downloads a compatible ChromeDriver binary for you. This saves you the hassle of manually finding and matching driver versions, a common task in older Selenium setups.
With the installation complete, you can import the library into your Python script:
import undetected_chromedriver as uc
Using Undetected ChromeDriver: A Practical Guide
Making Basic Web Requests
Fetching a webpage (making a GET request) is fundamental to web scraping. With Undetected ChromeDriver, the process is very similar to standard Selenium, but under the hood, it's doing more to stay hidden.
import undetected_chromedriver as uc
import time
def fetch_webpage(target_url):
# Initialize the Undetected ChromeDriver
# Use 'options' argument if you need custom configurations later
browser = uc.Chrome()
try:
# Navigate to the desired URL
print(f"Attempting to load: {target_url}")
browser.get(target_url)
# Let's pause briefly to observe the browser (optional)
print("Page loaded. Pausing for 5 seconds...")
time.sleep(5) # Keep the browser open for a few seconds
print("Finished.")
finally:
# Ensure the browser is closed even if errors occur
browser.quit()
# Example: Let's try accessing a site known for bot detection
fetch_webpage('https://nowsecure.nl') # A site that tests browser fingerprinting
In this script, we define a function fetch_webpage
. It initializes an instance of the modified Chrome browser using uc.Chrome()
. Then, it navigates to the specified URL using the familiar get()
method.
We've added a time.sleep(5)
just so you can see the browser window open and load the page before it automatically closes. The try...finally
block ensures the browser closes properly, even if something goes wrong during the page load.
We're using nowsecure.nl
as an example target because it actively checks for bot-like browser properties. Standard Selenium might struggle here, making it a decent test for Undetected ChromeDriver's capabilities.
Capturing Website Source Code
Simply visiting a page isn't enough for scraping; you need the underlying HTML content. You can then feed this HTML to a parsing library like Beautiful Soup 4 to extract the data you need.
import undetected_chromedriver as uc
import time
def get_page_html(target_url):
# Initialize the Undetected ChromeDriver
browser = uc.Chrome()
html_source = None # Initialize variable to store HTML
try:
# Navigate to the URL
print(f"Fetching HTML from: {target_url}")
browser.get(target_url)
# Wait a moment for potential dynamic content loading (adjust as needed)
time.sleep(2)
# Get the page source HTML
html_source = browser.page_source
print("Successfully retrieved HTML source.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Close the browser
browser.quit()
# Return the captured HTML
return html_source
# Example usage:
page_content = get_page_html('https://nowsecure.nl')
if page_content:
# Print the first 500 characters as a sample
print("\n--- HTML Source (First 500 chars) ---")
print(page_content[:500])
print("...")
else:
print("Failed to retrieve page content.")
This version builds on the previous one. The key addition is html_source = browser.page_source
. This command grabs the full HTML source code of the currently loaded page and stores it in the html_source
variable. The function then returns this HTML after closing the browser.
Customizing Browser Behavior
Undetected ChromeDriver aims for stealth out-of-the-box, but sometimes you need to tweak its settings for specific scraping tasks or further optimization.
Keep in mind that altering default settings requires careful testing. Some options might inadvertently make your browser *more* detectable, while others might be necessary for specific scenarios (like running without a visible browser window).
import undetected_chromedriver as uc
import time
def fetch_with_options(target_url):
# Create a ChromeOptions object
options = uc.ChromeOptions()
# Example: Run in headless mode (no visible browser window)
# Note: Headless detection is sophisticated; test thoroughly!
# Use options.add_argument('--headless=new') for modern Chrome versions
options.add_argument('--headless=new')
# Example: Disable loading images (can speed up scraping)
# options.add_argument('--blink-settings=imagesEnabled=false')
# Initialize the driver with the specified options
browser = uc.Chrome(options=options)
html_source = None
try:
print(f"Fetching (with options) from: {target_url}")
browser.get(target_url)
time.sleep(2) # Allow time for page load in headless mode
html_source = browser.page_source
print("Successfully retrieved HTML source in headless mode.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
browser.quit()
return html_source
# Example usage:
page_content = fetch_with_options('https://nowsecure.nl')
if page_content:
print("\nRetrieved content successfully (headless).")
# print(page_content[:500]) # Optionally print content
else:
print("Failed to retrieve page content (headless).")
Here, we introduce uc.ChromeOptions()
. We create an options
object and then add arguments like --headless=new
to run the browser without a GUI. These options are then passed when creating the uc.Chrome
instance.
Many other options exist, like disabling image loading or setting window sizes. One particularly powerful option is customizing the User-Agent string.
The User-Agent is a piece of information your browser sends with every request, identifying itself (browser type, version, OS, etc.). Changing the User-Agent can sometimes help blend in better, especially if the default doesn't match common user profiles.
import undetected_chromedriver as uc
import time
def fetch_with_custom_ua(target_url):
options = uc.ChromeOptions()
options.add_argument('--headless=new')
# Define a custom User-Agent string
custom_user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
options.add_argument(f'--user-agent={custom_user_agent}')
browser = uc.Chrome(options=options)
detected_ua = None
html_source = None
try:
# Optional: Verify the user agent being used by the browser instance
browser.get('https://httpbin.org/user-agent') # A site that echoes the UA
time.sleep(1)
ua_info = browser.find_element("tag name", "pre").text
print(f"Detected User Agent by httpbin: {ua_info}")
# Now navigate to the actual target
print(f"Fetching (with custom UA) from: {target_url}")
browser.get(target_url)
time.sleep(2)
html_source = browser.page_source
print("Successfully retrieved HTML source with custom User-Agent.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
browser.quit()
return html_source
# Example usage:
page_content = fetch_with_custom_ua('https://nowsecure.nl')
# Check results...
We added another argument to set the --user-agent
. To confirm it's working, we first visit https://httpbin.org/user-agent
which simply reports back the headers it received, allowing us to see the User-Agent our browser is actually sending before proceeding to the main target.
You can experiment with different user agents or even rotate through a list of common, up-to-date ones. Other useful options include specifying browser versions or even providing a path to a specific ChromeDriver binary (though usually unnecessary).
Integrating Proxies
While Undetected ChromeDriver helps mimic a real user's browser, aggressive anti-bot systems also monitor IP address behavior. Sending too many requests from a single IP is a classic red flag. This is where proxies become essential.
Proxies act as intermediaries, routing your requests through different IP addresses. This makes it much harder for websites to track and block your scraping activity based solely on your IP. Fortunately, adding a proxy to Undetected ChromeDriver is straightforward.
import undetected_chromedriver as uc
import time
def fetch_via_proxy(target_url, proxy_string):
# proxy_string format example: "http://username:password@host:port" or "http://host:port"
options = uc.ChromeOptions()
options.add_argument('--headless=new')
options.add_argument('--ignore-certificate-errors') # Often needed for proxies
# Add the proxy server argument
options.add_argument(f'--proxy-server={proxy_string}')
print(f"Configured proxy: {proxy_string}")
browser = uc.Chrome(options=options)
html_source = None
try:
# Let's check our apparent IP address via the proxy
print("Checking IP address via proxy...")
browser.get('https://httpbin.org/ip')
time.sleep(2)
ip_info = browser.find_element("tag name", "pre").text
print(f"IP address seen by httpbin: {ip_info}")
# Now navigate to the actual target via proxy
print(f"Fetching (via proxy) from: {target_url}")
browser.get(target_url)
time.sleep(3) # Allow a bit more time for proxy connection
html_source = browser.page_source
print("Successfully retrieved HTML source via proxy.")
except Exception as e:
print(f"An error occurred while using proxy: {e}")
finally:
browser.quit()
return html_source
# --- Evomi Proxy Example ---
# Replace with your actual Evomi credentials and desired endpoint/port
# Example using Evomi Residential Proxies (HTTP) - Authentication typically needed
# Format: protocol://username:password@host:port
# evomi_proxy = "http://YOUR_USERNAME:YOUR_PASSWORD@rp.evomi.com:1000"
# Example using Evomi Datacenter Proxies (HTTP) - Replace with your details if applicable
# evomi_proxy = "http://YOUR_USERNAME:YOUR_PASSWORD@dc.evomi.com:2000"
# For this example, let's use a placeholder structure.
# Make sure to replace this with a real, working proxy string for actual use.
proxy_details = "http://username:password@rp.evomi.com:1000" # <-- REPLACE THIS
page_content = fetch_via_proxy('https://nowsecure.nl', proxy_details)
# Check results...
# Consider using Evomi's Free Proxy Tester (https://proxy-tester.evomi.com/) to verify proxy status separately.
We add another argument, --proxy-server=YOUR_PROXY_STRING
, passing our proxy details to the options. The example shows a common format including protocol, username, password, host, and port, referencing Evomi's residential proxy endpoint structure. Remember to replace the placeholder with your actual, valid proxy credentials and endpoint provided by Evomi.
Using different proxy types like residential, mobile, or datacenter proxies can significantly impact success rates. Residential and mobile proxies often perform best against strict anti-bot measures as they use IPs associated with real devices. Evomi offers a range of these options, starting from competitive prices like $0.49/GB for residential proxies. You can explore the specifics on the Evomi pricing page.
Note: If you use an invalid proxy string, the script might run without errors but fail to load the page correctly, often capturing an error page from the proxy itself instead of the target website's content.
Beyond Undetected ChromeDriver: Other Strategies
Undetected ChromeDriver is a powerful tool, but it's part of a larger scraping ecosystem. If you're still facing blocks, consider these points:
Alternative Libraries: Frameworks like Playwright and Pyppeteer (a Python port of Puppeteer) offer similar browser automation capabilities and have their own communities developing anti-detection techniques. Switching would require code adaptation.
Headless vs. Headful: Experiment with running your browser visibly (headful) versus invisibly (headless). Some sites are better at detecting headless browsers, even modified ones.
Proxy Strategy: Don't just use *a* proxy, use proxies *strategically*. Rotate IPs regularly, use high-quality residential or mobile proxies for tough targets, and match proxy geolocation to the target site if necessary.
User Agent Rotation: Cycle through a list of current, common user agents instead of using just one static string.
Behavioral Analysis: Mimic human behavior. Add random delays between actions, avoid predictable scraping patterns, navigate through login pages naturally instead of hitting internal pages directly.
Anti-Detect Browsers: For maximum stealth, specialized browsers like Evomium (free for Evomi customers) are designed to manage and spoof many browser fingerprint parameters automatically, often simplifying the setup compared to manual configuration in code.
Fingerprint Checking: Use tools like Evomi's Browser Fingerprint Checker to see what information your configured browser might be leaking.
Quick Recap
Let's summarize the core steps for using Undetected ChromeDriver with options and proxies:
Install the library:
Import it:
import undetected_chromedriver as uc
Adapt the Python code below, configuring options and replacing the placeholder proxy details with your actual credentials (e.g., from Evomi):
import undetected_chromedriver as uc
import time
def scrape_with_proxy(target_url, proxy_string):
options = uc.ChromeOptions()
options.add_argument('--headless=new') # Or remove for visible browser
# Example: Use a realistic user agent
options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36')
options.add_argument('--ignore-certificate-errors')
# Configure the proxy
options.add_argument(f'--proxy-server={proxy_string}')
browser = uc.Chrome(options=options)
html_source = None
try:
print(f"Attempting to fetch {target_url} via proxy {proxy_string.split('@')[-1]}...") # Mask user/pass in log
# Optional: Check IP first
# browser.get('https://httpbin.org/ip')
# print(f"Current IP: {browser.find_element('tag name', 'pre').text}")
browser.get(target_url)
time.sleep(3) # Adjust wait time as needed
html_source = browser.page_source
print("Successfully retrieved page source.")
except Exception as e:
print(f"Scraping failed: {e}")
finally:
browser.quit()
return html_source
# --- Configuration ---
# Replace with your actual proxy details from Evomi or another provider
# Format: protocol://username:password@host:port
proxy_config = "http://YOUR_USERNAME:YOUR_PASSWORD@rp.evomi.com:1000" # <-- REPLACE THIS
target_website = 'https://httpbin.org/headers' # Example site that shows request headers
# --- Execution ---
content = scrape_with_proxy(target_website, proxy_config)
if content:
print("\n--- Received Content Sample ---")
print(content[:600])
print("...")
By combining Undetected ChromeDriver's stealth capabilities with the IP anonymity provided by quality proxies like those from Evomi, you significantly increase your chances of successful and sustainable web scraping in Python.
Stealthy Web Scraping: Combining Python, Undetected ChromeDriver, and Proxies
Selenium is a heavyweight champion in the world of browser automation and web scraping. Typically, it works hand-in-glove with standard browser drivers, like the official ChromeDriver for Google Chrome. While effective, these standard drivers aren't exactly invisible. Savvy websites have developed methods to detect automation scripts using Selenium, often leading to blocks or CAPTCHAs.
The main giveaway? Standard drivers tend to leak specific automation-related properties. Enter Undetected ChromeDriver, a clever Python library designed specifically to patch these leaks. By modifying ChromeDriver, it significantly lowers the chances of your script being flagged as a bot.
Integrating Undetected ChromeDriver into your Selenium web scraping projects can be a game-changer. It often leads to more successful data gathering and can even cut down on costs by reducing the need for aggressive IP rotation (though proxies remain crucial!). Since it's an open-source library, adding Undetected ChromeDriver is a smart move for anyone serious about web scraping with Selenium.
Getting Started: Installation
Like any Python package, you'll need to install Undetected ChromeDriver first. Fire up your terminal or command prompt and run:
One handy thing: you don't need to install Selenium separately. Undetected ChromeDriver pulls it in as a dependency, along with other necessary bits.
Another neat feature: Undetected ChromeDriver automatically downloads a compatible ChromeDriver binary for you. This saves you the hassle of manually finding and matching driver versions, a common task in older Selenium setups.
With the installation complete, you can import the library into your Python script:
import undetected_chromedriver as uc
Using Undetected ChromeDriver: A Practical Guide
Making Basic Web Requests
Fetching a webpage (making a GET request) is fundamental to web scraping. With Undetected ChromeDriver, the process is very similar to standard Selenium, but under the hood, it's doing more to stay hidden.
import undetected_chromedriver as uc
import time
def fetch_webpage(target_url):
# Initialize the Undetected ChromeDriver
# Use 'options' argument if you need custom configurations later
browser = uc.Chrome()
try:
# Navigate to the desired URL
print(f"Attempting to load: {target_url}")
browser.get(target_url)
# Let's pause briefly to observe the browser (optional)
print("Page loaded. Pausing for 5 seconds...")
time.sleep(5) # Keep the browser open for a few seconds
print("Finished.")
finally:
# Ensure the browser is closed even if errors occur
browser.quit()
# Example: Let's try accessing a site known for bot detection
fetch_webpage('https://nowsecure.nl') # A site that tests browser fingerprinting
In this script, we define a function fetch_webpage
. It initializes an instance of the modified Chrome browser using uc.Chrome()
. Then, it navigates to the specified URL using the familiar get()
method.
We've added a time.sleep(5)
just so you can see the browser window open and load the page before it automatically closes. The try...finally
block ensures the browser closes properly, even if something goes wrong during the page load.
We're using nowsecure.nl
as an example target because it actively checks for bot-like browser properties. Standard Selenium might struggle here, making it a decent test for Undetected ChromeDriver's capabilities.
Capturing Website Source Code
Simply visiting a page isn't enough for scraping; you need the underlying HTML content. You can then feed this HTML to a parsing library like Beautiful Soup 4 to extract the data you need.
import undetected_chromedriver as uc
import time
def get_page_html(target_url):
# Initialize the Undetected ChromeDriver
browser = uc.Chrome()
html_source = None # Initialize variable to store HTML
try:
# Navigate to the URL
print(f"Fetching HTML from: {target_url}")
browser.get(target_url)
# Wait a moment for potential dynamic content loading (adjust as needed)
time.sleep(2)
# Get the page source HTML
html_source = browser.page_source
print("Successfully retrieved HTML source.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Close the browser
browser.quit()
# Return the captured HTML
return html_source
# Example usage:
page_content = get_page_html('https://nowsecure.nl')
if page_content:
# Print the first 500 characters as a sample
print("\n--- HTML Source (First 500 chars) ---")
print(page_content[:500])
print("...")
else:
print("Failed to retrieve page content.")
This version builds on the previous one. The key addition is html_source = browser.page_source
. This command grabs the full HTML source code of the currently loaded page and stores it in the html_source
variable. The function then returns this HTML after closing the browser.
Customizing Browser Behavior
Undetected ChromeDriver aims for stealth out-of-the-box, but sometimes you need to tweak its settings for specific scraping tasks or further optimization.
Keep in mind that altering default settings requires careful testing. Some options might inadvertently make your browser *more* detectable, while others might be necessary for specific scenarios (like running without a visible browser window).
import undetected_chromedriver as uc
import time
def fetch_with_options(target_url):
# Create a ChromeOptions object
options = uc.ChromeOptions()
# Example: Run in headless mode (no visible browser window)
# Note: Headless detection is sophisticated; test thoroughly!
# Use options.add_argument('--headless=new') for modern Chrome versions
options.add_argument('--headless=new')
# Example: Disable loading images (can speed up scraping)
# options.add_argument('--blink-settings=imagesEnabled=false')
# Initialize the driver with the specified options
browser = uc.Chrome(options=options)
html_source = None
try:
print(f"Fetching (with options) from: {target_url}")
browser.get(target_url)
time.sleep(2) # Allow time for page load in headless mode
html_source = browser.page_source
print("Successfully retrieved HTML source in headless mode.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
browser.quit()
return html_source
# Example usage:
page_content = fetch_with_options('https://nowsecure.nl')
if page_content:
print("\nRetrieved content successfully (headless).")
# print(page_content[:500]) # Optionally print content
else:
print("Failed to retrieve page content (headless).")
Here, we introduce uc.ChromeOptions()
. We create an options
object and then add arguments like --headless=new
to run the browser without a GUI. These options are then passed when creating the uc.Chrome
instance.
Many other options exist, like disabling image loading or setting window sizes. One particularly powerful option is customizing the User-Agent string.
The User-Agent is a piece of information your browser sends with every request, identifying itself (browser type, version, OS, etc.). Changing the User-Agent can sometimes help blend in better, especially if the default doesn't match common user profiles.
import undetected_chromedriver as uc
import time
def fetch_with_custom_ua(target_url):
options = uc.ChromeOptions()
options.add_argument('--headless=new')
# Define a custom User-Agent string
custom_user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
options.add_argument(f'--user-agent={custom_user_agent}')
browser = uc.Chrome(options=options)
detected_ua = None
html_source = None
try:
# Optional: Verify the user agent being used by the browser instance
browser.get('https://httpbin.org/user-agent') # A site that echoes the UA
time.sleep(1)
ua_info = browser.find_element("tag name", "pre").text
print(f"Detected User Agent by httpbin: {ua_info}")
# Now navigate to the actual target
print(f"Fetching (with custom UA) from: {target_url}")
browser.get(target_url)
time.sleep(2)
html_source = browser.page_source
print("Successfully retrieved HTML source with custom User-Agent.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
browser.quit()
return html_source
# Example usage:
page_content = fetch_with_custom_ua('https://nowsecure.nl')
# Check results...
We added another argument to set the --user-agent
. To confirm it's working, we first visit https://httpbin.org/user-agent
which simply reports back the headers it received, allowing us to see the User-Agent our browser is actually sending before proceeding to the main target.
You can experiment with different user agents or even rotate through a list of common, up-to-date ones. Other useful options include specifying browser versions or even providing a path to a specific ChromeDriver binary (though usually unnecessary).
Integrating Proxies
While Undetected ChromeDriver helps mimic a real user's browser, aggressive anti-bot systems also monitor IP address behavior. Sending too many requests from a single IP is a classic red flag. This is where proxies become essential.
Proxies act as intermediaries, routing your requests through different IP addresses. This makes it much harder for websites to track and block your scraping activity based solely on your IP. Fortunately, adding a proxy to Undetected ChromeDriver is straightforward.
import undetected_chromedriver as uc
import time
def fetch_via_proxy(target_url, proxy_string):
# proxy_string format example: "http://username:password@host:port" or "http://host:port"
options = uc.ChromeOptions()
options.add_argument('--headless=new')
options.add_argument('--ignore-certificate-errors') # Often needed for proxies
# Add the proxy server argument
options.add_argument(f'--proxy-server={proxy_string}')
print(f"Configured proxy: {proxy_string}")
browser = uc.Chrome(options=options)
html_source = None
try:
# Let's check our apparent IP address via the proxy
print("Checking IP address via proxy...")
browser.get('https://httpbin.org/ip')
time.sleep(2)
ip_info = browser.find_element("tag name", "pre").text
print(f"IP address seen by httpbin: {ip_info}")
# Now navigate to the actual target via proxy
print(f"Fetching (via proxy) from: {target_url}")
browser.get(target_url)
time.sleep(3) # Allow a bit more time for proxy connection
html_source = browser.page_source
print("Successfully retrieved HTML source via proxy.")
except Exception as e:
print(f"An error occurred while using proxy: {e}")
finally:
browser.quit()
return html_source
# --- Evomi Proxy Example ---
# Replace with your actual Evomi credentials and desired endpoint/port
# Example using Evomi Residential Proxies (HTTP) - Authentication typically needed
# Format: protocol://username:password@host:port
# evomi_proxy = "http://YOUR_USERNAME:YOUR_PASSWORD@rp.evomi.com:1000"
# Example using Evomi Datacenter Proxies (HTTP) - Replace with your details if applicable
# evomi_proxy = "http://YOUR_USERNAME:YOUR_PASSWORD@dc.evomi.com:2000"
# For this example, let's use a placeholder structure.
# Make sure to replace this with a real, working proxy string for actual use.
proxy_details = "http://username:password@rp.evomi.com:1000" # <-- REPLACE THIS
page_content = fetch_via_proxy('https://nowsecure.nl', proxy_details)
# Check results...
# Consider using Evomi's Free Proxy Tester (https://proxy-tester.evomi.com/) to verify proxy status separately.
We add another argument, --proxy-server=YOUR_PROXY_STRING
, passing our proxy details to the options. The example shows a common format including protocol, username, password, host, and port, referencing Evomi's residential proxy endpoint structure. Remember to replace the placeholder with your actual, valid proxy credentials and endpoint provided by Evomi.
Using different proxy types like residential, mobile, or datacenter proxies can significantly impact success rates. Residential and mobile proxies often perform best against strict anti-bot measures as they use IPs associated with real devices. Evomi offers a range of these options, starting from competitive prices like $0.49/GB for residential proxies. You can explore the specifics on the Evomi pricing page.
Note: If you use an invalid proxy string, the script might run without errors but fail to load the page correctly, often capturing an error page from the proxy itself instead of the target website's content.
Beyond Undetected ChromeDriver: Other Strategies
Undetected ChromeDriver is a powerful tool, but it's part of a larger scraping ecosystem. If you're still facing blocks, consider these points:
Alternative Libraries: Frameworks like Playwright and Pyppeteer (a Python port of Puppeteer) offer similar browser automation capabilities and have their own communities developing anti-detection techniques. Switching would require code adaptation.
Headless vs. Headful: Experiment with running your browser visibly (headful) versus invisibly (headless). Some sites are better at detecting headless browsers, even modified ones.
Proxy Strategy: Don't just use *a* proxy, use proxies *strategically*. Rotate IPs regularly, use high-quality residential or mobile proxies for tough targets, and match proxy geolocation to the target site if necessary.
User Agent Rotation: Cycle through a list of current, common user agents instead of using just one static string.
Behavioral Analysis: Mimic human behavior. Add random delays between actions, avoid predictable scraping patterns, navigate through login pages naturally instead of hitting internal pages directly.
Anti-Detect Browsers: For maximum stealth, specialized browsers like Evomium (free for Evomi customers) are designed to manage and spoof many browser fingerprint parameters automatically, often simplifying the setup compared to manual configuration in code.
Fingerprint Checking: Use tools like Evomi's Browser Fingerprint Checker to see what information your configured browser might be leaking.
Quick Recap
Let's summarize the core steps for using Undetected ChromeDriver with options and proxies:
Install the library:
Import it:
import undetected_chromedriver as uc
Adapt the Python code below, configuring options and replacing the placeholder proxy details with your actual credentials (e.g., from Evomi):
import undetected_chromedriver as uc
import time
def scrape_with_proxy(target_url, proxy_string):
options = uc.ChromeOptions()
options.add_argument('--headless=new') # Or remove for visible browser
# Example: Use a realistic user agent
options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36')
options.add_argument('--ignore-certificate-errors')
# Configure the proxy
options.add_argument(f'--proxy-server={proxy_string}')
browser = uc.Chrome(options=options)
html_source = None
try:
print(f"Attempting to fetch {target_url} via proxy {proxy_string.split('@')[-1]}...") # Mask user/pass in log
# Optional: Check IP first
# browser.get('https://httpbin.org/ip')
# print(f"Current IP: {browser.find_element('tag name', 'pre').text}")
browser.get(target_url)
time.sleep(3) # Adjust wait time as needed
html_source = browser.page_source
print("Successfully retrieved page source.")
except Exception as e:
print(f"Scraping failed: {e}")
finally:
browser.quit()
return html_source
# --- Configuration ---
# Replace with your actual proxy details from Evomi or another provider
# Format: protocol://username:password@host:port
proxy_config = "http://YOUR_USERNAME:YOUR_PASSWORD@rp.evomi.com:1000" # <-- REPLACE THIS
target_website = 'https://httpbin.org/headers' # Example site that shows request headers
# --- Execution ---
content = scrape_with_proxy(target_website, proxy_config)
if content:
print("\n--- Received Content Sample ---")
print(content[:600])
print("...")
By combining Undetected ChromeDriver's stealth capabilities with the IP anonymity provided by quality proxies like those from Evomi, you significantly increase your chances of successful and sustainable web scraping in Python.

Author
Sarah Whitmore
Digital Privacy & Cybersecurity Consultant
About Author
Sarah is a cybersecurity strategist with a passion for online privacy and digital security. She explores how proxies, VPNs, and encryption tools protect users from tracking, cyber threats, and data breaches. With years of experience in cybersecurity consulting, she provides practical insights into safeguarding sensitive data in an increasingly digital world.