How to Fix Failed Python Requests: Retry & Proxy Strategies





Sarah Whitmore
Error Resolution
Tackling Failed Python Requests: Smart Retries and Proxy Power
Python's requests
library is a fantastic tool in any developer's arsenal, particularly favored for tasks like web scraping. It simplifies the process of sending HTTP requests compared to the standard urllib
library, making interactions with web servers and APIs much more straightforward.
When you're building web scrapers, you'll likely lean heavily on requests
because it's relatively easy to use and debug. This is crucial because, let's face it, requests fail. Understanding *why* they fail and how to handle those failures gracefully is key to building robust scraping applications.
Getting Your Feet Wet with Python Requests
Before we dive in, we'll assume you've got Python set up and are comfortable using an Integrated Development Environment (IDE) like VS Code or PyCharm. If you haven't already, you'll need to install the requests
library. Open your terminal or command prompt and type:
This command fetches and installs the library, making it available in your Python environment.
Like any Python library, you need to import requests
before using it in your script:
import requests
Sending a simple GET request is quite intuitive. You call the get()
method with the URL you want to access:
import requests
def fetch_data(url):
try:
response = requests.get(url)
# We'll check the status code to see if it worked
print(f"Request to {url} returned status code: {response.status_code}")
# You might want to return the response object for further processing
# return response
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# Let's try fetching the Evomi homepage
target_url = 'https://evomi.com'
fetch_data(target_url)
Checking the status_code
of the response object is fundamental. It tells you whether your request was successful or if something went wrong. We'll use these codes to decide how to handle failures.
Decoding Failed Request Responses
Every HTTP request results in a status code. While codes like 200 OK mean success, we're more interested in the ones that signal trouble, especially during web scraping.
403 Forbidden
This code means the server understood your request but refuses to authorize it. You might lack the necessary permissions or credentials to access the resource, or perhaps your IP address has been flagged or banned. Overcoming a 403 often requires valid authentication credentials.
If the site uses basic authentication, you can include credentials directly in your request:
import requests
def fetch_protected_data(url, username, password):
try:
credentials = (username, password)
response = requests.get(url, auth=credentials)
print(f"Status Code: {response.status_code}")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# Example usage (replace with actual URL and credentials)
# fetch_protected_data('https://some-protected-resource.com/data', 'my_user', 'my_secret_pass')
Keep in mind that many websites use more complex authentication methods (like login forms with sessions and CSRF tokens) which require more sophisticated handling, often involving session objects and posting data.
429 Too Many Requests
Ah, the bane of many scrapers. A 429 error indicates you've hit a rate limit – you're sending requests too frequently from the same IP address. The server is asking you to slow down. This is where retry strategies and proxies become essential.
500 Internal Server Error
This is a generic "something broke" error on the server side. It's not your fault, but the server encountered an unexpected condition. Retrying the request after a short delay often resolves this, as it might be a temporary glitch.
502 Bad Gateway
Similar to a 500 error, a 502 usually means one server acting as a gateway or proxy received an invalid response from an upstream server it was trying to access. Again, this is a server-side issue, and a retry might succeed if the upstream issue is resolved quickly.
503 Service Unavailable
This typically means the server is temporarily overloaded or down for maintenance. It's unable to handle the request right now. While you can retry, success depends entirely on the server becoming available again.
504 Gateway Timeout
Like 502, this involves a gateway or proxy server. However, this time, the gateway didn't receive a *timely* response from the upstream server. It could be due to network congestion or the upstream server being slow. Retrying, possibly with increasing delays (backoff), is a common approach.
Building a Resilient Retry Strategy
The requests
library, combined with Python's standard libraries, provides the necessary building blocks to handle most transient errors automatically. For errors like 403 (Forbidden), you might need specific credentials, and for 429 (Too Many Requests), proxies offer a great solution (more on that later). But for the 5xx server errors and temporary glitches, retries are your best friend.
Let's explore two common ways to implement retries.
Method 1: The Simple Loop with Fixed Delay
A straightforward approach is to wrap your request in a loop that retries a fixed number of times with a pause between attempts.
import requests
import time
def fetch_with_simple_retry(url, max_retries=3, delay=5):
"""
Attempts to fetch a URL, retrying on non-200/404 status codes.
"""
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=10) # Added a timeout
# Success or 'Not Found' are considered final
if response.status_code == 200 or response.status_code == 404:
print(f"Attempt {attempt + 1}: Success or Not Found ({response.status_code}).")
return response
else:
print(f"Attempt {attempt + 1}: Failed with status {response.status_code}. Retrying in {delay}s...")
except requests.exceptions.Timeout:
print(f"Attempt {attempt + 1}: Request timed out. Retrying in {delay}s...")
except requests.exceptions.ConnectionError:
print(f"Attempt {attempt + 1}: Connection error. Retrying in {delay}s...")
except requests.exceptions.RequestException as e:
print(f"Attempt {attempt + 1}: An unexpected error occurred: {e}. Retrying in {delay}s...")
# Wait before the next attempt, unless it's the last one
if attempt < max_retries - 1:
time.sleep(delay)
else:
print(f"Attempt {attempt + 1}: Max retries reached. Giving up.")
return None # Or raise an exception
# Example usage
target_url = 'https://httpbin.org/delay/3' # A site that delays response
fetch_with_simple_retry(target_url)
Here, we import the time
library for the sleep()
function. The function takes the URL, the maximum number of retries, and the delay between retries. It loops, making the request inside a try...except
block to catch potential network issues or timeouts. If the status code isn't 200 (OK) or 404 (Not Found - often treated as a final state), it waits and tries again. This is simple but might hammer a struggling server if the delay is too short.
Method 2: Sophisticated Retries with HTTPAdapter
For more fine-grained control, especially implementing exponential backoff (increasing delays between retries), you can use the HTTPAdapter
and Retry
utility from requests
and its underlying library, urllib3
.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry # Corrected import path
def fetch_with_adapter_retry(url):
"""
Fetches a URL using a session with a configured retry strategy.
"""
session = requests.Session()
# Configure the retry strategy
retry_strategy = Retry(
total=5, # Total number of retries
backoff_factor=1, # Multiplier for delay: {backoff factor} * (2 ** ({number of total retries} - 1))
status_forcelist=[429, 500, 502, 503, 504], # Status codes to force retry on
allowed_methods=["HEAD", "GET", "OPTIONS"] # Methods to retry (important for idempotency)
)
# Mount the strategy to the session for HTTP and HTTPS
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
try:
response = session.get(url, timeout=15) # Use the session to make the request
print(f"Request to {url} returned status code: {response.status_code}")
return response
except requests.exceptions.RequestException as e:
print(f"Failed to fetch {url} after multiple retries: {e}")
return None
# Example usage
target_url = 'https://httpbin.org/status/503' # A site that returns 503
fetch_with_adapter_retry(target_url)
First, we import HTTPAdapter
and Retry
. We create a requests.Session
object, which persists certain parameters across requests. Then, we define a Retry
object, specifying:
total
: The maximum number of retry attempts.backoff_factor
: Controls the delay between attempts. A factor of 1 means delays will be roughly 0s, 1s, 2s, 4s, 8s... for subsequent retries.status_forcelist
: A list of HTTP status codes that should trigger a retry.
We create an HTTPAdapter
with this retry strategy and `mount` it to our session for both `http://` and `https://` prefixes. Any request made using this `session` object will automatically use the configured retry logic. This method is generally preferred for production code as it's more robust and respects server load better via backoff.
Using Proxies to Navigate Request Limits (Especially 429s)
The 429 Too Many Requests
error is a direct challenge to your IP address's reputation with the target server. While retrying might eventually work if the rate limit window passes, a more effective strategy, especially for large-scale scraping, is using proxies.
By routing your requests through different proxy servers, you change the source IP address seen by the target server. If you hit a rate limit on one IP, you can simply switch to another. High-quality proxy providers like Evomi offer vast pools of IPs (Residential, Mobile, Datacenter) allowing you to distribute your requests and avoid hitting those limits.
Let's adapt the HTTPAdapter
example to incorporate rotating proxies for handling 429s proactively.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def fetch_with_proxies_and_retry(url):
"""
Fetches a URL using a session with retries and rotating proxies.
Note: This example assumes a rotating proxy endpoint.
"""
session = requests.Session()
# Proxy configuration - replace with your actual Evomi credentials and endpoint
# Example for Evomi rotating residential proxies (HTTP)
proxy_user = "YOUR_USERNAME"
proxy_pass = "YOUR_PASSWORD"
proxy_host = "rp.evomi.com"
proxy_port = 1000 # Use 1001 for HTTPS, 1002 for SOCKS5
proxy_url = f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
proxies = {
"http": proxy_url,
"https": proxy_url, # Use the same proxy for HTTPS traffic
}
session.proxies = proxies # Set proxies for the session
# Configure retry strategy - Note: We might remove 429 if proxies handle it
retry_strategy = Retry(
total=3, # Fewer retries as proxies help avoid some issues
backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504], # Retrying mainly server-side issues
allowed_methods=["HEAD", "GET", "OPTIONS"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
try:
# Verify IP change (optional, good for testing)
# ip_check_url = 'https://geo.evomi.com/'
# pre_response = session.get(ip_check_url, timeout=10)
# print(f"Current IP via proxy: {pre_response.json().get('ip')}")
response = session.get(url, timeout=20) # Increased timeout for proxy latency
print(f"Request to {url} via proxy returned status code: {response.status_code}")
# Specific handling for 429 if needed (though rotating proxy should mitigate)
if response.status_code == 429:
print("Received 429 despite proxy. Rotating proxy might need time or pool exhausted.")
# Depending on proxy type, might need manual rotation logic here
# For Evomi rotating residential, each new request *should* use a new IP.
return response
except requests.exceptions.ProxyError as e:
print(f"Proxy error occurred: {e}")
return None
except requests.exceptions.RequestException as e:
print(f"Failed to fetch {url} via proxy after retries: {e}")
return None
# Example usage
target_url = 'https://httpbin.org/ip' # Check the IP the target server sees
fetch_with_proxies_and_retry(target_url)
In this setup, we configure the session to use proxies. We're using placeholders for Evomi's rotating residential proxy endpoint (rp.evomi.com:1000
). Each request sent through this session will automatically be routed via a proxy. If you're using a provider like Evomi with rotating residential or mobile proxies, each connection (or session, depending on setup) typically gets a new IP address automatically. This drastically reduces the chance of hitting 429 errors tied to a single IP.
Notice we adjusted the Retry
strategy, perhaps removing 429
from the status_forcelist
because the primary strategy for handling it is now IP rotation via the proxy. We still keep retries for server-side errors (5xx). If using sticky sessions (where you keep the same proxy IP for a while), you'd need more complex logic to manually switch proxies upon receiving a 429.
Considering proxies? Evomi offers ethically sourced Residential, Mobile, and fast Datacenter proxies. We even offer a free trial to test them out!
Wrapping Up
Handling failed requests is non-negotiable for reliable web scraping or API interaction in Python. You've learned two primary ways to implement automatic retries:
A simple
for
loop withtime.sleep()
for basic retry needs.Using
requests.Session
,HTTPAdapter
, andurllib3.util.Retry
for more control, including exponential backoff.
Furthermore, you saw how proxies, particularly rotating ones like those offered by Evomi, are incredibly effective at overcoming rate limits (429 errors) by changing your source IP address. Combining smart retry logic with a robust proxy infrastructure gives your Python scripts the resilience needed to navigate the unpredictable nature of the web.
Tackling Failed Python Requests: Smart Retries and Proxy Power
Python's requests
library is a fantastic tool in any developer's arsenal, particularly favored for tasks like web scraping. It simplifies the process of sending HTTP requests compared to the standard urllib
library, making interactions with web servers and APIs much more straightforward.
When you're building web scrapers, you'll likely lean heavily on requests
because it's relatively easy to use and debug. This is crucial because, let's face it, requests fail. Understanding *why* they fail and how to handle those failures gracefully is key to building robust scraping applications.
Getting Your Feet Wet with Python Requests
Before we dive in, we'll assume you've got Python set up and are comfortable using an Integrated Development Environment (IDE) like VS Code or PyCharm. If you haven't already, you'll need to install the requests
library. Open your terminal or command prompt and type:
This command fetches and installs the library, making it available in your Python environment.
Like any Python library, you need to import requests
before using it in your script:
import requests
Sending a simple GET request is quite intuitive. You call the get()
method with the URL you want to access:
import requests
def fetch_data(url):
try:
response = requests.get(url)
# We'll check the status code to see if it worked
print(f"Request to {url} returned status code: {response.status_code}")
# You might want to return the response object for further processing
# return response
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# Let's try fetching the Evomi homepage
target_url = 'https://evomi.com'
fetch_data(target_url)
Checking the status_code
of the response object is fundamental. It tells you whether your request was successful or if something went wrong. We'll use these codes to decide how to handle failures.
Decoding Failed Request Responses
Every HTTP request results in a status code. While codes like 200 OK mean success, we're more interested in the ones that signal trouble, especially during web scraping.
403 Forbidden
This code means the server understood your request but refuses to authorize it. You might lack the necessary permissions or credentials to access the resource, or perhaps your IP address has been flagged or banned. Overcoming a 403 often requires valid authentication credentials.
If the site uses basic authentication, you can include credentials directly in your request:
import requests
def fetch_protected_data(url, username, password):
try:
credentials = (username, password)
response = requests.get(url, auth=credentials)
print(f"Status Code: {response.status_code}")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# Example usage (replace with actual URL and credentials)
# fetch_protected_data('https://some-protected-resource.com/data', 'my_user', 'my_secret_pass')
Keep in mind that many websites use more complex authentication methods (like login forms with sessions and CSRF tokens) which require more sophisticated handling, often involving session objects and posting data.
429 Too Many Requests
Ah, the bane of many scrapers. A 429 error indicates you've hit a rate limit – you're sending requests too frequently from the same IP address. The server is asking you to slow down. This is where retry strategies and proxies become essential.
500 Internal Server Error
This is a generic "something broke" error on the server side. It's not your fault, but the server encountered an unexpected condition. Retrying the request after a short delay often resolves this, as it might be a temporary glitch.
502 Bad Gateway
Similar to a 500 error, a 502 usually means one server acting as a gateway or proxy received an invalid response from an upstream server it was trying to access. Again, this is a server-side issue, and a retry might succeed if the upstream issue is resolved quickly.
503 Service Unavailable
This typically means the server is temporarily overloaded or down for maintenance. It's unable to handle the request right now. While you can retry, success depends entirely on the server becoming available again.
504 Gateway Timeout
Like 502, this involves a gateway or proxy server. However, this time, the gateway didn't receive a *timely* response from the upstream server. It could be due to network congestion or the upstream server being slow. Retrying, possibly with increasing delays (backoff), is a common approach.
Building a Resilient Retry Strategy
The requests
library, combined with Python's standard libraries, provides the necessary building blocks to handle most transient errors automatically. For errors like 403 (Forbidden), you might need specific credentials, and for 429 (Too Many Requests), proxies offer a great solution (more on that later). But for the 5xx server errors and temporary glitches, retries are your best friend.
Let's explore two common ways to implement retries.
Method 1: The Simple Loop with Fixed Delay
A straightforward approach is to wrap your request in a loop that retries a fixed number of times with a pause between attempts.
import requests
import time
def fetch_with_simple_retry(url, max_retries=3, delay=5):
"""
Attempts to fetch a URL, retrying on non-200/404 status codes.
"""
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=10) # Added a timeout
# Success or 'Not Found' are considered final
if response.status_code == 200 or response.status_code == 404:
print(f"Attempt {attempt + 1}: Success or Not Found ({response.status_code}).")
return response
else:
print(f"Attempt {attempt + 1}: Failed with status {response.status_code}. Retrying in {delay}s...")
except requests.exceptions.Timeout:
print(f"Attempt {attempt + 1}: Request timed out. Retrying in {delay}s...")
except requests.exceptions.ConnectionError:
print(f"Attempt {attempt + 1}: Connection error. Retrying in {delay}s...")
except requests.exceptions.RequestException as e:
print(f"Attempt {attempt + 1}: An unexpected error occurred: {e}. Retrying in {delay}s...")
# Wait before the next attempt, unless it's the last one
if attempt < max_retries - 1:
time.sleep(delay)
else:
print(f"Attempt {attempt + 1}: Max retries reached. Giving up.")
return None # Or raise an exception
# Example usage
target_url = 'https://httpbin.org/delay/3' # A site that delays response
fetch_with_simple_retry(target_url)
Here, we import the time
library for the sleep()
function. The function takes the URL, the maximum number of retries, and the delay between retries. It loops, making the request inside a try...except
block to catch potential network issues or timeouts. If the status code isn't 200 (OK) or 404 (Not Found - often treated as a final state), it waits and tries again. This is simple but might hammer a struggling server if the delay is too short.
Method 2: Sophisticated Retries with HTTPAdapter
For more fine-grained control, especially implementing exponential backoff (increasing delays between retries), you can use the HTTPAdapter
and Retry
utility from requests
and its underlying library, urllib3
.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry # Corrected import path
def fetch_with_adapter_retry(url):
"""
Fetches a URL using a session with a configured retry strategy.
"""
session = requests.Session()
# Configure the retry strategy
retry_strategy = Retry(
total=5, # Total number of retries
backoff_factor=1, # Multiplier for delay: {backoff factor} * (2 ** ({number of total retries} - 1))
status_forcelist=[429, 500, 502, 503, 504], # Status codes to force retry on
allowed_methods=["HEAD", "GET", "OPTIONS"] # Methods to retry (important for idempotency)
)
# Mount the strategy to the session for HTTP and HTTPS
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
try:
response = session.get(url, timeout=15) # Use the session to make the request
print(f"Request to {url} returned status code: {response.status_code}")
return response
except requests.exceptions.RequestException as e:
print(f"Failed to fetch {url} after multiple retries: {e}")
return None
# Example usage
target_url = 'https://httpbin.org/status/503' # A site that returns 503
fetch_with_adapter_retry(target_url)
First, we import HTTPAdapter
and Retry
. We create a requests.Session
object, which persists certain parameters across requests. Then, we define a Retry
object, specifying:
total
: The maximum number of retry attempts.backoff_factor
: Controls the delay between attempts. A factor of 1 means delays will be roughly 0s, 1s, 2s, 4s, 8s... for subsequent retries.status_forcelist
: A list of HTTP status codes that should trigger a retry.
We create an HTTPAdapter
with this retry strategy and `mount` it to our session for both `http://` and `https://` prefixes. Any request made using this `session` object will automatically use the configured retry logic. This method is generally preferred for production code as it's more robust and respects server load better via backoff.
Using Proxies to Navigate Request Limits (Especially 429s)
The 429 Too Many Requests
error is a direct challenge to your IP address's reputation with the target server. While retrying might eventually work if the rate limit window passes, a more effective strategy, especially for large-scale scraping, is using proxies.
By routing your requests through different proxy servers, you change the source IP address seen by the target server. If you hit a rate limit on one IP, you can simply switch to another. High-quality proxy providers like Evomi offer vast pools of IPs (Residential, Mobile, Datacenter) allowing you to distribute your requests and avoid hitting those limits.
Let's adapt the HTTPAdapter
example to incorporate rotating proxies for handling 429s proactively.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def fetch_with_proxies_and_retry(url):
"""
Fetches a URL using a session with retries and rotating proxies.
Note: This example assumes a rotating proxy endpoint.
"""
session = requests.Session()
# Proxy configuration - replace with your actual Evomi credentials and endpoint
# Example for Evomi rotating residential proxies (HTTP)
proxy_user = "YOUR_USERNAME"
proxy_pass = "YOUR_PASSWORD"
proxy_host = "rp.evomi.com"
proxy_port = 1000 # Use 1001 for HTTPS, 1002 for SOCKS5
proxy_url = f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
proxies = {
"http": proxy_url,
"https": proxy_url, # Use the same proxy for HTTPS traffic
}
session.proxies = proxies # Set proxies for the session
# Configure retry strategy - Note: We might remove 429 if proxies handle it
retry_strategy = Retry(
total=3, # Fewer retries as proxies help avoid some issues
backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504], # Retrying mainly server-side issues
allowed_methods=["HEAD", "GET", "OPTIONS"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
try:
# Verify IP change (optional, good for testing)
# ip_check_url = 'https://geo.evomi.com/'
# pre_response = session.get(ip_check_url, timeout=10)
# print(f"Current IP via proxy: {pre_response.json().get('ip')}")
response = session.get(url, timeout=20) # Increased timeout for proxy latency
print(f"Request to {url} via proxy returned status code: {response.status_code}")
# Specific handling for 429 if needed (though rotating proxy should mitigate)
if response.status_code == 429:
print("Received 429 despite proxy. Rotating proxy might need time or pool exhausted.")
# Depending on proxy type, might need manual rotation logic here
# For Evomi rotating residential, each new request *should* use a new IP.
return response
except requests.exceptions.ProxyError as e:
print(f"Proxy error occurred: {e}")
return None
except requests.exceptions.RequestException as e:
print(f"Failed to fetch {url} via proxy after retries: {e}")
return None
# Example usage
target_url = 'https://httpbin.org/ip' # Check the IP the target server sees
fetch_with_proxies_and_retry(target_url)
In this setup, we configure the session to use proxies. We're using placeholders for Evomi's rotating residential proxy endpoint (rp.evomi.com:1000
). Each request sent through this session will automatically be routed via a proxy. If you're using a provider like Evomi with rotating residential or mobile proxies, each connection (or session, depending on setup) typically gets a new IP address automatically. This drastically reduces the chance of hitting 429 errors tied to a single IP.
Notice we adjusted the Retry
strategy, perhaps removing 429
from the status_forcelist
because the primary strategy for handling it is now IP rotation via the proxy. We still keep retries for server-side errors (5xx). If using sticky sessions (where you keep the same proxy IP for a while), you'd need more complex logic to manually switch proxies upon receiving a 429.
Considering proxies? Evomi offers ethically sourced Residential, Mobile, and fast Datacenter proxies. We even offer a free trial to test them out!
Wrapping Up
Handling failed requests is non-negotiable for reliable web scraping or API interaction in Python. You've learned two primary ways to implement automatic retries:
A simple
for
loop withtime.sleep()
for basic retry needs.Using
requests.Session
,HTTPAdapter
, andurllib3.util.Retry
for more control, including exponential backoff.
Furthermore, you saw how proxies, particularly rotating ones like those offered by Evomi, are incredibly effective at overcoming rate limits (429 errors) by changing your source IP address. Combining smart retry logic with a robust proxy infrastructure gives your Python scripts the resilience needed to navigate the unpredictable nature of the web.
Tackling Failed Python Requests: Smart Retries and Proxy Power
Python's requests
library is a fantastic tool in any developer's arsenal, particularly favored for tasks like web scraping. It simplifies the process of sending HTTP requests compared to the standard urllib
library, making interactions with web servers and APIs much more straightforward.
When you're building web scrapers, you'll likely lean heavily on requests
because it's relatively easy to use and debug. This is crucial because, let's face it, requests fail. Understanding *why* they fail and how to handle those failures gracefully is key to building robust scraping applications.
Getting Your Feet Wet with Python Requests
Before we dive in, we'll assume you've got Python set up and are comfortable using an Integrated Development Environment (IDE) like VS Code or PyCharm. If you haven't already, you'll need to install the requests
library. Open your terminal or command prompt and type:
This command fetches and installs the library, making it available in your Python environment.
Like any Python library, you need to import requests
before using it in your script:
import requests
Sending a simple GET request is quite intuitive. You call the get()
method with the URL you want to access:
import requests
def fetch_data(url):
try:
response = requests.get(url)
# We'll check the status code to see if it worked
print(f"Request to {url} returned status code: {response.status_code}")
# You might want to return the response object for further processing
# return response
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# Let's try fetching the Evomi homepage
target_url = 'https://evomi.com'
fetch_data(target_url)
Checking the status_code
of the response object is fundamental. It tells you whether your request was successful or if something went wrong. We'll use these codes to decide how to handle failures.
Decoding Failed Request Responses
Every HTTP request results in a status code. While codes like 200 OK mean success, we're more interested in the ones that signal trouble, especially during web scraping.
403 Forbidden
This code means the server understood your request but refuses to authorize it. You might lack the necessary permissions or credentials to access the resource, or perhaps your IP address has been flagged or banned. Overcoming a 403 often requires valid authentication credentials.
If the site uses basic authentication, you can include credentials directly in your request:
import requests
def fetch_protected_data(url, username, password):
try:
credentials = (username, password)
response = requests.get(url, auth=credentials)
print(f"Status Code: {response.status_code}")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# Example usage (replace with actual URL and credentials)
# fetch_protected_data('https://some-protected-resource.com/data', 'my_user', 'my_secret_pass')
Keep in mind that many websites use more complex authentication methods (like login forms with sessions and CSRF tokens) which require more sophisticated handling, often involving session objects and posting data.
429 Too Many Requests
Ah, the bane of many scrapers. A 429 error indicates you've hit a rate limit – you're sending requests too frequently from the same IP address. The server is asking you to slow down. This is where retry strategies and proxies become essential.
500 Internal Server Error
This is a generic "something broke" error on the server side. It's not your fault, but the server encountered an unexpected condition. Retrying the request after a short delay often resolves this, as it might be a temporary glitch.
502 Bad Gateway
Similar to a 500 error, a 502 usually means one server acting as a gateway or proxy received an invalid response from an upstream server it was trying to access. Again, this is a server-side issue, and a retry might succeed if the upstream issue is resolved quickly.
503 Service Unavailable
This typically means the server is temporarily overloaded or down for maintenance. It's unable to handle the request right now. While you can retry, success depends entirely on the server becoming available again.
504 Gateway Timeout
Like 502, this involves a gateway or proxy server. However, this time, the gateway didn't receive a *timely* response from the upstream server. It could be due to network congestion or the upstream server being slow. Retrying, possibly with increasing delays (backoff), is a common approach.
Building a Resilient Retry Strategy
The requests
library, combined with Python's standard libraries, provides the necessary building blocks to handle most transient errors automatically. For errors like 403 (Forbidden), you might need specific credentials, and for 429 (Too Many Requests), proxies offer a great solution (more on that later). But for the 5xx server errors and temporary glitches, retries are your best friend.
Let's explore two common ways to implement retries.
Method 1: The Simple Loop with Fixed Delay
A straightforward approach is to wrap your request in a loop that retries a fixed number of times with a pause between attempts.
import requests
import time
def fetch_with_simple_retry(url, max_retries=3, delay=5):
"""
Attempts to fetch a URL, retrying on non-200/404 status codes.
"""
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=10) # Added a timeout
# Success or 'Not Found' are considered final
if response.status_code == 200 or response.status_code == 404:
print(f"Attempt {attempt + 1}: Success or Not Found ({response.status_code}).")
return response
else:
print(f"Attempt {attempt + 1}: Failed with status {response.status_code}. Retrying in {delay}s...")
except requests.exceptions.Timeout:
print(f"Attempt {attempt + 1}: Request timed out. Retrying in {delay}s...")
except requests.exceptions.ConnectionError:
print(f"Attempt {attempt + 1}: Connection error. Retrying in {delay}s...")
except requests.exceptions.RequestException as e:
print(f"Attempt {attempt + 1}: An unexpected error occurred: {e}. Retrying in {delay}s...")
# Wait before the next attempt, unless it's the last one
if attempt < max_retries - 1:
time.sleep(delay)
else:
print(f"Attempt {attempt + 1}: Max retries reached. Giving up.")
return None # Or raise an exception
# Example usage
target_url = 'https://httpbin.org/delay/3' # A site that delays response
fetch_with_simple_retry(target_url)
Here, we import the time
library for the sleep()
function. The function takes the URL, the maximum number of retries, and the delay between retries. It loops, making the request inside a try...except
block to catch potential network issues or timeouts. If the status code isn't 200 (OK) or 404 (Not Found - often treated as a final state), it waits and tries again. This is simple but might hammer a struggling server if the delay is too short.
Method 2: Sophisticated Retries with HTTPAdapter
For more fine-grained control, especially implementing exponential backoff (increasing delays between retries), you can use the HTTPAdapter
and Retry
utility from requests
and its underlying library, urllib3
.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry # Corrected import path
def fetch_with_adapter_retry(url):
"""
Fetches a URL using a session with a configured retry strategy.
"""
session = requests.Session()
# Configure the retry strategy
retry_strategy = Retry(
total=5, # Total number of retries
backoff_factor=1, # Multiplier for delay: {backoff factor} * (2 ** ({number of total retries} - 1))
status_forcelist=[429, 500, 502, 503, 504], # Status codes to force retry on
allowed_methods=["HEAD", "GET", "OPTIONS"] # Methods to retry (important for idempotency)
)
# Mount the strategy to the session for HTTP and HTTPS
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
try:
response = session.get(url, timeout=15) # Use the session to make the request
print(f"Request to {url} returned status code: {response.status_code}")
return response
except requests.exceptions.RequestException as e:
print(f"Failed to fetch {url} after multiple retries: {e}")
return None
# Example usage
target_url = 'https://httpbin.org/status/503' # A site that returns 503
fetch_with_adapter_retry(target_url)
First, we import HTTPAdapter
and Retry
. We create a requests.Session
object, which persists certain parameters across requests. Then, we define a Retry
object, specifying:
total
: The maximum number of retry attempts.backoff_factor
: Controls the delay between attempts. A factor of 1 means delays will be roughly 0s, 1s, 2s, 4s, 8s... for subsequent retries.status_forcelist
: A list of HTTP status codes that should trigger a retry.
We create an HTTPAdapter
with this retry strategy and `mount` it to our session for both `http://` and `https://` prefixes. Any request made using this `session` object will automatically use the configured retry logic. This method is generally preferred for production code as it's more robust and respects server load better via backoff.
Using Proxies to Navigate Request Limits (Especially 429s)
The 429 Too Many Requests
error is a direct challenge to your IP address's reputation with the target server. While retrying might eventually work if the rate limit window passes, a more effective strategy, especially for large-scale scraping, is using proxies.
By routing your requests through different proxy servers, you change the source IP address seen by the target server. If you hit a rate limit on one IP, you can simply switch to another. High-quality proxy providers like Evomi offer vast pools of IPs (Residential, Mobile, Datacenter) allowing you to distribute your requests and avoid hitting those limits.
Let's adapt the HTTPAdapter
example to incorporate rotating proxies for handling 429s proactively.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def fetch_with_proxies_and_retry(url):
"""
Fetches a URL using a session with retries and rotating proxies.
Note: This example assumes a rotating proxy endpoint.
"""
session = requests.Session()
# Proxy configuration - replace with your actual Evomi credentials and endpoint
# Example for Evomi rotating residential proxies (HTTP)
proxy_user = "YOUR_USERNAME"
proxy_pass = "YOUR_PASSWORD"
proxy_host = "rp.evomi.com"
proxy_port = 1000 # Use 1001 for HTTPS, 1002 for SOCKS5
proxy_url = f"http://{proxy_user}:{proxy_pass}@{proxy_host}:{proxy_port}"
proxies = {
"http": proxy_url,
"https": proxy_url, # Use the same proxy for HTTPS traffic
}
session.proxies = proxies # Set proxies for the session
# Configure retry strategy - Note: We might remove 429 if proxies handle it
retry_strategy = Retry(
total=3, # Fewer retries as proxies help avoid some issues
backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504], # Retrying mainly server-side issues
allowed_methods=["HEAD", "GET", "OPTIONS"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
try:
# Verify IP change (optional, good for testing)
# ip_check_url = 'https://geo.evomi.com/'
# pre_response = session.get(ip_check_url, timeout=10)
# print(f"Current IP via proxy: {pre_response.json().get('ip')}")
response = session.get(url, timeout=20) # Increased timeout for proxy latency
print(f"Request to {url} via proxy returned status code: {response.status_code}")
# Specific handling for 429 if needed (though rotating proxy should mitigate)
if response.status_code == 429:
print("Received 429 despite proxy. Rotating proxy might need time or pool exhausted.")
# Depending on proxy type, might need manual rotation logic here
# For Evomi rotating residential, each new request *should* use a new IP.
return response
except requests.exceptions.ProxyError as e:
print(f"Proxy error occurred: {e}")
return None
except requests.exceptions.RequestException as e:
print(f"Failed to fetch {url} via proxy after retries: {e}")
return None
# Example usage
target_url = 'https://httpbin.org/ip' # Check the IP the target server sees
fetch_with_proxies_and_retry(target_url)
In this setup, we configure the session to use proxies. We're using placeholders for Evomi's rotating residential proxy endpoint (rp.evomi.com:1000
). Each request sent through this session will automatically be routed via a proxy. If you're using a provider like Evomi with rotating residential or mobile proxies, each connection (or session, depending on setup) typically gets a new IP address automatically. This drastically reduces the chance of hitting 429 errors tied to a single IP.
Notice we adjusted the Retry
strategy, perhaps removing 429
from the status_forcelist
because the primary strategy for handling it is now IP rotation via the proxy. We still keep retries for server-side errors (5xx). If using sticky sessions (where you keep the same proxy IP for a while), you'd need more complex logic to manually switch proxies upon receiving a 429.
Considering proxies? Evomi offers ethically sourced Residential, Mobile, and fast Datacenter proxies. We even offer a free trial to test them out!
Wrapping Up
Handling failed requests is non-negotiable for reliable web scraping or API interaction in Python. You've learned two primary ways to implement automatic retries:
A simple
for
loop withtime.sleep()
for basic retry needs.Using
requests.Session
,HTTPAdapter
, andurllib3.util.Retry
for more control, including exponential backoff.
Furthermore, you saw how proxies, particularly rotating ones like those offered by Evomi, are incredibly effective at overcoming rate limits (429 errors) by changing your source IP address. Combining smart retry logic with a robust proxy infrastructure gives your Python scripts the resilience needed to navigate the unpredictable nature of the web.

Author
Sarah Whitmore
Digital Privacy & Cybersecurity Consultant
About Author
Sarah is a cybersecurity strategist with a passion for online privacy and digital security. She explores how proxies, VPNs, and encryption tools protect users from tracking, cyber threats, and data breaches. With years of experience in cybersecurity consulting, she provides practical insights into safeguarding sensitive data in an increasingly digital world.