Master Expedia Scraping with Python and Proxies





Sarah Whitmore
Scraping Techniques
Diving into Expedia Data: A Python Scraping Guide
Expedia, the well-known online travel agency, aggregates a wealth of information on hotels, flights, activities, and travel packages. It's a goldmine for data.
Extracting data from Expedia can provide significant advantages, whether for market analysis, keeping tabs on competitors, or simply snagging the best hotel rates for your next trip.
This guide will walk you through effective techniques for scraping Expedia, featuring a step-by-step tutorial using Python to gather hotel pricing information.
Why Scraping Expedia Can Be Tricky
Like many contemporary websites, Expedia relies heavily on JavaScript to load content dynamically and manage user interactions. This presents a challenge for traditional web scrapers that primarily parse static HTML, such as Beautiful Soup. These tools often can't execute the necessary JavaScript, meaning they might miss crucial data loaded after the initial page load.
Essential Tools for Effective Expedia Scraping
To successfully scrape dynamic sites like Expedia, you'll need tools capable of rendering the page like a real browser. Headless browsers controlled programmatically, like Puppeteer or Playwright, are ideal. They can execute JavaScript, interact with page elements, and access the fully rendered content.
Playwright stands out as a particularly robust choice for tackling dynamic websites. It offers official support for several languages, including Python, JavaScript, Java, and C#, making it accessible to a broad range of developers.
Scraping Expedia Hotel Data with Python and Playwright
Let's get practical. The following sections detail how to use Python with Playwright to extract hotel details for a specific city and date range.
Setting Up Your Environment
First things first, you need Python installed on your system. If you don't have it yet, grab it from the official Python website and follow their installation instructions.
Next, install the Playwright library and the necessary browser binaries. Open your terminal or command prompt and run:
pip install playwright
playwright install # Installs browser binaries (like Firefox, Chrome, WebKit)
Extracting Hotel Information
To start scraping hotel data for a chosen location and dates, we'll instruct Playwright to open the relevant Expedia search results page.
This initial code snippet launches a Firefox browser instance (non-headless, so you can see it working) and navigates to an Expedia search results page for Rome, Italy, for dates in July 2024.
import time
from playwright.sync_api import sync_playwright
# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1
# Construct the URL (simplified example - real URLs might be more complex)
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")
with sync_playwright() as playwright_instance:
browser = playwright_instance.firefox.launch(
headless=False, # Set to True for background execution
)
page = browser.new_page()
page.goto(expedia_url)
# Allow time for dynamic content to load
print("Page loaded, waiting for dynamic content...")
time.sleep(5) # Increased wait time for potentially slow loads
print("Closing browser.")
browser.close()
Note: While we're using a pre-constructed URL here, a more advanced script could automate filling in the search form on Expedia's homepage and clicking 'Search'. That's a neat challenge to try after working through this tutorial!
Once the search results page is loaded, the next step is to identify and extract data from the individual hotel listing cards.

First, we need to locate all the hotel card elements on the page. Playwright uses selectors to find elements.
# Inside the 'with sync_playwright() as ...' block:
hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
print(f"Found {len(hotel_cards)} initial hotel cards.")
Now, we'll loop through these cards and pull out the specific pieces of information we want: the hotel name (title), its rating, and the price per night.

# Inside the 'with sync_playwright() as ...' block, after locating cards:
extracted_hotels = []
for card in hotel_cards:
# Use locators relative to the card element
content_section = card.locator('div.uitk-card-content-section')
# Extract title
hotel_title_element = content_section.locator('h3')
hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"
# Extract rating (handle cases where it might be missing)
rating_element = content_section.locator('span.uitk-badge-base-text')
hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"
# Extract price (handle cases where it might be missing)
# More specific selector example
price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
# Fallback if the primary price selector fails
if not price_element.is_visible():
# Original selector as fallback
price_element = content_section.locator('div.uitk-type-500')
hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"
hotel_data = {
'title': hotel_title,
'rating': hotel_rating,
'price': hotel_price
}
extracted_hotels.append(hotel_data)
# Optional: Print progress for each hotel
# print(f"Extracted: {hotel_data}")
You'll notice checks like .is_visible()
. This is important because not all hotels display a rating (e.g., new listings) or a price (e.g., fully booked for the selected dates). These checks prevent the script from failing if an element isn't found, assigning a placeholder value like "No Rating"
or "Price Unavailable"
instead.
Finally, we can print the list of collected hotel data:
# Inside the 'with sync_playwright() as ...' block, after the loop:
print("\n--- Extracted Hotel Data ---")
for hotel in extracted_hotels:
print(hotel)
print("--------------------------")
Here’s the combined code for finding and extracting initial hotel data:
import time
from playwright.sync_api import sync_playwright
# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1
# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")
with sync_playwright() as playwright_instance:
browser = playwright_instance.firefox.launch(
headless=False, # Set to True for background execution
)
page = browser.new_page()
# Increase default timeout for navigation
page.set_default_navigation_timeout(60000) # 60 seconds
page.goto(expedia_url)
print("Page loaded, waiting for dynamic content...")
time.sleep(5) # Wait for initial load
# --- Scrape hotels ---
hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
print(f"Found {len(hotel_cards)} initial hotel cards.")
extracted_hotels = []
for card in hotel_cards:
content_section = card.locator('div.uitk-card-content-section')
hotel_title_element = content_section.locator('h3')
hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"
rating_element = content_section.locator('span.uitk-badge-base-text')
hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"
# Attempt primary price selector first
price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
if not price_element.is_visible():
# Fallback selector
price_element = content_section.locator('div.uitk-type-500')
hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"
hotel_data = {
'title': hotel_title,
'rating': hotel_rating,
'price': hotel_price
}
extracted_hotels.append(hotel_data)
print("\n--- Initial Extracted Hotel Data ---")
for hotel in extracted_hotels:
print(hotel)
print("------------------------------------")
print("Closing browser.")
browser.close()
Running this script will likely output a list like this (prices and availability vary):
[
{
'title': 'Hotel Artemide',
'rating': '9.6',
'price': '$450'
},
{
'title': 'iQ Hotel Roma',
'rating': '9.2',
'price': '$380'
},
{
'title': 'UNAHOTELS Decò Roma',
'rating': '8.8',
'price': '$320'
},
{
'title': 'The Hive Hotel',
'rating': '8.6',
'price': '$295'
},
...
]
However, this initial list probably doesn't contain *all* the hotels available. Expedia often loads more results as you scroll or requires clicking a button like "Show More Results".

This is where the power of browser automation libraries like Playwright truly shines – we can simulate user actions like clicking buttons!
Loading All Search Results with Playwright
To capture the complete list of hotels, we need to repeatedly click the "Show More" button (or simulate scrolling, depending on the site's mechanism) until no more results load or the button vanishes. Then, we scrape the full set of cards.
Let's insert logic to handle this. Before scraping the cards, add the following code snippet inside the with sync_playwright() as ...
block:
# Inside the 'with sync_playwright() as ...' block, before card scraping:
print("Checking for 'Show More Results' button...")
show_more_button_selector = 'button[data-stid="show-more-results"]' # Example selector, verify this in browser dev tools
while page.locator(show_more_button_selector).is_visible():
print("Found 'Show More Results' button, clicking...")
try:
page.locator(show_more_button_selector).click(timeout=10000) # 10 second timeout for click
print("Waiting for more results to load...")
# Wait for network activity to settle or just a fixed delay
page.wait_for_load_state('networkidle', timeout=15000) # Wait up to 15s for network to be idle
# Alternative fixed wait: time.sleep(4)
except Exception as e:
print(f"Could not click 'Show More' or timed out waiting: {e}")
break # Exit loop if button disappears or errors occur
print("'Show More Results' button no longer visible or process finished.")
# Now proceed to scrape *all* the cards that are currently loaded
This loop attempts to find the "Show More" button. If visible, it clicks it and waits for new content to load (using networkidle
or a fixed delay). It repeats until the button is no longer found or an error occurs. This pattern is very useful for handling dynamically loading content on websites.
Here’s the complete script incorporating the "Show More" logic:
import time
from playwright.sync_api import sync_playwright
# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1
# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")
with sync_playwright() as playwright_instance:
browser = playwright_instance.firefox.launch(
headless=False, # Set to True for background execution
)
page = browser.new_page()
page.set_default_navigation_timeout(60000) # 60 seconds
page.goto(expedia_url)
print("Page loaded, initial wait...")
time.sleep(5) # Wait for initial load
# --- Handle "Show More Results" ---
print("Checking for 'Show More Results' button...")
# Note: Selector might change, inspect element if needed
show_more_button_selector = 'button[data-stid="show-more-results"]'
while page.locator(show_more_button_selector).is_visible():
print("Found 'Show More Results' button, clicking...")
try:
page.locator(show_more_button_selector).click(timeout=10000)
print("Waiting for more results to load...")
# Wait for network activity to settle or just a fixed delay
page.wait_for_load_state('networkidle', timeout=15000)
# time.sleep(4) # Alternative fixed wait
except Exception as e:
print(f"Could not click 'Show More' or timed out waiting: {e}")
break
print("'Show More Results' button no longer visible or process finished.")
# --- Scrape all loaded hotels ---
print("Scraping all loaded hotel cards...")
hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
print(f"Found {len(hotel_cards)} total hotel cards.")
extracted_hotels = []
for card in hotel_cards:
content_section = card.locator('div.uitk-card-content-section')
hotel_title_element = content_section.locator('h3')
hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"
rating_element = content_section.locator('span.uitk-badge-base-text')
hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"
price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
if not price_element.is_visible():
price_element = content_section.locator('div.uitk-type-500') # Fallback
hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"
hotel_data = {
'title': hotel_title,
'rating': hotel_rating,
'price': hotel_price
}
extracted_hotels.append(hotel_data)
print("\n--- Final Extracted Hotel Data ---")
# Print only the first few and the total count for brevity
for i, hotel in enumerate(extracted_hotels):
if i < 5: # Print first 5
print(hotel)
elif i == 5:
print("...") # Indicate more data exists
print(f"(Total: {len(extracted_hotels)} hotels)")
print("---------------------------------")
print("Closing browser.")
browser.close()
Avoiding Blocks: The Role of Proxies
Scraping data for a single city and date range occasionally is unlikely to raise alarms on Expedia. Their servers handle massive traffic. However, if you intend to collect data at scale – across numerous cities, varying dates, or tracking price changes over time – your activity might be flagged as bot-like, potentially leading to your IP address being blocked.
This is where proxies become essential for serious web scraping. A proxy server acts as an intermediary, routing your request to Expedia but masking your actual IP address. By using a pool of proxies, especially residential or mobile ones, your requests appear to originate from many different, legitimate users rather than a single source.
Services like Evomi provide access to large pools of ethically sourced proxies. With Evomi's residential proxies (starting at $0.49/GB), you can rotate your IP address frequently, significantly reducing the risk of detection and bans. We even offer a free trial on residential, mobile, and datacenter proxies so you can test them out.
Integrating a proxy into your Playwright script is straightforward. First, get the necessary proxy details (host, port, username, password) from your Evomi dashboard.
Then, modify the browser launch configuration within your script:
# Example using Evomi Residential Proxy (HTTP)
proxy_server = "rp.evomi.com:1000" # Or HTTPS on 1001, SOCKS5 on 1002
proxy_username = "YOUR_EVOMI_USERNAME"
proxy_password = "YOUR_EVOMI_PASSWORD"
browser = playwright_instance.firefox.launch(
headless=False, # Keep False for testing, True for production
proxy={
'server': proxy_server,
'username': proxy_username,
'password': proxy_password,
}
)
# ... rest of your script (new_page, goto, etc.)
With this setup, all requests made by this Playwright browser instance will be routed through the specified Evomi proxy server, enhancing anonymity and reducing the likelihood of blocks during large-scale scraping operations.
Why Scrape Expedia Data? Potential Uses
Extracting data from Expedia opens doors to various insightful applications:
Dynamic Pricing Tools: Build tools that compare flight, hotel, or package prices across different dates or providers, helping users find optimal deals.
Market Intelligence: Analyze travel patterns, competitor pricing strategies, popular destinations, and customer preferences to inform business decisions in the travel sector.
Competitive Benchmarking: Hotels or travel agencies can monitor competitors' pricing, room availability, amenities, and special offers to adjust their own strategies effectively.
Sentiment Analysis: Collect and analyze customer reviews and ratings to gauge satisfaction levels, identify areas for improvement in services, or understand public perception of specific hotels or airlines.
Demand Forecasting: Use historical pricing and availability data to build models predicting future travel demand, peak seasons, and pricing fluctuations, aiding in resource planning and marketing efforts.
Wrapping Up
Using common tools like Python paired with Playwright makes extracting hotel information from Expedia quite manageable. You can adapt these techniques to scrape deeper details from individual hotel pages as well.
Remember, as your scraping activities scale up, incorporating proxies is crucial to maintain access and avoid IP bans. Choosing a dependable provider like Evomi, offering ethically sourced proxy pools (like residential, mobile, or datacenter options) with features like IP rotation, is key to successful and sustainable large-scale data collection.
Diving into Expedia Data: A Python Scraping Guide
Expedia, the well-known online travel agency, aggregates a wealth of information on hotels, flights, activities, and travel packages. It's a goldmine for data.
Extracting data from Expedia can provide significant advantages, whether for market analysis, keeping tabs on competitors, or simply snagging the best hotel rates for your next trip.
This guide will walk you through effective techniques for scraping Expedia, featuring a step-by-step tutorial using Python to gather hotel pricing information.
Why Scraping Expedia Can Be Tricky
Like many contemporary websites, Expedia relies heavily on JavaScript to load content dynamically and manage user interactions. This presents a challenge for traditional web scrapers that primarily parse static HTML, such as Beautiful Soup. These tools often can't execute the necessary JavaScript, meaning they might miss crucial data loaded after the initial page load.
Essential Tools for Effective Expedia Scraping
To successfully scrape dynamic sites like Expedia, you'll need tools capable of rendering the page like a real browser. Headless browsers controlled programmatically, like Puppeteer or Playwright, are ideal. They can execute JavaScript, interact with page elements, and access the fully rendered content.
Playwright stands out as a particularly robust choice for tackling dynamic websites. It offers official support for several languages, including Python, JavaScript, Java, and C#, making it accessible to a broad range of developers.
Scraping Expedia Hotel Data with Python and Playwright
Let's get practical. The following sections detail how to use Python with Playwright to extract hotel details for a specific city and date range.
Setting Up Your Environment
First things first, you need Python installed on your system. If you don't have it yet, grab it from the official Python website and follow their installation instructions.
Next, install the Playwright library and the necessary browser binaries. Open your terminal or command prompt and run:
pip install playwright
playwright install # Installs browser binaries (like Firefox, Chrome, WebKit)
Extracting Hotel Information
To start scraping hotel data for a chosen location and dates, we'll instruct Playwright to open the relevant Expedia search results page.
This initial code snippet launches a Firefox browser instance (non-headless, so you can see it working) and navigates to an Expedia search results page for Rome, Italy, for dates in July 2024.
import time
from playwright.sync_api import sync_playwright
# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1
# Construct the URL (simplified example - real URLs might be more complex)
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")
with sync_playwright() as playwright_instance:
browser = playwright_instance.firefox.launch(
headless=False, # Set to True for background execution
)
page = browser.new_page()
page.goto(expedia_url)
# Allow time for dynamic content to load
print("Page loaded, waiting for dynamic content...")
time.sleep(5) # Increased wait time for potentially slow loads
print("Closing browser.")
browser.close()
Note: While we're using a pre-constructed URL here, a more advanced script could automate filling in the search form on Expedia's homepage and clicking 'Search'. That's a neat challenge to try after working through this tutorial!
Once the search results page is loaded, the next step is to identify and extract data from the individual hotel listing cards.

First, we need to locate all the hotel card elements on the page. Playwright uses selectors to find elements.
# Inside the 'with sync_playwright() as ...' block:
hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
print(f"Found {len(hotel_cards)} initial hotel cards.")
Now, we'll loop through these cards and pull out the specific pieces of information we want: the hotel name (title), its rating, and the price per night.

# Inside the 'with sync_playwright() as ...' block, after locating cards:
extracted_hotels = []
for card in hotel_cards:
# Use locators relative to the card element
content_section = card.locator('div.uitk-card-content-section')
# Extract title
hotel_title_element = content_section.locator('h3')
hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"
# Extract rating (handle cases where it might be missing)
rating_element = content_section.locator('span.uitk-badge-base-text')
hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"
# Extract price (handle cases where it might be missing)
# More specific selector example
price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
# Fallback if the primary price selector fails
if not price_element.is_visible():
# Original selector as fallback
price_element = content_section.locator('div.uitk-type-500')
hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"
hotel_data = {
'title': hotel_title,
'rating': hotel_rating,
'price': hotel_price
}
extracted_hotels.append(hotel_data)
# Optional: Print progress for each hotel
# print(f"Extracted: {hotel_data}")
You'll notice checks like .is_visible()
. This is important because not all hotels display a rating (e.g., new listings) or a price (e.g., fully booked for the selected dates). These checks prevent the script from failing if an element isn't found, assigning a placeholder value like "No Rating"
or "Price Unavailable"
instead.
Finally, we can print the list of collected hotel data:
# Inside the 'with sync_playwright() as ...' block, after the loop:
print("\n--- Extracted Hotel Data ---")
for hotel in extracted_hotels:
print(hotel)
print("--------------------------")
Here’s the combined code for finding and extracting initial hotel data:
import time
from playwright.sync_api import sync_playwright
# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1
# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")
with sync_playwright() as playwright_instance:
browser = playwright_instance.firefox.launch(
headless=False, # Set to True for background execution
)
page = browser.new_page()
# Increase default timeout for navigation
page.set_default_navigation_timeout(60000) # 60 seconds
page.goto(expedia_url)
print("Page loaded, waiting for dynamic content...")
time.sleep(5) # Wait for initial load
# --- Scrape hotels ---
hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
print(f"Found {len(hotel_cards)} initial hotel cards.")
extracted_hotels = []
for card in hotel_cards:
content_section = card.locator('div.uitk-card-content-section')
hotel_title_element = content_section.locator('h3')
hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"
rating_element = content_section.locator('span.uitk-badge-base-text')
hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"
# Attempt primary price selector first
price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
if not price_element.is_visible():
# Fallback selector
price_element = content_section.locator('div.uitk-type-500')
hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"
hotel_data = {
'title': hotel_title,
'rating': hotel_rating,
'price': hotel_price
}
extracted_hotels.append(hotel_data)
print("\n--- Initial Extracted Hotel Data ---")
for hotel in extracted_hotels:
print(hotel)
print("------------------------------------")
print("Closing browser.")
browser.close()
Running this script will likely output a list like this (prices and availability vary):
[
{
'title': 'Hotel Artemide',
'rating': '9.6',
'price': '$450'
},
{
'title': 'iQ Hotel Roma',
'rating': '9.2',
'price': '$380'
},
{
'title': 'UNAHOTELS Decò Roma',
'rating': '8.8',
'price': '$320'
},
{
'title': 'The Hive Hotel',
'rating': '8.6',
'price': '$295'
},
...
]
However, this initial list probably doesn't contain *all* the hotels available. Expedia often loads more results as you scroll or requires clicking a button like "Show More Results".

This is where the power of browser automation libraries like Playwright truly shines – we can simulate user actions like clicking buttons!
Loading All Search Results with Playwright
To capture the complete list of hotels, we need to repeatedly click the "Show More" button (or simulate scrolling, depending on the site's mechanism) until no more results load or the button vanishes. Then, we scrape the full set of cards.
Let's insert logic to handle this. Before scraping the cards, add the following code snippet inside the with sync_playwright() as ...
block:
# Inside the 'with sync_playwright() as ...' block, before card scraping:
print("Checking for 'Show More Results' button...")
show_more_button_selector = 'button[data-stid="show-more-results"]' # Example selector, verify this in browser dev tools
while page.locator(show_more_button_selector).is_visible():
print("Found 'Show More Results' button, clicking...")
try:
page.locator(show_more_button_selector).click(timeout=10000) # 10 second timeout for click
print("Waiting for more results to load...")
# Wait for network activity to settle or just a fixed delay
page.wait_for_load_state('networkidle', timeout=15000) # Wait up to 15s for network to be idle
# Alternative fixed wait: time.sleep(4)
except Exception as e:
print(f"Could not click 'Show More' or timed out waiting: {e}")
break # Exit loop if button disappears or errors occur
print("'Show More Results' button no longer visible or process finished.")
# Now proceed to scrape *all* the cards that are currently loaded
This loop attempts to find the "Show More" button. If visible, it clicks it and waits for new content to load (using networkidle
or a fixed delay). It repeats until the button is no longer found or an error occurs. This pattern is very useful for handling dynamically loading content on websites.
Here’s the complete script incorporating the "Show More" logic:
import time
from playwright.sync_api import sync_playwright
# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1
# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")
with sync_playwright() as playwright_instance:
browser = playwright_instance.firefox.launch(
headless=False, # Set to True for background execution
)
page = browser.new_page()
page.set_default_navigation_timeout(60000) # 60 seconds
page.goto(expedia_url)
print("Page loaded, initial wait...")
time.sleep(5) # Wait for initial load
# --- Handle "Show More Results" ---
print("Checking for 'Show More Results' button...")
# Note: Selector might change, inspect element if needed
show_more_button_selector = 'button[data-stid="show-more-results"]'
while page.locator(show_more_button_selector).is_visible():
print("Found 'Show More Results' button, clicking...")
try:
page.locator(show_more_button_selector).click(timeout=10000)
print("Waiting for more results to load...")
# Wait for network activity to settle or just a fixed delay
page.wait_for_load_state('networkidle', timeout=15000)
# time.sleep(4) # Alternative fixed wait
except Exception as e:
print(f"Could not click 'Show More' or timed out waiting: {e}")
break
print("'Show More Results' button no longer visible or process finished.")
# --- Scrape all loaded hotels ---
print("Scraping all loaded hotel cards...")
hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
print(f"Found {len(hotel_cards)} total hotel cards.")
extracted_hotels = []
for card in hotel_cards:
content_section = card.locator('div.uitk-card-content-section')
hotel_title_element = content_section.locator('h3')
hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"
rating_element = content_section.locator('span.uitk-badge-base-text')
hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"
price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
if not price_element.is_visible():
price_element = content_section.locator('div.uitk-type-500') # Fallback
hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"
hotel_data = {
'title': hotel_title,
'rating': hotel_rating,
'price': hotel_price
}
extracted_hotels.append(hotel_data)
print("\n--- Final Extracted Hotel Data ---")
# Print only the first few and the total count for brevity
for i, hotel in enumerate(extracted_hotels):
if i < 5: # Print first 5
print(hotel)
elif i == 5:
print("...") # Indicate more data exists
print(f"(Total: {len(extracted_hotels)} hotels)")
print("---------------------------------")
print("Closing browser.")
browser.close()
Avoiding Blocks: The Role of Proxies
Scraping data for a single city and date range occasionally is unlikely to raise alarms on Expedia. Their servers handle massive traffic. However, if you intend to collect data at scale – across numerous cities, varying dates, or tracking price changes over time – your activity might be flagged as bot-like, potentially leading to your IP address being blocked.
This is where proxies become essential for serious web scraping. A proxy server acts as an intermediary, routing your request to Expedia but masking your actual IP address. By using a pool of proxies, especially residential or mobile ones, your requests appear to originate from many different, legitimate users rather than a single source.
Services like Evomi provide access to large pools of ethically sourced proxies. With Evomi's residential proxies (starting at $0.49/GB), you can rotate your IP address frequently, significantly reducing the risk of detection and bans. We even offer a free trial on residential, mobile, and datacenter proxies so you can test them out.
Integrating a proxy into your Playwright script is straightforward. First, get the necessary proxy details (host, port, username, password) from your Evomi dashboard.
Then, modify the browser launch configuration within your script:
# Example using Evomi Residential Proxy (HTTP)
proxy_server = "rp.evomi.com:1000" # Or HTTPS on 1001, SOCKS5 on 1002
proxy_username = "YOUR_EVOMI_USERNAME"
proxy_password = "YOUR_EVOMI_PASSWORD"
browser = playwright_instance.firefox.launch(
headless=False, # Keep False for testing, True for production
proxy={
'server': proxy_server,
'username': proxy_username,
'password': proxy_password,
}
)
# ... rest of your script (new_page, goto, etc.)
With this setup, all requests made by this Playwright browser instance will be routed through the specified Evomi proxy server, enhancing anonymity and reducing the likelihood of blocks during large-scale scraping operations.
Why Scrape Expedia Data? Potential Uses
Extracting data from Expedia opens doors to various insightful applications:
Dynamic Pricing Tools: Build tools that compare flight, hotel, or package prices across different dates or providers, helping users find optimal deals.
Market Intelligence: Analyze travel patterns, competitor pricing strategies, popular destinations, and customer preferences to inform business decisions in the travel sector.
Competitive Benchmarking: Hotels or travel agencies can monitor competitors' pricing, room availability, amenities, and special offers to adjust their own strategies effectively.
Sentiment Analysis: Collect and analyze customer reviews and ratings to gauge satisfaction levels, identify areas for improvement in services, or understand public perception of specific hotels or airlines.
Demand Forecasting: Use historical pricing and availability data to build models predicting future travel demand, peak seasons, and pricing fluctuations, aiding in resource planning and marketing efforts.
Wrapping Up
Using common tools like Python paired with Playwright makes extracting hotel information from Expedia quite manageable. You can adapt these techniques to scrape deeper details from individual hotel pages as well.
Remember, as your scraping activities scale up, incorporating proxies is crucial to maintain access and avoid IP bans. Choosing a dependable provider like Evomi, offering ethically sourced proxy pools (like residential, mobile, or datacenter options) with features like IP rotation, is key to successful and sustainable large-scale data collection.
Diving into Expedia Data: A Python Scraping Guide
Expedia, the well-known online travel agency, aggregates a wealth of information on hotels, flights, activities, and travel packages. It's a goldmine for data.
Extracting data from Expedia can provide significant advantages, whether for market analysis, keeping tabs on competitors, or simply snagging the best hotel rates for your next trip.
This guide will walk you through effective techniques for scraping Expedia, featuring a step-by-step tutorial using Python to gather hotel pricing information.
Why Scraping Expedia Can Be Tricky
Like many contemporary websites, Expedia relies heavily on JavaScript to load content dynamically and manage user interactions. This presents a challenge for traditional web scrapers that primarily parse static HTML, such as Beautiful Soup. These tools often can't execute the necessary JavaScript, meaning they might miss crucial data loaded after the initial page load.
Essential Tools for Effective Expedia Scraping
To successfully scrape dynamic sites like Expedia, you'll need tools capable of rendering the page like a real browser. Headless browsers controlled programmatically, like Puppeteer or Playwright, are ideal. They can execute JavaScript, interact with page elements, and access the fully rendered content.
Playwright stands out as a particularly robust choice for tackling dynamic websites. It offers official support for several languages, including Python, JavaScript, Java, and C#, making it accessible to a broad range of developers.
Scraping Expedia Hotel Data with Python and Playwright
Let's get practical. The following sections detail how to use Python with Playwright to extract hotel details for a specific city and date range.
Setting Up Your Environment
First things first, you need Python installed on your system. If you don't have it yet, grab it from the official Python website and follow their installation instructions.
Next, install the Playwright library and the necessary browser binaries. Open your terminal or command prompt and run:
pip install playwright
playwright install # Installs browser binaries (like Firefox, Chrome, WebKit)
Extracting Hotel Information
To start scraping hotel data for a chosen location and dates, we'll instruct Playwright to open the relevant Expedia search results page.
This initial code snippet launches a Firefox browser instance (non-headless, so you can see it working) and navigates to an Expedia search results page for Rome, Italy, for dates in July 2024.
import time
from playwright.sync_api import sync_playwright
# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1
# Construct the URL (simplified example - real URLs might be more complex)
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")
with sync_playwright() as playwright_instance:
browser = playwright_instance.firefox.launch(
headless=False, # Set to True for background execution
)
page = browser.new_page()
page.goto(expedia_url)
# Allow time for dynamic content to load
print("Page loaded, waiting for dynamic content...")
time.sleep(5) # Increased wait time for potentially slow loads
print("Closing browser.")
browser.close()
Note: While we're using a pre-constructed URL here, a more advanced script could automate filling in the search form on Expedia's homepage and clicking 'Search'. That's a neat challenge to try after working through this tutorial!
Once the search results page is loaded, the next step is to identify and extract data from the individual hotel listing cards.

First, we need to locate all the hotel card elements on the page. Playwright uses selectors to find elements.
# Inside the 'with sync_playwright() as ...' block:
hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
print(f"Found {len(hotel_cards)} initial hotel cards.")
Now, we'll loop through these cards and pull out the specific pieces of information we want: the hotel name (title), its rating, and the price per night.

# Inside the 'with sync_playwright() as ...' block, after locating cards:
extracted_hotels = []
for card in hotel_cards:
# Use locators relative to the card element
content_section = card.locator('div.uitk-card-content-section')
# Extract title
hotel_title_element = content_section.locator('h3')
hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"
# Extract rating (handle cases where it might be missing)
rating_element = content_section.locator('span.uitk-badge-base-text')
hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"
# Extract price (handle cases where it might be missing)
# More specific selector example
price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
# Fallback if the primary price selector fails
if not price_element.is_visible():
# Original selector as fallback
price_element = content_section.locator('div.uitk-type-500')
hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"
hotel_data = {
'title': hotel_title,
'rating': hotel_rating,
'price': hotel_price
}
extracted_hotels.append(hotel_data)
# Optional: Print progress for each hotel
# print(f"Extracted: {hotel_data}")
You'll notice checks like .is_visible()
. This is important because not all hotels display a rating (e.g., new listings) or a price (e.g., fully booked for the selected dates). These checks prevent the script from failing if an element isn't found, assigning a placeholder value like "No Rating"
or "Price Unavailable"
instead.
Finally, we can print the list of collected hotel data:
# Inside the 'with sync_playwright() as ...' block, after the loop:
print("\n--- Extracted Hotel Data ---")
for hotel in extracted_hotels:
print(hotel)
print("--------------------------")
Here’s the combined code for finding and extracting initial hotel data:
import time
from playwright.sync_api import sync_playwright
# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1
# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")
with sync_playwright() as playwright_instance:
browser = playwright_instance.firefox.launch(
headless=False, # Set to True for background execution
)
page = browser.new_page()
# Increase default timeout for navigation
page.set_default_navigation_timeout(60000) # 60 seconds
page.goto(expedia_url)
print("Page loaded, waiting for dynamic content...")
time.sleep(5) # Wait for initial load
# --- Scrape hotels ---
hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
print(f"Found {len(hotel_cards)} initial hotel cards.")
extracted_hotels = []
for card in hotel_cards:
content_section = card.locator('div.uitk-card-content-section')
hotel_title_element = content_section.locator('h3')
hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"
rating_element = content_section.locator('span.uitk-badge-base-text')
hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"
# Attempt primary price selector first
price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
if not price_element.is_visible():
# Fallback selector
price_element = content_section.locator('div.uitk-type-500')
hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"
hotel_data = {
'title': hotel_title,
'rating': hotel_rating,
'price': hotel_price
}
extracted_hotels.append(hotel_data)
print("\n--- Initial Extracted Hotel Data ---")
for hotel in extracted_hotels:
print(hotel)
print("------------------------------------")
print("Closing browser.")
browser.close()
Running this script will likely output a list like this (prices and availability vary):
[
{
'title': 'Hotel Artemide',
'rating': '9.6',
'price': '$450'
},
{
'title': 'iQ Hotel Roma',
'rating': '9.2',
'price': '$380'
},
{
'title': 'UNAHOTELS Decò Roma',
'rating': '8.8',
'price': '$320'
},
{
'title': 'The Hive Hotel',
'rating': '8.6',
'price': '$295'
},
...
]
However, this initial list probably doesn't contain *all* the hotels available. Expedia often loads more results as you scroll or requires clicking a button like "Show More Results".

This is where the power of browser automation libraries like Playwright truly shines – we can simulate user actions like clicking buttons!
Loading All Search Results with Playwright
To capture the complete list of hotels, we need to repeatedly click the "Show More" button (or simulate scrolling, depending on the site's mechanism) until no more results load or the button vanishes. Then, we scrape the full set of cards.
Let's insert logic to handle this. Before scraping the cards, add the following code snippet inside the with sync_playwright() as ...
block:
# Inside the 'with sync_playwright() as ...' block, before card scraping:
print("Checking for 'Show More Results' button...")
show_more_button_selector = 'button[data-stid="show-more-results"]' # Example selector, verify this in browser dev tools
while page.locator(show_more_button_selector).is_visible():
print("Found 'Show More Results' button, clicking...")
try:
page.locator(show_more_button_selector).click(timeout=10000) # 10 second timeout for click
print("Waiting for more results to load...")
# Wait for network activity to settle or just a fixed delay
page.wait_for_load_state('networkidle', timeout=15000) # Wait up to 15s for network to be idle
# Alternative fixed wait: time.sleep(4)
except Exception as e:
print(f"Could not click 'Show More' or timed out waiting: {e}")
break # Exit loop if button disappears or errors occur
print("'Show More Results' button no longer visible or process finished.")
# Now proceed to scrape *all* the cards that are currently loaded
This loop attempts to find the "Show More" button. If visible, it clicks it and waits for new content to load (using networkidle
or a fixed delay). It repeats until the button is no longer found or an error occurs. This pattern is very useful for handling dynamically loading content on websites.
Here’s the complete script incorporating the "Show More" logic:
import time
from playwright.sync_api import sync_playwright
# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1
# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")
with sync_playwright() as playwright_instance:
browser = playwright_instance.firefox.launch(
headless=False, # Set to True for background execution
)
page = browser.new_page()
page.set_default_navigation_timeout(60000) # 60 seconds
page.goto(expedia_url)
print("Page loaded, initial wait...")
time.sleep(5) # Wait for initial load
# --- Handle "Show More Results" ---
print("Checking for 'Show More Results' button...")
# Note: Selector might change, inspect element if needed
show_more_button_selector = 'button[data-stid="show-more-results"]'
while page.locator(show_more_button_selector).is_visible():
print("Found 'Show More Results' button, clicking...")
try:
page.locator(show_more_button_selector).click(timeout=10000)
print("Waiting for more results to load...")
# Wait for network activity to settle or just a fixed delay
page.wait_for_load_state('networkidle', timeout=15000)
# time.sleep(4) # Alternative fixed wait
except Exception as e:
print(f"Could not click 'Show More' or timed out waiting: {e}")
break
print("'Show More Results' button no longer visible or process finished.")
# --- Scrape all loaded hotels ---
print("Scraping all loaded hotel cards...")
hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
print(f"Found {len(hotel_cards)} total hotel cards.")
extracted_hotels = []
for card in hotel_cards:
content_section = card.locator('div.uitk-card-content-section')
hotel_title_element = content_section.locator('h3')
hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"
rating_element = content_section.locator('span.uitk-badge-base-text')
hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"
price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
if not price_element.is_visible():
price_element = content_section.locator('div.uitk-type-500') # Fallback
hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"
hotel_data = {
'title': hotel_title,
'rating': hotel_rating,
'price': hotel_price
}
extracted_hotels.append(hotel_data)
print("\n--- Final Extracted Hotel Data ---")
# Print only the first few and the total count for brevity
for i, hotel in enumerate(extracted_hotels):
if i < 5: # Print first 5
print(hotel)
elif i == 5:
print("...") # Indicate more data exists
print(f"(Total: {len(extracted_hotels)} hotels)")
print("---------------------------------")
print("Closing browser.")
browser.close()
Avoiding Blocks: The Role of Proxies
Scraping data for a single city and date range occasionally is unlikely to raise alarms on Expedia. Their servers handle massive traffic. However, if you intend to collect data at scale – across numerous cities, varying dates, or tracking price changes over time – your activity might be flagged as bot-like, potentially leading to your IP address being blocked.
This is where proxies become essential for serious web scraping. A proxy server acts as an intermediary, routing your request to Expedia but masking your actual IP address. By using a pool of proxies, especially residential or mobile ones, your requests appear to originate from many different, legitimate users rather than a single source.
Services like Evomi provide access to large pools of ethically sourced proxies. With Evomi's residential proxies (starting at $0.49/GB), you can rotate your IP address frequently, significantly reducing the risk of detection and bans. We even offer a free trial on residential, mobile, and datacenter proxies so you can test them out.
Integrating a proxy into your Playwright script is straightforward. First, get the necessary proxy details (host, port, username, password) from your Evomi dashboard.
Then, modify the browser launch configuration within your script:
# Example using Evomi Residential Proxy (HTTP)
proxy_server = "rp.evomi.com:1000" # Or HTTPS on 1001, SOCKS5 on 1002
proxy_username = "YOUR_EVOMI_USERNAME"
proxy_password = "YOUR_EVOMI_PASSWORD"
browser = playwright_instance.firefox.launch(
headless=False, # Keep False for testing, True for production
proxy={
'server': proxy_server,
'username': proxy_username,
'password': proxy_password,
}
)
# ... rest of your script (new_page, goto, etc.)
With this setup, all requests made by this Playwright browser instance will be routed through the specified Evomi proxy server, enhancing anonymity and reducing the likelihood of blocks during large-scale scraping operations.
Why Scrape Expedia Data? Potential Uses
Extracting data from Expedia opens doors to various insightful applications:
Dynamic Pricing Tools: Build tools that compare flight, hotel, or package prices across different dates or providers, helping users find optimal deals.
Market Intelligence: Analyze travel patterns, competitor pricing strategies, popular destinations, and customer preferences to inform business decisions in the travel sector.
Competitive Benchmarking: Hotels or travel agencies can monitor competitors' pricing, room availability, amenities, and special offers to adjust their own strategies effectively.
Sentiment Analysis: Collect and analyze customer reviews and ratings to gauge satisfaction levels, identify areas for improvement in services, or understand public perception of specific hotels or airlines.
Demand Forecasting: Use historical pricing and availability data to build models predicting future travel demand, peak seasons, and pricing fluctuations, aiding in resource planning and marketing efforts.
Wrapping Up
Using common tools like Python paired with Playwright makes extracting hotel information from Expedia quite manageable. You can adapt these techniques to scrape deeper details from individual hotel pages as well.
Remember, as your scraping activities scale up, incorporating proxies is crucial to maintain access and avoid IP bans. Choosing a dependable provider like Evomi, offering ethically sourced proxy pools (like residential, mobile, or datacenter options) with features like IP rotation, is key to successful and sustainable large-scale data collection.

Author
Sarah Whitmore
Digital Privacy & Cybersecurity Consultant
About Author
Sarah is a cybersecurity strategist with a passion for online privacy and digital security. She explores how proxies, VPNs, and encryption tools protect users from tracking, cyber threats, and data breaches. With years of experience in cybersecurity consulting, she provides practical insights into safeguarding sensitive data in an increasingly digital world.