Master Expedia Scraping with Python and Proxies

Sarah Whitmore

Last edited on May 4, 2025
Last edited on May 4, 2025

Scraping Techniques

Diving into Expedia Data: A Python Scraping Guide

Expedia, the well-known online travel agency, aggregates a wealth of information on hotels, flights, activities, and travel packages. It's a goldmine for data.

Extracting data from Expedia can provide significant advantages, whether for market analysis, keeping tabs on competitors, or simply snagging the best hotel rates for your next trip.

This guide will walk you through effective techniques for scraping Expedia, featuring a step-by-step tutorial using Python to gather hotel pricing information.

Why Scraping Expedia Can Be Tricky

Like many contemporary websites, Expedia relies heavily on JavaScript to load content dynamically and manage user interactions. This presents a challenge for traditional web scrapers that primarily parse static HTML, such as Beautiful Soup. These tools often can't execute the necessary JavaScript, meaning they might miss crucial data loaded after the initial page load.

Essential Tools for Effective Expedia Scraping

To successfully scrape dynamic sites like Expedia, you'll need tools capable of rendering the page like a real browser. Headless browsers controlled programmatically, like Puppeteer or Playwright, are ideal. They can execute JavaScript, interact with page elements, and access the fully rendered content.

Playwright stands out as a particularly robust choice for tackling dynamic websites. It offers official support for several languages, including Python, JavaScript, Java, and C#, making it accessible to a broad range of developers.

Scraping Expedia Hotel Data with Python and Playwright

Let's get practical. The following sections detail how to use Python with Playwright to extract hotel details for a specific city and date range.

Setting Up Your Environment

First things first, you need Python installed on your system. If you don't have it yet, grab it from the official Python website and follow their installation instructions.

Next, install the Playwright library and the necessary browser binaries. Open your terminal or command prompt and run:

pip install playwright
playwright install # Installs browser binaries (like Firefox, Chrome, WebKit)

Extracting Hotel Information

To start scraping hotel data for a chosen location and dates, we'll instruct Playwright to open the relevant Expedia search results page.

This initial code snippet launches a Firefox browser instance (non-headless, so you can see it working) and navigates to an Expedia search results page for Rome, Italy, for dates in July 2024.

import time
from playwright.sync_api import sync_playwright

# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1

# Construct the URL (simplified example - real URLs might be more complex)
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"

print(f"Navigating to: {expedia_url}")

with sync_playwright() as playwright_instance:
    browser = playwright_instance.firefox.launch(
        headless=False, # Set to True for background execution
    )
    page = browser.new_page()
    page.goto(expedia_url)

    # Allow time for dynamic content to load
    print("Page loaded, waiting for dynamic content...")
    time.sleep(5) # Increased wait time for potentially slow loads

    print("Closing browser.")
    browser.close()

Note: While we're using a pre-constructed URL here, a more advanced script could automate filling in the search form on Expedia's homepage and clicking 'Search'. That's a neat challenge to try after working through this tutorial!

Once the search results page is loaded, the next step is to identify and extract data from the individual hotel listing cards.

Expedia hotel search results page showing hotel cards

First, we need to locate all the hotel card elements on the page. Playwright uses selectors to find elements.

# Inside the 'with sync_playwright() as ...' block:
    hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
    print(f"Found {len(hotel_cards)} initial hotel cards.")

Now, we'll loop through these cards and pull out the specific pieces of information we want: the hotel name (title), its rating, and the price per night.

Close-up of an Expedia hotel card highlighting title, rating, and price
    # Inside the 'with sync_playwright() as ...' block, after locating cards:
    extracted_hotels = []
    for card in hotel_cards:
        # Use locators relative to the card element
        content_section = card.locator('div.uitk-card-content-section')

        # Extract title
        hotel_title_element = content_section.locator('h3')
        hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"

        # Extract rating (handle cases where it might be missing)
        rating_element = content_section.locator('span.uitk-badge-base-text')
        hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"

        # Extract price (handle cases where it might be missing)
        # More specific selector example
        price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
        # Fallback if the primary price selector fails
        if not price_element.is_visible():
             # Original selector as fallback
            price_element = content_section.locator('div.uitk-type-500')
        hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"

        hotel_data = {
            'title': hotel_title,
            'rating': hotel_rating,
            'price': hotel_price
        }
        extracted_hotels.append(hotel_data)

        # Optional: Print progress for each hotel
        # print(f"Extracted: {hotel_data}")

You'll notice checks like .is_visible(). This is important because not all hotels display a rating (e.g., new listings) or a price (e.g., fully booked for the selected dates). These checks prevent the script from failing if an element isn't found, assigning a placeholder value like "No Rating" or "Price Unavailable" instead.

Finally, we can print the list of collected hotel data:

    # Inside the 'with sync_playwright() as ...' block, after the loop:
    print("\n--- Extracted Hotel Data ---")
    for hotel in extracted_hotels:
        print(hotel)
    print("--------------------------")

Here’s the combined code for finding and extracting initial hotel data:

import time
from playwright.sync_api import sync_playwright

# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1

# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"

print(f"Navigating to: {expedia_url}")

with sync_playwright() as playwright_instance:
    browser = playwright_instance.firefox.launch(
        headless=False,  # Set to True for background execution
    )
    page = browser.new_page()

    # Increase default timeout for navigation
    page.set_default_navigation_timeout(60000)  # 60 seconds

    page.goto(expedia_url)

    print("Page loaded, waiting for dynamic content...")
    time.sleep(5)  # Wait for initial load

    # --- Scrape hotels ---
    hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
    print(f"Found {len(hotel_cards)} initial hotel cards.")

    extracted_hotels = []
    for card in hotel_cards:
        content_section = card.locator('div.uitk-card-content-section')
        hotel_title_element = content_section.locator('h3')
        hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"

        rating_element = content_section.locator('span.uitk-badge-base-text')
        hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"

        # Attempt primary price selector first
        price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
        if not price_element.is_visible():
            # Fallback selector
            price_element = content_section.locator('div.uitk-type-500')
        hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"

        hotel_data = {
            'title': hotel_title,
            'rating': hotel_rating,
            'price': hotel_price
        }
        extracted_hotels.append(hotel_data)

    print("\n--- Initial Extracted Hotel Data ---")
    for hotel in extracted_hotels:
        print(hotel)
    print("------------------------------------")

    print("Closing browser.")
    browser.close()

Running this script will likely output a list like this (prices and availability vary):

[
  {
    'title': 'Hotel Artemide',
    'rating': '9.6',
    'price': '$450'
  },
  {
    'title': 'iQ Hotel Roma',
    'rating': '9.2',
    'price': '$380'
  },
  {
    'title': 'UNAHOTELS Decò Roma',
    'rating': '8.8',
    'price': '$320'
  },
  {
    'title': 'The Hive Hotel',
    'rating': '8.6',
    'price': '$295'
  },
  ...
]

However, this initial list probably doesn't contain *all* the hotels available. Expedia often loads more results as you scroll or requires clicking a button like "Show More Results".

Expedia page showing the 'Show More' button at the bottom of the list

This is where the power of browser automation libraries like Playwright truly shines – we can simulate user actions like clicking buttons!

Loading All Search Results with Playwright

To capture the complete list of hotels, we need to repeatedly click the "Show More" button (or simulate scrolling, depending on the site's mechanism) until no more results load or the button vanishes. Then, we scrape the full set of cards.

Let's insert logic to handle this. Before scraping the cards, add the following code snippet inside the with sync_playwright() as ... block:

    # Inside the 'with sync_playwright() as ...' block, before card scraping:
    print("Checking for 'Show More Results' button...")
    show_more_button_selector = 'button[data-stid="show-more-results"]' # Example selector, verify this in browser dev tools
    while page.locator(show_more_button_selector).is_visible():
        print("Found 'Show More Results' button, clicking...")
        try:
            page.locator(show_more_button_selector).click(timeout=10000) # 10 second timeout for click
            print("Waiting for more results to load...")
            # Wait for network activity to settle or just a fixed delay
            page.wait_for_load_state('networkidle', timeout=15000) # Wait up to 15s for network to be idle
            # Alternative fixed wait: time.sleep(4)
        except Exception as e:
            print(f"Could not click 'Show More' or timed out waiting: {e}")
            break # Exit loop if button disappears or errors occur
    print("'Show More Results' button no longer visible or process finished.")
    # Now proceed to scrape *all* the cards that are currently loaded

This loop attempts to find the "Show More" button. If visible, it clicks it and waits for new content to load (using networkidle or a fixed delay). It repeats until the button is no longer found or an error occurs. This pattern is very useful for handling dynamically loading content on websites.

Here’s the complete script incorporating the "Show More" logic:

import time
from playwright.sync_api import sync_playwright

# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1

# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")

with sync_playwright() as playwright_instance:
    browser = playwright_instance.firefox.launch(
        headless=False,  # Set to True for background execution
    )
    page = browser.new_page()
    page.set_default_navigation_timeout(60000)  # 60 seconds
    page.goto(expedia_url)

    print("Page loaded, initial wait...")
    time.sleep(5)  # Wait for initial load

    # --- Handle "Show More Results" ---
    print("Checking for 'Show More Results' button...")
    # Note: Selector might change, inspect element if needed
    show_more_button_selector = 'button[data-stid="show-more-results"]'
    while page.locator(show_more_button_selector).is_visible():
        print("Found 'Show More Results' button, clicking...")
        try:
            page.locator(show_more_button_selector).click(timeout=10000)
            print("Waiting for more results to load...")
            # Wait for network activity to settle or just a fixed delay
            page.wait_for_load_state('networkidle', timeout=15000)
            # time.sleep(4) # Alternative fixed wait
        except Exception as e:
            print(f"Could not click 'Show More' or timed out waiting: {e}")
            break

    print("'Show More Results' button no longer visible or process finished.")

    # --- Scrape all loaded hotels ---
    print("Scraping all loaded hotel cards...")
    hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
    print(f"Found {len(hotel_cards)} total hotel cards.")

    extracted_hotels = []
    for card in hotel_cards:
        content_section = card.locator('div.uitk-card-content-section')
        hotel_title_element = content_section.locator('h3')
        hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"

        rating_element = content_section.locator('span.uitk-badge-base-text')
        hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"

        price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
        if not price_element.is_visible():
            price_element = content_section.locator('div.uitk-type-500')  # Fallback
        hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"

        hotel_data = {
            'title': hotel_title,
            'rating': hotel_rating,
            'price': hotel_price
        }
        extracted_hotels.append(hotel_data)

    print("\n--- Final Extracted Hotel Data ---")
    # Print only the first few and the total count for brevity
    for i, hotel in enumerate(extracted_hotels):
        if i < 5:  # Print first 5
            print(hotel)
        elif i == 5:
            print("...")  # Indicate more data exists
    print(f"(Total: {len(extracted_hotels)} hotels)")
    print("---------------------------------")

    print("Closing browser.")
    browser.close()

Avoiding Blocks: The Role of Proxies

Scraping data for a single city and date range occasionally is unlikely to raise alarms on Expedia. Their servers handle massive traffic. However, if you intend to collect data at scale – across numerous cities, varying dates, or tracking price changes over time – your activity might be flagged as bot-like, potentially leading to your IP address being blocked.

This is where proxies become essential for serious web scraping. A proxy server acts as an intermediary, routing your request to Expedia but masking your actual IP address. By using a pool of proxies, especially residential or mobile ones, your requests appear to originate from many different, legitimate users rather than a single source.

Services like Evomi provide access to large pools of ethically sourced proxies. With Evomi's residential proxies (starting at $0.49/GB), you can rotate your IP address frequently, significantly reducing the risk of detection and bans. We even offer a free trial on residential, mobile, and datacenter proxies so you can test them out.

Integrating a proxy into your Playwright script is straightforward. First, get the necessary proxy details (host, port, username, password) from your Evomi dashboard.

Then, modify the browser launch configuration within your script:

# Example using Evomi Residential Proxy (HTTP)
proxy_server = "rp.evomi.com:1000"  # Or HTTPS on 1001, SOCKS5 on 1002
proxy_username = "YOUR_EVOMI_USERNAME"
proxy_password = "YOUR_EVOMI_PASSWORD"
browser = playwright_instance.firefox.launch(
    headless=False,  # Keep False for testing, True for production
    proxy={
        'server': proxy_server,
        'username': proxy_username,
        'password': proxy_password,
    }
)
# ... rest of your script (new_page, goto, etc.)

With this setup, all requests made by this Playwright browser instance will be routed through the specified Evomi proxy server, enhancing anonymity and reducing the likelihood of blocks during large-scale scraping operations.

Why Scrape Expedia Data? Potential Uses

Extracting data from Expedia opens doors to various insightful applications:

  • Dynamic Pricing Tools: Build tools that compare flight, hotel, or package prices across different dates or providers, helping users find optimal deals.

  • Market Intelligence: Analyze travel patterns, competitor pricing strategies, popular destinations, and customer preferences to inform business decisions in the travel sector.

  • Competitive Benchmarking: Hotels or travel agencies can monitor competitors' pricing, room availability, amenities, and special offers to adjust their own strategies effectively.

  • Sentiment Analysis: Collect and analyze customer reviews and ratings to gauge satisfaction levels, identify areas for improvement in services, or understand public perception of specific hotels or airlines.

  • Demand Forecasting: Use historical pricing and availability data to build models predicting future travel demand, peak seasons, and pricing fluctuations, aiding in resource planning and marketing efforts.

Wrapping Up

Using common tools like Python paired with Playwright makes extracting hotel information from Expedia quite manageable. You can adapt these techniques to scrape deeper details from individual hotel pages as well.

Remember, as your scraping activities scale up, incorporating proxies is crucial to maintain access and avoid IP bans. Choosing a dependable provider like Evomi, offering ethically sourced proxy pools (like residential, mobile, or datacenter options) with features like IP rotation, is key to successful and sustainable large-scale data collection.

Diving into Expedia Data: A Python Scraping Guide

Expedia, the well-known online travel agency, aggregates a wealth of information on hotels, flights, activities, and travel packages. It's a goldmine for data.

Extracting data from Expedia can provide significant advantages, whether for market analysis, keeping tabs on competitors, or simply snagging the best hotel rates for your next trip.

This guide will walk you through effective techniques for scraping Expedia, featuring a step-by-step tutorial using Python to gather hotel pricing information.

Why Scraping Expedia Can Be Tricky

Like many contemporary websites, Expedia relies heavily on JavaScript to load content dynamically and manage user interactions. This presents a challenge for traditional web scrapers that primarily parse static HTML, such as Beautiful Soup. These tools often can't execute the necessary JavaScript, meaning they might miss crucial data loaded after the initial page load.

Essential Tools for Effective Expedia Scraping

To successfully scrape dynamic sites like Expedia, you'll need tools capable of rendering the page like a real browser. Headless browsers controlled programmatically, like Puppeteer or Playwright, are ideal. They can execute JavaScript, interact with page elements, and access the fully rendered content.

Playwright stands out as a particularly robust choice for tackling dynamic websites. It offers official support for several languages, including Python, JavaScript, Java, and C#, making it accessible to a broad range of developers.

Scraping Expedia Hotel Data with Python and Playwright

Let's get practical. The following sections detail how to use Python with Playwright to extract hotel details for a specific city and date range.

Setting Up Your Environment

First things first, you need Python installed on your system. If you don't have it yet, grab it from the official Python website and follow their installation instructions.

Next, install the Playwright library and the necessary browser binaries. Open your terminal or command prompt and run:

pip install playwright
playwright install # Installs browser binaries (like Firefox, Chrome, WebKit)

Extracting Hotel Information

To start scraping hotel data for a chosen location and dates, we'll instruct Playwright to open the relevant Expedia search results page.

This initial code snippet launches a Firefox browser instance (non-headless, so you can see it working) and navigates to an Expedia search results page for Rome, Italy, for dates in July 2024.

import time
from playwright.sync_api import sync_playwright

# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1

# Construct the URL (simplified example - real URLs might be more complex)
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"

print(f"Navigating to: {expedia_url}")

with sync_playwright() as playwright_instance:
    browser = playwright_instance.firefox.launch(
        headless=False, # Set to True for background execution
    )
    page = browser.new_page()
    page.goto(expedia_url)

    # Allow time for dynamic content to load
    print("Page loaded, waiting for dynamic content...")
    time.sleep(5) # Increased wait time for potentially slow loads

    print("Closing browser.")
    browser.close()

Note: While we're using a pre-constructed URL here, a more advanced script could automate filling in the search form on Expedia's homepage and clicking 'Search'. That's a neat challenge to try after working through this tutorial!

Once the search results page is loaded, the next step is to identify and extract data from the individual hotel listing cards.

Expedia hotel search results page showing hotel cards

First, we need to locate all the hotel card elements on the page. Playwright uses selectors to find elements.

# Inside the 'with sync_playwright() as ...' block:
    hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
    print(f"Found {len(hotel_cards)} initial hotel cards.")

Now, we'll loop through these cards and pull out the specific pieces of information we want: the hotel name (title), its rating, and the price per night.

Close-up of an Expedia hotel card highlighting title, rating, and price
    # Inside the 'with sync_playwright() as ...' block, after locating cards:
    extracted_hotels = []
    for card in hotel_cards:
        # Use locators relative to the card element
        content_section = card.locator('div.uitk-card-content-section')

        # Extract title
        hotel_title_element = content_section.locator('h3')
        hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"

        # Extract rating (handle cases where it might be missing)
        rating_element = content_section.locator('span.uitk-badge-base-text')
        hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"

        # Extract price (handle cases where it might be missing)
        # More specific selector example
        price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
        # Fallback if the primary price selector fails
        if not price_element.is_visible():
             # Original selector as fallback
            price_element = content_section.locator('div.uitk-type-500')
        hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"

        hotel_data = {
            'title': hotel_title,
            'rating': hotel_rating,
            'price': hotel_price
        }
        extracted_hotels.append(hotel_data)

        # Optional: Print progress for each hotel
        # print(f"Extracted: {hotel_data}")

You'll notice checks like .is_visible(). This is important because not all hotels display a rating (e.g., new listings) or a price (e.g., fully booked for the selected dates). These checks prevent the script from failing if an element isn't found, assigning a placeholder value like "No Rating" or "Price Unavailable" instead.

Finally, we can print the list of collected hotel data:

    # Inside the 'with sync_playwright() as ...' block, after the loop:
    print("\n--- Extracted Hotel Data ---")
    for hotel in extracted_hotels:
        print(hotel)
    print("--------------------------")

Here’s the combined code for finding and extracting initial hotel data:

import time
from playwright.sync_api import sync_playwright

# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1

# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"

print(f"Navigating to: {expedia_url}")

with sync_playwright() as playwright_instance:
    browser = playwright_instance.firefox.launch(
        headless=False,  # Set to True for background execution
    )
    page = browser.new_page()

    # Increase default timeout for navigation
    page.set_default_navigation_timeout(60000)  # 60 seconds

    page.goto(expedia_url)

    print("Page loaded, waiting for dynamic content...")
    time.sleep(5)  # Wait for initial load

    # --- Scrape hotels ---
    hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
    print(f"Found {len(hotel_cards)} initial hotel cards.")

    extracted_hotels = []
    for card in hotel_cards:
        content_section = card.locator('div.uitk-card-content-section')
        hotel_title_element = content_section.locator('h3')
        hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"

        rating_element = content_section.locator('span.uitk-badge-base-text')
        hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"

        # Attempt primary price selector first
        price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
        if not price_element.is_visible():
            # Fallback selector
            price_element = content_section.locator('div.uitk-type-500')
        hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"

        hotel_data = {
            'title': hotel_title,
            'rating': hotel_rating,
            'price': hotel_price
        }
        extracted_hotels.append(hotel_data)

    print("\n--- Initial Extracted Hotel Data ---")
    for hotel in extracted_hotels:
        print(hotel)
    print("------------------------------------")

    print("Closing browser.")
    browser.close()

Running this script will likely output a list like this (prices and availability vary):

[
  {
    'title': 'Hotel Artemide',
    'rating': '9.6',
    'price': '$450'
  },
  {
    'title': 'iQ Hotel Roma',
    'rating': '9.2',
    'price': '$380'
  },
  {
    'title': 'UNAHOTELS Decò Roma',
    'rating': '8.8',
    'price': '$320'
  },
  {
    'title': 'The Hive Hotel',
    'rating': '8.6',
    'price': '$295'
  },
  ...
]

However, this initial list probably doesn't contain *all* the hotels available. Expedia often loads more results as you scroll or requires clicking a button like "Show More Results".

Expedia page showing the 'Show More' button at the bottom of the list

This is where the power of browser automation libraries like Playwright truly shines – we can simulate user actions like clicking buttons!

Loading All Search Results with Playwright

To capture the complete list of hotels, we need to repeatedly click the "Show More" button (or simulate scrolling, depending on the site's mechanism) until no more results load or the button vanishes. Then, we scrape the full set of cards.

Let's insert logic to handle this. Before scraping the cards, add the following code snippet inside the with sync_playwright() as ... block:

    # Inside the 'with sync_playwright() as ...' block, before card scraping:
    print("Checking for 'Show More Results' button...")
    show_more_button_selector = 'button[data-stid="show-more-results"]' # Example selector, verify this in browser dev tools
    while page.locator(show_more_button_selector).is_visible():
        print("Found 'Show More Results' button, clicking...")
        try:
            page.locator(show_more_button_selector).click(timeout=10000) # 10 second timeout for click
            print("Waiting for more results to load...")
            # Wait for network activity to settle or just a fixed delay
            page.wait_for_load_state('networkidle', timeout=15000) # Wait up to 15s for network to be idle
            # Alternative fixed wait: time.sleep(4)
        except Exception as e:
            print(f"Could not click 'Show More' or timed out waiting: {e}")
            break # Exit loop if button disappears or errors occur
    print("'Show More Results' button no longer visible or process finished.")
    # Now proceed to scrape *all* the cards that are currently loaded

This loop attempts to find the "Show More" button. If visible, it clicks it and waits for new content to load (using networkidle or a fixed delay). It repeats until the button is no longer found or an error occurs. This pattern is very useful for handling dynamically loading content on websites.

Here’s the complete script incorporating the "Show More" logic:

import time
from playwright.sync_api import sync_playwright

# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1

# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")

with sync_playwright() as playwright_instance:
    browser = playwright_instance.firefox.launch(
        headless=False,  # Set to True for background execution
    )
    page = browser.new_page()
    page.set_default_navigation_timeout(60000)  # 60 seconds
    page.goto(expedia_url)

    print("Page loaded, initial wait...")
    time.sleep(5)  # Wait for initial load

    # --- Handle "Show More Results" ---
    print("Checking for 'Show More Results' button...")
    # Note: Selector might change, inspect element if needed
    show_more_button_selector = 'button[data-stid="show-more-results"]'
    while page.locator(show_more_button_selector).is_visible():
        print("Found 'Show More Results' button, clicking...")
        try:
            page.locator(show_more_button_selector).click(timeout=10000)
            print("Waiting for more results to load...")
            # Wait for network activity to settle or just a fixed delay
            page.wait_for_load_state('networkidle', timeout=15000)
            # time.sleep(4) # Alternative fixed wait
        except Exception as e:
            print(f"Could not click 'Show More' or timed out waiting: {e}")
            break

    print("'Show More Results' button no longer visible or process finished.")

    # --- Scrape all loaded hotels ---
    print("Scraping all loaded hotel cards...")
    hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
    print(f"Found {len(hotel_cards)} total hotel cards.")

    extracted_hotels = []
    for card in hotel_cards:
        content_section = card.locator('div.uitk-card-content-section')
        hotel_title_element = content_section.locator('h3')
        hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"

        rating_element = content_section.locator('span.uitk-badge-base-text')
        hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"

        price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
        if not price_element.is_visible():
            price_element = content_section.locator('div.uitk-type-500')  # Fallback
        hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"

        hotel_data = {
            'title': hotel_title,
            'rating': hotel_rating,
            'price': hotel_price
        }
        extracted_hotels.append(hotel_data)

    print("\n--- Final Extracted Hotel Data ---")
    # Print only the first few and the total count for brevity
    for i, hotel in enumerate(extracted_hotels):
        if i < 5:  # Print first 5
            print(hotel)
        elif i == 5:
            print("...")  # Indicate more data exists
    print(f"(Total: {len(extracted_hotels)} hotels)")
    print("---------------------------------")

    print("Closing browser.")
    browser.close()

Avoiding Blocks: The Role of Proxies

Scraping data for a single city and date range occasionally is unlikely to raise alarms on Expedia. Their servers handle massive traffic. However, if you intend to collect data at scale – across numerous cities, varying dates, or tracking price changes over time – your activity might be flagged as bot-like, potentially leading to your IP address being blocked.

This is where proxies become essential for serious web scraping. A proxy server acts as an intermediary, routing your request to Expedia but masking your actual IP address. By using a pool of proxies, especially residential or mobile ones, your requests appear to originate from many different, legitimate users rather than a single source.

Services like Evomi provide access to large pools of ethically sourced proxies. With Evomi's residential proxies (starting at $0.49/GB), you can rotate your IP address frequently, significantly reducing the risk of detection and bans. We even offer a free trial on residential, mobile, and datacenter proxies so you can test them out.

Integrating a proxy into your Playwright script is straightforward. First, get the necessary proxy details (host, port, username, password) from your Evomi dashboard.

Then, modify the browser launch configuration within your script:

# Example using Evomi Residential Proxy (HTTP)
proxy_server = "rp.evomi.com:1000"  # Or HTTPS on 1001, SOCKS5 on 1002
proxy_username = "YOUR_EVOMI_USERNAME"
proxy_password = "YOUR_EVOMI_PASSWORD"
browser = playwright_instance.firefox.launch(
    headless=False,  # Keep False for testing, True for production
    proxy={
        'server': proxy_server,
        'username': proxy_username,
        'password': proxy_password,
    }
)
# ... rest of your script (new_page, goto, etc.)

With this setup, all requests made by this Playwright browser instance will be routed through the specified Evomi proxy server, enhancing anonymity and reducing the likelihood of blocks during large-scale scraping operations.

Why Scrape Expedia Data? Potential Uses

Extracting data from Expedia opens doors to various insightful applications:

  • Dynamic Pricing Tools: Build tools that compare flight, hotel, or package prices across different dates or providers, helping users find optimal deals.

  • Market Intelligence: Analyze travel patterns, competitor pricing strategies, popular destinations, and customer preferences to inform business decisions in the travel sector.

  • Competitive Benchmarking: Hotels or travel agencies can monitor competitors' pricing, room availability, amenities, and special offers to adjust their own strategies effectively.

  • Sentiment Analysis: Collect and analyze customer reviews and ratings to gauge satisfaction levels, identify areas for improvement in services, or understand public perception of specific hotels or airlines.

  • Demand Forecasting: Use historical pricing and availability data to build models predicting future travel demand, peak seasons, and pricing fluctuations, aiding in resource planning and marketing efforts.

Wrapping Up

Using common tools like Python paired with Playwright makes extracting hotel information from Expedia quite manageable. You can adapt these techniques to scrape deeper details from individual hotel pages as well.

Remember, as your scraping activities scale up, incorporating proxies is crucial to maintain access and avoid IP bans. Choosing a dependable provider like Evomi, offering ethically sourced proxy pools (like residential, mobile, or datacenter options) with features like IP rotation, is key to successful and sustainable large-scale data collection.

Diving into Expedia Data: A Python Scraping Guide

Expedia, the well-known online travel agency, aggregates a wealth of information on hotels, flights, activities, and travel packages. It's a goldmine for data.

Extracting data from Expedia can provide significant advantages, whether for market analysis, keeping tabs on competitors, or simply snagging the best hotel rates for your next trip.

This guide will walk you through effective techniques for scraping Expedia, featuring a step-by-step tutorial using Python to gather hotel pricing information.

Why Scraping Expedia Can Be Tricky

Like many contemporary websites, Expedia relies heavily on JavaScript to load content dynamically and manage user interactions. This presents a challenge for traditional web scrapers that primarily parse static HTML, such as Beautiful Soup. These tools often can't execute the necessary JavaScript, meaning they might miss crucial data loaded after the initial page load.

Essential Tools for Effective Expedia Scraping

To successfully scrape dynamic sites like Expedia, you'll need tools capable of rendering the page like a real browser. Headless browsers controlled programmatically, like Puppeteer or Playwright, are ideal. They can execute JavaScript, interact with page elements, and access the fully rendered content.

Playwright stands out as a particularly robust choice for tackling dynamic websites. It offers official support for several languages, including Python, JavaScript, Java, and C#, making it accessible to a broad range of developers.

Scraping Expedia Hotel Data with Python and Playwright

Let's get practical. The following sections detail how to use Python with Playwright to extract hotel details for a specific city and date range.

Setting Up Your Environment

First things first, you need Python installed on your system. If you don't have it yet, grab it from the official Python website and follow their installation instructions.

Next, install the Playwright library and the necessary browser binaries. Open your terminal or command prompt and run:

pip install playwright
playwright install # Installs browser binaries (like Firefox, Chrome, WebKit)

Extracting Hotel Information

To start scraping hotel data for a chosen location and dates, we'll instruct Playwright to open the relevant Expedia search results page.

This initial code snippet launches a Firefox browser instance (non-headless, so you can see it working) and navigates to an Expedia search results page for Rome, Italy, for dates in July 2024.

import time
from playwright.sync_api import sync_playwright

# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1

# Construct the URL (simplified example - real URLs might be more complex)
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"

print(f"Navigating to: {expedia_url}")

with sync_playwright() as playwright_instance:
    browser = playwright_instance.firefox.launch(
        headless=False, # Set to True for background execution
    )
    page = browser.new_page()
    page.goto(expedia_url)

    # Allow time for dynamic content to load
    print("Page loaded, waiting for dynamic content...")
    time.sleep(5) # Increased wait time for potentially slow loads

    print("Closing browser.")
    browser.close()

Note: While we're using a pre-constructed URL here, a more advanced script could automate filling in the search form on Expedia's homepage and clicking 'Search'. That's a neat challenge to try after working through this tutorial!

Once the search results page is loaded, the next step is to identify and extract data from the individual hotel listing cards.

Expedia hotel search results page showing hotel cards

First, we need to locate all the hotel card elements on the page. Playwright uses selectors to find elements.

# Inside the 'with sync_playwright() as ...' block:
    hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
    print(f"Found {len(hotel_cards)} initial hotel cards.")

Now, we'll loop through these cards and pull out the specific pieces of information we want: the hotel name (title), its rating, and the price per night.

Close-up of an Expedia hotel card highlighting title, rating, and price
    # Inside the 'with sync_playwright() as ...' block, after locating cards:
    extracted_hotels = []
    for card in hotel_cards:
        # Use locators relative to the card element
        content_section = card.locator('div.uitk-card-content-section')

        # Extract title
        hotel_title_element = content_section.locator('h3')
        hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"

        # Extract rating (handle cases where it might be missing)
        rating_element = content_section.locator('span.uitk-badge-base-text')
        hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"

        # Extract price (handle cases where it might be missing)
        # More specific selector example
        price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
        # Fallback if the primary price selector fails
        if not price_element.is_visible():
             # Original selector as fallback
            price_element = content_section.locator('div.uitk-type-500')
        hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"

        hotel_data = {
            'title': hotel_title,
            'rating': hotel_rating,
            'price': hotel_price
        }
        extracted_hotels.append(hotel_data)

        # Optional: Print progress for each hotel
        # print(f"Extracted: {hotel_data}")

You'll notice checks like .is_visible(). This is important because not all hotels display a rating (e.g., new listings) or a price (e.g., fully booked for the selected dates). These checks prevent the script from failing if an element isn't found, assigning a placeholder value like "No Rating" or "Price Unavailable" instead.

Finally, we can print the list of collected hotel data:

    # Inside the 'with sync_playwright() as ...' block, after the loop:
    print("\n--- Extracted Hotel Data ---")
    for hotel in extracted_hotels:
        print(hotel)
    print("--------------------------")

Here’s the combined code for finding and extracting initial hotel data:

import time
from playwright.sync_api import sync_playwright

# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1

# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"

print(f"Navigating to: {expedia_url}")

with sync_playwright() as playwright_instance:
    browser = playwright_instance.firefox.launch(
        headless=False,  # Set to True for background execution
    )
    page = browser.new_page()

    # Increase default timeout for navigation
    page.set_default_navigation_timeout(60000)  # 60 seconds

    page.goto(expedia_url)

    print("Page loaded, waiting for dynamic content...")
    time.sleep(5)  # Wait for initial load

    # --- Scrape hotels ---
    hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
    print(f"Found {len(hotel_cards)} initial hotel cards.")

    extracted_hotels = []
    for card in hotel_cards:
        content_section = card.locator('div.uitk-card-content-section')
        hotel_title_element = content_section.locator('h3')
        hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"

        rating_element = content_section.locator('span.uitk-badge-base-text')
        hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"

        # Attempt primary price selector first
        price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
        if not price_element.is_visible():
            # Fallback selector
            price_element = content_section.locator('div.uitk-type-500')
        hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"

        hotel_data = {
            'title': hotel_title,
            'rating': hotel_rating,
            'price': hotel_price
        }
        extracted_hotels.append(hotel_data)

    print("\n--- Initial Extracted Hotel Data ---")
    for hotel in extracted_hotels:
        print(hotel)
    print("------------------------------------")

    print("Closing browser.")
    browser.close()

Running this script will likely output a list like this (prices and availability vary):

[
  {
    'title': 'Hotel Artemide',
    'rating': '9.6',
    'price': '$450'
  },
  {
    'title': 'iQ Hotel Roma',
    'rating': '9.2',
    'price': '$380'
  },
  {
    'title': 'UNAHOTELS Decò Roma',
    'rating': '8.8',
    'price': '$320'
  },
  {
    'title': 'The Hive Hotel',
    'rating': '8.6',
    'price': '$295'
  },
  ...
]

However, this initial list probably doesn't contain *all* the hotels available. Expedia often loads more results as you scroll or requires clicking a button like "Show More Results".

Expedia page showing the 'Show More' button at the bottom of the list

This is where the power of browser automation libraries like Playwright truly shines – we can simulate user actions like clicking buttons!

Loading All Search Results with Playwright

To capture the complete list of hotels, we need to repeatedly click the "Show More" button (or simulate scrolling, depending on the site's mechanism) until no more results load or the button vanishes. Then, we scrape the full set of cards.

Let's insert logic to handle this. Before scraping the cards, add the following code snippet inside the with sync_playwright() as ... block:

    # Inside the 'with sync_playwright() as ...' block, before card scraping:
    print("Checking for 'Show More Results' button...")
    show_more_button_selector = 'button[data-stid="show-more-results"]' # Example selector, verify this in browser dev tools
    while page.locator(show_more_button_selector).is_visible():
        print("Found 'Show More Results' button, clicking...")
        try:
            page.locator(show_more_button_selector).click(timeout=10000) # 10 second timeout for click
            print("Waiting for more results to load...")
            # Wait for network activity to settle or just a fixed delay
            page.wait_for_load_state('networkidle', timeout=15000) # Wait up to 15s for network to be idle
            # Alternative fixed wait: time.sleep(4)
        except Exception as e:
            print(f"Could not click 'Show More' or timed out waiting: {e}")
            break # Exit loop if button disappears or errors occur
    print("'Show More Results' button no longer visible or process finished.")
    # Now proceed to scrape *all* the cards that are currently loaded

This loop attempts to find the "Show More" button. If visible, it clicks it and waits for new content to load (using networkidle or a fixed delay). It repeats until the button is no longer found or an error occurs. This pattern is very useful for handling dynamically loading content on websites.

Here’s the complete script incorporating the "Show More" logic:

import time
from playwright.sync_api import sync_playwright

# Define search parameters
destination = "Rome (and vicinity), Lazio, Italy"
checkin_date = "2024-07-10"
checkout_date = "2024-07-17"
adults = 2
rooms = 1

# Construct the URL
expedia_url = f"https://www.expedia.com/Hotel-Search?destination={destination}&startDate={checkin_date}&endDate={checkout_date}&adults={adults}&rooms={rooms}&sort=RECOMMENDED"
print(f"Navigating to: {expedia_url}")

with sync_playwright() as playwright_instance:
    browser = playwright_instance.firefox.launch(
        headless=False,  # Set to True for background execution
    )
    page = browser.new_page()
    page.set_default_navigation_timeout(60000)  # 60 seconds
    page.goto(expedia_url)

    print("Page loaded, initial wait...")
    time.sleep(5)  # Wait for initial load

    # --- Handle "Show More Results" ---
    print("Checking for 'Show More Results' button...")
    # Note: Selector might change, inspect element if needed
    show_more_button_selector = 'button[data-stid="show-more-results"]'
    while page.locator(show_more_button_selector).is_visible():
        print("Found 'Show More Results' button, clicking...")
        try:
            page.locator(show_more_button_selector).click(timeout=10000)
            print("Waiting for more results to load...")
            # Wait for network activity to settle or just a fixed delay
            page.wait_for_load_state('networkidle', timeout=15000)
            # time.sleep(4) # Alternative fixed wait
        except Exception as e:
            print(f"Could not click 'Show More' or timed out waiting: {e}")
            break

    print("'Show More Results' button no longer visible or process finished.")

    # --- Scrape all loaded hotels ---
    print("Scraping all loaded hotel cards...")
    hotel_cards = page.locator('[data-stid="lodging-card-responsive"]').all()
    print(f"Found {len(hotel_cards)} total hotel cards.")

    extracted_hotels = []
    for card in hotel_cards:
        content_section = card.locator('div.uitk-card-content-section')
        hotel_title_element = content_section.locator('h3')
        hotel_title = hotel_title_element.text_content().strip() if hotel_title_element.is_visible() else "N/A"

        rating_element = content_section.locator('span.uitk-badge-base-text')
        hotel_rating = rating_element.text_content().strip() if rating_element.is_visible() else "No Rating"

        price_element = content_section.locator('div[data-test-id="price-summary"] .uitk-text .uitk-type-500')
        if not price_element.is_visible():
            price_element = content_section.locator('div.uitk-type-500')  # Fallback
        hotel_price = price_element.text_content().strip() if price_element.is_visible() else "Price Unavailable"

        hotel_data = {
            'title': hotel_title,
            'rating': hotel_rating,
            'price': hotel_price
        }
        extracted_hotels.append(hotel_data)

    print("\n--- Final Extracted Hotel Data ---")
    # Print only the first few and the total count for brevity
    for i, hotel in enumerate(extracted_hotels):
        if i < 5:  # Print first 5
            print(hotel)
        elif i == 5:
            print("...")  # Indicate more data exists
    print(f"(Total: {len(extracted_hotels)} hotels)")
    print("---------------------------------")

    print("Closing browser.")
    browser.close()

Avoiding Blocks: The Role of Proxies

Scraping data for a single city and date range occasionally is unlikely to raise alarms on Expedia. Their servers handle massive traffic. However, if you intend to collect data at scale – across numerous cities, varying dates, or tracking price changes over time – your activity might be flagged as bot-like, potentially leading to your IP address being blocked.

This is where proxies become essential for serious web scraping. A proxy server acts as an intermediary, routing your request to Expedia but masking your actual IP address. By using a pool of proxies, especially residential or mobile ones, your requests appear to originate from many different, legitimate users rather than a single source.

Services like Evomi provide access to large pools of ethically sourced proxies. With Evomi's residential proxies (starting at $0.49/GB), you can rotate your IP address frequently, significantly reducing the risk of detection and bans. We even offer a free trial on residential, mobile, and datacenter proxies so you can test them out.

Integrating a proxy into your Playwright script is straightforward. First, get the necessary proxy details (host, port, username, password) from your Evomi dashboard.

Then, modify the browser launch configuration within your script:

# Example using Evomi Residential Proxy (HTTP)
proxy_server = "rp.evomi.com:1000"  # Or HTTPS on 1001, SOCKS5 on 1002
proxy_username = "YOUR_EVOMI_USERNAME"
proxy_password = "YOUR_EVOMI_PASSWORD"
browser = playwright_instance.firefox.launch(
    headless=False,  # Keep False for testing, True for production
    proxy={
        'server': proxy_server,
        'username': proxy_username,
        'password': proxy_password,
    }
)
# ... rest of your script (new_page, goto, etc.)

With this setup, all requests made by this Playwright browser instance will be routed through the specified Evomi proxy server, enhancing anonymity and reducing the likelihood of blocks during large-scale scraping operations.

Why Scrape Expedia Data? Potential Uses

Extracting data from Expedia opens doors to various insightful applications:

  • Dynamic Pricing Tools: Build tools that compare flight, hotel, or package prices across different dates or providers, helping users find optimal deals.

  • Market Intelligence: Analyze travel patterns, competitor pricing strategies, popular destinations, and customer preferences to inform business decisions in the travel sector.

  • Competitive Benchmarking: Hotels or travel agencies can monitor competitors' pricing, room availability, amenities, and special offers to adjust their own strategies effectively.

  • Sentiment Analysis: Collect and analyze customer reviews and ratings to gauge satisfaction levels, identify areas for improvement in services, or understand public perception of specific hotels or airlines.

  • Demand Forecasting: Use historical pricing and availability data to build models predicting future travel demand, peak seasons, and pricing fluctuations, aiding in resource planning and marketing efforts.

Wrapping Up

Using common tools like Python paired with Playwright makes extracting hotel information from Expedia quite manageable. You can adapt these techniques to scrape deeper details from individual hotel pages as well.

Remember, as your scraping activities scale up, incorporating proxies is crucial to maintain access and avoid IP bans. Choosing a dependable provider like Evomi, offering ethically sourced proxy pools (like residential, mobile, or datacenter options) with features like IP rotation, is key to successful and sustainable large-scale data collection.

Author

Sarah Whitmore

Digital Privacy & Cybersecurity Consultant

About Author

Sarah is a cybersecurity strategist with a passion for online privacy and digital security. She explores how proxies, VPNs, and encryption tools protect users from tracking, cyber threats, and data breaches. With years of experience in cybersecurity consulting, she provides practical insights into safeguarding sensitive data in an increasingly digital world.

Like this article? Share it.
You asked, we answer - Users questions:
Is scraping Expedia data legal and allowed by their Terms of Service?+
Can the Playwright method described be adapted to scrape flight details or specific hotel amenities from Expedia?+
How frequently can I send requests to Expedia when scraping, even with proxies, to avoid detection?+
What should I do if my Expedia scraper encounters a CAPTCHA challenge?+
Are residential proxies always the best choice for Expedia, or can datacenter proxies be effective?+

In This Article

Read More Blogs