Build a Facebook & Amazon Scraper: Proxy & Safe Methods





Sarah Whitmore
Scraping Techniques
Building Your Own Facebook & Amazon Data Extractor with Proxies
With billions of users scrolling daily, Facebook is a treasure trove of publicly available information. Think about the marketplace, business pages, public group discussions – it's a vast digital landscape ripe for exploration. For data enthusiasts and businesses alike, this information holds immense potential.
So, how about we dive in and construct a tool to gather some of this data? Specifically, let's build a Facebook scraper.
Don't worry if coding isn't your strong suit or if you're just starting. This guide covers both the code-heavy path and simpler approaches.
We'll explore what Facebook scrapers are, identify effective tools, build a basic scraper using code, and extend it to pull data from Amazon too. And crucially, we'll learn how to do this responsibly and without triggering alarms.
This article continues our journey into programmatic SEO, where we're constructing a data-driven website piece by piece. Here’s a reminder of the series roadmap:
Programmatic SEO Fundamentals
Understanding what programmatic SEO entails, its benefits, site planning, and keyword strategies.Automated Web Scraping Techniques
Choosing the right tools and languages for your targets, setting up an automated scraping backend, and exploring top scraping instruments.Building a Facebook & Amazon Scraper (You are here!)
Focusing on data collection from specific sites like Facebook and Amazon, data handling, and error management.Programmatic SEO with WordPress
Leveraging scraped data to populate and manage a WordPress site effectively.
Let's get our hands dirty!
What Is a Facebook Scraper?
Essentially, a Facebook scraper is an automated script or tool designed to extract publicly accessible data from Facebook. This can include information from public profiles, business pages, marketplace listings, event details, public group comments, and more.
In our programmatic SEO project context, the Facebook scraper serves as a primary data source. Imagine building a niche site that aggregates specific Facebook Marketplace listings for a particular city. Instead of just any item, we'll target sought-after electronics (like smartphones, laptops, etc.) and gather all related listings. This provides users a streamlined view of attractive local deals for high-demand products.
Here’s a conceptual sketch of what the city-specific marketplace landing page might look like:
And when a user selects a product:

To power these pages, we need a structured way to store the gathered data. We'll cleverly combine data scraped from both Amazon (for reference prices) and Facebook (for local listings).
What's a Good Tool for Scraping Facebook?
For tackling dynamic sites like Facebook, Playwright stands out. It's a Node.js library (with support for Python, Java, and .NET too) that lets you automate real web browsers (Chromium, Firefox, WebKit) through code. This approach is excellent for interacting with JavaScript-heavy sites and often flies under the radar of basic anti-scraping measures.
Playwright offers several advantages. Firstly, it's open-source and free to use, unlike many commercial scraping APIs which can get pricey, especially at scale.
Secondly, being backed by Microsoft and available across multiple languages means learning its API provides transferable skills. Write your logic once, and adapting it to Python or C# later is much easier.
Finally, Playwright includes intuitive commands and a fantastic 'codegen' tool. This tool can record your interactions within a browser window and automatically generate the corresponding Playwright code. Even if you're not a coding wizard, you can perform the actions manually, and Playwright translates them into a script for you.
How Do I Extract Data From Amazon?
Extracting data from Amazon, or indeed most websites, follows a similar pattern when using a tool like Playwright. The process generally involves these key stages:
Define precisely which data points you need (e.g., product title, price, rating, description).
Navigate the automation tool (Playwright) to the specific Amazon product pages (URLs).
Develop functions or use code generation to target and extract the desired data elements from the page's HTML structure.
Set up a scheduler (like cron jobs or a dedicated task scheduler) to run your scraper periodically and keep the data fresh.
Does Facebook Allow Scraping?
Officially, no. Facebook's terms of service generally prohibit automated data collection. Like most major platforms, they employ various techniques to detect and block scrapers. However, court rulings have generally upheld that scraping publicly accessible data is legal. The key is sticking to public information (like marketplace listings or public business pages) and not attempting to access private data or violating other laws (like copyright).
Fortunately for our project, both Facebook Marketplace listings and standard Amazon product pages are public. So, the primary challenge isn't legality but avoiding detection.
This is where proxies become essential. Using a reliable proxy service, like Evomi's Residential Proxies, is a game-changer. Instead of sending all your requests from a single IP address (a dead giveaway), residential proxies route your traffic through IP addresses belonging to real home internet connections worldwide. From Facebook's or Amazon's perspective, each request looks like it's coming from a different, genuine user.
Evomi provides ethically sourced proxies, ensuring compliance and reliability, backed by Swiss standards for quality and robust customer support. Our residential proxies start at just $0.49 per GB, offering a cost-effective way to scale your scraping operations. Integrating them is straightforward.
Since Playwright controls real browsers, it sends realistic browser headers, which helps avoid suspicion compared to simpler HTTP libraries that might use default, easily identifiable headers.
Also, remember to incorporate random delays between actions in your script. Instantaneous clicks and navigation patterns scream "bot." Pausing for a few seconds between loading a page and extracting data mimics human browsing behavior.
Other useful tactics include clearing cookies between sessions and removing tracking parameters from URLs, as these can also be used for identification.
Building Your Facebook Scraper (and Amazon Scraper)
Our Facebook scraper's core task is to visit the marketplace for a designated city, search for a specific product, and extract details from the resulting listings.
We could enhance this later by adding features like filtering by search radius, validating product specifics (model numbers, condition), checking seller reputations, etc. But for now, let's keep the initial version focused and functional.
Interestingly, while the Facebook data is our ultimate goal for the programmatic site, the process often starts with Amazon. The Amazon scraper will fetch reference prices for our target products (stored perhaps in a simple database). This price context adds value. Then, armed with the product name, we search Facebook Marketplace in various cities for current local deals.
The overall scraping workflow looks like this:
Import necessary libraries (Playwright, database connector like `mysql2`).
Establish a connection to your database (e.g., MySQL, PostgreSQL).
Retrieve the list of products (and their Amazon URLs) to monitor from the database.
For each product:
Scrape its Amazon page, extract the current price, and update the database.
Retrieve the list of target cities from the database.
For each product again:
For each city:
Construct the Facebook Marketplace search URL for the product in that city.
Scrape the search results page, extracting details (title, price, URL) for each listing.
Save these listings to the database, associated with the product and city.
Alright, let's code!
First, ensure you have Playwright installed in your Node.js project:
npm
Facebook Marketplace URLs have a predictable structure, usually:
facebook.com/marketplace/[city_slug]/
For example:
https://www.facebook.com/marketplace/denver/
And searches simply append search/?query=[search_term]
. For instance:
https://www.facebook.com/marketplace/denver/search/?query=iphone
This predictability is great for automation. Here’s a JavaScript snippet using Playwright to load a search results page and extract basic listing info:
const { chromium } = require('playwright');
(async () => {
// Launch browser (consider adding proxy settings here!)
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
const targetUrl = 'https://www.facebook.com/marketplace/denver/search/?query=iphone';
await page.goto(targetUrl);
console.log(`Navigated to: ${targetUrl}`);
// Wait for the main container of listings to appear
// Note: Selectors might change! Inspect element is your friend.
const listingContainerSelector = '[aria-label="Collection of Marketplace items"]';
try {
await page.waitForSelector(listingContainerSelector, { timeout: 10000 }); // Wait max 10 seconds
console.log('Listing container found.');
} catch (error) {
console.error(`Error waiting for selector: ${listingContainerSelector}`, error);
await browser.close();
return;
}
// Find all links within the container - these usually lead to individual listings
const listingLinks = page.locator(`${listingContainerSelector} a`);
const linkCount = await listingLinks.count();
console.log(`Found ${linkCount} potential listing links.`);
let scrapedListings = [];
for (let i = 0; i < linkCount; i++) {
const element = listingLinks.nth(i);
try {
const textContent = await element.innerText();
const itemUrl = await element.getAttribute('href');
// Basic filtering: check if URL seems like a marketplace item link
if (itemUrl && itemUrl.includes('/marketplace/item/')) {
scrapedListings.push({
description: textContent.replace(/\n/g, ' | '), // Replace newlines for easier logging
url: `https://www.facebook.com${itemUrl}` // Prepend domain if needed
});
}
} catch (e) {
// Sometimes elements might detach or have issues, log and continue
console.log(`Skipping element at index ${i} due to error: ${e.message}`);
}
// Add a small random delay to appear more human
await page.waitForTimeout(Math.random() * 500 + 100); // Delay 100-600 ms
}
console.log("--- Scraped Listings ---");
console.log(JSON.stringify(scrapedListings, null, 2)); // Pretty print the JSON
console.log("----------------------");
await browser.close();
})();
Let's break down key parts of that script:
require('playwright')
: Imports the Playwright library.chromium.launch()
: Starts a Chromium browser instance. You'll add proxy settings here later.browser.newContext() / newPage()
: Creates a clean browser session and opens a new tab.page.goto(...)
: Navigates to our target Facebook Marketplace search URL (Denver, searching for 'iphone').page.waitForSelector(...)
: Pauses the script until the main container holding the listings (identified by its `aria-label`) is loaded in the page. This prevents errors from trying to access elements before they exist. Using a timeout prevents indefinite waiting.page.locator(...)
: Creates a locator object representing all the link (`a`) elements within the listing container.locators.count()
: Gets the number of matching link elements found.for loop
: Iterates through each found link element.element.innerText() / getAttribute('href')
: Extracts the visible text content and the link's URL (`href` attribute) for each listing.scrapedListings.push(...)
: Adds the extracted data (description and URL) as an object to our results array. We clean up the description slightly and ensure the URL is complete.page.waitForTimeout(...)
: Introduces a small, randomized delay inside the loop to mimic human interaction speed.console.log(...)
: Outputs the collected data in a readable JSON format.browser.close()
: Closes the browser instance.
Running this should print output similar to this structure (actual items will vary):

Notice how the descriptions often bundle price, title, and location, separated by newline characters (which we replaced with ' | ' for clarity). The URLs typically follow the `/marketplace/item/ID/` pattern. This structured data is perfect for storing in a database.
You can integrate this with a database like MySQL. First, install a driver:
npm install mysql2
The mysql2
library is a popular choice for Node.js. You can then use code like this to connect and interact with your database:
const mysql = require('mysql2/promise'); // Using the promise wrapper
async function dbConnect() {
try {
const connection = await mysql.createConnection({
host: 'your_db_host', // e.g., 'localhost' or IP address
user: 'your_db_user',
password: 'your_db_password',
database: 'your_db_name'
});
console.log("Successfully connected to the database!");
return connection;
} catch (error) {
console.error("Database connection failed:", error);
throw error; // Re-throw error to handle it upstream
}
}
async function insertListing(connection, listingData) {
// Example: Assumes a table 'fb_listings' with columns: city, product_query, url, description
const sql = 'INSERT INTO fb_listings (city, product_query, url, description) VALUES (?, ?, ?, ?)';
try {
const [results] = await connection.execute(sql, [
listingData.city, // e.g., 'denver'
listingData.product_query, // e.g., 'iphone'
listingData.url,
listingData.description
]);
console.log(`Inserted listing with ID: ${results.insertId}`);
return results.insertId;
} catch (error) {
console.error("Failed to insert listing:", error);
// Consider more robust error handling/logging here
}
}
// --- Inside your main scraper async function ---
// ... (after scraping)
/*
const connection = await dbConnect();
if (connection) {
for (const listing of scrapedListings) {
await insertListing(connection, {
city: 'denver', // Pass these dynamically based on your loops
product_query: 'iphone',
url: listing.url,
description: listing.description
});
}
await connection.end(); // Close connection when done
console.log("Database connection closed.");
}
*/
// ... rest of the scraper code ...
You'd integrate this database logic into your main scraper script. Instead of just logging the `scrapedListings` array, you'd loop through it and call `insertListing` for each item, passing the relevant city and product query along with the scraped data.
This database connection approach can also be adapted for your Amazon scraper, perhaps fetching product URLs from one table and storing scraped prices in another.
How to Scrape Product Prices From Amazon
Now, let's tackle grabbing product prices from Amazon. This time, we'll leverage Playwright's code generation feature.
Open your terminal and run this command (replace the URL with a current Amazon product page):
npx playwright codegen https://www.amazon.com/Google-Pixel-7a-Unlocked-Smartphone/dp/B0BZW1N34P/
This command launches two windows: one is a regular Chromium browser navigated to the URL, and the other is the Playwright Inspector/Recorder.

Click the "Record" button in the Inspector. Now, any action you take in the browser window (clicking, typing, scrolling) will be translated into Playwright code in the Inspector window.
Experiment with it! Try clicking around. To grab the price, don't click "Record" just yet. Instead, click the "Pick locator" button (it looks like a target reticle) in the Inspector. Then, hover over the price element on the Amazon page in the browser window and click it. The Inspector will generate a selector for that element.

Be mindful that generated selectors aren't always ideal for scraping dynamic data like prices. For example, you might initially get something like:
page.getByText('$374.00').first()
This selector finds the element based on its current text content ('$374.00'). This is useless for tracking price *changes* because if the price becomes $380.00, this selector will no longer find the element!
You need a more stable selector, usually based on CSS IDs or classes that Amazon uses structurally. Hover around the price and its containing elements using "Pick locator" until you find something more robust, like:
page.locator('.a-price .a-offscreen')
.first() // Or perhaps: page.locator('#corePrice_feature_div .a-price-whole').first()// Amazon's structure changes, so inspect carefully!
You can copy these generated selectors (or the entire recorded script) and integrate them into your dedicated Amazon scraping function. Since the elements you target on a product page are usually consistent, you can often hard-code these selectors.
Remember the crucial step: avoiding blocks. Running these scripts directly from your IP will likely get you blocked quickly by both Facebook and Amazon. This is where Evomi proxies are essential.
After signing up with Evomi, you'll get access to your proxy credentials and endpoints. For residential proxies, the endpoint might look like rp.evomi.com
with a specific port, say 1000
for HTTP. You'll configure Playwright to use these:
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({
proxy: {
server: 'http://rp.evomi.com:1000', // Evomi residential HTTP endpoint
username: 'YOUR_EVOMI_USERNAME', // Your Evomi proxy user
password: 'YOUR_EVOMI_PASSWORD' // Your Evomi proxy password
}
});
const context = await browser.newContext();
const page = await context.newPage();
// ... your scraping logic using page.goto, page.locator, etc. ...
console.log('Launched browser with Evomi proxy configuration.');
// Example: Check IP to verify proxy is working (optional)
// await page.goto('https://geo.evomi.com/');
// console.log('Current IP info:', await page.textContent('body'));
// await page.goto('your_target_amazon_or_facebook_url');
// ... perform scraping ...
await browser.close();
})();
By routing your requests through Evomi's residential IPs, each connection appears unique and legitimate, drastically reducing the chances of detection and blocks. Consider trying out our free trial for Residential, Mobile, or Datacenter proxies to see the difference yourself!
With Playwright, robust selectors, and reliable proxies like Evomi's, you have a solid foundation. You can now expand these basic scrapers, add error handling, integrate database storage properly, and schedule them to run automatically using tools like cron jobs.
Conclusion
We've walked through the process of conceptualizing and building basic web scrapers for Facebook Marketplace and Amazon using Playwright and Node.js. You've seen how to identify target data, use Playwright's API and code generation features, and structure the scraping logic.
Crucially, we emphasized the importance of responsible scraping: sticking to public data and employing techniques like random delays and, most importantly, using high-quality residential proxies from a provider like Evomi to avoid IP blocks and maintain access.
While these examples provide a starting point, real-world scraping often requires more sophistication – handling dynamic loading (infinite scroll), complex JavaScript interactions, validating data quality, and robust error management. But the principles remain the same.
Armed with these techniques, you're well on your way to gathering the data needed for your programmatic SEO project or any other data-driven endeavor. Happy scraping!
Building Your Own Facebook & Amazon Data Extractor with Proxies
With billions of users scrolling daily, Facebook is a treasure trove of publicly available information. Think about the marketplace, business pages, public group discussions – it's a vast digital landscape ripe for exploration. For data enthusiasts and businesses alike, this information holds immense potential.
So, how about we dive in and construct a tool to gather some of this data? Specifically, let's build a Facebook scraper.
Don't worry if coding isn't your strong suit or if you're just starting. This guide covers both the code-heavy path and simpler approaches.
We'll explore what Facebook scrapers are, identify effective tools, build a basic scraper using code, and extend it to pull data from Amazon too. And crucially, we'll learn how to do this responsibly and without triggering alarms.
This article continues our journey into programmatic SEO, where we're constructing a data-driven website piece by piece. Here’s a reminder of the series roadmap:
Programmatic SEO Fundamentals
Understanding what programmatic SEO entails, its benefits, site planning, and keyword strategies.Automated Web Scraping Techniques
Choosing the right tools and languages for your targets, setting up an automated scraping backend, and exploring top scraping instruments.Building a Facebook & Amazon Scraper (You are here!)
Focusing on data collection from specific sites like Facebook and Amazon, data handling, and error management.Programmatic SEO with WordPress
Leveraging scraped data to populate and manage a WordPress site effectively.
Let's get our hands dirty!
What Is a Facebook Scraper?
Essentially, a Facebook scraper is an automated script or tool designed to extract publicly accessible data from Facebook. This can include information from public profiles, business pages, marketplace listings, event details, public group comments, and more.
In our programmatic SEO project context, the Facebook scraper serves as a primary data source. Imagine building a niche site that aggregates specific Facebook Marketplace listings for a particular city. Instead of just any item, we'll target sought-after electronics (like smartphones, laptops, etc.) and gather all related listings. This provides users a streamlined view of attractive local deals for high-demand products.
Here’s a conceptual sketch of what the city-specific marketplace landing page might look like:
And when a user selects a product:

To power these pages, we need a structured way to store the gathered data. We'll cleverly combine data scraped from both Amazon (for reference prices) and Facebook (for local listings).
What's a Good Tool for Scraping Facebook?
For tackling dynamic sites like Facebook, Playwright stands out. It's a Node.js library (with support for Python, Java, and .NET too) that lets you automate real web browsers (Chromium, Firefox, WebKit) through code. This approach is excellent for interacting with JavaScript-heavy sites and often flies under the radar of basic anti-scraping measures.
Playwright offers several advantages. Firstly, it's open-source and free to use, unlike many commercial scraping APIs which can get pricey, especially at scale.
Secondly, being backed by Microsoft and available across multiple languages means learning its API provides transferable skills. Write your logic once, and adapting it to Python or C# later is much easier.
Finally, Playwright includes intuitive commands and a fantastic 'codegen' tool. This tool can record your interactions within a browser window and automatically generate the corresponding Playwright code. Even if you're not a coding wizard, you can perform the actions manually, and Playwright translates them into a script for you.
How Do I Extract Data From Amazon?
Extracting data from Amazon, or indeed most websites, follows a similar pattern when using a tool like Playwright. The process generally involves these key stages:
Define precisely which data points you need (e.g., product title, price, rating, description).
Navigate the automation tool (Playwright) to the specific Amazon product pages (URLs).
Develop functions or use code generation to target and extract the desired data elements from the page's HTML structure.
Set up a scheduler (like cron jobs or a dedicated task scheduler) to run your scraper periodically and keep the data fresh.
Does Facebook Allow Scraping?
Officially, no. Facebook's terms of service generally prohibit automated data collection. Like most major platforms, they employ various techniques to detect and block scrapers. However, court rulings have generally upheld that scraping publicly accessible data is legal. The key is sticking to public information (like marketplace listings or public business pages) and not attempting to access private data or violating other laws (like copyright).
Fortunately for our project, both Facebook Marketplace listings and standard Amazon product pages are public. So, the primary challenge isn't legality but avoiding detection.
This is where proxies become essential. Using a reliable proxy service, like Evomi's Residential Proxies, is a game-changer. Instead of sending all your requests from a single IP address (a dead giveaway), residential proxies route your traffic through IP addresses belonging to real home internet connections worldwide. From Facebook's or Amazon's perspective, each request looks like it's coming from a different, genuine user.
Evomi provides ethically sourced proxies, ensuring compliance and reliability, backed by Swiss standards for quality and robust customer support. Our residential proxies start at just $0.49 per GB, offering a cost-effective way to scale your scraping operations. Integrating them is straightforward.
Since Playwright controls real browsers, it sends realistic browser headers, which helps avoid suspicion compared to simpler HTTP libraries that might use default, easily identifiable headers.
Also, remember to incorporate random delays between actions in your script. Instantaneous clicks and navigation patterns scream "bot." Pausing for a few seconds between loading a page and extracting data mimics human browsing behavior.
Other useful tactics include clearing cookies between sessions and removing tracking parameters from URLs, as these can also be used for identification.
Building Your Facebook Scraper (and Amazon Scraper)
Our Facebook scraper's core task is to visit the marketplace for a designated city, search for a specific product, and extract details from the resulting listings.
We could enhance this later by adding features like filtering by search radius, validating product specifics (model numbers, condition), checking seller reputations, etc. But for now, let's keep the initial version focused and functional.
Interestingly, while the Facebook data is our ultimate goal for the programmatic site, the process often starts with Amazon. The Amazon scraper will fetch reference prices for our target products (stored perhaps in a simple database). This price context adds value. Then, armed with the product name, we search Facebook Marketplace in various cities for current local deals.
The overall scraping workflow looks like this:
Import necessary libraries (Playwright, database connector like `mysql2`).
Establish a connection to your database (e.g., MySQL, PostgreSQL).
Retrieve the list of products (and their Amazon URLs) to monitor from the database.
For each product:
Scrape its Amazon page, extract the current price, and update the database.
Retrieve the list of target cities from the database.
For each product again:
For each city:
Construct the Facebook Marketplace search URL for the product in that city.
Scrape the search results page, extracting details (title, price, URL) for each listing.
Save these listings to the database, associated with the product and city.
Alright, let's code!
First, ensure you have Playwright installed in your Node.js project:
npm
Facebook Marketplace URLs have a predictable structure, usually:
facebook.com/marketplace/[city_slug]/
For example:
https://www.facebook.com/marketplace/denver/
And searches simply append search/?query=[search_term]
. For instance:
https://www.facebook.com/marketplace/denver/search/?query=iphone
This predictability is great for automation. Here’s a JavaScript snippet using Playwright to load a search results page and extract basic listing info:
const { chromium } = require('playwright');
(async () => {
// Launch browser (consider adding proxy settings here!)
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
const targetUrl = 'https://www.facebook.com/marketplace/denver/search/?query=iphone';
await page.goto(targetUrl);
console.log(`Navigated to: ${targetUrl}`);
// Wait for the main container of listings to appear
// Note: Selectors might change! Inspect element is your friend.
const listingContainerSelector = '[aria-label="Collection of Marketplace items"]';
try {
await page.waitForSelector(listingContainerSelector, { timeout: 10000 }); // Wait max 10 seconds
console.log('Listing container found.');
} catch (error) {
console.error(`Error waiting for selector: ${listingContainerSelector}`, error);
await browser.close();
return;
}
// Find all links within the container - these usually lead to individual listings
const listingLinks = page.locator(`${listingContainerSelector} a`);
const linkCount = await listingLinks.count();
console.log(`Found ${linkCount} potential listing links.`);
let scrapedListings = [];
for (let i = 0; i < linkCount; i++) {
const element = listingLinks.nth(i);
try {
const textContent = await element.innerText();
const itemUrl = await element.getAttribute('href');
// Basic filtering: check if URL seems like a marketplace item link
if (itemUrl && itemUrl.includes('/marketplace/item/')) {
scrapedListings.push({
description: textContent.replace(/\n/g, ' | '), // Replace newlines for easier logging
url: `https://www.facebook.com${itemUrl}` // Prepend domain if needed
});
}
} catch (e) {
// Sometimes elements might detach or have issues, log and continue
console.log(`Skipping element at index ${i} due to error: ${e.message}`);
}
// Add a small random delay to appear more human
await page.waitForTimeout(Math.random() * 500 + 100); // Delay 100-600 ms
}
console.log("--- Scraped Listings ---");
console.log(JSON.stringify(scrapedListings, null, 2)); // Pretty print the JSON
console.log("----------------------");
await browser.close();
})();
Let's break down key parts of that script:
require('playwright')
: Imports the Playwright library.chromium.launch()
: Starts a Chromium browser instance. You'll add proxy settings here later.browser.newContext() / newPage()
: Creates a clean browser session and opens a new tab.page.goto(...)
: Navigates to our target Facebook Marketplace search URL (Denver, searching for 'iphone').page.waitForSelector(...)
: Pauses the script until the main container holding the listings (identified by its `aria-label`) is loaded in the page. This prevents errors from trying to access elements before they exist. Using a timeout prevents indefinite waiting.page.locator(...)
: Creates a locator object representing all the link (`a`) elements within the listing container.locators.count()
: Gets the number of matching link elements found.for loop
: Iterates through each found link element.element.innerText() / getAttribute('href')
: Extracts the visible text content and the link's URL (`href` attribute) for each listing.scrapedListings.push(...)
: Adds the extracted data (description and URL) as an object to our results array. We clean up the description slightly and ensure the URL is complete.page.waitForTimeout(...)
: Introduces a small, randomized delay inside the loop to mimic human interaction speed.console.log(...)
: Outputs the collected data in a readable JSON format.browser.close()
: Closes the browser instance.
Running this should print output similar to this structure (actual items will vary):

Notice how the descriptions often bundle price, title, and location, separated by newline characters (which we replaced with ' | ' for clarity). The URLs typically follow the `/marketplace/item/ID/` pattern. This structured data is perfect for storing in a database.
You can integrate this with a database like MySQL. First, install a driver:
npm install mysql2
The mysql2
library is a popular choice for Node.js. You can then use code like this to connect and interact with your database:
const mysql = require('mysql2/promise'); // Using the promise wrapper
async function dbConnect() {
try {
const connection = await mysql.createConnection({
host: 'your_db_host', // e.g., 'localhost' or IP address
user: 'your_db_user',
password: 'your_db_password',
database: 'your_db_name'
});
console.log("Successfully connected to the database!");
return connection;
} catch (error) {
console.error("Database connection failed:", error);
throw error; // Re-throw error to handle it upstream
}
}
async function insertListing(connection, listingData) {
// Example: Assumes a table 'fb_listings' with columns: city, product_query, url, description
const sql = 'INSERT INTO fb_listings (city, product_query, url, description) VALUES (?, ?, ?, ?)';
try {
const [results] = await connection.execute(sql, [
listingData.city, // e.g., 'denver'
listingData.product_query, // e.g., 'iphone'
listingData.url,
listingData.description
]);
console.log(`Inserted listing with ID: ${results.insertId}`);
return results.insertId;
} catch (error) {
console.error("Failed to insert listing:", error);
// Consider more robust error handling/logging here
}
}
// --- Inside your main scraper async function ---
// ... (after scraping)
/*
const connection = await dbConnect();
if (connection) {
for (const listing of scrapedListings) {
await insertListing(connection, {
city: 'denver', // Pass these dynamically based on your loops
product_query: 'iphone',
url: listing.url,
description: listing.description
});
}
await connection.end(); // Close connection when done
console.log("Database connection closed.");
}
*/
// ... rest of the scraper code ...
You'd integrate this database logic into your main scraper script. Instead of just logging the `scrapedListings` array, you'd loop through it and call `insertListing` for each item, passing the relevant city and product query along with the scraped data.
This database connection approach can also be adapted for your Amazon scraper, perhaps fetching product URLs from one table and storing scraped prices in another.
How to Scrape Product Prices From Amazon
Now, let's tackle grabbing product prices from Amazon. This time, we'll leverage Playwright's code generation feature.
Open your terminal and run this command (replace the URL with a current Amazon product page):
npx playwright codegen https://www.amazon.com/Google-Pixel-7a-Unlocked-Smartphone/dp/B0BZW1N34P/
This command launches two windows: one is a regular Chromium browser navigated to the URL, and the other is the Playwright Inspector/Recorder.

Click the "Record" button in the Inspector. Now, any action you take in the browser window (clicking, typing, scrolling) will be translated into Playwright code in the Inspector window.
Experiment with it! Try clicking around. To grab the price, don't click "Record" just yet. Instead, click the "Pick locator" button (it looks like a target reticle) in the Inspector. Then, hover over the price element on the Amazon page in the browser window and click it. The Inspector will generate a selector for that element.

Be mindful that generated selectors aren't always ideal for scraping dynamic data like prices. For example, you might initially get something like:
page.getByText('$374.00').first()
This selector finds the element based on its current text content ('$374.00'). This is useless for tracking price *changes* because if the price becomes $380.00, this selector will no longer find the element!
You need a more stable selector, usually based on CSS IDs or classes that Amazon uses structurally. Hover around the price and its containing elements using "Pick locator" until you find something more robust, like:
page.locator('.a-price .a-offscreen')
.first() // Or perhaps: page.locator('#corePrice_feature_div .a-price-whole').first()// Amazon's structure changes, so inspect carefully!
You can copy these generated selectors (or the entire recorded script) and integrate them into your dedicated Amazon scraping function. Since the elements you target on a product page are usually consistent, you can often hard-code these selectors.
Remember the crucial step: avoiding blocks. Running these scripts directly from your IP will likely get you blocked quickly by both Facebook and Amazon. This is where Evomi proxies are essential.
After signing up with Evomi, you'll get access to your proxy credentials and endpoints. For residential proxies, the endpoint might look like rp.evomi.com
with a specific port, say 1000
for HTTP. You'll configure Playwright to use these:
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({
proxy: {
server: 'http://rp.evomi.com:1000', // Evomi residential HTTP endpoint
username: 'YOUR_EVOMI_USERNAME', // Your Evomi proxy user
password: 'YOUR_EVOMI_PASSWORD' // Your Evomi proxy password
}
});
const context = await browser.newContext();
const page = await context.newPage();
// ... your scraping logic using page.goto, page.locator, etc. ...
console.log('Launched browser with Evomi proxy configuration.');
// Example: Check IP to verify proxy is working (optional)
// await page.goto('https://geo.evomi.com/');
// console.log('Current IP info:', await page.textContent('body'));
// await page.goto('your_target_amazon_or_facebook_url');
// ... perform scraping ...
await browser.close();
})();
By routing your requests through Evomi's residential IPs, each connection appears unique and legitimate, drastically reducing the chances of detection and blocks. Consider trying out our free trial for Residential, Mobile, or Datacenter proxies to see the difference yourself!
With Playwright, robust selectors, and reliable proxies like Evomi's, you have a solid foundation. You can now expand these basic scrapers, add error handling, integrate database storage properly, and schedule them to run automatically using tools like cron jobs.
Conclusion
We've walked through the process of conceptualizing and building basic web scrapers for Facebook Marketplace and Amazon using Playwright and Node.js. You've seen how to identify target data, use Playwright's API and code generation features, and structure the scraping logic.
Crucially, we emphasized the importance of responsible scraping: sticking to public data and employing techniques like random delays and, most importantly, using high-quality residential proxies from a provider like Evomi to avoid IP blocks and maintain access.
While these examples provide a starting point, real-world scraping often requires more sophistication – handling dynamic loading (infinite scroll), complex JavaScript interactions, validating data quality, and robust error management. But the principles remain the same.
Armed with these techniques, you're well on your way to gathering the data needed for your programmatic SEO project or any other data-driven endeavor. Happy scraping!
Building Your Own Facebook & Amazon Data Extractor with Proxies
With billions of users scrolling daily, Facebook is a treasure trove of publicly available information. Think about the marketplace, business pages, public group discussions – it's a vast digital landscape ripe for exploration. For data enthusiasts and businesses alike, this information holds immense potential.
So, how about we dive in and construct a tool to gather some of this data? Specifically, let's build a Facebook scraper.
Don't worry if coding isn't your strong suit or if you're just starting. This guide covers both the code-heavy path and simpler approaches.
We'll explore what Facebook scrapers are, identify effective tools, build a basic scraper using code, and extend it to pull data from Amazon too. And crucially, we'll learn how to do this responsibly and without triggering alarms.
This article continues our journey into programmatic SEO, where we're constructing a data-driven website piece by piece. Here’s a reminder of the series roadmap:
Programmatic SEO Fundamentals
Understanding what programmatic SEO entails, its benefits, site planning, and keyword strategies.Automated Web Scraping Techniques
Choosing the right tools and languages for your targets, setting up an automated scraping backend, and exploring top scraping instruments.Building a Facebook & Amazon Scraper (You are here!)
Focusing on data collection from specific sites like Facebook and Amazon, data handling, and error management.Programmatic SEO with WordPress
Leveraging scraped data to populate and manage a WordPress site effectively.
Let's get our hands dirty!
What Is a Facebook Scraper?
Essentially, a Facebook scraper is an automated script or tool designed to extract publicly accessible data from Facebook. This can include information from public profiles, business pages, marketplace listings, event details, public group comments, and more.
In our programmatic SEO project context, the Facebook scraper serves as a primary data source. Imagine building a niche site that aggregates specific Facebook Marketplace listings for a particular city. Instead of just any item, we'll target sought-after electronics (like smartphones, laptops, etc.) and gather all related listings. This provides users a streamlined view of attractive local deals for high-demand products.
Here’s a conceptual sketch of what the city-specific marketplace landing page might look like:
And when a user selects a product:

To power these pages, we need a structured way to store the gathered data. We'll cleverly combine data scraped from both Amazon (for reference prices) and Facebook (for local listings).
What's a Good Tool for Scraping Facebook?
For tackling dynamic sites like Facebook, Playwright stands out. It's a Node.js library (with support for Python, Java, and .NET too) that lets you automate real web browsers (Chromium, Firefox, WebKit) through code. This approach is excellent for interacting with JavaScript-heavy sites and often flies under the radar of basic anti-scraping measures.
Playwright offers several advantages. Firstly, it's open-source and free to use, unlike many commercial scraping APIs which can get pricey, especially at scale.
Secondly, being backed by Microsoft and available across multiple languages means learning its API provides transferable skills. Write your logic once, and adapting it to Python or C# later is much easier.
Finally, Playwright includes intuitive commands and a fantastic 'codegen' tool. This tool can record your interactions within a browser window and automatically generate the corresponding Playwright code. Even if you're not a coding wizard, you can perform the actions manually, and Playwright translates them into a script for you.
How Do I Extract Data From Amazon?
Extracting data from Amazon, or indeed most websites, follows a similar pattern when using a tool like Playwright. The process generally involves these key stages:
Define precisely which data points you need (e.g., product title, price, rating, description).
Navigate the automation tool (Playwright) to the specific Amazon product pages (URLs).
Develop functions or use code generation to target and extract the desired data elements from the page's HTML structure.
Set up a scheduler (like cron jobs or a dedicated task scheduler) to run your scraper periodically and keep the data fresh.
Does Facebook Allow Scraping?
Officially, no. Facebook's terms of service generally prohibit automated data collection. Like most major platforms, they employ various techniques to detect and block scrapers. However, court rulings have generally upheld that scraping publicly accessible data is legal. The key is sticking to public information (like marketplace listings or public business pages) and not attempting to access private data or violating other laws (like copyright).
Fortunately for our project, both Facebook Marketplace listings and standard Amazon product pages are public. So, the primary challenge isn't legality but avoiding detection.
This is where proxies become essential. Using a reliable proxy service, like Evomi's Residential Proxies, is a game-changer. Instead of sending all your requests from a single IP address (a dead giveaway), residential proxies route your traffic through IP addresses belonging to real home internet connections worldwide. From Facebook's or Amazon's perspective, each request looks like it's coming from a different, genuine user.
Evomi provides ethically sourced proxies, ensuring compliance and reliability, backed by Swiss standards for quality and robust customer support. Our residential proxies start at just $0.49 per GB, offering a cost-effective way to scale your scraping operations. Integrating them is straightforward.
Since Playwright controls real browsers, it sends realistic browser headers, which helps avoid suspicion compared to simpler HTTP libraries that might use default, easily identifiable headers.
Also, remember to incorporate random delays between actions in your script. Instantaneous clicks and navigation patterns scream "bot." Pausing for a few seconds between loading a page and extracting data mimics human browsing behavior.
Other useful tactics include clearing cookies between sessions and removing tracking parameters from URLs, as these can also be used for identification.
Building Your Facebook Scraper (and Amazon Scraper)
Our Facebook scraper's core task is to visit the marketplace for a designated city, search for a specific product, and extract details from the resulting listings.
We could enhance this later by adding features like filtering by search radius, validating product specifics (model numbers, condition), checking seller reputations, etc. But for now, let's keep the initial version focused and functional.
Interestingly, while the Facebook data is our ultimate goal for the programmatic site, the process often starts with Amazon. The Amazon scraper will fetch reference prices for our target products (stored perhaps in a simple database). This price context adds value. Then, armed with the product name, we search Facebook Marketplace in various cities for current local deals.
The overall scraping workflow looks like this:
Import necessary libraries (Playwright, database connector like `mysql2`).
Establish a connection to your database (e.g., MySQL, PostgreSQL).
Retrieve the list of products (and their Amazon URLs) to monitor from the database.
For each product:
Scrape its Amazon page, extract the current price, and update the database.
Retrieve the list of target cities from the database.
For each product again:
For each city:
Construct the Facebook Marketplace search URL for the product in that city.
Scrape the search results page, extracting details (title, price, URL) for each listing.
Save these listings to the database, associated with the product and city.
Alright, let's code!
First, ensure you have Playwright installed in your Node.js project:
npm
Facebook Marketplace URLs have a predictable structure, usually:
facebook.com/marketplace/[city_slug]/
For example:
https://www.facebook.com/marketplace/denver/
And searches simply append search/?query=[search_term]
. For instance:
https://www.facebook.com/marketplace/denver/search/?query=iphone
This predictability is great for automation. Here’s a JavaScript snippet using Playwright to load a search results page and extract basic listing info:
const { chromium } = require('playwright');
(async () => {
// Launch browser (consider adding proxy settings here!)
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
const targetUrl = 'https://www.facebook.com/marketplace/denver/search/?query=iphone';
await page.goto(targetUrl);
console.log(`Navigated to: ${targetUrl}`);
// Wait for the main container of listings to appear
// Note: Selectors might change! Inspect element is your friend.
const listingContainerSelector = '[aria-label="Collection of Marketplace items"]';
try {
await page.waitForSelector(listingContainerSelector, { timeout: 10000 }); // Wait max 10 seconds
console.log('Listing container found.');
} catch (error) {
console.error(`Error waiting for selector: ${listingContainerSelector}`, error);
await browser.close();
return;
}
// Find all links within the container - these usually lead to individual listings
const listingLinks = page.locator(`${listingContainerSelector} a`);
const linkCount = await listingLinks.count();
console.log(`Found ${linkCount} potential listing links.`);
let scrapedListings = [];
for (let i = 0; i < linkCount; i++) {
const element = listingLinks.nth(i);
try {
const textContent = await element.innerText();
const itemUrl = await element.getAttribute('href');
// Basic filtering: check if URL seems like a marketplace item link
if (itemUrl && itemUrl.includes('/marketplace/item/')) {
scrapedListings.push({
description: textContent.replace(/\n/g, ' | '), // Replace newlines for easier logging
url: `https://www.facebook.com${itemUrl}` // Prepend domain if needed
});
}
} catch (e) {
// Sometimes elements might detach or have issues, log and continue
console.log(`Skipping element at index ${i} due to error: ${e.message}`);
}
// Add a small random delay to appear more human
await page.waitForTimeout(Math.random() * 500 + 100); // Delay 100-600 ms
}
console.log("--- Scraped Listings ---");
console.log(JSON.stringify(scrapedListings, null, 2)); // Pretty print the JSON
console.log("----------------------");
await browser.close();
})();
Let's break down key parts of that script:
require('playwright')
: Imports the Playwright library.chromium.launch()
: Starts a Chromium browser instance. You'll add proxy settings here later.browser.newContext() / newPage()
: Creates a clean browser session and opens a new tab.page.goto(...)
: Navigates to our target Facebook Marketplace search URL (Denver, searching for 'iphone').page.waitForSelector(...)
: Pauses the script until the main container holding the listings (identified by its `aria-label`) is loaded in the page. This prevents errors from trying to access elements before they exist. Using a timeout prevents indefinite waiting.page.locator(...)
: Creates a locator object representing all the link (`a`) elements within the listing container.locators.count()
: Gets the number of matching link elements found.for loop
: Iterates through each found link element.element.innerText() / getAttribute('href')
: Extracts the visible text content and the link's URL (`href` attribute) for each listing.scrapedListings.push(...)
: Adds the extracted data (description and URL) as an object to our results array. We clean up the description slightly and ensure the URL is complete.page.waitForTimeout(...)
: Introduces a small, randomized delay inside the loop to mimic human interaction speed.console.log(...)
: Outputs the collected data in a readable JSON format.browser.close()
: Closes the browser instance.
Running this should print output similar to this structure (actual items will vary):

Notice how the descriptions often bundle price, title, and location, separated by newline characters (which we replaced with ' | ' for clarity). The URLs typically follow the `/marketplace/item/ID/` pattern. This structured data is perfect for storing in a database.
You can integrate this with a database like MySQL. First, install a driver:
npm install mysql2
The mysql2
library is a popular choice for Node.js. You can then use code like this to connect and interact with your database:
const mysql = require('mysql2/promise'); // Using the promise wrapper
async function dbConnect() {
try {
const connection = await mysql.createConnection({
host: 'your_db_host', // e.g., 'localhost' or IP address
user: 'your_db_user',
password: 'your_db_password',
database: 'your_db_name'
});
console.log("Successfully connected to the database!");
return connection;
} catch (error) {
console.error("Database connection failed:", error);
throw error; // Re-throw error to handle it upstream
}
}
async function insertListing(connection, listingData) {
// Example: Assumes a table 'fb_listings' with columns: city, product_query, url, description
const sql = 'INSERT INTO fb_listings (city, product_query, url, description) VALUES (?, ?, ?, ?)';
try {
const [results] = await connection.execute(sql, [
listingData.city, // e.g., 'denver'
listingData.product_query, // e.g., 'iphone'
listingData.url,
listingData.description
]);
console.log(`Inserted listing with ID: ${results.insertId}`);
return results.insertId;
} catch (error) {
console.error("Failed to insert listing:", error);
// Consider more robust error handling/logging here
}
}
// --- Inside your main scraper async function ---
// ... (after scraping)
/*
const connection = await dbConnect();
if (connection) {
for (const listing of scrapedListings) {
await insertListing(connection, {
city: 'denver', // Pass these dynamically based on your loops
product_query: 'iphone',
url: listing.url,
description: listing.description
});
}
await connection.end(); // Close connection when done
console.log("Database connection closed.");
}
*/
// ... rest of the scraper code ...
You'd integrate this database logic into your main scraper script. Instead of just logging the `scrapedListings` array, you'd loop through it and call `insertListing` for each item, passing the relevant city and product query along with the scraped data.
This database connection approach can also be adapted for your Amazon scraper, perhaps fetching product URLs from one table and storing scraped prices in another.
How to Scrape Product Prices From Amazon
Now, let's tackle grabbing product prices from Amazon. This time, we'll leverage Playwright's code generation feature.
Open your terminal and run this command (replace the URL with a current Amazon product page):
npx playwright codegen https://www.amazon.com/Google-Pixel-7a-Unlocked-Smartphone/dp/B0BZW1N34P/
This command launches two windows: one is a regular Chromium browser navigated to the URL, and the other is the Playwright Inspector/Recorder.

Click the "Record" button in the Inspector. Now, any action you take in the browser window (clicking, typing, scrolling) will be translated into Playwright code in the Inspector window.
Experiment with it! Try clicking around. To grab the price, don't click "Record" just yet. Instead, click the "Pick locator" button (it looks like a target reticle) in the Inspector. Then, hover over the price element on the Amazon page in the browser window and click it. The Inspector will generate a selector for that element.

Be mindful that generated selectors aren't always ideal for scraping dynamic data like prices. For example, you might initially get something like:
page.getByText('$374.00').first()
This selector finds the element based on its current text content ('$374.00'). This is useless for tracking price *changes* because if the price becomes $380.00, this selector will no longer find the element!
You need a more stable selector, usually based on CSS IDs or classes that Amazon uses structurally. Hover around the price and its containing elements using "Pick locator" until you find something more robust, like:
page.locator('.a-price .a-offscreen')
.first() // Or perhaps: page.locator('#corePrice_feature_div .a-price-whole').first()// Amazon's structure changes, so inspect carefully!
You can copy these generated selectors (or the entire recorded script) and integrate them into your dedicated Amazon scraping function. Since the elements you target on a product page are usually consistent, you can often hard-code these selectors.
Remember the crucial step: avoiding blocks. Running these scripts directly from your IP will likely get you blocked quickly by both Facebook and Amazon. This is where Evomi proxies are essential.
After signing up with Evomi, you'll get access to your proxy credentials and endpoints. For residential proxies, the endpoint might look like rp.evomi.com
with a specific port, say 1000
for HTTP. You'll configure Playwright to use these:
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({
proxy: {
server: 'http://rp.evomi.com:1000', // Evomi residential HTTP endpoint
username: 'YOUR_EVOMI_USERNAME', // Your Evomi proxy user
password: 'YOUR_EVOMI_PASSWORD' // Your Evomi proxy password
}
});
const context = await browser.newContext();
const page = await context.newPage();
// ... your scraping logic using page.goto, page.locator, etc. ...
console.log('Launched browser with Evomi proxy configuration.');
// Example: Check IP to verify proxy is working (optional)
// await page.goto('https://geo.evomi.com/');
// console.log('Current IP info:', await page.textContent('body'));
// await page.goto('your_target_amazon_or_facebook_url');
// ... perform scraping ...
await browser.close();
})();
By routing your requests through Evomi's residential IPs, each connection appears unique and legitimate, drastically reducing the chances of detection and blocks. Consider trying out our free trial for Residential, Mobile, or Datacenter proxies to see the difference yourself!
With Playwright, robust selectors, and reliable proxies like Evomi's, you have a solid foundation. You can now expand these basic scrapers, add error handling, integrate database storage properly, and schedule them to run automatically using tools like cron jobs.
Conclusion
We've walked through the process of conceptualizing and building basic web scrapers for Facebook Marketplace and Amazon using Playwright and Node.js. You've seen how to identify target data, use Playwright's API and code generation features, and structure the scraping logic.
Crucially, we emphasized the importance of responsible scraping: sticking to public data and employing techniques like random delays and, most importantly, using high-quality residential proxies from a provider like Evomi to avoid IP blocks and maintain access.
While these examples provide a starting point, real-world scraping often requires more sophistication – handling dynamic loading (infinite scroll), complex JavaScript interactions, validating data quality, and robust error management. But the principles remain the same.
Armed with these techniques, you're well on your way to gathering the data needed for your programmatic SEO project or any other data-driven endeavor. Happy scraping!

Author
Sarah Whitmore
Digital Privacy & Cybersecurity Consultant
About Author
Sarah is a cybersecurity strategist with a passion for online privacy and digital security. She explores how proxies, VPNs, and encryption tools protect users from tracking, cyber threats, and data breaches. With years of experience in cybersecurity consulting, she provides practical insights into safeguarding sensitive data in an increasingly digital world.