Use Python’s Requests and BeautifulSoup to Scrape Product Prices

Use Python’s Requests and BeautifulSoup to Scrape Product Prices

Use Python’s Requests and BeautifulSoup to Scrape Product Prices

 

Web scraping allows developers to automate data collection from websites, making it a powerful tool for monitoring prices, extracting content, and populating datasets. In this blog post, we’ll explore how to scrape product prices from an e-commerce website using Python’s requests library in combination with BeautifulSoup. We’ll walk through setting up the environment, sending HTTP requests, parsing HTML, locating data with selectors, and optimizing the process for performance. By the end, you’ll have a functional script capable of collecting dynamic price data from the web.

1. Setting Up Prerequisites

Before diving into code, ensure your environment includes Python 3 and two essential libraries: requests and beautifulsoup4. You can install them with pip:

pip install requests beautifulsoup4

These libraries enable HTTP requests and HTML parsing respectively. We’ll also use lxml for faster HTML parsing:

pip install lxml

For this tutorial, we’ll be scraping dummy product data from a sample site like http://books.toscrape.com, which is designed for web scraping practice.

2. Making HTTP Requests with Requests

The first step in any scraping script is retrieving the page content. We do this using the requests library:

import requests

url = 'http://books.toscrape.com/catalogue/page-1.html'
response = requests.get(url)

if response.status_code == 200:
    html = response.text
    print("Page fetched successfully")
else:
    print(f"Failed to retrieve page: {response.status_code}")

This code sends a GET request to the URL, returning the raw HTML of the page if successful. Always verify the response status before proceeding—site downtime or rate limits could lead to failed scrapes.

3. Parsing HTML with BeautifulSoup

Once we have the HTML, the next step is to parse it into a navigable structure using BeautifulSoup:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'lxml')

# Find all book listings on the page
products = soup.find_all('article', class_='product_pod')
print(f"Found {len(products)} products on the page")

We use find_all with the tag name and class to grab all product elements. This works because the site’s HTML structure wraps each book in an <article class="product_pod"> element. Understanding the site’s structure using browser DevTools is essential to writing reliable scraping logic.

4. Extracting Product Titles and Prices

Now let’s iterate through each product and pull out the title and price. Each element has well-defined tags with predictable class names—perfect for scraping.

for product in products:
    title = product.h3.a['title']
    price = product.find('p', class_='price_color').text
    print(f"Product: {title}, Price: {price}")

This snippet uses BeautifulSoup’s attribute access to grab the item title directly from the anchor’s title attribute. The price is in a <p> tag with class price_color. Always use a combination of tag names and CSS class names for precision scraping.

Tip: Strip the currency symbol or convert strings to numerical format if you intend to sort or analyze prices later.

5. Pagination and Automation Tips

On sites with multiple pages (like this one), scraping all product prices requires iterating over paginated pages:

all_products = []

for page in range(1, 6):  # scrape first 5 pages
    url = f"http://books.toscrape.com/catalogue/page-{page}.html"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'lxml')
    products = soup.find_all('article', class_='product_pod')

    for product in products:
        title = product.h3.a['title']
        price = product.find('p', class_='price_color').text.strip()
        all_products.append({"title": title, "price": price})

print(f"Collected {len(all_products)} products")

This approach makes your script more robust and capable of collecting data across large product catalogs. Be sure to add small delays between requests (e.g., time.sleep(1)) to prevent hammering the server.

Best Practices:

  • Respect robots.txt files and site terms of service.
  • Use headers to mimic a real browser request.
  • Cache results during development for testing to avoid repeated hits.

Final Thoughts

Using requests and BeautifulSoup, you can effectively scrape product listings, collect price data, and automate updates for e-commerce intelligence or personal price tracking tools. This technique forms the foundation for more advanced scraping tasks, including using headless browsers (e.g., Selenium) for dynamic content.

Going forward, consider storing scraped data in a CSV file or database, setting up cron jobs for periodic scraping, and implementing change detection logic for actionable alerts. Web scraping offers immense value—use it responsibly and ethically.

 

Useful links: