Web Scraping Price Data with BeautifulSoup and Writing to SQL

Web Scraping Price Data with BeautifulSoup and Writing to SQL

Web Scraping Price Data with BeautifulSoup and Writing to SQL

 

Web scraping is a valuable tool for monitoring prices, tracking competitors, or feeding your data pipelines with fresh data. In this article, we’ll walk through building an end-to-end pipeline that scrapes price data from an e-commerce site using BeautifulSoup and persists the cleaned results into a local SQLite database using SQLAlchemy. This tutorial is designed for intermediate Python developers who want to combine web scraping with persistent storage.

1. Setting Up the Environment

Before diving into code, ensure you have the necessary Python packages installed. For this project, we’ll need:

pip install requests beautifulsoup4 sqlalchemy

We’ll also use SQLite for simplicity, which comes bundled with Python.

2. Scraping the Page with BeautifulSoup

Let’s suppose we’re scraping prices from a mock e-commerce website that lists products within <div class="product">, with names in <h2> and prices in <span class="price">.

import requests
from bs4 import BeautifulSoup

URL = "https://example-ecommerce.com/products"

response = requests.get(URL)
soup = BeautifulSoup(response.text, 'html.parser')

products = []

for item in soup.select("div.product"):
    name_elem = item.find("h2")
    price_elem = item.find("span", class_="price")
    
    if name_elem and price_elem:
        name = name_elem.text.strip()
        price = float(price_elem.text.strip().replace("$", ""))
        products.append({"name": name, "price": price})

print(products)

This code fetches the HTML, parses it using BeautifulSoup, and extracts the product name and price. Cleaning is important here: we strip whitespace and remove the dollar sign before converting to a float.

3. Defining the SQLAlchemy Database Model

With the scraped data in hand, let’s model our SQLite database schema using SQLAlchemy:

from sqlalchemy import Column, Integer, String, Float, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Base = declarative_base()

class Product(Base):
    __tablename__ = 'products'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    price = Column(Float, nullable=False)

engine = create_engine('sqlite:///products.db')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()

This model represents each product as a row in our products table with an auto-incrementing primary key.

4. Inserting Scraped Data into SQLite

Now we’ll insert the scraped product data into the database:

for product in products:
    p = Product(name=product["name"], price=product["price"])
    session.add(p)

session.commit()  # Don’t forget to commit!

Here, we loop over the scraped data, create Product objects, and add them to the database session. We commit only once to reduce overhead and improve write performance.

Tip: Add a uniqueness check (e.g., a unique constraint or a check before insert) to avoid duplicating data in case the scraper is run multiple times.

5. Running the Pipeline Automatically

To make this scraper useful, you can automate it using a scheduler like cron (Linux/Mac) or Windows Task Scheduler. Or embed everything in a single Python script:

def run_scraper():
    response = requests.get(URL)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    session = Session()
    for item in soup.select("div.product"):
        name_elem = item.find("h2")
        price_elem = item.find("span", class_="price")
        
        if name_elem and price_elem:
            name = name_elem.text.strip()
            price = float(price_elem.text.strip().replace("$", ""))
            p = Product(name=name, price=price)
            session.add(p)

    session.commit()
    print("Scraping completed and data saved.")

if __name__ == '__main__':
    run_scraper()

This function encapsulates the entire workflow, making it easy to reuse and automate. Don’t forget to handle errors gracefully in a production setting—add try/except blocks, logging, and timeout settings.

Conclusion

With BeautifulSoup and SQLAlchemy, you’ve built a flexible pipeline that bridges raw HTML data extraction and structured database storage. This type of scraper is just the beginning—you can extend it to monitor historical price changes, integrate with reporting tools, or feed a price-tracking dashboard. Always respect the target site’s robots.txt rules and avoid overwhelming servers with high-frequency scraping. Happy coding!

 

Useful links: