Web Scraping Price Data with BeautifulSoup and Writing to SQL
Web scraping is a valuable tool for monitoring prices, tracking competitors, or feeding your data pipelines with fresh data. In this article, we’ll walk through building an end-to-end pipeline that scrapes price data from an e-commerce site using BeautifulSoup and persists the cleaned results into a local SQLite database using SQLAlchemy. This tutorial is designed for intermediate Python developers who want to combine web scraping with persistent storage.
1. Setting Up the Environment
Before diving into code, ensure you have the necessary Python packages installed. For this project, we’ll need:
pip install requests beautifulsoup4 sqlalchemy
We’ll also use SQLite for simplicity, which comes bundled with Python.
2. Scraping the Page with BeautifulSoup
Let’s suppose we’re scraping prices from a mock e-commerce website that lists products within <div class="product">
, with names in <h2>
and prices in <span class="price">
.
import requests
from bs4 import BeautifulSoup
URL = "https://example-ecommerce.com/products"
response = requests.get(URL)
soup = BeautifulSoup(response.text, 'html.parser')
products = []
for item in soup.select("div.product"):
name_elem = item.find("h2")
price_elem = item.find("span", class_="price")
if name_elem and price_elem:
name = name_elem.text.strip()
price = float(price_elem.text.strip().replace("$", ""))
products.append({"name": name, "price": price})
print(products)
This code fetches the HTML, parses it using BeautifulSoup, and extracts the product name and price. Cleaning is important here: we strip whitespace and remove the dollar sign before converting to a float.
3. Defining the SQLAlchemy Database Model
With the scraped data in hand, let’s model our SQLite database schema using SQLAlchemy:
from sqlalchemy import Column, Integer, String, Float, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class Product(Base):
__tablename__ = 'products'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
price = Column(Float, nullable=False)
engine = create_engine('sqlite:///products.db')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
This model represents each product as a row in our products
table with an auto-incrementing primary key.
4. Inserting Scraped Data into SQLite
Now we’ll insert the scraped product data into the database:
for product in products:
p = Product(name=product["name"], price=product["price"])
session.add(p)
session.commit() # Don’t forget to commit!
Here, we loop over the scraped data, create Product
objects, and add them to the database session. We commit only once to reduce overhead and improve write performance.
Tip: Add a uniqueness check (e.g., a unique constraint or a check before insert) to avoid duplicating data in case the scraper is run multiple times.
5. Running the Pipeline Automatically
To make this scraper useful, you can automate it using a scheduler like cron
(Linux/Mac) or Windows Task Scheduler. Or embed everything in a single Python script:
def run_scraper():
response = requests.get(URL)
soup = BeautifulSoup(response.text, 'html.parser')
session = Session()
for item in soup.select("div.product"):
name_elem = item.find("h2")
price_elem = item.find("span", class_="price")
if name_elem and price_elem:
name = name_elem.text.strip()
price = float(price_elem.text.strip().replace("$", ""))
p = Product(name=name, price=price)
session.add(p)
session.commit()
print("Scraping completed and data saved.")
if __name__ == '__main__':
run_scraper()
This function encapsulates the entire workflow, making it easy to reuse and automate. Don’t forget to handle errors gracefully in a production setting—add try/except blocks, logging, and timeout settings.
Conclusion
With BeautifulSoup and SQLAlchemy, you’ve built a flexible pipeline that bridges raw HTML data extraction and structured database storage. This type of scraper is just the beginning—you can extend it to monitor historical price changes, integrate with reporting tools, or feed a price-tracking dashboard. Always respect the target site’s robots.txt rules and avoid overwhelming servers with high-frequency scraping. Happy coding!
Useful links: