Scrape and Schedule: Automate Web Reports Using Python & Cron

Scrape and Schedule: Automate Web Reports Using Python & Cron

Scrape and Schedule: Automate Web Reports Using Python & Cron

 

Automatically receiving a daily report in your inbox with up-to-date financial data can save time, improve decision-making, and reduce manual effort. In this guide, we’ll walk through how to create an automated system using Python and `cron` that scrapes real-time financial data online, formats it into an email-friendly summary, and delivers it consistently every day. We’ll focus on best practices like error handling, performance optimization, and readability.

1. Scraping Financial Data Using Python

To begin, we’ll use the `requests` and `BeautifulSoup` libraries to scrape financial data. Yahoo Finance is a commonly used source and works well for demonstration purposes:

import requests
from bs4 import BeautifulSoup

def fetch_stock_price(ticker):
    url = f"https://finance.yahoo.com/quote/{ticker}"
    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, headers=headers)
    
    if response.status_code != 200:
        raise Exception("Failed to fetch data")

    soup = BeautifulSoup(response.text, 'html.parser')
    price_tag = soup.find("fin-streamer", {"data-field": "regularMarketPrice"})
    return price_tag.text if price_tag else "N/A"

# Example usage
print(fetch_stock_price("AAPL"))

We use a user-agent header because many sites block bots or report differently to them. This function retrieves the current price of any stock given its ticker symbol.

2. Formatting Data into a Report

Once we have the data, formatting it cleanly for email enhances readability. We’ll convert the data into a plain-text email body. The `email.mime` modules help craft a well-structured message:

from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

def build_email_report(data_dict):
    body_lines = [f"{ticker}: ${price}" for ticker, price in data_dict.items()]
    body_content = "\n".join(body_lines)

    msg = MIMEMultipart()
    msg["Subject"] = "Daily Financial Summary"
    msg.attach(MIMEText(body_content, "plain"))
    return msg

You can use this in combination with a `dict` of ticker symbols and their prices, like so:

sample_data = {
    "AAPL": fetch_stock_price("AAPL"),
    "GOOGL": fetch_stock_price("GOOGL"),
    "MSFT": fetch_stock_price("MSFT")
}
msg = build_email_report(sample_data)

3. Sending the Email Automatically

With our message ready, we’ll send it using Python’s `smtplib`. For Gmail, you may need to enable an App Password:

import smtplib

def send_email(msg, sender, recipient, smtp_server, smtp_port, password):
    msg["From"] = sender
    msg["To"] = recipient

    with smtplib.SMTP(smtp_server, smtp_port) as server:
        server.starttls()
        server.login(sender, password)
        server.sendmail(sender, recipient, msg.as_string())

# Usage Example
# send_email(msg, "you@gmail.com", "you@gmail.com", "smtp.gmail.com", 587, "your_app_password")

Security tip: Never hardcode passwords. Use environment variables or secrets managers for storing credentials.

4. Bundling Everything into a Script

Let’s bundle all of the above into a single Python script called `daily_report.py`:

import os

def main():
    tickers = ["AAPL", "GOOGL", "MSFT"]
    data = {ticker: fetch_stock_price(ticker) for ticker in tickers}
    msg = build_email_report(data)

    send_email(
        msg,
        sender=os.environ["EMAIL_USER"],
        recipient=os.environ["EMAIL_RECIPIENT"],
        smtp_server="smtp.gmail.com",
        smtp_port=587,
        password=os.environ["EMAIL_PASS"]
    )

if __name__ == "__main__":
    main()

Before setting up automation, test manually: run the script and verify the email lands in your inbox. Check logs to verify scraping doesn’t fail due to broken tag selectors or request throttling.

5. Scheduling the Script with Cron

Cron is a Unix tool that schedules jobs at fixed times. Run `crontab -e` and add this line to run the script daily at 8am:

0 8 * * * /usr/bin/python3 /path/to/daily_report.py >> /path/to/log.txt 2>&1

Make sure the script is executable and uses full paths for Python and the script itself. Log any errors for troubleshooting using `>> log.txt 2>&1`. Also, ensure the environment variables are set inside the cron environment, typically by exporting them in a shell script that wraps your Python call.

Final Thoughts

With this setup, you now have an automated pipeline that delivers real-time financial summaries directly to your email inbox every morning. The combination of Python and `cron` is powerful for scripting lightweight automation tasks. As a next step, consider adding error reporting (e.g., email yourself on failure), expanding the scope of the report (market indices, currency rates), or switching from plain-text to HTML emails for better formatting. Automation can start small—but offers massive long-term wins.

 

Useful links: