Scraping Job Listings from Indeed Using Python and BeautifulSoup

Scraping Job Listings from Indeed Using Python and BeautifulSoup

Scraping Job Listings from Indeed Using Python and BeautifulSoup

 

Introduction

Web scraping offers developers an easy way to extract structured data from websites. One practical example is collecting job listings from job search portals like Indeed. In this tutorial, we’ll walk through how to create a Python-based scraper using BeautifulSoup that collects job titles, company names, locations, and summaries from Indeed search results.

1. Setting Up the Environment

To begin, we need to prepare our scraping tools. We’ll use requests to fetch HTML content and BeautifulSoup to parse it. You can install the required libraries using pip:

pip install requests beautifulsoup4

Now, create a new Python script and import the necessary modules:

import requests
from bs4 import BeautifulSoup
import time
import csv

2. Understanding the Indeed Page Structure

Before writing any scraping code, it’s essential to inspect the page structure of Indeed job search results. Use your browser’s developer tools (Right-click → Inspect) to locate the HTML elements that contain the job details.

Each job listing is typically enclosed in a div with the class name 'job_seen_beacon'. Within that div, you’ll find:

  • h2.jobTitle – for the job title
  • span.companyName – for the company name
  • div.companyLocation – for the job location
  • div.job-snippet – for the job summary

3. Writing the Scraper

Now let’s write a function to scrape job data from a single page of results:

def scrape_jobs(keyword, location, page=0):
    url = f"https://www.indeed.com/jobs?q={keyword}&l={location}&start={page * 10}"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
    }

    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        print("Failed to retrieve page")
        return []

    soup = BeautifulSoup(response.text, 'html.parser')
    job_cards = soup.find_all('div', class_='job_seen_beacon')

    jobs = []
    for card in job_cards:
        title_elem = card.find('h2', class_='jobTitle')
        company_elem = card.find('span', class_='companyName')
        location_elem = card.find('div', class_='companyLocation')
        summary_elem = card.find('div', class_='job-snippet')

        job = {
            'title': title_elem.text.strip() if title_elem else None,
            'company': company_elem.text.strip() if company_elem else None,
            'location': location_elem.text.strip() if location_elem else None,
            'summary': summary_elem.text.strip().replace('\n', '') if summary_elem else None
        }

        jobs.append(job)

    return jobs

This function generates a search URL, makes a GET request with a user-agent header to simulate a browser, and parses job postings from the resulting HTML.

4. Pagination and Aggregating Results

Indeed paginates search results, often showing 10–15 per page. You can scrape multiple pages by looping and adjusting the page parameter. Let’s expand our scraper to gather results from the first few pages.

def scrape_multiple_pages(keyword, location, pages=3):
    all_jobs = []
    for page in range(pages):
        print(f"Scraping page {page + 1}")
        jobs = scrape_jobs(keyword, location, page)
        all_jobs.extend(jobs)
        time.sleep(1)  # Be respectful of Indeed’s servers
    return all_jobs

Running this function with scrape_multiple_pages("python developer", "new york") will return a list of job dictionaries from the first three pages.

5. Saving Results to CSV

Structured output is most useful when saved to a CSV file for further analysis or reporting. Here’s how to do it:

def save_to_csv(jobs, filename="jobs.csv"):
    with open(filename, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=['title', 'company', 'location', 'summary'])
        writer.writeheader()
        for job in jobs:
            writer.writerow(job)

# Example usage
results = scrape_multiple_pages("python developer", "remote", 5)
save_to_csv(results)

This will export a CSV file with columns for each field scraped.

Tips and Considerations

  • Use randomized delays between requests to avoid being blocked.
  • Rotate user-agent headers or IP addresses for large-scale scraping.
  • Always check the website’s robots.txt file to see if scraping is allowed.
  • Use logging and error handling for stability in production.
  • For dynamic content heavy pages, consider using Selenium or Playwright with headless browsers.

Conclusion

Scraping job listings from Indeed helps automate data collection for analytics, research, or career planning. By combining requests with BeautifulSoup and understanding HTML patterns, you can build powerful and flexible scrapers. Stay ethical and respectful when scraping, and always abide by a site’s terms of service.

 

Useful links: