Web Scraping with Cheerio.js: Extract Job Listings into CSV

Web scraping has become an essential tool for gathering data from websites for analysis, automation, or reporting. In this tutorial, you’ll learn how to scrape job listings from an HTML page using Cheerio.js, a fast and flexible library that mimics jQuery’s syntax on the server-side. We’ll walk through each step along the way, from parsing HTML to storing your results in a structured CSV format.

1. Setting Up Your Environment

First, let’s get the basics out of the way. You’ll need Node.js installed. Then, create a new project and install the required packages:

mkdir cheerio-job-scraper
cd cheerio-job-scraper
npm init -y
npm install cheerio axios fs csv-writer

Here’s what each package does:

axios: Handles HTTP requests to retrieve HTML content.
cheerio: Parses and traverses HTML easily, similar to jQuery.
csv-writer: Outputs our scraped data as a CSV file.

2. Fetching HTML with Axios

Let’s create a script that fetches HTML content from a sample jobs page:

const axios = require('axios');
const cheerio = require('cheerio');

async function fetchHTML(url) {
  try {
    const { data } = await axios.get(url);
    return data;
  } catch (error) {
    console.error(`Error fetching the page: ${error}`);
  }
}

const url = 'https://example.com/jobs';
fetchHTML(url).then(html => console.log(html.slice(0, 500)));

This fetches the first part of the HTML, which you can examine to identify the exact structure around the job postings.

3. Parsing Job Listings with Cheerio

Now that we have the HTML, we can parse it using Cheerio. Suppose our job listings look like this:

<div class="job">
  <h2 class="title">Frontend Developer</h2>
  <p class="company">Acme Inc.</p>
  <span class="location">Remote</span>
</div>

Here’s how to extract this data:

async function scrapeJobs(url) {
  const html = await fetchHTML(url);
  const $ = cheerio.load(html);
  const jobs = [];

  $('.job').each((index, element) => {
    const title = $(element).find('.title').text().trim();
    const company = $(element).find('.company').text().trim();
    const location = $(element).find('.location').text().trim();

    jobs.push({ title, company, location });
  });

  return jobs;
}

scrapeJobs(url).then(jobs => console.log(jobs));

This snippet gathers job data into a structured array of objects, which will be useful for outputting into CSV.

4. Writing to a CSV File

Once the data is collected, we can write it to a CSV file using csv-writer:

const fs = require('fs');
const createCsvWriter = require('csv-writer').createObjectCsvWriter;

function writeCSV(jobs, outputPath = 'jobs.csv') {
  const csvWriter = createCsvWriter({
    path: outputPath,
    header: [
      { id: 'title', title: 'Job Title' },
      { id: 'company', title: 'Company' },
      { id: 'location', title: 'Location' },
    ],
  });

  csvWriter.writeRecords(jobs)
    .then(() => console.log('CSV file was written successfully.'));
}

Now we can connect it all together:

scrapeJobs(url).then(jobs => writeCSV(jobs));

This will produce a jobs.csv file in your directory with clean, structured job listings.

5. Tips for Robust and Scalable Scraping

Here are practical tips to improve performance and maintainability:

Rate Limiting: Introduce delays or use libraries like p-limit to avoid overloading servers.
Error Handling: Use try/catch at each step, especially around HTTP requests and element parsing.
Selectors Maintenance: Use class names or HTML structures that are less likely to change; avoid brittle selectors.
Pagination Support: Enhance your scraper to support multiple pages by detecting and following pagination links.

Example loop for pagination (simplified):

async function scrapeMultiplePages(baseUrl, numPages) {
  let allJobs = [];

  for (let i = 1; i <= numPages; i++) {
    const pageUrl = `${baseUrl}?page=${i}`;
    const jobs = await scrapeJobs(pageUrl);
    allJobs = allJobs.concat(jobs);
  }

  writeCSV(allJobs);
}

This makes your scraper more scalable and useful for larger job boards.

Conclusion

Using Cheerio.js with Node.js is a powerful way to extract data from static web pages. We walked through fetching HTML, parsing it using simple jQuery-like syntax, and outputting to CSV using best practices. With this solid foundation, you can build scrapers for e-commerce prices, real estate listings, news headlines, and more—all useful in automating and analyzing web data.

Just remember to respect robots.txt and website terms of service. Happy scraping!

Useful links: