Scrape Job Listings with Node.js and Cheerio

Web scraping offers immense value when it comes to automating data collection for personal or professional projects. Imagine being able to extract the latest job listings from your favorite job board automatically, saving time and helping you stay ahead in your job hunt. In this blog, we’ll walk through creating a Node.js script using Cheerio—a fast, flexible, and lean implementation of core jQuery—to extract job titles and links from job board HTML pages.

1. Setting Up the Environment

Before we dive into the code, we need to set up our development environment:

mkdir job-scraper
cd job-scraper
npm init -y
npm install axios cheerio

We install axios for HTTP requests and cheerio for parsing and traversing the HTML. These two packages form the backbone of our scraping utility.

2. Fetch HTML Content with Axios

We’ll start by fetching the page source of a sample job board. For demonstration purposes, let’s assume we’re scraping a mock site https://mockjobboard.com/jobs.

const axios = require('axios');
const cheerio = require('cheerio');

const JOB_BOARD_URL = 'https://mockjobboard.com/jobs';

async function fetchHTML(url) {
  try {
    const { data } = await axios.get(url);
    return data;
  } catch (error) {
    console.error(`Error fetching URL: ${url}`, error.message);
    return null;
  }
}

This fetchHTML function uses axios.get to retrieve the HTML. It returns the raw HTML source which we’ll pass to Cheerio for parsing.

3. Parse Job Titles and Links Using Cheerio

Now that we can fetch the HTML, let’s extract relevant information such as job titles and links. Suppose each job listing is within a <div class="job-card"> and contains an anchor tag for the job post.

function extractJobs(html) {
  const $ = cheerio.load(html);
  const jobs = [];

  $('.job-card').each((i, elem) => {
    const title = $(elem).find('a.job-link').text().trim();
    const link = $(elem).find('a.job-link').attr('href');

    jobs.push({ title, link: new URL(link, JOB_BOARD_URL).href });
  });

  return jobs;
}

Here we’re using Cheerio to mimic jQuery-like selectors. We loop through each .job-card and extract the text and href from the anchor tag. We also resolve relative URLs into full absolute URLs using the standard URL() constructor.

4. Combine and Run

Let’s wire everything together in an asynchronous script to fetch and log job listings.

(async () => {
  const html = await fetchHTML(JOB_BOARD_URL);

  if (html) {
    const jobs = extractJobs(html);
    console.log(`\nFound ${jobs.length} job listings:\n`);
    jobs.forEach((job, index) => {
      console.log(`${index + 1}. ${job.title} - ${job.link}`);
    });
  }
})();

This code uses an immediately invoked async function expression (IIFE) to run the logic. That means you can just run node index.js and the job listings will print to the console.

5. Tips, Improvements & Automation

Performance: For large pages or multiple sources, add throttling to avoid overloading servers and hitting CAPTCHAs. Consider using setTimeout or libraries like bottleneck.

Error handling: Always check for the presence of expected elements before accessing properties like .text() or .attr().

Automation ideas: Run this script on a schedule using cron jobs or GitHub Actions to notify you via email or Slack when new entries match specified keywords.

Data storage: Outputs can be stored in JSON or sent to a database so your application can later filter or analyze historical job trends.

Conclusion

With minimal dependencies and clear logic, we built a useful scraper in under 100 lines of code. If you’re monitoring job openings, seeking freelance gigs, or just optimizing how you track job availability, this Node.js + Cheerio combo offers a fast and reliable solution. From here, you can add filters, expand to multiple pages, or integrate it into your personal dashboard.

Useful links: