From REST to JSON: Automate Data Pulls from Public APIs Using Python

From REST to JSON: Automate Data Pulls from Public APIs Using Python

From REST to JSON: Automate Data Pulls from Public APIs Using Python

 

Public APIs have become essential for developers wanting to acquire live, structured data — from weather updates and financial markets to COVID-19 statistics and space launches. But manually fetching data through web browsers or cURL commands doesn’t scale well. That’s where Python comes in.

In this article, we’ll build a flexible, reusable Python script that automates fetching JSON data from public APIs. You’ll learn how to use the requests library to interact with RESTful services and argparse to offer command-line capabilities. Along the way, we’ll explore good practices, optimizations, and tips to make your script production-ready.

1. Setting the Stage: Understanding REST and JSON APIs

Before we jump into coding, it’s important to understand the structure of REST APIs and how they deliver data — usually in JSON.

A REST API uses HTTP methods (GET, POST, PUT, DELETE) to allow interaction with web services. A public API typically exposes an endpoint like https://api.example.com/data that returns data in JSON format when you send a GET request.

Let’s look at a simple GET request using Python’s requests library:

import requests

response = requests.get('https://api.coindesk.com/v1/bpi/currentprice.json')

if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"Failed to retrieve data: {response.status_code}")

This snippet fetches the current Bitcoin price. If the response is successful (HTTP 200), it parses and prints the JSON.

2. Building the Base Script: Modular Fetch Logic

Let’s create a function to make our requests modular and reusable for any endpoint. We’ll also add error handling for robustness.

def fetch_json(url, headers=None, params=None):
    try:
        response = requests.get(url, headers=headers, params=params)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as err:
        print(f"Error fetching data: {err}")
        return None

This function handles not only basic requests but also gracefully catches and logs most HTTP exceptions.

Tip: Always use raise_for_status() — it saves debugging time by propagating HTTP error codes as Python exceptions.

3. Adding CLI Power with Argparse

Giving our script a command-line interface makes it versatile and user-friendly. Here’s how we use argparse to accept a URL and output filename as arguments.

import argparse

def parse_args():
    parser = argparse.ArgumentParser(description='Fetch JSON data from a public API')
    parser.add_argument('url', help='The API endpoint to fetch from')
    parser.add_argument('-o', '--output', help='File to save the data', default='output.json')
    return parser.parse_args()

Now we can run the script like this:

python fetch_api.py https://api.coindesk.com/v1/bpi/currentprice.json -o bitcoin.json

This design enables us to generalize the fetch process across many APIs without hardcoding anything.

4. Saving and Formatting the JSON Response

Once we fetch the data, storing it to disk is straightforward. Still, using nicely indented JSON improves readability and debugging.

import json

def save_json(data, filepath):
    with open(filepath, 'w') as f:
        json.dump(data, f, indent=4)
    print(f"Data saved to {filepath}")

This function makes sure your JSON is neatly written with indentation. It’s also safer than using str(data) or print into a text file.

Pro Tip: Consider dumping data in line-delimited JSON (each line is a JSON record) if you’re dealing with APIs that return arrays of objects — it’s more scalable in data pipelines.

5. Putting It All Together: The Complete Script

Here’s how all the parts integrate into a working CLI tool:

import requests
import argparse
import json

def fetch_json(url, headers=None, params=None):
    try:
        response = requests.get(url, headers=headers, params=params)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as err:
        print(f"Error fetching data: {err}")
        return None

def save_json(data, filepath):
    with open(filepath, 'w') as f:
        json.dump(data, f, indent=4)
    print(f"Data saved to {filepath}")

def parse_args():
    parser = argparse.ArgumentParser(description='Fetch JSON data from a public API')
    parser.add_argument('url', help='The API endpoint to fetch from')
    parser.add_argument('-o', '--output', help='File to save the data', default='output.json')
    return parser.parse_args()

def main():
    args = parse_args()
    data = fetch_json(args.url)
    if data:
        save_json(data, args.output)

if __name__ == '__main__':
    main()

This script can be used with any REST API that returns JSON. It’s reusable, safe, and CLI-friendly — a great starting point for automation or data harvesting tasks.

6. Tips, Optimizations, and Scaling Considerations

  • Use retry logic: Wrap requests in a retry loop using urllib3.util.retry or a library like tenacity.
  • Add throttling for rate-limited APIs: Use time.sleep() or implement exponential backoff.
  • Store timestamps in filenames if automating regular pulls (e.g. data_2024-06-08.json)
  • Cache responses for repetitive queries using libraries like requests-cache
  • Extend to pagination by checking for next URLs or offset/limit fields in the response

Automation doesn’t always mean complexity — and this script is a prime example. With a few lines of code, you can unlock the full potential of publicly available data APIs.

 

Useful links: