Rate Limiting an API Without Redis: A Simple In-Memory Python Solution

Rate limiting is a common strategy to prevent abuse and ensure fair usage of APIs. In production environments, Redis is often used as a backend store to manage rate limits across distributed systems due to its speed and TTL (Time To Live) support. But what if you’re just building a local mock server, prototyping an internal tool, or creating a personal side project? You don’t need the overhead of spinning up Redis just for basic throttling.

In this article, we’ll walk through building a simple, effective in-memory rate limiter in Python. This approach works well for single-process, small-scale API services or development environments.

1. What Is a Rate Limiter and Why You Need One

A rate limiter restricts how often a client (usually identified by IP address or API token) can make requests to an API in a given time period. This prevents server overload, mitigates abuse, and ensures quality of service for all users.

There are different rate limiting strategies, such as:

Fixed Window: Count requests in a fixed interval (e.g., 10 requests per minute).
Sliding Window: Consider timestamps over a rolling window.
Token Bucket / Leaky Bucket: Steady refill mechanics, used in more complex systems.

For simplicity, we’ll implement a fixed window approach using a Python decorator and an in-memory dictionary.

2. The Basic Decorator Structure

We’ll create a decorator that you can place on any API endpoint (e.g., Flask route). Let’s start by drafting the structure:

import time
from functools import wraps
from flask import request, jsonify

# Dictionary to track requests per IP
request_logs = {}

# Configurable rate: max 5 requests per 60 seconds
MAX_REQUESTS = 5
WINDOW_SIZE = 60  # seconds

def rate_limiter(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        ip = request.remote_addr
        current_time = time.time()

        # Get request times for this IP
        request_times = request_logs.get(ip, [])

        # Filter out requests older than our window
        request_times = [t for t in request_times if current_time - t < WINDOW_SIZE]

        if len(request_times) >= MAX_REQUESTS:
            return jsonify({"error": "Rate limit exceeded. Try again later."}), 429

        # Save updated request record
        request_times.append(current_time)
        request_logs[ip] = request_times

        return func(*args, **kwargs)
    return wrapper

This code keeps a record of recent request timestamps per IP, pruning old entries and checking the remaining count against a threshold.

3. Integrating With Flask Endpoints

Let’s put the decorator into action with a real Flask endpoint:

from flask import Flask

app = Flask(__name__)

@app.route("/api/data")
@rate_limiter
def get_data():
    return jsonify({"message": "Success! Here's your data."})

if __name__ == '__main__':
    app.run(debug=True)

Now, when users exceed 5 requests per minute from the same IP address, they’ll receive a 429 error. Otherwise, the endpoint responds normally.

4. Handling Thread Safety and Performance

This in-memory dictionary approach works best in single-threaded, single-process environments (like local development or a small container), but consider thread safety when using Flask with threaded WSGI servers.

To ensure safety and atomic updates, wrap writes to the request_logs dictionary in a thread-safe construct like Python’s threading.Lock:

import threading
lock = threading.Lock()

# Inside the wrapper:
with lock:
    request_times = [t for t in request_times if current_time - t < WINDOW_SIZE]
    if len(request_times) >= MAX_REQUESTS:
        return jsonify({"error": "Rate limit exceeded"}), 429
    request_times.append(current_time)
    request_logs[ip] = request_times

This approach prevents race conditions when multiple threads try to update the same dictionary.

5. Limitations, Considerations, and Enhancements

This solution is simple and effective, but be aware of its limitations:

No Persistence: All data is lost on server restart.
Memory Growth: With many clients, the dictionary will grow and never shrink. Consider a periodic cleanup strategy or TTL-based dicts.
Single-process Only: Won’t work in multi-instance deployments without shared memory (use Redis in production).

Some enhancements you might consider:

Rate-limit per API key instead of IP
Configure per-endpoint or per-client limits
Add exponential backoff headers

Also, you could extract the logic into a reusable class or middleware layer if you plan on applying it to multiple routes dynamically.

Final Thoughts

If you’re looking for a quick, dependency-free solution for rate limiting a local API during testing or lightweight deployments, an in-memory Python decorator like this is a solid choice. Just make sure you understand the limitations and transition to a more robust backend (like Redis or Memcached) when scaling up to production.

Happy coding and rate limiting responsibly!

Useful links: