‘SELECT * FROM JSON’: Querying Local JSON Files as Tables with Python
If you’ve ever wished you could run SQL queries directly on JSON files without setting up a full database, you’re in luck. Thanks to SQLite’s built-in JSON1 extension and a little Python script magic, you can treat local JSON files like SQL databases—enabling fast, structured queries with minimal overhead. Let’s dive into how you can turn raw JSON into queryable tables using Python and SQLite.
1. Why Query JSON with SQL?
JSON is everywhere—from APIs to config files—but searching, filtering, and analyzing data stored in JSON often requires writing complex loops or list comprehensions. SQL, on the other hand, is optimized for querying structured data. By combining Python, SQLite, and its JSON1 extension, we can:
- Run complex WHERE clause filters on nested JSON
- Use joins and aggregations
- Avoid converting JSON into pandas DataFrames or writing custom parsers
- Improve script reusability and query readability
This method is lightweight—no MySQL/PostgreSQL or other installations needed—and works entirely with built-in Python modules and SQLite.
2. Setting Up the Environment
First, make sure your Python version is 3.9 or higher (as recent SQLite builds include JSON1 by default).
Here’s a simple setup:
import sqlite3
import json
# Load your JSON file
with open('data.json') as f:
raw_data = json.load(f)
# Initialize the SQLite in-memory DB
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
# Enable the JSON1 extension (enabled by default in modern systems)
# No extra installation needed on standard Python setups
We’re using an in-memory SQLite database here for speed and no persistence—perfect for one-off scripts or ETL steps.
3. Converting JSON to a Queryable SQLite Table
Let’s suppose we have a JSON file like this:
[
{"id": 1, "name": "Alice", "address": {"city": "NYC", "zip": "10001"}},
{"id": 2, "name": "Bob", "address": {"city": "LA", "zip": "90001"}}
]
We first store this data in a table with a single JSON column called data
:
# Create a table with raw JSON content stored as TEXT
cur.execute('CREATE TABLE people (data TEXT)')
# Insert each JSON object as a row
for record in raw_data:
cur.execute('INSERT INTO people (data) VALUES (?)', (json.dumps(record),))
conn.commit()
Now each row in people
has a single text column that we can query using SQLite’s JSON functions.
4. Querying JSON Fields with SQL
To access fields inside the JSON, use SQLite’s json_extract
function. Here’s how to query all names:
cur.execute('SELECT json_extract(data, "$.name") FROM people')
print(cur.fetchall())
# Output: [('Alice',), ('Bob',)]
Query with conditions on nested fields:
cur.execute('''
SELECT json_extract(data, '$.name')
FROM people
WHERE json_extract(data, '$.address.city') = 'NYC'
''')
print(cur.fetchall())
# Output: [('Alice',)]
The $.address.city
is JSONPath syntax for accessing nested fields. You can also extract multiple fields:
cur.execute('''
SELECT
json_extract(data, '$.id') AS id,
json_extract(data, '$.name') AS name,
json_extract(data, '$.address.city') AS city
FROM people
''')
for row in cur.fetchall():
print(row)
This outputs a structured view akin to columns in an actual table.
5. Tips, Patterns, and Performance Considerations
Best practice #1: If you plan to query a specific field frequently, extract it into its own column outside the JSON blob:
# Hybrid table design
cur.execute('CREATE TABLE people_v2 (id INTEGER, name TEXT, city TEXT, raw_json TEXT)')
This lets you index structured fields while retaining raw JSON for full fidelity.
Best practice #2: Use json_type
, json_array_length
, and json_each
for advanced logic. For example, flattening arrays or dynamically joining to keys:
# Suppose "friends" is an array in each JSON object
cur.execute('''
SELECT json_extract(value, '$.name')
FROM people, json_each(people.data, '$.friends')
''')
Performance tip: For large JSON files, use a disk-backed Database instead of :memory:
and batch insert data.
conn = sqlite3.connect('json.db') # Use a file-based DB to persist and scale
Conclusion
Treating JSON like SQL tables may sound like a hack, but with SQLite and Python, it’s a powerful way to query structured data without an actual database. This can supercharge data processing scripts, rapid prototyping, or local data analysis workflows. With minimal setup, you get full SQL power on top of flexible JSON structures—making your next automation or reporting task a lot more elegant.
So next time you see a JSON file and need to answer questions like “how many users from NYC?” or “what’s the most common city?”, remember: you can SELECT * FROM JSON.
Useful links: