Transform CSV Data into JSON Using Python in 10 Lines
Converting CSV files into JSON format is a common task in data workflows. Whether you’re cleaning up messy exports, integrating with APIs, or preparing datasets for machine learning, understanding how to make this transformation quickly and efficiently is invaluable. In this article, we’ll walk through how to achieve this using Python’s built-in csv
and json
modules — all in under 10 lines of code.
1. Why Convert CSV to JSON?
CSV (Comma-Separated Values) is a flat-file format that’s common for tabular data. However, JSON (JavaScript Object Notation) is a more flexible, structured format — better suited for integration with web APIs, configuration files, or JavaScript applications. JSON supports nesting and object hierarchies, making it ideal for structured data representation.
Let’s say you work in data engineering and receive a messy CSV export daily that needs to be converted to JSON before being consumed by a web service. Automating this transformation can save time and prevent errors.
2. Sample CSV Input and Use Case
Assume we have a CSV file called data.csv
with the following content:
name,age,email
Alice,30,alice@example.com
Bob,25,bob@example.com
Charlie,35,charlie@foobar.com
We want to convert this into a list of JSON objects, like so:
[
{
"name": "Alice",
"age": "30",
"email": "alice@example.com"
},
{
"name": "Bob",
"age": "25",
"email": "bob@example.com"
},
{
"name": "Charlie",
"age": "35",
"email": "charlie@foobar.com"
}
]
This format is cleaner, structured, and ready for API ingestion.
3. Converting CSV to JSON in 10 Lines
Here’s the entire core logic written in 10 lines of Python:
import csv
import json
with open('data.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile)
rows = list(reader)
with open('data.json', 'w') as jsonfile:
json.dump(rows, jsonfile, indent=2)
Let’s break this down:
csv.DictReader
reads the header line and maps each row to a dictionary.list(reader)
creates a list of dictionaries from the CSV rows.json.dump
writes the data into a new file in JSON format, with pretty indentation.
That’s it — a powerful one-liner pipeline in each step that’s easy to understand and extend.
4. Handling Messy or Inconsistent CSV Files
Real-world CSVs are rarely clean. They may include trailing whitespace, inconsistent quotes, missing headers, or embedded commas. Here’s how you can handle common issues:
import csv
import json
with open('data.csv', 'r', newline='') as csvfile:
reader = csv.DictReader(csvfile, skipinitialspace=True)
rows = [dict((k.strip(), v.strip()) for k, v in row.items()) for row in reader]
with open('data.json', 'w') as jsonfile:
json.dump(rows, jsonfile, indent=2)
This adds the following improvements:
skipinitialspace=True
: Trims whitespace after delimiters.- Stripping keys and values ensures clean dictionary entries without accidental spacing.
For malformed rows or bad headers, you may want to use try/except
blocks or clean up input files before parsing.
5. Tips for Large Files and Performance
While this solution works for small to medium files, loading large CSVs entirely into memory may cause bottlenecks. Here’s how you can stream rows and write line-by-line:
import csv
import json
with open('data.csv', 'r') as csvfile, open('data.json', 'w') as jsonfile:
reader = csv.DictReader(csvfile)
jsonfile.write('[')
first = True
for row in reader:
if not first:
jsonfile.write(',\n')
json.dump(row, jsonfile)
first = False
jsonfile.write(']')
This version never holds the full dataset in memory and is suitable for gigabyte-scale CSV files. It writes valid JSON syntax while controlling memory use efficiently.
6. Wrapping Up
Using Python’s csv
and json
modules, you can quickly and reliably transform flat CSV files into structured JSON objects with minimal code. We covered how to handle irregular data, write performance-conscious code, and stay within 10 lines for simple use cases. This pattern becomes especially powerful when scripted into daily ETL jobs, API backends, or data cleaning workflows.
Whether you’re working with data pipelines or automating repetitive exporting tasks, Python makes it easy to go from raw text to structured data in a clean and readable way.
Useful links: