Module 8.3

CSV and JSON

CSV and JSON are the most common data exchange formats. CSV stores tabular data like spreadsheets. JSON stores structured data like configurations and API responses. Mastering both formats is essential for data processing and web development.

50 min
Intermediate
Hands-on
What You'll Learn
  • Reading CSV files
  • Writing CSV files
  • Parsing JSON data
  • Nested JSON handling
  • Converting between formats
Contents
01

Data Formats

CSV and JSON serve different purposes. CSV is ideal for tabular data like databases and spreadsheets. JSON is perfect for nested, hierarchical data like API responses and configurations.

CSV Structure
name , age , city Header row
Alice , 30 , New York Data row 1
Bob , 25 , Los Angeles Data row 2
Charlie , 35 , Chicago Data row 3
First row = column names
Each row = one record
Commas separate values
CSV = Comma-Separated Values. Simple, human-readable tabular data.
JSON Structure
{
"name": "Alice",
"age": 30,
"active": true,
"skills": ["Python", "SQL"],
"address": {
"city": "New York",
"zip": "10001"
}
}
"name" Key-value string pair
30 Number value (no quotes)
true Boolean (true/false)
[...] Array (Python list)
{...} Nested object (dict)
JSON = JavaScript Object Notation. Supports nested, hierarchical data.

CSV Best For

  • Spreadsheet data
  • Database exports
  • Simple tabular records
  • Excel compatibility

JSON Best For

  • API responses
  • Configuration files
  • Nested/hierarchical data
  • Web applications
02

Reading CSV

Python's csv module provides two main ways to read CSV files: csv.reader for lists and csv.DictReader for dictionaries with column names as keys.

Using csv.reader

import csv

with open('data.csv', 'r') as file:
    reader = csv.reader(file)
    header = next(reader)  # Get header row
    print(f"Columns: {header}")
    
    for row in reader:
        print(row)  # Each row is a list

csv.reader returns each row as a list. Use next() to skip the header row if you want to process data separately.

Using csv.DictReader

import csv

with open('users.csv', 'r') as file:
    reader = csv.DictReader(file)
    
    for row in reader:
        # row is a dict with column names as keys
        print(f"Name: {row['name']}, Age: {row['age']}")

DictReader automatically uses the first row as field names. Each row becomes a dictionary, making code more readable.

csv.reader

  • Returns lists
  • Access by index: row[0]
  • Faster for large files
  • Less readable code

csv.DictReader

  • Returns dicts
  • Access by name: row['name']
  • Self-documenting code
  • Slight overhead
03

Writing CSV

Write CSV files using csv.writer for lists or csv.DictWriter when you have dictionaries.

Using csv.writer

import csv

data = [
    ['name', 'age', 'city'],
    ['Alice', 30, 'New York'],
    ['Bob', 25, 'Los Angeles']
]

with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

Always use newline='' when opening CSV files for writing on Windows to prevent extra blank rows.

Using csv.DictWriter

import csv

users = [
    {'name': 'Alice', 'age': 30, 'city': 'New York'},
    {'name': 'Bob', 'age': 25, 'city': 'Los Angeles'}
]

with open('users.csv', 'w', newline='') as file:
    fieldnames = ['name', 'age', 'city']
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    
    writer.writeheader()  # Write column names
    writer.writerows(users)

DictWriter requires fieldnames to define column order. Use writeheader() to write the header row.

Practice: CSV Files

Task: Write read_csv(filename) that prints each row as a formatted string.

Show Solution
import csv

def read_csv(filename):
    with open(filename, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            print(row)

# Test
read_csv('users.csv')

Task: Write save_to_csv(data, filename) that saves a list of dicts to CSV.

Show Solution
import csv

def save_to_csv(data, filename):
    if not data:
        return
    fieldnames = data[0].keys()
    with open(filename, 'w', newline='') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(data)

# Test
users = [{'name': 'Alice', 'age': 30}]
save_to_csv(users, 'test.csv')

Task: Write filter_csv(input, output, column, value) that copies only matching rows.

Show Solution
import csv

def filter_csv(input_file, output_file, column, value):
    with open(input_file, 'r') as f_in:
        reader = csv.DictReader(f_in)
        rows = [r for r in reader if r[column] == value]
    
    if rows:
        with open(output_file, 'w', newline='') as f_out:
            writer = csv.DictWriter(f_out, fieldnames=rows[0].keys())
            writer.writeheader()
            writer.writerows(rows)

# Test
filter_csv('users.csv', 'nyc.csv', 'city', 'New York')
04

Reading JSON

Python's json module converts JSON data to Python objects. Objects become dicts, arrays become lists, and JSON types map naturally to Python types.

Loading from File

import json

# Load from file
with open('config.json', 'r') as file:
    data = json.load(file)

print(data['name'])
print(data['settings']['theme'])

json.load() reads from a file object. The result is a Python dict (or list if the JSON root is an array).

Loading from String

import json

json_string = '{"name": "Alice", "age": 30}'
data = json.loads(json_string)

print(data['name'])  # Alice
print(type(data))    # 

json.loads() (with 's' for string) parses a JSON string. Use this for API responses or data from other sources.

Type Mapping

JSON Type Python Type
object {}dict
array []list
string ""str
number (int)int
number (float)float
true/falseTrue/False
nullNone
05

Writing JSON

Convert Python objects to JSON using json.dump() for files or json.dumps() for strings.

Writing to File

import json

data = {
    'name': 'Alice',
    'age': 30,
    'skills': ['Python', 'SQL', 'Git']
}

with open('user.json', 'w') as file:
    json.dump(data, file, indent=2)

Use indent=2 for human-readable output. Without indent, JSON is written as a single line.

Converting to String

import json

data = {'name': 'Bob', 'active': True}

# Compact JSON
json_str = json.dumps(data)
print(json_str)  # {"name": "Bob", "active": true}

# Pretty JSON
pretty = json.dumps(data, indent=2)
print(pretty)

json.dumps() returns a string. Useful for sending JSON in API responses or storing in databases.

Practice: JSON Files

Task: Write load_json(filename) that loads and returns JSON data from a file.

Show Solution
import json

def load_json(filename):
    with open(filename, 'r') as f:
        return json.load(f)

# Test
data = load_json('config.json')
print(data)

Task: Write save_json(data, filename) that saves a dict to a pretty-printed JSON file.

Show Solution
import json

def save_json(data, filename):
    with open(filename, 'w') as f:
        json.dump(data, f, indent=2)

# Test
config = {'theme': 'dark', 'lang': 'en'}
save_json(config, 'settings.json')

Task: Write update_json(filename, key, value) that updates a key in a JSON file.

Show Solution
import json

def update_json(filename, key, value):
    with open(filename, 'r') as f:
        data = json.load(f)
    
    data[key] = value
    
    with open(filename, 'w') as f:
        json.dump(data, f, indent=2)

# Test
update_json('settings.json', 'theme', 'light')
06

Nested Data

Real-world JSON often contains nested objects and arrays. Navigate them using chained key access and handle missing keys gracefully.

Accessing Nested Data

import json

data = {
    'user': {
        'name': 'Alice',
        'address': {
            'city': 'New York',
            'zip': '10001'
        }
    },
    'orders': [
        {'id': 1, 'total': 99.99},
        {'id': 2, 'total': 149.99}
    ]
}

# Access nested values
city = data['user']['address']['city']
first_order = data['orders'][0]['total']
print(f"City: {city}, First order: ${first_order}")

Chain keys to drill down into nested structures. Use indices for arrays within the JSON.

Safe Access with .get()

# Risky: raises KeyError if missing
# city = data['user']['location']['city']

# Safe: returns None or default if missing
city = data.get('user', {}).get('location', {}).get('city', 'Unknown')

# Or use try/except
try:
    city = data['user']['location']['city']
except KeyError:
    city = 'Unknown'

Chained .get() calls prevent KeyError on missing keys. Provide empty dicts as defaults to continue the chain.

Iterating Nested Arrays

data = {
    'users': [
        {'name': 'Alice', 'skills': ['Python', 'SQL']},
        {'name': 'Bob', 'skills': ['JavaScript', 'React']}
    ]
}

for user in data['users']:
    print(f"{user['name']}:")
    for skill in user['skills']:
        print(f"  - {skill}")

Iterate over nested arrays using standard for loops. Each element can be a dict, list, or primitive value.

Practice: Nested Data

Task: Write get_nested(data, keys) that safely gets a value by a list of keys.

Show Solution
def get_nested(data, keys, default=None):
    result = data
    for key in keys:
        if isinstance(result, dict):
            result = result.get(key)
        elif isinstance(result, list) and isinstance(key, int):
            result = result[key] if key < len(result) else None
        else:
            return default
        if result is None:
            return default
    return result

# Test
data = {'user': {'address': {'city': 'NYC'}}}
print(get_nested(data, ['user', 'address', 'city']))

Task: Write flatten_json(data) that converts nested dict to flat with dot notation keys.

Show Solution
def flatten_json(data, prefix=''):
    result = {}
    for key, value in data.items():
        new_key = f"{prefix}.{key}" if prefix else key
        if isinstance(value, dict):
            result.update(flatten_json(value, new_key))
        else:
            result[new_key] = value
    return result

# Test
nested = {'user': {'name': 'Alice', 'address': {'city': 'NYC'}}}
print(flatten_json(nested))
# {'user.name': 'Alice', 'user.address.city': 'NYC'}

Task: Write csv_to_json(csv_file, json_file) that converts a CSV file to JSON array.

Show Solution
import csv
import json

def csv_to_json(csv_file, json_file):
    with open(csv_file, 'r') as f:
        reader = csv.DictReader(f)
        data = list(reader)
    
    with open(json_file, 'w') as f:
        json.dump(data, f, indent=2)

# Test
csv_to_json('users.csv', 'users.json')

Task: Write json_to_csv(json_file, csv_file) that converts a JSON array to CSV.

Show Solution
import csv
import json

def json_to_csv(json_file, csv_file):
    with open(json_file, 'r') as f:
        data = json.load(f)
    
    if not data:
        return
    
    with open(csv_file, 'w', newline='') as f:
        writer = csv.DictWriter(f, fieldnames=data[0].keys())
        writer.writeheader()
        writer.writerows(data)

# Test
json_to_csv('users.json', 'users_export.csv')

Task: Write merge_json(files, output) that merges multiple JSON files into one.

Show Solution
import json

def merge_json(files, output):
    merged = {}
    for filename in files:
        with open(filename, 'r') as f:
            data = json.load(f)
            merged.update(data)
    
    with open(output, 'w') as f:
        json.dump(merged, f, indent=2)

# Test
merge_json(['config1.json', 'config2.json'], 'merged.json')

Key Takeaways

CSV for Tabular Data

Use CSV for spreadsheet-like data with rows and columns. Simple and universal.

JSON for Structured Data

Use JSON for nested, hierarchical data. Perfect for APIs and configs.

DictReader is Cleaner

csv.DictReader makes code readable by using column names as keys.

newline for CSV

Always use newline='' when opening CSV files for writing on Windows.

indent for Readability

Use indent=2 in json.dump() for human-readable JSON output.

Safe Nested Access

Use .get() chains or try/except for safely accessing nested JSON data.

Knowledge Check

Quick Quiz

Test what you've learned about CSV and JSON handling

1 What is the main advantage of csv.DictReader over csv.reader?
2 What does json.loads() do?
3 Why use newline='' when writing CSV on Windows?
4 What does indent=2 do in json.dump()?
5 What method writes the header row in csv.DictWriter?
6 What Python type does a JSON array become when loaded?
Answer all questions to check your score