Reading & Writing Text Files
Data scientists spend much of their time working with files - reading datasets, saving results, and processing information. Understanding file handling is essential before diving into data analysis libraries.
Why File Handling Matters
Think about your daily work: you download CSV files, save analysis results, read configuration files, and export reports. All of this requires file handling - the ability to read from and write to files on your computer.
In data science, you'll constantly work with files containing datasets, model configurations, and output results. While libraries like Pandas make working with data files easier, understanding Python's built-in file handling gives you the foundation to work with any file type.
Opening Files: The open() Function
Python's built-in open() function is your gateway to working with files. It takes a filename
and a mode that specifies what you want to do with the file.
open(filename, mode)
Opens a file and returns a file object that you can use to read or write data. The mode determines whether you're reading, writing, or appending to the file.
Always remember: Files must be closed after use to free up system resources.
The best practice is using the with statement (covered in Section 04).
Here are the most common file modes:
| Mode | Description | Creates File? | Overwrites? |
|---|---|---|---|
'r' |
Read (default) - opens file for reading | No (error if missing) | No |
'w' |
Write - opens file for writing | Yes | Yes (erases content!) |
'a' |
Append - adds to end of file | Yes | No (keeps content) |
'r+' |
Read and write | No (error if missing) | Depends on operation |
'w' mode on an existing file will erase all its content
before writing! If you want to add to a file, use 'a' (append) mode instead.
Reading Files
There are several ways to read content from a file. Download the sample file below to follow along:
Method 1: read() - Reads the entire file as one string:
# Read entire file as one string
with open("sample_text.txt", "r") as file:
content = file.read()
print(content)
Method 2: readline() - Reads one line at a time:
# Read one line at a time
with open("sample_text.txt", "r") as file:
first_line = file.readline()
second_line = file.readline()
print("First:", first_line.strip())
print("Second:", second_line.strip())
Method 3: readlines() - Reads all lines into a list:
# Read all lines into a list
with open("sample_text.txt", "r") as file:
lines = file.readlines()
print(f"Total lines: {len(lines)}")
print(lines[:3]) # First 3 lines
Method 4: Loop through lines - Most memory-efficient for large files:
# Loop through file line by line (memory efficient!)
with open("sample_text.txt", "r") as file:
for line in file:
print(line.strip())
Writing Files
Writing to files is just as straightforward. Use 'w' mode to create a new file
(or overwrite existing), or 'a' to append to an existing file.
# Writing to a new file (creates it if doesn't exist)
file = open("output.txt", "w")
file.write("Line 1: Hello!\n")
file.write("Line 2: This is Python.\n")
file.close()
# The file now contains:
# Line 1: Hello!
# Line 2: This is Python.
Use writelines() to write multiple lines from a list:
# Writing multiple lines at once
lines = ["Apple\n", "Banana\n", "Cherry\n"]
file = open("fruits.txt", "w")
file.writelines(lines)
file.close()
Appending adds content to the end without erasing:
# Append to existing file (doesn't erase!)
file = open("fruits.txt", "a")
file.write("Dragonfruit\n")
file.close()
# fruits.txt now contains:
# Apple
# Banana
# Cherry
# Dragonfruit
Practice Questions: Text Files
Test your understanding with these hands-on exercises.
Task: Read sample_text.txt and count how many lines it contains.
Expected output: Total lines: X (where X is the actual count)
Show Solution
with open("sample_text.txt", "r") as file:
lines = file.readlines()
print(f"Total lines: {len(lines)}")
Given:
items = ["Milk", "Bread", "Eggs", "Butter"]
Task: Write each item to a file called shopping_list.txt, one item per line.
Show Solution
items = ["Milk", "Bread", "Eggs", "Butter"]
with open("shopping_list.txt", "w") as file:
for item in items:
file.write(item + "\n")
Task: Read sample_text.txt and print only the lines that contain the word "data" (case-insensitive).
Hint: Use .lower() for case-insensitive comparison.
Show Solution
with open("sample_text.txt", "r") as file:
for line in file:
if "data" in line.lower():
print(line.strip())
Given:
events = ["User login", "File uploaded", "Report generated"]
Task: Write each event to activity_log.txt with a timestamp. Each line should look like: 2025-12-20 10:30:45 - User login
Hint: Use from datetime import datetime
Show Solution
from datetime import datetime
events = ["User login", "File uploaded", "Report generated"]
with open("activity_log.txt", "w") as file:
for event in events:
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
file.write(f"{timestamp} - {event}\n")
CSV File Operations
CSV (Comma-Separated Values) is one of the most common formats for storing tabular data. Every data scientist encounters CSV files daily - from spreadsheet exports to database dumps.
What is CSV?
A CSV file is simply a text file where each line represents a row of data, and values within each row are separated by commas (or sometimes semicolons or tabs). The first row typically contains column headers.
# Preview of sales_data.csv:
# product,quantity,price,revenue,date
# Laptop,5,45000,225000,2024-01-15
# Mouse,25,500,12500,2024-01-15
# ...
Python's built-in csv module makes it easy to read and write CSV files without worrying
about edge cases like commas inside quoted values or different line endings.
# Import the csv module
import csv
Reading CSV Files
The csv.reader() function reads CSV files row by row, returning each row as a list of values.
import csv
# Read the sales data CSV file
with open("sales_data.csv", "r") as file:
reader = csv.reader(file)
# Get header row
header = next(reader)
print("Columns:", header)
# Read first 3 data rows
for i, row in enumerate(reader):
if i < 3:
print(row)
You can access individual values by index:
import csv
with open("sales_data.csv", "r") as file:
reader = csv.reader(file)
next(reader) # Skip header
for row in reader:
product = row[0]
revenue = int(row[3])
print(f"{product}: ₹{revenue:,}")
Writing CSV Files
Use csv.writer() to write data to CSV files. The writerow() method
writes a single row, while writerows() writes multiple rows at once.
import csv
# Data to write
students = [
["name", "age", "city", "score"],
["Priya", 22, "Mumbai", 85],
["Rahul", 24, "Delhi", 92],
["Ankit", 23, "Bangalore", 78]
]
# Write to CSV file
with open("new_students.csv", "w", newline="") as file:
writer = csv.writer(file)
writer.writerows(students) # Write all rows at once
newline="" when opening CSV files for writing on Windows.
This prevents blank rows from appearing between data rows.
Writing row by row gives you more control:
import csv
with open("scores.csv", "w", newline="") as file:
writer = csv.writer(file)
# Write header
writer.writerow(["Student", "Math", "Science"])
# Write data rows one by one
writer.writerow(["Priya", 95, 88])
writer.writerow(["Rahul", 87, 92])
writer.writerow(["Meera", 91, 85])
Working with DictReader and DictWriter
Instead of accessing columns by index (which can be confusing), DictReader lets you
access values by column name. This makes your code much more readable!
import csv
# DictReader - access by column name (much cleaner!)
with open("sales_data.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
# Access by column name - so much clearer!
product = row["product"]
revenue = int(row["revenue"])
date = row["date"]
print(f"{date}: {product} - ₹{revenue:,}")
Similarly, DictWriter lets you write using column names:
import csv
# Data as list of dictionaries
students = [
{"name": "Priya", "age": 22, "score": 85},
{"name": "Rahul", "age": 24, "score": 92},
{"name": "Ankit", "age": 23, "score": 78}
]
# DictWriter - specify column names (fieldnames)
with open("students_dict.csv", "w", newline="") as file:
fieldnames = ["name", "age", "score"]
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader() # Write the header row
writer.writerows(students) # Write all data rows
- Column order might change
- You want readable, self-documenting code
- Working with many columns
- Data naturally maps to dictionaries
- Simple files with few columns
- Processing data sequentially
- Performance is critical (slightly faster)
- No header row in file
Practice Questions: CSV Files
Test your understanding with these hands-on exercises.
Task: Read customers.csv and print each row as a dictionary.
Show Solution
import csv
with open("customers.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
print(row)
Task: Count how many data rows (excluding header) are in customers.csv.
Show Solution
import csv
with open("customers.csv", "r") as file:
reader = csv.DictReader(file)
count = sum(1 for row in reader)
print(f"Total rows: {count}")
Task: Read sales_data.csv and print only rows where the revenue is greater than 20000.
Hint: Convert the revenue string to int for comparison.
Show Solution
import csv
with open("sales_data.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
if int(row['revenue']) > 20000:
print(row)
Task: Read sales_data.csv and calculate the average of the 'revenue' column.
Show Solution
import csv
with open("sales_data.csv", "r") as file:
reader = csv.DictReader(file)
revenues = [int(row['revenue']) for row in reader]
average = sum(revenues) / len(revenues)
print(f"Average revenue: ₹{average:,.2f}")
JSON Data Handling
JSON (JavaScript Object Notation) has become the universal language for data exchange on the web. APIs, configuration files, and NoSQL databases all speak JSON - making it essential for data scientists.
What is JSON?
JSON is a lightweight text format that looks remarkably similar to Python dictionaries and lists. It's human-readable, easy to parse, and supported by virtually every programming language.
# Preview of students.json structure:
# {
# "students": [
# {"name": "Priya Sharma", "gpa": 3.8, ...},
# {"name": "Rahul Kumar", "gpa": 3.6, ...}
# ],
# "institution": "Data Science Academy"
# }
Notice how JSON maps directly to Python types:
| JSON Type | Python Type | Example |
|---|---|---|
| object | dict | {"key": "value"} |
| array | list | [1, 2, 3] |
| string | str | "hello" |
| number | int/float | 42 or 3.14 |
| true/false | True/False | true → True |
| null | None | null → None |
# Import the json module
import json
Reading JSON Files
Use json.load() to read JSON from a file. It automatically converts JSON into
Python dictionaries and lists.
import json
# Read the students.json file (download above)
with open("students.json", "r") as file:
data = json.load(file)
# Access top-level data
print(data["institution"]) # Data Science Academy
print(data["semester"]) # Fall 2024
# Access nested student data
students = data["students"]
print(f"Total students: {len(students)}")
Working with JSON arrays (lists of objects) is common when dealing with nested data:
import json
with open("students.json", "r") as file:
data = json.load(file)
# Loop through the list of student dictionaries
for student in data["students"]:
name = student["name"]
gpa = student["gpa"]
courses = ", ".join(student["courses"])
print(f"{name} (GPA: {gpa}) - {courses}")
Writing JSON Files
Use json.dump() to write Python data to a JSON file. The indent parameter
makes the output human-readable (pretty-printed).
import json
# Python dictionary to save
student = {
"name": "Rahul Kumar",
"age": 24,
"enrolled": True,
"courses": ["Machine Learning", "Deep Learning"],
"gpa": 3.8
}
# Write to JSON file (indent=2 for pretty formatting)
with open("output.json", "w") as file:
json.dump(student, file, indent=2)
The output file will look like this (nicely formatted):
# Contents of output.json:
# {
# "name": "Rahul Kumar",
# "age": 24,
# "enrolled": true,
# "courses": [
# "Machine Learning",
# "Deep Learning"
# ],
# "gpa": 3.8
# }
indent=2 or indent=4 for readable output.
Omit indent for compact output (saves space but harder to read).
Working with JSON Strings
Sometimes you need to convert between JSON strings and Python objects without files. This is common when working with APIs or web services.
json.loads() - Parse a JSON string into Python (the 's' stands for 'string'):
import json
# JSON string (maybe from an API response)
json_string = '{"name": "Meera", "score": 91, "passed": true}'
# Parse string to Python dictionary
data = json.loads(json_string)
print(data["name"]) # Meera
print(data["passed"]) # True (boolean, not string!)
json.dumps() - Convert Python to a JSON string:
import json
# Python dictionary
student = {"name": "Vikram", "scores": [85, 90, 88]}
# Convert to JSON string
json_string = json.dumps(student)
print(json_string)
# Output: {"name": "Vikram", "scores": [85, 90, 88]}
# Pretty-printed string
pretty_json = json.dumps(student, indent=2)
print(pretty_json)
json.load()/json.dump()- work with filesjson.loads()/json.dumps()- work with strings
Practice Questions: JSON Data
Test your understanding with these hands-on exercises.
Given:
api_response = '{"status": "success", "count": 42, "data": [1, 2, 3]}'
Task: Parse this JSON string and print the count value.
Show Solution
import json
api_response = '{"status": "success", "count": 42, "data": [1, 2, 3]}'
result = json.loads(api_response)
print(result["count"]) # 42
Given:
config = {
"app_name": "DataAnalyzer",
"version": "1.0",
"debug": True
}
Task: Save this config dictionary to a file called config.json with indentation.
Show Solution
import json
config = {
"app_name": "DataAnalyzer",
"version": "1.0",
"debug": True
}
with open("config.json", "w") as file:
json.dump(config, file, indent=2)
Task: Read products.json, add a new product to the list, and save it back.
New product: {"name": "Laptop", "price": 999.99}
Show Solution
import json
# Read existing data
with open("products.json", "r") as file:
products = json.load(file)
# Add new product
products.append({"name": "Laptop", "price": 999.99})
# Save updated data
with open("products.json", "w") as file:
json.dump(products, file, indent=2)
Given:
json_str = '''
{
"company": "TechCorp",
"employees": [
{"name": "Alice", "department": "Engineering"},
{"name": "Bob", "department": "Sales"},
{"name": "Carol", "department": "Engineering"}
]
}'''
Task: Parse the JSON and print the names of all employees in the Engineering department.
Show Solution
import json
json_str = '''
{
"company": "TechCorp",
"employees": [
{"name": "Alice", "department": "Engineering"},
{"name": "Bob", "department": "Sales"},
{"name": "Carol", "department": "Engineering"}
]
}'''
data = json.loads(json_str)
for emp in data["employees"]:
if emp["department"] == "Engineering":
print(emp["name"])
# Output: Alice, Carol
Context Managers (with Statement)
Throughout this lesson, you've seen the with statement. It's not just syntactic sugar -
it's a powerful pattern that prevents bugs and keeps your code clean. Let's understand why it matters.
The Problem with Manual File Handling
When you open a file, your operating system allocates resources to track it. If you forget to close the file, these resources stay locked - potentially causing issues like:
- Memory leaks in long-running programs
- Data not being written to disk (stuck in buffer)
- Other programs unable to access the file
- Maximum open file limit reached
Here's the old (problematic) way of handling files:
# The WRONG way - easy to forget close() or miss it on errors
file = open("data.txt", "r")
content = file.read()
# ... do something with content ...
# What if an error occurs here? close() never runs!
file.close() # Easy to forget this line
Even worse, if an error occurs between open() and close(), the file
never gets closed:
# Error prevents close() from running!
file = open("data.txt", "r")
content = file.read()
result = int(content) # ValueError if content isn't a number!
file.close() # This line NEVER executes if error above
The with Statement Solution
The with statement is Python's elegant solution. It automatically closes the file
when the block ends - even if an error occurs! This is called a "context manager."
# The RIGHT way - file is ALWAYS closed automatically
with open("data.txt", "r") as file:
content = file.read()
# Even if an error occurs here...
# File is automatically closed here, guaranteed!
Context Manager (with statement)
A context manager ensures that resources are properly managed - acquired when
needed and released when done. The with statement handles setup and cleanup
automatically.
Rule: Always use with when working with files. There's no good
reason to use manual open/close in modern Python.
Here's how it handles errors gracefully:
# Even with errors, file gets closed!
try:
with open("data.txt", "r") as file:
content = file.read()
result = int(content) # Might raise ValueError
except ValueError:
print("Could not convert to integer")
# File is closed regardless of error or success
Working with Multiple Files
You can open multiple files in a single with statement - useful for copying data
or comparing files:
# Open two files at once
with open("input.txt", "r") as infile, open("output.txt", "w") as outfile:
for line in infile:
# Process and write each line
outfile.write(line.upper())
For many files, use separate lines for readability:
# Multiple files - cleaner with parentheses (Python 3.10+)
with (
open("source.txt", "r") as source,
open("backup.txt", "w") as backup,
open("log.txt", "a") as log
):
content = source.read()
backup.write(content)
log.write("Backup created successfully\n")
File Paths and Common Errors
Understanding file paths prevents many common errors. There are two types:
Path relative to your current working directory:
"data.txt""data/input.txt""../parent_folder/file.txt"
Full path from system root:
"/home/priya/data.txt" (Linux/Mac)"C:/Users/Priya/data.txt" (Windows)
Common file errors and solutions:
# FileNotFoundError - file doesn't exist
# Solution: Check path, use os.path.exists()
import os
if os.path.exists("data.txt"):
with open("data.txt", "r") as file:
content = file.read()
else:
print("File not found!")
# PermissionError - no access rights
# Solution: Check file permissions, run as admin, or use different path
# UnicodeDecodeError - encoding mismatch
# Solution: Specify encoding
with open("data.txt", "r", encoding="utf-8") as file:
content = file.read()
encoding="utf-8" when working with text files
that might contain non-English characters. UTF-8 handles most international characters correctly.
Practice Questions: Context Managers
Test your understanding with these hands-on exercises.
Given (bad code):
file = open("data.txt", "r")
content = file.read()
print(content)
file.close()
Task: Rewrite this using a context manager (with statement).
Show Solution
with open("data.txt", "r") as file:
content = file.read()
print(content)
# File automatically closed!
Task: Write code that checks if "report.txt" exists before reading it. Print "File not found" if it doesn't exist.
Show Solution
import os
if os.path.exists("report.txt"):
with open("report.txt", "r") as file:
print(file.read())
else:
print("File not found")
Task: Open two files simultaneously - read from "source.txt" and write its contents to "destination.txt".
Show Solution
with open("source.txt", "r") as source, open("destination.txt", "w") as dest:
content = source.read()
dest.write(content)
Task: Read a large file line by line (memory efficient), count how many lines contain the word "error", and save the count to "error_count.txt".
Hint: Don't load entire file into memory - iterate line by line.
Show Solution
error_count = 0
with open("server_log.txt", "r") as file:
for line in file: # Memory efficient - one line at a time
if "error" in line.lower():
error_count += 1
with open("error_count.txt", "w") as output:
output.write(f"Total errors found: {error_count}")
Key Takeaways
Always Use with Statement
Context managers automatically close files, even if errors occur. Never use manual open/close.
Know Your File Modes
'r' for read, 'w' for write (overwrites!), 'a' for append. Wrong mode = data loss or errors.
CSV for Tabular Data
Use DictReader/DictWriter for cleaner code. Access columns by name instead of index.
JSON for Structured Data
load/dump for files, loads/dumps for strings. JSON maps directly to Python dicts and lists.
Specify Encoding
Use encoding="utf-8" for text files with international characters to avoid decode errors.
Check File Existence
Use os.path.exists() before reading. Handle FileNotFoundError gracefully in your code.
Knowledge Check
Quick Quiz
Test what you've learned about Python file handling