File I/O

Why File I/O?

Programs that only work with data in memory lose everything when they stop. File I/O allows your code to persist data, read configuration, process logs, and interact with other programs through files.

Key Concept

Files Are Persistent Storage

Files store data beyond program execution. Text files hold human-readable content, while binary files store raw bytes for images, audio, and executables.

Why it matters: File I/O is essential for configuration files, data processing, logging, exports, and integration with other systems.

Configuration Files

Store settings, preferences, and API keys that persist across runs.

Data Processing

Read CSV, JSON, and text files to analyze and transform data.

Logging

Write application logs for debugging and monitoring.

Data Exchange

Share data between programs using standard file formats.

File Modes

When opening a file, you specify a mode that determines what operations are allowed. Choose the wrong mode and you might accidentally overwrite important data or fail to read existing content.

File Opening Modes

`'r'`

Read

Open for reading (default). File must exist or raises FileNotFoundError.

`'w'`

Write

Open for writing. Creates file if not exists. Truncates existing content!

`'a'`

Append

Open for appending. Creates file if not exists. Preserves existing content.

`'rb'` / `'wb'`

Binary Modes

Read/write raw bytes. Use for images, audio, PDFs, and other non-text files.

`'r+'` / `'w+'`

Read + Write

Open for both reading and writing. Use when you need to modify files in place.

# Different ways to open files
file = open('data.txt', 'r')    # Read mode (default)
file = open('data.txt', 'w')    # Write mode (overwrites!)
file = open('data.txt', 'a')    # Append mode
file = open('image.png', 'rb')  # Binary read
file = open('output.bin', 'wb') # Binary write

Warning: Write mode ('w') truncates the file immediately when opened, erasing all existing content!

Context Managers

The with statement ensures files are properly closed even if errors occur. This is the recommended way to work with files in Python.

Context Manager Flow

with open('file.txt') as f:

File Opened __enter__() called

Your Code Runs read, write, process

(even if error occurs)

File Closed __exit__() called automatically

The with statement guarantees cleanup, preventing resource leaks and data corruption.

Manual vs Context Manager

Bad: Manual Close

# Manual close - risky!
file = open('data.txt')
content = file.read()
# If error happens here...
# file never gets closed!
file.close()

Good: With Statement

# Context manager - safe!
with open('data.txt') as file:
    content = file.read()
    # Even if error happens...
# File is ALWAYS closed

Always use the with statement for file operations. It handles cleanup automatically, even when exceptions occur.

Reading Files

Python offers multiple ways to read file content: all at once, line by line, or in chunks. Choose based on file size and your processing needs.

Read Entire File

# Read entire file as string
with open('data.txt', 'r') as file:
    content = file.read()
    print(content)

# Read all lines as list
with open('data.txt', 'r') as file:
    lines = file.readlines()
    for line in lines:
        print(line.strip())

Use read() for small files. readlines() returns a list where each element is a line including the newline character.

Read Line by Line (Memory Efficient)

# Iterate directly - best for large files
with open('big_file.txt', 'r') as file:
    for line in file:
        print(line.strip())

# Read single line
with open('data.txt', 'r') as file:
    first_line = file.readline()
    second_line = file.readline()

Iterating over the file object is memory efficient because it reads one line at a time instead of loading everything into memory.

Reading with Encoding

# Specify encoding for non-ASCII characters
with open('data.txt', 'r', encoding='utf-8') as file:
    content = file.read()

# Handle encoding errors
with open('data.txt', 'r', encoding='utf-8', errors='ignore') as f:
    content = f.read()

Always specify encoding='utf-8' when working with international text. The errors parameter controls how encoding errors are handled.

Practice: Reading Files

Task: Create a file 'sample.txt' with three lines, then read and print its content.

Show Solution

# First create the file
with open('sample.txt', 'w') as f:
    f.write("Line 1\nLine 2\nLine 3")

# Then read and print
with open('sample.txt', 'r') as f:
    content = f.read()
    print(content)

Task: Write a function count_lines(filename) that returns the number of lines in a file.

Show Solution

def count_lines(filename):
    with open(filename, 'r') as f:
        return len(f.readlines())

# Test
print(count_lines('sample.txt'))  # 3

Task: Write search_file(filename, word) that returns all lines containing the word.

Show Solution

def search_file(filename, word):
    matches = []
    with open(filename, 'r') as f:
        for line in f:
            if word in line:
                matches.append(line.strip())
    return matches

# Test
print(search_file('sample.txt', 'Line'))

Writing Files

Writing files creates new content or overwrites existing files. Append mode adds to the end without destroying existing data.

Write Mode (Overwrites)

# Write creates/overwrites file
with open('output.txt', 'w') as file:
    file.write("Hello, World!\n")
    file.write("This is line 2.\n")

# Write multiple lines at once
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]
with open('output.txt', 'w') as file:
    file.writelines(lines)

Write mode erases everything in the file when opened. Use append mode if you want to preserve existing content.

Append Mode (Preserves)

# Append adds to end of file
with open('log.txt', 'a') as file:
    file.write("New log entry\n")

# Append with timestamp
from datetime import datetime
with open('log.txt', 'a') as file:
    timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    file.write(f"[{timestamp}] Event occurred\n")

Append mode is perfect for log files where you want to add new entries without losing history.

Writing with Print

# Use print() with file parameter
with open('output.txt', 'w') as file:
    print("Using print!", file=file)
    print("Automatic newlines", file=file)
    print("Value:", 42, file=file)

Using print() with file parameter adds automatic newlines and supports multiple arguments like regular print.

Practice: Writing Files

Task: Write a function save_list(filename, items) that writes each item on a new line.

Show Solution

def save_list(filename, items):
    with open(filename, 'w') as f:
        for item in items:
            f.write(f"{item}\n")

# Test
fruits = ['apple', 'banana', 'cherry']
save_list('fruits.txt', fruits)

Task: Write log_message(message) that appends timestamped messages to 'app.log'.

Show Solution

from datetime import datetime

def log_message(message):
    with open('app.log', 'a') as f:
        ts = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        f.write(f"[{ts}] {message}\n")

# Test
log_message("Application started")
log_message("User logged in")

Task: Write copy_file(source, dest) that copies content from one file to another.

Show Solution

def copy_file(source, dest):
    with open(source, 'r') as src:
        content = src.read()
    with open(dest, 'w') as dst:
        dst.write(content)

# Test
copy_file('sample.txt', 'sample_copy.txt')

Task: Write filter_file(source, dest, keyword) that copies only lines containing keyword.

Show Solution

def filter_file(source, dest, keyword):
    with open(source, 'r') as src:
        lines = src.readlines()
    with open(dest, 'w') as dst:
        for line in lines:
            if keyword in line:
                dst.write(line)

# Test
filter_file('app.log', 'errors.log', 'ERROR')

Task: Write number_lines(source, dest) that adds line numbers to each line.

Show Solution

def number_lines(source, dest):
    with open(source, 'r') as src:
        lines = src.readlines()
    with open(dest, 'w') as dst:
        for i, line in enumerate(lines, 1):
            dst.write(f"{i:4d}: {line}")

# Test
number_lines('sample.txt', 'numbered.txt')

Binary Mode

Binary mode reads and writes raw bytes without text encoding. Use it for images, audio, PDFs, executables, and any non-text file.

Reading Binary Files

# Read binary file
with open('image.png', 'rb') as file:
    data = file.read()
    print(f"File size: {len(data)} bytes")
    print(f"First 10 bytes: {data[:10]}")

# Read binary in chunks
with open('large_file.bin', 'rb') as file:
    while chunk := file.read(1024):
        process(chunk)

Binary mode returns bytes objects instead of strings. Reading in chunks is memory efficient for large files.

Writing Binary Files

# Copy binary file
with open('source.png', 'rb') as src:
    data = src.read()

with open('copy.png', 'wb') as dst:
    dst.write(data)

# Write raw bytes
with open('data.bin', 'wb') as file:
    file.write(b'\x00\x01\x02\x03')

Binary write expects bytes, not strings. Use the b prefix for byte literals or encode strings with .encode().

Text Mode

Returns strings
Handles encoding
Converts line endings
Cannot handle binary data

Binary Mode

Returns bytes
No encoding applied
Preserves exact bytes
Works with any file

Practice: Binary Files

Task: Write get_file_size(filename) that returns file size in bytes using binary read.

Show Solution

def get_file_size(filename):
    with open(filename, 'rb') as f:
        data = f.read()
        return len(data)

# Test
print(f"Size: {get_file_size('sample.txt')} bytes")

Task: Write copy_binary(src, dst) that copies any file type using binary mode.

Show Solution

def copy_binary(src, dst):
    with open(src, 'rb') as source:
        data = source.read()
    with open(dst, 'wb') as dest:
        dest.write(data)

# Test - works for any file type
copy_binary('image.png', 'image_backup.png')

Task: Write is_png(filename) that checks if file starts with PNG signature bytes.

Show Solution

def is_png(filename):
    # PNG signature: 89 50 4E 47 0D 0A 1A 0A
    png_sig = b'\x89PNG\r\n\x1a\n'
    with open(filename, 'rb') as f:
        header = f.read(8)
    return header == png_sig

# Test
print(is_png('image.png'))  # True or False

Key Takeaways

Choose the Right Mode

Use 'r' for reading, 'w' for overwriting, 'a' for appending, and 'b' suffix for binary files.

Always Use With

Context managers ensure files are closed even if errors occur. Never manually close files.

Iterate for Large Files

Loop over file objects line by line instead of read() to handle large files efficiently.

Write Mode Truncates

Opening with 'w' erases existing content immediately. Use 'a' to preserve data.

Binary for Non-Text

Use 'rb' and 'wb' for images, audio, PDFs, and any file that is not plain text.

Specify Encoding

Use encoding='utf-8' for international text to avoid encoding errors.

What You'll Learn

Contents

Why File I/O?

Files Are Persistent Storage

Configuration Files

Data Processing

Logging

Data Exchange

File Modes

File Opening Modes

'r'

Read

'w'

Write

'a'

Append

'rb' / 'wb'

Binary Modes

'r+' / 'w+'

Read + Write

Context Managers

Context Manager Flow

Manual vs Context Manager

Bad: Manual Close

Good: With Statement

Reading Files

Read Entire File

Read Line by Line (Memory Efficient)

Reading with Encoding

Practice: Reading Files

Easy Read and print file content

Easy Count lines in a file

Medium Find lines containing a word

Writing Files

Write Mode (Overwrites)

Append Mode (Preserves)

Writing with Print

Practice: Writing Files

Easy Write a list to file

Easy Create a simple log function

Medium Copy file contents

Medium Filter lines to new file

Hard Number lines in a file

Binary Mode

Reading Binary Files

Writing Binary Files

Text Mode

Binary Mode

Practice: Binary Files

Easy Get file size in bytes

Medium Binary file copy

Hard Check file signature

Key Takeaways

Choose the Right Mode

Always Use With

Iterate for Large Files

Write Mode Truncates

Binary for Non-Text

Specify Encoding

Knowledge Check

Quick Quiz

1 What happens when you open a file with mode 'w'?

2 Why should you use the with statement?

3 Which method is most memory efficient for large files?

4 When should you use binary mode?

5 What does file.readline() return when called multiple times?

6 What is the difference between mode 'w' and mode 'a'?

`'r'`

`'w'`

`'a'`

`'rb'` / `'wb'`

`'r+'` / `'w+'`