Module 8.2

Path Handling

pathlib provides object-oriented paths that work everywhere. Instead of string manipulation and os.path functions, you get a clean, intuitive API for navigating file systems. Your code becomes more readable and works identically on Windows, Mac, and Linux.

40 min
Intermediate
Hands-on
What You'll Learn
  • Creating Path objects
  • Path components and parts
  • Navigating directories
  • Glob pattern matching
  • File operations with Path
Contents
01

Why pathlib?

Before pathlib, working with file paths meant string concatenation and os.path functions. pathlib treats paths as objects with methods and properties, making code cleaner and cross-platform by default.

Key Concept

Paths Are Objects, Not Strings

A Path object knows its components, can navigate to parents and children, and provides methods for file operations. It handles Windows backslashes and Unix forward slashes transparently.

Why it matters: Code that uses pathlib works on any operating system without changes. No more worrying about path separators.

os.path (Old Way)
import os

# Joining paths
path = os.path.join('data', 'file.txt')

# Getting filename
name = os.path.basename(path)

# Getting directory
folder = os.path.dirname(path)

# Checking existence
exists = os.path.exists(path)
pathlib (Modern Way)
from pathlib import Path

# Joining paths
path = Path('data') / 'file.txt'

# Getting filename
name = path.name

# Getting directory
folder = path.parent

# Checking existence
exists = path.exists()

pathlib uses the / operator to join paths, which is more intuitive than os.path.join(). Properties like .name and .parent replace function calls.

02

Creating Paths

Create Path objects from strings, join paths together, or get special locations like home and current directory.

Basic Path Creation

from pathlib import Path

# From string
p = Path('data/file.txt')

# Current directory
cwd = Path.cwd()

# Home directory
home = Path.home()

# Print paths
print(f"Path: {p}")
print(f"CWD: {cwd}")
print(f"Home: {home}")

Path.cwd() returns the current working directory. Path.home() returns the user's home folder regardless of operating system.

Joining Paths

# Use / operator to join
data_dir = Path('project') / 'data'
file_path = data_dir / 'users.csv'

print(file_path)  # project/data/users.csv

# Join multiple parts
full_path = Path('home') / 'user' / 'docs' / 'file.txt'

# Or use joinpath method
config = Path('app').joinpath('config', 'settings.ini')

The / operator makes path joining readable. It automatically uses the correct separator for your operating system.

03

Path Anatomy

Every path has components you can access: the name, suffix, stem, parent, and parts. Understanding these makes file manipulation easy.

Path Components
Path('/home/user/documents/report.pdf')
/ home user documents report .pdf
anchor parts parent stem suffix
.parts ('/', 'home', 'user', 'documents', 'report.pdf')
.parent /home/user/documents
.name report.pdf
.stem report
.suffix .pdf
.anchor /

Accessing Components

from pathlib import Path

p = Path('/home/user/project/data/sales.csv')

print(p.name)    # sales.csv
print(p.stem)    # sales
print(p.suffix)  # .csv
print(p.parent)  # /home/user/project/data
print(p.parts)   # ('/', 'home', 'user', 'project', 'data', 'sales.csv')

The stem is the filename without extension. The suffix includes the dot. Parts is a tuple of all path components.

Multiple Suffixes

p = Path('archive.tar.gz')

print(p.suffix)    # .gz (last suffix only)
print(p.suffixes)  # ['.tar', '.gz'] (all suffixes)
print(p.stem)      # archive.tar

# Common pattern: strip all suffixes
name_only = p.name
for suffix in p.suffixes:
    name_only = name_only.replace(suffix, '')
print(name_only)  # archive

Files like .tar.gz have multiple suffixes. Use .suffixes to get all of them as a list.

Changing Components

p = Path('data/report.txt')

# Change suffix
pdf_path = p.with_suffix('.pdf')
print(pdf_path)  # data/report.pdf

# Change name
new_name = p.with_name('summary.txt')
print(new_name)  # data/summary.txt

# Change stem (keep suffix)
new_stem = p.with_stem('analysis')
print(new_stem)  # data/analysis.txt

These methods return new Path objects. The original path is unchanged because Path objects are immutable.

Practice: Path Creation

Task: Write get_extension(filepath) that returns the file extension without the dot.

Show Solution
from pathlib import Path

def get_extension(filepath):
    return Path(filepath).suffix[1:]  # Remove the dot

# Test
print(get_extension('data/file.csv'))  # csv
print(get_extension('image.png'))      # png

Task: Write build_path(folder, name, ext) that creates a complete file path.

Show Solution
from pathlib import Path

def build_path(folder, name, ext):
    return Path(folder) / f"{name}.{ext}"

# Test
path = build_path('documents', 'report', 'pdf')
print(path)  # documents/report.pdf

Task: Write change_ext(filepath, new_ext) that returns path with new extension.

Show Solution
from pathlib import Path

def change_ext(filepath, new_ext):
    p = Path(filepath)
    if not new_ext.startswith('.'):
        new_ext = '.' + new_ext
    return p.with_suffix(new_ext)

# Test
print(change_ext('data.txt', 'csv'))  # data.csv
05

Glob Patterns

Glob patterns let you find files matching a pattern. Use * for any characters, ? for single character, and ** for recursive directory search.

Basic Glob Patterns

from pathlib import Path

# All .txt files in current directory
for f in Path('.').glob('*.txt'):
    print(f)

# All Python files
for f in Path('src').glob('*.py'):
    print(f)

# Single character wildcard
for f in Path('.').glob('file?.txt'):
    print(f)  # file1.txt, fileA.txt, etc.

The * matches any number of characters. The ? matches exactly one character. Glob returns an iterator of matching Paths.

Recursive Glob

# Find all Python files recursively
for f in Path('project').rglob('*.py'):
    print(f)

# Same as using ** pattern
for f in Path('project').glob('**/*.py'):
    print(f)

# All files in nested directories
for f in Path('.').rglob('*'):
    if f.is_file():
        print(f)

rglob() is recursive glob. It searches the directory and all subdirectories. The ** pattern matches any number of directories.

Common Patterns

Pattern Matches
*.pyAll Python files in directory
data*.csvCSV files starting with "data"
**/*.jsonAll JSON files recursively
test_*.pyAll test files
**/test_*.pyAll test files recursively
[0-9]*.txtFiles starting with digit

Practice: Glob Patterns

Task: Write find_python_files(directory) that returns list of all .py files recursively.

Show Solution
from pathlib import Path

def find_python_files(directory):
    return list(Path(directory).rglob('*.py'))

# Test
py_files = find_python_files('.')
for f in py_files:
    print(f)

Task: Write find_by_extensions(dir, extensions) that finds files with any of the given extensions.

Show Solution
from pathlib import Path

def find_by_extensions(directory, extensions):
    d = Path(directory)
    files = []
    for ext in extensions:
        files.extend(d.rglob(f'*{ext}'))
    return files

# Test
images = find_by_extensions('.', ['.png', '.jpg', '.gif'])

Task: Write find_test_files(directory) that finds all files matching test_*.py pattern.

Show Solution
from pathlib import Path

def find_test_files(directory):
    return list(Path(directory).rglob('test_*.py'))

# Test
tests = find_test_files('.')
print(f"Found {len(tests)} test files")

Task: Write get_dir_size(directory) that returns total size in bytes of all files recursively.

Show Solution
from pathlib import Path

def get_dir_size(directory):
    total = 0
    for f in Path(directory).rglob('*'):
        if f.is_file():
            total += f.stat().st_size
    return total

# Test
size = get_dir_size('.')
print(f"Total: {size / 1024:.2f} KB")

Task: Write find_largest(directory, n) that returns the n largest files.

Show Solution
from pathlib import Path

def find_largest(directory, n=5):
    files = []
    for f in Path(directory).rglob('*'):
        if f.is_file():
            files.append((f, f.stat().st_size))
    files.sort(key=lambda x: x[1], reverse=True)
    return files[:n]

# Test
for path, size in find_largest('.', 3):
    print(f"{path}: {size} bytes")

Key Takeaways

Paths Are Objects

Use Path objects instead of strings for cleaner, cross-platform code.

Join with /

Use the / operator to join paths. It handles separators automatically.

Easy Component Access

Use .name, .stem, .suffix, and .parent to access path parts.

Glob for Patterns

Use glob() and rglob() to find files matching patterns.

Recursive with rglob

Use rglob() or ** pattern for recursive directory searches.

Cross-Platform

pathlib code works on Windows, Mac, and Linux without changes.

Knowledge Check

Quick Quiz

Test what you've learned about pathlib and path handling

1 What does Path('data') / 'file.txt' produce?
2 What is the difference between .suffix and .stem?
3 What does rglob('*.py') do?
4 What does mkdir(parents=True, exist_ok=True) do?
5 What does Path.cwd() return?
6 What is the purpose of Path.with_suffix()?
Answer all questions to check your score