Path Handling

Why pathlib?

Before pathlib, working with file paths meant string concatenation and os.path functions. pathlib treats paths as objects with methods and properties, making code cleaner and cross-platform by default.

Key Concept

Paths Are Objects, Not Strings

A Path object knows its components, can navigate to parents and children, and provides methods for file operations. It handles Windows backslashes and Unix forward slashes transparently.

Why it matters: Code that uses pathlib works on any operating system without changes. No more worrying about path separators.

os.path (Old Way)

import os

# Joining paths
path = os.path.join('data', 'file.txt')

# Getting filename
name = os.path.basename(path)

# Getting directory
folder = os.path.dirname(path)

# Checking existence
exists = os.path.exists(path)

pathlib (Modern Way)

from pathlib import Path

# Joining paths
path = Path('data') / 'file.txt'

# Getting filename
name = path.name

# Getting directory
folder = path.parent

# Checking existence
exists = path.exists()

pathlib uses the / operator to join paths, which is more intuitive than os.path.join(). Properties like .name and .parent replace function calls.

Creating Paths

Create Path objects from strings, join paths together, or get special locations like home and current directory.

Basic Path Creation

from pathlib import Path

# From string
p = Path('data/file.txt')

# Current directory
cwd = Path.cwd()

# Home directory
home = Path.home()

# Print paths
print(f"Path: {p}")
print(f"CWD: {cwd}")
print(f"Home: {home}")

Path.cwd() returns the current working directory. Path.home() returns the user's home folder regardless of operating system.

Joining Paths

# Use / operator to join
data_dir = Path('project') / 'data'
file_path = data_dir / 'users.csv'

print(file_path)  # project/data/users.csv

# Join multiple parts
full_path = Path('home') / 'user' / 'docs' / 'file.txt'

# Or use joinpath method
config = Path('app').joinpath('config', 'settings.ini')

The / operator makes path joining readable. It automatically uses the correct separator for your operating system.

Path Anatomy

Every path has components you can access: the name, suffix, stem, parent, and parts. Understanding these makes file manipulation easy.

Path Components

Path('/home/user/documents/report.pdf')

/ home user documents report .pdf

anchor parts parent stem suffix

.parts ('/', 'home', 'user', 'documents', 'report.pdf')

.parent /home/user/documents

.name report.pdf

.stem report

.suffix .pdf

.anchor /

Accessing Components

from pathlib import Path

p = Path('/home/user/project/data/sales.csv')

print(p.name)    # sales.csv
print(p.stem)    # sales
print(p.suffix)  # .csv
print(p.parent)  # /home/user/project/data
print(p.parts)   # ('/', 'home', 'user', 'project', 'data', 'sales.csv')

The stem is the filename without extension. The suffix includes the dot. Parts is a tuple of all path components.

Multiple Suffixes

p = Path('archive.tar.gz')

print(p.suffix)    # .gz (last suffix only)
print(p.suffixes)  # ['.tar', '.gz'] (all suffixes)
print(p.stem)      # archive.tar

# Common pattern: strip all suffixes
name_only = p.name
for suffix in p.suffixes:
    name_only = name_only.replace(suffix, '')
print(name_only)  # archive

Files like .tar.gz have multiple suffixes. Use .suffixes to get all of them as a list.

Changing Components

p = Path('data/report.txt')

# Change suffix
pdf_path = p.with_suffix('.pdf')
print(pdf_path)  # data/report.pdf

# Change name
new_name = p.with_name('summary.txt')
print(new_name)  # data/summary.txt

# Change stem (keep suffix)
new_stem = p.with_stem('analysis')
print(new_stem)  # data/analysis.txt

These methods return new Path objects. The original path is unchanged because Path objects are immutable.

Practice: Path Creation

Task: Write get_extension(filepath) that returns the file extension without the dot.

Show Solution

from pathlib import Path

def get_extension(filepath):
    return Path(filepath).suffix[1:]  # Remove the dot

# Test
print(get_extension('data/file.csv'))  # csv
print(get_extension('image.png'))      # png

Task: Write build_path(folder, name, ext) that creates a complete file path.

Show Solution

from pathlib import Path

def build_path(folder, name, ext):
    return Path(folder) / f"{name}.{ext}"

# Test
path = build_path('documents', 'report', 'pdf')
print(path)  # documents/report.pdf

Task: Write change_ext(filepath, new_ext) that returns path with new extension.

Show Solution

from pathlib import Path

def change_ext(filepath, new_ext):
    p = Path(filepath)
    if not new_ext.startswith('.'):
        new_ext = '.' + new_ext
    return p.with_suffix(new_ext)

# Test
print(change_ext('data.txt', 'csv'))  # data.csv

Directory Navigation

Navigate up to parents, list directory contents, check existence, and create or remove directories.

Parents and Ancestors

from pathlib import Path

p = Path('/home/user/project/src/main.py')

# Immediate parent
print(p.parent)  # /home/user/project/src

# All ancestors
for ancestor in p.parents:
    print(ancestor)
# /home/user/project/src
# /home/user/project
# /home/user
# /home
# /

The .parents property gives all ancestor directories. Index with [0] for parent, [1] for grandparent, etc.

Listing Directory Contents

# List immediate children
for item in Path('.').iterdir():
    if item.is_file():
        print(f"File: {item.name}")
    elif item.is_dir():
        print(f"Dir:  {item.name}")

# Check what something is
p = Path('data')
print(p.exists())   # True if exists
print(p.is_file())  # True if file
print(p.is_dir())   # True if directory

iterdir() yields Path objects for each item in the directory. Use is_file() and is_dir() to filter.

Creating and Removing

# Create directory (and parents if needed)
Path('output/reports').mkdir(parents=True, exist_ok=True)

# Remove empty directory
Path('temp').rmdir()

# Create empty file
Path('new_file.txt').touch()

# Delete file
Path('old_file.txt').unlink(missing_ok=True)

Use parents=True to create intermediate directories. exist_ok=True prevents errors if already exists.

Practice: Directory Navigation

Task: Write list_files(directory) that returns a list of all file names in the directory.

Show Solution

from pathlib import Path

def list_files(directory):
    d = Path(directory)
    return [f.name for f in d.iterdir() if f.is_file()]

# Test
print(list_files('.'))

Task: Write ensure_dir(path) that creates the directory and all parents if they don't exist.

Show Solution

from pathlib import Path

def ensure_dir(path):
    Path(path).mkdir(parents=True, exist_ok=True)

# Test
ensure_dir('output/reports/2026')

Task: Write count_by_extension(directory) that returns a dict of extension counts.

Show Solution

from pathlib import Path
from collections import Counter

def count_by_extension(directory):
    d = Path(directory)
    extensions = [f.suffix for f in d.iterdir() if f.is_file()]
    return dict(Counter(extensions))

# Test
print(count_by_extension('.'))

Glob Patterns

Glob patterns let you find files matching a pattern. Use * for any characters, ? for single character, and ** for recursive directory search.

Basic Glob Patterns

from pathlib import Path

# All .txt files in current directory
for f in Path('.').glob('*.txt'):
    print(f)

# All Python files
for f in Path('src').glob('*.py'):
    print(f)

# Single character wildcard
for f in Path('.').glob('file?.txt'):
    print(f)  # file1.txt, fileA.txt, etc.

The * matches any number of characters. The ? matches exactly one character. Glob returns an iterator of matching Paths.

Recursive Glob

# Find all Python files recursively
for f in Path('project').rglob('*.py'):
    print(f)

# Same as using ** pattern
for f in Path('project').glob('**/*.py'):
    print(f)

# All files in nested directories
for f in Path('.').rglob('*'):
    if f.is_file():
        print(f)

rglob() is recursive glob. It searches the directory and all subdirectories. The ** pattern matches any number of directories.

Common Patterns

Pattern	Matches
`*.py`	All Python files in directory
`data*.csv`	CSV files starting with "data"
`*/.json`	All JSON files recursively
`test_*.py`	All test files
`*/test_.py`	All test files recursively
`[0-9]*.txt`	Files starting with digit

Practice: Glob Patterns

Task: Write find_python_files(directory) that returns list of all .py files recursively.

Show Solution

from pathlib import Path

def find_python_files(directory):
    return list(Path(directory).rglob('*.py'))

# Test
py_files = find_python_files('.')
for f in py_files:
    print(f)

Task: Write find_by_extensions(dir, extensions) that finds files with any of the given extensions.

Show Solution

from pathlib import Path

def find_by_extensions(directory, extensions):
    d = Path(directory)
    files = []
    for ext in extensions:
        files.extend(d.rglob(f'*{ext}'))
    return files

# Test
images = find_by_extensions('.', ['.png', '.jpg', '.gif'])

Task: Write find_test_files(directory) that finds all files matching test_*.py pattern.

Show Solution

from pathlib import Path

def find_test_files(directory):
    return list(Path(directory).rglob('test_*.py'))

# Test
tests = find_test_files('.')
print(f"Found {len(tests)} test files")

Task: Write get_dir_size(directory) that returns total size in bytes of all files recursively.

Show Solution

from pathlib import Path

def get_dir_size(directory):
    total = 0
    for f in Path(directory).rglob('*'):
        if f.is_file():
            total += f.stat().st_size
    return total

# Test
size = get_dir_size('.')
print(f"Total: {size / 1024:.2f} KB")

Task: Write find_largest(directory, n) that returns the n largest files.

Show Solution

from pathlib import Path

def find_largest(directory, n=5):
    files = []
    for f in Path(directory).rglob('*'):
        if f.is_file():
            files.append((f, f.stat().st_size))
    files.sort(key=lambda x: x[1], reverse=True)
    return files[:n]

# Test
for path, size in find_largest('.', 3):
    print(f"{path}: {size} bytes")

Key Takeaways

Paths Are Objects

Use Path objects instead of strings for cleaner, cross-platform code.

Join with /

Use the / operator to join paths. It handles separators automatically.

Easy Component Access

Use .name, .stem, .suffix, and .parent to access path parts.

Glob for Patterns

Use glob() and rglob() to find files matching patterns.

Recursive with rglob

Use rglob() or ** pattern for recursive directory searches.

Cross-Platform

pathlib code works on Windows, Mac, and Linux without changes.

What You'll Learn

Contents

Why pathlib?

Paths Are Objects, Not Strings

os.path (Old Way)

pathlib (Modern Way)

Creating Paths

Basic Path Creation

Joining Paths

Path Anatomy

Path Components

Accessing Components

Multiple Suffixes

Changing Components

Practice: Path Creation

Easy Get file extension

Easy Build a path from parts

Medium Change file extension

Directory Navigation

Parents and Ancestors

Listing Directory Contents

Creating and Removing

Practice: Directory Navigation

Easy List all files in a directory

Easy Create nested directories

Medium Count files by type

Glob Patterns

Basic Glob Patterns

Recursive Glob

Common Patterns

Practice: Glob Patterns

Easy Find all Python files

Medium Find files by multiple extensions

Medium Find test files

Hard Calculate directory size

Hard Find largest files

Key Takeaways

Paths Are Objects

Join with /

Easy Component Access

Glob for Patterns

Recursive with rglob

Cross-Platform

Knowledge Check

Quick Quiz

1 What does Path('data') / 'file.txt' produce?

2 What is the difference between .suffix and .stem?

3 What does rglob('*.py') do?

4 What does mkdir(parents=True, exist_ok=True) do?

5 What does Path.cwd() return?

6 What is the purpose of Path.with_suffix()?