Why pathlib?
Before pathlib, working with file paths meant string concatenation and os.path functions. pathlib treats paths as objects with methods and properties, making code cleaner and cross-platform by default.
Paths Are Objects, Not Strings
A Path object knows its components, can navigate to parents and children, and provides methods for file operations. It handles Windows backslashes and Unix forward slashes transparently.
Why it matters: Code that uses pathlib works on any operating system without changes. No more worrying about path separators.
os.path (Old Way)
import os
# Joining paths
path = os.path.join('data', 'file.txt')
# Getting filename
name = os.path.basename(path)
# Getting directory
folder = os.path.dirname(path)
# Checking existence
exists = os.path.exists(path)
pathlib (Modern Way)
from pathlib import Path
# Joining paths
path = Path('data') / 'file.txt'
# Getting filename
name = path.name
# Getting directory
folder = path.parent
# Checking existence
exists = path.exists()
pathlib uses the / operator to join paths, which is more intuitive than os.path.join(). Properties like .name and .parent replace function calls.
Creating Paths
Create Path objects from strings, join paths together, or get special locations like home and current directory.
Basic Path Creation
from pathlib import Path
# From string
p = Path('data/file.txt')
# Current directory
cwd = Path.cwd()
# Home directory
home = Path.home()
# Print paths
print(f"Path: {p}")
print(f"CWD: {cwd}")
print(f"Home: {home}")
Path.cwd() returns the current working directory. Path.home() returns the user's home folder regardless of operating system.
Joining Paths
# Use / operator to join
data_dir = Path('project') / 'data'
file_path = data_dir / 'users.csv'
print(file_path) # project/data/users.csv
# Join multiple parts
full_path = Path('home') / 'user' / 'docs' / 'file.txt'
# Or use joinpath method
config = Path('app').joinpath('config', 'settings.ini')
The / operator makes path joining readable. It automatically uses the correct separator for your operating system.
Path Anatomy
Every path has components you can access: the name, suffix, stem, parent, and parts. Understanding these makes file manipulation easy.
Path Components
Path('/home/user/documents/report.pdf')
.parts
('/', 'home', 'user', 'documents', 'report.pdf')
.parent
/home/user/documents
.name
report.pdf
.stem
report
.suffix
.pdf
.anchor
/
Accessing Components
from pathlib import Path
p = Path('/home/user/project/data/sales.csv')
print(p.name) # sales.csv
print(p.stem) # sales
print(p.suffix) # .csv
print(p.parent) # /home/user/project/data
print(p.parts) # ('/', 'home', 'user', 'project', 'data', 'sales.csv')
The stem is the filename without extension. The suffix includes the dot. Parts is a tuple of all path components.
Multiple Suffixes
p = Path('archive.tar.gz')
print(p.suffix) # .gz (last suffix only)
print(p.suffixes) # ['.tar', '.gz'] (all suffixes)
print(p.stem) # archive.tar
# Common pattern: strip all suffixes
name_only = p.name
for suffix in p.suffixes:
name_only = name_only.replace(suffix, '')
print(name_only) # archive
Files like .tar.gz have multiple suffixes. Use .suffixes to get all of them as a list.
Changing Components
p = Path('data/report.txt')
# Change suffix
pdf_path = p.with_suffix('.pdf')
print(pdf_path) # data/report.pdf
# Change name
new_name = p.with_name('summary.txt')
print(new_name) # data/summary.txt
# Change stem (keep suffix)
new_stem = p.with_stem('analysis')
print(new_stem) # data/analysis.txt
These methods return new Path objects. The original path is unchanged because Path objects are immutable.
Practice: Path Creation
Task: Write get_extension(filepath) that returns the file extension without the dot.
Show Solution
from pathlib import Path
def get_extension(filepath):
return Path(filepath).suffix[1:] # Remove the dot
# Test
print(get_extension('data/file.csv')) # csv
print(get_extension('image.png')) # png
Task: Write build_path(folder, name, ext) that creates a complete file path.
Show Solution
from pathlib import Path
def build_path(folder, name, ext):
return Path(folder) / f"{name}.{ext}"
# Test
path = build_path('documents', 'report', 'pdf')
print(path) # documents/report.pdf
Task: Write change_ext(filepath, new_ext) that returns path with new extension.
Show Solution
from pathlib import Path
def change_ext(filepath, new_ext):
p = Path(filepath)
if not new_ext.startswith('.'):
new_ext = '.' + new_ext
return p.with_suffix(new_ext)
# Test
print(change_ext('data.txt', 'csv')) # data.csv
Glob Patterns
Glob patterns let you find files matching a pattern. Use * for any characters, ? for single character, and ** for recursive directory search.
Basic Glob Patterns
from pathlib import Path
# All .txt files in current directory
for f in Path('.').glob('*.txt'):
print(f)
# All Python files
for f in Path('src').glob('*.py'):
print(f)
# Single character wildcard
for f in Path('.').glob('file?.txt'):
print(f) # file1.txt, fileA.txt, etc.
The * matches any number of characters. The ? matches exactly one character. Glob returns an iterator of matching Paths.
Recursive Glob
# Find all Python files recursively
for f in Path('project').rglob('*.py'):
print(f)
# Same as using ** pattern
for f in Path('project').glob('**/*.py'):
print(f)
# All files in nested directories
for f in Path('.').rglob('*'):
if f.is_file():
print(f)
rglob() is recursive glob. It searches the directory and all subdirectories. The ** pattern matches any number of directories.
Common Patterns
| Pattern | Matches |
|---|---|
*.py | All Python files in directory |
data*.csv | CSV files starting with "data" |
**/*.json | All JSON files recursively |
test_*.py | All test files |
**/test_*.py | All test files recursively |
[0-9]*.txt | Files starting with digit |
Practice: Glob Patterns
Task: Write find_python_files(directory) that returns list of all .py files recursively.
Show Solution
from pathlib import Path
def find_python_files(directory):
return list(Path(directory).rglob('*.py'))
# Test
py_files = find_python_files('.')
for f in py_files:
print(f)
Task: Write find_by_extensions(dir, extensions) that finds files with any of the given extensions.
Show Solution
from pathlib import Path
def find_by_extensions(directory, extensions):
d = Path(directory)
files = []
for ext in extensions:
files.extend(d.rglob(f'*{ext}'))
return files
# Test
images = find_by_extensions('.', ['.png', '.jpg', '.gif'])
Task: Write find_test_files(directory) that finds all files matching test_*.py pattern.
Show Solution
from pathlib import Path
def find_test_files(directory):
return list(Path(directory).rglob('test_*.py'))
# Test
tests = find_test_files('.')
print(f"Found {len(tests)} test files")
Task: Write get_dir_size(directory) that returns total size in bytes of all files recursively.
Show Solution
from pathlib import Path
def get_dir_size(directory):
total = 0
for f in Path(directory).rglob('*'):
if f.is_file():
total += f.stat().st_size
return total
# Test
size = get_dir_size('.')
print(f"Total: {size / 1024:.2f} KB")
Task: Write find_largest(directory, n) that returns the n largest files.
Show Solution
from pathlib import Path
def find_largest(directory, n=5):
files = []
for f in Path(directory).rglob('*'):
if f.is_file():
files.append((f, f.stat().st_size))
files.sort(key=lambda x: x[1], reverse=True)
return files[:n]
# Test
for path, size in find_largest('.', 3):
print(f"{path}: {size} bytes")
Key Takeaways
Paths Are Objects
Use Path objects instead of strings for cleaner, cross-platform code.
Join with /
Use the / operator to join paths. It handles separators automatically.
Easy Component Access
Use .name, .stem, .suffix, and .parent to access path parts.
Glob for Patterns
Use glob() and rglob() to find files matching patterns.
Recursive with rglob
Use rglob() or ** pattern for recursive directory searches.
Cross-Platform
pathlib code works on Windows, Mac, and Linux without changes.
Knowledge Check
Quick Quiz
Test what you've learned about pathlib and path handling