Module 4.2

Dictionaries and Sets

Think of a dictionary like a phonebook: you look up a name (key) to find a phone number (value) instantly. Unlike searching through a list one by one, dictionaries give you direct access to any value in constant time. Sets work similarly but only store unique keys, perfect for removing duplicates.

45 min
Intermediate
Hands-on
What You'll Learn
  • Dictionary creation and access patterns
  • Key-value manipulation methods
  • Hash tables and lookup efficiency
  • Set operations (union, intersection, difference)
  • Dictionary and set comprehensions
Contents
01

What Are Dictionaries?

Dictionaries are Python's built-in mapping type that store data as key-value pairs. Think of them like a real dictionary where you look up a word (the key) to find its definition (the value). Unlike lists where you access elements by position numbers (0, 1, 2...), dictionaries let you access values using meaningful keys like "name", "age", or "email". This makes your code more readable and lookups incredibly fast - always instant, no matter how much data you store!

For Absolute Beginners: Imagine a phonebook - you don't flip through every page to find "John Smith." You jump directly to the "S" section and find John instantly. That's exactly how dictionaries work in Python! The name is the key, the phone number is the value.
Key Concept

The Phonebook Analogy

A dictionary works exactly like a phonebook. You do not read through every entry to find someone. Instead, you jump directly to the name (key) and get their number (value). Python dictionaries use the same principle with hash tables - a special data structure that converts keys into memory addresses for instant access.

Lists (Slow Search)

To find "age", check position 0, then 1, then 2... until found. With 1000 items, might check all 1000!

Time: O(n) - grows with size
Dictionaries (Instant)

Ask for "age", get it immediately. Hash table calculates exact location. Always one step!

Time: O(1) - always constant

Why it matters: Looking up a value in a dictionary takes the same time whether it has 10 items or 10 million items - always instant! Lists must scan through elements one by one, getting slower as they grow. For 1 million items, a list might take 1 million checks, while a dictionary still takes just 1 step.

Real Impact: Searching for a user in 1 million records: List = 10 seconds, Dictionary = 0.00001 seconds!

Dictionary Structure Visualization
student = {"name": "Alice", "age": 21, "grade": "A"}
KEY
"name"
VALUE
"Alice"
KEY
"age"
VALUE
21
KEY
"grade"
VALUE
"A"
Each key maps directly to its value. Keys must be unique and immutable (strings, numbers, tuples).

Creating Dictionaries

There are multiple ways to create dictionaries in Python. Each method has its own use case. Let's explore them one by one, starting with the most common approach.

Using Curly Braces: This is the standard way to create dictionaries. Use curly braces { } with key-value pairs separated by colons :.

student = {"name": "Alice", "age": 21, "grade": "A"}
print(student)
# Output: {'name': 'Alice', 'age': 21, 'grade': 'A'}

This creates a dictionary with 3 key-value pairs. The keys ("name", "age", "grade") must be unique and are enclosed in quotes since they're strings. The values ("Alice", 21, "A") can be any data type - strings, numbers, lists, or even other dictionaries.

When to use: This is your default choice - 95% of the time, you'll use this method. It's clear, readable, and works for any key-value combination.

Key Rule: Each key must be in quotes if it's a string. Keys must be unique - if you repeat a key, the last value wins!

Using dict() Constructor: Use the dict() function with key=value syntax. Notice: no quotes needed for keys, and use = instead of :.

student = dict(name="Alice", age=21, grade="A")
print(student)
# Output: {'name': 'Alice', 'age': 21, 'grade': 'A'}

The dict() constructor produces the same result as curly braces but uses a cleaner syntax. Notice you write name="Alice" instead of "name": "Alice". The keys are written like variable names without quotes, and you use equals signs instead of colons.

When to use: When all your keys are simple words (letters, numbers, underscores only). Cleaner than curly braces for simple cases.

Limitation: Keys cannot have spaces or special characters. dict(first-name="Alice") won't work because of the hyphen!

From List of Tuples: Convert a list of (key, value) tuples into a dictionary. Very useful when reading data from files or databases.

pairs = [("name", "Alice"), ("age", 21), ("grade", "A")]
student = dict(pairs)
print(student)
# Output: {'name': 'Alice', 'age': 21, 'grade': 'A'}

You can create a dictionary from a list of tuples. Each tuple must have exactly 2 elements - the first becomes the key, the second becomes the value. The dict() function converts this list of pairs into a proper dictionary.

When to use: When your data comes from external sources like CSV files, database queries, or API responses. Many Python functions return data as lists of tuples.

Common Pattern: Read data from file → Get list of (key, value) pairs → Convert to dict for fast lookups!

Empty Dictionary: Start with an empty dictionary and add items later. Useful when building dictionaries dynamically in loops.

empty = {}
also_empty = dict()
print(empty)       # Output: {}
print(also_empty)  # Output: {}

empty["name"] = "Alice"
empty["age"] = 21
print(empty)  # Output: {'name': 'Alice', 'age': 21}

You can create an empty dictionary using either {} or dict() - both produce identical results. Use {} for simplicity. Once created, you can add key-value pairs dynamically using the assignment syntax dict[key] = value.

When to use: When you need to build a dictionary gradually - for example, reading user input in a loop, or collecting data from multiple sources.

Important: Use {} not set() for empty dictionaries. set() creates an empty set, not a dictionary!

Quick Comparison Summary
Method Syntax Best For
Curly Braces {"key": value} Default choice - always works
dict() dict(key=value) Simple keys without spaces
From Tuples dict([(k, v)]) Converting existing data
Empty {} Building dynamically in loops

Accessing and Modifying Values

Once you have a dictionary, you need to work with the data inside it. Let's learn how to retrieve values, update them, and add new entries.

Accessing Values: Use square brackets [ ] with the key to retrieve its value.

student = {"name": "Alice", "age": 21, "grade": "A"}

print(student["name"])     # Output: Alice
print(student["age"])      # Output: 21

Use square brackets with the key to retrieve its value. When you write student["name"], Python looks up the key "name" and returns its associated value "Alice". The same syntax works for any key - keys can be strings, numbers, or tuples, and values can be any data type including integers, strings, lists, or even other dictionaries.

Think of it like: The key is the question ("What's the name?"), and the value is the answer ("Alice"). You ask with square brackets, Python gives you the answer instantly!

Fast Lookup: This is O(1) constant time - happens instantly even with millions of entries!

Modifying Existing Values: Change a value by assigning a new one to an existing key.

student = {"name": "Alice", "age": 21, "grade": "A"}

student["age"] = 22
print(student["age"])      # Output: 22
print(student)
# Output: {'name': 'Alice', 'age': 22, 'grade': 'A'}

To update an existing value, use the same assignment syntax dict[key] = new_value. Since the key "age" already exists in the dictionary, this replaces the old value (21) with the new value (22). The other key-value pairs remain unchanged.

Key Point: If the key already exists in the dictionary, assigning to it updates the value. The old value is replaced completely.

Safe Operation: Updating values never raises errors - if the key exists, it works!

Adding New Key-Value Pairs: Create new entries by assigning to a key that doesn't exist yet.

student = {"name": "Alice", "age": 22, "grade": "A"}

student["major"] = "Computer Science"
print(student)
# Output: {'name': 'Alice', 'age': 22, 'grade': 'A', 'major': 'Computer Science'}

student["email"] = "alice@university.edu"
student["gpa"] = 3.8
print(student)
# Output: {'name': 'Alice', 'age': 22, 'grade': 'A', 'major': 'Computer Science', 
#          'email': 'alice@university.edu', 'gpa': 3.8}

When you assign to a key that doesn't exist yet, Python automatically creates a new key-value pair. The dictionary started with 3 items, then we added "major", "email", and "gpa" to reach 6 items total. Dictionaries grow dynamically without requiring size declarations or manual resizing.

Smart Behavior: Python automatically detects if you're updating an existing key or creating a new one. Same syntax dict[key] = value does both!

No Limits: Dictionaries grow automatically as you add items. Start with 3 items, add 1000 more - it just works!

KeyError Warning: Trying to access a key that doesn't exist will crash your program with a KeyError. For example: print(student["phone"]) would fail if there's no "phone" key. Use the get() method for safe access!

Safe Access with get(): The get() method retrieves values without crashing if the key is missing.

student = {"name": "Alice", "age": 21}

email = student.get("email")
print(email)  # Output: None

The get() method safely retrieves values without crashing. When you call student.get("email") and the key doesn't exist, it returns None (Python's "nothing" value) instead of raising a KeyError. This prevents your program from crashing when accessing potentially missing keys.

Best Practice: Use get() when you're not 100% sure a key exists. Use square brackets [ ] only when you're certain the key is there.

Providing Default Values: Use get() with a second parameter to return a custom default instead of None.

student = {"name": "Alice", "age": 21}

email = student.get("email", "N/A")
print(email)  # Output: N/A

phone = student.get("phone", "No phone number")
print(phone)  # Output: No phone number

name = student.get("name", "Unknown")
print(name)  # Output: Alice

The get() method accepts an optional second parameter - the default value to return if the key doesn't exist. When we call student.get("email", "N/A"), it returns "N/A" because "email" is missing. However, when the key exists (like "name"), get() returns the actual value ("Alice") and ignores the default.

When to use defaults: When displaying data to users (show "Not specified" instead of None), or when you want a sensible fallback value (0 for counts, empty list for collections).

Smart Behavior: The default is only used if the key is missing. If the key exists, you get the real value!

Checking if a Key Exists: Use the in operator to test for key existence before accessing.

student = {"name": "Alice", "age": 21}

if "age" in student:
    print(f"Age: {student['age']}")
    # Output: Age: 21

if "email" in student:
    print(student["email"])
else:
    print("Email not found")
    # Output: Email not found

if "phone" not in student:
    print("No phone number on file")
    # Output: No phone number on file

Use the in operator to check if a key exists before accessing it. The expression "age" in student returns True if the key exists, False otherwise. This allows you to execute different code paths based on key existence. You can also use not in to check if a key is missing.

When to use in: When you need to do different things based on whether a key exists (show error message, use alternative data source, etc.).

Common Pattern: Use get() when you just need the value with a fallback. Use in when you need to execute different code paths.

Access Methods Comparison
Method Example When Key Missing Best Use
dict[key] student["age"] Crashes (KeyError) When you're certain key exists
get(key) student.get("age") Returns None Safe access, can work with None
get(key, default) student.get("age", 0) Returns default value Need specific fallback value
key in dict "age" in student Returns False Different logic for exists/missing

Practice: Dictionary Basics

Task: Create a dictionary called "inventory" with three products as keys and their quantities as values. Print the quantity of the second product.

Show Solution
# Create inventory dictionary
inventory = {
    "apples": 50,
    "bananas": 30,
    "oranges": 45
}

# Print quantity of bananas
print(inventory["bananas"])  # Output: 30

Task: Create a contacts dictionary where each contact name maps to another dictionary containing "phone" and "email". Add two contacts and print the email of the first contact.

Show Solution
# Nested dictionary for contacts
contacts = {
    "Alice": {"phone": "555-1234", "email": "alice@mail.com"},
    "Bob": {"phone": "555-5678", "email": "bob@mail.com"}
}

# Access nested value
print(contacts["Alice"]["email"])  # Output: alice@mail.com

Task: Create a user profile dictionary. Write code that safely accesses "bio" (which does not exist) using get() with a default message. Then check if "username" exists before printing it.

Show Solution
# User profile dictionary
profile = {"username": "coder42", "email": "coder@dev.io"}

# Safe access with default value
bio = profile.get("bio", "No bio provided")
print(bio)  # Output: No bio provided

# Check existence before access
if "username" in profile:
    print(f"User: {profile['username']}")  # Output: User: coder42
02

Dictionary Methods

Python dictionaries come with powerful built-in methods for adding, removing, and iterating over data. Think of these methods as special tools that make working with dictionaries easier and safer. Instead of writing complex code with loops and if-statements, you can use methods like .get(), .update(), and .items() to accomplish tasks in just one line. Mastering these methods will make your code more efficient, readable, and "Pythonic" (the Python way of doing things!).

Beginner's Mental Model: Methods are like buttons on a remote control - each button does a specific job. Instead of manually doing complex operations, just press the right button (call the right method)! For example, dict.get(key) is the "safe lookup" button that won't crash if the key doesn't exist.
Method Purpose Returns Modifies Dict?
get(key, default) Safe value access Value or default No
keys() Get all keys dict_keys view No
values() Get all values dict_values view No
items() Get key-value pairs dict_items view No
pop(key) Remove and return value Removed value Yes
update(dict2) Merge dictionaries None Yes
setdefault(key, val) Get or set if missing Existing or new value Maybe

Iterating Over Dictionaries

You can iterate over keys, values, or both using the keys(), values(), and items() methods. The items() method is most commonly used as it provides access to both.

student = {"name": "Alice", "age": 21, "grade": "A"}

# Iterate over keys (default behavior)
for key in student:
    print(key)  # name, age, grade

# Iterate over values
for value in student.values():
    print(value)  # Alice, 21, A

# Iterate over key-value pairs (most useful)
for key, value in student.items():
    print(f"{key}: {value}")

The items() method returns tuples that can be unpacked directly in the for loop. This is the preferred way to iterate when you need both keys and values.

Adding and Removing Elements

Use pop() to remove and return a value, del to remove without returning, and update() to merge dictionaries together.

student = {"name": "Alice", "age": 21, "grade": "A"}

# Remove with pop() - returns the removed value
age = student.pop("age")
print(age)     # Output: 21
print(student) # {'name': 'Alice', 'grade': 'A'}

# Remove with del - no return value
del student["grade"]
print(student) # {'name': 'Alice'}

# Update with another dictionary
student.update({"age": 22, "major": "CS"})
print(student) # {'name': 'Alice', 'age': 22, 'major': 'CS'}

Use pop() when you need the removed value. Use del when you just want to remove. The update() method overwrites existing keys and adds new ones.

The setdefault() Method

The setdefault() method returns the value of a key if it exists. If not, it inserts the key with a specified default value and returns that value. This is useful for building dictionaries incrementally.

# Count word occurrences using setdefault
text = "apple banana apple cherry banana apple"
word_count = {}

for word in text.split():
    word_count.setdefault(word, 0)
    word_count[word] += 1

print(word_count)
# {'apple': 3, 'banana': 2, 'cherry': 1}

setdefault() eliminates the need to check if a key exists before using it. For counting, consider using collections.Counter which is even more convenient.

Practice: Dictionary Methods

Task: Given a dictionary of country capitals, print all country names on one line and all capital cities on another line.

Show Solution
capitals = {"USA": "Washington", "UK": "London", "Japan": "Tokyo"}

# Print all keys (countries)
print("Countries:", list(capitals.keys()))
# Output: Countries: ['USA', 'UK', 'Japan']

# Print all values (capitals)
print("Capitals:", list(capitals.values()))
# Output: Capitals: ['Washington', 'London', 'Tokyo']

Task: Create two price dictionaries for different stores. Merge them so that if a product exists in both, you keep the lower price. Print the final merged dictionary.

Show Solution
store_a = {"apple": 1.50, "banana": 0.75, "milk": 3.00}
store_b = {"apple": 1.25, "bread": 2.50, "milk": 3.25}

# Start with store_a prices
best_prices = store_a.copy()

# Update with store_b, keeping lower prices
for item, price in store_b.items():
    if item not in best_prices or price < best_prices[item]:
        best_prices[item] = price

print(best_prices)
# {'apple': 1.25, 'banana': 0.75, 'milk': 3.00, 'bread': 2.50}

Task: Given a list of fruits, create a dictionary that groups them by their first letter. Use setdefault() to build the groups. Each key should map to a list of fruits starting with that letter.

Show Solution
fruits = ["apple", "apricot", "banana", "blueberry", "cherry", "coconut"]
grouped = {}

for fruit in fruits:
    first_letter = fruit[0].upper()
    grouped.setdefault(first_letter, []).append(fruit)

print(grouped)
# {'A': ['apple', 'apricot'], 'B': ['banana', 'blueberry'], 
#  'C': ['cherry', 'coconut']}
03

Hash Tables Explained

Behind every Python dictionary is a hash table - the "magic" that makes dictionaries so fast. A hash table is like an automated filing system that calculates exactly where to store and find each item. When you give it a key like "name", it runs a mathematical function (called a hash function) that converts "name" into a number, then uses that number to find the exact storage location instantly. Understanding this concept helps you write more efficient code and explains the important rule: dictionary keys must be immutable (unchangeable) because changing a key would change its hash, making it impossible to find!

Simplified Explanation: Imagine a library where books are organized not by shelves you browse, but by a magical system. Tell the librarian the book title, and they instantly calculate which drawer it's in (using math), open that exact drawer, and hand you the book. That calculation is "hashing" - it converts your request into a specific location. This is why you can't change a book's title after filing it - the system wouldn't know where to find it anymore!
How Hash Tables Work (Simplified)
1. Key Input
"name"
2. Hash Function
hash("name")
= 7263891245
3. Array Index
7263891245 % 8
= slot 5
0: empty 1: empty 2: empty 3: empty 4: empty 5: "Alice" 6: empty 7: empty
The hash function converts keys into array indices for O(1) direct access. No searching needed!

O(1) Lookup Time

Hash tables provide constant-time access. Looking up a key in a dictionary with 1 million items takes the same time as one with 10 items.

Immutable Keys Only

Keys must be hashable (immutable): strings, numbers, tuples. Lists and dicts cannot be keys because their hash would change if modified.

What Can Be a Dictionary Key?

Only hashable (immutable) objects can be dictionary keys. This includes strings, numbers, tuples, and frozensets. Lists and dictionaries cannot be keys.

# Valid keys (immutable/hashable)
valid = {
    "string_key": 1,      # String - OK
    42: "number key",     # Integer - OK
    3.14: "float key",    # Float - OK
    (1, 2): "tuple key",  # Tuple - OK (if contents hashable)
    True: "bool key"      # Boolean - OK
}

# Invalid keys (mutable)
# invalid = {[1, 2]: "list key"}  # TypeError!
# invalid = {{"a": 1}: "dict key"}  # TypeError!

If you need a list-like key, convert it to a tuple first. Tuples are immutable and hashable as long as all their elements are hashable.

Pro Tip: Use hash(obj) to check if an object is hashable. If it raises TypeError, it cannot be a dictionary key.
04

Sets and Operations

Sets are unordered collections of unique elements - no duplicates allowed! Think of a set as a bag where you can only put one copy of each item. If you try to add something that's already there, the set ignores it. Sets use the same lightning-fast hash table mechanism as dictionaries, but simpler: they only store values (no key-value pairs). This makes sets perfect for three main tasks: (1) removing duplicates from data, (2) checking if something exists (super fast!), and (3) performing mathematical operations like finding what's common between two groups or what's unique to each.

Real-World Analogy: Think of a set like a club membership list. Each person can only be on the list once (no duplicates). You can quickly check if someone is a member (membership test), combine two club lists and remove duplicates (union), or find who's in both clubs (intersection). That's exactly what Python sets do with data!

Creating Sets

Create sets using curly braces with just values (no colons) or the set() constructor. Note that empty curly braces create a dict, not a set.

# Create a set with curly braces
fruits = {"apple", "banana", "cherry"}

# Create from a list (removes duplicates)
numbers = set([1, 2, 2, 3, 3, 3])
print(numbers)  # {1, 2, 3}

# Empty set (must use set(), not {})
empty = set()

# Convert string to set of characters
chars = set("hello")
print(chars)  # {'h', 'e', 'l', 'o'}

Sets automatically remove duplicates. Use set() to quickly deduplicate any iterable like lists or strings.

Set Operations

Sets support mathematical operations like union, intersection, difference, and symmetric difference. These operations are useful for comparing collections and finding commonalities.

Set Operations Comparison
Union |
a | b
All elements from both sets
Intersection &
a & b
Only common elements
Difference -
a - b
In a but not in b
Symmetric ^
a ^ b
In either, not both
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

# Union: all unique elements from both
print(a | b)        # {1, 2, 3, 4, 5, 6}
print(a.union(b))   # Same result

# Intersection: elements in both
print(a & b)             # {3, 4}
print(a.intersection(b)) # Same result

# Difference: elements in a but not b
print(a - b)            # {1, 2}
print(a.difference(b))  # Same result

# Symmetric difference: in either, not both
print(a ^ b)                      # {1, 2, 5, 6}
print(a.symmetric_difference(b))  # Same result

Operator syntax (|, &, -, ^) is concise. Method syntax (.union(), etc.) accepts any iterable, not just sets.

Set Methods

Sets provide methods for adding, removing, and testing elements. Like dictionary keys, set elements must be hashable.

colors = {"red", "green", "blue"}

# Add single element
colors.add("yellow")

# Add multiple elements
colors.update(["purple", "orange"])

# Remove element (raises error if missing)
colors.remove("red")

# Remove element (no error if missing)
colors.discard("pink")  # No error

# Check membership (O(1) fast!)
print("green" in colors)  # True

Use discard() instead of remove() when you are not sure if the element exists. Membership testing with "in" is O(1) for sets versus O(n) for lists.

Practice: Sets

Task: Given a list with duplicate values, convert it to a set to remove duplicates, then convert back to a sorted list.

Show Solution
numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

# Remove duplicates using set, then sort
unique = sorted(set(numbers))
print(unique)  # [1, 2, 3, 4, 5, 6, 9]

Task: Create two sets of friends for two users. Find common friends, friends unique to user1, and all friends combined.

Show Solution
user1_friends = {"Alice", "Bob", "Charlie", "Diana"}
user2_friends = {"Bob", "Diana", "Eve", "Frank"}

# Common friends
common = user1_friends & user2_friends
print(f"Common: {common}")  # {'Bob', 'Diana'}

# Friends unique to user1
only_user1 = user1_friends - user2_friends
print(f"Only user1: {only_user1}")  # {'Alice', 'Charlie'}

# All friends combined
all_friends = user1_friends | user2_friends
print(f"All: {all_friends}")  # {'Alice', 'Bob', ...}

Task: Given three document strings, find words that appear in exactly one document (not shared with any other). Convert to lowercase and split by spaces.

Show Solution
doc1 = "python is great for data science"
doc2 = "java is popular for enterprise"
doc3 = "python and java are programming languages"

# Convert to sets of words
words1 = set(doc1.lower().split())
words2 = set(doc2.lower().split())
words3 = set(doc3.lower().split())

# Find words unique to each document
unique1 = words1 - words2 - words3
unique2 = words2 - words1 - words3
unique3 = words3 - words1 - words2

# All unique words (in exactly one doc)
all_unique = unique1 | unique2 | unique3
print(all_unique)
# {'great', 'data', 'science', 'popular', 'enterprise', ...}
05

Dictionary and Set Comprehensions

Comprehensions provide a concise, one-line way to create dictionaries and sets from existing data. Instead of writing 4-5 lines with a loop, if-statement, and append, you write one elegant line that does everything! They follow the same pattern as list comprehensions but use curly braces { } instead of square brackets. For dictionaries, you include a colon to separate keys and values: {key: value for item in data}. For sets, just values: {value for item in data}. Think of comprehensions as "recipe cards" - a compact formula for transforming data.

Why Use Comprehensions?
  • Shorter: 1 line instead of 4-5 lines with loops
  • Faster: Optimized by Python (runs quicker than manual loops)
  • Clearer: Express "what you want" not "how to build it"
  • Pythonic: This is the Python way - other Python developers expect and prefer it!

Beginner Tip: Start by writing a regular loop, then convert it to a comprehension once you understand the pattern. It's like learning to tie shoes - start slow, then it becomes second nature!

Dictionary Comprehensions

Create dictionaries in one line using the pattern {key: value for item in iterable}. You can also add conditions to filter which items to include.

# Basic dictionary comprehension
squares = {x: x**2 for x in range(1, 6)}
print(squares)  # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# With condition (filter)
even_squares = {x: x**2 for x in range(1, 11) if x % 2 == 0}
print(even_squares)  # {2: 4, 4: 16, 6: 36, 8: 64, 10: 100}

# Transform existing dictionary
prices = {"apple": 1.50, "banana": 0.75, "cherry": 2.00}
discounted = {k: v * 0.9 for k, v in prices.items()}
print(discounted)  # 10% off all prices

Dictionary comprehensions are more readable and faster than building dictionaries with loops. Use them whenever you are transforming or filtering data into a dictionary.

Set Comprehensions

Set comprehensions work like list comprehensions but produce a set (no duplicates). Use curly braces without colons.

# Basic set comprehension
squares_set = {x**2 for x in range(-3, 4)}
print(squares_set)  # {0, 1, 4, 9} - duplicates removed!

# Extract unique first letters
names = ["Alice", "Bob", "Anna", "Charlie", "Carol"]
initials = {name[0] for name in names}
print(initials)  # {'A', 'B', 'C'}

# Filter with condition
evens = {x for x in range(20) if x % 2 == 0}
print(evens)  # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18}

Set comprehensions automatically deduplicate results. They are useful for extracting unique values from data while applying transformations.

Practical Examples

Here are real-world examples showing how comprehensions make code cleaner and more expressive.

# Swap keys and values
original = {"a": 1, "b": 2, "c": 3}
swapped = {v: k for k, v in original.items()}
print(swapped)  # {1: 'a', 2: 'b', 3: 'c'}

# Count character frequencies
text = "mississippi"
freq = {char: text.count(char) for char in set(text)}
print(freq)  # {'m': 1, 'i': 4, 's': 4, 'p': 2}

# Filter dictionary by value
scores = {"Alice": 85, "Bob": 72, "Carol": 90, "Dave": 65}
passed = {k: v for k, v in scores.items() if v >= 75}
print(passed)  # {'Alice': 85, 'Carol': 90}

The swapped dictionary trick only works if all values are unique. Character frequency counting uses set() to avoid recounting the same character.

Practice: Comprehensions

Task: Using a dictionary comprehension, create a dictionary where keys are numbers 1-5 and values are their cubes.

Show Solution
# Dictionary of cubes
cubes = {n: n**3 for n in range(1, 6)}
print(cubes)  # {1: 1, 2: 8, 3: 27, 4: 64, 5: 125}

Task: Given a dictionary of category counts, use a comprehension to convert values to percentages of the total. Round to 1 decimal place.

Show Solution
counts = {"python": 150, "java": 100, "js": 75, "cpp": 25}
total = sum(counts.values())

# Convert to percentages
percentages = {k: round(v/total * 100, 1) for k, v in counts.items()}
print(percentages)
# {'python': 42.9, 'java': 28.6, 'js': 21.4, 'cpp': 7.1}

Task: Given a sentence, create a dictionary where keys are unique word lengths and values are sets of words with that length. Use comprehensions for both the outer dict and inner sets.

Show Solution
sentence = "the quick brown fox jumps over the lazy dog"
words = sentence.split()

# Get unique lengths first
lengths = {len(w) for w in words}

# Build the index
length_index = {
    length: {w for w in words if len(w) == length}
    for length in lengths
}

print(length_index)
# {3: {'the', 'fox', 'dog'}, 5: {'quick', 'brown', 'jumps'}, 
#  4: {'over', 'lazy'}}
06

Advanced Dictionary Patterns

Master advanced dictionary techniques that professional Python developers use every day. This section covers nested dictionaries (dictionaries inside dictionaries - like folders within folders), defaultdict (automatically creates missing keys so you never get KeyError), Counter (counts how many times each item appears), and powerful real-world patterns for processing data, speeding up your code with caching, and managing application settings. These aren't just "fancy tricks" - they're essential tools that will make you a more productive programmer!

From Beginner to Professional: The difference between beginner and professional Python code is often in these techniques. Instead of writing 50 lines with loops and if-statements, professionals write 5 lines using defaultdict or Counter. This section bridges that gap!
What You'll Learn:
  • Nested Dicts: Store complex, hierarchical data (like JSON from web APIs)
  • defaultdict: Never worry about "does this key exist?" again
  • Counter: Count anything in one line (word frequency, vote tallies, etc.)
  • Real Patterns: Industry-proven techniques used in production code

Nested Dictionaries

Dictionaries can contain other dictionaries as values, creating hierarchical (multi-level) data structures. Think of it like folders within folders on your computer - the main dictionary is the outer folder, and each value can be another folder (dictionary) with its own contents. This pattern is perfect for storing complex, organized data like user profiles, configuration settings, or data from web APIs (JSON format).

Real-World Analogy:

Imagine a filing cabinet (main dictionary) where each drawer is labeled with a person's name. Inside each drawer is a folder (nested dictionary) with their personal information. Inside that folder are more folders (deeper nesting) for specific categories like "settings" or "preferences". You navigate: Cabinet → Person's Drawer → Information Folder → Specific Detail.

Creating Nested Dictionaries: Build multi-level structures by using dictionaries as values within a parent dictionary.

users = {
    "alice": {
        "age": 28,
        "email": "alice@example.com",
        "settings": {
            "theme": "dark",
            "notifications": True
        }
    },
    "bob": {
        "age": 35,
        "email": "bob@example.com",
        "settings": {
            "theme": "light",
            "notifications": False
        }
    }
}

print(users)

This creates a 3-level hierarchy: Level 1 - usernames ("alice", "bob"), Level 2 - user properties ("age", "email", "settings"), Level 3 - settings details ("theme", "notifications"). Each username maps to a dictionary, and the "settings" key maps to yet another dictionary. It's like JSON data you'd get from a web API!

Key Insight: Values in dictionaries can be ANY type - including other dictionaries. You can nest as deep as needed (dictionaries inside dictionaries inside dictionaries...).

Accessing Nested Values: Use multiple square brackets to "drill down" through the levels.

print(users["alice"]["email"])
# Output: alice@example.com

print(users["alice"]["settings"]["theme"])
# Output: dark

To access nested data, chain square brackets together. users["alice"] gets Alice's dictionary, then ["email"] gets her email from that dictionary. For deeper nesting like users["alice"]["settings"]["theme"], Python first gets users["alice"] (returns a dict), then gets ["settings"] from that dict (returns another dict), then finally gets ["theme"] from that innermost dict.

Think Step-by-Step: Each [key] returns a value, which becomes the object for the next [key]. It's like following a path: Start at users → Go to "alice" → Go to "settings" → Get "theme".

Modifying Nested Values: Change values at any level using the same chained square bracket syntax.

users["bob"]["settings"]["notifications"] = True

print(users["bob"]["settings"]["notifications"])
# Output: True

You can update values at any nesting level using assignment. Here we navigate to Bob's settings dictionary and change the "notifications" value from False to True. The syntax is the same as accessing - just add = new_value at the end.

Important: Each level must exist or you'll get a KeyError. You can't do users["charlie"]["email"] = "new" if "charlie" doesn't exist yet - create the outer level first!

Iterating Through Nested Structures: Loop through the outer dictionary and access nested values inside the loop.

for username, profile in users.items():
    email = profile['email']
    theme = profile['settings']['theme']
    print(f"{username}: {email}, Theme: {theme}")

# Output:
# alice: alice@example.com, Theme: dark
# bob: bob@example.com, Theme: light

When iterating with .items(), you get the key (username) and value (the entire profile dictionary). Inside the loop, profile is a dictionary, so you can access its keys normally. This pattern is common for processing JSON API responses or configuration files where you need to examine each record.

Common Use Cases: Configuration files (dev/staging/production settings), API responses (user data from servers), database records (customer profiles), game data (player stats and inventory), organizational charts (departments and employees).

Safe Access with Nested get(): Prevent crashes when navigating uncertain nested structures.

# Safe nested access
theme = users.get("alice", {}).get("settings", {}).get("theme", "default")
print(theme)  # Output: dark

# If user doesn't exist
theme = users.get("charlie", {}).get("settings", {}).get("theme", "default")
print(theme)  # Output: default (no crash!)

Chaining .get() methods is the safe way to access nested dictionaries when you're not sure all keys exist. Each .get(key, {}) returns an empty dictionary if the key is missing, so the next .get() has something to work with. The final .get() returns your default value if the key isn't found. This prevents KeyError crashes when working with incomplete or varying data structures.

Professional Pattern: Always use chained .get() when processing external data (APIs, user input, config files) where structure isn't guaranteed. It's defensive programming - expect the unexpected!

defaultdict for Auto-Initialization

The collections.defaultdict is a special type of dictionary that automatically creates default values for keys that don't exist yet. With regular dictionaries, accessing or modifying a non-existent key raises a KeyError and crashes your program. With defaultdict, Python automatically creates the missing key with a default value you specify. This eliminates the need for tedious if key in dict checks before every operation!

The Problem defaultdict Solves:

Regular dict (manual checking required):

counts = {}
if 'apple' not in counts:
    counts['apple'] = 0
counts['apple'] += 1  # Too much code!

With defaultdict (automatic!):

counts = defaultdict(int)
counts['apple'] += 1  # Just works!

Basic Setup - Import and Create: Import from collections module and specify what default value to use.

from collections import defaultdict

freq = defaultdict(int)
print(freq)
# Output: defaultdict(, {})

You must import defaultdict from Python's built-in collections module. When creating a defaultdict, you pass a "factory function" that tells Python what default value to create for missing keys. Here we use int, which creates 0 as the default. Other common factories: list (creates []), set (creates set()), dict (creates {}).

Key Concept: The argument (int, list, etc.) is a function that gets called whenever you access a missing key. It's not the value itself - it's a recipe for creating the value!

Counting with defaultdict(int): Count occurrences without checking if keys exist first.

from collections import defaultdict

text = "apple banana apple cherry banana apple"
freq = defaultdict(int)

for word in text.split():
    freq[word] += 1

print(dict(freq))
# Output: {'apple': 3, 'banana': 2, 'cherry': 1}

This is the classic counting use case. defaultdict(int) means missing keys automatically get the value 0. When we do freq[word] += 1, if the word hasn't been seen before, Python creates freq[word] = 0 automatically, then adds 1 to make it 1. If the word exists, it just increments normally. The loop processes: "apple" (0+1=1), "banana" (0+1=1), "apple" (1+1=2), "cherry" (0+1=1), "banana" (1+1=2), "apple" (2+1=3).

Why This Matters: Without defaultdict, you'd need to check if word in freq before every increment. With 1000 words, that's 1000 unnecessary checks! defaultdict makes code cleaner and faster.

Grouping with defaultdict(list): Collect items into categories without initializing empty lists first.

from collections import defaultdict

items = [
    ("apple", "fruit"),
    ("carrot", "vegetable"),
    ("banana", "fruit"),
    ("broccoli", "vegetable")
]

groups = defaultdict(list)

for item, category in items:
    groups[category].append(item)

print(dict(groups))
# Output: {'fruit': ['apple', 'banana'], 'vegetable': ['carrot', 'broccoli']}

defaultdict(list) automatically creates an empty list [] for any missing key. When we encounter "fruit" for the first time with "apple", Python creates groups["fruit"] = [], then appends "apple" to that list. The second fruit "banana" gets appended to the existing list. Same process happens independently for "vegetable". This pattern is extremely common for organizing data by category, grouping database results, or processing logs.

Real-World Use: Group students by grade level, organize products by department, categorize emails by sender, collect errors by type, aggregate sales by region - any time you're building lists grouped by some key.

Advanced: Nested defaultdict: Create multi-dimensional structures that auto-initialize at every level.

from collections import defaultdict

matrix = defaultdict(lambda: defaultdict(int))

matrix[0][0] = 1
matrix[0][1] = 2
matrix[5][5] = 99

print(matrix[5][5])  # Output: 99
print(matrix[3][3])  # Output: 0
print(matrix[10][20])  # Output: 0 (auto-created!)

This creates a 2D grid (matrix) that automatically initializes any coordinate you access. defaultdict(lambda: defaultdict(int)) means: "When a key is missing, create a new defaultdict(int)". So matrix[5] doesn't exist initially - Python creates it as defaultdict(int). Then [5] doesn't exist in that nested dict - Python creates it with default value 0. When we assign 99, it overwrites that 0. Accessing matrix[3][3] returns 0 because both levels auto-initialize.

When to Use: Sparse matrices (mostly zeros), grid-based games (Minecraft-like blocks), graph adjacency lists, multi-level categorization (state → city → neighborhood), or any scenario where you don't want to pre-populate all possible keys.

Common Factory Functions:
  • defaultdict(int) - Creates 0 for missing keys (counting, tallying)
  • defaultdict(list) - Creates [] for missing keys (grouping, collecting)
  • defaultdict(set) - Creates set() for missing keys (unique items per category)
  • defaultdict(dict) - Creates {} for missing keys (nested structures)
  • defaultdict(lambda: default_value) - Creates custom default (any value you want)

Counter for Frequency Counting

Counter is a specialized dictionary subclass designed specifically for counting hashable objects (strings, numbers, tuples). While you could use defaultdict(int) for counting, Counter provides powerful built-in methods for tallying, ranking, and arithmetic operations on counts. It's the professional tool for frequency analysis!

What Makes Counter Special:

Regular dict or defaultdict can count, but Counter adds superpowers:

  • most_common(n) - Instantly find top N items
  • Arithmetic operations - Add/subtract counters like numbers
  • Missing keys return 0 (not KeyError) - safe to use anywhere
  • Built-in methods optimized for frequency analysis

Basic Letter Frequency Counting: Count how many times each character appears in text.

from collections import Counter

text = "mississippi"
counts = Counter(text)

print(counts)
# Output: Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})

Import Counter from the collections module. When you pass a string to Counter(), it iterates through each character and counts occurrences automatically. The output shows a dictionary-like object where keys are characters and values are their counts. Notice it's automatically sorted by frequency (highest first: 'i' and 's' both appear 4 times, then 'p' with 2, finally 'm' with 1).

Use Cases: Analyze password strength (character distribution), detect repeated letters in words, cryptography frequency analysis, text compression preparation.

Finding Most Common Items: Get the top N most frequently occurring items with counts.

from collections import Counter

text = "mississippi"
counts = Counter(text)

top_2 = counts.most_common(2)
print(top_2)
# Output: [('i', 4), ('s', 4)]

all_items = counts.most_common()
print(all_items)
# Output: [('i', 4), ('s', 4), ('p', 2), ('m', 1)]

The most_common(n) method returns a list of tuples showing the top N items and their counts, sorted from highest to lowest frequency. Each tuple is (item, count). If you call most_common() without an argument, it returns ALL items sorted by frequency. This is incredibly useful for leaderboards, finding trending topics, identifying common errors in logs, or analyzing survey results.

Real Example: In a voting system with thousands of votes, vote_count.most_common(1)[0] instantly gives you the winner without manually sorting or comparing counts!

Counter Arithmetic - Adding: Combine multiple counters by adding their frequencies together.

from collections import Counter

counter1 = Counter("aabbcc")
counter2 = Counter("abc")

combined = counter1 + counter2

print(combined)
# Output: Counter({'a': 3, 'b': 3, 'c': 3})

counter1 has: a=2, b=2, c=2. counter2 has: a=1, b=1, c=1. Adding them together merges the counts element-wise: a becomes 2+1=3, b becomes 2+1=3, c becomes 2+1=3. This is perfect for merging statistics from multiple sources - like combining word counts from different documents, aggregating sales from multiple stores, or merging traffic data from different time periods.

Production Use: Merge hourly website analytics into daily totals, combine error counts from multiple servers, aggregate user activity across different platforms.

Counter Arithmetic - Subtracting: Find the difference between two counters (what changed?).

from collections import Counter

counter1 = Counter("aabbcc")
counter2 = Counter("abc")

difference = counter1 - counter2

print(difference)
# Output: Counter({'a': 1, 'b': 1, 'c': 1})

Subtraction removes counts: counter1 has a=2, counter2 has a=1, so result is a=1 (2-1). Same for b and c. Negative or zero counts are automatically excluded from the result. This is powerful for change detection - comparing inventory before/after sales, finding new issues in latest logs vs previous logs, detecting what items were removed from a collection.

Business Example: Compare this week's product sales vs last week's to see which items decreased in popularity. Subtract last week from this week to identify declining products.

Real-World Voting System: Count votes and determine the winner automatically.

from collections import Counter

votes = ["alice", "bob", "alice", "charlie", "alice", "bob"]
vote_count = Counter(votes)

print(vote_count)
# Output: Counter({'alice': 3, 'bob': 2, 'charlie': 1})

winner = vote_count.most_common(1)[0]
print(f"Winner: {winner[0]} with {winner[1]} votes")
# Output: Winner: alice with 3 votes

Pass a list of votes (could be names, candidate IDs, poll options, etc.) directly to Counter(). It automatically tallies each unique value. most_common(1) returns a list with one tuple containing the top item. [0] gets that first (and only) tuple. winner[0] is the candidate name, winner[1] is their vote count. This pattern works for any ranking scenario: most popular product, most active user, most common error code, most visited page.

Scaling Up: Works with millions of votes - Counter is optimized for performance. You can also use vote_count.most_common(3) for top 3 podium positions!

Counter vs defaultdict(int) - When to Use Each:

Use Counter when: You need most_common(), arithmetic operations, or just counting frequency

Use defaultdict(int) when: You need more control over counting logic or combining with other operations

Performance: Both are equally fast for basic counting. Counter adds minimal overhead but huge convenience!

Dictionary Merging Patterns

Merging dictionaries is essential when combining configuration settings, processing user preferences, or aggregating data from multiple sources. Python offers multiple ways to merge dictionaries, from modern operators (Python 3.9+) to traditional methods that work in older versions. When keys overlap, the rightmost dictionary's values take precedence.

Why Merge Dictionaries:

Common scenarios where you need to combine multiple dictionaries:

  • Combining default settings with user preferences
  • Merging configuration files from multiple sources
  • Aggregating data from different API responses
  • Building complete objects from partial data

Modern Merge Operator (Python 3.9+): Use the pipe symbol | to combine dictionaries cleanly.

defaults = {"theme": "light", "language": "en", "notifications": True}
user_prefs = {"theme": "dark", "font_size": 14}

config = defaults | user_prefs

print(config)
# Output: {'theme': 'dark', 'language': 'en', 'notifications': True, 'font_size': 14}

The | operator creates a new dictionary by merging two dictionaries. When the same key appears in both (like "theme"), the second dictionary's value wins ("dark" overrides "light"). Keys that only exist in one dictionary are simply included. The original dictionaries defaults and user_prefs remain unchanged. This is the cleanest, most readable way to merge dicts if you're using Python 3.9 or later.

Real-World Use: User customizes their app theme to "dark" while keeping default language "en" and notifications True. New setting "font_size" gets added without touching defaults.

In-Place Update Operator (Python 3.9+): Modify an existing dictionary by merging another into it.

settings = {"volume": 50}
settings |= {"volume": 75, "mute": False}

print(settings)
# Output: {'volume': 75, 'mute': False}

The |= operator updates the left dictionary in place (like += for numbers). It doesn't create a new dict - it modifies settings directly. The key "volume" gets updated from 50 to 75, and the new key "mute" is added. This is more memory-efficient when you don't need to preserve the original dictionary, perfect for accumulating updates or applying patches to configurations.

Use Case: Applying user actions to settings (user increases volume to 75 and enables mute) without creating a new settings object each time.

Dictionary Unpacking (Python 3.5+): Merge using ** unpacking syntax for older Python versions.

defaults = {"theme": "light", "language": "en", "notifications": True}
user_prefs = {"theme": "dark", "font_size": 14}

config = {**defaults, **user_prefs}

print(config)
# Output: {'theme': 'dark', 'language': 'en', 'notifications': True, 'font_size': 14}

The {**dict1, **dict2} syntax unpacks both dictionaries into a new one. **defaults unpacks all key-value pairs from defaults first, then **user_prefs unpacks its pairs second, overwriting any duplicate keys. This creates a new dictionary just like |, but works in Python 3.5-3.8. It's the second-best option if you can't use Python 3.9+.

Technical Note: The double asterisk ** is the "dictionary unpacking operator" - think of it as "spread all these key-value pairs here".

Traditional update() Method: Classic approach that modifies a dictionary in place.

defaults = {"theme": "light", "language": "en", "notifications": True}
user_prefs = {"theme": "dark", "font_size": 14}

config = defaults.copy()
config.update(user_prefs)

print(config)
# Output: {'theme': 'dark', 'language': 'en', 'notifications': True, 'font_size': 14}

First, we use defaults.copy() to create a shallow copy of the defaults dictionary (if we skip this, we'd modify the original defaults). Then .update(user_prefs) adds all key-value pairs from user_prefs to config, overwriting any duplicates. This is the oldest method, works in all Python versions, but requires two lines and is less readable than modern alternatives. Still useful when you need to update with conditions or in loops.

When to Use: Legacy codebases, Python 2 compatibility, or when you need to conditionally update keys in a loop.

Merging Multiple Dictionaries: Chain operators or unpacking to combine 3+ dictionaries.

dict1 = {"a": 1}
dict2 = {"b": 2}
dict3 = {"c": 3}

# Python 3.9+ chaining
merged = dict1 | dict2 | dict3

# Python 3.5+ unpacking
merged = {**dict1, **dict2, **dict3}

print(merged)
# Output: {'a': 1, 'b': 2, 'c': 3}

Both methods chain the merge operation left-to-right. dict1 | dict2 merges first (getting {a:1, b:2}), then that result merges with dict3 (adding c:3). The unpacking version does the same: unpack dict1, then dict2, then dict3 into one new dictionary. If any keys overlap across the three dicts, the rightmost value wins (dict3 beats dict2 beats dict1).

Production Example: Merge base config + environment-specific config + user overrides: config = base | production_settings | user_prefs. User preferences always win!

Choosing the Right Merge Method:

Python 3.9+: Use | for new dict, |= for in-place update (cleanest!)

Python 3.5-3.8: Use {**dict1, **dict2} unpacking (modern feel)

All Python versions: Use dict.copy() + update() (verbose but works everywhere)

Performance: All methods are equally fast. Choose based on Python version and readability!

Dictionary Views and Iteration Patterns

Dictionary views are live, dynamic objects that provide a window into dictionary keys, values, and key-value pairs. Unlike converting to a list (which creates a static copy), views automatically reflect any changes made to the original dictionary after creation. This makes them memory-efficient for large dictionaries and enables powerful set-like operations for comparing dictionaries.

What Are Dictionary Views:

Three types of views returned by dictionary methods:

  • dict.keys() - Returns view of all dictionary keys
  • dict.values() - Returns view of all dictionary values
  • dict.items() - Returns view of (key, value) tuple pairs
  • All views update automatically when the dictionary changes

Creating Dictionary Views: Get live references to keys, values, and items.

prices = {"apple": 1.50, "banana": 0.75}

keys_view = prices.keys()
values_view = prices.values()
items_view = prices.items()

print(keys_view)
# Output: dict_keys(['apple', 'banana'])

When you call .keys(), .values(), or .items() on a dictionary, Python doesn't create a new list - it returns a special view object. These view objects are lightweight pointers to the dictionary's internal data. dict_keys, dict_values, and dict_items are the actual types returned. You can iterate over them, check membership, or use them in set operations, all without copying data.

Memory Advantage: For a dictionary with 1 million entries, list(dict.keys()) creates 1 million copies in memory, while dict.keys() creates just a tiny view object!

Views Reflect Changes Dynamically: Views automatically update when the dictionary changes.

prices = {"apple": 1.50, "banana": 0.75}
keys_view = prices.keys()

print(keys_view)
# Output: dict_keys(['apple', 'banana'])

prices["cherry"] = 2.00

print(keys_view)
# Output: dict_keys(['apple', 'banana', 'cherry'])

We create keys_view when the dictionary only has 2 items. Later, we add "cherry" to the dictionary. When we print keys_view again, it shows all 3 keys including the newly added "cherry". The view wasn't updated explicitly - it dynamically reflects the current state of the dictionary. This happens because views don't store data themselves, they just provide a live window into the dictionary.

Practical Use: Create a view once at the start of a function, modify the dictionary throughout the function, and the view always shows current data without re-querying!

Set Operations on Keys - Finding Common Keys: Use intersection (&) to find keys present in both dictionaries.

dict1 = {"a": 1, "b": 2, "c": 3}
dict2 = {"b": 5, "c": 6, "d": 7}

common = dict1.keys() & dict2.keys()

print(common)
# Output: {'b', 'c'}

The & operator performs set intersection on the two key views. It finds keys that exist in both dictionaries. Here, "b" and "c" appear in both dict1 and dict2, while "a" is only in dict1 and "d" is only in dict2. The result is a set (note the curly braces) containing shared keys. This is incredibly useful for finding overlapping configuration options, common attributes, or matching database fields.

Business Example: Compare available products in two warehouses - warehouse1.keys() & warehouse2.keys() shows items available in both locations.

Set Operations on Keys - Finding Unique Keys: Use subtraction (-) to find keys in first dict but not in second.

dict1 = {"a": 1, "b": 2, "c": 3}
dict2 = {"b": 5, "c": 6, "d": 7}

only_dict1 = dict1.keys() - dict2.keys()

print(only_dict1)
# Output: {'a'}

The - operator performs set difference - it removes all keys from dict1 that also appear in dict2. dict1 has keys "a", "b", "c". dict2 has "b", "c", "d". Subtracting dict2's keys from dict1's keys leaves only "a" because "b" and "c" are removed (they exist in both). This helps identify unique settings, find missing fields, or detect what changed between configurations.

Use Case: Find optional fields user filled out: user_data.keys() - required_fields.keys() shows extra fields beyond required ones.

Set Operations on Keys - Finding All Keys: Use union (|) to combine all keys from both dictionaries.

dict1 = {"a": 1, "b": 2, "c": 3}
dict2 = {"b": 5, "c": 6, "d": 7}

all_keys = dict1.keys() | dict2.keys()

print(all_keys)
# Output: {'a', 'b', 'c', 'd'}

The | operator performs set union - it combines all unique keys from both dictionaries. dict1 contributes "a", "b", "c" and dict2 contributes "b", "c", "d". The union eliminates duplicates ("b" and "c" appear in both but only counted once), resulting in {"a", "b", "c", "d"}. Perfect for discovering all possible fields across multiple data sources or building a comprehensive schema.

Data Analysis: Combine column names from multiple CSV files to build a master schema: csv1.keys() | csv2.keys() | csv3.keys().

Iterating Over Keys (Default): Loop through dictionary keys with simple for loop.

prices = {"apple": 1.50, "banana": 0.75, "cherry": 2.00}

for key in prices:
    print(key)
    
# Output:
# apple
# banana
# cherry

When you iterate over a dictionary directly (for key in prices), Python automatically iterates over the keys. This is the default behavior - you don't need to write for key in prices.keys(). Each iteration gives you one key, and you can use it to access the value with prices[key] if needed. This is the simplest iteration pattern when you only need to check keys or use keys for lookups.

Common Pattern: Validate required fields exist: for field in required_fields: assert field in user_data.

Iterating Over Values Only: Loop through values when you don't need the keys.

prices = {"apple": 1.50, "banana": 0.75, "cherry": 2.00}

for price in prices.values():
    print(price)
    
# Output:
# 1.50
# 0.75
# 2.00

Use .values() when you only care about the values and don't need to know which key each value came from. This gives you direct access to values without looking them up. Perfect for calculations like summing, averaging, finding min/max, or filtering values. You can't access the corresponding keys inside the loop unless you store them separately.

Calculation Example: Calculate total inventory value: total = sum(prices.values()) - gives 4.25 in one line!

Iterating Over Key-Value Pairs (Most Common): Loop through both keys and values simultaneously.

prices = {"apple": 1.50, "banana": 0.75, "cherry": 2.00}

for item, price in prices.items():
    print(f"{item}: ${price}")
    
# Output:
# apple: $1.50
# banana: $0.75
# cherry: $2.00

.items() returns tuples of (key, value) pairs. The for item, price in ... syntax automatically unpacks each tuple into two variables. This is the most common iteration pattern because you usually need both the key and value together - for display, comparison, filtering, or transformation. Each iteration gives you complete information about one dictionary entry.

Pro Tip: This is THE standard way to iterate dictionaries in Python. Use it whenever you need both key and value!

Reverse Iteration (Python 3.8+): Iterate keys in reverse insertion order.

prices = {"apple": 1.50, "banana": 0.75, "cherry": 2.00}

for key in reversed(prices):
    print(key)
    
# Output:
# cherry
# banana
# apple

Python 3.7+ guarantees dictionaries maintain insertion order (the order you added keys). reversed() lets you iterate in reverse insertion order - last added items first. Here, "cherry" was added last, so it appears first when reversed. This is useful for processing recent items first, undoing operations in reverse order, or displaying newest entries at the top.

Real-World Use: Show recent user activity feed in reverse chronological order (newest first) from a dictionary tracking timestamps.

Dictionary Iteration Best Practices:

Use .items(): When you need both key and value (most common case)

Use direct iteration: When you only need keys: for key in dict

Use .values(): When you only need values for calculations

Set operations: Use key views with &, -, | to compare dictionaries

Memory: Views are free - use them instead of converting to lists!

Caching with Dictionaries

Caching (also called memoization) is a technique where you store the results of expensive function calls and return the cached result when the same inputs occur again. This avoids recalculating the same thing multiple times. Dictionaries are perfect for caching because they provide O(1) lookup speed - checking if a result exists and retrieving it happens instantly, no matter how many cached items you have!

Why Caching Matters:

Without caching, repeating the same calculation wastes time and resources:

  • Fibonacci(100) without cache: millions of recursive calls
  • Fibonacci(100) with cache: just 100 calculations (one per number)
  • API call without cache: slow network request every time
  • API call with cache: instant response after first call

Manual Caching with Dictionary: Create a global cache dictionary to store computed results.

cache = {}

def fibonacci(n):
    if n in cache:
        return cache[n]
    
    if n <= 1:
        return n
    
    result = fibonacci(n-1) + fibonacci(n-2)
    cache[n] = result
    return result

print(fibonacci(10))
# Output: 55

We create an empty dictionary cache = {} outside the function to persist across calls. Inside fibonacci(n), the first thing we do is check if n in cache - if we've already calculated this number, return the stored result immediately. If not in cache, we calculate using the standard Fibonacci logic: fibonacci(n-1) + fibonacci(n-2). Before returning, we store the result in the cache: cache[n] = result. Now future calls with the same n will hit the cache!

Performance Impact: Without cache, fibonacci(35) takes ~5 seconds. With cache, it's instant! The improvement grows exponentially with larger numbers.

Cache Benefits Demonstration: See how cache dramatically speeds up repeated calls.

cache = {}

def fibonacci(n):
    if n in cache:
        return cache[n]
    if n <= 1:
        return n
    result = fibonacci(n-1) + fibonacci(n-2)
    cache[n] = result
    return result

print(fibonacci(100))  # First call - builds cache
# Output: 354224848179261915075

print(fibonacci(100))  # Second call - instant!
# Output: 354224848179261915075 (from cache)

The first call to fibonacci(100) recursively calculates all values from 0 to 100, storing each in the cache. This takes time but only happens once. The second call with the same argument (100) checks the cache first, finds the result already stored, and returns it immediately without any calculation. The cache is persistent across function calls because it's defined in the global scope.

Key Insight: First call = slow (builds cache), all subsequent calls = instant (use cache). This pattern is gold for frequently called functions with repeated inputs!

Built-in Caching with @lru_cache: Python provides automatic caching with a decorator.

from functools import lru_cache

@lru_cache(maxsize=128)
def fib_cached(n):
    if n <= 1:
        return n
    return fib_cached(n-1) + fib_cached(n-2)

print(fib_cached(100))
# Output: 354224848179261915075

The @lru_cache decorator (LRU = Least Recently Used) automatically caches function results without manual dictionary management. maxsize=128 means it keeps the 128 most recent results - when you exceed this limit, the oldest cached result gets deleted to make room. Just add one line (@lru_cache) above your function and caching happens automatically! Python handles the dictionary, cache hits, and cache eviction for you.

When to Use: lru_cache is perfect for production code - it's battle-tested, thread-safe, and handles memory limits automatically. Use manual cache only when you need custom eviction logic.

Caching API Responses: Avoid repeated network calls by caching external data.

api_cache = {}

def fetch_user(user_id):
    if user_id in api_cache:
        print("Cache hit!")
        return api_cache[user_id]
    
    print("Fetching from API...")
    user_data = {"id": user_id, "name": f"User{user_id}"}
    api_cache[user_id] = user_data
    return user_data

user = fetch_user(123)  # Output: Fetching from API...
user = fetch_user(123)  # Output: Cache hit!

This pattern is crucial for web applications. Calling an external API (database, REST endpoint, file system) is slow - often 100-1000ms per request. The first call to fetch_user(123) checks the cache (empty), makes the real API call (simulated here with print + dictionary creation), stores the result, and returns it. The second call finds user_id 123 already in the cache and returns instantly without touching the network.

Production Use: Cache user profiles, product details, configuration settings, geolocation data - anything that doesn't change often but gets requested frequently. One API call saved = 100ms+ saved!

Cache Expiration Pattern: Add timestamps to invalidate old cached data.

import time

cache_with_time = {}

def fetch_data(key, ttl=60):
    now = time.time()
    
    if key in cache_with_time:
        data, timestamp = cache_with_time[key]
        if now - timestamp < ttl:
            print("Fresh cache hit!")
            return data
    
    print("Fetching fresh data...")
    fresh_data = f"Data for {key}"
    cache_with_time[key] = (fresh_data, now)
    return fresh_data

result = fetch_data("config")  # Output: Fetching fresh data...
result = fetch_data("config")  # Output: Fresh cache hit!

This advanced pattern stores tuples of (data, timestamp) instead of just data. ttl=60 means "time to live" - cached data expires after 60 seconds. When checking cache, we compare current time now = time.time() with the stored timestamp. If now - timestamp < ttl (less than 60 seconds old), the cache is still fresh and we return it. If older, we fetch new data and update the cache with a new timestamp. This prevents serving stale data!

Real-World Example: Cache stock prices for 5 minutes, weather data for 1 hour, or database queries for 10 seconds. Balance freshness vs performance based on how fast your data changes!

Caching Best Practices:

Use @lru_cache: For pure functions (same input = same output) in production code

Manual cache: When you need custom eviction, TTL, or cache inspection

Cache invalidation: Clear cache when underlying data changes (hardest problem in CS!)

Memory limits: Always set maxsize to prevent unbounded memory growth

Don't cache: Functions with side effects, random outputs, or time-sensitive data

Dictionary Inverse Mapping

Inverse mapping means swapping keys and values to create a reverse lookup dictionary. If you have a dictionary mapping names to IDs, an inverse mapping creates a dictionary mapping IDs back to names. This bidirectional lookup is essential for databases, translation systems, and any scenario where you need to search in both directions without scanning the entire dictionary.

When You Need Inverse Mapping:

Common scenarios requiring reverse lookups:

  • Database: Look up user by ID or find ID by username
  • Translation: Convert English→French and French→English
  • Encoding: Map symbols to codes and codes back to symbols
  • Configuration: Find setting by name or identify setting by value

Basic Inverse Mapping: Swap keys and values using dictionary comprehension.

student_ids = {"Alice": 101, "Bob": 102, "Charlie": 103}

id_to_name = {v: k for k, v in student_ids.items()}

print(id_to_name)
# Output: {101: 'Alice', 102: 'Bob', 103: 'Charlie'}

The dictionary comprehension {v: k for k, v in student_ids.items()} iterates through all key-value pairs in the original dictionary. For each pair, it swaps them: the original value v becomes the new key, and the original key k becomes the new value. The original dictionary maps names→IDs ("Alice"→101), and the inverted dictionary maps IDs→names (101→"Alice"). Now you can search both ways!

Requirement: This only works if all values in the original dictionary are unique. If two keys have the same value (like two students with ID 101), one will overwrite the other in the inverse mapping!

Bidirectional Lookup Class: Create a class that maintains both forward and reverse mappings.

class BiDict:
    def __init__(self, mapping):
        self.forward = mapping
        self.reverse = {v: k for k, v in mapping.items()}
    
    def get_name(self, id):
        return self.reverse.get(id)
    
    def get_id(self, name):
        return self.forward.get(name)

student_ids = {"Alice": 101, "Bob": 102, "Charlie": 103}
lookup = BiDict(student_ids)

print(lookup.get_name(101))
# Output: Alice

print(lookup.get_id("Bob"))
# Output: 102

The BiDict class wraps bidirectional lookup functionality. In __init__, we store the original mapping in self.forward and create the inverse in self.reverse. The get_name(id) method searches the reverse mapping (IDs→names), while get_id(name) searches the forward mapping (names→IDs). Both use .get() which returns None if the key doesn't exist, preventing errors.

Production Benefit: Encapsulating both directions in one object ensures they stay synchronized. If you update one, you can update the other. This pattern is used in ORMs (database models) and translation libraries.

Handling Duplicate Values with Grouping: When multiple keys share the same value, group them in lists.

from collections import defaultdict

data = {"a": 1, "b": 2, "c": 1, "d": 2}
inverted = defaultdict(list)

for key, value in data.items():
    inverted[value].append(key)

print(dict(inverted))
# Output: {1: ['a', 'c'], 2: ['b', 'd']}

The original dictionary has duplicate values: both "a" and "c" map to 1, and both "b" and "d" map to 2. A simple inverse mapping would lose data (only keeping one key per value). Instead, we use defaultdict(list) which automatically creates an empty list for each new value. As we iterate through data.items(), we append each key to the list associated with its value. The result groups all keys by their value: value 1 has keys ['a', 'c'], value 2 has keys ['b', 'd'].

Real-World Use: Group products by price, categorize users by subscription tier, find all files with the same size, or identify students with the same grade. Any "group by value" operation!

Inverse Mapping with Transformation: Create reverse lookup with modified values.

code_to_symbol = {
    "alpha": "α",
    "beta": "β",
    "gamma": "γ"
}

# Reverse: symbol to code (uppercase)
symbol_to_code = {v: k.upper() for k, v in code_to_symbol.items()}

print(symbol_to_code)
# Output: {'α': 'ALPHA', 'β': 'BETA', 'γ': 'GAMMA'}

During inversion, you can transform the values. Here we swap keys and values, but also convert the original keys to uppercase using k.upper(). The original maps code→symbol ("alpha"→"α"), and the inverse maps symbol→uppercase code ("α"→"ALPHA"). This pattern is useful when you need different representations in each direction - like storing lowercase in the database but displaying uppercase to users.

Advanced Pattern: Apply any transformation during inversion - lowercase, strip whitespace, convert types, format strings. The comprehension gives you full control!

Inverse Mapping Considerations:

Uniqueness: Only works if values are unique (or use defaultdict(list) for duplicates)

Hashability: Values must be hashable (immutable) - no lists or dicts as values!

Memory: Storing both forward and reverse doubles memory usage

Synchronization: If original dict changes, remember to rebuild the inverse

Performance: Creating inverse is O(n), but lookups in both directions become O(1)!

Performance Optimization Tips

Understanding dictionary internals helps you write faster code. Follow these guidelines for optimal performance.

Do This
  • Use key in dict for membership testing
  • Pre-size dictionaries if you know size: dict.fromkeys(range(1000))
  • Use dict.get(key, default) to avoid KeyError
  • Use comprehensions for transformation
  • Use collections.Counter for counting
Avoid This
  • Don't use try/except KeyError for normal flow
  • Don't modify dict while iterating over it
  • Don't use lists when you need fast lookups
  • Don't forget keys must be immutable (hashable)
  • Don't use dicts for ordered data (use OrderedDict)
Memory consideration: Dictionaries use more memory than lists due to hash table overhead. For large datasets with sequential integer keys, consider using lists or arrays instead.

Practice: Advanced Patterns

Task: Create a dictionary catalog with product IDs as keys. Each product should have name, price, and category fields. Add at least 3 products, then print all products in the "electronics" category.

Show Solution
catalog = {
    "P001": {"name": "Laptop", "price": 999, "category": "electronics"},
    "P002": {"name": "Mouse", "price": 25, "category": "electronics"},
    "P003": {"name": "Desk", "price": 299, "category": "furniture"}
}

# Find electronics
electronics = {pid: info for pid, info in catalog.items() 
               if info["category"] == "electronics"}

for pid, info in electronics.items():
    print(f"{pid}: {info['name']} - ${info['price']}")

Task: Given students = [("Alice", "A"), ("Bob", "B"), ("Charlie", "A"), ("Diana", "B")], use defaultdict to group students by their grade. Print the result.

Show Solution
from collections import defaultdict

students = [("Alice", "A"), ("Bob", "B"), ("Charlie", "A"), ("Diana", "B")]

groups = defaultdict(list)
for name, grade in students:
    groups[grade].append(name)

print(dict(groups))
# {'A': ['Alice', 'Charlie'], 'B': ['Bob', 'Diana']}

Task: Create an LRU (Least Recently Used) cache class with get(key) and put(key, value) methods. When capacity is exceeded, remove the least recently used item.

Show Solution
from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity
    
    def get(self, key):
        if key not in self.cache:
            return None
        # Move to end (most recent)
        self.cache.move_to_end(key)
        return self.cache[key]
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            # Remove first (least recent)
            self.cache.popitem(last=False)

# Test
cache = LRUCache(2)
cache.put("a", 1)
cache.put("b", 2)
print(cache.get("a"))  # 1
cache.put("c", 3)  # Evicts "b"
print(cache.get("b"))  # None

Real-World Applications

Dictionaries and sets aren't just academic concepts - they're essential tools in production code that powers real applications millions of people use every day. When you fetch data from web APIs (like weather apps, social media, or payment systems), it comes as JSON - which maps directly to Python dictionaries! When you need to store application settings for different environments (development, testing, production), dictionaries organize them perfectly. When analyzing business data or building reports, dictionaries aggregate and structure information. This section shows you exactly how professionals use these tools in real-world scenarios.

Why This Matters - Career Perspective: Every Python job interview includes questions about dictionaries. Every professional project uses them extensively. Understanding these real-world patterns isn't optional - it's what separates "I learned Python" from "I can build production applications with Python."
What You'll Build Skills For:
  • Web Development: Process API responses, handle user data, manage sessions
  • Data Analysis: Aggregate metrics, generate reports, track statistics
  • DevOps/Config: Manage settings across environments, handle deployment configs
  • Performance: Implement caching to make apps 100x faster
  • Data Cleaning: Remove duplicates, validate inputs, normalize datasets

1. JSON API Processing

Web APIs (like GitHub, Twitter, Google Maps) return data in JSON format, which Python automatically converts to dictionaries. This makes dictionaries the primary data structure for web development and API integration. Understanding how to navigate, extract, validate, and transform API responses is essential for modern Python development.

Why JSON Maps to Dictionaries:

JSON and Python dictionaries are nearly identical in structure:

  • JSON objects {} become Python dictionaries
  • JSON arrays [] become Python lists
  • JSON supports nesting, just like dictionaries can contain dictionaries
  • Python's json.loads() converts JSON strings to dicts automatically

Parsing JSON Response: Convert JSON string to Python dictionary.

import json

api_response = '''
{
  "user": {
    "id": 12345,
    "username": "alice_dev",
    "email": "alice@example.com"
  }
}
'''

data = json.loads(api_response)

print(data["user"]["username"])
# Output: alice_dev

The json.loads() function (loads = "load string") parses a JSON-formatted string and returns a Python dictionary. The triple quotes ''' allow multi-line strings. Once parsed into data, you can access nested values using bracket notation: data["user"]["username"] navigates to the user object, then gets the username field. This is how you handle responses from requests.get(), API calls, or any JSON data source.

Real-World Use: Every time you call a REST API (weather API, payment gateway, social media API), you receive JSON. Parse it with json.loads() and work with familiar dictionary operations!

Navigating Deeply Nested Data: Access data buried multiple levels deep.

import json

api_response = '''
{
  "user": {
    "username": "alice_dev",
    "profile": {
      "bio": "Python developer",
      "location": "San Francisco",
      "followers": 1523
    }
  }
}
'''

data = json.loads(api_response)

username = data["user"]["username"]
location = data["user"]["profile"]["location"]

print(f"{username} from {location}")
# Output: alice_dev from San Francisco

APIs often return deeply nested structures with multiple levels of objects. To access location, we navigate: data (root) → ["user"] (first level) → ["profile"] (second level) → ["location"] (third level). Each bracket accesses one level deeper. Chain as many brackets as needed to reach your target data. The f-string combines both values for clean output.

Beginner Tip: If you get lost in nested data, print intermediate steps: print(data["user"]), then print(data["user"]["profile"]) to verify each level exists.

Filtering Data with List Comprehension: Extract specific items from API arrays.

import json

api_response = '''
{
  "user": {
    "repositories": [
      {"name": "awesome-project", "stars": 245, "language": "Python"},
      {"name": "cool-tool", "stars": 89, "language": "JavaScript"},
      {"name": "data-analyzer", "stars": 156, "language": "Python"}
    ]
  }
}
'''

data = json.loads(api_response)

python_repos = [
    repo for repo in data["user"]["repositories"] 
    if repo["language"] == "Python"
]

print([r['name'] for r in python_repos])
# Output: ['awesome-project', 'data-analyzer']

The repositories field is a list of dictionaries (JSON array of objects). We use list comprehension to filter: iterate through each repo dictionary in the list, check if repo["language"] == "Python", and keep only matches. The second comprehension extracts just the names from filtered repos. This pattern is extremely common for processing API results - filtering products by category, finding active users, selecting high-rated items, etc.

Production Pattern: GitHub API returns all your repositories - use this filtering to show only Python projects, or only repos with 100+ stars, or repos updated in the last month.

Safe Navigation with Chained .get(): Prevent errors when API data is missing or incomplete.

import json

api_response = '''
{
  "user": {
    "username": "bob_dev"
  }
}
'''

data = json.loads(api_response)

# Dangerous: KeyError if profile or followers missing
# followers = data["user"]["profile"]["followers"]

# Safe: Returns 0 if any key is missing
followers = data.get("user", {}).get("profile", {}).get("followers", 0)

print(f"Followers: {followers}")
# Output: Followers: 0

APIs can be unreliable - fields might be missing, null, or have different structures than documented. Using brackets like data["user"]["profile"]["followers"] will crash with KeyError if "profile" doesn't exist. Chained .get() is the safe alternative: data.get("user", {}) returns the user dict if it exists, or empty dict {} if missing. Then .get("profile", {}) on that result, and finally .get("followers", 0) returning 0 as the ultimate default. No crashes!

Critical for Production: Third-party APIs change without notice. Optional fields appear and disappear. Always use .get() with sensible defaults to prevent your app from crashing when API structure changes.

Modifying and Converting Back to JSON: Update dictionary data and serialize back to JSON string.

import json

api_response = '''{"user": {"profile": {"followers": 100}}}'''

data = json.loads(api_response)

# Modify data
data["user"]["profile"]["followers"] += 1

# Convert back to JSON
json_string = json.dumps(data, indent=2)

print(json_string)
# Output:
# {
#   "user": {
#     "profile": {
#       "followers": 101
#     }
#   }
# }

After parsing JSON to a dictionary, you can modify it like any dict: data["user"]["profile"]["followers"] += 1 increments the follower count. To send this updated data back to an API or save to a file, convert it back to JSON using json.dumps() (dumps = "dump string"). The indent=2 parameter formats the output with 2-space indentation, making it human-readable. Without indent, it's a single compact line.

Common Workflow: Fetch JSON from API → parse to dict → modify data → convert back to JSON → send PATCH/PUT request to update the server. This is how you update user profiles, edit documents, or modify any API resource!

JSON API Processing Best Practices:

Always use .get(): Prevent KeyError crashes when API fields are missing or optional

Validate structure: Check expected keys exist before deep navigation

Handle errors: Wrap API calls in try/except to catch network and parsing errors

Use indent: json.dumps(data, indent=2) for debugging and logging

Type checking: Verify data types (isinstance) before operations to avoid type errors

2. Configuration Management

Modern applications need different settings for different environments (development on your laptop, production on servers). Dictionaries are perfect for managing these configurations because they organize settings hierarchically and allow easy environment switching. This pattern is used in Django, Flask, FastAPI, and virtually every professional Python application.

Why Dictionaries for Configuration:

Advantages over hardcoding or separate files:

  • Centralized: All settings in one place
  • Hierarchical: Group related settings (database, cache, logging)
  • Environment-aware: Switch between dev/prod with one variable
  • Easy to override: Merge defaults with environment-specific settings

Multi-Environment Configuration Structure: Define settings for each environment.

config = {
    "development": {
        "debug": True,
        "database": {
            "host": "localhost",
            "port": 5432,
            "name": "dev_db"
        },
        "cache": {"enabled": False},
        "log_level": "DEBUG"
    },
    "production": {
        "debug": False,
        "database": {
            "host": "prod-db.example.com",
            "port": 5432,
            "name": "prod_db"
        },
        "cache": {"enabled": True, "ttl": 300},
        "log_level": "ERROR"
    }
}

Create a master config dictionary with top-level keys for each environment ("development", "production", "staging", etc.). Each environment has its own nested dictionary containing all necessary settings. Development uses localhost database with debug mode on and verbose DEBUG logging for troubleshooting. Production uses remote database server, disables debug mode (for security and performance), enables caching with 300-second TTL, and only logs ERROR level messages to reduce log volume.

Best Practice: Keep structure identical across environments (same keys) but different values. This prevents bugs where code expects a setting that only exists in one environment.

Environment Selection: Choose which environment to use at runtime.

ENV = "development"
current_config = config[ENV]

print(f"Running in {ENV} mode")
# Output: Running in development mode

Set the ENV variable to choose which environment configuration to use. In production, this would typically come from an environment variable: ENV = os.getenv('APP_ENV', 'development'). Selecting config[ENV] extracts just that environment's settings. Now current_config contains only development settings. Change ENV to "production" and the entire app uses production settings - no code changes needed!

Deployment Pattern: Set APP_ENV=production on server, APP_ENV=development on local machine. Same codebase, different behavior based on environment.

Accessing Nested Settings: Extract specific configuration values.

DEBUG = current_config["debug"]
DB_HOST = current_config["database"]["host"]
DB_PORT = current_config["database"]["port"]

print(f"Database: {DB_HOST}:{DB_PORT}")
print(f"Debug: {DEBUG}")
# Output:
# Database: localhost:5432
# Debug: True

Extract configuration values by navigating the nested dictionary structure. current_config["debug"] gets the top-level debug flag. current_config["database"]["host"] drills two levels deep: first into the database object, then gets the host property. These values are then used throughout your application to connect to the database, enable/disable debug features, etc. No hardcoded connection strings!

Real Example: Pass these to database connector: connect(host=DB_HOST, port=DB_PORT). Switch environments and it automatically connects to different servers.

Merging with Defaults: Combine base defaults with environment-specific overrides.

DEFAULT_CONFIG = {
    "timeout": 30,
    "retry_count": 3,
    "log_level": "INFO"
}

app_config = {**DEFAULT_CONFIG, **current_config}

print(f"Log level: {app_config['log_level']}")
# Output: Log level: DEBUG

Define DEFAULT_CONFIG with sensible defaults for all settings. Then merge it with environment-specific config using unpacking: {**DEFAULT_CONFIG, **current_config}. This creates a new dictionary starting with defaults, then overlaying environment settings. Keys that exist in both (like "log_level") get overridden by the environment value ("DEBUG" beats "INFO"). Keys only in defaults ("timeout", "retry_count") are included automatically. Result: complete configuration with no missing keys!

Why This Matters: You don't need to define every setting in every environment. Define common defaults once, override only what changes. Reduces duplication and maintenance.

Configuration Best Practices:

Never commit secrets: Use environment variables for passwords, API keys, tokens

Load from files: Store config in JSON/YAML files, load with json.load()

Validate on startup: Check required keys exist before app runs

Use .get() with defaults: config.get('timeout', 30) prevents crashes

Document settings: Add comments explaining what each setting controls

3. Data Aggregation and Reporting

Data aggregation means summarizing large datasets by grouping, counting, and calculating totals or averages. This is the foundation of business intelligence, analytics dashboards, and data science. Dictionaries make aggregation fast and intuitive - you can group by any field, count frequencies, and compute statistics without databases or complex libraries.

Common Aggregation Tasks:

What you can do with dictionary-based aggregation:

  • Group sales by product/region/date and calculate totals
  • Count frequency of events (page views, errors, purchases)
  • Find top N items (bestsellers, most active users, common errors)
  • Calculate averages, min/max, percentages across categories

Sample Dataset: List of transaction dictionaries representing sales data.

transactions = [
    {"date": "2026-01-20", "product": "Laptop", "amount": 999, "region": "West"},
    {"date": "2026-01-20", "product": "Mouse", "amount": 25, "region": "East"},
    {"date": "2026-01-21", "product": "Laptop", "amount": 999, "region": "West"},
    {"date": "2026-01-21", "product": "Keyboard", "amount": 75, "region": "East"},
    {"date": "2026-01-21", "product": "Mouse", "amount": 25, "region": "West"},
    {"date": "2026-01-22", "product": "Laptop", "amount": 999, "region": "East"},
]

This list contains 6 transactions, where each transaction is a dictionary with date, product name, sale amount, and region. This structure mirrors what you'd get from a database query, CSV file, or API response. Each dictionary represents one row of data. We'll use this dataset to demonstrate various aggregation patterns - summing revenue, counting sales, finding top products, etc. Real datasets have thousands or millions of rows, but the patterns are identical.

Real-World Source: SQL query like SELECT * FROM transactions converted to dictionaries, or reading CSV with csv.DictReader().

Revenue by Product (Grouping and Summing): Calculate total revenue for each product.

from collections import defaultdict

revenue_by_product = defaultdict(int)

for t in transactions:
    revenue_by_product[t["product"]] += t["amount"]

print("Revenue by product:")
for product, revenue in sorted(revenue_by_product.items(), key=lambda x: x[1], reverse=True):
    print(f"  {product}: ${revenue}")
# Output:
# Revenue by product:
#   Laptop: $2997
#   Keyboard: $75
#   Mouse: $50

We use defaultdict(int) which creates 0 for missing keys automatically. Loop through each transaction, extract the product name, and add that transaction's amount to the product's total. Laptop appears 3 times (999+999+999=2997), Mouse appears 2 times (25+25=50), Keyboard once (75). The sorted() line sorts products by revenue descending (highest first) using key=lambda x: x[1] (sort by value, not key) with reverse=True.

Business Impact: This one loop answers "What products generate most revenue?" - critical for inventory planning, marketing focus, and business strategy.

Sales Count by Product (Frequency Analysis): Count how many times each product was sold.

from collections import Counter

sales_count = Counter(t["product"] for t in transactions)

print(f"Top seller: {sales_count.most_common(1)[0]}")
# Output: Top seller: ('Laptop', 3)

Counter is perfect for counting occurrences. The generator expression t["product"] for t in transactions extracts just the product names from all transactions. Counter tallies them: Laptop=3, Mouse=2, Keyboard=1. most_common(1) returns the single most frequent item as a list with one tuple: [('Laptop', 3)]. [0] extracts that tuple. This tells you bestseller by quantity sold (not revenue).

Key Difference: Revenue = dollars generated (Laptop wins: $2997). Sales count = number of transactions (Laptop still wins: 3 sales). Different metrics, different insights!

Average Transaction Value: Calculate mean purchase amount across all transactions.

total_amount = sum(t["amount"] for t in transactions)
avg_transaction = total_amount / len(transactions)

print(f"Average transaction: ${avg_transaction:.2f}")
# Output: Average transaction: $520.33

Use generator expression to extract all amounts, then sum() adds them up: 999+25+999+75+25+999=3122. Divide by total number of transactions (6) to get average: 3122/6≈520.33. The :.2f formats to 2 decimal places. Average transaction value is a key metric for understanding customer spending behavior - helps set pricing, identify high-value vs low-value purchases, and forecast revenue.

Business Insight: If average is $520 but most individual items cost $25-$75, it means a few high-ticket items (laptops) are pulling the average up - you have diverse customer segments!

Daily Revenue Tracking: Sum sales grouped by date to see trends over time.

from collections import defaultdict

daily_revenue = defaultdict(int)

for t in transactions:
    daily_revenue[t["date"]] += t["amount"]

for date in sorted(daily_revenue.keys()):
    print(f"{date}: ${daily_revenue[date]}")
# Output:
# 2026-01-20: $1024
# 2026-01-21: $1099
# 2026-01-22: $999

Same aggregation pattern but grouping by date instead of product. Loop through transactions, use date as key, accumulate amounts. Jan 20: 999+25=1024. Jan 21: 999+75+25=1099. Jan 22: 999. Sorting by date (sorted(daily_revenue.keys())) ensures chronological order for time series analysis. This creates a simple time series showing revenue trend day by day.

Analytics Use: Identify peak sales days, spot trends (growing/declining), detect anomalies, forecast future revenue, plan inventory restocking. The foundation of dashboards and reports!

Aggregation Patterns Summary:

Summing values: Use defaultdict(int) and += to accumulate totals

Counting frequency: Use Counter() for automatic tallying

Grouping items: Use defaultdict(list) and .append() to collect by category

Finding top N: Use Counter.most_common(n) or sort by value

Performance: These patterns handle millions of rows efficiently - O(n) time complexity!

4. Caching and Performance Optimization

Caching stores results of expensive operations (database queries, API calls, complex computations) so repeated requests return instantly without redoing the work. This decorator pattern is a production-grade technique used by professional developers to dramatically improve application performance. Understanding decorators with caching teaches you advanced Python patterns used everywhere.

When Caching Provides Massive Speedups:

Operations worth caching (100x-1000x faster with cache):

  • Database queries (10-100ms → 0.001ms)
  • External API calls (100-1000ms → 0.001ms)
  • Complex calculations (seconds/minutes → instant)
  • File I/O operations (1-10ms → 0.001ms)

Caching Decorator Function: Create reusable decorator that caches any function's results.

from functools import wraps

def memoize(func):
    cache = {}
    
    @wraps(func)
    def wrapper(*args):
        if args not in cache:
            print(f"Computing {func.__name__}{args}...")
            cache[args] = func(*args)
        else:
            print(f"Cache hit for {func.__name__}{args}")
        return cache[args]
    
    return wrapper

A decorator is a function that wraps another function to add functionality. memoize creates a cache dictionary that persists across calls. The inner wrapper function receives the arguments (*args), checks if args exists as a key in cache. If not found, it calls the original function func(*args), stores result in cache[args], and returns it. If found, skip the function call and return cached result directly. @wraps(func) preserves the original function's name and docstring for debugging.

How Decorators Work: @memoize is syntactic sugar for expensive_operation = memoize(expensive_operation). The decorator wraps the original function with caching logic transparently!

Applying Cache Decorator: Add caching to expensive function with one line.

import time

@memoize
def expensive_operation(n):
    time.sleep(1)  # Simulate slow computation
    return n * n

result = expensive_operation(5)
print(f"Result: {result}")
# Output: Computing expensive_operation(5)...
#         Result: 25

result = expensive_operation(5)
print(f"Result: {result}")
# Output: Cache hit for expensive_operation(5)
#         Result: 25

@memoize decorator is applied above the function definition, which wraps expensive_operation with caching. The function simulates a slow operation with time.sleep(1) (imagine a database query or API call). First call with argument 5: cache is empty, so it executes the function (takes 1 second), stores result 25 in cache, and returns 25. Second call with same argument 5: cache contains (5) → 25, so it returns 25 instantly (0.001 seconds) without calling the function!

Real-World Impact: Imagine a website where 1000 users request the same product page. Without cache: 1000 database queries (10 seconds total). With cache: 1 query + 999 instant hits (0.01 seconds total). 1000x speedup!

Time-Based Cache Class: Implement cache with automatic expiration to prevent stale data.

import time

class TimedCache:
    def __init__(self, ttl=60):
        self.cache = {}
        self.ttl = ttl
    
    def get(self, key):
        if key in self.cache:
            value, timestamp = self.cache[key]
            if time.time() - timestamp < self.ttl:
                return value
            else:
                del self.cache[key]
        return None
    
    def set(self, key, value):
        self.cache[key] = (value, time.time())

This class implements a cache with TTL (Time To Live) expiration. __init__ creates empty cache dict and stores TTL in seconds (default 60). set(key, value) stores a tuple of (value, current_timestamp) - both the data and when it was cached. get(key) checks if key exists, unpacks the value and timestamp, calculates age (time.time() - timestamp), and returns value only if age < TTL. If expired, deletes the entry and returns None. This prevents serving stale data!

When TTL is Critical: Stock prices (5 min), weather (1 hour), news headlines (15 min), user profiles (5 min). Balance freshness vs performance based on how fast your data changes.

Using TimedCache: Store and retrieve data with automatic expiration.

cache = TimedCache(ttl=5)

cache.set("user:123", {"name": "Alice", "email": "alice@example.com"})

user = cache.get("user:123")
print(user)
# Output: {'name': 'Alice', 'email': 'alice@example.com'}

time.sleep(6)

user = cache.get("user:123")
print(user)
# Output: None

Create cache with 5-second TTL. Store user data with set() - internally saves (user_dict, current_time). Immediately calling get() returns the user data because less than 5 seconds elapsed. Wait 6 seconds with time.sleep(6) to exceed TTL. Calling get() again checks timestamp, sees 6 seconds > 5 second TTL, deletes the expired entry, and returns None. This forces fresh data fetch.

Production Pattern: Set TTL based on data volatility: user preferences (5 min), product inventory (30 sec), exchange rates (1 min), static content (1 hour). Shorter TTL = fresher data but more database hits.

Caching Best Practices:

Cache what's expensive: Database queries, API calls, file I/O, heavy computations

Set appropriate TTL: Balance data freshness vs cache hit rate

Cache invalidation: Clear cache when underlying data changes (hardest problem in CS!)

Memory limits: Use LRU cache with maxsize to prevent unbounded growth

Don't cache: User-specific data (unless keyed by user ID), random values, time-sensitive operations

5. Data Deduplication

Deduplication is the process of identifying and removing duplicate entries from datasets. This is critical for data quality - duplicate user accounts, repeated transactions, or redundant log entries corrupt analytics and waste storage. Sets provide O(1) membership testing (instant lookup regardless of size), making them perfect for efficient deduplication even with millions of records.

Why Deduplication Matters:

Real-world scenarios requiring duplicate removal:

  • User registrations: Prevent multiple accounts with same email
  • Database imports: Merge data from multiple sources without duplicates
  • Log analysis: Count unique visitors, not total page views
  • E-commerce: Prevent duplicate orders, identify repeat customers

Sample Data with Duplicates: List of user registrations containing duplicate emails.

registrations = [
    {"email": "alice@example.com", "name": "Alice"},
    {"email": "bob@example.com", "name": "Bob"},
    {"email": "alice@example.com", "name": "Alice"},  # Duplicate
    {"email": "charlie@example.com", "name": "Charlie"},
    {"email": "bob@example.com", "name": "Robert"},  # Duplicate
]

We have 5 registrations, but alice@example.com appears twice (positions 0 and 2) and bob@example.com appears twice (positions 1 and 4). These duplicates might come from users registering multiple times, data import errors, or system glitches. We need to keep only the first occurrence of each email and discard the rest. Notice the second Bob entry has a different name ("Robert" vs "Bob") - duplicates aren't always identical, they just share a key field!

Real Example: Newsletter signups where same person subscribes multiple times, or e-commerce checkout where user clicks submit button twice creating duplicate orders.

Removing Duplicates with Set Tracking: Keep only first occurrence of each unique email.

seen_emails = set()
unique_users = []

for user in registrations:
    email = user["email"]
    if email not in seen_emails:
        seen_emails.add(email)
        unique_users.append(user)

print(f"Original: {len(registrations)} users")
print(f"Unique: {len(unique_users)} users")
# Output:
# Original: 5 users
# Unique: 3 users

Create empty set seen_emails to track emails we've already processed. Loop through registrations: extract email, check if email not in seen_emails (set membership is O(1) - instant!). If email is new, add it to seen_emails set and append the user to unique_users list. If email already in set, skip this user (it's a duplicate). Result: unique_users contains only first occurrence of each email - 3 users instead of 5.

Performance Magic: For 1 million registrations with 100K duplicates, set-based deduplication takes ~0.1 seconds. List-based checking (if email in emails_list) would take 10+ minutes because list search is O(n)!

Identifying Duplicate Emails: Find which emails appear multiple times and how many times.

from collections import Counter

email_counts = Counter(u["email"] for u in registrations)
duplicates = {email: count for email, count in email_counts.items() if count > 1}

print(f"Duplicate emails: {duplicates}")
# Output: Duplicate emails: {'alice@example.com': 2, 'bob@example.com': 2}

Use Counter to count how many times each email appears. The generator extracts all emails from registrations. Counter tallies: alice@example.com=2, bob@example.com=2, charlie@example.com=1. Dictionary comprehension filters only entries where count > 1, creating a dict of duplicates. This report shows which emails are duplicated and how many times - useful for auditing data quality issues or contacting users who registered multiple times.

Data Quality Use: Generate report for admins showing duplicate accounts, then send automated emails asking users to consolidate accounts, or flag suspicious activity (bots, fraud).

Merging Multiple Data Sources (Union): Combine unique IDs from different sources.

source1_ids = {101, 102, 103, 104}
source2_ids = {103, 104, 105, 106}
source3_ids = {105, 106, 107}

all_ids = source1_ids | source2_ids | source3_ids

print(f"Total unique IDs: {len(all_ids)}")
# Output: Total unique IDs: 7

Three data sources each contain sets of user IDs. Some IDs appear in multiple sources (103, 104 in both source1 and source2). Use set union operator | to combine all three sets, automatically eliminating duplicates. Source1 has 4 IDs, source2 has 4, source3 has 3 = 11 total, but only 7 unique IDs after deduplication: {101, 102, 103, 104, 105, 106, 107}. Order doesn't matter in sets.

Real Scenario: Merge customer lists from acquisition, website signups, and mobile app registrations. Union gives total unique customers across all channels without double-counting!

Finding Common IDs (Intersection): Identify IDs present in all sources.

common_ids = source1_ids & source2_ids & source3_ids

print(f"Common IDs: {common_ids}")
# Output: Common IDs: set()

Set intersection & finds elements present in all sets simultaneously. Source1={101,102,103,104}, source2={103,104,105,106}, source3={105,106,107}. No ID appears in all three! 103,104 are in source1 and source2 but not source3. 105,106 are in source2 and source3 but not source1. Result is empty set. If we had {103} in all three, output would be {103}. Useful for finding customers active across all platforms, or items in stock at all warehouses.

Marketing Use: Find users who engaged with email campaign AND visited website AND made purchase - your most valuable segment for retargeting!

Finding Exclusive IDs (Difference): Identify IDs unique to one source.

exclusive_to_source1 = source1_ids - source2_ids - source3_ids

print(f"Exclusive to source1: {exclusive_to_source1}")
# Output: Exclusive to source1: {101, 102}

Set difference - removes elements that exist in other sets. Start with source1={101,102,103,104}. Subtract source2={103,104,105,106} → removes 103,104 → leaves {101,102}. Subtract source3={105,106,107} → removes nothing (105,106,107 weren't in remaining set) → final result {101,102}. These IDs exist ONLY in source1, not in source2 or source3. Perfect for finding customers who only use one channel, or products sold in one region only.

Business Intelligence: Identify customers who abandoned mobile app (in source1) but never migrated to web (not in source2/3). Target them with "try our website" campaign!

Deduplication Performance:

Set membership: O(1) constant time - instant for any dataset size

List membership: O(n) linear time - slower as dataset grows

Real impact: 1M records: set=0.1s, list=10min (6000x faster!)

Memory tradeoff: Sets use more memory than lists, but speed gain is worth it

Always use sets: For deduplication, uniqueness checking, or membership testing with large datasets

Key Takeaways

Key-Value Mapping

Dictionaries store data as key-value pairs, enabling instant lookups like a phonebook instead of sequential list scanning

O(1) Constant Time

Hash tables provide constant-time lookups, insertions, and deletions regardless of dictionary size

Immutable Keys

Only hashable objects (strings, numbers, tuples) can be dictionary keys since their hash value must remain constant

Sets for Uniqueness

Sets automatically remove duplicates and provide fast membership testing with O(1) "in" operations

Comprehensions

Dict and set comprehensions create collections in one line: {k: v for ...} for dicts, {x for ...} for sets

Safe Access Patterns

Use get(key, default) for safe access and setdefault() when building dictionaries incrementally

Knowledge Check

1 What data structure underlies Python dictionaries?
2 Which of these CANNOT be used as a dictionary key?
3 What does dict.get("key", "default") return if "key" does not exist?
4 What is the result of {1, 2, 3} & {2, 3, 4}?
5 How do you create an empty set in Python?
6 What is the time complexity for looking up a key in a dictionary?
Answer all questions to check your score