Module 2.2

Python Data Structures

Learn how to organize and store data efficiently using Python's powerful built-in data structures. Master lists, tuples, dictionaries, sets, and the elegant list comprehensions that make Python code shine!

40 min read
Beginner
Hands-on Examples
What You'll Learn
  • Lists - ordered, mutable collections
  • Tuples - immutable sequences
  • Dictionaries - key-value pairs
  • Sets - unique elements
  • List comprehensions - elegant data transformation
Contents
01

Lists

Lists are Python's most versatile data structure. Think of them as ordered containers that can hold any type of data and can be modified after creation.

What is a List?

Imagine you're making a shopping list. You write items one after another, you can add new items, cross some off, or rearrange them. A Python list works exactly the same way - it's an ordered collection of items that you can modify anytime.

Data Structure

List

A list is an ordered, mutable (changeable) collection of items enclosed in square brackets []. Think of it as a numbered container where each slot can hold any type of data - numbers, text, even other lists!

Lists maintain the order you put items in, so the first item stays first, the second stays second, and so on. This makes lists perfect for sequences of data where position matters - like daily temperatures, a playlist of songs, or steps in a process.

Key characteristics: Ordered (items have positions), Mutable (can be modified), Allow duplicates (same value can appear multiple times), Heterogeneous (can mix different data types).

Creating Your First List

Creating a list is straightforward - you use square brackets [] and separate items with commas. Unlike some programming languages that require all items to be the same type, Python lists are flexible and can hold a mix of strings, numbers, booleans, and even other lists. This flexibility makes Python particularly powerful for data science where you often work with varied data.

# Creating lists - use square brackets []
shopping_list = ["milk", "eggs", "bread", "butter"]
print(shopping_list)  # ['milk', 'eggs', 'bread', 'butter']

# Lists can hold different data types
mixed_list = ["hello", 42, 3.14, True, None]
print(mixed_list)  # ['hello', 42, 3.14, True, None]

# Empty list - two ways to create
empty1 = []
empty2 = list()
print(empty1, empty2)  # [] []

# List of numbers (very common in data science!)
temperatures = [72, 75, 79, 81, 78, 74, 71]
scores = [95.5, 87.0, 92.3, 88.7, 91.2]

Accessing List Elements: Indexing

Every item in a list has a position number called an index. Here's the crucial thing to remember: Python starts counting from 0, not 1! This is called "zero-based indexing" and it's one of the most common sources of confusion for beginners.

Why does Python start at 0? It's a convention from computer science - the index represents the "offset" from the beginning. The first item is 0 positions away from the start, the second is 1 position away, and so on. Once you get used to it, you'll find it quite intuitive.

Python also supports negative indexing, which lets you count from the end of the list. This is incredibly useful when you want the last item (index -1) without knowing the list's length.

Item "milk" "eggs" "bread" "butter"
Index 0 1 2 3
Negative Index -4 -3 -2 -1
# Indexing - accessing individual elements
fruits = ["apple", "banana", "cherry", "date", "elderberry"]

# Positive indexing (from the start)
print(fruits[0])   # 'apple'  - first element
print(fruits[1])   # 'banana' - second element
print(fruits[4])   # 'elderberry' - fifth element

# Negative indexing (from the end) - very useful!
print(fruits[-1])  # 'elderberry' - last element
print(fruits[-2])  # 'date' - second to last
print(fruits[-5])  # 'apple' - same as fruits[0]

# Common error: index out of range
# print(fruits[10])  # IndexError: list index out of range
Why negative indexing? It's incredibly useful when you want the last few items without knowing the list's length. my_list[-1] always gives you the last item!

Slicing: Getting Multiple Elements

While indexing gets you a single element, slicing lets you extract a portion (a "slice") of a list. This is one of Python's most powerful features and something you'll use constantly in data science when working with datasets.

The slicing syntax is list[start:stop:step]. Think of it as instructions: "start at this position, stop before this position, and move by this many steps." All three parts are optional - if you omit them, Python uses sensible defaults (start from beginning, go to end, step by 1).

The most important thing to remember: the stop index is exclusive - it tells Python where to stop, but that position itself is not included. So list[0:3] gives you elements at positions 0, 1, and 2 - not 3. This might seem odd at first, but it has a nice property: the number of elements you get equals stop - start.

# Slicing syntax: list[start:stop:step]
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Basic slicing - start:stop (stop is NOT included!)
print(numbers[2:5])    # [2, 3, 4] - elements at index 2, 3, 4
print(numbers[0:3])    # [0, 1, 2] - first three elements
print(numbers[:4])     # [0, 1, 2, 3] - from start to index 3
print(numbers[6:])     # [6, 7, 8, 9] - from index 6 to end
print(numbers[:])      # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] - copy entire list

Use step to skip elements:

# Using step
print(numbers[::2])    # [0, 2, 4, 6, 8] - every 2nd element
print(numbers[1::2])   # [1, 3, 5, 7, 9] - every 2nd, starting at index 1
print(numbers[::3])    # [0, 3, 6, 9] - every 3rd element

Use negative step to reverse:

# Negative step - reverse!
print(numbers[::-1])   # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] - reversed list
print(numbers[8:2:-1]) # [8, 7, 6, 5, 4, 3] - backward from 8 to 3

Negative indices work in slicing too:

# Negative indices in slicing
print(numbers[-3:])    # [7, 8, 9] - last 3 elements
print(numbers[:-2])    # [0, 1, 2, 3, 4, 5, 6, 7] - all except last 2
Remember: The stop index is exclusive - it's not included in the result. numbers[2:5] gives elements at indices 2, 3, 4 - not 5!

Modifying Lists

Lists are mutable, meaning you can change them after creation. You can modify individual elements, add new ones, or remove existing ones.

# Changing individual elements
colors = ["red", "green", "blue"]
colors[1] = "yellow"  # Replace "green" with "yellow"
print(colors)  # ['red', 'yellow', 'blue']

You can replace multiple elements at once using slicing:

# Changing multiple elements with slicing
numbers = [1, 2, 3, 4, 5]
numbers[1:4] = [20, 30, 40]  # Replace indices 1, 2, 3
print(numbers)  # [1, 20, 30, 40, 5]

The replacement doesn't have to be the same length as the slice:

# You can even replace with different number of elements!
letters = ['a', 'b', 'c', 'd', 'e']
letters[1:4] = ['X']  # Replace 3 elements with 1
print(letters) # ['a', 'X', 'e']

Essential List Methods

Lists come with powerful built-in methods. Methods are functions that "belong" to a particular data type and help you manipulate it. You call them using dot notation: list.method().

Understanding when to use each method is crucial. Want to add one item to the end? Use append(). Need to add at a specific position? Use insert(). Want to combine two lists? Use extend(). Each method has a specific purpose, and choosing the right one makes your code cleaner and more efficient.

Adding Elements

There are three main ways to add elements to a list, each with different behavior.

append() adds ONE element to the end of the list:

fruits = ["apple", "banana"]
fruits.append("cherry")
print(fruits)  # ['apple', 'banana', 'cherry']

insert() adds an element at a specific position:

fruits.insert(1, "blueberry")  # Insert at index 1
print(fruits)  # ['apple', 'blueberry', 'banana', 'cherry']

extend() adds MULTIPLE elements from another list:

fruits.extend(["date", "elderberry"])
print(fruits)  # ['apple', 'blueberry', 'banana', 'cherry', 'date', 'elderberry']
Common Mistake: Don't confuse append() with extend()!
list1 = [1, 2, 3]
list1.append([4, 5])   # Adds the list as ONE element
print(list1)           # [1, 2, 3, [4, 5]]

list2 = [1, 2, 3]
list2.extend([4, 5])   # Adds each element individually
print(list2)           # [1, 2, 3, 4, 5]

Removing Elements

Python gives you several ways to remove elements, each suited for different situations.

remove() removes by VALUE (the first occurrence):

colors = ["red", "green", "blue", "green"]
colors.remove("green")  # Removes first "green"
print(colors)           # ['red', 'blue', 'green']

pop() removes by INDEX and returns the removed value:

fruits = ["apple", "banana", "cherry"]
removed = fruits.pop(1)  # Remove index 1
print(removed)           # 'banana'
print(fruits)            # ['apple', 'cherry']

Without an argument, pop() removes the last element:

last = fruits.pop()
print(last)    # 'cherry'
print(fruits)  # ['apple']

del deletes by index or slice (doesn't return the value):

numbers = [0, 1, 2, 3, 4, 5]
del numbers[0]     # Delete first element
print(numbers)     # [1, 2, 3, 4, 5]

del numbers[1:3]   # Delete a slice
print(numbers)     # [1, 4, 5]

clear() removes all elements:

numbers.clear()
print(numbers)  # []

Searching and Counting

Finding items in a list is a common operation.

index() finds the position of the first occurrence:

fruits = ["apple", "banana", "cherry", "banana"]
print(fruits.index("banana"))  # 1 (first occurrence)
print(fruits.index("cherry"))  # 2

count() counts how many times an element appears:

print(fruits.count("banana"))  # 2
print(fruits.count("grape"))   # 0

in checks if an element exists (returns True/False):

print("banana" in fruits)      # True
print("grape" in fruits)       # False
print("grape" not in fruits)   # True

Sorting and Reversing

Python offers two approaches to sorting: sort() modifies the original list (in-place), while sorted() creates a new sorted list leaving the original unchanged.

sort() sorts in place:

numbers = [3, 1, 4, 1, 5, 9, 2, 6]
numbers.sort()
print(numbers)  # [1, 1, 2, 3, 4, 5, 6, 9]

numbers.sort(reverse=True)  # Descending order
print(numbers)  # [9, 6, 5, 4, 3, 2, 1, 1]

sorted() returns a NEW list, keeping the original:

original = [3, 1, 4, 1, 5]
sorted_list = sorted(original)
print(original)     # [3, 1, 4, 1, 5] - unchanged!
print(sorted_list)  # [1, 1, 3, 4, 5]

reverse() reverses the list in place:

letters = ['a', 'b', 'c', 'd']
letters.reverse()
print(letters)  # ['d', 'c', 'b', 'a']

You can sort by custom criteria using the key parameter:

words = ["elephant", "cat", "dog", "butterfly"]
words.sort(key=len)  # Sort by string length
print(words)  # ['cat', 'dog', 'elephant', 'butterfly']

List Operations

Beyond methods, Python provides operators that work with lists. You can combine lists with +, repeat them with *, and use built-in functions like len(), min(), max(), and sum() to analyze them.

# Concatenation with +
list1 = [1, 2, 3]
list2 = [4, 5, 6]
combined = list1 + list2
print(combined)  # [1, 2, 3, 4, 5, 6]

# Repetition with *
repeated = [0] * 5
print(repeated)  # [0, 0, 0, 0, 0]

pattern = [1, 2] * 3
print(pattern)  # [1, 2, 1, 2, 1, 2]

# Length
my_list = [10, 20, 30, 40, 50]
print(len(my_list))  # 5

# Min, Max, Sum (for numeric lists)
numbers = [4, 2, 8, 1, 9, 3]
print(min(numbers))  # 1
print(max(numbers))  # 9
print(sum(numbers))  # 27

Copying Lists: A Common Pitfall

This is one of the trickiest concepts for beginners. When you assign a list to a new variable, you're not copying it - you're creating a reference to the same list. Both variables point to the exact same data in memory.

The Reference Problem

Watch what happens when we think we're making a copy but we're actually not:

original = [1, 2, 3]
not_a_copy = original  # Both point to the SAME list!

Now if we modify not_a_copy, we're actually modifying the original too:

not_a_copy.append(4)
print(original)     # [1, 2, 3, 4] - Original changed too!
print(not_a_copy)   # [1, 2, 3, 4]

Shallow Copy Methods

To create an actual copy, use one of these three methods. All three create a shallow copy - a new list object with the same elements:

original = [1, 2, 3]

copy1 = original[:]       # Method 1: Slice notation
copy2 = list(original)    # Method 2: list() constructor  
copy3 = original.copy()   # Method 3: copy() method

Now modifying the copy doesn't affect the original:

copy1.append(4)
print(original)  # [1, 2, 3] - unchanged!
print(copy1)     # [1, 2, 3, 4]

Deep Copy for Nested Lists

Shallow copies have a limitation: if your list contains other lists (nested lists), the inner lists are still shared references. This is because shallow copy only copies the outer list, not the objects inside it.

matrix = [[1, 2], [3, 4]]
shallow = matrix.copy()

shallow[0][0] = 99
print(matrix)   # [[99, 2], [3, 4]] - Original changed!

To copy everything including nested objects, use deepcopy from the copy module:

import copy

matrix = [[1, 2], [3, 4]]
deep = copy.deepcopy(matrix)

deep[0][0] = 99
print(matrix)  # [[1, 2], [3, 4]] - Original unchanged!
print(deep)    # [[99, 2], [3, 4]]
When to use which: Use shallow copy (.copy()) for simple lists with numbers, strings, or other immutable items. Use deepcopy() when your list contains other lists, dictionaries, or any mutable objects.

Real-World Data Science Example

Let's put everything together with a practical example. Imagine you're analyzing weather data - a common data science task. We'll build this step by step.

Step 1: Set Up the Data

First, we create two parallel lists - one for temperatures and one for day names. The index connects them: index 0 is Monday's temperature, index 1 is Tuesday's, and so on.

temperatures = [72, 75, 79, 81, 78, 74, 71]
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]

Step 2: Calculate Basic Statistics

Using built-in functions, we can quickly compute the average temperature:

avg_temp = sum(temperatures) / len(temperatures)
print(f"Average temperature: {avg_temp:.1f}F")  # 75.7F

Step 3: Find Extremes

To find the hottest and coldest days, we use max() and min() to get the temperatures, then .index() to find which day it occurred:

max_temp = max(temperatures)
min_temp = min(temperatures)

hottest_day = days[temperatures.index(max_temp)]
coldest_day = days[temperatures.index(min_temp)]

print(f"Hottest: {hottest_day} at {max_temp}F")  # Thu at 81F
print(f"Coldest: {coldest_day} at {min_temp}F")  # Sun at 71F

Step 4: Filter Data

Finally, we filter to find days above average. We use enumerate() to get both the index and value while looping:

above_avg = []
for i, temp in enumerate(temperatures):
    if temp > avg_temp:
        above_avg.append(f"{days[i]}: {temp}F")
        
print("Days above average:", above_avg)
# ['Wed: 79F', 'Thu: 81F', 'Fri: 78F']

Practice Questions: Lists

Given:

prices = [29.99, 15.50, 42.00, 8.75, 31.25]

Task: Find the highest price in the list without using the max() function. Use a loop to iterate through the list.

Expected output: 42.0

Show Solution
prices = [29.99, 15.50, 42.00, 8.75, 31.25]

highest = prices[0]  # Start with first element
for price in prices:
    if price > highest:
        highest = price

print(highest)  # 42.0

Given:

colors = ["red", "green", "blue", "yellow"]

Task: Reverse the list without using .reverse() or reversed(). Output should be ["yellow", "blue", "green", "red"].

Hint: Use slice notation with a step.

Show Solution
colors = ["red", "green", "blue", "yellow"]

reversed_colors = colors[::-1]

print(reversed_colors)  # ["yellow", "blue", "green", "red"]

Given:

nums = [0, 1, 0, 3, 12, 0, 5]

Task: Move all zeros to the end while maintaining the order of non-zero elements. Output should be [1, 3, 12, 5, 0, 0, 0].

Hint: Create two separate lists using loops, then combine them.

Show Solution
nums = [0, 1, 0, 3, 12, 0, 5]

non_zeros = []
zeros = []

for x in nums:
    if x != 0:
        non_zeros.append(x)
    else:
        zeros.append(x)

result = non_zeros + zeros
print(result)  # [1, 3, 12, 5, 0, 0, 0]

Given:

nums = [3, 7, 1, 2, 8, 4, 5]  # One number from 1-8 is missing

Task: Find the missing number. The list contains numbers 1-8 but one is missing. Output should be 6.

Hint: Sum of 1 to n = n*(n+1)/2

Show Solution
nums = [3, 7, 1, 2, 8, 4, 5]

n = len(nums) + 1  # 8
expected_sum = n * (n + 1) // 2  # 36
actual_sum = sum(nums)  # 30
missing = expected_sum - actual_sum

print(missing)  # 6

Given:

items = [1, 2, 3, 4, 5, 6, 7]
k = 3  # Rotate right by 3 positions

Task: Rotate the list to the right by k positions. Output should be [5, 6, 7, 1, 2, 3, 4].

The last 3 elements move to the front.

Show Solution
items = [1, 2, 3, 4, 5, 6, 7]
k = 3

k = k % len(items)  # Handle k > len(items)
rotated = items[-k:] + items[:-k]

print(rotated)  # [5, 6, 7, 1, 2, 3, 4]
02

Tuples

Tuples are like lists, but with one crucial difference: they cannot be changed after creation. This immutability makes them perfect for data that should stay constant.

What is a Tuple?

Think of a tuple as a "frozen list." Once you create it, you can't add, remove, or change its elements. This might seem like a limitation, but it's actually a powerful feature. In programming, there are many situations where you want data to be unchangeable - like GPS coordinates, RGB color values, days of the week, or database records that should remain intact.

The word "tuple" comes from mathematics, where it describes a finite ordered sequence. You might have heard of "triple" (3 items) or "quadruple" (4 items) - "tuple" is the generic term for any number of items.

Data Structure

Tuple

A tuple is an ordered, immutable (unchangeable) collection of items enclosed in parentheses (). Once created, a tuple cannot be modified - you cannot add, remove, or change its elements.

This immutability makes tuples "hashable," meaning they can be used as dictionary keys or stored in sets - something lists cannot do. Tuples are also slightly faster than lists because Python knows they won't change.

When to use tuples: Use tuples for fixed collections that shouldn't change - coordinates (x, y), RGB colors (r, g, b), database rows, function return values with multiple items, or any data you want to protect from accidental modification.

Creating Tuples

Creating a tuple is similar to creating a list, but you use parentheses () instead of square brackets. Interestingly, the parentheses are actually optional in most cases - it's the commas that define a tuple. However, using parentheses makes your code clearer and is recommended practice.

There's one important gotcha: to create a tuple with a single element, you must include a trailing comma. Without it, Python interprets the parentheses as just grouping, not as tuple creation.

# Creating tuples - use parentheses ()
coordinates = (10, 20)
rgb_color = (255, 128, 0)  # Orange color
person = ("Priya", 30, "Engineer")

print(coordinates)  # (10, 20)
print(rgb_color)    # (255, 128, 0)

# Actually, parentheses are optional!
another_tuple = 1, 2, 3
print(another_tuple)  # (1, 2, 3)
print(type(another_tuple))  # 

# Empty tuple
empty = ()
empty2 = tuple()

# IMPORTANT: Single-element tuple needs a comma!
not_a_tuple = (42)    # This is just the integer 42
print(type(not_a_tuple))  # 

actual_tuple = (42,)  # Note the comma!
print(type(actual_tuple))  # 

# Without parentheses, still need comma
also_tuple = 42,
print(type(also_tuple))  # 
Common Mistake: (42) is NOT a tuple - it's just 42 in parentheses. For a single-element tuple, you must include a trailing comma: (42,)

Accessing Tuple Elements

Good news! Accessing elements in a tuple works exactly like lists - you use the same indexing and slicing syntax you already learned. The zero-based indexing rules apply, negative indices work from the end, and slicing extracts portions of the tuple.

The only difference is that when you slice a tuple, you get back another tuple (not a list). And since tuples are immutable, any operation that would modify them will raise an error.

# Indexing - same as lists
fruits = ("apple", "banana", "cherry", "date")

print(fruits[0])   # 'apple'
print(fruits[-1])  # 'date'
print(fruits[1:3]) # ('banana', 'cherry') - returns a tuple!

# Length
print(len(fruits))  # 4

# Check membership
print("banana" in fruits)  # True
print("grape" in fruits)   # False

# Count and index
numbers = (1, 2, 2, 3, 2, 4)
print(numbers.count(2))  # 3
print(numbers.index(3))  # 3 (first occurrence at index 3)

Why Use Immutable Tuples?

You might wonder: "Why would I want a list I can't change?" This is a great question, and the answer reveals some important programming concepts. Immutability has several powerful advantages:

Data Protection

Prevents accidental changes to important data like configuration settings or constants.

Performance

Tuples are slightly faster than lists and use less memory due to their fixed size.

Dictionary Keys

Tuples can be used as dictionary keys (lists cannot) because they're "hashable."

Multiple Return Values

Functions can return multiple values as tuples, which is a common Python pattern.

# Attempting to modify a tuple raises an error
coordinates = (10, 20)
# coordinates[0] = 15  # TypeError: 'tuple' object does not support item assignment

# Tuples as dictionary keys (lists can't do this!)
locations = {
    (40.7128, -74.0060): "New York City",
    (34.0522, -118.2437): "Los Angeles",
    (51.5074, -0.1278): "London"
}
print(locations[(40.7128, -74.0060)])  # 'New York City'

# This would fail with a list:
# locations[[40.7128, -74.0060]] = "NYC"  # TypeError: unhashable type: 'list'

Tuple Unpacking

One of Python's most elegant features is tuple unpacking - extracting multiple values from a tuple into separate variables in one line.

# Basic unpacking
coordinates = (10, 20, 30)
x, y, z = coordinates
print(x)  # 10
print(y)  # 20
print(z)  # 30

Unpacking works beautifully in loops:

# Unpacking in a loop
points = [(1, 2), (3, 4), (5, 6)]
for x, y in points:
    print(f"x={x}, y={y}")
# x=1, y=2
# x=3, y=4
# x=5, y=6

The classic Python trick for swapping variables:

# Swapping variables (classic Python trick!)
a = 10
b = 20
a, b = b, a  # Swap in one line!
print(a, b)  # 20 10

Use * for extended unpacking:

# Extended unpacking with * (Python 3+)
numbers = (1, 2, 3, 4, 5)
first, *middle, last = numbers
print(first)   # 1
print(middle)  # [2, 3, 4] - note: it's a list!
print(last)    # 5

Use underscore _ to ignore values:

# Ignore values with underscore
data = ("Priya", 30, "Engineer", "India")
name, age, _, country = data  # Ignore profession
print(f"{name}, {age}, from {country}")  # Priya, 30, from India
Pro Tip: Use underscore _ as a variable name when you want to ignore a value during unpacking. It signals to other programmers that this value is intentionally unused.

Named Tuples: Tuples with Field Names

Regular tuples access elements by index, which can be confusing. Named tuples let you access elements by name, making your code more readable.

With regular tuples, you access elements by index - which can be confusing:

# Regular tuple - hard to remember what each position means
person = ("Priya", 30, "Engineer")
print(person[0])  # What is index 0? Name? Age?

Named tuples let you access elements by name, making your code self-documenting:

# Named tuple - self-documenting!
from collections import namedtuple

# Define a named tuple type
Person = namedtuple('Person', ['name', 'age', 'job'])

# Create instances
priya = Person("Priya", 30, "Engineer")
rahul = Person(name="Rahul", age=25, job="Designer")

Access values by name (much clearer!), but they still work like regular tuples:

# Access by name (much clearer!)
print(priya.name)  # 'Priya'
print(priya.age)   # 30
print(rahul.job)   # 'Designer'

# Still works like a regular tuple
print(priya[0])    # 'Priya'
print(len(priya))  # 3

Named tuples are perfect for representing "records" or structured data:

# Useful for data that represents a "record"
Point = namedtuple('Point', ['x', 'y'])
p1 = Point(10, 20)
print(f"Point at ({p1.x}, {p1.y})")  # Point at (10, 20)

Interactive: Choose Your Data Structure

Decision Helper

Answer a few questions to find the best data structure for your use case.

Do you need to modify the data after creation?

When to Use Tuples vs Lists

Use Case List Tuple
Collection that changes Yes No
Fixed configuration/constants No Yes
Dictionary keys No Yes
Function return values Possible Preferred
Homogeneous data (same type) Common Less common
Heterogeneous data (mixed types) Possible Common
# Example: Function returning multiple values
def get_statistics(numbers):
    """Calculate and return multiple statistics as a tuple."""
    total = sum(numbers)
    count = len(numbers)
    average = total / count
    minimum = min(numbers)
    maximum = max(numbers)
    return minimum, maximum, average  # Returns a tuple

# Unpack the returned tuple
data = [10, 20, 30, 40, 50]
min_val, max_val, avg_val = get_statistics(data)
print(f"Min: {min_val}, Max: {max_val}, Avg: {avg_val}")
# Min: 10, Max: 50, Avg: 30.0

Practice Questions: Tuples

Test your understanding with these specific challenges.

Given:

coordinates = (40.7128, -74.0060)

Task: Unpack the tuple into latitude and longitude variables, then print them.

Expected output:

Latitude: 40.7128
Longitude: -74.006
View Solution
coordinates = (40.7128, -74.0060)

latitude, longitude = coordinates

print(f"Latitude: {latitude}")
print(f"Longitude: {longitude}")

Given:

grades = (85, 90, 78, 90, 92, 88, 90)

Task: Find:

  • The index of the first occurrence of 90
  • How many times 90 appears in the tuple

Expected outputs: 1 and 3

View Solution
grades = (85, 90, 78, 90, 92, 88, 90)

first_90_index = grades.index(90)
count_90 = grades.count(90)

print(first_90_index)  # 1
print(count_90)        # 3

Task: Write a function get_stats(numbers) that takes a tuple of numbers and returns a tuple containing (minimum, maximum, average).

# Test:
data = (10, 25, 3, 47, 18)
result = get_stats(data)
print(result)  # Expected: (3, 47, 20.6)
View Solution
def get_stats(numbers):
    minimum = min(numbers)
    maximum = max(numbers)
    average = sum(numbers) / len(numbers)
    return (minimum, maximum, average)

data = (10, 25, 3, 47, 18)
result = get_stats(data)
print(result)  # (3, 47, 20.6)

# Unpack the result
min_val, max_val, avg_val = get_stats(data)
print(f"Min: {min_val}, Max: {max_val}, Avg: {avg_val}")

Given:

a = 100
b = 200

Task: Swap the values of a and b using tuple unpacking (in one line).

Expected: After swap, a = 200 and b = 100

View Solution
a = 100
b = 200

a, b = b, a

print(a)  # 200
print(b)  # 100

Task: Given a list of coordinate tuples, write a function that finds and returns the tuple with the maximum distance from the origin (0, 0). Use the distance formula: √(x² + y²)

# Given:
points = [(3, 4), (1, 1), (5, 12), (2, 3)]

# Expected result: (5, 12) because √(5² + 12²) = √169 = 13
View Solution
import math

def furthest_from_origin(points):
    max_distance = 0
    furthest_point = None
    
    for point in points:
        x, y = point  # Tuple unpacking
        distance = math.sqrt(x**2 + y**2)
        
        if distance > max_distance:
            max_distance = distance
            furthest_point = point
    
    return furthest_point

points = [(3, 4), (1, 1), (5, 12), (2, 3)]
result = furthest_from_origin(points)
print(result)  # (5, 12)

# Alternative using max() with key function
furthest = max(points, key=lambda p: p[0]**2 + p[1]**2)
print(furthest)  # (5, 12)
03

Dictionaries

Dictionaries store data as key-value pairs, like a real dictionary where you look up a word (key) to find its definition (value). They're incredibly fast for lookups!

What is a Dictionary?

Imagine a phone book: you look up a person's name (the key) to find their phone number (the value). Python dictionaries work exactly the same way - they map unique keys to values, allowing lightning-fast lookups regardless of how much data you have.

Unlike lists where you access items by their position (index 0, 1, 2...), dictionaries let you access items by meaningful names. Instead of remembering that the user's age is at index 2, you simply ask for user["age"]. This makes your code more readable and less prone to errors.

Dictionaries are one of Python's most used data structures, especially in data science. JSON data from APIs, configuration files, database records - they all map naturally to dictionaries. Master dictionaries and you'll handle real-world data with ease.

Data Structure

Dictionary

A dictionary is a collection of key-value pairs enclosed in curly braces {}. Each key maps to exactly one value, and you use the key to look up its associated value. As of Python 3.7+, dictionaries maintain the order in which items were added.

Keys must be unique (no duplicates allowed) and immutable (strings, numbers, or tuples - but not lists). Values can be anything - strings, numbers, lists, other dictionaries, or any Python object.

Performance: Dictionary lookups are O(1) on average - meaning they're just as fast whether your dictionary has 10 items or 10 million. This makes dictionaries ideal for building fast lookup tables and caches.

Creating Dictionaries

You can create dictionaries using curly braces {} with colon-separated key-value pairs, or using the dict() constructor. The curly brace syntax is more common and often more readable.

# Creating dictionaries - use curly braces {}
person = {
    "name": "Priya",
    "age": 30,
    "city": "Mumbai"
}
print(person)  # {'name': 'Priya', 'age': 30, 'city': 'Mumbai'}

Keys can be any immutable type - strings, numbers, or tuples (but not lists):

# Keys can be strings, numbers, or tuples
mixed_keys = {
    "name": "Product A",
    123: "ID number",
    (1, 2): "coordinate key"
}

# Empty dictionary - two ways
empty1 = {}
empty2 = dict()

Alternative ways to create dictionaries:

# Using dict() constructor
person2 = dict(name="Rahul", age=25, city="Delhi")
print(person2)  # {'name': 'Rahul', 'age': 25, 'city': 'Delhi'}

# From list of tuples
pairs = [("a", 1), ("b", 2), ("c", 3)]
from_pairs = dict(pairs)
print(from_pairs)  # {'a': 1, 'b': 2, 'c': 3}

Accessing Values

Unlike lists where you use numeric indices, dictionaries use keys to access values. There are two main ways to get values, and understanding when to use each will save you from debugging headaches.

The first method uses square brackets: dict[key]. This is direct and clean, but it raises a KeyError if the key doesn't exist. The second method uses dict.get(key), which returns None (or a default value you specify) if the key is missing - no error, no crash.

person = {"name": "Priya", "age": 30, "city": "Mumbai"}

# Method 1: Square brackets - raises error if key doesn't exist
print(person["name"])  # 'Priya'
print(person["age"])   # 30
# print(person["job"])  # KeyError: 'job'

The safer approach - get() returns None or a default value instead of crashing:

# Method 2: get() - returns None (or default) if key doesn't exist
print(person.get("name"))      # 'Priya'
print(person.get("job"))       # None (no error!)
print(person.get("job", "N/A"))  # 'N/A' (custom default)

Checking keys and viewing contents:

# Check if key exists
print("name" in person)  # True
print("job" in person)   # False

# Get all keys, values, or both
print(person.keys())    # dict_keys(['name', 'age', 'city'])
print(person.values())  # dict_values(['Priya', 30, 'Mumbai'])
print(person.items())   # dict_items([('name', 'Priya'), ('age', 30), ('city', 'Mumbai')])
Best Practice: Use dict.get(key) when the key might not exist. Use dict[key] when you're certain the key exists or want an error if it doesn't.

Modifying Dictionaries

Dictionaries are mutable - you can add new key-value pairs, update existing values, or remove entries. The syntax is intuitive: just assign to a key (existing or new) using square brackets.

person = {"name": "Priya", "age": 30}
person["job"] = "Engineer"
print(person)
# {'name': 'Priya', 'age': 30, 'city': 'Mumbai', 'job': 'Engineer'}

# Updating existing values
person["age"] = 31  # Happy birthday!
print(person["age"])  # 31

update() adds or updates multiple key-value pairs at once:

# update() - add/update multiple pairs at once
person.update({"age": 32, "salary": 100000, "hobby": "hiking"})
print(person)
# {'name': 'Priya', 'age': 32, 'city': 'Mumbai', 'job': 'Engineer', 
#  'salary': 100000, 'hobby': 'hiking'}

setdefault() sets a value only if the key doesn't already exist:

# setdefault() - set only if key doesn't exist
person.setdefault("country", "India")  # Added (key didn't exist)
person.setdefault("name", "Rahul")     # Not changed (key exists)
print(person["country"])  # 'India'
print(person["name"])     # 'Priya' (unchanged)

Removing Items

Python offers several ways to remove dictionary entries. pop() removes a specific key and returns its value:

person = {"name": "Priya", "age": 30, "city": "Mumbai", "job": "Engineer"}

# pop() - remove by key and return value
job = person.pop("job")
print(job)     # 'Engineer'
print(person)  # {'name': 'Priya', 'age': 30, 'city': 'Mumbai'}

# pop() with default (no error if key missing)
country = person.pop("country", "Unknown")
print(country)  # 'Unknown'

Use del to remove by key without returning the value:

# del - remove by key
del person["city"]
print(person)  # {'name': 'Priya', 'age': 30}

popitem() removes and returns the last inserted pair, and clear() empties the entire dictionary:

# popitem() - remove and return last inserted pair
person["job"] = "Data Scientist"
person["salary"] = 120000
last_item = person.popitem()
print(last_item)  # ('salary', 120000)

# clear() - remove all items
person.clear()
print(person)  # {}

Iterating Over Dictionaries

Looping through dictionaries is slightly different from lists. By default, iterating over a dictionary gives you its keys. To access values or key-value pairs, use the .values() and .items() methods respectively.

student = {"name": "Priya", "grade": "A", "score": 95}

# Loop through keys (default behavior)
for key in student:
    print(key)
# name
# grade
# score

Use .values() to iterate over values only:

# Loop through values
for value in student.values():
    print(value)
# Priya
# A
# 95

Use .items() for key-value pairs (most common):

# Loop through key-value pairs (most common)
for key, value in student.items():
    print(f"{key}: {value}")
# name: Priya
# grade: A
# score: 95

Practical example: formatting output:

# Practical example: formatting output
person = {"name": "Rahul", "age": 25, "city": "Delhi", "job": "Designer"}
print("Person Details:")
print("--------------------")
for key, value in person.items():
    print(f"{key.capitalize():10} : {value}")

Nested Dictionaries

Dictionaries can contain other dictionaries as values, creating nested structures. This is extremely common when working with JSON data from APIs.

# Nested dictionary - like JSON data
company = {
    "name": "TechCorp",
    "founded": 2010,
    "employees": {
        "priya": {
            "title": "Engineer",
            "salary": 100000
        },
        "rahul": {
            "title": "Designer", 
            "salary": 90000
        }
    },
    "locations": ["Mumbai", "Bangalore", "Delhi"]
}

Accessing nested values using chained bracket notation:

# Accessing nested values
print(company["name"])                          # 'TechCorp'
print(company["employees"]["priya"]["title"])   # 'Engineer'
print(company["locations"][0])                  # 'Mumbai'

Modifying and adding to nested dictionaries:

# Modifying nested values
company["employees"]["priya"]["salary"] = 110000

# Adding to nested dict
company["employees"]["ankit"] = {"title": "Manager", "salary": 120000}

Safe access with chained .get() to avoid errors:

# Safe access with get() for nested dicts
# This avoids errors if keys don't exist
salary = company.get("employees", {}).get("vikram", {}).get("salary", "Not found")
print(salary)  # 'Not found'

Quick Preview: Dictionary Comprehensions

Just like list comprehensions (covered in Section 05), you can create dictionaries in one line:

# Create dict from two lists
names = ["Priya", "Rahul", "Ankit"]
scores = [95, 87, 92]

# Traditional way
student_scores = {}
for i in range(len(names)):
    student_scores[names[i]] = scores[i]

# Dictionary comprehension (cleaner!)
student_scores = {name: score for name, score in zip(names, scores)}
print(student_scores)  # {'Priya': 95, 'Rahul': 87, 'Ankit': 92}

# With condition
passing = {name: score for name, score in zip(names, scores) if score >= 90}
print(passing)  # {'Priya': 95, 'Ankit': 92}

# Transform values
squared = {x: x**2 for x in range(1, 6)}
print(squared)  # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

Real-World Data Science Example

Let's work with product data, which is common in e-commerce applications:

# Product data (list of dictionaries)
products = [
    {"id": 1, "name": "Laptop", "price": 999, "category": "Electronics", "stock": 50},
    {"id": 2, "name": "Phone", "price": 699, "category": "Electronics", "stock": 100},
    {"id": 3, "name": "Desk", "price": 299, "category": "Furniture", "stock": 30},
    {"id": 4, "name": "Chair", "price": 199, "category": "Furniture", "stock": 45},
    {"id": 5, "name": "Headphones", "price": 149, "category": "Electronics", "stock": 200}
]

Calculate total inventory value by looping through all products:

# Calculate total inventory value
total_value = 0
for product in products:
    total_value += product["price"] * product["stock"]
print(f"Total inventory value: ${total_value:,}")  # $174,650

Group products by category using dictionary building:

# Group products by category
by_category = {}
for product in products:
    category = product["category"]
    if category not in by_category:
        by_category[category] = []
    by_category[category].append(product["name"])

print(by_category)
# {'Electronics': ['Laptop', 'Phone', 'Headphones'], 'Furniture': ['Desk', 'Chair']}

Find the most expensive product by iterating through the list:

# Find most expensive product
most_expensive = products[0]
for product in products:
    if product["price"] > most_expensive["price"]:
        most_expensive = product
        
print(f"Most expensive: {most_expensive['name']} (${most_expensive['price']})")
# Most expensive: Laptop ($999)

Create a price lookup dictionary for instant access:

# Create price lookup dict
price_lookup = {}
for product in products:
    price_lookup[product["name"]] = product["price"]
    
print(price_lookup["Phone"])  # 699 - instant lookup!

Practice Questions: Dictionaries

Test your understanding with these specific challenges.

Given:

student = {"name": "Priya", "age": 22, "grade": "A"}

Task: Print the student's name and grade.

Expected output:

Name: Priya
Grade: A
View Solution
student = {"name": "Priya", "age": 22, "grade": "A"}

print(f"Name: {student['name']}")
print(f"Grade: {student['grade']}")

Given:

product = {"name": "Laptop", "price": 999}

Task: Add a "stock" key with value 50, then update the price to 899.

Expected result:

{"name": "Laptop", "price": 899, "stock": 50}
View Solution
product = {"name": "Laptop", "price": 999}

product["stock"] = 50
product["price"] = 899

print(product)  # {'name': 'Laptop', 'price': 899, 'stock': 50}

Given:

user = {"name": "Rahul", "email": "rahul@example.com"}

Task: Use .get() to retrieve the "phone" key. If it doesn't exist, return "Not provided".

Expected output: Not provided

View Solution
user = {"name": "Rahul", "email": "rahul@example.com"}

phone = user.get("phone", "Not provided")
print(phone)  # Not provided

Given:

words = ["apple", "banana", "apple", "cherry", "banana", "apple"]

Task: Create a dictionary that counts how many times each word appears.

Expected result: {"apple": 3, "banana": 2, "cherry": 1}

View Solution
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

print(counts)  # {'apple': 3, 'banana': 2, 'cherry': 1}

Given:

dict1 = {"a": 1, "b": 2}
dict2 = {"b": 3, "c": 4}

Task: Merge both dictionaries. If a key exists in both, use the value from dict2.

Expected result: {"a": 1, "b": 3, "c": 4}

View Solution
dict1 = {"a": 1, "b": 2}
dict2 = {"b": 3, "c": 4}

# Method 1: Using spread operator (Python 3.9+)
merged = {**dict1, **dict2}
print(merged)  # {'a': 1, 'b': 3, 'c': 4}

# Method 2: Using update()
merged2 = dict1.copy()
merged2.update(dict2)
print(merged2)  # {'a': 1, 'b': 3, 'c': 4}

# Method 3: Using | operator (Python 3.9+)
merged3 = dict1 | dict2
print(merged3)  # {'a': 1, 'b': 3, 'c': 4}
04

Sets

Sets are unordered collections of unique elements. They're perfect for removing duplicates and performing mathematical set operations like union and intersection.

What is a Set?

Think of a set like a bag of unique marbles - you can't have duplicates, and the marbles have no particular order. If you try to add a marble you already have, the bag stays the same. Sets are inspired by mathematical sets and support powerful operations like union, intersection, and difference.

Sets are incredibly useful in data science for several reasons: removing duplicates from data (just convert to a set!), finding common elements between datasets, checking membership extremely fast, and performing operations like "customers who bought both products" or "users on both platforms."

One important distinction: sets are unordered, meaning items have no position. You can't access the "first" element because there is no first - items are stored based on their hash value for fast lookup. This also means sets don't support indexing.

Data Structure

Set

A set is an unordered collection of unique, immutable elements enclosed in curly braces {}. Sets automatically remove duplicates and provide O(1) membership testing - checking if an item is in a set is nearly instant regardless of size.

Unlike lists and tuples, sets don't maintain order and don't support indexing. You can't do my_set[0]. However, sets excel at answering questions like "is this item in my collection?" and "what items do these collections have in common?"

When to use sets: Removing duplicates, fast membership testing, finding common/different elements between groups, ensuring uniqueness in data, and any situation where you care about "what" items exist rather than "where" they are or "how many" of each.

Creating Sets

You create sets using curly braces {} (same as dictionaries, but without the key:value pairs), or by using the set() constructor. One tricky thing: an empty {} creates a dictionary, not a set. For an empty set, you must use set().

# Creating sets - use curly braces {}
fruits = {"apple", "banana", "cherry"}
print(fruits)  # {'cherry', 'banana', 'apple'} - order may vary!

# Duplicates are automatically removed
numbers = {1, 2, 2, 3, 3, 3, 4}
print(numbers)  # {1, 2, 3, 4}

Great for removing duplicates from a list:

# Create set from a list (great for removing duplicates!)
names_list = ["Priya", "Rahul", "Priya", "Ankit", "Rahul"]
unique_names = set(names_list)
print(unique_names)  # {'Priya', 'Rahul', 'Ankit'}

Important: Empty set must use set(), not {}:

# Empty set - MUST use set(), not {}
empty_set = set()  # Correct
empty_dict = {}    # This is an empty dictionary, not a set!

print(type(empty_set))   # 
print(type(empty_dict))  # 

Sets can only contain immutable (hashable) elements:

# Sets can only contain immutable (hashable) elements
valid_set = {1, "hello", (1, 2, 3)}  # OK: int, str, tuple
# invalid_set = {1, [2, 3]}  # Error: lists are mutable!
Common Mistake: {} creates an empty dictionary, NOT an empty set. Use set() for an empty set.

Adding and Removing Elements

Sets are mutable - you can add and remove elements after creation. The methods are slightly different from lists because sets don't have positions. You add() a single item, update() with multiple items, and remove() or discard() to delete.

fruits = {"apple", "banana"}

# add() - add single element
fruits.add("cherry")
print(fruits)  # {'apple', 'banana', 'cherry'}

# Adding duplicate has no effect
fruits.add("apple")
print(fruits)  # {'apple', 'banana', 'cherry'} - still 3 items

Use update() to add multiple elements:

# update() - add multiple elements
fruits.update(["date", "elderberry"])
print(fruits)  # {'apple', 'banana', 'cherry', 'date', 'elderberry'}

Removing elements - two options with different behaviors:

# remove() - remove element (error if not found)
fruits.remove("date")
print(fruits)  # {'apple', 'banana', 'cherry', 'elderberry'}
# fruits.remove("grape")  # KeyError: 'grape'

# discard() - remove element (no error if not found)
fruits.discard("elderberry")
print(fruits)  # {'apple', 'banana', 'cherry'}
fruits.discard("grape")  # No error, nothing happens

Other removal methods:

# pop() - remove and return arbitrary element
popped = fruits.pop()
print(f"Removed: {popped}")  # Removed: (random element)

# clear() - remove all elements
fruits.clear()
print(fruits)  # set()

Mathematical Set Operations

This is where sets really shine! You can perform mathematical operations to compare and combine sets. These operations are fundamental in data analysis - finding customers in both groups, products unique to one category, or combining datasets while eliminating duplicates.

Each operation has both a method form (set1.union(set2)) and an operator form (set1 | set2). Use whichever feels more natural to you - they do exactly the same thing.

Union

All elements from both sets

A | B or A.union(B)
Intersection

Elements in BOTH sets

A & B or A.intersection(B)
Difference

Elements in A but NOT in B

A - B or A.difference(B)
Symmetric Diff

Elements in A OR B, but not both

A ^ B or A.symmetric_difference(B)
python_devs = {"Priya", "Rahul", "Ankit", "Neha"}
js_devs = {"Rahul", "Neha", "Vikram", "Sneha"}

Union - everyone who knows Python OR JavaScript (or both):

# Union
all_devs = python_devs | js_devs
print(all_devs)  # {'Priya', 'Rahul', 'Ankit', 'Neha', 'Vikram', 'Sneha'}

# Same thing with method
all_devs = python_devs.union(js_devs)

Intersection - developers who know BOTH Python AND JavaScript:

# Intersection
full_stack = python_devs & js_devs
print(full_stack)  # {'Rahul', 'Neha'}

Difference - Python devs who DON'T know JavaScript:

# Difference
python_only = python_devs - js_devs
print(python_only)  # {'Priya', 'Ankit'}

js_only = js_devs - python_devs
print(js_only)  # {'Vikram', 'Sneha'}

Symmetric difference - devs who know ONLY ONE language:

# Symmetric difference
one_language = python_devs ^ js_devs
print(one_language)  # {'Priya', 'Ankit', 'Vikram', 'Sneha'}

Set Comparisons

Beyond combining sets, you can compare them using subset and superset operations. A set A is a subset of B if every element in A is also in B. Conversely, B is a superset of A. These comparisons are useful for checking containment relationships - like "do all required skills exist in a candidate's skillset?"

A = {1, 2, 3}
B = {1, 2, 3, 4, 5}
C = {1, 2, 3}
D = {6, 7, 8}

# Subset - is A contained within B?
print(A.issubset(B))    # True - all of A's elements are in B
print(A <= B)           # True (operator version)
print(A < B)            # True (proper subset - A is subset but not equal)

# Superset - does B contain A?
print(B.issuperset(A))  # True - B contains all elements of A
print(B >= A)           # True
print(B > A)            # True (proper superset)

# Equality
print(A == C)           # True - same elements

# Disjoint - no common elements?
print(A.isdisjoint(D))  # True - A and D share no elements
print(A.isdisjoint(B))  # False - they share {1, 2, 3}

Practical Uses of Sets

# 1. Remove duplicates from a list
emails = ["a@test.com", "b@test.com", "a@test.com", "c@test.com", "b@test.com"]
unique_emails = list(set(emails))
print(unique_emails)  # ['a@test.com', 'b@test.com', 'c@test.com']

# 2. Fast membership testing (much faster than lists!)
valid_users = {"priya", "rahul", "ankit", "neha"}

username = "rahul"
if username in valid_users:  # O(1) average time!
    print("Access granted")
else:
    print("Access denied")

# 3. Find common elements between lists
list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
common = set(list1) & set(list2)
print(common)  # {4, 5}

# 4. Find unique elements
all_elements = set(list1) | set(list2)
print(all_elements)  # {1, 2, 3, 4, 5, 6, 7, 8}

# 5. Check for duplicates
def has_duplicates(lst):
    return len(lst) != len(set(lst))

print(has_duplicates([1, 2, 3, 4]))     # False
print(has_duplicates([1, 2, 2, 3]))     # True

Frozen Sets: Immutable Sets

Just like tuples are immutable lists, frozen sets are immutable sets. They can be used as dictionary keys or elements of other sets.

# Creating a frozen set
frozen = frozenset([1, 2, 3, 4])
print(frozen)  # frozenset({1, 2, 3, 4})

# Can't modify frozen sets
# frozen.add(5)  # AttributeError: 'frozenset' object has no attribute 'add'

# But can do set operations (returns new frozenset)
other = frozenset([3, 4, 5, 6])
print(frozen | other)  # frozenset({1, 2, 3, 4, 5, 6})
print(frozen & other)  # frozenset({3, 4})

# Use as dictionary key
cache = {
    frozenset(["read", "write"]): "full_access",
    frozenset(["read"]): "read_only"
}
permissions = frozenset(["read", "write"])
print(cache[permissions])  # 'full_access'

Real-World Data Science Example

# Analyzing user engagement across platforms
instagram_users = {"priya", "rahul", "ankit", "neha", "vikram"}
twitter_users = {"rahul", "ankit", "sneha", "meera"}
linkedin_users = {"priya", "ankit", "neha", "amit", "karan"}

# Users on ALL platforms
power_users = instagram_users & twitter_users & linkedin_users
print(f"Power users (all platforms): {power_users}")  # {'ankit'}

# Users on ANY platform
all_users = instagram_users | twitter_users | linkedin_users
print(f"Total unique users: {len(all_users)}")  # 9

# Users ONLY on Instagram
instagram_exclusive = instagram_users - twitter_users - linkedin_users
print(f"Instagram only: {instagram_exclusive}")  # {'vikram'}

# Users on exactly 2 platforms
on_insta_twitter = instagram_users & twitter_users
on_insta_linkedin = instagram_users & linkedin_users
on_twitter_linkedin = twitter_users & linkedin_users
two_platforms = (on_insta_twitter | on_insta_linkedin | on_twitter_linkedin) - power_users
print(f"Users on exactly 2 platforms: {two_platforms}")

# Calculate platform overlap percentages
def overlap_percentage(set1, set2):
    if not set1 or not set2:
        return 0
    overlap = len(set1 & set2)
    return (overlap / min(len(set1), len(set2))) * 100

print(f"Instagram-Twitter overlap: {overlap_percentage(instagram_users, twitter_users):.1f}%")

Practice Questions: Sets

Test your understanding of sets with these specific problems:

Problem: Given a list with duplicate values, return a sorted list of unique values using sets.

Given: numbers = [4, 2, 7, 2, 1, 4, 7, 8, 1, 3]

Expected Output: [1, 2, 3, 4, 7, 8]

View Solution
numbers = [4, 2, 7, 2, 1, 4, 7, 8, 1, 3]

# Convert to set (removes duplicates), then to sorted list
unique_sorted = sorted(set(numbers))

print(unique_sorted)  # [1, 2, 3, 4, 7, 8]

Problem: Given two sets, calculate union, intersection, and difference.

Given:

A = {1, 2, 3, 4, 5}

B = {4, 5, 6, 7, 8}

Expected Outputs:

  • Union (A | B): {1, 2, 3, 4, 5, 6, 7, 8}
  • Intersection (A & B): {4, 5}
  • Difference (A - B): {1, 2, 3}
  • Symmetric Difference (A ^ B): {1, 2, 3, 6, 7, 8}
View Solution
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}

# Union - all elements from both sets
print(A | B)  # {1, 2, 3, 4, 5, 6, 7, 8}

# Intersection - elements in both sets
print(A & B)  # {4, 5}

# Difference - elements in A but not in B
print(A - B)  # {1, 2, 3}

# Symmetric Difference - elements in either, but not both
print(A ^ B)  # {1, 2, 3, 6, 7, 8}

Problem: Find elements that appear in ALL three lists.

Given:

list1 = [1, 2, 3, 4, 5]

list2 = [3, 4, 5, 6, 7]

list3 = [4, 5, 6, 8, 9]

Expected Output: {4, 5}

View Solution
list1 = [1, 2, 3, 4, 5]
list2 = [3, 4, 5, 6, 7]
list3 = [4, 5, 6, 8, 9]

# Convert to sets and find intersection
common = set(list1) & set(list2) & set(list3)

print(common)  # {4, 5}

Problem: Determine subset/superset relationships between sets.

Given:

small = {1, 2, 3}

medium = {1, 2, 3, 4, 5}

other = {1, 2, 6}

Expected Outputs:

  • Is small subset of medium? True
  • Is medium superset of small? True
  • Is small subset of other? False
  • Are small and other disjoint? False (they share 1, 2)
View Solution
small = {1, 2, 3}
medium = {1, 2, 3, 4, 5}
other = {1, 2, 6}

# Subset check (all elements in small are in medium)
print(small.issubset(medium))  # True
print(small <= medium)  # True (alternative syntax)

# Superset check (medium contains all of small)
print(medium.issuperset(small))  # True
print(medium >= small)  # True (alternative syntax)

# Is small a subset of other?
print(small.issubset(other))  # False (3 is not in other)

# Are sets disjoint (no common elements)?
print(small.isdisjoint(other))  # False (1, 2 are common)

Problem: Given a required set and an actual set, find what's missing and what's extra.

Given:

required = {"Python", "SQL", "Statistics", "ML", "Visualization"}

actual = {"Python", "SQL", "JavaScript", "Git", "ML"}

Expected Outputs:

  • Missing (in required but not actual): {"Statistics", "Visualization"}
  • Extra (in actual but not required): {"JavaScript", "Git"}
  • Match count: 3 out of 5
View Solution
required = {"Python", "SQL", "Statistics", "ML", "Visualization"}
actual = {"Python", "SQL", "JavaScript", "Git", "ML"}

# Missing elements (in required but not in actual)
missing = required - actual
print(f"Missing: {missing}")  # {'Statistics', 'Visualization'}

# Extra elements (in actual but not in required)
extra = actual - required
print(f"Extra: {extra}")  # {'JavaScript', 'Git'}

# Matching elements
matching = required & actual
print(f"Matching: {matching}")  # {'Python', 'SQL', 'ML'}
print(f"Match count: {len(matching)} out of {len(required)}")
05

Comprehensions

Comprehensions are Python's elegant way to create and transform data structures in a single, readable line. Master these and your code will become both shorter and clearer!

What are Comprehensions?

Imagine you have a list of temperatures in Celsius and need to convert them all to Fahrenheit. The traditional approach requires creating an empty list, writing a loop, doing the calculation, and appending each result. That's 4 lines of code for something conceptually simple.

Comprehensions let you express that same logic in a single, readable line. They're one of Python's most beloved features and are considered "Pythonic" - meaning they represent the elegant, concise style that experienced Python developers prefer.

At first, comprehensions might look strange, but once you get used to reading them, they're actually clearer than loops because the intent is immediately visible: you're creating a new collection by transforming each item from another collection.

Python Feature

Comprehension

A comprehension is a compact syntax for creating new collections by applying an expression to each item in an iterable, optionally filtering with conditions. Think of it as a one-line loop that builds a list, dictionary, or set.

The general pattern is: [what_to_do for item in collection if condition]. Read it as "for each item in the collection, if the condition is true, do this to it."

Why comprehensions matter: They're not just about writing less code. Comprehensions are often faster than equivalent loops because Python optimizes them internally. They're also the idiomatic way to transform data in Python - you'll see them everywhere in professional code.

Traditional Loop vs Comprehension

Let's see the same task done both ways. The comprehension version is not only shorter but also makes the intent clearer - "I'm creating a list of squares."

# Traditional way - 4 lines
squares = []
for x in range(5):
    squares.append(x ** 2)
print(squares)  # [0, 1, 4, 9, 16]

# List comprehension - 1 line!
squares = [x ** 2 for x in range(5)]
print(squares)  # [0, 1, 4, 9, 16]

# Much cleaner and more "Pythonic"!

List Comprehensions

The basic syntax is: [expression for item in iterable]. Read this as "create a list containing expression for each item in the iterable." The expression can be any valid Python expression - a calculation, a method call, or even just the item itself.

# Basic syntax: [expression for item in iterable]

# Example 1: Square numbers
squares = [x ** 2 for x in range(10)]
print(squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# Example 2: Convert temperatures
celsius = [0, 10, 20, 30, 40]
fahrenheit = [(c * 9/5) + 32 for c in celsius]
print(fahrenheit)  # [32.0, 50.0, 68.0, 86.0, 104.0]

You can also call methods on each item. Here we're calling .upper() on each name:

# Example 3: String operations
names = ["priya", "rahul", "ankit"]
capitalized = [name.upper() for name in names]
print(capitalized)  # ['PRIYA', 'RAHUL', 'ANKIT']

Comprehensions work great with dictionaries too - extract specific fields from a list of dicts:

# Example 4: Extract data from dicts
people = [
    {"name": "Priya", "age": 30},
    {"name": "Rahul", "age": 25},
    {"name": "Ankit", "age": 35}
]
names = [person["name"] for person in people]
print(names)  # ['Priya', 'Rahul', 'Ankit']

Adding Conditions (Filtering)

You can add an if clause to filter which items get included. The syntax becomes: [expression for item in iterable if condition]. Only items where the condition is True will appear in the result.

This is incredibly powerful for data cleaning and filtering - extracting only the rows you want, removing invalid data, or selecting items that meet certain criteria.

# Syntax: [expression for item in iterable if condition]

# Example 1: Even numbers only
evens = [x for x in range(20) if x % 2 == 0]
print(evens)  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

# Example 2: Filter by length
words = ["apple", "hi", "banana", "go", "cherry"]
long_words = [w for w in words if len(w) > 3]
print(long_words)  # ['apple', 'banana', 'cherry']

You can combine multiple conditions using and and or:

# Example 3: Filter positive numbers
numbers = [-5, 3, -2, 8, -1, 7, -4]
positive = [n for n in numbers if n > 0]
print(positive)  # [3, 8, 7]

# Example 4: Multiple conditions
numbers = range(100)
# Divisible by both 3 AND 5
div_3_and_5 = [n for n in numbers if n % 3 == 0 and n % 5 == 0]
print(div_3_and_5)  # [0, 15, 30, 45, 60, 75, 90]

This is especially powerful when filtering data structures like lists of dictionaries:

# Example 5: Filter from list of dicts
products = [
    {"name": "Laptop", "price": 999, "in_stock": True},
    {"name": "Phone", "price": 699, "in_stock": False},
    {"name": "Tablet", "price": 399, "in_stock": True}
]
available = [p["name"] for p in products if p["in_stock"]]
print(available)  # ['Laptop', 'Tablet']

If-Else in Comprehensions

When you want to transform values differently based on a condition, put the if-else before the for: [expr1 if condition else expr2 for item in iterable]

# Syntax: [value_if_true if condition else value_if_false for item in iterable]

# Example 1: Classify as even/odd
labels = ["even" if x % 2 == 0 else "odd" for x in range(5)]
print(labels)  # ['even', 'odd', 'even', 'odd', 'even']

# Example 2: Cap values at maximum
numbers = [5, 15, 25, 35, 45]
capped = [n if n <= 20 else 20 for n in numbers]
print(capped)  # [5, 15, 20, 20, 20]

Replace negative values with zero:

# Example 3: Replace negative with zero
values = [10, -5, 20, -15, 30]
non_negative = [v if v >= 0 else 0 for v in values]
print(non_negative)  # [10, 0, 20, 0, 30]

Classify values into categories:

# Example 4: Grade classification
scores = [45, 72, 88, 55, 95, 60]
grades = ["Pass" if s >= 60 else "Fail" for s in scores]
print(grades)  # ['Fail', 'Pass', 'Pass', 'Fail', 'Pass', 'Pass']

Handle missing data by providing defaults:

# Example 5: Handle missing data
data = [10, None, 20, None, 30]
cleaned = [x if x is not None else 0 for x in data]
print(cleaned)  # [10, 0, 20, 0, 30]
Remember the difference:
- [x for x in items if condition] - filters items (if at END)
- [a if condition else b for x in items] - transforms items (if-else at START)

Nested Comprehensions

You can have multiple for clauses to work with nested data or create combinations.

Flattening a 2D list into a single list:

# Flatten a 2D list
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]
flat = [num for row in matrix for num in row]
print(flat)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Equivalent traditional loop:
flat = []
for row in matrix:
    for num in row:
        flat.append(num)

Creating combinations of multiple lists:

# Create all combinations
colors = ["red", "blue"]
sizes = ["S", "M", "L"]
combinations = [(color, size) for color in colors for size in sizes]
print(combinations)
# [('red', 'S'), ('red', 'M'), ('red', 'L'), ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]

Creating nested structures like a multiplication table:

# Create multiplication table
table = [[i * j for j in range(1, 4)] for i in range(1, 4)]
print(table)  # [[1, 2, 3], [2, 4, 6], [3, 6, 9]]

Transposing a matrix (swap rows and columns):

# Transpose a matrix (swap rows and columns)
matrix = [
    [1, 2, 3],
    [4, 5, 6]
]
transposed = [[row[i] for row in matrix] for i in range(len(matrix[0]))]
print(transposed)  # [[1, 4], [2, 5], [3, 6]]

Dictionary Comprehensions

Same concept, but creates dictionaries: {key: value for item in iterable}

# Basic syntax: {key_expr: value_expr for item in iterable}

# Example 1: Square mapping
squares = {x: x**2 for x in range(6)}
print(squares)  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

Create a dictionary from two lists using zip():

# Example 2: From two lists
names = ["Priya", "Rahul", "Ankit"]
scores = [95, 87, 92]
student_scores = {name: score for name, score in zip(names, scores)}
print(student_scores)  # {'Priya': 95, 'Rahul': 87, 'Ankit': 92}

A common use case is swapping keys and values:

# Example 3: Swap keys and values
original = {"a": 1, "b": 2, "c": 3}
swapped = {v: k for k, v in original.items()}
print(swapped)  # {1: 'a', 2: 'b', 3: 'c'}

Filtering while creating a dictionary:

# Example 4: With filtering
scores = {"Priya": 95, "Rahul": 67, "Ankit": 82, "Neha": 45}
passing = {name: score for name, score in scores.items() if score >= 70}
print(passing)  # {'Priya': 95, 'Ankit': 82}

Transform values while keeping keys:

# Example 5: Transform values
prices = {"apple": 1.0, "banana": 0.5, "orange": 0.75}
discounted = {item: price * 0.9 for item, price in prices.items()}
print(discounted)  # {'apple': 0.9, 'banana': 0.45, 'orange': 0.675}

Count character frequency in a string:

# Example 6: Count character frequency
text = "hello world"
char_count = {char: text.count(char) for char in set(text)}
print(char_count)  # {'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1}

Set Comprehensions

Creates a set instead of a list: {expression for item in iterable}

# Basic syntax: {expression for item in iterable}

# Example 1: Unique squares
squares = {x**2 for x in range(-3, 4)}
print(squares)  # {0, 1, 4, 9} - duplicates removed!

# Example 2: Unique first letters
names = ["Priya", "Rahul", "Pooja", "Ankit", "Amit"]
first_letters = {name[0] for name in names}
print(first_letters)  # {'P', 'R', 'A'}

# Example 3: With filtering
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
even_unique = {n for n in numbers if n % 2 == 0}
print(even_unique)  # {2, 4}

# Example 4: Extract unique categories
products = [
    {"name": "Laptop", "category": "Electronics"},
    {"name": "Phone", "category": "Electronics"},
    {"name": "Desk", "category": "Furniture"},
    {"name": "Chair", "category": "Furniture"}
]
categories = {p["category"] for p in products}
print(categories)  # {'Electronics', 'Furniture'}

When NOT to Use Comprehensions

Comprehensions are powerful, but they can make code harder to read when overused. Here are guidelines:

Use Comprehensions When:
  • The logic is simple and fits on one line
  • You're creating a new collection from existing data
  • The transformation is straightforward
  • Readability is not sacrificed
Avoid Comprehensions When:
  • The logic requires multiple statements
  • You need complex nested conditions
  • The line becomes too long (>80 chars)
  • Side effects are needed (printing, etc.)
# GOOD - simple and readable
squares = [x**2 for x in range(10)]

# BAD - too complex, use a regular loop instead
# result = [x.strip().lower().replace(' ', '_') for x in data if x and len(x.strip()) > 3 and not x.startswith('#')]

# BETTER - break it down
result = []
for x in data:
    if x and len(x.strip()) > 3 and not x.startswith('#'):
        cleaned = x.strip().lower().replace(' ', '_')
        result.append(cleaned)

Real-World Data Science Examples

# Data cleaning with comprehensions
raw_data = ["  Priya  ", "RAHUL", " ankit ", "", "  Neha", None]

# Clean and normalize names
clean_names = [name.strip().title() for name in raw_data if name and name.strip()]
print(clean_names)  # ['Priya', 'Rahul', 'Ankit', 'Neha']

# Convert list of dicts to specific format
api_response = [
    {"userId": 1, "name": "Priya", "email": "priya@test.com", "active": True},
    {"userId": 2, "name": "Rahul", "email": "rahul@test.com", "active": False},
    {"userId": 3, "name": "Ankit", "email": "ankit@test.com", "active": True}
]

# Extract active user emails
active_emails = [user["email"] for user in api_response if user["active"]]
print(active_emails)  # ['priya@test.com', 'ankit@test.com']

# Create lookup dictionary
user_lookup = {user["userId"]: user["name"] for user in api_response}
print(user_lookup)  # {1: 'Priya', 2: 'Rahul', 3: 'Ankit'}

# Calculate statistics from data
sales = [
    {"product": "A", "amount": 100},
    {"product": "B", "amount": 200},
    {"product": "A", "amount": 150},
    {"product": "B", "amount": 300},
    {"product": "C", "amount": 250}
]

# Total by product
from collections import defaultdict
totals = defaultdict(int)
for sale in sales:
    totals[sale["product"]] += sale["amount"]
    
# Or with comprehension (advanced)
products = {s["product"] for s in sales}
totals = {p: sum(s["amount"] for s in sales if s["product"] == p) for p in products}
print(totals)  # {'A': 250, 'B': 500, 'C': 250}

Practice Questions: Comprehensions

Test your understanding of comprehensions with these specific problems:

Problem: Create a list comprehension that generates squares of even numbers from 1 to 10.

Given: Numbers 1 through 10

Expected Output: [4, 16, 36, 64, 100]

View Solution
# Square only the even numbers from 1-10
result = [x**2 for x in range(1, 11) if x % 2 == 0]

print(result)  # [4, 16, 36, 64, 100]

# Breaking it down:
# Even numbers: 2, 4, 6, 8, 10
# Their squares: 4, 16, 36, 64, 100

Problem: Extract the first character from each word and combine into a string (acronym).

Given: words = ["Application", "Programming", "Interface"]

Expected Output: "API"

View Solution
words = ["Application", "Programming", "Interface"]

# Get first character of each word
first_chars = [word[0] for word in words]
print(first_chars)  # ['A', 'P', 'I']

# Join into acronym
acronym = "".join(first_chars)
print(acronym)  # "API"

# Or in one line
acronym = "".join([word[0] for word in words])
print(acronym)  # "API"

Problem: From a list of words, extract only words longer than 4 characters and convert them to uppercase.

Given: words = ["cat", "elephant", "dog", "butterfly", "ant", "tiger"]

Expected Output: ["ELEPHANT", "BUTTERFLY", "TIGER"]

View Solution
words = ["cat", "elephant", "dog", "butterfly", "ant", "tiger"]

# Filter (len > 4) and transform (uppercase)
result = [word.upper() for word in words if len(word) > 4]

print(result)  # ['ELEPHANT', 'BUTTERFLY', 'TIGER']

# Long form equivalent:
result = []
for word in words:
    if len(word) > 4:
        result.append(word.upper())

Problem: Swap keys and values in a dictionary using dictionary comprehension.

Given: original = {"a": 1, "b": 2, "c": 3}

Expected Output: {1: "a", 2: "b", 3: "c"}

View Solution
original = {"a": 1, "b": 2, "c": 3}

# Swap keys and values
swapped = {value: key for key, value in original.items()}

print(swapped)  # {1: 'a', 2: 'b', 3: 'c'}

# Note: This only works if all values are unique and hashable

Problem: Flatten a 2D matrix into a 1D list using nested list comprehension.

Given:

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

Expected Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

View Solution
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

# Nested comprehension to flatten
flattened = [num for row in matrix for num in row]

print(flattened)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Read as: "for each row in matrix, for each num in row, take num"

# Equivalent loop:
flattened = []
for row in matrix:
    for num in row:
        flattened.append(num)

Key Takeaways

Lists are Mutable

Use lists when you need an ordered collection that can be modified - add, remove, or change elements freely.

Tuples are Immutable

Use tuples for fixed data like coordinates, RGB colors, or database records that shouldn't change.

Dictionaries for Lookup

Use dictionaries when you need fast access to data by a unique key - O(1) average lookup time!

Sets for Uniqueness

Use sets when you need to eliminate duplicates or perform set operations like union and intersection.

Comprehensions are Powerful

List, dict, and set comprehensions create new collections in one line - more readable and often faster.

Choose Wisely

Each data structure has strengths. Lists for order, dicts for mapping, sets for uniqueness, tuples for immutability.

Knowledge Check

Quick Quiz

Test what you've learned about Python data structures

1 Which data structure would you use to store a collection of unique email addresses?
2 What is the output of len({'a': 1, 'b': 2, 'a': 3})?
3 Which of the following creates a tuple with one element?
4 What does [x**2 for x in range(3)] produce?
5 How do you access the value associated with key 'name' in a dictionary person?
6 What is the result of {1, 2, 3} & {2, 3, 4}?
Answer all questions to check your score