Subqueries: Queries Within Queries
Subqueries are powerful SQL constructs that allow you to nest one query inside another, enabling complex filtering, calculations, and data comparisons. They're the foundation for sophisticated data analysis and dynamic reporting.
Understanding Subqueries
A subquery (also called an inner query or nested query) is a SELECT statement embedded within another SQL statement. Subqueries can appear in WHERE clauses, FROM clauses, SELECT lists, and even HAVING clauses. They execute first, and their results are used by the outer query.
Subqueries enable you to answer questions like: "Show customers who spent more than the average," "Find products that have never been ordered," or "List employees who earn more than anyone in the sales department."
Subquery Types
SQL supports three main types of subqueries: Scalar subqueries return a single value (one row, one column), Row subqueries return a single row with multiple columns, and Table subqueries return multiple rows and columns. Each type serves different purposes in query construction.
Why it matters: Choosing the right subquery type is crucial for query correctness. Using a multi-row subquery with = (expects single value) causes errors. Understanding these types helps you write efficient, error-free SQL for complex analytical tasks.
Scalar Subqueries (Single Value)
Scalar subqueries return exactly one value—a single row with a single column. They can be used anywhere a single value is expected: in WHERE clauses, SELECT lists, or with comparison operators.
-- Find products priced above average
SELECT
product_id,
product_name,
price,
(SELECT AVG(price) FROM products) AS avg_price
FROM products
WHERE price > (SELECT AVG(price) FROM products)
ORDER BY price DESC;
-- Find customers who spent more than average
SELECT
customer_id,
name,
total_spent
FROM (
SELECT
c.customer_id,
c.name,
SUM(o.order_total) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
) AS customer_totals
WHERE total_spent > (
SELECT AVG(customer_total)
FROM (
SELECT SUM(order_total) AS customer_total
FROM orders
GROUP BY customer_id
) AS averages
);
-- Compare employee salary to department average
SELECT
employee_id,
name,
department,
salary,
(SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department) AS dept_avg_salary,
salary - (SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department) AS diff_from_avg
FROM employees e1
ORDER BY department, salary DESC;
Table Subqueries (Multiple Rows)
Table subqueries return multiple rows and can be used with operators like IN, NOT IN, EXISTS, NOT EXISTS, ANY, or ALL. They're perfect for filtering based on membership in a set of values.
-- Find products that have never been ordered
SELECT product_id, product_name, price
FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_items
);
-- Find customers who ordered in the last 30 days
SELECT customer_id, name, email
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
);
-- Find employees earning more than ANY sales rep
SELECT name, department, salary
FROM employees
WHERE salary > ANY (
SELECT salary
FROM employees
WHERE department = 'Sales'
)
AND department != 'Sales';
-- Find products more expensive than ALL products in 'Books' category
SELECT product_name, category, price
FROM products
WHERE price > ALL (
SELECT price
FROM products
WHERE category = 'Books'
);
Correlated Subqueries
Correlated subqueries reference columns from the outer query, executing once for each row in the outer query. They're powerful but can be performance-intensive on large datasets:
-- Find products with above-average price in their category
SELECT
p1.product_id,
p1.product_name,
p1.category,
p1.price,
(SELECT AVG(p2.price)
FROM products p2
WHERE p2.category = p1.category) AS category_avg
FROM products p1
WHERE p1.price > (
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category = p1.category
);
-- Find customers with orders above their own average
SELECT DISTINCT
c.customer_id,
c.name,
o.order_id,
o.order_total
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_total > (
SELECT AVG(order_total)
FROM orders o2
WHERE o2.customer_id = c.customer_id
);
-- EXISTS: Check if customer has any orders
SELECT customer_id, name, email
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= '2025-01-01'
);
Subqueries in FROM Clause (Derived Tables)
Using subqueries in the FROM clause creates temporary result sets (derived tables) that you can query like regular tables. This pattern is useful for multi-step calculations:
-- Calculate percentage of total sales by category
SELECT
category,
category_revenue,
total_revenue,
ROUND(category_revenue * 100.0 / total_revenue, 2) AS percent_of_total
FROM (
SELECT
p.category,
SUM(oi.price * oi.quantity) AS category_revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.category
) AS category_sales
CROSS JOIN (
SELECT SUM(price * quantity) AS total_revenue
FROM order_items
) AS total_sales
ORDER BY percent_of_total DESC;
-- Find top 5 customers per region
SELECT *
FROM (
SELECT
c.region,
c.customer_id,
c.name,
SUM(o.order_total) AS total_spent,
ROW_NUMBER() OVER (PARTITION BY c.region ORDER BY SUM(o.order_total) DESC) AS rank_in_region
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.region, c.customer_id, c.name
) AS ranked_customers
WHERE rank_in_region <= 5
ORDER BY region, rank_in_region;
Practice Questions: Subqueries
Test your subquery skills with these challenges.
Task: Write a query to show products where price exceeds the average price in their category.
Show Solution
SELECT
p1.product_id,
p1.product_name,
p1.category,
p1.price,
(SELECT AVG(p2.price)
FROM products p2
WHERE p2.category = p1.category) AS category_avg
FROM products p1
WHERE p1.price > (
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category = p1.category
)
ORDER BY p1.category, p1.price DESC;
Task: List customers who have NOT placed any orders in the last 90 days. Use NOT EXISTS or NOT IN.
Show Solution
-- Solution 1: Using NOT EXISTS (more efficient)
SELECT customer_id, name, email
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)
);
-- Solution 2: Using NOT IN
SELECT customer_id, name, email
FROM customers
WHERE customer_id NOT IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)
);
Task: Show monthly revenue for 2025 with comparison to previous month. Calculate absolute and percentage growth.
Show Solution
SELECT
year,
month,
monthly_revenue,
prev_month_revenue,
monthly_revenue - prev_month_revenue AS revenue_growth,
ROUND(
(monthly_revenue - prev_month_revenue) * 100.0 / prev_month_revenue,
2
) AS growth_percent
FROM (
SELECT
YEAR(order_date) AS year,
MONTH(order_date) AS month,
SUM(order_total) AS monthly_revenue,
(SELECT SUM(order_total)
FROM orders o2
WHERE YEAR(o2.order_date) = YEAR(o1.order_date)
AND MONTH(o2.order_date) = MONTH(o1.order_date) - 1
) AS prev_month_revenue
FROM orders o1
WHERE YEAR(order_date) = 2025
GROUP BY YEAR(order_date), MONTH(order_date)
) AS monthly_comparison
ORDER BY year, month;
Common Table Expressions (CTEs)
Common Table Expressions (CTEs) provide a cleaner, more readable alternative to subqueries. Using the WITH clause, CTEs let you define temporary named result sets that exist only for the duration of a query, making complex SQL much easier to write, understand, and maintain.
Understanding CTEs
A CTE is like a temporary view that exists only within the execution scope of a single query. Think of it as giving a name to a subquery result, which you can then reference multiple times in your main query. CTEs are especially valuable for breaking down complex queries into logical, readable steps.
-- Basic CTE syntax
WITH cte_name AS (
SELECT column1, column2, ...
FROM table
WHERE conditions
)
SELECT *
FROM cte_name
WHERE additional_conditions;
-- Real example: Calculate customer lifetime value
WITH customer_totals AS (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(order_total) AS lifetime_value,
AVG(order_total) AS avg_order_value,
MIN(order_date) AS first_order_date,
MAX(order_date) AS last_order_date
FROM orders
GROUP BY customer_id
)
SELECT
c.customer_id,
c.name,
c.email,
ct.order_count,
ct.lifetime_value,
ct.avg_order_value,
DATEDIFF(ct.last_order_date, ct.first_order_date) AS customer_lifespan_days
FROM customers c
JOIN customer_totals ct ON c.customer_id = ct.customer_id
WHERE ct.order_count >= 5
ORDER BY ct.lifetime_value DESC;
CTE Benefits
CTEs make complex queries readable by breaking them into named, logical steps. Unlike subqueries that must be repeated for each use, a CTE is defined once and can be referenced multiple times. They support recursion for hierarchical data and improve query maintainability significantly.
Why it matters: Professional SQL code prioritizes readability and maintainability. CTEs transform tangled nested subqueries into clear, step-by-step logic that colleagues can understand and modify. This is crucial for collaborative data teams and long-term code maintenance.
Multiple CTEs in One Query
You can define multiple CTEs in a single query, each building on previous ones or working independently. This creates a pipeline of data transformations:
-- Multi-step customer segmentation analysis
WITH
-- Step 1: Calculate customer metrics
customer_metrics AS (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(order_total) AS total_spent,
AVG(order_total) AS avg_order_value,
MAX(order_date) AS last_order_date
FROM orders
GROUP BY customer_id
),
-- Step 2: Assign customer segments
customer_segments AS (
SELECT
customer_id,
order_count,
total_spent,
avg_order_value,
last_order_date,
CASE
WHEN total_spent >= 5000 THEN 'VIP'
WHEN total_spent >= 1000 THEN 'Gold'
WHEN total_spent >= 500 THEN 'Silver'
ELSE 'Bronze'
END AS segment,
CASE
WHEN last_order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY) THEN 'Active'
WHEN last_order_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY) THEN 'At Risk'
ELSE 'Inactive'
END AS activity_status
FROM customer_metrics
),
-- Step 3: Calculate segment statistics
segment_stats AS (
SELECT
segment,
activity_status,
COUNT(*) AS customer_count,
AVG(total_spent) AS avg_lifetime_value,
AVG(avg_order_value) AS avg_order_size
FROM customer_segments
GROUP BY segment, activity_status
)
-- Final: Show comprehensive segment analysis
SELECT
segment,
activity_status,
customer_count,
ROUND(avg_lifetime_value, 2) AS avg_lifetime_value,
ROUND(avg_order_size, 2) AS avg_order_size,
ROUND(customer_count * 100.0 / SUM(customer_count) OVER (), 2) AS percent_of_total
FROM segment_stats
ORDER BY
FIELD(segment, 'VIP', 'Gold', 'Silver', 'Bronze'),
FIELD(activity_status, 'Active', 'At Risk', 'Inactive');
CTEs vs Subqueries: When to Use Each
While CTEs and subqueries are functionally similar, each has optimal use cases:
| Scenario | Use CTEs When... | Use Subqueries When... |
|---|---|---|
| Readability | Query has multiple steps or complex logic that benefits from named intermediate results | Query is simple with a single nested operation |
| Reusability | Need to reference the same result set multiple times in one query | Result is only needed once in one specific place |
| Recursion | Working with hierarchical data (org charts, categories, etc.) | Recursion is not needed—CTEs are the only SQL recursion option |
| Debugging | Need to test/verify intermediate results step-by-step | Logic is straightforward and doesn't need stepwise verification |
Recursive CTEs for Hierarchical Data
Recursive CTEs can reference themselves, making them perfect for traversing hierarchical structures like organizational charts, product categories, or threaded comments:
-- Recursive CTE: Employee hierarchy (org chart)
WITH RECURSIVE employee_hierarchy AS (
-- Base case: Top-level managers (no supervisor)
SELECT
employee_id,
name,
manager_id,
title,
1 AS level,
CAST(name AS CHAR(1000)) AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: Employees reporting to previous level
SELECT
e.employee_id,
e.name,
e.manager_id,
e.title,
eh.level + 1,
CONCAT(eh.path, ' > ', e.name)
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT
employee_id,
CONCAT(REPEAT(' ', level - 1), name) AS name_indented,
title,
level,
path
FROM employee_hierarchy
ORDER BY path;
-- Recursive CTE: Category tree with parent-child relationships
WITH RECURSIVE category_tree AS (
-- Base: Root categories
SELECT
category_id,
category_name,
parent_category_id,
1 AS depth,
category_name AS full_path
FROM categories
WHERE parent_category_id IS NULL
UNION ALL
-- Recursive: Child categories
SELECT
c.category_id,
c.category_name,
c.parent_category_id,
ct.depth + 1,
CONCAT(ct.full_path, ' > ', c.category_name)
FROM categories c
INNER JOIN category_tree ct ON c.parent_category_id = ct.category_id
)
SELECT * FROM category_tree
ORDER BY full_path;
Practical Example: Sales Performance Dashboard
Let's build a comprehensive sales report using multiple CTEs to calculate various metrics:
WITH
-- CTE 1: Monthly sales by product
monthly_product_sales AS (
SELECT
p.product_id,
p.product_name,
p.category,
YEAR(o.order_date) AS year,
MONTH(o.order_date) AS month,
SUM(oi.quantity) AS units_sold,
SUM(oi.price * oi.quantity) AS revenue
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= '2025-01-01'
GROUP BY p.product_id, p.product_name, p.category, YEAR(o.order_date), MONTH(o.order_date)
),
-- CTE 2: Calculate product rankings
product_rankings AS (
SELECT
product_id,
product_name,
category,
year,
month,
revenue,
RANK() OVER (PARTITION BY category, year, month ORDER BY revenue DESC) AS rank_in_category
FROM monthly_product_sales
),
-- CTE 3: Identify top performers
top_products AS (
SELECT *
FROM product_rankings
WHERE rank_in_category <= 3
)
-- Final query: Show top 3 products per category per month
SELECT
year,
month,
category,
rank_in_category,
product_name,
revenue,
ROUND(revenue * 100.0 / SUM(revenue) OVER (PARTITION BY category, year, month), 2) AS percent_of_category
FROM top_products
ORDER BY year, month, category, rank_in_category;
Practice Questions: CTEs
Master CTEs with these practical exercises.
Task: Rewrite this subquery-based query using a CTE:
SELECT * FROM customers WHERE customer_id IN (SELECT customer_id FROM orders WHERE order_date >= '2025-01-01');
Show Solution
WITH recent_customers AS (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= '2025-01-01'
)
SELECT c.*
FROM customers c
INNER JOIN recent_customers rc ON c.customer_id = rc.customer_id;
Task: Create a CTE-based query to calculate RFM (Recency, Frequency, Monetary) scores for customers.
Show Solution
WITH customer_rfm AS (
SELECT
customer_id,
DATEDIFF(CURDATE(), MAX(order_date)) AS recency_days,
COUNT(*) AS frequency,
SUM(order_total) AS monetary_value
FROM orders
GROUP BY customer_id
),
rfm_scores AS (
SELECT
customer_id,
recency_days,
frequency,
monetary_value,
CASE
WHEN recency_days <= 30 THEN 5
WHEN recency_days <= 60 THEN 4
WHEN recency_days <= 90 THEN 3
WHEN recency_days <= 180 THEN 2
ELSE 1
END AS recency_score,
CASE
WHEN frequency >= 20 THEN 5
WHEN frequency >= 10 THEN 4
WHEN frequency >= 5 THEN 3
WHEN frequency >= 2 THEN 2
ELSE 1
END AS frequency_score,
CASE
WHEN monetary_value >= 5000 THEN 5
WHEN monetary_value >= 2000 THEN 4
WHEN monetary_value >= 1000 THEN 3
WHEN monetary_value >= 500 THEN 2
ELSE 1
END AS monetary_score
FROM customer_rfm
)
SELECT
c.customer_id,
c.name,
rs.recency_days,
rs.frequency,
rs.monetary_value,
rs.recency_score,
rs.frequency_score,
rs.monetary_score,
(rs.recency_score + rs.frequency_score + rs.monetary_score) AS total_rfm_score
FROM customers c
JOIN rfm_scores rs ON c.customer_id = rs.customer_id
ORDER BY total_rfm_score DESC;
Task: Write a recursive CTE to show complete category hierarchy with product counts at each level.
Show Solution
WITH RECURSIVE category_hierarchy AS (
-- Base: Root categories
SELECT
category_id,
category_name,
parent_category_id,
1 AS level,
CAST(category_name AS CHAR(500)) AS full_path
FROM categories
WHERE parent_category_id IS NULL
UNION ALL
-- Recursive: Child categories
SELECT
c.category_id,
c.category_name,
c.parent_category_id,
ch.level + 1,
CONCAT(ch.full_path, ' > ', c.category_name)
FROM categories c
INNER JOIN category_hierarchy ch ON c.parent_category_id = ch.category_id
)
SELECT
ch.category_id,
CONCAT(REPEAT(' ', ch.level - 1), ch.category_name) AS category_display,
ch.level,
ch.full_path,
COUNT(p.product_id) AS product_count,
ROUND(AVG(p.price), 2) AS avg_price
FROM category_hierarchy ch
LEFT JOIN products p ON ch.category_id = p.category_id
GROUP BY ch.category_id, ch.category_name, ch.level, ch.full_path
ORDER BY ch.full_path;
Window Functions
Window functions are game-changers for analytics. Unlike aggregate functions that collapse rows into groups, window functions perform calculations across sets of rows while preserving individual row identity. This enables rankings, running totals, moving averages, and comparisons between current and previous rows without complex self-joins.
Window Function Basics
A window function performs a calculation across a "window" of rows that are related to the current row. The syntax includes the function name followed by OVER(), which defines the window specification. You can partition data into groups (PARTITION BY) and order rows within partitions (ORDER BY).
Window Function Syntax
Window functions follow the pattern: function_name() OVER (PARTITION BY column1 ORDER BY column2). The PARTITION BY clause divides rows into groups (similar to GROUP BY), and ORDER BY determines the sequence within each partition. The key difference from GROUP BY is that window functions don't collapse rows.
Key Insight: Window functions let you compare each row to aggregate values without losing row-level detail. This is perfect for calculating percentages of totals, ranking within categories, or finding differences from group averages.
-- Basic window function structure
SELECT
column1,
column2,
aggregate_function() OVER (
PARTITION BY partition_column
ORDER BY order_column
ROWS BETWEEN start AND end -- Optional frame clause
) AS window_result
FROM table_name;
ROW_NUMBER() - Sequential Numbering
ROW_NUMBER() assigns a unique sequential integer to each row within a partition, starting from 1. It's the most straightforward ranking function and perfect when you need a simple ordinal position or want to select the top N rows per group.
-- Number orders for each customer chronologically
SELECT
customer_id,
order_id,
order_date,
order_total,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS order_number
FROM orders;
-- Result shows 1st, 2nd, 3rd order for each customer
-- Get first order for each customer
SELECT * FROM (
SELECT
customer_id,
order_id,
order_date,
order_total,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS rn
FROM orders
) AS ranked
WHERE rn = 1;
-- Top 3 products by sales in each category
SELECT * FROM (
SELECT
category_id,
product_id,
product_name,
total_sales,
ROW_NUMBER() OVER (
PARTITION BY category_id
ORDER BY total_sales DESC
) AS sales_rank
FROM product_sales
) AS ranked
WHERE sales_rank <= 3;
RANK() and DENSE_RANK() - Handling Ties
RANK() and DENSE_RANK() both handle tied values, but differently. RANK() leaves gaps in numbering after ties, while DENSE_RANK() doesn't skip numbers. Choose RANK() when gaps matter (like sports rankings) and DENSE_RANK() when you want consecutive numbers despite ties.
-- Compare RANK vs DENSE_RANK vs ROW_NUMBER
SELECT
employee_id,
name,
department,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
-- If two people tie at #3, RANK shows 3,3,5 while DENSE_RANK shows 3,3,4
-- Rank products by profit margin within each category
SELECT
category_name,
product_name,
price,
cost,
(price - cost) / price * 100 AS profit_margin_pct,
RANK() OVER (
PARTITION BY category_name
ORDER BY (price - cost) / price DESC
) AS profitability_rank
FROM products p
JOIN categories c ON p.category_id = c.category_id
WHERE price > 0;
-- Find departments where employee ranks in top 3 by salary
SELECT
department,
employee_id,
name,
salary,
salary_rank
FROM (
SELECT
department,
employee_id,
name,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_rank
FROM employees
) AS ranked
WHERE salary_rank <= 3
ORDER BY department, salary_rank;
LAG() and LEAD() - Accessing Adjacent Rows
LAG() accesses data from a previous row, while LEAD() accesses data from a following row within the same partition. These functions are invaluable for time-series analysis, calculating differences between consecutive records, and comparing current values to historical data.
-- Calculate month-over-month revenue growth
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
SUM(order_total) AS monthly_revenue,
LAG(SUM(order_total)) OVER (ORDER BY
EXTRACT(YEAR FROM order_date),
EXTRACT(MONTH FROM order_date)
) AS prev_month_revenue,
SUM(order_total) - LAG(SUM(order_total)) OVER (ORDER BY
EXTRACT(YEAR FROM order_date),
EXTRACT(MONTH FROM order_date)
) AS revenue_change,
ROUND(
(SUM(order_total) - LAG(SUM(order_total)) OVER (ORDER BY
EXTRACT(YEAR FROM order_date),
EXTRACT(MONTH FROM order_date)
)) / LAG(SUM(order_total)) OVER (ORDER BY
EXTRACT(YEAR FROM order_date),
EXTRACT(MONTH FROM order_date)
) * 100, 2
) AS growth_pct
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
ORDER BY year, month;
-- Track customer purchase patterns
SELECT
customer_id,
order_id,
order_date,
order_total,
LAG(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS previous_order_date,
order_date - LAG(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS days_since_last_order,
LEAD(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS next_order_date
FROM orders
ORDER BY customer_id, order_date;
-- Compare current price to previous and next price changes
SELECT
product_id,
effective_date,
price,
LAG(price, 1, 0) OVER (
PARTITION BY product_id
ORDER BY effective_date
) AS previous_price,
LEAD(price, 1, price) OVER (
PARTITION BY product_id
ORDER BY effective_date
) AS next_price,
price - LAG(price, 1, 0) OVER (
PARTITION BY product_id
ORDER BY effective_date
) AS price_increase
FROM price_history;
Aggregate Window Functions - Running Totals & Moving Averages
Standard aggregate functions (SUM, AVG, COUNT, MIN, MAX) become window functions when paired with OVER(). The ROWS or RANGE clause defines the window frame - which rows to include in the calculation relative to the current row.
-- Running total of sales
SELECT
order_date,
order_id,
order_total,
SUM(order_total) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
AVG(order_total) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS seven_day_avg
FROM orders
ORDER BY order_date;
-- Calculate percentage of total sales by product
SELECT
product_id,
product_name,
category,
sales_amount,
SUM(sales_amount) OVER () AS total_sales,
ROUND(sales_amount * 100.0 / SUM(sales_amount) OVER (), 2) AS pct_of_total,
SUM(sales_amount) OVER (PARTITION BY category) AS category_total,
ROUND(sales_amount * 100.0 / SUM(sales_amount) OVER (PARTITION BY category), 2) AS pct_of_category
FROM product_sales
ORDER BY category, sales_amount DESC;
-- Moving average for trend analysis
SELECT
sale_date,
daily_revenue,
AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS seven_day_moving_avg,
AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS thirty_day_moving_avg
FROM daily_sales
ORDER BY sale_date;
Frame Clauses
| Frame Type | Meaning |
|---|---|
ROWS BETWEEN |
Physical row positions |
RANGE BETWEEN |
Logical value ranges |
UNBOUNDED PRECEDING |
Start of partition |
CURRENT ROW |
Current row position |
UNBOUNDED FOLLOWING |
End of partition |
N PRECEDING |
N rows before current |
N FOLLOWING |
N rows after current |
Common Window Functions
| Function | Purpose |
|---|---|
ROW_NUMBER() |
Unique sequential numbers |
RANK() |
Ranking with gaps |
DENSE_RANK() |
Ranking without gaps |
LAG() |
Access previous row |
LEAD() |
Access next row |
NTILE(n) |
Divide into n groups |
FIRST_VALUE() |
First value in window |
LAST_VALUE() |
Last value in window |
Practice: Window Functions
Task: Write a query to rank employees by salary (highest to lowest) within each department using DENSE_RANK().
Show Solution
SELECT
department,
employee_id,
employee_name,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_rank
FROM employees
ORDER BY department, salary_rank;
Task: For the daily_sales table, calculate the running total and 7-day moving average of revenue.
Show Solution
SELECT
sale_date,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS seven_day_moving_avg,
ROUND(
daily_revenue * 100.0 / SUM(daily_revenue) OVER (),
2
) AS pct_of_total_sales
FROM daily_sales
ORDER BY sale_date;
Task: Calculate monthly revenue and year-over-year growth percentage using LAG().
Show Solution
WITH monthly_revenue AS (
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
SUM(order_total) AS revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
)
SELECT
year,
month,
revenue AS current_revenue,
LAG(revenue, 12) OVER (
PARTITION BY month
ORDER BY year
) AS prev_year_revenue,
ROUND(
(revenue - LAG(revenue, 12) OVER (
PARTITION BY month
ORDER BY year
)) * 100.0 / LAG(revenue, 12) OVER (
PARTITION BY month
ORDER BY year
),
2
) AS yoy_growth_pct
FROM monthly_revenue
ORDER BY year, month;
Date and String Manipulation
Clean, transform, and analyze text and temporal data with SQL's powerful date and string functions. Master techniques to extract date components, perform date arithmetic, format timestamps, manipulate text, and clean messy string data for robust analytics.
Date Functions - Extracting Components
Working with dates requires extracting specific components like year, month, day, or quarter. The EXTRACT() function is SQL standard and works across most databases. You can pull out year, month, day, hour, minute, second, day of week, and more.
-- Extract date parts
SELECT
order_date,
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
EXTRACT(DAY FROM order_date) AS day,
EXTRACT(QUARTER FROM order_date) AS quarter,
EXTRACT(DOW FROM order_date) AS day_of_week, -- 0=Sunday
EXTRACT(WEEK FROM order_date) AS week_number
FROM orders;
-- Aggregate sales by year and quarter
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(QUARTER FROM order_date) AS quarter,
COUNT(*) AS order_count,
SUM(order_total) AS total_revenue,
AVG(order_total) AS avg_order_value
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(QUARTER FROM order_date)
ORDER BY year, quarter;
-- Find orders placed on weekends
SELECT
order_id,
order_date,
CASE EXTRACT(DOW FROM order_date)
WHEN 0 THEN 'Sunday'
WHEN 6 THEN 'Saturday'
END AS weekend_day,
order_total
FROM orders
WHERE EXTRACT(DOW FROM order_date) IN (0, 6)
ORDER BY order_date;
Date Arithmetic and Intervals
SQL allows you to add or subtract time intervals from dates. This is essential for calculating deadlines, finding records within date ranges, or generating time-based reports. Use INTERVAL for standard date math operations.
-- Add/subtract time intervals
SELECT
order_date,
order_date + INTERVAL '30 days' AS due_date,
order_date - INTERVAL '7 days' AS week_ago,
order_date + INTERVAL '3 months' AS three_months_later,
CURRENT_DATE - order_date AS days_since_order
FROM orders;
-- Find customers who haven't ordered in 90 days
SELECT
c.customer_id,
c.name,
MAX(o.order_date) AS last_order_date,
CURRENT_DATE - MAX(o.order_date) AS days_since_last_order
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
HAVING CURRENT_DATE - MAX(o.order_date) > 90
OR MAX(o.order_date) IS NULL
ORDER BY days_since_last_order DESC;
-- Calculate age from birthdate
SELECT
employee_id,
name,
birth_date,
EXTRACT(YEAR FROM AGE(CURRENT_DATE, birth_date)) AS age,
AGE(CURRENT_DATE, birth_date) AS exact_age
FROM employees;
Common Date Operations
DATE_TRUNC truncates a timestamp to a specified precision (year, month, day, hour). AGE calculates the interval between two dates. NOW() returns the current timestamp, while CURRENT_DATE returns just the date without time.
Pro Tip: DATE_TRUNC is perfect for grouping by time periods: DATE_TRUNC('month', timestamp) groups all timestamps in the same month together, ideal for time-series analysis and reporting.
Date Formatting and Conversion
Format dates for display or convert strings to dates using TO_CHAR() and TO_DATE(). TO_CHAR() converts dates to formatted strings, while TO_DATE() parses string dates into proper date types using format patterns.
-- Format dates for display
SELECT
order_date,
TO_CHAR(order_date, 'YYYY-MM-DD') AS iso_format,
TO_CHAR(order_date, 'Month DD, YYYY') AS long_format,
TO_CHAR(order_date, 'MM/DD/YY') AS short_format,
TO_CHAR(order_date, 'Day, Mon DD YYYY') AS full_format,
TO_CHAR(order_date, 'HH24:MI:SS') AS time_only
FROM orders;
-- Convert strings to dates
SELECT
TO_DATE('2026-01-15', 'YYYY-MM-DD') AS date1,
TO_DATE('01/15/2026', 'MM/DD/YYYY') AS date2,
TO_DATE('15-Jan-2026', 'DD-Mon-YYYY') AS date3;
-- Parse date strings in queries
SELECT *
FROM orders
WHERE order_date >= TO_DATE('2025-01-01', 'YYYY-MM-DD')
AND order_date < TO_DATE('2026-01-01', 'YYYY-MM-DD');
String Functions - Concatenation and Transformation
String manipulation is crucial for data cleaning and formatting. CONCAT() joins strings, UPPER() and LOWER() change case, TRIM() removes whitespace, and SUBSTRING() extracts portions of text.
-- Concatenate strings
SELECT
first_name,
last_name,
CONCAT(first_name, ' ', last_name) AS full_name,
CONCAT(last_name, ', ', first_name) AS formal_name,
CONCAT(email, ' (', phone, ')') AS contact_info
FROM customers;
-- Change case
SELECT
product_name,
UPPER(product_name) AS uppercase,
LOWER(product_name) AS lowercase,
INITCAP(product_name) AS title_case
FROM products;
-- Clean whitespace
SELECT
customer_name,
TRIM(customer_name) AS trimmed,
TRIM(LEADING ' ' FROM customer_name) AS left_trim,
TRIM(TRAILING ' ' FROM customer_name) AS right_trim,
LENGTH(customer_name) AS original_length,
LENGTH(TRIM(customer_name)) AS trimmed_length
FROM customers;
-- Extract substrings
SELECT
email,
SUBSTRING(email FROM 1 FOR POSITION('@' IN email) - 1) AS username,
SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain,
LEFT(phone, 3) AS area_code,
RIGHT(phone, 4) AS last_four_digits
FROM customers;
String Searching and Replacement
Search within strings using POSITION() or LIKE patterns, and replace text with REPLACE(). These functions are essential for data cleaning, validation, and transformation tasks.
-- Search for substrings
SELECT
product_name,
description
FROM products
WHERE POSITION('premium' IN LOWER(description)) > 0
OR description LIKE '%premium%';
-- Replace text
SELECT
product_name,
REPLACE(product_name, 'Jr.', 'Junior') AS formatted_name,
REPLACE(phone, '-', '') AS phone_no_dashes,
REPLACE(REPLACE(email, '.', ''), '@', ' at ') AS obfuscated_email
FROM products;
-- Clean and standardize data
SELECT
customer_id,
UPPER(TRIM(REPLACE(name, ' ', ' '))) AS cleaned_name,
REPLACE(REPLACE(REPLACE(phone, '-', ''), '(', ''), ')', '') AS cleaned_phone,
LOWER(TRIM(email)) AS cleaned_email
FROM customers;
-- Pattern matching with SIMILAR TO (regex-like)
SELECT
email
FROM customers
WHERE email SIMILAR TO '%[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}%';
Advanced String Operations
Split strings, pad text, reverse strings, and perform other advanced manipulations. SPLIT_PART() divides strings by delimiters, LPAD/RPAD add padding, and string aggregation functions combine multiple rows into one string.
-- Split delimited strings
SELECT
full_address,
SPLIT_PART(full_address, ',', 1) AS street,
SPLIT_PART(full_address, ',', 2) AS city,
SPLIT_PART(full_address, ',', 3) AS state
FROM addresses;
-- Pad strings to fixed width
SELECT
order_id,
LPAD(order_id::TEXT, 10, '0') AS padded_order_id,
RPAD(product_code, 15, '-') AS padded_code
FROM orders;
-- Aggregate strings from multiple rows
SELECT
category_id,
category_name,
STRING_AGG(product_name, ', ' ORDER BY product_name) AS products_list,
COUNT(*) AS product_count
FROM products
GROUP BY category_id, category_name;
-- Reverse strings (useful for certain parsing tasks)
SELECT
email,
REVERSE(email) AS reversed,
SUBSTRING(REVERSE(email) FROM 1 FOR POSITION('.' IN REVERSE(email)) - 1) AS tld_reversed
FROM customers;
Key Date Functions
EXTRACT(part FROM date) | Get date component |
DATE_TRUNC('unit', date) | Truncate to precision |
AGE(date1, date2) | Calculate interval |
NOW(), CURRENT_DATE | Current timestamp/date |
INTERVAL '1 day' | Time duration literal |
TO_CHAR(date, format) | Format date as string |
TO_DATE(string, format) | Parse string to date |
Essential String Functions
CONCAT(str1, str2, ...) | Join strings |
UPPER(), LOWER() | Change case |
TRIM(), LTRIM(), RTRIM() | Remove whitespace |
SUBSTRING(str FROM pos) | Extract portion |
REPLACE(str, from, to) | Substitute text |
LENGTH(), CHAR_LENGTH() | String length |
POSITION(sub IN str) | Find substring |
Practice: Date & String Manipulation
Task: Write a query to show order year, month name, and total revenue grouped by year-month.
Show Solution
SELECT
EXTRACT(YEAR FROM order_date) AS year,
TO_CHAR(order_date, 'Month') AS month_name,
EXTRACT(MONTH FROM order_date) AS month_num,
COUNT(*) AS order_count,
SUM(order_total) AS total_revenue,
ROUND(AVG(order_total), 2) AS avg_order_value
FROM orders
GROUP BY
EXTRACT(YEAR FROM order_date),
TO_CHAR(order_date, 'Month'),
EXTRACT(MONTH FROM order_date)
ORDER BY year, month_num;
Task: Clean customer data: trim whitespace, title case names, lowercase emails, extract domain from email.
Show Solution
SELECT
customer_id,
INITCAP(TRIM(REGEXP_REPLACE(name, '\s+', ' '))) AS cleaned_name,
LOWER(TRIM(email)) AS cleaned_email,
SUBSTRING(
LOWER(TRIM(email))
FROM POSITION('@' IN LOWER(TRIM(email))) + 1
) AS email_domain,
CASE
WHEN SUBSTRING(LOWER(TRIM(email)) FROM POSITION('@' IN LOWER(TRIM(email))) + 1)
LIKE '%.com' THEN 'Commercial'
WHEN SUBSTRING(LOWER(TRIM(email)) FROM POSITION('@' IN LOWER(TRIM(email))) + 1)
LIKE '%.org' THEN 'Organization'
WHEN SUBSTRING(LOWER(TRIM(email)) FROM POSITION('@' IN LOWER(TRIM(email))) + 1)
LIKE '%.edu' THEN 'Education'
ELSE 'Other'
END AS email_type
FROM customers;
Task: Calculate days as customer, cohort month, and revenue by cohort using date functions.
Show Solution
WITH customer_cohorts AS (
SELECT
c.customer_id,
c.name,
MIN(o.order_date) AS first_order_date,
MAX(o.order_date) AS last_order_date,
DATE_TRUNC('month', MIN(o.order_date)) AS cohort_month,
TO_CHAR(MIN(o.order_date), 'YYYY-MM') AS cohort_label,
COUNT(DISTINCT o.order_id) AS total_orders,
SUM(o.order_total) AS lifetime_value,
CURRENT_DATE - MIN(o.order_date) AS days_as_customer,
MAX(o.order_date) - MIN(o.order_date) AS days_active
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
)
SELECT
cohort_label,
COUNT(*) AS customers_in_cohort,
ROUND(AVG(lifetime_value), 2) AS avg_ltv,
ROUND(AVG(days_as_customer), 0) AS avg_customer_age_days,
ROUND(AVG(total_orders), 1) AS avg_orders_per_customer,
SUM(lifetime_value) AS cohort_total_revenue
FROM customer_cohorts
WHERE first_order_date IS NOT NULL
GROUP BY cohort_label, cohort_month
ORDER BY cohort_month DESC;
UNION and UNION ALL
Set operations combine results from multiple SELECT statements into a single result set. UNION merges queries while removing duplicates, UNION ALL preserves all rows including duplicates. These operations are essential for consolidating data from multiple sources or tables with similar structures.
Understanding UNION
UNION combines the results of two or more SELECT statements and returns only distinct rows. Both queries must have the same number of columns with compatible data types in corresponding positions. The column names from the first SELECT are used in the result.
UNION vs UNION ALL
UNION performs a distinct operation, removing duplicate rows from the combined result set. UNION ALL includes all rows from all queries, even if they're duplicates. UNION ALL is faster because it skips the deduplication step.
Performance Tip: Use UNION ALL when you know there are no duplicates or when duplicates are acceptable. It's significantly faster than UNION, especially with large datasets, because it avoids the sorting and comparison required for deduplication.
-- Basic UNION syntax (removes duplicates)
SELECT customer_id, name, 'Active' AS status
FROM active_customers
UNION
SELECT customer_id, name, 'Inactive' AS status
FROM inactive_customers
ORDER BY name;
-- UNION ALL (keeps all rows including duplicates)
SELECT product_id, product_name, 'Online' AS channel
FROM online_sales
UNION ALL
SELECT product_id, product_name, 'Store' AS channel
FROM store_sales
ORDER BY product_name;
Practical UNION Examples
UNION is perfect for consolidating similar data from different tables, creating unified reports from segmented data, or combining historical and current records. The key requirement is that all SELECT statements must return the same number of columns with compatible types.
-- Combine data from different time periods
SELECT
order_id,
customer_id,
order_date,
order_total,
'Current' AS data_source
FROM orders_2026
UNION ALL
SELECT
order_id,
customer_id,
order_date,
order_total,
'Archive' AS data_source
FROM orders_archive
ORDER BY order_date DESC;
-- Create unified contact list from multiple tables
SELECT
employee_id AS contact_id,
first_name,
last_name,
email,
phone,
'Employee' AS contact_type
FROM employees
UNION
SELECT
customer_id AS contact_id,
first_name,
last_name,
email,
phone,
'Customer' AS contact_type
FROM customers
UNION
SELECT
supplier_id AS contact_id,
contact_first_name,
contact_last_name,
email,
phone,
'Supplier' AS contact_type
FROM suppliers
ORDER BY last_name, first_name;
-- Combine different product categories
SELECT
product_id,
product_name,
price,
'Electronics' AS category
FROM electronics
WHERE price > 100
UNION ALL
SELECT
product_id,
product_name,
price,
'Appliances' AS category
FROM appliances
WHERE price > 100
ORDER BY price DESC;
UNION with Aggregations
You can use UNION to combine aggregated results from different tables or time periods, creating comprehensive summary reports. This is useful for comparing metrics across different segments or building dashboard queries.
-- Compare sales by channel
SELECT
'Online' AS channel,
COUNT(*) AS order_count,
SUM(order_total) AS total_revenue,
AVG(order_total) AS avg_order_value
FROM online_orders
UNION ALL
SELECT
'Store' AS channel,
COUNT(*) AS order_count,
SUM(order_total) AS total_revenue,
AVG(order_total) AS avg_order_value
FROM store_orders
UNION ALL
SELECT
'Phone' AS channel,
COUNT(*) AS order_count,
SUM(order_total) AS total_revenue,
AVG(order_total) AS avg_order_value
FROM phone_orders;
-- Monthly summary with year-to-date totals
WITH monthly_sales AS (
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
'Monthly' AS period_type,
SUM(order_total) AS revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
),
ytd_sales AS (
SELECT
EXTRACT(YEAR FROM order_date) AS year,
NULL AS month,
'YTD Total' AS period_type,
SUM(order_total) AS revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date)
)
SELECT * FROM monthly_sales
UNION ALL
SELECT * FROM ytd_sales
ORDER BY year, month NULLS LAST;
-- Product performance across regions
SELECT
'North' AS region,
category,
COUNT(*) AS units_sold,
SUM(sale_amount) AS revenue
FROM north_sales
GROUP BY category
UNION ALL
SELECT
'South' AS region,
category,
COUNT(*) AS units_sold,
SUM(sale_amount) AS revenue
FROM south_sales
GROUP BY category
UNION ALL
SELECT
'East' AS region,
category,
COUNT(*) AS units_sold,
SUM(sale_amount) AS revenue
FROM east_sales
GROUP BY category
UNION ALL
SELECT
'West' AS region,
category,
COUNT(*) AS units_sold,
SUM(sale_amount) AS revenue
FROM west_sales
GROUP BY category
ORDER BY region, revenue DESC;
UNION with CTEs for Complex Reports
Combining CTEs with UNION creates powerful reporting queries. Build separate CTEs for different data segments, then union them together for comprehensive analysis. This pattern is common in business intelligence and dashboard queries.
-- Comprehensive sales report with multiple dimensions
WITH product_sales AS (
SELECT
'Product' AS dimension,
product_name AS segment,
SUM(quantity * price) AS revenue,
COUNT(DISTINCT order_id) AS transaction_count
FROM order_details od
JOIN products p ON od.product_id = p.product_id
GROUP BY product_name
),
category_sales AS (
SELECT
'Category' AS dimension,
c.category_name AS segment,
SUM(od.quantity * od.price) AS revenue,
COUNT(DISTINCT od.order_id) AS transaction_count
FROM order_details od
JOIN products p ON od.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
GROUP BY c.category_name
),
region_sales AS (
SELECT
'Region' AS dimension,
cu.region AS segment,
SUM(od.quantity * od.price) AS revenue,
COUNT(DISTINCT o.order_id) AS transaction_count
FROM orders o
JOIN customers cu ON o.customer_id = cu.customer_id
JOIN order_details od ON o.order_id = od.order_id
GROUP BY cu.region
)
SELECT * FROM product_sales
UNION ALL
SELECT * FROM category_sales
UNION ALL
SELECT * FROM region_sales
ORDER BY dimension, revenue DESC;
Practice: UNION Operations
Task: Write a query to combine orders from current_orders and archived_orders tables, adding a source indicator.
Show Solution
SELECT
order_id,
customer_id,
order_date,
order_total,
'Current' AS source
FROM current_orders
UNION ALL
SELECT
order_id,
customer_id,
order_date,
order_total,
'Archived' AS source
FROM archived_orders
ORDER BY order_date DESC;
Task: Combine employees, customers, and vendors into a single contact list with type indicator.
Show Solution
SELECT
employee_id AS contact_id,
CONCAT(first_name, ' ', last_name) AS full_name,
email,
phone,
department AS affiliation,
'Employee' AS contact_type
FROM employees
UNION
SELECT
customer_id AS contact_id,
CONCAT(first_name, ' ', last_name) AS full_name,
email,
phone,
company AS affiliation,
'Customer' AS contact_type
FROM customers
UNION
SELECT
vendor_id AS contact_id,
contact_name AS full_name,
contact_email AS email,
contact_phone AS phone,
company_name AS affiliation,
'Vendor' AS contact_type
FROM vendors
ORDER BY contact_type, full_name;
Task: Create a hierarchical sales report showing totals by product, category, and grand total using UNION.
Show Solution
WITH product_level AS (
SELECT
1 AS level_order,
'Product' AS level_type,
c.category_name AS group_name,
p.product_name AS detail,
COUNT(*) AS order_count,
SUM(od.quantity) AS units_sold,
SUM(od.quantity * od.unit_price) AS revenue
FROM order_details od
JOIN products p ON od.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
GROUP BY c.category_name, p.product_name
),
category_level AS (
SELECT
2 AS level_order,
'Category Total' AS level_type,
c.category_name AS group_name,
'' AS detail,
COUNT(*) AS order_count,
SUM(od.quantity) AS units_sold,
SUM(od.quantity * od.unit_price) AS revenue
FROM order_details od
JOIN products p ON od.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
GROUP BY c.category_name
),
grand_total AS (
SELECT
3 AS level_order,
'Grand Total' AS level_type,
'ALL CATEGORIES' AS group_name,
'' AS detail,
COUNT(*) AS order_count,
SUM(od.quantity) AS units_sold,
SUM(od.quantity * od.unit_price) AS revenue
FROM order_details od
)
SELECT
level_type,
group_name,
detail,
order_count,
units_sold,
ROUND(revenue, 2) AS revenue,
ROUND(revenue * 100.0 / (SELECT revenue FROM grand_total), 2) AS pct_of_total
FROM product_level
UNION ALL
SELECT
level_type,
group_name,
detail,
order_count,
units_sold,
ROUND(revenue, 2) AS revenue,
ROUND(revenue * 100.0 / (SELECT revenue FROM grand_total), 2) AS pct_of_total
FROM category_level
UNION ALL
SELECT
level_type,
group_name,
detail,
order_count,
units_sold,
ROUND(revenue, 2) AS revenue,
100.00 AS pct_of_total
FROM grand_total
ORDER BY level_order, group_name, revenue DESC;
Key Takeaways
Subqueries
Nested queries enable complex filtering, calculations, and data retrieval in SELECT, FROM, and WHERE clauses for dynamic analysis
CTEs
Common Table Expressions improve query readability and enable recursive queries for hierarchical data analysis
Window Functions
Perform calculations across row sets without collapsing results, perfect for rankings, running totals, and row comparisons
Date Manipulation
Extract, format, and calculate with dates using functions like EXTRACT, DATE_TRUNC, DATE_ADD, and AGE for time analysis
String Operations
Clean and transform text data with CONCAT, SUBSTRING, REPLACE, UPPER, LOWER, and TRIM for data quality
Set Operations
UNION combines distinct results from multiple queries, while UNION ALL preserves all rows including duplicates efficiently
Knowledge Check
Test your understanding of advanced SQL techniques:
What is the main difference between a scalar subquery and a table subquery?
What keyword do you use to define a Common Table Expression (CTE)?
Which window function would you use to assign a sequential number to each row within a partition, resetting for each partition?
What is the key difference between LAG() and LEAD() window functions?
Which SQL function would you use to extract just the year from a date column?
What happens when you use UNION instead of UNION ALL to combine two result sets?