SQL Aggregations | Data Analytics Course

Introduction to SQL Aggregations

SQL aggregate functions are the powerhouse of data analysis, transforming millions of individual records into meaningful insights. Whether you're calculating total sales, finding average customer ratings, or counting active users, aggregate functions make complex calculations simple and efficient.

Why Aggregations Matter

In the real world, raw data tells only part of the story. A table with a million transactions is overwhelming, but aggregate functions can instantly answer critical business questions: What's our total revenue? How many customers bought products this month? What's the average order value? These insights drive strategic decisions across every industry.

SQL provides five fundamental aggregate functions—COUNT, SUM, AVG, MIN, and MAX—that work on sets of rows to produce single summary values. Combined with GROUP BY and HAVING clauses, these functions become incredibly powerful tools for segmentation, trend analysis, and business intelligence reporting.

Key Concept

Aggregate Functions

Aggregate functions perform calculations on multiple rows of data and return a single result. Unlike regular functions that operate row-by-row, aggregates summarize entire datasets or groups of rows into meaningful metrics like totals, averages, counts, minimums, and maximums.

Why it matters: Every business report—from sales dashboards to customer analytics—relies on aggregations. Understanding these functions is essential for data analysts, business intelligence professionals, and anyone working with databases to extract insights from raw data.

The Five Core Aggregate Functions

SQL's aggregate functions are designed to answer specific types of questions about your data. Let's explore what each function does:

Function	Purpose	Example Use Case
COUNT()	Counts the number of rows or non-NULL values	How many customers placed orders this month? How many products are in each category?
SUM()	Calculates the total sum of numeric values	What's the total revenue for Q4? What's the sum of all inventory quantities?
AVG()	Computes the arithmetic mean of numeric values	What's the average order value? What's the mean customer satisfaction score?
MIN()	Finds the smallest value in a set	What's the lowest price in our catalog? When was the first order placed?
MAX()	Finds the largest value in a set	What's the highest salary in the company? What's the most recent transaction date?

Real-World Applications

Aggregate functions appear everywhere in data-driven organizations. Here are common scenarios you'll encounter:

Sales Analytics

Calculate total revenue, average order values, and sales counts by region, product, or time period to track performance and identify trends.

Customer Insights

Count active users, calculate average lifetime value, find top spenders, and segment customers based on aggregated behavior patterns.

Inventory Management

Sum total inventory quantities, find minimum stock levels, calculate average turnover rates, and monitor reorder thresholds.

Basic Aggregation Example

Let's start with a simple example using a sales table. Suppose we have this data:

-- Sample sales table
CREATE TABLE sales (
  sale_id INT,
  product_name VARCHAR(100),
  category VARCHAR(50),
  price DECIMAL(10,2),
  quantity INT,
  sale_date DATE
);

-- View some sample data
SELECT * FROM sales LIMIT 5;

Now let's use aggregate functions to get insights from this data:

-- Total number of sales transactions
SELECT COUNT(*) AS total_transactions
FROM sales;
-- Result: 1247

-- Total revenue across all sales
SELECT SUM(price * quantity) AS total_revenue
FROM sales;
-- Result: 284,592.50

-- Average price per product
SELECT AVG(price) AS average_price
FROM sales;
-- Result: 45.67

-- Lowest and highest prices
SELECT 
  MIN(price) AS lowest_price,
  MAX(price) AS highest_price
FROM sales;
-- Result: lowest_price = 5.99, highest_price = 299.99

Pro Tip: Always use meaningful aliases (AS keyword) for aggregate columns. Instead of "COUNT(*)" in your results, use "total_customers" or "order_count"—this makes reports much more readable and professional.

Remember: Aggregate functions operate on the entire result set unless you use GROUP BY. Without grouping, you get one summary row for all data. We'll explore GROUP BY in detail in Section 4.

Key Insight: While aggregate functions are powerful alone, their true potential emerges when combined with GROUP BY to analyze data by categories—sales by region, customers by country, products by category, etc.

COUNT & SUM Functions

COUNT and SUM are the workhorses of data analysis—COUNT tells you how many records exist, while SUM calculates totals. Mastering their variations and nuances is essential for accurate reporting and data validation.

Understanding COUNT()

The COUNT function has three important variations, and understanding the differences is critical for accurate analysis:

Syntax	Behavior
*COUNT()**	Counts all rows in the result set, including rows with NULL values. This is the most common form.
COUNT(column_name)	Counts only non-NULL values in the specified column. NULLs are ignored completely.
COUNT(DISTINCT column)	Counts only unique non-NULL values. Duplicates and NULLs are excluded from the count.

Let's see these variations in action with a customers table:

-- Sample customers table
CREATE TABLE customers (
  customer_id INT,
  name VARCHAR(100),
  email VARCHAR(100),
  phone VARCHAR(20),
  city VARCHAR(50)
);

-- Count all customers (includes everyone)
SELECT COUNT(*) AS total_customers
FROM customers;
-- Result: 500

-- Count customers with email addresses (excludes NULLs)
SELECT COUNT(email) AS customers_with_email
FROM customers;
-- Result: 487 (13 customers have NULL emails)

-- Count unique cities (no duplicates)
SELECT COUNT(DISTINCT city) AS unique_cities
FROM customers;
-- Result: 42

Important: COUNT(*) and COUNT(column) can return different results! Always use COUNT(*) when you want to count all rows, regardless of NULL values. Use COUNT(column) only when you specifically want to exclude NULLs.

The SUM() Function

SUM calculates the total of numeric values. It's commonly used for revenue calculations, quantity totals, and accumulating numeric metrics. Like COUNT(column), SUM automatically ignores NULL values.

-- Calculate total revenue from orders
SELECT SUM(order_total) AS total_revenue
FROM orders
WHERE order_date >= '2025-01-01';
-- Result: 1,247,850.00

-- Calculate total quantity sold across all products
SELECT SUM(quantity) AS total_units_sold
FROM order_items;
-- Result: 28,459

-- Sum with calculation: total revenue from price * quantity
SELECT SUM(price * quantity) AS calculated_revenue
FROM order_items;
-- Result: 892,340.50

Key Concept

NULL Handling in Aggregates

Most aggregate functions (SUM, AVG, MIN, MAX, COUNT(column)) automatically ignore NULL values. Only COUNT(*) includes rows with NULLs. This behavior is crucial for accurate calculations—a NULL doesn't mean zero, it means "unknown" or "missing data."

Why it matters: If you have 100 orders and 5 have NULL totals, SUM(order_total) calculates the sum of the 95 known values. If you need to treat NULLs as zeros, use COALESCE: SUM(COALESCE(order_total, 0)).

Combining COUNT and SUM

Often you'll use multiple aggregate functions together to get comprehensive insights:

-- Sales summary: count and total
SELECT 
  COUNT(*) AS total_orders,
  COUNT(DISTINCT customer_id) AS unique_customers,
  SUM(order_total) AS total_revenue,
  SUM(quantity) AS total_items_sold
FROM orders
WHERE order_date >= '2025-01-01'
  AND order_date < '2026-01-01';

-- Result:
-- total_orders: 3,847
-- unique_customers: 1,205
-- total_revenue: 487,920.00
-- total_items_sold: 12,489

Pro Tip: When calculating revenue, always verify your logic. If you have separate price and quantity columns, use SUM(price * quantity) rather than SUM(price) * SUM(quantity)—the latter gives incorrect results when rows have different quantities.

Conditional Counting with CASE

You can combine COUNT with CASE statements to count rows that meet specific conditions—a powerful technique for creating summary reports:

-- Count orders by status in a single query
SELECT 
  COUNT(*) AS total_orders,
  COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_orders,
  COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_orders,
  COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled_orders
FROM orders;

-- Result:
-- total_orders: 5,200
-- completed_orders: 4,850
-- pending_orders: 280
-- cancelled_orders: 70

Remember: COUNT(expression) only counts rows where the expression is NOT NULL. In CASE statements, the ELSE clause defaults to NULL, so unmatched rows aren't counted—exactly what we want for conditional counting.

Practice Questions: COUNT & SUM

Test your understanding with these coding challenges.

Task: Write a query to count the total number of products and the number of unique categories in a products table.

Expected columns: total_products, unique_categories

Show Solution

SELECT 
  COUNT(*) AS total_products,
  COUNT(DISTINCT category) AS unique_categories
FROM products;

-- Example output:
-- total_products: 1,247
-- unique_categories: 12

Scenario: You have an order_items table with columns: order_id, product_id, quantity, price.

Task: Calculate the total revenue (price × quantity for all items) and count the total number of items sold.

Show Solution

SELECT 
  SUM(price * quantity) AS total_revenue,
  SUM(quantity) AS total_items_sold,
  COUNT(*) AS total_line_items,
  COUNT(DISTINCT order_id) AS total_orders
FROM order_items;

-- Example output:
-- total_revenue: 284,592.50
-- total_items_sold: 8,459
-- total_line_items: 5,234
-- total_orders: 1,847

Task: Write a single query that counts orders in three status categories (pending, completed, cancelled) plus total revenue from completed orders only.

Show Solution

SELECT 
  COUNT(*) AS total_orders,
  COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_count,
  COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_count,
  COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled_count,
  SUM(CASE WHEN status = 'completed' THEN order_total ELSE 0 END) AS completed_revenue
FROM orders;

-- Example output:
-- total_orders: 5,200
-- pending_count: 280
-- completed_count: 4,850
-- cancelled_count: 70
-- completed_revenue: 487,920.00

AVG, MIN & MAX Functions

While COUNT and SUM tell you quantities and totals, AVG, MIN, and MAX reveal patterns, extremes, and central tendencies in your data. These statistical functions are essential for pricing analysis, performance metrics, and identifying outliers.

The AVG() Function

AVG calculates the arithmetic mean by summing values and dividing by the count of non-NULL values. It's one of the most commonly used statistics in business reporting:

-- Calculate average order value
SELECT AVG(order_total) AS average_order_value
FROM orders
WHERE order_date >= '2025-01-01';
-- Result: 127.45

-- Average product price by category
SELECT AVG(price) AS avg_price
FROM products;
-- Result: 45.67

-- Average with DISTINCT (unique values only)
SELECT AVG(DISTINCT price) AS avg_unique_price
FROM products;
-- Result: 52.30 (excludes duplicate prices)

Important: AVG automatically excludes NULL values. If you have 100 rows with 10 NULLs, AVG calculates the mean of the 90 non-NULL values. This is usually what you want, but be aware of how NULLs affect your results.

MIN() and MAX() Functions

MIN finds the smallest value, MAX finds the largest. They work with numbers, dates, and even text (alphabetically). These functions are perfect for finding extremes, ranges, and boundaries in your data:

-- Find price range in products
SELECT 
  MIN(price) AS lowest_price,
  MAX(price) AS highest_price,
  MAX(price) - MIN(price) AS price_range
FROM products;
-- Result: lowest=5.99, highest=299.99, range=294.00

-- Find date range of orders
SELECT 
  MIN(order_date) AS first_order,
  MAX(order_date) AS latest_order
FROM orders;
-- Result: first_order=2024-01-15, latest_order=2025-12-28

-- Alphabetically first and last customer names
SELECT 
  MIN(name) AS first_alphabetically,
  MAX(name) AS last_alphabetically
FROM customers;
-- Result: first=Aaron Adams, last=Zoe Zhang

Key Concept

Statistical Measures

AVG gives you the central tendency (mean), while MIN and MAX show the range extremes. Together, they provide a complete picture: AVG tells you the typical value, MIN/MAX show the spread. A small range suggests consistency; a large range indicates high variability.

Why it matters: In pricing analysis, knowing the average price ($50) plus the range ($10-$200) helps you understand your product mix better than average alone. High variability might indicate diverse product lines or pricing inconsistencies.

Combining All Statistical Aggregates

For comprehensive analysis, use multiple aggregate functions together to get the complete statistical picture:

-- Complete statistical summary of order values
SELECT 
  COUNT(*) AS total_orders,
  SUM(order_total) AS total_revenue,
  AVG(order_total) AS average_order,
  MIN(order_total) AS smallest_order,
  MAX(order_total) AS largest_order,
  MAX(order_total) - MIN(order_total) AS order_range
FROM orders
WHERE status = 'completed';

-- Result:
-- total_orders: 4,850
-- total_revenue: 487,920.00
-- average_order: 100.60
-- smallest_order: 5.99
-- largest_order: 1,299.00
-- order_range: 1,293.01

Practical Use Case: Product Performance Analysis

Let's analyze product performance using all five aggregate functions together:

-- Product sales performance summary
SELECT 
  COUNT(DISTINCT product_id) AS total_products_sold,
  SUM(quantity) AS total_units_sold,
  SUM(price * quantity) AS total_revenue,
  AVG(price) AS avg_selling_price,
  MIN(price) AS lowest_price_sold,
  MAX(price) AS highest_price_sold,
  AVG(quantity) AS avg_quantity_per_order,
  MAX(quantity) AS largest_single_order
FROM order_items
WHERE order_date >= '2025-01-01';

-- Result shows:
-- - How many different products were sold
-- - Total volume and revenue
-- - Pricing statistics
-- - Order size patterns

Pro Tip: When analyzing prices or financial data, consider using ROUND() with AVG to control decimal places: ROUND(AVG(price), 2) gives you a clean currency format like 45.67 instead of 45.6734523.

Finding Records with MIN/MAX Values

A common challenge: MIN/MAX give you the value, but how do you find the actual record? Use a subquery:

-- Find the most expensive product (name and price)
SELECT product_name, price
FROM products
WHERE price = (SELECT MAX(price) FROM products);
-- Result: Laptop Pro X1, 1299.99

-- Find customer with the most orders
SELECT 
  customer_id,
  name,
  COUNT(*) AS order_count
FROM orders
JOIN customers USING (customer_id)
GROUP BY customer_id, name
HAVING COUNT(*) = (
  SELECT MAX(order_count)
  FROM (
    SELECT customer_id, COUNT(*) AS order_count
    FROM orders
    GROUP BY customer_id
  ) AS counts
);

Remember: MIN and MAX work with dates and strings too! MIN(date) gives the earliest date, MAX(date) gives the latest. MIN(name) gives the alphabetically first name. This flexibility makes them incredibly useful beyond just numbers.

Practice Questions: AVG, MIN & MAX

Test your understanding with these statistical analysis challenges.

Task: Write a query to find the average price, minimum price, maximum price, and price range for all products.

Show Solution

SELECT 
  AVG(price) AS average_price,
  MIN(price) AS lowest_price,
  MAX(price) AS highest_price,
  MAX(price) - MIN(price) AS price_range
FROM products;

-- Example output:
-- average_price: 67.89
-- lowest_price: 5.99
-- highest_price: 299.99
-- price_range: 294.00

Task: Identify which customer has spent the most money total across all their orders. Show customer name and total spent.

Show Solution

-- Step 1: Calculate total per customer with subquery
SELECT 
  c.customer_id,
  c.name,
  SUM(o.order_total) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
HAVING SUM(o.order_total) = (
  SELECT MAX(customer_total)
  FROM (
    SELECT customer_id, SUM(order_total) AS customer_total
    FROM orders
    GROUP BY customer_id
  ) AS totals
);

-- Example output:
-- customer_id: 1847
-- name: Jennifer Martinez
-- total_spent: 12,487.50

Task: Show product name, price, category average price, and calculate how much above/below the category average each product is.

Show Solution

SELECT 
  p.product_name,
  p.price,
  p.category,
  AVG(p2.price) AS category_avg_price,
  ROUND(p.price - AVG(p2.price), 2) AS diff_from_avg,
  ROUND((p.price - AVG(p2.price)) / AVG(p2.price) * 100, 1) AS percent_diff
FROM products p
JOIN products p2 ON p.category = p2.category
GROUP BY p.product_id, p.product_name, p.price, p.category
ORDER BY percent_diff DESC
LIMIT 10;

-- Shows which products are priced highest above their category average

GROUP BY Clause

GROUP BY is where aggregate functions truly shine. Instead of summarizing all data into one row, GROUP BY segments your data into categories and calculates aggregates for each group separately. This is the foundation of dimensional analysis and business reporting.

Understanding GROUP BY

Think of GROUP BY as creating buckets: it divides rows into groups based on column values, then applies aggregate functions to each group independently. Each unique value (or combination of values) becomes a separate group with its own aggregate result.

-- Without GROUP BY: one total for everything
SELECT SUM(order_total) AS total_revenue
FROM orders;
-- Result: 487,920.00 (single row)

-- With GROUP BY: totals per category
SELECT 
  category,
  SUM(order_total) AS total_revenue
FROM orders
JOIN order_items USING (order_id)
JOIN products USING (product_id)
GROUP BY category;

-- Results: multiple rows, one per category
-- Electronics  | 245,890.00
-- Clothing     | 128,450.00
-- Home & Garden| 87,320.00
-- Books        | 26,260.00

Key Concept

GROUP BY Rule

Every column in the SELECT clause must either be in the GROUP BY clause OR be inside an aggregate function. This rule ensures each group produces exactly one output row. Non-aggregated columns must be used for grouping, otherwise SQL doesn't know which value to display.

Why it matters: Violating this rule causes errors in most databases. If you SELECT category and product_name but only GROUP BY category, SQL doesn't know which product_name to show for each category (there are many). Solution: either add product_name to GROUP BY, or aggregate it (e.g., COUNT(DISTINCT product_name)).

Common GROUP BY Patterns

Let's explore the most common grouping scenarios you'll encounter in business analytics:

-- 1. Group by single column: Sales by category
SELECT 
  category,
  COUNT(*) AS product_count,
  AVG(price) AS avg_price,
  SUM(quantity_sold) AS total_units
FROM products
GROUP BY category
ORDER BY total_units DESC;

-- 2. Group by date: Daily sales trend
SELECT 
  order_date,
  COUNT(*) AS daily_orders,
  SUM(order_total) AS daily_revenue
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY order_date
ORDER BY order_date;

-- 3. Group by multiple columns: Sales by region and category
SELECT 
  region,
  category,
  COUNT(*) AS order_count,
  SUM(order_total) AS revenue
FROM orders
JOIN customers USING (customer_id)
JOIN order_items USING (order_id)
JOIN products USING (product_id)
GROUP BY region, category
ORDER BY region, revenue DESC;

Important: The order of columns in GROUP BY doesn't affect the results, but it can impact performance. Generally, list more selective columns (more unique values) first for better optimization. However, always verify with your database's query planner.

Grouping by Multiple Columns

When you GROUP BY multiple columns, each unique combination becomes a separate group. This is powerful for multi-dimensional analysis:

-- Sales by year and month
SELECT 
  YEAR(order_date) AS year,
  MONTH(order_date) AS month,
  COUNT(*) AS orders,
  SUM(order_total) AS revenue,
  AVG(order_total) AS avg_order_value
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

-- Result shows trends over time:
-- 2024 | 1  | 245 | 24,567.00 | 100.27
-- 2024 | 2  | 287 | 28,940.00 | 100.84
-- 2024 | 3  | 312 | 31,450.00 | 100.80
-- ... and so on

Practical Use Case: Customer Segmentation

Let's use GROUP BY to segment customers based on their purchasing behavior—a common analytics task:

-- Customer segmentation by order frequency and value
SELECT 
  CASE 
    WHEN order_count >= 10 THEN 'Frequent Buyer'
    WHEN order_count >= 5 THEN 'Regular Customer'
    WHEN order_count >= 2 THEN 'Occasional Buyer'
    ELSE 'One-Time Customer'
  END AS customer_segment,
  COUNT(*) AS customers_in_segment,
  AVG(total_spent) AS avg_lifetime_value,
  SUM(total_spent) AS segment_revenue
FROM (
  SELECT 
    customer_id,
    COUNT(*) AS order_count,
    SUM(order_total) AS total_spent
  FROM orders
  GROUP BY customer_id
) AS customer_stats
GROUP BY 
  CASE 
    WHEN order_count >= 10 THEN 'Frequent Buyer'
    WHEN order_count >= 5 THEN 'Regular Customer'
    WHEN order_count >= 2 THEN 'Occasional Buyer'
    ELSE 'One-Time Customer'
  END
ORDER BY avg_lifetime_value DESC;

-- Result shows value of each customer segment
-- Frequent Buyer    | 487  | 2,847.50 | 1,386,742.50
-- Regular Customer  | 1,205 | 892.30  | 1,075,221.50
-- Occasional Buyer  | 2,847 | 284.60  | 810,170.20
-- One-Time Customer | 5,461 | 67.80   | 370,257.80

Pro Tip: When grouping by expressions (like YEAR(date) or CASE statements), you must repeat the entire expression in GROUP BY. Some databases allow GROUP BY 1, 2 to reference SELECT columns by position, but explicit expressions are more portable and readable.

WHERE vs HAVING: Filtering Before vs After Grouping

A critical distinction: WHERE filters rows BEFORE grouping, HAVING filters groups AFTER aggregation. Understanding this difference is essential for writing correct queries:

Clause	When It Executes	What It Filters
WHERE	Before grouping and aggregation	Filters individual rows. Cannot use aggregate functions (no SUM, AVG, COUNT in WHERE).
HAVING	After grouping and aggregation	Filters groups based on aggregate results. Can use aggregate functions (SUM, AVG, COUNT, etc.).

-- WHERE filters rows before grouping
SELECT category, COUNT(*) AS product_count
FROM products
WHERE price > 50  -- Filter: only expensive products
GROUP BY category;
-- Result: count of expensive products per category

-- HAVING filters groups after aggregation
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 10;  -- Filter: only categories with many products
-- Result: only categories that have more than 10 products

-- Using both together
SELECT 
  category,
  COUNT(*) AS product_count,
  AVG(price) AS avg_price
FROM products
WHERE in_stock = true  -- First: filter to in-stock products
GROUP BY category
HAVING COUNT(*) >= 5   -- Then: only categories with 5+ in-stock items
ORDER BY avg_price DESC;

Remember: Use WHERE to filter data early (more efficient). Use HAVING only when you need to filter based on aggregate results. If possible, filter with WHERE first to reduce the data volume before grouping.

Practice Questions: GROUP BY

Test your grouping and segmentation skills.

Task: Group products by category and show: category name, product count, average price, and lowest/highest prices.

Show Solution

SELECT 
  category,
  COUNT(*) AS product_count,
  ROUND(AVG(price), 2) AS avg_price,
  MIN(price) AS lowest_price,
  MAX(price) AS highest_price
FROM products
GROUP BY category
ORDER BY product_count DESC;

-- Example output:
-- Electronics   | 487 | 125.67 | 9.99  | 1299.99
-- Clothing      | 892 | 45.30  | 5.99  | 199.99
-- Home & Garden | 245 | 67.89  | 12.50 | 499.00

Task: Group orders by month and calculate: month number, order count, total revenue, and average order value for 2025.

Show Solution

SELECT 
  MONTH(order_date) AS month,
  COUNT(*) AS order_count,
  SUM(order_total) AS monthly_revenue,
  ROUND(AVG(order_total), 2) AS avg_order_value
FROM orders
WHERE YEAR(order_date) = 2025
GROUP BY MONTH(order_date)
ORDER BY month;

-- Shows monthly trends:
-- 1 | 245 | 24,567.89 | 100.28
-- 2 | 287 | 28,940.50 | 100.84
-- 3 | 312 | 31,450.00 | 100.80

Task: For each region, show the top 5 customers by total spending. Include region, customer name, order count, and total spent.

Show Solution

-- Using window functions (modern SQL)
WITH customer_spending AS (
  SELECT 
    c.region,
    c.customer_id,
    c.name,
    COUNT(*) AS order_count,
    SUM(o.order_total) AS total_spent,
    ROW_NUMBER() OVER (PARTITION BY c.region ORDER BY SUM(o.order_total) DESC) AS rank
  FROM customers c
  JOIN orders o ON c.customer_id = o.customer_id
  GROUP BY c.region, c.customer_id, c.name
)
SELECT 
  region,
  name,
  order_count,
  total_spent
FROM customer_spending
WHERE rank <= 5
ORDER BY region, rank;

-- Shows top 5 spenders per region

HAVING Clause

While WHERE filters individual rows before grouping, HAVING filters entire groups after aggregation. This powerful clause lets you answer questions like "Which categories have more than 100 products?" or "Show only customers who've spent over $1,000 total."

Understanding HAVING

HAVING works exclusively with GROUP BY to filter groups based on aggregate conditions. Think of it as WHERE for aggregated data—but with a crucial difference in timing and capability:

-- Show categories with more than 50 products
SELECT 
  category,
  COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 50
ORDER BY product_count DESC;

-- Result: Only categories with 50+ products
-- Electronics   | 487
-- Clothing      | 892
-- Home & Garden | 245
-- (categories with <50 products are excluded)

Key Concept

WHERE vs HAVING

WHERE filters rows BEFORE grouping (can't use aggregates). HAVING filters groups AFTER aggregation (can use SUM, AVG, COUNT, etc.). WHERE reduces data volume early; HAVING filters final results. Use both together for maximum efficiency and flexibility.

Why it matters: Understanding this distinction prevents errors and optimizes performance. Filter with WHERE when possible (more efficient). Use HAVING only when filtering requires aggregate calculations that don't exist until after grouping.

Common HAVING Patterns

Let's explore typical HAVING use cases in business analytics:

-- 1. Filter by COUNT: High-volume customers
SELECT 
  customer_id,
  COUNT(*) AS order_count,
  SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 5  -- Only customers with 5+ orders
ORDER BY total_spent DESC;

-- 2. Filter by SUM: High-revenue products
SELECT 
  product_id,
  product_name,
  SUM(quantity) AS total_sold,
  SUM(price * quantity) AS total_revenue
FROM order_items
JOIN products USING (product_id)
GROUP BY product_id, product_name
HAVING SUM(price * quantity) > 10000  -- Only products with $10k+ revenue
ORDER BY total_revenue DESC;

-- 3. Filter by AVG: Above-average performers
SELECT 
  region,
  COUNT(*) AS customer_count,
  AVG(order_total) AS avg_order_value
FROM orders
JOIN customers USING (customer_id)
GROUP BY region
HAVING AVG(order_total) > 100  -- Only regions with $100+ average order
ORDER BY avg_order_value DESC;

Important: HAVING is evaluated AFTER GROUP BY, so it can't improve performance by reducing rows to group. Always use WHERE to filter data early when possible. Reserve HAVING for conditions that genuinely require aggregate calculations.

Combining WHERE and HAVING

The most powerful queries use both WHERE and HAVING together—WHERE filters rows early, HAVING filters aggregated results:

-- Find active categories with high average prices
SELECT 
  category,
  COUNT(*) AS product_count,
  AVG(price) AS avg_price,
  SUM(quantity_in_stock) AS total_inventory
FROM products
WHERE 
  status = 'active'           -- WHERE: Filter rows first
  AND quantity_in_stock > 0   -- (more efficient)
GROUP BY category
HAVING 
  COUNT(*) >= 10              -- HAVING: Filter groups
  AND AVG(price) > 50         -- (after aggregation)
ORDER BY avg_price DESC;

-- Execution order:
-- 1. WHERE filters to active, in-stock products
-- 2. GROUP BY creates category groups
-- 3. Aggregates (COUNT, AVG, SUM) are calculated
-- 4. HAVING filters to categories meeting conditions
-- 5. ORDER BY sorts final results

Multiple Conditions in HAVING

Just like WHERE, you can combine multiple conditions in HAVING using AND/OR:

-- Find valuable customer segments: frequent AND high-spending
SELECT 
  customer_id,
  customer_name,
  COUNT(*) AS order_count,
  SUM(order_total) AS total_spent,
  AVG(order_total) AS avg_order_value
FROM orders
JOIN customers USING (customer_id)
WHERE order_date >= '2025-01-01'  -- This year only
GROUP BY customer_id, customer_name
HAVING 
  COUNT(*) >= 3              -- At least 3 orders
  AND SUM(order_total) > 500 -- Total spending over $500
ORDER BY total_spent DESC;

-- Find problem categories: many products but low sales
SELECT 
  category,
  COUNT(*) AS product_count,
  SUM(quantity_sold) AS total_sold,
  AVG(quantity_sold) AS avg_sold_per_product
FROM products
GROUP BY category
HAVING 
  COUNT(*) >= 20                      -- Many products
  AND AVG(quantity_sold) < 10         -- But low average sales
ORDER BY avg_sold_per_product;
-- Identifies categories that may need marketing attention

Pro Tip: When debugging queries, remove HAVING temporarily to see all groups, then add it back to verify filtering logic. This helps identify whether issues are in grouping or filtering.

HAVING with Subqueries

Advanced technique: Use subqueries in HAVING to compare groups against overall statistics:

-- Find categories performing above average
SELECT 
  category,
  COUNT(*) AS product_count,
  AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > (
  SELECT AVG(price) FROM products  -- Overall average
)
ORDER BY avg_price DESC;

-- Find high-performing sales regions
SELECT 
  region,
  SUM(order_total) AS region_revenue
FROM orders
JOIN customers USING (customer_id)
GROUP BY region
HAVING SUM(order_total) > (
  SELECT AVG(region_total)
  FROM (
    SELECT region, SUM(order_total) AS region_total
    FROM orders
    JOIN customers USING (customer_id)
    GROUP BY region
  ) AS region_stats
)
ORDER BY region_revenue DESC;

Remember: HAVING can reference any aggregate function in the query, even if it's not in the SELECT clause. However, for readability and debugging, it's often better to include aggregates in SELECT when used in HAVING.

Practice Questions: HAVING Clause

Test your group filtering skills.

Task: Show categories that have more than 100 products, ordered by product count descending.

Show Solution

SELECT 
  category,
  COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 100
ORDER BY product_count DESC;

-- Example output:
-- Clothing      | 892
-- Electronics   | 487
-- Home & Garden | 245

Task: Find customers who have placed at least 5 orders AND spent at least $1,000 total. Show customer name, order count, and total spent.

Show Solution

SELECT 
  c.customer_id,
  c.name,
  COUNT(*) AS order_count,
  SUM(o.order_total) AS total_spent,
  ROUND(AVG(o.order_total), 2) AS avg_order_value
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
HAVING 
  COUNT(*) >= 5
  AND SUM(o.order_total) >= 1000
ORDER BY total_spent DESC;

-- Shows VIP customers qualifying for both criteria

Task: Find products that are currently in stock (quantity > 0) but have sold fewer than 10 units total. Use WHERE for stock filter, HAVING for sales filter.

Show Solution

SELECT 
  p.product_id,
  p.product_name,
  p.category,
  p.quantity_in_stock,
  COALESCE(SUM(oi.quantity), 0) AS total_sold
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
WHERE p.quantity_in_stock > 0  -- In stock
GROUP BY p.product_id, p.product_name, p.category, p.quantity_in_stock
HAVING COALESCE(SUM(oi.quantity), 0) < 10  -- Low sales
ORDER BY total_sold, quantity_in_stock DESC;

-- Identifies products that may need price adjustment or promotion

Advanced Aggregations

Master-level aggregation techniques combine multiple concepts—nested queries, conditional logic, joins with aggregates, and window functions. These advanced patterns solve complex business problems that require sophisticated data analysis.

Nested Aggregations with Subqueries

Sometimes you need to aggregate aggregated data—like finding the average of group totals. This requires subqueries to create intermediate results:

-- Average revenue per category (aggregate of aggregates)
SELECT AVG(category_revenue) AS avg_category_revenue
FROM (
  SELECT 
    category,
    SUM(order_total) AS category_revenue
  FROM orders
  JOIN order_items USING (order_id)
  JOIN products USING (product_id)
  GROUP BY category
) AS category_totals;
-- Result: 121,980.00 (average across all category totals)

-- Find customers who spent more than their region's average
SELECT 
  c.customer_id,
  c.name,
  c.region,
  SUM(o.order_total) AS total_spent,
  (SELECT AVG(customer_total)
   FROM (
     SELECT customer_id, SUM(order_total) AS customer_total
     FROM orders o2
     JOIN customers c2 ON o2.customer_id = c2.customer_id
     WHERE c2.region = c.region
     GROUP BY customer_id
   ) AS region_totals
  ) AS region_avg
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.region
HAVING SUM(o.order_total) > (
  SELECT AVG(customer_total)
  FROM (
    SELECT customer_id, SUM(order_total) AS customer_total
    FROM orders o2
    JOIN customers c2 ON o2.customer_id = c2.customer_id
    WHERE c2.region = c.region
    GROUP BY customer_id
  ) AS region_totals
)
ORDER BY region, total_spent DESC;

Key Concept

Correlated Subqueries

A correlated subquery references columns from the outer query, executing once per row. While powerful for row-by-row comparisons against aggregates, they can be slow on large datasets. Modern alternatives include window functions or CTEs (Common Table Expressions).

Why it matters: Correlated subqueries enable sophisticated comparisons—like "customers above their region's average" or "products priced higher than their category median"—but optimization is crucial. Consider window functions for better performance on large tables.

Conditional Aggregation with CASE

CASE statements inside aggregates enable pivot-like reporting and conditional calculations in a single query:

-- Revenue breakdown by product category in columns
SELECT 
  YEAR(order_date) AS year,
  MONTH(order_date) AS month,
  SUM(CASE WHEN category = 'Electronics' THEN order_total ELSE 0 END) AS electronics_revenue,
  SUM(CASE WHEN category = 'Clothing' THEN order_total ELSE 0 END) AS clothing_revenue,
  SUM(CASE WHEN category = 'Books' THEN order_total ELSE 0 END) AS books_revenue,
  SUM(order_total) AS total_revenue
FROM orders
JOIN order_items USING (order_id)
JOIN products USING (product_id)
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

-- Customer purchase frequency segmentation
SELECT 
  COUNT(*) AS total_customers,
  SUM(CASE WHEN order_count = 1 THEN 1 ELSE 0 END) AS one_time_buyers,
  SUM(CASE WHEN order_count BETWEEN 2 AND 4 THEN 1 ELSE 0 END) AS occasional,
  SUM(CASE WHEN order_count BETWEEN 5 AND 9 THEN 1 ELSE 0 END) AS regular,
  SUM(CASE WHEN order_count >= 10 THEN 1 ELSE 0 END) AS frequent,
  ROUND(AVG(total_spent), 2) AS avg_lifetime_value
FROM (
  SELECT 
    customer_id,
    COUNT(*) AS order_count,
    SUM(order_total) AS total_spent
  FROM orders
  GROUP BY customer_id
) AS customer_stats;

Combining Joins with Aggregations

Real-world analytics often requires joining multiple tables before aggregating. Understanding join order and filtering is critical for correct results:

-- Sales performance by employee and region
SELECT 
  e.employee_id,
  e.name AS employee_name,
  e.region,
  COUNT(DISTINCT o.order_id) AS orders_processed,
  COUNT(DISTINCT o.customer_id) AS unique_customers,
  SUM(oi.quantity) AS total_items_sold,
  SUM(oi.price * oi.quantity) AS total_revenue,
  ROUND(AVG(o.order_total), 2) AS avg_order_value
FROM employees e
LEFT JOIN orders o ON e.employee_id = o.sales_rep_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id
WHERE e.department = 'Sales'
  AND o.order_date >= '2025-01-01'
GROUP BY e.employee_id, e.name, e.region
HAVING COUNT(DISTINCT o.order_id) > 0  -- Only employees with orders
ORDER BY total_revenue DESC;

-- Product popularity by customer segment
SELECT 
  p.product_name,
  p.category,
  COUNT(DISTINCT CASE WHEN c.age < 25 THEN o.customer_id END) AS young_buyers,
  COUNT(DISTINCT CASE WHEN c.age BETWEEN 25 AND 45 THEN o.customer_id END) AS middle_buyers,
  COUNT(DISTINCT CASE WHEN c.age > 45 THEN o.customer_id END) AS senior_buyers,
  SUM(oi.quantity) AS total_sold,
  ROUND(AVG(oi.price), 2) AS avg_selling_price
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
GROUP BY p.product_id, p.product_name, p.category
HAVING SUM(oi.quantity) >= 100  -- Popular products only
ORDER BY total_sold DESC;

Important: When joining before aggregating, be careful of duplicate row multiplication. If one order has multiple items, joining orders to order_items multiplies rows. Use COUNT(DISTINCT order_id) instead of COUNT(*) to get accurate order counts.

Window Functions for Advanced Analytics

Window functions (also called analytic functions) perform calculations across row sets without collapsing them into groups. They're perfect for rankings, running totals, and comparisons:

-- Running total of daily revenue
SELECT 
  order_date,
  SUM(order_total) AS daily_revenue,
  SUM(SUM(order_total)) OVER (ORDER BY order_date) AS running_total,
  AVG(SUM(order_total)) OVER (
    ORDER BY order_date 
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS moving_avg_7day
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY order_date
ORDER BY order_date;

-- Rank products by revenue within each category
SELECT 
  product_id,
  product_name,
  category,
  SUM(price * quantity) AS total_revenue,
  RANK() OVER (
    PARTITION BY category 
    ORDER BY SUM(price * quantity) DESC
  ) AS category_rank,
  SUM(SUM(price * quantity)) OVER (
    PARTITION BY category
  ) AS category_total_revenue,
  ROUND(
    SUM(price * quantity) * 100.0 / SUM(SUM(price * quantity)) OVER (PARTITION BY category),
    2
  ) AS percent_of_category
FROM order_items
JOIN products USING (product_id)
GROUP BY product_id, product_name, category
ORDER BY category, category_rank;

Pro Tip: Use Common Table Expressions (CTEs) with WITH clauses to break complex queries into readable steps. CTEs make nested aggregations much easier to understand and debug than deeply nested subqueries.

Common Table Expressions (CTEs)

CTEs provide a cleaner, more maintainable way to write complex aggregation queries:

-- Multi-step analysis using CTEs
WITH 
-- Step 1: Calculate monthly totals per customer
monthly_customer_sales AS (
  SELECT 
    customer_id,
    YEAR(order_date) AS year,
    MONTH(order_date) AS month,
    COUNT(*) AS order_count,
    SUM(order_total) AS monthly_total
  FROM orders
  GROUP BY customer_id, YEAR(order_date), MONTH(order_date)
),
-- Step 2: Find customer lifetime totals
customer_lifetime AS (
  SELECT 
    customer_id,
    SUM(monthly_total) AS lifetime_value,
    COUNT(DISTINCT CONCAT(year, '-', month)) AS active_months
  FROM monthly_customer_sales
  GROUP BY customer_id
),
-- Step 3: Segment customers
customer_segments AS (
  SELECT 
    customer_id,
    lifetime_value,
    active_months,
    CASE 
      WHEN lifetime_value >= 5000 THEN 'Premium'
      WHEN lifetime_value >= 1000 THEN 'Gold'
      WHEN lifetime_value >= 500 THEN 'Silver'
      ELSE 'Bronze'
    END AS segment
  FROM customer_lifetime
)
-- Final: Analyze segments
SELECT 
  segment,
  COUNT(*) AS customer_count,
  ROUND(AVG(lifetime_value), 2) AS avg_lifetime_value,
  ROUND(SUM(lifetime_value), 2) AS segment_total_revenue,
  ROUND(AVG(active_months), 1) AS avg_active_months
FROM customer_segments
GROUP BY segment
ORDER BY avg_lifetime_value DESC;

Remember: Advanced aggregations often combine multiple techniques: joins for data enrichment, WHERE for early filtering, GROUP BY for segmentation, HAVING for group filtering, and subqueries or CTEs for multi-level calculations. Master each individually, then combine them strategically.

Practice Questions: Advanced Aggregations

Test your mastery of complex aggregation techniques.

Task: Show monthly revenue with separate columns for each product category using conditional aggregation.

Show Solution

SELECT 
  YEAR(o.order_date) AS year,
  MONTH(o.order_date) AS month,
  SUM(CASE WHEN p.category = 'Electronics' 
      THEN oi.price * oi.quantity ELSE 0 END) AS electronics,
  SUM(CASE WHEN p.category = 'Clothing' 
      THEN oi.price * oi.quantity ELSE 0 END) AS clothing,
  SUM(CASE WHEN p.category = 'Books' 
      THEN oi.price * oi.quantity ELSE 0 END) AS books,
  SUM(oi.price * oi.quantity) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2025-01-01'
GROUP BY YEAR(o.order_date), MONTH(o.order_date)
ORDER BY year, month;

Task: Show products where total sales exceed their category's average sales, including both the product and category averages.

Show Solution

WITH product_sales AS (
  SELECT 
    p.product_id,
    p.product_name,
    p.category,
    SUM(oi.quantity) AS total_sold,
    SUM(oi.price * oi.quantity) AS total_revenue
  FROM products p
  LEFT JOIN order_items oi ON p.product_id = oi.product_id
  GROUP BY p.product_id, p.product_name, p.category
),
category_averages AS (
  SELECT 
    category,
    AVG(total_revenue) AS avg_category_revenue
  FROM product_sales
  GROUP BY category
)
SELECT 
  ps.product_name,
  ps.category,
  ps.total_revenue,
  ROUND(ca.avg_category_revenue, 2) AS category_avg,
  ROUND(ps.total_revenue - ca.avg_category_revenue, 2) AS diff_from_avg
FROM product_sales ps
JOIN category_averages ca ON ps.category = ca.category
WHERE ps.total_revenue > ca.avg_category_revenue
ORDER BY diff_from_avg DESC;

Task: Calculate what percentage of customers who made their first purchase each month came back to purchase again.

Show Solution

WITH first_purchase AS (
  SELECT 
    customer_id,
    MIN(order_date) AS first_order_date
  FROM orders
  GROUP BY customer_id
),
first_purchase_cohort AS (
  SELECT 
    DATE_FORMAT(first_order_date, '%Y-%m') AS cohort_month,
    customer_id
  FROM first_purchase
),
repeat_customers AS (
  SELECT 
    fp.customer_id,
    fpc.cohort_month
  FROM first_purchase fp
  JOIN first_purchase_cohort fpc ON fp.customer_id = fpc.customer_id
  JOIN orders o ON fp.customer_id = o.customer_id
  WHERE o.order_date > fp.first_order_date
  GROUP BY fp.customer_id, fpc.cohort_month
)
SELECT 
  fpc.cohort_month,
  COUNT(DISTINCT fpc.customer_id) AS new_customers,
  COUNT(DISTINCT rc.customer_id) AS repeat_customers,
  ROUND(
    COUNT(DISTINCT rc.customer_id) * 100.0 / COUNT(DISTINCT fpc.customer_id),
    2
  ) AS repeat_rate_percent
FROM first_purchase_cohort fpc
LEFT JOIN repeat_customers rc ON fpc.customer_id = rc.customer_id
GROUP BY fpc.cohort_month
ORDER BY fpc.cohort_month;

Key Takeaways

Aggregate Functions

COUNT, SUM, AVG, MIN, MAX perform calculations on sets of rows, returning single values for analysis and reporting.

GROUP BY Power

GROUP BY divides data into groups, applying aggregate functions to each group separately for category analysis and segmentation.

HAVING vs WHERE

WHERE filters individual rows before grouping; HAVING filters groups after aggregation is complete. Know when to use each.

Business Insights

Aggregations transform raw data into actionable metrics: sales totals, customer counts, average values, and trends over time.

Joins + Aggregations

Combining joins with aggregations enables complex multi-table analytics for comprehensive business reporting and insights.

NULL Handling

Most aggregate functions ignore NULLs; COUNT(*) counts all rows, COUNT(column) excludes NULLs. This matters for accuracy.

SQL Aggregations & GROUP BY

What You'll Learn

Contents

Introduction to SQL Aggregations

Why Aggregations Matter

Aggregate Functions

The Five Core Aggregate Functions

Real-World Applications

Sales Analytics

Customer Insights

Inventory Management

Basic Aggregation Example

COUNT & SUM Functions

Understanding COUNT()

The SUM() Function

NULL Handling in Aggregates

Combining COUNT and SUM

Conditional Counting with CASE

Practice Questions: COUNT & SUM

Easy Count total products and unique categories

Medium Calculate total revenue and average order size

Hard Count orders by status using conditional aggregation

AVG, MIN & MAX Functions

The AVG() Function

MIN() and MAX() Functions

Statistical Measures

Combining All Statistical Aggregates

Practical Use Case: Product Performance Analysis

Finding Records with MIN/MAX Values

Practice Questions: AVG, MIN & MAX

Easy Calculate average product price and price range

Medium Find customer with highest total spending

Hard Compare each product price to category average

GROUP BY Clause

Understanding GROUP BY

GROUP BY Rule

Common GROUP BY Patterns

Grouping by Multiple Columns

Practical Use Case: Customer Segmentation

WHERE vs HAVING: Filtering Before vs After Grouping

Practice Questions: GROUP BY

Easy Count products and calculate average price by category

Medium Calculate monthly revenue trend for 2025

Hard Find top 5 customers by spending in each region

HAVING Clause

Understanding HAVING

WHERE vs HAVING

Common HAVING Patterns

Combining WHERE and HAVING

Multiple Conditions in HAVING

HAVING with Subqueries

Practice Questions: HAVING Clause

Easy Find categories with more than 100 products

Medium Identify VIP customers: 5+ orders and $1000+ total

Hard Find underperforming products: in stock but low sales

Advanced Aggregations

Nested Aggregations with Subqueries

Correlated Subqueries

Conditional Aggregation with CASE

Combining Joins with Aggregations

Window Functions for Advanced Analytics

Common Table Expressions (CTEs)

Practice Questions: Advanced Aggregations

Medium Create a pivot table: revenue by category and month

Hard Find products selling better than category average

Hard Calculate customer retention: repeat purchase rate

Key Takeaways

Aggregate Functions

GROUP BY Power

HAVING vs WHERE

Business Insights

Joins + Aggregations

NULL Handling

Knowledge Check

1 What is the key difference between COUNT(*) and COUNT(column_name)?

2 Which clause filters groups AFTER aggregation is performed?

3 What will SELECT category, COUNT(*) FROM products GROUP BY category HAVING AVG(price) > 50 return?

4 How does SUM() handle NULL values?

5 What's required when using GROUP BY in a query?

6 Which query correctly finds categories with more than 10 products priced over $100?

3 What will `SELECT category, COUNT(*) FROM products GROUP BY category HAVING AVG(price) > 50` return?