Assignment 3-A

SQL Queries E-Commerce Analysis

Master advanced SQL techniques by analyzing a real e-commerce database. Write complex queries using subqueries, CTEs, window functions, and UNION operations to extract actionable business insights.

8-10 hours
Challenging
175 Points
Submit Assignment
What You'll Practice
  • Subqueries (scalar, row, table)
  • CTEs and recursive queries
  • Window functions & rankings
  • Date/string manipulation
  • UNION operations
Contents
01

Assignment Overview

In this assignment, you will analyze a real-world E-Commerce Database using advanced SQL techniques. This comprehensive project requires you to apply ALL concepts from Module 3: subqueries, Common Table Expressions (CTEs), window functions, date/string manipulation, and UNION operations to extract meaningful business insights.

PostgreSQL Only: You must write all queries in PostgreSQL syntax. All queries must be tested against the provided sample database and must execute without errors.
Skills Applied: This assignment tests your understanding of SQL Basics (Topic 3.1), Joins & Relationships (Topic 3.2), Aggregations & Grouping (Topic 3.3), and Advanced SQL Techniques (Topic 3.4) from Module 3.
Subqueries & CTEs (3.4)

Scalar, row, and table subqueries; WITH clauses for complex query organization

Window Functions (3.4)

ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and running aggregations

Data Manipulation (3.4)

Date arithmetic, EXTRACT, string functions, and UNION operations

Ready to submit? Already completed the assignment? Submit your work now!
Submit Now
02

The Scenario

ShopSphere E-Commerce Company

You are a Data Analyst at ShopSphere, a fast-growing e-commerce company. The business leadership team has assigned you this project:

"We need comprehensive data analysis to understand which products are driving revenue, which customers are our most valuable, and how our sales trends are evolving over time. The insights you provide will directly inform our Q2 marketing strategy and inventory planning. Use advanced SQL techniques to extract actionable insights from our database."

Your Task

Create a SQL script file called ecommerce_analysis.sql that contains all required queries to analyze ShopSphere's database. Each query must be clearly commented with its purpose and should demonstrate advanced SQL techniques covered in Module 3.

03

The Dataset

The ShopSphere database contains six interconnected tables with real-world e-commerce data spanning from January 2023 to December 2024. The database schema is normalized to 3NF and includes referential integrity constraints.

Database Schema

Study the following schema carefully. Understanding the relationships between tables is crucial for writing effective queries.

-- CUSTOMERS TABLE
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    city VARCHAR(50) NOT NULL,
    state VARCHAR(50) NOT NULL,
    country VARCHAR(50) NOT NULL,
    registration_date DATE NOT NULL,
    loyalty_tier VARCHAR(20) CHECK (loyalty_tier IN ('Bronze', 'Silver', 'Gold', 'Platinum'))
);

-- CATEGORIES TABLE
CREATE TABLE categories (
    category_id SERIAL PRIMARY KEY,
    category_name VARCHAR(100) UNIQUE NOT NULL,
    description TEXT
);

-- PRODUCTS TABLE
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(200) NOT NULL,
    category_id INT REFERENCES categories(category_id),
    unit_price DECIMAL(10, 2) NOT NULL CHECK (unit_price >= 0),
    stock_quantity INT NOT NULL CHECK (stock_quantity >= 0),
    supplier VARCHAR(100),
    is_active BOOLEAN DEFAULT TRUE
);

-- ORDERS TABLE
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date TIMESTAMP NOT NULL,
    ship_date TIMESTAMP,
    delivery_date TIMESTAMP,
    order_status VARCHAR(20) CHECK (order_status IN ('Pending', 'Processing', 'Shipped', 'Delivered', 'Cancelled')),
    total_amount DECIMAL(10, 2) NOT NULL CHECK (total_amount >= 0)
);

-- ORDER_ITEMS TABLE
CREATE TABLE order_items (
    order_item_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity INT NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10, 2) NOT NULL CHECK (unit_price >= 0),
    discount_percent DECIMAL(5, 2) DEFAULT 0 CHECK (discount_percent >= 0 AND discount_percent <= 100)
);

-- PAYMENTS TABLE
CREATE TABLE payments (
    payment_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    payment_date TIMESTAMP NOT NULL,
    payment_method VARCHAR(50) CHECK (payment_method IN ('Credit Card', 'Debit Card', 'PayPal', 'Bank Transfer', 'Cash on Delivery')),
    payment_amount DECIMAL(10, 2) NOT NULL CHECK (payment_amount >= 0),
    payment_status VARCHAR(20) CHECK (payment_status IN ('Pending', 'Completed', 'Failed', 'Refunded'))
);

Dataset Statistics

5,230 Customers

Across 45 states

1,840 Products

In 24 categories

18,942 Orders

From Jan 2023 to Dec 2024

Download Sample Data

A SQL dump file with sample data is available for testing your queries. Import this file into your PostgreSQL database to begin analysis.

Database Setup Instructions
# Create database
createdb shopsphere_db

# Import sample data
psql shopsphere_db < shopsphere_sample_data.sql

# Connect to database
psql shopsphere_db

# Verify tables
\dt
Download Sample Data (shopsphere_sample_data.sql)
04

Requirements

Your ecommerce_analysis.sql file must contain ALL of the following queries. Each query must be properly commented and demonstrate the specified SQL technique.

Part 1: Subqueries (30 points)

1
Top Spending Customers (Scalar Subquery)

Write a query to find all customers who have spent more than the average total order amount. Use a scalar subquery in the WHERE clause.

  • Calculate total spending per customer from the orders table
  • Compare each customer's total to the average using a scalar subquery
  • Display customer name, email, total spending, and average spending
  • Order by total spending in descending order
-- Expected columns: customer_name, email, total_spent, avg_spending
-- Must use: Scalar subquery for calculating average
2
Products Never Ordered (Table Subquery)

Find all products that have never been ordered using a table subquery with NOT IN or NOT EXISTS.

  • Use a subquery to get all product_id values from order_items
  • Return products NOT IN that list
  • Display product name, category name, unit price, and stock quantity
  • Include a LEFT JOIN with categories table
-- Expected columns: product_name, category_name, unit_price, stock_quantity
-- Must use: NOT IN with subquery or NOT EXISTS
3
Most Recent Order per Customer (Correlated Subquery)

For each customer, find their most recent order details using a correlated subquery.

  • Use a correlated subquery in WHERE clause with MAX(order_date)
  • Display customer name, order date, order status, and total amount
  • Only include customers who have placed at least one order
  • Order by order date descending
-- Expected columns: customer_name, order_date, order_status, total_amount
-- Must use: Correlated subquery comparing order_date to MAX(order_date)

Part 2: Common Table Expressions - CTEs (30 points)

4
Monthly Revenue Analysis (Single CTE)

Calculate monthly revenue trends for 2024 using a CTE to organize the query logic.

  • Create a CTE called monthly_sales that aggregates order totals by month
  • Use EXTRACT function to get month from order_date
  • Calculate total revenue, number of orders, and average order value per month
  • In main query, display month name, revenue, order count, and avg order value
-- Expected columns: month_name, total_revenue, order_count, avg_order_value
-- Must use: WITH clause for CTE, EXTRACT function, aggregate functions
5
Customer Segmentation (Multiple CTEs)

Segment customers into spending tiers (High, Medium, Low) using multiple CTEs.

  • CTE 1: customer_spending calculates total spent per customer
  • CTE 2: spending_thresholds calculates 33rd and 66th percentiles
  • Main query: Use CASE to classify customers based on thresholds
  • Display customer name, total spent, and spending tier
-- Expected columns: customer_name, total_spent, spending_tier
-- Must use: Multiple CTEs with WITH clause, PERCENTILE_CONT, CASE statement
6
Product Performance Hierarchy (Nested CTEs)

Analyze product performance within each category using nested CTEs.

  • CTE 1: product_sales calculates total revenue per product
  • CTE 2: category_totals calculates total revenue per category
  • CTE 3: product_contribution calculates each product's contribution percentage
  • Display category, product name, product revenue, and contribution percentage
  • Only show products contributing more than 10% to their category
-- Expected columns: category_name, product_name, product_revenue, contribution_pct
-- Must use: 3 nested CTEs, percentage calculation, filtering

Part 3: Window Functions (40 points)

7
Top 5 Products per Category (ROW_NUMBER)

Rank products within each category by total revenue and return the top 5 per category.

  • Calculate total revenue for each product (quantity * unit_price * discount)
  • Use ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY revenue DESC)
  • Filter to only include rows where rank is less than or equal to 5
  • Display category name, product name, revenue, and rank
-- Expected columns: category_name, product_name, total_revenue, product_rank
-- Must use: ROW_NUMBER window function with PARTITION BY
8
Customer Order Frequency Ranking (RANK and DENSE_RANK)

Rank customers by number of orders placed, showing both RANK and DENSE_RANK for comparison.

  • Count total orders per customer
  • Apply RANK() and DENSE_RANK() ordered by order count descending
  • Display customer name, order count, rank, and dense rank
  • Show top 20 customers only
-- Expected columns: customer_name, order_count, rank, dense_rank
-- Must use: Both RANK() and DENSE_RANK() window functions
9
Running Total Revenue (Running Sum)

Calculate a running total of revenue by order date for the year 2024.

  • Group orders by date (use DATE function to extract date from timestamp)
  • Calculate daily revenue totals
  • Use SUM() OVER (ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  • Display date, daily revenue, and running total
-- Expected columns: order_date, daily_revenue, running_total
-- Must use: Window function with ROWS frame for running total
10
Customer Order Patterns (LAG and LEAD)

Analyze customer purchasing patterns by comparing current order date with previous and next order dates.

  • For each customer's orders, get previous order date using LAG()
  • Get next order date using LEAD()
  • Calculate days since last order and days until next order
  • Partition by customer_id and order by order_date
  • Display customer name, order date, previous date, next date, days gaps
-- Expected columns: customer_name, order_date, prev_order_date, next_order_date, days_since_last, days_until_next
-- Must use: LAG() and LEAD() window functions with PARTITION BY

Part 4: Date and String Manipulation (25 points)

11
Order Processing Time Analysis (Date Arithmetic)

Calculate average processing time (ship_date - order_date) for each order status.

  • Use DATE_PART or EXTRACT to calculate days between order_date and ship_date
  • Calculate AVG processing time for each order_status
  • Only include orders where ship_date is NOT NULL
  • Display order status, average processing days, and order count
-- Expected columns: order_status, avg_processing_days, order_count
-- Must use: Date arithmetic functions (DATE_PART or EXTRACT)
12
Seasonal Sales Analysis (Date Extraction)

Analyze sales by season (Winter, Spring, Summer, Fall) for 2024.

  • Use EXTRACT(MONTH FROM order_date) to get month
  • Use CASE to map months to seasons (Dec-Feb: Winter, Mar-May: Spring, Jun-Aug: Summer, Sep-Nov: Fall)
  • Calculate total revenue and order count per season
  • Display season, total revenue, order count, and average order value
-- Expected columns: season, total_revenue, order_count, avg_order_value
-- Must use: EXTRACT function, CASE statement for season mapping
13
Customer Name Formatting (String Functions)

Format customer names for email campaigns using string manipulation functions.

  • Create full_name as "Last Name, First Name" using CONCAT
  • Create email_greeting as "Dear FirstName" using CONCAT and INITCAP
  • Extract domain from email using SUBSTRING and POSITION
  • Display formatted name, email greeting, email, and email domain
-- Expected columns: full_name, email_greeting, email, email_domain
-- Must use: CONCAT, INITCAP, SUBSTRING, POSITION functions

Part 5: UNION Operations (25 points)

14
High-Value Transactions Report (UNION ALL)

Combine high-value orders and high-value payments into a single report using UNION ALL.

  • Query 1: Select orders with total_amount greater than $1000
  • Query 2: Select payments with payment_amount greater than $1000
  • Use UNION ALL to combine both result sets
  • Display transaction_type (Order or Payment), transaction_id, amount, date
-- Expected columns: transaction_type, transaction_id, amount, transaction_date
-- Must use: UNION ALL to combine two SELECT statements
15
Customer Contact List (UNION with Deduplication)

Create a master contact list combining customers from different regions, removing duplicates.

  • Query 1: Select customers from California with email
  • Query 2: Select customers from Texas with email
  • Query 3: Select customers from New York with email
  • Use UNION (not UNION ALL) to automatically deduplicate
  • Display customer name, email, state, and registration date
  • Order by registration date descending
-- Expected columns: customer_name, email, state, registration_date
-- Must use: UNION (without ALL) for automatic deduplication
Pro Tip: Test each query individually before adding it to your final SQL file. Use EXPLAIN ANALYZE to check query performance and optimize as needed.
05

Deliverables

Create a GitHub repository containing all required files and submit the repository URL via the form below.

1
Create SQL Script File

Create ecommerce_analysis.sql containing:

  • All 15 required queries organized by part (Subqueries, CTEs, Window Functions, etc.)
  • Clear comments above each query explaining its purpose
  • Section headers separating different parts
  • Proper SQL formatting and indentation
-- ================================================================
-- PART 1: SUBQUERIES (30 POINTS)
-- ================================================================

-- Query 1: Top Spending Customers
-- Purpose: Find customers who spent more than average
SELECT ...

-- Query 2: Products Never Ordered
-- Purpose: Identify products with zero orders
SELECT ...
2
Create README File

Create README.md that includes:

  • Your name and assignment title
  • Brief description of the ShopSphere database
  • Instructions to set up the database and run queries
  • List of all 15 queries with brief descriptions
  • Key insights discovered from your analysis
  • Any challenges faced and how you solved them
3
Export Query Results

Create a folder called results/ containing:

  • CSV exports of at least 5 key query results
  • Name files descriptively (e.g., top_spending_customers.csv)
  • Include column headers in CSV files
# Export query results to CSV
psql shopsphere_db -c "COPY (SELECT ...) TO '/path/to/results/query1.csv' CSV HEADER;"
4
Create Documentation File

Create INSIGHTS.md that documents:

  • Top 3 business insights discovered from your analysis
  • Actionable recommendations for ShopSphere based on data
  • Interesting patterns or trends you identified
  • Any data quality issues you noticed
5
Repository Structure

Your GitHub repository should look like this:

shopsphere-sql-analysis/
├── README.md
├── ecommerce_analysis.sql    # All 15 queries with comments
├── INSIGHTS.md               # Business insights and recommendations
└── results/
    ├── top_spending_customers.csv
    ├── monthly_revenue.csv
    ├── product_rankings.csv
    ├── customer_patterns.csv
    └── seasonal_sales.csv
6
Submit via Form

Once your repository is ready:

  • Make sure your repository is public or shared with your instructor
  • Verify all queries execute without errors in PostgreSQL
  • Click the "Submit Assignment" button below
  • Fill in the submission form with your GitHub repository URL
Important: Test all queries against the sample database before submitting! Queries with syntax errors will receive reduced credit.
06

Grading Rubric

Your assignment will be graded on the following criteria:

Criteria Points Description
Subqueries (Part 1) 30 Correct use of scalar, table, and correlated subqueries; accurate results
CTEs (Part 2) 30 Proper WITH clause syntax, single and multiple CTEs, logical query organization
Window Functions (Part 3) 40 ROW_NUMBER, RANK, DENSE_RANK, running totals, LAG/LEAD with proper partitioning
Date/String Manipulation (Part 4) 25 Date arithmetic, EXTRACT, string functions (CONCAT, SUBSTRING, POSITION)
UNION Operations (Part 5) 25 Correct use of UNION and UNION ALL, proper column alignment
Code Quality 15 Comments, formatting, readability, proper SQL conventions
Documentation 10 README completeness, insights quality, repository organization
Total 175
Bonus Points (Up to +25)
  • +10 points: Create an additional complex query using recursive CTEs
  • +10 points: Include query execution plans and optimization notes in INSIGHTS.md
  • +5 points: Create visualizations of key query results using any tool

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Assignment
07

What You Will Practice

Subqueries (3.1)

Master scalar, row, and table subqueries for nested data retrieval and filtering based on aggregated results

Common Table Expressions (3.2)

Use WITH clauses to create temporary named result sets, improving query readability and maintainability

Window Functions (3.3)

Apply ROW_NUMBER, RANK, DENSE_RANK, LAG, and LEAD for advanced ranking and row comparison operations

Date & String Functions (3.4)

Manipulate dates with EXTRACT and date arithmetic, format strings with CONCAT, SUBSTRING, and POSITION

UNION Operations (3.5)

Combine multiple result sets using UNION (with deduplication) and UNION ALL (without deduplication)

Real-World Analysis

Apply SQL techniques to solve authentic e-commerce business problems and generate actionable insights

08

Pro Tips

Query Development
  • Start with simple queries, then add complexity
  • Test subqueries independently before nesting
  • Use LIMIT to test queries on small result sets first
  • Comment out sections to debug step by step
Performance Optimization
  • Use EXPLAIN ANALYZE to check query execution plans
  • Prefer CTEs over nested subqueries for readability
  • Use appropriate indexes for JOIN and WHERE clauses
  • Avoid SELECT * - specify only needed columns
Time Management
  • Allocate 2 hours per major section (Parts 1-5)
  • Complete Part 1 (Subqueries) first as foundation
  • Take breaks between sections to maintain focus
  • Save time at end for documentation and testing
Common Mistakes
  • Forgetting to JOIN tables before using their columns
  • Mixing aggregate and non-aggregate columns without GROUP BY
  • Confusing UNION (deduplicates) with UNION ALL (keeps all)
  • Not handling NULL values in calculations and comparisons
09

Pre-Submission Checklist

SQL Script Requirements
Repository Requirements