SQL Queries Assignment - Data Analytics Course

Assignment Overview

In this assignment, you will analyze a real-world E-Commerce Database using advanced SQL techniques. This comprehensive project requires you to apply ALL concepts from Module 3: subqueries, Common Table Expressions (CTEs), window functions, date/string manipulation, and UNION operations to extract meaningful business insights.

PostgreSQL Only: You must write all queries in PostgreSQL syntax. All queries must be tested against the provided sample database and must execute without errors.

Skills Applied: This assignment tests your understanding of SQL Basics (Topic 3.1), Joins & Relationships (Topic 3.2), Aggregations & Grouping (Topic 3.3), and Advanced SQL Techniques (Topic 3.4) from Module 3.

Subqueries & CTEs (3.4)

Scalar, row, and table subqueries; WITH clauses for complex query organization

Window Functions (3.4)

ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and running aggregations

Data Manipulation (3.4)

Date arithmetic, EXTRACT, string functions, and UNION operations

Ready to submit? Already completed the assignment? Submit your work now!

Submit Now

The Scenario

ShopSphere E-Commerce Company

You are a Data Analyst at ShopSphere, a fast-growing e-commerce company. The business leadership team has assigned you this project:

"We need comprehensive data analysis to understand which products are driving revenue, which customers are our most valuable, and how our sales trends are evolving over time. The insights you provide will directly inform our Q2 marketing strategy and inventory planning. Use advanced SQL techniques to extract actionable insights from our database."

Your Task

Create a SQL script file called ecommerce_analysis.sql that contains all required queries to analyze ShopSphere's database. Each query must be clearly commented with its purpose and should demonstrate advanced SQL techniques covered in Module 3.

The Dataset

The ShopSphere database contains six interconnected tables with real-world e-commerce data spanning from January 2023 to December 2024. The database schema is normalized to 3NF and includes referential integrity constraints.

Database Schema

Study the following schema carefully. Understanding the relationships between tables is crucial for writing effective queries.

-- CUSTOMERS TABLE
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    city VARCHAR(50) NOT NULL,
    state VARCHAR(50) NOT NULL,
    country VARCHAR(50) NOT NULL,
    registration_date DATE NOT NULL,
    loyalty_tier VARCHAR(20) CHECK (loyalty_tier IN ('Bronze', 'Silver', 'Gold', 'Platinum'))
);

-- CATEGORIES TABLE
CREATE TABLE categories (
    category_id SERIAL PRIMARY KEY,
    category_name VARCHAR(100) UNIQUE NOT NULL,
    description TEXT
);

-- PRODUCTS TABLE
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(200) NOT NULL,
    category_id INT REFERENCES categories(category_id),
    unit_price DECIMAL(10, 2) NOT NULL CHECK (unit_price >= 0),
    stock_quantity INT NOT NULL CHECK (stock_quantity >= 0),
    supplier VARCHAR(100),
    is_active BOOLEAN DEFAULT TRUE
);

-- ORDERS TABLE
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date TIMESTAMP NOT NULL,
    ship_date TIMESTAMP,
    delivery_date TIMESTAMP,
    order_status VARCHAR(20) CHECK (order_status IN ('Pending', 'Processing', 'Shipped', 'Delivered', 'Cancelled')),
    total_amount DECIMAL(10, 2) NOT NULL CHECK (total_amount >= 0)
);

-- ORDER_ITEMS TABLE
CREATE TABLE order_items (
    order_item_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity INT NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10, 2) NOT NULL CHECK (unit_price >= 0),
    discount_percent DECIMAL(5, 2) DEFAULT 0 CHECK (discount_percent >= 0 AND discount_percent <= 100)
);

-- PAYMENTS TABLE
CREATE TABLE payments (
    payment_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    payment_date TIMESTAMP NOT NULL,
    payment_method VARCHAR(50) CHECK (payment_method IN ('Credit Card', 'Debit Card', 'PayPal', 'Bank Transfer', 'Cash on Delivery')),
    payment_amount DECIMAL(10, 2) NOT NULL CHECK (payment_amount >= 0),
    payment_status VARCHAR(20) CHECK (payment_status IN ('Pending', 'Completed', 'Failed', 'Refunded'))
);

Dataset Statistics

5,230 Customers

Across 45 states

1,840 Products

In 24 categories

18,942 Orders

From Jan 2023 to Dec 2024

Download Sample Data

A SQL dump file with sample data is available for testing your queries. Import this file into your PostgreSQL database to begin analysis.

Database Setup Instructions

# Create database
createdb shopsphere_db

# Import sample data
psql shopsphere_db < shopsphere_sample_data.sql

# Connect to database
psql shopsphere_db

# Verify tables
\dt

Download Sample Data (shopsphere_sample_data.sql)

Requirements

Your ecommerce_analysis.sql file must contain ALL of the following queries. Each query must be properly commented and demonstrate the specified SQL technique.

Part 1: Subqueries (30 points)

Top Spending Customers (Scalar Subquery)

Write a query to find all customers who have spent more than the average total order amount. Use a scalar subquery in the WHERE clause.

Calculate total spending per customer from the orders table
Compare each customer's total to the average using a scalar subquery
Display customer name, email, total spending, and average spending
Order by total spending in descending order

-- Expected columns: customer_name, email, total_spent, avg_spending
-- Must use: Scalar subquery for calculating average

Products Never Ordered (Table Subquery)

Find all products that have never been ordered using a table subquery with NOT IN or NOT EXISTS.

Use a subquery to get all product_id values from order_items
Return products NOT IN that list
Display product name, category name, unit price, and stock quantity
Include a LEFT JOIN with categories table

-- Expected columns: product_name, category_name, unit_price, stock_quantity
-- Must use: NOT IN with subquery or NOT EXISTS

Most Recent Order per Customer (Correlated Subquery)

For each customer, find their most recent order details using a correlated subquery.

Use a correlated subquery in WHERE clause with MAX(order_date)
Display customer name, order date, order status, and total amount
Only include customers who have placed at least one order
Order by order date descending

-- Expected columns: customer_name, order_date, order_status, total_amount
-- Must use: Correlated subquery comparing order_date to MAX(order_date)

Part 2: Common Table Expressions - CTEs (30 points)

Monthly Revenue Analysis (Single CTE)

Calculate monthly revenue trends for 2024 using a CTE to organize the query logic.

Create a CTE called monthly_sales that aggregates order totals by month
Use EXTRACT function to get month from order_date
Calculate total revenue, number of orders, and average order value per month
In main query, display month name, revenue, order count, and avg order value

-- Expected columns: month_name, total_revenue, order_count, avg_order_value
-- Must use: WITH clause for CTE, EXTRACT function, aggregate functions

Customer Segmentation (Multiple CTEs)

Segment customers into spending tiers (High, Medium, Low) using multiple CTEs.

CTE 1: customer_spending calculates total spent per customer
CTE 2: spending_thresholds calculates 33rd and 66th percentiles
Main query: Use CASE to classify customers based on thresholds
Display customer name, total spent, and spending tier

-- Expected columns: customer_name, total_spent, spending_tier
-- Must use: Multiple CTEs with WITH clause, PERCENTILE_CONT, CASE statement

Product Performance Hierarchy (Nested CTEs)

Analyze product performance within each category using nested CTEs.

CTE 1: product_sales calculates total revenue per product
CTE 2: category_totals calculates total revenue per category
CTE 3: product_contribution calculates each product's contribution percentage
Display category, product name, product revenue, and contribution percentage
Only show products contributing more than 10% to their category

-- Expected columns: category_name, product_name, product_revenue, contribution_pct
-- Must use: 3 nested CTEs, percentage calculation, filtering

Part 3: Window Functions (40 points)

Top 5 Products per Category (ROW_NUMBER)

Rank products within each category by total revenue and return the top 5 per category.

Calculate total revenue for each product (quantity * unit_price * discount)
Use ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY revenue DESC)
Filter to only include rows where rank is less than or equal to 5
Display category name, product name, revenue, and rank

-- Expected columns: category_name, product_name, total_revenue, product_rank
-- Must use: ROW_NUMBER window function with PARTITION BY

Customer Order Frequency Ranking (RANK and DENSE_RANK)

Rank customers by number of orders placed, showing both RANK and DENSE_RANK for comparison.

Count total orders per customer
Apply RANK() and DENSE_RANK() ordered by order count descending
Display customer name, order count, rank, and dense rank
Show top 20 customers only

-- Expected columns: customer_name, order_count, rank, dense_rank
-- Must use: Both RANK() and DENSE_RANK() window functions

Running Total Revenue (Running Sum)

Calculate a running total of revenue by order date for the year 2024.

Group orders by date (use DATE function to extract date from timestamp)
Calculate daily revenue totals
Use SUM() OVER (ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Display date, daily revenue, and running total

-- Expected columns: order_date, daily_revenue, running_total
-- Must use: Window function with ROWS frame for running total

Customer Order Patterns (LAG and LEAD)

Analyze customer purchasing patterns by comparing current order date with previous and next order dates.

For each customer's orders, get previous order date using LAG()
Get next order date using LEAD()
Calculate days since last order and days until next order
Partition by customer_id and order by order_date
Display customer name, order date, previous date, next date, days gaps

-- Expected columns: customer_name, order_date, prev_order_date, next_order_date, days_since_last, days_until_next
-- Must use: LAG() and LEAD() window functions with PARTITION BY

Part 4: Date and String Manipulation (25 points)

Order Processing Time Analysis (Date Arithmetic)

Calculate average processing time (ship_date - order_date) for each order status.

Use DATE_PART or EXTRACT to calculate days between order_date and ship_date
Calculate AVG processing time for each order_status
Only include orders where ship_date is NOT NULL
Display order status, average processing days, and order count

-- Expected columns: order_status, avg_processing_days, order_count
-- Must use: Date arithmetic functions (DATE_PART or EXTRACT)

Seasonal Sales Analysis (Date Extraction)

Analyze sales by season (Winter, Spring, Summer, Fall) for 2024.

Use EXTRACT(MONTH FROM order_date) to get month
Use CASE to map months to seasons (Dec-Feb: Winter, Mar-May: Spring, Jun-Aug: Summer, Sep-Nov: Fall)
Calculate total revenue and order count per season
Display season, total revenue, order count, and average order value

-- Expected columns: season, total_revenue, order_count, avg_order_value
-- Must use: EXTRACT function, CASE statement for season mapping

Customer Name Formatting (String Functions)

Format customer names for email campaigns using string manipulation functions.

Create full_name as "Last Name, First Name" using CONCAT
Create email_greeting as "Dear FirstName" using CONCAT and INITCAP
Extract domain from email using SUBSTRING and POSITION
Display formatted name, email greeting, email, and email domain

-- Expected columns: full_name, email_greeting, email, email_domain
-- Must use: CONCAT, INITCAP, SUBSTRING, POSITION functions

Part 5: UNION Operations (25 points)

High-Value Transactions Report (UNION ALL)

Combine high-value orders and high-value payments into a single report using UNION ALL.

Query 1: Select orders with total_amount greater than $1000
Query 2: Select payments with payment_amount greater than $1000
Use UNION ALL to combine both result sets
Display transaction_type (Order or Payment), transaction_id, amount, date

-- Expected columns: transaction_type, transaction_id, amount, transaction_date
-- Must use: UNION ALL to combine two SELECT statements

Customer Contact List (UNION with Deduplication)

Create a master contact list combining customers from different regions, removing duplicates.

Query 1: Select customers from California with email
Query 2: Select customers from Texas with email
Query 3: Select customers from New York with email
Use UNION (not UNION ALL) to automatically deduplicate
Display customer name, email, state, and registration date
Order by registration date descending

-- Expected columns: customer_name, email, state, registration_date
-- Must use: UNION (without ALL) for automatic deduplication

Pro Tip: Test each query individually before adding it to your final SQL file. Use EXPLAIN ANALYZE to check query performance and optimize as needed.

Deliverables

Create a GitHub repository containing all required files and submit the repository URL via the form below.

Create SQL Script File

Create ecommerce_analysis.sql containing:

All 15 required queries organized by part (Subqueries, CTEs, Window Functions, etc.)
Clear comments above each query explaining its purpose
Section headers separating different parts
Proper SQL formatting and indentation

-- ================================================================
-- PART 1: SUBQUERIES (30 POINTS)
-- ================================================================

-- Query 1: Top Spending Customers
-- Purpose: Find customers who spent more than average
SELECT ...

-- Query 2: Products Never Ordered
-- Purpose: Identify products with zero orders
SELECT ...

Create README File

Create README.md that includes:

Your name and assignment title
Brief description of the ShopSphere database
Instructions to set up the database and run queries
List of all 15 queries with brief descriptions
Key insights discovered from your analysis
Any challenges faced and how you solved them

Export Query Results

Create a folder called results/ containing:

CSV exports of at least 5 key query results
Name files descriptively (e.g., top_spending_customers.csv)
Include column headers in CSV files

# Export query results to CSV
psql shopsphere_db -c "COPY (SELECT ...) TO '/path/to/results/query1.csv' CSV HEADER;"

Create Documentation File

Create INSIGHTS.md that documents:

Top 3 business insights discovered from your analysis
Actionable recommendations for ShopSphere based on data
Interesting patterns or trends you identified
Any data quality issues you noticed

Repository Structure

Your GitHub repository should look like this:

shopsphere-sql-analysis/
├── README.md
├── ecommerce_analysis.sql    # All 15 queries with comments
├── INSIGHTS.md               # Business insights and recommendations
└── results/
    ├── top_spending_customers.csv
    ├── monthly_revenue.csv
    ├── product_rankings.csv
    ├── customer_patterns.csv
    └── seasonal_sales.csv

Submit via Form

Once your repository is ready:

Make sure your repository is public or shared with your instructor
Verify all queries execute without errors in PostgreSQL
Click the "Submit Assignment" button below
Fill in the submission form with your GitHub repository URL

Important: Test all queries against the sample database before submitting! Queries with syntax errors will receive reduced credit.

Grading Rubric

Your assignment will be graded on the following criteria:

Criteria	Points	Description
Subqueries (Part 1)	30	Correct use of scalar, table, and correlated subqueries; accurate results
CTEs (Part 2)	30	Proper WITH clause syntax, single and multiple CTEs, logical query organization
Window Functions (Part 3)	40	ROW_NUMBER, RANK, DENSE_RANK, running totals, LAG/LEAD with proper partitioning
Date/String Manipulation (Part 4)	25	Date arithmetic, EXTRACT, string functions (CONCAT, SUBSTRING, POSITION)
UNION Operations (Part 5)	25	Correct use of UNION and UNION ALL, proper column alignment
Code Quality	15	Comments, formatting, readability, proper SQL conventions
Documentation	10	README completeness, insights quality, repository organization
Total	175

Bonus Points (Up to +25)

+10 points: Create an additional complex query using recursive CTEs
+10 points: Include query execution plans and optimization notes in INSIGHTS.md
+5 points: Create visualizations of key query results using any tool

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Assignment

What You Will Practice

Subqueries (3.1)

Master scalar, row, and table subqueries for nested data retrieval and filtering based on aggregated results

Common Table Expressions (3.2)

Use WITH clauses to create temporary named result sets, improving query readability and maintainability

Window Functions (3.3)

Apply ROW_NUMBER, RANK, DENSE_RANK, LAG, and LEAD for advanced ranking and row comparison operations

Date & String Functions (3.4)

Manipulate dates with EXTRACT and date arithmetic, format strings with CONCAT, SUBSTRING, and POSITION

UNION Operations (3.5)

Combine multiple result sets using UNION (with deduplication) and UNION ALL (without deduplication)

Real-World Analysis

Apply SQL techniques to solve authentic e-commerce business problems and generate actionable insights

Pro Tips

Query Development

Start with simple queries, then add complexity
Test subqueries independently before nesting
Use LIMIT to test queries on small result sets first
Comment out sections to debug step by step

Performance Optimization

Use EXPLAIN ANALYZE to check query execution plans
Prefer CTEs over nested subqueries for readability
Use appropriate indexes for JOIN and WHERE clauses
Avoid SELECT * - specify only needed columns

Time Management

Allocate 2 hours per major section (Parts 1-5)
Complete Part 1 (Subqueries) first as foundation
Take breaks between sections to maintain focus
Save time at end for documentation and testing

Common Mistakes

Forgetting to JOIN tables before using their columns
Mixing aggregate and non-aggregate columns without GROUP BY
Confusing UNION (deduplicates) with UNION ALL (keeps all)
Not handling NULL values in calculations and comparisons

SQL Queries E-Commerce Analysis

What You'll Practice

Contents