Capstone Project 1

Sales Performance Analytics

Build a comprehensive sales analytics solution using Excel, SQL, and Power BI. You will clean retail sales data, calculate key performance indicators, analyze regional and product performance, and create an interactive executive dashboard.

10-15 hours
Intermediate
500 Points
What You Will Build
  • Excel data cleaning workbook
  • SQL analysis queries
  • KPI calculation formulas
  • Power BI data model
  • Interactive executive dashboard
Contents
01

Project Overview

This capstone project brings together everything you have learned in the Data Analytics course. You will work with the famous Kaggle Superstore Dataset containing 9,993 real transactions spanning 4 years (2014-2017), covering 1,894 products across 3 categories (Furniture, Office Supplies, Technology) with 17 sub-categories, 4,910 customers in 3 segments (Consumer, Corporate, Home Office), and 4 US regions (Central, East, South, West). Total sales: $2.3M with $286K profit. Your goal is to build a professional analytics solution that transforms this real-world data into actionable business insights using Excel for data preparation, SQL for analysis, and Power BI for visualization.

Skills Applied: This project tests your proficiency in Excel (data cleaning, pivot tables, formulas), SQL (queries, joins, aggregations), and Power BI (data modeling, DAX measures, interactive dashboards).
Excel Prep

Clean data, validate entries, and create pivot summaries

SQL Analysis

Write queries for aggregations, joins, and insights

KPI Metrics

Calculate revenue, growth rates, and performance indicators

Power BI

Build interactive dashboards with DAX measures

Learning Objectives

Technical Skills
  • Master Excel data cleaning and validation techniques
  • Write complex SQL queries with joins and aggregations
  • Create Power BI data models with relationships
  • Build DAX measures for dynamic calculations
  • Design interactive dashboard visualizations
Business Skills
  • Translate business questions into analytical queries
  • Calculate and interpret key performance indicators
  • Identify trends, patterns, and anomalies in sales data
  • Present findings to non-technical stakeholders
  • Make data-driven recommendations for growth
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Business Scenario

RetailMax Corporation

You have been hired as a Business Intelligence Analyst at RetailMax, a growing retail chain with stores across four US regions. The company operates both physical stores and an online channel, selling products across Electronics, Clothing, Home and Garden, Sports, and Toys categories. The leadership team is preparing for their annual strategic planning meeting and needs a comprehensive sales performance analysis.

"We need to understand our sales performance across all regions and product categories. Which stores are outperforming? What products should we invest in? Are there seasonal patterns we should prepare for? Can you build us an executive dashboard that answers these questions at a glance?"

Michael Torres, Chief Revenue Officer

Business Questions to Answer

Revenue Analysis
  • What is the total revenue by month and quarter?
  • Which region generates the highest revenue?
  • What is our year-over-year growth rate?
  • How does online vs in-store performance compare?
Product Performance
  • What are the top 10 best-selling products?
  • Which category has the highest profit margin?
  • What products have declining sales trends?
  • How do promotional discounts affect revenue?
Store Performance
  • Which stores are top and bottom performers?
  • What is the average transaction value by store?
  • How does store size correlate with revenue?
  • Which stores have the highest customer retention?
Trends and Patterns
  • Are there seasonal patterns in sales?
  • Which day of the week has highest sales?
  • How do holidays impact revenue?
  • What is the monthly sales growth trend?
Pro Tip: Think like a business analyst! Your dashboard should tell a story that helps executives make data-driven decisions about inventory, marketing spend, and store operations.
03

The Dataset

You will work with a comprehensive retail sales dataset. Download the Excel file containing multiple sheets for transactions, products, stores, and customers:

Dataset Download

Download the retail sales dataset files (based on Kaggle's Superstore Dataset) and save them to your project folder. The CSV files contain all necessary data for sales performance analysis.

Original Data Source

This project uses the Superstore Dataset from Kaggle - one of the most popular datasets for learning data analytics and business intelligence. The dataset contains real-world retail sales data with 9,993 transactions, 4,910 customers, and 1,894 products across the United States.

Dataset Info: 9,993 rows × 20 columns | Date Range: Jan 2014 - Dec 2017 | Total Sales: $2,296,919 | Total Profit: $286,409 | 4,910 Customers | 1,894 Products | Categories: Furniture, Office Supplies, Technology | Segments: Consumer, Corporate, Home Office
Dataset Schema

ColumnTypeDescription
Row_IDIntegerUnique row identifier (1-9993)
Order_IDStringUnique order identifier (e.g., CA-2016-152156)
Order_DateDateDate of order (YYYY-MM-DD, 2016-2019)
Ship_DateDateDate of shipment (YYYY-MM-DD)
Ship_ModeStringStandard Class, Second Class, First Class, Same Day
Customer_IDStringCustomer identifier (e.g., CG-12520)
Customer_NameStringCustomer full name
SegmentStringConsumer, Corporate, Home Office
CityStringCity name (USA cities)
StateStringUS state name
Postal_CodeInteger5-digit postal code
RegionStringEast, West, Central, South
Product_IDStringProduct identifier (e.g., FUR-BO-10001798)
CategoryStringFurniture, Office Supplies, Technology
Sub_CategoryString17 subcategories (Chairs, Phones, Binders, etc.)
Product_NameStringFull product name
SalesDecimalTotal sales amount ($)
QuantityIntegerNumber of units purchased
DiscountDecimalDiscount applied (0.00-0.80)
ProfitDecimalProfit/Loss amount (can be negative)

ColumnTypeDescription
Product_IDStringUnique product identifier (e.g., FUR-BO-10001798)
Product_NameStringFull product name
CategoryStringFurniture, Office Supplies, Technology
Sub_CategoryStringProduct subcategory (17 types: Bookcases, Chairs, Phones, etc.)

ColumnTypeDescription
RegionStringRegion name (Central, East, South, West)
Total_SalesDecimalTotal sales revenue ($)
Total_ProfitDecimalTotal profit ($)
Total_QuantityIntegerTotal units sold
Total_OrdersIntegerUnique order count
Total_CustomersIntegerUnique customer count

ColumnTypeDescription
Customer_IDStringUnique customer identifier (e.g., CG-12520)
Customer_NameStringCustomer full name
SegmentStringConsumer, Corporate, Home Office
CityStringCity name
StateStringUS state name
Postal_CodeInteger5-digit postal code
RegionStringCentral, East, South, West
Dataset Stats: 100 sales transactions, 49 customers, 100 products, 4 regions (Kaggle Superstore format)
Analysis Scope: Orders from 2023-2024 with Sales, Profit, Quantity, Discount data
Sample Data Preview

Here is what a typical sales record looks like from retail_sales.csv:

Order_IDOrder_DateCustomerProductCategoryQtySalesProfit
CA-2024-1521562024-11-08Claire GuteBush Somerset BookcaseFurniture2$261.96$41.91
CA-2024-1158122024-06-09Brosina HoffmanMitel 5320 IP PhoneTechnology6$907.15$90.72
CA-2024-1198232024-03-14Anna ReevesCanon imageCLASS CopierTechnology1$3,399.99$1,529.99
Data Quality Note: The dataset contains some intentional data quality issues for you to identify and clean: missing values in certain columns, duplicate transaction IDs, and inconsistent date formats. Part of your task is to document and handle these issues.
04

Project Requirements

Your project must include all of the following components using Excel, SQL, and Power BI. Structure your deliverables with clear organization and documentation.

1
Excel Data Preparation

Data Quality Assessment:

  • Import all 4 sheets and inspect data structure
  • Identify missing values, duplicates, and inconsistencies
  • Document all data quality issues found in a summary sheet
  • Use conditional formatting to highlight problematic cells

Data Cleaning:

  • Remove or handle duplicate transaction records
  • Fill or flag missing values with appropriate methods
  • Standardize date formats across all sheets
  • Validate foreign key relationships between sheets

Feature Engineering:

  • Add calculated columns: profit, profit margin, discount amount
  • Extract year, month, quarter, and day of week from dates
  • Create fiscal year and fiscal quarter columns
  • Add a customer segment column based on purchase behavior
Deliverable: Cleaned Excel workbook with original data, cleaned data, and a Data Quality Report sheet documenting all issues and resolutions.
2
SQL Analysis Queries

Write SQL queries to answer the following business questions:

  • Revenue Analysis: Total revenue by month, quarter, region, and category
  • Product Performance: Top 10 products by revenue and quantity sold
  • Store Comparison: Revenue per store with ranking
  • Customer Analysis: Customer segments by total lifetime value
  • Trend Analysis: Month-over-month growth rate calculation
  • Advanced: Running totals, moving averages, or year-over-year comparisons

Query Requirements:

  • Use proper JOINs to combine transaction data with dimension tables
  • Include GROUP BY with aggregate functions (SUM, AVG, COUNT)
  • Use window functions for rankings and running calculations
  • Add comments explaining each query's purpose
Deliverable: SQL file (.sql) containing at least 10 well-documented queries with comments explaining business purpose and expected output.
3
Power BI Dashboard

Data Model:

  • Import cleaned data from Excel
  • Create proper relationships between fact and dimension tables
  • Configure star schema with Transactions as fact table
  • Create a Date dimension table for time intelligence

DAX Measures (minimum 10):

  • Total Revenue, Total Cost, Total Profit, Profit Margin %
  • YTD Revenue, Previous Year Revenue, YoY Growth %
  • Average Order Value, Customer Count, Repeat Customer Rate
  • MoM Growth %, Revenue per Square Foot

Dashboard Pages (minimum 3):

  • Executive Summary: KPI cards, revenue trend, regional breakdown
  • Product Analysis: Category performance, top products, margins
  • Store Performance: Store rankings, regional maps, comparisons
Deliverable: Power BI file (.pbix) with data model, DAX measures, and at least 3 interactive dashboard pages.
4
Analysis Report

Executive Summary (1 page):

  • High-level business overview and key findings
  • Top 5 insights with supporting metrics
  • Strategic recommendations prioritized by impact

Detailed Findings (2-3 pages):

  • Revenue performance analysis with trends and patterns
  • Product and category performance insights
  • Regional and store-level analysis
  • Customer behavior patterns

Recommendations (1 page):

  • 3-5 actionable recommendations based on findings
  • Expected impact and implementation priority
  • Next steps and future analysis opportunities
Deliverable: PDF report (4-5 pages) with visualizations, insights, and business recommendations.
05

KPI Specifications

Calculate and display the following Key Performance Indicators in your Power BI dashboard. Create DAX measures for dynamic calculations and organize them in display folders.

Revenue Metrics
  • Total Revenue: Sum of all total_amount
  • Total Cost: Sum of (quantity * cost_price)
  • Gross Profit: Revenue - Cost
  • Profit Margin %: (Profit / Revenue) * 100
  • YTD Revenue: Year-to-date total using TOTALYTD
  • MoM Growth %: Monthly growth rate comparison
Product Metrics
  • Total Units Sold: Sum of all quantity
  • Unique Products Sold: DISTINCTCOUNT of product_id
  • Top 10 Products: By revenue and by units
  • Category Revenue %: Revenue share per category
  • Average Unit Price: Mean selling price
  • Discount Impact: Revenue lost to discounts
Store Metrics
  • Revenue by Region: Northeast, Southeast, Midwest, West
  • Revenue per Store: Average and by individual store
  • Revenue per Sq Ft: Efficiency metric by store size
  • Store Ranking: RANKX by revenue
  • Online vs In-Store %: Channel distribution
  • Top/Bottom Performers: Best and worst 5 stores
Customer Metrics
  • Total Customers: DISTINCTCOUNT of customer_id
  • Average Order Value: Revenue / Transaction count
  • Orders per Customer: Total orders / Unique customers
  • Customer Lifetime Value: Avg total spend per customer
  • Tier Distribution: Bronze, Silver, Gold, Platinum %
  • Repeat Customer Rate: Customers with 2+ orders
Sample DAX Measures
-- Total Revenue
Total Revenue = SUM(Transactions[total_amount])

-- Profit Margin
Profit Margin % = 
DIVIDE([Gross Profit], [Total Revenue], 0) * 100

-- YTD Revenue
YTD Revenue = 
TOTALYTD([Total Revenue], 'Date'[Date])
-- MoM Growth
MoM Growth % = 
VAR CurrentMonth = [Total Revenue]
VAR PreviousMonth = CALCULATE(
    [Total Revenue],
    DATEADD('Date'[Date], -1, MONTH))
RETURN
DIVIDE(CurrentMonth - PreviousMonth, PreviousMonth, 0) * 100
Best Practice: Organize your DAX measures into display folders (Revenue, Products, Stores, Customers) in the model view for better maintenance.
06

Required Visualizations

Create at least 12 visualizations across your 3 dashboard pages. Each visualization should be interactive with proper filtering, tooltips, and drill-through capabilities.

Page 1

Executive Summary Dashboard

High-level overview for executive leadership

KPI Cards 4-6 Cards
  • Total Revenue with YoY comparison
  • Gross Profit with margin %
  • Total Orders with growth indicator
  • Average Order Value
  • Customer Count
  • Top Performing Region
Revenue Trend Line
  • Monthly revenue over 4-year period (2014-2017)
  • Include trend line and forecast
  • Add annotations for key events
  • Enable drill-down to weekly view
Regional Map
  • US map with revenue by region (Central, East, South, West)
  • Color saturation by performance
  • Tooltip showing key metrics
  • Click to filter other visuals
Category Breakdown
  • Donut chart: Furniture, Office Supplies, Technology
  • Show percentages and values
  • Highlight top category
  • Click to cross-filter dashboard
Page 2

Product Analysis Dashboard

Deep dive into product and category performance

Top 10 Products Bar Chart
  • Horizontal bars sorted by revenue
  • Color by category
  • Data labels showing values
  • Toggle between revenue and units
Category Matrix Matrix
  • 17 Sub-categories within 3 Categories
  • Show revenue, units, and margin %
  • Conditional formatting for performance
  • Expandable row hierarchy
Category Trend Stacked Area
  • Monthly trend by category (2014-2017)
  • Stacked area chart
  • Show category contribution over time
  • Identify growing/declining categories
Price vs Profit Scatter
  • Scatter plot of 1,894 products
  • X: Sales amount | Y: Profit
  • Size: Quantity sold
  • Color: Category (identify loss-makers)
Page 3

Regional Performance Dashboard

Region and state-level performance analysis

State Ranking Table
  • All states ranked by revenue
  • Columns: Rank, State, Region, Sales, Profit
  • Conditional formatting bars
  • Sparklines for trend
Regional Comparison Clustered Bar
  • 4 Regions: Central, East, South, West
  • Compare revenue, orders, profit
  • Color code by performance tier
  • Include regional targets
Segment Analysis Donut + Bar
  • Consumer vs Corporate vs Home Office
  • Revenue and profit by segment
  • Average order value comparison
  • Customer count per segment
Ship Mode Analysis Combo Chart
  • Standard, Second, First Class, Same Day
  • Order count by ship mode
  • Average delivery time
  • Profit margin by shipping
Extra Credit

Bonus Visualizations

Optional enhancements for extra points

Customer Segmentation

RFM analysis: 4,910 customers segmented by Recency, Frequency, Monetary

Time Intelligence

Same period last year comparison, YTD, QTD calculations

Decomposition Tree

Revenue breakdown by Region → Segment → Category

Q&A Visual

Natural language query: "Show sales by region for 2017"

Bookmarks

Pre-configured views: "Top Products", "Loss Analysis", "Regional Focus"

Mobile Layout

Responsive design optimized for tablet and phone viewing

Design Best Practices
Consistency
  • Consistent color palette
  • Same fonts throughout
  • Aligned visual elements
  • Consistent number formatting
Interactivity
  • Cross-filtering enabled
  • Slicers for date and region
  • Drill-through to details
  • Tooltips with context
Clarity
  • Clear, descriptive titles
  • Labeled axes and legends
  • Appropriate chart types
  • No chart junk
07

Submission Requirements

Create a public GitHub repository with the exact name shown below:

Required Repository Name
sales-performance-analytics
github.com/<your-username>/sales-performance-analytics
Required Project Structure
sales-performance-analytics/
├── data/
│   ├── retail_sales.xlsx           # Original dataset
│   └── retail_sales_cleaned.xlsx   # Your cleaned dataset
├── sql/
│   └── analysis_queries.sql        # All SQL queries with comments
├── powerbi/
│   └── sales_dashboard.pbix        # Power BI dashboard file
├── reports/
│   └── analysis_report.pdf         # Final analysis report
├── screenshots/
│   ├── dashboard_executive.png     # Executive summary page
│   ├── dashboard_products.png      # Product analysis page
│   └── dashboard_stores.png        # Store performance page
└── README.md                       # Project documentation
README.md Required Sections
1. Project Header
  • Project title and description
  • Your full name and submission date
  • Course and project number
2. Business Context
  • RetailMax scenario overview
  • Project objectives
  • Dataset summary
3. Tools and Technologies
  • Microsoft Excel
  • SQL (specify dialect)
  • Power BI Desktop
4. Key Findings
  • 5-7 data-driven insights
  • Supporting metrics for each
  • Business implications
5. Dashboard Screenshots
  • 3 dashboard page screenshots
  • Brief caption for each
  • Use markdown image syntax
6. Data Model
  • Describe star schema design
  • List key DAX measures
  • Explain relationships
7. How to Use
  • Instructions for opening PBIX file
  • How to refresh data
  • Navigation guide
8. Contact
  • GitHub profile link
  • LinkedIn (optional)
  • Email (optional)
Do Include
  • All required files in correct folders
  • Cleaned Excel file with documentation
  • At least 10 commented SQL queries
  • PBIX file with 3+ dashboard pages
  • PDF report with visualizations
  • Dashboard screenshots in README
Do Not Include
  • Temporary or backup files (~$*.xlsx)
  • Power BI auto-recovery files
  • Sensitive or personal information
  • Extremely large image files (> 2MB each)
  • Incomplete or broken files
Important: Before submitting, verify that your Power BI file opens without errors and all visuals load correctly. Test the PBIX file on a different computer if possible.
Submit Your Project

Enter your GitHub username - we will verify your repository automatically

08

Grading Rubric

Your project will be graded on the following criteria. Total: 500 points.

Criteria Points Description
Excel Data Preparation 75 Data cleaning, validation, documentation of issues, calculated columns
SQL Analysis 100 At least 10 queries with proper joins, aggregations, and window functions
Power BI Data Model 75 Proper relationships, date table, organized measures (10+ DAX)
Dashboard Visualizations 100 At least 12 visuals across 3 pages with proper interactivity
Analysis Report 75 Clear insights, supporting metrics, actionable recommendations
Documentation 50 README quality, SQL comments, folder structure, screenshots
Design and Usability 25 Visual consistency, clarity, user experience, professional appearance
Total 500
Grading Levels
Excellent
450-500

Exceeds all requirements with exceptional quality

Good
375-449

Meets all requirements with good quality

Satisfactory
300-374

Meets minimum requirements

Needs Work
< 300

Missing key requirements

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Project
09

Pre-Submission Checklist

Use this checklist to verify you have completed all requirements before submitting your project.

Excel Requirements
SQL Requirements
Power BI Requirements
Repository Requirements
Final Check: Open your Power BI file on a fresh install or different computer to ensure all data connections and visuals work correctly.
10

Common Issues and Solutions

Encountering problems? Don't worry! Here are the most common issues students face and how to resolve them quickly.

Power BI Data Refresh Errors
Problem

Data source path errors when opening PBIX file on different computer

Solution

Use relative file paths or place data files in the same folder as PBIX file. In Power Query:

= Excel.Workbook(File.Contents("data\retail_sales.csv"))
Tip: Embed data in PBIX file by importing rather than connecting
DAX Time Intelligence Not Working
Problem

YTD, MoM, and other time functions return errors or blanks

Solution

Create a proper Date dimension table with continuous dates:

Date = CALENDAR(DATE(2014,1,1), DATE(2017,12,31))
Important: Mark Date table as "Date Table" in model view
SQL Queries Returning Wrong Results
Problem

Aggregations return unexpected values or duplicate records

Solution

Check JOIN conditions carefully. Use DISTINCT when counting unique values. Verify foreign keys match correctly.

Debug: Run SELECT * FROM table LIMIT 10 to inspect intermediate results
Excel Dates Displaying as Numbers
Problem

Dates appear as serial numbers (e.g., 45678 instead of 2024-11-15)

Solution

Select date column → Right-click → Format Cells → Choose Date format. In Power Query, change data type to Date.

Prevention: Always verify data types after importing to Power BI
GitHub File Size Limits
Problem

Cannot push large PBIX or Excel files to GitHub (100MB limit)

Solution

Compress images, remove unused data, or use Git LFS for files over 50MB. Remove unused columns from Power BI model.

Alternative: Embed data rather than linking to reduce file size
Profit Margin Calculations Off
Problem

Profit margins showing incorrect percentages or negative values

Solution

Use this DAX formula for accurate profit margin calculation:

Profit Margin = DIVIDE(SUM(Sales[Profit]), SUM(Sales[Sales]), 0)
Note: The dataset has negative profits - this is intentional real data
Still Having Issues?

Check the course discussion forum or reach out for help