Project Overview
This capstone project brings together everything you have learned in the Data Analytics course. You will work with the famous Kaggle Superstore Dataset containing 9,993 real transactions spanning 4 years (2014-2017), covering 1,894 products across 3 categories (Furniture, Office Supplies, Technology) with 17 sub-categories, 4,910 customers in 3 segments (Consumer, Corporate, Home Office), and 4 US regions (Central, East, South, West). Total sales: $2.3M with $286K profit. Your goal is to build a professional analytics solution that transforms this real-world data into actionable business insights using Excel for data preparation, SQL for analysis, and Power BI for visualization.
Excel Prep
Clean data, validate entries, and create pivot summaries
SQL Analysis
Write queries for aggregations, joins, and insights
KPI Metrics
Calculate revenue, growth rates, and performance indicators
Power BI
Build interactive dashboards with DAX measures
Learning Objectives
Technical Skills
- Master Excel data cleaning and validation techniques
- Write complex SQL queries with joins and aggregations
- Create Power BI data models with relationships
- Build DAX measures for dynamic calculations
- Design interactive dashboard visualizations
Business Skills
- Translate business questions into analytical queries
- Calculate and interpret key performance indicators
- Identify trends, patterns, and anomalies in sales data
- Present findings to non-technical stakeholders
- Make data-driven recommendations for growth
Business Scenario
RetailMax Corporation
You have been hired as a Business Intelligence Analyst at RetailMax, a growing retail chain with stores across four US regions. The company operates both physical stores and an online channel, selling products across Electronics, Clothing, Home and Garden, Sports, and Toys categories. The leadership team is preparing for their annual strategic planning meeting and needs a comprehensive sales performance analysis.
"We need to understand our sales performance across all regions and product categories. Which stores are outperforming? What products should we invest in? Are there seasonal patterns we should prepare for? Can you build us an executive dashboard that answers these questions at a glance?"
Business Questions to Answer
- What is the total revenue by month and quarter?
- Which region generates the highest revenue?
- What is our year-over-year growth rate?
- How does online vs in-store performance compare?
- What are the top 10 best-selling products?
- Which category has the highest profit margin?
- What products have declining sales trends?
- How do promotional discounts affect revenue?
- Which stores are top and bottom performers?
- What is the average transaction value by store?
- How does store size correlate with revenue?
- Which stores have the highest customer retention?
- Are there seasonal patterns in sales?
- Which day of the week has highest sales?
- How do holidays impact revenue?
- What is the monthly sales growth trend?
The Dataset
You will work with a comprehensive retail sales dataset. Download the Excel file containing multiple sheets for transactions, products, stores, and customers:
Dataset Download
Download the retail sales dataset files (based on Kaggle's Superstore Dataset) and save them to your project folder. The CSV files contain all necessary data for sales performance analysis.
Original Data Source
This project uses the Superstore Dataset from Kaggle - one of the most popular datasets for learning data analytics and business intelligence. The dataset contains real-world retail sales data with 9,993 transactions, 4,910 customers, and 1,894 products across the United States.
Dataset Schema
| Column | Type | Description |
|---|---|---|
Row_ID | Integer | Unique row identifier (1-9993) |
Order_ID | String | Unique order identifier (e.g., CA-2016-152156) |
Order_Date | Date | Date of order (YYYY-MM-DD, 2016-2019) |
Ship_Date | Date | Date of shipment (YYYY-MM-DD) |
Ship_Mode | String | Standard Class, Second Class, First Class, Same Day |
Customer_ID | String | Customer identifier (e.g., CG-12520) |
Customer_Name | String | Customer full name |
Segment | String | Consumer, Corporate, Home Office |
City | String | City name (USA cities) |
State | String | US state name |
Postal_Code | Integer | 5-digit postal code |
Region | String | East, West, Central, South |
Product_ID | String | Product identifier (e.g., FUR-BO-10001798) |
Category | String | Furniture, Office Supplies, Technology |
Sub_Category | String | 17 subcategories (Chairs, Phones, Binders, etc.) |
Product_Name | String | Full product name |
Sales | Decimal | Total sales amount ($) |
Quantity | Integer | Number of units purchased |
Discount | Decimal | Discount applied (0.00-0.80) |
Profit | Decimal | Profit/Loss amount (can be negative) |
| Column | Type | Description |
|---|---|---|
Product_ID | String | Unique product identifier (e.g., FUR-BO-10001798) |
Product_Name | String | Full product name |
Category | String | Furniture, Office Supplies, Technology |
Sub_Category | String | Product subcategory (17 types: Bookcases, Chairs, Phones, etc.) |
| Column | Type | Description |
|---|---|---|
Region | String | Region name (Central, East, South, West) |
Total_Sales | Decimal | Total sales revenue ($) |
Total_Profit | Decimal | Total profit ($) |
Total_Quantity | Integer | Total units sold |
Total_Orders | Integer | Unique order count |
Total_Customers | Integer | Unique customer count |
| Column | Type | Description |
|---|---|---|
Customer_ID | String | Unique customer identifier (e.g., CG-12520) |
Customer_Name | String | Customer full name |
Segment | String | Consumer, Corporate, Home Office |
City | String | City name |
State | String | US state name |
Postal_Code | Integer | 5-digit postal code |
Region | String | Central, East, South, West |
Sample Data Preview
Here is what a typical sales record looks like from retail_sales.csv:
| Order_ID | Order_Date | Customer | Product | Category | Qty | Sales | Profit |
|---|---|---|---|---|---|---|---|
| CA-2024-152156 | 2024-11-08 | Claire Gute | Bush Somerset Bookcase | Furniture | 2 | $261.96 | $41.91 |
| CA-2024-115812 | 2024-06-09 | Brosina Hoffman | Mitel 5320 IP Phone | Technology | 6 | $907.15 | $90.72 |
| CA-2024-119823 | 2024-03-14 | Anna Reeves | Canon imageCLASS Copier | Technology | 1 | $3,399.99 | $1,529.99 |
Project Requirements
Your project must include all of the following components using Excel, SQL, and Power BI. Structure your deliverables with clear organization and documentation.
Excel Data Preparation
Data Quality Assessment:
- Import all 4 sheets and inspect data structure
- Identify missing values, duplicates, and inconsistencies
- Document all data quality issues found in a summary sheet
- Use conditional formatting to highlight problematic cells
Data Cleaning:
- Remove or handle duplicate transaction records
- Fill or flag missing values with appropriate methods
- Standardize date formats across all sheets
- Validate foreign key relationships between sheets
Feature Engineering:
- Add calculated columns: profit, profit margin, discount amount
- Extract year, month, quarter, and day of week from dates
- Create fiscal year and fiscal quarter columns
- Add a customer segment column based on purchase behavior
SQL Analysis Queries
Write SQL queries to answer the following business questions:
- Revenue Analysis: Total revenue by month, quarter, region, and category
- Product Performance: Top 10 products by revenue and quantity sold
- Store Comparison: Revenue per store with ranking
- Customer Analysis: Customer segments by total lifetime value
- Trend Analysis: Month-over-month growth rate calculation
- Advanced: Running totals, moving averages, or year-over-year comparisons
Query Requirements:
- Use proper JOINs to combine transaction data with dimension tables
- Include GROUP BY with aggregate functions (SUM, AVG, COUNT)
- Use window functions for rankings and running calculations
- Add comments explaining each query's purpose
Power BI Dashboard
Data Model:
- Import cleaned data from Excel
- Create proper relationships between fact and dimension tables
- Configure star schema with Transactions as fact table
- Create a Date dimension table for time intelligence
DAX Measures (minimum 10):
- Total Revenue, Total Cost, Total Profit, Profit Margin %
- YTD Revenue, Previous Year Revenue, YoY Growth %
- Average Order Value, Customer Count, Repeat Customer Rate
- MoM Growth %, Revenue per Square Foot
Dashboard Pages (minimum 3):
- Executive Summary: KPI cards, revenue trend, regional breakdown
- Product Analysis: Category performance, top products, margins
- Store Performance: Store rankings, regional maps, comparisons
Analysis Report
Executive Summary (1 page):
- High-level business overview and key findings
- Top 5 insights with supporting metrics
- Strategic recommendations prioritized by impact
Detailed Findings (2-3 pages):
- Revenue performance analysis with trends and patterns
- Product and category performance insights
- Regional and store-level analysis
- Customer behavior patterns
Recommendations (1 page):
- 3-5 actionable recommendations based on findings
- Expected impact and implementation priority
- Next steps and future analysis opportunities
KPI Specifications
Calculate and display the following Key Performance Indicators in your Power BI dashboard. Create DAX measures for dynamic calculations and organize them in display folders.
- Total Revenue: Sum of all
total_amount - Total Cost: Sum of (quantity * cost_price)
- Gross Profit: Revenue - Cost
- Profit Margin %: (Profit / Revenue) * 100
- YTD Revenue: Year-to-date total using TOTALYTD
- MoM Growth %: Monthly growth rate comparison
- Total Units Sold: Sum of all
quantity - Unique Products Sold: DISTINCTCOUNT of product_id
- Top 10 Products: By revenue and by units
- Category Revenue %: Revenue share per category
- Average Unit Price: Mean selling price
- Discount Impact: Revenue lost to discounts
- Revenue by Region: Northeast, Southeast, Midwest, West
- Revenue per Store: Average and by individual store
- Revenue per Sq Ft: Efficiency metric by store size
- Store Ranking: RANKX by revenue
- Online vs In-Store %: Channel distribution
- Top/Bottom Performers: Best and worst 5 stores
- Total Customers: DISTINCTCOUNT of customer_id
- Average Order Value: Revenue / Transaction count
- Orders per Customer: Total orders / Unique customers
- Customer Lifetime Value: Avg total spend per customer
- Tier Distribution: Bronze, Silver, Gold, Platinum %
- Repeat Customer Rate: Customers with 2+ orders
Sample DAX Measures
-- Total Revenue
Total Revenue = SUM(Transactions[total_amount])
-- Profit Margin
Profit Margin % =
DIVIDE([Gross Profit], [Total Revenue], 0) * 100
-- YTD Revenue
YTD Revenue =
TOTALYTD([Total Revenue], 'Date'[Date])
-- MoM Growth
MoM Growth % =
VAR CurrentMonth = [Total Revenue]
VAR PreviousMonth = CALCULATE(
[Total Revenue],
DATEADD('Date'[Date], -1, MONTH))
RETURN
DIVIDE(CurrentMonth - PreviousMonth, PreviousMonth, 0) * 100
Required Visualizations
Create at least 12 visualizations across your 3 dashboard pages. Each visualization should be interactive with proper filtering, tooltips, and drill-through capabilities.
Executive Summary Dashboard
High-level overview for executive leadership
KPI Cards 4-6 Cards
- Total Revenue with YoY comparison
- Gross Profit with margin %
- Total Orders with growth indicator
- Average Order Value
- Customer Count
- Top Performing Region
Revenue Trend Line
- Monthly revenue over 4-year period (2014-2017)
- Include trend line and forecast
- Add annotations for key events
- Enable drill-down to weekly view
Regional Map
- US map with revenue by region (Central, East, South, West)
- Color saturation by performance
- Tooltip showing key metrics
- Click to filter other visuals
Category Breakdown
- Donut chart: Furniture, Office Supplies, Technology
- Show percentages and values
- Highlight top category
- Click to cross-filter dashboard
Product Analysis Dashboard
Deep dive into product and category performance
Top 10 Products Bar Chart
- Horizontal bars sorted by revenue
- Color by category
- Data labels showing values
- Toggle between revenue and units
Category Matrix Matrix
- 17 Sub-categories within 3 Categories
- Show revenue, units, and margin %
- Conditional formatting for performance
- Expandable row hierarchy
Category Trend Stacked Area
- Monthly trend by category (2014-2017)
- Stacked area chart
- Show category contribution over time
- Identify growing/declining categories
Price vs Profit Scatter
- Scatter plot of 1,894 products
- X: Sales amount | Y: Profit
- Size: Quantity sold
- Color: Category (identify loss-makers)
Regional Performance Dashboard
Region and state-level performance analysis
State Ranking Table
- All states ranked by revenue
- Columns: Rank, State, Region, Sales, Profit
- Conditional formatting bars
- Sparklines for trend
Regional Comparison Clustered Bar
- 4 Regions: Central, East, South, West
- Compare revenue, orders, profit
- Color code by performance tier
- Include regional targets
Segment Analysis Donut + Bar
- Consumer vs Corporate vs Home Office
- Revenue and profit by segment
- Average order value comparison
- Customer count per segment
Ship Mode Analysis Combo Chart
- Standard, Second, First Class, Same Day
- Order count by ship mode
- Average delivery time
- Profit margin by shipping
Bonus Visualizations
Optional enhancements for extra points
Customer Segmentation
RFM analysis: 4,910 customers segmented by Recency, Frequency, Monetary
Time Intelligence
Same period last year comparison, YTD, QTD calculations
Decomposition Tree
Revenue breakdown by Region → Segment → Category
Q&A Visual
Natural language query: "Show sales by region for 2017"
Bookmarks
Pre-configured views: "Top Products", "Loss Analysis", "Regional Focus"
Mobile Layout
Responsive design optimized for tablet and phone viewing
Design Best Practices
Consistency
- Consistent color palette
- Same fonts throughout
- Aligned visual elements
- Consistent number formatting
Interactivity
- Cross-filtering enabled
- Slicers for date and region
- Drill-through to details
- Tooltips with context
Clarity
- Clear, descriptive titles
- Labeled axes and legends
- Appropriate chart types
- No chart junk
Submission Requirements
Create a public GitHub repository with the exact name shown below:
Required Repository Name
sales-performance-analytics
Required Project Structure
sales-performance-analytics/
├── data/
│ ├── retail_sales.xlsx # Original dataset
│ └── retail_sales_cleaned.xlsx # Your cleaned dataset
├── sql/
│ └── analysis_queries.sql # All SQL queries with comments
├── powerbi/
│ └── sales_dashboard.pbix # Power BI dashboard file
├── reports/
│ └── analysis_report.pdf # Final analysis report
├── screenshots/
│ ├── dashboard_executive.png # Executive summary page
│ ├── dashboard_products.png # Product analysis page
│ └── dashboard_stores.png # Store performance page
└── README.md # Project documentation
README.md Required Sections
1. Project Header
- Project title and description
- Your full name and submission date
- Course and project number
2. Business Context
- RetailMax scenario overview
- Project objectives
- Dataset summary
3. Tools and Technologies
- Microsoft Excel
- SQL (specify dialect)
- Power BI Desktop
4. Key Findings
- 5-7 data-driven insights
- Supporting metrics for each
- Business implications
5. Dashboard Screenshots
- 3 dashboard page screenshots
- Brief caption for each
- Use markdown image syntax
6. Data Model
- Describe star schema design
- List key DAX measures
- Explain relationships
7. How to Use
- Instructions for opening PBIX file
- How to refresh data
- Navigation guide
8. Contact
- GitHub profile link
- LinkedIn (optional)
- Email (optional)
Do Include
- All required files in correct folders
- Cleaned Excel file with documentation
- At least 10 commented SQL queries
- PBIX file with 3+ dashboard pages
- PDF report with visualizations
- Dashboard screenshots in README
Do Not Include
- Temporary or backup files (~$*.xlsx)
- Power BI auto-recovery files
- Sensitive or personal information
- Extremely large image files (> 2MB each)
- Incomplete or broken files
Enter your GitHub username - we will verify your repository automatically
Grading Rubric
Your project will be graded on the following criteria. Total: 500 points.
| Criteria | Points | Description |
|---|---|---|
| Excel Data Preparation | 75 | Data cleaning, validation, documentation of issues, calculated columns |
| SQL Analysis | 100 | At least 10 queries with proper joins, aggregations, and window functions |
| Power BI Data Model | 75 | Proper relationships, date table, organized measures (10+ DAX) |
| Dashboard Visualizations | 100 | At least 12 visuals across 3 pages with proper interactivity |
| Analysis Report | 75 | Clear insights, supporting metrics, actionable recommendations |
| Documentation | 50 | README quality, SQL comments, folder structure, screenshots |
| Design and Usability | 25 | Visual consistency, clarity, user experience, professional appearance |
| Total | 500 |
Grading Levels
Excellent
Exceeds all requirements with exceptional quality
Good
Meets all requirements with good quality
Satisfactory
Meets minimum requirements
Needs Work
Missing key requirements
Ready to Submit?
Make sure you have completed all requirements and reviewed the grading rubric above.
Submit Your ProjectPre-Submission Checklist
Use this checklist to verify you have completed all requirements before submitting your project.
Excel Requirements
SQL Requirements
Power BI Requirements
Repository Requirements
Common Issues and Solutions
Encountering problems? Don't worry! Here are the most common issues students face and how to resolve them quickly.
Power BI Data Refresh Errors
Data source path errors when opening PBIX file on different computer
Use relative file paths or place data files in the same folder as PBIX file. In Power Query:
= Excel.Workbook(File.Contents("data\retail_sales.csv"))
DAX Time Intelligence Not Working
YTD, MoM, and other time functions return errors or blanks
Create a proper Date dimension table with continuous dates:
Date = CALENDAR(DATE(2014,1,1), DATE(2017,12,31))
SQL Queries Returning Wrong Results
Aggregations return unexpected values or duplicate records
Check JOIN conditions carefully. Use DISTINCT when counting unique values. Verify foreign keys match correctly.
SELECT * FROM table LIMIT 10 to inspect intermediate results
Excel Dates Displaying as Numbers
Dates appear as serial numbers (e.g., 45678 instead of 2024-11-15)
Select date column → Right-click → Format Cells → Choose Date format. In Power Query, change data type to Date.
GitHub File Size Limits
Cannot push large PBIX or Excel files to GitHub (100MB limit)
Compress images, remove unused data, or use Git LFS for files over 50MB. Remove unused columns from Power BI model.
Profit Margin Calculations Off
Profit margins showing incorrect percentages or negative values
Use this DAX formula for accurate profit margin calculation:
Profit Margin = DIVIDE(SUM(Sales[Profit]), SUM(Sales[Sales]), 0)
Still Having Issues?
Check the course discussion forum or reach out for help