Capstone Project 1

Sales Analysis Dashboard

Build a comprehensive interactive sales dashboard using real e-commerce data. You will clean and prepare data, calculate key performance indicators (KPIs), analyze trends, and create stunning visualizations with Plotly.

8-12 hours
Intermediate
500 Points
What You Will Build
  • Data cleaning pipeline
  • KPI calculation engine
  • Time series trend analysis
  • Regional performance charts
  • Interactive Plotly dashboard
Contents
01

Project Overview

This capstone project brings together everything you have learned in the Data Science course. You will work with a realistic e-commerce sales dataset containing 2000 orders spanning an entire year, with 100 products across 3 categories (Electronics, Furniture, Office Supplies), 300 customers, and 4 geographic regions. Your goal is to build a professional-grade analytics dashboard that transforms raw data into actionable business insights through interactive visualizations and comprehensive KPI analysis.

Skills Applied: This project tests your proficiency in Pandas (data cleaning, transformation), statistical analysis, Matplotlib/Seaborn basics, and Plotly for interactive visualizations.
Data Cleaning

Handle missing values, data types, and outliers

KPI Calculation

Compute revenue, AOV, growth rates, and more

Trend Analysis

Identify patterns and seasonality in sales

Dashboard

Build interactive Plotly visualizations

Learning Objectives

Technical Skills
  • Master pandas for data cleaning and transformation
  • Calculate and interpret business KPIs
  • Create interactive visualizations with Plotly
  • Perform time-series and trend analysis
  • Handle large datasets efficiently
Business Skills
  • Translate business questions into analyses
  • Derive actionable insights from data
  • Present findings to non-technical stakeholders
  • Make data-driven recommendations
  • Build executive-ready dashboards
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Business Scenario

TechMart E-Commerce

You have been hired as a Data Analyst at TechMart, a growing e-commerce company specializing in electronics, furniture, and office supplies. The executive team is preparing for their quarterly board meeting and needs a comprehensive sales dashboard.

"We need to understand our sales performance across regions, identify our top products and customers, and spot any trends that could inform our Q2 strategy. Can you build us an interactive dashboard with the key metrics and visualizations?"

Sarah Chen, VP of Sales

Business Questions to Answer

Revenue
  • What is the total revenue for each month?
  • Which region generates the most revenue?
  • What is our month-over-month growth rate?
Products
  • What are the top 10 best-selling products?
  • Which category has the highest sales volume?
  • What is the average order value by category?
Customers
  • Who are our top 5 customers by revenue?
  • What is the customer repeat purchase rate?
  • How does payment method affect order value?
Trends
  • Are there any weekly patterns in orders?
  • How do discounts impact total revenue?
  • What is the regional distribution of orders?
Pro Tip: Think like a business analyst! Your dashboard should tell a story that helps executives make data-driven decisions.
03

The Dataset

You will work with a realistic e-commerce sales dataset. Download the CSV file and place it in your project's data/ folder:

Dataset Download

Download the e-commerce sales dataset and save it to your project's data/ folder. The CSV file contains 2000 real transaction records with all necessary columns for analysis.

Download ecommerce_sales.csv 96 KB • 2001 rows × 16 columns
Dataset Schema
Column Type Description
order_idIntegerUnique order identifier
order_dateDateDate of the order (YYYY-MM-DD)
customer_idStringUnique customer identifier
customer_nameStringCustomer full name
regionStringGeographic region (North, South, East, West)
cityStringCity name
product_idStringUnique product identifier
product_nameStringProduct name
categoryStringProduct category (Electronics, Furniture, Office Supplies)
subcategoryStringProduct subcategory
quantityIntegerNumber of units ordered
unit_priceFloatPrice per unit ($)
discountFloatDiscount applied (0.0 to 0.20)
total_amountFloatTotal order amount after discount
payment_methodStringPayment type (Credit Card, Debit Card, PayPal)
order_statusStringOrder status (Delivered)
Dataset Stats: 2000 orders, 300 unique customers, 100 unique products, 12 months of data (2024)
Analysis Scope: Total revenue: $1.1M+ across 3 categories and 4 US regions
Sample Data Preview

Understanding the data structure is crucial. Here's what a typical row looks like:

order_idorder_datecustomer_nameregionproduct_namecategoryquantitytotal_amount
10012024-04-12William CoxWestGaming Desktop RGBElectronics1$1,709.99
10022024-12-14Jeffrey HowardWestKeyboard Mouse ComboElectronics1$38.97
10032024-09-27Richard GrayNorthMonitor 27 inchElectronics1$337.49
Data Quality Note: The dataset is clean with no missing values. All orders have "Delivered" status. Focus on analysis rather than extensive data cleaning.
04

Project Requirements

Your Jupyter Notebook must include all of the following components. Structure your notebook with clear markdown headers and explanations for each section.

1
Project Setup and Introduction

Begin your notebook with a professional introduction including project title, your full name, submission date, and comprehensive project overview. Clearly state the business context and objectives. Import all required libraries including pandas for data manipulation, numpy for calculations, and plotly for interactive visualizations.

Expected Sections: Title, Author Info, Executive Summary, Business Context, Objectives, and Library Imports
2
Data Cleaning and Preparation

Data Quality Assessment:

  • Inspect the dataset structure using info(), describe(), and head()
  • Identify and document any missing values, duplicates, or inconsistencies
  • Verify data types for all 16 columns match the schema
  • Check for logical errors (e.g., negative quantities, future dates)

Feature Engineering:

  • Convert order_date from string to datetime format for time-series analysis
  • Extract temporal features: month, month name, week number, day of week, quarter
  • Create derived metrics: revenue per item, discount amount, profit margin indicators
  • Add categorical flags: is_discounted, is_weekend, season, high_value_order

Data Validation:

  • Verify calculations: total_amount = unit_price * quantity * (1 - discount)
  • Check value ranges are realistic (prices, quantities, discounts)
  • Ensure referential integrity between customer_id, product_id, and names
3
Exploratory Data Analysis (EDA)

Univariate Analysis:

  • Statistical summary of numerical columns (revenue, quantity, discount, unit_price)
  • Frequency distributions for categorical variables (category, region, payment_method)
  • Identify outliers using box plots and statistical methods (IQR, z-scores)
  • Analyze value ranges, means, medians, and standard deviations

Bivariate Analysis:

  • Correlation matrix between numerical features (quantity, price, discount, revenue)
  • Revenue distribution across categories, regions, and payment methods
  • Relationship between discount levels and order values
  • Customer purchase patterns and repeat behavior analysis

Pattern Discovery:

  • Temporal patterns: sales trends by month, week, day of week
  • Geographic patterns: regional performance differences
  • Product patterns: category mix, best/worst performers
  • Customer patterns: segmentation by purchase frequency and value
4
KPI Calculations

Calculate all required KPIs organized into four categories (see detailed specifications below):

  • Revenue Metrics: Total revenue, AOV, monthly trends, MoM growth, regional breakdown
  • Product Metrics: Units sold, top performers, category analysis, discount impact
  • Customer Metrics: Total customers, retention rate, CLV, top spenders
  • Operational Metrics: Order volume, payment distribution, regional counts, patterns

Create a comprehensive KPI summary section that displays all metrics in a clear, organized format. Consider using pandas DataFrames, formatted tables, or creating custom display functions. Include percentage changes, comparisons, and context for each metric.

Create reusable functions like calculate_revenue_kpis(df), calculate_product_kpis(df) to organize your code and enable testing.
5
Visualizations with Plotly

Create at least 8 professional interactive visualizations using Plotly Express and Graph Objects. Each visualization should be business-ready with:

  • Clear titles that explain what the chart shows
  • Axis labels with proper units (e.g., "Revenue ($)", "Number of Orders")
  • Color schemes that enhance readability and highlight key information
  • Interactivity: hover details, zoom, pan, and filtering capabilities
  • Annotations for important data points, trends, or insights
  • Consistent styling using themes (plotly_dark, plotly_white)

See the Visualizations section below for specific chart requirements and recommendations. Group related charts together and provide context before each visualization explaining what business question it answers.

6
Insights and Recommendations

Executive Summary: Create a comprehensive analysis summary with clear, actionable insights.

Key Findings (5-7 insights):

  • Data-driven insights backed by specific numbers from your analysis
  • Trend identification (growth patterns, seasonality, anomalies)
  • Performance highlights (top products, regions, customer segments)
  • Problem areas or concerns (declining metrics, underperformers)
  • Opportunity identification (untapped markets, product potential)

Business Recommendations (3-5 actionable items):

  • Specific actions the business should take based on your findings
  • Prioritized by potential impact (high/medium/low)
  • Include expected outcomes or benefits for each recommendation
  • Consider operational feasibility and resource requirements

Future Analysis: Suggest additional analyses or data that would provide deeper insights.

04B

Recommended Workflow

Follow this structured approach to complete your project efficiently and systematically. Each phase builds upon the previous one to create a comprehensive analytics dashboard.

Total Estimated Time: 8-12 hours Recommended: Spread across 2-3 days
1
Setup & Initial Exploration
1-2 hours
Setup
  • Set up project folder structure
  • Load data and initial inspection
  • Create data quality report
  • Document observations
Tip: Start with df.info(), df.describe(), and df.head()
2
Data Preparation
2-3 hours
Cleaning
  • Convert dates & extract features
  • Create derived columns
  • Add categorical flags
  • Validate transformations
  • Save cleaned dataset
Critical: Verify date parsing produces 12 months of 2024 data
3
KPI Calculation
1-2 hours
Metrics
  • Create modular KPI functions
  • Calculate all metrics
  • Build KPI summary
  • Verify with spot checks
Verify: Total Revenue ≈ $1.1M, AOV ≈ $560
4
Visualization Development
2-3 hours
Charts
  • Create 8+ Plotly charts
  • Customize styling & themes
  • Add interactivity
  • Organize by theme
Style: Use consistent colors and plotly_dark theme
5
Analysis & Insights
1-2 hours
Insights
  • Analyze patterns & trends
  • Document 5-7 key findings
  • Develop recommendations
  • Create executive summary
Focus: Answer the business questions from section 02
6
Documentation & Submission
1 hour
Finalize
  • Write comprehensive README
  • Add visualization screenshots
  • Create requirements.txt
  • Test end-to-end execution
  • Push to GitHub & submit
Final Step: Run "Restart & Run All" before submitting
Completion Milestone

Once you complete all 6 phases, you'll have a professional portfolio piece showcasing your data science skills!

6
Phases
05

KPI Specifications

Calculate and display the following Key Performance Indicators. Create functions for reusable calculations.

Revenue Metrics
  • Total Revenue: Sum of all total_amount
  • Average Order Value (AOV): Total Revenue / Number of Orders
  • Revenue by Month: Monthly revenue totals
  • MoM Growth Rate: (Current - Previous) / Previous * 100
  • Revenue by Region: Regional breakdown
Product Metrics
  • Total Units Sold: Sum of all quantity
  • Top 10 Products: By revenue and by quantity
  • Category Revenue: Revenue per category
  • Average Discount: Mean discount rate
  • Discount Impact: Revenue with vs without discount
Customer Metrics
  • Total Customers: Unique customer_id count
  • Repeat Customer Rate: Customers with 2+ orders / Total
  • Top 5 Customers: By total revenue
  • Avg Orders per Customer: Total orders / Unique customers
  • Customer Lifetime Value: Avg revenue per customer
Operational Metrics
  • Orders per Day: Average daily order count
  • Payment Method Split: Distribution by payment type
  • Regional Order Count: Orders per region
  • Busiest Day: Day of week with most orders
  • Category Mix: Percentage by category
KPI Implementation Tips
Organization: Create separate functions for each KPI category to keep code modular and maintainable. Return results as dictionaries or DataFrames for easy display.
Validation: Cross-check calculated KPIs with manual calculations on sample data. Ensure percentage changes and ratios are computed correctly.
06

Required Visualizations

Create at least 8 of the following visualizations using Plotly. All charts should be interactive and professional-quality.

Business Question: How has revenue changed over the year? Are there growth patterns or seasonal trends?

Chart Type: Line chart with markers

Key Features:

  • X-axis: Months (Jan-Dec 2024)
  • Y-axis: Total Revenue ($)
  • Add markers at each data point
  • Include MoM growth rate annotations
  • Use gradient colors to highlight growth

Insight to Find: Identify peak sales months, growth trends, potential seasonality

Business Question: Which products are driving the most revenue?

Chart Type: Horizontal bar chart

Key Features:

  • Sort from highest to lowest revenue
  • Color bars by product category
  • Show exact revenue values on bars
  • Include percentage of total revenue

Insight to Find: Product concentration (are sales dependent on few products?), category of winners

Business Question: What is the revenue mix across product categories?

Chart Type: Donut chart (preferred) or pie chart

Key Features:

  • Show percentages for each category
  • Use distinct colors for each segment
  • Display actual dollar amounts in hover
  • Pull out largest segment slightly

Insight to Find: Category dominance, diversification level, potential gaps

Business Question: How do different regions compare in revenue and order volume?

Chart Type: Grouped bar chart

Key Features:

  • X-axis: Regions (North, South, East, West)
  • Y-axis: Both revenue and order count (dual axis or grouped)
  • Color-code revenue vs orders differently
  • Include average order value per region

Insight to Find: Regional disparities, high-value vs high-volume regions, expansion opportunities

Business Question: When do customers prefer to shop? Are there weekly patterns?

Chart Type: Heatmap

Key Features:

  • Rows: Days of week (Monday-Sunday)
  • Columns: Weeks or months
  • Color intensity: Order count or revenue
  • Use diverging color scale

Insight to Find: Busiest days, weekly patterns, potential for promotions

Business Question: What is the typical order size? Are there outliers?

Chart Type: Box plot (or violin plot for bonus points)

Key Features:

  • Show distribution by category or region
  • Display median, quartiles, and outliers
  • Use horizontal orientation for better readability
  • Include mean line for comparison

Insight to Find: Order value consistency, outlier orders, category pricing differences

Business Question: Is there a relationship between quantity sold and revenue?

Chart Type: Scatter plot

Key Features:

  • X-axis: Quantity, Y-axis: Total Amount
  • Color points by category
  • Size points by discount level (optional)
  • Add trend line if correlation exists

Insight to Find: High-value low-quantity products, bulk purchase patterns, category behaviors

Business Question: How does revenue distribute across category/subcategory/product levels?

Chart Type: Treemap

Key Features:

  • Hierarchy: Category → Subcategory → Product
  • Size: Revenue or units sold
  • Color: Profit margin or category
  • Interactive drill-down capability

Insight to Find: Category composition, subcategory winners, product concentration

Business Question: What are the key performance metrics at a glance?

Chart Type: Dashboard with indicator cards using Plotly

Key Features:

  • Create 6-8 KPI cards showing: Total Revenue, AOV, Orders, Customers, Growth Rate, Top Category
  • Use color coding: green for positive, red for negative trends
  • Include delta indicators (arrows, percentages)
  • Arrange in logical grid layout

Insight to Find: Overall business health snapshot, key metrics summary

Visualization Best Practices
Chart Selection
  • Line charts for time trends
  • Bar charts for comparisons
  • Pie/Donut for composition
  • Heatmaps for patterns
Interactivity
  • Enable hover tooltips
  • Add zoom and pan
  • Use consistent colors
  • Include legends
Clarity
  • Descriptive titles
  • Clear axis labels
  • Proper formatting
  • Annotations for insights
Reference: Consult the Plotly Python documentation for specific chart types and customization options.
07

Submission Requirements

Create a public GitHub repository with the exact name shown below:

Required Repository Name
sales-dashboard-project
github.com/<your-username>/sales-dashboard-project
Required Project Structure
sales-dashboard-project/
├── data/
│   └── ecommerce_sales.csv       # The dataset (download from above)
├── notebooks/
│   └── sales_analysis.ipynb      # Your main analysis notebook
├── requirements.txt              # Python dependencies
└── README.md                     # REQUIRED - see contents below
README.md Template Structure

Your README should follow this professional structure:

1. Project Header
  • Project title
  • Your full name
  • Submission date
  • Course name and project number
2. Business Context
  • Brief description of TechMart scenario
  • Project objectives
  • Dataset overview (2000 orders, 12 months, etc.)
3. Technologies Used
  • Python 3.x
  • Pandas (version)
  • Plotly (version)
  • Jupyter Notebook
  • Other libraries used
4. Key Findings
  • 5-7 data-driven insights
  • Each finding with supporting metric
  • Business implications
5. Visualizations
  • 3-4 screenshots of your best charts
  • Brief caption for each
  • Use markdown image syntax
6. Installation & Usage
  • Clone repository command
  • Install dependencies: pip install -r requirements.txt
  • How to run notebook
  • Expected runtime
7. Project Structure
  • Directory tree showing file organization
  • Brief description of each major file
8. Contact Information
  • Your email (optional)
  • GitHub profile link
  • LinkedIn (optional)
requirements.txt
pandas>=2.0.0
numpy>=1.24.0
plotly>=5.18.0
jupyter>=1.0.0
nbformat>=5.9.0
Do Include
  • Clear markdown sections with headers
  • All code cells executed with outputs
  • At least 8 interactive Plotly charts
  • KPI summary table or cards
  • Business insights and recommendations
  • README with screenshots
Do Not Include
  • Virtual environment folders (venv, .env)
  • Any .pyc or __pycache__ files
  • Unexecuted notebooks
  • Hardcoded absolute file paths
  • API keys or credentials
Important: Before submitting, run Kernel > Restart and Run All to ensure your notebook executes from top to bottom without errors!
Submit Your Project

Enter your GitHub username - we will verify your repository automatically

08

Grading Rubric

Your project will be graded on the following criteria. Total: 500 points.

Criteria Points Description
Data Cleaning 75 Proper handling of data types, missing values, feature engineering
KPI Calculations 100 All required metrics calculated correctly with clear code
Visualizations 125 At least 8 interactive, professional Plotly charts
Analysis Quality 75 Insightful observations and business-relevant findings
Code Quality 50 Clean, well-organized, reusable code with comments
Documentation 50 Clear markdown, README with screenshots, requirements.txt
Recommendations 25 Actionable business recommendations based on analysis
Total 500

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Project
08B

Common Issues & Solutions

Problem: pd.to_datetime() fails or produces unexpected results

Solution: Specify the date format explicitly: pd.to_datetime(df['order_date'], format='%Y-%m-%d')

Verify: Check that all dates fall within 2024 and month extraction works correctly

Problem: Charts show blank output or only display in separate browser window

Solution 1: In Jupyter Lab, install the extension: !pip install jupyterlab-plotly

Solution 2: Use fig.show(renderer='notebook') for classic Jupyter

Solution 3: Try import plotly.io as pio; pio.renderers.default='iframe'

Problem: KPIs don't match expected values or seem incorrect

Solution: Double-check your groupby operations and aggregation functions

Tips:

  • Verify: Total revenue should be around $1.1M
  • Check: AOV should be ~$560
  • Validate: 300 unique customers, 2000 total orders
  • Test calculations on small subset first

Problem: Months appear in alphabetical order (Apr, Aug, Dec...) instead of chronological

Solution: Convert month names to Categorical with proper ordering

Alternative: Use month numbers for sorting, then display month names

Best Practice: Sort by original date column, then aggregate by month

Problem: FileNotFoundError when loading CSV

Solution: Use relative paths from notebook location: '../data/ecommerce_sales.csv'

Check: Verify your folder structure matches the required layout

Avoid: Never use absolute paths like C:/Users/YourName/...

09

Pre-Submission Checklist

Notebook Requirements
Repository Requirements