Capstone Project 5

Customer Analytics

Master the art of customer analytics by building segmentation models, RFM analysis frameworks, churn prediction systems, and customer lifetime value calculations using Python and Power BI.

12-18 hours
Intermediate
650 Points
What You Will Build
  • Customer segmentation model
  • RFM analysis framework
  • Churn prediction dashboard
  • Customer lifetime value model
  • Interactive Power BI dashboard
Contents
01

Project Overview

This capstone project focuses on customer analytics techniques used by marketing and CRM teams worldwide. You will work with realistic e-commerce customer data spanning 2 years, build customer segmentation models, perform RFM analysis, predict customer churn, and calculate customer lifetime value. Your goal is to deliver actionable insights that help the business retain valuable customers and optimize marketing spend.

Skills Applied: This project tests your proficiency in Python (pandas, scikit-learn), Power BI (DAX, visualizations), customer segmentation, RFM modeling, and predictive analytics.
Segmentation

Group customers by behavior and value patterns

RFM Analysis

Score customers by Recency, Frequency, Monetary

Churn Prediction

Identify at-risk customers before they leave

Customer LTV

Calculate lifetime value for each segment

Learning Objectives

Technical Skills
  • Perform RFM analysis and customer scoring
  • Apply K-means clustering for segmentation
  • Calculate customer lifetime value (CLV)
  • Build churn prediction indicators
  • Create interactive Power BI dashboards
Business Skills
  • Interpret customer behavior patterns
  • Develop retention strategies by segment
  • Optimize marketing spend allocation
  • Present insights to stakeholders
  • Make data-driven CRM recommendations
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Business Scenario

ShopSmart E-Commerce

You have been hired as a Customer Analytics Specialist at ShopSmart, a rapidly growing e-commerce company with over 50,000 customers across multiple product categories. The CMO is concerned about increasing customer churn and wants to develop a data-driven approach to customer retention.

"We are spending millions on customer acquisition, but we do not really understand who our best customers are or why some customers stop buying. I need a comprehensive customer analytics solution that helps us segment our customer base, identify at-risk customers, and prioritize our retention efforts on high-value segments."

Sarah Chen, Chief Marketing Officer

Business Questions to Answer

Customer Segmentation
  • How can we group customers by behavior?
  • What defines our best vs worst customers?
  • How many distinct customer segments exist?
RFM Analysis
  • Who are our Champions and Loyal customers?
  • Which customers are at risk of churning?
  • How do RFM scores correlate with value?
Customer Value
  • What is the lifetime value by segment?
  • How much should we spend to retain customers?
  • Which segments drive the most revenue?
Churn & Retention
  • Which customers are likely to churn?
  • What patterns indicate customer disengagement?
  • How can we proactively retain at-risk customers?
Pro Tip: Think like a CMO! Your analysis should provide clear, actionable segments with specific retention strategies that marketing teams can immediately implement.
03

The Dataset

You will work with a comprehensive e-commerce customer dataset containing transaction history, customer demographics, and behavioral data spanning 2 years with over 50,000 customers.

Kaggle Datasets (Recommended)

Download real customer analytics datasets from Kaggle for authentic analysis experience. These datasets require a free Kaggle account to download.

Primary Datasets (Use These)
E-Commerce Data (UCI) 541K rows Online Retail transactions with Customer ID, perfect for RFM
Brazilian E-Commerce (Olist) 100K orders Complete e-commerce with customers, orders, reviews
Telco Customer Churn 7K rows Customer churn data with demographics and services
Coffee Shop Data 149K rows Transactions with customer and product details
Note: You need a free Kaggle account to download datasets.
Dataset Structure
File Name Records Description Key Columns
customers.csv 50,000 Customer master data with demographics CustomerID, Name, Email, JoinDate, Country, Segment
transactions.csv 500,000 Transaction history over 2 years TransactionID, CustomerID, Date, Amount, ProductCategory
products.csv 2,500 Product catalog with categories ProductID, Name, Category, Price, Cost
Transaction Data Schema
Column Type Description
InvoiceNoStringUnique transaction identifier
StockCodeStringProduct code
DescriptionStringProduct description
QuantityIntegerNumber of units purchased
InvoiceDateDateTimeTransaction date and time
UnitPriceDecimalPrice per unit
CustomerIDIntegerUnique customer identifier
CountryStringCustomer country
Sample Data Preview

Understanding the data structure is crucial. Here is what a typical transaction record looks like:

InvoiceNoCustomerIDInvoiceDateQuantityUnitPriceCountry
536365178502024-12-01 08:266$2.55United Kingdom
536366178502024-12-01 08:288$4.25United Kingdom
536367130472024-12-01 08:3412$1.65United Kingdom
Data Scope: 2 years of transactions, 50K+ customers, 500K+ transactions across multiple countries
Data Quality: Clean dataset with minimal missing values, ready for RFM and segmentation analysis
Data Cleaning Note: Remove cancelled transactions (InvoiceNo starting with 'C'), handle missing CustomerIDs, and filter out negative quantities for accurate analysis.
04

Project Requirements

Your deliverables must include Python notebooks with analysis, a Power BI dashboard, and comprehensive documentation with actionable recommendations.

1
Data Exploration & Cleaning

Start with comprehensive exploratory data analysis:

  • Load and validate all data files
  • Handle missing CustomerIDs and cancelled orders
  • Remove duplicates and negative quantities
  • Create derived columns (TotalAmount = Quantity * UnitPrice)
  • Generate summary statistics and distributions
Notebook: 01_data_exploration.ipynb
2
RFM Analysis

Calculate RFM Metrics:

  • Recency: Days since last purchase (lower is better)
  • Frequency: Total number of transactions
  • Monetary: Total spend amount

RFM Scoring:

  • Score each metric from 1-5 using quintiles
  • Create combined RFM score (e.g., 555, 111)
  • Map scores to customer segments (Champions, Loyal, At Risk, etc.)
Use pandas qcut() for quintile-based scoring
3
Customer Segmentation (K-Means)

Clustering Analysis:

  • Standardize features using StandardScaler
  • Determine optimal clusters using Elbow Method
  • Validate with Silhouette Score
  • Apply K-Means clustering (typically 4-6 clusters)

Segment Profiling:

  • Name each segment based on characteristics
  • Calculate segment statistics (avg spend, frequency)
  • Visualize clusters with scatter plots
4
Customer Lifetime Value (CLV)

CLV Calculation Methods:

  • Historical CLV: Sum of all past transactions
  • Simple CLV: Avg Order Value * Purchase Frequency * Lifespan
  • Predictive CLV: Using BG/NBD or Gamma-Gamma models (optional)

Analysis Requirements:

  • Calculate CLV by customer segment
  • Identify highest and lowest value customers
  • Determine customer acquisition cost thresholds
5
Power BI Dashboard

Create an interactive customer analytics dashboard with the following pages:

  • Customer Overview: Total customers, revenue, KPI cards
  • RFM Analysis: Segment distribution, RFM matrix heatmap
  • Segmentation: Cluster visualization, segment profiles
  • Customer Value: CLV distribution, top customers
  • Churn Risk: At-risk customer identification

Dashboard Requirements:

  • Interactive slicers (date range, country, segment)
  • Drill-through for customer details
  • Conditional formatting for risk indicators
  • Professional color scheme
6
Insights and Recommendations

Key Deliverables:

  • 5-7 data-driven insights about customer behavior
  • Segment-specific retention strategies
  • Marketing budget allocation recommendations
  • Churn prevention action plan
  • Executive summary document (1-2 pages)

Recommended Workflow

Total Estimated Time: 12-18 hours Recommended: Spread across 4-5 days
1
Data Prep
3-4 hours
  • Download dataset
  • EDA and cleaning
  • Feature engineering
2
Analysis
5-6 hours
  • RFM scoring
  • K-Means clustering
  • CLV calculation
3
Dashboard
4-6 hours
  • Power BI visuals
  • Documentation
  • Submit project
05

Customer KPI Specifications

Calculate and display the following Key Performance Indicators for customer analytics. Implement these in both Python (for analysis) and Power BI (for visualization).

RFM Metrics
  • Recency Score: 1-5 based on days since last purchase
  • Frequency Score: 1-5 based on transaction count
  • Monetary Score: 1-5 based on total spend
  • RFM Combined: Concatenated score (e.g., 555)
  • RFM Segment: Champion, Loyal, At Risk, etc.
Value Metrics
  • Customer Lifetime Value: Total expected revenue per customer
  • Average Order Value: Total Revenue / Total Orders
  • Purchase Frequency: Orders / Unique Customers
  • Customer Value: AOV * Purchase Frequency
  • Revenue per Customer: Total Revenue / Customer Count
Customer Metrics
  • Total Customers: Count of unique CustomerIDs
  • Active Customers: Purchased in last 90 days
  • New Customers: First purchase in period
  • Repeat Customer Rate: Customers with 2+ orders
  • Customer Retention Rate: Returning / Total
Churn Metrics
  • Churn Rate: Lost Customers / Total Customers
  • At-Risk Customers: No purchase in 60-90 days
  • Dormant Customers: No purchase in 90+ days
  • Customer Lifespan: Avg days between first and last purchase
  • Days Since Last Purchase: Recency indicator
RFM Segment Definitions
Segment RFM Score Range Description Recommended Action
Champions R: 5, F: 5, M: 5 Best customers, recent, frequent, high value Reward loyalty, early access, referral program
Loyal Customers R: 3-5, F: 4-5, M: 4-5 Buy regularly, responsive to promotions Upsell, recommend premium products
Potential Loyalists R: 4-5, F: 2-3, M: 2-3 Recent customers with average frequency Offer membership, personalized recommendations
At Risk R: 2-3, F: 3-4, M: 3-4 Previously active, declining engagement Win-back campaigns, special offers
Lost R: 1-2, F: 1-2, M: 1-2 Lowest recency, frequency, and value Reactivation campaign or deprioritize
Python: Use pandas for calculations, matplotlib/seaborn for visualizations, and scikit-learn for clustering analysis.
Power BI: Create DAX measures for dynamic calculations. Use CALCULATE with filters for segment-specific metrics.
06

Required Visualizations

Create the following visualizations in your Power BI dashboard. All charts should be interactive, professionally formatted, and tell a clear customer analytics story.

1 Customer Segment Distribution Donut Chart

Business Question: How are customers distributed across RFM segments?

  • Segments as slices with percentages
  • Total customer count in center
  • Color code by segment health (green to red)
2 RFM Heatmap Matrix Matrix

Business Question: How do R, F, M scores correlate?

  • Recency on Y-axis, Frequency on X-axis
  • Color intensity = customer count or avg monetary
  • Interactive drill-through to customer list
3 Customer Value Distribution Histogram

Business Question: What is the CLV distribution?

  • X-axis: CLV ranges, Y-axis: Customer count
  • Highlight top 20% high-value customers
  • Add median and average reference lines
4 Revenue by Segment Bar Chart

Business Question: Which segments drive the most revenue?

  • Segments on Y-axis, Revenue on X-axis
  • Sort descending by revenue
  • Add customer count as secondary metric
5 Customer Trend Analysis Line Chart

Business Question: How are customer metrics trending?

  • X-axis: Month, Y-axis: Customer count
  • Lines for new, active, and churned customers
  • Add trend lines for forecasting
6 Cluster Scatter Plot Scatter

Business Question: How are K-means clusters distributed?

  • X: Frequency, Y: Monetary Value
  • Color by cluster assignment
  • Size by recency score
7 Top Customers Table Table

Business Question: Who are our most valuable customers?

  • Show top 20 by CLV or total spend
  • Include RFM segment and scores
  • Add conditional formatting for values
8 KPI Cards Card

Business Question: What are the key customer metrics?

  • Total Customers, Active Customers, CLV
  • Avg Order Value, Repeat Rate, Churn Rate
  • YoY change with up/down indicators
Dashboard Best Practices
Color Strategy
  • Green for healthy segments
  • Yellow for at-risk customers
  • Red for churned or lost
  • Consistent palette across pages
Interactivity
  • Cross-filtering between visuals
  • Date range slicer
  • Segment and country filters
  • Drill-through to customer details
Executive Focus
  • Lead with KPIs and trends
  • Highlight actionable segments
  • Clear segment recommendations
  • Revenue impact visibility
07

Submission Requirements

Create a Google Drive folder with the exact name shown below and share the link with view access:

Required Folder Name
Customer-Analytics-Project-[YourName]
Example: Customer-Analytics-Project-JohnSmith
Required Project Structure
Customer-Analytics-Project-[YourName]/
├── data/
│   └── customer_data.csv             # Original dataset (download from above)
├── notebooks/
│   ├── 01_data_exploration.ipynb     # EDA and data cleaning
│   ├── 02_rfm_analysis.ipynb         # RFM segmentation
│   ├── 03_customer_segmentation.ipynb # K-means clustering
│   └── 04_clv_calculation.ipynb      # Customer lifetime value
├── powerbi/
│   └── customer_dashboard.pbix       # Your Power BI dashboard file
├── screenshots/
│   ├── dashboard_overview.png        # Overview page screenshot
│   ├── dashboard_segmentation.png    # Segmentation page screenshot
│   └── dashboard_rfm.png             # RFM analysis page screenshot
├── docs/
│   └── executive_summary.pdf         # Executive summary document (1-2 pages)
└── README.md                         # REQUIRED - see contents below
Do Include
  • All Python notebooks with outputs
  • Power BI file (.pbix) with all pages
  • Clear documentation in README
  • Dashboard screenshots
  • Executive summary PDF
  • Original dataset in data folder
Do Not Include
  • Virtual environment folders (.venv)
  • Checkpoint files (.ipynb_checkpoints)
  • Personal or sensitive information
  • Broken file links or formulas
  • Unfinished or draft versions
Submit Your Project

Enter your Google Drive folder link - we will verify your files automatically

08

Grading Rubric

Your project will be graded on the following criteria. Total: 650 points.

Criteria Points Description
Data Exploration & Cleaning 75 Thorough EDA, data quality checks, and proper data preparation
RFM Analysis 100 Correct RFM calculations, proper scoring, and meaningful segments
Customer Segmentation 125 K-means clustering, optimal cluster selection, segment profiling
CLV Calculation 100 Accurate lifetime value modeling with proper methodology
Power BI Dashboard 125 Professional visualizations, interactive features, clean design
Documentation 50 Complete README, executive summary, clear explanations
Recommendations 75 Actionable, data-driven marketing recommendations
Total 650
Grading Scale
Excellent
585-650

90-100%

Good
520-584

80-89%

Satisfactory
455-519

70-79%

Needs Work
<455

<70%

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Project
09

Pre-Submission Checklist

Python Notebook Requirements
Power BI Dashboard Requirements
Documentation Requirements
Submission Requirements