Project Overview
This capstone project focuses on customer analytics techniques used by marketing and CRM teams worldwide. You will work with realistic e-commerce customer data spanning 2 years, build customer segmentation models, perform RFM analysis, predict customer churn, and calculate customer lifetime value. Your goal is to deliver actionable insights that help the business retain valuable customers and optimize marketing spend.
Segmentation
Group customers by behavior and value patterns
RFM Analysis
Score customers by Recency, Frequency, Monetary
Churn Prediction
Identify at-risk customers before they leave
Customer LTV
Calculate lifetime value for each segment
Learning Objectives
Technical Skills
- Perform RFM analysis and customer scoring
- Apply K-means clustering for segmentation
- Calculate customer lifetime value (CLV)
- Build churn prediction indicators
- Create interactive Power BI dashboards
Business Skills
- Interpret customer behavior patterns
- Develop retention strategies by segment
- Optimize marketing spend allocation
- Present insights to stakeholders
- Make data-driven CRM recommendations
Business Scenario
ShopSmart E-Commerce
You have been hired as a Customer Analytics Specialist at ShopSmart, a rapidly growing e-commerce company with over 50,000 customers across multiple product categories. The CMO is concerned about increasing customer churn and wants to develop a data-driven approach to customer retention.
"We are spending millions on customer acquisition, but we do not really understand who our best customers are or why some customers stop buying. I need a comprehensive customer analytics solution that helps us segment our customer base, identify at-risk customers, and prioritize our retention efforts on high-value segments."
Business Questions to Answer
- How can we group customers by behavior?
- What defines our best vs worst customers?
- How many distinct customer segments exist?
- Who are our Champions and Loyal customers?
- Which customers are at risk of churning?
- How do RFM scores correlate with value?
- What is the lifetime value by segment?
- How much should we spend to retain customers?
- Which segments drive the most revenue?
- Which customers are likely to churn?
- What patterns indicate customer disengagement?
- How can we proactively retain at-risk customers?
The Dataset
You will work with a comprehensive e-commerce customer dataset containing transaction history, customer demographics, and behavioral data spanning 2 years with over 50,000 customers.
Kaggle Datasets (Recommended)
Download real customer analytics datasets from Kaggle for authentic analysis experience. These datasets require a free Kaggle account to download.
Primary Datasets (Use These)
Dataset Structure
| File Name | Records | Description | Key Columns |
|---|---|---|---|
customers.csv |
50,000 | Customer master data with demographics | CustomerID, Name, Email, JoinDate, Country, Segment |
transactions.csv |
500,000 | Transaction history over 2 years | TransactionID, CustomerID, Date, Amount, ProductCategory |
products.csv |
2,500 | Product catalog with categories | ProductID, Name, Category, Price, Cost |
Transaction Data Schema
| Column | Type | Description |
|---|---|---|
InvoiceNo | String | Unique transaction identifier |
StockCode | String | Product code |
Description | String | Product description |
Quantity | Integer | Number of units purchased |
InvoiceDate | DateTime | Transaction date and time |
UnitPrice | Decimal | Price per unit |
CustomerID | Integer | Unique customer identifier |
Country | String | Customer country |
Sample Data Preview
Understanding the data structure is crucial. Here is what a typical transaction record looks like:
| InvoiceNo | CustomerID | InvoiceDate | Quantity | UnitPrice | Country |
|---|---|---|---|---|---|
| 536365 | 17850 | 2024-12-01 08:26 | 6 | $2.55 | United Kingdom |
| 536366 | 17850 | 2024-12-01 08:28 | 8 | $4.25 | United Kingdom |
| 536367 | 13047 | 2024-12-01 08:34 | 12 | $1.65 | United Kingdom |
Project Requirements
Your deliverables must include Python notebooks with analysis, a Power BI dashboard, and comprehensive documentation with actionable recommendations.
Data Exploration & Cleaning
Start with comprehensive exploratory data analysis:
- Load and validate all data files
- Handle missing CustomerIDs and cancelled orders
- Remove duplicates and negative quantities
- Create derived columns (TotalAmount = Quantity * UnitPrice)
- Generate summary statistics and distributions
RFM Analysis
Calculate RFM Metrics:
- Recency: Days since last purchase (lower is better)
- Frequency: Total number of transactions
- Monetary: Total spend amount
RFM Scoring:
- Score each metric from 1-5 using quintiles
- Create combined RFM score (e.g., 555, 111)
- Map scores to customer segments (Champions, Loyal, At Risk, etc.)
Customer Segmentation (K-Means)
Clustering Analysis:
- Standardize features using StandardScaler
- Determine optimal clusters using Elbow Method
- Validate with Silhouette Score
- Apply K-Means clustering (typically 4-6 clusters)
Segment Profiling:
- Name each segment based on characteristics
- Calculate segment statistics (avg spend, frequency)
- Visualize clusters with scatter plots
Customer Lifetime Value (CLV)
CLV Calculation Methods:
- Historical CLV: Sum of all past transactions
- Simple CLV: Avg Order Value * Purchase Frequency * Lifespan
- Predictive CLV: Using BG/NBD or Gamma-Gamma models (optional)
Analysis Requirements:
- Calculate CLV by customer segment
- Identify highest and lowest value customers
- Determine customer acquisition cost thresholds
Power BI Dashboard
Create an interactive customer analytics dashboard with the following pages:
- Customer Overview: Total customers, revenue, KPI cards
- RFM Analysis: Segment distribution, RFM matrix heatmap
- Segmentation: Cluster visualization, segment profiles
- Customer Value: CLV distribution, top customers
- Churn Risk: At-risk customer identification
Dashboard Requirements:
- Interactive slicers (date range, country, segment)
- Drill-through for customer details
- Conditional formatting for risk indicators
- Professional color scheme
Insights and Recommendations
Key Deliverables:
- 5-7 data-driven insights about customer behavior
- Segment-specific retention strategies
- Marketing budget allocation recommendations
- Churn prevention action plan
- Executive summary document (1-2 pages)
Recommended Workflow
Data Prep
3-4 hours- Download dataset
- EDA and cleaning
- Feature engineering
Analysis
5-6 hours- RFM scoring
- K-Means clustering
- CLV calculation
Dashboard
4-6 hours- Power BI visuals
- Documentation
- Submit project
Customer KPI Specifications
Calculate and display the following Key Performance Indicators for customer analytics. Implement these in both Python (for analysis) and Power BI (for visualization).
- Recency Score: 1-5 based on days since last purchase
- Frequency Score: 1-5 based on transaction count
- Monetary Score: 1-5 based on total spend
- RFM Combined: Concatenated score (e.g., 555)
- RFM Segment: Champion, Loyal, At Risk, etc.
- Customer Lifetime Value: Total expected revenue per customer
- Average Order Value: Total Revenue / Total Orders
- Purchase Frequency: Orders / Unique Customers
- Customer Value: AOV * Purchase Frequency
- Revenue per Customer: Total Revenue / Customer Count
- Total Customers: Count of unique CustomerIDs
- Active Customers: Purchased in last 90 days
- New Customers: First purchase in period
- Repeat Customer Rate: Customers with 2+ orders
- Customer Retention Rate: Returning / Total
- Churn Rate: Lost Customers / Total Customers
- At-Risk Customers: No purchase in 60-90 days
- Dormant Customers: No purchase in 90+ days
- Customer Lifespan: Avg days between first and last purchase
- Days Since Last Purchase: Recency indicator
RFM Segment Definitions
| Segment | RFM Score Range | Description | Recommended Action |
|---|---|---|---|
| Champions | R: 5, F: 5, M: 5 | Best customers, recent, frequent, high value | Reward loyalty, early access, referral program |
| Loyal Customers | R: 3-5, F: 4-5, M: 4-5 | Buy regularly, responsive to promotions | Upsell, recommend premium products |
| Potential Loyalists | R: 4-5, F: 2-3, M: 2-3 | Recent customers with average frequency | Offer membership, personalized recommendations |
| At Risk | R: 2-3, F: 3-4, M: 3-4 | Previously active, declining engagement | Win-back campaigns, special offers |
| Lost | R: 1-2, F: 1-2, M: 1-2 | Lowest recency, frequency, and value | Reactivation campaign or deprioritize |
Required Visualizations
Create the following visualizations in your Power BI dashboard. All charts should be interactive, professionally formatted, and tell a clear customer analytics story.
Business Question: How are customers distributed across RFM segments?
- Segments as slices with percentages
- Total customer count in center
- Color code by segment health (green to red)
Business Question: How do R, F, M scores correlate?
- Recency on Y-axis, Frequency on X-axis
- Color intensity = customer count or avg monetary
- Interactive drill-through to customer list
Business Question: What is the CLV distribution?
- X-axis: CLV ranges, Y-axis: Customer count
- Highlight top 20% high-value customers
- Add median and average reference lines
Business Question: Which segments drive the most revenue?
- Segments on Y-axis, Revenue on X-axis
- Sort descending by revenue
- Add customer count as secondary metric
Business Question: How are customer metrics trending?
- X-axis: Month, Y-axis: Customer count
- Lines for new, active, and churned customers
- Add trend lines for forecasting
Business Question: How are K-means clusters distributed?
- X: Frequency, Y: Monetary Value
- Color by cluster assignment
- Size by recency score
Business Question: Who are our most valuable customers?
- Show top 20 by CLV or total spend
- Include RFM segment and scores
- Add conditional formatting for values
Business Question: What are the key customer metrics?
- Total Customers, Active Customers, CLV
- Avg Order Value, Repeat Rate, Churn Rate
- YoY change with up/down indicators
Dashboard Best Practices
Color Strategy
- Green for healthy segments
- Yellow for at-risk customers
- Red for churned or lost
- Consistent palette across pages
Interactivity
- Cross-filtering between visuals
- Date range slicer
- Segment and country filters
- Drill-through to customer details
Executive Focus
- Lead with KPIs and trends
- Highlight actionable segments
- Clear segment recommendations
- Revenue impact visibility
Submission Requirements
Create a Google Drive folder with the exact name shown below and share the link with view access:
Required Folder Name
Customer-Analytics-Project-[YourName]
Required Project Structure
Customer-Analytics-Project-[YourName]/
├── data/
│ └── customer_data.csv # Original dataset (download from above)
├── notebooks/
│ ├── 01_data_exploration.ipynb # EDA and data cleaning
│ ├── 02_rfm_analysis.ipynb # RFM segmentation
│ ├── 03_customer_segmentation.ipynb # K-means clustering
│ └── 04_clv_calculation.ipynb # Customer lifetime value
├── powerbi/
│ └── customer_dashboard.pbix # Your Power BI dashboard file
├── screenshots/
│ ├── dashboard_overview.png # Overview page screenshot
│ ├── dashboard_segmentation.png # Segmentation page screenshot
│ └── dashboard_rfm.png # RFM analysis page screenshot
├── docs/
│ └── executive_summary.pdf # Executive summary document (1-2 pages)
└── README.md # REQUIRED - see contents below
Do Include
- All Python notebooks with outputs
- Power BI file (.pbix) with all pages
- Clear documentation in README
- Dashboard screenshots
- Executive summary PDF
- Original dataset in data folder
Do Not Include
- Virtual environment folders (.venv)
- Checkpoint files (.ipynb_checkpoints)
- Personal or sensitive information
- Broken file links or formulas
- Unfinished or draft versions
Enter your Google Drive folder link - we will verify your files automatically
Grading Rubric
Your project will be graded on the following criteria. Total: 650 points.
| Criteria | Points | Description |
|---|---|---|
| Data Exploration & Cleaning | 75 | Thorough EDA, data quality checks, and proper data preparation |
| RFM Analysis | 100 | Correct RFM calculations, proper scoring, and meaningful segments |
| Customer Segmentation | 125 | K-means clustering, optimal cluster selection, segment profiling |
| CLV Calculation | 100 | Accurate lifetime value modeling with proper methodology |
| Power BI Dashboard | 125 | Professional visualizations, interactive features, clean design |
| Documentation | 50 | Complete README, executive summary, clear explanations |
| Recommendations | 75 | Actionable, data-driven marketing recommendations |
| Total | 650 |
Grading Scale
Excellent
585-65090-100%
Good
520-58480-89%
Satisfactory
455-51970-79%
Needs Work
<455<70%
Ready to Submit?
Make sure you have completed all requirements and reviewed the grading rubric above.
Submit Your Project