Assignment 7-A

Exploratory Data Analysis Report

Apply your EDA skills to analyze real-world e-commerce data, uncover patterns and insights using univariate, bivariate, and multivariate techniques, and create compelling visualizations that tell a data story.

5-7 hours
Intermediate
100 Points
Submit Assignment
What You'll Practice
  • Univariate distribution analysis
  • Bivariate relationship exploration
  • Correlation matrix analysis
  • Data visualization best practices
  • Insight generation and reporting

Assignment Overview

In this comprehensive assignment, you will work as a Data Analyst at TechMetrics Inc., a growing e-commerce analytics company. Your manager needs you to perform exploratory data analysis on customer, sales, and website data to understand business performance and identify opportunities for growth.

Objectives
  • Understand data distributions and outliers
  • Identify relationships between variables
  • Discover customer segments
  • Analyze sales patterns and trends
  • Create professional visualizations
  • Generate actionable business insights
Skills Tested
  • Pandas for data manipulation
  • Matplotlib and Seaborn for visualization
  • Statistical measures and interpretation
  • Correlation analysis techniques
  • Data storytelling skills
  • Report writing and presentation
Deliverables
  • eda_analysis.ipynb (notebook)
  • eda_report.pdf
  • visualizations/ folder (8 charts)
  • README.md

The Scenario

Email from Jennifer Martinez, VP of Analytics

"Welcome to TechMetrics Inc.! We have been collecting data from our e-commerce platform for the past year, and our executive team is preparing for the quarterly business review. We need a comprehensive EDA report to understand our business better.

I need you to investigate the following areas:

  1. Customer Demographics - Who are our customers? What are their purchasing patterns?
  2. Sales Performance - How are sales distributed across products, regions, and time periods?
  3. Website Behavior - How do visitors interact with our site? What drives conversions?
  4. Key Relationships - What factors correlate with higher sales and customer satisfaction?

The board presentation is in two weeks. I need you to create a thorough EDA report with clear visualizations and actionable insights. Focus on telling the data story - what patterns emerge, what surprises you, and what should we do about it.

Looking forward to your analysis!"

- Jennifer Martinez, VP of Analytics

The Datasets

Download the three datasets below. These contain real business data from TechMetrics Inc.

techmetrics_customers.csv

Customer profiles including demographics, account details, and purchase history metrics.

500 customers Customer data Clean dataset
Download CSV
Columns:
  • customer_id, signup_date, age, gender, region
  • membership_type, total_purchases, total_spent, avg_order_value
  • days_since_last_purchase, satisfaction_score, referral_count
techmetrics_sales.csv

Transaction records with order details, products, quantities, and revenue data.

2000 transactions Sales data Linked to customers
Download CSV
Columns:
  • order_id, customer_id, order_date, product_category
  • product_name, quantity, unit_price, total_amount
  • discount_applied, payment_method, shipping_region
techmetrics_website.csv

Website analytics including page views, session duration, bounce rates, and conversion data.

1500 sessions Web analytics Behavioral data
Download CSV
Columns:
  • session_id, customer_id, visit_date, traffic_source
  • device_type, pages_viewed, session_duration, bounce
  • converted, cart_value, product_views

Requirements

Task 1 Data Overview and Quality Assessment (10 points)

In your eda_analysis.ipynb notebook:
  1. Load and inspect all three datasets:
    • Display shape, data types, and first few rows
    • Use df.info() and df.describe()
    • Document column meanings and relationships
  2. Assess data quality:
    • Check for missing values (count and percentage)
    • Identify duplicate records
    • Validate data types are appropriate
    • Document any data quality issues found
  3. Merge datasets where appropriate:
    • Join sales with customers using customer_id
    • Join website data where relevant
    • Create a master analysis dataset

Task 2 Univariate Analysis (20 points)

Analyze individual variable distributions:
  1. Numerical variable analysis:
    • Calculate mean, median, mode, std, skewness, kurtosis for key variables
    • Create histograms with KDE for: age, total_spent, session_duration, cart_value
    • Identify and visualize outliers using box plots
    • Describe distribution shapes (normal, skewed, bimodal)
  2. Categorical variable analysis:
    • Frequency counts and percentages for: gender, region, membership_type, product_category
    • Create bar charts for top 10 products by sales count
    • Pie chart for traffic source distribution
    • Analyze device type usage patterns
  3. Create visualization:
    • Multi-panel figure with 4 distributions
    • Export as univariate_distributions.png

Task 3 Bivariate Analysis (25 points)

Explore relationships between variable pairs:
  1. Numerical vs Numerical:
    • Scatter plot: age vs total_spent with trend line
    • Scatter plot: session_duration vs cart_value
    • Calculate Pearson and Spearman correlation coefficients
    • Interpret relationship strength and direction
  2. Numerical vs Categorical:
    • Box plots: total_spent by membership_type
    • Violin plots: satisfaction_score by region
    • Group statistics (mean, median) by category
    • Identify significant differences between groups
  3. Categorical vs Categorical:
    • Cross-tabulation: device_type vs converted
    • Stacked bar chart: product_category by region
    • Chi-square test interpretation (reference only)
  4. Create visualizations:
    • Scatter plot matrix for key numerical variables
    • Export as bivariate_relationships.png
    • Grouped comparison chart
    • Export as categorical_comparisons.png

Task 4 Correlation Matrix Analysis (15 points)

Comprehensive correlation exploration:
  1. Create correlation matrix:
    • Include all numerical variables from merged dataset
    • Calculate both Pearson and Spearman correlations
    • Identify pairs with |r| > 0.5
  2. Visualize with heatmap:
    • Annotated heatmap with correlation values
    • Use appropriate color palette (diverging)
    • Apply lower triangle mask for clarity
  3. Interpret key correlations:
    • Discuss strongest positive correlations
    • Discuss strongest negative correlations
    • Identify potential multicollinearity concerns
  4. Create visualization:
    • Professional correlation heatmap
    • Export as correlation_heatmap.png

Task 5 Temporal Pattern Analysis (15 points)

Analyze time-based patterns:
  1. Sales trends over time:
    • Daily/weekly/monthly sales aggregations
    • Line chart showing sales trends
    • Identify peak and slow periods
  2. Customer signup patterns:
    • New customer acquisition over time
    • Cohort analysis by signup month
  3. Website traffic patterns:
    • Visit frequency by day of week
    • Conversion rate trends over time
  4. Create visualizations:
    • Time series line charts
    • Export as temporal_patterns.png

Task 6 Business Insights Report (15 points)

Synthesize findings into actionable insights:
  1. Executive summary:
    • Key findings in bullet points
    • Most important metrics and trends
    • Data quality summary
  2. Customer insights:
    • Customer segment characteristics
    • High-value customer profile
    • Churn risk indicators
  3. Sales insights:
    • Top performing products/categories
    • Regional performance differences
    • Pricing and discount effectiveness
  4. Recommendations:
    • 3-5 actionable recommendations based on data
    • Prioritize by potential impact
    • Suggest further analysis areas
  5. Create summary visualization:
    • Key metrics dashboard figure
    • Export as key_insights_dashboard.png

Grading Rubric

Component Points Criteria
Data Overview 10 Complete data inspection, quality assessment, proper merging
Univariate Analysis 20 Thorough distribution analysis, appropriate statistics, quality visualizations
Bivariate Analysis 25 All three relationship types analyzed, proper techniques, clear interpretations
Correlation Analysis 15 Complete matrix, professional heatmap, meaningful interpretation
Temporal Analysis 15 Time series patterns identified, trends visualized, insights generated
Business Insights 15 Clear insights, actionable recommendations, well-written report
Total 100
Deductions
  • -5 points: Missing or incomplete visualizations
  • -5 points: No interpretation of analysis results
  • -5 points: Code not documented with comments
  • -5 points: Visualizations not exported as required
  • -10 points: Major analysis tasks missing
Bonus Points (up to 10)
  • +3 points: Interactive visualizations using Plotly
  • +3 points: Advanced segmentation analysis (RFM, clustering)
  • +2 points: Automated EDA using libraries like pandas-profiling
  • +2 points: Exceptionally polished and presentation-ready report

Submission

Create a public GitHub repository with the exact name shown below, add all required files, and submit through the submission portal.

github.com/<your-username>/techmetrics-eda
Required Repository Structure:
techmetrics-eda/
├── eda_analysis.ipynb
├── eda_report.pdf
├── visualizations/
│   ├── univariate_distributions.png
│   ├── bivariate_relationships.png
│   ├── categorical_comparisons.png
│   ├── correlation_heatmap.png
│   ├── temporal_patterns.png
│   └── key_insights_dashboard.png
├── data/
│   └── (downloaded CSV files)
└── README.md
Required Files Checklist:
eda_analysis.ipynb eda_report.pdf univariate_distributions.png bivariate_relationships.png categorical_comparisons.png correlation_heatmap.png temporal_patterns.png key_insights_dashboard.png README.md

All files are required. Submission will fail if any file is missing.

Pro Tips

EDA Best Practices
  • Start broad, then narrow down to interesting findings
  • Always check for outliers before interpreting statistics
  • Use appropriate chart types for different data types
  • Label axes and add titles to all visualizations
Code Organization
  • Use markdown cells to explain your thought process
  • Create reusable functions for repeated visualizations
  • Keep visualization code clean and well-commented
  • Use meaningful variable names throughout
Report Writing
  • Lead with insights, not methodology
  • Use visualizations to support your narrative
  • Write for a non-technical business audience
  • Prioritize findings by business impact
Common Mistakes
  • Showing analysis without interpretation
  • Ignoring data quality issues
  • Using correlation to imply causation
  • Creating cluttered or hard-to-read visualizations

Pre-Submission Checklist

Analysis Requirements
Deliverables