Apply statistical methods to analyze E-Commerce sales data, customer behavior patterns, and marketing campaign performance. Use descriptive statistics, probability theory, and hypothesis testing to drive data-informed business decisions.
In this comprehensive statistics assignment, you will work as a Business Analyst at DataMart E-Commerce. The company has collected vast amounts of sales, customer, and marketing data but lacks statistical insights to make informed business decisions. Your task is to apply statistical methods to uncover patterns, test hypotheses, and provide actionable recommendations.
"Welcome to the DataMart Analytics Team! I need your statistical expertise for several critical business questions we're facing this quarter.
We've been collecting data from our E-Commerce platform, but we need someone who can apply rigorous statistical methods to answer key questions:
I've attached four datasets covering sales transactions, customer profiles, marketing campaigns, and A/B test results. Your assignment is to conduct comprehensive statistical analysis and provide data-driven recommendations.
The executive team is making decisions next week, so accuracy and clear interpretation are critical. Show your work, explain your assumptions, and don't just give numbers--tell us what they mean for the business!"
-- Jennifer Martinez, VP of Analytics
Download all four datasets below. Each dataset contains real-world E-Commerce data that requires statistical analysis.
Sales transaction data including order values, dates, customer segments, product categories, and payment methods.
order_id - Unique transaction identifierorder_value - Total purchase amount in USD (numeric)order_date - Transaction timestampcustomer_segment - Premium, Standard, or Basic (categorical)product_category - Electronics, Fashion, Home, Books (categorical)payment_method - Credit Card, PayPal, Debit Card (categorical)Customer demographic and behavioral data including age, location, lifetime value, satisfaction scores.
customer_id - Unique customer identifierage - Customer age in years (numeric)gender - Male, Female, Other (categorical)city_tier - Tier 1, Tier 2, Tier 3 cities (categorical)lifetime_value - Total customer spend in USD (numeric)satisfaction_score - Rating from 1-10 (ordinal)Marketing campaign performance metrics including impressions, clicks, conversions, and ROI data.
campaign_id - Unique campaign identifierchannel - Email, Social Media, Search Ads, Display (categorical)impressions - Number of ad views (numeric)clicks - Number of ad clicks (numeric)conversions - Number of purchases (numeric)spend - Campaign cost in USD (numeric)A/B test data from website redesign experiment comparing old vs new homepage designs on conversion rates.
user_id - Unique visitor identifiergroup - Control (old design) or Treatment (new design)page_views - Number of pages visited (numeric)time_on_site - Session duration in seconds (numeric)converted - 1 if purchased, 0 if not (binary)device_type - Desktop, Mobile, Tablet (categorical)
Using the datamart_sales.csv dataset, conduct a comprehensive descriptive statistical
analysis of order values across different customer segments and product categories.
order_valuecustomer_segment (Premium, Standard, Basic)product_categoryorder_date)descriptive_stats_report.csv containing all calculated metrics organized by segments and categories.
Using the datamart_campaigns.csv dataset, apply probability theory and distribution analysis
to understand campaign performance and predict future outcomes.
scipy.stats.norm.cdf() for calculating probabilities from normal distribution.
For z-scores: z = (x - mean) / std_dev
Conduct rigorous hypothesis tests to answer critical business questions using both the
datamart_sales.csv and datamart_customers.csv datasets.
scipy.stats.ttest_ind())datamart_sales.csv with datamart_customers.csv on customer_idscipy.stats.chi2_contingency())datamart_customers.csv satisfaction scoreshypothesis_test_results.txt
containing all test results with hypotheses, test statistics, p-values, decisions, and interpretations for each test.
DataMart recently ran an A/B test to evaluate whether a new homepage design improves conversion rates
compared to the old design. Using datamart_ab_test.csv, conduct a comprehensive A/B test analysis.
statsmodels.stats.proportion.proportions_ztest()statsmodels.stats.power)device_type (Desktop, Mobile, Tablet)statistical_insights.pdf.
Submit all files in a single ZIP file named exactly as shown below:
YourName_Statistics_Assignment.zip
Statistics_Assignment/
├── statistics_analysis.ipynb # Jupyter Notebook with ALL exercises
├── descriptive_stats_report.csv # Descriptive statistics summary (Exercise 1)
├── hypothesis_test_results.txt # Hypothesis test results (Exercise 3)
├── statistical_insights.pdf # Professional report (2-3 pages)
└── README.md # REQUIRED - see contents below
Upload your ZIP file to submit your assignment
| Category | Criteria | Points | Your Score |
|---|---|---|---|
| Exercise 1 Descriptive Statistics |
|
25 | |
| Exercise 2 Probability Distributions |
|
20 | |
| Exercise 3 Hypothesis Testing |
|
25 | |
| Exercise 4 A/B Testing |
|
30 | |
| Total Points | 100 | ||
Complete this checklist before submitting your assignment to ensure you haven't missed anything important.