Assignment 3-A

NumPy Data Analysis Challenge

Apply your NumPy skills to real-world scenarios: analyze multi-store sales data, process grayscale images, and optimize investment portfolios using array operations and linear algebra techniques.

6-8 hours
Challenging
200 Points
Submit Assignment
What You'll Practice
  • Array creation and manipulation
  • Vectorized operations & broadcasting
  • Statistical analysis with aggregations
  • Matrix operations & transformations
  • Linear algebra for real problems
Contents
01

Assignment Overview

In this assignment, you will build a complete Data Analytics System using NumPy. This comprehensive project requires you to apply ALL concepts from Module 3: array creation and manipulation, array operations and broadcasting, and linear algebra operations for real-world data analysis.

NumPy Only: You must use ONLY NumPy (import numpy as np). No pandas, matplotlib, or any other libraries allowed except for data generation. This tests your understanding of pure NumPy operations.
Skills Applied: This assignment tests your understanding of NumPy Arrays (Topic 3.1), Array Operations (Topic 3.2), and Linear Algebra (Topic 3.3) from Module 3.
NumPy Arrays (3.1)

Array creation, indexing, slicing, reshaping, and manipulation

Array Operations (3.2)

Broadcasting, vectorized operations, aggregations, and statistical functions

Linear Algebra (3.3)

Matrix operations, dot products, solving systems, eigenvalues, and decompositions

Ready to submit? Already completed the assignment? Submit your work now!
Submit Now
02

The Scenario

DataCorp Analytics Firm

You have been hired as a Data Analyst at DataCorp Analytics, a firm that provides data analysis services to retail clients. Your manager has assigned you this project:

"We have multi-store sales data from a retail chain. They need insights about store performance, category trends, and sales patterns. Additionally, we need to analyze their investment portfolio and optimize asset allocation. Use NumPy to build an analysis system that can handle these tasks efficiently without pandas or other high-level libraries."

Your Task

Create a Jupyter notebook called numpy_analytics.ipynb that implements a complete data analytics system using ONLY NumPy. Your code must work with multi-dimensional arrays, perform statistical analysis, and apply linear algebra operations to solve real business problems.

03

The Dataset

You will work with a 3D sales array representing 12 months of sales data across 5 stores and 4 product categories. Use this code to generate the dataset:

Sales Data Generation

import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Generate sales data: 12 months × 5 stores × 4 categories
# Base sales + seasonal variation + random noise
base_sales = np.array([15000, 8000, 12000, 6000])  # Per category
seasonal_factor = np.array([
    1.0, 0.9, 0.95, 1.1, 1.15, 1.2,  # Jan-Jun
    1.3, 1.25, 1.1, 1.05, 1.15, 1.4  # Jul-Dec
]).reshape(12, 1, 1)

store_factor = np.array([1.2, 0.9, 1.0, 0.85, 1.1]).reshape(1, 5, 1)

sales_data = (base_sales * seasonal_factor * store_factor * 
              (1 + np.random.randn(12, 5, 4) * 0.1))

print("Sales data shape:", sales_data.shape)
print("Sample (Month 0, Store 0):", sales_data[0, 0, :])

# Labels for reference
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
stores = ['Store A', 'Store B', 'Store C', 'Store D', 'Store E']
categories = ['Electronics', 'Clothing', 'Food', 'Home']
Data Structure Explained
  • Shape: (12, 5, 4) = 12 months × 5 stores × 4 categories
  • Axis 0: Months (January through December)
  • Axis 1: Stores (A through E)
  • Axis 2: Categories (Electronics, Clothing, Food, Home)
  • Values: Sales amount in rupees (₹)

Portfolio Data (For Part 2)

Generate stock returns data for portfolio optimization:

# Generate daily returns for 5 stocks over 252 trading days
n_days = 252
n_stocks = 5

# Mean returns and covariance matrix
mean_returns = np.array([0.12, 0.10, 0.08, 0.15, 0.07]) / 252
volatilities = np.array([0.20, 0.18, 0.15, 0.25, 0.12]) / np.sqrt(252)

# Correlation matrix
correlation_matrix = np.array([
    [1.00, 0.60, 0.40, 0.30, 0.20],
    [0.60, 1.00, 0.50, 0.35, 0.25],
    [0.40, 0.50, 1.00, 0.30, 0.30],
    [0.30, 0.35, 0.30, 1.00, 0.15],
    [0.20, 0.25, 0.30, 0.15, 1.00]
])

# Covariance matrix
cov_matrix = np.outer(volatilities, volatilities) * correlation_matrix

# Generate returns
returns = np.random.multivariate_normal(mean_returns, cov_matrix, n_days)

stock_names = ['AAPL', 'GOOGL', 'MSFT', 'TSLA', 'AMZN']
print("Returns shape:", returns.shape)  # (252, 5)
04

Requirements

Your numpy_analytics.ipynb must implement ALL of the following functions. Each function is mandatory and will be tested individually.

Part 1: Sales Data Analysis (90 points)

1
Calculate Total Sales

Create a function get_total_sales(sales_data) that:

  • Returns total sales across all months, stores, and categories using np.sum()
  • Must return a single number (scalar)
def get_total_sales(sales_data):
    """Calculate total sales across all dimensions."""
    # Must use: np.sum() with no axis parameter
    pass
2
Monthly Sales Totals

Create a function get_monthly_sales(sales_data) that:

  • Returns a 1D array of 12 elements (one per month)
  • Each element is the total sales for that month (summed across stores and categories)
  • Must use np.sum() with correct axis parameter
def get_monthly_sales(sales_data):
    """Return array of total sales per month (shape: 12,)"""
    # Must use: np.sum(sales_data, axis=(1, 2))
    pass
3
Store Performance Analysis

Create a function get_store_performance(sales_data) that:

  • Returns a 1D array of 5 elements (one per store)
  • Each element is the total annual sales for that store
  • Must sum across months and categories
def get_store_performance(sales_data):
    """Return array of total sales per store (shape: 5,)"""
    # Must use: np.sum(sales_data, axis=(0, 2))
    pass
4
Category Performance

Create a function get_category_performance(sales_data) that:

  • Returns a 1D array of 4 elements (one per category)
  • Each element is the total annual sales for that category
  • Must sum across months and stores
def get_category_performance(sales_data):
    """Return array of total sales per category (shape: 4,)"""
    # Must use: np.sum(sales_data, axis=(0, 1))
    pass
5
Calculate Growth Rates

Create a function calculate_month_over_month_growth(monthly_sales) that:

  • Takes the monthly sales array from function 2
  • Calculates month-over-month growth percentage
  • Returns array of 11 elements (Feb growth compared to Jan, Mar to Feb, etc.)
  • Formula: ((current - previous) / previous) * 100
  • Must use array slicing: monthly_sales[1:] - monthly_sales[:-1]
def calculate_month_over_month_growth(monthly_sales):
    """Return array of growth percentages (shape: 11,)"""
    # Must use: slicing and element-wise operations
    pass
6
Find Best and Worst Performers

Create a function find_extremes(sales_data) that:

  • Finds best and worst performing store indices using np.argmax() and np.argmin()
  • Finds best and worst performing category indices
  • Finds best and worst performing month indices
  • Returns a dictionary with keys: 'best_store', 'worst_store', 'best_category', 'worst_category', 'best_month', 'worst_month'
  • All values should be integer indices
def find_extremes(sales_data):
    """Return dict of best/worst indices for stores, categories, months."""
    # Must use: np.argmax(), np.argmin() on aggregated data
    pass
7
Calculate Statistics

Create a function calculate_statistics(sales_data) that:

  • Returns a dictionary with keys: 'mean', 'median', 'std', 'min', 'max'
  • All statistics calculated across entire array
  • Must use: np.mean(), np.median(), np.std(), np.min(), np.max()
def calculate_statistics(sales_data):
    """Return dict of statistical measures across all data."""
    pass
8
Apply Broadcasting

Create a function normalize_by_store(sales_data) that:

  • Normalizes each store's sales by dividing by that store's total sales
  • Returns array of same shape (12, 5, 4) with normalized values
  • Must use broadcasting to divide 3D array by 1D store totals
def normalize_by_store(sales_data):
    """Normalize sales by each store's total using broadcasting."""
    # Hint: store_totals = get_store_performance(sales_data)
    # Then use broadcasting to divide
    pass
9
Filter Data with Boolean Indexing

Create a function count_high_sales(sales_data, threshold) that:

  • Counts how many sales entries exceed the given threshold
  • Must use boolean indexing: sales_data > threshold
  • Returns an integer count
def count_high_sales(sales_data, threshold):
    """Count entries above threshold using boolean indexing."""
    # Must use: boolean mask and np.sum()
    pass

Part 2: Linear Algebra & Portfolio Analysis (90 points)

10
Calculate Covariance Matrix

Create a function calculate_covariance(returns) that:

  • Takes the returns array (252, 5)
  • Returns the 5×5 covariance matrix using np.cov()
  • Must transpose returns before passing to np.cov()
def calculate_covariance(returns):
    """Calculate covariance matrix of stock returns."""
    # Must use: np.cov(returns.T) or np.cov(returns, rowvar=False)
    pass
11
Calculate Correlation Matrix

Create a function calculate_correlation(returns) that:

  • Returns the 5×5 correlation matrix using np.corrcoef()
  • Must transpose returns before passing to np.corrcoef()
def calculate_correlation(returns):
    """Calculate correlation matrix of stock returns."""
    # Must use: np.corrcoef(returns.T) or np.corrcoef(returns, rowvar=False)
    pass
12
Portfolio Return Calculation

Create a function calculate_portfolio_return(returns, weights) that:

  • Takes returns array (252, 5) and weights array (5,)
  • Calculates daily portfolio returns using matrix multiplication: returns @ weights
  • Returns 1D array of 252 daily portfolio returns
def calculate_portfolio_return(returns, weights):
    """Calculate portfolio returns using matrix multiplication."""
    # Must use: @ operator or np.dot()
    pass
13
Portfolio Variance

Create a function calculate_portfolio_variance(weights, cov_matrix) that:

  • Calculates portfolio variance using formula: weights.T @ cov_matrix @ weights
  • Returns a scalar (portfolio variance)
  • Must use matrix multiplication
def calculate_portfolio_variance(weights, cov_matrix):
    """Calculate portfolio variance using quadratic form."""
    # Must use: weights @ cov_matrix @ weights
    pass
14
Matrix Inversion

Create a function invert_matrix(matrix) that:

  • Takes a square matrix as input
  • Returns its inverse using np.linalg.inv()
  • If matrix is singular (non-invertible), return None
  • Use try/except to handle singular matrices
def invert_matrix(matrix):
    """Invert matrix or return None if singular."""
    # Must use: np.linalg.inv() with try/except
    pass
15
Solve Linear System

Create a function solve_system(A, b) that:

  • Solves the linear system Ax = b
  • Returns the solution vector x
  • Must use np.linalg.solve() (not matrix inversion)
def solve_system(A, b):
    """Solve linear system Ax = b."""
    # Must use: np.linalg.solve(A, b)
    pass
16
Calculate Eigenvalues

Create a function get_eigenvalues(matrix) that:

  • Takes a square matrix as input
  • Returns the eigenvalues sorted in descending order
  • Must use np.linalg.eig() and np.sort()
def get_eigenvalues(matrix):
    """Return sorted eigenvalues (descending order)."""
    # Must use: np.linalg.eig() then sort
    pass
17
Matrix Determinant

Create a function calculate_determinant(matrix) that:

  • Returns the determinant of a square matrix
  • Must use np.linalg.det()
def calculate_determinant(matrix):
    """Calculate matrix determinant."""
    # Must use: np.linalg.det()
    pass
18
Matrix Transpose and Dot Product

Create a function transpose_and_multiply(A, B) that:

  • Computes A.T @ B (transpose of A multiplied by B)
  • Must use .T attribute and @ operator
  • Returns the result matrix
def transpose_and_multiply(A, B):
    """Compute A.T @ B."""
    # Must use: A.T @ B
    pass
05

Submission Instructions

Submit your completed assignment via GitHub following these instructions:

1
Create Jupyter Notebook

Create a single notebook called numpy_analytics.ipynb containing all 18 functions listed above.

  • Organize your notebook with clear markdown headers for each part
  • Each function must have a docstring explaining what it does
  • Include test cells that demonstrate each function working
  • Add markdown cells explaining your approach
2
Include Test Demonstrations

In your notebook, add cells that:

  • Imports your numpy_analytics module
  • Generates the sales and portfolio datasets
  • Calls each of your 18 functions
  • Prints the results clearly with labels
3
Create README

Create README.md that includes:

  • Your name and assignment title
  • Instructions to run your code
  • List of all 18 functions with brief descriptions
  • Any challenges you faced and how you solved them
3
Create requirements.txt
numpy==1.24.0
4
Repository Structure

Your GitHub repository should look like this:

datacorp-numpy-analytics/
├── README.md
├── requirements.txt
└── numpy_analytics.ipynb    # All 18 functions with test demonstrations
5
Submit via Form

Once your repository is ready:

  • Make sure your repository is public or shared with your instructor
  • Click the "Submit Assignment" button below
  • Fill in the submission form with your GitHub repository URL
Important: Make sure all cells in your notebook run without errors before submitting!
06

Grading Rubric

Your assignment will be graded on the following criteria:

Criteria Points Description
Array Creation & Manipulation 35 Correct use of array creation functions, reshaping, and indexing operations
Aggregation & Statistics 30 Proper use of np.sum(), np.mean(), axis parameter, and boolean indexing
Broadcasting & Vectorization 25 Efficient use of broadcasting rules and vectorized operations
Linear Algebra Operations 50 Matrix multiplication, inversion, solving systems, eigenvalues, determinants
Advanced Techniques 40 Covariance, correlation matrices, and complex multi-step calculations
Code Quality 20 Docstrings, comments, proper variable names, clean code organization
Total 200

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Assignment
07

What You Will Practice

NumPy Arrays (3.1)

Array creation, multi-dimensional indexing, slicing, reshaping, and understanding axis operations

Array Operations (3.2)

Broadcasting, vectorized operations, aggregations (sum, mean, std), boolean indexing, and statistical functions

Linear Algebra (3.3)

Matrix multiplication, transpose, inversion, solving linear systems, eigenvalues, determinants, and portfolio optimization

Real-World Applications

Multi-store sales analysis, statistical insights, correlation analysis, and financial portfolio optimization

08

Pro Tips

Array Manipulation
  • Use axis parameter to control operation direction
  • Understand broadcasting rules for element-wise ops
  • Use reshape(-1) to flatten arrays
  • Check array shapes with .shape frequently
Linear Algebra
  • Use @ operator for matrix multiplication
  • Verify matrix dimensions before operations
  • Check for singular matrices before inversion
  • Understand difference between dot and @
Time Management
  • Start with Part 1 (Sales Analytics) first
  • Test each function with sample data
  • Build portfolio analysis incrementally
  • Save visualizations as you go
Common Mistakes
  • Confusing axis 0 (rows) and axis 1 (columns)
  • Forgetting to set random seed for reproducibility
  • Using wrong matrix multiplication method
  • Not handling division by zero in statistics
09

Pre-Submission Checklist

Analysis Requirements
Repository Requirements