Assignment Overview
In this assignment, you will build a complete Data Analytics System using NumPy. This comprehensive project requires you to apply ALL concepts from Module 3: array creation and manipulation, array operations and broadcasting, and linear algebra operations for real-world data analysis.
import numpy as np).
No pandas, matplotlib, or any other libraries allowed except for data generation. This tests your understanding of pure NumPy operations.
NumPy Arrays (3.1)
Array creation, indexing, slicing, reshaping, and manipulation
Array Operations (3.2)
Broadcasting, vectorized operations, aggregations, and statistical functions
Linear Algebra (3.3)
Matrix operations, dot products, solving systems, eigenvalues, and decompositions
The Scenario
DataCorp Analytics Firm
You have been hired as a Data Analyst at DataCorp Analytics, a firm that provides data analysis services to retail clients. Your manager has assigned you this project:
"We have multi-store sales data from a retail chain. They need insights about store performance, category trends, and sales patterns. Additionally, we need to analyze their investment portfolio and optimize asset allocation. Use NumPy to build an analysis system that can handle these tasks efficiently without pandas or other high-level libraries."
Your Task
Create a Jupyter notebook called numpy_analytics.ipynb that implements a complete
data analytics system using ONLY NumPy. Your code must work with multi-dimensional arrays,
perform statistical analysis, and apply linear algebra operations to solve real business problems.
The Dataset
You will work with a 3D sales array representing 12 months of sales data across 5 stores and 4 product categories. Use this code to generate the dataset:
Sales Data Generation
import numpy as np
# Set random seed for reproducibility
np.random.seed(42)
# Generate sales data: 12 months × 5 stores × 4 categories
# Base sales + seasonal variation + random noise
base_sales = np.array([15000, 8000, 12000, 6000]) # Per category
seasonal_factor = np.array([
1.0, 0.9, 0.95, 1.1, 1.15, 1.2, # Jan-Jun
1.3, 1.25, 1.1, 1.05, 1.15, 1.4 # Jul-Dec
]).reshape(12, 1, 1)
store_factor = np.array([1.2, 0.9, 1.0, 0.85, 1.1]).reshape(1, 5, 1)
sales_data = (base_sales * seasonal_factor * store_factor *
(1 + np.random.randn(12, 5, 4) * 0.1))
print("Sales data shape:", sales_data.shape)
print("Sample (Month 0, Store 0):", sales_data[0, 0, :])
# Labels for reference
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
stores = ['Store A', 'Store B', 'Store C', 'Store D', 'Store E']
categories = ['Electronics', 'Clothing', 'Food', 'Home']
Data Structure Explained
- Shape: (12, 5, 4) = 12 months × 5 stores × 4 categories
- Axis 0: Months (January through December)
- Axis 1: Stores (A through E)
- Axis 2: Categories (Electronics, Clothing, Food, Home)
- Values: Sales amount in rupees (₹)
Portfolio Data (For Part 2)
Generate stock returns data for portfolio optimization:
# Generate daily returns for 5 stocks over 252 trading days
n_days = 252
n_stocks = 5
# Mean returns and covariance matrix
mean_returns = np.array([0.12, 0.10, 0.08, 0.15, 0.07]) / 252
volatilities = np.array([0.20, 0.18, 0.15, 0.25, 0.12]) / np.sqrt(252)
# Correlation matrix
correlation_matrix = np.array([
[1.00, 0.60, 0.40, 0.30, 0.20],
[0.60, 1.00, 0.50, 0.35, 0.25],
[0.40, 0.50, 1.00, 0.30, 0.30],
[0.30, 0.35, 0.30, 1.00, 0.15],
[0.20, 0.25, 0.30, 0.15, 1.00]
])
# Covariance matrix
cov_matrix = np.outer(volatilities, volatilities) * correlation_matrix
# Generate returns
returns = np.random.multivariate_normal(mean_returns, cov_matrix, n_days)
stock_names = ['AAPL', 'GOOGL', 'MSFT', 'TSLA', 'AMZN']
print("Returns shape:", returns.shape) # (252, 5)
Requirements
Your numpy_analytics.ipynb must implement ALL of the following functions.
Each function is mandatory and will be tested individually.
Part 1: Sales Data Analysis (90 points)
Calculate Total Sales
Create a function get_total_sales(sales_data) that:
- Returns total sales across all months, stores, and categories using
np.sum() - Must return a single number (scalar)
def get_total_sales(sales_data):
"""Calculate total sales across all dimensions."""
# Must use: np.sum() with no axis parameter
pass
Monthly Sales Totals
Create a function get_monthly_sales(sales_data) that:
- Returns a 1D array of 12 elements (one per month)
- Each element is the total sales for that month (summed across stores and categories)
- Must use
np.sum()with correct axis parameter
def get_monthly_sales(sales_data):
"""Return array of total sales per month (shape: 12,)"""
# Must use: np.sum(sales_data, axis=(1, 2))
pass
Store Performance Analysis
Create a function get_store_performance(sales_data) that:
- Returns a 1D array of 5 elements (one per store)
- Each element is the total annual sales for that store
- Must sum across months and categories
def get_store_performance(sales_data):
"""Return array of total sales per store (shape: 5,)"""
# Must use: np.sum(sales_data, axis=(0, 2))
pass
Category Performance
Create a function get_category_performance(sales_data) that:
- Returns a 1D array of 4 elements (one per category)
- Each element is the total annual sales for that category
- Must sum across months and stores
def get_category_performance(sales_data):
"""Return array of total sales per category (shape: 4,)"""
# Must use: np.sum(sales_data, axis=(0, 1))
pass
Calculate Growth Rates
Create a function calculate_month_over_month_growth(monthly_sales) that:
- Takes the monthly sales array from function 2
- Calculates month-over-month growth percentage
- Returns array of 11 elements (Feb growth compared to Jan, Mar to Feb, etc.)
- Formula:
((current - previous) / previous) * 100 - Must use array slicing:
monthly_sales[1:] - monthly_sales[:-1]
def calculate_month_over_month_growth(monthly_sales):
"""Return array of growth percentages (shape: 11,)"""
# Must use: slicing and element-wise operations
pass
Find Best and Worst Performers
Create a function find_extremes(sales_data) that:
- Finds best and worst performing store indices using
np.argmax()andnp.argmin() - Finds best and worst performing category indices
- Finds best and worst performing month indices
- Returns a dictionary with keys: 'best_store', 'worst_store', 'best_category', 'worst_category', 'best_month', 'worst_month'
- All values should be integer indices
def find_extremes(sales_data):
"""Return dict of best/worst indices for stores, categories, months."""
# Must use: np.argmax(), np.argmin() on aggregated data
pass
Calculate Statistics
Create a function calculate_statistics(sales_data) that:
- Returns a dictionary with keys: 'mean', 'median', 'std', 'min', 'max'
- All statistics calculated across entire array
- Must use:
np.mean(),np.median(),np.std(),np.min(),np.max()
def calculate_statistics(sales_data):
"""Return dict of statistical measures across all data."""
pass
Apply Broadcasting
Create a function normalize_by_store(sales_data) that:
- Normalizes each store's sales by dividing by that store's total sales
- Returns array of same shape (12, 5, 4) with normalized values
- Must use broadcasting to divide 3D array by 1D store totals
def normalize_by_store(sales_data):
"""Normalize sales by each store's total using broadcasting."""
# Hint: store_totals = get_store_performance(sales_data)
# Then use broadcasting to divide
pass
Filter Data with Boolean Indexing
Create a function count_high_sales(sales_data, threshold) that:
- Counts how many sales entries exceed the given threshold
- Must use boolean indexing:
sales_data > threshold - Returns an integer count
def count_high_sales(sales_data, threshold):
"""Count entries above threshold using boolean indexing."""
# Must use: boolean mask and np.sum()
pass
Part 2: Linear Algebra & Portfolio Analysis (90 points)
Calculate Covariance Matrix
Create a function calculate_covariance(returns) that:
- Takes the returns array (252, 5)
- Returns the 5×5 covariance matrix using
np.cov() - Must transpose returns before passing to
np.cov()
def calculate_covariance(returns):
"""Calculate covariance matrix of stock returns."""
# Must use: np.cov(returns.T) or np.cov(returns, rowvar=False)
pass
Calculate Correlation Matrix
Create a function calculate_correlation(returns) that:
- Returns the 5×5 correlation matrix using
np.corrcoef() - Must transpose returns before passing to
np.corrcoef()
def calculate_correlation(returns):
"""Calculate correlation matrix of stock returns."""
# Must use: np.corrcoef(returns.T) or np.corrcoef(returns, rowvar=False)
pass
Portfolio Return Calculation
Create a function calculate_portfolio_return(returns, weights) that:
- Takes returns array (252, 5) and weights array (5,)
- Calculates daily portfolio returns using matrix multiplication:
returns @ weights - Returns 1D array of 252 daily portfolio returns
def calculate_portfolio_return(returns, weights):
"""Calculate portfolio returns using matrix multiplication."""
# Must use: @ operator or np.dot()
pass
Portfolio Variance
Create a function calculate_portfolio_variance(weights, cov_matrix) that:
- Calculates portfolio variance using formula:
weights.T @ cov_matrix @ weights - Returns a scalar (portfolio variance)
- Must use matrix multiplication
def calculate_portfolio_variance(weights, cov_matrix):
"""Calculate portfolio variance using quadratic form."""
# Must use: weights @ cov_matrix @ weights
pass
Matrix Inversion
Create a function invert_matrix(matrix) that:
- Takes a square matrix as input
- Returns its inverse using
np.linalg.inv() - If matrix is singular (non-invertible), return None
- Use try/except to handle singular matrices
def invert_matrix(matrix):
"""Invert matrix or return None if singular."""
# Must use: np.linalg.inv() with try/except
pass
Solve Linear System
Create a function solve_system(A, b) that:
- Solves the linear system Ax = b
- Returns the solution vector x
- Must use
np.linalg.solve()(not matrix inversion)
def solve_system(A, b):
"""Solve linear system Ax = b."""
# Must use: np.linalg.solve(A, b)
pass
Calculate Eigenvalues
Create a function get_eigenvalues(matrix) that:
- Takes a square matrix as input
- Returns the eigenvalues sorted in descending order
- Must use
np.linalg.eig()andnp.sort()
def get_eigenvalues(matrix):
"""Return sorted eigenvalues (descending order)."""
# Must use: np.linalg.eig() then sort
pass
Matrix Determinant
Create a function calculate_determinant(matrix) that:
- Returns the determinant of a square matrix
- Must use
np.linalg.det()
def calculate_determinant(matrix):
"""Calculate matrix determinant."""
# Must use: np.linalg.det()
pass
Matrix Transpose and Dot Product
Create a function transpose_and_multiply(A, B) that:
- Computes A.T @ B (transpose of A multiplied by B)
- Must use
.Tattribute and@operator - Returns the result matrix
def transpose_and_multiply(A, B):
"""Compute A.T @ B."""
# Must use: A.T @ B
pass
Submission Instructions
Submit your completed assignment via GitHub following these instructions:
Create Jupyter Notebook
Create a single notebook called numpy_analytics.ipynb containing all 18 functions listed above.
- Organize your notebook with clear markdown headers for each part
- Each function must have a docstring explaining what it does
- Include test cells that demonstrate each function working
- Add markdown cells explaining your approach
Include Test Demonstrations
In your notebook, add cells that:
- Imports your
numpy_analyticsmodule - Generates the sales and portfolio datasets
- Calls each of your 18 functions
- Prints the results clearly with labels
Create README
Create README.md that includes:
- Your name and assignment title
- Instructions to run your code
- List of all 18 functions with brief descriptions
- Any challenges you faced and how you solved them
Create requirements.txt
numpy==1.24.0
Repository Structure
Your GitHub repository should look like this:
datacorp-numpy-analytics/
├── README.md
├── requirements.txt
└── numpy_analytics.ipynb # All 18 functions with test demonstrations
Submit via Form
Once your repository is ready:
- Make sure your repository is public or shared with your instructor
- Click the "Submit Assignment" button below
- Fill in the submission form with your GitHub repository URL
Grading Rubric
Your assignment will be graded on the following criteria:
| Criteria | Points | Description |
|---|---|---|
| Array Creation & Manipulation | 35 | Correct use of array creation functions, reshaping, and indexing operations |
| Aggregation & Statistics | 30 | Proper use of np.sum(), np.mean(), axis parameter, and boolean indexing |
| Broadcasting & Vectorization | 25 | Efficient use of broadcasting rules and vectorized operations |
| Linear Algebra Operations | 50 | Matrix multiplication, inversion, solving systems, eigenvalues, determinants |
| Advanced Techniques | 40 | Covariance, correlation matrices, and complex multi-step calculations |
| Code Quality | 20 | Docstrings, comments, proper variable names, clean code organization |
| Total | 200 |
Ready to Submit?
Make sure you have completed all requirements and reviewed the grading rubric above.
Submit Your AssignmentWhat You Will Practice
NumPy Arrays (3.1)
Array creation, multi-dimensional indexing, slicing, reshaping, and understanding axis operations
Array Operations (3.2)
Broadcasting, vectorized operations, aggregations (sum, mean, std), boolean indexing, and statistical functions
Linear Algebra (3.3)
Matrix multiplication, transpose, inversion, solving linear systems, eigenvalues, determinants, and portfolio optimization
Real-World Applications
Multi-store sales analysis, statistical insights, correlation analysis, and financial portfolio optimization
Pro Tips
Array Manipulation
- Use
axisparameter to control operation direction - Understand broadcasting rules for element-wise ops
- Use
reshape(-1)to flatten arrays - Check array shapes with
.shapefrequently
Linear Algebra
- Use
@operator for matrix multiplication - Verify matrix dimensions before operations
- Check for singular matrices before inversion
- Understand difference between
dotand@
Time Management
- Start with Part 1 (Sales Analytics) first
- Test each function with sample data
- Build portfolio analysis incrementally
- Save visualizations as you go
Common Mistakes
- Confusing axis 0 (rows) and axis 1 (columns)
- Forgetting to set random seed for reproducibility
- Using wrong matrix multiplication method
- Not handling division by zero in statistics