Project 5: Time Series Forecasting | Data Science Course

Project Overview

In this capstone project, you will build an end-to-end time series forecasting system for energy consumption prediction. You will apply trend and seasonality decomposition, implement ARIMA and Prophet models, and evaluate forecast accuracy using industry-standard metrics.

Skills Applied: This project tests your proficiency in time series analysis, stationarity testing, seasonal decomposition, ARIMA/SARIMA modeling, and Facebook Prophet.

Decomposition

Extract trend, seasonality, and residuals

ARIMA/SARIMA

Fit statistical time series models

Prophet

Build Facebook Prophet forecasts

Evaluation

Compare models with MAE, RMSE, MAPE

Ready to submit? Already completed the project? Submit your work now!

Submit Now

Business Scenario

GridSmart Energy Solutions

You have been hired as a Senior Data Scientist at GridSmart Energy Solutions, a leading smart grid technology company serving metropolitan areas across India. The company manages energy distribution for over 2 million households and 50,000 commercial establishments.

"Welcome to the GridSmart analytics team! Our biggest challenge is predicting energy consumption accurately to optimize grid operations and prevent blackouts. We need you to build a forecasting system that captures daily patterns, weekly cycles, and seasonal trends. Your models will directly impact our capacity planning and help us reduce energy waste by 15-20%."

Dr. Priya Sharma, Chief Analytics Officer

Business Questions to Answer

Trend Analysis

What is the overall trend in energy consumption?
Is demand increasing or stabilizing over time?
What is the long-term growth rate?

Seasonality

What are the daily consumption patterns?
How does temperature affect energy usage?
What are weekly and yearly cycles?

Peak Prediction

When do peak consumption periods occur?
Can we predict peak demand 24-48 hours ahead?
How accurate are short-term forecasts?

Model Comparison

Which model performs best for short-term forecasts?
Which model handles seasonality better?
What are the trade-offs between models?

Pro Tip: Think like an energy analyst! Your forecasts should help grid operators plan capacity and prevent blackouts.

The Dataset

The dataset contains hourly energy consumption readings from GridSmart's smart meter network, spanning from January 2022 to January 2025. It includes weather data and temporal features essential for time series analysis.

Download energy_consumption.csv

Dataset Schema

Column	Type	Description
`date`	Datetime	Date of the reading (YYYY-MM-DD)
`consumption_kwh`	Float	Energy consumption in kilowatt-hours
`temperature_c`	Float	Ambient temperature in Celsius
`humidity_pct`	Integer	Relative humidity percentage
`is_weekend`	Integer	Weekend indicator (0=Weekday, 1=Weekend)
`is_holiday`	Integer	Public holiday indicator (0=No, 1=Yes)
`hour`	Integer	Hour of the day (0-23)
`day_of_week`	Integer	Day of week (0=Monday, 6=Sunday)
`month`	Integer	Month of the year (1-12)
`year`	Integer	Year of the reading

Dataset Stats: 3+ years of hourly data, 26,000+ readings, weather features, and calendar indicators

Project Requirements

Complete the following steps to build your time series forecasting system. Each step builds upon the previous one, culminating in a comprehensive forecasting solution.

Project Setup and Data Loading

Create project structure with data/, notebooks/, and reports/ folders. Load energy_consumption.csv with proper datetime parsing. Set datetime as index and verify data types.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from prophet import Prophet

# Load the data
df = pd.read_csv('data/energy_consumption.csv', parse_dates=['date'])
df.set_index('date', inplace=True)
print(f"Dataset shape: {df.shape}")
df.head()

Exploratory Time Series Analysis

Plot the complete time series to visualize patterns
Analyze hourly, daily, and monthly consumption patterns
Identify correlation between consumption and temperature
Compare weekday vs weekend consumption profiles

Stationarity Testing

Perform Augmented Dickey-Fuller (ADF) test
Perform KPSS test for stationarity confirmation
Apply differencing if series is non-stationary
Document transformation steps needed

from statsmodels.tsa.stattools import adfuller, kpss

# ADF Test
adf_result = adfuller(daily_data.dropna())
print(f"ADF Statistic: {adf_result[0]:.4f}")
print(f"p-value: {adf_result[1]:.4f}")

Time Series Decomposition

Perform additive decomposition on daily aggregated data
Perform multiplicative decomposition for comparison
Extract and analyze trend, seasonal, and residual components
Determine which decomposition type fits better

ARIMA/SARIMA Model Development

Plot ACF and PACF to determine p, d, q parameters
Fit ARIMA model with selected parameters
Implement SARIMA to capture seasonal patterns
Perform model diagnostics (residual analysis)

Prophet Model Development

Prepare data in Prophet format (ds, y columns)
Configure daily and weekly seasonality
Add holiday effects for Indian holidays
Tune changepoint parameters for trend flexibility

Model Evaluation and Comparison

Split data into train (80%) and test (20%) sets
Calculate MAE, RMSE, and MAPE for both models
Perform time series cross-validation
Compare model performance and select best model

Forecasting and Business Insights

Generate 7-day and 30-day forecasts with confidence intervals
Visualize forecasts against actual values
Identify peak consumption periods
Provide actionable business recommendations

Time Series Decomposition

Decomposition separates a time series into its fundamental components: trend, seasonality, and residuals. This helps understand underlying patterns and improves forecasting accuracy.

Additive Decomposition

Use when seasonal variation is constant over time.

Y(t) = Trend + Seasonal + Residual

from statsmodels.tsa.seasonal import seasonal_decompose

# Aggregate to daily for decomposition
daily_data = df['consumption_kwh'].resample('D').sum()

# Additive decomposition
result_add = seasonal_decompose(
    daily_data, 
    model='additive', 
    period=7  # Weekly seasonality
)
result_add.plot()
plt.tight_layout()
plt.show()

Multiplicative Decomposition

Use when seasonal variation changes proportionally with trend.

Y(t) = Trend × Seasonal × Residual

# Multiplicative decomposition
result_mult = seasonal_decompose(
    daily_data, 
    model='multiplicative', 
    period=7
)
result_mult.plot()
plt.tight_layout()
plt.show()

# Compare residual variance
add_var = result_add.resid.var()
mult_var = result_mult.resid.var()
print(f"Additive Var: {add_var:.2f}")
print(f"Multiplicative Var: {mult_var:.2f}")

Decomposition Tips: Use 7 for weekly patterns, 365 for yearly patterns. If seasonal amplitude grows with trend, use multiplicative.

Forecasting Models

Implement both ARIMA and Prophet models to compare their forecasting performance on energy consumption data. Each has strengths for different forecasting scenarios.

AR (p)

Autoregressive order - number of lag observations

I (d)

Integration order - degree of differencing

MA (q)

Moving average order - size of moving average window

ARIMA / SARIMA Model

from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Plot ACF and PACF to determine p and q
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(daily_data.dropna(), lags=30, ax=axes[0])
plot_pacf(daily_data.dropna(), lags=30, ax=axes[1])
plt.tight_layout()
plt.show()

# Fit SARIMA model with seasonal component
model = SARIMAX(
    train_data,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 7),  # Weekly seasonality
)
sarima_result = model.fit(disp=False)

# Model diagnostics
sarima_result.plot_diagnostics(figsize=(12, 8))
plt.show()

# Forecast
forecast = sarima_result.forecast(steps=30)
conf_int = sarima_result.get_forecast(steps=30).conf_int()

Facebook Prophet Model

Prophet is designed for business forecasting with automatic handling of seasonality, holidays, and trend changes. It is particularly robust to missing data and outliers.

from prophet import Prophet

# Prepare data for Prophet (requires 'ds' and 'y' columns)
prophet_df = daily_data.reset_index()
prophet_df.columns = ['ds', 'y']

# Create and configure Prophet model
model = Prophet(
    weekly_seasonality=True,
    yearly_seasonality=True,
    changepoint_prior_scale=0.05
)

# Add Indian holidays
model.add_country_holidays(country_name='IN')
model.fit(prophet_df)

# Create future dataframe and forecast
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)

# Plot forecast and components
model.plot(forecast)
model.plot_components(forecast)
plt.show()

Model Comparison

from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

def calculate_mape(actual, predicted):
    """Calculate Mean Absolute Percentage Error"""
    return np.mean(np.abs((actual - predicted) / actual)) * 100

def evaluate_model(actual, predicted, model_name):
    """Evaluate forecasting model performance"""
    mae = mean_absolute_error(actual, predicted)
    rmse = np.sqrt(mean_squared_error(actual, predicted))
    mape = calculate_mape(actual, predicted)
    return {'Model': model_name, 'MAE': mae, 'RMSE': rmse, 'MAPE': mape}

# Compare models
comparison_df = pd.DataFrame([
    evaluate_model(test_data, sarima_forecast, 'SARIMA'),
    evaluate_model(test_data, prophet_forecast, 'Prophet')
])
print(comparison_df)

Required Visualizations

Create the following visualizations to demonstrate your time series analysis and forecasting capabilities. Each visualization should be properly labeled.

1. Line Chart

Complete Time Series

Full consumption data over the entire period

2. Multi-Panel

Decomposition Plot

Trend, seasonal, and residual components

3. Correlation

ACF and PACF

Autocorrelation function plots

4. Heatmap

Hourly Consumption Pattern

Hour vs day of week heatmap

5. Box Plot

Monthly Distribution

Consumption by month showing seasonality

6. Scatter Plot

Temperature vs Consumption

Relationship between temp and usage

7. Diagnostic

SARIMA Residuals

4-panel diagnostic plot for validation

8. Forecast

SARIMA Forecast

Forecast with confidence intervals

9. Prophet Forecast

Prophet Forecast Plot

Prophet forecast with uncertainty

10. Components

Prophet Components

Trend, weekly, and yearly seasonality

Submission Requirements

Create a public GitHub repository with the exact name shown below:

Required Repository Name

time-series-forecasting

github.com/<your-username>/time-series-forecasting

Required Project Structure

time-series-forecasting/
├── data/
│   └── energy_consumption.csv  # The dataset
├── notebooks/
│   └── time_series_analysis.ipynb  # Your main analysis notebook
├── reports/
│   └── (visualizations)  # Saved plots
├── requirements.txt  # Python dependencies
└── README.md  # Project documentation

README.md Must Include:

Your full name and submission date
Project overview and business context
Key findings (5-7 bullet points)
Technologies used (Python, statsmodels, Prophet, etc.)
Instructions to run the notebook
Screenshots of at least 3 visualizations

requirements.txt

pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
statsmodels>=0.14.0
prophet>=1.1.0
jupyter>=1.0.0

Do Include

Clear markdown sections with headers
All code cells executed with outputs
All 10 required visualizations
Model comparison table with metrics
Business insights and recommendations
README with screenshots

Do Not Include

Virtual environment folders (venv, .env)
Any .pyc or __pycache__ files
Unexecuted notebooks
Hardcoded absolute file paths
API keys or credentials

Important: Before submitting, run Kernel > Restart and Run All to ensure your notebook executes from top to bottom without errors!

Submit Your Project

Enter your GitHub username - we will verify your repository automatically

Grading Rubric

Your project will be graded on the following criteria. Total: 600 points.

Criteria	Points	Description
Data Loading & EDA	110	Proper datetime parsing, missing values, pattern identification
Stationarity & Decomposition	120	ADF/KPSS tests, additive and multiplicative decomposition
ARIMA/SARIMA Model	80	Parameter selection, fitting, and diagnostics
Prophet Model	80	Configuration, seasonality, and holiday effects
Model Evaluation	60	MAE, RMSE, MAPE calculation and comparison
Visualizations	70	All 10 required charts with proper labels
Code & Documentation	40	Clean code, README, requirements.txt
Business Insights	40	Actionable recommendations based on analysis
Total	600

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Project

Time Series Forecasting

What You Will Build

Contents

Project Overview

Decomposition

ARIMA/SARIMA

Prophet

Evaluation

Business Scenario

GridSmart Energy Solutions

Business Questions to Answer

The Dataset

Dataset Schema

Project Requirements

Project Setup and Data Loading

Exploratory Time Series Analysis

Stationarity Testing

Time Series Decomposition

ARIMA/SARIMA Model Development

Prophet Model Development

Model Evaluation and Comparison

Forecasting and Business Insights

Time Series Decomposition

Forecasting Models

AR (p)

I (d)

MA (q)

ARIMA / SARIMA Model

Facebook Prophet Model

Model Comparison

Required Visualizations

Complete Time Series

Decomposition Plot

ACF and PACF

Hourly Consumption Pattern

Monthly Distribution

Temperature vs Consumption

SARIMA Residuals

SARIMA Forecast

Prophet Forecast Plot

Prophet Components

Submission Requirements

Required Repository Name

Required Project Structure

README.md Must Include:

requirements.txt

Do Include

Do Not Include

Grading Rubric

Ready to Submit?

Pre-Submission Checklist

Notebook Requirements

Repository Requirements