Project Overview
In this capstone project, you will build an end-to-end time series forecasting system for energy consumption prediction. You will apply trend and seasonality decomposition, implement ARIMA and Prophet models, and evaluate forecast accuracy using industry-standard metrics.
Decomposition
Extract trend, seasonality, and residuals
ARIMA/SARIMA
Fit statistical time series models
Prophet
Build Facebook Prophet forecasts
Evaluation
Compare models with MAE, RMSE, MAPE
Business Scenario
GridSmart Energy Solutions
You have been hired as a Senior Data Scientist at GridSmart Energy Solutions, a leading smart grid technology company serving metropolitan areas across India. The company manages energy distribution for over 2 million households and 50,000 commercial establishments.
"Welcome to the GridSmart analytics team! Our biggest challenge is predicting energy consumption accurately to optimize grid operations and prevent blackouts. We need you to build a forecasting system that captures daily patterns, weekly cycles, and seasonal trends. Your models will directly impact our capacity planning and help us reduce energy waste by 15-20%."
Business Questions to Answer
- What is the overall trend in energy consumption?
- Is demand increasing or stabilizing over time?
- What is the long-term growth rate?
- What are the daily consumption patterns?
- How does temperature affect energy usage?
- What are weekly and yearly cycles?
- When do peak consumption periods occur?
- Can we predict peak demand 24-48 hours ahead?
- How accurate are short-term forecasts?
- Which model performs best for short-term forecasts?
- Which model handles seasonality better?
- What are the trade-offs between models?
The Dataset
The dataset contains hourly energy consumption readings from GridSmart's smart meter network, spanning from January 2022 to January 2025. It includes weather data and temporal features essential for time series analysis.
Dataset Schema
| Column | Type | Description |
|---|---|---|
date | Datetime | Date of the reading (YYYY-MM-DD) |
consumption_kwh | Float | Energy consumption in kilowatt-hours |
temperature_c | Float | Ambient temperature in Celsius |
humidity_pct | Integer | Relative humidity percentage |
is_weekend | Integer | Weekend indicator (0=Weekday, 1=Weekend) |
is_holiday | Integer | Public holiday indicator (0=No, 1=Yes) |
hour | Integer | Hour of the day (0-23) |
day_of_week | Integer | Day of week (0=Monday, 6=Sunday) |
month | Integer | Month of the year (1-12) |
year | Integer | Year of the reading |
Project Requirements
Complete the following steps to build your time series forecasting system. Each step builds upon the previous one, culminating in a comprehensive forecasting solution.
Project Setup and Data Loading
Create project structure with data/, notebooks/, and reports/ folders. Load energy_consumption.csv with proper datetime parsing. Set datetime as index and verify data types.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from prophet import Prophet
# Load the data
df = pd.read_csv('data/energy_consumption.csv', parse_dates=['date'])
df.set_index('date', inplace=True)
print(f"Dataset shape: {df.shape}")
df.head()
Exploratory Time Series Analysis
- Plot the complete time series to visualize patterns
- Analyze hourly, daily, and monthly consumption patterns
- Identify correlation between consumption and temperature
- Compare weekday vs weekend consumption profiles
Stationarity Testing
- Perform Augmented Dickey-Fuller (ADF) test
- Perform KPSS test for stationarity confirmation
- Apply differencing if series is non-stationary
- Document transformation steps needed
from statsmodels.tsa.stattools import adfuller, kpss
# ADF Test
adf_result = adfuller(daily_data.dropna())
print(f"ADF Statistic: {adf_result[0]:.4f}")
print(f"p-value: {adf_result[1]:.4f}")
Time Series Decomposition
- Perform additive decomposition on daily aggregated data
- Perform multiplicative decomposition for comparison
- Extract and analyze trend, seasonal, and residual components
- Determine which decomposition type fits better
ARIMA/SARIMA Model Development
- Plot ACF and PACF to determine p, d, q parameters
- Fit ARIMA model with selected parameters
- Implement SARIMA to capture seasonal patterns
- Perform model diagnostics (residual analysis)
Prophet Model Development
- Prepare data in Prophet format (ds, y columns)
- Configure daily and weekly seasonality
- Add holiday effects for Indian holidays
- Tune changepoint parameters for trend flexibility
Model Evaluation and Comparison
- Split data into train (80%) and test (20%) sets
- Calculate MAE, RMSE, and MAPE for both models
- Perform time series cross-validation
- Compare model performance and select best model
Forecasting and Business Insights
- Generate 7-day and 30-day forecasts with confidence intervals
- Visualize forecasts against actual values
- Identify peak consumption periods
- Provide actionable business recommendations
Time Series Decomposition
Decomposition separates a time series into its fundamental components: trend, seasonality, and residuals. This helps understand underlying patterns and improves forecasting accuracy.
Use when seasonal variation is constant over time.
Y(t) = Trend + Seasonal + Residual
from statsmodels.tsa.seasonal import seasonal_decompose
# Aggregate to daily for decomposition
daily_data = df['consumption_kwh'].resample('D').sum()
# Additive decomposition
result_add = seasonal_decompose(
daily_data,
model='additive',
period=7 # Weekly seasonality
)
result_add.plot()
plt.tight_layout()
plt.show()
Use when seasonal variation changes proportionally with trend.
Y(t) = Trend × Seasonal × Residual
# Multiplicative decomposition
result_mult = seasonal_decompose(
daily_data,
model='multiplicative',
period=7
)
result_mult.plot()
plt.tight_layout()
plt.show()
# Compare residual variance
add_var = result_add.resid.var()
mult_var = result_mult.resid.var()
print(f"Additive Var: {add_var:.2f}")
print(f"Multiplicative Var: {mult_var:.2f}")
Forecasting Models
Implement both ARIMA and Prophet models to compare their forecasting performance on energy consumption data. Each has strengths for different forecasting scenarios.
AR (p)
Autoregressive order - number of lag observations
I (d)
Integration order - degree of differencing
MA (q)
Moving average order - size of moving average window
ARIMA / SARIMA Model
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Plot ACF and PACF to determine p and q
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(daily_data.dropna(), lags=30, ax=axes[0])
plot_pacf(daily_data.dropna(), lags=30, ax=axes[1])
plt.tight_layout()
plt.show()
# Fit SARIMA model with seasonal component
model = SARIMAX(
train_data,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 7), # Weekly seasonality
)
sarima_result = model.fit(disp=False)
# Model diagnostics
sarima_result.plot_diagnostics(figsize=(12, 8))
plt.show()
# Forecast
forecast = sarima_result.forecast(steps=30)
conf_int = sarima_result.get_forecast(steps=30).conf_int()
Facebook Prophet Model
Prophet is designed for business forecasting with automatic handling of seasonality, holidays, and trend changes. It is particularly robust to missing data and outliers.
from prophet import Prophet
# Prepare data for Prophet (requires 'ds' and 'y' columns)
prophet_df = daily_data.reset_index()
prophet_df.columns = ['ds', 'y']
# Create and configure Prophet model
model = Prophet(
weekly_seasonality=True,
yearly_seasonality=True,
changepoint_prior_scale=0.05
)
# Add Indian holidays
model.add_country_holidays(country_name='IN')
model.fit(prophet_df)
# Create future dataframe and forecast
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)
# Plot forecast and components
model.plot(forecast)
model.plot_components(forecast)
plt.show()
Model Comparison
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
def calculate_mape(actual, predicted):
"""Calculate Mean Absolute Percentage Error"""
return np.mean(np.abs((actual - predicted) / actual)) * 100
def evaluate_model(actual, predicted, model_name):
"""Evaluate forecasting model performance"""
mae = mean_absolute_error(actual, predicted)
rmse = np.sqrt(mean_squared_error(actual, predicted))
mape = calculate_mape(actual, predicted)
return {'Model': model_name, 'MAE': mae, 'RMSE': rmse, 'MAPE': mape}
# Compare models
comparison_df = pd.DataFrame([
evaluate_model(test_data, sarima_forecast, 'SARIMA'),
evaluate_model(test_data, prophet_forecast, 'Prophet')
])
print(comparison_df)
Required Visualizations
Create the following visualizations to demonstrate your time series analysis and forecasting capabilities. Each visualization should be properly labeled.
Complete Time Series
Full consumption data over the entire period
Decomposition Plot
Trend, seasonal, and residual components
ACF and PACF
Autocorrelation function plots
Hourly Consumption Pattern
Hour vs day of week heatmap
Monthly Distribution
Consumption by month showing seasonality
Temperature vs Consumption
Relationship between temp and usage
SARIMA Residuals
4-panel diagnostic plot for validation
SARIMA Forecast
Forecast with confidence intervals
Prophet Forecast Plot
Prophet forecast with uncertainty
Prophet Components
Trend, weekly, and yearly seasonality
Submission Requirements
Create a public GitHub repository with the exact name shown below:
Required Repository Name
time-series-forecasting
Required Project Structure
time-series-forecasting/
├── data/
│ └── energy_consumption.csv # The dataset
├── notebooks/
│ └── time_series_analysis.ipynb # Your main analysis notebook
├── reports/
│ └── (visualizations) # Saved plots
├── requirements.txt # Python dependencies
└── README.md # Project documentation
README.md Must Include:
- Your full name and submission date
- Project overview and business context
- Key findings (5-7 bullet points)
- Technologies used (Python, statsmodels, Prophet, etc.)
- Instructions to run the notebook
- Screenshots of at least 3 visualizations
requirements.txt
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
statsmodels>=0.14.0
prophet>=1.1.0
jupyter>=1.0.0
Do Include
- Clear markdown sections with headers
- All code cells executed with outputs
- All 10 required visualizations
- Model comparison table with metrics
- Business insights and recommendations
- README with screenshots
Do Not Include
- Virtual environment folders (venv, .env)
- Any .pyc or __pycache__ files
- Unexecuted notebooks
- Hardcoded absolute file paths
- API keys or credentials
Enter your GitHub username - we will verify your repository automatically
Grading Rubric
Your project will be graded on the following criteria. Total: 600 points.
| Criteria | Points | Description |
|---|---|---|
| Data Loading & EDA | 110 | Proper datetime parsing, missing values, pattern identification |
| Stationarity & Decomposition | 120 | ADF/KPSS tests, additive and multiplicative decomposition |
| ARIMA/SARIMA Model | 80 | Parameter selection, fitting, and diagnostics |
| Prophet Model | 80 | Configuration, seasonality, and holiday effects |
| Model Evaluation | 60 | MAE, RMSE, MAPE calculation and comparison |
| Visualizations | 70 | All 10 required charts with proper labels |
| Code & Documentation | 40 | Clean code, README, requirements.txt |
| Business Insights | 40 | Actionable recommendations based on analysis |
| Total | 600 |
Ready to Submit?
Make sure you have completed all requirements and reviewed the grading rubric above.
Submit Your Project