Capstone Project 5

RL Game Agent

Build intelligent game-playing agents using reinforcement learning. You will implement Q-learning from scratch, build a Deep Q-Network (DQN) with experience replay, train agents on OpenAI Gym environments, and create visualizations of your agent's learning progress and gameplay.

15-20 hours
Advanced
750 Points
What You Will Build
  • Q-learning agent implementation
  • Deep Q-Network (DQN) model
  • Experience replay buffer
  • Training visualization dashboard
  • Agent gameplay recordings
Contents
01

Project Overview

Reinforcement Learning (RL) enables agents to learn optimal behaviors through trial and error. In this project, you will build agents that learn to play classic control games from the OpenAI Gymnasium library. You will implement both tabular Q-learning for simple environments and Deep Q-Networks (DQN) for more complex games. Target: Achieve average reward over 195 on CartPole and solve FrozenLake with over 70% success rate.

Skills Applied: This project tests your understanding of RL fundamentals (MDPs, Bellman equations, exploration vs exploitation), neural network implementation, and training optimization techniques.
Q-Learning

Tabular RL with Q-table updates and epsilon-greedy policy

Deep Q-Network

Neural network function approximation for large state spaces

Experience Replay

Memory buffer for stable training with mini-batch sampling

Training Analysis

Learning curves, reward plots, and agent visualization

Learning Objectives

Technical Skills
  • Implement Q-learning algorithm from scratch
  • Build DQN with PyTorch or TensorFlow
  • Create experience replay buffer
  • Implement target network for stable training
  • Record and visualize agent gameplay
RL Concepts
  • Understand Markov Decision Processes (MDPs)
  • Master the Bellman equation and temporal difference learning
  • Balance exploration vs exploitation with epsilon-greedy
  • Tune hyperparameters (learning rate, discount factor, epsilon decay)
  • Evaluate agent performance and convergence
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Problem Scenario

GameMind AI

You have been hired as an AI Research Engineer at GameMind AI, a startup developing intelligent agents for game testing and autonomous systems. The company needs a proof-of-concept showing that RL agents can learn to solve control tasks. Your task is to build agents that can master classic control environments and demonstrate learning progress.

"We need agents that can learn to balance poles, navigate frozen lakes, and control pendulums. Start with tabular methods for simple environments, then scale up to deep learning for complex ones. Document the learning process with visualizations. Can you build this?"

Dr. Sarah Kim, Head of AI Research, GameMind AI

Technical Challenges to Solve

Algorithm Selection
  • When to use tabular Q-learning vs DQN?
  • How to handle continuous state spaces?
  • Trade-offs between sample efficiency and stability
  • Choosing appropriate neural network architecture
Training Stability
  • Why does vanilla DQN training diverge?
  • How does experience replay help?
  • Role of target network in stabilization
  • Epsilon decay scheduling strategies
Hyperparameter Tuning
  • Learning rate (alpha) selection
  • Discount factor (gamma) for long-term rewards
  • Epsilon schedule for exploration
  • Replay buffer size and batch size
Evaluation
  • Measuring learning progress
  • Defining "solved" for each environment
  • Averaging over multiple evaluation episodes
  • Recording and visualizing agent behavior
Pro Tip: Start with FrozenLake-v1 (discrete states) to validate your Q-learning implementation, then move to CartPole-v1 (continuous states) for DQN. Always plot learning curves to debug training issues.
03

Gymnasium Environments

You will work with OpenAI Gymnasium (formerly Gym) environments. These provide standardized interfaces for training and evaluating RL agents.

Required Environments

Install Gymnasium and work with these classic control environments:

FrozenLake-v1 (Q-Learning)

Navigate a frozen lake without falling into holes. Discrete 4x4 grid with slippery ice.

  • State Space: 16 discrete states (grid positions)
  • Action Space: 4 discrete actions (up, down, left, right)
  • Reward: +1 for reaching goal, 0 otherwise
  • Solved: Average reward over 0.7 over 100 episodes
  • Algorithm: Tabular Q-learning
CartPole-v1 (DQN)

Balance a pole on a cart by moving left or right. Classic control benchmark.

  • State Space: 4 continuous values (position, velocity, angle, angular velocity)
  • Action Space: 2 discrete actions (push left, push right)
  • Reward: +1 for each timestep pole is balanced
  • Solved: Average reward over 195 over 100 episodes
  • Algorithm: Deep Q-Network (DQN)
Bonus: Additional Environments

For extra credit, implement agents for these environments:

MountainCar-v0

Drive up a steep hill with momentum. Sparse reward challenge.

LunarLander-v2

Land a spacecraft on the moon. Continuous state, discrete action.

Acrobot-v1

Swing a 2-link robot above a line. Challenging control problem.

Note: Install with pip install gymnasium[classic-control]. For video recording, also install pip install gymnasium[other] for MoviePy support.
04

Project Requirements

Your project must include all of the following components. This is a comprehensive reinforcement learning project covering both tabular and deep RL methods.

1
Environment Setup

Set up Gymnasium environments:

  • Install Gymnasium and dependencies
  • Create environment wrappers
  • Understand state and action spaces
  • Test random agent baseline
  • Set up video recording for evaluation
Deliverable: Environment exploration notebook showing state/action spaces and random baseline performance.
2
Q-Learning Implementation

Build tabular Q-learning agent:

  • Initialize Q-table with proper dimensions
  • Implement epsilon-greedy action selection
  • Apply Bellman equation for Q-value updates
  • Implement epsilon decay schedule
  • Train on FrozenLake-v1 environment

Target: Achieve over 70% success rate on FrozenLake

Deliverable: Q-learning notebook with trained agent achieving target performance.
3
Deep Q-Network (DQN)

Build DQN with neural network:

  • Design Q-network architecture (MLP)
  • Implement experience replay buffer
  • Create target network for stability
  • Implement training loop with mini-batch sampling
  • Add soft or hard target network updates

Target: Solve CartPole-v1 (average reward over 195)

Deliverable: DQN notebook with trained agent solving CartPole.
4
Hyperparameter Tuning

Experiment with hyperparameters:

  • Test different learning rates (alpha)
  • Vary discount factor (gamma)
  • Compare epsilon decay schedules
  • Tune replay buffer and batch sizes
  • Document impact on learning curves
Deliverable: Hyperparameter analysis with comparison plots.
5
Training Visualization

Create comprehensive visualizations:

  • Plot episode rewards over training
  • Show moving average reward curves
  • Visualize epsilon decay
  • Plot loss curves for DQN
  • Compare Q-learning vs DQN performance
Deliverable: Training visualization dashboard with all plots.
6
Agent Visualization

Record and analyze agent gameplay:

  • Record videos of trained agents playing
  • Create before/after training comparisons
  • Visualize Q-values or policy for discrete envs
  • Document agent behavior and strategies
Deliverable: Video recordings and Q-value/policy visualizations.
05

Q-Learning Algorithm

Q-learning is a model-free, off-policy algorithm that learns the value of state-action pairs. It uses the Bellman equation to iteratively update Q-values toward optimal values.

Key Equations
ConceptEquationDescription
Q-Value Update Q(s,a) ← Q(s,a) + α[r + γ·max Q(s',a') - Q(s,a)] Update Q-value using temporal difference
Epsilon-Greedy a = argmax Q(s,a) with prob (1-ε), random with prob ε Balance exploration and exploitation
Epsilon Decay ε = max(ε_min, ε × decay_rate) Reduce exploration over time
Recommended Hyperparameters
ParameterSymbolFrozenLakeDescription
Learning Rateα0.1 - 0.8How much to update Q-values
Discount Factorγ0.95 - 0.99Importance of future rewards
Initial Epsilonε₀1.0Start with full exploration
Min Epsilonε_min0.01 - 0.1Minimum exploration rate
Epsilon Decaydecay0.995 - 0.999Per-episode decay rate
EpisodesN10,000 - 50,000Training episodes
06

Deep Q-Network (DQN)

DQN extends Q-learning to continuous state spaces using neural networks as function approximators. Key innovations include experience replay and target networks for stable training.

DQN Architecture
LayerTypeSizeActivation
InputState4 (CartPole)-
Hidden 1Dense64 - 128ReLU
Hidden 2Dense64 - 128ReLU
OutputDense2 (actions)Linear
Key DQN Components
Experience Replay
  • Store transitions (s, a, r, s', done) in buffer
  • Sample random mini-batches for training
  • Breaks correlation between consecutive samples
  • Typical buffer size: 10,000 - 100,000
  • Batch size: 32 - 128
Target Network
  • Separate network for computing targets
  • Updated less frequently than online network
  • Prevents moving target problem
  • Hard update every N steps or soft update (τ)
  • τ = 0.001 - 0.01 for soft updates
DQN Hyperparameters
ParameterCartPole ValueDescription
Learning Rate0.001Adam optimizer learning rate
Discount Factor (γ)0.99Future reward discount
Replay Buffer Size10,000Maximum stored transitions
Batch Size64Mini-batch size for training
Target Update Freq100 stepsSteps between target network updates
Epsilon Start1.0Initial exploration rate
Epsilon End0.01Final exploration rate
Epsilon Decay0.995Per-episode decay
07

Evaluation and Visualization

Proper evaluation and visualization are essential for understanding agent learning and debugging training issues.

Required Visualizations
Learning Curves

Episode rewards over training with moving average

Loss Curves

DQN training loss over time

Epsilon Decay

Exploration rate over episodes

Q-Table Heatmap

Visualize learned Q-values for FrozenLake

Agent Gameplay

Recorded videos of trained agents

Before/After

Compare random vs trained agent

Visualization Tips:
  • Use rolling averages (window=100) to smooth noisy reward curves
  • Record videos using Gymnasium's RecordVideo wrapper
  • For FrozenLake, create a grid showing optimal action per state
  • Include training time and hardware specs in documentation
08

Submission Requirements

Create a public GitHub repository with the exact name shown below:

Required Repository Name
rl-game-agent
github.com/<your-username>/rl-game-agent
Required Project Structure
rl-game-agent/
├── notebooks/
│   ├── 01_environment_exploration.ipynb   # Env setup and baseline
│   ├── 02_q_learning.ipynb                # Q-learning implementation
│   ├── 03_dqn.ipynb                       # DQN implementation
│   └── 04_visualization.ipynb             # Training analysis
├── src/
│   ├── q_learning.py                      # Q-learning agent class
│   ├── dqn.py                             # DQN agent class
│   ├── replay_buffer.py                   # Experience replay
│   └── utils.py                           # Helper functions
├── models/
│   ├── q_table_frozenlake.npy             # Trained Q-table
│   └── dqn_cartpole.pt                    # Trained DQN weights
├── reports/
│   ├── learning_curves.png                # Reward plots
│   ├── q_table_heatmap.png                # Q-value visualization
│   └── hyperparameter_comparison.png      # Tuning results
├── videos/
│   ├── frozenlake_trained.mp4             # FrozenLake gameplay
│   └── cartpole_trained.mp4               # CartPole gameplay
├── requirements.txt                       # Python dependencies
└── README.md                              # Project documentation
README.md Required Sections
1. Project Header
  • Project title and description
  • Your full name and submission date
  • Final performance metrics
2. Environments
  • Environments used
  • State and action space descriptions
  • Solved criteria for each
3. Q-Learning Results
  • FrozenLake success rate
  • Hyperparameters used
  • Q-table visualization
4. DQN Results
  • CartPole average reward
  • Network architecture
  • Training configuration
5. Visualizations
  • Learning curves
  • GIF or video demos
  • Hyperparameter analysis
6. How to Run
  • Installation instructions
  • Training commands
  • Evaluation commands
Submit Your Project

Enter your GitHub username - we will verify your repository automatically

09

Grading Rubric

Your project will be graded on the following criteria. Total: 750 points.

Criteria Points Description
Environment Setup 50 Proper setup, baseline evaluation
Q-Learning Implementation 150 Correct algorithm, over 70% success on FrozenLake
DQN Implementation 200 Replay buffer, target network, solves CartPole
Hyperparameter Analysis 100 Systematic tuning with documented results
Visualizations 125 Learning curves, Q-table, agent videos
Documentation 100 README quality, code comments, reproducibility
Bonus: Extra Environment 25 Solve LunarLander or MountainCar
Total 750
Grading Levels
Excellent
675-750

Solves all envs, excellent visualizations

Good
563-674

Meets all requirements, good docs

Satisfactory
450-562

Meets minimum requirements

Needs Work
< 450

Missing components or poor performance

Ready to Submit?

Make sure your agents are trained and videos are recorded.

Submit Your Project