Capstone Project 6

Task Scheduler System

Build a comprehensive task scheduling application using Java data structures and algorithms. You will implement priority-based scheduling using heaps, handle task dependencies with topological sorting, optimize resource allocation, and create efficient schedulers for real-time task management.

15-20 hours
Advanced
500 Points
What You Will Build
  • Priority Queue & Heap System
  • Job Scheduling Algorithms
  • Load Balancer Implementation
  • Resource Optimizer
  • Real-time Task Manager
Contents
01

Project Overview

This capstone project brings together everything you have learned in the Java DSA course. You will work with real task scheduling data containing 100 system tasks with priorities, dependencies, deadlines, and resource requirements. Your goal is to build a professional task scheduling system that applies data structures and algorithms to solve classic scheduling problems and optimize task execution order.

Skills Applied: This project tests your proficiency in Priority Queues & Heaps (min-heap, max-heap implementations), Job Scheduling Algorithms (SJF, Round Robin, Priority Scheduling), Load Balancing (resource distribution, optimization), and Graph Algorithms (topological sort for dependencies).
Priority Queue & Heap

Min/Max heap implementation from scratch

Job Scheduling

SJF, Round Robin, Priority algorithms

Load Balancing

Distribute tasks across resources efficiently

Optimization

Minimize wait time, maximize throughput

Learning Objectives

Technical Skills
  • Build min-heap and max-heap from scratch
  • Implement classic job scheduling algorithms (SJF, RR, Priority)
  • Design load balancing strategies for distributed systems
  • Optimize for throughput, latency, and resource utilization
  • Apply heaps to solve real scheduling problems
Problem-Solving Skills
  • Translate OS scheduling concepts to code
  • Analyze time and space complexity trade-offs
  • Compare scheduling algorithm performance metrics
  • Handle priority inversion and starvation
  • Test algorithms with real Google cluster data
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Problem Scenario

CloudOps Systems

You have been hired as a Systems Engineer at CloudOps Systems, a cloud infrastructure company that manages thousands of automated tasks across distributed servers. The company runs critical operations including database backups, security scans, data processing pipelines, and deployment workflows that must be scheduled efficiently while respecting priorities, dependencies, and resource constraints.

"We process hundreds of tasks daily across multiple servers. We need a scheduling system that can handle task priorities, respect dependencies between tasks, meet strict deadlines, and allocate resources efficiently. Can you build a scheduler that solves these classic computer science problems?"

Alex Rivera, Chief Operations Officer

Problems to Solve

Priority Scheduling
  • Process tasks by priority (highest first)
  • Use min-heap for earliest deadline first
  • Implement max-heap for highest priority first
  • Support dynamic priority updates
Dependency Resolution
  • Build dependency graph from task data
  • Topological sort for valid execution order
  • Detect circular dependencies (cycles)
  • Find critical path for completion time
Deadline Scheduling
  • Schedule tasks to meet deadlines
  • Maximize number of tasks completed on time
  • Minimize total lateness
  • Handle impossible schedules gracefully
Resource Allocation
  • Assign tasks to servers with limited resources
  • Interval scheduling for non-overlapping tasks
  • Multi-resource constraint satisfaction
  • Load balancing across servers
Pro Tip: Think in terms of data structure selection! Each scheduling problem maps to specific data structures: heaps for priority, graphs for dependencies, greedy for deadlines, and interval trees for resource allocation.
03

The Dataset

You will work with a comprehensive task scheduling dataset containing 100 system tasks. Download the CSV file containing task data for analysis:

Dataset Download

Download the task scheduler dataset. We provide a curated sample, or you can use the full Google Cluster Dataset from Kaggle for extended analysis.

Google cluster traces with real job scheduling data from production systems
Original Data Source

This project uses the Google Cluster Data from Kaggle - real traces from Google's production cluster containing millions of job scheduling events. It includes task priorities, resource requirements, scheduling constraints, and execution metrics from actual Google data centers.

Dataset Info: 29-day trace | 12,500+ machines | Millions of tasks | Includes: CPU/Memory requests, priorities, constraints, scheduling events
Dataset Schema

Each row represents a system task with the following columns:

ColumnTypeDescription
task_idStringUnique task identifier (e.g., T001, T002)
task_nameStringDescriptive name of the task
priorityIntegerPriority level 1-5 (5 = highest priority)
duration_minutesIntegerTime required to complete task
deadlineDateTimeTask must complete by this time
arrival_timeDateTimeWhen task becomes available
dependenciesStringComma-separated task IDs that must complete first
categoryStringTask category (System, Security, Analytics, etc.)
statusStringCurrent status (Pending, Running, Completed)
resource_requiredStringServer/resource needed (Server A, B, C, GPU)
cpu_coresIntegerCPU cores required (1-16)
memory_mbIntegerMemory required in MB (64-16384)

Tasks are distributed across the following categories:

  • System (15 tasks)
  • Security (12 tasks)
  • Analytics (10 tasks)
  • Deployment (8 tasks)
  • Monitoring (8 tasks)
  • Database (7 tasks)
  • Network (6 tasks)
  • Testing (8 tasks)
  • Performance (6 tasks)
  • Communication (4 tasks)
  • Compliance (5 tasks)
  • Others (11 tasks)

For classic scheduling problems, focus on these key fields:

// Example: Task representation for scheduling
class Task {
    String taskId;       // Unique identifier
    int priority;        // 1-5 (use for heap ordering)
    int duration;        // Duration in minutes
    LocalDateTime deadline;    // Must complete by
    List<String> dependencies; // Tasks that must finish first
    String resource;     // Required server/resource
}

// This becomes input for:
// - Priority Queue Scheduling (max-heap by priority)
// - Earliest Deadline First (min-heap by deadline)
// - Topological Sort (dependency graph)
// - Interval Scheduling (resource allocation)
Sample Data Preview

Here is what the data looks like from task_scheduler.csv:

Task IDTask NamePriorityDurationDeadlineDependenciesResource
T001Database Backup545 min2024-01-15 18:00-Server A
T003Data Processing4120 min2024-01-15 14:00T001Server A
T015ML Model Training5240 min2024-01-16 08:00T010GPU Server
Dataset Stats: 100 tasks | 5 priority levels | 4 servers | 12 categories | Complex dependency chains
For Algorithms: Build dependency graph, use priority field for heaps, deadline for scheduling
Data Quality Note: The dataset contains realistic scheduling challenges: circular dependency detection needed, impossible deadline scenarios, resource conflicts, and varying task durations. Your algorithms should handle these gracefully.
04

Project Requirements

Your project must include all of the following components. Structure your code with clear organization, documentation, and comprehensive testing.

1
Scheduling Algorithms (5 Required)

Implement the following classic scheduling algorithms:

  • Priority Queue Scheduler: Process tasks by priority using max-heap
  • Earliest Deadline First (EDF): Schedule by deadline using min-heap
  • Topological Sort Scheduler: Respect task dependencies
  • Deadline Scheduling: Maximize tasks completed on time (greedy)
  • Task Scheduling with Cooldown: LeetCode 621 - CPU interval scheduling
Deliverable: Java classes implementing each algorithm with optimal time complexity.
2
Data Structures Implementation

Build the following custom data structures:

  • TaskPriorityQueue: Custom heap with update and remove operations
  • DependencyGraph: Directed graph for task dependencies
  • TaskRegistry: Hash map with O(1) task lookup by ID
  • ResourceManager: Track server availability and allocation
Deliverable: Custom data structure classes with proper encapsulation and documentation.
3
Testing and Validation

Comprehensive test coverage:

  • Unit Tests: JUnit tests for each algorithm and data structure
  • Edge Cases: Empty task list, single task, circular dependencies, impossible deadlines
  • Performance Tests: Verify time complexity with large inputs
  • Integration Tests: Test with actual CSV data
Deliverable: Test suite with 90%+ code coverage and documented test cases.
4
Documentation and Analysis

Technical documentation:

  • Algorithm Explanations: Describe approach for each scheduling problem
  • Complexity Analysis: Time and space complexity for each solution
  • Trade-offs: Compare different scheduling strategies
  • Results: Scheduling outcomes on provided dataset
Deliverable: README.md with comprehensive documentation and analysis.
05

Algorithm Specifications

Implement the following algorithms with optimal time and space complexity. Each algorithm should be thoroughly tested and documented.

Priority Queue Scheduler (Medium)

Process tasks in order of priority using a max-heap.

// Example:
// tasks = [(T1, p=3), (T2, p=5), (T3, p=1)]
// Output: T2 -> T1 -> T3

public List<Task> scheduleByPriority(List<Task> tasks) {
    PriorityQueue<Task> maxHeap = 
        new PriorityQueue<>((a, b) -> b.priority - a.priority);
    
    maxHeap.addAll(tasks);
    List<Task> schedule = new ArrayList<>();
    
    while (!maxHeap.isEmpty()) {
        schedule.add(maxHeap.poll());
    }
    return schedule;
}
Time: O(n log n) Space: O(n)
Topological Sort (Medium)

Order tasks respecting dependencies (Kahn's Algorithm).

// Example:
// T1 -> T2 -> T3 (T2 depends on T1)
// Output: [T1, T2, T3]

public List<Task> topologicalSort(List<Task> tasks) {
    Map<String, Integer> inDegree = new HashMap<>();
    Queue<Task> queue = new LinkedList<>();
    
    // Initialize in-degrees
    for (Task t : tasks) {
        inDegree.put(t.id, t.dependencies.size());
        if (t.dependencies.isEmpty()) {
            queue.offer(t);
        }
    }
    // BFS processing...
}
Time: O(V + E) Space: O(V)
Earliest Deadline First (Medium)

Schedule tasks by earliest deadline using min-heap.

// Schedule tasks to minimize missed deadlines

public List<Task> scheduleByDeadline(List<Task> tasks) {
    PriorityQueue<Task> minHeap = 
        new PriorityQueue<>(
            Comparator.comparing(t -> t.deadline));
    
    minHeap.addAll(tasks);
    List<Task> schedule = new ArrayList<>();
    
    while (!minHeap.isEmpty()) {
        schedule.add(minHeap.poll());
    }
    return schedule;
}
Time: O(n log n) Space: O(n)
Task Scheduler (Hard - LeetCode 621)

Schedule tasks with cooldown constraint.

// Given tasks and cooldown n, find minimum time
// tasks = ["A","A","A","B","B","B"], n = 2
// Output: 8 (A -> B -> idle -> A -> B -> idle -> A -> B)

public int leastInterval(char[] tasks, int n) {
    int[] freq = new int[26];
    for (char c : tasks) freq[c - 'A']++;
    
    Arrays.sort(freq);
    int maxFreq = freq[25];
    int idleSlots = (maxFreq - 1) * n;
    
    for (int i = 24; i >= 0; i--) {
        idleSlots -= Math.min(freq[i], maxFreq - 1);
    }
    return tasks.length + Math.max(0, idleSlots);
}
Time: O(n) Space: O(1)
Cycle Detection (Medium)

Detect circular dependencies in task graph.

// Return true if circular dependency exists

public boolean hasCycle(List<Task> tasks) {
    Map<String, Integer> state = new HashMap<>();
    // 0 = unvisited, 1 = visiting, 2 = visited
    
    for (Task t : tasks) {
        if (dfs(t, state, taskMap)) {
            return true; // Cycle found
        }
    }
    return false;
}

private boolean dfs(Task t, Map<String, Integer> state) {
    if (state.get(t.id) == 1) return true; // Back edge
    if (state.get(t.id) == 2) return false;
    state.put(t.id, 1);
    // ... check dependencies
}
Time: O(V + E) Space: O(V)
Deadline Scheduling (Hard)

Maximize tasks completed before deadlines (greedy).

// Maximize number of tasks completed on time

public int maxTasksOnTime(List<Task> tasks) {
    // Sort by deadline
    Collections.sort(tasks, 
        Comparator.comparing(t -> t.deadline));
    
    PriorityQueue<Task> scheduled = 
        new PriorityQueue<>((a,b) -> b.duration - a.duration);
    int currentTime = 0;
    
    for (Task t : tasks) {
        scheduled.offer(t);
        currentTime += t.duration;
        if (currentTime > t.deadline) {
            currentTime -= scheduled.poll().duration;
        }
    }
    return scheduled.size();
}
Time: O(n log n) Space: O(n)
06

Data Structure Specifications

Implement the following data structures to support efficient scheduling operations. Focus on optimal time complexity for the core operations.

TaskPriorityQueue

Custom priority queue with update and remove by ID.

  • insert(Task t) - O(log n)
  • extractMax() - O(log n)
  • updatePriority(id, newPriority) - O(log n)
  • remove(id) - O(log n)
  • peek() - O(1)
Hint: Use a HashMap to track positions in the heap array for O(1) lookup.
DependencyGraph

Directed graph for task dependency management.

  • addTask(Task t) - O(1)
  • addDependency(from, to) - O(1)
  • getDependencies(id) - O(1)
  • getDependents(id) - O(1)
  • hasCycle() - O(V + E)
  • topologicalOrder() - O(V + E)
Hint: Use adjacency list with both forward and reverse edges.
TaskRegistry

Fast lookup table for tasks by various keys.

  • register(Task t) - O(1)
  • getById(id) - O(1)
  • getByCategory(category) - O(1)
  • getByResource(resource) - O(1)
  • getByPriority(priority) - O(1)
  • updateStatus(id, status) - O(1)
Hint: Use multiple HashMaps for different index types.
ResourceManager

Track and allocate server resources to tasks.

  • addResource(Resource r) - O(1)
  • allocate(task, resource) - O(1)
  • release(task) - O(1)
  • isAvailable(resource, time) - O(log n)
  • getAvailableSlot(resource, duration) - O(log n)
Hint: Use TreeMap for time-based availability tracking.
07

Submission Guidelines

Submit your project via GitHub. Ensure your repository is public and follows the structure below.

Repository Structure
task-scheduler-system/
├── src/
│   ├── main/java/
│   │   ├── models/
│   │   │   └── Task.java
│   │   ├── datastructures/
│   │   │   ├── TaskPriorityQueue.java
│   │   │   ├── DependencyGraph.java
│   │   │   ├── TaskRegistry.java
│   │   │   └── ResourceManager.java
│   │   ├── algorithms/
│   │   │   ├── PriorityScheduler.java
│   │   │   ├── TopologicalScheduler.java
│   │   │   ├── DeadlineScheduler.java
│   │   │   ├── TaskSchedulerWithCooldown.java
│   │   │   └── CycleDetector.java
│   │   └── Main.java
│   └── test/java/
│       ├── datastructures/
│       └── algorithms/
├── data/
│   └── task_scheduler.csv
├── docs/
│   ├── algorithm-analysis.md
│   └── complexity-report.md
├── README.md
└── pom.xml (or build.gradle)
README Requirements
  • Project overview and objectives
  • Setup instructions (compilation, dependencies)
  • Algorithm descriptions with examples
  • Time and space complexity analysis
  • Usage examples with sample outputs
  • Test cases documentation and results
  • Performance benchmarks on the dataset
Submission Checklist
  • All 5 scheduling algorithms implemented
  • All 4 data structures implemented
  • JUnit test suite with 90%+ coverage
  • Edge cases handled (cycles, empty, impossible)
  • Code properly documented with Javadoc
  • README with all required sections
  • Complexity analysis document
08

Grading Rubric

Your project will be evaluated on the following criteria. Maximum score is 500 points.

Category Criteria Points
Algorithms
  • Priority Queue Scheduler (30 pts)
  • Topological Sort Scheduler (30 pts)
  • Earliest Deadline First (30 pts)
  • Task Scheduler with Cooldown (40 pts)
  • Cycle Detection (20 pts)
150
Data Structures
  • TaskPriorityQueue with update (30 pts)
  • DependencyGraph (25 pts)
  • TaskRegistry (20 pts)
  • ResourceManager (25 pts)
100
Testing
  • Unit test coverage (40 pts)
  • Edge case handling (30 pts)
  • Integration tests with CSV data (30 pts)
100
Documentation
  • README completeness (25 pts)
  • Code documentation/Javadoc (25 pts)
  • Complexity analysis (25 pts)
  • Algorithm explanations (25 pts)
100
Code Quality
  • Clean code principles (15 pts)
  • Proper OOP design (15 pts)
  • Error handling (10 pts)
  • Project structure (10 pts)
50
Total 500
Excellent
450-500

Outstanding implementation

Good
350-449

Meets all requirements

Needs Work
< 350

Missing key requirements