Capstone Project 5

Text Editor with Undo/Redo

Build a fully functional text editor using stack-based command pattern. You will implement undo and redo functionality using dual stacks, apply memory management and optimization techniques, and create an efficient editing experience with proper state management.

20-25 hours
Advanced
500 Points
What You Will Build
  • Stack-based Command Pattern
  • Undo/Redo with Dual Stacks
  • Memory Management & Optimization
  • Operation History Tracking
  • State Snapshots & Recovery
Contents
01

Project Overview

This capstone project challenges you to build a text editor with complete undo/redo functionality. You will work with real editing operation data containing 100+ editing events including insertions, deletions, undo/redo operations, and cursor movements. Your goal is to implement stack-based command pattern and understand memory management techniques that power professional text editors.

Skills Applied: This project tests your proficiency in Stacks (dual-stack undo/redo), Command Pattern (encapsulating operations as objects), Memory Management (efficient allocation and cleanup), and Design Patterns (Command, Memento for state snapshots).
Stack-based Commands

Command pattern with dual stacks

Undo/Redo System

Full operation history management

Memory Management

Efficient memory allocation & cleanup

Optimization

Performance tuning & state compression

Learning Objectives

Technical Skills
  • Implement Rope data structure with balanced trees
  • Build Gap Buffer for efficient local editing
  • Create Piece Table for append-only editing
  • Implement undo/redo using Command and Memento patterns
  • Apply KMP/Rabin-Karp for efficient text search
Problem-Solving Skills
  • Understand why String is inefficient for editors
  • Analyze time complexity of edit operations
  • Compare trade-offs between different approaches
  • Handle edge cases in text manipulation
  • Test with real VS Code editing data
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Problem Scenario

CodeCraft IDE

You have been hired as a Core Engineer at CodeCraft IDE, a startup building a next-generation code editor to compete with VS Code and Sublime Text. The team has discovered that using Java's String class for document storage causes severe performance issues when editing large files - every keystroke becomes slow because strings are immutable and require O(n) copy operations.

"Our prototype crashes when opening files over 1MB. Simple typing has noticeable lag. We need you to implement proper data structures like VS Code uses. Can you build a text buffer that handles millions of characters with O(log n) edits and efficient undo/redo?"

Emily Chen, Lead Architect

Problems to Solve

Text Storage
  • Efficient insert/delete at any position
  • Handle files with millions of characters
  • O(log n) operations instead of O(n)
  • Memory-efficient for large documents
Undo/Redo
  • Unlimited undo history
  • Group related operations together
  • Efficient memory usage for history
  • Support redo after undo
Find & Replace
  • Fast pattern matching in large files
  • Find all occurrences efficiently
  • Replace single or all matches
  • Support regex patterns (bonus)
Cursor & Selection
  • Track cursor position efficiently
  • Support text selection ranges
  • Handle multi-cursor editing
  • Line/column navigation
Pro Tip: VS Code uses a Piece Table for its text buffer, Vim uses a Gap Buffer, and many editors use Rope. Each has different trade-offs - your job is to implement at least one and understand when to use each!
03

The Dataset

You will work with a dataset of real text editor operations. Download the CSV file containing editing events to test your implementation:

Dataset Download

Download the text editor operations dataset. We provide a curated sample, or you can explore the full keystroke dynamics dataset on Kaggle for extended testing.

100 editing operations including INSERT, DELETE, UNDO, REDO, FIND, REPLACE across 10 documents
Original Data Source

This project uses editing patterns inspired by the Keystrokes Dataset from Kaggle. The dataset contains 136 million keystrokes capturing real typing sessions including press/release times, keystroke dynamics, and text entry behaviors from multiple participants.

Dataset Info: 136M keystrokes | Typing sessions | Press/Release timestamps | Keystroke dynamics for text editor analysis
Dataset Schema

Each row represents an editing operation with the following columns:

ColumnTypeDescription
operation_idStringUnique operation identifier (e.g., OP001)
timestampDateTimeWhen the operation occurred
operation_typeStringINSERT, DELETE, UNDO, REDO, FIND, REPLACE, etc.
positionIntegerCursor position (0-indexed)
lengthIntegerNumber of characters affected
contentStringText content (for INSERT/REPLACE operations)
document_idStringDocument being edited (e.g., DOC001)
user_idStringUser performing the operation
session_idStringEditing session identifier
undo_groupIntegerGroup ID for undo operations

Operations are distributed across the following types:

  • INSERT (45 operations)
  • DELETE (15 operations)
  • UNDO (12 operations)
  • REDO (5 operations)
  • FIND (8 operations)
  • REPLACE (6 operations)
  • CURSOR_MOVE (5 operations)
  • SELECT (2 operations)
  • CUT/COPY/PASTE (2 operations)

Parse operations and replay them to test your implementation:

  • Create an EditOperation class with type, position, length, and content fields
  • Load operations from CSV and iterate through them
  • Use switch statement to handle INSERT, DELETE, UNDO, REDO operations
  • Verify your text buffer produces correct output after replaying all operations
Sample Data Preview

Here is what the data looks like from text_editor_operations.csv:

Op IDTypePositionLengthContentDoc ID
OP001INSERT013Hello World!DOC001
OP004DELETE56-DOC001
OP006UNDO00-DOC001
Dataset Stats: 100 operations | 10 documents | 5 users | Operations: INSERT, DELETE, UNDO, REDO, FIND, REPLACE
For Testing: Replay operations in order to verify your text buffer produces correct output
Data Quality Note: The dataset contains realistic editing patterns including rapid typing, bulk deletions, frequent undo/redo, and find-replace operations. Your implementation should handle these efficiently.
04

Project Requirements

Your project must include all of the following components. Structure your code with clear organization, documentation, and comprehensive testing.

1
Text Buffer Implementation (Choose 1+)

Implement at least ONE of the following data structures:

  • Rope: Binary tree where leaves contain strings, O(log n) operations
  • Gap Buffer: Array with movable gap at cursor, O(1) local edits
  • Piece Table: Append-only with piece descriptors, used by VS Code
Deliverable: TextBuffer interface with insert(), delete(), charAt(), substring(), length() methods.
2
Undo/Redo System

Implement complete undo/redo functionality:

  • Command Pattern: Encapsulate operations as objects
  • Undo Stack: Track all operations for reversal
  • Redo Stack: Track undone operations for replay
  • Group Operations: Batch related edits (e.g., word completion)
Deliverable: UndoManager class with undo(), redo(), beginGroup(), endGroup() methods.
3
Find & Replace

Implement efficient text search:

  • KMP Algorithm: O(n+m) pattern matching
  • Find All: Return all occurrences of pattern
  • Replace: Replace single or all occurrences
  • Case Sensitivity: Support case-insensitive search
Deliverable: SearchEngine class with find(), findAll(), replace(), replaceAll() methods.
4
Documentation and Analysis

Technical documentation:

  • Data Structure Analysis: Why you chose Rope/Gap Buffer/Piece Table
  • Complexity Analysis: Time/space for each operation
  • Comparison: Trade-offs between different approaches
  • Performance Tests: Benchmarks with large files
Deliverable: README.md with comprehensive documentation and analysis.
05

Algorithm Specifications

Implement the following algorithms with optimal time and space complexity. Each algorithm should be thoroughly tested and documented.

Command Pattern (Core)

Encapsulate each edit operation as a command object.

  • Create Command interface with execute() and undo() methods
  • Implement InsertCommand, DeleteCommand classes
  • Store command state for reversal (position, text, length)
  • Each command knows how to undo itself
Time: O(1) execute/undo Space: O(n) history
Dual Stack Undo/Redo (Core)

Use two stacks to manage operation history.

  • Undo Stack: Push commands after execution
  • Redo Stack: Push commands after undo
  • Clear redo stack on new operation
  • Handle stack underflow gracefully
Time: O(1) undo/redo Space: O(ops)
Memory Management (Important)

Optimize memory usage for large documents.

  • Limit undo history size (e.g., max 100 operations)
  • Compress consecutive similar operations
  • Use StringBuilder for efficient string manipulation
  • Implement lazy evaluation where possible
Memory: Bounded Trade-off: History vs Memory
Operation Grouping (Advanced)

Group related operations for better UX.

  • Batch rapid keystrokes into single undo unit
  • Use timestamps to detect typing sessions
  • Implement beginGroup() / endGroup()
  • Undo entire group as single operation
UX: Better Complexity: Medium
06

Data Structure Specifications

Implement the following data structures to support efficient text editing operations. Choose at least ONE text buffer implementation.

Rope (Recommended)

Binary tree where leaves store string fragments.

  • insert(pos, str) - O(log n)
  • delete(pos, len) - O(log n)
  • charAt(index) - O(log n)
  • substring(start, end) - O(log n + k)
  • concat(rope1, rope2) - O(log n)
  • split(pos) - O(log n)
Best For: Frequent concatenation and splitting operations.
Gap Buffer

Array with movable gap at cursor position.

  • insert(char) - O(1) at cursor
  • delete() - O(1) at cursor
  • moveCursor(pos) - O(gap movement)
  • charAt(index) - O(1)
  • toString() - O(n)
Best For: Sequential typing at cursor (Vim/Emacs use this).
Piece Table

Append-only buffers with piece descriptors (VS Code).

  • insert(pos, str) - O(pieces)
  • delete(pos, len) - O(pieces)
  • charAt(index) - O(pieces)
  • Original buffer: immutable file content
  • Add buffer: append-only new text
  • Pieces: (buffer, start, length) descriptors
Best For: Efficient undo (just remove pieces), large files.
UndoManager

Stack-based history management.

  • execute(Command cmd) - O(1)
  • undo() - O(1)
  • redo() - O(1)
  • canUndo() - O(1)
  • canRedo() - O(1)
  • beginGroup() / endGroup() - O(1)
Pattern: Command Pattern with two stacks (undo/redo).
07

Submission Guidelines

Submit your project via GitHub. Ensure your repository is public and follows the structure below.

Repository Structure
Required Folders
  • src/ - Source code files
  • src/commands/ - Command pattern classes
  • src/editor/ - Main editor class
  • test/ - JUnit test files
  • data/ - CSV dataset
Required Files
  • Command.java - Command interface
  • InsertCommand.java - Insert operation
  • DeleteCommand.java - Delete operation
  • UndoManager.java - Stack management
  • README.md - Documentation
README Requirements
  • Project overview and objectives
  • Command Pattern implementation details
  • Undo/Redo stack logic explanation
  • Memory management strategies used
  • Usage examples with sample outputs
  • Time and space complexity analysis
Submission Checklist
  • Command interface with execute/undo
  • InsertCommand and DeleteCommand classes
  • UndoManager with dual stacks
  • Memory optimization (history limit)
  • JUnit test suite with edge cases
  • Code properly documented
08

Grading Rubric

Your project will be evaluated on the following criteria. Maximum score is 500 points.

Category Criteria Points
Text Buffer
  • Rope OR Gap Buffer OR Piece Table (100 pts)
  • Correct insert/delete operations (30 pts)
  • Optimal time complexity (20 pts)
150
Undo/Redo
  • Command Pattern implementation (40 pts)
  • Undo stack functionality (25 pts)
  • Redo stack functionality (20 pts)
  • Group operations support (15 pts)
100
Find & Replace
  • KMP or Rabin-Karp implementation (40 pts)
  • Find all occurrences (20 pts)
  • Replace single/all (20 pts)
  • Case-insensitive option (10 pts)
90
Testing
  • Unit test coverage (40 pts)
  • Edge case handling (30 pts)
  • Operation replay from CSV (20 pts)
90
Documentation
  • README completeness (20 pts)
  • Code documentation/Javadoc (15 pts)
  • Complexity analysis (20 pts)
  • Performance benchmarks (15 pts)
70
Total 500
Excellent
450-500

Outstanding implementation

Good
350-449

Meets all requirements

Needs Work
< 350

Missing key requirements