Project Overview
This capstone project focuses on cryptography fundamentals and file I/O operations. You will build a professional-grade file encryption tool that can encrypt and decrypt files using multiple algorithms. The tool will feature a command-line interface with proper argument parsing, support for both text and binary files, and robust error handling for various edge cases.
XOR Cipher
Implement XOR-based encryption with variable key length
Caesar Cipher
Classic shift cipher with configurable offset
File Handling
Read/write text and binary files securely
CLI Interface
Parse arguments and provide help documentation
Learning Objectives
Technical Skills
- Master binary file I/O with fread/fwrite
- Implement bitwise XOR operations
- Build modular alphabet shift functions
- Parse command-line arguments (argc/argv)
- Handle memory allocation correctly
Security Concepts
- Understand symmetric encryption principles
- Learn about key management best practices
- Recognize strengths and weaknesses of classic ciphers
- Handle sensitive data in memory securely
- Implement proper error messages (no info leakage)
Project Scenario
SecureData Solutions
You have been contracted by SecureData Solutions, a small cybersecurity consulting firm. They need a lightweight, portable file encryption utility for their clients who work in environments where installing large security suites is not practical. The tool must be small, fast, and work entirely from the command line.
"We need a simple but effective encryption tool that our field agents can use on any system with a C compiler. It should support multiple algorithms so users can choose their security level, and it must handle any file type - text documents, images, even executables. Can you build this for us?"
Requirements from the Client
- XOR cipher with user-provided key
- Caesar cipher with configurable shift (1-25)
- Decrypt mode for all algorithms
- Auto-detect original file from encrypted
- Read files of any size (streaming for large files)
- Write encrypted output to new file
- Preserve original file (no overwrite)
- Support both text and binary files
- Support short and long argument forms
- Display help message with --help
- Show version with --version
- Clear error messages for invalid input
- Vigenère cipher implementation
- Progress bar for large files
- File integrity check (checksum)
- Batch encryption (multiple files)
The Dataset
You will work with sample text files for testing your encryption algorithms. Download the files containing various content types and sizes for comprehensive testing:
Dataset Download
Download the sample text files and save them to your project folder. The files contain various content patterns for testing encryption algorithms.
Original Data Source
This project uses the Gutenberg Text Corpus from Kaggle - a collection of 18 classic literature .txt files (11.8 MB total). Includes works by Austen, Carroll, Melville, Milton, Shakespeare, and more - perfect for testing file encryption with real-world text data.
Dataset Schema
| Column | Type | Description |
|---|---|---|
id | Integer | Row identifier (1-10) |
text | String | Sample text patterns (Hello World, pangram, etc.) |
category | String | Pattern type (greeting, alphabet, numbers, special) |
length | Integer | Character count of text field |
| Column | Type | Description |
|---|---|---|
section_id | Integer | Section identifier (1-20) |
topic | String | Cryptography topic (symmetric, asymmetric, hash, etc.) |
content | String | Educational content about encryption |
example | String | Code or pattern example |
difficulty | String | Beginner, Intermediate, Advanced |
| Column | Type | Description |
|---|---|---|
row_id | Integer | Unique row identifier (1-500) |
chapter | String | Chapter name (History, XOR, Caesar, Vigenere, etc.) |
section | String | Section title within chapter |
content | String | Detailed documentation text |
code_sample | String | C code examples (may contain special chars) |
ascii_value | Integer | ASCII reference values (0-255) |
| Column | Type | Description |
|---|---|---|
record_id | Integer | Record identifier (1-25) |
data_type | String | Type: PII, financial, credentials, network |
field_name | String | Field label (name, ssn, card_number, etc.) |
sample_value | String | Fictional sample data (NEVER real data) |
sensitivity | String | Low, Medium, High, Critical |
Key Concepts
Before implementing the encryption algorithms, make sure you understand these fundamental concepts:
XOR Cipher
XOR (exclusive or) is a bitwise operation that returns 1 when inputs differ:
0 XOR 0 = 0
0 XOR 1 = 1
1 XOR 0 = 1
1 XOR 1 = 0
Key property: A XOR B XOR B = A
(XORing twice with same key returns original)
Implementation: Each byte of plaintext is XORed with corresponding byte of key. If key is shorter than plaintext, repeat the key.
Caesar Cipher
Each letter is shifted by a fixed number of positions in the alphabet:
Shift = 3:
A → D B → E C → F ...
X → A Y → B Z → C
Encryption: E(x) = (x + shift) mod 26
Decryption: D(x) = (x - shift) mod 26
Important: Only shift letters (A-Z, a-z). Leave numbers, spaces, and punctuation unchanged. Preserve case.
Binary File I/O
Use binary mode for reliable file handling:
// Open for binary read
FILE *in = fopen("input.txt", "rb");
// Open for binary write
FILE *out = fopen("output.enc", "wb");
// Read/write binary data
size_t bytes = fread(buffer, 1, SIZE, in);
fwrite(buffer, 1, bytes, out);
Why binary mode? Text mode may translate newlines differently on Windows vs Linux. Binary mode preserves exact bytes.
Command-Line Arguments
Parse argc/argv to get user options:
int main(int argc, char *argv[]) {
for (int i = 1; i < argc; i++) {
if (strcmp(argv[i], "-k") == 0) {
key = argv[++i];
} else if (strcmp(argv[i], "-i") == 0) {
input_file = argv[++i];
}
}
}
Tip: Consider using getopt() for more robust argument parsing, or implement your own loop for learning purposes.
Project Requirements
Your project must implement the following features. Focus on correctness first, then add bonus features.
XOR Cipher Implementation (Required)
- Accept key of any length as string
- XOR each byte of input with corresponding key byte (repeating key as needed)
- Work with both text and binary files
- Same function should work for both encryption and decryption
- Handle files larger than available memory (streaming)
Caesar Cipher Implementation (Required)
- Accept shift value (1-25) as command-line argument
- Shift only alphabetic characters (A-Z, a-z)
- Preserve letter case (uppercase stays uppercase)
- Leave non-alphabetic characters unchanged
- Support both encrypt (positive shift) and decrypt (negative shift)
Command-Line Interface (Required)
- Support arguments:
-a(algorithm),-k(key),-s(shift),-i(input),-o(output) - Support
-dor--decryptflag for decryption mode - Display help message with
-hor--help - Show version with
--version - Clear error messages for missing or invalid arguments
Error Handling (Required)
- File not found - clear error message
- Permission denied - handle gracefully
- Invalid algorithm selection
- Missing required arguments
- Invalid shift value (outside 1-25 range)
- Memory allocation failures
Bonus Features (Optional)
- Vigenère Cipher: Polyalphabetic cipher using keyword
- Progress Bar: Show encryption progress for large files
- Checksum: Add integrity verification (CRC32 or simple checksum)
- Batch Mode: Encrypt multiple files with wildcard pattern
- Key from File: Read encryption key from file instead of command line
Example Usage
Here are example commands showing how your program should work:
XOR Encryption
# Encrypt a file with XOR cipher
$ ./encrypt -a xor -k "MySecretKey123" -i sample.txt -o sample.enc
Encrypting sample.txt with XOR cipher...
Done! Encrypted 1024 bytes to sample.enc
# Decrypt the file (same command, add -d flag)
$ ./encrypt -a xor -k "MySecretKey123" -d -i sample.enc -o sample_decrypted.txt
Decrypting sample.enc with XOR cipher...
Done! Decrypted 1024 bytes to sample_decrypted.txt
Caesar Cipher
# Encrypt with Caesar cipher (shift of 5)
$ ./encrypt -a caesar -s 5 -i message.txt -o message.enc
Encrypting message.txt with Caesar cipher (shift: 5)...
Done! Encrypted 256 bytes to message.enc
# Decrypt (use -d flag)
$ ./encrypt -a caesar -s 5 -d -i message.enc -o message_plain.txt
Decrypting message.enc with Caesar cipher (shift: 5)...
Done! Decrypted 256 bytes to message_plain.txt
Help and Errors
# Display help
$ ./encrypt --help
File Encryption Tool v1.0
Usage: encrypt [OPTIONS]
Options:
-a, --algorithm ALG Encryption algorithm (xor, caesar, vigenere)
-k, --key KEY Encryption key (for xor, vigenere)
-s, --shift N Shift value 1-25 (for caesar)
-i, --input FILE Input file path
-o, --output FILE Output file path
-d, --decrypt Decrypt mode (default: encrypt)
-h, --help Show this help message
--version Show version information
Examples:
encrypt -a xor -k "secret" -i input.txt -o output.enc
encrypt -a caesar -s 3 -d -i encrypted.txt -o plain.txt
# Error handling
$ ./encrypt -a xor -i nonexistent.txt -o output.enc
Error: File 'nonexistent.txt' not found.
$ ./encrypt -a caesar -s 30 -i input.txt -o output.enc
Error: Shift value must be between 1 and 25.
Submission Requirements
Create a public GitHub repository with the exact name shown below:
Required Repository Name
c-file-encryption
Required Project Structure
include/
- encryption.h
- decryption.h
- file_handler.h
- key_manager.h
- utils.h
src/
- main.c
- xor_cipher.c
- caesar_cipher.c
- vigenere_cipher.c (bonus)
- file_handler.c
- utils.c
Other
- data/ (test files)
- tests/ (test cases)
- Makefile
- README.md
README.md Required Sections
1. Project Header
- Project title and description
- Your full name and submission date
- Course and project number
2. Features
- List of implemented algorithms
- Supported file types
- CLI options available
3. Build Instructions
- Prerequisites (GCC, make)
- How to compile with make
- How to run the program
4. Usage Examples
- XOR encryption/decryption commands
- Caesar cipher examples
- Error handling examples
5. Algorithm Details
- Brief explanation of each cipher
- Security considerations
- Limitations of classic ciphers
6. Testing
- How to run tests
- Test files used
- Expected vs actual output verification
Grading Rubric
Your project will be evaluated on the following criteria (450 points total):
| Category | Criteria | Points |
|---|---|---|
| XOR Cipher | Correct encryption/decryption | 40 |
| Variable key length support | 20 | |
| Binary file handling | 20 | |
| Caesar Cipher | Correct shift implementation | 40 |
| Case preservation | 15 | |
| Non-alpha character handling | 15 | |
| CLI Interface | Argument parsing works correctly | 30 |
| Help message complete | 15 | |
| Error messages are clear | 15 | |
| File Handling | Correct file read/write | 30 |
| Large file support (streaming) | 20 | |
| Error handling (file not found, etc.) | 20 | |
| Code Quality | Modular structure (separate files) | 25 |
| Clear comments and documentation | 20 | |
| No memory leaks | 20 | |
| Documentation | README completeness | 40 |
| Usage examples work correctly | 15 | |
| Bonus | Vigenère cipher, progress bar, checksum, etc. | +50 |
| Total | 450 (+50 bonus) | |
Excellent
Exceeds all requirements with exceptional quality
Good
Meets all requirements with good quality
Satisfactory
Meets minimum requirements
Needs Work
Missing key requirements
Ready to Submit?
Make sure you have completed all requirements and reviewed the grading rubric above.
Submit Your ProjectPre-Submission Checklist
Use this checklist to verify you have completed all requirements before submitting your project.
XOR Cipher
Caesar Cipher
CLI Interface
Repository Requirements
diff or fc command to compare.
Common Issues and Solutions
Encountering problems? Don't worry! Here are the most common issues students face and how to resolve them quickly.
Encryption/Decryption Mismatch
Decrypted file doesn't match original - characters are garbled
Ensure you're using the exact same key. For XOR, check key length repeating logic. For Caesar, use negative shift for decryption.
Binary File Corruption
Binary files (images, executables) get corrupted after decryption
Always use binary mode ("rb", "wb") not text mode. Text mode may translate newlines:
FILE *f = fopen(filename, "rb"); // Not "r"
Segmentation Fault on Large Files
Program crashes when processing large files (> 1MB)
Use buffered reading instead of loading entire file. Process in chunks:
char buffer[4096];
while ((bytes = fread(buffer, 1, 4096, in)) > 0) {...}
valgrind ./encrypt ... to find memory issues
Caesar Cipher Wrapping Errors
Letters near end of alphabet (X, Y, Z) don't wrap correctly
Use modulo arithmetic correctly. For uppercase:
encrypted = ((c - 'A' + shift) % 26) + 'A';
Argument Parsing Fails
Program crashes or ignores arguments when parsing CLI options
Check bounds before accessing argv[i+1]. Use strcmp() for string comparison:
if (i + 1 < argc && strcmp(argv[i], "-k") == 0)
Makefile Build Errors
"make: Nothing to be done" or linker errors
Use tabs (not spaces) for indentation. Add all .c files to build. Run make clean first:
gcc -o encrypt src/*.c -I include
Still Having Issues?
Check the course discussion forum or reach out for help