Capstone Project 4

Social Network Graph System

Build a complete social network platform using graph data structures. You will implement friend recommendations with BFS/DFS traversals, find shortest connection paths using Dijkstra's algorithm, detect communities, and analyze network influence metrics.

18-22 hours
Advanced
500 Points
What You Will Build
  • Graph-based User Network
  • Friend Recommendation Engine
  • Shortest Path Finder
  • Community Detection
  • Influence Analysis
Contents
01

Project Overview

This capstone project brings together everything you have learned in the Java DSA course. You will work with real social network data containing 50 users, 100 connections, and 50 posts with engagement metrics. Your goal is to implement graph algorithms that power features like friend suggestions, connection paths, and community detection - the same algorithms used by LinkedIn, Facebook, and Twitter.

Skills Applied: This project tests your proficiency in Graphs (adjacency list/matrix), BFS/DFS Traversals (friend recommendations), Dijkstra's Algorithm (shortest paths), Union-Find (communities), and HashMap/HashSet (efficient lookups).
Graph Structure

Adjacency list with weighted edges

Friend Suggestions

BFS/DFS based recommendations

Connection Paths

Dijkstra's shortest path algorithm

Communities

Union-Find for group detection

Learning Objectives

Technical Skills
  • Implement Graph using adjacency list representation
  • Build BFS traversal for friend-of-friend recommendations
  • Apply Dijkstra's algorithm for weighted shortest paths
  • Use Union-Find data structure for community detection
  • Calculate network metrics like degree centrality and influence
Problem-Solving Skills
  • Analyze real-world social network patterns
  • Understand graph algorithm time complexity
  • Handle large-scale graph operations efficiently
  • Design recommendation systems based on connections
  • Test with actual social network interaction data
Ready to submit? Already completed the project? Submit your work now!
Submit Now
02

Problem Scenario

ConnectHub

You have been hired as a Backend Engineer at ConnectHub, a growing social networking platform. The company is struggling with user engagement - their current system shows random friend suggestions and users can't discover how they're connected to others. The product team needs you to build an intelligent recommendation engine using graph algorithms.

"Our users want to see 'People You May Know' based on mutual friends, not random suggestions. They also want to know 'how am I connected to this person?' like LinkedIn's connection path feature. Can you build a graph-based system that handles these features efficiently?"

Marcus Thompson, VP of Product

Key Challenges to Solve

Friend Recommendations
  • Find friends-of-friends efficiently (2-hop connections)
  • Rank suggestions by mutual friend count
  • Filter already-connected users
  • Handle edge cases gracefully
Connection Paths
  • Find shortest path between two users
  • Consider connection strength as edge weight
  • Display the chain of connections
  • Handle disconnected users
Community Detection
  • Identify groups of closely connected users
  • Use Union-Find for efficient grouping
  • Detect isolated communities
  • Analyze community sizes and patterns
Influence Analysis
  • Calculate degree centrality (connections)
  • Identify most influential users
  • Analyze network topology metrics
  • Find bridge users between communities
Pro Tip: Think like a software engineer! Your graph implementation should be efficient, scalable, and handle real-world data patterns. Focus on algorithm optimization and clean code structure.
03

The Dataset

You will work with a comprehensive social network dataset containing real-world interaction patterns. Download the CSV files containing users, connections, and posts data for graph analysis:

Dataset Download

Download the social network dataset files and save them to your project folder. The CSV files contain all necessary data for graph algorithms and network analysis.

Original Data Source

This project uses a curated Social Network Graph Dataset inspired by popular Kaggle datasets and real-world platforms like LinkedIn, Twitter, and Facebook. The dataset contains realistic user profiles, friendship connections with varying strengths, and engagement metrics. Perfect for learning graph algorithms and network analysis techniques used by major tech companies.

Dataset Info: 50 Users | 100 Connections | 50 Posts | Connection Strength: 0.0-1.0 | Network Density: ~8% | Average Connections Per User: 4 | Date Range: 2023-2024 | Inspired by: LinkedIn, Twitter, Facebook graph datasets
Dataset Schema

ColumnTypeDescription
user_idStringUnique user identifier (U001-U050)
usernameStringUsername handle for the profile
full_nameStringUser's full name
emailStringUser's email address
join_dateDateAccount creation date (YYYY-MM-DD)
locationStringUser's city/region
followers_countIntegerNumber of followers
following_countIntegerNumber of users followed
posts_countIntegerTotal posts published
is_verifiedBooleanAccount verification status
profile_statusStringActive, Inactive, or Dormant

ColumnTypeDescription
connection_idStringUnique connection identifier (C001-C100)
user_idStringSource user (follower/connector)
friend_idStringTarget user (followed/connected)
connection_typeStringFriend, Follower, or Following
connected_dateDateDate connection was established
interaction_countIntegerTotal interactions between users
last_interactionDateDate of most recent interaction
strength_scoreDecimalConnection strength (0.0-1.0), edge weight

ColumnTypeDescription
post_idStringUnique post identifier (P001-P050)
user_idStringUser who created the post
contentStringPost text/content
post_dateDateDate post was published (YYYY-MM-DD)
likes_countIntegerNumber of likes received
comments_countIntegerNumber of comments
shares_countIntegerNumber of shares
post_typeStringText, Image, Video, or Article
visibilityStringPublic, Friends Only, or Private
engagement_scoreDecimalCalculated engagement metric (0.0-100.0)
04

Project Requirements

Your project must include all of the following components using Java, graph data structures, and algorithms. Structure your implementation with clear organization and comprehensive documentation.

1
Graph Data Structure & CSV Parsing

Data Loading:

  • Parse users.csv and populate user profiles (User objects)
  • Parse connections.csv and build adjacency list graph
  • Parse posts.csv for engagement metrics (optional for recommendation scoring)
  • Validate data integrity and handle malformed CSV rows

Graph Construction:

  • Implement adjacency list using HashMap<String, List<Edge>>
  • Store connection strength as edge weights (0.0-1.0)
  • Support bidirectional friendship connections
  • Efficient O(1) neighbor lookup and O(E) edge iteration

Data Classes:

  • User.java - Store user profile with followers, following, posts count
  • Connection.java - Edge representation with strength score
  • SocialGraph.java - Main graph class with core operations
Deliverable: Java classes for User, Connection, and SocialGraph with CSV parsing and proper error handling.
2
Friend Recommendations (BFS Algorithm)

Algorithm Implementation:

  • Implement BFS to explore 2-hop connections from source user
  • Level 1: Direct friends (visited, exclude from recommendations)
  • Level 2: Friends-of-friends (candidates for recommendation)
  • Skip self-loops and already-connected users

Ranking & Filtering:

  • For each Level 2 user, count number of mutual friends
  • Rank candidates by mutual friend count (descending)
  • Return top N recommendations with scores
  • Handle users with no friends (return empty list)

Performance Requirements:

  • Time Complexity: O(V + E) for BFS traversal
  • Space Complexity: O(V) for visited set and result storage
  • Optimize with HashMap for constant-time lookups
Deliverable: FriendRecommender.java class with recommendFriends() method returning top-N recommendations with mutual friend counts.
3
Shortest Connection Path (Dijkstra's Algorithm)

Algorithm Implementation:

  • Implement Dijkstra's algorithm with binary heap (PriorityQueue)
  • Use connection strength as edge weight: cost = 1 - strength_score
  • Stronger connections (higher strength) = shorter distances (preferred)
  • Find optimal path maximizing total connection strength

Path Reconstruction:

  • Track parent pointers during algorithm execution
  • Reconstruct full path from source to destination
  • Return path as list of User objects or IDs
  • Include total distance/cost in result

Edge Cases:

  • Handle disconnected users (no path exists)
  • Handle direct connections (source == destination)
  • Handle single-hop paths (direct friends)
Deliverable: PathFinder.java class with findShortestPath() method returning the path with minimum cost (maximum strength).
4
Community Detection (Union-Find Algorithm)

Union-Find Implementation:

  • Implement Disjoint Set Union (Union-Find) data structure
  • Use int[] parent array to track set membership
  • Use int[] rank array for union-by-rank optimization
  • Implement path compression in find() method

Community Grouping:

  • Union users connected by edges with strength_score ≥ 0.7 (strong connections)
  • Identify distinct connected components (communities)
  • Return mapping of users to community IDs
  • Report community sizes and member lists

Optimizations:

  • Path compression in find() - O(α(n)) amortized time
  • Union by rank - maintain balanced tree structure
  • Overall complexity: Near O(1) per operation
Deliverable: CommunityDetector.java class with detectCommunities() method returning community assignments for all users.

Bonus Features (+75 points)

Influence Analysis (+50 pts)

Calculate network influence metrics:

  • Degree centrality for each user (connection count)
  • Weighted degree (sum of connection strengths)
  • Identify top 10 most influential users
  • Find bridge users connecting multiple communities
  • Betweenness centrality (times in shortest paths)
Network Visualization (+25 pts)

Generate visual representation:

  • Export graph in DOT format (Graphviz)
  • Color-code nodes by community membership
  • Size nodes by degree centrality
  • Label edges with connection strength
  • Highlight bridge users and communities
05

Algorithm Complexity Reference

Understanding the time and space complexity of each algorithm is critical for performance and scalability. These are the expected complexities for your implementations.

BFS (Friend Recommendations)
  • Time: O(V + E)
  • Space: O(V)
  • Why: Visit each vertex and edge once

With 50 users and 100 connections: ~150 operations

Dijkstra's (Shortest Path)
  • Time: O((V+E) log V)
  • Space: O(V)
  • Why: Extract-min from heap V times

With 50 users and 100 connections: ~600 operations

Union-Find (Communities)
  • Time: O(α(n)) per op
  • Space: O(V)
  • Why: Path compression ≈ constant

With 50 users: ~50 near-constant ops

DFS (Graph Traversal)
  • Time: O(V + E)
  • Space: O(V)
  • Why: Stack depth + edges

With 50 users and 100 connections: ~150 operations

06

Data Structures

Choose the right data structure for each component. Efficiency matters when working with graphs!

Adjacency List

HashMap<String, List<Edge>>

  • Map user ID → list of neighbors
  • O(1) neighbor access
  • O(E) space for edges
Priority Queue

PriorityQueue<Vertex>

  • Extract min distance in O(log V)
  • Used for Dijkstra's algorithm
  • Custom Comparator by distance
Union-Find

int[] parent, int[] rank

  • Path compression in find()
  • Union by rank optimization
  • Nearly O(1) per operation
Hash Collections

HashMap, HashSet

  • O(1) average lookups
  • Visited tracking (BFS/DFS)
  • User/connection storage
07

Submission Requirements

GitHub Repository

Create a public GitHub repository named social-network-graph with complete source code, data files, test cases, and comprehensive documentation. The repository will be verified for required files and code structure.

Submit Your Project
Required Repository Structure
social-network-graph/
├── src/
│   ├── User.java
│   ├── Connection.java
│   ├── SocialGraph.java
│   ├── FriendRecommender.java
│   ├── PathFinder.java
│   └── CommunityDetector.java
├── test/
│   └── SocialGraphTest.java (or use JUnit)
├── data/
│   ├── social_network_users.csv
│   ├── social_network_connections.csv
│   └── social_network_posts.csv
├── README.md
└── .gitignore
README.md Must Include
Project Details
  • Your full name and submission date
  • Project overview and features
  • Core algorithms implemented
  • Bonus features (if any)
Technical Documentation
  • Time/space complexity analysis
  • Data structure explanations
  • Compilation & execution instructions
  • Sample output for each feature
Design & Implementation
  • Design decisions and trade-offs
  • Graph representation explanation
  • Algorithm choice justification
  • Edge case handling
Testing & Results
  • Test cases with expected results
  • Performance on dataset
  • Screenshots of output
  • Known limitations
08

Grading Rubric

Your project will be evaluated on correctness of implementation, code quality, and comprehensive documentation. Bonus features offer additional points for advanced work.

Component Points Criteria
Graph Construction 100 Correct CSV parsing, adjacency list implementation, bidirectional edges, weight handling
Friend Recommendations (BFS) 100 Correct BFS traversal, 2-hop detection, ranking by mutual friends, proper exclusions
Shortest Path (Dijkstra's) 100 Correct algorithm implementation, priority queue usage, path reconstruction, edge weighting
Community Detection (Union-Find) 100 Correct Union-Find implementation, path compression, union by rank, threshold-based grouping
Code Quality 50 Clean code structure, meaningful variable names, proper comments, efficient implementation
Documentation & Testing 50 Comprehensive README, complexity analysis, test cases, sample output, design explanation
Bonus: Influence Analysis +50 Degree centrality calculation, top influencers, bridge user detection
Bonus: Network Visualization +25 DOT format export, community coloring, node sizing by influence
Total 500 Base score + up to +75 bonus points
Tip: Bonus features can boost your score significantly. Start with core features, then add influence analysis and visualization if you have time. Both bonuses are worth pursuing!