Assignment 9-A

Module Integration

Build a comprehensive Data Analysis Toolkit called PyDataKit that combines Python's standard library with powerful third-party packages. Create a modular, well-structured package for data fetching, processing, and analysis.

5-7 hours
Intermediate
175 Points
Submit Assignment
What You'll Practice
  • datetime & time modules
  • collections & itertools
  • Create Python packages
  • NumPy & Pandas basics
  • HTTP requests with requests
Contents
01

Assignment Overview

In this assignment, you will build PyDataKit, a professional-grade data analysis toolkit that demonstrates mastery of Python's module system, standard library, and essential third-party packages.

Skills Applied: This assignment tests your understanding of Standard Library (Topic 9.1), Creating Packages (Topic 9.2), and Third-Party Libraries (Topic 9.3) from Module 9.
Standard Library (9.1)

datetime, time, math, random, collections, itertools

Package Structure (9.2)

Modules, __init__.py, imports, virtual environments

Third-Party (9.3)

NumPy arrays, Pandas DataFrames, Requests HTTP

Ready to submit? Already completed the assignment? Submit your work now!
Submit Now
02

The Scenario

DataPulse Analytics

You've been hired by DataPulse Analytics, a data consulting firm that needs a reusable Python toolkit for their analysts. The toolkit should handle common tasks like fetching data from web APIs, processing timestamps across timezones, performing statistical computations, and generating data reports.

"We need a well-structured Python package that our analysts can easily install and use. It must demonstrate proper module organization, leverage Python's powerful standard library, and integrate seamlessly with the data science ecosystem."

Your Task

Create a Python package called pydatakit that provides utilities for time handling, statistics, data processing, API interactions, NumPy operations, and Pandas DataFrame manipulation.

Required Package Structure
python-pydatakit/
├── pydatakit/
│   ├── __init__.py           # Package initialization with version
│   ├── time_utils.py         # datetime, time, timezone utilities
│   ├── math_utils.py         # math, random, statistics functions
│   ├── data_structures.py    # collections, itertools helpers
│   ├── api_client.py         # HTTP client using requests
│   ├── numpy_ops.py          # NumPy array operations
│   └── pandas_ops.py         # Pandas DataFrame operations
├── tests/
│   ├── __init__.py
│   ├── test_time_utils.py
│   ├── test_math_utils.py
│   └── test_api_client.py
├── examples/
│   └── demo.py               # Usage demonstration
├── main.py                   # Main entry point
├── requirements.txt          # Dependencies
├── output.txt                # Sample output
└── README.md                 # Documentation
03

Requirements

Your pydatakit package must implement ALL of the following modules and classes. Each requirement is mandatory and will be tested individually.

1
Package Initialization

Create a proper Python package with __init__.py that exposes the public API:

# pydatakit/__init__.py
"""PyDataKit - A comprehensive data analysis toolkit."""

__version__ = "1.0.0"
__author__ = "Your Name"

# Import main classes/functions for easy access
from .time_utils import TimeUtils, DateRange
from .math_utils import Statistics, RandomGenerator
from .data_structures import DataProcessor
from .api_client import APIClient
from .numpy_ops import ArrayOperations
from .pandas_ops import DataFrameHelper

# Define what's available with "from pydatakit import *"
__all__ = [
    'TimeUtils', 'DateRange', 'Statistics', 'RandomGenerator',
    'DataProcessor', 'APIClient', 'ArrayOperations', 'DataFrameHelper',
]
2
TimeUtils Class (datetime & time)

Implement comprehensive datetime utilities:

  • now(tz) - Get current datetime with optional timezone
  • parse_date(date_string, fmt) - Parse string to datetime
  • format_date(dt, fmt) - Format datetime to string
  • days_between(start, end) - Calculate days between dates
  • add_business_days(start, days) - Add business days
  • timestamp_to_datetime(ts) - Convert Unix timestamp
  • measure_execution(func) - Decorator for timing functions

Also create a DateRange class with iteration, weekdays, weekends, and monthly split.

3
Statistics & RandomGenerator Classes (math & random)

Implement math and random utilities:

  • Statistics.mean/median/std_dev/percentile/correlation
  • RandomGenerator.random_int/random_float/random_choice
  • RandomGenerator.random_sample/shuffle/generate_normal
  • RandomGenerator.generate_dataset(n, columns) - Generate test data
4
DataProcessor Class (collections & itertools)

Implement data processing utilities:

  • count_items(items) - Count occurrences using Counter
  • most_common(items, n) - Get n most common items
  • group_by(items, key) - Group list of dicts by key using defaultdict
  • flatten(nested) - Flatten nested lists using chain
  • window(items, size) - Sliding window using deque
  • batch(items, size) - Split into batches
  • unique_combinations(items, r) - Get combinations
5
APIClient Class (requests library)

Implement HTTP client using requests:

  • get(endpoint, params, headers) - Make GET request
  • post(endpoint, data, json_data, headers) - Make POST request
  • fetch_with_retry(endpoint, max_retries, backoff) - Retry with exponential backoff
  • download_file(url, filepath) - Download file
  • get_request_stats() - Get request statistics
6
ArrayOperations Class (NumPy)

Implement NumPy array operations:

  • create_array/arange/linspace/zeros/ones/random_array
  • reshape(arr, new_shape) - Reshape array
  • statistics(arr) - Return dict with mean, std, min, max, median, sum
  • normalize/standardize - Scale data
  • dot_product/matrix_multiply - Linear algebra
  • filter_by_condition(arr, condition) - Boolean filtering
7
DataFrameHelper Class (Pandas)

Implement Pandas DataFrame operations:

  • from_csv/from_json/to_csv/to_json - File I/O
  • info/describe/head - Data exploration
  • filter_rows(column, condition, value) - Filter by condition
  • select_columns/add_column/drop_columns/rename_columns
  • sort_by(columns, ascending) - Sort data
  • group_aggregate(group_by, agg_dict) - Group and aggregate
  • fill_missing/drop_duplicates/merge - Data cleaning
8
Requirements File

Create a proper requirements.txt with pinned versions:

# requirements.txt
numpy>=1.24.0
pandas>=2.0.0
requests>=2.31.0
9
Unit Tests

Create tests in the tests/ directory:

  • test_time_utils.py - Test datetime functions
  • test_math_utils.py - Test statistics and random
  • test_api_client.py - Test HTTP client

Use Python's unittest module with at least 5 test cases per file.

10
Main Entry Point

Create main.py that demonstrates all modules:

#!/usr/bin/env python3
"""PyDataKit - Main demonstration script."""

from pydatakit import (
    TimeUtils, DateRange, Statistics, RandomGenerator,
    DataProcessor, APIClient, ArrayOperations, DataFrameHelper
)

def main():
    print("=" * 60)
    print("PyDataKit - Data Analysis Toolkit Demo")
    print("=" * 60)
    
    # Demonstrate TimeUtils
    print("\n--- DateTime Utilities ---")
    now = TimeUtils.now()
    print(f"Current time: {TimeUtils.format_date(now)}")
    
    # Demonstrate Statistics
    print("\n--- Statistics ---")
    data = [10, 20, 30, 40, 50]
    print(f"Mean: {Statistics.mean(data)}")
    print(f"Std Dev: {Statistics.std_dev(data):.2f}")
    
    # Demonstrate NumPy operations
    print("\n--- NumPy Operations ---")
    arr = ArrayOperations.create_array([1, 2, 3, 4, 5])
    stats = ArrayOperations.statistics(arr)
    print(f"Array stats: {stats}")
    
    # Demonstrate Pandas operations
    print("\n--- Pandas Operations ---")
    df = DataFrameHelper({'name': ['Alice', 'Bob'], 'age': [25, 30]})
    print(df.head())
    
    print("\n" + "=" * 60)
    print("Demo completed!")
    print("=" * 60)

if __name__ == "__main__":
    main()
04

Submission

Create a public GitHub repository with the exact name shown below:

Required Repository Name
python-pydatakit
github.com/<your-username>/python-pydatakit
Required Files
python-pydatakit/
├── pydatakit/
│   ├── __init__.py           # Package init with version and exports
│   ├── time_utils.py         # TimeUtils and DateRange classes
│   ├── math_utils.py         # Statistics and RandomGenerator classes
│   ├── data_structures.py    # DataProcessor class
│   ├── api_client.py         # APIClient class
│   ├── numpy_ops.py          # ArrayOperations class
│   └── pandas_ops.py         # DataFrameHelper class
├── tests/
│   ├── __init__.py
│   ├── test_time_utils.py    # At least 5 test cases
│   ├── test_math_utils.py    # At least 5 test cases
│   └── test_api_client.py    # At least 5 test cases
├── examples/
│   └── demo.py               # Usage examples
├── main.py                   # Main entry point
├── requirements.txt          # Dependencies
├── output.txt                # Sample output from running main.py
└── README.md                 # Documentation
README.md Must Include:
  • Your full name and submission date
  • Installation instructions (pip install -r requirements.txt)
  • Usage examples for each module
  • API documentation for main classes
  • How to run tests
Do Include
  • All 7 module files with complete classes
  • Docstrings for every class and method
  • Type hints for function parameters
  • At least 15 unit tests total
  • output.txt from running main.py
  • Comprehensive README.md
Do Not Include
  • __pycache__ folders
  • Virtual environment (venv/)
  • .pyc files
  • IDE config files (.idea, .vscode)
  • Code that doesn't run without errors
Important: Run python main.py and save the output to output.txt before submitting!
Submit Your Assignment

Enter your GitHub username - we'll verify your repository automatically

05

Grading Rubric

Your assignment will be graded on the following criteria:

Criteria Points Description
Package Structure 20 Proper __init__.py, exports, module organization
TimeUtils & DateRange 20 datetime, time module usage, timezone handling
Statistics & Random 15 math, random modules, statistical calculations
DataProcessor 15 collections, itertools usage, data transformations
NumPy Operations 20 Array creation, statistics, transformations
Pandas Operations 25 DataFrame creation, filtering, grouping, aggregation
API Client 20 requests library, error handling, retry logic
Unit Tests 15 Test coverage for core functionality
Documentation 15 README, docstrings, code comments
Code Quality 10 Clean code, type hints, best practices
Total 175
Bonus Points (up to 25)
+10 points

Create setup.py for pip-installable package

+5 points

Add data visualization methods using matplotlib

+5 points

Implement async API client with aiohttp

+5 points

Add CLI interface using argparse

Ready to Submit?

Make sure you have completed all requirements and reviewed the grading rubric above.

Submit Your Assignment
06

What You Will Practice

Standard Library (9.1)

datetime, time, math, random, collections, itertools — Python's powerful built-in tools

Package Structure (9.2)

Creating proper Python packages with __init__.py, module exports, and virtual environments

Third-Party Libraries (9.3)

NumPy for numerical computing, Pandas for data manipulation, Requests for HTTP

Testing & Documentation

Writing unit tests with unittest, creating comprehensive documentation and docstrings

07

Pre-Submission Checklist

Code Requirements
Repository Requirements