Module 1.2

Getting Started with Data Science

Set up your complete data science environment with Python, VS Code, and essential libraries. Then master Jupyter Notebooks, the interactive tool data scientists use every day!

45 min read
Beginner
Installation Guide
What You'll Learn
  • Install Python 3.11+ properly
  • Set up virtual environments
  • Configure VS Code for Python
  • Master Jupyter Notebooks
  • Install essential DS libraries
Contents
01

Why Environment Setup Matters

Before you write a single line of code, you need a proper development environment. We'll use the standard Python installation with pip package manager, the modern, industry-standard approach that gives you full control and flexibility.

Simple Setup: We'll install Python, set up a virtual environment, install essential libraries, and configure VS Code. Takes about 15 minutes!
Python 3.11+

The latest Python with pip package manager included

Virtual Environment

Isolated workspace for your data science projects

VS Code Editor

Professional code editor with Python and Jupyter support

Data Science Stack

NumPy, pandas, matplotlib, scikit-learn, Jupyter

02

What We'll Install

Here's everything you'll need for professional data science development:

Python 3.11+

The core programming language. We recommend Python 3.11 or 3.12 for best performance and latest features.

Includes: pip (package manager), venv (virtual environments)

VS Code

Professional code editor with excellent Python and Jupyter support. Industry standard for development.

Features: IntelliSense, debugging, Git integration, extensions

Essential Data Science Libraries

NumPy - Numerical computing

pandas - Data manipulation

matplotlib - Visualization

seaborn - Statistical plots

scikit-learn - Machine learning

scipy - Scientific computing

jupyter - Interactive notebooks

ipykernel - Jupyter kernel

plotly - Interactive charts

Time Required: About 15-20 minutes for complete setup on a good internet connection.
03

Installing Python + pip

Let's get Python installed on your system. Follow the steps for your operating system:

Windows Installation

1
Download Python

Visit python.org/downloads and download Python 3.11 or later (look for the big yellow "Download Python 3.x" button)

2
Run the Installer

Double-click the downloaded .exe file

⚠️ Critical: Check "Add Python to PATH" during installation! This is essential.
3
Verify Installation

Open Command Prompt (search "cmd" in Start Menu) and run:

python --version
pip --version

You should see Python 3.11.x or later

macOS Installation

1
Download Python

Visit python.org/downloads and download the macOS installer (choose for your Mac: Intel or Apple Silicon M1/M2/M3)

2
Run the Installer

Double-click the .pkg file and follow the installation wizard

3
Verify Installation

Open Terminal (Applications → Utilities → Terminal) and run:

python3 --version
pip3 --version

Note: On macOS, use python3 instead of python

Linux Installation

1
Install via Package Manager

Most Linux distros come with Python. To install the latest version:

Ubuntu/Debian:

sudo apt update
sudo apt install python3.11 python3-pip python3-venv

Fedora:

sudo dnf install python3.11 python3-pip
2
Verify Installation
python3 --version
pip3 --version

Creating a Virtual Environment

Virtual environments keep your projects isolated. Let's create one for data science:

1
Create Virtual Environment

Navigate to your projects folder and run:

Windows:

python -m venv ds_env

macOS/Linux:

python3 -m venv ds_env
2
Activate Virtual Environment

Windows:

ds_env\Scripts\activate

macOS/Linux:

source ds_env/bin/activate

You should see (ds_env) appear in your terminal prompt

3
Install Data Science Libraries

With your virtual environment activated, install essential libraries:

pip install numpy pandas matplotlib seaborn
pip install scikit-learn scipy jupyter ipykernel
pip install plotly

This may take 2-3 minutes depending on your connection

4
Register Jupyter Kernel

So Jupyter can use your virtual environment:

python -m ipykernel install --user --name=ds_env --display-name "Python (DS)"
Pro Tip: Always activate your virtual environment before working on data science projects. To deactivate, just type deactivate.
04

Setting Up VS Code (Recommended Editor)

VS Code is the most popular code editor for Python and data science. It's free, powerful, and has excellent extensions.

Installation

1
Download VS Code

Visit code.visualstudio.com and download for your OS

2
Install Essential Extensions

Open VS Code, go to Extensions (Ctrl+Shift+X), and install:

  • Python (by Microsoft) - Python language support
  • Jupyter (by Microsoft) - Jupyter notebook support
  • Pylance - Fast Python language server
  • Python Indent - Auto-indent correction
  • autoDocstring - Generate docstrings
3
Configure Python Interpreter

Press Ctrl+Shift+P (or Cmd+Shift+P on Mac), type "Python: Select Interpreter", and choose your Anaconda or Python installation

Alternative: PyCharm Community Edition is another excellent free IDE specifically designed for Python development.
05

Testing Your Setup

Let's verify everything is working correctly by running some test code:

Test 1: Python & Basic Libraries

Create a new file called test_setup.py and add:

import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

print(f"Python version: {sys.version}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")

# Test NumPy
arr = np.array([1, 2, 3, 4, 5])
print(f"\nNumPy array: {arr}")
print(f"Mean: {arr.mean()}")

# Test Pandas
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})
print(f"\nPandas DataFrame:\n{df}")

print("\nAll libraries working correctly!")

Run it from terminal:

python test_setup.py

You should see output with version numbers and test data. If you see errors, revisit the installation steps.

Test 2: Jupyter Notebook

Launch Jupyter Notebook:

jupyter notebook

This should open your browser with the Jupyter interface. Create a new notebook and try:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Create sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Plot
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title('Test Plot: Sine Wave')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()

If you see a sine wave plot, congratulations! Everything is working.

06

Common Issues & Solutions

Problem: "python" not recognized

Solution: Python wasn't added to your system PATH during installation.

  • Windows: Reinstall Python and check "Add Python to PATH" option, or manually add it:
    1. Search "Environment Variables" in Start Menu
    2. Click "Environment Variables" button
    3. Under "System variables", find "Path" and click "Edit"
    4. Add C:\Users\YourName\AppData\Local\Programs\Python\Python311
  • Mac/Linux: Add to your ~/.bashrc or ~/.zshrc:
    export PATH="/usr/local/bin/python3:$PATH"
Problem: Virtual environment won't activate

Solution: Different activation methods depending on your shell:

  • Windows Command Prompt: ds_env\Scripts\activate.bat
  • Windows PowerShell: ds_env\Scripts\Activate.ps1 (may need to enable scripts first)
  • Mac/Linux: source ds_env/bin/activate

If PowerShell gives an error, run: Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Problem: "ModuleNotFoundError" when importing libraries

Solution: Either the library isn't installed, or you're not in your virtual environment:

# 1. Make sure venv is activated (you should see (ds_env) in prompt)
source ds_env/bin/activate  # or ds_env\Scripts\activate on Windows

# 2. Install the missing library
pip install library-name

# 3. Verify it's installed
pip list | grep library-name
Problem: Jupyter Notebook won't start

Solution: Make sure Jupyter and ipykernel are installed in your virtual environment:

# Activate your virtual environment first
source ds_env/bin/activate

# Reinstall Jupyter
pip install --upgrade jupyter ipykernel

# Register the kernel again
python -m ipykernel install --user --name=ds_env --display-name "Python (DS)"
Problem: VS Code can't find Python interpreter

Solution:

  1. Open Command Palette (Ctrl+Shift+P or Cmd+Shift+P)
  2. Type "Python: Select Interpreter"
  3. Look for your virtual environment path (e.g., ./ds_env/bin/python)
  4. If not listed, click "Enter interpreter path" and browse to ds_env/bin/python (or ds_env\Scripts\python.exe on Windows)
Problem: pip install is slow or failing

Solution: Try upgrading pip first:

python -m pip install --upgrade pip

# Then try installing again
pip install package-name
07

Introduction to Jupyter Notebooks

Jupyter Notebook

An open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text.

Think of it as a digital notebook where code, results, and explanations live together in perfect harmony.

Jupyter Notebooks are the industry standard for data science work. They let you write code in small chunks (cells), run them individually, see results instantly, and document everything with markdown. Perfect for exploration, analysis, and sharing insights!

Interactive Execution

Run code cells one at a time, see outputs immediately

Rich Output

Display plots, tables, images, HTML directly in notebook

Documentation

Mix code with markdown text, equations, and explanations

08

Launching Jupyter Notebook

Let's get Jupyter up and running. Make sure you've completed the environment setup above!

1
Activate Your Virtual Environment

Open your terminal and activate the virtual environment you created:

Windows:

ds_env\Scripts\activate

macOS/Linux:

source ds_env/bin/activate
2
Launch Jupyter Notebook

With your virtual environment activated, run:

jupyter notebook

This will start a local server and automatically open Jupyter in your default browser at http://localhost:8888

3
Create a New Notebook

In the Jupyter interface:

  • Click the "New" button in the top right
  • Select "Python (DS)" or "Python 3" from the dropdown
  • A new notebook tab will open
Alternative: You can also use Jupyter within VS Code! Just open a .ipynb file or create a new one, and VS Code's Jupyter extension will handle it.
09

Working with Cells

Notebooks are made up of cells. Mastering cells is key to being productive in Jupyter.

Code Cells

Where you write and execute Python code

  • Has In [ ]: indicator on the left
  • Shows execution order number after running
  • Output appears directly below the cell
Markdown Cells

Where you write formatted text and documentation

  • No In [ ]: indicator
  • Supports headings, lists, links, images
  • Renders when you run the cell

Running Cells

1
Shift+Enter - Run and Move

Runs current cell and moves to the next one (creates new cell if at end)

2
Ctrl+Enter (or Cmd+Enter) - Run and Stay

Runs current cell but stays in the same cell

3
Alt+Enter (or Option+Enter) - Run and Insert

Runs current cell and inserts a new cell below

10

Essential Keyboard Shortcuts

Learning these shortcuts will make you 10x faster in Jupyter:

Navigation & Execution
Enter Enter edit mode
Esc Enter command mode
Shift+Enter Run cell, select below
Ctrl+Enter Run cell, stay selected
Cell Management (Command Mode)
A Insert cell above
B Insert cell below
DD Delete cell (press D twice)
M Change to markdown
Y Change to code
Pro Tip: Press H in command mode to see the complete list of keyboard shortcuts in Jupyter.
11

Jupyter Best Practices

Follow these guidelines to create clean, professional notebooks:

DO
  • Use descriptive markdown headings to organize sections
  • Import all libraries at the top in the first cell
  • Keep cells small and focused on one task
  • Run cells in order from top to bottom
  • Save frequently (Ctrl+S)
DON'T
  • Have giant cells with hundreds of lines
  • Run cells out of order (creates confusion)
  • Leave sensitive data (passwords, API keys) in notebooks
  • Forget to restart kernel when things break
  • Share notebooks with 1000+ lines of output
Important: Always restart and run all cells (Kernel → Restart & Run All) before sharing your notebook to ensure it runs correctly from top to bottom!

Key Takeaways

Modern Python Setup

Python 3.11+ with pip and virtual environments is the industry-standard approach for data science development

Virtual Environments

Always use venv to isolate project dependencies. Keeps your system clean and projects independent

Jupyter for Interactive Work

Jupyter Notebooks let you write code in cells, run them individually, and see results instantly

Master Shortcuts

Learn Shift+Enter (run), A/B (insert cells), M/Y (markdown/code). These shortcuts make you 10x faster

Command Line Basics

Master basic terminal commands for activating venv, installing packages with pip, and running Python scripts

VS Code Integration

VS Code with Python and Jupyter extensions gives you the best of both worlds: IDE power and notebook interactivity

Knowledge Check

Test your understanding of environment setup concepts:

Question 1 of 6

What is the purpose of using a virtual environment (venv) in Python?

Question 2 of 6

Which command is used to install a Python library using pip?

Question 3 of 6

What is the purpose of adding Python to your system PATH?

Question 4 of 6

Which of these is NOT one of the essential data science libraries we need to install?

Question 5 of 6

What keyboard shortcut runs the current cell in a Jupyter Notebook?

Question 6 of 6

What is the correct command to activate a virtual environment on Windows?

Answer all questions to check your score