Why Environment Setup Matters
Before you write a single line of code, you need a proper development environment. We'll use the standard Python installation with pip package manager, the modern, industry-standard approach that gives you full control and flexibility.
Python 3.11+
The latest Python with pip package manager included
Virtual Environment
Isolated workspace for your data science projects
VS Code Editor
Professional code editor with Python and Jupyter support
Data Science Stack
NumPy, pandas, matplotlib, scikit-learn, Jupyter
What We'll Install
Here's everything you'll need for professional data science development:
Python 3.11+
The core programming language. We recommend Python 3.11 or 3.12 for best performance and latest features.
Includes: pip (package manager), venv (virtual environments)
VS Code
Professional code editor with excellent Python and Jupyter support. Industry standard for development.
Features: IntelliSense, debugging, Git integration, extensions
Essential Data Science Libraries
NumPy - Numerical computing
pandas - Data manipulation
matplotlib - Visualization
seaborn - Statistical plots
scikit-learn - Machine learning
scipy - Scientific computing
jupyter - Interactive notebooks
ipykernel - Jupyter kernel
plotly - Interactive charts
Installing Python + pip
Let's get Python installed on your system. Follow the steps for your operating system:
Windows Installation
Download Python
Visit python.org/downloads and download Python 3.11 or later (look for the big yellow "Download Python 3.x" button)
Run the Installer
Double-click the downloaded .exe file
Verify Installation
Open Command Prompt (search "cmd" in Start Menu) and run:
python --version
pip --version
You should see Python 3.11.x or later
macOS Installation
Download Python
Visit python.org/downloads and download the macOS installer (choose for your Mac: Intel or Apple Silicon M1/M2/M3)
Run the Installer
Double-click the .pkg file and follow the installation wizard
Verify Installation
Open Terminal (Applications → Utilities → Terminal) and run:
python3 --version
pip3 --version
Note: On macOS, use python3 instead of python
Linux Installation
Install via Package Manager
Most Linux distros come with Python. To install the latest version:
Ubuntu/Debian:
sudo apt update
sudo apt install python3.11 python3-pip python3-venv
Fedora:
sudo dnf install python3.11 python3-pip
Verify Installation
python3 --version
pip3 --version
Creating a Virtual Environment
Virtual environments keep your projects isolated. Let's create one for data science:
Create Virtual Environment
Navigate to your projects folder and run:
Windows:
python -m venv ds_env
macOS/Linux:
python3 -m venv ds_env
Activate Virtual Environment
Windows:
ds_env\Scripts\activate
macOS/Linux:
source ds_env/bin/activate
You should see (ds_env) appear in your terminal prompt
Install Data Science Libraries
With your virtual environment activated, install essential libraries:
pip install numpy pandas matplotlib seaborn
pip install scikit-learn scipy jupyter ipykernel
pip install plotly
This may take 2-3 minutes depending on your connection
Register Jupyter Kernel
So Jupyter can use your virtual environment:
python -m ipykernel install --user --name=ds_env --display-name "Python (DS)"
deactivate.
Setting Up VS Code (Recommended Editor)
VS Code is the most popular code editor for Python and data science. It's free, powerful, and has excellent extensions.
Installation
Download VS Code
Visit code.visualstudio.com and download for your OS
Install Essential Extensions
Open VS Code, go to Extensions (Ctrl+Shift+X), and install:
- Python (by Microsoft) - Python language support
- Jupyter (by Microsoft) - Jupyter notebook support
- Pylance - Fast Python language server
- Python Indent - Auto-indent correction
- autoDocstring - Generate docstrings
Configure Python Interpreter
Press Ctrl+Shift+P (or Cmd+Shift+P on Mac), type "Python: Select Interpreter", and choose your Anaconda or Python installation
Testing Your Setup
Let's verify everything is working correctly by running some test code:
Test 1: Python & Basic Libraries
Create a new file called test_setup.py and add:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
print(f"Python version: {sys.version}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")
# Test NumPy
arr = np.array([1, 2, 3, 4, 5])
print(f"\nNumPy array: {arr}")
print(f"Mean: {arr.mean()}")
# Test Pandas
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
})
print(f"\nPandas DataFrame:\n{df}")
print("\nAll libraries working correctly!")
Run it from terminal:
python test_setup.py
You should see output with version numbers and test data. If you see errors, revisit the installation steps.
Test 2: Jupyter Notebook
Launch Jupyter Notebook:
jupyter notebook
This should open your browser with the Jupyter interface. Create a new notebook and try:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Create sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Plot
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title('Test Plot: Sine Wave')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()
If you see a sine wave plot, congratulations! Everything is working.
Common Issues & Solutions
Problem: "python" not recognized
Solution: Python wasn't added to your system PATH during installation.
- Windows: Reinstall Python and check "Add Python to PATH" option, or manually add it:
- Search "Environment Variables" in Start Menu
- Click "Environment Variables" button
- Under "System variables", find "Path" and click "Edit"
- Add
C:\Users\YourName\AppData\Local\Programs\Python\Python311
- Mac/Linux: Add to your
~/.bashrcor~/.zshrc:export PATH="/usr/local/bin/python3:$PATH"
Problem: Virtual environment won't activate
Solution: Different activation methods depending on your shell:
- Windows Command Prompt:
ds_env\Scripts\activate.bat - Windows PowerShell:
ds_env\Scripts\Activate.ps1(may need to enable scripts first) - Mac/Linux:
source ds_env/bin/activate
If PowerShell gives an error, run: Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Problem: "ModuleNotFoundError" when importing libraries
Solution: Either the library isn't installed, or you're not in your virtual environment:
# 1. Make sure venv is activated (you should see (ds_env) in prompt)
source ds_env/bin/activate # or ds_env\Scripts\activate on Windows
# 2. Install the missing library
pip install library-name
# 3. Verify it's installed
pip list | grep library-name
Problem: Jupyter Notebook won't start
Solution: Make sure Jupyter and ipykernel are installed in your virtual environment:
# Activate your virtual environment first
source ds_env/bin/activate
# Reinstall Jupyter
pip install --upgrade jupyter ipykernel
# Register the kernel again
python -m ipykernel install --user --name=ds_env --display-name "Python (DS)"
Problem: VS Code can't find Python interpreter
Solution:
- Open Command Palette (Ctrl+Shift+P or Cmd+Shift+P)
- Type "Python: Select Interpreter"
- Look for your virtual environment path (e.g.,
./ds_env/bin/python) - If not listed, click "Enter interpreter path" and browse to
ds_env/bin/python(ords_env\Scripts\python.exeon Windows)
Problem: pip install is slow or failing
Solution: Try upgrading pip first:
python -m pip install --upgrade pip
# Then try installing again
pip install package-name
Introduction to Jupyter Notebooks
Jupyter Notebook
An open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text.
Think of it as a digital notebook where code, results, and explanations live together in perfect harmony.
Jupyter Notebooks are the industry standard for data science work. They let you write code in small chunks (cells), run them individually, see results instantly, and document everything with markdown. Perfect for exploration, analysis, and sharing insights!
Interactive Execution
Run code cells one at a time, see outputs immediately
Rich Output
Display plots, tables, images, HTML directly in notebook
Documentation
Mix code with markdown text, equations, and explanations
Launching Jupyter Notebook
Let's get Jupyter up and running. Make sure you've completed the environment setup above!
Activate Your Virtual Environment
Open your terminal and activate the virtual environment you created:
Windows:
ds_env\Scripts\activate
macOS/Linux:
source ds_env/bin/activate
Launch Jupyter Notebook
With your virtual environment activated, run:
jupyter notebook
This will start a local server and automatically open Jupyter in your default browser at http://localhost:8888
Create a New Notebook
In the Jupyter interface:
- Click the "New" button in the top right
- Select "Python (DS)" or "Python 3" from the dropdown
- A new notebook tab will open
Working with Cells
Notebooks are made up of cells. Mastering cells is key to being productive in Jupyter.
Code Cells
Where you write and execute Python code
- Has
In [ ]:indicator on the left - Shows execution order number after running
- Output appears directly below the cell
Markdown Cells
Where you write formatted text and documentation
- No
In [ ]:indicator - Supports headings, lists, links, images
- Renders when you run the cell
Running Cells
Shift+Enter - Run and Move
Runs current cell and moves to the next one (creates new cell if at end)
Ctrl+Enter (or Cmd+Enter) - Run and Stay
Runs current cell but stays in the same cell
Alt+Enter (or Option+Enter) - Run and Insert
Runs current cell and inserts a new cell below
Essential Keyboard Shortcuts
Learning these shortcuts will make you 10x faster in Jupyter:
Navigation & Execution
| Enter | Enter edit mode |
| Esc | Enter command mode |
| Shift+Enter | Run cell, select below |
| Ctrl+Enter | Run cell, stay selected |
Cell Management (Command Mode)
| A | Insert cell above |
| B | Insert cell below |
| DD | Delete cell (press D twice) |
| M | Change to markdown |
| Y | Change to code |
Jupyter Best Practices
Follow these guidelines to create clean, professional notebooks:
DO
- Use descriptive markdown headings to organize sections
- Import all libraries at the top in the first cell
- Keep cells small and focused on one task
- Run cells in order from top to bottom
- Save frequently (Ctrl+S)
DON'T
- Have giant cells with hundreds of lines
- Run cells out of order (creates confusion)
- Leave sensitive data (passwords, API keys) in notebooks
- Forget to restart kernel when things break
- Share notebooks with 1000+ lines of output
Key Takeaways
Modern Python Setup
Python 3.11+ with pip and virtual environments is the industry-standard approach for data science development
Virtual Environments
Always use venv to isolate project dependencies. Keeps your system clean and projects independent
Jupyter for Interactive Work
Jupyter Notebooks let you write code in cells, run them individually, and see results instantly
Master Shortcuts
Learn Shift+Enter (run), A/B (insert cells), M/Y (markdown/code). These shortcuts make you 10x faster
Command Line Basics
Master basic terminal commands for activating venv, installing packages with pip, and running Python scripts
VS Code Integration
VS Code with Python and Jupyter extensions gives you the best of both worlds: IDE power and notebook interactivity
Knowledge Check
Test your understanding of environment setup concepts:
What is the purpose of using a virtual environment (venv) in Python?
Which command is used to install a Python library using pip?
What is the purpose of adding Python to your system PATH?
Which of these is NOT one of the essential data science libraries we need to install?
What keyboard shortcut runs the current cell in a Jupyter Notebook?
What is the correct command to activate a virtual environment on Windows?