Menu
×
×
   ❮   
PYTHON FOR DJANGO DJANGO FOR BEGINNERS DJANGO SPECIFICS PAYMENT INTEGRATION API BASICS NUMPY FOR ML Roadmap
     ❯   

PROJECTS

Project 2 - Principal Component Analysis (PCA) Implementation

×

Share this Topic

Share Via:

Thank you for sharing!


Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while retaining as much variance as possible. It is widely used in data preprocessing, visualization, and noise reduction.

Theory and Background

PCA works by performing the following steps:

  1. Compute the Mean: Center the data by subtracting the mean of each feature.
  2. Calculate the Covariance Matrix: This matrix measures how different features vary with one another.
  3. Compute Eigenvalues and Eigenvectors: Solve for the principal components, which will represent the new axes.
  4. Select Top Principal Components: Choose the top 'k' eigenvectors based on the eigenvalues, which represent the direction of maximum variance.
  5. Transform the Data: Project the original data onto the new principal component axes to reduce its dimensionality.

Implementation

Below is the Python code that implements PCA from scratch:


import numpy as np

def pca(X, n_components):
    # Center the data by subtracting the mean of each feature
    X_meaned = X - np.mean(X, axis=0)

    # Compute the covariance matrix
    cov_matrix = np.cov(X_meaned, rowvar=False)

    # Compute eigenvalues and eigenvectors
    eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)

    # Sort eigenvectors by eigenvalues in descending order
    sorted_indices = np.argsort(eigenvalues)[::-1]
    eigenvectors = eigenvectors[:, sorted_indices]
    eigenvalues = eigenvalues[sorted_indices]

    # Select top n_components eigenvectors
    selected_eigenvectors = eigenvectors[:, :n_components]

    # Transform data by projecting it onto the new principal component axes
    X_reduced = np.dot(X_meaned, selected_eigenvectors)
    return X_reduced

Visualization and Application

We can apply PCA on the Iris dataset and visualize the transformed data in a 2D space:


import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Apply PCA to reduce the dataset to 2 principal components
X_pca = pca(X, 2)

# Plot the transformed data
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA on Iris Dataset')
plt.colorbar(label='Target Label')
plt.show()

Output Visualization

The plot above visualizes the Iris dataset in 2D after applying PCA. The different colors represent the three target labels (species), and the points are scattered based on the first two principal components. The PCA technique has reduced the data from 4 dimensions to 2, allowing easier visualization while preserving as much variance as possible.


Django-tutorial.dev is dedicated to providing beginner-friendly tutorials on Django development. Examples are simplified to enhance readability and ease of learning. Tutorials, references, and examples are continuously reviewed to ensure accuracy, but we cannot guarantee complete correctness of all content. By using Django-tutorial.dev, you agree to have read and accepted our terms of use , cookie policy and privacy policy.

© 2025 Django-tutorial.dev .All Rights Reserved.
Django-tutorial.dev is styled using Bootstrap 5.
And W3.CSS.