PCA DEMO FACES

PCA DEMO FACES#

UnSupervised Learning - Dimensionality Reduction - Principal Component Analysis (PCA)#

Notebook by:

Royi Avital RoyiAvital@fixelalgorithms.com

Revision History#

Version	Date	User	Content / Changes
1.0.000	13/04/2024	Royi Avital	First version

# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

# Machine Learning
from sklearn.datasets import fetch_olivetti_faces
from sklearn.decomposition import PCA

# Miscellaneous
import math
import os
from platform import python_version
import random
import timeit

# Typing
from typing import Callable, Dict, List, Optional, Self, Set, Tuple, Union

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image
from IPython.display import display
from ipywidgets import Dropdown, FloatSlider, interact, IntSlider, Layout, SelectionSlider
from ipywidgets import interact

Notations#

(?) Question to answer interactively.
(!) Simple task to add code for the notebook.
(@) Optional / Extra self practice.
(#) Note / Useful resource / Food for thought.

Code Notations:

someVar    = 2; #<! Notation for a variable
vVector    = np.random.rand(4) #<! Notation for 1D array
mMatrix    = np.random.rand(4, 3) #<! Notation for 2D array
tTensor    = np.random.rand(4, 3, 2, 3) #<! Notation for nD array (Tensor)
tuTuple    = (1, 2, 3) #<! Notation for a tuple
lList      = [1, 2, 3] #<! Notation for a list
dDict      = {1: 3, 2: 2, 3: 1} #<! Notation for a dictionary
oObj       = MyClass() #<! Notation for an object
dfData     = pd.DataFrame() #<! Notation for a data frame
dsData     = pd.Series() #<! Notation for a series
hObj       = plt.Axes() #<! Notation for an object / handler / function handler

Code Exercise#

Single line fill

vallToFill = ???

Multi Line to Fill (At least one)

# You need to start writing
????

Section to Fill

#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#

# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# Matplotlib default color palette
lMatPltLibclr = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

# Constants

FIG_SIZE_DEF    = (8, 8)
ELM_SIZE_DEF    = 50
CLASS_COLOR     = ('b', 'r')
EDGE_COLOR      = 'k'
MARKER_SIZE_DEF = 10
LINE_WIDTH_DEF  = 2

# Courses Packages
import sys
sys.path.append('../')
sys.path.append('../../')
sys.path.append('../../../')
from utils.DataVisualization import PlotMnistImages

# General Auxiliary Functions

hOrdinalNum = lambda n: '%d%s' % (n, 'tsnrhtdd'[(((math.floor(n / 10) %10) != 1) * ((n % 10) < 4) * (n % 10))::4])

def PlotPcaReconstruction( mX: np.ndarray, dataIdx: int, mU: np.ndarray, vMean: np.ndarray, numComp:int, vSize: np.ndarray, hA: Optional[plt.Axes] = None, figSize: Tuple[int, int] = FIG_SIZE_DEF ) -> plt.Axes:

    if hA is None:
        hF, hA = plt.subplots(nrows = 1, ncols = 3, figsize = figSize)
    else:
        hF = hA.get_figure()

    vX = mX[dataIdx, :]

    if numComp == 0:
        vZ    = [0]
        vHatX = vMean
    else:
        vZ    = mU[:numComp]   @ (vX - vMean) #<! Encode
        vHatX = (mU[:numComp].T @  vZ) + vMean  #<! Decode
        
    mI   = np.clip(np.reshape(vX, vSize), 0, 1)
    mRec = np.clip(np.reshape(vHatX, vSize), 0, 1)

    hA[0].imshow(mI, cmap = 'gray');
    hA[0].set_title('Original Image')

    hA[1].imshow(mRec, cmap = 'gray');
    hA[1].set_title(f'Reconstructed Image, # Components: {numComp}')

    hA[2].stem(vZ, markerfmt = 'b.', label = 'Coefficients')
    hA[2].set_xlabel('Principal Component')
    hA[2].set_ylabel('Coefficient Value')

    return hA

Dimensionality Reduction by PCA#

In this note book we’ll use the PCA approach for dimensionality reduction.

This notebook introduces:

The Olivetti Faces Data Set from AT&T.
Showing the PCA spectrum.
Showing the PCA reconstruction (Eigen Faces).

Eigen Faces#

One of the first successful approaches to face recognition is the concept of Eigenface.
Given enough data (Images) of the subject we build the PCA of the face of each subject.
We use those as a mean to recognize the person.

(#) PCA is the most basic dimensionality reduction operator.
(#) The PCA output is a linear combination of the input.
(#) Conceptually we may think of Dimensionality Reduction as a soft feature selection / mixture.

# Parameters

# Data
tImgSize = (64, 64)
numRows = 3
numCols = 3

# Model

Generate / Load Data#

In this notebook we’ll use the Olivetti Faces Data Set from AT&T.
The data set is available on SciKit Learn using fetch_olivetti_faces().

The data set itself is built like the MNIST, each row is an image.
The size of the images is (64, 64) and there are 40 classes.

There are ten different images of each of 40 distinct subjects.
For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses).
All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).

# Load Data

mX, vY   = fetch_olivetti_faces(return_X_y = True)

print(f'The features data shape: {mX.shape}')
print(f'The features data type: {mX.dtype}')

(?) Do we need to scale the data?

Plot Data#

# Plot the Data

hF = PlotMnistImages(mX, vY, numRows, numCols, tuImgSize = tImgSize)

Applying Dimensionality Reduction - PCA#

The PCA method basically treats the data as a Gaussian Distribution.
Hence, it basically decomposes the ellipsoid into its radius components, each in its own orthogonal to the others, direction.
Those are sorted by the variance along each direction.

# Applying the PCA Model
numComp    = min(mX.shape)
numSamples = mX.shape[0]

oPCA = PCA(n_components = numComp) #<! Calculate all the components of the data (The default)
oPCA = oPCA.fit(mX)

Plot the Mean Image#

The PCA works on a centered data.
Hence the mean image is kept a side for the reconstruction.

# Plot the Mean Image

hF = PlotMnistImages(np.atleast_2d(oPCA.mean_), np.array(['Mean Image']), 1, 1, tuImgSize = tImgSize)

# Plot the PCA Spectrum

vλ = oPCA.explained_variance_ratio_

hF, hA = plt.subplots(figsize = (12, 6))
hA.stem(np.sqrt(vλ[:200]), markerfmt = 'b.', label = '$\\sqrt{\lambda_i}$')
hA.set_title('Eigen Values')
hA.set_xlabel('$i$')
hA.legend()

plt.show()

# Plot the Energy Ratio

vλ = oPCA.explained_variance_ratio_

hF, hA = plt.subplots(figsize = (12, 6))
hA.stem(vλ, markerfmt = 'b.', label = '$Ratio$')
hA.set_title('Variance Ratio')
hA.set_xlabel('$Component Index$')
hA.legend()

plt.show()

(#) Look at the rate the accumulated explained energy is accumulated.

# Plot the Components

mU = oPCA.components_ #<! mU.shape = (n_components, n_features)

hF, hAs = plt.subplots(nrows = 2, ncols = 5, figsize = (12, 6))
vIdx    = list(range(5)) + list(range(numComp - 5, numComp))
for kk, hA in zip(range(10), hAs.flat):
    idx = vIdx[kk]
    mI  = np.reshape(mU[idx], tImgSize)
    hA.imshow(mI)
    hA.set_title(f'{hOrdinalNum(idx + 1)} Principal Component')
    
hF.tight_layout()
plt.show()

PCA Reconstruction#

Encode: $$\boldsymbol{z}_{i}=\boldsymbol{U}_{d}^{T}\left(\boldsymbol{x}_{i}-\boldsymbol{\mu}_{x}\right)$$
Decode: $$\hat{\boldsymbol{x}}_{i}=\boldsymbol{U}_{d}\boldsymbol{z}_{i}+\boldsymbol{\mu}_{x}$$

# Plotting Function Wrapper
hPlotPcaReconstruction = lambda dataIdx, numComponents: PlotPcaReconstruction(mX, dataIdx, mU, oPCA.mean_, numComponents, tImgSize, figSize = (14, 4))

# Interactive Visualization 
dataIdxSlider = IntSlider(min = 0, max = numSamples - 1, step = 1, value = 0, layout = Layout(width = '30%'))
numComponentsSlider = IntSlider(min = 0, max = numComp, step = 1, value = 0, layout = Layout(width = '30%'))

interact(hPlotPcaReconstruction, dataIdx = dataIdxSlider, numComponents = numComponentsSlider)

plt.show()

(?) Describe how the actual recognition of a given face is done.
(@) Remove one image from each class. Then build a recognition system based on all other images. Show the success rate.

?#

how that will help to recognize a new face?

each face will have Zi = U^T (Xi - mu)
Then, for a new face, we’ll calculate its Zi and compare it to the existing ones to get the closest one.

define some threshold (Epsilon-Rec) for the distance to consider it as a known face.