Virtual Expo 2026

Linear Algebraic Approaches to Image Classification

Envision CompSoc

Aim

To train a minimal convolutional neural network on MNIST and then fully deconstruct its learned weight matrices using Singular Value Decomposition, demonstrating that every convolutional layer is equivalently an SVD operation — and to make each component of that equivalence numerically verifiable and visually interpretable.

Introduction

Convolutional Neural Networks have achieved state-of-the-art performance on image classification tasks, yet the internal mathematical structure of a trained network remains opaque in most treatments. A conv layer is commonly described as a learnable bank of spatial filters — but what does gradient descent actually discover when it optimises those filters?

The central insight of this project is that the answer lies in linear algebra. When a conv layer's weight tensor of shape (out_ch, in_ch, kH, kW) is reshaped into a 2-D matrix W of shape (out_ch, in_ch·kH·kW), its full Singular Value Decomposition gives:

W  =  U · diag(σ) · Vᵀ

This is an identity — not an approximation. The forward pass through the layer for any input patch p can therefore be written in three explicit steps:

z  =  Vᵀ p          → project patch onto input eigenfilter basis
s  =  diag(σ) z     → scale each coordinate by its singular value
y  =  U s  =  W p   → rotate into output feature space
h  =  ReLU(y)       → carve piecewise-linear subspace

When gradient descent optimises W, it is implicitly discovering {U, σ, V} — the singular vectors and values. The singular values σᵢ encode how much discriminative energy the network places in each eigenfilter direction vᵢ, and their decay rate reveals the effective dimensionality of each layer. This project implements, verifies, and visualises every claim above across an 8-phase analysis pipeline.

Literature Survey and Technologies Used

Theoretical Background

  • Singular Value Decomposition (Golub & Van Loan, Matrix Computations) — foundational matrix factorisation underlying the entire analysis
  • im2col patch extraction — the standard implementation trick used in cuDNN and PyTorch that transforms convolution into matrix multiplication
  • Low-rank neural network compression (Denton et al., NIPS 2014; Jaderberg et al., ICLR 2014) — empirical observation that conv layer weight matrices are approximately low-rank, motivating the spectral energy analysis
  • Piecewise-linear network analysis (Montufar et al., NIPS 2014) — theoretical treatment of how ReLU networks partition input space into linear regions

Dataset

  • MNIST (LeCun et al., 1998) — 60,000 training and 10,000 test 28×28 greyscale handwritten digit images, 10 classes. Automatically downloaded via torchvision.datasets.MNIST on first run.

Technologies

  • Python 3.x — core language
  • PyTorch / torchvision — model definition, training, MNIST loading
  • NumPy + SciPy — SVD computation (numpy.linalg.svd), numerical verification
  • Matplotlib — all diagnostic visualisations

Methodology

The Network: TinyConvNet

To keep the analysis fully tractable, a deliberately small CNN is used. Bias terms are disabled in both conv layers so that the weight matrix alone captures the entire linear transformation.

Layer Operation Output Shape Weight Shape SVD Shapes
conv1 Conv2d(1, 8, 3×3, bias=False) (B, 8, 26, 26) (8, 9) U:(8,8), σ:(8,), Vᵀ:(8,9)
ReLU + MaxPool(2)   (B, 8, 13, 13)
conv2 Conv2d(8, 16, 3×3, bias=False) (B, 16, 11, 11) (16, 72) U:(16,16), σ:(16,), Vᵀ:(16,72)
ReLU + MaxPool(2)   (B, 16, 5, 5)
fc Linear(400, 10) (B, 10) (10, 400)

SVD Decomposition of Convolutional Layers

Each conv layer weight tensor is reshaped into a 2-D matrix W and the thin SVD is computed:

U, σ, Vᵀ = numpy.linalg.svd(W, full_matrices=False)
Component Shape Meaning
V (right singular vectors) (r, in_ch·kH·kW) Eigenfilters — orthogonal spatial patterns the layer is sensitive to, ranked by importance
σ (singular values) (r,) Importance weights — σᵢ measures how much energy the network places in eigenfilter direction vᵢ
U (left singular vectors) (out_ch, r) Feature directions — orthogonal basis for the output feature space

Low-Rank Approximation

A rank-k reconstruction is obtained by retaining only the top-k singular triplets:

W_k  =  (U[:, :k] * σ[:k])  @  Vᵀ[:k, :]

Sweeping k from 1 to r allows plotting reconstruction error and cumulative spectral energy vs rank, quantifying each layer's effective dimensionality.

Analytical Forward Pass

AnalyticalNet mirrors TinyConvNet but performs every computation in explicit SVD coordinates:

Layer 1:
  z1 = Vᵀ1 @ p        # project patch: (k,)
  s1 = σ1 * z1         # scale:         (k,)
  a1 = U1 @ s1 = W1@p  # lift:          (8,)
  h1 = ReLU(a1)

Layer 2 (on pooled patches f):
  z2 = Vᵀ2 @ f
  s2 = σ2 * z2
  a2 = U2 @ s2 = W2@f
  h2 = ReLU(a2)

output = W_fc @ flatten(MaxPool(h2)) + b_fc

ReLU Subspace Analysis

For each patch p, let M(p) = diag(1[Wp > 0]) be the binary activation mask. Then:

h(p)  =  ReLU(W p)  =  M(p) · W · p

Each unique mask defines a different effective linear map. Layer 1 has 8 output channels giving a theoretical maximum of 2⁸ = 256 distinct linear regions. The analysis counts unique masks across 500 MNIST patches and projects activations onto the U-basis to visualise the space contraction induced by ReLU.

The 8-Phase Pipeline

Phase What it does
1 — Train Train TinyConvNet on MNIST for N epochs; per-epoch loss and test accuracy logging
2 — SVD Decompose both conv layers; compute singular values, cumulative energy, and effective rank
3 — Analytical Build AnalyticalNet; verify W = U·Σ·Vᵀ holds to machine precision; print two-layer SVD composition
4 — Verify Confirm analytical im2col → W·p output matches PyTorch conv output numerically (max diff < 1e-4)
5 — Low-rank Sweep rank k = 1…r; compute relative reconstruction error and cumulative energy captured
6 — ReLU Analyse how ReLU partitions pre-activation space into piecewise-linear regions; count unique activation masks
7 — Composition Verify Layer 2 analytically; verify Parseval identity ‖W‖²_F = Σ σᵢ² for both layers
8 — Plots Generate all diagnostic figures to outputs/

Results

Metric Value Interpretation
W = U·Σ·Vᵀ identity (Layer 1) Max error ~8.9 × 10⁻⁸ Exact decomposition confirmed to machine epsilon
W = U·Σ·Vᵀ identity (Layer 2) Max error ~1.8 × 10⁻⁷ Exact decomposition confirmed to machine epsilon
Test accuracy (20 epochs) 98.71% Competitive with standard CNNs
Layer 1 singular values [2.170, 1.753, 1.496, 1.041, 0.787, 0.623, 0.534, 0.424] 6 of 8 components capture 95% energy
Layer 2 singular values [2.895, 2.480, 2.349, 2.070, …] (16 values) 13/16 for 95% energy; 15/16 for 99%
ReLU unique activation masks 40 out of theoretical max 256 (2⁸) Low effective dimensionality in practice
Frobenius identity (Layer 1) ‖W‖²_F = Σ σᵢ² = 12.5733 Parseval identity verified
Frobenius identity (Layer 2) ‖W‖²_F = Σ σᵢ² = 43.7439 Parseval identity verified
Analytical vs CNN forward pass Max diff < 1 × 10⁻⁶ (both layers) Forward pass equivalence confirmed

Conclusions / Future Scope

Conclusions

This project establishes, numerically and visually, that every convolutional layer is exactly an SVD. Training TinyConvNet on MNIST achieves 98.71% test accuracy, and post-training analysis confirms all theoretical claims to machine epsilon:

  • W = U·Σ·Vᵀ holds with max error below 10⁻⁷ for both layers
  • AnalyticalNet forward pass agrees with PyTorch conv to within 10⁻⁶
  • Parseval/Frobenius identity ‖W‖²_F = Σ σᵢ² verified for both layers
  • Gradient descent discovers eigenfilters with interpretable spatial structure (edges, blobs, diagonals)
  • ReLU activates only ~40 effective linear regions out of 256 theoretical maximum

These results provide a principled explanation for why low-rank compression of conv layers works: the bottom singular components carry negligible energy and removing them barely changes network output.

Future Scope

  • Deeper networks — extending SVD analysis to VGG or ResNet-style architectures with batch normalisation and residual connections
  • SVD-parameterised training — directly optimising {U, σ, V} as separate parameter groups to enforce orthogonality constraints during training
  • Structured pruning — using singular value spectra to guide layer-wise rank reduction with provable output bounds
  • Dataset generalisation — applying the pipeline to CIFAR-10 or ImageNet crops to study how spectrum complexity scales with input difficulty
  • Attention mechanisms — investigating analogous SVD decompositions in transformer attention weight matrices

https://meet.google.com/imi-iqzy-szv

Report Information

Explore More Projects

View All 2026 Projects