Linear Algebra for ML#

Vectors and Matrices#

A vector $\mathbf{x} \in \mathbb{R}^n$ is a column of $n$ real numbers. A matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ maps $\mathbb{R}^n \to \mathbb{R}^m$.

Key operations:

  • Dot product: $\mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i$ — measures alignment
  • Matrix multiply: $(\mathbf{AB}){ij} = \sum_k A{ik} B_{kj}$ — $O(n^3)$ naive
  • Transpose: $(\mathbf{AB})^\top = \mathbf{B}^\top \mathbf{A}^\top$

Eigendecomposition#

For square matrix $\mathbf{A}$: $\mathbf{Av} = \lambda \mathbf{v}$, where $\lambda$ is an eigenvalue and $\mathbf{v}$ a (right) eigenvector.

If $\mathbf{A}$ is symmetric ($A_{ij} = A_{ji}$):

  • All eigenvalues are real
  • Eigenvectors are orthogonal
  • $\mathbf{A} = \mathbf{Q} \Lambda \mathbf{Q}^\top$ (spectral theorem)

Positive semidefinite (PSD): all eigenvalues $\geq 0$. Covariance matrices are always PSD.

Singular Value Decomposition#

Any matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$:

$$\mathbf{A} = \mathbf{U} \Sigma \mathbf{V}^\top$$

  • $\mathbf{U} \in \mathbb{R}^{m \times m}$ — left singular vectors (orthonormal)
  • $\Sigma \in \mathbb{R}^{m \times n}$ — diagonal, singular values $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$
  • $\mathbf{V} \in \mathbb{R}^{n \times n}$ — right singular vectors (orthonormal)

Uses in ML: PCA (truncated SVD), matrix factorization, pseudo-inverse, low-rank approximations.

Norms#

Norm Formula Use
L1 $\sum \lvert x_i \rvert$ sparsity (Lasso)
L2 $\sqrt{\sum x_i^2}$ most common, smooth
L∞ $\max \lvert x_i \rvert$ worst-case error
Frobenius $\sqrt{\sum_{ij} A_{ij}^2}$ matrix L2

Gradient as a Vector#

If $f: \mathbb{R}^n \to \mathbb{R}$, then $\nabla f(\mathbf{x}) \in \mathbb{R}^n$ points in the direction of steepest ascent.

The Jacobian $J \in \mathbb{R}^{m \times n}$ generalizes the gradient for $f: \mathbb{R}^n \to \mathbb{R}^m$. The Hessian $H \in \mathbb{R}^{n \times n}$ is the matrix of second derivatives; $H = \nabla^2 f$.