Linear Algebra for ML#

Vectors and Matrices#

A vector $\mathbf{x} \in \mathbb{R}^n$ is a column of $n$ real numbers. A matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ maps $\mathbb{R}^n \to \mathbb{R}^m$.

Key operations:

Dot product: $\mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i$ — measures alignment
Matrix multiply: $(\mathbf{AB}){ij} = \sum_k A{ik} B_{kj}$ — $O(n^3)$ naive
Transpose: $(\mathbf{AB})^\top = \mathbf{B}^\top \mathbf{A}^\top$

Eigendecomposition#

For square matrix $\mathbf{A}$: $\mathbf{Av} = \lambda \mathbf{v}$, where $\lambda$ is an eigenvalue and $\mathbf{v}$ a (right) eigenvector.

If $\mathbf{A}$ is symmetric ($A_{ij} = A_{ji}$):

All eigenvalues are real
Eigenvectors are orthogonal
$\mathbf{A} = \mathbf{Q} \Lambda \mathbf{Q}^\top$ (spectral theorem)

Positive semidefinite (PSD): all eigenvalues $\geq 0$. Covariance matrices are always PSD.

Singular Value Decomposition#

Any matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$:

$$\mathbf{A} = \mathbf{U} \Sigma \mathbf{V}^\top$$

$\mathbf{U} \in \mathbb{R}^{m \times m}$ — left singular vectors (orthonormal)
$\Sigma \in \mathbb{R}^{m \times n}$ — diagonal, singular values $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$
$\mathbf{V} \in \mathbb{R}^{n \times n}$ — right singular vectors (orthonormal)

Uses in ML: PCA (truncated SVD), matrix factorization, pseudo-inverse, low-rank approximations.

Norms#

Norm	Formula	Use
L1	$\sum \lvert x_i \rvert$	sparsity (Lasso)
L2	$\sqrt{\sum x_i^2}$	most common, smooth
L∞	$\max \lvert x_i \rvert$	worst-case error
Frobenius	$\sqrt{\sum_{ij} A_{ij}^2}$	matrix L2

Gradient as a Vector#

If $f: \mathbb{R}^n \to \mathbb{R}$, then $\nabla f(\mathbf{x}) \in \mathbb{R}^n$ points in the direction of steepest ascent.

The Jacobian $J \in \mathbb{R}^{m \times n}$ generalizes the gradient for $f: \mathbb{R}^n \to \mathbb{R}^m$. The Hessian $H \in \mathbb{R}^{n \times n}$ is the matrix of second derivatives; $H = \nabla^2 f$.