Linear Algebra for ML#
Vectors and Matrices#
A vector $\mathbf{x} \in \mathbb{R}^n$ is a column of $n$ real numbers. A matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ maps $\mathbb{R}^n \to \mathbb{R}^m$.
Key operations:
- Dot product: $\mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i$ — measures alignment
- Matrix multiply: $(\mathbf{AB}){ij} = \sum_k A{ik} B_{kj}$ — $O(n^3)$ naive
- Transpose: $(\mathbf{AB})^\top = \mathbf{B}^\top \mathbf{A}^\top$
Eigendecomposition#
For square matrix $\mathbf{A}$: $\mathbf{Av} = \lambda \mathbf{v}$, where $\lambda$ is an eigenvalue and $\mathbf{v}$ a (right) eigenvector.
If $\mathbf{A}$ is symmetric ($A_{ij} = A_{ji}$):
- All eigenvalues are real
- Eigenvectors are orthogonal
- $\mathbf{A} = \mathbf{Q} \Lambda \mathbf{Q}^\top$ (spectral theorem)
Positive semidefinite (PSD): all eigenvalues $\geq 0$. Covariance matrices are always PSD.
Singular Value Decomposition#
Any matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$:
$$\mathbf{A} = \mathbf{U} \Sigma \mathbf{V}^\top$$
- $\mathbf{U} \in \mathbb{R}^{m \times m}$ — left singular vectors (orthonormal)
- $\Sigma \in \mathbb{R}^{m \times n}$ — diagonal, singular values $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$
- $\mathbf{V} \in \mathbb{R}^{n \times n}$ — right singular vectors (orthonormal)
Uses in ML: PCA (truncated SVD), matrix factorization, pseudo-inverse, low-rank approximations.
Norms#
| Norm | Formula | Use |
|---|---|---|
| L1 | $\sum \lvert x_i \rvert$ | sparsity (Lasso) |
| L2 | $\sqrt{\sum x_i^2}$ | most common, smooth |
| L∞ | $\max \lvert x_i \rvert$ | worst-case error |
| Frobenius | $\sqrt{\sum_{ij} A_{ij}^2}$ | matrix L2 |
Gradient as a Vector#
If $f: \mathbb{R}^n \to \mathbb{R}$, then $\nabla f(\mathbf{x}) \in \mathbb{R}^n$ points in the direction of steepest ascent.
The Jacobian $J \in \mathbb{R}^{m \times n}$ generalizes the gradient for $f: \mathbb{R}^n \to \mathbb{R}^m$. The Hessian $H \in \mathbb{R}^{n \times n}$ is the matrix of second derivatives; $H = \nabla^2 f$.