Matrix Calculus (for Machine Learning and Beyond)
Abstract

This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions. It emphasizes practical computational applications, such as large-scale optimization and machine learning, where derivatives must be re-imagined in order to be propagated through complicated calculations. The class also discusses efficiency concerns leading to "adjoint" or "reverse-mode" differentiation (a.k.a. "backpropagation"), and gives a gentle introduction to modern automatic differentiation (AD) techniques.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- DOI:
- arXiv:
- arXiv:2501.14787
- Bibcode:
- 2025arXiv250114787B
- Keywords:
-
- Mathematics - History and Overview;
- Computer Science - Machine Learning;
- Mathematics - Numerical Analysis;
- Statistics - Machine Learning
- E-Print:
- Lecture notes for the MIT short course 18.063 "Matrix Calculus", based on the class as taught in January 2023 (also available on MIT OpenCourseWare)