\chapter{Linear systems} A linear system: \begin{equation*} \begin{cases} a_{1,1}x_1 + a_{1,2}x_2 + \dots + a_{1,n}x_n = b_1\\ a_{2,1}x_1 + a_{2,2}x_2 + \dots + a_{2,n}x_n = b_2\\ \hspace*{7em} \vdots \\ a_{m,1}x_1 + a_{m,2}x_2 + \dots + a_{m,n}x_n = b_m\\ \end{cases} \end{equation*} can be represented as: \[ \matr{A}\vec{x} = \vec{b} \] where: \[ \matr{A} = \begin{pmatrix} a_{1,1} & a_{1, 2} & \hdots & a_{1,n} \\ a_{2,1} & a_{2, 2} & \hdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m, 2} & \hdots & a_{m,n} \end{pmatrix} \in \mathbb{R}^{m \times n} \hspace*{2em} % \vec{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} \in \mathbb{R}^n \hspace*{2em} % \vec{b} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{pmatrix} \in \mathbb{R}^m \] \section{Square linear systems} \marginnote{Square linear system} A square linear system $\matr{A}\vec{x} = \vec{b}$ with $\matr{A} \in \mathbb{R}^{n \times n}$ and $\vec{x}, \vec{b} \in \mathbb{R}^n$ has a unique solution iff one of the following conditions is satisfied: \begin{enumerate} \item $\matr{A}$ is non-singular (invertible) \item $\text{rank}(\matr{A}) = n$ (full rank) \item $\matr{A}\vec{x}$ only admits the solution $\vec{x} = \nullvec$ \end{enumerate} The solution can be algebraically determined as \marginnote{Algebraic solution to linear systems} \[ \matr{A}\vec{x} = \vec{b} \iff \vec{x} = \matr{A}^{-1}\vec{b} \] However, this approach requires to compute the inverse of a matrix, which has a time complexity of $O(n^3)$. Therefore, numerical methods are usually more suited. The two main families of methods are: \begin{itemize} \item Direct methods. \item Iterative methods. \end{itemize} \section{Direct methods} \marginnote{Direct methods} Direct methods compute the solution of a linear system in a finite number of steps. Compared to iterative methods, they are more precise but more expensive. The most common approach consists in factorizing the matrix $\matr{A}$. \subsection{Gaussian factorization} \marginnote{Gaussian factorization\\(LU decomposition)} Given a square linear system $\matr{A}\vec{x} = \vec{b}$, the matrix $\matr{A} \in \mathbb{R}^{n \times n}$ is factorized into $\matr{A} = \matr{L}\matr{U}$ such that: \begin{itemize} \item $\matr{L} \in \mathbb{R}^{n \times n}$ is a lower triangular matrix. \item $\matr{U} \in \mathbb{R}^{n \times n}$ is an upper triangular matrix. \end{itemize} % The system can be decomposed into: \[ \begin{split} \matr{A}\vec{x} = \vec{b} & \iff \matr{LU}\vec{x} = \vec{b} \\ & \iff \begin{cases} \matr{L}\vec{y} = \vec{b} \\ \vec{y} = \matr{U}\vec{x} \end{cases} \end{split} \] To find the solution, it is sufficient to solve in order: \begin{enumerate} \item $\matr{L}\vec{y} = \vec{b}$ (solved w.r.t. $\vec{y}$) \item $\vec{y} = \matr{U}\vec{x}$ (solved w.r.t. $\vec{x}$) \end{enumerate} The overall complexity is $O(\frac{n^3}{3}) + 2 \cdot O(n^2) = O(\frac{n^3}{3})$.\\ $O(\frac{n^3}{3})$ is the time complexity of the LU factorization. $O(n^2)$ is the complexity to directly solve a system with a triangular matrix (forward or backward substitutions). \subsection{Gaussian factorization with pivoting} \marginnote{Gaussian factorization with pivoting} During the computation of $\matr{A} = \matr{L}\matr{U}$ (using Gaussian elimination\footnote{\url{https://en.wikipedia.org/wiki/LU\_decomposition\#Using\_Gaussian\_elimination}}), a division by 0 may occur. A method to prevent this problem (and to lower the algorithmic error (i.e. overflows)) is to change the order of the rows of $\matr{A}$ before decomposing it. This is achieved by using a permutation matrix $\matr{P}$, which is obtained as a permutation of the identity matrix. The permuted system becomes $\matr{P}\matr{A}\vec{x} = \matr{P}\vec{b}$ and the factorization is obtained as $\matr{P}\matr{A} = \matr{L}\matr{U}$. The system can be decomposed into: \[ \begin{split} \matr{P}\matr{A}\vec{x} = \matr{P}\vec{b} & \iff \matr{L}\matr{U}\vec{x} = \matr{P}\vec{b} \\ & \iff \begin{cases} \matr{L}\vec{y} = \matr{P}\vec{b} \\ \vec{y} = \matr{U}\vec{x} \end{cases} \end{split} \] An alternative formulation (which is what \texttt{SciPy} uses) is defined as: \[\matr{A} = \matr{P}\matr{L}\matr{U} \iff \matr{P}^T\matr{A} = \matr{L}\matr{U} \] It must be noted that $\matr{P}$ is orthogonal, so $\matr{P}^T = \matr{P}^{-1}$. The solution to the system ($\matr{P}^T\matr{A}\vec{x} = \matr{P}^T\vec{b}$) can be found as above. \subsection{Cholesky factorization} Given a symmetric positive definite matrix $\matr{A} \in \mathbb{R}^{n \times n}$. It is possible to decompose $\matr{A}$ as: \[ \matr{A} = \matr{L}\matr{L}^T \] where $\matr{L}$ is lower triangular. A square system where $\matr{A}$ is symmetric definite positive can be solved as above using the Cholesky factorization. This method has time complexity $O(\frac{n^3}{6})$. \section{Iterative methods} \marginnote{Iterative methods} Iterative methods solve a linear system by computing a sequence that converges to the exact solution. Compared to direct methods, they are less precise but computationally faster and more suited for large systems. The overall idea is to build a sequence of vectors $\vec{x}_k$ that converges to the exact solution $\vec{x}^*$: \[ \lim_{k \rightarrow \infty} \vec{x}_k = \vec{x}^* \] Generally, the first vector $\vec{x}_0$ is given (or guessed). Subsequent vectors are computed w.r.t. the previous iteration as $\vec{x}_k = g(\vec{x}_{k-1})$. The two most common families of iterative methods are: \begin{descriptionlist} \item[Stationary methods] \marginnote{Stationary methods} compute the sequence as: \[ \vec{x}_k = \matr{B}\vec{x}_{k-1} + \vec{d} \] where $\matr{B}$ is called iteration matrix and $\vec{d}$ is computed from the $\vec{b}$ vector of the system. The time complexity per iteration is $O(n^2)$. \item[Gradient-like methods] \marginnote{Gradient-like methods} have the form: \[ \vec{x}_k = \vec{x}_{k-1} + \alpha_{k-1}\vec{p}_{k-1} \] where $\alpha_{k-1} \in \mathbb{R}$ and the vector $\vec{p}_{k-1}$ is called direction. \end{descriptionlist} \subsection{Stopping criteria} \marginnote{Stopping criteria} One or more stopping criteria are needed to determine when to truncate the sequence (as it is theoretically infinite). The most common approaches are: \begin{descriptionlist} \item[Residual based] The algorithm is terminated when the current solution is close enough to the exact solution. The residual at iteration $k$ is computed as $\vec{r}_k = \vec{b} - \matr{A}\vec{x}_k$. Given a tolerance $\varepsilon$, the algorithm may stop when: \begin{itemize} \item $\Vert \vec{r}_k \Vert \leq \varepsilon$ (absolute) \item $\frac{\Vert \vec{r}_k \Vert}{\Vert \vec{b} \Vert} \leq \varepsilon$ (relative) \end{itemize} \item[Update based] The algorithm is terminated when the difference between iterations is very small. Given a tolerance $\tau$, the algorithm stops when: \[ \Vert \vec{x}_{k} - \vec{x}_{k-1} \Vert \leq \tau \] \end{descriptionlist} Obviously, as the sequence is truncated, a truncation error is introduced when using iterative methods. \section{Condition number} Inherent error causes inaccuracies during the resolution of a system. This problem is independent of the algorithm and is estimated using exact arithmetic. Given a system $\matr{A}\vec{x} = \vec{b}$, we perturbate $\matr{A}$ and/or $\vec{b}$ and study the inherited error. For instance, if we perturbate $\vec{b}$, we obtain the following system: \[ \matr{A}\tilde{\vec{x}} = (\vec{b} + \Delta\vec{b}) \] After finding $\tilde{\vec{x}}$, we can compute the inherent error as $\Delta\vec{x} = \tilde{\vec{x}} - \vec{x}$. By comparing $\left\Vert \frac{\Delta\vec{x}}{\vec{x}} \right\Vert$ and $\left\Vert \frac{\Delta\vec{b}}{\vec{b}} \right\Vert$, we can compute the error introduced by the perturbation. It can be shown that the distance is: \[ \left\Vert \frac{\Delta\vec{x}}{\vec{x}} \right\Vert \leq \Vert \matr{A} \Vert \cdot \Vert \matr{A}^{-1} \Vert \cdot \left\Vert \frac{\Delta\vec{b}}{\vec{b}} \right\Vert \] Finally, we can define the \textbf{condition number} of a matrix $\matr{A}$ as: \marginnote{Condition number} \[ K(\matr{A}) = \Vert \matr{A} \Vert \cdot \Vert \matr{A}^{-1} \Vert \] A system is \textbf{ill-conditioned} if $K(\matr{A})$ is large \marginnote{Ill-conditioned} (i.e. a small perturbation of the input causes a large change in the output). Otherwise, it is \textbf{well-conditioned}. \marginnote{Well-conditioned} \section{Linear least squares problem} A system $\matr{A}\vec{x} = \vec{b}$ with $\matr{A} \in \mathbb{R}^{m \times n} \text{, } m > n$ does not generally have a solution. \marginnote{Linear least squares} Therefore, instead of finding the exact solution, it is possible to search for a $\tilde{\vec{x}}$ such that: \[ \matr{A}\tilde{\vec{x}} - \vec{b} \approx \nullvec \] In other words, we aim to find a $\tilde{\vec{x}}$ that is close enough to solve the system. This problem is usually formulated as: \[ \tilde{\vec{x}} = \arg\min_{\vec{x} \in \mathbb{R}^n} \Vert \matr{A}\vec{x} - \vec{b} \Vert_2^2 \] It always admits a solution and, depending on $\text{rank}(\matr{A})$, there are two possible cases: \begin{descriptionlist} \item[$\text{rank}(\matr{A}) = n$] The solution is unique for each $b \in \mathbb{R}^m$. \marginnote{Normal equation} It is found by solving the normal equation: \[ \matr{A}^T\matr{A}\vec{x} = \matr{A}^T\vec{b} \] $\matr{A}^T\matr{A}$ is symmetric definite positive and the system can be solved using the Cholesky factorization. \item[$\text{rank}(\matr{A}) < n$] The system admits infinite solutions. Of all the solutions $S$, we are interested in the one with minimum norm: \[ \vec{x}^* = \arg\min_{\vec{x} \in S} \Vert \vec{x} \Vert_2 \] \end{descriptionlist}