Appendix L — Moore-Penrose Inversion & Cholesky Decomposition

The Moore-Penrose inversion and the Cholesky decomposition are two important methods in linear algebra for solving linear equations efficiently. They are widely used in various applications, including statistics, machine learning, and numerical analysis.

L.1 Properties of transpose

transpose of a row vector is a column vector and vice versa:

kA^{T} = {kA}^\top \qquad \text{scalar multiplication } \tag{L.1}

(A^{T})^\top = A \qquad \text { involution } \tag{L.2}

(A+B)^{T} = A^\top + B^\top \qquad\text { distributivity under addition } \tag{L.3}

(AB)^\top = B^\top A^\top \qquad\text { anti } \tag{L.4}

note that we swap the order of the matrices in the product when taking the transpose.

if A is a square matrix, then the following are equivalent: Sq = A^\top*A = A*A^\top where Sq is a symmetric positive definite matrix.

Sk = A^\top*A = A*A^\top

L.2 Full Rank

A matrix is said to be of full row rank if its rows are linearly independent, and it is of full column rank if its columns are linearly independent. A matrix is said to be of full rank if it is either of full row rank or full column rank.

L.3 Generelised (Moore-Penrose) Inverse

The Moore-Penrose inversion is a method for computing the pseudoinverse of a matrix. The pseudoinverse is a generalization of the inverse of a matrix that can be used to solve linear equations when the matrix is rectangular, not-invertible or even singular.

Definition L.1 (Definition of the Moore-Penrose Inverse 1) The Moore–Penrose inverse of the m × n matrix A is the n × m matrix, denoted by A^+, which satisfies the conditions

AA+ A = A \tag{L.5}

A^+AA^+ = A^+ \tag{L.6}

(AA^+ )' = AA^+ \tag{L.7}

(A^+A)' = A^+A \tag{L.8}

An important features of the Moore–Penrose inverse, is that it is uniquely defined.

Corresponding to each m × n matrix A, one and only one n × m matrix A^+ exists satisfying conditions (Equation L.5)–(Equation L.8).

Definition Definition L.1 is the definition of a generalized inverse given by Penrose (1955).

The following alternative definition, which we will find useful on some occasions, utilizes properties of the Moore–Penrose inverse that were first illustrated by Moore (1935).

Definition L.2 (Definition of the Moore-Penrose Inverse 2) Let A be an m × n matrix. Then the Moore–Penrose inverse of A is the unique n × m matrix A^+ satisfying

AA^+ = P_{R(A)} \tag{L.9}

A^+ A = P_{R(A^+)} \tag{L.10}

where P_{R(A)} and P_{R(A^+)} are the projection matrices of the range spaces of A and A^+, respectively.

Theorem L.1 (Properties of the Moore-Penrose inverse) Let A be an m \times n matrix. Then:

(\alpha_A)^+ = \alpha^{-1} A^+ , \text{ if } \alpha \ne 0 \text{ is a scalar}
(A^\top)^+ = (A^+)^\top
(A^+)^+ = A
A^+ = A^{-1} ,\text{if A is square and nonsingular}
(A^\top A)^+ = A^+ A^\top and (AA^\top)^+ = A^\top A^+
(AA^+)^+ = AA^+ and (A^+ A)^+ = A^+ A
A^+ = (A^\top A)^+ A^\top = A^\top (AA^\top)^+
A^+ = (A^\top A)^{-1} A^\top and A^+ A = I_n , \text{ if } rank(A) = n
A^+ = A^\top (AA^\top)^{-1} and AA^+ = I_m , \text{ if } rank(A) = m
A^+ = A^\top if the columns of A are orthogonal, that is, A^\top A = I_n

Theorem L.2 (Rank of Moore-Penrose Inverse) For any m \times n matrix A, \text{rank}(A) = \text{rank}(A^+) = \text{rank}(AA^+) = \text{rank}(A^+ A).

Theorem L.3 (Symmetric Moore-Penrose Inverse) Let A be an m × m symmetric matrix. Then a. A^+ is also symmetric, b. AA^+ = $A^+A, c. A^+ = A, if A is idempotent.

The Moore-Penrose inverse is particularly useful in maximum likelihood estimation (MLE) for linear models. In MLE, we often need to solve linear equations of the form Ax = b, where A is the design matrix and b is the response vector. If A is not full rank or is not square, we can use the Moore-Penrose inverse to find a solution that minimizes the residual sum of squares.

In the context of MLE, the Moore-Penrose inverse allows us to obtain parameter estimates even when the design matrix is singular or when there are more predictors than observations. This is achieved by projecting the response vector onto the column space of the design matrix, leading to a solution that is consistent and has desirable statistical properties.

we start with:

y = \mathbf{X} \boldsymbol{\beta} + \boldsymbol\varepsilon \qquad \boldsymbol\varepsilon \sim \mathcal{N} (0, v\mathbf{I})

we want MLE of \boldsymbol{\beta}, which is given by: \hat{\boldsymbol{\beta}}_{NKE} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top y

Also in the bayesian setting we can show that MLE is equivalent to minimizing the negative log-likelihood function, under a uniform prior on \boldsymbol{\beta}, which is equivalent to minimizing the residual sum of squares.

we can show that if we use least squares AKA l_2 norm minimization, we will end up with the Moore-Penrose inverse to find the solution:

we can write this explicitly as:

\mathbb{E}_{l_2}(\boldsymbol{\beta}) = \frac{1}{2} \sum (y - \boldsymbol{\beta}^\top \mathbf{X})^2

L.4 Cholesky Decomposition

The Cholesky decomposition is a method for factorizing a positive definite matrix into the product of a lower triangular matrix and its transpose. It is particularly useful for solving systems of linear equations and for generating samples from multivariate normal distributions.

Definition L.3 (Definition of the Cholesky Decomposition) André-Louis Cholesky (1875–1918) was a cartographer in the French army, who introduced a method for decomposing a symmetric positive definite matrix into the product of a lower triangular matrix and its transpose. This decomposition is known as the Cholesky decomposition.

Let A be a symmetric positive definite matrix. The Cholesky decomposition of A is a factorization of the form: A = LL^\top \tag{L.11} where L is a lower triangular matrix with positive diagonal entries.

The Cholesky decomposition is unique for a given positive definite matrix, and it can be computed efficiently using algorithms such as the Doolittle algorithm or the Crout algorithm.

L.4.1 Doolittle Algorithm

def doolittle(A):
    """Performs Doolittle LU decomposition: A = LU, with L unit lower triangular and U upper triangular."""
    n = len(A)
    L = [[0.0]*n for _ in range(n)]
    U = [[0.0]*n for _ in range(n)]
    
    for i in range(n):
        # Upper Triangular
        for k in range(i, n):
            U[i][k] = A[i][k] - sum(L[i][j]*U[j][k] for j in range(i))
        
        # Lower Triangular
        L[i][i] = 1.0
        for k in range(i+1, n):
            if U[i][i] == 0:
                raise ZeroDivisionError("Zero pivot encountered.")
            L[k][i] = (A[k][i] - sum(L[k][j]*U[j][i] for j in range(i))) / U[i][i]
    
    return L, U


def print_matrix(M, name="Matrix"):
    print(f"{name} =")
    for row in M:
        print("  [" + "  ".join(f"{val:8.3f}" for val in row) + "]")
    print()

A = [
    [2, 3, 1],
    [4, 7, 7],
    [6, 18, 22]
]

print_matrix(A, name="A")


L, U = doolittle(A)
print_matrix(L, name="L")
print_matrix(U, name="U")

A =
  [   2.000     3.000     1.000]
  [   4.000     7.000     7.000]
  [   6.000    18.000    22.000]

L =
  [   1.000     0.000     0.000]
  [   2.000     1.000     0.000]
  [   3.000     9.000     1.000]

U =
  [   2.000     3.000     1.000]
  [   0.000     1.000     5.000]
  [   0.000     0.000   -26.000]

L.4.2 Doolittle’s Algorithm with Partial Pivoting

When performing LU decomposition, it is often necessary to use partial pivoting to ensure numerical stability and to handle cases where the matrix may be singular or nearly singular. Partial pivoting involves swapping rows of the matrix to place the largest absolute value in the pivot position.

Adding partial pivoting is algebraically equivalent to multiplying the original matrix by a permutation matrix P, such that PA = LU, where P is a permutation matrix, L is a lower triangular matrix, and U is an upper triangular matrix.

def doolittle_partial_pivoting(A):
    """Performs LU decomposition with partial pivoting: PA = LU."""
    n = len(A)
    # Deep copy of A
    A = [row[:] for row in A]
    P = list(range(n))
    L = [[0.0]*n for _ in range(n)]
    U = [[0.0]*n for _ in range(n)]

    for k in range(n):
        # Partial pivoting: find row with max abs value in column k
        pivot_row = max(range(k, n), key=lambda i: abs(A[i][k]))
        if A[pivot_row][k] == 0:
            raise ZeroDivisionError("Matrix is singular.")

        # Swap rows in A and record permutation
        A[k], A[pivot_row] = A[pivot_row], A[k]
        P[k], P[pivot_row] = P[pivot_row], P[k]
        for i in range(k):
            L[k][i], L[pivot_row][i] = L[pivot_row][i], L[k][i]

        # Compute U[k][k:] and L[k+1:][k]
        L[k][k] = 1.0
        for j in range(k, n):
            U[k][j] = A[k][j] - sum(L[k][s]*U[s][j] for s in range(k))
        for i in range(k+1, n):
            L[i][k] = (A[i][k] - sum(L[i][s]*U[s][k] for s in range(k))) / U[k][k]

    # Permutation matrix P as a 2D matrix
    P_matrix = [[1 if j == P[i] else 0 for j in range(n)] for i in range(n)]
    return P_matrix, L, U

Demo for Doolittle’s algorithm with partial pivoting:

A = [
    [0, 3, 1],
    [4, 7, 7],
    [6, 18, 22]
]

print_matrix(A, name="A")


P, L, U = doolittle_partial_pivoting(A)
print_matrix(P, name="P")
print_matrix(L, name="L")
print_matrix(U, name="U")

A =
  [   0.000     3.000     1.000]
  [   4.000     7.000     7.000]
  [   6.000    18.000    22.000]

P =
  [   0.000     0.000     1.000]
  [   0.000     1.000     0.000]
  [   1.000     0.000     0.000]

L =
  [   1.000     0.000     0.000]
  [   0.667     1.000     0.000]
  [   0.000    -0.600     1.000]

U =
  [   6.000    18.000    22.000]
  [   0.000    -5.000    -7.667]
  [   0.000     0.000    -3.600]

A = [
    [2, 1, 1, 3, 2],
    [1, 2, 2, 1, 1],
    [3, 2, 3, 2, 1],
    [2, 1, 2, 2, 1],
    [1, 1, 1, 1, 1]
]

P, L, U = doolittle_partial_pivoting(A)

print_matrix(P, "P")
print_matrix(L, "L")
print_matrix(U, "U")

def matmul(A, B):
    return [[sum(A[i][k] * B[k][j] for k in range(len(B)))
             for j in range(len(B[0]))]
            for i in range(len(A))]

def permute(A, P):
    """P is a permutation matrix; return PA."""
    return matmul(P, A)

PA = permute(A, P)
LU = matmul(L, U)

# Print comparison
print_matrix(PA, "PA")
print_matrix(LU, "LU")

P =
  [   0.000     0.000     1.000     0.000     0.000]
  [   0.000     1.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     1.000     0.000]
  [   1.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     1.000]

L =
  [   1.000     0.000     0.000     0.000     0.000]
  [   0.333     1.000     0.000     0.000     0.000]
  [   0.667    -0.250     1.000     0.000     0.000]
  [   0.667    -0.250    -3.000     1.000     0.000]
  [   0.333     0.250    -1.000     0.250     1.000]

U =
  [   3.000     2.000     3.000     2.000     1.000]
  [   0.000     1.333     1.000     0.333     0.667]
  [   0.000     0.000     0.250     0.750     0.500]
  [   0.000     0.000     0.000     4.000     3.000]
  [   0.000     0.000     0.000     0.000     0.250]

PA =
  [   3.000     2.000     3.000     2.000     1.000]
  [   1.000     2.000     2.000     1.000     1.000]
  [   2.000     1.000     2.000     2.000     1.000]
  [   2.000     1.000     1.000     3.000     2.000]
  [   1.000     1.000     1.000     1.000     1.000]

LU =
  [   3.000     2.000     3.000     2.000     1.000]
  [   1.000     2.000     2.000     1.000     1.000]
  [   2.000     1.000     2.000     2.000     1.000]
  [   2.000     1.000     1.000     3.000     2.000]
  [   1.000     1.000     1.000     1.000     1.000]

L.4.3 Vectorization of the doolittle algorithm

import numpy as np

def doolittle_numpy(A):
    """LU decomposition with partial pivoting using NumPy. Returns P, L, U such that PA = LU."""
    A = np.array(A, dtype=float)
    n = A.shape[0]
    P = np.eye(n)
    L = np.zeros((n, n))
    U = A.copy()

    for k in range(n):
        # Partial pivoting
        pivot = np.argmax(abs(U[k:, k])) + k
        if U[pivot, k] == 0:
            raise ZeroDivisionError("Matrix is singular.")
        if pivot != k:
            U[[k, pivot]] = U[[pivot, k]]
            P[[k, pivot]] = P[[pivot, k]]
            L[[k, pivot], :k] = L[[pivot, k], :k]

        L[k, k] = 1.0
        L[k+1:, k] = U[k+1:, k] / U[k, k]
        U[k+1:] -= np.outer(L[k+1:, k], U[k])

    return P, L, U

def random_sign_matrix(n, seed=None):
    """Generate an n×n matrix with random entries in {-1, 0, 1}."""
    rng = np.random.default_rng(seed)
    return rng.choice([-1, 0, 1], size=(n, n))

A = [
    [2, 3, 1],
    [4, 7, 7],
    [6, 18, 22]
]
A = random_sign_matrix(16, seed=42)

print_matrix(A, name="A")

P, L, U = doolittle_numpy(A)

print_matrix(P, name="P")

print_matrix(L, name="L")

print_matrix(U, name="U")

A =
  [  -1.000     1.000     0.000     0.000     0.000     1.000    -1.000     1.000    -1.000    -1.000     0.000     1.000     1.000     1.000     1.000     1.000]
  [   0.000    -1.000     1.000     0.000     0.000     0.000    -1.000     1.000     1.000     0.000     0.000     1.000     0.000     0.000     0.000    -1.000]
  [  -1.000     0.000     1.000    -1.000     1.000     1.000    -1.000     0.000    -1.000     1.000     1.000     0.000    -1.000     1.000     0.000     1.000]
  [   1.000     1.000     1.000    -1.000     0.000     0.000     0.000    -1.000     0.000    -1.000     1.000     1.000     1.000     1.000     0.000     1.000]
  [   0.000    -1.000     1.000     0.000    -1.000     0.000     1.000    -1.000     0.000    -1.000     1.000     0.000    -1.000    -1.000     0.000     1.000]
  [   1.000     0.000    -1.000     1.000     0.000     1.000    -1.000    -1.000     1.000     1.000     0.000     1.000     1.000     0.000     1.000    -1.000]
  [  -1.000     1.000     0.000    -1.000     1.000    -1.000     1.000    -1.000     1.000     1.000     1.000     0.000     0.000     1.000    -1.000     1.000]
  [   0.000     0.000     0.000     0.000    -1.000    -1.000    -1.000    -1.000     0.000     1.000     0.000     0.000     1.000     0.000    -1.000     1.000]
  [   0.000     0.000     0.000     0.000    -1.000     0.000     1.000    -1.000     0.000    -1.000     0.000     0.000     1.000    -1.000    -1.000     0.000]
  [   1.000     1.000    -1.000    -1.000     1.000    -1.000     1.000    -1.000     1.000    -1.000     0.000     0.000    -1.000     0.000     0.000     1.000]
  [   1.000     0.000     0.000     0.000     0.000     1.000    -1.000    -1.000     0.000    -1.000    -1.000    -1.000     1.000     1.000     1.000     0.000]
  [   1.000    -1.000     1.000     0.000     1.000    -1.000     0.000     1.000     0.000     0.000    -1.000     0.000    -1.000    -1.000     1.000     0.000]
  [   0.000     0.000     1.000    -1.000     0.000    -1.000     0.000     1.000     0.000     1.000     0.000     1.000     0.000    -1.000     1.000     1.000]
  [  -1.000     1.000    -1.000     1.000     1.000     0.000     1.000    -1.000    -1.000    -1.000     0.000     1.000    -1.000     0.000     1.000    -1.000]
  [   1.000    -1.000     1.000     0.000     0.000    -1.000     0.000     1.000    -1.000     1.000     0.000     1.000     0.000     0.000    -1.000     0.000]
  [  -1.000     0.000    -1.000     0.000     0.000    -1.000     1.000     0.000     1.000    -1.000    -1.000     0.000    -1.000    -1.000     0.000    -1.000]

P =
  [   1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.000]

L =
  [   1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [  -1.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [  -1.000     1.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [  -1.000     0.500     0.750     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [  -1.000     0.000    -0.500     0.000     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   1.000    -0.500    -0.750    -1.000     0.667     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [  -0.000     0.000    -0.000     0.000    -0.667    -0.571     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   1.000     0.000     0.500     0.667     0.667    -0.714    -0.938     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [   1.000     0.000    -0.000    -0.667     0.333    -0.143    -0.187     0.537     1.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [  -0.000    -0.500    -0.750    -0.333     0.333     0.286     0.375    -0.232     0.670     1.000     0.000     0.000     0.000     0.000     0.000     0.000]
  [  -1.000     0.500     0.250     0.333     0.000     0.429     0.125    -0.189    -0.393     0.619     1.000     0.000     0.000     0.000     0.000     0.000]
  [  -0.000     0.000    -0.500    -0.667     0.000     0.000     0.438    -0.242     0.214    -0.636    -0.177     1.000     0.000     0.000     0.000     0.000]
  [  -1.000     0.000    -0.500     0.000     0.333    -0.143     0.250    -0.211    -0.625    -0.996    -0.786     0.408     1.000     0.000     0.000     0.000]
  [  -0.000     0.000    -0.000     0.000    -0.667    -0.143    -0.188     0.032    -0.161     0.360     0.480     0.771     0.733     1.000     0.000     0.000]
  [  -0.000    -0.500    -0.750    -0.333    -0.333     0.143    -0.250     0.042    -0.045     0.933     0.070     0.108     0.143     0.431     1.000     0.000]
  [   1.000    -0.500     0.250    -0.333    -0.333    -0.286     0.062     0.158     0.554     0.895     0.568     0.802     0.025     0.140    -0.221     1.000]

U =
  [  -1.000     1.000     0.000     0.000     0.000     1.000    -1.000     1.000    -1.000    -1.000     0.000     1.000     1.000     1.000     1.000     1.000]
  [   0.000     2.000     1.000    -1.000     0.000     1.000    -1.000     0.000    -1.000    -2.000     1.000     2.000     2.000     2.000     1.000     2.000]
  [   0.000     0.000    -2.000     0.000     1.000    -1.000     1.000     0.000     1.000     0.000    -1.000    -1.000    -2.000    -1.000     0.000     0.000]
  [   0.000     0.000     0.000     1.500    -0.750     2.250    -2.250     0.000    -0.250     1.000     0.250     1.750     2.500     0.750     1.500    -1.000]
  [   0.000     0.000     0.000     0.000     1.500    -0.500    -0.500     2.000    -0.500    -1.000    -1.500     0.500    -1.000    -0.500     2.000     1.000]
  [   0.000     0.000     0.000     0.000     0.000     2.333    -1.667    -2.333     0.333     2.667     2.000     0.667     0.667     1.333    -0.333    -0.667]
  [   0.000     0.000     0.000     0.000     0.000     0.000    -2.286    -1.000    -0.143     1.857     0.143     0.714     0.714     0.429     0.143     1.286]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000    -5.938     0.104     3.646     2.896     0.146    -0.854     0.688    -2.437    -1.271]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.965     1.772     0.425     0.151     1.688     0.568    -0.379    -0.172]
  [   0.000     0.000     0.000     0.000     0.000    -0.000     0.000     0.000     0.000    -2.134     0.095     1.141    -1.120    -0.096     0.064    -1.137]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000    -1.551    -2.328     1.486     0.788     0.474     0.854]
  [   0.000     0.000     0.000     0.000     0.000    -0.000     0.000     0.000     0.000     0.000     0.000     1.671    -0.663    -1.065     1.553    -1.072]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.449     2.208    -1.697    -0.149]
  [   0.000     0.000     0.000     0.000     0.000    -0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000    -2.134     0.123     1.760]
  [   0.000     0.000     0.000     0.000     0.000    -0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     1.765     2.843]
  [   0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.802]

L.4.4 Crout’s Algorithm

Crout’s algorithm is another method for performing LU decomposition, similar to Doolittle’s algorithm. It constructs the lower triangular matrix L directly, while the upper triangular matrix U is obtained from the original matrix A.

the key parts are the formula for computing the entries of L:

L[i][j] = A[i][j] - \sum_{k=0}^{j-1} L[i][k] U[k][j]

and using this to computing the entries of U:

U[j][i] = \frac{A[j][i] - \sum_{k=0}^{j-1} L[j][k] U[k][i]}{L[j][j]}

import numpy as np

def crout_lu(A, *, pivot=True, rtol=1e-9, atol=None):
    """
    Robust Crout LU with partial pivoting.

    Improvements
    ------------
    1. Pivot is selected from the *updated* column, eliminating the
       “false zero-pivot” issue.
    2. Test `abs(pivot) < max(atol, rtol*‖A‖∞)` so tolerance scales with data.

    Returns L, U, P such that  P @ A = L @ U.
    """
    A = np.asarray_chkfinite(A, dtype=float)
    n  = A.shape[0]
    if A.ndim != 2 or n != A.shape[1]:
        raise ValueError("square matrix required")

    L = np.zeros_like(A)
    U = np.eye(n, dtype=A.dtype)
    P = np.eye(n, dtype=A.dtype)
    rows = np.arange(n)

    if atol is None:
        atol = np.finfo(A.dtype).eps * np.linalg.norm(A, np.inf) * 10

    for k in range(n):
        # current residual column
        col = A[rows[k:], k] - L[k:, :k] @ U[:k, k]

        if pivot:
            j = k + np.argmax(np.abs(col))        # best row
            if np.abs(col[j - k]) < max(atol, rtol*np.abs(col).max()):
                raise np.linalg.LinAlgError("matrix is numerically singular")
            if j != k:                            # swap logical rows
                rows[[k, j]] = rows[[j, k]]
                L[[k, j], :k] = L[[j, k], :k]
                P[[k, j]] = P[[j, k]]
                col[[0, j - k]] = col[[j - k, 0]]

        L[k:, k] = col
        if np.abs(L[k, k]) < max(atol, rtol*np.abs(col).max()):
            raise np.linalg.LinAlgError("zero pivot encountered")

        # row k of U (unit diagonal)
        U[k, k+1:] = (A[rows[k], k+1:] - L[k, :k] @ U[:k, k+1:]) / L[k, k]

    return L, U, P

Test the Crout LU decomposition with a random full rank sign matrix:

def random_full_rank_sign_matrix(n, seed=42):
    rng = np.random.default_rng(seed)
    while True:
        A = rng.choice([-1,0, 1], size=(n, n))
        if np.linalg.matrix_rank(A) == n:
            return A

A = random_full_rank_sign_matrix(12, seed=42)
print_matrix(A, name="A")
A_orig = A.copy()  # Keep original for verification
L, U, P = crout_lu(A)

# Check correctness
assert np.allclose(P @ A_orig, L @ U, atol=1e-8)

A =
  [  -1.000     1.000     0.000     0.000     0.000     1.000    -1.000     1.000    -1.000    -1.000     0.000     1.000]
  [   1.000     1.000     1.000     1.000     0.000    -1.000     1.000     0.000     0.000     0.000    -1.000     1.000]
  [   1.000     0.000     0.000     1.000     0.000     0.000     0.000    -1.000    -1.000     0.000     1.000    -1.000]
  [   1.000     1.000    -1.000     0.000    -1.000     1.000     1.000     0.000    -1.000     1.000     0.000     1.000]
  [   1.000     1.000     1.000    -1.000     0.000     0.000     0.000    -1.000     0.000    -1.000     1.000     1.000]
  [   1.000     1.000     0.000     1.000     0.000    -1.000     1.000     0.000    -1.000     0.000     1.000    -1.000]
  [   0.000    -1.000     1.000     0.000    -1.000    -1.000     0.000     1.000     1.000     0.000    -1.000     1.000]
  [   0.000     1.000    -1.000    -1.000     1.000     1.000     0.000     1.000     1.000     0.000     1.000    -1.000]
  [  -1.000     1.000     0.000    -1.000     1.000    -1.000     1.000    -1.000     1.000     1.000     1.000     0.000]
  [   0.000     1.000    -1.000     1.000     0.000     0.000     0.000     0.000    -1.000    -1.000    -1.000    -1.000]
  [   0.000     1.000     0.000     0.000     1.000     0.000    -1.000     1.000     0.000     0.000     0.000     0.000]
  [  -1.000     0.000     1.000    -1.000     0.000    -1.000     0.000     0.000     1.000    -1.000    -1.000     0.000]

The Cholesky decomposition has several important properties:

If A is positive definite, then L is unique and has positive diagonal entries.
The Cholesky decomposition can be used to solve linear systems of equations of the form Ax = b by first solving Ly = b for y, and then solving L^\top x = y.
The Cholesky decomposition can be used to generate samples from a multivariate normal distribution by transforming samples from a standard normal distribution using the Cholesky factor L.

The Cholesky decomposition is widely used in various applications, including numerical optimization, Bayesian inference, and machine learning. It is particularly useful for solving linear systems efficiently and for generating samples from multivariate normal distributions.

L.5 Kronecker Product and Hadamard Product

The Kronecker product and Hadamard product are two important operations in linear algebra that are used to manipulate matrices in various ways.

However when it comes to programming, these operations can be used to formalize many matrix operations in a more compact and efficient way which otherwise couldn’t be understood as an algebraic operation.

These operations are particularly useful in applications such as signal processing, image processing, and machine learning, where they can be used as short hand notation for certain matrix operations used in many algorithms.

L.5.1 Kronecker Product

The Kronecker product is a matrix operation that takes two matrices and produces a block matrix by multiplying each element of the first matrix by the entire second matrix. A programmer might describe it as a “block-wise multiplication” of two matrices.

The Hadamard product, on the other hand, is an element-wise multiplication of two matrices of the same dimensions.

Definition L.4 (Definition of the Kronecker Product) The Kronecker product of two matrices A and B, denoted by A \otimes B, is defined as the block matrix formed by multiplying each element of A by the entire matrix B. If A is an m \times n matrix and B is a p \times q matrix, then the Kronecker product A \otimes B is an (mp) \times (nq) matrix given by:

A \otimes B = \begin{bmatrix} a_{11}B & a_{12}B & \cdots & a_{1n}B \\ a_{21}B & a_{22}B & \cdots & a_{2n}B \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1}B & a_{m2}B & \cdots & a_{mn}B \end{bmatrix} \tag{L.12}

where a_{ij} are the elements of matrix A.

some properties of the Kronecker product include:

Theorem L.4 (Properties of the Kronecker Product) Let A be an m \times n matrix and B be a p \times q matrix. Then:

a \otimes A = A\otimes a for any scalar a (scalar multiplication property)
(aA) \otimes (bB) = ab(A \otimes B) for any scalars a and b (scalar multiplication property)
(A \otimes B) \otimes C = A \otimes (B\otimes C) (associative property)
(A+B) \otimes C = (A \otimes C) + (B \otimes C) when A,B,C are matrices of the same size (distributive property)
A \otimes (B+C) = (A \otimes B) + (A \otimes C) when A,B,C are matrices of the same size (distributive property)
(A \otimes B)^{\prime} = A^{\prime} \otimes B^{\prime} (transpose property)
(\mathbf{a} \otimes \mathbf{b})^{\prime} = \mathbf{a}^{\prime} \otimes \mathbf{b}^{\prime} (commutativity property for vectors)
\text{det}(A \otimes B) = \text{det}(A)^{p} \cdot \text{det}(B)^{m} (determinant property)
\text{rank}(A \otimes B) = \text{rank}(A) \cdot \text{rank}(B) (rank property)
(A \otimes B)^{-1} = A^{-1} \otimes B^{-1}, if both A and B are invertible (inverse property)
(A \otimes B)(C \otimes D) = (AC) \otimes (BD) (multiplication property) ;. \text{tr}(A \otimes B) = \text{tr}(A) \cdot \text{tr}(B) (trace property)