Expressing LU factorization using rank one matrices
In the last article, we looked at expressing matrix multiplication using rank one matrices. In this article, we will study LU factorization using rank one matrices.
Prerequisites
We are assuming that you are familiar with linear algebra corresponding to a typical undergrad course. In particular, we are assuming that you are familiar with Gaussian elimination and LU factorization.
Running example
In LU factorization, we take a square matrix and turn it into an upper triangular matrix. As our running example, we will look at the following matrix. Let’s turn it into an upper triangular matrix using a sequence of row operations.
A=1032−1−1−30−1−1−221040.
The sequence of row operations along with the intermediate matrices are as follows.
That was LU factorization using row operations. We will now look at an alternative way that uses rank one matrices.
Computing rank one components
When we say that we want to express LU factorization using rank one matrices, we are saying that we want to achieve
A=LU=k=1∑nlkuk,
where each summation term, lkuk, is of rank one.
Our strategy is going to be to succesively extract rank one components from a matrix and reduce it's rank on each extraction till we are left with a rank zero matrix. For example, if we denote a rank n matrix by An, where the subscript denotes the matrix rank, it would look like this.
To achieve this, we will look at the row operations that we did above and reorganize them so that they extract rank one components. The row operations in the above example are these.
In each of these operations, we use a pivot row to manipulate the rows below the pivot rows. For example, in the first three instructions, the pivot row is row1, in the next two instructions, the pivot row is row2 and in the last instruction, the pivot row is row3.
We can group these instructions by the pivot row involved. That is, we can group them by the second operand of addition. This results in the following groups.
As we will see, each of these groups is going to lead to the extraction of a rank one component. Our other requirement was to reduce the rank of the matrix by one on each rank one extraction. To achieve that, we turn the pivot row into zero. This results in the following four groups.
Applying group4 to Agroup3 results in
⟹⟹⟹⟹000000000000000−4⟶group40000000000000000Agroup3=Ogroup4+Agroup4Ogroup4=Agroup3−Agroup4Ogroup4=000000000000000−4Ogroup4=l4u4=0001[000−4].
In our above discussion, we reused the existing row operations to extract rank one components. Let's try to come up with the lkuk components in a different way.
What happens when we extract l1u1 from our matrix A? The original matrix was
A=1032−1−1−30−1−1−221040.
After subtracting l1u1 from A, we need zeroes on the pivot row and on the pivot column. In LU factorization, this is done using the pivot row, which is currently the first row of A. What we want is
A=abcd[1−1−11]+00000−1020−114001−2,
that is, we need the right multipliers a, b, c, and d. We can get these numbers by dividing the first column entries by the pivot. So that looks like
The first term here is l1u1. We now need to extract a rank one component from the second matrix such that its second row and its second row column is zero. We use the second pivot row to do it. It looks like
The above procedure can be described recursively as follows:
Base case: The matrix is zero rank.
Recursive case: Extract a rank one component and reduce the rank of the matrix by one.
The uk factor of the rank one component is the current pivot row. We compute the lk factor by dividing all elements of the pivot column by the pivot.
Implementation
A python implementation of the above ideas is this
import copy
classMatrix:def__init__(self, body): self.body = body
defget_number_of_rows(self):return len(self.body)
defget_number_of_columns(self):return len(self.body[0])
defget_element(self, row_idx, col_idx):return self.body[row_idx][col_idx]
defget_row(self, row_idx):return Matrix([self.body[row_idx]])
defget_column(self, col_idx):return Matrix([[row[col_idx]] for row in self.body])
def__add__(self, other_matrix): new_matrix = copy.deepcopy(self)
for row_idx, (row1, row2) in enumerate(zip(self.body, other_matrix.body)):
new_matrix.body[row_idx] = [num1 + num2 for num1, num2 in zip(row1, row2)]
return new_matrix
def__mul__(self, other_object):if isinstance(other_object, int) or isinstance(other_object, float):
new_matrix = copy.deepcopy(self)
for row_idx, row in enumerate(self.body):
for col_idx, element in enumerate(row):
new_matrix.body[row_idx][col_idx] = other_object * element
else:
new_matrix = Matrix([[0] * len(other_object.body[0]) for _ in range(len(self.body))])
for row_idx in range(len(self.body)):
for col_idx in range(len(other_object.body[0])):
row1_list = self.body[row_idx]
col2_list = [row[col_idx] for row in other_object.body]
new_matrix.body[row_idx][col_idx] = sum([num1 * num2 for num1, num2 in zip(row1_list, col2_list)])
return new_matrix
def__sub__(self, other_matrix):return self + (other_matrix *-1)
def__truediv__(self, scalar):if isinstance(scalar, int) or isinstance(scalar, float):
return self * (1/ scalar)
raiseValueError("Cannot divide a matrix by anything other than an int or a float")
def__repr__(self): result = []
for row in self.body:
row_string_list = [str(element) if element <0elsef" {element}"for element in row]
result.append(" ".join(row_string_list))
return"\n".join(result)
deflu(matrix, current_pivot_idx=0):if current_pivot_idx == matrix.get_number_of_rows():
return []
pivot = matrix.get_element(current_pivot_idx, current_pivot_idx)
current_u = matrix.get_row(current_pivot_idx)
current_l = matrix.get_column(current_pivot_idx) / pivot
result = [(current_l, current_u)]
remaining_matrix = matrix - current_l * current_u
result.extend(lu(remaining_matrix, current_pivot_idx +1))
return result
Conclusion
The original idea behind expressing factorization using the rank one approach was to automatically infer parallelization oppurtunities. Indeed, in our recursive algorithm, each recursive step has some
potential parallelization.
What actually ended up happening was viewing factorization using the rank one approach helped in algorithm design. The recursive structure of LU decomposition was more prominent for me in this framework.
It is worthwhile to go further in this direction. I am not sure where will it go and whether it will lead to anything. My strategy is not to go broad in the sense of covering other factorizations like QR or SVD but instead to go deep and implement this with all the automatic parallelization possible.
One thing I am unhappy about is the algorithm is too detailed about where the data is coming from and going to. We need a higher level language or notation to express data dependencies that can be somehow be compiled down to actual operations. Maybe I will explore this in a future article.