3 Matrices, II
3.1 Row echelon form
We’ll now introduce an important way of calculating with matrices – an extension of the Gaussian elimination which you saw in the Algorithmics module – which is going to be the key to virtually all the computations we’ll do in linear algebra.
Definition
Let \(\mathbf{M}\) and \(\mathbf{N}\) be matrices in \(M_{m,n}(\mathbb{K}).\) We say \(\mathbf{M}\) and \(\mathbf{N}\) are left-equivalent if there exists an invertible \(\mathbf{A}\in M_{m, m}(\mathbb{K})\) such that \(\mathbf{A}\mathbf{M}= \mathbf{N}.\)
One can show that “left equivalence” is an equivalence relation in the sense of Algorithmics, Section 2; that is, we have the following properties:
(Reflexivity) Every matrix is left-equivalent to itself.
(Symmetry) If \(\mathbf{M}\) is left-equivalent to \(\mathbf{N},\) then \(\mathbf{N}\) is left-equivalent to \(\mathbf{M}.\)
(Transitivity) If \(\mathbf{M}\) is left-equivalent to \(\mathbf{N},\) and \(\mathbf{N}\) is left-equivalent to \(\mathbf{R},\) then \(\mathbf{M}\) is left-equivalent to \(\mathbf{R}.\)
The idea of this section is to show that among all the matrices that are left-equivalent to a given \(\mathbf{M},\) there is a unique “nicest” one.
For each row in a matrix, if the row does not consist of zeros only, then the leftmost nonzero entry is called the leading entry of that row.
A matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) is said to be in row echelon form (REF) if
all rows consisting of only zeros are below all the non-zero rows;
the leading entry of a nonzero row is always strictly to the right of the leading entry of the row above it.
The matrix \(\mathbf{A}\) is said to be in reduced row echelon form (RREF) if furthermore
all of the leading entries are equal to \(1\);
in every column containing a leading entry, all of the other entries in that column are zero.
The basic theorem about REF and RREF is the following one:
For any \(\mathbf{M}\in M_{m,n}(\mathbb{K}),\) there exists a unique matrix \(\mathbf{N}\in M_{m, n}(\mathbb{K})\) with the following two properties:
\(\mathbf{N}\) is in reduced row echelon form, and
\(\mathbf{N}\) is left-equivalent to \(\mathbf{M}.\)
We say \(\mathbf{N}\) is the reduced row echelon form of \(\mathbf{M}.\) Moreover, there is an explicit algorithm for computing \(\mathbf{N},\) given \(\mathbf{M}.\)
By the definition of left-equivalence, there must be some invertible \(\mathbf{A}\) that multiplies \(\mathbf{M}\) into its RREF; but this \(\mathbf{A}\) is not unique in general. It’s somehow a miracle that \(\mathbf{N}\) is unique, even though \(\mathbf{A}\) isn’t. The extreme case is when \(\mathbf{M}= \mathbf{0}_{mn}\); then \(\mathbf{M}\) is already in RREF, but \(\mathbf{A}\mathbf{M}= \mathbf{M}\) for any matrix \(\mathbf{A},\) so we could take \(\mathbf{A}\) to be any invertible matrix we like.
Echelonizing a matrix via Gauss–Jordan
We’ll first prove the “existence” part of the theorem. You already saw how to convert a matrix into row echelon form (Gaussian elimination); to get reduced row echelon form, we’ll use a slight refinement, Gauss–Jordan elimination. This relies on the following tools:
Let \(\mathbf{M}\in M_{m,n}(\mathbb{K}).\) If \(\mathbf{M}'\) is obtained from \(\mathbf{M}\) by any one of the following operations, then \(\mathbf{M}'\) is left-equivalent to \(\mathbf{M}\):
Interchanging the \(i\)-th and \(j\)-th rows of \(\mathbf{M},\) for any \(i \ne j \in \{1, \dots, m\}\);
Multiplying all the entries of the \(i\)-th row by \(\lambda,\) for some \(\lambda \ne 0 \in \mathbb{K}\);
Adding \(\lambda\) times the \(j\)-th row of \(\mathbf{M}\) to the \(i\)-th row, for any \(i \ne j \in \{1, \dots, m\}\) and \(\lambda \in \mathbb{K}.\)
Proof. Each of these operations corresponds to left-multiplying \(\mathbf{M}\) by one of the three kinds of elementary matrices which you learned about in the M01 course. For instance, adding \(\lambda\) times the \(j\)-th row of \(\mathbf{M}\) to the \(i\)-th row corresponds to multiplying by the matrix with 1’s down the diagonal, \(\lambda\) in the \((i, j)\) position, and all other entries zero.
Proof of existence of RREF. Using the “transitivity” property above, if we apply any finite sequence of elementary row operations to \(\mathbf{M},\) we still get a matrix left-equivalent to \(\mathbf{M}.\) We use this to gradually transform \(\mathbf{M},\) one column at a time, to get it into RREF. More precisely, we’ll prove the following:
For any \(0 \leqslant r \leqslant n,\) we can apply a finite sequence of row operations to \(\mathbf{M}\) to obtain a matrix \(\mathbf{M}_r\) with the following property: the \(m \times r\) matrix given by the first \(r\) columns of \(\mathbf{M}_r\) is in reduced row echelon form.
We’ll prove this by induction on \(r.\) The statement is vacuous for \(r = 0\): a matrix without any columns is certainly in RREF, so \(\mathbf{M}_0 = \mathbf{M}\) will do. So let’s assume that we have already transformed \(\mathbf{M}\) into a matrix \(\mathbf{M}_r\) whose first \(r\) columns are in RREF, for some \(r < n,\) and try to deal with the \((r + 1)\)-st column.
Let \(h \leqslant m\) be the number of nonzero rows in the left-hand \(m \times r\) submatrix, so the first \(r\) columns are all zero below the \(h\)-th entry. Then we can visualise \(\mathbf{M}_r\) as follows: \[\mathbf{M}_r = \left( \begin{array}{rc|c} {\scriptstyle h} \big\{ &\overbrace{\left( \begin{array}{c} \text{RREF}, \\[-0.7ex] \text{no zero rows} \end{array}\right) }^r & \overbrace{\rule{0pt}{3ex}(?)}^{(n-r)} \\ \hline {\scriptstyle (m-h)} \big\{ & \text{($\mathbf{0}$)} & \text{($?$)} \end{array}\right)\]
If \(h = m\) (which is certainly possible), then the whole matrix \(\mathbf{M}_r\) is in RREF already, so we can set \(\mathbf{M}_{r+1} = \mathbf{M}_r\) and go on. If not, then we home in on the first column in the bottom right \((m - h) \times (n - r)\) submatrix of \(\mathbf{M}_r.\) Two things can now happen:
Case A: all these entries are zero. In this case, the first \((r + 1)\) columns of \(\mathbf{M}_r\) are already in RREF; so we can set \(\mathbf{M}_{r + 1} = \mathbf{M}_{r},\) and the induction step is complete.
Case B: at least one of these \((m-h)\) entries is non-zero (so in particular \(h < m\)).
Swapping the \((h + 1)\)-st row with one of the rows further down if necessary, we can assume that this nonzero entry is the top left corner entry of our submatrix (i.e. in position \((h + 1, r + 1)\) of \(\mathbf{M}_r\)). By multiplying the \((h + 1)\)-th row by a suitable scalar, we can make this nonzero entry be 1. This row swap and scaling doesn’t change anything in the leftmost \(r\) columns – we’re just moving zeroes around – so the first \(r\) columns are still in RREF.
We now kill off all the other non-zero entries in the \((r + 1)\)-th column, by subtracting a suitable multiple of the \((h + 1)\)-th row. Again, this doesn’t change anything in the first \(r\) columns, because the \((h + 1)\)-th row has zeroes in these positions; so the resulting matrix \(\mathbf{M}_{r+ 1}\) has its first \((r + 1)\) columns in RREF.
After \(n\) steps of this process we end up with a matrix which is fully in RREF.
Notice that this proof doesn’t just abstractly show you that the RREF exists; it gives you a completely explicit recipe for finding it.
Since calculating the RREF of a matrix is such a basic tool in linear algebra calculations, any half-decent computer mathematics package will include a RREF routine; often several different ones, tailored to matrices of particular specific shapes or with entries in specific fields.
Keeping track of the transformation matrix
In principle, whenever you Gauss–Jordan eliminate a matrix, you can make a list of the elementary row operations you used, and the corresponding elementary matrices; this gives you a list of elementary matrices \((\mathbf{A}_1, \mathbf{A}_2, \dots, \mathbf{A}_k)\) such that \(\mathbf{A}_k\mathbf{A}_{k-1}\dots\mathbf{A}_1 \mathbf{M}= \mathbf{N}\) is in RREF. However, this quickly gets quite tedious to do, and there’s a much better method.
For \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\in M_{m, r}(\mathbb{K}),\) we write \((\mathbf{A}\ |\ \mathbf{B})\) for the \(m \times (n + r)\) matrix given by joining \(\mathbf{A}\) and \(\mathbf{B}\) together (so the \((i, j)\) entry is \(A_{ij}\) for \(1 \leqslant j \leqslant n,\) and \(B_{i,j-n}\) for \(n + 1 \leqslant j \leqslant n + r\)).
E.g. if we have \[\mathbf{A}= \begin{pmatrix} 1 & 3 & 2 \\ 2 & 0 & 1 \\ 5 & 2 & 2\end{pmatrix}\qquad \mathbf{B}= \begin{pmatrix} 4 \\ 3 \\ 1\end{pmatrix},\] then \[(\mathbf{A}\ |\ \mathbf{B}) = \left(\begin{array}{ccc|c} 1 & 3 & 2 & 4\\ 2 & 0 & 1 & 3\\ 5 & 2 & 2 & 1\end{array}\right).\] It’s common to write a line in between the entries, as above, to remind ourselves which entries came from where.
For any matrices \(\mathbf{A}\in M_{m, n}(\mathbb{K}),\) \(\mathbf{B}\in M_{n, r}(\mathbb{K}),\) and \(\mathbf{C}\in M_{n, s}(\mathbb{K}),\) we have \[\mathbf{A}\cdot (\mathbf{B}\ |\ \mathbf{C}) = (\mathbf{A}\mathbf{B}\ |\ \mathbf{A}\mathbf{C}).\]
Proof. The \(j\)-th column of \(\mathbf{A}\cdot (\mathbf{B}\ |\ \mathbf{C})\) is given by \(\mathbf{A}\cdot \vec{v}_j\) where \(\vec{v}_j\) is the \(j\)-th column of \((\mathbf{B}\ |\ \mathbf{C}).\) Considering the cases \(1 \leqslant j \leqslant r\) and \(r+1 \leqslant j \leqslant r + s\) separately, we obtain either a column of \(\mathbf{A}\mathbf{B}\) or a column of \(\mathbf{A}\mathbf{C}.\)
Given any \(\mathbf{M}\in M_{m,n}(\mathbb{K}),\) let \(\tilde{\mathbf{M}} = (\mathbf{M}\ |\ \mathbf{1}_{m}),\) and let \(\tilde{\mathbf{N}}\) be the RREF of \(\tilde{\mathbf{M}}.\) Then we have \(\tilde{\mathbf{N}} = (\mathbf{N}\ |\ \mathbf{A}),\) where \(\mathbf{N}\) is the RREF of \(\mathbf{M},\) and \(\mathbf{A}\) is a matrix such that \(\mathbf{A}\mathbf{M}= \mathbf{N}.\)
Proof. Let \(\mathbf{A}\) be any invertible matrix such that \(\tilde{\mathbf{N}} = \mathbf{A}\tilde{\mathbf{M}}\) is in RREF. From the proposition, we have \(\mathbf{A}\tilde{\mathbf{M}} = (\mathbf{A}\mathbf{M}\ |\ \mathbf{A}\mathbf{1}_{m}) = (\mathbf{A}\mathbf{M}\ |\ \mathbf{A}).\) However, if \(\tilde{\mathbf{N}}\) is an RREF matrix, then its first \(n\) columns are also an RREF matrix, so \(\mathbf{A}\mathbf{M}\) must be the unique RREF matrix left-equivalent to \(\mathbf{M}.\)
We can take a slight shortcut here: since Gauss–Jordan elimination transforms a matrix into RREF one column at a time, we can stop as soon as the first \(n\) columns of \(\tilde{M}\) are in RREF – we don’t have to keep going all the way up to the \((m + n)\)-th column.
Uniqueness
We’ll now prove the uniqueness property of reduced row echelon form. This proof is non-examinable (marked by a dark green line down the margin in the printed notes); but you should definitely be aware of the statement of the theorem!
We use the following easy remark:
Assume that \(\mathbf{M}\) and \(\mathbf{N}\) are left-equivalent, and let \(j \in \{1, \dots, n\}.\) Then the matrices \(\mathbf{M}'\) and \(\mathbf{N}'\) given by deleting the \(j\)-th columns from \(\mathbf{M}\) and \(\mathbf{N}\) are also left-equivalent.
Proof. Exercise.
Proof of uniqueness of RREF. Now suppose \(\mathbf{M}\) and \(\mathbf{N}\) are left-equivalent matrices, both in reduced row echelon form, with \(\mathbf{M}\ne \mathbf{N}.\) Let the \(t\)-th column, for \(1 \leqslant t \leqslant n,\) be the first (leftmost) column where the two matrices differ; and consider the new matrices \(\mathbf{M}', \mathbf{N}'\) given by deleting all columns strictly to the right of the \(t\)-th, and all columns to the left which don’t contain a leading entry. Then we have \(\mathbf{M}' \ne \mathbf{N}'\); both are still in RREF; and we have \[\mathbf{M}' = \begin{pmatrix} \mathbf{1}_{h} & \vec{r} \\ 0 & \vec{s} \end{pmatrix}, \qquad \mathbf{N}' = \begin{pmatrix} \mathbf{1}_{h} & \vec{r'} \\ 0 & \vec{s'} \end{pmatrix}\] for some \(h < t\) and some column vectors \(\vec{r}, \vec{s}, \vec{r'}, \vec{s'}.\)
Since \(\mathbf{M}\) and \(\mathbf{N}\) are left-equivalent, so are \(\mathbf{M}'\) and \(\mathbf{N}'\) (by the last Proposition), so there exists an invertible \(\mathbf{A}\) with \(\mathbf{A}\mathbf{M}' = \mathbf{N}'.\) Let’s think about what \(\mathbf{A}\mathbf{M}'\) looks like. We can divide \(\mathbf{A}\) up into blocks \(\begin{pmatrix}\mathbf{R} & \mathbf{S} \\ \mathbf{T} & \mathbf{U} \end{pmatrix},\) with \(\mathbf{R}\in M_{h,h}(\mathbb{K})\) etc; and the product is then \[\begin{pmatrix}\mathbf{R} & \mathbf{S} \\ \mathbf{T} & \mathbf{U} \end{pmatrix} \cdot \begin{pmatrix} \mathbf{1}_{h} & \vec{r} \\ 0 & \vec{s} \end{pmatrix} = \begin{pmatrix} \mathbf{R} & \mathbf{R}\vec{r} + \mathbf{S}\vec{s} \\ \mathbf{T} & \mathbf{T}\vec{r} + \mathbf{U}\vec{s}\end{pmatrix}.\] Let’s suppose this is equal to \(\mathbf{N}'.\) Then, comparing the top-left and bottom-left blocks, we must have \(\mathbf{R} = \mathbf{1}_{h}\) and \(\mathbf{T} = \mathbf{0}_{m-h, h}\); so this becomes \[\begin{pmatrix} \mathbf{1}_{h} & \vec{r} + \mathbf{S} \vec{s} \\ 0 & \mathbf{U} \vec{s} \end{pmatrix} = \begin{pmatrix} \mathbf{1}_{h} & \vec{r'} \\ 0 & \vec{s'} \end{pmatrix}.\]
If \(\vec{s}\ne 0,\) then \(\mathbf{U} \vec{s} \ne 0\) also, since the invertibility of \(\mathbf{A}\) implies that \(\mathbf{U}\) is also invertible (see Exercise below). Since \(\mathbf{M}'\) and \(\mathbf{N}'\) are in RREF, both \(\vec{r}\) and \(\vec{r'}\) have to be zero, and \(\vec{s}\) and \(\vec{s'}\) are both equal to the standard basis vector \(\vec{e}_1\); so in fact \(\mathbf{M}' = \mathbf{N}',\) contradicting our assumptions.
On the other hand, if \(\vec{s} = 0,\) then \(\vec{s}' = \mathbf{U} \vec{s} = 0\) as well; and substituting this into the top right block, we have \(\vec{r}' = \vec{r} + \mathbf{S} \vec{s} = \vec{r}.\) So again \(\mathbf{M}' = \mathbf{N}',\) contradiction. So we can conclude that it is impossible for two distinct RREF matrices to be left-equivalent.
3.2 Solving equations
You already saw in Algorithmics that matrices can be used to “package together” systems of linear equations. Suppose we have a system of simultaneous equations \[\begin{aligned} a_{11} x_1 + a_{12} x_2 + \dots + a_{1n} x_n &= b_1, \\ \vdots\qquad &= \quad\vdots\\ a_{m1} x_1 + a_{m2} x_2 + \dots + a_{mn} x_n &= b_m, \end{aligned}\] with (known) values \(a_{ij} \in \mathbb{K}\) and \(b_i \in \mathbb{K},\) and we want to solve this for the unknowns \(x_j.\) Then we can write this simply as \(\mathbf{A}\vec{x} = \vec{b},\) with \(\mathbf{A}= (A_{ij}) \in M_{m,n}(\mathbb{K}),\) \(\vec{b} = (b_i)\) and \(\vec{x} = (x_j)\); and we write \(\operatorname{Sol}(\mathbf{A}, \vec{b})\) for the set of solutions to this equation, i.e. \[\operatorname{Sol}(\mathbf{A}, \vec{b}) = \{ \vec{x} \in \mathbb{K}^n : \mathbf{A}\vec{x} = \vec{b}\}.\]
If \(\mathbf{B}\) is an invertible square matrix, then we have \(\operatorname{Sol}(\mathbf{B}\mathbf{A}, \mathbf{B}\vec{b}) = \operatorname{Sol}(\mathbf{A}, \vec{b}).\) So, if we want to compute \(\operatorname{Sol}(\mathbf{A}, \vec{b}),\) it’s a good idea to choose some \(\mathbf{B}\) such that \(\mathbf{B}\mathbf{A}\) and \(\mathbf{B}\vec{b}\) are as simple as possible. So we form the augmented matrix \((\mathbf{A}\, | \vec{b})\) and put that into echelon form; this gives a new, echelonized system of equations where we can immediately read off the solutions.
Suppose \(\mathbf{A}\) is in REF (not necessarily RREF), and let \(r \leqslant m\) be the number of non-zero rows of \(\mathbf{A}.\) Then the equation \(\mathbf{A}\vec{x} = \vec{b}\) has a solution if, and only if, we have \(b_i = 0\) for all \(i\) with \(r < i \leqslant m.\)
(This condition is vacuously satisfied if \(r = m,\) since there are no such \(i,\) so solutions always exist in this case.) It’s clear that “\(b_i = 0\) for all \(i\) with \(r < i \leqslant m\)” is a necessary condition for solutions to exist, since for \(i\) in this range the \(i\)-th equation in our system is just \(0 = b_i.\) What’s less obvious is that it is a sufficient condition, which is shown in Chapter 9 of Algorithmics.
We say the system of equations is inconsistent if one of \(b_{r + 1}, \dots, b_m\) is nonzero, so the solution set is empty.
no solution
my mind is a matrix
that has been reduced
into row echelon form
and proven to be
— inconsistent
(from More Math Poems by Eileen Tupaz)
Of course, we don’t just want to know whether solutions exist: we want to find them! Having found a REF for \(\mathbf{A},\) we say \(x_j\) is a free variable if there is no row of the REF whose leading term is in the \(j\)-th column. Then it’s easy to check that for any values of the free variables, there is a unique way to fill in the rest of the variables to get a solution \(\vec{x}.\)
“Find all real numbers \(x_1, \dots, x_4\) satisfying the system of linear equations \[\begin{aligned} -9x_2 + 3x_3 + 4x_4 &=9 \\ x_1 + 4x_2 - x_4 &= 5\\ 2x_1 + 6x_2 - x_3 + 5 x_4 &= -5 \text{ .''} \end{aligned}\]
The augmented matrix is \[\left(\begin{array}{cccc|c} 0 & -9 & 3 & 4 & 9 \\ 1 & 4 & 0 & -1 & 5 \\ 2 & 6 & -1 & 5 & -5 \end{array}\right)\]
Let’s walk through the steps of echelonizing this:
Swap rows 1 and 2 to get \(\left(\begin{array}{rrrr|r} 1 & 4 & 0 & -1 & 5 \\ 0 & -9 & 3 & 4 & 9 \\ 2 & 6 & -1 & 5 & -5 \end{array}\right)\)
Add \(-2\) times row 1 to row 3 to get \(\left(\begin{array}{rrrr|r} 1 & 4 & 0 & -1 & 5 \\ 0 & -9 & 3 & 4 & 9 \\ 0 & -2 & -1 & 7 & -15 \end{array}\right)\)
Multiply row 2 by \(-\tfrac{1}{9}\) to get \(\left(\begin{array}{rrrr|r} 1 & 4 & 0 & -1 & 5 \\ 0 & 1 & -\frac{1}{3} & -\frac{4}{9} & -1 \\ 0 & -2 & -1 & 7 & -15 \end{array}\right)\)
Add \(-4\) times row 2 to row 1, and \(2\) times row 2 to row 3, to get \(\left(\begin{array}{rrrr|r} 1 & 0 & \frac{4}{3} & \frac{7}{9} & 9 \\ 0 & 1 & -\frac{1}{3} & -\frac{4}{9} & -1 \\ 0 & 0 & -\frac{5}{3} & \frac{55}{9} & -17 \end{array}\right)\)
Multiply row 3 by \(-\tfrac{3}{5}\) to get \(\left(\begin{array}{rrrr|r} 1 & 0 & \frac{4}{3} & \frac{7}{9} & 9 \\ 0 & 1 & -\frac{1}{3} & -\frac{4}{9} & -1 \\ 0 & 0 & 1 & -\frac{11}{3} & \frac{51}{5} \end{array}\right)\)
Add \(-\tfrac{4}{3}\) of row 3 to row 1, and \(\tfrac{1}{3}\) of row 3 to row 2, to get \(\left(\begin{array}{rrrr|r} 1 & 0 & 0 & \frac{17}{3} & -\frac{23}{5} \\ 0 & 1 & 0 & -\frac{5}{3} & \frac{12}{5} \\ 0 & 0 & 1 & -\frac{11}{3} & \frac{51}{5} \end{array}\right)\)
So the solutions to the original system are the same as those of the echelonized system \[\begin{aligned} x_1 + \tfrac{17}{3} x_4 &= -\tfrac{23}{5} \\ x_2 - \tfrac{5}{3} x_4 &= \tfrac{12}{5}\\ x_3 - \tfrac{11}{3} x_4 &= \tfrac{51}{5} \end{aligned}\]
Clearly, we can choose any value of \(x_4\) we like (let’s call it \(\lambda\)), and then read off the values of \(x_1, x_2, x_3\) from that, so the solutions are given by \[\vec{x} = \begin{pmatrix} -\tfrac{23}{5} - \tfrac{17}{3} \lambda\\ \tfrac{12}{5} + \tfrac{5}{3} \lambda \\ \tfrac{51}{5} + \tfrac{11}{3} \lambda \\ \lambda \end{pmatrix}\] for any \(\lambda \in \mathbb{R}.\)
Here is a variation on RREF, where we have unknown parameters, not numbers, in the last column. (I am grateful to Sarah Zerbes for this example.)
“Consider the system of equations \[\begin{pmatrix}1 & -2 & 1 \\ 2 & 1 & 1 \\ 0 & 5 & -1\end{pmatrix}\begin{pmatrix}x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix}\] where \(b_1, b_2, b_3 \in \mathbb{R}.\) For which values of the \(b_i\) is this solvable?”
We take the augmented matrix \[\left(\begin{array}{ccc|c} 1 & -2 & 1 & b_1 \\ 2 & 1 & 1 & b_2 \\ 0 & 5 & -1 & b_3 \end{array}\right)\] and apply Gauss–Jordan elimination to echelonize the first three columns. After clearing the first column we get \[\left(\begin{array}{ccc|c} 1 & -2 & 1 & b_1 \\ 0 & 5 & -1 & -2b_1 + b_2 \\ 0 & 5 & -1 & b_3 \end{array}\right)\] and continuing to clear the second column we get \[\left(\begin{array}{ccc|c} 1 & 0 & \frac{3}{5} & \frac{1}{5} b_{1} + \frac{2}{5} b_{2} \\ 0 & 1 & -\frac{1}{5} & -\frac{2}{5} b_{1} + \frac{1}{5} b_{2} \\ 0 & 0 & 0 & 2 b_{1} - b_{2} + b_{3} \end{array}\right).\] So the original system has solutions if and only if \(2b_1 - b_2 + b_3 = 0.\)
3.3 Inverting a matrix
Non-square matrices
We can now make good on a promise from the last chapter, by showing that a non-square matrix cannot be invertible. This follows from the following more specific theorem:
Suppose \(\mathbf{A}\in M_{m, n}(\mathbb{K}).\)
If \(n > m\) (so \(\mathbf{A}\) has strictly more columns than it has rows), then there does not exist a matrix \(\mathbf{B}\in M_{n, m}(\mathbb{K})\) with \(\mathbf{B}\mathbf{A}= \mathbf{1}_{n}.\)
If \(n < m,\) then there does not exist a matrix \(\mathbf{B}\in M_{n, m}(\mathbb{K})\) with \(\mathbf{A}\mathbf{B}= \mathbf{1}_{m}.\)
Proof. First suppose \(n > m.\) Consider the system of equations \(\mathbf{A}\cdot \vec{x} = \mathbf{0}.\) This obviously has the solution \(\vec{x} = \mathbf{0}.\) But it must have other solutions too, since \(\mathbf{A}\) has only \(m\) rows, so at most \(m\) of the columns of the RREF can contain a leading entry (possibly less, if there are some zero rows). Thus the general solution has some free variables in it (at least \(n - m\) of them). So there exists a solution \(\vec{x} \ne \mathbf{0}.\)
But then we have a contradiction, since \[\begin{aligned} \vec{x} &= \mathbf{1}_n \vec{x} \\ &= (\mathbf{B}\cdot \mathbf{A}) \cdot \vec{x}\\ &= \mathbf{B}\cdot (\mathbf{A}\cdot \vec{x})\\ &= \mathbf{B}\cdot \mathbf{0}= \mathbf{0}. \end{aligned}\] This proves (i).
Now let’s consider \(m > n.\) If \(\mathbf{A}\mathbf{B}= \mathbf{1}_m,\) then \(\mathbf{B}^T \mathbf{A}^T = (\mathbf{1}_m)^T = \mathbf{1}_m,\) and we obtain a contradiction by applying (i) to \(\mathbf{A}^T \in M_{n, m}(\mathbb{K}).\)
Square matrices
For square matrices, we can use Gauss–Jordan elimination to determine if a matrix is invertible, and to compute its inverse if so. This relies on the following fact:
Let \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) be a square matrix. Then the following statements are equivalent:
\(\mathbf{A}\) is invertible;
the RREF of \(\mathbf{A}\) is the identity.
Proof. Suppose \(\mathbf{A}\) is invertible. Then \(\mathbf{A}^{-1}\) is itself invertible (with inverse \(\mathbf{A}\)), so the equation \(\mathbf{A}^{-1} \mathbf{A}= \mathbf{1}_{n}\) shows that \(\mathbf{A}\) is left-equivalent to \(\mathbf{1}_{n}.\) As \(\mathbf{1}_{n}\) is obviously in RREF, this must be the (unique) RREF of \(\mathbf{A}.\)
Conversely, if the RREF of \(\mathbf{A}\) is \(\mathbf{1}_{n},\) then we know there is an invertible \(\mathbf{B}\) such that \(\mathbf{B}\mathbf{A}= \mathbf{1}_{n}.\) However, since \(\mathbf{B}\) is invertible, this implies \(\mathbf{A}= \mathbf{B}^{-1}\) (using Exercise 2.4), so we have \(\mathbf{A}\mathbf{B}= \mathbf{B}^{-1} \mathbf{B}= \mathbf{1}_{n}\) as well. Thus \(\mathbf{A}\) is invertible.
Note that this also shows that when \(\mathbf{A}\) is invertible, the inverse of \(\mathbf{A}\) is the same as the matrix which puts it into RREF. So if we apply Gauss–Jordan elimination to the augmented matrix \((\mathbf{A}\ |\ \mathbf{1}_{n}),\) then the rightmost \(n\) columns of the echelonized matrix will be the inverse of \(\mathbf{A}.\)
We want to compute the inverse of \[\mathbf{A}=\begin{pmatrix} 1 & -2 \\ -3 & 4 \end{pmatrix}.\] Write \[\left(\begin{array}{cc} 1 & -2 \\ -3 & 4\end{array}\right| \left.\begin{array}{cc} 1 & 0 \\ 0 & 1\end{array}\right).\] Adding \(3\)-times the first row to the second row gives \[\left(\begin{array}{cc} 1 & -2 \\ 0 & -2\end{array}\right| \left.\begin{array}{cc} 1 & 0 \\ 3 & 1\end{array}\right).\] Dividing the second row by \(-2\) gives \[\left(\begin{array}{cc} 1 & -2 \\ 0 & 1\end{array}\right| \left.\begin{array}{cc} 1 & 0 \\ -\frac{3}{2} & -\frac{1}{2}\end{array}\right).\] Finally, adding the second row twice to the first row gives \[\left(\begin{array}{cc} 1 & 0 \\ 0 & 1\end{array}\right| \left.\begin{array}{cc} -2 & -1 \\ -\frac{3}{2} & -\frac{1}{2}\end{array}\right).\] The first two columns are now the identity, so \(\mathbf{A}\) is invertible; and the last two columns give us the inverse, so \[\mathbf{A}^{-1}=\begin{pmatrix} -2 & -1 \\ -\frac{3}{2} & -\frac{1}{2}\end{pmatrix}.\]
Exercises
Check that left equivalence of matrices is an equivalence relation.
Solution
For this exercise we write \(\mathbf{M}\sim \mathbf{N}\) to mean “there exists an invertible \(\mathbf{A}\) with \(\mathbf{A}\mathbf{M}= \mathbf{N}\)”.
Reflexivity: taking \(\mathbf{A}= \mathbf{1}_{m},\) every \(\mathbf{M}\in M_{m, n}(\mathbb{K})\) satisfies \(\mathbf{M}\sim \mathbf{M}.\)
Symmetry: suppose \(\mathbf{M}\sim \mathbf{N},\) so \(\mathbf{A}\mathbf{M}= \mathbf{N}\) for some \(\mathbf{A}.\) By assumption \(\mathbf{A}\) is invertible; and \(\mathbf{A}^{-1} \mathbf{N}= \mathbf{A}^{-1} \mathbf{A}\mathbf{M}= \mathbf{M},\) so \(\mathbf{N}\sim \mathbf{M}.\)
Transitivity: if \(\mathbf{M}\sim \mathbf{N}\) and \(\mathbf{N}\sim \mathbf{R},\) we have \(\mathbf{A}\) and \(\mathbf{B}\) such that \(\mathbf{A}\mathbf{M}= \mathbf{N}\) and \(\mathbf{B}\mathbf{N}= \mathbf{R}.\) So \((\mathbf{B}\mathbf{A}) \mathbf{M}= \mathbf{B}(\mathbf{A}\mathbf{M}) = \mathbf{B}\mathbf{N}= \mathbf{R}\) (note the use of associativity!), and \(\mathbf{B}\mathbf{A}\) is invertible (with inverse \(\mathbf{A}^{-1} \mathbf{B}^{-1}\)), hence \(\mathbf{M}\sim \mathbf{R}.\)
Consider the system of equations from 3.16, and suppose \(b_3 = b_2 - 2b_1,\) so the system is consistent.
Find formulae (in terms of \(b_1\) and \(b_2\)) for vectors \(\vec{c}\) and \(\vec{d}\) such that the general solution of the above system of equations is given by \[\begin{pmatrix}x_1 \\ x_2 \\ x_3 \end{pmatrix} = \{ \vec{c} + \lambda \vec{d}\ |\ \lambda \in \mathbb{R}\}.\]
Solution
The only free variable is \(x_3,\) since there are leading entries in the first and second columns. Setting \(x_3 = \lambda,\) the first row of the RREF gives us \[x_1 + \tfrac{3}{5} \lambda = \tfrac{1}{5} b_1 + \tfrac{2}{5}b_2\] and the second row \[x_2 - \tfrac{1}{5} \lambda = -\tfrac{2}{5}b_1 + \tfrac{1}{5}b_2,\] so the general solution is given by \[\{ \begin{pmatrix} \frac{1}{5} b_{1} + \frac{2}{5} b_{2} - \tfrac{3}{5} \lambda \\ -\frac{2}{5} b_{1} + \frac{1}{5} b_{2} + \tfrac{1}{5}\lambda \\ \lambda\end{pmatrix} : \lambda \in \mathbb{R}\}.\] Thus we can take \(\vec{c} = \begin{pmatrix}\frac{1}{5} b_{1} + \frac{2}{5} b_{2} \\ -\frac{2}{5} b_{1} + \frac{1}{5} b_{2} \\ 0 \end{pmatrix} + \lambda \begin{pmatrix} -\tfrac{3}{5} \\ \tfrac{1}{5} \\ 1 \end{pmatrix}.\) (This is not the only way of writing the solution, of course.)
Let \(\mathbf{A}\in M_{n, n}(\mathbb{K})\) be an upper-triangular matrix (so its entries satisfy \(A_{ij} = 0\) if \(i > j\)).
Show that \(\mathbf{A}\) is invertible if and only if all the diagonal entries \(A_{ii}\) are non-zero.
Show that if the condition of part (a) is satisfied, then \(\mathbf{A}^{-1}\) is also upper-triangular.
Solution
First, we observe that if \(\mathbf{A}\) is upper triangular, and the first \(r\) diagonal entries are nonzero, for some \(1 \leqslant r \leqslant n,\) then in order to put the first \(r\) columns into RREF, we only need to multiply the first \(r\) rows by nonzero constants (to make the diagonal entries 1) and add suitable multiples of the \(j\)-th row to the \(i\)-th row for each \(i < j \leqslant r\) (because the entries below the diagonal are already zero).
This process doesn’t change the rows after the \(r\)-th at all; and in the recipe for computing the diagonal, by applying echelon form to \((\mathbf{A}\, |\, \mathbf{1}_{n}),\) it won’t introduce any nonzero below-diagonal entries in the right half of the matrix.
So if \(A_{rr} \ne 0\) for all \(r,\) the RREF of \(\mathbf{A}\) will be the identity, and the transformation matrix putting it into RREF (which is \(\mathbf{A}^{-1}\)) will be upper-triangular.
On the other hand, if \(A_{rr} = 0\) for some \(r,\) then when we reach the \(r\)-th column in the RREF algorithm, all the rows from \(r, \dots, n\) will have zeroes in the \(r\)-th column, and hence the \(r\)-th row of the RREF won’t contain a leading entry; thus the RREF of \(\mathbf{A}\) is not the identity and \(\mathbf{A}\) is not invertible.
Justify the claim made in the proof of uniqueness of RREF that if \(\mathbf{A}= \begin{pmatrix}\mathbf{1}_{h} & \mathbf{S} \\ \mathbf{0}_{m-h, h} & \mathbf{U}\end{pmatrix}\) is invertible, then \(\mathbf{U}\) is itself invertible.
Solution
Suppose there exists an inverse \(\mathbf{B}\) of \(\mathbf{A}.\) We can break \(\mathbf{B}\) up into blocks of the same sizes as the blocks of \(\mathbf{A},\) so we write it as \(\mathbf{B}= \begin{pmatrix} \mathbf{C}& \mathbf{D}\\ \mathbf{E}& \mathbf{F}\end{pmatrix}\) (with \(\mathbf{C}\in M_{h, h}(\mathbb{K}), \mathbf{D}\in M_{h, m-h}(\mathbb{K})\) etc), and we have \[\mathbf{B}\mathbf{A}= \begin{pmatrix} \mathbf{C}& \mathbf{D}\\ \mathbf{E}& \mathbf{F}\end{pmatrix}\begin{pmatrix}\mathbf{1}_{} & \mathbf{S} \\ \mathbf{0}& \mathbf{U}\end{pmatrix} = \begin{pmatrix}\mathbf{1}_{} & \mathbf{0}\\ \mathbf{0}& \mathbf{1}_{}\end{pmatrix}.\] We can compute the products block-by-block, hence \[\begin{pmatrix} \mathbf{C}& \mathbf{C}\mathbf{S} + \mathbf{D}\mathbf{U} \\ \mathbf{E}& \mathbf{E}\mathbf{S} + \mathbf{F} \mathbf{U}\end{pmatrix} = \begin{pmatrix}\mathbf{1}_{} & \mathbf{0}\\ \mathbf{0}& \mathbf{1}_{}\end{pmatrix}\] Comparing the bottom left blocks we have \(\mathbf{E}= \mathbf{0},\) and now comparing the bottom right blocks we have \(\mathbf{F} \mathbf{U} = \mathbf{1}_{}.\) Now (using the fact that \(\mathbf{E}= \mathbf{0}\)) we can compute the other product as \[\mathbf{A}\mathbf{B}= \begin{pmatrix}\mathbf{1}_{} & \mathbf{S} \\ \mathbf{0}& \mathbf{U}\end{pmatrix} \begin{pmatrix} \mathbf{C}& \mathbf{D}\\ \mathbf{0}& \mathbf{F}\end{pmatrix} = \begin{pmatrix}\mathbf{C}& \mathbf{D}+ \mathbf{S} \mathbf{F} \\ \mathbf{0}& \mathbf{U}\mathbf{F}\end{pmatrix} = \begin{pmatrix}\mathbf{1}_{} & \mathbf{0}\\ \mathbf{0}& \mathbf{1}_{}\end{pmatrix},\] so we deduce \(\mathbf{U} \mathbf{F}= \mathbf{1}_{}.\) Thus \(\mathbf{F}\) is both a left and a right inverse of \(\mathbf{U},\) so \(\mathbf{U}\) is invertible as required.
Show that if \(\mathbf{A}\) is an invertible square matrix, then there is a finite sequence of elementary matrices \(\mathbf{B}_1 \mathbf{B}_2 \dots \mathbf{B}_k\) whose product is \(\mathbf{A}.\) (Hint: what is the RREF of \(\mathbf{A}^{-1}\)?)
Solution
We know that \(\mathbf{A}^{-1}\) is itself invertible (with inverse \(\mathbf{A}\)) so its RREF is the identity. Hence we can find a sequence of elementary matrices \(\mathbf{B}_1 \mathbf{B}_2 \dots \mathbf{B}_k\) with \[\mathbf{B}_1 \mathbf{B}_2 \dots \mathbf{B}_k \mathbf{A}^{-1} = \mathbf{1}_{n}\] and right-multiplying both sides by \(\mathbf{A}\) we deduce \(\mathbf{B}_1 \mathbf{B}_2 \dots \mathbf{B}_k = \mathbf{A}.\)
If \(\mathbf{B}_1,\ldots,\mathbf{B}_N,\mathbf{C}_1,\ldots,\mathbf{C}_{\tilde{N}}\in M_{m,n}(\mathbb{K})\) are such that \(\mathbf{B}_N\mathbf{B}_{N-1}\cdots \mathbf{B}_1 = \mathbf{C}_{\tilde{N}}\mathbf{C}_{\tilde{N}-1}\cdots \mathbf{C}_1\) is the reduced row echelon form of \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) then \(\tilde N=N\) and \(\mathbf{B}_k = \mathbf{C}_k\) for all \(k=1,\ldots, N.\)
- True
- False
\(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is invertible if and only if the reduced row echelon form of \(\mathbf{A}\) equals \(\mathbf{1}_{n}.\)
- True
- False
In general, a solution to the equation \(\mathbf{A}\vec x = \vec b\) is given by \(\vec x = \mathbf{A}^{-1}\vec b,\) for \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) and \(\vec x,\vec b\in\mathbb{K}^n.\)
- True
- False
A system of \(m\) linear equations in \(n\) variables, \(\mathbf{A}\vec{x} = \vec{b},\) always has a solution if \(m < n.\)
- True
- False