3 Matrices, II

3.1 Row echelon form

We’ll now introduce an important way of calculating with matrices – an extension of the Gaussian elimination which you saw in the Algorithmics module – which is going to be the key to virtually all the computations we’ll do in linear algebra.

Definition

Definition 3.1

Let $\mathbf{M}$ and $\mathbf{N}$ be matrices in $M_{m,n}(\mathbb{K}).$ We say $\mathbf{M}$ and $\mathbf{N}$ are left-equivalent if there exists an invertible $\mathbf{A}\in M_{m, m}(\mathbb{K})$ such that $\mathbf{A}\mathbf{M}= \mathbf{N}.$

One can show that “left equivalence” is an equivalence relation in the sense of Algorithmics, Section 2; that is, we have the following properties:

(Reflexivity) Every matrix is left-equivalent to itself.
(Symmetry) If $\mathbf{M}$ is left-equivalent to $\mathbf{N},$ then $\mathbf{N}$ is left-equivalent to $\mathbf{M}.$
(Transitivity) If $\mathbf{M}$ is left-equivalent to $\mathbf{N},$ and $\mathbf{N}$ is left-equivalent to $\mathbf{R},$ then $\mathbf{M}$ is left-equivalent to $\mathbf{R}.$

The idea of this section is to show that among all the matrices that are left-equivalent to a given $\mathbf{M},$ there is a unique “nicest” one.

Definition 3.2

For each row in a matrix, if the row does not consist of zeros only, then the leftmost nonzero entry is called the leading entry of that row.

Definition 3.3 • Row echelon form

A matrix $\mathbf{A}\in M_{m,n}(\mathbb{K})$ is said to be in row echelon form (REF) if

all rows consisting of only zeros are below all the non-zero rows;
the leading entry of a nonzero row is always strictly to the right of the leading entry of the row above it.

The matrix $\mathbf{A}$ is said to be in reduced row echelon form (RREF) if furthermore

all of the leading entries are equal to $1$;
in every column containing a leading entry, all of the other entries in that column are zero.

The basic theorem about REF and RREF is the following one:

Theorem 3.4

For any $\mathbf{M}\in M_{m,n}(\mathbb{K}),$ there exists a unique matrix $\mathbf{N}\in M_{m, n}(\mathbb{K})$ with the following two properties:

$\mathbf{N}$ is in reduced row echelon form, and
$\mathbf{N}$ is left-equivalent to $\mathbf{M}.$

We say $\mathbf{N}$ is the reduced row echelon form of $\mathbf{M}.$ Moreover, there is an explicit algorithm for computing $\mathbf{N},$ given $\mathbf{M}.$

Remark 3.5

By the definition of left-equivalence, there must be some invertible $\mathbf{A}$ that multiplies $\mathbf{M}$ into its RREF; but this $\mathbf{A}$ is not unique in general. It’s somehow a miracle that $\mathbf{N}$ is unique, even though $\mathbf{A}$ isn’t. The extreme case is when $\mathbf{M}= \mathbf{0}_{mn}$; then $\mathbf{M}$ is already in RREF, but $\mathbf{A}\mathbf{M}= \mathbf{M}$ for any matrix $\mathbf{A},$ so we could take $\mathbf{A}$ to be any invertible matrix we like.

Echelonizing a matrix via Gauss–Jordan

We’ll first prove the “existence” part of the theorem. You already saw how to convert a matrix into row echelon form (Gaussian elimination); to get reduced row echelon form, we’ll use a slight refinement, Gauss–Jordan elimination. This relies on the following tools:

Proposition 3.6 • Elementary row operations

Let $\mathbf{M}\in M_{m,n}(\mathbb{K}).$ If $\mathbf{M}'$ is obtained from $\mathbf{M}$ by any one of the following operations, then $\mathbf{M}'$ is left-equivalent to $\mathbf{M}$:

Interchanging the $i$-th and $j$-th rows of $\mathbf{M},$ for any $i \ne j \in \{1, \dots, m\}$;
Multiplying all the entries of the $i$-th row by $\lambda,$ for some $\lambda \ne 0 \in \mathbb{K}$;
Adding $\lambda$ times the $j$-th row of $\mathbf{M}$ to the $i$-th row, for any $i \ne j \in \{1, \dots, m\}$ and $\lambda \in \mathbb{K}.$

Proof. Each of these operations corresponds to left-multiplying $\mathbf{M}$ by one of the three kinds of elementary matrices which you learned about in the M01 course. For instance, adding $\lambda$ times the $j$-th row of $\mathbf{M}$ to the $i$-th row corresponds to multiplying by the matrix with 1’s down the diagonal, $\lambda$ in the $(i, j)$ position, and all other entries zero.

Proof of existence of RREF. Using the “transitivity” property above, if we apply any finite sequence of elementary row operations to $\mathbf{M},$ we still get a matrix left-equivalent to $\mathbf{M}.$ We use this to gradually transform $\mathbf{M},$ one column at a time, to get it into RREF. More precisely, we’ll prove the following:

For any $0 \leqslant r \leqslant n,$ we can apply a finite sequence of row operations to $\mathbf{M}$ to obtain a matrix $\mathbf{M}_r$ with the following property: the $m \times r$ matrix given by the first $r$ columns of $\mathbf{M}_r$ is in reduced row echelon form.

We’ll prove this by induction on $r.$ The statement is vacuous for $r = 0$: a matrix without any columns is certainly in RREF, so $\mathbf{M}_0 = \mathbf{M}$ will do. So let’s assume that we have already transformed $\mathbf{M}$ into a matrix $\mathbf{M}_r$ whose first $r$ columns are in RREF, for some $r < n,$ and try to deal with the $(r + 1)$-st column.

Let $h \leqslant m$ be the number of nonzero rows in the left-hand $m \times r$ submatrix, so the first $r$ columns are all zero below the $h$-th entry. Then we can visualise $\mathbf{M}_r$ as follows: \[\mathbf{M}_r = \left( \begin{array}{rc|c} {\scriptstyle h} \big\{ &\overbrace{\left( \begin{array}{c} \text{RREF}, \\[-0.7ex] \text{no zero rows} \end{array}\right) }^r & \overbrace{\rule{0pt}{3ex}(?)}^{(n-r)} \\ \hline {\scriptstyle (m-h)} \big\{ & \text{($\mathbf{0}$)} & \text{($?$)} \end{array}\right)\]

If $h = m$ (which is certainly possible), then the whole matrix $\mathbf{M}_r$ is in RREF already, so we can set $\mathbf{M}_{r+1} = \mathbf{M}_r$ and go on. If not, then we home in on the first column in the bottom right $(m - h) \times (n - r)$ submatrix of $\mathbf{M}_r.$ Two things can now happen:

Case A: all these entries are zero. In this case, the first $(r + 1)$ columns of $\mathbf{M}_r$ are already in RREF; so we can set $\mathbf{M}_{r + 1} = \mathbf{M}_{r},$ and the induction step is complete.
Case B: at least one of these $(m-h)$ entries is non-zero (so in particular $h < m$).

Swapping the $(h + 1)$-st row with one of the rows further down if necessary, we can assume that this nonzero entry is the top left corner entry of our submatrix (i.e. in position $(h + 1, r + 1)$ of $\mathbf{M}_r$). By multiplying the $(h + 1)$-th row by a suitable scalar, we can make this nonzero entry be 1. This row swap and scaling doesn’t change anything in the leftmost $r$ columns – we’re just moving zeroes around – so the first $r$ columns are still in RREF.

We now kill off all the other non-zero entries in the $(r + 1)$-th column, by subtracting a suitable multiple of the $(h + 1)$-th row. Again, this doesn’t change anything in the first $r$ columns, because the $(h + 1)$-th row has zeroes in these positions; so the resulting matrix $\mathbf{M}_{r+ 1}$ has its first $(r + 1)$ columns in RREF.

After $n$ steps of this process we end up with a matrix which is fully in RREF.

Remark 3.7

Notice that this proof doesn’t just abstractly show you that the RREF exists; it gives you a completely explicit recipe for finding it.

Since calculating the RREF of a matrix is such a basic tool in linear algebra calculations, any half-decent computer mathematics package will include a RREF routine; often several different ones, tailored to matrices of particular specific shapes or with entries in specific fields.

Keeping track of the transformation matrix

In principle, whenever you Gauss–Jordan eliminate a matrix, you can make a list of the elementary row operations you used, and the corresponding elementary matrices; this gives you a list of elementary matrices $(\mathbf{A}_1, \mathbf{A}_2, \dots, \mathbf{A}_k)$ such that $\mathbf{A}_k\mathbf{A}_{k-1}\dots\mathbf{A}_1 \mathbf{M}= \mathbf{N}$ is in RREF. However, this quickly gets quite tedious to do, and there’s a much better method.

Definition 3.8

For $\mathbf{A}\in M_{m,n}(\mathbb{K})$ and $\mathbf{B}\in M_{m, r}(\mathbb{K}),$ we write $(\mathbf{A}\ |\ \mathbf{B})$ for the $m \times (n + r)$ matrix given by joining $\mathbf{A}$ and $\mathbf{B}$ together (so the $(i, j)$ entry is $A_{ij}$ for $1 \leqslant j \leqslant n,$ and $B_{i,j-n}$ for $n + 1 \leqslant j \leqslant n + r$).

E.g. if we have \[\mathbf{A}= \begin{pmatrix} 1 & 3 & 2 \\ 2 & 0 & 1 \\ 5 & 2 & 2\end{pmatrix}\qquad \mathbf{B}= \begin{pmatrix} 4 \\ 3 \\ 1\end{pmatrix},\] then \[(\mathbf{A}\ |\ \mathbf{B}) = \left(\begin{array}{ccc|c} 1 & 3 & 2 & 4\\ 2 & 0 & 1 & 3\\ 5 & 2 & 2 & 1\end{array}\right).\] It’s common to write a line in between the entries, as above, to remind ourselves which entries came from where.

Proposition 3.9

For any matrices $\mathbf{A}\in M_{m, n}(\mathbb{K}),$ $\mathbf{B}\in M_{n, r}(\mathbb{K}),$ and $\mathbf{C}\in M_{n, s}(\mathbb{K}),$ we have \[\mathbf{A}\cdot (\mathbf{B}\ |\ \mathbf{C}) = (\mathbf{A}\mathbf{B}\ |\ \mathbf{A}\mathbf{C}).\]

Proof. The $j$-th column of $\mathbf{A}\cdot (\mathbf{B}\ |\ \mathbf{C})$ is given by $\mathbf{A}\cdot \vec{v}_j$ where $\vec{v}_j$ is the $j$-th column of $(\mathbf{B}\ |\ \mathbf{C}).$ Considering the cases $1 \leqslant j \leqslant r$ and $r+1 \leqslant j \leqslant r + s$ separately, we obtain either a column of $\mathbf{A}\mathbf{B}$ or a column of $\mathbf{A}\mathbf{C}.$

Proposition 3.10

Given any $\mathbf{M}\in M_{m,n}(\mathbb{K}),$ let $\tilde{\mathbf{M}} = (\mathbf{M}\ |\ \mathbf{1}_{m}),$ and let $\tilde{\mathbf{N}}$ be the RREF of $\tilde{\mathbf{M}}.$ Then we have $\tilde{\mathbf{N}} = (\mathbf{N}\ |\ \mathbf{A}),$ where $\mathbf{N}$ is the RREF of $\mathbf{M},$ and $\mathbf{A}$ is a matrix such that $\mathbf{A}\mathbf{M}= \mathbf{N}.$

Proof. Let $\mathbf{A}$ be any invertible matrix such that $\tilde{\mathbf{N}} = \mathbf{A}\tilde{\mathbf{M}}$ is in RREF. From the proposition, we have $\mathbf{A}\tilde{\mathbf{M}} = (\mathbf{A}\mathbf{M}\ |\ \mathbf{A}\mathbf{1}_{m}) = (\mathbf{A}\mathbf{M}\ |\ \mathbf{A}).$ However, if $\tilde{\mathbf{N}}$ is an RREF matrix, then its first $n$ columns are also an RREF matrix, so $\mathbf{A}\mathbf{M}$ must be the unique RREF matrix left-equivalent to $\mathbf{M}.$

Remark 3.11

We can take a slight shortcut here: since Gauss–Jordan elimination transforms a matrix into RREF one column at a time, we can stop as soon as the first $n$ columns of $\tilde{M}$ are in RREF – we don’t have to keep going all the way up to the $(m + n)$-th column.

Uniqueness

We’ll now prove the uniqueness property of reduced row echelon form. This proof is non-examinable (marked by a dark green line down the margin in the printed notes); but you should definitely be aware of the statement of the theorem!

We use the following easy remark:

Proposition 3.12

Assume that $\mathbf{M}$ and $\mathbf{N}$ are left-equivalent, and let $j \in \{1, \dots, n\}.$ Then the matrices $\mathbf{M}'$ and $\mathbf{N}'$ given by deleting the $j$-th columns from $\mathbf{M}$ and $\mathbf{N}$ are also left-equivalent.

Proof. Exercise.

Proof of uniqueness of RREF. Now suppose $\mathbf{M}$ and $\mathbf{N}$ are left-equivalent matrices, both in reduced row echelon form, with $\mathbf{M}\ne \mathbf{N}.$ Let the $t$-th column, for $1 \leqslant t \leqslant n,$ be the first (leftmost) column where the two matrices differ; and consider the new matrices $\mathbf{M}', \mathbf{N}'$ given by deleting all columns strictly to the right of the $t$-th, and all columns to the left which don’t contain a leading entry. Then we have $\mathbf{M}' \ne \mathbf{N}'$; both are still in RREF; and we have \[\mathbf{M}' = \begin{pmatrix} \mathbf{1}_{h} & \vec{r} \\ 0 & \vec{s} \end{pmatrix}, \qquad \mathbf{N}' = \begin{pmatrix} \mathbf{1}_{h} & \vec{r'} \\ 0 & \vec{s'} \end{pmatrix}\] for some $h < t$ and some column vectors $\vec{r}, \vec{s}, \vec{r'}, \vec{s'}.$

Since $\mathbf{M}$ and $\mathbf{N}$ are left-equivalent, so are $\mathbf{M}'$ and $\mathbf{N}'$ (by the last Proposition), so there exists an invertible $\mathbf{A}$ with $\mathbf{A}\mathbf{M}' = \mathbf{N}'.$ Let’s think about what $\mathbf{A}\mathbf{M}'$ looks like. We can divide $\mathbf{A}$ up into blocks $\begin{pmatrix}\mathbf{R} & \mathbf{S} \\ \mathbf{T} & \mathbf{U} \end{pmatrix},$ with $\mathbf{R}\in M_{h,h}(\mathbb{K})$ etc; and the product is then \[\begin{pmatrix}\mathbf{R} & \mathbf{S} \\ \mathbf{T} & \mathbf{U} \end{pmatrix} \cdot \begin{pmatrix} \mathbf{1}_{h} & \vec{r} \\ 0 & \vec{s} \end{pmatrix} = \begin{pmatrix} \mathbf{R} & \mathbf{R}\vec{r} + \mathbf{S}\vec{s} \\ \mathbf{T} & \mathbf{T}\vec{r} + \mathbf{U}\vec{s}\end{pmatrix}.\] Let’s suppose this is equal to $\mathbf{N}'.$ Then, comparing the top-left and bottom-left blocks, we must have $\mathbf{R} = \mathbf{1}_{h}$ and $\mathbf{T} = \mathbf{0}_{m-h, h}$; so this becomes \[\begin{pmatrix} \mathbf{1}_{h} & \vec{r} + \mathbf{S} \vec{s} \\ 0 & \mathbf{U} \vec{s} \end{pmatrix} = \begin{pmatrix} \mathbf{1}_{h} & \vec{r'} \\ 0 & \vec{s'} \end{pmatrix}.\]

If $\vec{s}\ne 0,$ then $\mathbf{U} \vec{s} \ne 0$ also, since the invertibility of $\mathbf{A}$ implies that $\mathbf{U}$ is also invertible (see Exercise below). Since $\mathbf{M}'$ and $\mathbf{N}'$ are in RREF, both $\vec{r}$ and $\vec{r'}$ have to be zero, and $\vec{s}$ and $\vec{s'}$ are both equal to the standard basis vector $\vec{e}_1$; so in fact $\mathbf{M}' = \mathbf{N}',$ contradicting our assumptions.

On the other hand, if $\vec{s} = 0,$ then $\vec{s}' = \mathbf{U} \vec{s} = 0$ as well; and substituting this into the top right block, we have $\vec{r}' = \vec{r} + \mathbf{S} \vec{s} = \vec{r}.$ So again $\mathbf{M}' = \mathbf{N}',$ contradiction. So we can conclude that it is impossible for two distinct RREF matrices to be left-equivalent.

3.2 Solving equations

You already saw in Algorithmics that matrices can be used to “package together” systems of linear equations. Suppose we have a system of simultaneous equations \[\begin{aligned} a_{11} x_1 + a_{12} x_2 + \dots + a_{1n} x_n &= b_1, \\ \vdots\qquad &= \quad\vdots\\ a_{m1} x_1 + a_{m2} x_2 + \dots + a_{mn} x_n &= b_m, \end{aligned}\] with (known) values $a_{ij} \in \mathbb{K}$ and $b_i \in \mathbb{K},$ and we want to solve this for the unknowns $x_j.$ Then we can write this simply as $\mathbf{A}\vec{x} = \vec{b},$ with $\mathbf{A}= (A_{ij}) \in M_{m,n}(\mathbb{K}),$ $\vec{b} = (b_i)$ and $\vec{x} = (x_j)$; and we write $\operatorname{Sol}(\mathbf{A}, \vec{b})$ for the set of solutions to this equation, i.e. \[\operatorname{Sol}(\mathbf{A}, \vec{b}) = \{ \vec{x} \in \mathbb{K}^n : \mathbf{A}\vec{x} = \vec{b}\}.\]

If $\mathbf{B}$ is an invertible square matrix, then we have $\operatorname{Sol}(\mathbf{B}\mathbf{A}, \mathbf{B}\vec{b}) = \operatorname{Sol}(\mathbf{A}, \vec{b}).$ So, if we want to compute $\operatorname{Sol}(\mathbf{A}, \vec{b}),$ it’s a good idea to choose some $\mathbf{B}$ such that $\mathbf{B}\mathbf{A}$ and $\mathbf{B}\vec{b}$ are as simple as possible. So we form the augmented matrix $(\mathbf{A}\, | \vec{b})$ and put that into echelon form; this gives a new, echelonized system of equations where we can immediately read off the solutions.

Proposition 3.13

Suppose $\mathbf{A}$ is in REF (not necessarily RREF), and let $r \leqslant m$ be the number of non-zero rows of $\mathbf{A}.$ Then the equation $\mathbf{A}\vec{x} = \vec{b}$ has a solution if, and only if, we have $b_i = 0$ for all $i$ with $r < i \leqslant m.$

(This condition is vacuously satisfied if $r = m,$ since there are no such $i,$ so solutions always exist in this case.) It’s clear that “$b_i = 0$ for all $i$ with $r < i \leqslant m$” is a necessary condition for solutions to exist, since for $i$ in this range the $i$-th equation in our system is just $0 = b_i.$ What’s less obvious is that it is a sufficient condition, which is shown in Chapter 9 of Algorithmics.

Remark 3.14

We say the system of equations is inconsistent if one of $b_{r + 1}, \dots, b_m$ is nonzero, so the solution set is empty.

no solution

my mind is a matrix
that has been reduced
into row echelon form
and proven to be
— inconsistent

(from More Math Poems by Eileen Tupaz)

Of course, we don’t just want to know whether solutions exist: we want to find them! Having found a REF for $\mathbf{A},$ we say $x_j$ is a free variable if there is no row of the REF whose leading term is in the $j$-th column. Then it’s easy to check that for any values of the free variables, there is a unique way to fill in the rest of the variables to get a solution $\vec{x}.$

Example 3.15

“Find all real numbers $x_1, \dots, x_4$ satisfying the system of linear equations \[\begin{aligned} -9x_2 + 3x_3 + 4x_4 &=9 \\ x_1 + 4x_2 - x_4 &= 5\\ 2x_1 + 6x_2 - x_3 + 5 x_4 &= -5 \text{ .''} \end{aligned}\]

The augmented matrix is \[\left(\begin{array}{cccc|c} 0 & -9 & 3 & 4 & 9 \\ 1 & 4 & 0 & -1 & 5 \\ 2 & 6 & -1 & 5 & -5 \end{array}\right)\]

Let’s walk through the steps of echelonizing this:

Swap rows 1 and 2 to get $\left(\begin{array}{rrrr|r} 1 & 4 & 0 & -1 & 5 \\ 0 & -9 & 3 & 4 & 9 \\ 2 & 6 & -1 & 5 & -5 \end{array}\right)$
Add $-2$ times row 1 to row 3 to get $\left(\begin{array}{rrrr|r} 1 & 4 & 0 & -1 & 5 \\ 0 & -9 & 3 & 4 & 9 \\ 0 & -2 & -1 & 7 & -15 \end{array}\right)$
Multiply row 2 by $-\tfrac{1}{9}$ to get $\left(\begin{array}{rrrr|r} 1 & 4 & 0 & -1 & 5 \\ 0 & 1 & -\frac{1}{3} & -\frac{4}{9} & -1 \\ 0 & -2 & -1 & 7 & -15 \end{array}\right)$
Add $-4$ times row 2 to row 1, and $2$ times row 2 to row 3, to get $\left(\begin{array}{rrrr|r} 1 & 0 & \frac{4}{3} & \frac{7}{9} & 9 \\ 0 & 1 & -\frac{1}{3} & -\frac{4}{9} & -1 \\ 0 & 0 & -\frac{5}{3} & \frac{55}{9} & -17 \end{array}\right)$
Multiply row 3 by $-\tfrac{3}{5}$ to get $\left(\begin{array}{rrrr|r} 1 & 0 & \frac{4}{3} & \frac{7}{9} & 9 \\ 0 & 1 & -\frac{1}{3} & -\frac{4}{9} & -1 \\ 0 & 0 & 1 & -\frac{11}{3} & \frac{51}{5} \end{array}\right)$
Add $-\tfrac{4}{3}$ of row 3 to row 1, and $\tfrac{1}{3}$ of row 3 to row 2, to get $\left(\begin{array}{rrrr|r} 1 & 0 & 0 & \frac{17}{3} & -\frac{23}{5} \\ 0 & 1 & 0 & -\frac{5}{3} & \frac{12}{5} \\ 0 & 0 & 1 & -\frac{11}{3} & \frac{51}{5} \end{array}\right)$

So the solutions to the original system are the same as those of the echelonized system \[\begin{aligned} x_1 + \tfrac{17}{3} x_4 &= -\tfrac{23}{5} \\ x_2 - \tfrac{5}{3} x_4 &= \tfrac{12}{5}\\ x_3 - \tfrac{11}{3} x_4 &= \tfrac{51}{5} \end{aligned}\]

Clearly, we can choose any value of $x_4$ we like (let’s call it $\lambda$), and then read off the values of $x_1, x_2, x_3$ from that, so the solutions are given by \[\vec{x} = \begin{pmatrix} -\tfrac{23}{5} - \tfrac{17}{3} \lambda\\ \tfrac{12}{5} + \tfrac{5}{3} \lambda \\ \tfrac{51}{5} + \tfrac{11}{3} \lambda \\ \lambda \end{pmatrix}\] for any $\lambda \in \mathbb{R}.$

Here is a variation on RREF, where we have unknown parameters, not numbers, in the last column. (I am grateful to Sarah Zerbes for this example.)

Example 3.16

“Consider the system of equations \[\begin{pmatrix}1 & -2 & 1 \\ 2 & 1 & 1 \\ 0 & 5 & -1\end{pmatrix}\begin{pmatrix}x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix}\] where $b_1, b_2, b_3 \in \mathbb{R}.$ For which values of the $b_i$ is this solvable?”

We take the augmented matrix \[\left(\begin{array}{ccc|c} 1 & -2 & 1 & b_1 \\ 2 & 1 & 1 & b_2 \\ 0 & 5 & -1 & b_3 \end{array}\right)\] and apply Gauss–Jordan elimination to echelonize the first three columns. After clearing the first column we get \[\left(\begin{array}{ccc|c} 1 & -2 & 1 & b_1 \\ 0 & 5 & -1 & -2b_1 + b_2 \\ 0 & 5 & -1 & b_3 \end{array}\right)\] and continuing to clear the second column we get \[\left(\begin{array}{ccc|c} 1 & 0 & \frac{3}{5} & \frac{1}{5} b_{1} + \frac{2}{5} b_{2} \\ 0 & 1 & -\frac{1}{5} & -\frac{2}{5} b_{1} + \frac{1}{5} b_{2} \\ 0 & 0 & 0 & 2 b_{1} - b_{2} + b_{3} \end{array}\right).\] So the original system has solutions if and only if $2b_1 - b_2 + b_3 = 0.$

3.3 Inverting a matrix

Non-square matrices

We can now make good on a promise from the last chapter, by showing that a non-square matrix cannot be invertible. This follows from the following more specific theorem:

Proposition 3.17

Suppose $\mathbf{A}\in M_{m, n}(\mathbb{K}).$

If $n > m$ (so $\mathbf{A}$ has strictly more columns than it has rows), then there does not exist a matrix $\mathbf{B}\in M_{n, m}(\mathbb{K})$ with $\mathbf{B}\mathbf{A}= \mathbf{1}_{n}.$
If $n < m,$ then there does not exist a matrix $\mathbf{B}\in M_{n, m}(\mathbb{K})$ with $\mathbf{A}\mathbf{B}= \mathbf{1}_{m}.$

Proof. First suppose $n > m.$ Consider the system of equations $\mathbf{A}\cdot \vec{x} = \mathbf{0}.$ This obviously has the solution $\vec{x} = \mathbf{0}.$ But it must have other solutions too, since $\mathbf{A}$ has only $m$ rows, so at most $m$ of the columns of the RREF can contain a leading entry (possibly less, if there are some zero rows). Thus the general solution has some free variables in it (at least $n - m$ of them). So there exists a solution $\vec{x} \ne \mathbf{0}.$

But then we have a contradiction, since \[\begin{aligned} \vec{x} &= \mathbf{1}_n \vec{x} \\ &= (\mathbf{B}\cdot \mathbf{A}) \cdot \vec{x}\\ &= \mathbf{B}\cdot (\mathbf{A}\cdot \vec{x})\\ &= \mathbf{B}\cdot \mathbf{0}= \mathbf{0}. \end{aligned}\] This proves (i).

Now let’s consider $m > n.$ If $\mathbf{A}\mathbf{B}= \mathbf{1}_m,$ then $\mathbf{B}^T \mathbf{A}^T = (\mathbf{1}_m)^T = \mathbf{1}_m,$ and we obtain a contradiction by applying (i) to $\mathbf{A}^T \in M_{n, m}(\mathbb{K}).$

Square matrices

For square matrices, we can use Gauss–Jordan elimination to determine if a matrix is invertible, and to compute its inverse if so. This relies on the following fact:

Proposition 3.18

Let $\mathbf{A}\in M_{n,n}(\mathbb{K})$ be a square matrix. Then the following statements are equivalent:

$\mathbf{A}$ is invertible;
the RREF of $\mathbf{A}$ is the identity.

Proof. Suppose $\mathbf{A}$ is invertible. Then $\mathbf{A}^{-1}$ is itself invertible (with inverse $\mathbf{A}$), so the equation $\mathbf{A}^{-1} \mathbf{A}= \mathbf{1}_{n}$ shows that $\mathbf{A}$ is left-equivalent to $\mathbf{1}_{n}.$ As $\mathbf{1}_{n}$ is obviously in RREF, this must be the (unique) RREF of $\mathbf{A}.$

Conversely, if the RREF of $\mathbf{A}$ is $\mathbf{1}_{n},$ then we know there is an invertible $\mathbf{B}$ such that $\mathbf{B}\mathbf{A}= \mathbf{1}_{n}.$ However, since $\mathbf{B}$ is invertible, this implies $\mathbf{A}= \mathbf{B}^{-1}$ (using Exercise 2.4), so we have $\mathbf{A}\mathbf{B}= \mathbf{B}^{-1} \mathbf{B}= \mathbf{1}_{n}$ as well. Thus $\mathbf{A}$ is invertible.

Note that this also shows that when $\mathbf{A}$ is invertible, the inverse of $\mathbf{A}$ is the same as the matrix which puts it into RREF. So if we apply Gauss–Jordan elimination to the augmented matrix $(\mathbf{A}\ |\ \mathbf{1}_{n}),$ then the rightmost $n$ columns of the echelonized matrix will be the inverse of $\mathbf{A}.$

Example 3.19 • Inverse of a matrix

We want to compute the inverse of \[\mathbf{A}=\begin{pmatrix} 1 & -2 \\ -3 & 4 \end{pmatrix}.\] Write \[\left(\begin{array}{cc} 1 & -2 \\ -3 & 4\end{array}\right| \left.\begin{array}{cc} 1 & 0 \\ 0 & 1\end{array}\right).\] Adding $3$-times the first row to the second row gives \[\left(\begin{array}{cc} 1 & -2 \\ 0 & -2\end{array}\right| \left.\begin{array}{cc} 1 & 0 \\ 3 & 1\end{array}\right).\] Dividing the second row by $-2$ gives \[\left(\begin{array}{cc} 1 & -2 \\ 0 & 1\end{array}\right| \left.\begin{array}{cc} 1 & 0 \\ -\frac{3}{2} & -\frac{1}{2}\end{array}\right).\] Finally, adding the second row twice to the first row gives \[\left(\begin{array}{cc} 1 & 0 \\ 0 & 1\end{array}\right| \left.\begin{array}{cc} -2 & -1 \\ -\frac{3}{2} & -\frac{1}{2}\end{array}\right).\] The first two columns are now the identity, so $\mathbf{A}$ is invertible; and the last two columns give us the inverse, so \[\mathbf{A}^{-1}=\begin{pmatrix} -2 & -1 \\ -\frac{3}{2} & -\frac{1}{2}\end{pmatrix}.\]

Exercises

Exercise 3.1

Check that left equivalence of matrices is an equivalence relation.

Solution

For this exercise we write $\mathbf{M}\sim \mathbf{N}$ to mean “there exists an invertible $\mathbf{A}$ with $\mathbf{A}\mathbf{M}= \mathbf{N}$”.

Reflexivity: taking $\mathbf{A}= \mathbf{1}_{m},$ every $\mathbf{M}\in M_{m, n}(\mathbb{K})$ satisfies $\mathbf{M}\sim \mathbf{M}.$

Symmetry: suppose $\mathbf{M}\sim \mathbf{N},$ so $\mathbf{A}\mathbf{M}= \mathbf{N}$ for some $\mathbf{A}.$ By assumption $\mathbf{A}$ is invertible; and $\mathbf{A}^{-1} \mathbf{N}= \mathbf{A}^{-1} \mathbf{A}\mathbf{M}= \mathbf{M},$ so $\mathbf{N}\sim \mathbf{M}.$

Transitivity: if $\mathbf{M}\sim \mathbf{N}$ and $\mathbf{N}\sim \mathbf{R},$ we have $\mathbf{A}$ and $\mathbf{B}$ such that $\mathbf{A}\mathbf{M}= \mathbf{N}$ and $\mathbf{B}\mathbf{N}= \mathbf{R}.$ So $(\mathbf{B}\mathbf{A}) \mathbf{M}= \mathbf{B}(\mathbf{A}\mathbf{M}) = \mathbf{B}\mathbf{N}= \mathbf{R}$ (note the use of associativity!), and $\mathbf{B}\mathbf{A}$ is invertible (with inverse $\mathbf{A}^{-1} \mathbf{B}^{-1}$), hence $\mathbf{M}\sim \mathbf{R}.$

Exercise 3.2

Consider the system of equations from 3.16, and suppose $b_3 = b_2 - 2b_1,$ so the system is consistent.

Find formulae (in terms of $b_1$ and $b_2$) for vectors $\vec{c}$ and $\vec{d}$ such that the general solution of the above system of equations is given by \[\begin{pmatrix}x_1 \\ x_2 \\ x_3 \end{pmatrix} = \{ \vec{c} + \lambda \vec{d}\ |\ \lambda \in \mathbb{R}\}.\]

Solution

The only free variable is $x_3,$ since there are leading entries in the first and second columns. Setting $x_3 = \lambda,$ the first row of the RREF gives us \[x_1 + \tfrac{3}{5} \lambda = \tfrac{1}{5} b_1 + \tfrac{2}{5}b_2\] and the second row \[x_2 - \tfrac{1}{5} \lambda = -\tfrac{2}{5}b_1 + \tfrac{1}{5}b_2,\] so the general solution is given by \[\{ \begin{pmatrix} \frac{1}{5} b_{1} + \frac{2}{5} b_{2} - \tfrac{3}{5} \lambda \\ -\frac{2}{5} b_{1} + \frac{1}{5} b_{2} + \tfrac{1}{5}\lambda \\ \lambda\end{pmatrix} : \lambda \in \mathbb{R}\}.\] Thus we can take $\vec{c} = \begin{pmatrix}\frac{1}{5} b_{1} + \frac{2}{5} b_{2} \\ -\frac{2}{5} b_{1} + \frac{1}{5} b_{2} \\ 0 \end{pmatrix} + \lambda \begin{pmatrix} -\tfrac{3}{5} \\ \tfrac{1}{5} \\ 1 \end{pmatrix}.$ (This is not the only way of writing the solution, of course.)

Exercise 3.3

Let $\mathbf{A}\in M_{n, n}(\mathbb{K})$ be an upper-triangular matrix (so its entries satisfy $A_{ij} = 0$ if $i > j$).

Show that $\mathbf{A}$ is invertible if and only if all the diagonal entries $A_{ii}$ are non-zero.
Show that if the condition of part (a) is satisfied, then $\mathbf{A}^{-1}$ is also upper-triangular.

Solution

First, we observe that if $\mathbf{A}$ is upper triangular, and the first $r$ diagonal entries are nonzero, for some $1 \leqslant r \leqslant n,$ then in order to put the first $r$ columns into RREF, we only need to multiply the first $r$ rows by nonzero constants (to make the diagonal entries 1) and add suitable multiples of the $j$-th row to the $i$-th row for each $i < j \leqslant r$ (because the entries below the diagonal are already zero).

This process doesn’t change the rows after the $r$-th at all; and in the recipe for computing the diagonal, by applying echelon form to $(\mathbf{A}\, |\, \mathbf{1}_{n}),$ it won’t introduce any nonzero below-diagonal entries in the right half of the matrix.

So if $A_{rr} \ne 0$ for all $r,$ the RREF of $\mathbf{A}$ will be the identity, and the transformation matrix putting it into RREF (which is $\mathbf{A}^{-1}$) will be upper-triangular.

On the other hand, if $A_{rr} = 0$ for some $r,$ then when we reach the $r$-th column in the RREF algorithm, all the rows from $r, \dots, n$ will have zeroes in the $r$-th column, and hence the $r$-th row of the RREF won’t contain a leading entry; thus the RREF of $\mathbf{A}$ is not the identity and $\mathbf{A}$ is not invertible.

Exercise 3.4

Justify the claim made in the proof of uniqueness of RREF that if $\mathbf{A}= \begin{pmatrix}\mathbf{1}_{h} & \mathbf{S} \\ \mathbf{0}_{m-h, h} & \mathbf{U}\end{pmatrix}$ is invertible, then $\mathbf{U}$ is itself invertible.

Solution

Suppose there exists an inverse $\mathbf{B}$ of $\mathbf{A}.$ We can break $\mathbf{B}$ up into blocks of the same sizes as the blocks of $\mathbf{A},$ so we write it as $\mathbf{B}= \begin{pmatrix} \mathbf{C}& \mathbf{D}\\ \mathbf{E}& \mathbf{F}\end{pmatrix}$ (with $\mathbf{C}\in M_{h, h}(\mathbb{K}), \mathbf{D}\in M_{h, m-h}(\mathbb{K})$ etc), and we have \[\mathbf{B}\mathbf{A}= \begin{pmatrix} \mathbf{C}& \mathbf{D}\\ \mathbf{E}& \mathbf{F}\end{pmatrix}\begin{pmatrix}\mathbf{1}_{} & \mathbf{S} \\ \mathbf{0}& \mathbf{U}\end{pmatrix} = \begin{pmatrix}\mathbf{1}_{} & \mathbf{0}\\ \mathbf{0}& \mathbf{1}_{}\end{pmatrix}.\] We can compute the products block-by-block, hence \[\begin{pmatrix} \mathbf{C}& \mathbf{C}\mathbf{S} + \mathbf{D}\mathbf{U} \\ \mathbf{E}& \mathbf{E}\mathbf{S} + \mathbf{F} \mathbf{U}\end{pmatrix} = \begin{pmatrix}\mathbf{1}_{} & \mathbf{0}\\ \mathbf{0}& \mathbf{1}_{}\end{pmatrix}\] Comparing the bottom left blocks we have $\mathbf{E}= \mathbf{0},$ and now comparing the bottom right blocks we have $\mathbf{F} \mathbf{U} = \mathbf{1}_{}.$ Now (using the fact that $\mathbf{E}= \mathbf{0}$) we can compute the other product as \[\mathbf{A}\mathbf{B}= \begin{pmatrix}\mathbf{1}_{} & \mathbf{S} \\ \mathbf{0}& \mathbf{U}\end{pmatrix} \begin{pmatrix} \mathbf{C}& \mathbf{D}\\ \mathbf{0}& \mathbf{F}\end{pmatrix} = \begin{pmatrix}\mathbf{C}& \mathbf{D}+ \mathbf{S} \mathbf{F} \\ \mathbf{0}& \mathbf{U}\mathbf{F}\end{pmatrix} = \begin{pmatrix}\mathbf{1}_{} & \mathbf{0}\\ \mathbf{0}& \mathbf{1}_{}\end{pmatrix},\] so we deduce $\mathbf{U} \mathbf{F}= \mathbf{1}_{}.$ Thus $\mathbf{F}$ is both a left and a right inverse of $\mathbf{U},$ so $\mathbf{U}$ is invertible as required.

Exercise 3.5

Show that if $\mathbf{A}$ is an invertible square matrix, then there is a finite sequence of elementary matrices $\mathbf{B}_1 \mathbf{B}_2 \dots \mathbf{B}_k$ whose product is $\mathbf{A}.$ (Hint: what is the RREF of $\mathbf{A}^{-1}$?)

Solution

We know that $\mathbf{A}^{-1}$ is itself invertible (with inverse $\mathbf{A}$) so its RREF is the identity. Hence we can find a sequence of elementary matrices $\mathbf{B}_1 \mathbf{B}_2 \dots \mathbf{B}_k$ with \[\mathbf{B}_1 \mathbf{B}_2 \dots \mathbf{B}_k \mathbf{A}^{-1} = \mathbf{1}_{n}\] and right-multiplying both sides by $\mathbf{A}$ we deduce $\mathbf{B}_1 \mathbf{B}_2 \dots \mathbf{B}_k = \mathbf{A}.$

MCQ 3.1

If $\mathbf{B}_1,\ldots,\mathbf{B}_N,\mathbf{C}_1,\ldots,\mathbf{C}_{\tilde{N}}\in M_{m,n}(\mathbb{K})$ are such that $\mathbf{B}_N\mathbf{B}_{N-1}\cdots \mathbf{B}_1 = \mathbf{C}_{\tilde{N}}\mathbf{C}_{\tilde{N}-1}\cdots \mathbf{C}_1$ is the reduced row echelon form of $\mathbf{A}\in M_{m,n}(\mathbb{K}),$ then $\tilde N=N$ and $\mathbf{B}_k = \mathbf{C}_k$ for all $k=1,\ldots, N.$

True
False

MCQ 3.2

$\mathbf{A}\in M_{n,n}(\mathbb{K})$ is invertible if and only if the reduced row echelon form of $\mathbf{A}$ equals $\mathbf{1}_{n}.$

True
False

MCQ 3.3

In general, a solution to the equation $\mathbf{A}\vec x = \vec b$ is given by $\vec x = \mathbf{A}^{-1}\vec b,$ for $\mathbf{A}\in M_{n,n}(\mathbb{K})$ and $\vec x,\vec b\in\mathbb{K}^n.$

True
False

MCQ 3.4

A system of $m$ linear equations in $n$ variables, $\mathbf{A}\vec{x} = \vec{b},$ always has a solution if $m < n.$

True
False

3 Matrices, II

3.1 Row echelon form

Definition

Echelonizing a matrix via Gauss–Jordan

Keeping track of the transformation matrix

Uniqueness

3.2 Solving equations

3.3 Inverting a matrix

Non-square matrices

Square matrices

Exercises

Home

Chapters

Contents

PDFs

✕