3 Vector spaces and linear maps

3.1 Vector spaces

We have seen that to every matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) we can associate a mapping \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) which is additive and \(1\)-homogeneous. Another example of a mapping which is additive and \(1\)-homogeneous is the derivative. Consider \(\mathsf{P}(\mathbb{R}),\) the set of polynomial functions in one real variable, which we denote by \(x,\) with real coefficients. That is, an element \(p \in \mathsf{P}(\mathbb{R})\) is a function \[p : \mathbb{R}\to \mathbb{R}, \qquad x \mapsto a_n x^n+a_{n-1}x^{n-1}+\cdots + a_1 x+a_0=\sum_{k=0}^na_kx^k,\] where \(n \in \mathbb{N}\) and the coefficients \(a_k \in \mathbb{R}\) for \(k=0,1,\ldots,n.\) The largest \(m \in \mathbb{N}\cup \{0\}\) such that \(a_m \neq 0\) is called the degree of \(p.\) Notice that we consider polynomials of arbitrary, but finite degree. A power series \(x \mapsto \sum_{k=0}^{\infty} a_k x^k,\) that you encounter in the Analysis module, is not a polynomial, unless only finitely many of its coefficients are different from zero.

Clearly, we can multiply \(p\) with a real number \(s\in \mathbb{R}\) to obtain a new polynomial \(s\cdot_{\mathsf{P}(\mathbb{R})} p\) \[\tag{3.1} s\cdot_{\mathsf{P}(\mathbb{R})} p : \mathbb{R}\to \mathbb{R}, \qquad x \mapsto s\cdot p(x)\] so that \((s\cdot_{\mathsf{P}(\mathbb{R})}p)(x)=\sum_{k=0}^n sa_k x^k\) for all \(x \in \mathbb{R}.\) Here \(s\cdot p(x)\) is the usual multiplication of the real numbers \(s\) and \(p(x).\) If we consider another polynomial \[q : \mathbb{R}\to \mathbb{R}, \qquad x\mapsto \sum_{k=0}^n b_k x^k\] with \(b_k \in \mathbb{R}\) for \(k=0,1,\ldots,n,\) the sum of the polynomials \(p\) and \(q\) is the polynomial \[\tag{3.2} p+_{\mathsf{P}(\mathbb{R})}q : \mathbb{R}\to \mathbb{R}, \qquad x \mapsto p(x)+q(x)\] so that \((p+_{\mathsf{P}(\mathbb{R})}q)(x)=\sum_{k=0}(a_k+b_k)x^k\) for all \(x \in \mathbb{R}.\) Here \(p(x)+q(x)\) is the usual addition of the real numbers \(p(x)\) and \(q(x).\) We will henceforth omit writing \(+_{\mathsf{P}(\mathbb{R})}\) and \(\cdot_{\mathsf{P}(\mathbb{R})}\) and simply write \(+\) and \(\cdot.\)

We may think of the derivative with respect to the variable \(x\) as a mapping \[\frac{\mathrm{d}}{\mathrm{d}x} : \mathsf{P}(\mathbb{R}) \to \mathsf{P}(\mathbb{R}).\] Now recall that the derivative satisfies \[\tag{3.3} \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}x}(p+q)&=\frac{\mathrm{d}}{\mathrm{d}x}(p)+\frac{\mathrm{d}}{\mathrm{d}x}(q) \qquad && (\text{additivity}),\\ \frac{\mathrm{d}}{\mathrm{d}x}(s\cdot p)&=s\cdot \frac{\mathrm{d}}{\mathrm{d}x}(p)\qquad && (\text{$1$-homogeneity}). \end{aligned}\]

Comparing (2.6) with (3.3) we notice that the polynomials \(p,q\) take the role of the vectors \(\vec{x},\vec{y}\) and the derivative takes the role of the mapping \(f_\mathbf{A}.\) This suggests that the mental image of a vector being an arrow in \(\mathbb{K}^n\) is too narrow and that we ought to come up with a generalisation of the space \(\mathbb{K}^n\) whose elements are abstract vectors.

In order to define the notion of a space of abstract vectors, we may ask what key structure the set of (column) vectors \(\mathbb{K}^n\) carries. On \(\mathbb{K}^n,\) we have two fundamental operations, \[\begin{aligned} + &: \mathbb{K}^n \times \mathbb{K}^n \to \mathbb{K}^n& (\vec{x},\vec{y}) &\mapsto \vec{x}+ \vec{y},& &\text{(vector addition),}\\ \cdot &: \mathbb{K}\times \mathbb{K}^n \to \mathbb{K}^n,& (s,\vec{x}) &\mapsto s\cdot\vec{x},& &\text{(scalar multiplication).} \end{aligned}\] A vector space is roughly speaking a set where these two operations are defined and obey the expected properties. More precisely:

Definition 3.1 • Vector space

A \(\mathbb{K}\)-vector space, or vector space over \(\mathbb{K}\) is a set \(V\) with a distinguished element \(0_V\) (called the zero vector) and two operations \[\begin{aligned} +_V : V \times V \to V& &(v_1,v_2) \mapsto v_1+_Vv_2& &(\text{vector addition}) \end{aligned}\] and \[\begin{aligned} \cdot_V : \mathbb{K}\times V \to V& &(s,v) \mapsto s\cdot_V v& &(\text{scalar multiplication}), \end{aligned}\] so that the following properties hold:

  • Commutativity of vector addition \[v_1+_Vv_2=v_2+_Vv_1\quad (\text{for all}\; v_1,v_2 \in V);\]

  • Associativity of vector addition \[v_1+_V(v_2+_Vv_3)=(v_1+_Vv_2)+_Vv_3 \quad (\text{for all}\; v_1,v_2,v_3 \in V);\]

  • Identity element of vector addition \[\tag{3.4} 0_V+_Vv=v+_V0_V=v\quad (\text{for all}\; v \in V);\]

  • Identity element of scalar multiplication \[1\cdot_V v=v\quad (\text{for all}\; v \in V);\]

  • Scalar multiplication by zero \[\tag{3.5} 0\cdot_{V}v=0_V \quad (\text{for all}\; v \in V);\]

  • Compatibility of scalar multiplication with field multiplication \[(s_1s_2)\cdot_V v=s_1\cdot_V(s_2\cdot_V v) \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V);\]

  • Distributivity of scalar multiplication with respect to vector addition \[s\cdot_V(v_1+_Vv_2)=s\cdot_Vv_1+_Vs\cdot_V v_2\quad (\text{for all}\; s\in \mathbb{K}, v_1,v_2 \in V);\]

  • Distributivity of scalar multiplication with respect to field addition \[(s_1+s_2)\cdot_Vv=s_1\cdot_Vv+_Vs_2\cdot_Vv \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V).\] The elements of \(V\) are called vectors.

Example 3.2 • Field

A field \(\mathbb{K}\) is a \(\mathbb{K}\)-vector space. We may take \(V=\mathbb{K},\) \(0_V=0_{\mathbb{K}}\) and equip \(V\) with addition \(+_V=+_{\mathbb{K}}\) and scalar multiplication \(\cdot_V=\cdot_{\mathbb{K}}.\) Then the properties of a field imply that \(V=\mathbb{K}\) is a \(\mathbb{K}\)-vector space.

Example 3.3 • Vector space of matrices Let \(V=M_{m,n}(\mathbb{K})\) denote the set of \(m\times n\)-matrices with entries in \(\mathbb{K}\) and \(0_V=\mathbf{0}_{m,n}\) denote the zero vector. It follows from Proposition 2.15 that \(V\) equipped with addition \(+_V : V \times V \to V\) defined by (2.4) and scalar multiplication \(\cdot_V : \mathbb{K}\times V \to V\) defined by (2.3) is a \(\mathbb{K}\)-vector space. In particular, the set of column vectors \(\mathbb{K}^n=M_{n,1}(\mathbb{K})\) is a \(\mathbb{K}\)-vector space as well.
Example 3.4 • Vector space of polynomials

The set \(\mathsf{P}(\mathbb{R})\) of polynomials in one real variable and with real coefficients is an \(\mathbb{R}\)-vector space, when equipped with addition and scalar multiplication as defined in (3.1) and (3.2) and when the zero vector \(0_{\mathsf{P}(\mathbb{R})}\) is defined to be the zero polynomial \(o : \mathbb{R}\to \mathbb{R},\) that is, the polynomial satisfying \(o(x)=0\) for all \(x \in \mathbb{R}.\)

More generally, functions form a vector space:

Example 3.5 • Vector space of functions

We follow the convention of calling a mapping with values in \(\mathbb{K}\) a function. Let \(I\subset \mathbb{R}\) be an interval and let \(o : I \to \mathbb{K}\) denote the zero function defined by \(o(x)=0\) for all \(x \in I.\) We consider \(V=\mathsf{F}(I,\mathbb{K}),\) the set of functions from \(I\) to \(\mathbb{K}\) with zero vector \(0_V=o\) given by the zero function and define addition \(+_V : V \times V \to V\) as in (3.2) and scalar multiplication \(\cdot_V : \mathbb{K}\times V \to V\) as in (3.1). It now is a consequence of the properties of addition and multiplication of scalars that \(\mathsf{F}(I,\mathbb{K})\) is a \(\mathbb{K}\)-vector space. (The reader is invited to check this assertion!)

Example 3.6 • Vector space of sequences

A mapping \(x : \mathbb{N} \to \mathbb{K}\) from the natural numbers into a field \(\mathbb{K}\) called a sequence in \(\mathbb{K}\) (or simply a sequence, when \(\mathbb{K}\) is clear from the context). It is common to write \(x_n\) instead of \(x(n)\) for \(n \in \mathbb{N}\) and to denote a sequence by \((x_n)_{n \in \mathbb{N}}=(x_1,x_2,x_3,\ldots).\) We write \(\mathbb{K}^{\infty}\) for the set of sequences in \(\mathbb{K}.\) For instance, taking \(\mathbb{K}=\mathbb{R},\) we may consider the sequence \[\left(\frac{1}{n}\right)_{n \in \mathbb{N}}=\left(1,\frac{1}{2},\frac{1}{3},\frac{1}{4},\frac{1}{5},\ldots\right)\] or the sequence \[\left(\sqrt{n}\right)_{n \in \mathbb{N}}=\left(1,\sqrt{2},\sqrt{3},2,\sqrt{5},\ldots\right).\] If we equip \(\mathbb{K}^{\infty}\) with the zero vector given by the zero sequence \((0,0,0,0,0,\ldots),\) addition given by \((x_n)_{n \in \mathbb{N}}+(y_n)_{n\in N}=(x_n+y_n)_{n \in \mathbb{N}}\) and scalar multiplication given by \(s\cdot(x_n)_{n \in \mathbb{N}}=(sx_n)_{n \in \mathbb{N}}\) for \(s\in \mathbb{K},\) then \(\mathbb{K}^{\infty}\) is a \(\mathbb{K}\)-vector space.

Example 3.7 • Zero vector space Consider a set \(V=\{x\}\) consisting of a single element. We define \(0_V=x,\) addition by \(x+_Vx=x\) and scalar multiplication by \(s\cdot_V x=x.\) Then all the properties of Definition 3.1 are satisfied. We write \(V=\{0_V\}\) or simply \(V=\{0\}\) and call \(V\) the zero vector space (over \(\mathbb{K}\)).

The notion of a vector space is an example of an abstract space. Later in your studies you will encounter further examples, like topological spaces, metric spaces and manifolds.

Remark 3.8 • Notation & Definition

Let \(V\) be a \(\mathbb{K}\)-vector space.

  • For \(v \in V\) we write \(-v=(-1)\cdot_V v\) and for \(v_1,v_2 \in V\) we write \(v_1-v_2=v_1+_V(-v_2).\) In particular, using the properties from Definition 3.1 we have (check which properties we do use!) \[v-v=v+_V(-v)=v+_V(-1)\cdot_V v=(1-1)\cdot_Vv=0\cdot_Vv=0_V\] For this reason we call \(-v\) the additive inverse of \(v\).
  • Again, it is too cumbersome to always write \(+_V,\) for this reason we often write \(v_1+v_2\) instead of \(v_1+_Vv_2.\)

  • Likewise, we will often write \(s\cdot v\) or \(sv\) instead of \(s\cdot_Vv.\)

  • It is also customary to write \(0\) instead of \(0_V.\)

Lemma 3.9 • Elementary properties of vector spaces

Let \(V\) be a \(\mathbb{K}\)-vector space. Then we have:

  1. The zero vector is unique, that is, if \(0_V^{\prime}\) is another vector such that \(0_V^{\prime}+v=v+0_V^{\prime}=v\) for all \(v \in V,\) then \(0_V^{\prime}=0_V.\)

  2. The additive inverse of every \(v \in V\) is unique, that is, if \(w \in V\) satisfies \(v+w=0_V,\) then \(w=-v.\)

  3. For all \(s\in \mathbb{K}\) we have \(s0_V=0_V.\)

  4. For \(s\in \mathbb{K}\) and \(v \in V\) we have \(sv=0_V\) if and only if either \(s=0\) or \(v=0_V.\)

Proof. (The reader is invited to check which property of Definition 3.1 is used in each of the equality signs below)
  1. We have \(0_V^{\prime}=0_V^{\prime}+0_V=0_V.\)

  2. Since \(v+w=0_V,\) adding \(-v,\) we obtain \((-v)+v+w=0_V+(-v)=-v=w.\)

  3. We compute \(s0_V=s(0_V+0_V)=s0_V+s0_V\) so that \(s0_V-s0_V=0_V=s0_V.\)

  4. \(\Leftarrow\) If \(v=0_V,\) then \(sv=0_V\) by (iii). If \(s=0,\) then \(sv=0_V\) by (3.5).

  5. \(\Rightarrow\) Let \(s\in \mathbb{K}\) and \(v \in V\) such that \(sv=0_V.\) It is sufficient to show that if \(s\neq 0,\) then \(v=0_V.\) Since \(s\neq 0\) we can multiply \(sv=0_V\) with \(1/s\) so that \[\frac{1}{s}\left(sv\right)=\left(\frac{1}{s}s\right)v=v=\frac{1}{s}0_V=0_V.\]

3.2 Linear maps

Throughout this section, \(V,W\) denote \(\mathbb{K}\)-vector spaces.

Previously we saw that the mapping \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) associated to a matrix \(M_{m,n}(\mathbb{K})\) is additive and \(1\)-homogeneous. These notions also make sense for mappings between vector spaces.

Definition 3.10 • Linear map

A mapping \(f : V \to W\) is called linear if it is additive and \(1\)-homogeneous, that is, if it satisfies \[\tag{3.6} f(s_1v_1+s_2v_2)=s_1 f(v_1)+s_2 f(v_2)\] for all \(s_1,s_2 \in \mathbb{K}\) and for all \(v_1,v_2 \in V.\)

The reader is invited to check that the condition (3.6) is indeed equivalent to \(f\) being additive and \(1\)-homogeneous.

Example 3.11 As we have seen in Remark 2.23, the mapping \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) associated to a matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) is linear. In Lemma 3.18 below we will see that in fact any linear map \(\mathbb{K}^n \to \mathbb{K}^m\) is of this form.
Animation: Additivity of a linear map means that the image of the sum of two vectors is the sum of the image vectors.
Example 3.12

The derivative \(\frac{\mathrm{d}}{\mathrm{d}x} : \mathsf{P}(\mathbb{R}) \to \mathsf{P}(\mathbb{R})\) is linear, see (3.3).

Example 3.13

The matrix transpose is a map \(M_{m,n}(\mathbb{K}) \to M_{n,m}(\mathbb{K})\) and this map is linear. Indeed, for all \(s,t\in \mathbb{K}\) and \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K}),\) we have \[\begin{gathered} (s\mathbf{A}+t\mathbf{B})^T=(sA_{ji}+tB_{ji})_{1\leqslant j\leqslant n, 1\leqslant i \leqslant m}=s(A_{ji})_{1\leqslant j\leqslant n, 1\leqslant i \leqslant m}+\\t(B_{ji})_{1\leqslant j\leqslant n, 1\leqslant i \leqslant m}=s\mathbf{A}^T+t\mathbf{B}^T. \end{gathered}\]

Example 3.14

If \(\mathcal{X}\) is set, the mapping \(\mathrm{Id}_\mathcal{X} : \mathcal{X} \to \mathcal{X}\) which returns its input is called the identity mapping. Let \(V\) be a \(\mathbb{K}\)-vector space and \(\mathrm{Id}_V : V \to V\) the identity mapping so that \(\mathrm{Id}_V(v)=v\) for all \(v \in V.\) The identity mapping is linear since for all \(s_1,s_2 \in \mathbb{K}\) and \(v_1,v_2 \in V\) we have \[\mathrm{Id}_V(s_1v_1+s_2v_2)=s_1v_1+s_2v_2=s_1\mathrm{Id}_V(v_1)+s_2\mathrm{Id}_V(v_2).\]

A necessary condition for linearity of a mapping is that it maps the zero vector onto the zero vector:

Lemma 3.15

Let \(f : V \to W\) be a linear map, then \(f(0_V)=0_W.\)

Proof. Since \(f : V \to W\) is linear, we have \[f(0_V)=f(0\cdot 0_V)=0\cdot f(0_V)=0_W.\]

Proposition 3.16

Let \(V_1,V_2,V_3\) be \(\mathbb{K}\)-vector spaces and \(f : V_1 \to V_2\) and \(g: V_2 \to V_3\) be linear maps. Then the composition \(g \circ f : V_1 \to V_3\) is linear. Furthermore, if \(f : V_1 \to V_2\) is bijective, then the inverse function \(f^{-1} : V_2 \to V_1\) (satisfying \(f^{-1}\circ f=f\circ f^{-1}=\mathrm{Id}_{V_1})\) is linear.

Proof. Let \(s,t\in \mathbb{K}\) and \(v,w \in V_1.\) Then \[\begin{aligned} \left(g\circ f\right)(sv+tw)&=g(f(sv+tw))=g(sf(v)+tf(w))\\&=sg(f(v))+tg(f(w))=s(g\circ f)(v)+t(g\circ f)(w), \end{aligned}\] where we first use the linearity of \(f\) and then the linearity of \(g.\) It follows that \(g\circ f\) is linear.

Now suppose \(f : V_1 \to V_2\) is bijective with inverse function \(f^{-1} : V_2 \to V_1.\) Let \(s,t\in \mathbb{K}\) and \(v,w \in V_2.\) Since \(f\) is bijective there exist unique vectors \(v^{\prime},w^{\prime} \in V_1\) with \(f(v^{\prime})=v\) and \(f(w^{\prime})=w.\) Hence we can write \[\begin{aligned} f^{-1}(sv+tw)&=f^{-1}(sf(v^{\prime})+tf(w^{\prime}))=f^{-1}\left(f(sv^{\prime}+tw^{\prime})\right)\\ &=(f^{-1}\circ f)(sv^{\prime}+tw^{\prime})=sv^{\prime}+tw^{\prime}, \end{aligned}\] where we use the linearity of \(f.\) Since we also have \(v^{\prime}=f^{-1}(v)\) and \(w^{\prime}=f^{-1}(w),\) we obtain \[f^{-1}(sv+tw)=sf^{-1}(v)+tf^{-1}(w),\] thus showing that \(f^{-1} : V_2 \to V_1\) is linear.

We also have:

Proposition 3.17

Let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) the associated linear map. Then \(f_\mathbf{A}\) is bijective if and only if there exists a matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) satisfying \(\mathbf{B}\mathbf{A}=\mathbf{1}_{n}\) and \(\mathbf{A}\mathbf{B}=\mathbf{1}_{m}.\) In this case, the matrix \(\mathbf{B}\) is unique and will be denoted by \(\mathbf{A}^{-1}.\) We refer to \(\mathbf{A}^{-1}\) as the inverse of \(\mathbf{A}\) and call \(\mathbf{A}\) invertible.

In order to prove Proposition 3.17 we need the following lemma:
Lemma 3.18

A mapping \(g : \mathbb{K}^m \to \mathbb{K}^n\) is linear if and only if there exists a matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) so that \(g=f_\mathbf{B}.\)

Proof. Let \(\mathbf{B}\in M_{n,m}(\mathbb{K}),\) then \(f_\mathbf{B}\) is linear by Remark 2.23. Conversely, let \(g : \mathbb{K}^m \to \mathbb{K}^n\) be linear. Let \(\{\vec{e}_1,\ldots,\vec{e}_m\}\) denote the standard basis of \(\mathbb{K}^m.\) Write \[g(\vec{e}_i)=\begin{pmatrix} B_{1i} \\ \vdots \\ B_{ni}\end{pmatrix}\quad \text{for} \quad i=1,\ldots,m\] and consider the matrix \[\mathbf{B}=\begin{pmatrix} B_{11} & \cdots & B_{1m} \\ \vdots & \ddots & \vdots \\ B_{n1} & \cdots & B_{nm}\end{pmatrix} \in M_{n,m}(\mathbb{K}).\] For \(i=1,\ldots,m\) we obtain \[\tag{3.7} f_\mathbf{B}(\vec{e}_i)=\mathbf{B}\vec{e}_i=g(\vec{e}_i).\] Any vector \(\vec{v}=(v_i)_{1\leqslant i\leqslant m} \in\mathbb{K}^m\) can be written as \[\vec{v}=v_1\vec{e}_1+\cdots+v_m\vec{e}_m\] for (unique) scalars \(v_i,\) \(i=1,\ldots,m.\) Hence using the linearity of \(g\) and \(f_\mathbf{B},\) we compute \[\begin{aligned} g(\vec{v})-f_\mathbf{B}(\vec{v})&=g(v_1\vec{e}_1+\cdots+v_m\vec{e}_m)-f_\mathbf{B}(v_1\vec{e}_1+\cdots+v_m\vec{e}_m)\\ &=v_1\left(g(\vec{e}_1)-f_\mathbf{B}(\vec{e}_1)\right)+\cdots+v_m\left(g(\vec{e}_m)-f_\mathbf{B}(\vec{e}_m)\right)=0_{\mathbb{K}^n}, \end{aligned}\] where the last equality uses (3.7). Since the vector \(\vec{v}\) is arbitrary, it follows that \(g=f_\mathbf{B},\) as claimed.
Proof of Proposition 3.17. First, notice that the mapping \(f_{\mathbf{1}_{n}} : \mathbb{K}^n \to \mathbb{K}^n\) associated to the unit matrix is the identity mapping on \(\mathbb{K}^n,\) that is, for all \(n \in \mathbb{N},\) we have \(f_{\mathbf{1}_{n}}=\mathrm{Id}_{\mathbb{K}^n}.\) Let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and suppose that \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) is bijective with inverse function \((f_\mathbf{A})^{-1} : \mathbb{K}^m \to \mathbb{K}^n.\) By Proposition 3.16, the mapping \((f_\mathbf{A})^{-1}\) is linear and hence of the form \((f_\mathbf{A})^{-1}=f_\mathbf{B}\) for some matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) by the previous Lemma 3.18. Using Theorem 2.21, we obtain \[(f_\mathbf{A})^{-1}\circ f_\mathbf{A}=\mathrm{Id}_{\mathbb{K}^n}=f_\mathbf{B}\circ f_\mathbf{A}=f_{\mathbf{B}\mathbf{A}}=f_{\mathbf{1}_{n}}\] hence Proposition 2.20 implies that \(\mathbf{B}\mathbf{A}=\mathbf{1}_{n}.\) Likewise we have \[f_\mathbf{A}\circ (f_\mathbf{A})^{-1}=\mathrm{Id}_{\mathbb{K}^m}=f_\mathbf{A}\circ f_\mathbf{B}=f_{\mathbf{A}\mathbf{B}}=f_{\mathbf{1}_{m}}\] so that \(\mathbf{A}\mathbf{B}=\mathbf{1}_{m}.\)

Conversely, let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and suppose the matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) satisfies \(\mathbf{A}\mathbf{B}=\mathbf{1}_{m}\) and \(\mathbf{B}\mathbf{A}=\mathbf{1}_{n}.\) Then, as before, we have \[f_{\mathbf{A}\mathbf{B}}=f_{\mathbf{1}_{m}}=\mathrm{Id}_{\mathbb{K}^m}=f_\mathbf{A}\circ f_\mathbf{B}\quad \text{and} \quad f_{\mathbf{B}\mathbf{A}}=f_{\mathbf{1}_{n}}=\mathrm{Id}_{\mathbb{K}^n}=f_\mathbf{B}\circ f_\mathbf{A}\] showing that \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) is bijective with inverse function \(f_\mathbf{B}: \mathbb{K}^m \to \mathbb{K}^n.\)

Finally, to verify the uniqueness of \(\mathbf{B},\) we assume that there exists \({\mathbf{B}^{\prime}} \in M_{n,m}(\mathbb{K})\) with \(\mathbf{A}{\mathbf{B}^{\prime}}=\mathbf{1}_{m}\) and \({\mathbf{B}^{\prime}}\mathbf{A}=\mathbf{1}_{n}.\) Then \[{\mathbf{B}^{\prime}}={\mathbf{B}^{\prime}}\mathbf{1}_{m}={\mathbf{B}^{\prime}}\mathbf{A}\mathbf{B}=({\mathbf{B}^{\prime}}\mathbf{A})\mathbf{B}=\mathbf{1}_{n}\mathbf{B}=\mathbf{B},\] showing that \({\mathbf{B}^{\prime}}=\mathbf{B},\) hence \(\mathbf{B}\) is unique.

Exercises

Exercise 3.19

Let \(f : V \to W\) be a linear map, \(k\geqslant 2\) a natural number and \(s_1,\ldots,s_k \in \mathbb{K}\) and \(v_1,\ldots,v_k \in V.\) Then \(f : V \to W\) satisfies \[f(s_1v_1+\cdots +s_kv_k)=s_1f(v_1)+\cdots+s_kf(v_k)\] or written with the sum symbol \[\boxed{f\left(\sum_{i=1}^ks_iv_i\right)=\sum_{i=1}^ks_if(v_i).}\] This identity is used frequently in Linear Algebra, so make sure you understand it.

Solution

We will show the claim by induction on \(k.\) If \(k=2,\) the claim follows immediately by linearity of \(f\) and the statement is anchored.

Inductive step: Suppose the statement is true for \(k\geqslant 2\) and we will argue that it will be true for \(k+1.\) Let \(s_1,\ldots,s_{k+1}\in \mathbb{K}\) and \(v_1,\ldots,v_{k+1}\in V.\) Writing \(u = \sum_{i=1}^{k}s_iv_i,\) we have \[\begin{aligned} f\bigg(\sum_{i=1}^{k+1}s_iv_i\bigg) & = f(u+s_{k+1}v_{k+1}) \\ & = f(u) + s_{k+1}f(v_{k+1}) \\ & = \sum_{i=1}^{k}s_if(v_i) + s_{k+1}f(v_{k+1})\\ & = \sum_{i=1}^{k+1}s_if(v_i), \end{aligned}\] where the second equality follows by linearity of \(f\) and the third one by the induction hypothesis.

Exercise 3.20

Let \(a,b,c,d \in \mathbb{K}\) and \[\mathbf{A}=\begin{pmatrix} a & b \\ c & d \end{pmatrix} \in M_{2,2}(\mathbb{K}).\] Show that \(\mathbf{A}\) has an inverse \(\mathbf{A}^{-1}\) if and only if \(ad-bc\neq 0.\) For \(ad-bc\neq 0,\) compute the inverse \(\mathbf{A}^{-1}.\)

Solution

If the matrix \(\mathbf{A}\) had an inverse given by \[\mathbf{A}^{-1}=\begin{pmatrix} u & v \\ w & x \end{pmatrix},\] then \(\mathbf{A}\mathbf{A}^{-1}=\mathbf{1}_{2},\) which amounts to the equations \[\begin{aligned} au+bw & = 1, \\ av+bx & = 0,\\ cu+dw & = 0,\\ cv+dx &= 1. \end{aligned}\] Multiplying the first equation by \(c\) and using the third equation leads to \(c=-(ad-bc)w.\) Similarly we find \[\begin{aligned} a& =(ad-bc)x,\\ b & =-(ad-bc)v,\\ c & =-(ad-bc)w,\\ d & =(ad-bc)u. \end{aligned}\] These equations imply that \[\begin{pmatrix}a & b \\ c & d\end{pmatrix}\begin{pmatrix}d & -b \\ -c & a\end{pmatrix} = \begin{pmatrix}ad-bc & 0 \\ 0 & ad-bc\end{pmatrix},\] which shows that \(\mathbf{A}\) is invertible if and only if \(ad-bc\ne 0\) and in this case, the inverse is given by \[\mathbf{A}^{-1}= \frac{1}{ad-bc}\begin{pmatrix}d & -b \\ -c & a\end{pmatrix}.\]

Home

Contents

Exercise Sheets

Lecture Recordings

Quizzes

Study Weeks