3 Vector spaces and linear maps
3.1 Vector spaces
We have seen that to every matrix \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) we can associate a mapping \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) which is additive and \(1\)-homogeneous. Another example of a mapping which is additive and \(1\)-homogeneous is the derivative. Consider \(\mathsf{P}(\mathbb{R}),\) the set of polynomial functions in one real variable, which we denote by \(x,\) with real coefficients. That is, an element \(p \in \mathsf{P}(\mathbb{R})\) is a function \[p : \mathbb{R}\to \mathbb{R}, \qquad x \mapsto a_n x^n+a_{n-1}x^{n-1}+\cdots + a_1 x+a_0=\sum_{k=0}^na_kx^k,\] where \(n \in \mathbb{N}\) and the coefficients \(a_k \in \mathbb{R}\) for \(k=0,1,\ldots,n.\) The largest \(m \in \mathbb{N}\cup \{0\}\) such that \(a_m \neq 0\) is called the degree of \(p.\) Notice that we consider polynomials of arbitrary, but finite degree. A power series \(x \mapsto \sum_{k=0}^{\infty} a_k x^k,\) that you encounter in the Analysis module, is not a polynomial, unless only finitely many of its coefficients are different from zero.
Clearly, we can multiply \(p\) with a real number \(s\in \mathbb{R}\) to obtain a new polynomial \(s\cdot_{\mathsf{P}(\mathbb{R})} p\) \[\tag{3.1} s\cdot_{\mathsf{P}(\mathbb{R})} p : \mathbb{R}\to \mathbb{R}, \qquad x \mapsto s\cdot p(x)\] so that \((s\cdot_{\mathsf{P}(\mathbb{R})}p)(x)=\sum_{k=0}^n sa_k x^k\) for all \(x \in \mathbb{R}.\) Here \(s\cdot p(x)\) is the usual multiplication of the real numbers \(s\) and \(p(x).\) If we consider another polynomial \[q : \mathbb{R}\to \mathbb{R}, \qquad x\mapsto \sum_{k=0}^n b_k x^k\] with \(b_k \in \mathbb{R}\) for \(k=0,1,\ldots,n,\) the sum of the polynomials \(p\) and \(q\) is the polynomial \[\tag{3.2} p+_{\mathsf{P}(\mathbb{R})}q : \mathbb{R}\to \mathbb{R}, \qquad x \mapsto p(x)+q(x)\] so that \((p+_{\mathsf{P}(\mathbb{R})}q)(x)=\sum_{k=0}(a_k+b_k)x^k\) for all \(x \in \mathbb{R}.\) Here \(p(x)+q(x)\) is the usual addition of the real numbers \(p(x)\) and \(q(x).\) We will henceforth omit writing \(+_{\mathsf{P}(\mathbb{R})}\) and \(\cdot_{\mathsf{P}(\mathbb{R})}\) and simply write \(+\) and \(\cdot.\)
We may think of the derivative with respect to the variable \(x\) as a mapping \[\frac{\mathrm{d}}{\mathrm{d}x} : \mathsf{P}(\mathbb{R}) \to \mathsf{P}(\mathbb{R}).\] Now recall that the derivative satisfies \[\tag{3.3} \begin{aligned} \frac{\mathrm{d}}{\mathrm{d}x}(p+q)&=\frac{\mathrm{d}}{\mathrm{d}x}(p)+\frac{\mathrm{d}}{\mathrm{d}x}(q) \qquad && (\text{additivity}),\\ \frac{\mathrm{d}}{\mathrm{d}x}(s\cdot p)&=s\cdot \frac{\mathrm{d}}{\mathrm{d}x}(p)\qquad && (\text{$1$-homogeneity}). \end{aligned}\]
Comparing (2.6) with (3.3) we notice that the polynomials \(p,q\) take the role of the vectors \(\vec{x},\vec{y}\) and the derivative takes the role of the mapping \(f_\mathbf{A}.\) This suggests that the mental image of a vector being an arrow in \(\mathbb{K}^n\) is too narrow and that we ought to come up with a generalisation of the space \(\mathbb{K}^n\) whose elements are abstract vectors.
In order to define the notion of a space of abstract vectors, we may ask what key structure the set of (column) vectors \(\mathbb{K}^n\) carries. On \(\mathbb{K}^n,\) we have two fundamental operations, \[\begin{aligned} + &: \mathbb{K}^n \times \mathbb{K}^n \to \mathbb{K}^n& (\vec{x},\vec{y}) &\mapsto \vec{x}+ \vec{y},& &\text{(vector addition),}\\ \cdot &: \mathbb{K}\times \mathbb{K}^n \to \mathbb{K}^n,& (s,\vec{x}) &\mapsto s\cdot\vec{x},& &\text{(scalar multiplication).} \end{aligned}\] A vector space is roughly speaking a set where these two operations are defined and obey the expected properties. More precisely:
A \(\mathbb{K}\)-vector space, or vector space over \(\mathbb{K}\) is a set \(V\) with a distinguished element \(0_V\) (called the zero vector) and two operations \[\begin{aligned} +_V : V \times V \to V& &(v_1,v_2) \mapsto v_1+_Vv_2& &(\text{vector addition}) \end{aligned}\] and \[\begin{aligned} \cdot_V : \mathbb{K}\times V \to V& &(s,v) \mapsto s\cdot_V v& &(\text{scalar multiplication}), \end{aligned}\] so that the following properties hold:
Commutativity of vector addition \[v_1+_Vv_2=v_2+_Vv_1\quad (\text{for all}\; v_1,v_2 \in V);\]
Associativity of vector addition \[v_1+_V(v_2+_Vv_3)=(v_1+_Vv_2)+_Vv_3 \quad (\text{for all}\; v_1,v_2,v_3 \in V);\]
Identity element of vector addition \[\tag{3.4} 0_V+_Vv=v+_V0_V=v\quad (\text{for all}\; v \in V);\]
Identity element of scalar multiplication \[1\cdot_V v=v\quad (\text{for all}\; v \in V);\]
Scalar multiplication by zero \[\tag{3.5} 0\cdot_{V}v=0_V \quad (\text{for all}\; v \in V);\]
Compatibility of scalar multiplication with field multiplication \[(s_1s_2)\cdot_V v=s_1\cdot_V(s_2\cdot_V v) \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V);\]
Distributivity of scalar multiplication with respect to vector addition \[s\cdot_V(v_1+_Vv_2)=s\cdot_Vv_1+_Vs\cdot_V v_2\quad (\text{for all}\; s\in \mathbb{K}, v_1,v_2 \in V);\]
Distributivity of scalar multiplication with respect to field addition \[(s_1+s_2)\cdot_Vv=s_1\cdot_Vv+_Vs_2\cdot_Vv \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V).\] The elements of \(V\) are called vectors.
A field \(\mathbb{K}\) is a \(\mathbb{K}\)-vector space. We may take \(V=\mathbb{K},\) \(0_V=0_{\mathbb{K}}\) and equip \(V\) with addition \(+_V=+_{\mathbb{K}}\) and scalar multiplication \(\cdot_V=\cdot_{\mathbb{K}}.\) Then the properties of a field imply that \(V=\mathbb{K}\) is a \(\mathbb{K}\)-vector space.
\(\mathbf{0}_{m,n}+\mathbf{A}=\mathbf{A}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
\(\mathbf{1}_{m}\mathbf{A}=\mathbf{A}\) and \(\mathbf{A}\mathbf{1}_{n}=\mathbf{A}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
\(\mathbf{0}_{{\tilde{m}},m}\mathbf{A}=\mathbf{0}_{{\tilde{m}},n}\) and \(\mathbf{A}\mathbf{0}_{n,{\tilde{m}}}=\mathbf{0}_{m,{\tilde{m}}}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
\(\mathbf{A}+\mathbf{B}=\mathbf{B}+\mathbf{A}\) and \((\mathbf{A}+\mathbf{B})+\mathbf{C}=\mathbf{A}+(\mathbf{B}+\mathbf{C})\) for all \(\mathbf{A},\mathbf{B},\mathbf{C}\in M_{m,n}(\mathbb{K});\)
\(0\cdot \mathbf{A}=\mathbf{0}_{m,n}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
\((s_1s_2)\mathbf{A}=s_1(s_2 \mathbf{A})\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and all \(s_1,s_2 \in \mathbb{K}\);
\(\mathbf{A}(s\mathbf{B})=s(\mathbf{A}\mathbf{B})=(s\mathbf{A})\mathbf{B}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and all \(\mathbf{B}\in M_{n,{\tilde{m}}}(\mathbb{K})\) and all \(s\in \mathbb{K}\);
\(s(\mathbf{A}+\mathbf{B})=s\mathbf{A}+s\mathbf{B}\) for all \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K})\) and \(s\in \mathbb{K}\);
\((s_1+s_2)\mathbf{A}=s_1\mathbf{A}+s_2\mathbf{A}\) for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and for all \(s_1,s_2 \in \mathbb{K}\);
\((\mathbf{B}+\mathbf{C})\mathbf{A}=\mathbf{B}\mathbf{A}+\mathbf{C}\mathbf{A}\) for all \(\mathbf{B},\mathbf{C}\in M_{{\tilde{m}},m}(\mathbb{K})\) and for all \(\mathbf{A}\in M_{m,n}(\mathbb{K})\);
\(\mathbf{A}(\mathbf{B}+\mathbf{C})=\mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}\) for all \(\mathbf{A}\in M_{{\tilde{m}},m}(\mathbb{K})\) and for all \(\mathbf{B},\mathbf{C}\in M_{m,n}(\mathbb{K}).\)
The set \(\mathsf{P}(\mathbb{R})\) of polynomials in one real variable and with real coefficients is an \(\mathbb{R}\)-vector space, when equipped with addition and scalar multiplication as defined in (3.1) and (3.2) and when the zero vector \(0_{\mathsf{P}(\mathbb{R})}\) is defined to be the zero polynomial \(o : \mathbb{R}\to \mathbb{R},\) that is, the polynomial satisfying \(o(x)=0\) for all \(x \in \mathbb{R}.\)
More generally, functions form a vector space:
We follow the convention of calling a mapping with values in \(\mathbb{K}\) a function. Let \(I\subset \mathbb{R}\) be an interval and let \(o : I \to \mathbb{K}\) denote the zero function defined by \(o(x)=0\) for all \(x \in I.\) We consider \(V=\mathsf{F}(I,\mathbb{K}),\) the set of functions from \(I\) to \(\mathbb{K}\) with zero vector \(0_V=o\) given by the zero function and define addition \(+_V : V \times V \to V\) as in (3.2) and scalar multiplication \(\cdot_V : \mathbb{K}\times V \to V\) as in (3.1). It now is a consequence of the properties of addition and multiplication of scalars that \(\mathsf{F}(I,\mathbb{K})\) is a \(\mathbb{K}\)-vector space. (The reader is invited to check this assertion!)
A mapping \(x : \mathbb{N} \to \mathbb{K}\) from the natural numbers into a field \(\mathbb{K}\) called a sequence in \(\mathbb{K}\) (or simply a sequence, when \(\mathbb{K}\) is clear from the context). It is common to write \(x_n\) instead of \(x(n)\) for \(n \in \mathbb{N}\) and to denote a sequence by \((x_n)_{n \in \mathbb{N}}=(x_1,x_2,x_3,\ldots).\) We write \(\mathbb{K}^{\infty}\) for the set of sequences in \(\mathbb{K}.\) For instance, taking \(\mathbb{K}=\mathbb{R},\) we may consider the sequence \[\left(\frac{1}{n}\right)_{n \in \mathbb{N}}=\left(1,\frac{1}{2},\frac{1}{3},\frac{1}{4},\frac{1}{5},\ldots\right)\] or the sequence \[\left(\sqrt{n}\right)_{n \in \mathbb{N}}=\left(1,\sqrt{2},\sqrt{3},2,\sqrt{5},\ldots\right).\] If we equip \(\mathbb{K}^{\infty}\) with the zero vector given by the zero sequence \((0,0,0,0,0,\ldots),\) addition given by \((x_n)_{n \in \mathbb{N}}+(y_n)_{n\in N}=(x_n+y_n)_{n \in \mathbb{N}}\) and scalar multiplication given by \(s\cdot(x_n)_{n \in \mathbb{N}}=(sx_n)_{n \in \mathbb{N}}\) for \(s\in \mathbb{K},\) then \(\mathbb{K}^{\infty}\) is a \(\mathbb{K}\)-vector space.
A \(\mathbb{K}\)-vector space, or vector space over \(\mathbb{K}\) is a set \(V\) with a distinguished element \(0_V\) (called the zero vector) and two operations \[\begin{aligned} +_V : V \times V \to V& &(v_1,v_2) \mapsto v_1+_Vv_2& &(\text{vector addition}) \end{aligned}\] and \[\begin{aligned} \cdot_V : \mathbb{K}\times V \to V& &(s,v) \mapsto s\cdot_V v& &(\text{scalar multiplication}), \end{aligned}\] so that the following properties hold:
Commutativity of vector addition \[v_1+_Vv_2=v_2+_Vv_1\quad (\text{for all}\; v_1,v_2 \in V);\]
Associativity of vector addition \[v_1+_V(v_2+_Vv_3)=(v_1+_Vv_2)+_Vv_3 \quad (\text{for all}\; v_1,v_2,v_3 \in V);\]
Identity element of vector addition \[\tag{3.4} 0_V+_Vv=v+_V0_V=v\quad (\text{for all}\; v \in V);\]
Identity element of scalar multiplication \[1\cdot_V v=v\quad (\text{for all}\; v \in V);\]
Scalar multiplication by zero \[\tag{3.5} 0\cdot_{V}v=0_V \quad (\text{for all}\; v \in V);\]
Compatibility of scalar multiplication with field multiplication \[(s_1s_2)\cdot_V v=s_1\cdot_V(s_2\cdot_V v) \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V);\]
Distributivity of scalar multiplication with respect to vector addition \[s\cdot_V(v_1+_Vv_2)=s\cdot_Vv_1+_Vs\cdot_V v_2\quad (\text{for all}\; s\in \mathbb{K}, v_1,v_2 \in V);\]
Distributivity of scalar multiplication with respect to field addition \[(s_1+s_2)\cdot_Vv=s_1\cdot_Vv+_Vs_2\cdot_Vv \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V).\] The elements of \(V\) are called vectors.
The notion of a vector space is an example of an abstract space. Later in your studies you will encounter further examples, like topological spaces, metric spaces and manifolds.
Let \(V\) be a \(\mathbb{K}\)-vector space.
For \(v \in V\) we write \(-v=(-1)\cdot_V v\) and for \(v_1,v_2 \in V\) we write \(v_1-v_2=v_1+_V(-v_2).\) In particular, using the properties from Definition 3.1 Definition 3.1 • Vector space ➔we have (check which properties we do use!) \[v-v=v+_V(-v)=v+_V(-1)\cdot_V v=(1-1)\cdot_Vv=0\cdot_Vv=0_V\] For this reason we call \(-v\) the additive inverse of \(v\).A \(\mathbb{K}\)-vector space, or vector space over \(\mathbb{K}\) is a set \(V\) with a distinguished element \(0_V\) (called the zero vector) and two operations \[\begin{aligned} +_V : V \times V \to V& &(v_1,v_2) \mapsto v_1+_Vv_2& &(\text{vector addition}) \end{aligned}\] and \[\begin{aligned} \cdot_V : \mathbb{K}\times V \to V& &(s,v) \mapsto s\cdot_V v& &(\text{scalar multiplication}), \end{aligned}\] so that the following properties hold:
Commutativity of vector addition \[v_1+_Vv_2=v_2+_Vv_1\quad (\text{for all}\; v_1,v_2 \in V);\]
Associativity of vector addition \[v_1+_V(v_2+_Vv_3)=(v_1+_Vv_2)+_Vv_3 \quad (\text{for all}\; v_1,v_2,v_3 \in V);\]
Identity element of vector addition \[\tag{3.4} 0_V+_Vv=v+_V0_V=v\quad (\text{for all}\; v \in V);\]
Identity element of scalar multiplication \[1\cdot_V v=v\quad (\text{for all}\; v \in V);\]
Scalar multiplication by zero \[\tag{3.5} 0\cdot_{V}v=0_V \quad (\text{for all}\; v \in V);\]
Compatibility of scalar multiplication with field multiplication \[(s_1s_2)\cdot_V v=s_1\cdot_V(s_2\cdot_V v) \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V);\]
Distributivity of scalar multiplication with respect to vector addition \[s\cdot_V(v_1+_Vv_2)=s\cdot_Vv_1+_Vs\cdot_V v_2\quad (\text{for all}\; s\in \mathbb{K}, v_1,v_2 \in V);\]
Distributivity of scalar multiplication with respect to field addition \[(s_1+s_2)\cdot_Vv=s_1\cdot_Vv+_Vs_2\cdot_Vv \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V).\] The elements of \(V\) are called vectors.
Again, it is too cumbersome to always write \(+_V,\) for this reason we often write \(v_1+v_2\) instead of \(v_1+_Vv_2.\)
Likewise, we will often write \(s\cdot v\) or \(sv\) instead of \(s\cdot_Vv.\)
It is also customary to write \(0\) instead of \(0_V.\)
Let \(V\) be a \(\mathbb{K}\)-vector space. Then we have:
The zero vector is unique, that is, if \(0_V^{\prime}\) is another vector such that \(0_V^{\prime}+v=v+0_V^{\prime}=v\) for all \(v \in V,\) then \(0_V^{\prime}=0_V.\)
The additive inverse of every \(v \in V\) is unique, that is, if \(w \in V\) satisfies \(v+w=0_V,\) then \(w=-v.\)
For all \(s\in \mathbb{K}\) we have \(s0_V=0_V.\)
For \(s\in \mathbb{K}\) and \(v \in V\) we have \(sv=0_V\) if and only if either \(s=0\) or \(v=0_V.\)
A \(\mathbb{K}\)-vector space, or vector space over \(\mathbb{K}\) is a set \(V\) with a distinguished element \(0_V\) (called the zero vector) and two operations \[\begin{aligned} +_V : V \times V \to V& &(v_1,v_2) \mapsto v_1+_Vv_2& &(\text{vector addition}) \end{aligned}\] and \[\begin{aligned} \cdot_V : \mathbb{K}\times V \to V& &(s,v) \mapsto s\cdot_V v& &(\text{scalar multiplication}), \end{aligned}\] so that the following properties hold:
Commutativity of vector addition \[v_1+_Vv_2=v_2+_Vv_1\quad (\text{for all}\; v_1,v_2 \in V);\]
Associativity of vector addition \[v_1+_V(v_2+_Vv_3)=(v_1+_Vv_2)+_Vv_3 \quad (\text{for all}\; v_1,v_2,v_3 \in V);\]
Identity element of vector addition \[\tag{3.4} 0_V+_Vv=v+_V0_V=v\quad (\text{for all}\; v \in V);\]
Identity element of scalar multiplication \[1\cdot_V v=v\quad (\text{for all}\; v \in V);\]
Scalar multiplication by zero \[\tag{3.5} 0\cdot_{V}v=0_V \quad (\text{for all}\; v \in V);\]
Compatibility of scalar multiplication with field multiplication \[(s_1s_2)\cdot_V v=s_1\cdot_V(s_2\cdot_V v) \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V);\]
Distributivity of scalar multiplication with respect to vector addition \[s\cdot_V(v_1+_Vv_2)=s\cdot_Vv_1+_Vs\cdot_V v_2\quad (\text{for all}\; s\in \mathbb{K}, v_1,v_2 \in V);\]
Distributivity of scalar multiplication with respect to field addition \[(s_1+s_2)\cdot_Vv=s_1\cdot_Vv+_Vs_2\cdot_Vv \quad (\text{for all}\; s_1,s_2 \in \mathbb{K}, v \in V).\] The elements of \(V\) are called vectors.
We have \(0_V^{\prime}=0_V^{\prime}+0_V=0_V.\)
Since \(v+w=0_V,\) adding \(-v,\) we obtain \((-v)+v+w=0_V+(-v)=-v=w.\)
We compute \(s0_V=s(0_V+0_V)=s0_V+s0_V\) so that \(s0_V-s0_V=0_V=s0_V.\)
\(\Leftarrow\) If \(v=0_V,\) then \(sv=0_V\) by (iii). If \(s=0,\) then \(sv=0_V\) by (3.5).
\(\Rightarrow\) Let \(s\in \mathbb{K}\) and \(v \in V\) such that \(sv=0_V.\) It is sufficient to show that if \(s\neq 0,\) then \(v=0_V.\) Since \(s\neq 0\) we can multiply \(sv=0_V\) with \(1/s\) so that \[\frac{1}{s}\left(sv\right)=\left(\frac{1}{s}s\right)v=v=\frac{1}{s}0_V=0_V.\]
3.2 Linear maps
Throughout this section, \(V,W\) denote \(\mathbb{K}\)-vector spaces.
Previously we saw that the mapping \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) associated to a matrix \(M_{m,n}(\mathbb{K})\) is additive and \(1\)-homogeneous. These notions also make sense for mappings between vector spaces.
A mapping \(f : V \to W\) is called linear if it is additive and \(1\)-homogeneous, that is, if it satisfies \[\tag{3.6} f(s_1v_1+s_2v_2)=s_1 f(v_1)+s_2 f(v_2)\] for all \(s_1,s_2 \in \mathbb{K}\) and for all \(v_1,v_2 \in V.\)
The reader is invited to check that the condition (3.6) is indeed equivalent to \(f\) being additive and \(1\)-homogeneous.
For all \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) the mapping \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) satisfies the following two very important properties \[\tag{2.6} \begin{aligned} f_\mathbf{A}(\vec{x}+\vec{y})&=f_\mathbf{A}(\vec{x})+f_\mathbf{A}(\vec{y}),\qquad &&(\text{additivity}),\\ f_\mathbf{A}(s\cdot \vec{x})&=s\cdot f_\mathbf{A}(\vec{x}),\qquad &&(\text{$1$-homogeneity}), \end{aligned}\] for all \(\vec{x},\vec{y} \in \mathbb{K}^{n}\) and \(s\in \mathbb{K}.\) Indeed, using Proposition 2.15 we have \[f_\mathbf{A}(\vec{x}+\vec{y})=\mathbf{A}(\vec{x}+\vec{y})=\mathbf{A}\vec{x}+\mathbf{A}\vec{y}=f_\mathbf{A}(\vec{x})+f_\mathbf{A}(\vec{y})\] and \[f_\mathbf{A}(s\cdot x)=\mathbf{A}(sx)=s\cdot (\mathbf{A}x)=s\cdot f_\mathbf{A}(x).\] Mappings satisfying (2.6) are called linear.
A mapping \(g : \mathbb{K}^m \to \mathbb{K}^n\) is linear if and only if there exists a matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) so that \(g=f_\mathbf{B}.\)
The derivative \(\frac{\mathrm{d}}{\mathrm{d}x} : \mathsf{P}(\mathbb{R}) \to \mathsf{P}(\mathbb{R})\) is linear, see (3.3).
The matrix transpose is a map \(M_{m,n}(\mathbb{K}) \to M_{n,m}(\mathbb{K})\) and this map is linear. Indeed, for all \(s,t\in \mathbb{K}\) and \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K}),\) we have \[\begin{gathered} (s\mathbf{A}+t\mathbf{B})^T=(sA_{ji}+tB_{ji})_{1\leqslant j\leqslant n, 1\leqslant i \leqslant m}=s(A_{ji})_{1\leqslant j\leqslant n, 1\leqslant i \leqslant m}+\\t(B_{ji})_{1\leqslant j\leqslant n, 1\leqslant i \leqslant m}=s\mathbf{A}^T+t\mathbf{B}^T. \end{gathered}\]
If \(\mathcal{X}\) is set, the mapping \(\mathrm{Id}_\mathcal{X} : \mathcal{X} \to \mathcal{X}\) which returns its input is called the identity mapping. Let \(V\) be a \(\mathbb{K}\)-vector space and \(\mathrm{Id}_V : V \to V\) the identity mapping so that \(\mathrm{Id}_V(v)=v\) for all \(v \in V.\) The identity mapping is linear since for all \(s_1,s_2 \in \mathbb{K}\) and \(v_1,v_2 \in V\) we have \[\mathrm{Id}_V(s_1v_1+s_2v_2)=s_1v_1+s_2v_2=s_1\mathrm{Id}_V(v_1)+s_2\mathrm{Id}_V(v_2).\]
A necessary condition for linearity of a mapping is that it maps the zero vector onto the zero vector:
Let \(f : V \to W\) be a linear map, then \(f(0_V)=0_W.\)
Proof. Since \(f : V \to W\) is linear, we have \[f(0_V)=f(0\cdot 0_V)=0\cdot f(0_V)=0_W.\]
Let \(V_1,V_2,V_3\) be \(\mathbb{K}\)-vector spaces and \(f : V_1 \to V_2\) and \(g: V_2 \to V_3\) be linear maps. Then the composition \(g \circ f : V_1 \to V_3\) is linear. Furthermore, if \(f : V_1 \to V_2\) is bijective, then the inverse function \(f^{-1} : V_2 \to V_1\) (satisfying \(f^{-1}\circ f=f\circ f^{-1}=\mathrm{Id}_{V_1})\) is linear.
Proof. Let \(s,t\in \mathbb{K}\) and \(v,w \in V_1.\) Then \[\begin{aligned} \left(g\circ f\right)(sv+tw)&=g(f(sv+tw))=g(sf(v)+tf(w))\\&=sg(f(v))+tg(f(w))=s(g\circ f)(v)+t(g\circ f)(w), \end{aligned}\] where we first use the linearity of \(f\) and then the linearity of \(g.\) It follows that \(g\circ f\) is linear.
Now suppose \(f : V_1 \to V_2\) is bijective with inverse function \(f^{-1} : V_2 \to V_1.\) Let \(s,t\in \mathbb{K}\) and \(v,w \in V_2.\) Since \(f\) is bijective there exist unique vectors \(v^{\prime},w^{\prime} \in V_1\) with \(f(v^{\prime})=v\) and \(f(w^{\prime})=w.\) Hence we can write \[\begin{aligned} f^{-1}(sv+tw)&=f^{-1}(sf(v^{\prime})+tf(w^{\prime}))=f^{-1}\left(f(sv^{\prime}+tw^{\prime})\right)\\ &=(f^{-1}\circ f)(sv^{\prime}+tw^{\prime})=sv^{\prime}+tw^{\prime}, \end{aligned}\] where we use the linearity of \(f.\) Since we also have \(v^{\prime}=f^{-1}(v)\) and \(w^{\prime}=f^{-1}(w),\) we obtain \[f^{-1}(sv+tw)=sf^{-1}(v)+tf^{-1}(w),\] thus showing that \(f^{-1} : V_2 \to V_1\) is linear.
We also have:
Let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) the associated linear map. Then \(f_\mathbf{A}\) is bijective if and only if there exists a matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) satisfying \(\mathbf{B}\mathbf{A}=\mathbf{1}_{n}\) and \(\mathbf{A}\mathbf{B}=\mathbf{1}_{m}.\) In this case, the matrix \(\mathbf{B}\) is unique and will be denoted by \(\mathbf{A}^{-1}.\) We refer to \(\mathbf{A}^{-1}\) as the inverse of \(\mathbf{A}\) and call \(\mathbf{A}\) invertible.
Let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) the associated linear map. Then \(f_\mathbf{A}\) is bijective if and only if there exists a matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) satisfying \(\mathbf{B}\mathbf{A}=\mathbf{1}_{n}\) and \(\mathbf{A}\mathbf{B}=\mathbf{1}_{m}.\) In this case, the matrix \(\mathbf{B}\) is unique and will be denoted by \(\mathbf{A}^{-1}.\) We refer to \(\mathbf{A}^{-1}\) as the inverse of \(\mathbf{A}\) and call \(\mathbf{A}\) invertible.
A mapping \(g : \mathbb{K}^m \to \mathbb{K}^n\) is linear if and only if there exists a matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) so that \(g=f_\mathbf{B}.\)
For all \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) the mapping \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) satisfies the following two very important properties \[\tag{2.6} \begin{aligned} f_\mathbf{A}(\vec{x}+\vec{y})&=f_\mathbf{A}(\vec{x})+f_\mathbf{A}(\vec{y}),\qquad &&(\text{additivity}),\\ f_\mathbf{A}(s\cdot \vec{x})&=s\cdot f_\mathbf{A}(\vec{x}),\qquad &&(\text{$1$-homogeneity}), \end{aligned}\] for all \(\vec{x},\vec{y} \in \mathbb{K}^{n}\) and \(s\in \mathbb{K}.\) Indeed, using Proposition 2.15 we have \[f_\mathbf{A}(\vec{x}+\vec{y})=\mathbf{A}(\vec{x}+\vec{y})=\mathbf{A}\vec{x}+\mathbf{A}\vec{y}=f_\mathbf{A}(\vec{x})+f_\mathbf{A}(\vec{y})\] and \[f_\mathbf{A}(s\cdot x)=\mathbf{A}(sx)=s\cdot (\mathbf{A}x)=s\cdot f_\mathbf{A}(x).\] Mappings satisfying (2.6) are called linear.
Let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) the associated linear map. Then \(f_\mathbf{A}\) is bijective if and only if there exists a matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) satisfying \(\mathbf{B}\mathbf{A}=\mathbf{1}_{n}\) and \(\mathbf{A}\mathbf{B}=\mathbf{1}_{m}.\) In this case, the matrix \(\mathbf{B}\) is unique and will be denoted by \(\mathbf{A}^{-1}.\) We refer to \(\mathbf{A}^{-1}\) as the inverse of \(\mathbf{A}\) and call \(\mathbf{A}\) invertible.
Let \(V_1,V_2,V_3\) be \(\mathbb{K}\)-vector spaces and \(f : V_1 \to V_2\) and \(g: V_2 \to V_3\) be linear maps. Then the composition \(g \circ f : V_1 \to V_3\) is linear. Furthermore, if \(f : V_1 \to V_2\) is bijective, then the inverse function \(f^{-1} : V_2 \to V_1\) (satisfying \(f^{-1}\circ f=f\circ f^{-1}=\mathrm{Id}_{V_1})\) is linear.
A mapping \(g : \mathbb{K}^m \to \mathbb{K}^n\) is linear if and only if there exists a matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) so that \(g=f_\mathbf{B}.\)
Let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\in M_{n,{\tilde{m}}}(\mathbb{K})\) so that \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) and \(f_\mathbf{B}: \mathbb{K}^{{\tilde{m}}} \to \mathbb{K}^n\) and \(f_{\mathbf{A}\mathbf{B}} : \mathbb{K}^{{\tilde{m}}} \to \mathbb{K}^{m}.\) Then \(f_{\mathbf{A}\mathbf{B}}=f_\mathbf{A}\circ f_\mathbf{B}.\)
Let \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K}).\) Then \(f_\mathbf{A}=f_\mathbf{B}\) if and only if \(\mathbf{A}=\mathbf{B}.\)
Conversely, let \(\mathbf{A}\in M_{m,n}(\mathbb{K})\) and suppose the matrix \(\mathbf{B}\in M_{n,m}(\mathbb{K})\) satisfies \(\mathbf{A}\mathbf{B}=\mathbf{1}_{m}\) and \(\mathbf{B}\mathbf{A}=\mathbf{1}_{n}.\) Then, as before, we have \[f_{\mathbf{A}\mathbf{B}}=f_{\mathbf{1}_{m}}=\mathrm{Id}_{\mathbb{K}^m}=f_\mathbf{A}\circ f_\mathbf{B}\quad \text{and} \quad f_{\mathbf{B}\mathbf{A}}=f_{\mathbf{1}_{n}}=\mathrm{Id}_{\mathbb{K}^n}=f_\mathbf{B}\circ f_\mathbf{A}\] showing that \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^m\) is bijective with inverse function \(f_\mathbf{B}: \mathbb{K}^m \to \mathbb{K}^n.\)
Finally, to verify the uniqueness of \(\mathbf{B},\) we assume that there exists \({\mathbf{B}^{\prime}} \in M_{n,m}(\mathbb{K})\) with \(\mathbf{A}{\mathbf{B}^{\prime}}=\mathbf{1}_{m}\) and \({\mathbf{B}^{\prime}}\mathbf{A}=\mathbf{1}_{n}.\) Then \[{\mathbf{B}^{\prime}}={\mathbf{B}^{\prime}}\mathbf{1}_{m}={\mathbf{B}^{\prime}}\mathbf{A}\mathbf{B}=({\mathbf{B}^{\prime}}\mathbf{A})\mathbf{B}=\mathbf{1}_{n}\mathbf{B}=\mathbf{B},\] showing that \({\mathbf{B}^{\prime}}=\mathbf{B},\) hence \(\mathbf{B}\) is unique.
Exercises
Let \(f : V \to W\) be a linear map, \(k\geqslant 2\) a natural number and \(s_1,\ldots,s_k \in \mathbb{K}\) and \(v_1,\ldots,v_k \in V.\) Then \(f : V \to W\) satisfies \[f(s_1v_1+\cdots +s_kv_k)=s_1f(v_1)+\cdots+s_kf(v_k)\] or written with the sum symbol \[\boxed{f\left(\sum_{i=1}^ks_iv_i\right)=\sum_{i=1}^ks_if(v_i).}\] This identity is used frequently in Linear Algebra, so make sure you understand it.
Solution
We will show the claim by induction on \(k.\) If \(k=2,\) the claim follows immediately by linearity of \(f\) and the statement is anchored.
Inductive step: Suppose the statement is true for \(k\geqslant 2\) and we will argue that it will be true for \(k+1.\) Let \(s_1,\ldots,s_{k+1}\in \mathbb{K}\) and \(v_1,\ldots,v_{k+1}\in V.\) Writing \(u = \sum_{i=1}^{k}s_iv_i,\) we have \[\begin{aligned} f\bigg(\sum_{i=1}^{k+1}s_iv_i\bigg) & = f(u+s_{k+1}v_{k+1}) \\ & = f(u) + s_{k+1}f(v_{k+1}) \\ & = \sum_{i=1}^{k}s_if(v_i) + s_{k+1}f(v_{k+1})\\ & = \sum_{i=1}^{k+1}s_if(v_i), \end{aligned}\] where the second equality follows by linearity of \(f\) and the third one by the induction hypothesis.
Let \(a,b,c,d \in \mathbb{K}\) and \[\mathbf{A}=\begin{pmatrix} a & b \\ c & d \end{pmatrix} \in M_{2,2}(\mathbb{K}).\] Show that \(\mathbf{A}\) has an inverse \(\mathbf{A}^{-1}\) if and only if \(ad-bc\neq 0.\) For \(ad-bc\neq 0,\) compute the inverse \(\mathbf{A}^{-1}.\)
Solution
If the matrix \(\mathbf{A}\) had an inverse given by \[\mathbf{A}^{-1}=\begin{pmatrix} u & v \\ w & x \end{pmatrix},\] then \(\mathbf{A}\mathbf{A}^{-1}=\mathbf{1}_{2},\) which amounts to the equations \[\begin{aligned} au+bw & = 1, \\ av+bx & = 0,\\ cu+dw & = 0,\\ cv+dx &= 1. \end{aligned}\] Multiplying the first equation by \(c\) and using the third equation leads to \(c=-(ad-bc)w.\) Similarly we find \[\begin{aligned} a& =(ad-bc)x,\\ b & =-(ad-bc)v,\\ c & =-(ad-bc)w,\\ d & =(ad-bc)u. \end{aligned}\] These equations imply that \[\begin{pmatrix}a & b \\ c & d\end{pmatrix}\begin{pmatrix}d & -b \\ -c & a\end{pmatrix} = \begin{pmatrix}ad-bc & 0 \\ 0 & ad-bc\end{pmatrix},\] which shows that \(\mathbf{A}\) is invertible if and only if \(ad-bc\ne 0\) and in this case, the inverse is given by \[\mathbf{A}^{-1}= \frac{1}{ad-bc}\begin{pmatrix}d & -b \\ -c & a\end{pmatrix}.\]