6 Endomorphisms
6.1 Sums, direct sums and complements
In this chapter we study linear mappings from a vector space to itself.
A linear map \(g : V \to V\) from a \(\mathbb{K}\)-vector space \(V\) to itself is called an endomorphism. An endomorphism that is also an isomorphism is called an automorphism.
Before we develop the theory of endomorphisms, we introduce some notions for subspaces.
Let \(V\) be a \(\mathbb{K}\)-vector space, \(n \in \mathbb{N}\) and \(U_1,\ldots,U_n\) be vector subspaces of \(V.\) The set \[\sum_{i=1}^n U_i=U_1+U_2+\cdots +U_n=\{v \in V \mid v=u_1+u_2+\cdots + u_n \text{ for } u_i \in U_i\}\] is called the sum of the subspaces \(U_i\).
Recall that by Proposition 3.27, the intersection of two subspaces is again a subspace, whereas the union of two subspaces fails to be a subspace in general. However, subspaces do behave nicely with regards to sums:
The sum of the subspaces \(U_i\subset V,\) \(i=1,\ldots,n\) is again a vector subspace.
Proof. The sum \(\sum_{i=1}^n U_i\) is non-empty, since it contains the zero vector \(0_V.\) Let \(v\) and \(v^{\prime} \in \sum_{i=1}^n U_i\) so that \[v=v_1+v_2+\cdots +v_n \qquad \text{and} \qquad v^{\prime}=v^{\prime}_1+v^{\prime}_2+\cdots+v^{\prime}_n\] for vectors \(v_i,v^{\prime}_i \in U_i,\) \(i=1,\ldots,n.\) Each \(U_i\) is a vector subspace of \(V.\) Therefore, for all scalars \(s,t\in \mathbb{K},\) the vector \(sv_i+tv^{\prime}_i\) is an element of \(U_i,\) \(i=1,\ldots,n.\) Thus \[sv+tv^{\prime}=sv_1+tv^{\prime}_1+\cdots+sv_n+tv^{\prime}_n\] is an element of \(U_1+\cdots +U_n.\) By Definition 3.21, it follows that \(U_1+\cdots +U_n\) is a vector subspace of \(V.\)
Notice that \(U_1+\cdots+ U_n\) is the smallest vector subspace of \(V\) containing all vector subspaces \(U_i,\) \(i=1,\ldots,n.\)
If each vector in the sum is in a unique way the sum of vectors from the subspaces we say the subspaces are in direct sum:
Let \(V\) be a \(\mathbb{K}\)-vector space, \(n \in \mathbb{N}\) and \(U_1,\ldots,U_n\) be vector subspaces of \(V.\) The subspaces \(U_1,\ldots,U_n\) are said to be in direct sum if each vector \(w \in W=U_1+\cdots+U_n\) is in a unique way the sum of vectors \(v_i \in U_i\) for \(1\leqslant i\leqslant n.\) That is, if \(w=v_1+v_2+\cdots+v_n=v^{\prime}_1+v^{\prime}_2+\cdots+v^{\prime}_n\) for vectors \(v_i,v^{\prime}_i \in U_i,\) then \(v_i=v^{\prime}_i\) for all \(1\leqslant i \leqslant n.\) We write \[\bigoplus_{i=1}^n U_i\] in case the subspaces \(U_1,\ldots,U_n\) are in direct sum.
Let \(n \in \mathbb{N}\) and \(V=\mathbb{K}^n\) as well as \(U_i=\operatorname{span}\{\vec{e}_i\},\) where \(\{\vec{e}_1,\ldots,\vec{e}_n\}\) denotes the standard basis of \(\mathbb{K}^n.\) Then \(\mathbb{K}^n=\bigoplus_{i=1}^n U_i.\)
Two subspaces \(U_1,U_2\) of \(V\) are in direct sum if and only if \(U_1\cap U_2=\{0_V\}.\) Indeed, suppose \(U_1\cap U_2=\{0_V\}\) and consider \(w=v_1+v_2=v_1^{\prime}+v_2^{\prime}\) with \(v_i,v_i^{\prime} \in U_i\) for \(i=1,2.\) We then have \(v_1-v_1^{\prime}=v_2^{\prime}-v_2 \in U_2,\) since \(U_2\) is a subspace. Since \(U_1\) is a subspace as well, we also have \(v_1-v_1^{\prime} \in U_1.\) Since \(v_1-v_1^{\prime}\) lies both in \(U_1\) and \(U_2,\) we must have \(v_1-v_1^{\prime}=0_V=v_2^{\prime}-v_2.\) Conversely, suppose \(U_1,U_2\) are in direct sum and let \(w \in (U_1\cap U_2).\) We can write \(w=w+0_V=0_V+w,\) since \(w \in U_1\) and \(w \in U_2.\) Since \(U_1,U_2\) are in direct sum, we must have \(w=0_V,\) hence \(U_1\cap U_2=\{0_V\}.\)
Observe that if the subspaces \(U_1,\ldots,U_n\) are in direct sum and \(v_i \in U_i\) with \(v_i \neq 0_V\) for \(1\leqslant i\leqslant n,\) then the vectors \(\{v_1,\ldots,v_n\}\) are linearly independent. Indeed, if \(s_1,\ldots,s_n\) are scalars such that \[s_1v_1+s_2v_2+\cdots+s_n v_n=0_V=0_V+0_V+\cdots+0_V,\] then \(s_iv_i=0_V\) for all \(1\leqslant i\leqslant n.\) By assumption \(v_i\neq 0_V\) and hence \(s_i=0\) for all \(1\leqslant i\leqslant n.\)
Let \(n \in \mathbb{N},\) \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(U_1,\ldots,U_n\) be subspaces of \(V.\) Let \(\mathbf{b}_i\) be an ordered basis of \(U_i\) for \(1\leqslant i\leqslant n.\) Then we have:
The tuple of vectors obtained by listing all the vectors of the bases \(\mathbf{b}_i\) is a basis of \(V\) if and only if \(V=\bigoplus_{i=1}^n U_i.\)
\(\dim(U_1+\cdots+U_n)\leqslant \dim(U_1)+\cdots+\dim (U_n)\) with equality if and only if the subspaces \(U_1,\ldots,U_n\) are in direct sum.
Proof. Part of an exercise.
Let \(V\) be a \(\mathbb{K}\)-vector space and \(U\subset V\) a subspace. A subspace \(U^{\prime}\) of \(V\) such that \(V=U\oplus U^{\prime}\) is called a complement to \(U\).
Notice that a complement need not be unique. Consider \(V=\mathbb{R}^2\) and \(U=\operatorname{span}\{\vec{e}_1\}.\) Let \(\vec{v} \in V.\) Then the subspace \(U^{\prime}=\operatorname{span}\{\vec{v}\}\) is a complement to \(U,\) provided \(\vec{e}_1,\vec{v}\) are linearly independent.
Let \(U\) be a subspace of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) Then there exists a subspace \(U^{\prime}\) so that \(V=U\oplus U^{\prime}.\)
Proof. Suppose \((v_1,\ldots,v_m)\) is an ordered basis of \(U.\) By Theorem 3.64, there exists a basis \(\{v_1,\ldots,v_m,v_{m+1},\ldots,v_n\}\) of \(V.\) Defining \(U^{\prime}=\operatorname{span}\{v_{m+1},\ldots,v_n\},\) Proposition 6.8 implies the claim.
The dimension of a sum of two subspaces equals the sum of the dimensions of the subspaces minus the dimension of the intersection:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(U_1,U_2\) subspaces of \(V.\) Then we have \[\dim(U_1+U_2)=\dim(U_1)+\dim(U_2)-\dim(U_1\cap U_2).\]
Proof. Let \(r=\dim(U_1\cap U_2)\) and let \(\{u_1,\ldots,u_r\}\) be a basis of \(U_1\cap U_2.\) These vectors are linearly independent and elements of \(U_1,\) hence by Theorem 3.64, there exist vectors \(v_1,\ldots,v_{m-r}\) so that \(\mathcal{S}_1=\{u_1,\ldots,u_r,v_1,\ldots,v_{m-r}\}\) is a basis of \(U_1.\) Likewise there exist vectors \(w_1,\ldots,w_{n-r}\) such that \(\mathcal{S}_2=\{u_1,\ldots,u_r,w_1,\ldots,w_{n-r}\}\) is a basis of \(U_2.\) Here \(m=\dim U_1\) and \(n=\dim U_2.\)
Now consider the set \(\mathcal{S}=\{u_1,\ldots,u_r,v_1,\ldots,v_{m-r},w_1,\ldots,w_{n-r}\}\) consisting of \(r+m-r+n-r=n+m-r\) vectors. If this set is a basis of \(U_1+U_2,\) then the claim follows, since then \(\dim(U_1+U_2)=n+m-r=\dim(U_1)+\dim(U_2)-\dim(U_1\cap U_2).\)
We first show that \(\mathcal{S}\) generates \(U_1+U_2.\) Let \(y \in U_1+U_2\) so that \(y=x_1+x_2\) for vectors \(x_1 \in U_1\) and \(x_2 \in U_2.\) Since \(\mathcal{S}_1\) is a basis of \(U_1,\) we can write \(x_1\) as a linear combination of elements of \(\mathcal{S}_1.\) Likewise we can write \(x_2\) as a linear combination of elements of \(\mathcal{S}_2.\) It follows that \(\mathcal{S}\) generates \(U_1+U_2.\)
We need to show that \(\mathcal{S}\) is linearly independent. So suppose we have scalars \(s_1,\ldots,s_r,\) \(t_1,\ldots,t_{m-r},\) and \(r_{1},\ldots,r_{n-r},\) so that \[\underbrace{s_1u_1+\cdots+s_r u_r}_{=u} +\underbrace{t_1v_1+\cdots+t_{m-r}v_{m-r}}_{=v}+\underbrace{r_1w_1+\cdots+r_{n-r}w_{n-r}}_{=w}=0_V.\] Equivalently, \(w=-u-v\) so that \(w \in U_1.\) Since \(w\) is a linear combination of elements of \(\mathcal{S}_2,\) we also have \(w \in U_2.\) Therefore, \(w \in U_1\cap U_2\) and there exist scalars \(\hat{s}_1,\ldots,\hat{s}_r\) such that \[w=\underbrace{\hat{s}_1u_1+\cdots+\hat{s}_r u_r}_{=\hat{u}}\] This is equivalent to \(w-\hat{u}=0_V,\) or written out \[r_1w_1+\cdots+r_{n-r}w_{n-r}-\hat{s}_1u_1-\cdots-\hat{s}_r u_r=0_V.\] Since the vectors \(\{u_1,\ldots,u_r,w_1,\ldots,w_{n-r}\}\) are linearly independent, we conclude that \(r_1=\cdots=r_{n-r}=\hat{s}_1=\cdots=\hat{s}_r=0.\) It follows that \(w=0_V\) and hence \(u+v=0_V.\) Again, since \(\{u_1,\ldots,u_r,v_1,\ldots,v_{m-r}\}\) are linearly independent, we conclude that \(s_1=\cdots=s_r=t_1=\cdots=t_{m-r}=0\) and we are done.
6.2 Invariants of endomorphisms
Let \(V\) be a finite dimensional vector space equipped with an ordered basis \(\mathbf{b}\) and \(g : V \to V\) an endomorphism. Recall from Theorem 3.119 that if we consider another ordered basis \(\mathbf{b}^{\prime}\) of \(V,\) then \[\mathbf{M}(g,\mathbf{b}^{\prime},\mathbf{b}^{\prime})=\mathbf{C}\,\mathbf{M}(g,\mathbf{b},\mathbf{b})\,\mathbf{C}^{-1},\] where we write \(\mathbf{C}=\mathbf{C}(\mathbf{b},\mathbf{b}^{\prime})\) for the change of basis matrix. This motivates the following definition:
Let \(n \in \mathbb{N}\) and \(\mathbf{A},\mathbf{A}^{\prime} \in M_{n,n}(\mathbb{K}).\) The matrices \(\mathbf{A}\) and \(\mathbf{A}^{\prime}\) are called similar or conjugate over \(\mathbb{K}\) if there exists an invertible matrix \(\mathbf{C}\in M_{n,n}(\mathbb{K})\) such that \[\mathbf{A}^{\prime} =\mathbf{C}\mathbf{A}\mathbf{C}^{-1}.\]
Similarity of matrices over \(\mathbb{K}\) is an equivalence relation:
Let \(n \in \mathbb{N}\) and \(\mathbf{A},\mathbf{B},\mathbf{X}\in M_{n,n}(\mathbb{K}).\) Then we have
\(\mathbf{A}\) is similar to itself;
\(\mathbf{A}\) is similar to \(\mathbf{B}\) then \(\mathbf{B}\) is similar to \(\mathbf{A}\);
If \(\mathbf{A}\) is similar to \(\mathbf{B}\) and \(\mathbf{B}\) is similar to \(\mathbf{X},\) then \(\mathbf{A}\) is also similar to \(\mathbf{X}.\)
Proof. (i) We take \(\mathbf{C}=\mathbf{1}_{n}.\)
(ii) Suppose \(\mathbf{A}\) is similar to \(\mathbf{B}\) so that \(\mathbf{B}=\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\) for some invertible matrix \(\mathbf{C}\in M_{n,n}(\mathbb{K}).\) Multiplying with \(\mathbf{C}^{-1}\) from the left and \(\mathbf{C}\) from the right, we get \[\mathbf{C}^{-1}\mathbf{B}\mathbf{C}=\mathbf{C}^{-1}\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\mathbf{C}=\mathbf{A},\] so that the similarity follows for the choice \(\hat{\mathbf{C}}=\mathbf{C}^{-1}.\)
(iii) We have \(\mathbf{B}=\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\) and \(\mathbf{X}=\mathbf{D}\mathbf{B}\mathbf{D}^{-1}\) for invertible matrices \(\mathbf{C},\mathbf{D}.\) Then we get \[\mathbf{X}=\mathbf{D}\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\mathbf{D}^{-1},\] so that the similarity follows for the choice \(\hat{\mathbf{C}}=\mathbf{D}\mathbf{C}.\)
Because of (ii) in particular, one can say that two matrices \(\mathbf{A}\) and \(\mathbf{B}\) are similar without ambiguity.
Theorem 3.119 shows that \(\mathbf{A}\) and \(\mathbf{B}\) are similar if and only if there exists an endomorphism \(g\) of \(\mathbb{K}^n\) such that \(\mathbf{A}\) and \(\mathbf{B}\) represent \(g\) with respect to two ordered bases of \(\mathbb{K}^n.\)
One might wonder whether there exist functions \(f : M_{n,n}(\mathbb{K})\to \mathbb{K}\) which are invariant under conjugation, that is, \(f\) satisfies \(f(\mathbf{C}\mathbf{A}\mathbf{C}^{-1})=f(\mathbf{A})\) for all \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) and all invertible matrices \(\mathbf{C}\in M_{n,n}(\mathbb{K}).\) We have already seen an example of such a function, namely the determinant. Indeed using the product rule Proposition 5.21 and Corollary 5.22, we compute \[\tag{6.1} \begin{aligned} \det \left(\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\right)&=\det(\mathbf{C}\mathbf{A})\det\left(\mathbf{C}^{-1}\right)=\det(\mathbf{C})\det(\mathbf{A})\det\left(\mathbf{C}^{-1}\right)\\ &=\det(\mathbf{A}). \end{aligned}\] Because of this fact, the following definition makes sense:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. We define \[\det(g) = \det\left(\mathbf{M}(g,\mathbf{b},\mathbf{b})\right)\] where \(\mathbf{b}\) is any ordered basis of \(V.\) By Theorem 3.119 and (6.1), the scalar \(\det(g)\) is independent of the chosen ordered basis.
Another example of a scalar that we can associate to an endomorphism is the so-called trace. Like for the determinant, we first define the trace for matrices. Luckily, the trace is a lot simpler to define:
Let \(n \in \mathbb{N}\) and \(\mathbf{A}\in M_{n,n}(\mathbb{K}).\) The sum \(\sum_{i=1}^n [\mathbf{A}]_{ii}\) of its diagonal entries is called the trace of \(\mathbf{A}\) and denoted by \(\operatorname{Tr}(\mathbf{A})\) or \(\operatorname{Tr}\mathbf{A}.\)
For all \(n \in \mathbb{N}\) we have \(\operatorname{Tr}(\mathbf{1}_{n})=n.\) For \[\mathbf{A}=\begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 3 \end{pmatrix}\] we have \(\operatorname{Tr}(\mathbf{A})=2+2+3=7.\)
The trace of a product of square matrices is independent of the order of multiplication:
Let \(n \in \mathbb{N}\) and \(\mathbf{A},\mathbf{B}\in M_{n,n}(\mathbb{K}).\) Then we have \[\operatorname{Tr}(\mathbf{A}\mathbf{B})=\operatorname{Tr}(\mathbf{B}\mathbf{A}).\]
Proof. Let \(\mathbf{A}=(A_{ij})_{1\leqslant i,j\leqslant n}\) and \(\mathbf{B}=(B_{ij})_{1\leqslant i,j\leqslant n}.\) Then \[[\mathbf{A}\mathbf{B}]_{ij}=\sum_{k=1}^n A_{ik}B_{kj} \qquad \text{and}\qquad [\mathbf{B}\mathbf{A}]_{kj}=\sum_{i=1}^n B_{ki}A_{ij},\] so that \[\operatorname{Tr}(\mathbf{A}\mathbf{B})=\sum_{i=1}^n\sum_{k=1}^n A_{ik}B_{ki}=\sum_{k=1}^n\sum_{i=1}^n B_{ki}A_{ik}=\operatorname{Tr}(\mathbf{B}\mathbf{A}).\]
Using the previous proposition, we obtain \[\tag{6.2} \operatorname{Tr}\left(\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\right)=\operatorname{Tr}\left(\mathbf{A}\mathbf{C}^{-1}\mathbf{C}\right)=\operatorname{Tr}(\mathbf{A}).\] As for the determinant, the following definition thus makes sense:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. We define \[\operatorname{Tr}(g) = \operatorname{Tr}\left(\mathbf{M}(g,\mathbf{b},\mathbf{b})\right)\] where \(\mathbf{b}\) is any ordered basis of \(V.\) By Theorem 3.119 and (6.2), the scalar \(\operatorname{Tr}(g)\) is independent of the chosen ordered basis.
The trace and determinant of endomorphisms behave nicely with respect to composition of maps:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space. Then, for all endomorphisms \(f,g :V \to V\) we have
\(\operatorname{Tr}(f\circ g)=\operatorname{Tr}(g\circ f)\);
\(\det(f\circ g)=\det(f)\det(g).\)
Proof. (i) Fix an ordered basis \(\mathbf{b}\) of \(V.\) Then, using Corollary 3.115 and Proposition 6.19, we obtain \[\begin{aligned} \operatorname{Tr}(f\circ g)&=\operatorname{Tr}\left(\mathbf{M}(f\circ g,\mathbf{b},\mathbf{b})\right)=\operatorname{Tr}\left(\mathbf{M}(f,\mathbf{b},\mathbf{b})\mathbf{M}(g,\mathbf{b},\mathbf{b})\right)\\ &=\operatorname{Tr}\left(\mathbf{M}(g,\mathbf{b},\mathbf{b})\mathbf{M}(f,\mathbf{b},\mathbf{b})\right)=\operatorname{Tr}\left(\mathbf{M}(g\circ f,\mathbf{b},\mathbf{b})\right)=\operatorname{Tr}(g\circ f). \end{aligned}\] The proof of (ii) is analogous, but we use Proposition 5.21 instead of Proposition 6.19.
We also have:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. Then the following statements are equivalent:
\(g\) is injective;
\(g\) is surjective;
\(g\) is bijective;
\(\det(g) \neq 0.\)
Proof. The equivalence of the first three statements follows from Corollary 3.77. We fix an ordered basis \(\mathbf{b}\) of \(V.\) Suppose \(g\) is bijective with inverse \(g^{-1} : V \to V.\) Then we have \[\det (g\circ g^{-1})=\det(g)\det\left(g^{-1}\right)=\det\left(\mathrm{Id}_V\right)=\det\left(\mathbf{M}(\mathrm{Id}_V,\mathbf{b},\mathbf{b})\right)=\det\left(\mathbf{1}_{\dim V}\right)=1.\] It follows that \(\det(g)\neq 0\) and moreover that \[\det\left(g^{-1}\right)=\frac{1}{\det g}.\] Conversely, suppose that \(\det g\neq 0.\) Then \(\det \mathbf{M}(g,\mathbf{b},\mathbf{b}) \neq 0\) so that \(\mathbf{M}(g,\mathbf{b},\mathbf{b})\) is invertible by Corollary 5.22 and Proposition 3.116 implies that \(g\) is bijective.
Notice that Proposition 6.22 is wrong for infinite dimensional vector spaces. Consider \(V=\mathbb{K}^{\infty},\) the \(\mathbb{K}\)-vector space of sequences from Example 3.6. The endomorphism \(g : V \to V\) defined by \((x_1,x_2,x_3,\ldots) \mapsto (0,x_1,x_2,x_3,\ldots)\) is injective but not surjective.