11 Endomorphisms, I
In this chapter we study linear mappings from a vector space to itself.
A linear map \(g : V \to V\) from a \(\mathbb{K}\)-vector space \(V\) to itself is called an endomorphism. An endomorphism that is also an isomorphism is called an automorphism.
Working with endomorphisms has a different flavour than working with linear maps between two different spaces. In particular, if we want to write \(g\) as a matrix, it clearly makes sense to work with just one coordinate system for \(V\) – that is, we want to consider the matrices \(M(g, \mathbf{b}, \mathbf{b})\) for \(\mathbf{b}\) an ordered basis of \(V.\)
11.1 Matrices of endomorphisms
Similarity
Let \(V\) be a finite dimensional vector space equipped with an ordered basis \(\mathbf{b}\) and \(g : V \to V\) an endomorphism. As a special case of Theorem 8.26, we see that if we consider another ordered basis \(\mathbf{b}^{\prime}\) of \(V,\) then \[\mathbf{M}(g,\mathbf{b}^{\prime},\mathbf{b}^{\prime})=\mathbf{C}\,\mathbf{M}(g,\mathbf{b},\mathbf{b})\,\mathbf{C}^{-1},\] where we write \(\mathbf{C}=\mathbf{C}(\mathbf{b},\mathbf{b}^{\prime})\) for the change of basis matrix. This motivates the following definition:
Let \(n \in \mathbb{N}\) and \(\mathbf{A},\mathbf{A}^{\prime} \in M_{n,n}(\mathbb{K}).\) The matrices \(\mathbf{A}\) and \(\mathbf{A}^{\prime}\) are called similar or conjugate over \(\mathbb{K}\) if there exists an invertible matrix \(\mathbf{C}\in M_{n,n}(\mathbb{K})\) such that \[\mathbf{A}^{\prime} =\mathbf{C}\mathbf{A}\mathbf{C}^{-1}.\]
Similarity of matrices over \(\mathbb{K}\) is an equivalence relation:
Let \(n \in \mathbb{N}\) and \(\mathbf{A},\mathbf{B},\mathbf{X}\in M_{n,n}(\mathbb{K}).\) Then we have
\(\mathbf{A}\) is similar to itself;
\(\mathbf{A}\) is similar to \(\mathbf{B}\) then \(\mathbf{B}\) is similar to \(\mathbf{A}\);
If \(\mathbf{A}\) is similar to \(\mathbf{B}\) and \(\mathbf{B}\) is similar to \(\mathbf{X},\) then \(\mathbf{A}\) is also similar to \(\mathbf{X}.\)
Proof. (i) We take \(\mathbf{C}=\mathbf{1}_{n}.\)
(ii) Suppose \(\mathbf{A}\) is similar to \(\mathbf{B}\) so that \(\mathbf{B}=\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\) for some invertible matrix \(\mathbf{C}\in M_{n,n}(\mathbb{K}).\) Multiplying with \(\mathbf{C}^{-1}\) from the left and \(\mathbf{C}\) from the right, we get \[\mathbf{C}^{-1}\mathbf{B}\mathbf{C}=\mathbf{C}^{-1}\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\mathbf{C}=\mathbf{A},\] so that the similarity follows for the choice \(\hat{\mathbf{C}}=\mathbf{C}^{-1}.\)
(iii) We have \(\mathbf{B}=\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\) and \(\mathbf{X}=\mathbf{D}\mathbf{B}\mathbf{D}^{-1}\) for invertible matrices \(\mathbf{C},\mathbf{D}.\) Then we get \[\mathbf{X}=\mathbf{D}\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\mathbf{D}^{-1},\] so that the similarity follows for the choice \(\hat{\mathbf{C}}=\mathbf{D}\mathbf{C}.\)
Because of (ii) in particular, one can say that two matrices \(\mathbf{A}\) and \(\mathbf{B}\) are similar without ambiguity.
Theorem 8.26 shows that \(\mathbf{A}\) and \(\mathbf{B}\) are similar if and only if there exists an endomorphism \(g\) of \(\mathbb{K}^n\) such that \(\mathbf{A}\) and \(\mathbf{B}\) represent \(g\) with respect to two ordered bases of \(\mathbb{K}^n.\)
The main goal of this chapter (and the next) is: given an endomorphism \(g,\) how can we choose a basis \(\mathbf{b}\) which makes the matrix of \(g\) as nice as possible? Equivalently, given a square matrix \(\mathbf{A},\) is there a “nicest” matrix among all the matrices similar to \(\mathbf{A}\)?
This should remind you a lot of row echelon form. The RREF of \(\mathbf{A}\) is a unique “best” representative among all the matrices which are left-equivalent to \(\mathbf{A}.\) We’re now looking for a unique “best” representative among all matrices similar to \(\mathbf{A}.\)
(This sort of classification problem – define some equivalence relation, and then look for a nicest representative of each equivalence class – comes up a great deal in many areas of mathematics.)
Invariants of similarity classes
As a first step, we want to study functions \(f : M_{n,n}(\mathbb{K})\to \mathbb{K}\) which are invariant under conjugation, that is, \(f\) satisfies \(f(\mathbf{C}\mathbf{A}\mathbf{C}^{-1})=f(\mathbf{A})\) for all \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) and all invertible matrices \(\mathbf{C}\in M_{n,n}(\mathbb{K}).\) We have already seen an example of such a function, namely the determinant. Indeed using the product rule Proposition 10.1 and Corollary 10.2, we compute \[\tag{11.1} \begin{aligned} \det \left(\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\right)&=\det(\mathbf{C}\mathbf{A})\det\left(\mathbf{C}^{-1}\right)=\det(\mathbf{C})\det(\mathbf{A})\det\left(\mathbf{C}^{-1}\right)\\ &=\det(\mathbf{A}). \end{aligned}\] Because of this fact, the following definition makes sense:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. We define \[\det(g) = \det\left(\mathbf{M}(g,\mathbf{b},\mathbf{b})\right)\] where \(\mathbf{b}\) is any ordered basis of \(V.\) By Theorem 8.26 and (11.1), the scalar \(\det(g)\) is independent of the chosen ordered basis.
Another example of a scalar that we can associate to an endomorphism is the so-called trace. Like for the determinant, we first define the trace for matrices. Luckily, the trace is a lot simpler to define:
Let \(n \in \mathbb{N}\) and \(\mathbf{A}\in M_{n,n}(\mathbb{K}).\) The sum \(\sum_{i=1}^n [\mathbf{A}]_{ii}\) of its diagonal entries is called the trace of \(\mathbf{A}\) and denoted by \(\operatorname{Tr}(\mathbf{A})\) or \(\operatorname{Tr}\mathbf{A}.\)
For all \(n \in \mathbb{N}\) we have \(\operatorname{Tr}(\mathbf{1}_{n})=n.\) For \[\mathbf{A}=\begin{pmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 3 \end{pmatrix}\] we have \(\operatorname{Tr}(\mathbf{A})=2+2+3=7.\)
The trace of a product of square matrices is independent of the order of multiplication:
Let \(n \in \mathbb{N}\) and \(\mathbf{A},\mathbf{B}\in M_{n,n}(\mathbb{K}).\) Then we have \[\operatorname{Tr}(\mathbf{A}\mathbf{B})=\operatorname{Tr}(\mathbf{B}\mathbf{A}).\]
Proof. Let \(\mathbf{A}=(A_{ij})_{1\leqslant i,j\leqslant n}\) and \(\mathbf{B}=(B_{ij})_{1\leqslant i,j\leqslant n}.\) Then \[[\mathbf{A}\mathbf{B}]_{ij}=\sum_{k=1}^n A_{ik}B_{kj} \qquad \text{and}\qquad [\mathbf{B}\mathbf{A}]_{kj}=\sum_{i=1}^n B_{ki}A_{ij},\] so that \[\operatorname{Tr}(\mathbf{A}\mathbf{B})=\sum_{i=1}^n\sum_{k=1}^n A_{ik}B_{ki}=\sum_{k=1}^n\sum_{i=1}^n B_{ki}A_{ik}=\operatorname{Tr}(\mathbf{B}\mathbf{A}).\]
Using the previous proposition, we obtain \[\tag{11.2} \operatorname{Tr}\left(\mathbf{C}\mathbf{A}\mathbf{C}^{-1}\right)=\operatorname{Tr}\left(\mathbf{A}\mathbf{C}^{-1}\mathbf{C}\right)=\operatorname{Tr}(\mathbf{A}).\] As for the determinant, the following definition thus makes sense:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. We define \[\operatorname{Tr}(g) = \operatorname{Tr}\left(\mathbf{M}(g,\mathbf{b},\mathbf{b})\right)\] where \(\mathbf{b}\) is any ordered basis of \(V.\) By Theorem 8.26 and (11.2), the scalar \(\operatorname{Tr}(g)\) is independent of the chosen ordered basis.
The trace and determinant of endomorphisms behave nicely with respect to composition of maps:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space. Then, for all endomorphisms \(f,g :V \to V\) we have
\(\operatorname{Tr}(f\circ g)=\operatorname{Tr}(g\circ f)\);
\(\det(f\circ g)=\det(f)\det(g).\)
Proof. (i) Fix an ordered basis \(\mathbf{b}\) of \(V.\) Then, using Corollary 8.19 and Proposition 11.9, we obtain \[\begin{aligned} \operatorname{Tr}(f\circ g)&=\operatorname{Tr}\left(\mathbf{M}(f\circ g,\mathbf{b},\mathbf{b})\right)=\operatorname{Tr}\left(\mathbf{M}(f,\mathbf{b},\mathbf{b})\mathbf{M}(g,\mathbf{b},\mathbf{b})\right)\\ &=\operatorname{Tr}\left(\mathbf{M}(g,\mathbf{b},\mathbf{b})\mathbf{M}(f,\mathbf{b},\mathbf{b})\right)=\operatorname{Tr}\left(\mathbf{M}(g\circ f,\mathbf{b},\mathbf{b})\right)=\operatorname{Tr}(g\circ f). \end{aligned}\] The proof of (ii) is analogous, but we use Proposition 10.1 instead of Proposition 11.9.
We also have:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. Then the following statements are equivalent:
\(g\) is injective;
\(g\) is surjective;
\(g\) is bijective;
\(\det(g) \neq 0.\)
Proof. The equivalence of the first three statements follows from Corollary 6.21. We fix an ordered basis \(\mathbf{b}\) of \(V.\) Suppose \(g\) is bijective with inverse \(g^{-1} : V \to V.\) Then we have \[\det (g\circ g^{-1})=\det(g)\det\left(g^{-1}\right)=\det\left(\mathrm{Id}_V\right)=\det\left(\mathbf{M}(\mathrm{Id}_V,\mathbf{b},\mathbf{b})\right)=\det\left(\mathbf{1}_{\dim V}\right)=1.\] It follows that \(\det(g)\neq 0\) and moreover that \[\det\left(g^{-1}\right)=\frac{1}{\det g}.\] Conversely, suppose that \(\det g\neq 0.\) Then \(\det \mathbf{M}(g,\mathbf{b},\mathbf{b}) \neq 0\) so that \(\mathbf{M}(g,\mathbf{b},\mathbf{b})\) is invertible by Corollary 10.2 and Proposition 8.20 implies that \(g\) is bijective.
Notice that assertions (i)–(iii) of Proposition 11.12 are not equivalent for infinite-dimensional vector spaces (where the determinant doesn’t make sense). For instance, consider \(V=\mathbb{K}^{\infty},\) the \(\mathbb{K}\)-vector space of sequences from Example 4.7; then the endomorphism \(g : V \to V\) defined by \((x_1,x_2,x_3,\ldots) \mapsto (0,x_1,x_2,x_3,\ldots)\) is injective but not surjective.
11.2 Detour: More on Subspaces
Before we develop the theory of endomorphisms further, we need to make a detour, by developing a bit more theory about subspaces of vector spaces.
Sums of subspaces
Let \(V\) be a \(\mathbb{K}\)-vector space, \(n \in \mathbb{N}\) and \(U_1,\ldots,U_n\) be vector subspaces of \(V.\) The set \[\sum_{i=1}^n U_i=U_1+U_2+\cdots +U_n=\{v \in V | v=u_1+u_2+\cdots + u_n \text{ for } u_i \in U_i\}\] is called the sum of the subspaces \(U_i\).
Recall that by Proposition 4.21, the intersection of two subspaces is again a subspace, whereas the union of two subspaces fails to be a subspace in general. However, subspaces do behave nicely with regards to sums:
The sum of the subspaces \(U_i\subset V,\) \(i=1\ldots,n\) is again a vector subspace.
Proof. The sum \(\sum_{i=1}^n U_i\) is non-empty, since it contains the zero vector \(0_V.\) Let \(v\) and \(v^{\prime} \in \sum_{i=1}^n U_i\) so that \[v=v_1+v_2+\cdots +v_n \qquad \text{and} \qquad v^{\prime}=v^{\prime}_1+v^{\prime}_2+\cdots+v^{\prime}_n\] for vectors \(v_i,v^{\prime}_i \in U_i,\) \(i=1,\ldots,n.\) Each \(U_i\) is a vector subspace of \(V.\) Therefore, for all scalars \(s,t \in \mathbb{K},\) the vector \(s v_i+t v^{\prime}_i\) is an element of \(U_i,\) \(i=1,\ldots,n.\) Thus \[s v+t v^{\prime}=s v_1+t v^{\prime}_1+\cdots+s v_n+t v^{\prime}_n\] is an element of \(U_1+\cdots +U_n.\) By Definition 4.16, it follows that \(U_1+\cdots +U_n\) is a vector subspace of \(V.\)
Notice that \(U_1+\cdots+ U_n\) is the smallest vector subspace of \(V\) containing all vector subspaces \(U_i,\) \(i=1,\ldots,n.\)
Direct sums
If each vector in the sum is in a unique way the sum of vectors from the subspaces we say the subspaces are in direct sum:
Let \(V\) be a \(\mathbb{K}\)-vector space, \(n \in \mathbb{N}\) and \(U_1,\ldots,U_n\) be vector subspaces of \(V.\) The subspaces \(U_1,\ldots,U_n\) are said to be in direct sum if each vector \(w \in W=U_1+\cdots+U_n\) is in a unique way the sum of vectors \(v_i \in U_i\) for \(1\leqslant i\leqslant n.\) That is, if \(w=v_1+v_2+\cdots+v_n=v^{\prime}_1+v^{\prime}_2+\cdots+v^{\prime}_n\) for vectors \(v_i,v^{\prime}_i \in U_i,\) then \(v_i=v^{\prime}_i\) for all \(1\leqslant i \leqslant n.\) We write \[\bigoplus_{i=1}^n U_i\] in case the subspaces \(U_1,\ldots,U_n\) are in direct sum.
Let \(n \in \mathbb{N}\) and \(V=\mathbb{K}^n\) as well as \(U_i=\operatorname{span}\{\vec{e}_i\},\) where \(\{\vec{e}_1,\ldots,\vec{e}_n\}\) denotes the standard basis of \(\mathbb{K}^n.\) Then \(\mathbb{K}^n=\bigoplus_{i=1}^n U_i.\)
Two subspaces \(U_1,U_2\) of \(V\) are in direct sum if and only if \(U_1\cap U_2=\{0_V\}.\) Indeed, suppose \(U_1\cap U_2=\{0_V\}\) and consider \(w=v_1+v_2=v_1^{\prime}+v_2^{\prime}\) with \(v_i,v_i^{\prime} \in U_i\) for \(i=1,2.\) We then have \(v_1-v_1^{\prime}=v_2^{\prime}-v_2 \in U_2,\) since \(U_2\) is a subspace. Since \(U_1\) is a subspace as well, we also have \(v_1-v_1^{\prime} \in U_1.\) Since \(v_1-v_1^{\prime}\) lies both in \(U_1\) and \(U_2,\) we must have \(v_1-v_1^{\prime}=0_V=v_2^{\prime}-v_2.\) Conversely, suppose \(U_1,U_2\) are in direct sum and let \(w \in (U_1\cap U_2).\) We can write \(w=w+0_V=0_V+w,\) since \(w \in U_1\) and \(w \in U_2.\) Since \(U_1,U_2\) are in direct sum, we must have \(w=0_V,\) hence \(U_1\cap U_2=\{0_V\}.\)
Observe that if the subspaces \(U_1,\ldots,U_n\) are in direct sum and \(v_i \in U_i\) with \(v_i \neq 0_V\) for \(1\leqslant i\leqslant n,\) then the vectors \(\{v_1,\ldots,v_n\}\) are linearly independent. Indeed, if \(s_1,\ldots,s_n\) are scalars such that \[s_1v_1+s_2v_2+\cdots+s_n v_n=0_V=0_V+0_V+\cdots+0_V,\] then \(s_iv_i=0_V\) for all \(1\leqslant i\leqslant n.\) By assumption \(v_i\neq 0_V\) and hence \(s_i=0\) for all \(1\leqslant i\leqslant n.\)
Let \(n \in \mathbb{N},\) \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(U_1,\ldots,U_n\) be subspaces of \(V.\) Let \(\mathbf{b}_i\) be an ordered basis of \(U_i\) for \(1\leqslant i\leqslant n.\) Then we have:
The tuple of vectors obtained by listing all the vectors of the bases \(\mathbf{b}_i\) is a basis of \(V\) if and only if \(V=\bigoplus_{i=1}^n U_i.\)
\(\dim(U_1+\cdots+U_n)\leqslant \dim(U_1)+\cdots+\dim (U_n)\) with equality if and only if the subspaces \(U_1,\ldots,U_n\) are in direct sum.
Proof. Part of an exercise.
Complements
Let \(V\) be a \(\mathbb{K}\)-vector space and \(U\subset V\) a subspace. A subspace \(U^{\prime}\) of \(V\) such that \(V=U\oplus U^{\prime}\) is called a complement to \(U\).
Notice that a complement need not be unique. Consider \(V=\mathbb{R}^2\) and \(U=\operatorname{span}\{\vec{e}_1\}.\) Let \(v \in V.\) Then the subspace \(U^{\prime}=\operatorname{span}\{v\}\) is a complement to \(U,\) provided \(\vec{e}_1,\vec{v}\) are linearly independent.
Let \(U\) be a subspace of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) Then there exists a subspace \(U^{\prime}\) so that \(V=U\oplus U^{\prime}.\)
Proof. Suppose \((v_1,\ldots,v_m)\) is an ordered basis of \(U.\) By Theorem 5.10, there exists a basis \(\{v_1,\ldots,v_m,v_{m+1},\ldots,v_n\}\) of \(V.\) Defining \(U^{\prime}=\operatorname{span}\{v_{m+1},\ldots,v_n\},\) Proposition 11.20 implies the claim.
The dimension of a sum of two subspaces equals the sum of the dimensions of the subspaces minus the dimension of the intersection:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(U_1,U_2\) subspaces of \(V.\) Then we have \[\dim(U_1+U_2)=\dim(U_1)+\dim(U_2)-\dim(U_1\cap U_2).\]
Proof. Let \(r=\dim(U_1\cap U_2)\) and let \(\{u_1,\ldots,u_r\}\) be a basis of \(U_1\cap U_2.\) These vectors are linearly independent and elements of \(U_1,\) hence by Theorem 5.10, there exist vectors \(v_1,\ldots,v_{m-r}\) so that \(\mathcal{S}_1=\{u_1,\ldots,u_r,v_1,\ldots,v_{m-r}\}\) is a basis of \(U_1.\) Likewise there exist vectors \(w_1,\ldots,w_{n-r}\) such that \(\mathcal{S}_2=\{u_1,\ldots,u_r,w_1,\ldots,w_{n-r}\}\) is a basis of \(U_2.\) Here \(m=\dim U_1\) and \(n=\dim U_2.\)
Now consider the set \(\mathcal{S}=\{u_1,\ldots,u_r,v_1,\ldots,v_{m-r},w_1,\ldots,w_{n-r}\}\) consisting of \(r+m-r+n-r=n+m-r\) vectors. If this set is a basis of \(U_1+U_2,\) then the claim follows, since then \(\dim(U_1+U_2)=n+m-r=\dim(U_1)+\dim(U_2)-\dim(U_1\cap U_2).\)
We first show that \(\mathcal{S}\) generates \(U_1+U_2.\) Let \(y \in U_1+U_2\) so that \(y=x_1+x_2\) for vectors \(x_1 \in U_1\) and \(x_2 \in U_2.\) Since \(\mathcal{S}_1\) is a basis of \(U_1,\) we can write \(x_1\) as a linear combination of elements of \(\mathcal{S}_1.\) Likewise we can write \(x_2\) as a linear combination of elements of \(\mathcal{S}_2.\) It follows that \(\mathcal{S}\) generates \(U_1+U_2.\)
We need to show that \(\mathcal{S}\) is linearly independent. So suppose we have scalars \(s_1,\ldots,s_r,\) \(t_1,\ldots,t_{m-r},\) and \(r_{1},\ldots,r_{n-r},\) so that \[\underbrace{s_1u_1+\cdots+s_r u_r}_{=u} +\underbrace{t_1v_1+\cdots+t_{m-r}v_{m-r}}_{=v}+\underbrace{r_1w_1+\cdots+r_{n-r}w_{n-r}}_{=w}=0_V.\] Equivalently, \(w=-u-v\) so that \(w \in U_1.\) Since \(w\) is a linear combination of elements of \(\mathcal{S}_2,\) we also have \(w \in U_2.\) Therefore, \(w \in U_1\cap U_2\) and there exist scalars \(\hat{s}_1,\ldots,\hat{s}_r\) such that \[w=\underbrace{\hat{s}_1u_1+\cdots+\hat{s}_r u_r}_{=\hat{u}}\] This is equivalent to \(w-\hat{u}=0_V,\) or written out \[r_1w_1+\cdots+r_{n-r}w_{n-r}-\hat{s}_1u_1-\cdots+\hat{s}_r u_r=0_V.\] Since the vectors \(\{u_1,\ldots,u_r,w_1,\ldots,w_{n-r}\}\) are linearly independent, we conclude that \(r_1=\cdots=r_{n-r}=\hat{s}_1=\cdots=\hat{s}_r=0.\) It follows that \(w=0_V\) and hence \(u+v=0_V.\) Again, since \(\{u_1,\ldots,u_r,v_1,\ldots,v_{n-r}\}\) are linearly independent, we conclude that \(s_1=\cdots=s_r=t_1=\cdots=t_{m-r}=0\) and we are done.
If you’ve seen the Inclusion-Exclusion Principle for counting the sizes of finite sets, it’s tempting to guess that the previous lemma generalises to three or more subspaces as follows: \[\begin{gathered} \dim(U_1 + U_2 + U_3)\, \stackrel{?}{=}\, \dim(U_1) + \dim(U_2) + \dim(U_3) \\ - \dim(U_1\cap U_2) - \dim(U_2\cap U_3) - \dim(U_3 \cap U_1) + \dim(U_1 \cap U_2 \cap U_3). \end{gathered}\] This is, surprisingly, false – taking \(U_i\) to be any three distinct lines through the origin in \(\mathbb{R}^2\) gives a counterexample.
11.3 Eigenvectors and eigenvalues
Mappings \(g\) that have the same domain and codomain allow for the notion of a fixed point. Recall that an element \(x\) of a set \(\mathcal{X}\) is called a fixed point of a mapping \(g :\mathcal{X} \to \mathcal{X}\) if \(g(x)=x,\) that is, \(x\) agrees with its image under \(g.\) In Linear Algebra, a generalisation of the notion of a fixed point is that of an eigenvector. A vector \(v \in V\) is called an eigenvector of the linear map \(g : V \to V\) if \(v\) is merely scaled when applying \(g\) to \(v,\) that is, there exists a scalar \(\lambda \in \mathbb{K}\) – called eigenvalue – such that \(g(v)=\lambda v.\) Clearly, the zero vector \(0_V\) will satisfy this condition for every choice of scalar \(\lambda.\) For this reason, eigenvectors are usually required to be different from the zero vector. In this terminology, fixed points \(v\) of \(g\) are simply eigenvectors with eigenvalue \(1,\) since they satisfy \(g(v)=v=1v.\)
It is natural to ask whether a linear map \(g : V \to V\) always admits an eigenvector. In the remaining part of this chapter we will answer this question and further develop our theory of linear maps, specifically endomorphisms. We start with some precise definitions.
Let \(g : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V.\)
An eigenvector with eigenvalue \(\lambda \in \mathbb{K}\) is a non-zero vector \(v \in V\) such that \(g(v)=\lambda v.\)
If \(\lambda\in \mathbb{K}\) is an eigenvalue of \(g,\) the \(\lambda\)-eigenspace \(\operatorname{Eig}_{g}(\lambda)\) is the subspace of vectors \(v\in V\) satisfying \(g(v)=\lambda v.\)
The dimension of \(\operatorname{Eig}_{g}(\lambda)\) is called the geometric multiplicity of the eigenvalue \(\lambda.\)
The set of all eigenvalues of \(g\) is called the spectrum of \(g\).
For \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) we speak of eigenvalues, eigenvectors, eigenspaces and spectrum to mean those of the endomorphism \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n.\)
By definition, the zero vector \(0_V\) is not an eigenvector, it is however an element of the eigenspace \(\operatorname{Eig}_{g}(\lambda)\) for every eigenvalue \(\lambda.\)
The scalar \(0\) is an eigenvalue of an endomorphism \(g : V \to V\) if and only if the kernel of \(g\) is different from \(\{0_V\}.\) In the case where the kernel of \(f\) does not only consist of the zero vector, we have \(\operatorname{Ker}g=\operatorname{Eig}_{g}(0)\) and the geometric multiplicity of \(0\) is the nullity of \(g.\)
The endomorphism \(f_\mathbf{D}: \mathbb{K}^n \to \mathbb{K}^n\) associated to a diagonal matrix with distinct diagonal entries \[\mathbf{D}=\begin{pmatrix} \lambda_1 & && \\ & \lambda_2 && \\ &&\ddots & \\ &&& \lambda_n \end{pmatrix}\] has spectrum \(\{\lambda_1,\ldots,\lambda_n\}\) and corresponding eigenspaces \(\operatorname{Eig}_{f_{\mathbf{D}}}(\lambda_i)=\operatorname{span}\{\vec{e}_i\}.\)
Consider the \(\mathbb{R}\)-vector space \(\mathsf{P}(\mathbb{R})\) of polynomials and \(f=\frac{\mathrm{d}}{\mathrm{d}x} : \mathsf{P}(\mathbb{R}) \to \mathsf{P}(\mathbb{R})\) the derivative by the variable \(x.\) The kernel of \(f\) consists of the constant polynomials and hence \(0\) is an eigenvalue for \(f.\) For any non-zero scalar \(\lambda\) we cannot have polynomials \(p\) satisfying \(\frac{\mathrm{d}}{\mathrm{d}x} p=\lambda p,\) as the left hand of this last expression has a smaller degree than the right hand side.
Previously we defined the trace and determinant for an endomorphism \(g : V \to V\) by observing that the trace and determinant of the matrix representation of \(g\) are independent of the chosen basis of \(V.\) Similarly, we can consider eigenvalues of \(g\) and eigenvalues of the matrix representation of \(g\) with respect to some ordered basis of \(V.\) Perhaps unsurprisingly, the eigenvalues are the same:
Let \(g : V \to V\) be an endomorphism of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) Let \(\mathbf{b}\) be an ordered basis of \(V\) with corresponding linear coordinate system \(\boldsymbol{\beta}.\) Then \(v \in V\) is an eigenvector of \(g\) with eigenvalue \(\lambda \in \mathbb{K}\) if and only if \(\boldsymbol{\beta}(v) \in \mathbb{K}^n\) is an eigenvector with eigenvalue \(\lambda\) of \(\mathbf{M}(g,\mathbf{b},\mathbf{b}).\) In particular, conjugate matrices have the same eigenvalues.
Proof. Write \(\mathbf{A}=\mathbf{M}(g,\mathbf{b},\mathbf{b}).\) Recall that by an eigenvector of \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) we mean an eigenvector of \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n.\) By Definition 8.10, we have \(f_\mathbf{A}=\boldsymbol{\beta}\circ g \circ \boldsymbol{\beta}^{-1}.\) Suppose \(\lambda \in \mathbb{K}\) is an eigenvalue of \(g\) so that \(g(v)=\lambda v\) for some non-zero vector \(v\in V.\) Consider the vector \(\vec{x}=\boldsymbol{\beta}(v) \in \mathbb{K}^n\) which is non-zero, since \(\boldsymbol{\beta}: V \to \mathbb{K}^n\) is an isomorphism. Then \[f_\mathbf{A}(\vec{x})=\boldsymbol{\beta}(g(\boldsymbol{\beta}^{-1}(\vec{x})))=\boldsymbol{\beta}(g(v))=\boldsymbol{\beta}(\lambda v)=\lambda \boldsymbol{\beta}(v)=\lambda \vec{x},\] so that \(\vec{x}\) is an eigenvector of \(f_\mathbf{A}\) with eigenvalue \(\lambda.\)
Conversely, if \(\lambda\) is an eigenvalue of \(f_\mathbf{A}\) with non-zero eigenvector \(\vec{x},\) then it follows as above that \(v=\boldsymbol{\beta}^{-1}(\vec{x}) \in V\) is an eigenvector of \(g\) with eigenvalue \(\lambda.\)
By Remark 11.4, if the matrices \(\mathbf{A},\) \(\mathbf{B}\) are similar, then they represent the same endomorphism \(g : \mathbb{K}^n \to \mathbb{K}^n\) and hence have the same eigenvalues.
The “nicest” endomorphisms are those for which there exists an ordered basis consisting of eigenvectors:
An endomorphism \(g : V \to V\) is called diagonalisable if there exists an ordered basis \(\mathbf{b}\) of \(V\) such that each element of \(\mathbf{b}\) is an eigenvector of \(g.\)
For \(n \in \mathbb{N},\) a matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is called diagonalisable over \(\mathbb{K}\) if the endomorphism \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n\) is diagonalisable.
We consider \(V=\mathsf{P}(\mathbb{R})\) and the endomorphism \(g : V \to V\) which replaces the variable \(x\) with \(2x.\) For instance, we have \[g(x^2-2x+3)=(2x)^2-2(2x)+3=4x^2-4x+3.\] Then \(g\) is diagonalisable. The vector space \(\mathsf{P}(\mathbb{R})\) has an ordered basis \(\mathbf{b}=(1,x,x^2,x^3,\ldots).\) Clearly, for all \(k \in \mathbb{N}\cup\{0\}\) we have \(g(x^k)=2^kx^k,\) so that \(x^k\) is an eigenvector of \(g\) with eigenvalue \(2^k.\)
For \(\alpha \in (0, \pi)\) consider \[\mathbf{R}_{\alpha}=\begin{pmatrix} \cos \alpha & -\sin \alpha \\ \sin\alpha & \cos \alpha \end{pmatrix}.\] Recall that the endomorphism \(f_{\mathbf{R}_{\alpha}} : \mathbb{R}^2 \to \mathbb{R}^2\) rotates vectors counter-clockwise around the origin \(0_{\mathbb{R}^2}\) by the angle \(\alpha.\) Since \(\alpha \in (0,\pi),\) the endomorphism \(f_{\mathbf{R}_{\alpha}}\) has no eigenvectors and hence is not diagonalisable.
Applying Proposition 11.29, we conclude that in the case of a finite dimensional \(\mathbb{K}\)-vector space \(V,\) an endomorphism \(g : V \to V\) is diagonalisable if and only if there exists an ordered basis \(\mathbf{b}\) of \(V\) such that \(\mathbf{M}(g,\mathbf{b},\mathbf{b})\) is a diagonal matrix. Moreover, \(\mathbf{A}\in M_{n,n}(\mathbb{K})\) is diagonalisable if and only if \(\mathbf{A}\) is similar over \(\mathbb{K}\) to a diagonal matrix.
Recall, if \(\mathcal{X},\mathcal{Y}\) are sets, \(f : \mathcal{X} \to \mathcal{Y}\) a mapping and \(\mathcal{Z}\subset \mathcal{X}\) a subset of \(\mathcal{X},\) we can consider the restriction of \(f\) to \(\mathcal{Z}\), usually denoted by \(f|_\mathcal{Z},\) which is the mapping \[f|_\mathcal{Z} : \mathcal{Z} \to \mathcal{Y}, \quad z \mapsto f(z).\] So we simply take the same mapping \(f,\) but apply it to the elements of the subset only.
Closely related to the notion of an eigenvector is that of a stable subspace. Let \(v \in V\) be an eigenvector with eigenvalue \(\lambda\) of the endomorphism \(g : V \to V.\) The \(1\)-dimensional subspace \(U=\operatorname{span}\{v\}\) is stable under \(g,\) that is, \(g(U)\subset U.\) Indeed, since \(g(v)=\lambda v\) and since every vector \(u \in U\) can be written as \(u=t v\) for some scalar \(t \in \mathbb{K},\) we have \(g(u)=g(t v)=t g(v)=t\lambda v \in U.\) This motivates the following definition:
A subspace \(U\subset V\) is called stable or invariant under the endomorphism \(g : V \to V\) if \(g(U) \subset U,\) that is \(g(u) \in U\) for all vectors \(u \in U.\) In this case, the restriction \(g|_{U}\) of \(g\) to \(U\) is an endomorphism of \(U.\)
Notice that a finite dimensional subspace \(U\subset V\) is stable under \(g\) if and only if \(g(v_i) \in U\) for \(1\leqslant i\leqslant m,\) where \(\{v_1,\ldots,v_m\}\) is a basis of \(U.\)
Every eigenspace of an endomorphism \(g : V \to V\) is a stable subspace. By definition \(g|_{\operatorname{Eig}_{g}(\lambda)} : \operatorname{Eig}_{g}(\lambda) \to \operatorname{Eig}_{g}(\lambda)\) is multiplication by the scalar \(\lambda \in \mathbb{K}.\)
We consider \(V=\mathbb{R}^3\) and \[\mathbf{R}_{\alpha}=\begin{pmatrix} \cos \alpha & -\sin \alpha & 0 \\ \sin \alpha & \cos \alpha & 0 \\ 0 & 0 & 1 \end{pmatrix}\] for \(\alpha \in (0,\pi).\) The endomorphism \(f_{\mathbf{R}_\alpha} : \mathbb{R}^3 \to \mathbb{R}^3\) is the rotation by the angle \(\alpha \in \mathbb{R}\) around the axis spanned by \(\vec{e}_3.\) Then the plane \(U=\{\vec{x}=(x_i)_{1\leqslant i\leqslant 3} \in \mathbb{R}^3 | x_3=0\}\) is stable under \(f=f_{\mathbf{R}_{\alpha}}.\) Here \(f|_{\Pi} : \Pi \to \Pi\) is the rotation in the plane \(U\) around the origin with angle \(\alpha.\)
Moreover, the vector \(\vec{e}_3\) is an eigenvector with eigenvalue \(1\) so that \[\operatorname{Eig}_{f}(1)=\operatorname{span}\{\vec{e}_3\}.\]
We consider again the \(\mathbb{R}\)-vector space \(\mathsf{P}(\mathbb{R})\) of polynomials and \(f=\frac{\mathrm{d}}{\mathrm{d}x} : \mathsf{P}(\mathbb{R}) \to \mathsf{P}(\mathbb{R})\) the derivative by the variable \(x.\) For \(n \in \mathbb{N}\) let \(U_n\) denote the subspace of polynomials of degree at most \(n.\) Since \(U_{n-1}\subset U_n,\) the subspace \(U_n\) is stable under \(f.\)
Stable subspaces correspond to zero blocks in the matrix representation of linear maps. More precisely:
Let \(V\) be a \(\mathbb{K}\)-vector space of dimension \(n \in \mathbb{N}\) and \(g : V \to V\) an endomorphism. Furthermore, let \(U\subset V\) be a subspace of dimension \(1\leqslant m\leqslant n\) and \(\mathbf{b}\) an ordered basis of \(U\) and \(\mathbf{c}=(\mathbf{b},\mathbf{b}^{\prime})\) an ordered basis of \(V.\) Then \(U\) is stable under \(g\) if and only if the matrix \(\mathbf{A}=\mathbf{M}(g,\mathbf{c},\mathbf{c})\) has the form \[\mathbf{A}=\begin{pmatrix} \hat{\mathbf{A}} & \ast \\ \mathbf{0}_{n-m,m} & \ast \end{pmatrix}\] for some matrix \(\hat{\mathbf{A}} \in M_{m,m}(\mathbb{K}).\) In the case where \(U\) is stable under \(g,\) we have \(\hat{\mathbf{A}}=\mathbf{M}(g|_U,\mathbf{b},\mathbf{b}) \in M_{m,m}(\mathbb{K}).\)
Proof. Write \(\mathbf{b}=(v_1,\ldots,v_m)\) for vectors \(v_i \in U\) and \(\mathbf{b}^{\prime}=(w_1,\ldots,w_{n-m})\) for vectors \(w_i \in V.\)
\(\Rightarrow\) Since \(U\) is stable under \(g,\) we have \(g(u) \in U\) for all vectors \(u \in U.\) Since \(\mathbf{b}\) is a basis of \(U,\) there exist scalars \(\hat{A}_{ij} \in \mathbb{K}\) with \(1\leqslant i,j\leqslant m\) such that \[g(v_j)=\sum_{i=1}^m \hat{A}_{ij}v_i\] for all \(1\leqslant j\leqslant m.\) By Proposition 8.11, the matrix representation of \(g\) with respect to the ordered basis \(\mathbf{c}=(\mathbf{b},\mathbf{b}^{\prime})\) of \(V\) thus takes the form \[\mathbf{A}=\begin{pmatrix} \hat{\mathbf{A}} & \ast \\ \mathbf{0}_{n-m,m} & \ast \end{pmatrix}\] where we write \(\hat{\mathbf{A}}=(\hat{A}_{ij})_{1\leqslant i,j\leqslant m}=\mathbf{M}(g|_U,\mathbf{b},\mathbf{b}).\)
\(\Leftarrow\) Suppose \[\mathbf{A}=\begin{pmatrix} \hat{\mathbf{A}} & \ast \\ \mathbf{0}_{n-m,m} & \ast \end{pmatrix}=\mathbf{M}(g,\mathbf{c},\mathbf{c})\] is the matrix representation of \(g\) with respect to the ordered basis \(\mathbf{c}\) of \(V.\) Write \(\hat{\mathbf{A}}=(\hat{A}_{ij})_{1\leqslant i,j\leqslant m}\) Then, by Proposition 8.11, \(g(v_j)=\sum_{i=1}^m \hat{A}_{ij}v_i \in U\) for all \(1\leqslant j\leqslant m,\) hence \(U\) is stable under \(g,\) by Remark 11.34.
From Proposition 11.36 we can conclude:
Suppose \(V\) is the direct sum of subspaces \(U_1,\) \(U_2,\ldots,U_m,\) all of which are stable under the endomorphism \(g : V \to V.\) If \(\mathbf{b}_i\) is an ordered basis of \(U_i\) for \(i=1,\ldots,m,\) then the matrix representation of \(g\) with respect to the ordered basis \(\mathbf{c}=(\mathbf{b}_1,\ldots,\mathbf{b}_m)\) takes the block form \[\mathbf{A}=\begin{pmatrix} \mathbf{A}_1 & && \\ & \mathbf{A}_2 && \\ &&\ddots & \\ &&& \mathbf{A}_m \end{pmatrix}\] where \(\mathbf{A}_i=\mathbf{M}(g|_{U_i},\mathbf{b}_i,\mathbf{b}_i)\) for \(i=1,\ldots,m.\)
11.4 The characteristic polynomial
The eigenvalues of an endomorphism are the solutions of a polynomial equation:
Let \(V\) be a finite dimensional \(\mathbb{K}\)-vector space and \(g : V \to V\) an endomorphism. Then \(\lambda \in \mathbb{K}\) is an eigenvalue of \(g\) if and only if \[\det\left(\lambda \mathrm{Id}_{V}-g\right)=0.\] Moreover if \(\lambda\) is an eigenvalue of \(g,\) then \(\operatorname{Eig}_{g}(\lambda)=\operatorname{Ker}(\lambda \mathrm{Id}_V-g).\)
Proof. Let \(v \in V.\) We may write \(v=\mathrm{Id}_V(v).\) Hence \[g(v)=\lambda v \quad \iff \quad 0_V=(\lambda \mathrm{Id}_V-g)(v) \quad \iff v \in \operatorname{Ker}(\lambda\mathrm{Id}_V-g)\] It follows that \(\operatorname{Eig}_{g}(\lambda)=\operatorname{Ker}(\lambda \mathrm{Id}_V-g).\) Moreover \(\lambda \in \mathbb{K}\) is an eigenvalue of \(g\) if and only if the kernel of \(\lambda\mathrm{Id}_V-g\) is different from \(\{0_V\}\) or if and only if \(\lambda\mathrm{Id}_V-g\) is not injective. Proposition 11.12 implies that \(\lambda \in \mathbb{K}\) is an eigenvalue of \(g\) if and only if \(\det\left(\lambda \mathrm{Id}_{V}-g\right)=0.\)
Let \(g : V \to V\) be an endomorphism of a finite dimensional \(\mathbb{K}\)-vector space \(V.\) The function \[\operatorname{char}_g : \mathbb{K}\to \mathbb{K}, \quad x\mapsto \det\left(x\mathrm{Id}_V -g\right)\] is called the characteristic polynomial of the endomorphism \(g\).
In practice, in order to compute the characteristic polynomial of an endomorphism \(g : V \to V,\) we choose an ordered basis \(\mathbf{b}\) of \(V\) and compute the matrix representation \(\mathbf{A}=\mathbf{M}(g,\mathbf{b},\mathbf{b})\) of \(g\) with respect to \(\mathbf{b}.\) We then have \[\operatorname{char}_g(x)=\det\left(x\mathbf{1}_{n}-\mathbf{A}\right).\] By the characteristic polynomial of a matrix \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) we mean the characteristic polynomial of the endomorphism \(f_\mathbf{A}: \mathbb{K}^n \to \mathbb{K}^n,\) that is, the function \(x\mapsto \det\left(x\mathbf{1}_{n}-\mathbf{A}\right).\)
A zero of a polynomial \(f : \mathbb{K}\to \mathbb{K}\) is a scalar \(\lambda\in \mathbb{K}\) such that \(f(\lambda)=0.\) The multiplicity of a zero \(\lambda\) is the largest integer \(n\geqslant 1\) such that there exists a polynomial \(\hat{f} : \mathbb{K}\to \mathbb{K}\) so that \(f(x)=(x-\lambda)^n\hat{f}(x)\) for all \(x \in \mathbb{K}.\) Zeros are also known as roots.
The polynomial \(f(x)=x^3-x^2-8x+12\) can be factorised as \(f(x)=(x-2)^2(x+3)\) and hence has zero \(2\) with multiplicity \(2\) and \(-3\) with multiplicity \(1.\)
Let \(\lambda\) be an eigenvalue of the endomorphism \(g : V \to V.\) The multiplicity of the zero \(\lambda\) of \(\operatorname{char}_g\) is called the algebraic multiplicity of \(\lambda\).
We consider \[\mathbf{A}=\begin{pmatrix} 1 & 5 \\ 5 & 1 \end{pmatrix}.\] Then \[\begin{aligned} \operatorname{char}_{\mathbf{A}}(x)&=\operatorname{char}_{f_\mathbf{A}}(x)=\det\left(x\mathbf{1}_{2}-A\right)=\det\begin{pmatrix} x-1 & -5 \\ -5 & x-1\end{pmatrix}\\ &=(x-1)^2-25=x^2-2x-24=(x+4)(x-6). \end{aligned}\] Hence we have eigenvalues \(\lambda_1=6\) and \(\lambda_2=-4,\) both with algebraic multiplicity \(1.\) By definition we have \[\operatorname{Eig}_{\mathbf{A}}(6)=\operatorname{Eig}_{f_\mathbf{A}}(6)=\left\{\vec{v} \in \mathbb{K}^2 : \mathbf{A}\vec{v}=6\vec{v}\right\}\] and we compute that \[\operatorname{Eig}_{\mathbf{A}}(6)=\operatorname{span}\left\{\begin{pmatrix} 1 \\ 1 \end{pmatrix}\right\}\] Since \(\dim \operatorname{Eig}_\mathbf{A}(6)=1,\) the eigenvalue \(6\) has geometric multiplicity \(1.\) Likewise we compute \[\operatorname{Eig}_\mathbf{A}(-4)=\operatorname{span}\left\{\begin{pmatrix} -1 \\ 1 \end{pmatrix}\right\}\] so that the eigenvalue \(-4\) has geometric multiplicity \(1\) as well. Notice that we have an ordered basis of eigenvectors of \(\mathbf{A}\) and hence \(\mathbf{A}\) is diagonalisable, c.f. Example 8.15.
We consider \[\mathbf{A}=\begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}\] Then \(\operatorname{char}_\mathbf{A}(x)=(x-2)^2\) so that we have a single eigenvalue \(2\) with algebraic multiplicity \(2.\) We compute \[\operatorname{Eig}_\mathbf{A}(2)=\operatorname{span}\left\{\begin{pmatrix} 1 \\ 0 \end{pmatrix}\right\}\] so that the eigenvalue \(2\) has geometric multiplicity \(1.\) Notice that we cannot find an ordered basis consisting of eigenvectors, hence \(\mathbf{A}\) is not diagonalisable.
The determinant and trace of an endomorphism do appear as coefficients in its characteristic polynomial:
Let \(g : V \to V\) be an endomorphism of a \(\mathbb{K}\)-vector space \(V\) of dimension \(n.\) Then \(\operatorname{char}_g\) is a polynomial of degree \(n\) and \[\operatorname{char}_g(x)=x^n-\operatorname{Tr}(g)x^{n-1}+\cdots +(-1)^n\det(g).\]
Proof. Fix an ordered basis \(\mathbf{b}\) of \(V\) and let \(\mathbf{A}= \mathbf{M}(g,\mathbf{b},\mathbf{b})=(A_{ij})_{1\leqslant i,j\leqslant n},\) so we need to compute \(\det(\mathbf{B})\) where \(\mathbf{B}= x\mathbf{1}_{n} - \mathbf{A}.\) The result is obvious if \(n = 1,\) so by induction we may assume it holds for \(n - 1.\)
Performing a Laplace expansion on the first column of \(\mathbf{B},\) we get the following terms:
A term \((x - A_{11}) \det(\mathbf{B}^{(1, 1)}).\) Using the induction hypothesis we have \(\det \mathbf{B}^{(1, 1)} = \operatorname{char}(\mathbf{A}^{(1, 1)}) = x^{n-1} - x^{n-2} \operatorname{Tr}(\mathbf{A}^{(1, 1)}) + \dots,\) where the dots denote terms of degree \(\leqslant(n - 2).\) So we have \[\begin{gathered} (x - A_{11}) \det(\mathbf{B}^{(1, 1)}) = (x - A_{11})(x^{n-1} - x^{n-2} \operatorname{Tr}(\mathbf{A}^{(1, 1)}) + \dots)\\ = x^n - (A_{11} + \operatorname{Tr}(\mathbf{A}^{(1, 1)})) x^{n-1} + \dots. \end{gathered}\] Since the diagonal entries of \(\mathbf{A}^{(1, 1)}\) are precisely the diagonal entries of \(A\) with \(A_{1, 1}\) removed, we have \(A_{11} + \operatorname{Tr}(\mathbf{A}^{(1, 1)}) = \operatorname{Tr}(\mathbf{A}),\) so this is just \(x^n - \operatorname{Tr}(\mathbf{A}) x^{n-1} + \dots.\)
Another \((n - 1)\) terms of the form \((-1)^{k + 1} \cdot (-A_{k1}) \cdot \det(\mathbf{B}^{(k, 1)})\) for \(2 \leqslant k \leqslant n.\) We claim that each of these is a polynomial of degree \(\leqslant n - 2.\) This is because crossing out the \(k\)-th row and first column, for \(k \ne 1,\) removes 2 of the terms with \(x\) in them from the diagonal of \(\mathbf{B}.\) Hence \(\mathbf{B}^{(k, 1)}\) has \(n-2\) entries which are linear in \(x\) and all the rest are constant; so each of the terms in the Leibniz formula for \(\det \mathbf{B}^{(k, 1)}\) has degree at most \(n - 2.\)
So the Laplace-expansion terms with \(2 \leqslant k \leqslant n\) don’t contribute anything to the \(x^n\) and \(x^{n-1}\) coefficients of \(\det(\mathbf{B})\) and we conclude \[\operatorname{char}_g(x)=x^n-\operatorname{Tr}(g)x^{n-1}+c_{n-2}x^{n-2}+\cdots+c_1x+c_0\] for coefficients \(c_{n-2},\ldots,c_0 \in \mathbb{K}.\) It remains to show that \(c_0=(-1)^n\det(g).\) We have \(c_0=\operatorname{char}_g(0)=\det(-g)=\det(-\mathbf{A}).\) Since the determinant is linear in each row of \(\mathbf{A},\) this gives \(\det(-\mathbf{A})=(-1)^n\det(\mathbf{A}),\) as claimed.
In particular, for \(n=2\) we have \(\operatorname{char}_g(x)=x^2-\operatorname{Tr}(g)x+\det(g).\) Compare with Example 11.42.
Exercises
Let \(U\) and \(U'\) be the subspaces of \(\mathbb{R}^3\) with bases \[\left\{ \begin{pmatrix} 1 \\ 1 \\ 1\end{pmatrix}, \begin{pmatrix}1 \\ 2 \\2 \end{pmatrix}\right\} \quad\text{and}\quad \left\{ \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\right\}\] respectively. Show that \(U\) and \(U'\) are complementary subspaces of \(\mathbb{R}^3.\) Find another subspace \(U'',\) with \(U'' \ne U',\) such that \(U''\) is also complementary to \(U.\)
Solution
Since both basis vectors of \(U\) have their 2nd and 3rd entries equal, this is true of all vectors in \(U.\) Hence \(U'\) is not contained in \(U\); and as \(U'\) is one-dimensional, this implies that \(U \cap U'\) must be zero (so the sum is direct); and \(\dim (U + U') = \dim(U) + \dim(U') = 3\) (so the sum is all of \(\mathbb{R}^3\)). Hence \(\mathbb{R}^3 = U \oplus U'\) as required.
The same argument would apply with \(\begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\) replaced by any vector whose second and third entries were distinct, so we can take \(U''\) to be the span of the standard basis vector \(\vec{e}_3.\)
Let \(\mathbf{A}\in M_{n,n}(\mathbb{K}).\) Show that the map \(t_{\mathbf{A}} : M_{n,n}(\mathbb{K}) \to \mathbb{K}\) defined by \(t_{\mathbf{A}}(\mathbf{B}) = \operatorname{tr}(\mathbf{A}\mathbf{B})\) is linear. Show, conversely, that given any linear map \(t : M_{n,n}(\mathbb{K}) \to \mathbb{K},\) we can find a unique \(\mathbf{A}\) such that \(t = t_{\mathbf{A}}.\)
Compute the characteristic polynomials and eigenvalues of the following matrices, and bases of their eigenspaces: \[(i)\ \begin{pmatrix}7 & -4 \\ -8 & -7 \end{pmatrix},\quad (ii)\ \begin{pmatrix}5 & 2 & 3 \\ -13 & -6 & -11 \\ 4 & 2 & 4 \end{pmatrix}, \quad (iii)\ \begin{pmatrix}3 & 1 & 1 \\ -15 & -5 & -5 \\ 6 & 2 & 2 \end{pmatrix}.\]
Which of them are diagonalisable?
Solution
(i): we have \(\operatorname{char}_A(x) = (x - 7)(x + 7) - 32 = x^2 - 81 = (x - 9)(x + 9),\) so the eigenvalues are \(\pm 9.\)
The \(\lambda = 9\) eigenspace is the kernel of \(9 \cdot 1_2 - A = \begin{pmatrix} 2 & 4 \\ 8 & 16\end{pmatrix},\) which is the 1-dimensional space spanned by \(\begin{pmatrix} 2 \\ -1\end{pmatrix}.\) Similarly, the \(\lambda = -9\) eigenspace is the kernel of \(-9 \cdot 1_2 - A = \begin{pmatrix}-16 & 4 \\ 8 & -2 \end{pmatrix},\) which is 1-dimensional spanned by \(\begin{pmatrix} 1 \\ 4\end{pmatrix}.\)
(ii) We compute that the characteristic polynomial is \(x (x - 1) (x - 2),\) so the eigenvalues are \(\{0, 1, 2\}.\) The eigenspaces are all 1-dimensional, and bases of them are given by the following eigenvectors: \[\lambda = 0 : \begin{pmatrix} 1 \\ -4 \\ 1\end{pmatrix}, \lambda=1 : \begin{pmatrix}1 \\ -5 \\ 2\end{pmatrix}, \lambda=2 : \begin{pmatrix} 1 \\ -3 \\ 1 \end{pmatrix}.\]
(iii) The characteristic polynomial is \(x^3,\) so the only eigenvalue is 0. Hence the eigenspace is just \(\ker A,\) which we compute by taking row echelon form of the transpose (augmented by the identity), as usual:
\[\left(\begin{array}{rrr|rrr} 3 & -15 & 6 & 1 & 0 & 0 \\ 1 & -5 & 2 & 0 & 1 & 0 \\ 1 & -5 & 2 & 0 & 0 & 1 \end{array}\right)\stackrel{RREF}{\leadsto} \left(\begin{array}{rrr|rrr} 1 & -5 & 2 & 0 & 0 & 1 \\ \hline 0 & 0 & 0 & 1 & 0 & -3 \\ 0 & 0 & 0 & 0 & 1 & -1 \end{array}\right)\] so a basis of the eigenspace is \(\left\{ \begin{pmatrix} 1 \\ 0 \\ -3\end{pmatrix}, \begin{pmatrix}0 \\ 1 \\ -1 \end{pmatrix}\right\}.\) (Of course, many other bases are possible.)
In (i) and (ii), the eigenvectors are linearly independent (this is easy to check by hand, and is also an instance of a general theorem from the next chapter) and there are enough of them to form a basis of \(\mathbb{R}^2\) resp. \(\mathbb{R}^3,\) so the matrices are diagonalisable. In (iii), the eigenvectors only span a 2-dimensional subspace of \(\mathbb{R}^3\) so the matrix is not diagonalisable.
(hard!)
Show that the coefficient of \(x^{n-2}\) in \(\operatorname{char}_{\mathbf{A}}(x),\) for \(\mathbf{A}\in M_{n,n}(\mathbb{K}),\) is given by the sum \[\sum_{1 \leqslant i < j \leqslant n} \det \begin{pmatrix} A_{ii} & A_{ij} \\ A_{ji} & A_{jj} \end{pmatrix}\] (the sum over all “second-order diagonal minors” of \(A\)). Can you spot a generalisation to the \(x^{n-r}\) coefficient for an arbitrary \(r\)?
Let \(V\) be a \(\mathbb{K}\)-vector space and \(\mathcal S,\mathcal T\subset V.\) If \(\mathcal S\cup \mathcal T\) is a basis of \(V,\) then \(V=\operatorname{span}(\mathcal S)\oplus\operatorname{span}(\mathcal T).\)
- True
- False
Let \(V\) be a \(\mathbb{K}\)-vector space and \(\mathcal S,\mathcal T\subset V.\) If \(\mathcal S\cup \mathcal T\) is a basis of \(V\) and \(\mathcal S\cap\mathcal T=\emptyset\) then \(V=\operatorname{span}(\mathcal S)\oplus\operatorname{span}(\mathcal T).\)
- True
- False
If \(U_1,U_2,U_3\subset V\) are subspaces of a \(\mathbb{K}\)-vector space, then \(U_1+U_2=U_1+U_3\) implies \(U_2=U_3.\)
- True
- False
If \(U_1,U_2,U_3\subset V\) are subspaces of a \(\mathbb{K}\)-vector space, then \(U_1+U_2=U_1+U_3\) implies \(\dim(U_2)=\dim(U_3).\)
- True
- False
If \(U_1,U_2,U_3\subset V\) are subspaces of a finite-dimensional \(\mathbb{K}\)-vector space, then \(U_1\oplus U_2=U_1\oplus U_3\) implies \(\dim(U_2)=\dim(U_3).\)
- True
- False
If \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) then \(\operatorname{Tr}(\mathbf{A}^T\mathbf{A}) = \operatorname{Tr}(\mathbf{A}\mathbf{A}^T).\)
- True
- False
If \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K}),\) then \(\operatorname{Tr}(\mathbf{A}+\mathbf{B})=\operatorname{Tr}(\mathbf{A})+\operatorname{Tr}(\mathbf{B}).\)
- True
- False
If \(\mathbf{A},\mathbf{B}\in M_{m,n}(\mathbb{K})\) and \(\mathbf{B}\) is invertible, then \(\operatorname{Tr}(\mathbf{B}\mathbf{A}\mathbf{B}^{-1})=\operatorname{Tr}(\mathbf{A}).\)
- True
- False
If \(\mathbf{A}\in M_{m,n}(\mathbb{K}),\) then \(\operatorname{Tr}(\mathbf{A})\ne\operatorname{Tr}(\mathbf{A}^T).\)
- True
- False
If \(1\) is the only eigenvalue of \(\mathbf{A},\) then \(\mathbf{A}=\mathbf{1}_{n}.\)
- True
- False
The eigenvalues of an anti-symmetric matrix \(\mathbf{A}\in M_{2,2}(\mathbb{R})\) are pure imaginary.
- True
- False
\(1\) is an eigenvalue of \(f:\mathsf P_2(\mathbb{R})\to \mathsf P_2(\mathbb{R}),\)\(p\mapsto f(p) = p+\frac{\mathrm d}{\mathrm dx}p\)
- True
- False
The algebraic multiplicity of the eigenvalue \(1\) of \(f:\mathsf P_2(\mathbb{R})\to \mathsf P_2(\mathbb{R}),\)\(p\mapsto f(p) = p+\frac{\mathrm d}{\mathrm dx}p\) equals its geometric multiplicity.
- True
- False
Let \(f:\mathsf P_2(\mathbb{R})\to \mathsf P_2(\mathbb{R}),\)\(p\mapsto f(p) = p+\frac{\mathrm d}{\mathrm dx}p.\) Then \(\dim(\operatorname{Eig}_f(1))=1.\)
- True
- False