10.6 The spectral theorem

We now come to one of the core results of the Linear Algebra II module:

Theorem 10.48 • The spectral theorem

Let \(f : V \to V\) be an endomorphism of the finite dimensional Euclidean space \((V,\langle\cdot{,}\cdot\rangle).\) Then there exists an orthonormal basis of \(V\) consisting of eigenvectors of \(f\) if and only if \(f\) is self-adjoint.

For the proof of this statement we need two lemmas.

Lemma 10.49

Let \((V,\langle\cdot{,}\cdot\rangle)\) be a finite dimensional Euclidean space of dimension \(n\geqslant 1\) and \(f : V \to V\) a self-adjoint endomorphism. Then \(f\) admits an eigenvalue \(\lambda \in \mathbb{R}.\)

Proof. Let \(\mathbf{b}\) be an ordered orthonormal basis of \((V,\langle\cdot{,}\cdot\rangle)\) and \(\mathbf{A}=\mathbf{M}(f,\mathbf{b},\mathbf{b}).\) Since \(f\) is self-adjoint, we have that \(\mathbf{A}=\mathbf{A}^T.\) Recall that the characteristic polynomial \(\operatorname{char}_f : \mathbb{R}\to \mathbb{R}\) of \(f\) satisfies \(\operatorname{char}_f(x)=\det(x \mathbf{1}_{n}-\mathbf{A})\) for all \(x \in \mathbb{R}.\) We may interpret each entry of \(\mathbf{A}\) as a complex number and hence the characteristic polynomial as a function \(\operatorname{char}_f : \mathbb{C}\to \mathbb{C}.\) In doing so, we can apply the fundamental theorem of algebra and conclude that there exists a complex number \(w\) such that \(\operatorname{char}_f(w)=0.\) We next argue that \(w\) has vanishing imaginary part and hence is a real number. Since \(\det(w \mathbf{1}_{n}-\mathbf{A})=0\) we can find a non-zero vector \(\vec{z} \in \mathbb{C}^n\) such that \(\mathbf{A}\vec{z}=w\vec{z}.\) We write \(\vec{z}=\vec{x}+\mathrm{i}\vec{y}\) for vectors \(\vec{x},\vec{y} \in \mathbb{R}^n\) and \(w=s+\mathrm{i}t\) for real numbers \(s,t.\) Decomposing \(\mathbf{A}(\vec{x}+\mathrm{i}\vec{y})=(s+\mathrm{i}t)(\vec{x}+\mathrm{i}\vec{y})\) into real and imaginary parts, we obtain the equations \[\begin{aligned} \mathbf{A}\vec{x}&=s\vec{x}-t\vec{y},\\ \mathbf{A}\vec{y}&=s\vec{y}+t\vec{x}. \end{aligned}\] Using the symmetry of \(\mathbf{A},\) we compute \[\langle \mathbf{A}\vec{x},\vec{y}\rangle_{\mathbf{1}_{n}}=(\mathbf{A}\vec{x})^T\vec{y}=\vec{x}^T\mathbf{A}\vec{y}=\langle \vec{x},\mathbf{A}\vec{y}\rangle_{\mathbf{1}_{n}}.\] Using the above equations, we obtain \[\begin{aligned} \langle \mathbf{A}\vec{x},\vec{y}\rangle_{\mathbf{1}_{n}}&=\langle s\vec{x}-t\vec{y},\vec{y}\rangle_{\mathbf{1}_{n}}=s\langle \vec{x},\vec{y}\rangle_{\mathbf{1}_{n}}-t\Vert \vec{y}\Vert^2=\langle \vec{x},\mathbf{A}\vec{y}\rangle_{\mathbf{1}_{n}}=\langle\vec{x},s\vec{y}+t\vec{x}\rangle_{\mathbf{1}_{n}}\\ &=s\langle \vec{x},\vec{y}\rangle_{\mathbf{1}_{n}}+t\Vert \vec{x}\Vert^2, \end{aligned}\] where \(\Vert \cdot \Vert\) denotes the norm induced by the standard scalar product \(\langle\cdot{,}\cdot\rangle_{\mathbf{1}_{n}}\) on \(\mathbb{R}^n.\) The last equation is equivalent to \[0=t(\Vert \vec{x}\Vert^2 +\Vert \vec{y}\Vert^2).\] Since \(\vec{z}\neq 0_{\mathbb{C}^n},\) the properties of the norm \(\Vert \cdot \Vert\) – see Proposition 10.12 – imply that \((\Vert \vec{x}\Vert^2 +\Vert \vec{y}\Vert^2) > 0\) and hence we must have \(t=0.\)

Recall that a subspace \(U\subset V\) is said to be stable under an endomorphism \(f : V \to V\) if \(f(u) \in U\) for all \(u \in U.\)

Lemma 10.50

Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space, \(f : V \to V\) a self adjoint endomorphism and \(\lambda\) an eigenvalue of \(f.\) Then \((\operatorname{Eig}_f(\lambda))^{\perp}\) is stable under \(f.\)

Proof. Write \(U=\operatorname{Eig}_f(\lambda)\) and let \(w \in U^{\perp}.\) Then, for all \(u \in U\) we obtain \[\langle u,f(w)\rangle=\langle u,f^*(w)\rangle=\langle f(u),w\rangle=\lambda \langle u,w\rangle,\] where we use the self-adjointness of \(f\) and that \(u\) is an eigenvector of \(f.\) Since \(w \in U^{\perp},\) we have \(\langle u,w\rangle=0\) and hence \(\langle u,f(w)\rangle=0\) for all \(u \in U.\) This shows that \(f(w) \in U^{\perp},\) hence \(U^{\perp}\) is stable under \(f.\)

Proof of Theorem 10.48. We first show that if \(f\) admits an orthonormal basis consisting of eigenvectors of \(f,\) then \(f\) must be self-adjoint. Let \(\mathbf{b}=(u_1,\ldots,u_n)\) be such a basis. We need to show that for all \(v,w \in V\) we have \[\langle f(v),w)\rangle=\langle v,f(w)\rangle\] There exist unique scalars \(s_1,\ldots,s_n \in \mathbb{R}\) and \(t_1,\ldots,t_n\in \mathbb{R}\) such that \(v=\sum_{i=1}^n s_iu_i\) and \(w=\sum_{j=1}^n t_j u_j.\) From this we compute \[\begin{aligned} \langle f(v),w)\rangle&=\left\langle f\left(\sum_{i=1}^n s_iu_i\right),\sum_{j=1}^n t_j u_j\right\rangle=\sum_{i=1}^n\sum_{j=1}^ns_i t_j\langle f(u_i),u_j\rangle \\ &=\sum_{i=1}^n\sum_{j=1}^ns_it_j\lambda_i\langle u_i,u_j\rangle=\sum_{i=1}^n\sum_{j=1}^ns_it_j\lambda_i\delta_{ij}=\sum_{i=1}^ns_it_i\lambda_i, \end{aligned}\] where \(\lambda_i \in \mathbb{R}\) denotes the eigenvalue of the eigenvector \(u_i\) for \(i=1,\ldots,n.\) Likewise we have \[\begin{aligned} \langle v,f(w)\rangle&=\sum_{i=1}^n\sum_{j=1}^ns_i t_j\langle u_i,f(u_j)\rangle =\sum_{i=1}^n\sum_{j=1}^ns_it_j\lambda_j\langle u_i,u_j\rangle=\sum_{i=1}^ns_it_i\lambda_i, \end{aligned}\] as claimed.

Conversely, assume that \(f\) is self-adjoint. We will use induction on the dimension \(n\) of \(V\) to show that \((V,\langle\cdot{,}\cdot\rangle)\) admits an orthonormal basis consisting of eigenvector of \(f.\) For \(n=1\) every endomorphism is diagonal, hence there is nothing to show and the statement is anchored.

Inductive step: Assume \(n\geqslant 2\) and that the statement is true for all Euclidean spaces of dimension at most \(n-1.\) By Lemma 10.49 the self-adjoint endomorphism \(f : V \to V\) admits an eigenvalue \(\lambda \in \mathbb{R}.\) Write \(U=\operatorname{Eig}_f(\lambda).\) By Remark 10.16 we have \(V=U\oplus U^{\perp}\) and by Lemma 10.50 we have that \(U^{\perp}\) is stable under \(f.\) We thus obtain a linear map \(\hat{f}=f|_{U^{\perp}} : U^{\perp} \to U^{\perp}\) by restricting \(f\) to \(U^{\perp}.\) Recall that the restriction \(\langle\cdot{,}\cdot\rangle|_{U^{\perp}}\) of \(\langle\cdot{,}\cdot\rangle\) to \(U^{\perp}\) turns \((U^{\perp},\langle\cdot{,}\cdot\rangle|_{U^{\perp}})\) into another Euclidean space. Since \(\dim U\geqslant 1,\) the dimension of \(U^{\perp}\) is at most \(n-1.\) The self-adjointness condition \(f(v)=f^*(v)\) must hold for all vectors \(v \in V\) and hence in particular also for all vectors of \(U^{\perp}\subset V.\) It follows that \(\hat{f} : U^{\perp} \to U^{\perp}\) is self-adjoint with respect to \(\langle\cdot{,}\cdot\rangle|_{U^{\perp}}.\) Write \(k=\dim U^{\perp}.\) By the induction hypothesis there exists an orthonormal basis \(\{u_1,\ldots,u_k\}\) consisting of eigenvectors of \(\hat{f}.\) Since \(\hat{f}=f|_{U^\perp},\) the vectors \(\{u_1,\ldots,u_k\}\) are also eigenvectors of \(f\) and since the inner product of vectors in \(U^{\perp}\) is the same as the inner product computed in \(V,\) it follows that \(\{u_1,\ldots,u_k\}\) is orthonormal with respect to \(\langle\cdot{,}\cdot\rangle.\) Finally, using Gram-Schmidt orthonormalisation (Theorem 10.22), we can find an orthonormal basis \(\{v_1,\ldots,v_{n-k}\}\) of \(U=\operatorname{Eig}_f(\lambda)\) consisting of eigenvectors with eigenvalue \(\lambda.\) It follows that \(\{u_1,\ldots,u_k,v_1,\ldots,v_{n-k}\}\) is an orthonormal basis of \(V\) consisting of eigenvectors of \(f.\)
Again, there is a matrix version of Theorem 10.48:
Theorem 10.51 • Matrix version of the spectral theorem

Let \(n \in \mathbb{N}\) and \(\mathbf{A}\in M_{n,n}(\mathbb{R})\) be a matrix. Then there exists an orthogonal matrix \(\mathbf{R}\in M_{n,n}(\mathbb{R})\) such that \(\mathbf{R}\mathbf{A}\mathbf{R}^T\) is a diagonal matrix if and only if \(\mathbf{A}\) is symmetric.

Proof. We first show that if there exists an orthogonal matrix \(\mathbf{R}\in M_{n,n}(\mathbb{R})\) such that \(\mathbf{R}\mathbf{A}\mathbf{R}^T=\mathbf{D}\) for some diagonal matrix \(\mathbf{D}\in M_{n,n}(\mathbb{R}),\) then \(A\) must be symmetric. Since \(\mathbf{A}=\mathbf{R}^T\mathbf{D}\mathbf{R}\) we obtain \[\mathbf{A}^T=(\mathbf{R}^T\mathbf{D}\mathbf{R})^T=\mathbf{R}^T\mathbf{D}^T\mathbf{R}=\mathbf{R}^T\mathbf{D}\mathbf{R}=\mathbf{A},\] where we use \(\mathbf{D}^T=\mathbf{D}\) and Remark 2.18. For the converse direction consider \(V=\mathbb{R}^n\) equipped with its standard scalar product \(\langle\cdot{,}\cdot\rangle.\) Since \(\mathbf{A}\) is symmetric, the endomorphism \(f_\mathbf{A}: \mathbb{R}^n \to \mathbb{R}^n\) is self-adjoint with respect to \(\langle\cdot{,}\cdot\rangle.\) Applying Theorem 10.48 we can thus find an ordered orthonormal basis \(\mathbf{b}\) of \(\mathbb{R}^n\) consisting of eigenvectors of \(f_\mathbf{A}.\) Denoting by \(\mathbf{e}\) the standard ordered basis of \(\mathbb{R}^n,\) we have by Theorem 3.106 \[\mathbf{M}(f_\mathbf{A},\mathbf{b},\mathbf{b})=\mathbf{C}(\mathbf{e},\mathbf{b})\mathbf{M}(f_\mathbf{A},\mathbf{e},\mathbf{e})\mathbf{C}(\mathbf{e},\mathbf{b})^{-1}.\] The basis \(\mathbf{b}\) consists of eigenvectors of \(f_\mathbf{A},\) hence \(\mathbf{M}(f_\mathbf{A},\mathbf{b},\mathbf{b})\) is a diagonal matrix by Remark 6.30. Now recall from Example 3.95 that \(\mathbf{M}(f_\mathbf{A},\mathbf{e},\mathbf{e})=\mathbf{A},\) thus writing \(\mathbf{R}=\mathbf{C}(\mathbf{e},\mathbf{b}),\) we conclude that \(\mathbf{R}\mathbf{A}\mathbf{R}^{-1}\) is diagonal. The standard ordered basis \(\mathbf{e}\) of \(\mathbb{R}^n\) is orthonormal with respect to the standard scalar product of \(\mathbb{R}^n,\) hence Corollary 10.37 implies that \(\mathbf{R}\) is orthogonal, \(\mathbf{R}^{-1}=\mathbf{R}^T.\) We have thus found an orthogonal matrix \(\mathbf{R}\) so that \(\mathbf{R}\mathbf{A}\mathbf{R}^T\) is diagonal.

10.6.1 Geometric description of self-adjoint endomorphisms

The spectral theorem tells us that self-adjoint endomorphisms can be diagonalised with an orthonormal basis. As a consequence one can give a precise geometric description of self-adjoint mappings. A first key observation towards this end is the following:

Lemma 10.52

Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(f : V \to V\) a self-adjoint endomorphism. Then the eigenspaces of \(f\) are orthogonal. That is, for eigenvalues \(\lambda\neq \mu\) of \(f\) we have \(\langle u,v\rangle=0\) for all \(u \in \operatorname{Eig}_f({\lambda})\) and for all \(v \in \operatorname{Eig}_f({\mu}).\)

Proof. Let \(u \in \operatorname{Eig}_f({\lambda})\) and \(v \in \operatorname{Eig}_f({\mu}).\) Then \[\lambda\langle u,v\rangle=\langle f(u),v\rangle=\langle u,f(v)\rangle=\mu \langle u,v\rangle\] and hence \(0=(\lambda-\mu)\langle u,v\rangle.\) It follows that \(\langle u,v\rangle=0\) since \(\lambda-\mu \neq 0.\)

Recall that a vector space \(V\) is the direct sum of vector subspaces \(U_1,\ldots ,U_k\) of \(V\) if every vector \(v \in V\) can be written uniquely as a sum \(v=u_1+u_2+\cdots+u_k\) with \(u_i \in U_i\) for \(1\leqslant i \leqslant k.\) In this case we write \(V=\bigoplus_{i=1}^k U_i.\) In the presence of an inner product on \(V,\) we may ask that the subspaces \(U_i\) are all orthogonal:

Definition 10.53 • Orthogonal direct sum

Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(U_1,\ldots,U_k\) be subspaces of \(V\) such that \(V=\bigoplus_{i=1}^k U_i.\) We say \(V\) is the orthogonal direct sum of the subspaces \(U_1,\ldots ,U_k\) if for all \(i\neq j,\) we have \(\langle u_i,u_j\rangle=0\) for all \(u_i \in U_i\) and for all \(u_j \in U_j.\) In this case we write \[V=\bigoplus_{i=1}^k\!{}^{\perp}\, U_i.\]

Example 10.54

  1. Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(U\subset V\) a subspace. Then \(V\) is the orthogonal direct sum of \(U\) and \(U^{\perp}.\)

  2. Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space and \(\{u_1,\ldots,u_n\}\) an orthogonal basis of \(V.\) Then \(V\) is the orthogonal direct sum of the subspaces \(U_i=\operatorname{span}\{u_i\}\) for \(1\leqslant i\leqslant n.\)

Proposition 10.55

Let \((V,\langle\cdot{,}\cdot\rangle)\) be a finite dimensional Euclidean space and \(f : V \to V\) a self-adjoint endomorphism. Let \(\{\lambda_1,\ldots,\lambda_k\}\) denote the eigenvalues of \(f.\) Then \[V=\bigoplus_{i=1}^k\!{}^{\perp}\, \mathrm{Eig}_f(\lambda_i).\]

Proof. By Proposition 6.46 the eigenspaces of \(f\) are in direct sum and by Lemma 10.52 this direct sum is orthogonal with respect to \(\langle\cdot{,}\cdot\rangle.\) By Theorem 10.48 \(f\) is diagonalisable, hence \[V=\bigoplus_{i=1}^k\!{}^{\perp}\, \mathrm{Eig}_f(\lambda_i).\]

We now obtain the aforementioned geometric description: A self adjoint endomorphism of a finite dimensional vector space is a linear combination of orthogonal projections.

Proposition 10.56

Let \((V,\langle\cdot{,}\cdot\rangle)\) be a finite dimensional Euclidean space and \(f : V \to V\) a self-adjoint endomorphism with eigenvalues \(\{\lambda_1,\ldots,\lambda_{k}\}.\) Then we have for all \(v \in V\) \[f(v)=\sum_{i=1}^k\lambda_i\Pi^{\perp}_{U_i}(v),\] where we write \(U_i=\operatorname{Eig}_f(\lambda_i).\)

Proof. Let \(g : V \to V\) be the endomorphism defined by the rule \(g(v)=\sum_{i=1}^k\lambda_i\Pi^{\perp}_{U_i}(v)\) for all \(v \in V.\) We want to show that \(f(v)=g(v)\) for all \(v\in V.\) Recall that for an orthogonal projection onto a subspace \(U\subset V\) we have \[\Pi_U^{\perp}(v)=\left\{\begin{array}{cc} v & v \in U, \\ 0_V & v \in U^{\perp}.\end{array}\right.\] Let \(j \in \{1,\ldots,k\}\) and \(v \in U_j=\operatorname{Eig}_f(\lambda_j).\) By Lemma 10.52 we have \(U_j\subset U_i^{\perp}\) for all \(i \in \{1,\ldots,k\}\) with \(j\neq i.\) Therefore, \[g(v)=\sum_{i=1}^k\lambda_i\Pi^{\perp}_{U_i}(v)=\lambda_j\Pi_{U_j}^{\perp}(v)=\lambda_j v=f(v)\] and the two mappings agree on all eigenspaces. Since \(V=\bigoplus_{i=1}^k \operatorname{Eig}_f(\lambda_i),\) the claim follows.

10.7 Quadratic forms

Closely related to the notion of a symmetric bilinear form is that of a quadratic form.

Definition 10.57 • Quadratic form

A function \(q : V \to \mathbb{R}\) is called a quadratic form on \(V\) if there exists a symmetric bilinear form \(\langle\cdot{,}\cdot\rangle\) on \(V\) such that \[q(v)=\langle v,v\rangle\] for all \(v \in V.\)

Remark 10.58

  • The adjective quadratic is used since a quadratic form \(q : V\to \mathbb{R}\) is so-called \(2\)-homogeneous, that is, it satisfies \[q(sv)=s^2 q(v)\] for all \(s\in \mathbb{R}\) and \(v \in V.\)

  • By definition, every symmetric bilinear form \(\langle\cdot{,}\cdot\rangle\) on \(V\) gives rise to a quadratic form \(q.\) The mapping \(\langle\cdot{,}\cdot\rangle\mapsto q\) from the set of symmetric bilinear forms into the set of quadratic forms is thus surjective. That this mapping is also injective is a consequence of the so-called polarisation identity \[4\langle v_1,v_2\rangle=\langle v_1+v_2,v_1+v_2\rangle-\langle v_1-v_2,v_1-v_2\rangle\] which holds for all \(v_1,v_2 \in V.\) Written in terms of the quadratic form associated to \(\langle\cdot{,}\cdot\rangle,\) it becomes \[4\langle v_1,v_2\rangle=q(v_1+v_2)-q(v_1-v_2).\] Therefore, if two symmetric bilinear forms define the same quadratic form, then they must agree.

Example 10.59

  1. Consider \(V=\mathbb{R}^2.\) The function \[q :\mathbb{R}^2 \to \mathbb{R},\qquad \vec{v}=\begin{pmatrix} x \\ y \end{pmatrix} \mapsto q(\vec{v})=2x^2-4xy+5y^2\] is a quadratic form. Indeed, we have \(q(\vec{v})=\langle \vec{v},\vec{v}\rangle_\mathbf{A},\) where \[\mathbf{A}=\begin{pmatrix} 2 & -2 \\ -2 & 5 \end{pmatrix}.\]

  2. Likewise, the function \[q : \mathbb{R}^3 \to \mathbb{R}, \qquad \vec{v}=\begin{pmatrix} x \\ y \\ z \end{pmatrix} \mapsto q(\vec{v})=4xy-6yz+z^2\] is a quadratic form. Indeed, we have \(q(\vec{v})=\langle \vec{v},\vec{v}\rangle_\mathbf{A},\) where \[\mathbf{A}=\begin{pmatrix} 0 & 2 & 0 \\ 2 & 0 & -3 \\ 0 & -3 & 1 \end{pmatrix}.\]

Applying the spectral theorem Theorem 10.48, we see that we can "diagonalise" quadratic forms.
Theorem 10.60 • Principal axes theorem

Let \((V,\langle\cdot{,}\cdot\rangle)\) be a Euclidean space of dimension \(n\in \mathbb{N}\) and \(q : V \to \mathbb{R}\) a quadratic form. Then there exists an orthonormal ordered basis \(\mathbf{b}=(v_1,\ldots,v_n)\) of \(V\) with corresponding linear coordinate system \(\boldsymbol{\beta}: V \to \mathbb{R}^n\) and a diagonal matrix \(\mathbf{D}\in M_{n,n}(\mathbb{R})\) such that for all \(v \in V\) \[q(v)=\boldsymbol{\beta}(v)^T\mathbf{D}\boldsymbol{\beta}(v).\]

Remark 10.61

The lines spanned by the vectors \(v_i\) for \(1\leqslant i\leqslant n\) of the orthonormal basis are known as the principal axes of the quadratic form \(q\). We will explain this terminology below.

Proof of Theorem 10.60. Fix an orthonormal ordered basis \(\mathbf{b}^{\prime}\) of \((V,\langle\cdot{,}\cdot\rangle)\) and let \(\langle\!\langle\cdot{,}\cdot\rangle\!\rangle\) denote the symmetric bilinear form on \(V\) such that \(q(v)=\langle\!\langle v,v\rangle\!\rangle\) for all \(v \in V.\) Let \(\mathbf{A}=\mathbf{M}(\langle\!\langle\cdot{,}\cdot\rangle\!\rangle,\mathbf{b}^{\prime})\) and \(f : V \to V\) denote the endomorphism whose matrix representation is \(\mathbf{A}\) with respect to the ordered basis \(\mathbf{b}^{\prime}\) of \(V.\) Since \(\langle\!\langle\cdot{,}\cdot\rangle\!\rangle\) is a symmetric bilinear form, the matrix \(\mathbf{A}\) is symmetric and hence \(f\) is self-adjoint with respect to \(\langle\cdot{,}\cdot\rangle\) by Example 10.46. Theorem 10.48 implies that there exists an orthonormal ordered basis \(\mathbf{b}\) of \((V,\langle\cdot{,}\cdot\rangle)\) consisting of eigenvectors of \(f.\) Let \(\mathbf{D}=\mathbf{M}(f,\mathbf{b},\mathbf{b})\) be the diagonal matrix representation of \(f\) with respect to \(\mathbf{b}.\) From Proposition 9.6 we have for all \(v \in V\) \[\tag{10.4} q(v)=\langle \!\langle v,v\rangle\!\rangle=\boldsymbol{\beta}(v)^T\mathbf{M}(\langle\!\langle\cdot{,}\cdot\rangle\!\rangle,\mathbf{b})\boldsymbol{\beta}(v).\] By construction we have \(\mathbf{M}(\langle\!\langle\cdot{,}\cdot\rangle\!\rangle,\mathbf{b}^{\prime})=\mathbf{M}(f,\mathbf{b}^{\prime},\mathbf{b}^{\prime}),\) hence Proposition 9.6 gives \[\mathbf{M}(\langle\!\langle\cdot{,}\cdot\rangle\!\rangle,\mathbf{b})=\mathbf{C}^T\mathbf{M}(\langle\!\langle\cdot{,}\cdot\rangle\!\rangle,\mathbf{b}^{\prime})\mathbf{C}=\mathbf{C}^T\mathbf{M}(f,\mathbf{b}^{\prime},\mathbf{b}^{\prime})\mathbf{C},\] where \(\mathbf{C}=\mathbf{C}(\mathbf{b},\mathbf{b}^{\prime}).\) Since both \(\mathbf{b}^{\prime}\) and \(\mathbf{b}\) are ordered basis that are orthonormal with respect to \(\langle\cdot{,}\cdot\rangle,\) Proposition 10.36 implies that \(\mathbf{C}\) is orthogonal, \(\mathbf{C}^T=\mathbf{C}^{-1}.\) Finally, using Theorem 3.106, we thus obtain \[\tag{10.5} \mathbf{M}(\langle\!\langle\cdot{,}\cdot\rangle\!\rangle,\mathbf{b})=\mathbf{C}^{-1}\mathbf{M}(f,\mathbf{b}^{\prime},\mathbf{b}^{\prime})\mathbf{C}=\mathbf{M}(f,\mathbf{b},\mathbf{b})=\mathbf{D}.\] Combining (10.4) and (10.5), we get \[q(v)=\boldsymbol{\beta}(v)^T\mathbf{D}\boldsymbol{\beta}(v),\] as claimed.
Example 10.62(Example 10.59 (i) continued). Here we are in the case where \(V=\mathbb{R}^2\) and \(\langle\cdot{,}\cdot\rangle\) is the standard scalar product. We have \(q(\vec{v})=\langle\!\langle \vec{v},\vec{v}\rangle\!\rangle=\langle \vec{v},\vec{v}\rangle_\mathbf{A}.\) Taking \(\mathbf{b}^{\prime}=\mathbf{e}\) to be the orthonormal standard ordered basis of \(\mathbb{R}^2,\) we get \[\mathbf{M}(\langle\!\langle\cdot{,}\cdot\rangle\!\rangle,\mathbf{b}^{\prime})=\mathbf{A}=\begin{pmatrix} 2 & -2 \\ -2 & 5 \end{pmatrix}.\] Orthonormal eigenvectors of \(\mathbf{A}\) can be computed to be \[\mathbf{b}=(v_1,v_2)=\left(\begin{pmatrix} -\frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}}\end{pmatrix},-\begin{pmatrix} \frac{2}{\sqrt{5}}\\ \frac{1}{\sqrt{5}} \\ \end{pmatrix}\right)\] so that \[\mathbf{C}(\mathbf{b},\mathbf{b}^{\prime})=\begin{pmatrix} -\frac{1}{\sqrt{5}} && -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} && -\frac{1}{\sqrt{5}} \end{pmatrix}\] and \[\mathbf{C}^T\mathbf{A}\mathbf{C}=\begin{pmatrix} 6 && 0 \\ 0 && 1 \end{pmatrix}=\mathbf{D}.\] Writing \[\vec{v}=\begin{pmatrix} x \\ y \end{pmatrix} \qquad \text{and} \qquad \boldsymbol{\beta}(\vec{v})=\begin{pmatrix} X (\vec{v}) \\ Y (\vec{v}) \end{pmatrix},\] we obtain \[X (\vec{v})=\begin{pmatrix} x \\ y \end{pmatrix}^T\begin{pmatrix} -\frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}}\end{pmatrix}=-\frac{x}{\sqrt{5}}+\frac{2y}{\sqrt{5}}\] and \[Y(\vec{v})=-\begin{pmatrix} x \\ y \end{pmatrix}^T\begin{pmatrix} \frac{2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}}\end{pmatrix}=-\frac{2x}{\sqrt{5}}-\frac{y}{\sqrt{5}},\] so that \[q(\vec{v})=2x^2-4xy+5y^2=6X (\vec{v})^2+Y (\vec{v})^2.\]
Figure 10.6: The ellipse defined by the equation \(2x^2-4xy+5y^2=1\) and its principal axes spanned by the orthonormal vectors \(\vec{v}_1\) and \(\vec{v}_2.\)
Remark 10.63

Especially in the physics literature it is customary to also use the letters \(x,y\) to denote functions from \(\mathbb{R}^2 \to \mathbb{R}\) (and likewise for higher dimensions). The function \(x\) returns the first component of a vector \(\vec{v} \in \mathbb{R}^2\) and \(y\) returns the second component, so that for instance \[x\left(\begin{pmatrix} 2 \\ -4\end{pmatrix}\right)=2\qquad \text{and}\qquad y\left(\begin{pmatrix} 3 \\ 5\end{pmatrix}\right)=5.\] Thinking of \(x,y\) as functions – and doing the same for \(X ,Y ,\) the quadratic form from the previous example can then be written as (notice that we write \(q\) and not \(q(\vec{v})\)) \[q=2x^2-4xy+5y^2=6X ^2+Y ^2.\]

Definition 10.64 • Quadric

Let \(q :V \to \mathbb{R}\) be a quadratic form and \(c \in \mathbb{R}.\) A quadric \(Q\) in \(V\) is the set of solutions \(v\in V\) to an equation of the form \(q(v)=c.\)

Example 10.65

The set \[Q=\left\{(x,y) \in \mathbb{R}^2 | 2x^2-4xy+5y^2=1\right\}\] is a quadric in \(\mathbb{R}^2.\) Written this way it is not immediately clear how the set of solution looks like. With respect to our new orthonormal orthonormal basis \(\mathbf{b}=(v_1,v_2)\) provided by the example above, we can however write \(Q\) as \[Q=\left\{\vec{v} \in \mathbb{R}^2| 6X (\vec{v})^2+Y (\vec{v})^2=1\right\}\] and we recognise \(Q\) as an ellipse. The \(X\)-axis spanned by \(v_1\) and the \(Y\)-axis spanned by \(v_2\) are symmetry axes for the ellipse and are known as its principal axes, see Figure 10.6.

Remark 10.66

(\(\heartsuit\) - not examinable). Quadratic forms also play an important role in calculus. Let \(f : \mathbb{R}^n \to \mathbb{R}\) be a twice continuously differentiable function. The Hessian matrix of \(f\) at \(\vec{x}=(x_i)_{1\leqslant i\leqslant n} \in \mathbb{R}^n\) is given by \[[\mathbf{H}_f(\vec{x})]_{ij}=\frac{\partial^2 f}{\partial x_i \partial x_j}\] where \(1\leqslant i,j\leqslant n.\) By the Schwartz theorem, this matrix is symmetric and hence for each \(\vec{x} \in \mathbb{R}^n\) we obtain a quadratic form on \(\mathbb{R}^n\) defined by the rule \[q(\vec{h})=\frac{1}{2}\vec{h}^T\mathbf{H}_f(\vec{x})\vec{h}=\frac{1}{2}\langle \vec{h},\vec{h}\rangle_{\mathbf{H}_f(\vec{x})}.\] for all \(\vec{h} \in \mathbb{R}^n\) and where \(\langle\cdot{,}\cdot\rangle\) denotes the standard scalar product of \(\mathbb{R}^n.\) The significance of this quadratic form arises from the Taylor approximation of \(f.\) For vectors \(\vec{h} \in \mathbb{R}^n\) of small length we have the approximation \[f(\vec{x}+\vec{h})\approx f(\vec{x})+\langle \nabla f(\vec{x}),\vec{h}\rangle+\frac{1}{2}\langle \vec{h},\vec{h}\rangle_{\mathbf{H}_f(\vec{x})},\] where \(\nabla f(\vec{x})\) denotes the gradient of \(f\) at \(\vec{x}.\) Recall that at a critical point \(\vec{x}\) of \(f\) we have \(\nabla f(\vec{x})=0_{\mathbb{R}^n}\) and hence \[f(\vec{x}+\vec{h})\approx f(\vec{x})+q(\vec{h}).\] In order to decide whether \(f\) admits a local maximum / a local minimum at a critical point, one thus needs to investigate the sign of \(q(\vec{h})\) for all \(\vec{h}.\)

The previous remark is one motivation for the following definition:

Definition 10.67

Let \(q : V \to \mathbb{R}\) be a quadratic form on the \(\mathbb{R}\)-vector space \(V.\) Then \(q\) is called

  • positive or positive semi-definite if \(q(v)\geqslant 0\) for all \(v \in V\);

  • positive definite if \(q(v)\geqslant 0\) and \(q(v)=0\) if and only if \(v=0_V\);

  • negative or negative semi-definite if \(q(v)\leqslant 0\) for all \(v \in V\);

  • negative definite if \(q(v)\leqslant 0\) and \(q(v)=0\) if and only if \(v=0_V\);

  • indefinite if there exists \(v \in V\) and \(w \in V\) such that \(q(v)<0\) and \(q(w)>0.\)

By the principal axes theorem (Theorem 10.60), we can write a quadratic form \(q : V \to \mathbb{R}\) on a Euclidean space \((V,\langle\cdot{,}\cdot\rangle)\) as \(q(v)=\boldsymbol{\beta}(v)^T\mathbf{D}\boldsymbol{\beta}(v),\) where \(\mathbf{b}\) is an ordered orthonormal basis of \(V\) and \(\mathbf{D}\) a diagonal matrix.

Exercises

Exercise 10.68

Show the following characterisations:

  1. \(q\) is positive if and only if all diagonal entries of \(\mathbf{D}\) are greater than or equal to zero;

  2. \(q\) is positive definite if and only if all diagonal entries of \(\mathbf{D}\) are positive;

  3. \(q\) is negative if and only if all diagonal entries of \(\mathbf{D}\) are less than or equal to zero;

  4. \(q\) is negative definite if and only if all diagonal entries of \(\mathbf{D}\) are negative;

  5. \(q\) is indefinite if and only if \(\mathbf{D}\) has positive and negative diagonal entries.

Solution

Let \(\mathbf{b}=(v_1,\ldots, v_n)\) be an ordered orthonormal basis of \(V\) such that \(q(v) = \boldsymbol{\beta}(v)^T\mathbf{D}\boldsymbol{\beta}(v)\) for all \(v\in V,\) where \(\mathbf{D}= \operatorname{diag}(d_1,\ldots,d_n).\) Observe that \[q(v_i) = \boldsymbol{\beta}(v_i)^T\mathbf{D}\boldsymbol{\beta}(v_i) = \vec e_i ^T\mathbf{D}\vec e_i=d_i.\]

  1. Assume that \(q\) is positive, i.e. \(q(v)\geqslant 0\) for all \(v\in V.\) By choosing \(v=v_i,\) we obtain \(0\leqslant q(v_i) =d_i\) and hence \(d_i\geqslant 0\) for all \(i=1,\ldots,n.\) Conversely, if \(d_i\geqslant 0\) for all \(i=1,\ldots,n\) and \(v = s_1v_1+\ldots+s_nv_n,\) we compute \[\begin{aligned} q(v) & = \left(\sum_{i=1}^ns_i \vec e_i^T\right)\mathbf{D}\left(\sum_{j=1}^ns_j \vec e_j\right) = \sum_{i,j=1}^ns_is_j[\mathbf{D}]_{ij}=\sum_{i=1}^ns_i^2d_i \geqslant 0. \end{aligned}\]

  2. Assume that \(q\) is positive definite, then \(q(v_i)=d_i>0,\) since \(v_i\ne 0_V\) for all \(i=1,\ldots,n.\) Conversely, the last computation of the item above shows that \[q(v)=\sum_{i=1}^ns_i^2d_i\geqslant 0.\] Since \(d_i>0\) for all \(i\in\{1,\ldots,n\},\) we find \(q(v)=0\) if and only if \(s_i=0\) for all \(i\in\{1,\ldots,n\},\) but then \(v=0_V.\)

  3. Analogous to (a)

  4. Analogous to (b)

  5. Assume that \(q\) is indefinite i.e. there exist vectors \[v=\sum_{i=1}^n s_i v_i, w=\sum_{j=1}^nt_j v_j\] such that \(q(v)<0\) and \(q(w)>0.\) Since \[q(v) = \sum_{i=1}^ns_i^2d_i<0,\] there must be at least one index \(k\) such that \(d_k<0.\) On the other hand, since \[q(w) = \sum_{j=1}^nt_j^2d_j>0,\] there must be at least one index \(\ell\) such that \(d_{\ell}>0.\) Conversely, if \(\mathbf{D}\) has positive and negative entries, let \(d_k<0\) and \(d_{\ell}>0\) for some indices \(1\leqslant k\ne \ell\leqslant n,\) then \(q(v_k)=d_k<0\) and \(q(v_{\ell})=d_{\ell} >0\) and hence \(q\) is indefinite.

Home

Contents

Exercises

Lecture Recordings

Quizzes

Study Weeks