A First Course in Linear Algebra


﻿


A First Course in Linear Algebra


                       by
                  Robert A. Beezer
     Department of Mathematics and Computer Science
              University of Puget Sound


Version 2.02


﻿


Robert A. Beezer is a Professor of Mathematics at the University of Puget Sound, where he has been
on the faculty since 1984. He received a B.S. in Mathematics (with an Emphasis in Computer Science)
from the University of Santa Clara in 1978, a M.S. in Statistics from the University of Illinois at Urbana-
Champaign in 1982 and a Ph.D. in Mathematics from the University of Illinois at Urbana-Champaign in
1984. He teaches calculus, linear algebra and abstract algebra regularly, while his research interests include
the applications of linear algebra to graph theory. His professional website is at http: //buzzard. ups . edu.


Edition
Version 2.02.
November 19, 2008.


Publisher
Robert A. Beezer
Department of Mathematics and Computer Science
University of Puget Sound
1500 North Warner
Tacoma, Washington 98416-1043
USA


© 2004 by Robert A. Beezer.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free
Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with
no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included
in the appendix entitled "GNU Free Documentation License".

The most recent version of this work can always be found at http: I/linear . ups . edu.


﻿


To my wife, Pat.


﻿


Contents


Table of Contents

Contributors

Definitions

Theorems

Notation

Diagrams

Examples

Preface

Acknowledgements


vi


vii


viii


ix


x


xi


xii

xiii


xviii


Part C Core


Chapter SLE Systems of Linear Equations
   WILA What is Linear Algebra? . . . . . . . .
       LA "Linear" + "Algebra" . . . . . . . .
       AA An Application . . . . . . . . . . . .
       READ Reading Questions . . . . . . . .
       EXC Exercises . . . . . . . . . . . . . .
       SQL Solutions . . . . . . . . . . . . . . .
   SSLE Solving Systems of Linear Equations
       SLE Systems of Linear Equations . . . .
       PSS Possibilities for Solution Sets . . . .
       ESEO Equivalent Systems and Equation
       READ Reading Questions . . . . . . . .
       EXC Exercises . . . . . . . . . . . . . .
       SQL Solutions . . . . . . . . . . . . . . .
   RREF Reduced Row-Echelon Form . . . . . .
       MVNSE Matrix and Vector Notation for
       RO Row Operations . . . . . . . . . . .
       RREF Reduced Row-Echelon Form . . .
       READ Reading Questions . . . . . . . .
       EXC Exercises . . . . . . . . . . . . . .


Operations


Systems of


Equations


2
2
2
3
6
7
8
9
9
10
11
17
18
21
24
24
27
29
39
40


vi


﻿
                                                                              CONTENTS vii


        SQL Solutions................................................... 44
   TSS Types of Solution Sets.............................................. 50
        CS Consistent Systems............................................. 50
        FV Free Variables................................................ 55
        READ Reading Questions........................................... 57
        EXC Exercises................................................... 58
        SQL Solutions................................................... 60
   HSE Homogeneous Systems of Equations..................................... 62
        SHS Solutions of Homogeneous Systems.................................. 62
        NSM Null Space of a Matrix......................................... 64
        READ Reading Questions........................................... 66
        EXC Exercises................................................... 67
        SQL Solutions................................................... 69
   NM Nonsingular Matrices............................................... 71
        NM Nonsingular Matrices........................................... 71
        NSNM Null Space of a Nonsingular Matrix................................ 73
        READ Reading Questions........................................... 75
        EXC Exercises................................................... 76
        SQL Solutions................................................... 78
        SLE Systems of Linear Equations...................................... 82

Chapter V Vectors                                                                         83
   VQ Vector Qperations..................................................83
        VEASM Vector Equality, Addition, Scalar Multiplication...................... 84
        VSP Vector Space Properties......................................... 86
        READ Reading Questions........................................... 87
        EXC Exercises................................................... 88
        SQL Solutions................................................... 89
   LC Linear Combinations................................................ 90
        LC Linear Combinations............................................ 90
        VESS Vector Form of Solution Sets..................................... 94
        PSHS Particular Solutions, Homogeneous Solutions.......................... 105
        READ Reading Questions........................................... 107
        EXC Exercises.................................................. 108
        SQL Solutions................................................... 110
   SS Spanning Sets.................................................... 112
        SSV Span of a. Set of Vectos.........................112,


     EXC Exercises.................................................. 142
     SQL Solutions................................................... 146
LDS Linear Dependence and Spans........................................ 152
     LDSS Linearly Dependent Sets and Spans................................ 152


Version 2.02


﻿
                                                                              CONTENTS viii


        COV Casting Out Vectors........................................... 154
        READ Reading Questions........................................... 161
        EXC Exercises.................................................. 162
        SQL Solutions...................................................  164
   0 Orthogonality.....................................................  167
        CAV Complex Arithmetic and Vectors................................... 167
        IP Inner products.................................................  168
        N Norm....................................................... 171
        OV Orthogonal Vectors............................................ 172
        GSP Gram-Schmidt Procedure........................................ 175
        READ Reading Questions........................................... 178
        EXC Exercises.................................................. 179
        SQL Solutions...................................................  180
        V Vectors......................................................  181

Chapter M Matrices                                                                        182
   MO Matrix Operations................................................ 182
        MEASM Matrix Equality, Addition, Scalar Multiplication...................... 182
        VSP Vector Space Properties......................................... 184
        TSM Transposes and Symmetric Matrices................................ 185
        MCC Matrices and Complex Conjugation................................ 187
        AM Adjoint of a Matrix............................................ 189
        READ Reading Questions........................................... 190
        EXC Exercises.................................................. 191
        SOL Solutions................................................... 193
   MM Matrix Multiplication.............................................. 194
        MVP Matrix-Vector Product......................................... 194
        MM Matrix Multiplication........................................... 197
        MMEE Matrix Multiplication, Entry-by-Entry............................. 198
        PMM Properties of Matrix Multiplication................................ 200
        HM Hermitian Matrices............................................ 204
        READ Reading Questions........................................... 206
        EXC Exercises.................................................. 207
        SOL Solutions................................................... 209
   MISLE Matrix Inverses and Systems of Linear Equations......................... 212
        IM Inverse of a Matrix............................................. 213
        CI Cmputing the Inverse of na Matrix......................14


     SOL Solutions................................................... 235
CRS Column and Row Spaces............................................ 236
     CSSE Column Spaces and Systems of Equations............................ 236
     CSSOC Column Space Spanned by Original Columns........................ 239


Version 2.02


﻿
                                                                              CONTENTS ix


       CSNM Column Space of a Nonsingular Matrix............................. 241
       RSM Row Space of a Matrix......................................... 243
       READ Reading Questions........................................... 248
       EXC Exercises.................................................. 249
       SQL Solutions................................................... 253
   ES Four Subsets..................................................... 257
       LNS Left Null Space.............................................. 257
       CRS Computing Column Spaces...................................... 258
       EEF Extended echelon form.......................................... 261
       ES Four Subsets................................................. 263
       READ Reading Questions........................................... 271
       EXC Exercises.................................................. 272
       SQL Solutions................................................... 274
       M Matrices.................................................... 278

Chapter VS Vector Spaces                                                                279
   VS Vector Spaces.................................................... 279
       VS Vector Spaces................................................. 279
       EVS Examples of Vector Spaces....................................... 280
       VSP Vector Space Properties......................................... 285
       RD Recycling Definitions........................................... 288
       READ Reading Questions........................................... 289
       EXC Exercises.................................................. 290
       SQL Solutions................................................... 291
   S Subspaces........................................................ 292
       TS Testing Subspaces.............................................. 293
       TSS The Span of a Set............................................. 297
       SC Subspace Constructions.......................................... 302
       READ Reading Questions........................................... 303
       EXC Exercises.................................................. 304
       SQL Solutions................................................... 305
   LISS Linear Independence and Spanning Sets................................. 308
       LI Linear Independence............................................ 308
       SS Spanning Sets................................................. 312
       VR Vector Representation........................................... 316
       READ Reading Questions........................................... 318
       EXC E xercie..................................19


D Dimension....................................................... 341
     D Dimension................................................... 341
     DVS Dimension of Vector Spaces...................................... 345
     RNM Rank and Nullity of a Matrix.................................... 347


Version 2.02


﻿
CONTENTS x


        RNNM Rank and Nullity of a Nonsingular Matrix . . . . . . . . . . .
        READ Reading Questions . . . . . . . . . . . . . . . . . . . . . . . . .
        EXC Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
        SQL Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
   PD Properties of Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . .
        GT Goldilocks' Theorem . . . . . . . . . . . . . . . . . . . . . . . . . .
        RT Ranks and Transposes . . . . . . . . . . . . . . . . . . . . . . . . .
        DFS Dimension of Four Subspaces . . . . . . . . . . . . . . . . . . . .
        DS Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
        READ Reading Questions . . . . . . . . . . . . . . . . . . . . . . . . .
        EXC Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
        SQL Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
        VS Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter D Determinants
   DM Determinant of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . .
        EM Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . .
        DD Definition of the Determinant . . . . . . . . . . . . . . . . . . . .
        CD Computing Determinants . . . . . . . . . . . . . . . . . . . . . . .
        READ Reading Questions . . . . . . . . . . . . . . . . . . . . . . . . .
        EXC Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
        SQL Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
   PDM Properties of Determinants of Matrices . . . . . . . . . . . . . . . . .
        DRO Determinants and Row Operations . . . . . . . . . . . . . . . . .
        DROEM Determinants, Row Operations, Elementary Matrices . . . .
        DNMMM Determinants, Nonsingular Matrices, Matrix Multiplication
        READ Reading Questions . . . . . . . . . . . . . . . . . . . . . . . . .
        EXC Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
        SQL Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
        D Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


348
350
351
353
355
355
358
360
361
365
366
367
369

370
370
370
374
376
380
381
382
383
383
387
389
392
393
394
395

396
396
396
398
399
403
406
413
414
415
419
424
427
428
429
430
432
432
433


435


Chapter E Eigenvalues
   EE Eigenvalues and Eigenvectors . . . . . . . . . . .
        EEM Eigenvalues and Eigenvectors of a Matrix
        PM Polynomials and Matrices . . . . . . . . . .
        EEE Existence of Eigenvalues and Eigenvectors
        CEE Computing Eigenvalues and Eigenvectors
        ECEE Examples of Computing Eigenvalues and
        READ Reading Questions . . . . . . . . . . . .
        EXC Exercises . . . . . . . . . . . . . . . . . .
        SQL Solutions . . . . . . . . . . . . . . . . . . .
   PEE Properties of Eigenvalues and Eigenvectors . .
        ME Multiplicities of Eigenvalues . . . . . . . .
        EHM Eigenvalues of Hermitian Matrices . . . .
        READ Reading Questions . . . . . . . . . . . .
        EXC Exercises . . . . . . . . . . . . . . . . . .
        SQL Solutions . . . . . . . . . . . . . . . . . . .
   SD Similarity and Diagonalization . . . . . . . . . .
        SM Similar Matrices . . . . . . . . . . . . . . .
        PSM Properties of Similar Matrices . . . . . .
        D Diagonalization . . . . . . . . . . . . . . . .


Eigenvectors


Version 2.02


﻿

                                                                                  CONTENTS xi


        FS Fibonacci Sequences. ....................................           442
        READ Reading Questions................................................ 445
        EXC Exercises.. ............  ......................................... 446
        SQ L Solutions... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
        E Eigenvalues.. ............. .......................................... 451

Chapter LT Linear Transformations                                                            452
   LT Linear Transformations.. ........... ..................................... 452
        LT Linear Transformations. ..................................         452
        LTC Linear Transformation Cartoons.. ......... ............................ 456
        MLT Matrices and Linear Transformations.. ................................. 457
        LTLC Linear Transformations and Linear Combinations.. ...... .................. 461
        PI Pre-Im ages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
        NLTFO New Linear Transformations From Old.. ....... ....................... 467
        READ Reading Questions................................................ 471
        EXC Exercises.. ............  ......................................... 472
        SQ L Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
   ILT Injective Linear Transformations.................................     477
        EILT Examples of Injective Linear Transformations..................... 477
        KLT  Kernel of a Linear Transformation...........................     . 481
        ILTLI Injective Linear Transformations and Linear Independence.............. 485
        ILTD Injective Linear Transformations and Dimension ...... ................... 486
        CILT Composition of Injective Linear Transformations ...... ................... 487
        READ Reading Questions................................................ 487
        EXC Exercises.. ............  ......................................... 488
        SQ L Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
   SLT Surjective Linear Transformations.. ......... ............................... 492
        ESLT Examples of Surjective Linear Transformations ....... .................... 492
        RLT  Range of a Linear Transformation. ..........................    . 496
        SSSLT Spanning Sets and Surjective Linear Transformations................ 500
        SLTD Surjective Linear Transformations and Dimension.. ........................ 502
        CSLT Composition of Surjective Linear Transformations.. ........................ 503
        READ Reading Questions................................................ 503
        EXC Exercises.. ............  ......................................... 504
        SQ L Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
   IVLT Invertible Linear Transformations.. ....................................... 508
        IVLT  Invertible Linear Transformations...........................    . 508
        IV Invertibility........................................ .. .. .. .. .. .. ......511
        SI Structure and Isomorphism........................... .. .. .. .. .. .. ......515
        RNLT Rank and Nullity of a Linear Transformation......... .. .. .. .. .. .......517
        SLELT Systems of Linear Equations and Linear Transformations..... .. .. .. .. .. ....520
        READ Reading Questions................................... .. .. .. .. .. .....521
        EXC Exercises.......................................... .. .. .. .. .. ......522
        SQL Solutions................................................ .. .. .. .. .........524
        LT Linear Transformations................................. .. .. .. .. .. ......528

Chapter R Representations                                                                    530


VR Vector Representations.. ................................................ 530
     CVS Characterization of Vector Spaces.. ........ ............................ 535
     CP Coordinatization Principle.. .......................................... 536
     READ Reading Questions................................................ 539


Version 2.02


﻿
                                                                               CONTENTS xii


        EXC Exercises.. .......... ......................................... 540
        SOL Solutions... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 541
   MR Matrix Representations.. .............................................. 542
        NRFO  New  Representations from  Old.............................   548
        PMR Properties of Matrix Representations.. ............................... 552
        IVLT Invertible Linear Transformations........................... . 557
        READ  Reading Questions.....................................       561
        EXC Exercises.. .......... ......................................... 562
        SO L Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
   CB Change of Basis.. ................................................... 574
        EELT Eigenvalues and Eigenvectors of Linear Transformations ................... 574
        CBM Change-of-Basis Matrix.. ........................................ 575
        MRS  Matrix Representations and  Similarity......................... . 581
        CELT Computing Eigenvectors of Linear Transformations..................587
        READ  Reading Questions.....................................       595
        EXC Exercises.. .......... ......................................... 596
        SO L Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
   OD  Orthonormal Diagonalization..................................     601
        TM Triangular Matrices.. ......... .................................... 601
        UTMR   Upper Triangular Matrix Representation.......................  602
        NM Normal Matrices.. ............................................... 606
        OD Orthonormal Diagonalization ........ ............................... 607
   NLT  Nilpotent Linear Transformations.............................. . 610
        NLT Nilpotent Linear Transformations ................................... 610
        PNLT Properties of Nilpotent Linear Transformations ...... ................... 615
        CFNLT Canonical Form for Nilpotent Linear Transformations ................... 619
   IS Invariant  Subspaces.........................................      627
        IS Invariant Subspaces.. ......... ..................................... 627
        GEE  Generalized Eigenvectors and Eigenspaces....................... . 630
        RLT Restrictions of Linear Transformations ....... ......................... 635
   JCF Jordan Canonical Form.. ............................................. 644
        GESD  Generalized Eigenspace Decomposition.........................  644
        JCF Jordan Canonical Form. ................................    . 650
        CHT Cayley-Hamilton Theorem.. ....... ............................... 663
        R Representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665

Appendix CN Computation Notes                                                             667
   MMA Mathematica.................................. .. .. .. .. .. .. .. .......667
        ME.MMA Matrix Entry...................... .. .. .. .. .. .. .. .......667
        RR.MMA Row Reduce.......................... .. .. .. .. .. .. .. .......667
        LS.MMA Linear Solve....................... .. .. .. .. .. .. .. .. .....668
        VLC.MMA Vector Linear Combinations.............. .. .. .. .. .. .. .. .. ...668
        NS.MMA Null Space...................... .. .. .. .. .. .. .. .. .......669
        VFSS.MMA Vector Form of Solution Set.......... .. .. .. .. .. .. .. .......669
        GSP.MMA Gram-Schmidt Procedure.............. .. .. .. .. .. .. .. .......670
        TM.MMA Transpose of a Matrix............... .. .. .. .. .. .. .. .. .....671


     MM.MMA Matrix Multiplication.. ....... ...............................  671
     M I.M M A M atrix  Inverse....................................  671
T186 Texas Instruments 86. .................................... . 672
     M E.T186 M atrix  Entry.................................... . 672


Version 2.02


﻿
                                                                            CONTENTS xiii


       RR-.T186 Row Reduce............................................. 672
       VLC.T186 Vector Linear Combinations.................................. 672
       TM.T186 Transpose of a Matrix....................................... 673
   T183 Texas Instruments 83.............................................. 673
       ME.T183 Matrix Entry............................................. 673
       RR. T183 Row Reduce............................................. 673
       VLC.T183 Vector Linear Combinations.................................. 674
   SAGE SAGE: Open Source Mathematics Software.............................. 674
       R. SAGE Rings.................................................. 674
       ME.SAGE Matrix Entry............................................ 675
       RR. SAGE Row Reduce............................................ 675
       LS.SAGE Linear Solve............................................. 676
       VLC.SAGE Vector Linear Combinations................................. 677
       MJ.SAGE Matrix Inverse........................................... 677
       TM.SAGE Transpose of a Matrix...................................... 677
       E. SAGE Eigenspaces.............................................. 677

Appendix P Preliminaries                                                               679
   CNO Complex Number Operations........................................ 679
       CNA Arithmetic with complex numbers.................................. 679
       CCN Conjugates of Complex Numbers................................... 681
       MCN Modulus of a Complex Number................................... 682
   SET Sets.........................................................  683
       SC Set Cardinality................................................ 684
       SO Set Operations................................................ 685
   PT Proof Techniques.................................................. 687
       D Definitions...................................................  687
       T Theorems.................................................... 688
       L Language.................................................... 688
       GS Getting Started............................................... 689
       C Constructive Proofs............................................. 690
       E Equivalences..................................................  690
       N Negation....................................................  691
       CP Contrapositives............................................... 691
       CV Converses................................................... 691
       CD Contradiction................................................ 692
       U Uniqueness................................................... 693


C...............................................................712
D...............................................................716
E...............................................................720
F...............................................................724


Version 2.02


﻿
CONTENTS xiv


G
H
I
J
K
L
M
N
0
P
Q
R
S
T
U
V
W
X


729
733
737
741
746
750
754
757
760
763
765
769
772
775
777
779
781
783

786
786
787
787
788
789
790
790
790
790
790
791


Appendix GFDL GNU Free Documentation License
   1. APPLICABILITY AND DEFINITIONS...........................
   2. VERBATIM COPYING. ......... .....................  ...
   3. COPYING IN QUANTITY. ..... .... ... .. ...................
   4. MODIFICATIONS....... .... ... ... . ......................
   5. COMBINING DOCUMENTS..................................
   6. COLLECTIONS OF DOCUMENTS. .............................
   7. AGGREGATION WITH INDEPENDENT WORKS......................
   8. TRANSLATION. ..... .... ... . . .......................
   9. TERMINATION. ..... .... ... . . .......................
   10. FUTURE REVISIONS OF THIS LICENSE. .......................
   ADDENDUM: How to use this License for your documents....................


Part T Topics


F Fields . . . . . . . . . . . . . . . . . . . . . . . .
    F Fields . . . . . . . . . . . . . . . . . . . . .
    FF Finite Fields . . . . . . . . . . . . . . . .
    EXC Exercises . . . . . . . . . . . . . . . . .
    SOL Solutions . . . . . . . . . . . . . . . . . .
T Trace . . . . . . . . . . . . . . . . . . . . . . . .
    EXC Exercises . . . . . . . . . . . . . . . . .
    SOL Solutions . . . . . . . . . . . . . . . . . .
HP Hadamard Product . . . . . . . . . . . . . . .
    DMHP Diagonal Matrices and the Hadamard
    EXC Exercises . . . . . . . . . . . . . . . . .
VM Vandermonde Matrix . . . . . . . . . . . . . .
PSM Positive Semi-definite Matrices . . . . . . . .
    PSM Positive Semi-Definite Matrices . . . . .
    EXC Exercises . . . . . . . . . . . . . . . . .


Product


793
793
794
799
801
802
806
807
808
810
813
814
818
818
821


Version 2.02


﻿
                                                                              CONTENTS xv


Chapter MD Matrix Decompositions                                                         822
   ROD  Rank  One Decomposition...................................       . 822
   TD  Triangular  Decomposition....................................       827
        TD Triangular Decomposition ........ ................................. 827
        TDSSE Triangular Decomposition and Solving Systems of Equations .............. 830
        CTD Computing Triangular Decompositions.. .............................. 831
   SVD Singular Value Decomposition ........ ................................. 835
        MAP Matrix-Adjoint Product ........ ................................. 835
        SVD Singular Value Decomposition..............................     838
   SR Square Roots.. ..................................................... 840
        SRM  Square Root of a Matrix. ................................      840
   POD Polar Decomposition.. ............................................... 844


Part A Applications

   CF Curve Fitting.. .......... ........................................... 847
        D F D ata Fitting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
        EXC Exercises.. .................................................. 851
   SAS Sharing A Secret. ........................................        852


Version 2.02


﻿


Contributors


Beezer, David. Belarmine Preparatory School, Tacoma
Beezer, Robert. University of Puget Sound http://buzzard.ups.edu/
Braithwaite, David. Chicago, Illinois
Bucht, Sara. University of Puget Sound
Canfield, Steve. University of Puget Sound
Hubert, Dupont. Creteil, France
Fellez, Sarah. University of Puget Sound
Fickenscher, Eric. University of Puget Sound
Jackson, Martin. University of Puget Sound http://www.math.ups.edu/iiartinj
Hamrick, Mark. St. Louis University
Linenthal, Jacob. University of Puget Sound
Million, Elizabeth. University of Puget Sound
Osborne, Travis. University of Puget Sound
Riegsecker, Joe. Middlebury, Indiana joepye (at) pobox (dot) com
Phelps, Douglas. University of Puget Sound
Shoemaker, Mark. University of Puget Sound
Zimmer, Andy. University of Puget Sound


xvi


﻿


Definitions


Section
Section
SLE
ESYS
EO

Section
M
CV
ZCV
CM
VOC
SOLV
MRLS
AM
RO
REM
RREF
RR


WILA
SSLE
   System of Linear Equations . . . . . . . .
   Equivalent Systems . . . . . . . . . . . . .
   Equation Operations . . . . . . . . . . . .


RREF
   Matrix . . . . . . . . . . . . . . .
   Column Vector . . . . . . . . . . .
   Zero Column Vector . . . . . . . .
   Coefficient Matrix . . . . . . . . .
   Vector of Constants . . . . . . . .
   Solution Vector . . . . . . . . . .
   Matrix Representation of a Linear
   Augmented Matrix . . . . . . . .
   Row Operations . . . . . . . . . .
   Row-Equivalent Matrices . . . . .
   Reduced Row-Echelon Form . . .
   Row-Reducing . . . . . . . . . . .


System


Section TSS
CS        Consistent System . . . . . . . . . . . . .
IDV       Independent and Dependent Variables .

Section HSE
HS        Homogeneous System  . . . . . . . . . . .
TSHSE     Trivial Solution to Homogeneous Systems
NSM       Null Space of a Matrix . . . . . . . . . .

Section NM
SQM       Square Matrix . . . . . . . . . . . . . . .
NM        Nonsingular Matrix . . . . . . . . . . . .
IM        Identity Matrix . . . . . . . . . . . . . .

Section VO
VSCV      Vector Space of Column Vectors . . . . .
CVE       Column Vector Equality . . . . . . . . .
CVA       Column Vector Addition . . . . . . . . .
CVSM      Column Vector Scalar Multiplication . .


9
11
11


24
24
25
25
25
26
26
27
28
28
30
39


50
52


62
62
64


71
71
72


83
84
84
85


of Equations


xvii


﻿
DEFINITIONS xviii


Section LC
LCCV      Linear Combination of Column Vectors . . . . . . . . . . . . . . . . . . . . . . . . . .

Section SS
SSCV      Span of a Set of Column Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section LI
RLDCV     Relation of Linear Dependence for Column Vectors . . . . . . . . . . . . . . . . . . .
LICV      Linear Independence of Column Vectors . . . . . . . . . . . . . . . . . . . . . . . . . .


90


112


132
132


Section LDS
Section 0
CCCV      Complex Conjugate of a Column Vector


IP


Inner Product


NV        Norm of a Vector . . . . . . . .
OV        Orthogonal Vectors . . . . . . .
OSV       Orthogonal Set of Vectors . . .
SUV       Standard Unit Vectors . . . . .
ONS       OrthoNormal Set . . . . . . . .

Section MO
VSM       Vector Space of m x n Matrices
ME        Matrix Equality . . . . . . . . .
MA        Matrix Addition . . . . . . . . .
MSM       Matrix Scalar Multiplication . .
ZM        Zero Matrix  . . . . . . . . . . .
TM        Transpose of a Matrix . . . . . .
SYM       Symmetric Matrix . . . . . . . .
CCM       Complex Conjugate of a Matrix
A         Adjoint . . . . . . . . . . . . . .

Section MM
MVP       Matrix-Vector Product . . . . .
MM        Matrix Multiplication . . . . . .
HM        Hermitian Matrix  . . . . . . . .


167
168
171
172
173
173
177


182
182
182
183
185
185
186
187
189


194
197
205


213


229


236
243


257
261


Section MISLE
MI        Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section MINM
UM        Unitary Matrices  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Section CRS
CSM       Column Space of a Matrix
RSM       Row Space of a Matrix . .


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Section FS
LNS       Left Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EEF       Extended Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Version 2.02


﻿
DEFINITIONS xix


Section VS
V S       Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section S
S          Subspace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TS         Trivial Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LC         Linear Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SS         Span of a Set........    .........................................

Section LISS
RLD        Relation of Linear Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LI         Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TSVS       To Span a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Section B
B

Section D
D
NOM
ROM

Section PD
DS

Section DM\
ELEM
SM
DM

Section PD
Section EE
EEM
CP
EM
AME


279


292
296
297
298


308
308
313


325


341
347
347


361


370
375
375


B asis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Nullity Of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rank Of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


D irect Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SubMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Determinant of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

)M

Eigenvalues and Eigenvectors of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . .
Characteristic Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Eigenspace of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Algebraic Multiplicity of an Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . .


GME       Geometric  Multiplicity of an Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . .


Section PEE
Section SD
SIM        Similar Matrices . . . . . . . . . . . . . .
DIM       Diagonal Matrix  . . . . . . . . . . . . . .
DZM       Diagonalizable Matrix . . . . . . . . . . .


396
403
404
406
406


432
435
435


452
465
467
468


Section LT
LT
PI
LTA
LTSM


Linear Transformation . . . . . . . . . . . .
Pre-Image . . . . . . . . . . . . . . . . . . .
Linear Transformation Addition . . . . . . .
Linear Transformation Scalar Multiplication


Version 2.02


﻿
DEFINITIONS xx


LTC


Linear Transformation Composition


Section
ILT
KLT

Section
SLT
RLT

Section
IDLT
IVLT
IVS
ROLT
NOLT


IL]


SL


[
Injective Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Kernel of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

T


   Surjective Linear Transformation
   Range of a Linear Transformation

IVLT
   Identity Linear Transformation . . .
   Invertible Linear Transformations .
   Isomorphic Vector Spaces . . . . . .
   Rank Of a Linear Transformation
   Nullity Of a Linear Transformation


Section VR
VR         Vector Representation . . . . . . . . . .

Section MR
MR         Matrix Representation . . . . . . . . .

Section CB
EELT       Eigenvalue and Eigenvector of a Linear'
CBM        Change-of-Basis Matrix . . . . . . . . .


Transformation . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .


469


477
481


492
496


508
508
515
517
517


530


542


574
575


601
601
606


610
612


627
631
631
635
641


650


680
680


6R0


Section OD
UTM       Upper Triangular Matrix  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LTM       Lower Triangular Matrix  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NRML      Normal Matrix  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section NLT
NLT       Nilpotent Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
JB        Jordan Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section IS
IS        Invariant Subspace  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
GEV        Generalized Eigenvector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
GES        Generalized Eigenspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LTR       Linear Transformation Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IE        Index of an Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section JCF
JCF       Jordan  Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Section CNO
CNE       Complex Number Equality . . .
CNA       Complex Number Addition . . .
CNM       Complex Number Multiplication .


Version 2.02


﻿
DEFINITIONS xxi


CCN
MCN


Conjugate of a Complex Number
Modulus of a Complex Number


Section SET
SET       Set  . . . . . . . . . . . . .
SSET      Subset  . . . . . . . . . . .
ES        Empty Set . . . . . . . . .
SE        Set Equality . . . . . . . .
C         Cardinality . . . . . . . . .
SU        Set Union  . . . . . . . . .
SI        Set Intersection . . . . . .
SC        Set Complement . . . . . .

Section PT
Section F
F         Field  . . . . . . . . . . . .
IMP       Integers Modulo a Prime

Section T
T         Trace . . . . . . . . . . . .

Section HP
HP        Hadamard Product   . . . .
HID       Hadamard Identity  . . . .
HI        Hadamard Inverse . . . . .

Section VM
VM        Vandermonde Matrix   . . .


681
682


683
683
683
684
684
685
685
685


793
794


802


808
809
809


814


818


839


843


848


Section PSM


PSM

Section
Section
Section
SV


   Positive Semi-Definite Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ROD
TD
SVD
   Singular Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Section SR
SRM       Square Root of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section POD
Section CF
LSS       Least Squares Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Section SAS


Version 2.02


﻿


Theorems


Section WILA
Section SSLE
EOPSS     Equation Operations Preserve Solution Sets . . . . . . . . . . . . . . . . . . . . . . .

Section RREF
REMES     Row-Equivalent Matrices represent Equivalent Systems . . . . . . . . . . . . . . . . .
REMEF     Row-Equivalent Matrix in Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . .
RREFU     Reduced Row-Echelon Form  is Unique  . . . . . . . . . . . . . . . . . . . . . . . . . .

Section TSS
RCLS      Recognizing Consistency of a Linear System . . . . . . . . . . . . . . . . . . . . . . .
ISRN      Inconsistent Systems, r and n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CSRN      Consistent Systems, r and n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FVCS      Free Variables for Consistent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . .
PSSLS     Possible Solution Sets for Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . .
CMVEI     Consistent, More Variables than Equations, Infinite solutions . . . . . . . . . . . . . .

Section HSE
HSC       Homogeneous Systems are Consistent . . . . . . . . . . . . . . . . . . . . . . . . . . .
HMVEI     Homogeneous, More Variables than Equations, Infinite solutions . . . . . . . . . . . .

Section NM
NMRRI     Nonsingular Matrices Row Reduce to the Identity matrix . . . . . . . . . . . . . . . .
NMTNS     Nonsingular Matrices have Trivial Null Spaces . . . . . . . . . . . . . . . . . . . . . .
NMUS      Nonsingular Matrices and Unique Solutions . . . . . . . . . . . . . . . . . . . . . . . .
NME1      Nonsingular Matrix Equivalences, Round 1 . . . . . . . . . . . . . . . . . . . . . . . .


12


28
30
32


53
54
54
55
55
56


62
64


72
74
74
75


86


Section VO
VSPCV     Vector Space Properties of Column Vectors . . . . . . . . . . . .


Section LC
SLSLC     Solutions to Linear Systems are Linear Combinations . . . . . . . . . . . . . . . . . .
VFSLS     Vector Form of Solutions to Linear Systems . . . . . . . . . . . . . . . . . . . . . . .
PSPHS     Particular Solution Plus Homogeneous Solutions . . . . . . . . . . . . . . . . . . . . .

Section SS
SSNS      Spanning Sets for Null Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


93
99
105


118


Section LI


xxii


﻿
THEOREMS


xxiii


THEOREMS xxiii


LIVHS
LIVRN
MVSLD
NMLIC


Linearly Independent Vectors and Homogeneous Systems .
Linearly Independent Vectors, r and n . . . . . . . . . . .
More Vectors than Size implies Linear Dependence . . . .
Nonsingular Matrices have Linearly Independent Columns


NME2      Nonsingular Matrix Equivalences, Round 2 . . . . . . .
BNS       Basis for Null Spaces . . . . . . . . . . . . . . . . . . .

Section LDS
DLDS      Dependency in Linearly Dependent Sets  . . . . . . . .
BS        Basis of a Span . . . . . . . . . . . . . . . . . . . . . .

Section 0
CRVA      Conjugation Respects Vector Addition . . . . . . . . .
CRSM      Conjugation Respects Vector Scalar Multiplication
IPVA      Inner Product and Vector Addition . . . . . . . . . . .
IPSM      Inner Product and Scalar Multiplication . . . . . . . .
IPAC      Inner Product is Anti-Commutative . . . . . . . . . . .
IPN       Inner Products and Norms . . . . . . . . . . . . . . . .
PIP       Positive Inner Products . . . . . . . . . . . . . . . . . .
OSLI      Orthogonal Sets are Linearly Independent . . . . . . .
GSP       Gram-Schmidt Procedure . . . . . . . . . . . . . . . . .

Section MO
VSPM      Vector Space Properties of Matrices . . . . . . . . . . .
SMS       Symmetric Matrices are Square  . . . . . . . . . . . . .
TMA       Transpose and Matrix Addition  . . . . . . . . . . . . .
TMSM      Transpose and Matrix Scalar Multiplication . . . . . .
TT        Transpose of a Transpose . . . . . . . . . . . . . . . . .
CRMA      Conjugation Respects Matrix Addition . . . . . . . . .
CRMSM     Conjugation Respects Matrix Scalar Multiplication  . .
CCM       Conjugate of the Conjugate of a Matrix . . . . . . . . .
MCT       Matrix Conjugation and Transposes . . . . . . . . . . .
AMA       Adjoint and Matrix Addition . . . . . . . . . . . . . . .
AMSM      Adjoint and Matrix Scalar Multiplication . . . . . . . .
AA        Adjoint of an Adjoint . . . . . . . . . . . . . . . . . . .

Section MM
SLEMM     Systems of Linear Equations as Matrix Multiplication
EMMVP     Equal Matrices and Matrix-Vector Products . . . . . .
EMP       Entries of Matrix Products . . . . . . . . . . . . . . . .
MMZM      Matrix Multiplication and the Zero Matrix . . . . . . .
MMIM      Matrix Multiplication and Identity Matrix . . . . . . .
MMDAA     Matrix Multiplication Distributes Across Addition . . .
MMSMM Matrix Multiplication and Scalar Matrix Multiplication
MMA       Matrix Multiplication is Associative ..........
MMIP      Matrix Multiplication and Inner Products.. .. .. ...
MMCC      Matrix Multiplication and Complex Conjugation . ...
MMT       Matrix Multiplication and Transposes... .. .. .. ...
MMAD      Matrix Multiplication and Adjoints... .. .. .. .....
AIP       Adjoint and Inner Product.... .. .. .. .. .. .. ..


134
136
137
138
138
139


152
157


167
167
169
170
170
171
172
174
175


184
186
186
187
187
188
188
188
189
189
189
190


195
196
198
200
200
201
201
202
202
203
203
204


204


Version 2.02


﻿
THEOREMS xxiv


HMIP      Hermitian Matrices and Inner Products . . . . .


Section
TTMI
CINM
MIU
SS
MIMI
MIT
MISM

Section
NPNT
OSIS
NI
NME3
SNCM
UMI
CUMOS
UMPIP


MISLE
   Two-by-Two Matrix Inverse . . . . . . . . . . .
   Computing the Inverse of a Nonsingular Matrix
   Matrix Inverse is Unique . . . . . . . . . . . . .
   Socks and Shoes. . ...................
   Matrix Inverse of a Matrix Inverse . . . . . . . .
   Matrix Inverse of a Transpose . . . . . . . . . .
   Matrix Inverse of a Scalar Multiple . . . . . . .

MINM
   Nonsingular Product has Nonsingular Terms
   One-Sided Inverse is Sufficient . . . . . . . . . .
   Nonsingularity is Invertibility . . . . . . . . . .
   Nonsingular Matrix Equivalences, Round 3 . . .
   Solution with Nonsingular Coefficient Matrix . .
   Unitary Matrices are Invertible . . . . . . . . . .
   Columns of Unitary Matrices are Orthonormal S
   Unitary Matrices Preserve Inner Products . . .


ets


Section CRS
CSCS      Column Spaces and Consistent Systems . . . . . .
BCS       Basis of the Column Space . . . . . . . . . . . . .
CSNM      Column Space of a Nonsingular Matrix . . . . . .
NME4      Nonsingular Matrix Equivalences, Round 4 . . .
REMRS     Row-Equivalent Matrices have equal Row Spaces
BRS       Basis for the Row Space . . . . . . . . . . . . . .
CSRST     Column Space, Row Space, Transpose . . . . . . .

Section FS
PEEF      Properties of Extended Echelon Form . . . . . . .
FS        Four Subsets.. . .....................

Section VS
ZVU       Zero Vector is Unique . . . . . . . . . . . . . . . .
AIU       Additive Inverses are Unique . . . . . . . . . . . .
ZSSM      Zero Scalar in Scalar Multiplication . . . . . . . .
ZVSM      Zero Vector in Scalar Multiplication . . . . . . . .
AISM      Additive Inverses from Scalar Multiplication . .
SMEZV     Scalar Multiplication Equals the Zero Vector . .

Section S
TSS       Testing Subsets for Subspaces . . . . . . . . . . .
NSMS      Null Space of a Matrix is a Subspace . . . . . . .
SSS       Span of a Set is a Subspace...............
CSMS      Column Space of a Matrix is a Subspace  .
RSMS      Row Space of a Matrix is a Subspace . . . . . . .
LNSMS     Left Null Space of a Matrix is a Subspace  . . .


205


214
217
219
219
220
220
221


226
227
228
228
229
230
230
231


237
239
242
242
244
245
247


262
263


285
286
286
286
287
287


293
296
298
302
303
303


Version 2.02


﻿
THEOREMS xxv


Section LISS
VRRB      Vector Representation Relative to a Basis . . .


Section B
SUVB
CNMB
NME5
COB
UMCOB

Section D
SSLD
BIS
DCM
DP
DM
CRN
RPNC
RNNM
NME6

Section PD
ELIS
G
PSSD
EDYES
RMRT
DFS
DSFB
DSFOS
DSZV
DSZI
DSLI
DSD
RDS

Section DM\
EMDRO
EMN
NMPEM
DMST
DER
DT
DEC


Standard Unit Vectors are a Basis . . . . . . . . .
Columns of Nonsingular Matrix are a Basis . . . .
Nonsingular Matrix Equivalences, Round 5 . . . .
Coordinates and Orthonormal Bases . . . . . . . .
Unitary Matrices Convert Orthonormal Bases . .


Spanning Sets and Linear Dependence . . . . . .
Bases have Identical Sizes . . . . . . . . . . . . . .
Dimension of Cm . . . . . . . . . . . . . . . . . .
Dimension of P . . . . . . . . . . . . . . . . . . .
Dimension of Mmm . . . . . . . . . . . . . . . . .
Computing Rank and Nullity . . . . . . . . . . . .
Rank Plus Nullity is Columns . . . . . . . . . . .
Rank and Nullity of a Nonsingular Matrix . . . .
Nonsingular Matrix Equivalences, Round 6 . . . .


Extending Linearly Independent Sets . . . . . . .
Goldilocks . . . . . . . . . . . . . . . . . . . . . .
Proper Subspaces have Smaller Dimension . . . .
Equal Dimensions Yields Equal Subspaces . . . .
Rank of a Matrix is the Rank of the Transpose .
Dimensions of Four Subspaces . . . . . . . . . . .
Direct Sum From a Basis . . . . . . . . . . . . . .
Direct Sum From One Subspace . . . . . . . . . .
Direct Sums and Zero Vectors . . . . . . . . . . .
Direct Sums and Zero Intersection . . . . . . . . .
Direct Sums and Linear Independence . . . . . . .
Direct Sums and Dimension . . . . . . . . . . . .
Repeated Direct Sums . . . . . . . . . . . . . . .


Elementary Matrices Do Row Operations . . . . .
Elementary Matrices are Nonsingular . . . . . . .
Nonsingular Matrices are Products of Elementary
Determinant of Matrices of Size Two . . . . . . .
Determinant Expansion about Rows . . . . . . . .
Determinant of the Transpose... .. .. .. .....
Determinant Expansion about Columns. .. .. .


317


325
330
331
332
334


341
344
345
345
345
347
348
349
349


355
355
358
358
359
360
361
362
362
363
364
364
365


372
374
374
376
376
377
378


383
383
384
385


Matrices


Section
DZRC
DRCS
DRCM
DERC


PDM
   Determinant with Zero Row or Column . .
   Determinant for Row or Column Swap . .
   Determinant for Row or Column Multiples
   Determinant with Equal Rows or Columns


Version 2.02


﻿
THEOREMS xxvi


DRCMA
DIM
DEM
DEMMM
SMZD
NME7
DRMM


Determinant for Row or Column Multiples and Addition .
Determinant of the Identity Matrix . . . . . . . . . . . . .
Determinants of Elementary Matrices . . . . . . . . . . . .
Determinants, Elementary Matrices, Matrix Multiplication
Singular Matrices have Zero Determinants . . . . . . . . .
Nonsingular Matrix Equivalences, Round 7 . . . . . . . . .
Determinant Respects Matrix Multiplication . . . . . . . .


Section EE
EMHE      Every Matrix Has an Eigenvalue . . . . . . . . . . . . . . . . . . .
EMRCP     Eigenvalues of a Matrix are Roots of Characteristic Polynomials
EMS       Eigenspace for a Matrix is a Subspace . . . . . . . . . . . . . . . .
EMNS      Eigenspace of a Matrix is a Null Space . . . . . . . . . . . . . . .

Section PEE
EDELI     Eigenvectors with Distinct Eigenvalues are Linearly Independent
SMZE      Singular Matrices have Zero Eigenvalues . . . . . . . . . . . . . .
NME8      Nonsingular Matrix Equivalences, Round 8 . . . . . . . . . . . . .
ESMM      Eigenvalues of a Scalar Multiple of a Matrix . . . . . . . . . . . .
EOMP      Eigenvalues Of Matrix Powers . . . . . . . . . . . . . . . . . . . .
EPM       Eigenvalues of the Polynomial of a Matrix . . . . . . . . . . . . .
EIM       Eigenvalues of the Inverse of a Matrix . . . . . . . . . . . . . . . .
ETM       Eigenvalues of the Transpose of a Matrix . . . . . . . . . . . . . .
ERMCP     Eigenvalues of Real Matrices come in Conjugate Pairs . . . . . . .
DCP       Degree of the Characteristic Polynomial . . . . . . . . . . . . . . .
NEM       Number of Eigenvalues of a Matrix . . . . . . . . . . . . . . . . .
ME        Multiplicities of an Eigenvalue . . . . . . . . . . . . . . . . . . . .
MNEM      Maximum Number of Eigenvalues of a Matrix   . . . . . . . . . . .
HMRE      Hermitian Matrices have Real Eigenvalues . . . . . . . . . . . . .
HMOE      Hermitian Matrices have Orthogonal Eigenvectors . . . . . . . . .

Section SD
SER       Similarity is an Equivalence Relation . . . . . . . . . . . . . . . .
SMEE      Similar Matrices have Equal Eigenvalues . . . . . . . . . . . . . .
DC        Diagonalization Characterization . . . . . . . . . . . . . . . . . . .
DMFE      Diagonalizable Matrices have Full Eigenspaces . . . . . . . . . . .
DED       Distinct Eigenvalues implies Diagonalizable . . . . . . . . . . . . .


385
387
388
389
389
390
391


400
404
404
405


419
420
420
421
421
421
422
423
423
424
425
425
427
427
428


433
434
436
438
440


456
459
460
462
462
467
468
469
470


Section LT
LTTZZ     Linear Transformations Take Zero to Zero . . . . . . . . . . . . . . . . . . . . . . . .
MBLT      Matrices Build Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . .
MLTCV     Matrix of a Linear Transformation, Column Vectors . . . . . . . . . . . . . . . . . . .
LTLC      Linear Transformations and Linear Combinations . . . . . . . . . . . . . . . . . . . .
LTDB      Linear Transformation Defined on a Basis . . . . . . . . . . . . . . . . . . . . . . . .
SLTLT     Sum  of Linear Transformations is a Linear Transformation . . . . . . . . . . . . . . .
MLTLT     Multiple of a Linear Transformation is a Linear Transformation . . . . . . . . . . . .
VSLT      Vector Space of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . .
CLTLT     Composition of Linear Transformations is a Linear Transformation . . . . . . . . . .

Section ILT


Version 2.02


﻿
THEOREMS xxvii


KLTS
KPI
KILT
ILTLI
ILTB
ILTD
CILTI


Kernel of a Linear Transformation is a Subspace . . . . . . .
Kernel and Pre-Image . . . . . . . . . . . . . . . . . . . . . .
Kernel of an Injective Linear Transformation . . . . . . . . .
Injective Linear Transformations and Linear Independence .
Injective Linear Transformations and Bases . . . . . . . . . .
Injective Linear Transformations and Dimension . . . . . . .
Composition of Injective Linear Transformations is Injective


Section SLT
RLTS       Range of a Linear Transformation is a Subspace . . . . . . . . . . . . . . . . . . . . .
RSLT       Range of a Surjective Linear Transformation . . . . . . . . . . . . . . . . . . . . . . .
SSRLT      Spanning Set for Range of a Linear Transformation . . . . . . . . . . . . . . . . . . .
RPI        Range and Pre-Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SLTB       Surjective Linear Transformations and Bases . . . . . . . . . . . . . . . . . . . . . . .
SLTD       Surjective Linear Transformations and Dimension . . . . . . . . . . . . . . . . . . . .
CSLTS      Composition of Surjective Linear Transformations is Surjective . . . . . . . . . . . . .

Section IVLT
ILTLT      Inverse of a Linear Transformation is a Linear Transformation . . . . . . . . . . . . .
IILT       Inverse of an Invertible Linear Transformation . . . . . . . . . . . . . . . . . . . . . .
ILTIS      Invertible Linear Transformations are Injective and Surjective . . . . . . . . . . . . .
CIVLT      Composition of Invertible Linear Transformations . . . . . . . . . . . . . . . . . . . .
ICLT       Inverse of a Composition of Linear Transformations . . . . . . . . . . . . . . . . . . .
IVSED      Isomorphic Vector Spaces have Equal Dimension . . . . . . . . . . . . . . . . . . . . .
ROSLT      Rank Of a Surjective Linear Transformation . . . . . . . . . . . . . . . . . . . . . . .
NOILT      Nullity Of an Injective Linear Transformation . . . . . . . . . . . . . . . . . . . . . .
RPNDD      Rank Plus Nullity is Domain Dimension  . . . . . . . . . . . . . . . . . . . . . . . . .

Section VR
VRLT       Vector Representation is a Linear Transformation . . . . . . . . . . . . . . . . . . . .
VRI        Vector Representation is Injective . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VRS        Vector Representation is Surjective . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VRILT      Vector Representation is an Invertible Linear Transformation . . . . . . . . . . . . . .
CFDVS      Characterization of Finite Dimensional Vector Spaces . . . . . . . . . . . . . . . . . .
IFDVS      Isomorphism of Finite Dimensional Vector Spaces . . . . . . . . . . . . . . . . . . . .
CLI        Coordinatization and Linear Independence . . . . . . . . . . . . . . . . . . . . . . . .
CSS        Coordinatization and Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section MR
FTMR       Fundamental Theorem  of Matrix Representation . . . . . . . . . . . . . . . . . . . . .
MRSLT      Matrix Representation of a Sum of Linear Transformations . . . . . . . . . . . . . . .
MRMLT      Matrix Representation of a Multiple of a Linear Transformation . . . . . . . . . . . .
MRCLT      Matrix Representation of a Composition of Linear Transformations . . . . . . . . . .
KNSI       Kernel and Null Space Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RCSI       Range and Column Space Isomorphism   . . . . . . . . . . . . . . . . . . . . . . . . . .
IMR        Invertible Matrix Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IMILT      Invertible Matrices, Invertible Linear Transformation . . . . . . . . . . . . . . . . . .
NME9       Nonsingular Matrix Equivalences, Round 9......... .. ... .. .. .. .. .. .. ...


482
483
484
485
486
486
487


497
498
500
501
501
502
503


511
511
511
514
514
516
517
517
517


530
534
535
535
535
536
536
537


544
548
548
549
552
555
557
560
560


Section CB


Version 2.02


﻿
THEOREMS


xxviii


THEOREMS xxviii


CB
ICBM
MRCB
SCB
EER


Change-of-Basis . . . . . . . . . . . . . . . .
Inverse of Change-of-Basis Matrix . . . . . .
Matrix Representation and Change of Basis
Similarity and Change of Basis . . . . . . . .
Eigenvalues, Eigenvectors, Representations .


Section O1
PTMT
ITMT
UTMR
OBUTR
OD
OBNM

Section NL
NJB
ENLT
DNLT
KPLT
KPNLT
CFNLT

Section IS
EIS
KPIS
GESIS
GEK
RGEN
MRRGE


Product of Triangular Matrices is Triangular . . . . . . . . . . . . . . . . . . . . . . .
Inverse of a Triangular Matrix is Triangular . . . . . . . . . . . . . . . . . . . . . . .
Upper Triangular Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . . .
Orthonormal Basis for Upper Triangular Representation . . . . . . . . . . . . . . . .
Orthonormal Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Orthonormal Bases and Normal Matrices . . . . . . . . . . . . . . . . . . . . . . . . .

T
Nilpotent Jordan Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Eigenvalues of Nilpotent Linear Transformations . . . . . . . . . . . . . . . . . . . . .
Diagonalizable Nilpotent Linear Transformations . . . . . . . . . . . . . . . . . . . . .
Kernels of Powers of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . .
Kernels of Powers of Nilpotent Linear Transformations . . . . . . . . . . . . . . . . .
Canonical Form for Nilpotent Linear Transformations . . . . . . . . . . . . . . . . . .


Eigenspaces are Invariant Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Kernels of Powers are Invariant Subspaces . . . . . . . . . . . . . . . . . . . . . . . .
Generalized Eigenspace is an Invariant Subspace . . . . . . . . . . . . . . . . . . . . .
Generalized Eigenspace as a Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Restriction to Generalized Eigenspace is Nilpotent . . . . . . . . . . . . . . . . . . . .
Matrix Representation of a Restriction to a Generalized Eigenspace . . . . . . . . . .


576
576
581
583
586


601
602
602
605
607
609


614
615
616
616
617
619


629
629
631
632
640
643


644
650
651
663


680
681
682
682


795


802


803


Section JCF
GESD      Generalized Eigenspace Decomposition . . . . . . .
DGES      Dimension of Generalized Eigenspaces . . . . . . . .
JCFLT     Jordan Canonical Form for a Linear Transformation
CHT       Cayley-Hamilton Theorem   . . . . . . . . . . . . . .


. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .


Section CNO
PCNA      Properties of Complex Number Arithmetic . . . . . . . . . . . . . . . . . . . . . . . .
CCRA      Complex  Conjugation Respects Addition . . . . . . . . . . . . . . . . . . . . . . . . .
CCRM      Complex  Conjugation Respects Multiplication . . . . . . . . . . . . . . . . . . . . . .
CCT       Complex  Conjugation Twice  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section SET
Section PT
Section F
FIMP      Field of Integers Modulo a Prime . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section T
TL        Trace is Linear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TSRM      Trace is Symmetric with Respect to Multiplication . . . . . . . . . . . . . . . . . . .


Version 2.02


﻿
THEOREMS xxix


TIST      Trace is Invariant Under Similarity Transformations .
TSE       Trace is the Sum of the Eigenvalues . . . . . . . . . .


Section HP
HPC       Hadamard Product is Commutative   . . . . . . . . . . . . . . . . . . . . . . . . . . . .
HPHID     Hadamard Product with the Hadamard Identity  . . . . . . . . . . . . . . . . . . . . .
HPHI      Hadamard Product with Hadamard Inverses  . . . . . . . . . . . . . . . . . . . . . . .
HPDAA     Hadamard Product Distributes Across Addition . . . . . . . . . . . . . . . . . . . . .
HPSMM     Hadamard Product and Scalar Matrix Multiplication . . . . . . . . . . . . . . . . . .
DMHP      Diagonalizable Matrices and the Hadamard Product . . . . . . . . . . . . . . . . . . .
DMMP      Diagonal Matrices and Matrix Products . . . . . . . . . . . . . . . . . . . . . . . . . .

Section VM
DVM       Determinant of a Vandermonde Matrix  . . . . . . . . . . . . . . . . . . . . . . . . . .
NVM       Nonsingular Vandermonde Matrix  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section PSM
CPSM      Creating Positive Semi-Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . .
EPSM      Eigenvalues of Positive Semi-definite Matrices . . . . . . . . . . . . . . . . . . . . . .

Section ROD
ROD       Rank One Decomposition  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section TD
TD        Triangular Decomposition  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TDEE      Triangular Decomposition, Entry by Entry . . . . . . . . . . . . . . . . . . . . . . . .


803
803


808
809
809
810
810
810
811


814
817


818
819


823


827
831


835
839


840
841
843


844


847
848


Section SVD
EEMAP     Eigenvalues and Eigenvectors of Matrix-Adjoint Product
SVD       Singular Value Decomposition . . . . . . . . . . . . . . .

Section SR
PSMSR     Positive Semi-Definite Matrices and Square Roots . . . .
EESR      Eigenvalues and Eigenspaces of a Square Root . . . . . .
USR       Unique Square Root . . . . . . . . . . . . . . . . . . . . .

Section POD
PDM       Polar Decomposition of a Matrix . . . . . . . . . . . . . .

Section CF
IP        Interpolating Polynomial . . . . . . . . . . . . . . . . . .
LSMR      Least Squares Minimizes Residuals . . . . . . . . . . . .


Section SAS


Version 2.02


﻿


Notation


M         A: Matrix.................................................... 24
MC        [A]~: Matrix Components......................................... 24
CV     v: Column Vector............................................... 25
CVC    [v]i: Column Vector Components..................................... 25
ZCV    0: Zero Column Vector........................................... 25
MRLS   [CS(A, b): Matrix Representation of a Linear System....................... 26
AM     [A b] : Augmented Matrix......................................... 27
RO     Ri HRR, oi, oi + R3: Row Operations............................... 28
RREFA  r, D, F: Reduced Row-Echelon Form Analysis............................ 30
NSM    P1(A): Null Space of a Matrix....................................... 64
IM     Im: Identity Matrix.............................................. 72
VSCV   Cm: Vector Space of Column Vectors.................................. 83
CVE       u= v: Column Vector Equality...................................... 84
CVA       u + v: Column Vector Addition...................................... 85
CVSM  cau: Column Vector Scalar Multiplication............................... 85
SSV    (S): Span of a Set of Vectors....................................... 112
CCCV     ui: Complex Conjugate of a Column Vector.............................. 167
IP        (u, v) : Inner Product............................................  168
NV      v: Norm of a Vector........................................... 171
SUV    ei: Standard Unit Vectors......................................... 173
VSM       Mmm: Vector Space of Matrices..................................... 182
ME        A= B: Matrix Equality........................................... 182
MA        A + B: Matrix Addition.......................................... 183
MSM    a A: Matrix Scalar Multiplication.................................... 183
ZM        0: Zero Matrix................................................ 185
TM        At: Transpose of a Matrix......................................... 185
CCM    A: Complex Conjugate of a Matrix................................... 187
A         A*: Adjoint................................................... 189
MVP       Au: Matrix-Vector Product........................................ 194
MI        A-1: Matrix Inverse.............................................  213
CSM    C(A): Column Space of a Matrix..................................... 236
RSM    7Z(A) : Row Space of a Matrix...................................... 243
LN     [(1 /A):rLeftTNl Sae...............................27


xxx


﻿
NOTATION xxxi


DM
AME
GME
LT
KLT
RLT
ROLT
NOLT
VR
MR
JB
GES
LTR
IE
CNE
CNA
CNM
CCN
SETM
SSET
ES
SE
C
SU
SI
SC
T
HP
HID
HI
SRM


det (A), |AI: Determinant of a Matrix. ....
aA (A): Algebraic Multiplicity of an Eigenvalue .
7YA (A): Geometric Multiplicity of an Eigenvalue


T: U i V: Linear Transformation.. .   .
C(T): Kernel of a Linear Transformation
R(T): Range of a Linear Transformation
r (T): Rank of a Linear Transformation
n (T): Nullity of a Linear Transformation
PB (w): Vector Representation ..  ...
MBc: Matrix Representation........
J (A): Jordan Block.............
gT (A): Generalized Eigenspace.... .  .
T u: Linear Transformation Restriction  .
tT (A): Index of an Eigenvalue.......
a =,3: Complex Number Equality.   ...
a + 3: Complex Number Addition.   ...
a,3: Complex Number Multiplication .
c: Conjugate of a Complex Number . .
x E S: Set Membership .....   ....
S C T: Subset.................
0: Empty Set................ .        .
S = T: Set Equality .......    ...
|S|: Cardinality................
S U T: Set Union...............
S n T: Set Intersection............
S: Set Complement..............
t (A): Trace................. .       .
A o B: Hadamard Product..........
Jmn: Hadamard Identity...........
A: Hadamard Inverse ......        ...
A1/2: Square Root of a Matrix ..  ...


. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .


375
406
406
452
481
496
517
517
530
542
612
631
635
641
680
680
680
681
683
683
683
684
684
685
685
685
802
808
809
809
843


Version 2.02


﻿


Diagrams


DTSLS     Decision Tree for Solving Linear Systems.... .. ...................56
CSRST     Column Space and Row Space Techniques. ... .. ..................271
DLTA      Definition of Linear Transformation, Additive. .. ...................453
DLTM      Definition of Linear Transformation, Multiplicative. . .................. 453
GLT       General Linear Transformation ....... ... .. ...................457
NILT      Non-Injective Linear Transformation. .... ... .. ................. 478
ILT       Injective Linear Transformation...... ... .. . .................480
FTMR      Fundamental Theorem of Matrix Representations. .. ................. 545
FTMRA     Fundamental Theorem of Matrix Representations (Alternate).. ............ 546
MRCLT     Matrix Representation and Composition of Linear Transformations.. ......... 552


xxxii


﻿


Examples


Section WILA
TMP       Trail Mix Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Section SSLE
STNE      Solving two (nonlinear) equations . . . . . . .
NSE       Notation for a system of equations . . . . . . .
TTS       Three typical systems . . . . . . . . . . . . . .
US        Three equations, one solution . . . . . . . . .
IS        Three equations, infinitely many solutions

Section RREF
AM        A  matrix . . . . . . . . . . . . . . . . . . . . .
NSLE      Notation for systems of linear equations . . . .
AMAA      Augmented matrix for Archetype A . . . . . .
TREM      Two row-equivalent matrices . . . . . . . . . .
USR       Three equations, one solution, reprised . . . .
RREF      A matrix in reduced row-echelon form . . . . .
NRREF     A matrix not in reduced row-echelon form . .
SAB       Solutions for Archetype B . . . . . . . . . . .
SAA       Solutions for Archetype A . . . . . . . . . . .
SAE       Solutions for Archetype E . . . . . . . . . . .

Section TSS
RREFN     Reduced row-echelon form notation . . . . . .
ISSI      Describing infinite solution sets, Archetype I
FDV       Free and dependent variables . . . . . . . . . .
CFV       Counting free variables . . . . . . . . . . . . .
OSGMD     One solution gives many, Archetype D . . . .

Section HSE
AHSAC     Archetype C as a homogeneous system  . . . .
HUSAB     Homogeneous, unique solution, Archetype B
HISAA     Homogeneous, infinite solutions, Archetype A
HISAD     Homogeneous, infinite solutions, Archetype D
NSEAI     Null space elements of Archetype I . . . . . .
CNS1      Computing a null space, #1 . . . . . . . . . .
CNS2      Computing a null space, #2 . . . . . . . . . .


3


9
10
10
14
15


24
26
27
28
29
30
30
36
37
38


50
51
52
55
56


62
63
63
63
65
65
66


Section NM


xxxiii


﻿
EXAMPLES xxxiv


S
NM
IM
SRR
NSR
NSS
NSNM

Section VC
VESE
VA
CVSM

Section LC
TLC
ABLC
AALC
VFSAD
VFS
VFSAI
VFSAL
PSHS

Section SS
ABS
SCAA
SCAB
SSNS
NSDS
SCAD

Section LI
LDS
LIS
LIHS
LDHS
LDRN
LLDS
LDCAA
LICAB


A singular matrix, Archetype A . . . . . . . . . . . . .
A nonsingular matrix, Archetype B . . . . . . . . . . .
An identity matrix . . . . . . . . . . . . . . . . . . . .
Singular matrix, row-reduced . . . . . . . . . . . . . . .
Nonsingular matrix, row-reduced . . . . . . . . . . . . .
Null space of a singular matrix . . . . . . . . . . . . . .
Null space of a nonsingular matrix . . . . . . . . . . . .


Vector equality for a system of equations . . . . . . . .
Addition of two vectors in C4 . . . . . . . . . . . . . .
Scalar multiplication in C5 . . . . . . . . . . . . . . . .


Two linear combinations in C6. . . . . . . . . . . . . . .
Archetype B as a linear combination . . . . . . . . . .
Archetype A as a linear combination . . . . . . . . . .
Vector form of solutions for Archetype D . . . . . . . .
Vector form of solutions . . . . . . . . . . . . . . . . . .
Vector form of solutions for Archetype I . . . . . . . . .
Vector form of solutions for Archetype L . . . . . . . .
Particular solutions, homogeneous solutions, Archetype


A basic span . . . . . . . . . . . . . . . . . . . . . . . .
Span of the columns of Archetype A . . . . . . . . . . .
Span of the columns of Archetype B . . . . . . . . . . .
Spanning set of a null space . . . . . . . . . . . . . . .
Null space directly as a span . . . . . . . . . . . . . . .
Span of the columns of Archetype D . . . . . . . . . . .


Linearly dependent set in C5 . . . . . . . . . . . . . . .
Linearly independent set in C5 . . . . . . . . . . . . . .
Linearly independent, homogeneous system . . . . . . .
Linearly dependent, homogeneous system . . . . . . . .
Linearly dependent, r < n . . . . . . . . . . . . . . . .
Large linearly dependent set in C4 . . . . . . . . . . . .
Linearly dependent columns in Archetype A . . . . . .
Linearly independent columns in Archetype B. .. ...


71
72
72
73
73
73
74


84
85
86


D


90
91
92
95
96
102
103
106


112
114
116
118
119
120


132
133
134
135
136
136
137
137
138
140


153
154
159
159


LINSB     Linear independence of null space basis . . . . . . . . . . . .
NSLIL     Null space spanned by linearly independent set, Archetype L

Section LDS
RSC5      Reducing a span in C5  . . . . . . . . . . . . . . . . . . . . .
COV        Casting out vectors . . . . . . . . . . . . . . . . . . . . . . .
RSC4      Reducing a span in C4  . . . . . . . . . . . . . . . . . . . . .
RES       Reworking elements of a span  . . . . . . . . . . . . . . . . .


Section 0


Version 2.02


﻿
EXAMPLES xxxv


CSIP
CNSV
TOV
SUVOS
AOS
GSTV
ONTV
ONFV


Computing some inner products . . . . .
Computing the norm of some vectors . .
Two orthogonal vectors . . . . . . . . . .
Standard Unit Vectors are an Orthogonal
An orthogonal set . . . . . . . . . . . . .
Gram-Schmidt of three vectors . . . . . .
Orthonormal set, three vectors . . . . . .
Orthonormal set, four vectors . . . . . .


Set


Section MO
MA        Addition of two matrices in M23 . . . . . . . . .
MSM       Scalar multiplication in M32 . . . . . . . . . . .
TM        Transpose of a 3 x 4 matrix . . . . . . . . . . .
SYM       A symmetric 5 x 5 matrix . . . . . . . . . . . .
CCM       Complex conjugate of a matrix . . . . . . . . . .

Section MM
MTV       A matrix times a vector . . . . . . . . . . . . . .
MNSLE     Matrix notation for systems of linear equations .
MBC       Money's best cities . . . . . . . . . . . . . . . .
PTM       Product of two matrices . . . . . . . . . . . . .
MMNC      Matrix multiplication is not commutative . . . .
PTMEE     Product of two matrices, entry-by-entry . . . . .

Section MISLE
SABMI     Solutions to Archetype B with a matrix inverse
MWIAA     A matrix without an inverse, Archetype A . . .
MI        Matrix inverse . . . . . . . . . . . . . . . . . . .
CMI       Computing a matrix inverse . . . . . . . . . . .
CMIAB     Computing a matrix inverse, Archetype B . . .


168
171
172
173
173
176
177
178


183
183
185
186
187


194
195
195
197
198
199


212
213
214
216
218


229
229
231


236
237
239
240
241
241
243
245
246
247


257


Section
UM3
UPM
OSMC


MINM
   Unitary matrix of size 3 . . . . . . . . . . . . . .
   Unitary permutation matrix . . . . . . . . . . .
   Orthonormal set from matrix columns . . . . . .


Section CRS
CSMCS     Column space of a matrix and consistent systems
MCSM      Membership in the column space of a matrix . . .
CSTW      Column space, two ways  . . . . . . . . . . . . . .
CSOCD     Column space, original columns, Archetype D   . .
CSAA      Column space of Archetype A . . . . . . . . . . .
CSAB      Column space of Archetype B . . . . . . . . . . .
RSAI      Row space of Archetype I . . . . . . . . . . . . . .
RSREM     Row spaces of two row-equivalent matrices . . . .
IAS       Improving a span . . . . . . . . . . . . . . . . . .
CSROI     Column space from row operations, Archetype I

Section FS
LNS       Left null space . . . . . . . . . . . . . . . . . . . .


Version 2.02


﻿
EXAMPLES xxxvi


CSANS
SEEF
FS1
FS2
FSAG

Section VS
VSCV
VSM
VSP
VSIS
VSF
VSS
CVS
PCVS

Section S
SC3
SP4
NSC2Z
NSC2A
NSC2S
RSNS
LCM
SSP
SM32

Section LI
LIP4
LIM32
LIC
SSP4
SSM22
SSC
AVR

Section B
BP
BM
BSP4
BSM22
BC
RSB
RS
CABAK
CROB4
CROB3

Section D


LDP4


Column space as null space . . . . . . . . .
Submatrices of extended echelon form . . .
Four subsets, #1 . . . . . . . . . . . . . . .
Four subsets, #2 . . . . . . . . . . . . . . .
Four subsets, Archetype G . . . . . . . . .


The vector space Cm. . . . . . . . . . . . .
The vector space of matrices, Mmmn . . . .
The vector space of polynomials, Pn . . . .
The vector space of infinite sequences . . .
The vector space of functions . . . . . . . .
The singleton vector space . . . . . . . . .
The crazy vector space . . . . . . . . . . .
Properties for the Crazy Vector Space . . .


A subspace of C3 . . . . . . . . . . . . . .
A subspace of P4 . . . . . . . . . . . . . .
A non-subspace in C2, zero vector . . . . .
A non-subspace in C2, additive closure . .
A non-subspace in C2, scalar multiplication
Recasting a subspace as a null space . . . .
A linear combination of matrices . . . . . .
Span of a set of polynomials . . . . . . . .
A subspace of M32 . . . . . . . . . . . . . .

SS


closure


258
261
267
268
269


281
281
281
282
282
283
283
288


292
294
295
295
296
297
297
299
300


308
310
312
313
314
315
316


326
326
326
327
328
328
329
330
332
333


344


Linear independence in P4 . . . . . . . . . . . . . . .
Linear independence in M32 . . . . . . . . . . . . . .
Linearly independent set in the crazy vector space . .
Spanning set in P4 . . . . . . . . . . . . . . . . . . . .
Spanning set in M22. . . . . . . . . . . . . . . . . . .
Spanning set in the crazy vector space . . . . . . . .
A vector representation . . . . . . . . . . . . . . . . .


Bases for P . . . . . . . . . . . . . . . . . . . . . . .
A basis for the vector space of matrices . . . . . . . .
A basis for a subspace of P4 . . . . . . . . . . . . . .
A basis for a subspace of M22 . . . . . . . . . . . . .
Basis for the crazy vector space . . . . . . . . . . . .
Row space basis . . . . . . . . . . . . . . . . . . . . .
Reducing a span . . . . . . . . . . . . . . . . . . . . .
Columns as Basis, Archetype K . . . . . . . . . . . .
Coordinatization relative to an orthonormal basis, C4
Coordinatization relative to an orthonormal basis, C3


Linearly dependent set in P4 . . . . . . . . . . . . . .


Version 2.02


﻿
EXAMPLES xxxvii


DSM22     Dimension of a subspace of M2l22.
DSP4      Dimension of a subspace of P4 . . . . . . . . . . . .
DC        Dimension of the crazy vector space . . . . . . . . .
VSPUD     Vector space of polynomials with unbounded degree
RNM       Rank and nullity of a matrix . . . . . . . . . . . . .


.
.
.
.
.


RNSM      Rank and nullity of a square matrix . . . . . . . . .

Section PD
BPR       Bases for Ps, reprised . . . . . . . . . . . . . . . . .
BDM22     Basis by dimension in M22  . . . . . . . . . . . . .
SVP4      Sets of vectors in P4 . . . . . . . . . . . . . . . . . .
RRTI      Rank, rank of transpose, Archetype I . . . . . . . .
SDS       Simple direct sum . . . . . . . . . . . . . . . . . . .

Section DM
EMRO      Elementary matrices and row operations . . . . . .
SS        Some submatrices . . . . . . . . . . . . . . . . . . .
D33M      Determinant of a 3 x 3 matrix . . . . . . . . . . . .
TCSD      Two computations, same determinant . . . . . . . .
DUTM      Determinant of an upper triangular matrix . . . . .


.


Section PDM
DRO       Determinant by row operations . . . . . . .
ZNDAB     Zero and nonzero determinant, Archetypes A


. . . . . . . . . . . . . . . . . . . . . . .
and B . . . . . . . . . . . . . . . . . . .


345
346
346
346
347
348


356
357
357
359
361


371
375
375
379
379


386
390


396
398
401
403
404
405
406
407
408
409
411


422


432
433
435
435
437
440
440


441


Section EE
SEE       Some eigenvalues and eigenvectors . . . . . . .
PM        Polynomial of a matrix . . . . . . . . . . . . .
CAEHW     Computing an eigenvalue the hard way . . . .
CPMS3     Characteristic polynomial of a matrix, size 3
EMS3      Eigenvalues of a matrix, size 3 . . . . . . . . .
ESMS3     Eigenspaces of a matrix, size 3 . . . . . . . . .
EMMS4     Eigenvalue multiplicities, matrix of size 4 . . .
ESMS4     Eigenvalues, symmetric matrix of size 4 . . . .
HMEM5     High multiplicity eigenvalues, matrix of size 5
CEMS6     Complex eigenvalues, matrix of size 6 . . . . .
DEMS5     Distinct eigenvalues, matrix of size 5 . . . . .

Section PEE
BDE       Building desired eigenvalues . . . . . . . . . .

Section SD
SMS5      Similar matrices of size 5 . . . . . . . . . . . .
SMS3      Similar matrices of size 3 . . . . . . . . . . . .
EENS      Equal eigenvalues, not similar . . . . . . . . .
DAB       Diagonalization of Archetype B . . . . . . . .
DMS3      Diagonalizing a matrix of size 3 . . . . . . . .
NDMS4     A non-diagonalizable matrix of size 4 . . . . .
DEHD      Distinct eigenvalues, hence diagonalizable . . .
HPDM      High power of a diagonalizable matrix . . . . .


Version 2.02


﻿
EXAMPLES


xxxviii


EXAMPLES xxxviii


FSCF       Fibonacci sequence, closed form . . . . . . . . . . .

Section LT
ALT        A linear transformation . . . . . . . . . . . . . . . .
NLT        Not a linear transformation . . . . . . . . . . . . . .
LTPM       Linear transformation, polynomials to matrices . .
LTPP       Linear transformation, polynomials to polynomials
LTM        Linear transformation from a matrix . . . . . . . .
MFLT       Matrix from a linear transformation . . . . . . . . .
MOLT       Matrix of a linear transformation . . . . . . . . . .
LTDB1      Linear transformation defined on a basis . . . . . .
LTDB2      Linear transformation defined on a basis . . . . . .
LTDB3      Linear transformation defined on a basis . . . . . .
SPIAS      Sample pre-images, Archetype S . . . . . . . . . . .
STLT       Sum of two linear transformations . . . . . . . . . .
SMLT       Scalar multiple of a linear transformation . . . . . .
CTLT       Composition of two linear transformations . . . . .

Section ILT
NIAQ       Not injective, Archetype Q . . . . . . . . . . . . . .
IAR        Injective, Archetype R . . . . . . . . . . . . . . . .
IAV        Injective, Archetype V . . . . . . . . . . . . . . . .
NKAO       Nontrivial kernel, Archetype 0 . . . . . . . . . . . .
TKAP       Trivial kernel, Archetype P . . . . . . . . . . . . . .
NIAQR      Not injective, Archetype Q, revisited . . . . . . . .
NIAO       Not injective, Archetype 0 . . . . . . . . . . . . . .
IAP        Injective, Archetype P . . . . . . . . . . . . . . . .
NIDAU      Not injective by dimension, Archetype U . . . . . .


442


453
454
455
455
457
459
461
463
464
464
465
468
469
470


477
478
480
481
482
484
485
485
486


492
493
494
496
497
499
499
500
501
502


508
509
512
515


531


533


Section SL
NSAQ
SAR
SAV
RAO
FRAN
NSAQR
NSAO
SAN


T
Not surjective, Archetype Q
Surjective, Archetype R . . .
Surjective, Archetype V . . .
Range, Archetype 0 . . . . .
Full range, Archetype N . .
Not surjective, Archetype Q,
Not surjective, Archetype 0
Surjective, Archetype N . . .


revisited


BRLT       A basis for the range of a linear transformation  . .
NSDAT      Not surjective by dimension, Archetype T . . . . .

Section IVLT
AIVLT      An invertible linear transformation . . . . . . . . .
ANILT      A non-invertible linear transformation . . . . . . . .
CIVLT      Computing the Inverse of a Linear Transformations
IVSAV      Isomorphic vector spaces, Archetype V . . . . . . .

Section VR
VRC4       Vector representation in C4 . . . . . . . . . . . . . .
VRP2       Vector representations in P2 . . . . . . . . . . . . .


Version 2.02


﻿
EXAMPLES xxxix


TIVS
CVSR
ASC
MIVS
CP2
CM32


Two isomorphic vector spaces . . . . . . . . . .
Crazy vector space revealed . . . . . . . . . . .
A subspace characterized . . . . . . . . . . . . .
Multiple isomorphic vector spaces . . . . . . . .
Coordinatizing in P2 . . . . . . . . . . . . . . .
Coordinatization in M32  . . . . . . . . . . . . .


Section MI
OLTTR
ALTMM
MPMR
KVMR
RVMR
ILTVR


R
One linear transformation, three representations . . .
A linear transformation as matrix multiplication . . .
Matrix product of matrix representations . . . . . . .
Kernel via matrix representation . . . . . . . . . . . .
Range via matrix representation . . . . . . . . . . . .
Inverse of a linear transformation via a representation


Section CB
ELTBM      Eigenvectors of linear transformation between matrices . . . . . . . . . . . . . . . . .
ELTBP      Eigenvectors of linear transformation between polynomials . . . . . . . . . . . . . . .
CBP        Change of basis with polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CBCV       Change of basis with column vectors . . . . . . . . . . . . . . . . . . . . . . . . . . .
MRCM       Matrix representations and change-of-basis matrices . . . . . . . . . . . . . . . . . . .
MRBE       Matrix representation with basis of eigenvectors . . . . . . . . . . . . . . . . . . . . .
ELTT       Eigenvectors of a linear transformation, twice . . . . . . . . . . . . . . . . . . . . . .
CELT       Complex eigenvectors of a linear transformation . . . . . . . . . . . . . . . . . . . . .

Section OD
ANM        A normal matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section NLT
NM64       Nilpotent matrix, size 6, index 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NM62       Nilpotent matrix, size 6, index 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
JB4        Jordan block, size 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NJB5       Nilpotent Jordan block, size 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NM83       Nilpotent matrix, size 8, index 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
KPNLT      Kernels of powers of a nilpotent linear transformation . . . . . . . . . . . . . . . . . .
CFNLT      Canonical form for a nilpotent linear transformation . . . . . . . . . . . . . . . . . . .

Section IS
TIS        Two invariant subspaces  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EIS        Eigenspaces as invariant subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ISJB       Invariant subspaces and Jordan blocks . . . . . . . . . . . . . . . . . . . . . . . . . .
GE4        Generalized eigenspaces, dimension 4 domain . . . . . . . . . . . . . . . . . . . . . . .
GE6        Generalized eigenspaces, dimension 6 domain . . . . . . . . . . . . . . . . . . . . . . .
LTRGE      Linear transformation restriction on generalized eigenspace . . . . . . . . . . . . . . .
ISMR4      Invariant subspaces, matrix representation, dimension 4 domain . . . . . . . . . . . .
ISMR6      Invariant subspaces, matrix representation, dimension 6 domain . . . . . . . . . . . .
GENR6      Generalized eigenspaces and nilpotent restrictions, dimension 6 domain . . . . . . . .

Section JCF
JCF10      Jordan canonical form, size 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


536
536
536
536
537
538


542
546
549
553
556
559


574
575
576
579
581
584
587
592


606


610
611
612
613
614
618
623


627
629
630
632
633
635
638
639
641


652


Version 2.02


﻿
EXAMPLES xl


Section
ACN
CSCN
MSCN


CNO
   Arithmetic of complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
   Conjugate of some complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . .
   Modulus of some complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Section SET
SETM      Set membership  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SSE T     Subset  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CS        Cardinality and Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SU        Set union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SI        Set intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SC        Set complement  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section PT
Section F
IM11      Integers mod 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VSIM5     Vector space over integers mod 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SM2Z7     Symmetric matrices of size 2 over Z7 . . . . . . . . . . . . . . . . . . . . . . . . . . .
FF8       Finite field of size 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section T
Section HP
HP        Hadamard   Product  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section VM
VM4       Vandermonde matrix of size 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section PSM
Section ROD
ROD2      Rank one decomposition, size 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ROD4      Rank one decomposition, size 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section TD
TD4       Triangular decomposition, size 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TDSSE     Triangular decomposition solves a system of equations . . . . . . . . . . . . . . . . .
TDEE6     Triangular decomposition, entry by entry, size 6 . . . . . . . . . . . . . . . . . . . . .

Section SVD
Section SR
Section POD
Section CF
PTFP      Polynomial through five points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Section SAS
SS6W      Sharing a secret 6 ways. . .    ..... . . . . . .......................


679
681
682


683
683
684
685
685
686


795
795
796
796


808


814


824
825


829
830
833


847


853


Version 2.02


﻿


Preface


This textbook is designed to teach the university mathematics student the basics of linear algebra and
the techniques of formal mathematics. There are no prerequisites other than ordinary algebra, but it is
probably best used by a student who has the "mathematical maturity" of a sophomore or junior. The text
has two goals: to teach the fundamental concepts and techniques of matrix algebra and abstract vector
spaces, and to teach the techniques associated with understanding the definitions and theorems forming
a coherent area of mathematics. So there is an emphasis on worked examples of nontrivial size and on
proving theorems carefully.
    This book is copyrighted. This means that governments have granted the author a monopoly  the
exclusive right to control the making of copies and derivative works for many years (too many years in
some cases). It also gives others limited rights, generally referred to as "fair use," such as the right to
quote sections in a review without seeking permission. However, the author licenses this book to anyone
under the terms of the GNU Free Documentation License (GFDL), which gives you more rights than most
copyrights (see Appendix GFDL [786]). Loosely speaking, you may make as many copies as you like at no
cost, and you may distribute these unmodified copies if you please. You may modify the book for your own
use. The catch is that if you make modifications and you distribute the modified version, or make use of
portions in excess of fair use in another work, then you must also license the new work with the GFDL. So
the book has lots of inherent freedom, and no one is allowed to distribute a derivative work that restricts
these freedoms. (See the license itself in the appendix for the exact details of the additional rights you
have been given.)
    Notice that initially most people are struck by the notion that this book is free (the French would say
gratuit, at no cost). And it is. However, it is more important that the book has freedom (the French
would say libert, liberty). It will never go "out of print" nor will there ever be trivial updates designed
only to frustrate the used book market. Those considering teaching a course with this book can examine
it thoroughly in advance. Adding new exercises or new sections has been purposely made very easy, and
the hope is that others will contribute these modifications back for incorporation into the book, for the
benefit of all.
    Depending on how you received your copy, you may want to check for the latest version (and other
news) at http://linear.ups.edu/.


Topics The first half of this text (through Chapter M [182]) is basically a course in matrix algebra,
though the foundation of some more advanced ideas is also being formed in these early sections. Vectors
are presented exclusively as column vectors (since we also have the typographic freedom to avoid writing
a column vector inline as the transpose of a row vector), and linear combinations are presented very early.
Spans, null spaces, column spaces and row spaces are also presented early, simply as sets, saving most of
their vector space properties for later, so they are familiar objects before being scrutinized carefully.
   You cannot do everything early, so in particular matrix multiplication comes later than usual. However,
with a definition built on linear combinations of column vectors, it should seem more natural than the
more frequent definition using dot products of rows with columns. And this delay emphasizes that linear
algebra is built upon vector addition and scalar multiplication, Of course, matrix inverses must wait for
matrix multiplication, but this does not prevent nonsingular matrices from occurring sooner. Vector space


xli


﻿
                                                                                         PREFACE xlii


properties are hinted at when vector and matrix operations are first defined, but the notion of a vector
space is saved for a more axiomatic treatment later (Chapter VS [279]). Once bases and dimension have
been explored in the context of vector spaces, linear transformations and their matrix representations
follow. The goal of the book is to go as far as Jordan canonical form in the Core (Part C [2]), with less
central topics collected in the Topics (Part T [793]). A third part contains contributed applications (Part
A [847]), with notation and theorems integrated with the earlier two parts.
    Linear algebra is an ideal subject for the novice mathematics student to learn how to develop a topic
precisely, with all the rigor mathematics requires. Unfortunately, much of this rigor seems to have escaped
the standard calculus curriculum, so for many university students this is their first exposure to careful
definitions and theorems, and the expectation that they fully understand them, to say nothing of the
expectation that they become proficient in formulating their own proofs. We have tried to make this text
as helpful as possible with this transition. Every definition is stated carefully, set apart from the text.
Likewise, every theorem is carefully stated, and almost every one has a complete proof. Theorems usually
have just one conclusion, so they can be referenced precisely later. Definitions and theorems are cataloged
in order of their appearance in the front of the book (Definitions [viii], Theorems [ix]), and alphabetical
order in the index at the back. Along the way, there are discussions of some more important ideas relating
to formulating proofs (Proof Techniques [??]), which is part advice and part logic.


Origin and History This book is the result of the confluence of several related events and trends.

   " At the University of Puget Sound we teach a one-semester, post-calculus linear algebra course to
     students majoring in mathematics, computer science, physics, chemistry and economics. Between
     January 1986 and June 2002, I taught this course seventeen times. For the Spring 2003 semester, I
     elected to convert my course notes to an electronic form so that it would be easier to incorporate the
     inevitable and nearly-constant revisions. Central to my new notes was a collection of stock examples
     that would be used repeatedly to illustrate new concepts. (These would become the Archetypes,
     Appendix A [698].) It was only a short leap to then decide to distribute copies of these notes and
     examples to the students in the two sections of this course. As the semester wore on, the notes began
     to look less like notes and more like a textbook.

   " I used the notes again in the Fall 2003 semester for a single section of the course. Simultaneously, the
     textbook I was using came out in a fifth edition. A new chapter was added toward the start of the
     book, and a few additional exercises were added in other chapters. This demanded the annoyance
     of reworking my notes and list of suggested exercises to conform with the changed numbering of the
     chapters and exercises. I had an almost identical experience with the third course I was teaching
     that semester. I also learned that in the next academic year I would be teaching a course where my
     textbook of choice had gone out of print. I felt there had to be a better alternative to having the
     organization of my courses buffeted by the economics of traditional textbook publishing.

   * I had used TEX and the Internet for many years, so there was little to stand in the way of typesetting,
     distributing and "marketing" a free book. With recreational and professional interests in software
     development, I had long been fascinated by the open-source software movement, as exemplified by
     the success of GNU and Linux, though public-domain TEX might also deserve mention. Obviously,
     this book is an attempt to carry over that model of creative endeavor to textbook publishing.

   * As a sabbatical project during the Spring 2004 semester, I embarked on the current project of creating
     a freely-distributable linear algebra textbook. (Notice the implied financial support of the University


of Puget Sound to this project.) Most of the material was written from scratch since changes in
notation and approach made much of my notes of little use. By August 2004 I had written half the
material necessary for our Math 232 course. The remaining half was written during the Fall 2004
semester as I taught another two sections of Math 232.


Version 2.02


﻿
                                                                                       PREFACE xliii


   " While in early 2005 the book was complete enough to build a course around and Version 1.0 was
     released. Work has continued since, filling out the narrative, exercises and supplements.
However, much of my motivation for writing this book is captured by the sentiments expressed by H.M.
Cundy and A.P. Rollet in their Preface to the First Edition of Mathematical Models (1952), especially the
final sentence,
     This book was born in the classroom, and arose from the spontaneous interest of a Mathematical
     Sixth in the construction of simple models. A desire to show that even in mathematics one could
     have fun led to an exhibition of the results and attracted considerable attention throughout the
     school. Since then the Sherborne collection has grown, ideas have come from many sources, and
     widespread interest has been shown. It seems therefore desirable to give permanent form to the
     lessons of experience so that others can benefit by them and be encouraged to undertake similar
     work.

How To Use This Book Chapters, Theorems, etc. are not numbered in this book, but are instead
referenced by acronyms. This means that Theorem XYZ will always be Theorem XYZ, no matter if
new sections are added, or if an individual decides to remove certain other sections. Within sections,
the subsections are acronyms that begin with the acronym of the section. So Subsection XYZ.AB is the
subsection AB in Section XYZ. Acronyms are unique within their type, so for example there is just one
Definition B [325], but there is also a Section B [325]. At first, all the letters flying around may be confusing,
but with time, you will begin to recognize the more important ones on sight. Furthermore, there are lists
of theorems, examples, etc. in the front of the book, and an index that contains every acronym. If you
are reading this in an electronic version (PDF or XML), you will see that all of the cross-references are
hyperlinks, allowing you to click to a definition or example, and then use the back button to return. In
printed versions, you must rely on the page numbers. However, note that page numbers are not permanent!
Different editions, different margins, or different sized paper will affect what content is on each page. And
in time, the addition of new material will affect the page numbering.
    Chapter divisions are not critical to the organization of the book, as Sections are the main organizational
unit. Sections are designed to be the subject of a single lecture or classroom session, though there is
frequently more material than can be discussed and illustrated in a fifty-minute session. Consequently,
the instructor will need to be selective about which topics to illustrate with other examples and which
topics to leave to the student's reading. Many of the examples are meant to be large, such as using five
or six variables in a system of equations, so the instructor may just want to "walk" a class through these
examples. The book has been written with the idea that some may work through it independently, so the
hope is that students can learn some of the more mechanical ideas on their own.
    The highest level division of the book is the three Parts: Core, Topics, Applications (Part C [2], Part T
[793], Part A [847]). The Core is meant to carefully describe the basic ideas required of a first exposure to
linear algebra. In the final sections of the Core, one should ask the question: which previous Sections could
be removed without destroying the logical development of the subject? Hopefully, the answer is "none."
The goal of the book is to finish the Core with a very general representation of a linear transformation
(Jordan canonical form, Section JCF [644]). Of course, there will not be universal agreement on what
should, or should not, constitute the Core, but the main idea is to limit it to about forty sections. Topics
(Part T [793]) is meant to contain those subjects that are important in linear algebra, and which would
make profitable detours from the Core for those interested in pursuing them. Applications (Part A [847])
should illustrate the power and widespread applicability of linear algebra to as many fields as possible. The
Archetypes (Appendix A [698]) cover many of the computational aspects of systems of linear equations,
matrices and linear transformations. The student should consult them often, and this is encouraged by


exercises that simply suggest the right properties to examine at the right time. But what is more important,
this a repository that contains enough variety to provide abundant examples of key theorems, while also
providing counterexamples to hypotheses or converses of theorems. The summary table at the start of this
appendix should be especially useful.


Version 2.02


﻿
                                                                                         PREFACE xliv


    I require my students to read each Section prior to the day's discussion on that section. For some
students this is a novel idea, but at the end of the semester a few always report on the benefits, both for
this course and other courses where they have adopted the habit. To make good on this requirement, each
section contains three Reading Questions. These sometimes only require parroting back a key definition or
theorem, or they require performing a small example of a key computation, or they ask for musings on key
ideas or new relationships between old ideas. Answers are emailed to me the evening before the lecture.
Given the flavor and purpose of these questions, including solutions seems foolish.
    Every chapter of Part C [2] ends with "Annotated Acronyms", a short list of critical theorems or
definitions from that chapter. There are a variety of reasons for any one of these to have been chosen,
and reading the short paragraphs after some of these might provide insight into the possibilities. An
end-of-chapter review might usefully incorporate a close reading of these lists.
    Formulating interesting and effective exercises is as difficult, or more so, than building a narrative.
But it is the place where a student really learns the material. As such, for the student's benefit, complete
solutions should be given. As the list of exercises expands, the amount with solutions should similarly
expand. Exercises and their solutions are referenced with a section name, followed by a dot, then a
letter (C,M, or T) and a number. The letter 'C' indicates a problem that is mostly computational in
nature, while the letter 'T' indicates a problem that is more theoretical in nature. A problem with a letter
'M' is somewhere in between (middle, mid-level, median, middling), probably a mix of computation and
applications of theorems. So Solution MO.T13 [193] is a solution to an exercise in Section MO [182] that
is theoretical in nature. The number '13' has no intrinsic meaning.

More on Freedom This book is freely-distributable under the terms of the GFDL, along with the
underlying TEX code from which the book is built. This arrangement provides many benefits unavailable
with traditional texts.

   " No cost, or low cost, to students. With no physical vessel (i.e. paper, binding), no transportation
     costs (Internet bandwidth being a negligible cost) and no marketing costs (evaluation and desk copies
     are free to all), anyone with an Internet connection can obtain it, and a teacher could make available
     paper copies in sufficient quantities for a class. The cost to print a copy is not insignificant, but is
     just a fraction of the cost of a traditional textbook when printing is handled by a print-on-demand
     service over the Internet. Students will not feel the need to sell back their book (nor should there be
     much of a market for used copies), and in future years can even pick up a newer edition freely.

   " Electronic versions of the book contain extensive hyperlinks. Specifically, most logical steps in proofs
     and examples include links back to the previous definitions or theorems that support that step. With
     whatever viewer you might be using (web browser, PDF reader) the "back" button can then return
     you to the middle of the proof you were studying. So even if you are reading a physical copy of this
     book, you can benefit from also working with an electronic version.
     A traditional book, which the publisher is unwilling to distribute in an easily-copied electronic form,
     cannot offer this very intuitive and flexible approach to learning mathematics.

   * The book will not go out of print. No matter what, a teacher can maintain their own copy and use the
     book for as many years as they desire. Further, the naming schemes for chapters, sections, theorems,
     etc. is designed so that the addition of new material will not break any course syllabi or assignment
     list.

   * With many eyes reading the book and with frequent postings of updates, the reliability should become
     very high. Please report any errors you find that persist into the latest version.


" For those with a working installation of the popular typesetting program TEX, the book has been
  designed so that it can be customized. Page layouts, presence of exercises, solutions, sections or chap-
  ters can all be easily controlled. Furthermore, many variants of mathematical notation are achieved


Version 2.02


﻿
                                                                                        PREFACE xlv


     via TEX macros. So by changing a single macro, one's favorite notation can be reflected throughout
     the text. For example, every transpose of a matrix is coded in the source as \transpose{A}, which
     when printed will yield At. However by changing the definition of \transpose{ }, any desired al-
     ternative notation (superscript t, superscript T, superscript prime) will then appear throughout the
     text instead.

   " The book has also been designed to make it easy for others to contribute material. Would you like
     to see a section on symmetric bilinear forms? Consider writing one and contributing it to one of the
     Topics chapters. Should there be more exercises about the null space of a matrix? Send me some.
     Historical Notes? Contact me, and we will see about adding those in also.

   " You have no legal obligation to pay for this book. It has been licensed with no expectation that you
     pay for it. You do not even have a moral obligation to pay for the book. Thomas Jefferson (1743 -
     1826), the author of the United States Declaration of Independence, wrote,

          If nature has made any one thing less susceptible than all others of exclusive property, it
          is the action of the thinking power called an idea, which an individual may exclusively
          possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into
          the possession of every one, and the receiver cannot dispossess himself of it. Its peculiar
          character, too, is that no one possesses the less, because every other possesses the whole of it.
          He who receives an idea from me, receives instruction himself without lessening mine; as he
          who lights his taper at mine, receives light without darkening me. That ideas should freely
          spread from one to another over the globe, for the moral and mutual instruction of man,
          and improvement of his condition, seems to have been peculiarly and benevolently designed
          by nature, when she made them, like fire, expansible over all space, without lessening their
          density in any point, and like the air in which we breathe, move, and have our physical
          being, incapable of confinement or exclusive appropriation.

                                                                        Letter to Isaac McPherson
                                                                                  August 13, 1813

     However, if you feel a royalty is due the author, or if you would like to encourage the author, or if you
     wish to show others that this approach to textbook publishing can also bring financial compensation,
     then donations are gratefully received. Moreover, non-financial forms of help can often be even more
     valuable. A simple note of encouragement, submitting a report of an error, or contributing some
     exercises or perhaps an entire section for the Topics or Applications are all important ways you can
     acknowledge the freedoms accorded to this work by the copyright holder and other contributors.

Conclusion Foremost, I hope that students find their time spent with this book profitable. I hope that
instructors find it flexible enough to fit the needs of their course. And I hope that everyone will send me
their comments and suggestions, and also consider the myriad ways they can help (as listed on the book's
website at ht t p: //line ar . ups . e du).

                                                                                      Robert A. Beezer
                                                                                  Tacoma, Washington
                                                                                             July 2008


Version 2.02


﻿


Acknowledgements


Many people have helped to make this book, and its freedoms, possible.
   First, the time to create, edit and distribute the book has been provided implicitly and explicitly by
the University of Puget Sound. A sabbatical leave Spring 2004 and a course release in Spring 2007 are two
obvious examples of explicit support. The latter was provided by support from the Lind-VanEnkevort Fund.
The university has also provided clerical support, computer hardware, network servers and bandwidth.
Thanks to Dean Kris Bartanen and the chair of the Mathematics and Computer Science Department,
Professor Martin Jackson, for their support, encouragement and flexibility.
   My colleagues in the Mathematics and Computer Science Department have graciously taught our
introductory linear algebra course using preliminary versions and have provided valuable suggestions that
have improved the book immeasurably. Thanks to Professor Martin Jackson (v0.30), Professor David Scott
(v0.70) and Professor Bryan Smith (v0.70, 0.80, v1.00).
   University of Puget Sound librarians Lori Ricigliano, Elizabeth Knight and Jeanne Kimura provided
valuable advice on production, and interesting conversations about copyrights.
   Many aspects of the book have been influenced by insightful questions and creative suggestions from
the students who have labored through the book in our courses. For example, the flashcards with theorems
and definitions are a direct result of a student suggestion. I will single out a handful of students have been
especially adept at finding and reporting mathematically significant typographical errors: Jake Linenthal,
Christie Su, Kim Le, Sarah McQuate, Andy Zimmer, Travis Osborne, Andrew Tapay, Mark Shoemaker,
Tasha Underhill, Tim Zitzer, Elizabeth Million, and Steve Canfield.
   I have tried to be as original as possible in the organization and presentation of this beautiful subject.
However, I have been influenced by many years of teaching from another excellent textbook, Introduction
to Linear Algebra by L.W. Johnson, R.D. Reiss and J.T. Arnold. When I have needed inspiration for
the correct approach to particularly important proofs, I have learned to eventually consult two other
textbooks. Sheldon Axler's Linear Algebra Done Right is a highly original exposition, while Ben Noble's
Applied Linear Algebra frequently strikes just the right note between rigor and intuition. Noble's excellent
book is highly recommended, even though its publication dates to 1969.
    Conversion to various electronic formats have greatly depended on assistance from: Eitan Gurari,
author of the powerful LATEX translator, tex4ht; Davide Cervone, author of j sMath; and Carl Witty, who
advised and tested the Sony Reader format. Thanks to these individuals for their critical assistance.
    General support and encouragement of free and affordable textbooks, in addition to specific promotion
of this text, was provided by Nicole Allen, Textbook Advocate at Student Public Interest Research Groups.
Nicole was an early consumer of this material, back when it looked more like lecture notes than a textbook.
   Finally, in every possible case, the production and distribution of this book has been accomplished with
open-source software. The range of individuals and projects is far too great to pretend to list them all.
The book's web site will someday maintain pointers to as many of these projects as possible.


xlvi


﻿


Part C
Core


1


﻿


Chapter SLE

Systems of Linear Equations


We will motivate our study of linear algebra by studying solutions to systems of linear equations. While
the focus of this chapter is on the practical matter of how to find, and describe, these solutions, we will
also be setting ourselves up for more theoretical ideas that will appear later.


Section WILA
What is Linear Algebra?
U.


Subsection LA
"Linear" + "Algebra"


The subject of linear algebra can be partially explained by the meaning of the two terms comprising
the title. "Linear" is a term you will appreciate better at the end of this course, and indeed, attaining
this appreciation could be taken as one of the primary goals of this course. However for now, you can
understand it to mean anything that is "straight" or "flat." For example in the xy-plane you might be
accustomed to describing straight lines (is there any other kind?) as the set of solutions to an equation
of the form y = mx + b, where the slope m and the y-intercept b are constants that together describe
the line. In multivariate calculus, you may have discussed planes. Living in three dimensions, with
coordinates described by triples (x, y, z), they can be described as the set of solutions to equations of the
form ax + by + cz = d, where a, b, c, d are constants that together determine the plane. While we might
describe planes as "flat," lines in three dimensions might be described as "straight." From a multivariate
calculus course you will recall that lines are sets of points described by equations such as x= 3t - 4,
y = -7t + 2, z = 9t, where t is a parameter that can take on any value.
   Another view of this notion of "flatness" is to recognize that the sets of points just described are solutions
to equations of a relatively simple form. These equations involve addition and multiplication only. We
will have a need for subtraction, and occasionally we will divide, but mostly you can describe "linear"
equations as involving only addition and multiplication. Here are some examples of typical equations we
will see in the next few sections:

        2x+3y -4z= 13            4x1+5x2 -x3+x4+xs= 0                9a-2b+7c+2d= -7

What we will not see are equations like:

          xy + 5yz =13          xi + xi/x4 - xax4x j   0         tan(ab) + log(c - d) =-7

The exception will be that we will on occasion need to take a square root.


2


﻿
                                                                Subsection WILA.AA  An Application 3


   You have probably heard the word "algebra" frequently in your mathematical preparation for this
course. Most likely, you have spent a good ten to fifteen years learning the algebra of the real numbers,
along with some introduction to the very similar algebra of complex numbers (see Section CNO [679]).
However, there are many new algebras to learn and use, and likely linear algebra will be your second
algebra. Like learning a second language, the necessary adjustments can be challenging at times, but the
rewards are many. And it will make learning your third and fourth algebras even easier. Perhaps you have
heard of "groups" and "rings" (or maybe you have studied them already), which are excellent examples of
other algebras with very interesting properties and applications. In any event, prepare yourself to learn a
new algebra and realize that some of the old rules you used for the real numbers may no longer apply to
this new algebra you will be learning!
    The brief discussion above about lines and planes suggests that linear algebra has an inherently geomet-
ric nature, and this is true. Examples in two and three dimensions can be used to provide valuable insight
into important concepts of this course. However, much of the power of linear algebra will be the ability to
work with "flat" or "straight" objects in higher dimensions, without concerning ourselves with visualizing
the situation. While much of our intuition will come from examples in two and three dimensions, we will
maintain an algebraic approach to the subject, with the geometry being secondary. Others may wish to
switch this emphasis around, and that can lead to a very fruitful and beneficial course, but here and now
we are laying our bias bare.


Subsection AA
An Application


We conclude this section with a rather involved example that will highlight some of the power and tech-
niques of linear algebra. Work through all of the details with pencil and paper, until you believe all the
assertions made. However, in this introductory example, do not concern yourself with how some of the
results are obtained or how you might be expected to solve a similar problem. We will come back to
this example later and expose some of the techniques used and properties exploited. For now, use your
background in mathematics to convince yourself that everything said here really is correct.

Example TMP
Trail Mix Packaging
Suppose you are the production manager at a food-packaging plant and one of your product lines is trail
mix, a healthy snack popular with hikers and backpackers, containing raisins, peanuts and hard-shelled
chocolate pieces. By adjusting the mix of these three ingredients, you are able to sell three varieties of this
item. The fancy version is sold in half-kilogram packages at outdoor supply stores and has more chocolate
and fewer raisins, thus commanding a higher price. The standard version is sold in one kilogram packages
in grocery stores and gas station mini-markets. Since the standard version has roughly equal amounts of
each ingredient, it is not as expensive as the fancy version. Finally, a bulk version is sold in bins at grocery
stores for consumers to load into plastic bags in amounts of their choosing. To appeal to the shoppers
that like bulk items for their economy and healthfulness, this mix has many more raisins (at the expense
of chocolate) and therefore sells for less.
   Your production facilities have limited storage space and early each morning you are able to receive
and store 380 kilograms of raisins, 500 kilograms of peanuts and 620 kilograms of chocolate pieces. As
production manager, one of your most important duties is to decide how much of each version of trail mix
to make every day. Clearly, you can have up to 1500 kilograms of raw ingredients available each day, so


to be the most productive you will likely produce 1500 kilograms of trail mix each day. Also, you would
prefer not to have any ingredients leftover each day, so that your final product is as fresh as possible and
so that you can receive the maximum delivery the next morning. But how should these ingredients be
allocated to the mixing of the bulk, standard and fancy versions?


Version 2.02


﻿
                                                                Subsection WILA.AA  An Application 4


    First, we need a little more information about the mixes. Workers mix the ingredients in 15 kilogram
batches, and each row of the table below gives a recipe for a 15 kilogram batch. There is some additional
information on the costs of the ingredients and the price the manufacturer can charge for the different
versions of the trail mix.

                                Raisins     Peanuts     Chocolate     Cost   Sale Price
                              (kg/batch)   (kg/batch)   (kg/batch)   ($/kg)  ($/kg)
                Bulk               7            6            2        3.69      4.99
                Standard           6            4            5        3.86      5.50
                Fancy              2            5            8        4.45      6.50
                Storage (kg)      380         500          620
                Cost ($/kg)      2.55         4.65         4.80

As production manager, it is important to realize that you only have three decisions to make  the amount
of bulk mix to make, the amount of standard mix to make and the amount of fancy mix to make. Everything
else is beyond your control or is handled by another department within the company. Principally, you are
also limited by the amount of raw ingredients you can store each day. Let us denote the amount of each
mix to produce each day, measured in kilograms, by the variable quantities b, s and f. Your production
schedule can be described as values of b, s and f that do several things. First, we cannot make negative
quantities of each mix, so

                      b>0                        s>0                       f >0

Second, if we want to consume all of our ingredients each day, the storage capacities lead to three (linear)
equations, one for each ingredient,
                        7     6      2
                        b + -s +      f =380                         (raisins)
                        15    15    15
                        6     4      5
                        b + -s + -f =500                             (peanuts)
                        15    15    15
                        2     5      8
                          -b + -s + -f     620                       (chocolate)
                       15     15    15
It happens that this system of three equations has just one solution. In other words, as production manager,
your job is easy, since there is but one way to use up all of your raw ingredients making trail mix. This
single solution is

                  b =300kg                    s =300kg                    f =900kg.

We do not yet have the tools to explain why this solution is the only one, but it should be simple for you
to verify that this is indeed a solution. (Go ahead, we will wait.) Determining solutions such as this, and
establishing that they are unique, will be the main motivation for our initial study of linear algebra.
    So we have solved the problem of making sure that we make the best use of our limited storage space,
and each day use up all of the raw ingredients that are shipped to us. Additionally, as production manager,
you must report weekly to the CEO of the company, and you know he will be more interested in the profit
derived from your decisions than in the actual production levels. So you compute,

                    300(4.99 - 3.69) + 300(5.50 - 3.86) + 900(6.50 - 4.45) =2727.00

for a daily profit of $2,727 from this production schedule. The computation of the daily profit is also
beyond our control, though it is definitely of interest, and it too looks like a "linear" computation.


   As often happens, things do not stay the same for long, and now the marketing department has
suggested that your company's trail mix products standardize on every mix being one-third peanuts.
Adjusting the peanut portion of each recipe by also adjusting the chocolate portion leads to revised recipes,
and slightly different costs for the bulk and standard mixes, as given in the following table.


Version 2.02


﻿
Subsection WILA.AA   An Application  5


Bulk
Standard
Fancy
Storage (kg)
Cost ($/kg)


I


  Raisins
(kg/batch)
     7
     6
     2
   380
   2.55


Peanuts
(kg/batch)
     5
     5
     5
   500
   4.65


Chocolate
(kg/batch)
    3
    4
    8
    620
    4.80


1=


V


Cost
($/kg)
3.70
3.85
4.45


1=


Sale Price
  ($/kg)
  4.99
  5.50
  6.50


t


t


In a similar fashion as before, we desire values of b, s and f so that


b>0


s>0


f>0


and


7      6      2
15  15s+ 15f38
5      5      5
-b+ -s+ 1f =500
15     15    15
3      4     8
-b+ -s3+ -f =620
15     15    15


(raisins)

(peanuts)

(chocolate)


It now happens that this system of equations has infinitely many solutions, as we will now demonstrate.
Let f remain a variable quantity. Then if we make f kilograms of the fancy mix, we will make 4f - 3300
kilograms of the bulk mix and -5f + 4800 kilograms of the standard mix. Let us now verify that, for any
choice of f, the values of b = 4f - 3300 and s = -5f + 4800 will yield a production schedule that exhausts
all of the day's supply of raw ingredients (right now, do not be concerned about how you might derive
expressions like these for b and s). Grab your pencil and paper and play along.


7                6                  2
   (4f - 3300) + -(-5f + 4800) + -f
15               15                15
5                5                  5
   (4f - 3300) + 1(-5f + 4800) + -f
15               15                15
3                4                  8
    (f- 3300) + -(-5f + 4800) + -f
15               15                15


Of + 5700
       15
Of + 7500
       15
     9300
Of +   1
       15


380

500

620


    Convince yourself that these expressions for b and s allow us to vary f and obtain an infinite number of
possibilities for solutions to the three equations that describe our storage capacities. As a practical matter,
there really are not an infinite number of solutions, since we are unlikely to want to end the day with a
fractional number of bags of fancy mix, so our allowable values of f should probably be integers. More
importantly, we need to remember that we cannot make negative amounts of each mix! Where does this
lead us? Positive quantities of the bulk mix requires that

                              b> 0         4f - 3300> 0          f>825

Similarly for the standard mix,

                             s >0         -5f + 4800    0         f <;960

So, as production manager, you really have to choose a value of f from the finite set

                                          {825, 826, ..., 960}

leaving you with 136 choices, each of which will exhaust the day's supply of raw ingredients. Pause now
and think about which you would choose.


Version 2.02


﻿
                                                           Subsection WILA.READ  Reading Questions 6


    Recalling your weekly meeting with the CEO suggests that you might want to choose a production
schedule that yields the biggest possible profit for the company. So you compute an expression for the
profit based on your as yet undetermined decision for the value of f,

        (4f - 3300) (4.99 - 3.70) + (-5f + 4800) (5.50 - 3.85) + (f)(6.50 - 4.45) = -1.04f + 3663

Since f has a negative coefficient it would appear that mixing fancy mix is detrimental to your profit and
should be avoided. So you will make the decision to set daily fancy mix production at f = 825. This has
the effect of setting b = 4(825) - 3300 = 0 and we stop producing bulk mix entirely. So the remainder of
your daily production is standard mix at the level of s = -5(825) + 4800 = 675 kilograms and the resulting
daily profit is (-1.04)(825) + 3663 = 2805. It is a pleasant surprise that daily profit has risen to $2,805,
but this is not the most important part of the story. What is important here is that there are a large
number of ways to produce trail mix that use all of the day's worth of raw ingredients and you were able
to easily choose the one that netted the largest profit. Notice too how all of the above computations look
"linear."
    In the food industry, things do not stay the same for long, and now the sales department says that
increased competition has led to the decision to stay competitive and charge just $5.25 for a kilogram of the
standard mix, rather than the previous $5.50 per kilogram. This decision has no effect on the possibilities
for the production schedule, but will affect the decision based on profit considerations. So you revisit just
the profit computation, suitably adjusted for the new selling price of standard mix,

         (4f - 3300) (4.99 - 3.70) + (-Sf + 4800) (5.25 - 3.85) + (f)(6.50 - 4.45) = 0.21f + 2463

Now it would appear that fancy mix is beneficial to the company's profit since the value of f has a positive
coefficient. So you take the decision to make as much fancy mix as possible, setting f = 960. This leads
to s = -5(960) + 4800 = 0 and the increased competition has driven you out of the standard mix market
all together. The remainder of production is therefore bulk mix at a daily level of b = 4(960) - 3300 = 540
kilograms and the resulting daily profit is 0.21(960) + 2463 = 2664.60. A daily profit of $2,664.60 is less
than it used to be, but as production manager, you have made the best of a difficult situation and shown
the sales department that the best course is to pull out of the highly competitive standard mix market
completely.
    This example is taken from a field of mathematics variously known by names such as operations research,
systems science, or management science. More specifically, this is a perfect example of problems that are
solved by the techniques of "linear programming."
    There is a lot going on under the hood in this example. The heart of the matter is the solution to
systems of linear equations, which is the topic of the next few sections, and a recurrent theme throughout
this course. We will return to this example on several occasions to reveal some of the reasons for its
behavior.

Subsection READ
Reading Questions


   1. Is the equation 92+ zy + tan(y3) =0 linear or not? Why or why not?

   2. Find all solutions to the system of two linear equations 2x + 3y =-8, x - y =6.

   3. Describe how the production manager might explain the importance of the procedures described in


the trail mix application (Subsection WILA.AA [3]).


Version 2.02


﻿
                                                                     Subsection WILA.EXC   Exercises 7


Subsection EXC
Exercises


C1O In Example TMP [3] the first table lists the cost (per kilogram) to manufacture each of the three
varieties of trail mix (bulk, standard, fancy). For example, it costs $3.69 to make one kilogram of the bulk
variety. Re-compute each of these three costs and notice that the computations are linear in character.
Contributed by Robert Beezer

M70 In Example TMP [3] two different prices were considered for marketing standard mix with the
revised recipes (one-third peanuts in each recipe). Selling standard mix at $5.50 resulted in selling the
minimum amount of the fancy mix and no bulk mix. At $5.25 it was best for profits to sell the maximum
amount of fancy mix and then sell no standard mix. Determine a selling price for standard mix that allows
for maximum profits while still selling some of each type of mix.
Contributed by Robert Beezer Solution [8]


Version 2.02


﻿
                                                                     Subsection WILA.SOL   Solutions 8


Subsection SOL
Solutions


M70     Contributed by Robert Beezer  Statement [7]
If the price of standard mix is set at $5.292, then the profit function has a zero coefficient on the variable
quantity f. So, we can set f to be any integer quantity in {825, 826, ... , 960}. All but the extreme values
(f = 825, f = 960) will result in production levels where some of every mix is manufactured. No matter
what value of f is chosen, the resulting profit will be the same, at $2,664.60.


Version 2.02


﻿
                                                    Section SSLE  Solving Systems of Linear Equations 9


Section SSLE
Solving Systems of Linear Equations


We will motivate our study of linear algebra by considering the problem of solving several linear equations
simultaneously. The word "solve" tends to get abused somewhat, as in "solve this problem." When talking
about equations we understand a more precise meaning: find all of the values of some variable quantities
that make an equation, or several equations, true.

Subsection SLE
Systems of Linear Equations


Example STNE
Solving two (nonlinear) equations
Suppose we desire the simultaneous solutions of the two equations,

                                               z2 + 2
                                               x +y =0
                                            -x+ 3y~o


You can easily check by substitution that x  , y =  and x = -   , y =--1 are both solutions. We
                                              2'      2            2'        2
need to also convince ourselves that these are the only solutions. To see this, plot each equation on the
xy-plane, which means to plot (x, y) pairs that make an individual equation true. In this case we get
a circle centered at the origin with radius 1 and a straight line through the origin with slope 1. The
intersections of these two curves are our desired simultaneous solutions, and so we believe from our plot
that the two solutions we know already are indeed the only ones. We like to write solutions as sets, so in
this case we write the set of solutions as


   In order to discuss systems of linear equations carefully, we need a precise definition. And before
we do that, we will introduce our periodic discussions about "Proof Techniques." Linear algebra is an
excellent setting for learning how to read, understand and formulate proofs. But this is a difficult step in
your development as a mathematician, so we have included a series of short essays containing advice and
explanations to help you along. These can be found back in Section PT [687] of Appendix P [679], and
we will reference them as they become appropriate. Be sure to head back to the appendix to read this as
they are introduced. With a definition next, now is the time for the first of our proof techniques. Head
back to Section PT [687] of Appendix P [679] and study Technique D [687]. We'll be right here when you
get back. See you in a bit.
Definition SLE
System of Linear Equations
A system of linear equations is a collection of m equations in the variable quantities zi, z2, z3, ..., z


of the form,

                                an1x1 + a12x2 + a13x3 + .  + ainx  = bi
                                a21x1 + a22x2 + a23x3 + .  + a2nx  = b2


Version 2.02


﻿
                                                   Subsection SSLE.PSS  Possibilities for Solution Sets 10


                                a31x1 + a32x2 + a33x3 + ... + a3nxn = b3


                             amlxl + am2x2 + am3x3 + ... + amnxn = bm

where the values of aij, bi and xz are from the set of complex numbers, C.                         A

   Don't let the mention of the complex numbers, C, rattle you. We will stick with real numbers exclusively
for many more sections, and it will sometimes seem like we only work with integers! However, we want to
leave the possibility of complex numbers open, and there will be occasions in subsequent sections where
they are necessary. You can review the basic properties of complex numbers in Section CNO [679], but
these facts will not be critical until we reach Section 0 [167]. For now, here is an example to illustrate
using the notation introduced in Definition SLE [9].

Example NSE
Notation for a system of equations
Given the system of linear equations,

                                           x1+2x2+      x4=7
                                           X1 +| X2 +-|- X4 = 3
                                           zi+x2+x3-x4=3
                                       3xi + X2 + 5X3 - 7x4 = 1

we have n = 4 variables and m = 3 equations. Also,

           a= 1              a12 =2            a13=0             a14=1              b1=7
           a21 =1            a22=1             a23=1a24=-1                          b2=3
           asi=3             a32=1             a33=5             a34=-7             b3=1

Additionally, convince yourself that xi= -2, x2 = 4, x3 = 2, x4 =1 is one solution (but it is not the only
one!).

   We will often shorten the term "system of linear equations" to "system of equations" leaving the linear
aspect implied. After all, this is a book about linear algebra.

Subsection PSS
Possibilities for Solution Sets


The next example illustrates the possibilities for the solution set of a system of linear equations. We will
not be too formal here, and the necessary theorems to back up our claims will come in subsequent sections.
So read for feeling and come back later to revisit this example.

Example TTS
Three typical systems
Consider the system of two equations with two variables,

                                            231 + 3X2 =3
                                                2-  -2=4


If we plot the solutions to each of these equations separately on the xix2-plane, we get two lines, one with
negative slope, the other with positive slope. They have exactly one point in common, (x1, x2) = (3, -1),
which is the solution xi= 3, x2 =-1. From the geometry, we believe that this is the only solution to the
system of equations, and so we say it is unique.


Version 2.02


﻿
                                   Subsection SSLE.ESEO  Equivalent Systems and Equation Operations 11


    Now adjust the system with a different second equation,

                                             2x1 + 3x2 = 3
                                             4x1 + 6x2 = 6

A plot of the solutions to these equations individually results in two lines, one on top of the other! There
are infinitely many pairs of points that make both equations true. We will learn shortly how to describe
this infinite solution set precisely (see Example SAA [37], Theorem VFSLS [99]). Notice now how the
second equation is just a multiple of the first.
    One more minor adjustment provides a third system of linear equations,

                                             2x1 + 3x2= 3
                                             4x1 + 6x2 =10

A plot now reveals two lines with identical slopes, i.e. parallel lines. They have no points in common, and
so the system has a solution set that is empty, S = 0.
    This example exhibits all of the typical behaviors of a system of equations. A subsequent theorem will
tell us that every system of linear equations has a solution set that is empty, contains a single solution or
contains infinitely many solutions (Theorem PSSLS [55]). Example STNE [9] yielded exactly two solutions,
but this does not contradict the forthcoming theorem. The equations in Example STNE [9] are not linear
because they do not match the form of Definition SLE [9], and so we cannot apply Theorem PSSLS [55]
in this case.

Subsection ESEO
Equivalent Systems and Equation Operations


With all this talk about finding solution sets for systems of linear equations, you might be ready to begin
learning how to find these solution sets yourself. We begin with our first definition that takes a common
word and gives it a very precise meaning in the context of systems of linear equations.
Definition ESYS
Equivalent Systems
Two systems of linear equations are equivalent if their solution sets are equal.              A
    Notice here that the two systems of equations could look very different (i.e. not be equal), but still have
equal solution sets, and we would then call the systems equivalent. Two linear equations in two variables
might be plotted as two lines that intersect in a single point. A different system, with three equations in
two variables might have a plot that is three lines, all intersecting at a common point, with this common
point identical to the intersection point for the first system. By our definition, we could then say these
two very different looking systems of equations are equivalent, since they have identical solution sets. It is
really like a weaker form of equality, where we allow the systems to be different in some respects, but we
use the term equivalent to highlight the situation when their solution sets are equal.
    With this definition, we can begin to describe our strategy for solving linear systems. Given a system
of linear equations that looks difficult to solve, we would like to have an equivalent system that is easy to
solve. Since the systems will have equal solution sets, we can solve the "easy" system and get the solution
set to the "difficult" system. Here come the tools for making this strategy viable.


Definition EO
Equation Operations
Given a system of linear equations, the following three operations will transform the system into a different
one, and each operation is known as an equation operation.


Version 2.02


﻿
                                   Subsection SSLE.ESEO   Equivalent Systems and Equation Operations 12


  1. Swap the locations of two equations in the list of equations.

  2. Multiply each term of an equation by a nonzero quantity.

  3. Multiply each term of one equation by some quantity, and add these terms to a second equation, on
     both sides of the equality. Leave the first equation the same after this operation, but replace the
     second equation by the new one.

                                                                                                    A

    These descriptions might seem a bit vague, but the proof or the examples that follow should make it
clear what is meant by each. We will shortly prove a key theorem about equation operations and solutions
to linear systems of equations. We are about to give a rather involved proof, so a discussion about just
what a theorem really is would be timely. Head back and read Technique T [688]. In the theorem we are
about to prove, the conclusion is that two systems are equivalent. By Definition ESYS [11] this translates
to requiring that solution sets be equal for the two systems. So we are being asked to show that two sets
are equal. How do we do this? Well, there is a very standard technique, and we will use it repeatedly
through the course. If you have not done so already, head to Section SET [683] and familiarize yourself
with sets, their operations, and especially the notion of set equality, Definition SE [684] and the nearby
discussion about its use.
Theorem EOPSS
Equation Operations Preserve Solution Sets
If we apply one of the three equation operations of Definition EO [11] to a system of linear equations
(Definition SLE [9]), then the original system and the transformed system are equivalent.     Q
Proof We take each equation operation in turn and show that the solution sets of the two systems are
equal, using the definition of set equality (Definition SE [684]).

  1. It will not be our habit in proofs to resort to saying statements are "obvious," but in this case, it
     should be. There is nothing about the order in which we write linear equations that affects their
     solutions, so the solution set will be equal if the systems only differ by a rearrangement of the order
     of the equations.

  2. Suppose a + 0 is a number. Let's choose to multiply the terms of equation i by a to build the new
     system of equations,

                                    a11x1 + a12x2 + a13x3 + . + alnxn =
                                    a21x1 + a22x2 + a23x3 + '. + a2nxn =  2
                                    a31x1 + a32x2 + a33x3 + ... + a3nxn = b3


                                 am1x1 +| am2x2 +| am3x3 +| '. ''-+ amnmnm  bm

     Let S denote the solutions to the system in the statement of the theorem, and let T denote the
     solutions to the transformed system.

     (a) Show S c T. Suppose (X1, X2, X3, . .. , In) =(#31, /#2, /33, . .. , #3k) E S is a solution to the original


system. Ignoring the i-th equation for a moment, we know it makes all the other equations of
the transformed system true. We also know that

                              ail1-3 + ai232 + ai33 + '+ + -inn = bi


Version 2.02


﻿
                                 Subsection SSLE.ESEO   Equivalent Systems and Equation Operations 13


       which we can multiply by a to get


                               aasi/13 + aai2t32 + aai333 + ... +  |ain3n = abi

       This says that the i-th equation of the transformed system is also true, so we have established
       that (31,, 32, /33, ..., /n) ET, and therefore S C T.
   (b) Now show T C S. Suppose (x1, x2, x3, ... ,mz) =(#1, 32, /33, .. . ,/3n) E T is a solution to the
       transformed system. Ignoring the i-th equation for a moment, we know it makes all the other
       equations of the original system true. We also know that

                               aaii/3i + oaai2/32 + oai3/33 + ... + an/3cn = abi

       which we can multiply by   , since c # 0, to get


                                     agi/31 + ai2/32 + ai3/33 + ... +  ain/n = b


       This says that the i-th equation of the original system is also true, so we have established that
       (/13, /32, /33, ... , /3n) E S, and therefore T C S. Locate the key point where we required that
       a # 0, and consider what would happen if a = 0.

3. Suppose a is a number. Let's choose to multiply the terms of equation i by a and add them to
   equation j in order to build the new system of equations,

                                              a11xi + a12x2 + ... + ainx = bi
                                              a21x1 + a22x2 + -.- + a2nx = b2
                                              a31x1 + a32x2 + ... + a3nxn = b3


                    (ca1i + aji)xi + (ceai2 + aj2)x2 + ... + (wi + ajn)xn = oi + b3


                                            amlxl + am2x2 + ... + ammxm= bm

   Let S denote the solutions to the system in the statement of the theorem, and let T denote the
   solutions to the transformed system.

   (a) Show S C T. Suppose (xi, z2, x3, .. . , z) =(#1, /32, /3, - -. -,/) E S is a solution to the
       original system. Ignoring the j-th equation for a moment, we know this solution makes all the
       other equations of the transformed system true. Using the fact that the solution makes the i-th
       and j-th equations of the original system true, we find

                   (oaasi + agi)/31 + (caai2 + aj2)/32 + - - + (oaajn + aga)#32
                        = (cai/i1 + aa2/32 + - - + aag/3n) + (ag1/31 + aj2/32 + - - + agn/3n)
                        = aai/i1 + ai2/32 + - - + ain/3n) + (ag1/31 + aj2/32 + - - + agn/3n)


                = abi + b3.

This says that the j-th equation of the transformed system is also true, so we have established
that (#13, /32, /33, ...,3n) ET, and therefore S C T.


Version 2.02


﻿
                                   Subsection SSLE.ESEO  Equivalent Systems and Equation Operations 14


      (b) Now show T C S. Suppose (x1, x2, x3, ...,x)= (/31, /32, /33, .. . ,,n) E T is a solution to the
         transformed system. Ignoring the j-th equation for a moment, we know it makes all the other
         equations of the original system true. We then find

                   aji/31 + aj2/32 + ... + ajn/n
                        =aji/31 + aj2#2 + ... + ajn  + abi - abi
                        = aji13 + aj232 + ... + ajnn + (caii/31 + aai22 + ... + aain/3n) - abi
                        =aji13i + rai3i + aj2/2 + aai2/2 + ... + ajn/ + n aa/3n - abi
                        =(ai + aji)/31 + (a2 + aj2)/32 + ... + (gain + ajn)/ - abi
                        = cbi + b3 - cbi
                        = by

         This says that the j-th equation of the original system is also true, so we have established that
         (131, /32, /3, ...,/3n) E S, and therefore T C S.

     Why didn't we need to require that a -$ 0 for this row operation? In other words, how does the
     third statement of the theorem read when a = 0? Does our proof require some extra care when
     a = 0? Compare your answers with the similar situation for the second row operation. (See Exercise
     SSLE.T20 [20].)


   Theorem EOPSS [12] is the necessary tool to complete our strategy for solving systems of equations.
We will use equation operations to move from one system to another, all the while keeping the solution set
the same. With the right sequence of operations, we will arrive at a simpler equation to solve. The next
two examples illustrate this idea, while saving some of the details for later.

Example US
Three equations, one solution
We solve the following system by a sequence of equation operations.

                                         xi + 2x2 + 2x3 = 4
                                         xi+ 3x2 + 3x3 =5
                                         2x1 + 6x2 + 5x3 = 6

a = -1 times equation 1, add to equation 2:

                                         zi + 2x2 + 2x3 = 4
                                         0xi + 1X2 + 1X3 =1
                                         2xi + 6X2 + 5X3 =6

  a=-2 times equation 1, add to equation 3:

                                         zi + 2x2 + 2x3 =4
                                         0xi + 1x2 + 1x3 =1
                                         0xi + 2x2 +1x3 =-2


-2 times equation 2, add to equation 3:

                                     xl + 2x2 + 2x3   4


Version 2.02


﻿
Subsection SSLE.ESEO


Equivalent Systems and Equation Operations 15


                                         Ox1 + 1x2 + 1x3= 1
                                         Ox1+Ox2-1x3= -4

a = -1 times equation 3:

                                          xi + 2x2 + 2x3 = 4
                                          Oxi + 1x2 + 1x3= 1
                                          Ox1 + Ox2 + 1x3 = 4

which can be written more clearly as

                                          xi + 2x2 + 2x3 = 4
                                                 X2 + X3 =1
                                                      X3 = 4

This is now a very easy system of equations to solve. The third equation requires that x3 = 4 to be true.
Making this substitution into equation 2 we arrive at x2 = -3, and finally, substituting these values of x2
and x3 into the first equation, we find that xi= 2. Note too that this is the only solution to this final
system of equations, since we were forced to choose these values to make the equations true. Since we
performed equation operations on each system to obtain the next one in the list, all of the systems listed
here are all equivalent to each other by Theorem EOPSS [12]. Thus (xi, x2, X3)= (2, -3,4) is the unique
solution to the original system of equations (and all of the other intermediate systems of equations listed
as we transformed one into another).

Example IS
Three equations, infinitely many solutions
The following system of equations made an appearance earlier in this section (Example NSE [10]), where
we listed one of its solutions. Now, we will try to find all of the solutions to this system. Don't concern
yourself too much about why we choose this particular sequence of equation operations, just believe that
the work we do is all correct.

                                        xi + 2x2 + Ox3 + x4 = 7
                                          xi+x2+x3-x4=3
                                       3xi + x2 + 5x3 - 7x4 = 1

a = -1 times equation 1, add to equation 2:

                                        i + 2x2 + Ox3 -- x4 =7
                                        Oxi - x2 + x3 - 2x4 =-4
                                        3xi + x2 + 5x3 - 7x4 =1

  a=-3 times equation 1, add to equation 3:

                                        xi + 2x2 + 0x3 +| x4 =7
                                        Ox1 - x2 + x3 - 2x4 =-4
                                     Ox1 - 5x2 + 5x3 - 10x4 =-20


-5 times equation 2, add to equation 3:

                                   xi + 2x2 + Ox3 + X4 = 7


Version 2.02


﻿
Subsection SSLE.ESEO Equivalent


Systems and Equation Operations 16


                                       O1 - z2 + z3 - 2x4 =-4
                                     0x1+0x2 +0x3 +0x4=0

a = -1 times equation 2:

                                       xi + 2x2 + 03 +4 = 7


                                     0x1+0x2+0x3+0x4=0

a = -2 times equation 2, add to equation 1:

                                      zi +0O2 + 2x3 - 3x4 =-1


                                      0x1+0x2+0x3+0x4=0

which can be written more clearly as

                                            xi + 2x3 - 3x4 =-1
                                            2- 33 + 2x4 =4
                                                         0=0

What does the equation 0 = 0 mean? We can choose any values for xzi, 2, 33, 34 and this equation will
be true, so we only need to consider further the first two equations, since the third is true no matter what.
We can analyze the second equation without consideration of the variable xi. It would appear that there
is considerable latitude in how we can choose x2, 33, 34 and make this equation true. Let's choose 33 and
34 to be anything we please, say x3 = a and 34 = b.
   Now we can take these arbitrary values for 33 and 34, substitute them in equation 1, to obtain

                                     xi + 2a - 3b =-1
                                               i = -1-2a+3b

Similarly, equation 2 becomes

                                      z2- a + 2b =4
                                               2 =4+a-2b
So our arbitrary choices of values for 33 and 34 (a and b) translate into specific values of xi and x2. The
lone solution given in Example NSE [10] was obtained by choosing a = 2 and b = 1. Now we can easily
and quickly find many more (infinitely more). Suppose we choose a = 5 and b = -2, then we compute

                                    z1i =-1 - 2(5) + 3(-2) =--17
                                    z2 = 4 + 5-2(-2) =13

and you can verify that (3:i, z2, 3:3, 3:4) =(-17, 13, 5, -2) makes all three equations true. The entire
solution set is written as

                         S ={(-1 -2a+ 3b, 4+ a- 2b, a, b) Ia EC, bcEC}
It would be instructive to finish off your study of this example by taking the general form of the solutions
given in this set and substituting them into each of the three equations and verify that they are true in
each case (Exercise SSLE.M40 [19]).


   In the next section we will describe how to use equation operations to systematically solve any system
of linear equations. But first, read one of our more important pieces of advice about speaking and writing
mathematics. See Technique L [688].


Version 2.02


﻿
                                                        Subsection SSLE.READ   Reading Questions 17


   Before attacking the exercises in this section, it will be helpful to read some advice on getting started
on the construction of a proof. See Technique GS [689].

Subsection READ
Reading Questions


1. How many solutions does the system of equations 3x + 2y
   answer.

2. How many solutions does the system of equations 3x + 2y :
   answer.

3. What do we mean when we say mathematics is a language?


: 4, 6x + 4y = 8 have? Explain your


4, 6x + 4y = -2 have? Explain your


Version 2.02


﻿
                                                                    Subsection SSLE.EXC   Exercises 18


Subsection EXC
Exercises


C10 Find a solution to the system in Example IS [15] where x3 = 6 and x4 = 2. Find two other solutions
to the system. Find a solution where x1 = -17 and x2 =14. How many possible answers are there to each
of these questions?
Contributed by Robert Beezer

C20 Each archetype (Appendix A [698]) that is a system of equations begins by listing some specific
solutions. Verify the specific solutions listed in the following archetypes by evaluating the system of
equations with the solutions listed.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]
Archetype E [720]
Archetype F [724]
Archetype G [729]
Archetype H [733]
Archetype I [737]
Archetype J [741]
Contributed by Robert Beezer

C50 A three-digit number has two properties. The tens-digit and the ones-digit add up to 5. If the
number is written with the digits in the reverse order, and then subtracted from the original number, the
result is 792. Use a system of equations to find all of the three-digit numbers with these properties.
Contributed by Robert Beezer Solution [21]

C51 Find all of the six-digit numbers in which the first digit is one less than the second, the third digit is
half the second, the fourth digit is three times the third and the last two digits form a number that equals
the sum of the fourth and fifth. The sum of all the digits is 24. (From The MENSA Puzzle Calendar for
January 9, 2006.)
Contributed by Robert Beezer Solution [21]

C52 Driving along, Terry notices that the last four digits on his car's odometer are palindromic. A mile
later, the last five digits are palindromic. After driving another mile, the middle four digits are palindromic.
One more mile, and all six are palindromic. What was the odometer reading when Terry first looked at
it? Form a linear system of equations that expresses the requirements of this puzzle. (Car Talk Puzzler,
National Public Radio, Week of January 21, 2008) (A car odometer displays six digits and a sequence is a
palindrome if it reads the same left-to-right as right-to-left.)
Contributed by Robert Beezer Solution [22]

M1O Each sentence below has at least two meanings. Identify the source of the double meaning, and
rewrite the sentence (at least twice) to clearly convey each meaning.

   1. They are baking potatoes.

   2. He bought many ripe pears and apricots.


3. She likes his sculpture.

4. I decided on the bus.


Version 2.02


﻿
                                                                    Subsection SSLE.EXC  Exercises 19


Contributed by Robert Beezer Solution [22]

M11 Discuss the difference in meaning of each of the following three almost identical sentences, which
all have the same grammatical structure. (These are due to Keith Devlin.)

  1. She saw him in the park with a dog.

  2. She saw him in the park with a fountain.

  3. She saw him in the park with a telescope.


Contributed by Robert Beezer Solution [22]

M12 The following sentence, due to Noam Chomsky, has a correct grammatical structure, but is mean-
ingless. Critique its faults. "Colorless green ideas sleep furiously." (Chomsky, Noam. Syntactic Structures,
The Hague/Paris: Mouton, 1957. p. 15.)
Contributed by Robert Beezer Solution [22]

M13 Read the following sentence and form a mental picture of the situation.

                             The baby cried and the mother picked it up.

What assumptions did you make about the situation?
Contributed by Robert Beezer Solution [22]

M30 This problem appears in a middle-school mathematics textbook: Together Dan and Diane have
$20. Together Diane and Donna have $15. How much do the three of them have in total? (Transition
Mathematics, Second Edition, Scott Foresman Addison Wesley, 1998. Problem 5-1.19.)
Contributed by David Beezer  Solution [22]

M40 Solutions to the system in Example IS [15] are given as

                           (x1, x2, x3, X4) = (-1-2a+3b, 4+ a - 2b, a, b)

Evaluate the three equations of the original system with these expressions in a and b and verify that each
equation is true, no matter what values are chosen for a and b.
Contributed by Robert Beezer

M70 We have seen in this section that systems of linear equations have limited possibilities for solution
sets, and we will shortly prove Theorem PSSLS [55] that describes these possibilities exactly. This exercise
will show that if we relax the requirement that our equations be linear, then the possibilities expand greatly.
Consider a system of two equations in the two variables x and y, where the departure from linearity involves
simply squaring the variables.


After solving this system of non-linear equations, replace the second equation in turn by 92+ 2x + y2 =3,
92+ y2 = , 92      x + y2 = , 492+ 4y2 =-1 and solve each resulting system of two equations in two
variables.


Contributed by Robert Beezer Solution [23]

T10 Technique D [687] asks you to formulate a definition of what it means for a whole number to be
odd. What is your definition? (Don't say "the opposite of even.") Is 6 odd? Is 11 odd? Justify your


Version 2.02


﻿
                                                                      Subsection SSLE.EXC  Exercises 20


answers by using your definition.
Contributed by Robert Beezer Solution [23]

T20 Explain why the second equation operation in Definition EO [11] requires that the scalar be nonzero,
while in the third equation operation this restriction on the scalar is not present.
Contributed by Robert Beezer Solution [23]


Version 2.02


﻿
                                                                     Subsection SSLE.SOL  Solutions 21


Subsection SOL
Solutions


C50    Contributed by Robert Beezer  Statement [18]
Let a be the hundreds digit, b the tens digit, and c the ones digit. Then the first condition says that
b + c = 5. The original number is 100a + 10b + c, while the reversed number is 100c + 10b + a. So the
second condition is
                         792 = (100a + 10b + c) - (100c + 10b + a) = 99a - 99c

So we arrive at the system of equations

                                                b+c=5
                                           99a - 99c = 792

Using equation operations, we arrive at the equivalent system

                                               a-c=8
                                               b+c=5

We can vary c and obtain infinitely many solutions. However, c must be a digit, restricting us to ten values
(0 - 9). Furthermore, if c > 1, then the first equation forces a > 9, an impossibility. Setting c = 0, yields
850 as a solution, and setting c =1 yields 941 as another solution.
C51 Contributed by Robert Beezer Statement [18]
Let abcdef denote any such six-digit number and convert each requirement in the problem statement into
an equation.

                                           a b-1
                                                1
                                           c=-b
                                                2
                                           d   3c
                                   10*e+ f = d+e
                                          24=a+b+c+d+e+ f

In a more standard form this becomes

                                                      a - b =-1
                                                   -b+ 2c =0
                                                   -3c+d =0
                                               -d +9e+ f =0
                                      a+b+c+d+e+f=24

Using equation operations (or the techniques of the upcoming Section RREF [24]), this system can be
converted to the equivalent system

                                                  16
                                             a+-f= 5
                                                 75


    16
    b+ f=6
    75
    8
c+ -f8
   c+f=3
   75
                                                 Version 2.02


﻿
                                                                   Subsection SSLE.SOL  Solutions 22


                                                 8     9
                                                 25
                                                 11
                                            e+-f=1
                                                75

Clearly, choosing f = 0 will yield the solution abcde = 563910. Furthermore, to have the variables result
in single-digit numbers, none of the other choices for f (1, 2, ..., 9) will yield a solution.

C52    Contributed by Robert Beezer  Statement [18]
198888 is one solution, and David Braithwaite found 199999 as another.

M10     Contributed by Robert Beezer   Statement [18]


  1. Is "baking" a verb or an adjective?
     Potatoes are being baked.
     Those are baking potatoes.

  2. Are the apricots ripe, or just the pears? Parentheses could indicate just what the adjective "ripe" is
     meant to modify. Were there many apricots as well, or just many pears?
     He bought many pears and many ripe apricots.
     He bought apricots and many ripe pears.

  3. Is "sculpture" a single physical object, or the sculptor's style expressed over many pieces and many
     years?
     She likes his sculpture of the girl.
     She likes his sculptural style.

  4. Was a decision made while in the bus, or was the outcome of a decision to choose the bus. Would the
     sentence "I decided on the car," have a similar double meaning?
     I made my decision while on the bus.
     I decided to ride the bus.


M11     Contributed by Robert Beezer   Statement [19]
We know the dog belongs to the man, and the fountain belongs to the park. It is not clear if the telescope
belongs to the man, the woman, or the park.

M12     Contributed by Robert Beezer   Statement [19]
In adjacent pairs the words are contradictory or inappropriate. Something cannot be both green and
colorless, ideas do not have color, ideas do not sleep, and it is hard to sleep furiously.

M13 Contributed by Robert Beezer Statement [19]
Did you assume that the baby and mother are human?
Did you assume that the baby is the child of the mother?
Did you assume that the mother picked up the baby as an attempt to stop the crying?
M30 Contributed by Robert Beezer Statement [19]
If x, y and z represent the money held by Dan, Diane and Donna, then y =15 - z and x  20 - y
20 - (15 - z) =5+ z. We can let z take on any value from 0 to 15 without any of the three amounts being
negative, since presumably middle-schoolers are too young to assume debt.


   Then the total capital held by the three is x + y + z= (5 + z) +(15 - z)+z = 20 + z. So their combined
holdings can range anywhere from $20 (Donna is broke) to $35 (Donna is flush).
   We will have more to say about this situation in Section TSS [50], and specifically Theorem CMVEI
[56].


Version 2.02


﻿
                                                                      Subsection SSLE.SOL  Solutions 23


M70     Contributed by Robert Beezer  Statement [19]
The equation 2 _ -2 = 1 has a solution set by itself that has the shape of a hyperbola when plotted. The
five different second equations have solution sets that are circles when plotted individually. Where the
hyperbola and circle intersect are the solutions to the system of two equations. As the size and location of
the circle varies, the number of intersections varies from four to none (in the order given). Sketching the
relevant equations would be instructive, as was discussed in Example STNE [9].
    The exact solution sets are (according to the choice of the second equation),


       Xz2 + 2x + y2 = 3 : {(1, 0), (-2,v'3), (-2,-'--4)

            X       -2 + y212 : {(1,0), (-1,0)}
        2 -x + y2 = 0 :     f{(1,0)}
          4x2 + 4y2=1:          {}

T10    Contributed by Robert Beezer  Statement [19]
We can say that an integer is odd if when it is divided by 2 there is a remainder of 1. So 6 is not odd
since 6 = 3 x 2+0, while 11 is odd since 11 =5 x 2+ 1.
T20    Contributed by Robert Beezer  Statement [20]
Definition EO [11] is engineered to make Theorem EOPSS [12] true. If we were to allow a zero scalar to
multiply an equation then that equation would be transformed to the equation 0 = 0, which is true for
any possible values of the variables. Any restrictions on the solution set imposed by the original equation
would be lost.
    However, in the third operation, it is allowed to choose a zero scalar, multiply an equation by this
scalar and add the transformed equation to a second equation (leaving the first unchanged). The result?
Nothing. The second equation is the same as it was before. So the theorem is true in this case, the two
systems are equivalent. But in practice, this would be a silly thing to actually ever do! We still allow it
though, in order to keep our theorem as general as possible.
    Notice the location in the proof of Theorem EOPSS [12] where the expression  appears  this explains
the prohibition on a'= 0 in the second equation operation.


Version 2.02


﻿
                                                        Section RREF   Reduced Row-Echelon Form  24


Section RREF
Reduced Row-Echelon Form
.                                                                                                 0


After solving a few systems of equations, you will recognize that it doesn't matter so much what we call
our variables, as opposed to what numbers act as their coefficients. A system in the variables Xi, x2, x3
would behave the same if we changed the names of the variables to a, b, c and kept all the constants the
same and in the same places. In this section, we will isolate the key bits of information about a system of
equations into something called a matrix, and then use this matrix to systematically solve the equations.
Along the way we will obtain one of our most important and useful computational tools.

Subsection MVNSE
Matrix and Vector Notation for Systems of Equations


Definition M
Matrix
An m x n matrix is a rectangular layout of numbers from C having m rows and n columns. We will use
upper-case Latin letters from the start of the alphabet (A, B, C, ...) to denote matrices and squared-off
brackets to delimit the layout. Many use large parentheses instead of brackets the distinction is not
important. Rows of a matrix will be referenced starting at the top and working down (i.e. row 1 is at the
top) and columns will be referenced starting from the left (i.e. column 1 is at the left). For a matrix A,
the notation [A]ij will refer to the complex number in row i and column j of A.
(This definition contains Notation M.)
(This definition contains Notation MC.)                                                          A
   Be careful with this notation for individual entries, since it is easy to think that [A]id refers to the
whole matrix. It does not. It is just a number, but is a convenient way to talk about the individual entries
simultaneously. This notation will get a heavy workout once we get to Chapter M [182].
Example AM
A matrix

                                            -1 2     5   3
                                      B41        0 -6    1
                                            -4 2     2   -2
is a matrix with m =3 rows and nr= 4 columns. We can say that [B]2,3 =-6 while [B]3,4 =-2.
   Some mathematical software is very particular about which types of numbers (integers, rationals,
reals, complexes) you wish to work with.See: Computation R.SAGE [674] . A calculator or computer
language can be a convenient way to perform calculations with matrices. But first you have to enter the
matrix.See: Computation ME.MMA [667] Computation ME.TI86 [672] Computation ME.TI83 [673]
Computation ME.SAGE [675] . When we do equation operations on system of equations, the names of
the variables really aren't very important. Xi, X2, X3, or a, b, c, or x, y, z, it really doesn't matter. In
this subsection we will describe some notation that will make it easier to describe linear systems, solve the
systems and describe the solution sets. Here is a list of definitions, laden with notation.


Definition CV
Column Vector
A column vector of size m is an ordered list of m numbers, which is written in order vertically, starting
at the top and proceeding to the bottom. At times, we will refer to a column vector as simply a vector.


Version 2.02


﻿
                        Subsection RREF.MVNSE    Matrix and Vector Notation for Systems of Equations 25


Column vectors will be written in bold, usually with lower case Latin letter from the end of the alphabet
such as u, v, w, x, y, z. Some books like to write vectors with arrows, such as iu. Writing by hand, some
like to put arrows on top of the symbol, or a tilde underneath the symbol, as in u. To refer to the entry
or component that is number i in the list that is the vector v we write [v]i.
(This definition contains Notation CV.)
(This definition contains Notation CVC.)                                                         A
   Be careful with this notation. While the symbols [v]2 might look somewhat substantial, as an object
this represents just one component of a vector, which is just a single complex number.
Definition ZCV
Zero Column Vector
The zero vector of size m is the column vector of size m where each entry is the number zero,

                                                   0
                                                   0
                                             0 = 0


                                                   0

or defined much more compactly, [0]2 = 0 for 1 < i < m.
(This definition contains Notation ZCV.)                                                         A

Definition CM
Coefficient Matrix
For a system of linear equations,

                               a11x1 + a12x2 + a13x3 + ... + alnxn =  
                               a21x1 + a22x2 + a23x3 +- ' + a2nxn = b2
                               a31x1 + a32x2 + a33x3 + '.. + a3nxn = b3


                             amlxl + am2x2 + am3x3 + ... + amnxn = bm

the coefficient matrix is the m x n matrix

                                        alla 12   a13   ...  aln
                                        a21  a22  a23   ...  a2n
                                 A =    a31  a32   a33  ...  a3n


                                       _ami am2 am3 . .. amn_


Definition VOC
Vector of Constants
For a system of linear equations,

                               a11x1 +| a12x2 +| a13x3 +| '. + 'a-- lnn  6


a21x1 + a22x2 + a23x3 + .  2n n = 2
a31x1 + a32x2 + a33x3 + '  3n n = 3


Version 2.02


﻿
Subsection RREF.MVNSE Matrix and Vector


Notation for Systems of Equations 26


Subsection RREF.MVNSE Matrix and Vector Notation for Systems of Equations 26


amlxl + am232 + am3-3 + '''- + amnin


bm


the vector of constants is the column vector of size m

                                                   b1
                                                   b2
                                             b =   b3

                                                   bm


A


Definition SOLV
Solution Vector
For a system of linear equations,


   a11x1 + a12x2 + a13x3 + '. + ainxn
   a21x1 + a22x2 + a23x3 + ' . + a2n-In
   a31-11 + a32-12 + a33-13 + ' . + a3n-In


amlxl + am2-T2 + am3-T3 + ' ' '-amnin


bm


the solution vector is the column vector of size n


                                                   -x2
                                             x =   -x3


                                                                                                  A

   The solution vector may do double-duty on occasion. It might refer to a list of variable quantities at
one point, and subsequently refer to values of those variables that actually form a particular solution to
that system.

Definition MRLS
Matrix Representation of a Linear System
If A is the coefficient matrix of a system of linear equations and b is the vector of constants, then we will
write [S(A, b) as a shorthand expression for the system of linear equations, which we will refer to as the
matrix representation of the linear system.
(This definition contains Notation MRLS.)                                                         A

Example NSLE
Notation for systems of linear equations
The system of linear equations


23: + 4x2 - 3x3 + 5x4 +3:5 = 9
    3xi:+32 +       4-3 3x5 = 0
-2x + 7-2 - 533 + 2:4 + 2x =-3


Version 2.02


﻿
                                                             Subsection RREF.RO   Row Operations 27


has coefficient matrix
                                            2   4 -3 5      1
                                     A43        1   0   1 -3
                                           -2 7 -5 2        2
and vector of constants
                                                    9
                                             b40
                                                   .-3_
and so will be referenced as [S(A, b).

Definition AM
Augmented Matrix
Suppose we have a system of m equations in n variables, with coefficient matrix A and vector of constants
b. Then the augmented matrix of the system of equations is the m x (n + 1) matrix whose first n
columns are the columns of A and whose last column (number n + 1) is the column vector b. This matrix
will be written as [A b].
(This definition contains Notation AM.)                                                           A
   The augmented matrix represents all the important information in the system of equations, since the
names of the variables have been ignored, and the only connection with the variables is the location of
their coefficients in the matrix. It is important to realize that the augmented matrix is just that, a matrix,
and not a system of equations. In particular, the augmented matrix does not have any "solutions," though
it will be useful for finding solutions to the system of equations that it is associated with. (Think about
your objects, and review Technique L [688].) However, notice that an augmented matrix always belongs
to some system of equations, and vice versa, so it is tempting to try and blur the distinction between the
two. Here's a quick example.
Example AMAA
Augmented matrix for Archetype A
Archetype A [702] is the following system of 3 equations in 3 variables.

                                          X1 - z2 + 2x3 = 1
                                          2x1 + x2 + x3 = 8
                                                x1 + x2 = 5

Here is its augmented matrix.

                                            1 -1 2 1
                                            2   1   1 8


Subsection RO
Row Operations


An augmented matrix for a system of equations will save us the tedium of continually writing down the


names of the variables as we solve the system. It will also release us from any dependence on the actual
names of the variables. We have seen how certain operations we can perform on equations (Definition
EO [11]) will preserve their solutions (Theorem EOPSS [12]). The next two definitions and the following
theorem carry over these ideas to augmented matrices.


Version 2.02


﻿
                                                             Subsection RREF.RO   Row Operations 28


Definition RO
Row Operations
The following three operations will transform an m x n matrix into a different matrix of the same size, and
each is known as a row operation.

  1. Swap the locations of two rows.

  2. Multiply each entry of a single row by a nonzero quantity.

  3. Multiply each entry of one row by some quantity, and add these values to the entries in the same
     columns of a second row. Leave the first row the same after this operation, but replace the second
     row by the new values.

We will use a symbolic shorthand to describe these row operations:
  1. Ri - R3: Swap the location of rows i and j.

  2. aR2: Multiply row i by the nonzero scalar a.

  3. oRZ + R3: Multiply row i by the scalar a and add to row j.
(This definition contains Notation RO.)                                                           A

Definition REM
Row-Equivalent Matrices
Two matrices, A and B, are row-equivalent if one can be obtained from the other by a sequence of row
operations.                                                                                       A

Example TREM
Two row-equivalent matrices
The matrices
                         2 -1    3   4                           1   1   0    6
                   A=5       2   -2 3                      B=3      0   -2 -9
                         1   1   0   6_                          2 -1    3    4_

are row-equivalent as can be seen from

                              2 -1     3   4           1   1   0   6
                              5   2   -2 3]k       i   5   2   -2 3
                              1I  1    0   6_          2 -1    3   4_
                                                       1   1   0    6
                                             -2R1+R2[3     0   -2  -9


We can also say that any pair of these three matrices are row-equivalent.
   Notice that each of the three row operations is reversible (Exercise RREF.T10 [43]), so we do not have
to be careful about the distinction between "A is row-equivalent to B" and "B is row-equivalent to A."
(Exercise RREF.T11 [43]) The preceding definitions are designed to make the following theorem possible.
It says that row-equivalent matrices represent systems of linear equations that have identical solution sets.
Theorem REMES
Row- Equivalent Matrices represent Equivalent Systems
Suppose that A and B are row-equivalent augmented matrices. Then the systems of linear equations that


they represent are equivalent systems.                                                            D
Proof If we perform a single row operation on an augmented matrix, it will have the same effect as if
we did the analogous equation operation on the corresponding system of equations. By exactly the same


Version 2.02


﻿
                                                 Subsection RREF.RREF   Reduced Row-Echelon Form   29


methods as we used in the proof of Theorem EOPSS [12] we can see that each of these row operations will
preserve the set of solutions for the corresponding system of equations.                            U

   So at this point, our strategy is to begin with a system of equations, represent it by an augmented
matrix, perform row operations (which will preserve solutions for the corresponding systems) to get a
"simpler" augmented matrix, convert back to a "simpler" system of equations and then solve that system,
knowing that its solutions are those of the original system. Here's a rehash of Example US [14] as an
exercise in using our new tools.


Example USR
Three equations, one solution, reprised
We solve the following system using augmented matrices and row
equations solved in Example US [14] using equation operations.


operations. This is the same system of


zi + 2X2
zi + 3X2
2x1 + 6x2


+ 2x3= 4
+ 3x3= 5
+ 53=6


Form the augmented matrix,


      1
A=1
      2


2
3
6


2
3
5


4
5
6


and apply row operations,

                            1
                 -1R1+R2  0
                            2
                            1
                 -2R2+R3  0

                            0

So the matrix


2
1
6
2
1
0


2
1
5
2
1


4]
1
6]


          1
-2R1+R3 0

          0
          1

          0
          0


2
1
2
2
1
0


2
1
1
2
1
1


4
1
-2
4
1
4_


    4]
  L1
1 -4_


                                                1
                                          B=    0
                                                0
is row equivalent to A and by Theorem REMES [28]
set as the original system of equations.


2 2 4
1 1 1
0 1 4
the system of equations below has the same solution


                                         zi +22x2 + 2x3 = 4
                                                x2 +x:3 = 1
                                                     x3 = 4

Solving this "simpler" system is straightforward and is identical to the process in Example US [14].


Subsection RREF
Reduced Row-Echelon Form


0


The preceding example amply illustrates the definitions and theorems we have seen so far. But it still
leaves two questions unanswered. Exactly what is this "simpler" form for a matrix, and just how do we
get it? Here's the answer to the first question, a definition of reduced row-echelon form.


Version 2.02


﻿
                                                Subsection RREF.RREF   Reduced Row-Echelon Form  30


Definition RREF
Reduced Row-Echelon Form
A matrix is in reduced row-echelon form if it meets all of the following conditions:
  1. A row where every entry is zero lies below any row that contains a nonzero entry.

  2. The leftmost nonzero entry of a row is equal to 1.

  3. The leftmost nonzero entry of a row is the only nonzero entry in its column.

  4. Consider any two different leftmost nonzero entries, one located in row i, column j and the other
     located in row s, column t. If s > i, then t > j.
A row of only zero entries will be called a zero row and the leftmost nonzero entry of a nonzero row will
be called a leading 1. The number of nonzero rows will be denoted by r.
   A column containing a leading 1 will be called a pivot column. The set of column indices for all of
the pivot columns will be denoted by D = {di, d2, d3, ..., dr} where di < d2 < d3 < ... < dr, while the
columns that are not pivot columns will be denoted as F = {fi, f2, f3, ..., f,-r} where fi < f2 < f3 <
---<fn-r-
(This definition contains Notation RREFA.)                                                       A
   The principal feature of reduced row-echelon form is the pattern of leading 1's guaranteed by conditions
(2) and (4), reminiscent of a flight of geese, or steps in a staircase, or water cascading down a mountain
stream.
   There are a number of new terms and notation introduced in this definition, which should make you
suspect that this is an important definition. Given all there is to digest here, we will save the use of D and
F until Section TSS [50].
Example RREF
A matrix in reduced row-echelon form
The matrix C is in reduced row-echelon form.
                                      1 -3    0 6 0 0 -5       9
                                      0   0   0 0 1 0      3   -7
                                C     0   0   0 0 0 1      7   3
                                      0   0   0 0 0 0      0   0
                                      0   0   0 0 0 0      0   0
This matrix has two zero rows and three leading 1's. r = 3. Columns 1, 5, and 6 are pivot columns. Z

Example NRREF
A matrix not in reduced row-echelon form
The matrix E is not in reduced row-echelon form, as it fails each of the four requirements once.
                                     1 0 -3 0 6 0 7 -5 9
                                     0 0 0 5 0 1 0 3 -7
                                     0 00 00 0 00                0
                                  B  0 1    0  0 0 0 0 -4        2
                                     0 00 00 0 17                3
                                     _000 00 0 00                0_


   Our next theorem has a "constructive" proof. Learn about the meaning of this term in Technique C


[690].
Theorem REMEF
Row-Equivalent Matrix in Echelon Form
Suppose A is a matrix. Then there is a matrix B so that


Version 2.02


﻿
                                                  Subsection RREF.RREF  Reduced Row-Echelon Form  31


   1. A and B are row-equivalent.

   2. B is in reduced row-echelon form.


Proof Suppose that A has m rows and n columns. We will describe a process for converting A into
B via row operations. This procedure is known as Gauss-Jordan elimination. Tracing through this
procedure will be easier if you recognize that i refers to a row that is being converted, j refers to a column
that is being converted, and r keeps track of the number of nonzero rows. Here we go.

   1. Set j= 0 and r =0.

   2. Increase j by 1. If j now equals n + 1, then stop.

   3. Examine the entries of A in column j located in rows r + 1 through m.
     If all of these entries are zero, then go to Step 2.

  4. Choose a row from rows r + 1 through m with a nonzero entry in column j.
     Let i denote the index for this row.

  5. Increase r by 1.

  6. Use the first row operation to swap rows i and r.

  7. Use the second row operation to convert the entry in row r and column j to a 1.

  8. Use the third row operation with row r to convert every other entry of column j to zero.

  9. Go to Step 2.

The result of this procedure is that the matrix A is converted to a matrix in reduced row-echelon form,
which we will refer to as B. We need to now prove this claim by showing that the converted matrix has the
requisite properties of Definition RREF [30]. First, the matrix is only converted through row operations
(Step 6, Step 7, Step 8), so A and B are row-equivalent (Definition REM [28]).
    It is a bit more work to be certain that B is in reduced row-echelon form. We claim that as we begin
Step 2, the first j columns of the matrix are in reduced row-echelon form with r nonzero rows. Certainly
this is true at the start when j = 0, since the matrix has no columns and so vacuously meets the conditions
of Definition RREF [30] with r = 0 nonzero rows.
    In Step 2 we increase j by 1 and begin to work with the next column. There are two possible outcomes
for Step 3. Suppose that every entry of column j in rows r + 1 through m is zero. Then with no changes
we recognize that the first j columns of the matrix has its first r rows still in reduced-row echelon form,
with the final m - r rows still all zero.
    Suppose instead that the entry in row i of column j is nonzero. Notice that since r + 1   i   m, we
know the first j - 1 entries of this row are all zero. Now, in Step 5 we increase r by 1, and then embark
on building a new nonzero row. In Step 6 we swap row r and row i. In the first j columns, the first r - 1
rows remain in reduced row-echelon form after the swap. In Step 7 we multiply row r by a nonzero scalar,
creating a 1 in the entry in column j of row i, and not changing any other rows. This new leading 1 is the
first nonzero entry in its row, and is located to the right of all the leading 1's in the preceding r - 1 rows.
With Step 8 we insure that every entry in the column with this new leading 1 is now zero, as required for


reduced row-echelon form. Also, rows r + 1 through m are now all zeros in the first j columns, so we now
only have one new nonzero row, consistent with our increase of r by one. Furthermore, since the first j - 1
entries of row r are zero, the employment of the third row operation does not destroy any of the necessary
features of rows 1 through r - 1 and rows r + 1 through m, in columns 1 through j - 1.


Version 2.02


﻿
                                                 Subsection RREF.RREF  Reduced Row-Echelon Form  32


    So at this stage, the first j columns of the matrix are in reduced row-echelon form. When Step 2 finally
increases j to n + 1, then the procedure is completed and the full n columns of the matrix are in reduced
row-echelon form, with the value of r correctly recording the number of nonzero rows.
    The procedure given in the proof of Theorem REMEF [30] can be more precisely described using a
pseudo-code version of a computer program, as follows:
         input m, n and A
         r<-0
         for j <- 1 to n
            i --r+1
            while i <m and [A] = 0

            ifi#m+1
                r<--r+1
                swap rows i and r of A (row op 1)
                scale entry in row r, column j of A to a leading 1 (row op 2)
                for k <- 1 to m, k # r
                   zero out entry in row k, column j of A (row op 3 using row r)
         output r and A
Notice that as a practical matter the "and" used in the conditional statement of the while statement should
be of the "short-circuit" variety so that the array access that follows is not out-of-bounds.
    So now we can put it all together. Begin with a system of linear equations (Definition SLE [9]), and
represent the system by its augmented matrix (Definition AM [27]). Use row operations (Definition RO
[28]) to convert this matrix into reduced row-echelon form (Definition RREF [30]), using the procedure
outlined in the proof of Theorem REMEF [30]. Theorem REMEF [30] also tells us we can always accomplish
this, and that the result is row-equivalent (Definition REM [28]) to the original augmented matrix. Since
the matrix in reduced-row echelon form has the same solution set, we can analyze the row-reduced version
instead of the original matrix, viewing it as the augmented matrix of a different system of equations. The
beauty of augmented matrices in reduced row-echelon form is that the solution sets to their corresponding
systems can be easily determined, as we will see in the next few examples and in the next section.
    We will see through the course that almost every interesting property of a matrix can be discerned by
looking at a row-equivalent matrix in reduced row-echelon form. For this reason it is important to know
that the matrix B guaranteed to exist by Theorem REMEF [30] is also unique.
    Two proof techniques are applicable to the proof. First, head out and read two proof techniques:
Technique CD [692] and Technique U [693].
Theorem RREFU
Reduced Row-Echelon Form is Unique
Suppose that A is an m x n~ matrix and that B and C are m x n~ matrices that are row-equivalent to A
and in reduced row-echelon form. Then B =C.D
Proof We need to begin with no assumptions about any relationships between B and C, other than they
are both in reduced row-echelon form, and they are both row-equivalent to A.
    If B and C are both row-equivalent to A, then they are row-equivalent to each other. Repeated row
operations on a matrix combine the rows with each other using operations that are linear, and are identical
in each column. A key observation for this proof is that each individual row of B is linearly related to the
rows of C. This relationship is different for each row of B, but once we fix a row, the relationship is the
same across columns. More precisely, there are scalars oaj, 1 < i, k < m such that for any 1    i < m,


1 < j   n,
                                                  m
                                          [B] ij = Zak [C]kj
                                                 k=1


Version 2.02


﻿
                                                 Subsection RREF.RREF  Reduced Row-Echelon Form  33


You should read this as saying that an entry of row i of B (in column j) is a linear function of the entries
of all the rows of C that are also in column j, and the scalars (aik) depend on which row of B we are
considering (the i subscript on oik), but are the same for every column (no dependence on j in oik). This
idea may be complicated now, but will feel more familiar once we discuss "linear combinations" (Definition
LCCV [90]) and moreso when we discuss "row spaces" (Definition RSM [243]). For now, spend some time
carefully working Exercise RREF.M40 [42], which is designed to illustrate the origins of this expression.
This completes our exploitation of the row-equivalence of B and C.
   We now repeatedly exploit the fact that B and C are in reduced row-echelon form. Recall that a pivot
column is all zeros, except a single one. More carefully, if R is a matrix in reduced row-echelon form, and
df is the index of a pivot column, then [R]kd, = 1 precisely when k =_£ and is otherwise zero. Notice also
that any entry of R that is both below the entry in row £ and to the left of column df is also zero (with
below and left understood to include equality). In other words, look at examples of matrices in reduced
row-echelon form and choose a leading 1 (with a box around it). The rest of the column is also zeros, and
the lower left "quadrant" of the matrix that begins here is totally zeros.
    Assuming no relationship about the form of B and C, let B have r nonzero rows and denote the pivot
columns as D = {d1, d2, d3, ..., dr}. For C let r' denote the number of nonzero rows and denote the
pivot columns as D' = {d'1, d'2, d'3, ..., d'r'} (Notation RREFA [30]). There are four steps in the proof,
and the first three are about showing that B and C have the same number of pivot columns, in the same
places. In other words, the "primed" symbols are a necessary fiction.
    First Step. Suppose that d1 <cdl. Then

                      1   [B]ld                             Definition RREF [30]
                           m
                           E 61k [C] kd,
                           k=1
                           M
                        =E    1k (0)                        d1 < dl
                          k=1
                          =0

The entries of C are all zero since they are left and below of the leading 1 in row 1 and column dl of C.
This is a contradiction, so we know that d1 > dl. By an entirely similar argument, reversing the roles of
B and C, we could conclude that d1 < d'. Together this means that d1= d'.
    Second Step. Suppose that we have determined that d1= dl, d2 = d2, d3 = d',..., d = dl. Let's now
show that dp+1 = d+1. Working towards a contradiction, suppose that dp+1 < d'+1. For 1 < £ < p,

                0 = [B]p+lde                                     Definition RREF [30]
                     m
                       SP+,k [C]kd
                    k=1

                         mm


                    op1, [C]gd/ + >3    +1±,k [C]kd              Property CACN [680]
                                  k=1
                                m


= y+,(1) +  | P+l,k(0)                         Definition RREF [30]
             k=l
             k#e
= by+1,e


Version 2.02


﻿
Subsection RREF.RREF  Reduced Row-Echelon Form  34


Now,


1=[B]p~ldp±l
   m
-E S p+1,k [C]kd+±,
  k~l
  p
-E S p+1,k [C]kd+±, +
  k~l


Definition RREF [30]


  m

  S ap+1,k [Clkd~±,
k~p+1


Property AACN [680]


                 p               m
               -  (o) [Clkd±+1 + 5E op±1,k [Clkd+1
               k=1             k    1~
                 m
               -ES a~4,k [Clkd+±1
               k~p+1
                 m

                 k~p+1lAko


This contradiction shows that dp+1 > p+   By an entirely
dp±i   d+1, and therefore dp+ ±i pl
   Third Step. Now  we establish that r = r'. Suppose that l 2 =dd 3 . r l o


               0 - [B] rd,
                   m
                - 5 ark [C]kde
                  k=1

                _ Sark [Ckde +  S      rk [C>kd
                  k=1           k '+1


                _ Sark [Ckd +     5   ark (0)
                  k=1            k~r'+l


d,±i < di


similar argument, we could conclude that

Ir' < r. By the arguments above know that


Definition RREF [30]


Property AACN [680]


Property AACN [680]


5 ark [C]kde
k=1


5 ark [C]kie
k=1


                - arf [C].ed + 5 ark [C] kci
                             k=1


                - arf (1) + >3 ar k(0)
                          k=1


Now examine the entries of row r of B,
                      M
              [B] rj 5aE rk [C]kj
                     k=1


Property CACN [680]


Definition RREF [30]


Version 2.02


﻿
Subsection RREF.RREF    Reduced Row-Echelon Form   35


r'              m
   ork[C]kj+ S+rk[C]kj
k=1           k=r'+1
r'              m
S   ork[C]kj +  5 ork(0)
k=1           k=r'+1


Property CACN [680]


Definition RREF [30]


                      = ork [C]k
                      k=1
                        r'
                    = E(0) [C]kg
                       k=1


So row r is a totally zero row, contradicting that this should be the bottommost nonzero row of B. So
r' > r. By an entirely similar argument, reversing the roles of B and C, we would conclude that r' < r
and therefore r = r'. Thus, combining the first three steps we can say that D = D'. In other words, B
and C have the same pivot columns, in the same locations.
    Fourth Step. In this final step, we will not argue by contradiction. Our intent is to determine the
values of the og3. Notice that we can use the values of the d2 interchangeably for B and C. Here we go,


1 = [B]d.
     m
     = aik[C]kd,
     k=1
               m
    =  [C] 2d2 + 5 ok [C]kd,
              k=1
              k#i
              m
    =22(1) + 5 ok(0)
            k=1
            k#i


Definition RREF [30]


Property CACN [680]


Definition RREF [30]


and for £ # i


0 = [B]id,
     m
     =  Sik [C]kd,
     k=1
                m
  = ort [C] c, + 5 ik [C]kd,
               k=1
               k#2P
             m
  = bee(1) + 5  a6k(0)
            k=1
            k#2


Definition RREF [30]


Property CACN [680]


Definition RREF [30]


Finally, having determined the values of the ojj, we can show that B = C. For 1 < i < m, 1 < j <n,
                          m
                  [B] ij = aoik [C]kj
                         k=1


Version 2.02


﻿
Subsection RREF.RREF   Reduced Row-Echelon Form  36


          m
Sii [C] i + S ik [C]kj
         k=1
         k#i
           m
(1) [C]i +(>(0) [Clkj
          k=1
          k#i


Property CACN [680]


So B and C have equal values in every entry, and so are the same matrix.


0


   We will now run through some examples of using these definitions and theorems to solve some systems
of equations. From now on, when we have a matrix in reduced row-echelon form, we will mark the leading
l's with a small box. In your work, you can box 'em, circle 'em or write 'em in a different color  just
identify 'em somehow. This device will prove very useful later and is a very good habit to start developing
right now.

Example SAB
Solutions for Archetype B
Let's find the solutions to the following system of equations,

                                     -7    - 6x2 - 12x3 =-33
                                        5xi + 5x2 + 7x3= 24
                                               zi +4x3= 5


First, form the augmented matrix,


                                        -7 -6
                                        5    5
                                        1 0

and work to reduced row-echelon form, first with i


-12
7
4


-33
24
5


1,


          1
R1<-+R3 5

        [-7

7R1+R3   0

         0


0
5
-6
0
5
-6


4
7
-12
4
-13
16


5]
24
-33]
5
-1
2_


           1
-5R1+R2  0

         [-7


0
5
-6


4
-13
-12


5
-1
-33_


Now, with i = 2,


   11
1 R2: 0
      _0


0
1
-6


4
-13
5
16


5]
2i
2]_


And finally, with i = 3,


          10       4    5
6R2+R3  0 o ii -13 i
              00        5
        _ 0   0    j2   j4


    13        j   0 4 5]
    5R3+R2
    -         0       0  5
            [0    0   1 2


           10       4    5]
    -R
    2  3   0       -13   -1
           0   0    1    2]
           1   0   0 -3
-4R3+R1: 0W1  0   5

           0   0 W      2]


Version 2.02


﻿
                                                 Subsection RREF.RREF  Reduced Row-Echelon Form  37


This is now the augmented matrix of a very simple system of equations, namely x1 = -3, x2 = 5, x3 = 2,
which has an obvious solution. Furthermore, we can see that this is the only solution to this system, so we
have determined the entire solution set,

                                                    --3
                                            S =      5


You might compare this example with the procedure we used in Example US [14].
    Archetypes A and B are meant to contrast each other in many respects. So let's solve Archetype A
now.
Example SAA
Solutions for Archetype A
Let's find the solutions to the following system of equations,

                                           x1 - z2 + 2x3 = 1
                                           2x1 + x2 + x3 = 8
                                                 x1 + x2 = 5

First, form the augmented matrix,

                                             1 -1 2 1
                                             2   1   1 8
                                             1   1   0 5

and work to reduced row-echelon form, first with i = 1,

                           1 -1     2   1                             1   -1    2   1
                 -2R1+R2, 0    3   -3 6                     -1R1+R3> 0     3   -3 6
                           1   1    0   5_                           _ 0   2   -2 4

Now, with i = 2,

                      1     1   -1    2                                1   0   1   3
                      - > 0      1   -1 2-                          > 0    1 -1 2

                            0    2   -2 4                              0   2 -2 4]
                            1    0    1   3
                 -2R2+R3    0        -1 2


The system of equations represented by this augmented matrix needs to be considered a bit differently
than that for Archetype B. First, the last row of the matrix is the equation 0 =0, which is always true, so
it imposes no restrictions on our possible solutions and therefore we can safely ignore it as we analyze the
other two equations. These equations are,

                                             X1 + X3 =3
                                             z2 - X3 =2.


While this system is fairly easy to solve, it also appears to have a multitude of solutions. For example,
choose x3 =1 and see that then xi= 2 and x2 = 3 will together form a solution. Or choose x3 = 0, and
then discover that xi = 3 and x2 = 2 lead to a solution. Try it yourself: pick any value of x3 you please,
and figure out what xi and x2 should be to make the first and second equations (respectively) true. We'll


Version 2.02


﻿
                                                Subsection RREF.RREF  Reduced Row-Echelon Form  38


wait while you do that. Because of this behavior, we say that x3 is a "free" or "independent" variable. But
why do we vary x3 and not some other variable? For now, notice that the third column of the augmented
matrix does not have any leading 1's in its column. With this idea, we can rearrange the two equations,
solving each for the variable that corresponds to the leading 1 in that row.


                                             X2 = 2+x3

To write the set of solution vectors in set notation, we have


                                     S=[2+x3          33ECC


We'll learn more in the next section about systems with infinitely many solutions and how to express their
solution sets. Right now, you might look back at Example IS [15].

Example SAE
Solutions for Archetype E
Let's find the solutions to the following system of equations,


2x1 + 12 + 733
-3xi + 4x2 - 5z3
   zi + 12 +413


- 7x4 =2
- 6x4 = 3
- 5x4 =2


First, form the augmented matrix,

                                          2 1
                                          -3 4
                                          1 1

and work to reduced row-echelon form, first with i


7
-5
4


-7
-6
-5


2
3
2


1,


           1
  R1<-+R3>-3
           2

-2R1 +R3

           0


1
4
1
1
7
-1


4
-5
7
  4
  7
  -1


Now, with i = 2,


-5 2
-6 3
-7 2_
-5     2
-21 9
  3 -2_


-5 2
3    -2
-21 9_

0
2
9_


         1
3R1+R2 0

         2


1 4
7 7
1 7


-5
-21
-7


2
9
2]


         [Eli
  R2-R3    0

           0

-1R2+R1    0

           0


1
-1
7

0 3
1 1
7 7


4
-1
7


          S1
   -1R2 ri
     - 2   0   1
           0  7
           1 0
-7R2+R3    0   1

           0   0


4
1
7


-5
-3
21
-2
-3
0


2
2
9]
0
2
-5_


-2
-3
-21


3
1
0


And finally, with i = 3,

                   11 0
              -1R  1 0
                -R3A 0 R
                      0    0


3
1
0


-2
-3
0


01
2
1_


           1   0   3
-2R3+R2    0o    1

           0   0   0


-2
-3
0


0
0
1


Version 2.02


﻿
                                                         Subsection RREF.READ  Reading Questions 39


Let's analyze the equations in the system represented by this augmented matrix. The third equation will
read 0 = 1. This is patently false, all the time. No choice of values for our variables will ever make it
true. We're done. Since we cannot even make the last equation true, we have no hope of making all of
the equations simultaneously true. So this system has no solutions, and its solution set is the empty set,
0 = { } (Definition ES [683]).
   Notice that we could have reached this conclusion sooner. After performing the row operation -7R2 +
R3, we can see that the third equation reads 0 =-5, a false statement. Since the system represented by
this matrix has no solutions, none of the systems represented has any solutions. However, for this example,
we have chosen to bring the matrix fully to reduced row-echelon form for the practice.
   These three examples (Example SAB [36], Example SAA [37], Example SAE [38]) illustrate the full
range of possibilities for a system of linear equations  no solutions, one solution, or infinitely many
solutions. In the next section we'll examine these three scenarios more closely.
Definition RR
Row-Reducing
To row-reduce the matrix A means to apply row operations to A and arrive at a row-equivalent matrix
B in reduced row-echelon form.                                                                     A

   So the term row-reduce is used as a verb. Theorem REMEF [30] tells us that this process will always
be successful and Theorem RREFU [32] tells us that the result will be unambiguous. Typically, the analysis
of A will proceed by analyzing B and applying theorems whose hypotheses include the row-equivalence of
A and B.
   After some practice by hand, you will want to use your favorite computing device to do the computations
required to bring a matrix to reduced row-echelon form (Exercise RREF.C30 [42]).See:  Computation
RR.MMA [667]      Computation RR.T186 [672]     Computation RR.T183 [673]     Computation RR.SAGE
[675] .

Subsection READ
Reading Questions


  1. Is the matrix below in reduced row-echelon form? Why or why not?

                                              1 5 0 6 8
                                              0 0 1 2 0
                                              0 0 0 0 1

  2. Use row operations to convert the matrix below to reduced row-echelon form and report the final
     matrix.


  3. Find all the solutions to the system below by using an augmented matrix and row operations. Report
     your final matrix in reduced row-echelon form and the set of solutions.

                                            2xi + 3x2 - z3=0
                                            Xi + 2x2 + X3 =3


zi + 3x2 + 3x3 = 7


Version 2.02


﻿
                                                                 Subsection RREF.EXC   Exercises 40


Subsection EXC
Exercises


C05   Each archetype below is a system of equations. Form the augmented matrix of the system of
equations, convert the matrix to reduced row-echelon form by using equation operations and then describe
the solution set of the original system of equations.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]
Archetype E [720]
Archetype F [724]
Archetype G [729]
Archetype H [733]
Archetype I [737]
Archetype J [741]
Contributed by Robert Beezer

   For problems C10-C19, find all solutions to the system of linear equations. Use your favorite computing
device to row-reduce the augmented matrices for the systems, and write the solutions as a set, using correct
set notation.
C10

                                      2x1 - 3x2 + x3 + 7x4 =14
                                      2xi + 8x2 - 4x3 + 5x4 = -1
                                           xi + 3X2 - 3x3= 4
                                   -5x1 + 2x2 + 3x3 + 4x4 = -19


Contributed by Robert Beezer Solution [44]

C11

                                      3xi + 4x2 - 33 + 2X4 = 6
                                      xi-2x2+3x3+x4=2
                                         10x2 - 10x3 - x4 =1


Contributed by Robert Beezer Solution [44]

C12

                                     237i + 42 + 5373 + 7374 =-26
                                        z7i + 2xv2 +373 - 374 =-4
                                   -23i - 472 +33+114 =   10


Contributed by Robert Beezer Solution [44]

C13

                                     zi + 2x2 + 8x3 - 7x4 = -2


Version 2.02


﻿
Subsection RREF.EXC  Exercises 41


3xi + 2x2 + 12x3 - 5x4 = 6
   -Xi + 12 + 3 - 514 = -


10


Contributed by Robert Beezer Solution [45]

C14

                                    2xi +3:2 + 733 - 2x4 = 4
                                       3xi - 2x2 + 1134 =13
                                     i +3:2 + 5x3 - 3x4 = 1


Contributed by Robert Beezer Solution [45]

C15


231 + 3x2 - 33 - 934 = -16
       zi + 2x2 +3:3 = 0
-zi + 2x2+3x3+4x4=8


Contributed by Robert Beezer Solution [45]

C16

                                   231 + 3X2 + 1933 - 4x4 = 2
                                   xi + 2x2 + 12x3 - 334 = 1
                                   -zi + 2x2 +8x3 - 534=1


Contributed by Robert Beezer Solution [46]

C17


           -zi + 52 = -8
-2xi + 532 + 533 + 2x4 = 9
-3xi - 32 + 3x3 +3:4 = 3
  7zi + 6x2 + 533 +3:4 = 30


Contributed by Robert Beezer Solution [46]

C18

                                     zi + 2x2 - 4x3 - 34 = 32
                                     zi + 3x2 - 733 - x5 = 33
                                     xi+ 2x3-2x4 +3x5=22


Contributed by Robert Beezer Solution [46]


Version 2.02


﻿
Subsection RREF.EXC   Exercises 42


C19


2xi + x2 = 6
- 2 = -2
3xi + 4x2 = 4
3xi + 5x2 = 2


Contributed by Robert Beezer Solution [47]

   For problems C30-C33, row-reduce the matrix without the aid of a calculator, indicating the row
operations you are using at each step using the notation of Definition RO [28].
C30


2   1    5
1  -3   -1
4  -2    6


10
-2
12


Contributed by Robert Beezer


Solution [47]


C31


[


1    2
-3 -1
-2 1


-4
-3
-7


Contributed by Robert Beezer


Solution [47]


C32


1     1
-4   -3
3     2


1
-2
1


Contributed by Robert Beezer


Solution [48]


C33


1
2
-1


2
4
-2


-1
-1
3


-1
4
5_


Contributed by Robert Beezer


Solution [48]


M40 Consider the two 3 x 4 matrices below


       1
B = -1
      -1


3   -2    2]
-2 -1 -1
-5 8     -3]


       1    2    1  2
C=     1    1    4  0
      -1 -1 -4 1_


    (a) Row-reduce each matrix and determine that the reduced row-echelon forms of B and C are
identical. From this argue that B and C are row-equivalent.
    (b) In the proof of Theorem RREFU [32], we begin by arguing that entries of row-equivalent matrices
are related by way of certain scalars and sums. In this example, we would write that entries of B from row
i that are in column j are linearly related to the entries of C in column j from all three rows


[B]ig = Si1 [C]1 + ai2 [C]2j +ai3 [C]3


1<j<4


Version 2.02


﻿
                                                                     Subsection RREF.EXC  Exercises 43


For each 1 < i < 3 find the corresponding three scalars in this relationship. So your answer will be nine
scalars, determined three at a time.
Contributed by Robert Beezer Solution [48]

M50 A parking lot has 66 vehicles (cars, trucks, motorcycles and bicycles) in it. There are four times
as many cars as trucks. The total number of tires (4 per car or truck, 2 per motorcycle or bicycle) is 252.
How many cars are there? How many bicycles?
Contributed by Robert Beezer Solution [49]

T10 Prove that each of the three row operations (Definition RO [28]) is reversible. More precisely, if
the matrix B is obtained from A by application of a single row operation, show that there is a single row
operation that will transform B back into A.
Contributed by Robert Beezer Solution [49]

T11 Suppose that A, B and C are m x n matrices. Use the definition of row-equivalence (Definition
REM [28]) to prove the following three facts.

   1. A is row-equivalent to A.

   2. If A is row-equivalent to B, then B is row-equivalent to A.

   3. If A is row-equivalent to B, and B is row-equivalent to C, then A is row-equivalent to C.

A relationship that satisfies these three properties is known as an equivalence relation, an important
idea in the study of various algebras. This is a formal way of saying that a relationship behaves like
equality, without requiring the relationship to be as strict as equality itself. We'll see it again in Theorem
SER [433].
Contributed by Robert Beezer

T12 Suppose that B is an m x n matrix in reduced row-echelon form. Build a new, likely smaller, k x £
matrix C as follows. Keep any collection of k adjacent rows, k < m. From these rows, keep columns 1
through £, £ < n. Prove that C is in reduced row-echelon form.
Contributed by Robert Beezer

T13 Generalize Exercise RREF.T12 [43] by just keeping any k rows, and not requiring the rows to be
adjacent. Prove that any such matrix C is in reduced row-echelon form.
Contributed by Robert Beezer


Version 2.02


﻿
                                                                  Subsection RREF.SOL   Solutions 44


Subsection SOL
Solutions


C1O Contributed by Robert Beezer
The augmented matrix row-reduces to


Statement [40]


[M    0   0    0
  0   1   0    0
  0   0   1    0
  0   0   0   W


1
-3
-4
1


and we see from the locations of the leading 1's that the system
that n - r = 4 - 4 = 0 and so the system has no free variables
unique solution. This solution is

                                                    1

                                           S =      4


C11    Contributed by Robert Beezer  Statement [40]
The augmented matrix row-reduces to


is consistent (Theorem RCLS [53]) and
(Theorem CSRN [54]) and hence has a


10 1
0   F-   -1
0    0    0


4/5
-1/10
  0


0
0
1


and a leading 1 in the last column tells us that the system is
solution set is 0 = {}.


inconsistent (Theorem RCLS [53]). So the


C12    Contributed by Robert Beezer  Statement [40]
The augmented matrix row-reduces to

                                         1   2  0   -4    2
                                         0   0 0     3   -6
                                         0   0  0    0    0]

(Theorem RCLS [53]) and (Theorem CSRN [54]) tells us the system is consistent and the solution set can
be described with n - r = 4 - 2 = 2 free variables, namely x2 and x4. Solving for the dependent variables
(D = {xi, x3}) the first and second equations represented in the row-reduced matrix yields,

                                         zi = 2 - 2x2 + 4x4
                                         3 = -6 -3X4


As a set, we write this as


~2 - 2x2 + 4x4
      X2
   -6 - 3x4     |X2, X4 E C
      X:4


Version 2.02


﻿
                                                                   Subsection RREF.SOL   Solutions 45


C13    Contributed by Robert Beezer  Statement [40]
The augmented matrix of the system of equations is

                                        1    2   8  -7    -2
                                        3    2 12 -5       6
                                        _-1 1    1  -5 -10

which row-reduces to
                                          1   0   2   1    0
                                          0       3 -4    0
                                          0 0 00 Lii
With a leading one in the last column Theorem RCLS [53] tells us the system of equations is inconsistent,
so the solution set is the empty set, 0.
C14    Contributed by Robert Beezer  Statement [41]
The augmented matrix of the system of equations is

                                          2   1   7 -2    4
                                          3 -2 0     11  13
                                          1   1   5 -3    1_

which row-reduces to
                                          1   0   2   1    3
                                          0       3 -4    -2
                                          0   0   0   0    0]
Then D = {1, 2} and F = {3, 4, 5}, so the system is consistent (5 g D) and can be described by the two free
variables x3 and x4. Rearranging the equations represented by the two nonzero rows to gain expressions
for the dependent variables x1 and x2, yields the solution set,

                                         3 - 2x3 - X4
                                    S   -2 - 3x3 -| 4x4 x3x4EC
                                S= {Iz~~4                  3, X4 E C}
                                              z3


C15    Contributed by Robert Beezer  Statement [41]
The augmented matrix of the system of equations is

                                         2   3 -1 -9 -16
                                         1   2   1   0     0


which row-reduces to
                                          [W 0    0   2    3
                                          0 2 0 -3 -


Then D ={1, 2, 3} and F ={4, 5}, so the system is consistent (5 g D) and can be described by the one
free variable X4. Rearranging the equations represented by the three nonzero rows to gain expressions for
the dependent variables zi, x2 and xs, yields the solution set,


       { 3 - 2x4
       -5 + 3X4 I   I
S =      >          z 4 E C
        7 - 4]


Version 2.02


﻿
                                                                 Subsection RREF.SOL  Solutions 46


C16    Contributed by Robert Beezer   Statement [41]
The augmented matrix of the system of equations is

                                         2   3 19 -4 2
                                         1   2 12 -3     1
                                         -1 2   8   -5 1

which row-reduces to
                                        1   0   2   1   0
                                        0       5 -2    0
                                        0   0   0   0   1_
With a leading one in the last column Theorem RCLS [53] tells us the system of equations is inconsistent,
so the solution set is the empty set, 0 = {}.

C17 Contributed by Robert Beezer Statement [41]
We row-reduce the augmented matrix of the system of equations,

                         -1   5   0 0 -8                  0   0   0    3
                         -2   5   5 2    9   RREF    0   FI0      0   -1
                         -3 -1 3 1       3           0    0  [    0    2
                         7    6   5 1    30_0             0   0[       5

The reduced row-echelon form of the matrix is the augmented matrix of the system xi= 3, x2  -1,
x3 = 2, x4 = 5, which has a unique solution. As a set of column vectors, the solution set is


                                          S =


C18 Contributed by Robert Beezer Statement [41]
We row-reduce the augmented matrix of the system of equations,

                     1 2 -4 -1       0  32         [1    0    2   0    5   6
                     1 3 -7     0   -1 33    RREF:   0[      -3   0   -2   9
                     1 0   2   -2    3  220              0    0 0      1   -8]

With no leading 1 in the final column, we recognize the system as consistent (Theorem RCLS [53]). Since
the system is consistent, we compute the number of free variables as n - r = 5 - 3 = 2 (), and we see
that columns 3 and 5 are not pivot columns, so x3 and 5 are free variables. We convert each row of the
reduced row-echelon form of the matrix into an equation, and solve it for the lone dependent variable, as
in expression in the two free variables.

                            xi +2x3 + 5x5=6     -a   x1 =6 -2x3- 5x5
                            x2 -3x3 -2x5 = 9     >   x2-=9 +3x3+--2.x5
                                 X4 +  5 =-8    -~ X4 =-8 -     5

These expressions give us a convenient way to describe the solution set, S.


       6 - 2x3 - 5x5
       9 + 3x3 + 2x5
S=           X3       |x3,x5EC
          -8- x5
             - 5  _


Version 2.02


﻿
                                                                Subsection RREF.SOL  Solutions 47


C19    Contributed by Robert Beezer   Statement [42]
We form the augmented matrix of the system,

                                           2    1   6
                                           -1 -1 -2
                                           3    4   4
                                           3    5   2

which row-reduces to

                                            0       4
                                            0      -2
                                            0   0   0
                                            0   0   0_

With no leading 1 in the final column, this system is consistent (Theorem RCLS [53]). There are n = 2
variables in the system and r = 2 non-zero rows in the row-reduced matrix. By Theorem FVCS [55], there
are n - r = 2 - 2 = 0 free variables and we therefore know the solution is unique. Forming the system
of equations represented by the row-reduced matrix, we see that xi = 4 and x2 =-2. Written as set of
column vectors,

                                          S     [= ]
                                               { [4]

C30 Contributed by Robert Beezer Statement [42]


             2
             1
             4
             1
 -2R1+R2

             4

             1

             0

-10R2+R3

            _ 0


1
-3
-2
-3
7
-2
-3


5
-1
6
-1
7
6
-1


101
-2
12]
-2]
14
12
-2]


            1
 R1<-R2     2
            4
            1
_4R1 +R3    0

            0
            1
 3R2+R1     0

            0


-3 -1
1    5
-2 6
-3 -1
7    7
10 10
0   2
1   1
10 10


-2
  10
  12
  -2
  14
  20]
4
2
20_


1    1    2
10   10  2
  0  2 4
  [  1 2
  0 0 0


0_


C31    Contributed by Robert Beezer  Statement [42]


           1   2 -4
           -3 -1 -3
           -2  1   -7_-
           1 2 -4
 2R1 +R3 II
         10 5 -15
         0 5 -15_
         1 0 2]
-2R2+R1 0 1 -3

          0  5  -15_


           1 2 -4
 3R1+R2    0  5  -15

          -21 -7]
       11 2 -4
       S0 1 -3
          0  5  -15
             1 0   2
-5R2+R3   0 []-3

          0    0   0


Version 2.02


﻿
                                                                   Subsection RREF.SOL   Solutions 48


C32    Contributed by Robert Beezer  Statement [42]
Following the algorithm of Theorem REMEF [30], and working to create pivot columns from left to right,
we have


1     1   1
- 4 - 3 - 2 4R1 +R2

3     2   1
1     0   -1
0     1   2    1R2+R3
0  -1 -2


    ~1 1 1
    0 1 2 l    3R1+R3
    3 2    1
1    0   -1
0 L       2
0    0    0


1    1    1
0    1    2   -1R2+R1
0   -1 -2


C33    Contributed by Robert Beezer  Statement [42]
Following the algorithm of Theorem REMEF [30], and working to create pivot columns
we have


from left to right,


1     2
2     4
-1   -2
      1
      0
      _0


-1
-1
3
2 0
0 1
0 2


-11
4    -2R1+R2
5]
5
6  -2R2+R3:
4


L


1    2   -1 -1                  1   2  -1   -1
S    0    1    6   1R1+R3       0   0   1    6   1R2+R1
-1  -2    3   5                 0   0   2    4

  1   2   0   5    _1             1   2   0   5
  0   0   1   6       Rs          0   0 ri6        6Rs+R2
  0   0   0   -8_                 0   0   0   1_
      2   0    0
   0  0 L      0
   0  0   0  1


1   2   0   5]
00W ~ -5R3+R1
0   0   0   1]


M40     Contributed by Robert Beezer  Statement [42]
(a) Let R be the common reduced row-echelon form of B and C. A sequence of row operations converts
B to R and a second sequence of row operations converts C to R. If we "reverse" the second sequence's
order, and reverse each individual row operation (see Exercise RREF.T10 [43]) then we can begin with
B, convert to R with the first sequence, and then convert to C with the reversed sequence. Satisfying
Definition REM [28] we can say B and C are row-equivalent matrices.
    (b) We will work this carefully for the first row of B and just give the solution for the next two rows.
For row 1 of B take i = 1 and we have


[B]1j = 611 [C]j + a12 [C]2j + a13 [C]3j


1<j<4


If we substitute the four values for j we arrive at four linear equations in the three unknowns a11, 612, 613,


(j =1)      [B]11
(j=2)       [B]12
(j = 3)     [B]13
(j = 4)     [B]14


a11 [C]11 + a12 [C]21 + a13 [C]31
a11 [C]12 + a12 [C]22 + a13 [C]32
a11 [C]13 + a12 [C]23 + a13 [C]33
a11 [C]14 + a12 [C]24 + a13 [C]34


         1
         3
-      -2
         2


611 (1) + a12(1) + a13(-1)
a11(2) + 512(1) + a13(-1)
611(1) + 612(4) + 613(-4)
611(2) + a12(0) + a13(1)


We form the augmented matrix of this system and row-reduce to find the solutions,

                              ~1 1 -1     1           ~1    0   0    2
                              2 1 -1      3    RREF    0   [1   0   -3
                              1 4 -4 -2                0    0W-     -2
                              2 0     1   2            0    0   0    0


Version 2.02


﻿
                                                                    Subsection RREF.SOL   Solutions 49


So the unique solution is 11   2, 612 = -3, 613 = -2. Entirely similar work will lead you to

                    621 = -1                     622 = 1                   623 = 1

and

                    631 =-4                      632 =8                     633= 5

M50     Contributed by Robert Beezer  Statement [43]
Let c, t, m, b denote the number of cars, trucks, motorcycles, and bicycles. Then the statements from the
problem yield the equations:

                                            c+t+m+b         66
                                                   c - 4t =0
                                       4c+4t+2m+2b          252

The augmented matrix for this system is

                                           1  1   1 1    66
                                           1 -4 0 0       0
                                           4  4   2 2 252

which row-reduces to
                                          [lw1 0    0  0 48
                                          0         0  0 12
                                          0    0   LF  16]
c = 48 is the first equation represented in the row-reduced matrix so there are 48 cars. m + b = 6 is the
third equation represented in the row-reduced matrix so there are anywhere from 0 to 6 bicycles. We can
also say that b is a free variable, but the context of the problem limits it to 7 integer values since you
cannot have a negative number of motorcycles.
T10    Contributed by Robert Beezer  Statement [43]
If we can reverse each row operation individually, then we can reverse a sequence of row operations. The
operations that reverse each operation are listed below, using our shorthand notation,

                                        Ri <- Rj     Ri <- R3
                                                     1
                                      aR,a#0         -R2
                                      aR + R3         -caRi+ R


Version 2.02


﻿
                                                                Section TSS  Types of Solution Sets 50


Section TSS
Types of Solution Sets


We will now be more careful about analyzing the reduced row-echelon form derived from the augmented
matrix of a system of linear equations. In particular, we will see how to systematically handle the situation
when we have infinitely many solutions to a system, and we will prove that every system of linear equations
has either zero, one or infinitely many solutions. With these tools, we will be able to solve any system by
a well-described method.

Subsection CS
Consistent Systems


The computer scientist Donald Knuth said, "Science is what we understand well enough to explain to a
computer. Art is everything else." In this section we'll remove solving systems of equations from the realm
of art, and into the realm of science. We begin with a definition.
Definition CS
Consistent System
A system of linear equations is consistent if it has at least one solution. Otherwise, the system is called
inconsistent.                                                                                     A
   We will want to first recognize when a system is inconsistent or consistent, and in the case of consistent
systems we will be able to further refine the types of solutions possible. We will do this by analyzing the
reduced row-echelon form of a matrix, using the value of r, and the sets of column indices, D and F, first
defined back in Definition RREF [30].
   Use of the notation for the elements of D and F can be a bit confusing, since we have subscripted
variables that are in turn equal to integers used to index the matrix. However, many questions about
matrices and systems of equations can be answered once we know r, D and F. The choice of the letters D
and F refer to our upcoming definition of dependent and free variables (Definition IDV [52]). An example
will help us begin to get comfortable with this aspect of reduced row-echelon form.
Example RREFN
Reduced row-echelon form notation
For the 5 x 9 matrix

                                    1    5  0    0  2 8    0   5 -1
                                    00 LE047 02 0
                              B =    0  0   0   2   3 9    0   3 -6
                                     0 00 0 00W 4 2
                                     0 00 0 00 00 0

in reduced row-echelon form we have


            d1=1             d2=3              ds=4              d4=7
            fi =2             f2 =5            f3 =6             f4 =8            f5 =9


Notice that the sets

          D = {di, d2, d3, d4} ={1, 3, 4, 7}        F = {fi, f2, f3, f4, f5} {2, 5, 6, 8, 9}


Version 2.02


﻿
                                                              Subsection TSS.CS Consistent Systems 51


have nothing in common and together account for all of the columns of B (we say it is a partition of the
set of column indices).
   The number r is the single most important piece of information we can get from the reduced row-
echelon form of a matrix. It is defined as the number of nonzero rows, but since each nonzero row has a
leading 1, it is also the number of leading 1's present. For each leading 1, we have a pivot column, so r is
also the number of pivot columns. Repeating ourselves, r is the number of nonzero rows, the number of
leading 1's and the number of pivot columns. Across different situations, each of these interpretations of
the meaning of r will be useful.
   Before proving some theorems about the possibilities for solution sets to systems of equations, let's
analyze one particular system with an infinite solution set very carefully as an example. We'll use this
technique frequently, and shortly we'll refine it slightly.
   Archetypes I and J are both fairly large for doing computations by hand (though not impossibly large).
Their properties are very similar, so we will frequently analyze the situation in Archetype I, and leave you
the joy of analyzing Archetype J yourself. So work through Archetype I with the text, by hand and/or
with a computer, and then tackle Archetype J yourself (and check your results with those listed). Notice
too that the archetypes describing systems of equations each lists the values of r, D and F. Here we go...
Example ISSI
Describing infinite solution sets, Archetype I
Archetype I [737] is the system of m = 4 equations in n = 7 variables.

                                             x1+4x2-x4+7x6-9x7=3
                              2x1 + 8x2 - x3 + 3X4 + 9x5 - 13x6 + 7X7= 9
                                         2x3 - 3X4 - 4x5 + 12x6 - 8X7=1
                            -   -4x2+2x3+4x4+8x5 -31x6+37x7 = 4

This system has a 4 x 8 augmented matrix that is row-equivalent to the following matrix (check this!), and
which is in reduced row-echelon form (the existence of this matrix is guaranteed by Theorem REMEF [30]
and its uniqueness is guaranteed by Theorem RREFU [32]),

                                    1   4   0   0   2   1   -3 4]
                                    0   0 F     0   1 -3     5   2
                                    0   0   0  0    2 -6     6   1
                                    0   0   0   0   0   0    00]

So we find that r 3 and

            D ={di, d2, d3} ={1, 3, 4}          F ={fi, f2, f, f, f}   {2, 5,6,7, 8}

Let i denote one of the r - 3 non-zero rows, and then we see that we can solve the corresponding equation
represented by this row for the variable zd, and write it as a linear function of the variables Xfi, Xf2, Xf3, Xf4
(notice that ft   8 does not reference a variable). We'll do this now, but you can already see how the
subscripts upon subscripts takes some getting used to.

                    (i= 1)                    oXd= zi=4-4x2 -2x5 -z+3X7
                    (i =2)                    Xd2 = z3 =2- 5 + 3xe-5zy
                    (i =3)                    Xd3 =z4 =1 -2x5 +6X6 -6X7


Each element of the set F = {fi, f2, f3, f4, f5} = {2, 5, 6, 7, 8} is the index of a variable, except for
f5= 8. We refer to xf x= 2, Xf2 = z5, Xf3 =x6 and xf4 =X7 as "free" (or "independent") variables since
they are allowed to assume any possible combination of values that we can imagine and we can continue on


Version 2.02


﻿
                                                              Subsection TSS.CS Consistent Systems 52


to build a solution to the system by solving individual equations for the values of the other ("dependent")
variables.
   Each element of the set D = {di, d2, d3} = {1, 3, 4} is the index of a variable. We refer to the variables
zdl = xi, :d2 = x3 and z:d3 =4 as "dependent" variables since they depend on the independent variables.
More precisely, for each possible choice of values for the independent variables we get exactly one set of
values for the dependent variables that combine to form a solution of the system.
   To express the solutions as a set, we write

                             4 - 4x2 - 2x5 - x6 + 3x7
                                         x2
                                2 - X5 + 3X6 - 5x7
                                1 - 2X5 + 6X6 - 6x7    | 2, 35, 6, 37 E C
                                        X5
                                        X6
                                        3:7

The condition that x2, 5, x6, 37 E C is how we specify that the variables x2, x5, z6, 37 are "free" to
assume any possible values.
   This systematic approach to solving a system of equations will allow us to create a precise description
of the solution set for any consistent system once we have found the reduced row-echelon form of the
augmented matrix. It will work just as well when the set of free variables is empty and we get just a
single solution. And we could program a computer to do it! Now have a whack at Archetype J (Exercise
TSS.T10 [58]), mimicking the discussion in this example. We'll still be here when you get back.

   Using the reduced row-echelon form of the augmented matrix of a system of equations to determine
the nature of the solution set of the system is a very key idea. So let's look at one more example like the
last one. But first a definition, and then the example. We mix our metaphors a bit when we call variables
free versus dependent. Maybe we should call dependent variables "enslaved"?

Definition IDV
Independent and Dependent Variables
Suppose A is the augmented matrix of a consistent system of linear equations and B is a row-equivalent
matrix in reduced row-echelon form. Suppose j is the index of a column of B that contains the leading 1
for some row (i.e. column j is a pivot column). Then the variable 3 is dependent. A variable that is not
dependent is called independent or free.                                                           A

   If you studied this definition carefully, you might wonder what to do if the system has n variables and
column n + 1 is a pivot column? We will see shortly, by Theorem RCLS [53], that this never happens for
a consistent system.

Example FDV
Free and dependent variables
Consider the system of five equations in five variables,


                                      zi- z2 - 23:3 +3: 4 + 113:5 =13
                                      zi - z2 +3:3 +3: 4 + 5z5   16


      231-2x2+3:4+10x5 =21
231 - 2x2 - 33 + 334 + 20x5 = 38
  2x1-2x2+3:3+3:4+8x5=22


Version 2.02


﻿
                                                               Subsection TSS.CS Consistent Systems 53


whose augmented matrix row-reduces to

                                        1   -1    0   0    3   6
                                        0    0   [    0   -2   1
                                        0    0    0   W    4   9
                                        0    0    0   0    0   0
                                        0    0    0   0    0   0

There are leading 1's in columns 1, 3 and 4, so D = {1, 3, 4}. From this we know that the variables xi,
x3 and x4 will be dependent variables, and each of the r = 3 nonzero rows of the row-reduced matrix
will yield an expression for one of these three variables. The set F is all the remaining column indices,
F = {2, 5, 6}. That 6 E F refers to the column originating from the vector of constants, but the remaining
indices in F will correspond to free variables, so x2 and x5 (the remaining variables) are our free variables.
The resulting three equations that describe our solution set are then,

                          (xd1=x1)                           zi = 6 + x2 - 3x5
                          (xd2 =3)                            3 = 1+22X5
                          (xd3 = X4)                          4 = 9 - 4x5

Make sure you understand where these three equations came from, and notice how the location of the
leading 1's determined the variables on the left-hand side of each equation. We can compactly describe
the solution set as,

                                               x2
                                  S =       1 +2X5       |x2, x5 E C
                                            9- 4x5
                                               _ 5    _
Notice how we express the freedom for x2 and x5: x2, x5 E C.

    Sets are an important part of algebra, and we've seen a few already. Being comfortable with sets is
important for understanding and writing proofs. If you haven't already, pay a visit now to Section SET
[683].
    We can now use the values of m, n, r, and the independent and dependent variables to categorize the
solution sets for linear systems through a sequence of theorems. Through the following sequence of proofs,
you will want to consult three proof techniques. See Technique E [690].   See Technique N [691].  See
Technique CP [691].
    First we have an important theorem that explores the distinction between consistent and inconsistent
linear systems.

Theorem RCLS
Recognizing Consistency of a Linear System
Suppose A is the augmented matrix of a system of linear equations with n~ variables. Suppose also that B
is a row-equivalent matrix in reduced row-echelon form with r nonzero rows. Then the system of equations
is inconsistent if and only if the leading 1 of row r is located in column n~ + 1 of B.D
Proof (<-) The first half of the proof begins with the assumption that the leading 1 of row r is located in
column n+1 of B. Then row r of B begins with n~ consecutive zeros, finishing with the leading 1. This is a
representation of the equation 0 =1, which is false. Since this equation is false for any collection of values


we might choose for the variables, there are no solutions for the system of equations, and it is inconsistent.
    (-) For the second half of the proof, we wish to show that if we assume the system is inconsistent,
then the final leading 1 is located in the last column. But instead of proving this directly, we'll form the
logically equivalent statement that is the contrapositive, and prove that instead (see Technique CP [691]).


Version 2.02


﻿
                                                               Subsection TSS.CS Consistent Systems 54


Turning the implication around, and negating each portion, we arrive at the logically equivalent statement:
If the leading 1 of row r is not in column n + 1, then the system of equations is consistent.
    If the leading 1 for row r is located somewhere in columns 1 through n, then every preceding row's
leading 1 is also located in columns 1 through n. In other words, since the last leading 1 is not in the
last column, no leading 1 for any row is in the last column, due to the echelon layout of the leading 1's
(Definition RREF [30]). We will now construct a solution to the system by setting each dependent variable
to the entry of the final column for the row with the corresponding leading 1, and setting each free variable
to zero. That sentence is pretty vague, so let's be more precise. Using our notation for the sets D and F
from the reduced row-echelon form (Notation RREFA [30]):

                  zd = [B]in+1, 1<5i 5r                       og =O0, 1<5i      n - r

These values for the variables make the equations represented by the first r rows of B all true (convince
yourself of this). Rows numbered greater than r (if any) are all zero rows, hence represent the equation
0 = 0 and are also all true. We have now identified one solution to the system represented by B, and hence
a solution to the system represented by A (Theorem REMES [28]). So we can say the system is consistent
(Definition CS [50]).                                                                                 U

    The beauty of this theorem being an equivalence is that we can unequivocally test to see if a system
is consistent or inconsistent by looking at just a single entry of the reduced row-echelon form matrix. We
could program a computer to do it!
    Notice that for a consistent system the row-reduced augmented matrix has n + 1 E F, so the largest
element of F does not refer to a variable. Also, for an inconsistent system, n+ 1 E D, and it then does not
make much sense to discuss whether or not variables are free or dependent since there is no solution. Take
a look back at Definition IDV [52] and see why we did not need to consider the possibility of referencing
zn+1 as a dependent variable.
    With the characterization of Theorem RCLS [53], we can explore the relationships between r and n
in light of the consistency of a system of equations. First, a situation where we can quickly conclude the
inconsistency of a system.

Theorem ISRN
Inconsistent Systems, r and n
Suppose A is the augmented matrix of a system of linear equations in n variables. Suppose also that B is a
row-equivalent matrix in reduced row-echelon form with r rows that are not completely zeros. If r = n+1,
then the system of equations is inconsistent.                                                         D

Proof If r = n + 1, then D = {1, 2, 3, ... , n, n + 1} and every column of B contains a leading 1 and is
a pivot column. In particular, the entry of column n + 1 for row r = n + 1 is a leading 1. Theorem RCLS
[53] then says that the system is inconsistent.                                                       U
    Do not confuse Theorem ISRN [54] with its converse! Go check out Technique CV [691] right now.
    Next, if a system is consistent, we can distinguish between a unique solution and infinitely many
solutions, and furthermore, we recognize that these are the only two possibilities.

Theorem CSRN
Consistent Systems, r and n~
Suppose A is the augmented matrix of a consistent system of linear equations with n~ variables. Suppose
also that B is a row-equivalent matrix in reduced row-echelon form with r rows that are not zero rows.
Then r <rn. If r =rn, then the system has a unique solution, and if r <rn, then the system has infinitely
many solutions.D


Proof This theorem contains three implications that we must establish. Notice first that B has n + 1
columns, so there can be at most n + 1 pivot columns, i.e. r < n + 1. If r = n + 1, then Theorem ISRN
[54] tells us that the system is inconsistent, contrary to our hypothesis. We are left with r <rn.


Version 2.02


﻿
                                                                   Subsection TSS.FV  Free Variables 55


   When r = n, we find n - r = 0 free variables (i.e. F= {rT + 1}) and any solution must equal the unique
solution given by the first n entries of column n + 1 of B.
   When r < n, we have n - r > 0 free variables, corresponding to columns of B without a leading 1,
excepting the final column, which also does not contain a leading 1 by Theorem RCLS [53]. By varying
the values of the free variables suitably, we can demonstrate infinitely many solutions.


Subsection FV
Free Variables


The next theorem simply states a conclusion from the final paragraph of the previous proof, allowing us
to state explicitly the number of free variables for a consistent system.

Theorem FVCS
Free Variables for Consistent Systems
Suppose A is the augmented matrix of a consistent system of linear equations with n variables. Suppose
also that B is a row-equivalent matrix in reduced row-echelon form with r rows that are not completely
zeros. Then the solution set can be described with n - r free variables.                             D

Proof See the proof of Theorem CSRN [54].                                                            U

Example CFV
Counting free variables
For each archetype that is a system of equations, the values of n and r are listed. Many also contain a few
sample solutions. We can use this information profitably, as illustrated by four examples.

  1. Archetype A [702] has n = 3 and r = 2. It can be seen to be consistent by the sample solutions given.
     Its solution set then has n - r =1 free variables, and therefore will be infinite.

  2. Archetype B [707] has n = 3 and r = 3. It can be seen to be consistent by the single sample solution
     given. Its solution set can then be described with n - r = 0 free variables, and therefore will have
     just the single solution.

  3. Archetype H [733] has n = 2 and r = 3. In this case, r = n + 1, so Theorem ISRN [54] says the
     system is inconsistent. We should not try to apply Theorem FVCS [55] to count free variables, since
     the theorem only applies to consistent systems. (What would happen if you did?)

  4. Archetype E [720] has n = 4 and r = 3. However, by looking at the reduced row-echelon form of the
     augmented matrix, we find a leading 1 in row 3, column 4. By Theorem RCLS [53] we recognize the
     system as inconsistent. (Why doesn't this example contradict Theorem ISRN [54]?)


   We have accomplished a lot so far, but our main goal has been the following theorem, which is now
very simple to prove. The proof is so simple that we ought to call it a corollary, but the result is important
enough that it deserves to be called a theorem. (See Technique LC [696].) Notice that this theorem was
presaged first by Example TTS [10] and further foreshadowed by other examples.
Theorem PSSLS
Possible Solution Sets for Linear Systems


A system of linear equations has no solutions, a unique solution or infinitely many solutions.    Q

Proof By its definition, a system is either inconsistent or consistent (Definition CS [50]). The first case
describes systems with no solutions. For consistent systems, we have the remaining two possibilities as


Version 2.02


﻿
                                                                  Subsection TSS.FV  Free Variables 56


guaranteed by, and described in, Theorem CSRN [54].                                                 U
   Here is a diagram that consolidates several of our theorems from this section, and which is of practical
use when you analyze systems of equations.
                                               Theorem RCLS
                                  no leading 1 in           a leading 1 in
                                    column n + 1            column n + 1


                                         Consistent     Inconsistent

                                      Theorem FVCS

                                  r <n              r=n


                             Infinite solutions Unique solution
                      Diagram DTSLS. Decision Tree for Solving Linear Systems

We have one more theorem to round out our set of tools for determining solution sets to systems of linear
equations.
Theorem CMVEI
Consistent, More Variables than Equations, Infinite solutions
Suppose a consistent system of linear equations has m equations in n variables. If n> m, then the system
has infinitely many solutions.                                                                      D
Proof Suppose that the augmented matrix of the system of equations is row-equivalent to B, a matrix
in reduced row-echelon form with r nonzero rows. Because B has m rows in total, the number that are
nonzero rows is less. In other words, r < m. Follow this with the hypothesis that n> m and we find that
the system has a solution set described by at least one free variable because

                                         n- r > n - m>0.

A consistent system with free variables will have an infinite number of solutions, as given by Theorem
CSRN [54].
   Notice that to use this theorem we need only know that the system is consistent, together with the
values of m and n. We do not necessarily have to compute a row-equivalent reduced row-echelon form
matrix, even though we discussed such a matrix in the proof. This is the substance of the following
example.

Example OSGMD
One solution gives many, Archetype D
Archetype D is the system of m= 3 equations in nr= 4 variables,

                                       2xi + z2 + 7X3 - 7X4 =8
                                    -3xi + 4x2 - 5z3 - 6x4 =--12
                                        zi + z2 + 4x3 - 5z4 = 4

and the solution zi1 0, X2 =1, X3 =2, z4  1 can be checked easily by substitution. Having been handed
this solution, we know the system is consistent. This, together with n~ > m, allows us to apply Theorem


CMVEI [56] and conclude that the system has infinitely many solutions.
   These theorems give us the procedures and implications that allow us to completely solve any system
of linear equations. The main computational tool is using row operations to convert an augmented matrix


Version 2.02


﻿
                                                            Subsection TSS.READ  Reading Questions 57


into reduced row-echelon form. Here's a broad outline of how we would instruct a computer to solve a
system of linear equations.

   1. Represent a system of linear equations by an augmented matrix (an array is the appropriate data
     structure in most computer languages).

  2. Convert the matrix to a row-equivalent matrix in reduced row-echelon form using the procedure from
     the proof of Theorem REMEF [30].

  3. Determine r and locate the leading 1 of row r. If it is in column n+ 1, output the statement that the
     system is inconsistent and halt.

  4. With the leading 1 of row r not in column n + 1, there are two possibilities:

      (a) r = n and the solution is unique. It can be read off directly from the entries in rows 1 through
          n of column n + 1.
      (b) r <rn and there are infinitely many solutions. If only a single solution is needed, set all the free
          variables to zero and read off the dependent variable values from column n +1, as in the second
          half of the proof of Theorem RCLS [53]. If the entire solution set is required, figure out some nice
          compact way to describe it, since your finite computer is not big enough to hold all the solutions
          (we'll have such a way soon).

The above makes it all sound a bit simpler than it really is. In practice, row operations employ division
(usually to get a leading entry of a row to convert to a leading 1) and that will introduce round-off errors.
Entries that should be zero sometimes end up being very, very small nonzero entries, or small entries lead
to overflow errors when used as divisors. A variety of strategies can be employed to minimize these sorts
of errors, and this is one of the main topics in the important subject known as numerical linear algebra.
    Solving a linear system is such a fundamental problem in so many areas of mathematics, and its
applications, that any computational device worth using for linear algebra will have a built-in routine to
do just that.See:   Computation LS.MMA [668]   Computation LS.SAGE [676] . In this section we've
gained a foolproof procedure for solving any system of linear equations, no matter how many equations
or variables. We also have a handful of theorems that allow us to determine partial information about a
solution set without actually constructing the whole set itself. Donald Knuth would be proud.

Subsection READ
Reading Questions


   1. How do we recognize when a system of linear equations is inconsistent?

   2. Suppose we have converted the augmented matrix of a system of equations into reduced row-echelon
     form. How do we then identify the dependent and independent (free) variables?

  3. What are the possible solution sets for a system of linear equations?


Version 2.02


﻿
                                                                    Subsection TSS.EXC  Exercises 58


Subsection EXC
Exercises


C10 In the spirit of Example ISSI [51], describe the infinite solution set for Archetype J [741].
Contributed by Robert Beezer

M45 Prove that Archetype J [741] has infinitely many solutions without row-reducing the augmented
matrix.
Contributed by Robert Beezer Solution [60]

   For Exercises M51-M57 say as much as possible about each system's solution set. Be sure to make
it clear which theorems you are using to reach your conclusions.
M51 A consistent system of 8 equations in 6 variables.
Contributed by Robert Beezer Solution [60]

M52 A consistent system of 6 equations in 8 variables.
Contributed by Robert Beezer Solution [60]

M53 A system of 5 equations in 9 variables.
Contributed by Robert Beezer Solution [60]

M54 A system with 12 equations in 35 variables.
Contributed by Robert Beezer Solution [60]

M56 A system with 6 equations in 12 variables.
Contributed by Robert Beezer Solution [60]

M57 A system with 8 equations and 6 variables. The reduced row-echelon form of the augmented matrix
of the system has 7 pivot columns.
Contributed by Robert Beezer Solution [60]

M60 Without doing any computations, and without examining any solutions, say as much as possible
about the form of the solution set for each archetype that is a system of equations.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]
Archetype E [720]
Archetype F [724]
Archetype G [729]
Archetype H [733]
Archetype I [737]
Archetype J [741]
Contributed by Robert Beezer

T1O An inconsistent system may have r > n. If we try (incorrectly!) to apply Theorem FVCS [55] to
such a system, how many free variables would we discover?
Contributed by Robert Beezer Solution [60]

T40 Suppose that the coefficient matrix of a consistent system of linear equations has two columns that


are identical. Prove that the system has infinitely many solutions.
Contributed by Robert Beezer Solution [60]

T41 Consider the system of linear equations [S(A, b), and suppose that every element of the vector of


Version 2.02


﻿
                                                                      Subsection TSS.EXC  Exercises 59


constants b is a common multiple of the corresponding element of a certain column of A. More precisely,
there is a complex number a, and a column index j, such that [b] = a [A]ij for all i. Prove that the system
is consistent.
Contributed by Robert Beezer Solution [60]


Version 2.02


﻿
                                                                     Subsection TSS.SOL  Solutions 60


Subsection SOL
Solutions


M45     Contributed by Robert Beezer  Statement [58]
Demonstrate that the system is consistent by verifying any one of the four sample solutions provided. Then
because n = 9 > 6 = m, Theorem CMVEI [56] gives us the conclusion that the system has infinitely many
solutions.
   Notice that we only know the system will have at least 9 - 6 = 3 free variables, but very well could
have more. We do not know know that r = 6, only that r < 6.

M51     Contributed by Robert Beezer  Statement [58]
Consistent means there is at least one solution (Definition CS [50]). It will have either a unique solution
or infinitely many solutions (Theorem PSSLS [55]).

M52     Contributed by Robert Beezer  Statement [58]
With 6 rows in the augmented matrix, the row-reduced version will have r < 6. Since the system is
consistent, apply Theorem CSRN [54] to see that n - r > 2 implies infinitely many solutions.

M53     Contributed by Robert Beezer  Statement [58]
The system could be inconsistent. If it is consistent, then because it has more variables than equations
Theorem CMVEI [56] implies that there would be infinitely many solutions. So, of all the possibilities in
Theorem PSSLS [55], only the case of a unique solution can be ruled out.

M54     Contributed by Robert Beezer  Statement [58]
The system could be inconsistent. If it is consistent, then Theorem CMVEI [56] tells us the solution set
will be infinite. So we can be certain that there is not a unique solution.

M56     Contributed by Robert Beezer  Statement [58]
The system could be inconsistent. If it is consistent, and since 12 > 6, then Theorem CMVEI [56] says
we will have infinitely many solutions. So there are two possibilities. Theorem PSSLS [55] allows to state
equivalently that a unique solution is an impossibility.

M57     Contributed by Robert Beezer  Statement [58]
7 pivot columns implies that there are r = 7 nonzero rows (so row 8 is all zeros in the reduced row-echelon
form). Then n + 1 = 6 + 1 = 7 = r and Theorem ISRN [54] allows to conclude that the system is
inconsistent.

T10    Contributed by Robert Beezer  Statement [58]
Theorem FVCS [55] will indicate a negative number of free variables, but we can say even more. If r > n,
then the only possibility is that r =rn + 1, and then we compute n~ - r =n - (nt + 1) =-1 free variables.

T40 Contributed by Robert Beezer Statement [58]
Since the system is consistent, we know there is either a unique solution, or infinitely many solutions
(Theorem PSSLS [55]). If we perform row operations (Definition RO [28]) on the augmented matrix of the
system, the two equal columns of the coefficient matrix will suffer the same fate, and remain equal in the
final reduced row-echelon form. Suppose both of these columns are pivot columns (Definition RREF [30]).
Then there is single row containing the two leading 1's of the two pivot columns, a violation of reduced
row-echelon form (Definition RREF [30]). So at least one of these columns is not a pivot column, and the
column index indicates a free variable in the description of the solution set (Definition IDV [52]). With a


free variable, we arrive at an infinite solution set (Theorem FVCS [55]).

T41    Contributed by Robert Beezer  Statement [58]
The condition about the multiple of the column of constants will allow you to show that the following


Version 2.02


﻿
                                                                      Subsection TSS.SOL  Solutions 61


values form a solution of the system IJS(A, b),

    x1 = 0      2 =0     ...    zg_1 = 0     zy = a     j+1 =0      ...    zn_1 = 0      n= 0

With one solution of the system known, we can say the system is consistent (Definition CS [50]).
    A more involved proof can be built using Theorem RCLS [53]. Begin by proving that each of the three
row operations (Definition RO [28]) will convert the augmented matrix of the system into another matrix
where column j is a times the entry of the same row in the last column. In other words, the "column
multiple property" is preserved under row operations. These proofs will get successively more involved as
you work through the three operations.
    Now construct a proof by contradiction (Technique CD [692]), by supposing that the system is incon-
sistent. Then the last column of the reduced row-echelon form of the augmented matrix is a pivot column
(Theorem RCLS [53]). Then column j must have a zero in the same row as the leading 1 of the final
column. But the "column multiple property" implies that there is an a in column j in the same row as the
leading 1. So a = 0. By hypothesis, then the vector of constants is the zero vector. However, if we began
with a final column of zeros, row operations would never have created a leading 1 in the final column. This
contradicts the final column being a pivot column, and therefore the system cannot be inconsistent.


Version 2.02


﻿
                                                   Section HSE  Homogeneous Systems of Equations 62


Section HSE
Homogeneous Systems of Equations


In this section we specialize to systems of linear equations where every equation has a zero as its constant
term. Along the way, we will begin to express more and more ideas in the language of matrices and begin
a move away from writing out whole systems of equations. The ideas initiated in this section will carry
through the remainder of the course.

Subsection SHS
Solutions of Homogeneous Systems


As usual, we begin with a definition.
Definition HS
Homogeneous System
A system of linear equations, IJS(A, b) is homogeneous if the vector of constants is the zero vector, in
other words, b = 0.                                                                             A

Example AHSAC
Archetype C as a homogeneous system
For each archetype that is a system of equations, we have formulated a similar, yet different, homogeneous
system of equations by replacing each equation's constant term with a zero. To wit, for Archetype C [712],
we can convert the original system of equations into the homogeneous system,

                                     2xi - 3X2 + X3 - 6x4 = 0
                                     4x1 + x2 + 2x3 + 9X4 = 0
                                     3x1+x2+x3+8X4 =0

Can you quickly find a solution to this system without row-reducing the augmented matrix?
   As you might have discovered by studying Example AHSAC [62], setting each variable to zero will
always be a solution of a homogeneous system. This is the substance of the following theorem.
Theorem HSC
Homogeneous Systems are Consistent
Suppose that a system of linear equations is homogeneous. Then the system is consistent.    Q
Proof Set each variable of the system to zero. When substituting these values into each equation, the
left-hand side evaluates to zero, no matter what the coefficients are. Since a homogeneous system has zero
on the right-hand side of each equation as the constant term, each equation is true. With one demonstrated
solution, we can call the system consistent.U
   Since this solution is so obvious, we now define it as the trivial solution.
Definition TSHSE
Trivial Solution to Homogeneous Systems of Equations
Suppose a homogeneous system of linear equations has n~ variables. The solution zi1 0, X2 =0,. . ., on=0
(i.e. x =0) is called the trivial solution.                                                     A


   Here are three typical examples, which we will reference throughout this section. Work through the
row operations as we bring each to reduced row-echelon form. Also notice what is similar in each example,
and what differs.


Version 2.02


﻿
                                             Subsection HSE.SHS Solutions of Homogeneous Systems 63


Example HUSAB
Homogeneous, unique solution, Archetype B
Archetype B can be converted to the homogeneous system,

                                      -11x1 + 2x2 - 14x3 = 0
                                      23xi- 6x2 + 33x3 = 0
                                      14xi - 22 + 173= 0

whose augmented matrix row-reduces to

                                          1    0   0   0-
                                          0   [-1  0   0
                                          0    0  M1   0_

By Theorem HSC [62], the system is consistent, and so the computation n - r = 3 - 3 = 0 means the
solution set contains just a single solution. Then, this lone solution must be the trivial solution.

Example HISAA
Homogeneous, infinite solutions, Archetype A
Archetype A [702] can be converted to the homogeneous system,

                                         1- x2 + 2x3 = 0
                                         2x1+xz2+z3 = 0
                                         23: + z2 +   =: 0
                                         3:1+3:2      =0

whose augmented matrix row-reduces to

                                           1   0   1   0
                                           0  [-1 -1 0
                                           0   0   0   0]

By Theorem HSC [62], the system is consistent, and so the computation n - r = 3 - 2 = 1 means the
solution set contains one free variable by Theorem FVCS [55], and hence has infinitely many solutions. We
can describe this solution set using the free variable 33,


                      S =     x2  |Xz = -X3, x2 = X3   =     x3   |X3 E C


Geometrically, these are points in three dimensions that lie on a line through the origin.17

Example HISAD
Homogeneous, infinite solutions, Archetype D
Archetype D [716] (and identically, Archetype E [720]) can be converted to the homogeneous system,

                                       23:1 + z2 + 73:3 - 73:4 - 0
                                    -3x1 +43:2- 533-63:4=0
                                        zi +3:2 +43:3 - 53:4 =0

whose augmented matrix row-reduces to


1   0   3 -2 0
0       1 -3 0
0   0   0   0   0]


Version 2.02


﻿
                                                       Subsection HSE.NSM  Null Space of a Matrix 64


By Theorem HSC [62], the system is consistent, and so the computation n - r = 4 - 2 = 2 means the
solution set contains two free variables by Theorem FVCS [55], and hence has infinitely many solutions.
We can describe this solution set using the free variables x3 and x4,


                          S           |=3 +1 = -3x3 + 2X4,:42 = -z3 + 3X4
                                  x3
                                  _4_
                                  -3x3 + 2X4

                                  [ _ 3x~34j  3, x4 E   C
                                      X3


   After working through these examples, you might perform the same computations for the slightly larger
example, Archetype J [741].
   Notice that when we do row operations on the augmented matrix of a homogeneous system of linear
equations the last column of the matrix is all zeros. Any one of the three allowable row operations will
convert zeros to zeros and thus, the final column of the matrix in reduced row-echelon form will also be
all zeros. So in this case, we may be as likely to reference only the coefficient matrix and presume that we
remember that the final column begins with zeros, and after any number of row operations is still zero.
   Example HISAD [63] suggests the following theorem.

Theorem HMVEI
Homogeneous, More Variables than Equations, Infinite solutions
Suppose that a homogeneous system of linear equations has m equations and n variables with n > m.
Then the system has infinitely many solutions.                                                   D

Proof We are assuming the system is homogeneous, so Theorem HSC [62] says it is consistent. Then the
hypothesis that n> m, together with Theorem CMVEI [56], gives infinitely many solutions.

   Example HUSAB [63] and Example HISAA [63] are concerned with homogeneous systems where n = m
and expose a fundamental distinction between the two examples. One has a unique solution, while the
other has infinitely many. These are exactly the only two possibilities for a homogeneous system and
illustrate that each is possible (unlike the case when n> m where Theorem HMVEI [64] tells us that there
is only one possibility for a homogeneous system).


Subsection NSM
Null Space of a Matrix


The set of solutions to a homogeneous system (which by Theorem HSC [62] is never empty) is of enough
interest to warrant its own name. However, we define it as a property of the coefficient matrix, not as a
property of some system of equations.


Definition NSM
Null Space of a Matrix
The null space of a matrix A, denoted Af(A), is the set of all the vectors that are solutions to the
homogeneous system CS(A, 0).


Version 2.02


﻿
                                                        Subsection HSE.NSM  Null Space of a Matrix 65


(This definition contains Notation NSM.)                                                          A
   In the Archetypes (Appendix A [698]) each example that is a system of equations also has a corre-
sponding homogeneous system of equations listed, and several sample solutions are given. These solutions
will be elements of the null space of the coefficient matrix. We'll look at one example.
Example NSEAI
Null space elements of Archetype I
The write-up for Archetype I [737] lists several solutions of the corresponding homogeneous system. Here
are two, written as solution vectors. We can say that they are in the null space of the coefficient matrix
for the system of equations in Archetype I [737].

                                 3                                   -4
                                 0                                    1
                                 -5                                  -3
                           x=    -6                            y=    -2
                                 0                                    1
                                 0                                    1
                                 1                                    1

However, the vector
                                                    1
                                                    0
                                                    0
                                              z=0
                                                    0
                                                    0
                                                    _2
is not in the null space, since it is not a solution to the homogeneous system. For example, it fails to even
make the first equation true.
   Here are two (prototypical) examples of the computation of the null space of a matrix.
Example CNS1
Computing a null space, #1
Let's compute the null space of
                                          2 -1     7   -3 -8
                                    A=    1   0    2    4   9
                                          2   2   -2 -1     8
which we write as (A). Translating Definition NSM [64], we simply desire to solve the homogeneous
system [S(A, 0). So we row-reduce the augmented matrix to obtain

                                         [  0    2   0   10
                                         O 2    -3   0   40


The variables (of the homogeneous system) x3 and x5 are free (since columns 1, 2 and 4 are pivot columns),
so we arrange the equations represented by the matrix in reduced row-echelon form to

                                             z1=-2x3 - z


2 = 3x3 - 4x5
X4 =-2x5


Version 2.02


﻿
                                                        Subsection HSE.READ   Reading Questions 66


So we can write the infinite solution set as sets using column vectors,

                                           -2X3 - X5
                                           3X3 - 4x5
                               N1(A) =        z3|3, x5 E C
                                             -2x5
                                          {   ;5                 }


Example CNS2
Computing a null space, #2
Let's compute the null space of
                                               -4 6 1
                                               -1 4 1
                                            C   5  6 7
                                                4  7 1

which we write as P1(C). Translating Definition NSM [64], we simply desire to solve the homogeneous
system IJS(C, 0). So we row-reduce the augmented matrix to obtain

                                            0 0    0   0-

                                          0   I["1] 00
                                             0000]

                                          0    0   0   0_

There are no free variables in the homogeneous system represented by the row-reduced matrix, so there is
only the trivial solution, the zero vector, 0. So we can write the (trivial) solution set as

                                                       0
                                      N(C) = {o} =     0
                                                       -0


Subsection READ
Reading Questions


  1. What is always true of the solution set for a homogeneous system of equations?

  2. Suppose a homogeneous system of equations has 13 variables and 8 equations. How many solutions
     will it have? Why?

  3. Describe in words (not symbols) the null space of a matrix.


Version 2.02


﻿
                                                                  Subsection HSE.EXC  Exercises 67


Subsection EXC
Exercises


C1O Each Archetype (Appendix A [698]) that is a system of equations has a corresponding homogeneous
system with the same coefficient matrix. Compute the set of solutions for each. Notice that these solution
sets are the null spaces of the coefficient matrices.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]
Archetype G [729]/ Archetype H [733]
Archetype I [737]
and Archetype J [741]
Contributed by Robert Beezer

C20 Archetype K [746] and Archetype L [750] are simply 5 x 5 matrices (i.e. they are not systems of
equations). Compute the null space of each matrix.
Contributed by Robert Beezer

C30   Compute the null space of the matrix A, N(A).

                                          2    4   1    3   8
                                    A4     1 -2-1 -1 1
                                        4 2    4   0   -3 4
                                        2      4   -1 -7 4_

Contributed by Robert Beezer   Solution [69]

C31 Find the null space of the matrix B, N(B).

                                          -6    4   -36   6
                                    B=     2   -1   10   -1
                                          -3    2   -18   3]

Contributed by Robert Beezer Solution [69]

M45 Without doing any computations, and without examining any solutions, say as much as possible
about the form of the solution set for corresponding homogeneous system of equations of each archetype
that is a system of equations.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716] /Archetype E [720]
Archetype F [724]
Archetype G [729] /Archetype H [733]
Archetype I [737]
Archetype J [741]


Contributed by Robert Beezer

   For Exercises M50-M52 say as much as possible about each system's solution set. Be sure to make
it clear which theorems you are using to reach your conclusions.


Version 2.02


﻿
                                                                    Subsection HSE.EXC   Exercises 68


M50 A homogeneous system of 8 equations in 8 variables.
Contributed by Robert Beezer Solution [69]

M51 A homogeneous system of 8 equations in 9 variables.
Contributed by Robert Beezer Solution [70]

M52 A homogeneous system of 8 equations in 7 variables.
Contributed by Robert Beezer Solution [70]

T10 Prove or disprove: A system of linear equations is homogeneous if and only if the system has the
zero vector as a solution.
Contributed by Martin Jackson Solution [70]

                                                                                           a1
                                                                                           U2
T20    Consider the homogeneous system of linear equations IJS(A, 0), and suppose that u = u3   is one


                                                    4u1
                                                    4u2
solution to the system of equations. Prove that v = 4u3  is also a solution to [S(A, 0).

                                                    4un_
Contributed by Robert Beezer Solution [70]


Version 2.02


﻿
                                                                   Subsection HSE.SOL  Solutions 69


Subsection SOL
Solutions


C30    Contributed by Robert Beezer   Statement [67]
Definition NSM [64] tells us that the null space of A is the solution set to the homogeneous system IJS(A, 0).
The augmented matrix of this system is

                                       2   4    1    3   8 0
                                       -1 -2   -1 -1 1 0
                                       2   4    0   -3 4 0
                                       2   4   -1 -7 4 0_

To solve the system, we row-reduce the augmented matrix and obtain,

                                           -2  0   0   5   0
                                       0   0 Q     0   -8 0
                                       0   0   0  Q1   2   0
                                       0   0   0   0   0   0_

This matrix represents a system with equations having three dependent variables (x1, x3, and x4) and two
independent variables (x2 and x5). These equations rearrange to

                xz1= -2x2 - 5X5                   s= 8x5                 X4 =-2x5

So we can write the solution set (which is the requested null space) as

                                           -2x2 - 5X5
                                               x2
                               Af(A) =        8x5       x2, x5 E C
                                             -2x5
                                          {     5

C31    Contributed by Robert Beezer   Statement [67]
We form the augmented matrix of the homogeneous system [S(B, 0) and row-reduce the matrix,

                         -6    4   -36   6   0           1   0    2   1 0
                         2     -1    0   -1  0   RREF:0 1        -6 3 0
                         [-3   2   -18   30]            _0   0    0  0 0

We knew ahead of time that this system would be consistent (Theorem HSC [62]), but we can now see
there are n~ - r =4 -2 =2 free variables, namely x3 and X4 (Theorem FVCS [55]). Based on this analysis,
we can rearrange the equations associated with each nonzero row of the reduced row-echelon form into an
expression for the lone dependent variable as a function of the free variables. We arrive at the solution set
to the homogeneous system, which is the null space of the matrix by Definition NSM [64],


                               PJ(B)    {  6xX z         3, X4 E C}


                                        lL     3X4  J             )

M50     Contributed by Robert Beezer   Statement [68]
Since the system is homogeneous, we know it has the trivial solution (Theorem HSC [62]). We cannot say


Version 2.02


﻿
                                                                      Subsection HSE.SOL  Solutions 70


anymore based on the information provided, except to say that there is either a unique solution or infinitely
many solutions (Theorem PSSLS [55]). See Archetype A [702] and Archetype B [707] to understand the
possibilities.
M51     Contributed by Robert Beezer  Statement [68]
Since there are more variables than equations, Theorem HMVEI [64] applies and tells us that the solution
set is infinite. From the proof of Theorem HSC [62] we know that the zero vector is one solution.
M52     Contributed by Robert Beezer  Statement [68]
By Theorem HSC [62], we know the system is consistent because the zero vector is always a solution of a
homogeneous system. There is no more that we can say, since both a unique solution and infinitely many
solutions are possibilities.
T10    Contributed by Robert Beezer  Statement [68]
This is a true statement. A proof is:
    (-) Suppose we have a homogeneous system IJS(A, 0). Then by substituting the scalar zero for each
variable, we arrive at true statements for each equation. So the zero vector is a solution. This is the
content of Theorem HSC [62].
    (<) Suppose now that we have a generic (i.e. not necessarily homogeneous) system of equations,
[S(A, b) that has the zero vector as a solution. Upon substituting this solution into the system, we
discover that each component of b must also be zero. So b = 0.
T20    Contributed by Robert Beezer  Statement [68]
Suppose that a single equation from this system (the i-th one) has the form,

                                az1x1 + ai2x2 + a  -3x3 + ... + ainz = 0

Evaluate the left-hand side of this equation with the components of the proposed solution vector v,

          agi (4ui) + ai2 (4u2) + ai3 (4u3) + ... + ain (4un)
                   = 4aiaii + 4ai2u2 + 4a3u-3 + - ---+ 4aanun   Commutativity
                   = 4 (asiai + ai2a2 + a3u-3 + - ---+ ainu)   Distributivity
                   = 4(0)                                             u solution to [S(A, 0)
                   = 0

So v makes each equation true, and so is a solution to the system.
    Notice that this result is not true if we change [S(A, 0) from a homogeneous system to a non-
homogeneous system. Can you create an example of a (non-homogeneous) system with a solution u
such that v is not a solution?


Version 2.02


﻿
                                                                  Section NM   Nonsingular Matrices 71


Section NM
Nonsingular Matrices


In this section we specialize and consider matrices with equal numbers of rows and columns, which when
considered as coefficient matrices lead to systems with equal numbers of equations and variables. We will
see in the second half of the course (Chapter D [370], Chapter E [396] Chapter LT [452], Chapter R [530])
that these matrices are especially important.


Subsection NM
Nonsingular Matrices


Our theorems will now establish connections between systems of equations (homogeneous or otherwise),
augmented matrices representing those systems, coefficient matrices, constant vectors, the reduced row-
echelon form of matrices (augmented and coefficient) and solution sets. Be very careful in your reading,
writing and speaking about systems of equations, matrices and sets of vectors. A system of equations is
not a matrix, a matrix is not a solution set, and a solution set is not a system of equations. Now would be
a great time to review the discussion about speaking and writing mathematics in Technique L [688].

Definition SQM
Square Matrix
A matrix with m rows and n columns is square if m = n. In this case, we say the matrix has size n. To
emphasize the situation when a matrix is not square, we will call it rectangular.           A

   We can now present one of the central definitions of linear algebra.

Definition NM
Nonsingular Matrix
Suppose A is a square matrix. Suppose further that the solution set to the homogeneous linear system
of equations IJS(A, 0) is {0}, i.e. the system has only the trivial solution. Then we say that A is a
nonsingular matrix. Otherwise we say A is a singular matrix.                                       A

   We can investigate whether any square matrix is nonsingular or not, no matter if the matrix is derived
somehow from a system of equations or if it is simply a matrix. The definition says that to perform this
investigation we must construct a very specific system of equations (homogeneous, with the matrix as
the coefficient matrix) and look at its solution set. We will have theorems in this section that connect
nonsingular matrices with systems of equations, creating more opportunities for confusion. Convince
yourself now of two observations, (1) we can decide nonsingularity for any square matrix, and (2) the
determination of nonsingularity involves the solution set for a certain homogeneous system of equations.
   Notice that it makes no sense to call a system of equations nonsingular (the term does not apply to a
system of equations), nor does it make any sense to call a 5 x 7 matrix singular (the matrix is not square).

Example S
A singular matrix, Archetype A
Example HISAA [63] shows that the coefficient matrix derived from Archetype A [702], specifically the


3 x 3 matrix,
                                                1 -1 2
                                          A=    2   1   1
                                                1   1   0


Version 2.02


﻿
                                                           Subsection NM.NM  Nonsingular Matrices 72


is a singular matrix since there are nontrivial solutions to the homogeneous system IS(A, 0).

Example NM
A nonsingular matrix, Archetype B
Example HUSAB [63] shows that the coefficient matrix derived from Archetype B [707], specifically the
3 x 3 matrix,
                                              --7  -6 -12
                                        B=     5    5    7
                                               1    0    4

is a nonsingular matrix since the homogeneous system, [S(B, 0), has only the trivial solution.

   Notice that we will not discuss Example HISAD [63] as being a singular or nonsingular coefficient
matrix since the matrix is not square.
   The next theorem combines with our main computational technique (row-reducing a matrix) to make
it easy to recognize a nonsingular matrix. But first a definition.

Definition IM
Identity Matrix
The m x m identity matrix, Im, is defined by

                                           Ii i= j      1
                                 [Im]e.        .          <z, j <;m


(This definition contains Notation IM.)                                                            A

Example IM
An identity matrix
The 4 x 4 identity matrix is
                                               1 0 0 0

                                         14    0 1 0 0
                                               0 0    1 0
                                               0 0 0 1


   Notice that an identity matrix is square, and in reduced row-echelon form. So in particular, if we were
to arrive at the identity matrix while bringing a matrix to reduced row-echelon form, then it would have
all of the diagonal entries circled as leading 1's.

Theorem NMRRI
Nonsingular Matrices Row Reduce to the Identity matrix
Suppose that A is a square matrix and B is a row-equivalent matrix in reduced row-echelon form. Then
A is nonsingular if and only if B is the identity matrix.D

Proof (<-) Suppose B is the identity matrix. When the augmented matrix [A |0] is row-reduced, the
result is [B I0] =[In 0]. The number of nonzero rows is equal to the number of variables in the linear
system of equations [S(A, 0), 50 nr= r and Theorem FVCS [55] gives n~ - r =0 free variables. Thus, the
homogeneous system [S(A, 0) has just one solution, which must be the trivial solution. This is exactly


the definition of a nonsingular matrix.
    (-) If A is nonsingular, then the homogeneous system [S(A, 0) has a unique solution, and has no
free variables in the description of the solution set. The homogeneous system is consistent (Theorem HSC
[62]) so Theorem FVCS [55] applies and tells us there are n - r free variables. Thus, n - r = 0, and so


Version 2.02


﻿
                                           Subsection NM.NSNM   Null Space of a Nonsingular Matrix 73


n = r. So B has n pivot columns among its total of n columns. This is enough to force B to be the n x n
identity matrix In.
   Notice that since this theorem is an equivalence it will always allow us to determine if a matrix is
either nonsingular or singular. Here are two examples of this, continuing our study of Archetype A and
Archetype B.
Example SRR
Singular matrix, row-reduced
The coefficient matrix for Archetype A [702] is

                                               1 -1 2
                                         A42       1  1
                                               1   1  0

which when row-reduced becomes the row-equivalent matrix

                                              1   0    1
                                       B=     0 LE    -1 .
                                             0    0   0]

Since this matrix is not the 3 x 3 identity matrix, Theorem NMRRI [72] tells us that A is a singular matrix.


Example NSR
Nonsingular matrix, row-reduced
The coefficient matrix for Archetype B [707] is

                                             -7  -6   -12]
                                       A=    5    5    7
                                             1    0    4

which when row-reduced becomes the row-equivalent matrix

                                              0       0
                                       B=     0  []0      .
                                             0 0 0_

Since this matrix is the 3 x 3 identity matrix, Theorem NMRRI [72] tells us that A is a nonsingular matrix.


Subsection NSNM
Null Space of a Nonsingular Matrix


Nonsingular matrices and their null spaces are intimately related, as the next two examples illustrate.

Example NSS
Null space of a singular matrix
Given the coefficient matrix from Archetype A [702],


      1 -1 2
A=    2   1   1
      1   1  0


Version 2.02


﻿
                                            Subsection NM.NSNM   Null Space of a Nonsingular Matrix 74


the null space is the set of solutions to the homogeneous system of equations IJS(A, 0) has a solution set
and null space constructed in Example HISAA [63] as

                                                _x3
                                    A(A) = V     x3j    3 E C}


Example NSNM
Null space of a nonsingular matrix
Given the coefficient matrix from Archetype B [707],

                                              -7   -6  -12]
                                        A=    5    5    7
                                               1   0    4

the homogeneous system [S(A, 0) has a solution set constructed in Example HUSAB [63] that contains
only the trivial solution, so the null space has only a single element,

                                                      0
                                          ,N(A){=     0
                                                     -0


   These two examples illustrate the next theorem, which is another equivalence.
Theorem NMTNS
Nonsingular Matrices have Trivial Null Spaces
Suppose that A is a square matrix. Then A is nonsingular if and only if the null space of A, P1(A), contains
only the zero vector, i.e. P1(A) = {0}.                                                           D
Proof The null space of a square matrix, A, is equal to the set of solutions to the homogeneous system,
[S(A, 0). A matrix is nonsingular if and only if the set of solutions to the homogeneous system, [S(A, 0),
has only a trivial solution. These two observations may be chained together to construct the two proofs
necessary for each half of this theorem.                                                          U
   The next theorem pulls a lot of big ideas together. Theorem NMUS [74] tells us that we can learn
much about solutions to a system of linear equations with a square coefficient matrix by just examining a
similar homogeneous system.
Theorem NMUS
Nonsingular Matrices and Unique Solutions
Suppose that A is a square matrix. A is a nonsingular matrix if and only if the system [S(A, b) has a
unique solution for every choice of the constant vector b.D
Proof (<-) The hypothesis for this half of the proof is that the system [S(A, b) has a unique solution
for every choice of the constant vector b. We will make a very specific choice for b: b =0. Then we know
that the system [S(A, 0) has a unique solution. But this is precisely the definition of what it means for
A to be nonsingular (Definition NM [71]). That almost seems too easy! Notice that we have not used the
full power of our hypothesis, but there is nothing that says we must use a hypothesis to its fullest.


    (-) We assume that A is nonsingular of size n x n, so we know there is a sequence of row operations that
will convert A into the identity matrix In (Theorem NMRRI [72]). Form the augmented matrix A' = [A | b]
and apply this same sequence of row operations to A'. The result will be the matrix B' = [In I c], which is
in reduced row-echelon form with r = n. Then the augmented matrix B' represents the (extremely simple)


Version 2.02


﻿
                                                            Subsection NM.READ  Reading Questions 75


system of equations x2 = [c] , 1 < i < n. The vector c is clearly a solution, so the system is consistent
(Definition CS [50]). With a consistent system, we use Theorem FVCS [55] to count free variables. We
find that there are n - r = n - n = 0 free variables, and so we therefore know that the solution is unique.
(This half of the proof was suggested by Asa Scherer.)


    This theorem helps to explain part of our interest in nonsingular matrices. If a matrix is nonsingular,
then no matter what vector of constants we pair it with, using the matrix as the coefficient matrix will
always yield a linear system of equations with a solution, and the solution is unique. To determine if a
matrix has this property (non-singularity) it is enough to just solve one linear system, the homogeneous
system with the matrix as coefficient matrix and the zero vector as the vector of constants (or any other
vector of constants, see Exercise MM.T10 [207]).
    Formulating the negation of the second part of this theorem is a good exercise. A singular matrix has
the property that for some value of the vector b, the system [S(A, b) does not have a unique solution
(which means that it has no solution or infinitely many solutions). We will be able to say more about this
case later (see the discussion following Theorem PSPHS [105]). Square matrices that are nonsingular have
a long list of interesting properties, which we will start to catalog in the following, recurring, theorem. Of
course, singular matrices will then have all of the opposite properties. The following theorem is a list of
equivalences. We want to understand just what is involved with understanding and proving a theorem
that says several conditions are equivalent. So have a look at Technique ME [693] before studying the first
in this series of theorems.
Theorem NME1
Nonsingular Matrix Equivalences, Round 1
Suppose that A is a square matrix. The following are equivalent.

  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, Nf(A) = {0}.

  4. The linear system IJS(A, b) has a unique solution for every possible choice of b.


Proof That A is nonsingular is equivalent to each of the subsequent statements by, in turn, Theorem
NMRRI [72], Theorem NMTNS [74] and Theorem NMUS [74]. So the statement of this theorem is just a
convenient way to organize all these results.                                                        U
    Finally, you may have wondered why we refer to a matrix as nonsingular when it creates systems
of equations with single solutions (Theorem NMUS [74])! I've wondered the same thing. We'll have an
opportunity to address this when we get to Theorem SMZD [389]. Can you wait that long?

Subsection READ
Reading Questions


   1. What is the definition of a nonsingular matrix?

   2. What is the easiest way to recognize a nonsingular matrix?


3. Suppose we have a system of equations and its coefficient matrix is nonsingular. What can you say
   about the solution set for this system?


Version 2.02


﻿
                                                                   Subsection NM.EXC   Exercises 76


Subsection EXC
Exercises


In Exercises C30-C33 determine if the matrix is
C30
                                          -3
                                          2
                                          1
                                          _5

Contributed by Robert Beezer Solution [78]


nonsingular or singular. Give reasons for your answer.


1   2
0 3
2 7
-1 2


8
4
-4
0_


C31


2
1
-1
1


3 1
1 1
2 3
2 1


4
0
5
3_


Contributed by Robert Beezer Solution [78]


C32


9
5
4


3 2
-6 1
1 3


Contributed by Robert Beezer Solution [78]


C33


[


-1 2
1 -3
-2 0
-3 1


0
-2
4
-2


3
4
3
3]


Contributed by Robert Beezer


Solution [78]


C40 Each of the archetypes below is a system of equations with a square coefficient matrix, or is itself
a square matrix. Determine if these matrices are nonsingular, or singular. Comment on the null space of
each matrix.
Archetype A [702]
Archetype B [707]
Archetype F [724]
Archetype K [746]
Archetype L [750]
Contributed by Robert Beezer

C50 Find the null space of the matrix E below.


   2

E 1
   -1


1
2
2
2


-1
-6
-8
-12


-9
-6
0
12]


Contributed by Robert Beezer Solution [78]


Version 2.02


﻿
                                                                     Subsection NM.EXC   Exercises 77


M30 Let A be the coefficient matrix of the system of equations below. Is A nonsingular or singular?
Explain what you could infer about the solution set for the system based only on what you have learned
about A being singular or nonsingular.

                                                 -x1 + 5x2 = -8
                                    -2x1 + 5x2 + 5x3 + 2X4=9
                                       -3xi - x2 + 3x3 +4 = 3
                                       7xi + 6X2 + 533 +4 = 30


Contributed by Robert Beezer Solution [79]

   For Exercises M51-M52 say as much as possible about each system's solution set. Be sure to make
it clear which theorems you are using to reach your conclusions.
M51 6 equations in 6 variables, singular coefficient matrix.
Contributed by Robert Beezer Solution [79]

M52 A system with a nonsingular coefficient matrix, not homogeneous.
Contributed by Robert Beezer Solution [79]

T10 Suppose that A is a singular matrix, and B is a matrix in reduced row-echelon form that is row-
equivalent to A. Prove that the last row of B is a zero row.
Contributed by Robert Beezer Solution [79]

T30 Suppose that A is a nonsingular matrix and A is row-equivalent to the matrix B. Prove that B is
nonsingular.
Contributed by Robert Beezer Solution [79]

T90 Provide an alternative for the second half of the proof of Theorem NMUS [74], without appealing
to properties of the reduced row-echelon form of the coefficient matrix. In other words, prove that if A is
nonsingular, then IJS(A, b) has a unique solution for every choice of the constant vector b. Construct this
proof without using Theorem REMEF [30] or Theorem RREFU [32].
Contributed by Robert Beezer Solution [79]


Version 2.02


﻿
                                                                    Subsection NM.SOL  Solutions 78


Subsection SOL
Solutions


C30    Contributed by Robert Beezer   Statement [76]
The matrix row-reduces to
                                         0     0   0   0


                                         0     0   0
which is the 4 x 4 identity matrix. By Theorem NMRRI [72] the original matrix must be nonsingular.
C31    Contributed by Robert Beezer   Statement [76]
Row-reducing the matrix yields,
                                          1    0   0   -21
                                          0     E0      3
                                          0   0   LE   -1
                                          0   0    0   0
Since this is not the 4 x 4 identity matrix, Theorem NMRRI [72] tells us the matrix is singular.
C32    Contributed by Robert Beezer   Statement [76]
The matrix is not square, so neither term is applicable. See Definition NM [71], which is stated for just
square matrices.
C33    Contributed by Robert Beezer   Statement [76]
Theorem NMRRI [72] tells us we can answer this question by simply row-reducing the matrix. Doing this
we obtain,

                                          0   M1   0   0

                                          0    0  M1   0

Since the reduced row-echelon form of the matrix is the 4 x 4 identity matrix I4, we know that B is
nonsingular.
C50    Contributed by Robert Beezer   Statement [76]
We form the augmented matrix of the homogeneous system [S(E, 0) and row-reduce the matrix,

                          2   1   -1   -9 0]               0    2   -6   0
                               2  2 -6  6 0RREF, 0oW           -5    3   01
                                 1   2 8 0 0        '0     0   0     0 0
                                 -1 -1 120_            0   0   0     0 0]

We knew ahead of time that this system would be consistent (Theorem HSC [62]), but we can now see
there are n~ - r =4 - 2 =2 free variables, namely x3 and X4 since F ={3, 4, 5} (Theorem FVCS [55]).
Based on this analysis, we can rearrange the equations associated with each nonzero row of the reduced
row-echelon form into an expression for the lone dependent variable as a function of the free variables. We
arrive at the solution set to this homogeneous system, which is the null space of the matrix by Definition
NSM [64],


            -2x3 + 6x4
N(E) =[5z3-4] 3:3, 4 E C}

                34


Version 2.02


﻿
                                                                     Subsection NM.SOL  Solutions 79


M30     Contributed by Robert Beezer  Statement [77]
We row-reduce the coefficient matrix of the system of equations,

                              -1    5   0 0              0      0   0
                              -2    5   5 2    RREF    0  FI0       0
                              -3 -1 3 1                  0 0     O0
                              7     6   5 1_           0   0    0[

Since the row-reduced version of the coefficient matrix is the 4 x 4 identity matrix, 14 (Definition IM [72]
byTheorem NMRRI [72], we know the coefficient matrix is nonsingular. According to Theorem NMUS
[74] we know that the system is guaranteed to have a unique solution, based only on the extra information
that the coefficient matrix is nonsingular.
M51     Contributed by Robert Beezer  Statement [77]
Theorem NMRRI [72] tells us that the coefficient matrix will not row-reduce to the identity matrix. So
if we were to row-reduce the augmented matrix of this system of equations, we would not get a unique
solution. So by Theorem PSSLS [55] the remaining possibilities are no solutions, or infinitely many.
M52     Contributed by Robert Beezer  Statement [77]
Any system with a nonsingular coefficient matrix will have a unique solution by Theorem NMUS [74]. If
the system is not homogeneous, the solution cannot be the zero vector (Exercise HSE.T10 [68]).
T10    Contributed by Robert Beezer  Statement [77]
Let n denote the size of the square matrix A. By Theorem NMRRI [72] the hypothesis that A is singular
implies that B is not the identity matrix In. If B has n pivot columns, then it would have to be In, so B
must have fewer than n pivot columns. But the number of nonzero rows in B (r) is equal to the number
of pivot columns as well. So the n rows of B have fewer than n nonzero rows, and B must contain at least
one zero row. By Definition RREF [30], this row must be at the bottom of B.
T30    Contributed by Robert Beezer  Statement [77]
Since A and B are row-equivalent matrices, consideration of the three row operations (Definition RO [28])
will show that the augmented matrices, [A  0] and [B   0], are also row-equivalent matrices. This says
that the two homogeneous systems, IS(A, 0) and [S(B, 0) are equivalent systems. [S(A, 0) has only
the zero vector as a solution (Definition NM [71]), thus [S(B, 0) has only the zero vector as a solution.
Finally, by Definition NM [71], we see that B is nonsingular.
   Form a similar theorem replacing "nonsingular" by "singular" in both the hypothesis and the conclu-
sion. Prove this new theorem with an approach just like the one above, and/or employ the result about
nonsingular matrices in a proof by contradiction.
T90    Contributed by Robert Beezer  Statement [77]
We assume A is nonsingular, and try to solve the system [S(A, b) without making any assumptions about
b. To do this we will begin by constructing a new homogeneous linear system of equations that looks very
much like the original. Suppose A has size n (why must it be square?) and write the original system as,

                               a11x1 +| a12x2 +| a13x3 +| '. + ' ix -|-Gn=6
                               a21x1 +| a22x2 +| a23x3 +| '. + 'a| 2nxn =-6
                               a31x1 +| a32x2 +| a33x3 +| '. + 'a| 3n~n= 6

                                                                                                 (*)
                              an1x1 +| an2x2 +| an3x3 +| '. + ' | nnn=U


form the new, homogeneous system in n equations with n+ 1 variables, by adding a new variable y, whose
coefficients are the negatives of the constant terms,

                             a11x1 + a12x2 + a13x3 + . + alnxn - b1y= 0


Version 2.02


﻿
                                                                     Subsection NM.SOL   Solutions 80


                             a21x1 + a22x2 + a23x3 + '. + a2nn - b2y = 0
                             a31x1 + a32x2 + a33x3 + '. + a3nn - b3y = 0

                                                                                                 (**)
                            an1x1 + an2x2 + an3x3 + ... + annn - bny = 0

Since this is a homogeneous system with more variables than equations (m = n+1 > n), Theorem HMVEI
[64] says that the system has infinitely many solutions. We will choose one of these solutions, any one of
these solutions, so long as it is not the trivial solution. Write this solution as

         X1 C1           x2   C2         x3=C3           ... n           Cn         Y  Cn+1

We know that at least one value of the ci is nonzero, but we will now show that in particular cn+1 $ 0.
We do this using a proof by contradiction (Technique CD [692]). So suppose the ci form a solution as
described, and in addition that cn+1 = 0. Then we can write the i-th equation of system (**) as,

                             a ici + ai2C2 + af3c3+ --- +-aincn - bi(0) = 0

which becomes

                                    afici + ai2c2 + af3c3 + ... + ainc = 0


Since this is true for each i, we have that xl = Cl, X2 = C2, X3 = C3,..., n = cn is a solution to the
homogeneous system [S(A, 0) formed with a nonsingular coefficient matrix. This means that the only
possible solution is the trivial solution, so cl= 0, c2 = 0, C3 = 0, ..., cn = 0. So, assuming simply that
cn+1 = 0, we conclude that all of the ci are zero. But this contradicts our choice of the ci as not being the
trivial solution to the system (**). So cn+1 # 0.
   We now propose and verify a solution to the original system (*). Set
                Cl                  C2                  C3                              Cn
          1X= 2=x3 =...                                                          Xn
               Ca+l                Ch+l                Ch+l                            Ch+l

Notice how it was necessary that we know that cn+l $ 0 for this step to succeed. Now, evaluate the i-th
equation of system (*) with this proposed solution, and recognize in the third line that cl through cn+1
appear as if they were substituted into the left-hand side of the i-th equation of system (**),
                           Cl         C2        C3             Cn
                           Ca+l     C+l        Ch+l           Ch+l
                           1
                             l1-- i i a2C2 +| ai3C3 +| '. + a |-GiCn)

                             1
                             S  ailC1 +| ai2C2 +| ai3C3 +| '. + ' | i~  - biCn+1) +| bi
                           Cmpi
                           1
                           -    (0) + b
                           Cmpi


Since this equation is true for every i, we have found a solution to system (*). To finish, we still need to
establish that this solution is unique.


   With one solution in hand, we will entertain the possibility of a second solution. So assume system (*)
has two solutions,

           x1 = d1            x2 = d2            x3 = d3            -.-.-          n = dn


Version 2.02


﻿
                                                                        Subsection NM.SOL  Solutions 81


            Xi = ei            x2 = e2             x3 = e3             . . .         z = en

Then,

               (aii(di - el) + ai2(d2 - e2) + ai3(d3 - e3) + -.. + ain(d - en))
               - (aiidi + ai2d2 + ai3d3 + ... + aindn) - (aiei + ai2e2 + ai3e3 + ... + amnen)
               = bi - bi
               = 0

This is the i-th equation of the homogeneous system [S(A, 0) evaluated with xz = d3 - eg, 1 < j < n.
Since A is nonsingular, we must conclude that this solution is the trivial solution, and so 0 = d3 - ej,
1 < j <rn. That is, dj = ej for all j and the two solutions are identical, meaning any solution to (*) is
unique.

    Notice that the proposed solution (xi = j  ) appeared in this proof with no motivation whatsoever.
This is just fine in a proof. A proof should convince you that a theorem is true. It is your job to read the
proof and be convinced of every assertion. Questions like "Where did that come from?" or "How would I
think of that?" have no bearing on the validity of the proof.


Version 2.02


﻿
                                           Annotated Acronyms NM.SLE  Systems of Linear Equations 82


Annotated Acronyms SLE
Systems of Linear Equations


At the conclusion of each chapter you will find a section like this, reviewing selected definitions and
theorems. There are many reasons for why a definition or theorem might be placed here. It might
represent a key concept, it might be used frequently for computations, provide the critical step in many
proofs, or it may deserve special comment.
   These lists are not meant to be exhaustive, but should still be useful as part of reviewing each chapter.
We will mention a few of these that you might eventually recognize on sight as being worth memorization.
By that we mean that you can associate the acronym with a rough statement of the theorem  not that
the exact details of the theorem need to be memorized. And it is certainly not our intent that everything
on these lists is important enough to memorize.

Theorem RCLS [53]
We will repeatedly appeal to this theorem to determine if a system of linear equations, does, or doesn't,
have a solution. This one we will see often enough that it is worth memorizing.

Theorem HMVEI [64]
This theorem is the theoretical basis of several of our most important theorems. So keep an eye out for
it, and its descendants, as you study other proofs. For example, Theorem HMVEI [64] is critical to the
proof of Theorem SSLD [341], Theorem SSLD [341] is critical to the proof of Theorem G [355], Theorem
G [355] is critical to the proofs of the pair of similar theorems, Theorem ILTD [486] and Theorem SLTD
[502], while finally Theorem ILTD [486] and Theorem SLTD [502] are critical to the proof of an important
result, Theorem IVSED [516]. This chain of implications might not make much sense on a first reading,
but come back later to see how some very important theorems build on the seemingly simple result that is
Theorem HMVEI [64]. Using the "find" feature in whatever software you use to read the electronic version
of the text can be a fun way to explore these relationships.

Theorem NMRRI [72]
This theorem gives us one of simplest ways, computationally, to recognize if a matrix is nonsingular, or
singular. We will see this one often, in computational exercises especially.

Theorem NMUS [74]
Nonsingular matrices will be an important topic going forward (witness the NMEx series of theorems).
This is our first result along these lines, a useful theorem for other proofs, and also illustrates a more
general concept from Chapter LT [452].


Version 2.02


﻿


Chapter V

Vectors


We have worked extensively in the last chapter with matrices, and some with vectors. In this chapter we will
develop the properties of vectors, while preparing to study vector spaces (Chapter VS [279]). Initially we
will depart from our study of systems of linear equations, but in Section LC [90] we will forge a connection
between linear combinations and systems of linear equations in Theorem SLSLC [93]. This connection will
allow us to understand systems of linear equations at a higher level, while consequently discussing them
less frequently.


Section VO
Vector Operations
.                                                                                                  -


In this section we define some new operations involving vectors, and collect some basic properties of these
operations. Begin by recalling our definition of a column vector as an ordered list of complex numbers,
written vertically (Definition CV [24]). The collection of all possible vectors of a fixed size is a commonly
used set, so we start with its definition.

Definition VSCV
Vector Space of Column Vectors
The vector space Ctm is the set of all column vectors (Definition CV [24]) of size m with entries from the
set of complex numbers, C.
(This definition contains Notation VSCV.)                                                          A


    When a set similar to this is defined using only column vectors where all the entries are from the real
numbers, it is written as Rm and is known as Euclidean rn-space.
    The term "vector" is used in a variety of different ways. We have defined it as an ordered list written
vertically. It could simply be an ordered list of numbers, and written as (2, 3, -1, 6). Or it could be
interpreted as a point in m dimensions, such as (3, 4, -2) representing a point in three dimensions relative
to x, y and z axes. With an interpretation as a point, we can construct an arrow from the origin to the
point which is consistent with the notion that a vector has direction and magnitude.
    All of these ideas can be shown to be related and equivalent, so keep that in mind as you connect the
ideas of this course with ideas from other disciplines. For now, we'll stick with the idea that a vector is a
just a list of numbers, in some particular order.


83


﻿
                                Subsection VO.VEASM  Vector Equality, Addition, Scalar Multiplication  84


Subsection VEASM
Vector Equality, Addition, Scalar Multiplication


We start our study of this set by first defining what it means for two vectors to be the same.
Definition CVE
Column Vector Equality
Suppose that u, v E Cm. Then u and v are equal, written u = v if

                            [u] =[v]i                           1<5i 5m


(This definition contains Notation CVE.)                                                           A
   Now this may seem like a silly (or even stupid) thing to say so carefully. Of course two vectors are
equal if they are equal for each corresponding entry! Well, this is not as silly as it appears. We will see a
few occasions later where the obvious definition is not the right one. And besides, in doing mathematics
we need to be very careful about making all the necessary definitions and making them unambiguous. And
we've done that here.
   Notice now that the symbol '=' is now doing triple-duty. We know from our earlier education what it
means for two numbers (real or complex) to be equal, and we take this for granted. In Definition SE [684]
we defined what it meant for two sets to be equal. Now we have defined what it means for two vectors
to be equal, and that definition builds on our definition for when two numbers are equal when we use the
condition u v= o for all 1 < i <im. So think carefully about your objects when you see an equal sign and
think about just which notion of equality you have encountered. This will be especially important when
you are asked to construct proofs whose conclusion states that two objects are equal.
   OK, let's do an example of vector equality that begins to hint at the utility of this definition.
Example VESE
Vector equality for a system of equations
Consider the system of linear equations in Archetype B [707],

                                      -7    - 6x2 - 12x3 =-33
                                         5xi + 5x2 + 7x3= 24
                                                xi +4x3= 5
Note the use of three equals signs  each indicates an equality of numbers (the linear expressions are
numbers when we evaluate them with fixed values of the variable quantities). Now write the vector
equality,
                                    -7xi- 6x2 -12x31    [-33]
                                    [53:1+53:2+7:3 ]       [24].

By Definition CVE [84], this single equality (of two column vectors) translates into three simultaneous
equalities of numbers that form the system of equations. So with this new notion of vector equality we
can become less reliant on referring to systems of simultaneous equations. There's more to vector equality
than just this, but this is a good example for starters and we will develop it further.
   We will now define two operations on the set Cm. By this we mean well-defined procedures that
somehow convert vectors into other vectors. Here are two of the most basic definitions of the entire course.
Definition CVA


Column Vector Addition
Suppose that u, v E Cm. The sum of u and v is the vector u + v defined by

                        [u + v]; = [uli + [v];                      1<i    m


Version 2.02


﻿
                                 Subsection VO.VEASM  Vector Equality, Addition, Scalar Multiplication  85


(This definition contains Notation CVA.)                                                             A

    So vector addition takes two vectors of the same size and combines them (in a natural way!) to create a
new vector of the same size. Notice that this definition is required, even if we agree that this is the obvious,
right, natural or correct way to do it. Notice too that the symbol '+' is being recycled. We all know how
to add numbers, but now we have the same symbol extended to double-duty and we use it to indicate how
to add two new objects, vectors. And this definition of our new meaning is built on our previous meaning
of addition via the expressions ui + vi. Think about your objects, especially when doing proofs. Vector
addition is easy, here's an example from C4.

Example VA
Addition of two vectors in C4
If

                                  2                                    -1
                                  -3                                    5
                           U=                                    v=     2

                                 2  _                                  -7_

then
                                       2      -1       2 + (-1)       1
                                       -3      5        -3+5          2
                            u+ v=      4          +2              =       .

                                       2 _    -7_      2+ (-7)_      _-5_


    Our second operation takes two objects of different types, specifically a number and a vector, and
combines them to create another vector. In this context we call a number a scalar in order to emphasize
that it is not a vector.

Definition CVSM
Column Vector Scalar Multiplication
Suppose u E Cm and a E C, then the scalar multiple of u by a is the vector au defined by

                           [au]i = a [u]i                         1<i     m


(This definition contains Notation CVSM.)                                                            A

    Notice that we are doing a kind of multiplication here, but we are defining a new type, perhaps in what
appears to be a natural way. We use juxtaposition (smashing two symbols together side-by-side) to denote
this operation rather than using a symbol like we did with vector addition. So this can be another source
of confusion. When two symbols are next to each other, are we doing regular old multiplication, the kind
we've done for years, or are we doing scalar vector multiplication, the operation we just defined? Think
about your objects -if the first object is a scalar, and the second is a vector, then it must be that we are
doing our new operation, and the result of this operation will be another vector.
    Notice how consistency in notation can be an aid here. If we write scalars as lower case Greek letters
from the start of the alphabet (such as ca, /3, ...) and write vectors in bold Latin letters from the end
of the alphabet (u, v, ...), then we have some hints about what type of objects we are working with.


This can be a blessing and a curse, since when we go read another book about linear algebra, or read an
application in another discipline (physics, economics, ...) the types of notation employed may be very
different and hence unfamiliar.
   Again, computationally, vector scalar multiplication is very easy.


Version 2.02


﻿
                                                     Subsection VO.VSP Vector Space Properties 86


Example CVSM
Scalar multiplication in C5
If
                                                 3
                                                 1
                                           u =  -2
                                                 4
                                                 -1
and ao 6, then
                                        3       6(3)       18
                                        1       6(1)       6
                               au = 6 -2 =     6(-2)      -12
                                        4       6(4)      24
                                        -1     6(-1)      -6


   Vector addition and scalar multiplication are the most natural and basic operations to perform on
vectors, so it should be easy to have your computational device form a linear combination.See:  Compu-
tation VLC.MMA [668]    Computation VLC.T186 [672]   Computation VLC.T183 [674]  Computation
VLC.SAGE [677].

Subsection VSP
Vector Space Properties


With definitions of vector addition and scalar multiplication we can state, and prove, several properties of
each operation, and some properties that involve their interplay. We now collect ten of them here for later
reference.
Theorem VSPCV
Vector Space Properties of Column Vectors
Suppose that Ctm is the set of column vectors of size m (Definition VSCV [83]) with addition and scalar
multiplication as defined in Definition CVA [84] and Definition CVSM [85]. Then

   " ACC Additive Closure, Column Vectors
     If u, v E Cm, then u+ v E Cm.

   " SCC Scalar Closure, Column Vectors
     If a  C and u E Cm, then  u C C/m

   * CC Commutativity, Column Vectors
     If u, v C Ctm, then u + v =v + u.

   * AAC Additive Associativity, Column Vectors
     If u, v, w C Ctm, then u + (v + w) =(u + v) + w.

   * ZC Zero Vector, Column Vectors
     There is a vector, 0, called the zero vector, such that u + 0 =u for all u C Cm.

   * AIC Additive Inverses, Column Vectors


  If u E Ctm, then there exists a vector -u E Cm so that u + (-u) = 0.

" SMAC Scalar Multiplication Associativity, Column Vectors
  If a, 3 E C and u E Ctm, then a(,3u) = (c3)u.


Version 2.02


﻿
                                                           Subsection VO.READ  Reading Questions 87


   " DVAC Distributivity across Vector Addition, Column Vectors
     If ac E C and u, v E Cm, then (u+ v) =au+ ov.

   " DSAC Distributivity across Scalar Addition, Column Vectors
     If a, /3EC and u E CCm, then (a+,3)u= au+ 3u.

   " OC One, Column Vectors
     If u E Ctm, then lu = u.


Proof While some of these properties seem very obvious, they all require proof. However, the proofs are
not very interesting, and border on tedious. We'll prove one version of distributivity very carefully, and
you can test your proof-building skills on some of the others. We need to establish an equality, so we will
do so by beginning with one side of the equality, apply various definitions and theorems (listed to the right
of each step) to massage the expression from the left into the expression on the right. Here we go with a
proof of Property DSAC [87]. For 1 < i < m,

                  [(c + #3)u] = (a + 3) [u]i                  Definition CVSM [85]
                             = a [u]i + 3 [u]i                Distributivity in C
                             = [au]i + [/3u]i                 Definition CVSM [85]
                             = [au + /3u]i                    Definition CVA [84]


Since the individual components of the vectors (a + 3)u and au + 3u are equal for all i, 1 < i < m,
Definition CVE [84] tells us the vectors are equal.                                                 U
   Many of the conclusions of our theorems can be characterized as "identities," especially when we are
establishing basic properties of operations such as those in this section. Most of the properties listed in
Theorem VSPCV [86] are examples.    So some advice about the style we use for proving identities is
appropriate right now. Have a look at Technique PI [693].
   Be careful with the notion of the vector -u. This is a vector that we add to u so that the result is the
particular vector 0. This is basically a property of vector addition. It happens that we can compute -u
using the other operation, scalar multiplication. We can prove this directly by writing that

                                 [-u]= - [u]i = (-1) [u]= =[-1)u];

We will see later how to derive this property as a consequence of several of the ten properties listed in
Theorem VSPCV [86].

Subsection READ
Reading Questions


  1. Where have you seen vectors used before in other courses? How were they different?

  2. In words, when are two vectors equal?

  3. Perform the following computation with vector operations


   1           7
2 5 + (-3) 6
   0           5


Version 2.02


﻿
                                                                  Subsection VO.EXC  Exercises 88


Subsection EXC
Exercises


C1O Compute
                                      2            1      -1
                                      -3           2       3
                                   4  4   +(-2) -5 +       0
                                      1            2       1
                                      0            4       2

Contributed by Robert Beezer Solution [89]

T13 Prove Property CC [86] of Theorem VSPCV [86]. Write your proof in the style of the proof of
Property DSAC [87] given in this section.
Contributed by Robert Beezer Solution [89]

T17 Prove Property SMAC [86] of Theorem VSPCV [86]. Write your proof in the style of the proof of
Property DSAC [87] given in this section.
Contributed by Robert Beezer

T18 Prove Property DVAC [87] of Theorem VSPCV [86]. Write your proof in the style of the proof of
Property DSAC [87] given in this section.
Contributed by Robert Beezer


Version 2.02


﻿
                                                                   Subsection VO.SOL  Solutions 89


Subsection SOL
Solutions


ClO    Contributed by Robert Beezer   Statement [88]
   5
   -13
   26
   1
   -6 _
T13    Contributed by Robert Beezer   Statement [88]
For all 1 <i <m,

                    [u + v]i = [u]i + [v]i                 Definition CVA [84]
                            = [v]i + [u]i                  Commutativity in C
                            = [v + u]i                     Definition CVA [84]

With equality of each component of the vectors u + v and v + u being equal Definition CVE [84] tells us
the two vectors are equal.


Version 2.02


﻿
Section LC Linear Combinations 90


Section LC
Linear Combinations


0


In Section VO [83] we defined vector addition and scalar multiplication. These two operations combine
nicely to give us a construction known as a linear combination, a construct that we will work with through-
out this course.


Subsection LC
Linear Combinations


Definition LCCV
Linear Combination of Column Vectors
Given n vectors ui, u2, u3, ..., un from Cm and n scalars ai, a2, a3, ..., an, their linear combination
is the vector
                                 aiu1 + a2U2 + Ou3 + ... + Onun

                                                                                                A

   So this definition takes an equal number of scalars and vectors, combines them using our two new
operations (scalar multiplication and vector addition) and creates a single brand-new vector, of the same
size as the original vectors. When a definition or theorem employs a linear combination, think about the
nature of the objects that go into its creation (lists of scalars and vectors), and the type of object that
results (a single vector). Computationally, a linear combination is pretty easy.


Example TLC
Two linear combinations in C6
Suppose that

              o1= 1=

and


a2=-4


O3= 2


O4= -1


ui1


2
4
-3
1
2
9-


U2


6
3
0
-2
1
4


U3


-5
2
1
1
-3
0


U4


3
2
-5
7
1
3


then their linear combination is


a1u1 + a2U2 + a3U3 + 4U4-


(1)


2
4
-3
1
2
9 _


     6
     3
     0
+   -2
     1
     4 _


       -5-
       2

+(2)I     I

       -3
       0 _


-1)


3
2
-5
7
1
3 _


Version 2.02


﻿
Subsection LC.LC  Linear Combinations  91


2
4
-3
1
2
_9 _


+


-24~-
-12
0
8
-4
-16


+


-10
4
2
2
-6
0


+


-3-
-2
5
-7
-1
_-3_


--35-
-6
4
4
-9
-10


A different linear combination, of the same set of vectors, can be formed with different scalars. Take

            1 =f3o2t=0ie3=o5mi4n=n-1

and form the linear combination


/31u + /32u2 + ,3u3 + 34u4


    2
    4

(3) 7   + (0)

    2
    9 _


6
3
0
-2
1
4 _
-25
10
5
5
-15
0


      -5         3
      2          2
      1          -5
+(5)  1   + (-1) 7
      -3         1
      0          3
    -3     -22
    -2      20
    5        1

    -1     -10
    _-3_ _ 24 _


~6 ~
12
-9
3
6
27


+


0
0
0
0
0
_0_


+


Notice how we could keep our set of vectors fixed, and use different sets of scalars to construct different
vectors. You might build a few new linear combinations of ui, u2, u3, u4 right now. We'll be right here
when you get back. What vectors were you able to create? Do you think you could create the vector


w


13
15
5
-17
2
25 _


with a "suitable" choice of four scalars? Do you think you could create any possible vector from C6 by
choosing the proper scalars? These last two questions are very fundamental, and time spent considering
them now will prove beneficial later.

   Our next two examples are key ones, and a discussion about decompositions is timely. Have a look at
Technique DC [694] before studying the next two examples.

Example ABLC
Archetype B as a linear combination
In this example we will rewrite Archetype B [707] in the language of vectors, vector equality and linear
combinations. In Example VESE [84] we wrote the system of m = 3 equations as the vector equality


L


-7x1 - 6x2 - 12x3
5x1 + 5x2 + 7x3
   zi + 4x3   j


-33
24
5]


Version 2.02


﻿
                                                              Subsection LC.LC   Linear Combinations 92


Now we will bust up the linear expressions on the left, first using vector addition,

                                 -7xi      -6X2       -12x3       -33
                                 5xi    +   5X2   +     7X3   =    24
                                 _x1  _     0xO 2 _ _4x3     _    _5   _
Now we can rewrite each of these n = 3 vectors as a scalar multiple of a fixed vector, where the scalar is
one of the unknown variables, converting the left-hand side into a linear combination

                                  -_7-        -6         ~-12      ~-33
                              i    5   +   2   5   +      7    =    24
                                   1           0          4         5

We can now interpret the problem of solving the system of equations as determining values for the scalar
multiples that make the vector equation true. In the analysis of Archetype B [707], we were able to
determine that it had only one solution. A quick way to see this is to row-reduce the coefficient matrix
to the 3 x 3 identity matrix and apply Theorem NMRRI [72] to determine that the coefficient matrix is
nonsingular. Then Theorem NMUS [74] tells us that the system of equations has a unique solution. This
solution is

                     i= -3                       x2=5                       x3=2.

So, in the context of this example, we can express the fact that these values of the variables are a solution
by writing the linear combination,

                                   _7-         -6          -12       -33
                             (-3)   5   + (5)   5   + (2)   7    =    24.
                                    1           0           4         5

Furthermore, these are the only three scalars that will accomplish this equality, since they come from a
unique solution.
    Notice how the three vectors in this example are the columns of the coefficient matrix of the system of
equations. This is our first hint of the important interplay between the vectors that form the columns of
a matrix, and the matrix itself.
   With any discussion of Archetype A [702] or Archetype B [707] we should be sure to contrast with the
other.
Example AALC
Archetype A as a linear combination
As a vector equality, Archetype A [702] can be written as

                                        i -                 r2 + 2X3
                                        2xi1+z2 + 3      =8.
                                            zi + z2    _    [5_
Now bust up the linear expressions on the left, first using vector addition,

                                    [xi] + [2] + [z] H


Rewrite each of these nr= 3 vectors as a scalar multiple of a fixed vector, where the scalar is one of the
unknown variables, converting the left-hand side into a linear combination


    1         -1         2      1
zi 2 +zx2      1   +zx3 1=      8.
    1          1         0      5


Version 2.02


﻿
                                                             Subsection LC.LC  Linear Combinations 93


Row-reducing the augmented matrix for Archetype A [702] leads to the conclusion that the system is
consistent and has free variables, hence infinitely many solutions. So for example, the two solutions

                     x1=2                      x2=3                       x3=1
                     xi=3                      x2=2                       x3=0

can be used together to say that,

                      .1_      .1_         2      1         1         ~-1         2
                 (2) 2 + (3)     1  + (1) 1 =     8 = (3) 2 + (2)      1   + (0) 1
                      1         1          0      5         1          1          0

Ignore the middle of this equation, and move all the terms to the left-hand side,

                   .1_      .1_         2           1          -1            2     0
              (2) 2 + (3)     1  + (1) 1 + (-3) 2 + (-2)        1   + (-0) 1 =     0
                   1          1         0           1           1            0     0

Regrouping gives
                                      1         -1         2      0
                                (-1) 2 + (1)     1   + (1) 1      0
                                      1          1         0      -0
Notice that these three vectors are the columns of the coefficient matrix for the system of equations in
Archetype A [702]. This equality says there is a linear combination of those columns that equals the vector
of all zeros. Give it some thought, but this says that

                    x1 =-1                      x2= 1                      x3= 1

is a nontrivial solution to the homogeneous system of equations with the coefficient matrix for the original
system in Archetype A [702]. In particular, this demonstrates that this coefficient matrix is singular. Z

   There's a lot going on in the last two examples. Come back to them in a while and make some
connections with the intervening material. For now, we will summarize and explain some of this behavior
with a theorem.

Theorem SLSLC
Solutions to Linear Systems are Linear Combinations
Denote the columns of the m x n matrix A as the vectors A1, A2, A3, ..., An. Then x is a solution to
the linear system of equations IS(A, b) if and only if

                            [x]1 A1 + [x]2 A2 + [x]3 A3 +|  + | [x], An b


Proof The proof of this theorem is as much about a change in notation as it is about making logical
deductions. Write the system of equations IJS(A, b) as

                                a11xi + a12x2 + a13x3 +| -. + | ainzx = b
                                a21xi + a22x2 + a23x3 +| -. + | a2nzx= b


   a31x1 + a32x2 + a33x3 + -  3n n = 3


amlxl + am2x2 + am3x3 + -+ amnxn = bm.


Version 2.02


﻿
                                                    Subsection LC.VFSS Vector Form of Solution Sets 94


Notice then that the entry of the coefficient matrix A in row i and column j has two names: aig as the
coefficient of xo in equation i of the system and [Aj]i as the i-th entry of the column vector in column
j of the coefficient matrix A. Likewise, entry i of b has two names: b2 from the linear system and [b]2
as an entry of a vector. Our theorem is an equivalence (Technique E [690]) so we need to prove both
"directions."
    (<) Suppose we have the vector equality between b and the linear combination of the columns of A.
Then for 1 i <rn,


b2 = [b]
     [[x]1 A1] + [[x]2 A2 + [x]3 A3+    ... + [x] [An]  A
     [X]1i A1]2 + [[x]2 A2]i + [[x]3 A3]i + ... + [[x]n An]
  X1 [A1]i +    [x]2 [A2]i + [x]3 [A3] |-+--| [x]n [An]
    _ x]1 ai + [x]2 ai2 + [x]3 ais -|--.-. -| [x]n ain
  = ai[x]1 + ai2 [X]2 + ais [x]3 -|-.-..-+-|- inEXI n


Notation
Hypothesis
Definition CVA [84]
Definition CVSM [85]
Notation
Commutativity in C


This says that the entries of x form a solution to equation i of IS(A, b) for all 1 < i <rn, in other words,
x is a solution to [S(A, b).
    (-) Suppose now that x is a solution to the linear system [S(A, b). Then for all 1 < i < n,


[b], = b
    = ai [x]1 + a22 [x]2 + a23 [x]3 + . .. +  [x]n
      [X]1agi+ [x]2aGi2+ [x3a3+...+ [x]na Lin
    - [x]1 [A1]i + [x]2 [A2]; + [x]3 [A3]- + ... + [x]n [An];
      [[X]1 A1]i + [[x]2 A2] + [[x]3 A3]- + ... + [[x] An]
      _[X]1 A1 + [x]2 A2 + [x]3 A3 -|-..-.--|- x],An],


Notation
Hypothesis
Commutativity in C
Notation
Definition CVSM [85]
Definition CVA [84]


Since the components of b and the linear combination of the columns of A agree for all 1 < i < n, Definition
CVE [84] tells us that the vectors are equal.                                                      U
   In other words, this theorem tells us that solutions to systems of equations are linear combinations of
the column vectors of the coefficient matrix (Ai) which yield the constant vector b. Or said another way,
a solution to a system of equations [S(A, b) is an answer to the question "How can I form the vector b
as a linear combination of the columns of A?" Look through the archetypes that are systems of equations
and examine a few of the advertised solutions. In each case use the solution to form a linear combination
of the columns of the coefficient matrix and verify that the result equals the constant vector (see Exercise
LC.C21 [108]).

Subsection VFSS
Vector Form of Solution Sets


We have written solutions to systems of equations as column vectors. For example Archetype B [707] has
the solution xi= -3, x2 = 5, x3= 2 which we now write as

                                               zi      -3
                                         X =X2 =        5   .
                                               za_      2 _
Now, we will use column vectors and linear combinations to express all of the solutions to a linear system
of equations in a compact and understandable way. First, here's two examples that will motivate our next


Version 2.02


﻿
                                                    Subsection LC.VFSS Vector Form of Solution Sets 95


theorem. This is a valuable technique, almost the equal of row-reducing a matrix, so be sure you get
comfortable with it over the course of this section.

Example VFSAD
Vector form of solutions for Archetype D
Archetype D [716] is a linear system of 3 equations in 4 variables. Row-reducing the augmented matrix
yields
                                          1    0   3 -2    4]
                                          0        1 -3    0
                                          0    0  0   0   0]

and we see r = 2 nonzero rows. Also, D = {1, 2} so the dependent variables are then zl and z2.
F = {3, 4, 5} so the two free variables are x3 and x4. We will express a generic solution for the system by
two slightly different methods, though both arrive at the same conclusion.
   First, we will decompose (Technique DC [694]) a solution vector. Rearranging each equation represented
in the row-reduced form of the augmented matrix by solving for the dependent variable in each row yields
the vector equality,

                 zl       4 - 3x3--2x4
                 X2    _   -x3 + 3x4
                 x3            x3


Now we will use the definitions of column vector addition and scalar multiplication to express this vector
as a linear combination,

                          4     -3x3       2X4

                          0      -x3   +   3X4                  Definition CVA [84]
                          0       X3        0
                          .0_   _ 0        X4_
                          4        -3         2

                          0] +3 K- +4 K]Definition CVSM [85]
                          0         10
                          0         0         1


We will develop the same linear combination a bit quicker, using three steps. While the method above is
instructive, the method below will be our preferred approach.
    Step 1. Write the vector of variables as a fixed vector, plus a linear combination of n~ - r vectors, using
the free variables as the scalars.


Step 2. Use 0's and l's to ensure equality for the entries of the the vectors with indices in F (corresponding
to the free variables).


      X1

X +3 +2
      X3      0 +x31 +          0
      .X4_] _0_        _0_     L_1


Version 2.02


﻿
                                                      Subsection LC.VFSS Vector Form of Solution Sets 96


Step 3. For each dependent variable, use the augmented matrix to formulate an equation expressing the
dependent variable as a constant plus multiples of the free variables. Convert this equation into entries of
the vectors that ensure equality for each dependent variable, one at a time.

                                                              1I      4         -3         2
            i = 4 - 3x+3 + 2x4             -x =[2                    K] 0+3 [       + +|c4 0

                                                              x4      0         0          _1_
                                                              XI      4         -3         2

            2 = 0 - 1x3 + 3X4             -x                 K]2     [j] +cc3 -9] - +c4K
                                                              X3      0         1          0
                                                              _z4_    0__0         _       1_


This final form of a typical solution is especially pleasing and useful. For example, we can build solutions
quickly by choosing values for our free variables, and then compute a linear combination. Such as

                                                   XI      4          -3           2       -12

         X3 = 2, x4 =-5-                     x=[x]         [   +(2) K] +(-5) K]            -[7
                                                   X3      0           1           0        2
                                                   _z4_    0__0                    _1_    _-5_

or,

                                                       X1      4         -3          2      7

           X3 = 1, x4 = 3          -x                    ]     0 + (1)[1i     + (3) K]      [
                                                       X3      0          1          0       1
                                                       _z4_    0_         0 _1_             3_

You'll find the second solution listed in the write-up for Archetype D [716], and you might check the first
solution by substituting it back into the original equations.
    While this form is useful for quickly creating solutions, its even better because it tells us exactly what
every solution looks like. We know the solution set is infinite, which is pretty big, but now we can say that
                                -3                       2                         4

a solution is some multiple of   plus a multiple of 0  plus the fixed vector 0n. Period. So it only


takes us three vectors to describe tle entire infinite solution set, provided we also agree on how to combine
the three vectors into a linear combination.

    This is such an important and fundamental technique, we'll do another example.

Example VFS
Vector form of solutions
Consider a linear system of m - 5 equations in n~ - 7 variables, having the augmented matrix A.


       2    1   -1   -2   2   1   5   21
       1    1   -3    1   1 1     2   -5
A=     1    2   -8    5   1 1 -6      -15
       3    3   -9    3   6 5     2   -24
       -2  -1    1    2   1 1 -9      -30


Version 2.02


﻿
                                                    Subsection LC.VFSS Vector Form of Solution Sets 97


Row-reducing we obtain the matrix

                                    1   0    2   -3   0    0   9    15
                                    0  [    -5    4   0    0  -8   -10
                             B      0   0    0   0    1    0  -6    11
                                    0   0    0   0    0    1   7   -21
                                    0 0 0 0000 0

and we see r = 4 nonzero rows. Also, D = {1, 2, 5, 6} so the dependent variables are then zi, z2, x5, and
z6. F = {3, 4, 7, 8} so the n - r = 3 free variables are x3, x4 and x7. We will express a generic solution
for the system by two different methods: both a decomposition and a construction.
   First, we will decompose (Technique DC [694]) a solution vector. Rearranging each equation represented
in the row-reduced form of the augmented matrix by solving for the dependent variable in each row yields
the vector equality,
            z1       15-2x3+3x4-9x7
            X2       -10 + 5x3 - 4x4 + 8x7
            x3                x3
            4 3=:4
            X5             11+6x7
            X6             -21-7X7
            3:7               37

Now we will use the definitions of column vector addition and scalar multiplication to decompose this
generic solution vector as a linear combination,

                     15       -2x3       3x4       -9x7
                     -10       5X3       -4X4       8x7
                     0         33         0          0
                 -    0   +     0    +    34    +    0                Definition CVA [84]
                     11         0         0         6X7
                     -21        0         0        -7X7
                     0          0         0   _      :7
                     15          -2         3          -9
                     -10         5          -4         8
                     0           1          0          0
                 -    0   + 33   0   + 34   1   + 37   0              Definition CVSM [85]
                     11          0          0          6
                     -21         0          0          -7
                     0  __0        _        0 _        1 _


We will now develop the same linear combination a bit quicker, using three steps. While the method above
is instructive, the method below will be our preferred approach.
   Step 1. Write the vector of variables as a fixed vector, plus a linear combination of n~ - r vectors, using
the free variables as the scalars.


X=[3X4 =        +3X3 +4           + x7
      X5
      X6


Version 2.02


﻿
                                                      Subsection LC.VFSS   Vector Form of Solution Sets 98


Step 2. Use 0's and l's to ensure equality for the entries of the the vectors with indices in F (corresponding
to the free variables).


                                    x2
                                    X3      0         1        0         0
                              X=    :4 =    0 +33     0 +3:4 1 +3:7 0
                                    x5
                                    x6
                                    3:7     0        0         0        -1

Step 3. For each dependent variable, use the augmented matrix to formulate an equation expressing the
dependent variable as a constant plus multiples of the free variables. Convert this equation into entries of
the vectors that ensure equality for each dependent variable, one at a time.


 = 15 - 2x3+ 314


93:7


x2 = -10 + 533 - 4x4 +8x7


X5=11 + 6X7


6=-21-737


       x


4      x


4      x


4      x


3i3
12
13
34
15
16


3:7
12
13
34
15
16


3:7
12
13
34
35
16


12
13
14
15
z16
_z7_


15-

0
0


-3

0
1


0]
15
-10
0
0


0
15
-10
0
0
11

0
15
-10
0
0
11
-21
0


+ 3


1
0


0K


+ X4


+ z7


-2-
5
1
0


0]
-2
5
1
0
0

0
-2
5
1
0
0
0
0


    L


+ 34


+3:4


+3:4


0]
   3.
   -4
   0
   1


   0
   3.
   -4
   0
   1
   0

   0
   3.
   -4
   0
   1
   0
   0
   0


+ z7


-9-

0
0


1


-9
8
0
0


1
-9
8
0
0
6

1
-9
8
0
0
6
-7
1


+3:7


+X7


This final form of a typical solution is especially pleasing and useful. For example, we can build solutions
quickly by choosing values for our free variables, and then compute a linear combination. For example


                  s3= 2,xz4 =-4,x7y= 3


Version 2.02


﻿
Subsection LC.VFSS


Vector Form of Solution Sets 99


Subsection LC.VFSS Vector Form of Solution Sets 99


x


12
13
14
3:5

137~


15
-10
0
0
11
-21
0


+(2)


-2
5
1
0
0
0
0


+ (-4)


3-
-4
0
1
0
0
0


+(3)


-9-
8
0
0
6
-7
1_


-28~
40
  2
  -4
  29
  -42
-3_


or perhaps,


x


or even,


5, 34


  3i3
  32
  13
  34


  3:7


0, 14


12
13
14
15
z16
_z7_


2, 37 =
    15
    -10
    0
= 0
    11
    -21
    0


 0, 37 =
    15
    -10
    0
= 0
    11
    -21
    0


+


1


(5)


-2-
5
1
0
0
0
0


+


(2)


-3-
-4
0
1
0
0
0


+


(1)


~-9-
8
0
0
6
-7
_1 _


2
15
5
2
17
-28
1


0


x


+(0)


-2-
5
1
0
0
0
0


+(0)


3-
-4
0
1
0
0
0


+(0)


-9-
8
0
0
6
-7
1 _


15-
-10
0
0
11
-21
0


So we can compactly express all of the solutions to this linear system with just 4 fixed vectors, provided
we agree how to combine them in a linear combinations to create solution vectors.
    Suppose you were told that the vector w below was a solution to this system of equations. Could you
turn the problem around and write w as a linear combination of the four vectors c, ui, u2, u3? (See
Exercise LC.M11 [109].)


w


100
-75
7
9
-37
35
-8


C


15 -
-10
0
0
11
-21
0


ui1


-2
5
1
0
0
0
0


U2


3-
-4
0
1
0
0
0


U3


-9-
8
0
0
6
-7
1_


    Did you think a few weeks ago that you could so quickly and easily list all the solutions to a linear
system of 5 equations in 7 variables?
   We'll now formalize the last two (important) examples as a theorem.

Theorem VFSLS
Vector Form of Solutions to Linear Systems
Suppose that [A | b] is the augmented matrix for a consistent linear system ES(A, b) of m equations in


Version 2.02


﻿
                                                  Subsection LC.VFSS  Vector Form of Solution Sets 100


n variables. Let B be a row-equivalent m x (n + 1) matrix in reduced row-echelon form. Suppose that
B has r nonzero rows, columns without leading 1's with indices F = {fi, f2, f3, ..., fn-r, n+1}, and
columns with leading 1's (pivot columns) having indices D = {di, d2, d3, ..., dr}. Define vectors c, u3,
1   j   rn - r of size n by


                                  [c]   (0         if i EF
                                          [B]kn+1if i E D, i=dk
                                          11if i E F, i =f
                                 [ug] =   0ifiEF,i -ff
                                          1-[B]k    ifiED,i=dk

Then the set of solutions to the system of equations [S(A, b) is

              S = {c + C1ui+ C2u2 +c3u3+ .+--- runr |(1i, (2, (3, ... , an-r E C}


Proof First, IJS(A, b) is equivalent to the linear system of equations that has the matrix B as its
augmented matrix (Theorem REMES [28]), so we need only show that S is the solution set for the system
with B as its augmented matrix. The conclusion of this theorem is that the solution set is equal to the set
S, so we will apply Definition SE [684].
   We begin by showing that every element of S is indeed a solution to the system. Let ai, a2, O3, ... , an-r
be one choice of the scalars used to describe elements of S. So an arbitrary element of S, which we will
consider as a proposed solution is

                            X    C= C+ aiU1 + a2U2 +a3U3 -+ ---+-n-run-r

When r + 1 < £     m, row £ of the matrix B is a zero row, so the equation represented by that row is
always true, no matter which solution vector we propose. So concentrate on rows representing equations
1 < < < r. We evaluate equation £ of the system represented by B with the proposed solution vector x
and refer to the value of the left-hand side of the equation as #f,

                        #-f= [B],1 [x]1 + [B]f2 [x]2 + [B]f3 [x]3 + ... + [B]in [x]n

Since [B]fd.,= 0 for all 1 < i < r, except that [B]fd, =1, we see that 3 simplifies to

                #N = [x]d, + [B]1f [x]fl + [B]f2 [X]f2 + [B]f3 [X]f3 + ... + [B]- [x]fn-r

Notice that for 1 < i <rn - r

            [x]f, = [c]f. + ci [uihf. + a2 [u2]f. + as3 [us3]f. +| -. + - --a [ui~j, +| -. + -O-|Tanr [Un-r]f.
                = 0 + ai(0) + a{2(0) + as3(0) + - - - + a{y(1) + - - + an-r (0)


So /32 simplifies further, and we expand the first term

           #- [x] ,+ [B]gf, a1 + [B]!f2 a2 + [B]gf3 as3 + --- + [B]e ar
           - [c + caiui + 02u2 + asUs +| -. + - -|anrun-r], -|-+


    [B]g, a1 + [B]2f2 a2 + [B]f3 a3 + ... + [B]  an-r

= [c], + c1 [ui]d, + a2 [U2]d, + a3 [u3],d + ... +  |n-r [un-rd, -+
    [ B], a1 + [B] f2 a2 + [B]f3 a3 + ... + [B]  an-r


Version 2.02


﻿
                                                    Subsection LC.VFSS   Vector Form of Solution Sets 101


               [B]f,n+1 + Cai(- [B]f1) + Oa2(- [B]f~f2) + a3(- [B]Ef,) + ... + an-r (- [B],_)+
                 [B]f,, a1 + [B] f2 (2 + [B]f3 a3 + ... + [B]f   n-r

               [B]in+1

So /3 began as the left-hand side of equation £ of the system represented by B and we now know it equals
[B]f,n+1,the constant term for equation £ of this system. So the arbitrarily chosen vector from S makes
every equation of the system true, and therefore is a solution to the system. So all the elements of S are
solutions to the system.
    For the second half of the proof, assume that x is a solution vector for the system having B as its
augmented matrix. For convenience and clarity, denote the entries of x by xi, in other words, xi = [x]i.
We desire to show that this solution vector is also an element of the set S. Begin with the observation
that a solution vector's entries makes equation £ of the system true for all 1 < £ < m,

                        [B],1zi + [B],2 z2 + [B],3 x3 + ... + [B],, oxn = [B],n+1

When £    r, the pivot columns of B have zero entries in row £ with the exception of column df, which will
contain a 1. So for 1 < Ef< r, equation £ simplifies to

               1xd, + [B]ef1 oxf, + [B]f2 o:f2 + [B]f3 Xf, + ... + [B]fn rf_, =[B]f,n+1

This allows us to write,

               [x] d, = zd,
                      [B]Q,n+1 - [B]1 xf, - [B]f2 o:f2 - [B]f3 Xf - - - [B],f zr _fn-r
                      [C]d, + zf1 [u] , + of2 [u2]d, + Xf [u3]d, + ... + zfn-r [U3r]d
                    - [C +Xflui+Xf2u2+f3u3+...+Xfn_,un-r]d

This tells us that the entries of the solution vector x corresponding to dependent variables (indices in D),
are equal to those of a vector in the set S. We still need to check the other entries of the solution vector x
corresponding to the free variables (indices in F) to see if they are equal to the entries of the same vector
in the set S. To this end, suppose i E F and i = f3. Then

             [x] = xi =xf,
                 = 0+ 0xf, + 0xf2 + 0xf3 +- + 0xfj_, + 1xf3 + 0xfj  +-- + 0   _fn-r
                 = [c]i + ofi [ui]i + of2 [u2]i + Xf, [us]i + ... + xf3 [uj] + ... + ofn-r [un-r]j
                 = [C+f-i1+.of2U2 +---+X f_,unr-r]J

So entries of x and c +z3fiuil+z£f2u12 +-. + -+zfn_,um_ are equal and therefore by Definition CVE [84] they
are equal vectors. Since o:fi, of2, o:f,, ...,n-r_ are scalars, this shows us that x qualifies for membership
in S. So the set S contains all of the solutions to the system.U
    Note that both halves of the proof of Theorem VFSLS [99] indicate that oi = [x]f,. In other words,
the arbitrary scalars, aij, in the description of the set S actually have more meaning -they are the values
of the free variables [x]f., 1 < i < n - r. So we will often exploit this observation in our descriptions of
solution sets.
    Theorem VFSLS [99] formalizes what happened in the three steps of Example VESAD [95]. The


theorem will be useful in proving other theorems, and it it is useful since it tells us an exact procedure for
simply describing an infinite solution set. We could program a computer to implement it, once we have
the augmented matrix row-reduced and have checked that the system is consistent. By Knuth's definition,
this completes our conversion of linear equation solving from art into science. Notice that it even applies


Version 2.02


﻿
                                                   Subsection LC.VFSS  Vector Form of Solution Sets 102


(but is overkill) in the case of a unique solution. However, as a practical matter, I prefer the three-step
process of Example VFSAD [95] when I need to describe an infinite solution set. So let's practice some
more, but with a bigger example.
Example VFSAI
Vector form of solutions for Archetype I
Archetype I [737] is a linear system of m = 4 equations in n = 7 variables. Row-reducing the augmented
matrix yields
                                    1   4   0   0   2   1   -3   4]
                                    0   0 2     0   1 -3     5   2
                                    0   0   0  2    2 -6     6   1
                                    0   0   0   0   0   0    00]
and we see r = 3 nonzero rows. The columns with leading 1's are D = {1, 3, 4} so the r dependent
variables are zi, zx3,:4. The columns without leading 1's are F = {2, 5, 6, 7, 8}, so the n - r = 4 free
variables are x2, x5, z6, 37.
    Step 1. Write the vector of variables (x) as a fixed vector (c), plus a linear combination of n - r = 4
vectors (ui, u2, u3, u4), using the free variables as the scalars.


                               22
                               X3
                         X=    3:4 +=+2           +3:5     +3:6     + x7
                               X5
                               X6
                               3:7

Step 2. For each free variable, use 0's and l's to ensure equality for the corresponding entry of the the
vectors. Take note of the pattern of 0's and l's at this stage, because this is the best look you'll have at it.
We'll state an important theorem in the next section and the proof will essentially rely on this observation.


                              x2      0        1        0         0        0
                              x3
                        X=    3:4 +=32            +35:     +3:6      +3:7
                              x5      0        0        1         0        0
                              z6      0        0        0         1        0
                              7       0        0        0        0 _1_

Step 3. For each dependent variable, use the augmented matrix to formulate an equation expressing the
dependent variable as a constant plus multiples of the free variables. Convert this equation into entries of
the vectors that ensure equality for each dependent variable, one at a time.

                    zi = 4 -4:2 -2:5 -1x + 37     ->


                               z2  0         1          0          0         0
                           33
                     X=    34   =     +32        +3:5       +3:6       +3:7
                               z5  0         0          1          0         0


x3 = 2+ 0:2 - X5+ 3:6 - 5x:7  =


Version 2.02


﻿
Subsection LC.VFSS


Vector Form of Solution Sets 103


                           zi      4        -4         -2         -1          3
                           X2      0         1          0          0          0
                           X3      2         0         -1          3         -5
                     X=     :4  =     +2         +x5        +x6        +x7
                           z5      0         0          1          0          0
                           z6      0         0          0          1          0
                           _ 7     0         0          0          0          1


                    X4 = 1+ 2 - 2x5+6x6 - 6X7 -
                           zi      4        -4         -2         -1          3
                           X2      0         1          0          0          0
                           X3      2         0         -1          3         -5
                     X=     :4 =   1 +3:2    0   +3:5 -2    +3:6   6   +3:7 -6
                           z5      0         0          1          0          0
                           z6      0         0          0          1          0
                           _ 7     0         0          0          0          1


We can now use this final expression to quickly build solutions to the system. You might try to recreate
each of the solutions listed in the write-up for Archetype I [737]. (Hint: look at the values of the free
variables in each solution, and notice that the vector c has 0's in these locations.)
   Even better, we have a description of the infinite solution set, based on just 5 vectors, which we combine
in linear combinations to produce solutions.
   Whenever we discuss Archetype I [737] you know that's your cue to go work through Archetype J [741]
by yourself. Remember to take note of the 0/1 pattern at the conclusion of Step 2. Have fun  we won't
go anywhere while you're away.

   This technique is so important, that we'll do one more example. However, an important distinction
will be that this system is homogeneous.

Example VFSAL
Vector form of solutions for Archetype L
Archetype L [750] is presented simply as the 5 x 5 matrix

                                         -2   -1   -2   -4    4
                                         -6   -5   -4   -4    6
                                   L      10   7    7   10   -13
                                         -7   -5   -6   -9   10
                                         _-4 -3 -4 -6 6 _

We'll interpret it here as the coefficient matrix of a homogeneous system and reference this matrix as
L. So we are solving the homogeneous system IJS(L, 0) having m =5 equations in nr= 5 variables. If
we built the augmented matrix, we would add a sixth column to L containing all zeros. As we did row
operations, this sixth column would remain all zeros. So instead we will row-reduce the coefficient matrix,
and mentally remember the missing sixth column of zeros. This row-reduced matrix is


1    0   0    1   -2
0        0   -2   2
0    0        2   -1
0   0    0   0    0
0   0    0   0    0
                                                    Version 2.02


﻿
                                                 Subsection LC.VFSS Vector Form of Solution Sets 104


and we see r = 3 nonzero rows. The columns with leading 1's are D = {1, 2, 3} so the r dependent
variables are zi, z2, 33. The columns without leading 1's are F = {4, 5}, so the n - r = 2 free variables
are x4, x5. Notice that if we had included the all-zero vector of constants to form the augmented matrix
for the system, then the index 6 would have appeared in the set F, and subsequently would have been
ignored when listing the free variables.
   Step 1. Write the vector of variables (x) as a fixed vector (c), plus a linear combination of n - r = 2
vectors (ui, u2), using the free variables as the scalars.

                                      x1
                                      22
                                 X=   33 =       +x4     +x5
                                      X4


Step 2. For each free variable, use 0's and l's to ensure equality for the corresponding entry of the the
vectors. Take note of the pattern of 0's and l's at this stage, even if it is not as illuminating as in other
examples.
                                      x1
                                      22
                                X=    33 =      +4       +3:5
                                      34     0        1        0
                                      _ _    0        0 _1
Step 3. For each dependent variable, use the augmented matrix to formulate an equation expressing the
dependent variable as a constant plus multiples of the free variables. Don't forget about the "missing"
sixth column being full of zeros. Convert this equation into entries of the vectors that ensure equality for
each dependent variable, one at a time.

                                                        zi      0        -1        2
                                                        22
          zi = 0 - lx4 + 2x5                       X 3=  3 =       + 34       --X5
                                                        34      0         1        0
                                                        0               _50         1

                                                        zi      0        -1         2
                                                        X2      0         2         -2
          x2=0+2x4-2x5                 -x =              :3 +=--x4            -
                                                        34      0         1         0
                                                        0               _50 _1

                                                        zIr0 -1                     2
                                                            z2  0         2         -2
          3= 0- 2:4 + 1x:>                           =   33     0 +3:4 -2     +3:s  1
                                                        3:4     0         1         0
                                                            _ _ _0__      0 _1

The vector c will always have 0's in the entries corresponding to free variables. However, since we are
solving a homogeneous system, the row-reduced augmented matrix has zeros in column n~ + 1 =6, and
hence all the entries of c are zero. So we can write


      X1            -1         2         -1         2
      X2            2         -2          2         -2
x= z3 = : O+z4 -2 +3:5         1   = z4 -2 +35:     1
      34             1         0          1         0
      _             0     _    1     _    0          1


Version 2.02


﻿
                                    Subsection LC.PSHS  Particular Solutions, Homogeneous Solutions 105


It will always happen that the solutions to a homogeneous system has c = 0 (even in the case of a unique
solution?). So our expression for the solutions is a bit more pleasing. In this example it says that the
                                                                     -1                2
                                                                     2                -2
solutions are all possible linear combinations of the two vectors ui= -2  and u2 =  1  , with no
                                                                      1                0
                                                                      0                1
mention of any fixed vector entering into the linear combination.
   This observation will motivate our next section and the main definition of that section, and after that
we will conclude the section by formalizing this situation.


Subsection PSHS
Particular Solutions, Homogeneous Solutions


0


The next theorem tells us that in order to find all of the solutions to a linear system of equations, it is
sufficient to find just one solution, and then find all of the solutions to the corresponding homogeneous
system. This explains part of our interest in the null space, the set of all solutions to a homogeneous
system.
Theorem PSPHS
Particular Solution Plus Homogeneous Solutions
Suppose that w is one solution to the linear system of equations [S(A, b). Then y is a solution to [S(A, b)
if and only if y = w + z for some vector z E P1(A).                                              D
Proof Let A1, A2, A3, ..., A, be the columns of the coefficient matrix A.
   (<) Suppose y = w + z and z E N1(A). Then


b = [w]1 A1i+ [w]2 A2 + [w]3A3-+...+[w],, An
  = [w]1 A1 + [w]2 A2 + [w]3 A3 + -+ [w] A  +0
  = [w]1 A1 + [w]2 A2 + [w]3 A3 + ... + [w] An
          + [z]1 A1 + [z]2 A2 + [z]3A3 + .-- + [z]n An
  = ([w]1 + [zl I) A1 + ([w]2 + [z]2) A2 + -. -+ ([w]n + [z]n) An
    [w+z]A1+[w+z]2A2+[w+z]3A3+...+[w+z]nA
    [Y  A1+ [y]2 A2 + [y]3 A3 + ... + [y]n An


Theorem SLSLC [93]
Property ZC [86]
Theorem SLSLC [93]


Theorem VSPCV [86]
Definition CVA [84]
Definition of y


Applying Theorem SLSLC [93] we see that the vector y is a solution to [S(A, b).
   (-) Suppose y is a solution to [S(A, b). Then

      0=b-b


[yi A1 + [y]2A2 + [y]3A3 + -.-+ [y]n An
      - ([w]1 A1 + [w]2 A2 + [w]3 A3 + --+ [w]n An)
 y   - [w] ) A1 + ([y]2 - [w]2) A2 + ... + ([y]n - [w]n) An
[y-w]1A1+ [y - w]2A2 + [y - w]3A3-+ - - - -|y - w] A


Theorem SLSLC [93]


Theorem VSPCV [86]
Definition CVA [84]


By Theorem SLSLC [93] we see that the vector y - w is a solution to the homogeneous system [S(A, 0)
and by Definition NSM [64], y - w E N(A). In other words, y - w = z for some vector z E N(A).
Rewritten, this is y = w + z, as desired.                                                        U
   After proving Theorem NMUS [74] we commented (insufficiently) on the negation of one half of the the-
orem. Nonsingular coefficient matrices lead to unique solutions for every choice of the vector of constants.


Version 2.02


﻿
                                     Subsection LC.PSHS   Particular Solutions, Homogeneous Solutions 106


What does this say about singular matrices? A singular matrix A has a nontrivial null space (Theorem
NMTNS [74]). For a given vector of constants, b, the system IJS(A, b) could be inconsistent, meaning
there are no solutions. But if there is at least one solution (w), then Theorem PSPHS [105] tells us there
will be infinitely many solutions because of the role of the infinite null space for a singular matrix. So a
system of equations with a singular coefficient matrix never has a unique solution. Either there are no
solutions, or infinitely many solutions, depending on the choice of the vector of constants (b).

Example PSHS
Particular solutions, homogeneous solutions, Archetype D
Archetype D [716] is a consistent system of equations with a nontrivial null space. Let A denote the
coefficient matrix of this system. The write-up for this system begins with three solutions,

                         0                           4                           7
                         1                           0                           8
                   Y1 =  2                    Y      0                    y3s=   1
                         1                           0                           3

We will choose to have yi play the role of w in the statement of Theorem PSPHS [105], any one of the
three vectors listed here (or others) could have been chosen. To illustrate the theorem, we should be able
to write each of these three solutions as the vector w plus a solution to the corresponding homogeneous
system of equations. Since 0 is always a solution to a homogeneous system we can easily write

                                           y = w=w + 0.

The vectors y2 and y3 will require a bit more effort. Solutions to the homogeneous system [S(A, 0) are
exactly the elements of the null space of the coefficient matrix, which by an application of Theorem VFSLS
[99] is
                                             -3         2

                              P1(A)    {X3[1+X4[X--4]0| 3:3, 4 E C
                                             0          1

Then
                       4      0      4       0             -3           2
                     = 0  =   1      -1   =  1              -1          3     =
                y2     0         +                                      0 + K2) +(-1) = w + z2
                   02         2+     -2      2 +      -)1       +(      0)
                      0       1      -1      1              0           1

where


                                 z2 [  ] (-2) ['J + (-1)]


is obviously a solution of the homogeneous system since it is written as a linear combination of the vectors
describing the null space of the coefficient matrix (or as a check, you could just evaluate the equations in
the homogeneous system with z2).
    Again


       7      0      7       0             -3        2
       8      1      7       1              -1       3
ya = 1        2 +    -       2+      (    1     + 2        = w + z3
      3]      1      2       1              0        1i]


Version 2.02


﻿
                                                         Subsection LC.READ  Reading Questions 107


where
                                        7            -3       2
                                        7            - 1      3
                                 Z3    [Jr]   (-1) [']+2[]
                                        2            0        1

is obviously a solution of the homogeneous system since it is written as a linear combination of the vectors
describing the null space of the coefficient matrix (or as a check, you could just evaluate the equations in
the homogeneous system with z2).
   Here's another view of this theorem, in the context of this example. Grab two new solutions of the
original system of equations, say

                                11                                  -4
                                0                                    2
                         Y4 =   -3                            y5 =   4
                                L-12

and form their difference,
                                         11      -4       15
                                         0        2      -2
                                   u    [ ]=-     4   = [n    .
                                         -3       4      -7
                                         . 1.    . 2 _   -3_

It is no accident that u is a solution to the homogeneous system (check this!). In other words, the difference
between any two solutions to a linear system of equations is an element of the null space of the coefficient
matrix. This is an equivalent way to state Theorem PSPHS [105]. (See Exercise MM.T50 [207]).  0

   The ideas of this subsection will be appear again in Chapter LT [452] when we discuss pre-images of
linear transformations (Definition PI [465]).


Subsection READ
Reading Questions


  1. Earlier, a reading question asked you to solve the system of equations

                                           2xi + 3X2 - X3 = 0
                                           Xi + 2X2 + X3 =3
                                           zi + 3x2 + 3x3 = 7

     Use a linear combination to rewrite this system of equations as a vector equality.

  2. Find a linear combination of the vectors


                       1
that equals the vector -9.
                       11


Version 2.02


﻿
                                                           Subsection LC.READ  Reading Questions 108


3. The matrix below is the augmented matrix of a system of equations, row-reduced to reduced row-
   echelon form. Write the vector form of the solutions to the system.

                                         1   3   0    6    0    9
                                         0   0 [     -2    0   -s
                                         0   0   0    0   []    3]


Version 2.02


﻿
                                                                     Subsection LC.EXC  Exercises 109


Subsection EXC
Exercises


C21 Consider each archetype that is a system of equations. For individual solutions listed (both for the
original system and the corresponding homogeneous system) express the vector of constants as a linear
combination of the columns of the coefficient matrix, as guaranteed by Theorem SLSLC [93]. Verify this
equality by computing the linear combination. For systems with no solutions, recognize that it is then
impossible to write the vector of constants as a linear combination of the columns of the coefficient matrix.
Note too, for homogeneous systems, that the solutions give rise to linear combinations that equal the zero
vector.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]
Archetype E [720]
Archetype F [724]
Archetype G [729]
Archetype H [733]
Archetype I [737]
Archetype J [741]

Contributed by Robert Beezer Solution [110]

C22 Consider each archetype that is a system of equations. Write elements of the solution set in vector
form, as guaranteed by Theorem VFSLS [99].
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]
Archetype E [720]
Archetype F [724]
Archetype G [729]
Archetype H [733]
Archetype I [737]
Archetype J [741]

Contributed by Robert Beezer Solution [110]

C40 Find the vector form of the solutions to the system of equations below.

                                         2xi - 4x2 + 3x3 + 375 =6
                                  xi -2x72 -233+14x74 -4x5=15
                                     xi - 2x72 +373 + 2374 +3-75 =-1
                                       -2xi + 4x2 - 12374 + z5 = -7


Contributed by Robert Beezer Solution [110]

C41 Find the vector form of the solutions to the system of equations below.

                     -2xi - 1x2 - 8x3 + 8x4 + 4x5 - 9x6 - 1x7 - 138 - 18xg = 3


Version 2.02


﻿
Subsection LC.EXC  Exercises 110


                      31i - 232 + 5x3 + 2x4 - 2x5 - 5x6 + 137 + 2x8 + 153:9
                                  41i - 2x2 + 833 + 215 - 1436 - 238 + 239
                                  -1l1 + 2X2 + 1X3 - 6X4 + 7z: - 1X7 - 3xg
                          31i + 2x2 + 1313 - 14x4 - 135 + 516 - 138 + 123:9
                     -2x1 + 2x2-2x3-4x4 + 15 + 6x6-2x7-238 - 153:9


Contributed by Robert Beezer Solution [110]

M10 Example TLC [90] asks if the vector

                                                   13
                                                   15
                                                   5
                                                   -17
                                                   2
                                                   25 _

can be written as a linear combination of the four vectors


10
36
-8
15
-7


ui1


2
4
-3
1
2
-9-


U2


6
3
0
-2
1
4


U3


~-5-
2
1
1
-3
0


U4


3
2
-5
7
1
_3_


Can it? Can any vector in C6 be written as a linear combination of the four vectors u1, u2, u3, u4?
Contributed by Robert Beezer Solution [111]

M11 At the end of Example VFS [96], the vector w is claimed to be a solution to the linear system
under discussion. Verify that w really is a solution. Then determine the four scalars that express w as a
linear combination of c, u1, u2, u3.
Contributed by Robert Beezer Solution [111]


Version 2.02


﻿
                                                                   Subsection LC.SOL Solutions 111


Subsection SOL
Solutions


C21    Contributed by Robert Beezer   Statement [108]
Solutions for Archetype A [702] and Archetype B [707] are described carefully in Example AALC [92] and
Example ABLC [91].

C22    Contributed by Robert Beezer   Statement [108]
Solutions for Archetype D [716] and Archetype I [737] are described carefully in Example VFSAD [95] and
Example VFSAI [102]. The technique described in these examples is probably more useful than carefully
deciphering the notation of Theorem VFSLS [99]. The solution for each archetype is contained in its
description. So now you can check-off the box for that item.

C40    Contributed by Robert Beezer   Statement [108]
Row-reduce the augmented matrix representing this system, to find


1 -2     0
0    0
0   0    0
0   0    0


6
-4
0
0


0    1
0    3
0    -5
0 0_


The system is consistent (no leading one in column 6, Theorem RCLS [53]). x2 and x4 are the free variables.
Now apply Theorem VFSLS [99] directly, or follow the three-step process of Example VFS [96], Example
VFSAD [95], Example VFSAI [102], or Example VFSAL [103] to obtain


XI       1
X2       0
X3  =    3
X4       0
_z5     -5


      2        -6
      1         0
+X2   0  +X4    4
      0         1
      0         0


C41    Contributed by Robert Beezer   Statement [108]
Row-reduce the augmented matrix representing this system, to find


0
0
0
0
0


0
01
0
0
0
0


3
2
0
0
0
0


-2
-4
0
0
0
0


0
0
01
0
0
0


-1
3
-2
0
0
0


0    0
0    0
0    0
Ql   0
0 0l
0 0


3
2
-1
4
2
0


6
-1
3
0
-2
0


The system is consistent (no leading one in column 10, Theorem RCLS [53]). F = {3, 4, 6, 9, 10}, so the
free variables are 33, 34, x6 and 39. Now apply Theorem VFSLS [99] directly, or follow the three-step
process of Example VFS [96], Example VFSAD [95], Example VFSAI [102], or Example VFSAL [103] to


Version 2.02


﻿
Subsection LC.SOL  Solutions 112


obtain the solution set


sH


-1
0
0
3
0
0
-2
_0 .


+ 3


-3-
-2
1
0
0
0
0
0
0 _


+ X4


4
0
1
0
0
0
0
_0_


+ z6


1-
-3
0
0
2
1
0
0
0


+ z9


-3-
-2
0
0
1
0
-4
-2
1_


33, 14, 16, 3:9 E C


M10     Contributed by Robert Beezer    Statement [109]
No, it is not possible to create w as a linear combination of the four vectors ui, u2, u3, U4. By creating the
desired linear combination with unknowns as scalars, Theorem SLSLC [93] provides a system of equations
that has no solution. This one computation is enough to show us that it is not possible to create all the
vectors of C6 through linear combinations of the four vectors ui, u2, u3, u4.
M11 Contributed by Robert Beezer Statement [109]
The coefficient of c is 1. The coefficients of ui, u2, u3 lie in the third, fourth and seventh entries of w.
Can you see why? (Hint: F = {3, 4, 7, 8}, so the free variables are 33, 34 and x7.)


Version 2.02


﻿
                                                                      Section SS Spanning Sets 113


Section SS
Spanning Sets


In this section we will describe a compact way to indicate the elements of an infinite set of vectors, making
use of linear combinations. This will give us a convenient way to describe the elements of a set of solutions
to a linear system, or the elements of the null space of a matrix, or many other sets of vectors.

Subsection SSV
Span of a Set of Vectors


In Example VFSAL [103] we saw the solution set of a homogeneous system described as all possible linear
combinations of two particular vectors. This happens to be a useful way to construct or describe infinite
sets of vectors, so we encapsulate this idea in a definition.
Definition SSCV
Span of a Set of Column Vectors
Given a set of vectors S = {ui, u2, u3, ... , up}, their span, (S), is the set of all possible linear combina-
tions of ui, u2, u3, ... , up. Symbolically,

                     (S) = {aCui + a2u2 + asus + --. + apu |ai EC, 1 <i < p}

                        ={>cauiui |aiEC, 1i p
                           ( i=1

(This definition contains Notation SSV.)                                                        A
   The span is just a set of vectors, though in all but one situation it is an infinite set. (Just when is it
not infinite?) So we start with a finite collection of vectors S (p of them to be precise), and use this finite
set to describe an infinite set of vectors, (S). Confusing the finite set S with the infinite set (S) is one of
the most pervasive problems in understanding introductory linear algebra. We will see this construction
repeatedly, so let's work through some examples to get comfortable with it. The most obvious question
about a set is if a particular item of the correct type is in the set, or not.
Example ABS
A basic span
Consider the set of 5 vectors, 5, from C4


and consider the infinite set of vectors (S) formed from all possible linear combinations of the elements of
S. Here are four vectors we definitely know are elements of (S), since we will construct them in accordance
with Definition SSCV [112],


         1         2            7          1         ~-1      ~-4
w   = (2) 3 + (1)   2  + (-1)    5  + (2)   _1i + (3) H     [2

         [i.      [-1]         [-5]       [2]        _ 0j10


Version 2.02


﻿
Subsection SS.SSV  Span of a Set of Vectors 114


      1            2            7

x =(5 3 +  (-6)    2   + (-3)   53

      .l.[--1..-5]


        1
+ (4) _1y

        2


       -1       -26
       0 -6
+(2)        -2
        0      [34]


         K]
y = (1) 3


z = (0) 3


+ (0) K


+(0) r

       .


2]
1
2
-1.
2
1
2
-1.


+(1) r


+(0)


7           1          -1        7

5   +  (0)  _1  + (1)   9 -     17
-5]        _ 2_0_              _-4_
7            1          -1      0

5  + (0) _1  + (0)  9  0
.-5_         2 __0         _    [_0


The purpose of a set is to collect objects with some common property, and to exclude objects without that
property. So the most fundamental question about a set is if a given object is an element of the set or not.
Let's learn more about (S) by investigating which vectors are elements of the set, and which are not.
                 [-15

   First, is u =  -6  an element of (S)? We are asking if there are scalars O, 2, as, a4, as such that
                  19


    1          2
    1 +        1
1 3          22
    .l.       -1.


+ a3


7]
3
5
-5_


+ a4


1          -1           -15
1           0           -6
_-1 + a5  9  =u 19S
2  __0 __                5 _


Applying Theorem SLSLC [93] we recognize the search for these scalars as a solution to a linear system of
equations with augmented matrix


1 2
1 1
3 2
1 -1


7
3
5
-5


   1
   1
   -1
   2


1 0
    0

    0


-1
0
9
0


-15
-6
19
5


which row-reduces to


1
0
0
0


0
01
0
0


4
0
0


3    10
-1 -9
-2 -7
0 0_


At this point, we see that the system is consistent (Theorem RCLS [53]), so we know there is a solution
for the five scalars a1, a2, a3, a4, a5. This is enough evidence for us to say that u E (S). If we wished
further evidence, we could compute an actual solution, say


a1 = 2


a2= 1


a3 --2


a4= -3


a5= 2


This particular solution allows us to write


    [1]
(2)
    .l.


       . 2

+ (1) 21

       .-1.


         7

+ (-2) 53

        .-5_


+(-3) r


11
1

2]


2
+ L2)


-11
0
9
0]


      ~-15
      -6
u 4Y9]
      _ 5 _


Version 2.02


﻿
                                                         Subsection SS.SSV Span of a Set of Vectors 115


making it even more obvious that u E (S).
                             3

   Lets do it again. Is v =  2   an element of (S)? We are asking if there are scalars ai, a2, a3, a4, a5


such that
                       1         2          7          1          -1           3
                       1         1          3          1          0            1
                   a +3+2 5]+a 3            5] +a4 [_] +         [9           [= 2

                      .l.       -1.        -5_        _2 __0        _-1_
Applying Theorem SLSLC [93] we recognize the search for these scalars as a solution to a linear system of
equations with augmented matrix

                                      1   2    7    1   -1   3
                                      1   1    3    1   0    1
                                      3   2    5   -1   9    2
                                      1 -1 -5       2   0   -1]

which row-reduces to
                                     1     0   -1   0    3    0
                                       0  F     4   0   -1    0
                                       0   0    0  F1   -2    0
                                       0o000 0E
At this point, we see that the system is inconsistent by Theorem RCLS [53], so we know there is not a
solution for the five scalars ai, a2, a3, a4, a5. This is enough evidence for us to say that v 0 (S). End of
story.

Example SCAA
Span of the columns of Archetype A
Begin with the finite set of three vectors of size 3

                                                        1  -1     2
                               S = {ui,1u2, u3} =    2  ,      1 1
                                                     1      1     -0

and consider the infinite set (S). The vectors of S could have been chosen to be anything, but for reasons
that will become clear later, we have chosen the three columns of the coefficient matrix in Archetype A
[702]. First, as an example, note that


                                 v[()2] + (-3) [1] + (7) [2 [22]


is in (5), since it is a linear combination of u11, 112, 113. We write this succinctly as v E (S). There is
nothing magical about the scalars ai6= 5, 062 =-3, as   7, they could have been chosen to be anything.
So repeat this part of the example yourself, using different values of ai, 0a2, 063. What happens if you
choose all three scalars to be zero?
   So we know how to quickly construct sample elements of the set (S). A slightly different question arises


when you are handed a vector of the correct size and asked if it is an element of (S). For example, is
      1
w =   8 in (S)? More succinctly, w E (S)?
      5


Version 2.02


﻿
                                                         Subsection SS.SSV  Span of a Set of Vectors 116


    To answer this question, we will look for scalars oi, aC2, a3 so that

                                       aiUi + a2U2 + a3U3 =W.

By Theorem SLSLC [93] solutions to this vector equation are solutions to the system of equations

                                             ai - a~2 + 2as = 1
                                             2ci + (k2 + O3s= 8
                                                   ai + (2 = 5.

Building the augmented matrix for this linear system, and row-reducing, gives

                                          1   0    1   3
                                          0  [-1  -1 2.
                                          _0  0    0   0_

This system has infinitely many solutions (there's a free variable in x3), but all we need is one solution
vector. The solution,

                     a1=2                       °2=3                       O3=1

tells us that
                                      (2)ui + (3)u2 + (1)u3 = w

so we are convinced that w really is in (S). Notice that there are an infinite number of ways to answer
this question affirmatively. We could choose a different solution, this time choosing the free variable to be
zero,

                     a1=3                       O2=2                       O3=0

shows us that
                                      (3)ui + (2)U2 + (0)U31=1W

Verifying the arithmetic in this second solution maybe makes it seem obvious that w is in this span? And
of course, we now realize that there are an infinite number of ways to realize w as element of (S). Let's
                                                            2
ask the same type of question again, but this time with y = 4 , i.e. is y E (S)?
                                                            -3
    So we'll look for scalars ai, o2, O3 so that

                                       aiU1 + a2U2 + as3Us  y

By Theorem SLSLC [93] solutions to this vector equation are the solutions to the system of equations


                                             ai - a2 + 2as = 2
                                             2ai + a2 + as = 4
                                                   ai + a2 =3.

Building the augmented matrix for this linear system, and row-reducing, gives


10 1 0
0        -1   0
0 0 0 R_]


Version 2.02


﻿
                                                         Subsection SS.SSV  Span of a Set of Vectors 117


This system is inconsistent (there's a leading 1 in the last column, Theorem RCLS [53]), so there are no
scalars ai, a2, a3 that will create a linear combination of ui, u2, u3 that equals y. More precisely, y 0 (S).
   There are three things to observe in this example. (1) It is easy to construct vectors in (S). (2) It is
possible that some vectors are in (S) (e.g. w), while others are not (e.g. y). (3) Deciding if a given vector
is in (S) leads to solving a linear system of equations and asking if the system is consistent.
   With a computer program in hand to solve systems of linear equations, could you create a program to
decide if a vector was, or wasn't, in the span of a given set of vectors? Is this art or science?
   This example was built on vectors from the columns of the coefficient matrix of Archetype A [702].
Study the determination that v E (S) and see if you can connect it with some of the other properties of
Archetype A [702].

   Having analyzed Archetype A [702] in Example SCAA [114], we will of course subject Archetype B
[707] to a similar investigation.
Example SCAB
Span of the columns of Archetype B
Begin with the finite set of three vectors of size 3 that are the columns of the coefficient matrix in Archetype
B [707],
                                                  _ 7-     -6      -12
                             R ={vi, v2, v3}=       5       5] ,    7
                                                   11       0       4
and consider the infinite set V = (R). First, as an example, note that

                                    _7-        -6            -12       -2
                           x = (2) 5    +(4) 5      +(-3)     7] =      9
                                    1           0             4        -10  

is in (R), since it is a linear combination of vi, v2, v3. In other words, x E (R). Try some different values
of ai, a2, a3 yourself, and see what vectors you can create as elements of (R).
                                                                       -33
   Now ask if a given vector is an element of (R). For example, is z =[24  in (R)? Is z E (R)?
                                                                        5
   To answer this question, we will look for scalars ai, a2, a3 so that

                                      aivi + a2v2 + a3v3 = Z.

By Theorem SLSLC [93] solutions to this vector equation are the solutions to the system of equations

                                       -7c01- 602 -1203= -33
                                          5c01 + 5c02 + 703 = 24
                                                 ai1 + 4as = 5.

Building the augmented matrix for this linear system, and row-reducing, gives

                                         [ Wo -31
                                         0  2    0    5  .
                                         0 0 2 2]


This system has a unique solution,

                    ai=-3                        a2=5                       O= 2


Version 2.02


﻿
                                                   Subsection SS.SSNS Spanning Sets of Null Spaces 118


telling us that
                                     (-3)vi + (5)v2 + (2)v3 = z
so we are convinced that z really is in (R). Notice that in this case we have only one way to answer the
question affirmatively since the solution is unique.

   Let's ask about another vector, say is x = 8 in (R)? Is x E (R)?
                                             .-3_
   We desire scalars ai, c2, O3 so that

                                     O1V1 + a2V2 + O3V3 =X.

By Theorem SLSLC [93] solutions to this vector equation are the solutions to the system of equations

                                      -71-6c2-1203= -7
                                         5o1 + 5o2 + 703 = 8
                                                afi + 4a3 =-3.

Building the augmented matrix for this linear system, and row-reducing, gives

                                        1   0   0    1
                                        0   I   0    2
                                        0   0   F1  -1_

This system has a unique solution,

                    a1=O12e=2                                           a3=-1

telling us that
                                     (1)vi + (2)v2 + (-1)v3 = x
so we are convinced that x really is in (R). Notice that in this case we again have only one way to answer
the question affirmatively since the solution is again unique.
   We could continue to test other vectors for membership in (R), but there is no point. A question
about membership in (R) inevitably leads to a system of three equations in the three variables ai, a2, a3
with a coefficient matrix whose columns are the vectors vi, v2, v3. This particular coefficient matrix is
nonsingular, so by Theorem NMUS [74], the system is guaranteed to have a solution. (This solution is
unique, but that's not critical here.) So no matter which vector we might have chosen for z, we would have
been certain to discover that it was an element of (R). Stated differently, every vector of size 3 is in (R),
or (R) =C3.
   Compare this example with Example SCAA [114], and see if you can connect z with some aspects of
the write-up for Archetype B [707].


Subsection SSNS
Spanning Sets of Null Spaces


We saw in Example VESAL [103] that when a system of equations is homogeneous the solution set can


be expressed in the form described by Theorem VFSLS [99] where the vector c is the zero vector. We can
essentially ignore this vector, so that the remainder of the typical expression for a solution looks like an arbi-
trary linear combination, where the scalars are the free variables and the vectors are ui, u2, u3, ... , un-r
Which sounds a lot like a span. This is the substance of the next theorem.


Version 2.02


﻿
                                                  Subsection SS.SSNS Spanning Sets of Null Spaces 119


Theorem SSNS
Spanning Sets for Null Spaces
Suppose that A is an m x n matrix, and B is a row-equivalent matrix in reduced row-echelon form with r
nonzero rows. Let D = {di, d2, d3, ..., dr } be the column indices where B has leading 1's (pivot columns)
and F = {fi, f2, f3, ..., ,-r} be the set of column indices where B does not have leading 1's. Construct
the n - r vectors z3, 1 < j < n - r of size n as

                                          11        if i E F, i=f
                                 [z]=0if i E F, i - f
                                        {-[B]kfj   iiDid
                                            - [B]   if i & D, i =dk
Then the null space of A is given by

                                  P1(A) = ({zi, z2, z3, ..., zn-r}) -


Proof   Consider the homogeneous system with A as a coefficient matrix, [S(A, 0). Its set of solutions,
S, is by Definition NSM [64], the null space of A, P1(A). Let B' denote the result of row-reducing the
augmented matrix of this homogeneous system. Since the system is homogeneous, the final column of
the augmented matrix will be all zeros, and after any number of row operations (Definition RO [28]), the
column will still be all zeros. So B' has a final column that is totally zeros.
   Now apply Theorem VFSLS [99] to B', after noting that our homogeneous system must be consistent
(Theorem HSC [62]). The vector c has zeros for each entry that corresponds to an index in F. For entries
that correspond to an index in D, the value is - [B']kfn+l, but for B' any entry in the final column (index
n + 1) is zero. So c = 0. The vectors z3, 1 < j < n - r are identical to the vectorsu, 1 < j  n - r
described in Theorem VFSLS [99]. Putting it all together and applying Definition SSCV [112] in the final
step,

            AN(A)=S
                  = {C + a1u1 + ae2U2 + a3U3 +   + -.-+|n-run-rC |1, a2, a3, .. . ,  C Cn-r E C}
                    {1iUi + Oa2U2 + a3U3 +  + --.-|-n-run-r ci, a2, 3, ... , CCn-r E C}
                  = ({zi, z2, z3, ..., zn-r})

                                                                                                 U

Example SSNS
Spanning set of a null space
Find a set of vectors, S, so that the null space of the matrix A below is the span of S, that is, (S) - P1(A).


The null space of A is the set of all solutions to the homogeneous system [S(A, 0). If we find the vector


form of the solutions to this homogeneous system (Theorem VFSLS [99]) then the vectors u3, 1 < j <r n-r
in the linear combination are exactly the vectors z3, 1 < j <rn - r described in Theorem SSNS [118]. So
we can mimic Example VFSAL [103] to arrive at these vectors (rather than being a slave to the formulas
in the statement of the theorem).


Version 2.02


﻿
                                                   Subsection SS.SSNS  Spanning Sets of Null Spaces 120


   Begin by row-reducing A. The result is

                                         [W  0    6    0   4]
                                         0  [    -1    0  -2
                                         0   0    0   W    3
                                         0   0    0   0    0]

With D = {1, 2, 4} and F = {3, 5} we recognize that x3 and x5 are free variables and we can express each
nonzero row as an expression for the dependent variables zi, z2, x4 (respectively) in the free variables x3
and x5. With this we can write the vector form of a solution vector as

                              -zi     -6X3 - 4x5          -6        -4
                              X2       x33+2x5            1          2
                              33 3=3               =z3    1   +3:5   0
                              X4         -3x5             0          -3
                              _5z _        z             _ 0         1

Then in the notation of Theorem SSNS [118],

                                 -6                                   -4
                                 1                                     2
                           zi=    1                             Z2=    0
                                  0                                   -3
                                  0 _1

and
                                                        --6    -4
                                                        1       2
                              AA(A) =_({zi, z2}) =       1   ,  0
                                                         0     -3
                                                         _0 _   1 _


Example NSDS
Null space directly as a span
Let's express the null space of A as the span of a set of vectors, applying Theorem SSNS [118] as econom-
ically as possible, without reference to the underlying homogeneous system of equations (in contrast to
Example SSNS [118]).
                                        2    1    5    1    5    1
                                        1    1    3    1    6   -1
                                  A=    -1   1   -1    0    4   -3
                                        -3   2   -4   -4   -7   0
                                        _3  -1    5    2    2   3_
Theorem SSNS [118] creates vectors for the span by first row-reducing the matrix in question. The row-
reduced version of A is

                                          0  2   1    0   3    -
                                   B =    0   0   0o      4   -21


                                          0   0   0   0   0    0
                                          0   0   0   0   0    0

We will mechanically follow the prescription of Theorem SSNS [118]. Here we go, in two big steps.


Version 2.02


﻿
                                                    Subsection SS.SSNS  Spanning Sets of Null Spaces 121


    First, the indices of the non-pivot columns have indices F = {3, 5, 6}, so we will construct the n - r
6 - 3 = 3 vectors with a pattern of zeros and ones corresponding to the indices in F. This is the realization
of the first two lines of the three-case definition of the vectors z3, 1 < j r n - r.


Zi


1

0
_0_


Z2


0

1
_0


0

0
1


Each of these vectors arises due to the presence of a column that is not a pivot column. The remaining
entries of each vector are the entries of the corresponding non-pivot column, negated, and distributed into
the empty slots in order (these slots have indices in the set D and correspond to pivot columns). This is
the realization of the third line of the three-case definition of the vectors z3, 1 < j r n - r.


Z1


-2
-1
1
0
0
0_


Z2


1
-3
0
-4
  1
_0_


-2
1
0
2
0
1


So, by Theorem SSNS [118], we have


N (A) = ({zi, z2, Z3})


<I


-2
-1
1
0
0
.0_


1=
-3
0
-4
1
0


-2
1
0
2
0
1


I>


We know that the null space of A is the solution set of the homogeneous system [S(A, 0),
in this application of Theorem SSNS [118] have we found occasion to reference the variables
of this system. These details are all buried in the proof of Theorem SSNS [118].


but nowhere
or equations


   More advanced computational devices will compute the null space of a matrix.See:  Computation
NS.MMA [669] . Here's an example that will simultaneously exercise the span construction and Theorem
SSNS [118], while also pointing the way to the next section.
Example SCAD
Span of the columns of Archetype D
Begin with the set of four vectors of size 3

                                                     2     1      7      -7
                       T = {wi, w2, w3, w4}      { -3      4 ,  [, -5        }-6
                                                   11      1      4      -5

and consider the infinite set W = (T). The vectors of T have been chosen as the four columns of the
coefficient matrix in Archetype D [716]. Check that the vector

                                                     2
                                                     3
                                               Z2 =0
                                                     1


Version 2.02


﻿
                                                   Subsection SS.SSNS Spanning Sets of Null Spaces 122


is a solution to the homogeneous system IJS(D, 0) (it is the vector z2 provided by the description of the
null space of the coefficient matrix D from Theorem SSNS [118]). Applying Theorem SLSLC [93], we can
write the linear combination,
                                    2wi+3w2+Ow3+1w4=0

which we can solve for w4,
                                      W4 = (-2)wi + (-3)w2.

This equation says that whenever we encounter the vector W4, we can replace it with a specific linear
combination of the vectors wi and w2. So using W4 in the set T, along with wi and w2, is excessive. An
example of what we mean here can be illustrated by the computation,

           5wi + (-4)w2 + 6w3 + (-3)w4 = 5wi + (-4)w2 + 6w3 + (-3) ((-2)wi + (-3)w2)
                                         = 5wi + (-4)w2 + 6w3 + (6wi + 9w2)
                                         = 11w1 + 5w2 + 6w3.

So what began as a linear combination of the vectors wi, w2, w3, w4 has been reduced to a linear combi-
nation of the vectors wi, w2, w3. A careful proof using our definition of set equality (Definition SE [684])
would now allow us to conclude that this reduction is possible for any vector in W, so

                                        W ={wi, w2, w3}.

So the span of our set of vectors, W, has not changed, but we have described it by the span of a set of three
vectors, rather than four. Furthermore, we can achieve yet another, similar, reduction.
   Check that the vector
                                                   -3
                                                 _-1
                                             zi [ ]

                                                    0
is a solution to the homogeneous system [S(D, 0) (it is the vector zi provided by the description of the
null space of the coefficient matrix D from Theorem SSNS [118]). Applying Theorem SLSLC [93], we can
write the linear combination,
                                    (-3)wi + (-1)w2 + 1w3 = 0

which we can solve for w3,
                                          w  =3w1 + 1w2.

This equation says that whenever we encounter the vector w3, we can replace it with a specific linear
combination of the vectors w1 and w2. So, as before, the vector w3 is not needed in the description of W,
provided we have w   i and w2 available. In particular, a careful proof (such as is done in Example RSC5
[153]) would show that
                                          W = ({wi, w2}) .
So W began life as the span of a set of four vectors, and we have now shown (utilizing solutions to a
homogeneous system) that W can also be described as the span of a set of just two vectors. Convince
yourself that we cannot go any further. In other words, it is not possible to dismiss either wi or w2 in a
similar fashion and winnow the set down to just one vector.
   What was it about the original set of four vectors that allowed us to declare certain vectors as surplus?
And just which vectors were we able to dismiss? And why did we have to stop once we had two vectors
remaining? The answers to these questions motivate "linear independence," our next section and next


definition, and so are worth considering carefully now.

   It is possible to have your computational device crank out the vector form of the solution set to a linear
system of equations.See:  Computation VFSS.MMA [669] .


Version 2.02


﻿
                                                          Subsection SS.READ   Reading Questions 123


Subsection READ
Reading Questions


  1. Let S be the set of three vectors below.

                                            1    _   .3      4
                                      S =      2   , -4 , -2
                                              -1 2 1

                                                -_i
     Let W = (S) be the span of S. Is the vector 8 in W? Give an explanation of the reason for your
                                                -4
     answer.
                                                              .6
  2. Use S and W   from the previous question. Is the vector  5  in W? Give an explanation of the
                                                              -1
     reason for your answer.

  3. For the matrix A below, find a set S so that (S) =Nf(A), where P1(A) is the null space of A. (See
     Theorem SSNS [118].)
                                                 1 3    1   9
                                           A=    2 1 -3 8
                                                 1 1 -1 5


Version 2.02


﻿
                                                                    Subsection SS.EXC Exercises 124


Subsection EXC
Exercises


C22 For each archetype that is a system of equations, consider the corresponding homogeneous system of
equations. Write elements of the solution set to these homogeneous systems in vector form, as guaranteed
by Theorem VFSLS [99]. Then write the null space of the coefficient matrix of each system as the span of
a set of vectors, as described in Theorem SSNS [118].
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]/ Archetype E [720]
Archetype F [724]
Archetype G [729]/ Archetype H [733]
Archetype I [737]
Archetype J [741]

Contributed by Robert Beezer Solution [126]

C23 Archetype K [746] and Archetype L [750] are defined as matrices. Use Theorem SSNS [118] directly
to find a set S so that (S) is the null space of the matrix. Do not make any reference to the associated
homogeneous system of equations in your solution.
Contributed by Robert Beezer Solution [126]


                           2 ]     3
C40   Suppose that S =             22    . Let W =

                           .4 _    1 _
an explicit linear combination that demonstrates this.
Contributed by Robert Beezer Solution [126]

                            2      3
C41   Suppose that S =      3'    [2]    . Let W =

                           .4 _   .1 _
explicit linear combination that demonstrates this.
Contributed by Robert Beezer Solution [126]

                       2       1      3
                       -1      1     -1
C42   Suppose R     {   3          ,  0      Is y
                       4       2      3
                       _0 _   -1_    -2_
Contributed by Robert Beezer Solution [127]


(S) and let x


(S) and let y


  5

  8    Is x E W? If so, provide
  -12
[-5_j


5

31 Is
.5_


y E W? If so, provide an


1
-1
-8 in (R)?
-4
-3


                       2       1      3             1
                       -1      1     -1             1
C43   Suppose R =      3   ,   2   ,  0    . Is z = 5 in (R)?
                       4       2      3             3
                       0 _    -1_    _-2_           _11_
Contributed by Robert Beezer Solution [127]


Version 2.02


﻿
                                                                    Subsection SS.EXC  Exercises 125


                            -1     3     1     -6                                -5
C44    Suppose thatS {=      2] ,1]      5]   [5    . Let W=(S) and lety=         3   .Is y EW? If
                           11      2     4-    1                                  0
so, provide an explicit linear combination that demonstrates this.
Contributed by Robert Beezer Solution [128]

                           .-1_    3    1     -6                                2
C45    Suppose that S   {=  2]   ] [1 ] 5      }. Let W = (S) and let w =  1 . Is w E W? If so,
                          11       2    4      1                                -3
provide an explicit linear combination that demonstrates this.
Contributed by Robert Beezer Solution [128]

C50 Let A be the matrix below.
(a) Find a set S so that N(A) = (S).
              3

(b) If z    [=[] ,then show directly that z E N(A).

              2
(c) Write z as a linear combination of the vectors in S.

                                               2   3 1 4
                                        A=     1   2 1 3
                                              -1 0 1 1

Contributed by Robert Beezer Solution [129]

C60 For the matrix A below, find a set of vectors S so that the span of S equals the null space of A,
(S) = N(A).
                                             1   1    6   -8
                                      A41        -20       1
                                            -2 1 -6 7_

Contributed by Robert Beezer Solution [130]

M20 In Example SCAD [120] we began with the four columns of the coefficient matrix of Archetype D
[716], and used these columns in a span construction. Then we methodically argued that we could remove
the last column, then the third column, and create the same set by just doing a span construction with the
first two columns. We claimed we could not go any further, and had removed as many vectors as possible.
Provide a convincing argument for why a third vector cannot be removed.
Contributed by Robert Beezer

M21 In the spirit of Example SCAD [120], begin with the four columns of the coefficient matrix of
Archetype C [712], and use these columns in a span construction to build the set S. Argue that S can be
expressed as the span of just three of the columns of the coefficient matrix (saying exactly which three)
and in the spirit of Exercise SS.M20 [124] argue that no one of these three vectors can be removed and
still have a span construction create S.
Contributed by Robert Beezer Solution [130]

T1O Suppose that vi, v2 E Ctm. Prove that


                                  ({vi, v2}) = ({vi, v2, 5v1 + 3v2})


Contributed by Robert Beezer Solution [130]


Version 2.02


﻿
                                                                     Subsection SS.EXC  Exercises 126


T20 Suppose that S is a set of vectors from Cm. Prove that the zero vector, 0, is an element of (S).
Contributed by Robert Beezer Solution [131]

T21    Suppose that S is a set of vectors from Ctm and x, y E (8). Prove that x + y E (8).
Contributed by Robert Beezer

T22    Suppose that S is a set of vectors from Ctm, a E C, and x E (5). Prove that ax E (5).
Contributed by Robert Beezer


Version 2.02


﻿
                                                                     Subsection SS.SOL Solutions 127


Subsection SOL
Solutions


C22    Contributed by Robert Beezer  Statement [123]
The vector form of the solutions obtained in this manner will involve precisely the vectors described in
Theorem SSNS [118] as providing the null space of the coefficient matrix of the system as a span. These
vectors occur in each archetype in a description of the null space. Studying Example VFSAL [103] may
be of some help.
C23    Contributed by Robert Beezer  Statement [123]
Study Example NSDS [119] to understand the correct approach to this question. The solution for each is
listed in the Archetypes (Appendix A [698]) themselves.
C40    Contributed by Robert Beezer  Statement [123]
Rephrasing the question, we want to know if there are scalars c1 and a2 such that

                                         2         3         5
                                         -1        2         8
                                    Oi [3]+        -2      -12
                                        .4 _1         _     -5_

Theorem SLSLC [93] allows us to rephrase the question again as a quest for solutions to the system of four
equations in two unknowns with an augmented matrix given by

                                            2    3    5
                                            -1 2 8
                                            3   -2 -12
                                            4 1 -5_

This matrix row-reduces to
                                               I1 0  -2
                                             0        23
                                             0   0    0
                                             0 0 0_

From the form of this matrix, we can see that ai= -2 and a2 = 3 is an affirmative answer to our question.
More convincingly,


                                  (-2) [     + (3)[1


C41 Contributed by Robert Beezer Statement [123]
Rephrasing the question, we want to know if there are scalars ai and ai2 such that


                                          2          3       5
                                          -1         2       1
                                     ai   3   +a    -2       3
                                          4          1       5

Theorem SLSLC [93] allows us to rephrase the question again as a quest for solutions to the system of four


Version 2.02


﻿
                                                                      Subsection SS.SOL Solutions 128


equations in two unknowns with an augmented matrix given by

                                              2    3   5
                                              -1   2   1
                                              3   -2 3
                                              4    1   5

This matrix row-reduces to

                                              0     0 1
                                              0   0   L
                                              0   0   0

With a leading 1 in the last column of this matrix (Theorem RCLS [53]) we can see that the system of
equations has no solution, so there are no values for ci and a2 that will allow us to conclude that y is in
W. So y g W.

C42    Contributed by Robert Beezer  Statement [123]
Form a linear combination, with unknown scalars, of R that equals y,

                                    2          1          3        1
                                    -1         1         -1       -1
                               al   3   +a2    2   +a3    0   =   -8
                                    4          2          3       -4
                                    0         -1         -2      _-3_

We want to know if there are values for the scalars that make the vector equation true since that is the
definition of membership in (R). By Theorem SLSLC [93] any such values will also be solutions to the
linear system represented by the augmented matrix,

                                           2    1    3   1
                                           -1   1   -1 -1
                                           3    2    0   -8
                                           4    2    3   -4
                                           0   -1 -2     -3_

Row-reducing the matrix yields,
                                           1    0   0   -2
                                           0    0   0   -1

                                           0    0   0    2


From this we see that the system of equations is consistent (Theorem RCLS [53]), and has a unique solution.
This solution will provide a linear combination of the vectors in R that equals y. So y E R.

C43 Contributed by Robert Beezer Statement [123]
Form a linear combination, with unknown scalars, of R that equals z,


     2          1          3      1
     -1         1         -1      1
ai   3   +a2    2   +a3    0   =  5
     4          2          3      3
     0 _       -1         -2      1


Version 2.02


﻿
                                                                      Subsection SS.SOL  Solutions 129


We want to know if there are values for the scalars that make the vector equation true since that is the
definition of membership in (R). By Theorem SLSLC [93] any such values will also be solutions to the
linear system represented by the augmented matrix,

                                            2    1    3   1
                                            -1   1   -1 1
                                            3    2    0   5
                                            4    2    3   3
                                            0   -1 -2     1

Row-reducing the matrix yields,
                                           1    0    0   0
                                           0    0    0   0
                                           0    0    1   0
                                           0    0    0   1


With a leading 1 in the last column, the system is inconsistent (Theorem RCLS [53]), so there are no
scalars ai, a2, a3 that will create a linear combination of the vectors in R that equal z. So z 0 R.

C44    Contributed by Robert Beezer  Statement [124]
Form a linear combination, with unknown scalars, of S that equals y,

                                _-         3        1         -6      -5
                            ai   2   +a2 1 +a3 5 +a4          5   =    3


We want to know if there are values for the scalars that make the vector equation true since that is the
definition of membership in (S). By Theorem SLSLC [93] any such values will also be solutions to the
linear system represented by the augmented matrix,

                                          -1 3 1 -6 -5
                                          2    1 5    5    3
                                          1    2 4    1    0

Row-reducing the matrix yields,
                                            1 0   2   3    2
                                          0       1 -1 -1
                                          0 0 0 0 0_
From this we see that the system of equations is consistent (Theorem RCLS [53]), and has a infinitely many
solutions. Any solution will provide a linear combination of the vectors in R that equals y. So y E 5, for
example,

                        (-10) [2] + (-2) [1] + (3) [5 + (2) [56 [=   3


C45 Contributed by Robert Beezer Statement [124]
Form a linear combination, with unknown scalars, of S that equals w,


    _ i-       3        1         -6      2
ai   2   + a2  1  + a3  5  +a4    5    =  1
     1         2        4j         1      3


Version 2.02


﻿
                                                                      Subsection SS.SOL  Solutions 130


We want to know if there are values for the scalars that make the vector equation true since that is the
definition of membership in (S). By Theorem SLSLC [93] any such values will also be solutions to the
linear system represented by the augmented matrix,

                                           -1 3 1 -6       2
                                           2    1 5    5   1
                                           1    2 4    1   3

Row-reducing the matrix yields,
                                          1    0  2   3    0
                                          0    1   1 -1    0
                                          0    0  0   0   [Wi
With a leading 1 in the last column, the system is inconsistent (Theorem RCLS [53]), so there are no
scalars al, a2, a3, a4 that will create a linear combination of the vectors in S that equal w. So w 0 (S).
C50    Contributed by Robert Beezer  Statement [124]
(a) Theorem SSNS [118] provides formulas for a set S with this property, but first we must row-reduce A

                                                  1 0   -1 -1
                                     ARREF 0             1    2
                                                0   0    0    0

x3 and x4 would be the free variables in the homogeneous system IJS(A, 0) and Theorem SSNS [118]
provides the set S = {zi, z2} where

                                  1                                     1
                                  -1                                   -2
                           z     []Z2=                                  0

                                  0 __1                                   _

(b)  Simply employ the components of the vector z as the variables in the homogeneous system [S(A, 0).
The three equations of this system evaluate as follows,

                                     2(3) + 3(-5) + 1(1) + 4(2) = 0
                                     1(3) + 2(-5) + 1(1) + 3(2) = 0
                                     -1(3) + 0(-5) + 1(1) + 1(2) = 0

Since each result is zero, z qualifies for membership in P1(A).
    (c) By Theorem SSNS [118] we know this must be possible (that is the moral of this exercise). Find
scalars ai1 and ca2 so that


                            aizi + oa2z2 =ciq      + ci [02        1[=5   z


Theorem SLSLC [93] allows us to convert this question into a question about a system of four equations
in two variables. The augmented matrix of this system row-reduces to


0        1
0   2    2
0    0   0
0 0 0_


Version 2.02


﻿
                                                                      Subsection SS.SOL Solutions 131


A solution is ai= 1 and a2 = 2. (Notice too that this solution is unique!)
C60    Contributed by Robert Beezer  Statement [124]
Theorem SSNS [118] says that if we find the vector form of the solutions to the homogeneous system
IS(A, 0), then the fixed vectors (one per free variable) will have the desired property. Row-reduce A,
viewing it as the augmented matrix of a homogeneous system with an invisible columns of zeros as the last
column,
                                            1   0   4    -
                                            0  [T]2
                                            0   0   0   0
Moving to the vector form of the solutions (Theorem VFSLS [99]), with free variables x3 and x4, solutions
to the consistent system (it is homogeneous, Theorem HSC [62]) can be expressed as

                                       zi         -4         5
                                       k       3-2 3+4
                                       X3          1         0
                                       _z4__0        _       1_

Then with S given by
                                                 -4     5
                                             S   - 2    3
                                             S   1   '0
                                                . 0{[   1_
Theorem SSNS [118] guarantees that

                                                      -4     5

                                  N(A)= (S) ={         2     3
                                                      .1 _ _1


M21     Contributed by Robert Beezer  Statement [124]
If the columns of the coefficient matrix from Archetype C [712] are named ui, u2, u3, U4 then we can
discover the equation
                                    (-2)ui + (-3)U2 + U3+U4 = 0
by building a homogeneous system of equations and viewing a solution to the system as scalars in a linear
combination via Theorem SLSLC [93]. This particular vector equation can be rearranged to read

                                     114 =(2)ui + (3)112 + (-1)113

This can be interpreted to mean that 114 is unnecessary in ({ui, 112, 113, U4}), so that

                                  ({ui, 112, 113, U4}) = {ui, 112, u3})

If we try to repeat this process and find a linear combination of 11i, 112, 113 that equals the zero vector,
we will fail. The required homogeneous system of equations (via Theorem SLSLC [93]) has only a trivial
solution, which will not provide the kind of equation we need to remove one of the three remaining vectors.
T10 Contributed by Robert Beezer Statement [124]
This is an equality of sets, so Definition SE [684] applies.


   First show that X = ({vi, v2}) C ({vi, v2, 5v1 + 3v2}) = Y.
Choose x E X. Then x = aiv1 + a2v2 for some scalars ai and a2. Then,

                            x = aiv1 + a2v2 = aiv1 + a2v2 + 0(5vi + 3v2)


Version 2.02


﻿
                                                                     Subsection SS.SOL Solutions 132


which qualifies x for membership in Y, as it is a linear combination of v1, v2, 5v1 + 3v2.
   Now show the opposite inclusion, Y = ({vi, v2, 5v1 + 3v2}) C ({vi, v2}) = X.
Choose y E Y. Then there are scalars a1, a2, a3 such that

                                  y = aiv1 + a2v2 + a3(5vi + 3v2)

Rearranging, we obtain,

                y = aivi + a2v2 + a3(5vi + 3V2)
                  = aiv1+ a2v2 + 5a3v1+ 3a3v2                   Property DVAC [87]
                  = aiv1 + 5a3v1 + a2v2 + 3a3v2                 Property CC [86]
                  = (a1 + 5a3)v -+ (a2 + 3a3)v2                 Property DSAC [87]


This is an expression for y as a linear combination of v1 and v2, earning y membership in X. Since X is
a subset of Y, and vice versa, we see that X = Y, as desired.
T20    Contributed by Robert Beezer  Statement [125]
No matter what the elements of the set S are, we can choose the scalars in a linear combination to all be
zero. Suppose that S = {vi, V2, v3, ..., vp}. Then compute

                          OV+OV2+OV3+---+OVp=0+0+0+---+0
                                                     -0

But what if we choose S to be the empty set? The convention is that the empty sum in Definition SSCV
[112] evaluates to "zero," in this case this is the zero vector.


Version 2.02


﻿
                                                                  Section LI Linear Independence 133


Section LI
Linear Independence


Subsection LISV
Linearly Independent Sets of Vectors


Theorem SLSLC [93] tells us that a solution to a homogeneous system of equations is a linear combination
of the columns of the coefficient matrix that equals the zero vector. We used just this situation to our
advantage (twice!) in Example SCAD [120] where we reduced the set of vectors used in a span construction
from four down to two, by declaring certain vectors as surplus. The next two definitions will allow us to
formalize this situation.

Definition RLDCV
Relation of Linear Dependence for Column Vectors
Given a set of vectors S = {ui, u2, u3, ..., un}, a true statement of the form

                                1U+     2u2 +  3u3 + --+anun =0

is a relation of linear dependence on S. If this statement is formed in a trivial fashion, i.e. ai = 0,
1 < i < n, then we say it is the trivial relation of linear dependence on S.               A

Definition LICV
Linear Independence of Column Vectors
The set of vectors S = {ui, u2, u3, ..., un} is linearly dependent if there is a relation of linear depen-
dence on S that is not trivial. In the case where the only relation of linear dependence on S is the trivial
one, then S is a linearly independent set of vectors.                                            A

   Notice that a relation of linear dependence is an equation. Though most of it is a linear combination, it
is not a linear combination (that would be a vector). Linear independence is a property of a set of vectors.
It is easy to take a set of vectors, and an equal number of scalars, all zero, and form a linear combination
that equals the zero vector. When the easy way is the only way, then we say the set is linearly independent.
Here's a couple of examples.

Example LDS
Linearly dependent set in C5
Consider the set of n= 4 vectors from C5,


To determine linear independence we first form a relation of linear dependence,


     2          1          2         -6
     -1         2          1          7
a1   3   +ca2 -1 +3 -3        + a4 -1 = 0.
     1          5          6         0
     2          2          1          1


Version 2.02


﻿
                                             Subsection LI.LISV  Linearly Independent Sets of Vectors 134


We know that a1 = a2 = as = a4 = 0 is a solution to this equation, but that is of no interest whatsoever.
That is always the case, no matter what four vectors we might have chosen. We are curious to know if there
are other, nontrivial, solutions. Theorem SLSLC [93] tells us that we can find such solutions as solutions
to the homogeneous system [S(A, 0) where the coefficient matrix has these four vectors as columns,

                                             2    1    2   -6
                                             -1   2    1    7
                                      A      3   -1 -3     -1
                                             1    5    6    0
                                             2    2    1    1

Row-reducing this coefficient matrix yields,

                                           1    0   0   -2
                                           0        0    4
                                           0    0  [    -3
                                           0    0   0    0
                                           0    0   0    0

We could solve this homogeneous system completely, but for this example all we need is one nontrivial
solution. Setting the lone free variable to any nonzero value, such as x4 =1, yields the nontrivial solution

                                                    2
                                                    -4
                                              x =   3   *
                                                    1

completing our application of Theorem SLSLC [93], we have

                                2             1         2        -6
                                -1            2         1         7
                            2   3   +(-4) -1 +3 -3 +1 -1                0.
                                1             5         6         0
                                2             2         1         1

This is a relation of linear dependence on S that is not trivial, so we conclude that S is linearly dependent.


Example LIS
Linearly independent set in C5
Consider the set of nr= 4 vectors from C5,


                                T=j3 , -1 , -3 , -1 .


To determine linear independence we first form a relation of linear dependence,


     2           1          2         -6
     -1         2           1          7
a1   3   +2 -1 +3           -3 + a4 -1 =0.
     1          5           6          1
     2          2           1          1


Version 2.02


﻿
                                             Subsection LI.LISV Linearly Independent Sets of Vectors 135


We know that a1 = a2 = a3 = a4 = 0 is a solution to this equation, but that is of no interest whatsoever.
That is always the case, no matter what four vectors we might have chosen. We are curious to know if there
are other, nontrivial, solutions. Theorem SLSLC [93] tells us that we can find such solutions as solution
to the homogeneous system [S(B, 0) where the coefficient matrix has these four vectors as columns,

                                             2    1   2   -6
                                             -1   2   1    7
                                      B=     3   -1 -3    -1
                                             1    5   6    1
                                             2    2   1    1

Row-reducing this coefficient matrix yields,

                                          1 0       0   0
                                          0   [     0   0
                                          0    0   [    0
                                          0    0    0  0
                                          0    0    0   0

From the form of this matrix, we see that there are no free variables, so the solution is unique, and because
the system is homogeneous, this unique solution is the trivial solution. So we now know that there is but
one way to combine the four vectors of T into a relation of linear dependence, and that one way is the easy
and obvious way. In this situation we say that the set, T, is linearly independent.

   Example LDS [132] and Example LIS [133] relied on solving a homogeneous system of equations to
determine linear independence. We can codify this process in a time-saving theorem.

Theorem LIVHS
Linearly Independent Vectors and Homogeneous Systems
Suppose that A is an m x n matrix and S = {A1, A2, A3, ..., An} is the set of vectors in Cm that are
the columns of A. Then S is a linearly independent set if and only if the homogeneous system [S(A, 0)
has a unique solution.                                                                              D

Proof (<) Suppose that [S(A, 0) has a unique solution. Since it is a homogeneous system, this solution
must be the trivial solution x = 0. By Theorem SLSLC [93], this means that the only relation of linear
dependence on S is the trivial one. So S is linearly independent.
    (-) We will prove the contrapositive. Suppose that [S(A, 0) does not have a unique solution. Since it
is a homogeneous system, it is consistent (Theorem HSC [62]), and so must have infinitely many solutions
(Theorem PSSLS [55]). One of these infinitely many solutions must be nontrivial (in fact, almost all of
them are), so choose one. By Theorem SLSLC [93] this nontrivial solution will give a nontrivial relation
of linear dependence on 5, so we can conclude that S is a linearly dependent set.U

   Since Theorem LIVHS [134] is an equivalence, we can use it to determine the linear independence
or dependence of any set of column vectors, just by creating a corresponding matrix and analyzing the
row-reduced form. Let's illustrate this with two more examples.

Example LIHS
Linearly independent, homogeneous system
Is the set of vectors


         2      6      4
         -1     2      3
S=   3 , -1, -4
         4      3      5
         2      4       1


Version 2.02


﻿
                                             Subsection LI.LISV Linearly Independent Sets of Vectors 136


linearly independent or linearly dependent?
   Theorem LIVHS [134] suggests we study the matrix whose columns are the vectors in S,

                                                2   6    4
                                                -1  2    3
                                         A=     3  -1 -4
                                               4    3    5
                                               2    4    1

Specifically, we are interested in the size of the solution set for the homogeneous system IJS(A, 0). Row-
reducing A, we obtain
                                             1    0   0
                                             0   L1   0
                                             0    0   L1
                                             0    0   0
                                             0    0   0

Now, r = 3, so there are n - r = 3 - 3 = 0 free variables and we see that [S(A, 0) has a unique solution
(Theorem HSC [62], Theorem FVCS [55]). By Theorem LIVHS [134], the set S is linearly independent. Z

Example LDHS
Linearly dependent, homogeneous system
Is the set of vectors
                                             2      6       4
                                             -1     2       3
                                    S=       3   , -1 , -4
                                             4      3      -1
                                             2     _4 _     2
linearly independent or linearly dependent?
   Theorem LIVHS [134] suggests we study the matrix whose columns are the vectors in S,

                                                2   6    4
                                                -1  2    3
                                         A=     3  -1 -4
                                                4   3   -1
                                                2   4    2

Specifically, we are interested in the size of the solution set for the homogeneous system [S(A, 0). Row-
reducing A, we obtain
                                               0~l    -1


Now, r =2, so there are n - r =3 - 2 =1 free variables and we see that [S(A, 0) has infinitely
many solutions (Theorem HSC [62], Theorem FVCS [55]). By Theorem LIVHS [134], the set S is linearly
dependent.


   As an equivalence, Theorem LIVHS [134] gives us a straightforward way to determine if a set of vectors
is linearly independent or dependent.
   Review Example LIHS [134] and Example LDHS [135]. They are very similar, differing only in the
last two slots of the third vector. This resulted in slightly different matrices when row-reduced, and


Version 2.02


﻿
                                           Subsection LI.LISV Linearly Independent Sets of Vectors 137


slightly different values of r, the number of nonzero rows. Notice, too, that we are less interested in the
actual solution set, and more interested in its form or size. These observations allow us to make a slight
improvement in Theorem LIVHS [134].
Theorem LIVRN
Linearly Independent Vectors, r and n
Suppose that A is an m x n matrix and S = {A1, A2, A3, ..., An} is the set of vectors in Cm that are
the columns of A. Let B be a matrix in reduced row-echelon form that is row-equivalent to A and let r
denote the number of non-zero rows in B. Then S is linearly independent if and only if n =r.   Q
Proof Theorem LIVHS [134] says the linear independence of S is equivalent to the homogeneous linear
system IS(A, 0) having a unique solution. Since [S(A, 0) is consistent (Theorem HSC [62]) we can apply
Theorem CSRN [54] to see that the solution is unique exactly when n =r.                        U
   So now here's an example of the most straightforward way to determine if a set of column vectors in
linearly independent or linearly dependent. While this method can be quick and easy, don't forget the
logical progression from the definition of linear independence through homogeneous system of equations
which makes it possible.


Example LDRN
Linearly dependent, r < n
Is the set of vectors
                                    2       9     1    -3      6
                                    -1     -6     1     1     -2
                              03 -2               1     4      1
                            3        1      3     0     2 04
                                    0       2     0     1      3


                            3311212                    000
linearly independent or linearly dependent? Theorem LIVHS [134] suggests
matrix as columns and analyze the row-reduced version of the matrix,

                         2   9   1 -3    6          [1    0   0   0
                         -1 -6   1   1   -2           0  [1   0   0
                         3   -2  1   4   1    RREF,   0   0  T    0
                         1   3   0   2   4'           0   0   0   [1]
                         0   2   0   1   3            0   0   0   0
                         _3  1   1   2   2 _          0   0   0   0


we place these vectors into a


-1
1
2
1
0
0


Now we need only compute that r = 4 < 5 =
dependent set. Boom!


rn to recognize, via Theorem LIVHS [134] that S is a linearly


Example LLDS
Large linearly dependent set in C4
Consider the set of n = 9 vectors from C4,

                      -1      7      1     0      5      2      3     1    ~-6
                      3       1      2     4     -2      1      0     1    -1
                   R1      ' -3   ' -1   ' 2  '  4   ' -6   '  -3  ' 5   ' 1      *
                      2 _     6 _   -2_    9_     3 _    4 _    1 _   _3_   1 _
To employ Theorem LIVHS [134], we form a 4 x 9 coefficient matrix, C,


      -1

C      1

       2


7
1
-3
6


1
2
-1
-2


0   5   2
4 -2     1
2   4   -6
9   3   4


3    1 -6
0    1 -1
-3 5    1
1   3   1


Version 2.02


﻿
                                  Subsection LI.LINM Linear Independence and Nonsingular Matrices 138


To determine if the homogeneous system IJS(C, 0) has a unique solution or not, we would normally row-
reduce this matrix. But in this particular example, we can do better. Theorem HMVEI [64] tells us that
since the system is homogeneous with n = 9 variables in m = 4 equations, and n > m, there must be
infinitely many solutions. Since there is not a unique solution, Theorem LIVHS [134] says the set is linearly
dependent.
   The situation in Example LLDS [136] is slick enough to warrant formulating as a theorem.
Theorem MVSLD
More Vectors than Size implies Linear Dependence
Suppose that S = {ui, u2, u3, ... , un} is the set of vectors in CCm, and that n > m. Then S is a linearly
dependent set.                                                                                  D
Proof   Form the m x n coefficient matrix A that has the column vectors u2, 1 < i < n as its columns.
Consider the homogeneous system [S(A, 0). By Theorem HMVEI [64] this system has infinitely many
solutions. Since the system does not have a unique solution, Theorem LIVHS [134] says the columns of A
form a linearly dependent set, which is the desired conclusion.                                 U


Subsection LINM
Linear Independence and Nonsingular Matrices


We will now specialize to sets of n vectors from C"m. This will put Theorem MVSLD [137] off-limits, while
Theorem LIVHS [134] will involve square matrices. Let's begin by contrasting Archetype A [702] and
Archetype B [707].
Example LDCAA
Linearly dependent columns in Archetype A
Archetype A [702] is a system of linear equations with coefficient matrix,

                                              1 -1 2
                                        A42       1   1].
                                              1   1   0

Do the columns of this matrix form a linearly independent or dependent set? By Example S [71] we
know that A is singular. According to the definition of nonsingular matrices, Definition NM [71], the
homogeneous system [S(A, 0) has infinitely many solutions. So by Theorem LIVHS [134], the columns of
A form a linearly dependent set.

Example LICAB
Linearly independent columns in Archetype B
Archetype B [707] is a system of linear equations with coefficient matrix,

                                         B47 -6 -12]


Do the columns of this matrix form a linearly independent or dependent set? By Example NM [72] we
know that B is nonsingular. According to the definition of nonsingular matrices, Definition NM [71], the
homogeneous system [S(A, 0) has a unique solution. So by Theorem LIVHS [134], the columns of B form
a linearly independent set.


   That Archetype A [702] and Archetype B [707] have opposite properties for the columns of their
coefficient matrices is no accident. Here's the theorem, and then we will update our equivalences for
nonsingular matrices, Theorem NME1 [75].


Version 2.02


﻿
                                       Subsection LI.NSSLI Null Spaces, Spans, Linear Independence 139


Theorem NMLIC
Nonsingular Matrices have Linearly Independent Columns
Suppose that A is a square matrix. Then A is nonsingular if and only if the columns of A form a linearly
independent set.                                                                                 D

Proof This is a proof where we can chain together equivalences, rather than proving the two halves
separately.

        A nonsingular <-   IS(A, 0) has a unique solution           Definition NM [71]
                      <    columns of A are linearly independent   Theorem LIVHS [134]


   Here's an update to Theorem NME1 [75].

Theorem NME2
Nonsingular Matrix Equivalences, Round 2
Suppose that A is a square matrix. The following are equivalent.

  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, P1(A) = {0}.

  4. The linear system [S(A, b) has a unique solution for every possible choice of b.

  5. The columns of A form a linearly independent set.


Proof Theorem NMLIC [138] is yet another equivalence for a nonsingular matrix, so we can add it to
the list in Theorem NME1 [75].                                                                   U


Subsection NSSLI
Null Spaces, Spans, Linear Independence


In Subsection SS.SSNS [117] we proved Theorem SSNS [118] which provided n - r vectors that could be
used with the span construction to build the entire null space of a matrix. As we have hinted in Example
SCAD [120], and as we will see again going forward, linearly dependent sets carry redundant vectors
with them when used in building a set as a span. Our aim now is to show that the vectors provided by
Theorem SSNS [118] form a linearly independent set, so in one sense they are as efficient as possible a
way to describe the null space. Notice that the vectors z3, 1 5 j   n~ - r first appear in the vector form
of solutions to arbitrary linear systems (Theorem VFSLS [99]). The exact same vectors appear again in
the span construction in the conclusion of Theorem SSNS [118]. Since this second theorem specializes
to homogeneous systems the only real difference is that the vector c in Theorem VFSLS [99] is the zero
vector for a homogeneous system. Finally, Theorem BNS [139] will now show that these same vectors are a
linearly independent set. We'll set the stage for the proof of this theorem with a moderately large example.


Study the example carefully, as it will make it easier to understand the proof.

Example LINSB
Linear independence of null space basis


Version 2.02


﻿
                                         Subsection LI.NSSLI Null Spaces, Spans, Linear Independence 140


Suppose that we are interested in the null space of the a 3 x 7 matrix, A, which row-reduces to

                                         1   0   -2 4     0  3   9
                                  B      0  []5       6   0  7    1
                                        0    0    0   0 [    8 -5]

The set F = {3, 4, 6, 7} is the set of indices for our four free variables that would be used in a description
of the solution set for the homogeneous system IJS(A, 0). Applying Theorem SSNS [118] we can begin to
construct a set of four vectors whose span is the null space of A, a set of vectors we will reference as T.


                                                           1     0     0    0
                    N(A)=(T)=({ziz2,z3,z4})=               0 , 1 , 0 , 0

                                                           0     0     1    0
                                                           0     0     0 _   1

So far, we have constructed as much of these individual vectors as we can, based just on the knowledge of
the contents of the set F. This has allowed us to determine the entries in slots 3, 4, 6 and 7, while we have
left slots 1, 2 and 5 blank. Without doing any more, lets ask if T is linearly independent? Begin with a
relation of linear dependence on T, and see what we can learn about the scalars,

                                0O = iZ1 + a2Z2 + a3Z3 + a4Z4
                              0
                              0
                              0         1         0        0         0
                              0 o=ai 0 +a2 1 +a3 0 +a4 0
                              0
                              0         0         0        1         0
                              0         0         0        0         1


                                     1i       0       0       0       c
                                     0    +  a2 +     0   +   0   =   a2

                                     0        0      O3       0       O3
                                     0        0       0 _    a4      _O4

Applying Definition CVE [84] to the two ends of this chain of equalities, we see that ai = 2 =Os a3  0.
So the only relation of linear dependence on the set T is a trivial one. By Definition LICV [132] the set T
is linearly independent. The important feature of this example is how the "pattern of zeros and ones"~ in
the four vectors led to the conclusion of linear independence.
    The proof of Theorem BNS [139] is really quite straightforward, and relies on the "pattern of zeros
and ones" that arise in the vectors zi, 1 i   n~ - r in the entries that correspond to the free variables.
Play along with Example LINSB [138] as you study the proof. Also, take a look at Example VESAD
[95], Example VFSAI [102] and Example VFSAL [103], especially at the conclusion of Step 2 (temporarily
ignore the construction of the constant vector, c). This proof is also a good first example of how to prove
a conclusion that states a set is linearly independent.


Theorem BNS
Basis for Null Spaces
Suppose that A is an m x n matrix, and B is a row-equivalent matrix in reduced row-echelon form with r


Version 2.02


﻿
                                        Subsection LI.NSSLI Null Spaces, Spans, Linear Independence 141


nonzero rows. Let D = {di, d2, d3, ..., dr} and F = {fi, f2, f3, -.., fn-r} be the sets of column indices
where B does and does not (respectively) have leading 1's. Construct the n - r vectors z3, 1 < j rn - r
of size n as
                                           11        if i E F, if
                                  [z;] =0            if i E F, i # f
                                           1-[B]k f if i ED, i=dk
Define the set S = {zi, z2, z3, ..., znr}. Then
  1. P(A)     (S).

  2. S is a linearly independent set.


Proof Notice first that the vectors z3, 1 < j < n - r are exactly the same as the n - r vectors defined in
Theorem SSNS [118]. Also, the hypotheses of Theorem SSNS [118] are the same as the hypotheses of the
theorem we are currently proving. So it is then simply the conclusion of Theorem SSNS [118] that tells us
that P1(A) = (S). That was the easy half, but the second part is not much harder. What is new here is
the claim that S is a linearly independent set.
   To prove the linear independence of a set, we need to start with a relation of linear dependence and
somehow conclude that the scalars involved must all be zero, i.e. that the relation of linear dependence
only happens in the trivial fashion. So to establish the linear independence of S, we start with

                              O1Z1 + O2Z2 + O3Z3 + ... +  nrzn-r = 0.
For each j, 1 < j < n - r, consider the equality of the individual entries of the vectors on both sides of
this equality in position fj,

          0   [0] .
          = [aizl + a2z2 + asz3 + ... + -n-rzn-r]f.                    Definition CVE [84]

              [aizi]f. + [a2Z2]f. + [cz3]f, + ... + [an-rzn-r]f,   Definition CVA [84]
              =  [zi] + a2 [z2]f + a3 [z3]f, + ... +

              aej_1 [zj-1]f + a% [zj]1, + aj+1 [zj+1]f + ... +
              an-r [z-r]1                                              Definition CVSM [85]
              = ai(0) + e2 (0) + as3(0) + -.-.-+
              cg_1(0) + ac (1) + cj+1(0) + - - - + Can-r (0)    Definition of zj


So for all j, 1 j < n - r we have o= 0, which is the conclusion that tells us that the oniy relation of
linear dependence on S ={zi, z2, z3, ..., za~r} is the trivial one. Hence, by Definition LICV [132] the
set is linearly independent, as desired.U

Example NSLIL
Null space spanned by linearly independent set, Archetype L
In Example VESAL [103] we previewed Theorem SSNS [118] by finding a set of two vectors such that their
span was the null space for the matrix in Archetype L [750]. Writing the matrix as L, we have


              -1      2
              2      -2
N(L) = -2  , 1 .
               1      0
               _0 _ _ 1 _


Version 2.02


﻿
                                                            Subsection LI.READ   Reading Questions 142


Solving the homogeneous system IJS(L, 0) resulted in recognizing x4 and x5 as the free variables. So look
in entries 4 and 5 of the two vectors above and notice the pattern of zeros and ones that provides the linear
independence of the set.


Subsection READ
Reading Questions


  1. Let S be the set of three vectors below.

                                             1_       .3       4
                                       S =      2   , -4 , -2
                                               -1      2       1

     Is S linearly independent or linearly dependent? Explain why.

  2. Let S be the set of three vectors below.

                                               11      3      4
                                        S=      -1 , 2 ,      3
                                                 0     2     -4

     Is S linearly independent or linearly dependent? Explain why.

  3. Based on your answer to the previous question, is the matrix below singular or nonsingular? Explain.

                                                 1   3   4
                                                 -1 2    3
                                                 0 2 -4


Version 2.02


﻿
                                                                    Subsection LI.EXC Exercises 143


Subsection EXC
Exercises


Determine if the sets of vectors in Exercises C20-C25 are linearly independent or linearly dependent.
          1 _     2     1
C20       -2]   [-1]   [5
        .1 1 . .3 _     0_
Contributed by Robert Beezer   Solution [146]

          -1      3      7

C21    {Y       [      [6
          4   '-1 '-6

Contributed by Robert Beezer Solution [146]

          .1_   6      9       2      3
C22       5  ,   1 , -3    ,   8  , -2
          .1   .2_     8      -1      0 _
Contributed by Robert Beezer Solution [146]

           1      3      2     1
           -2     3      1     0
C23       2   ,1         2     1
          5       2     -1     2
          I3     -4_     1_    2
Contributed by Robert Beezer Solution [146]

           1      3      4     -1
           2      2      4      2
C24       -1  , -1     ,-2,    -1
          0       2      2     -2
          1  _   _2 _   _3 _   _0 _
Contributed by Robert Beezer Solution [146]

          2       4      10
          1      -2     -7
C25       3{j ,       ,  0


Contributed by Robert Beezer Solution [147]

C30 For the matrix B below, find a set S that is linearly independent and spans the null space of B,
that is, Af(B) =(S).

                                         -3~ 1-22 7


Contributed by Robert Beezer Solution [147]

C31 For the matrix A below, find a linearly independent set S so that the null space of A is spanned by


Version 2.02


﻿
                                                                      Subsection LI.EXC  Exercises 144


S, that is, N(A) = (S).
                                             -1 -2 2 1 5
                                             1     2   1 1 5
                                             4 3   6   1 2 7
                                             2     4   0 1 2

Contributed by Robert Beezer Solution [147]

C32 Find a set of column vectors, T, such that (1) the span of T is the null space of B, (T) = N(B)
and (2) T is a linearly independent set.

                                              2    1    1   1
                                       B =   -4 -3      1   -7
                                              1    1   -1   3

Contributed by Robert Beezer Solution [148]

C33    Find a set S so that S is linearly independent and N(A)  (S), where N(A) is the null space of the
matrix A below.

                                            2 3    3    1    4
                                      A=    1 1 -1 -1 -3
                                            3 2 -8 -1        1

Contributed by Robert Beezer Solution [148]

C50 Consider each archetype that is a system of equations and consider the solutions listed for the
homogeneous version of the archetype. (If only the trivial solution is listed, then assume this is the only
solution to the system.) From the solution set, determine if the columns of the coefficient matrix form
a linearly independent or linearly dependent set. In the case of a linearly dependent set, use one of the
sample solutions to provide a nontrivial relation of linear dependence on the set of columns of the coefficient
matrix (Definition RLD [308]). Indicate when Theorem MVSLD [137] applies and connect this with the
number of variables and equations in the system of equations.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]
Archetype G [729]/Archetype H [733]
Archetype I [737]
Archetype J [741]

Contributed by Robert Beezer

C51 For each archetype that is a system of equations consider the homogeneous version. Write elements
of the solution set in vector form (Theorem VFSLS [99]) and from this extract the vectors z3 described
in Theorem BNS [139]. These vectors are used in a span construction to describe the null space of the
coefficient matrix for each archetype. What does it mean when we write a null space as ({ })?
Archetype A [702]


Archetype B [707]
Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]


Version 2.02


﻿
                                                                     Subsection LI.EXC Exercises 145


Archetype G [729]/Archetype H [733]
Archetype I [737]
Archetype J [741]

Contributed by Robert Beezer

C52 For each archetype that is a system of equations consider the homogeneous version. Sample solutions
are given and a linearly independent spanning set is given for the null space of the coefficient matrix. Write
each of the sample solutions individually as a linear combination of the vectors in the spanning set for the
null space of the coefficient matrix.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]
Archetype G [729]/Archetype H [733]
Archetype I [737]
Archetype J [741]

Contributed by Robert Beezer

C60 For the matrix A below, find a set of vectors S so that (1) S is linearly independent, and (2) the
span of S equals the null space of A, (S) = N(A). (See Exercise SS.C60 [124].)

                                             1   1    6   -8
                                      A41        -2   0    1
                                            -2   1   -6    7_

Contributed by Robert Beezer Solution [149]

M50    Consider the set of vectors from C3, W, given below. Find a set T that contains three vectors from
W and such that W = (T).

                                                   2    -1      1    3      0
                 W = ({vi, v2, v3, v4, v5}) =   1, -1 ,2 ,1 ,1
                                                   1_    1 _    3_   3_    -3_

Contributed by Robert Beezer Solution [149]

T1O Prove that if a set of vectors contains the zero vector, then the set is linearly dependent. (Ed. "The
zero vector is death to linearly independent sets.")
Contributed by Martin Jackson

T12 Suppose that S is a linearly independent set of vectors, and T is a subset of 5, T G S (Definition
SSET [683]). Prove that T is linearly independent.
Contributed by Robert Beezer

T13 Suppose that T is a linearly dependent set of vectors, and T is a subset of 5, T G S (Definition
SSET [683]). Prove that S is linearly dependent.
Contributed by Robert Beezer


T15   Suppose that {v1, v2, v3, ..., vn} is a set of vectors. Prove that

                               {fvi - V2, V2 - V3, V3 - V4, ..., vn - V1}


Version 2.02


﻿
                                                                      Subsection LI.EXC  Exercises 146


is a linearly dependent set.

Contributed by Robert Beezer Solution [150]

T20 Suppose that {v1, v2, v3, v4} is a linearly independent set in C35. Prove that

                             {vi, Vi + V2, Vi + V2 + V3, Vi + V2 + V3 + V4}

is a linearly independent set.
Contributed by Robert Beezer Solution [150]

T50    Suppose that A is an m x n matrix with linearly independent columns and the linear system
IJS(A, b) is consistent. Show that this system has a unique solution. (Notice that we are not requiring A
to be square.)
Contributed by Robert Beezer Solution [151]


Version 2.02


﻿
                                                                   Subsection LI.SOL Solutions 147


Subsection SOL
Solutions


C20    Contributed by Robert Beezer   Statement [142]
With three vectors from C3, we can form a square matrix by making these three vectors the columns of a
matrix. We do so, and row-reduce to obtain,

                                              1 0    0
                                            o0      o
                                            0   0

the 3 x 3 identity matrix. So by Theorem NME2 [138] the original matrix is nonsingular and its columns
are therefore a linearly independent set.

C21    Contributed by Robert Beezer   Statement [142]
Theorem LIVRN [136] says we can answer this question by putting theses vectors into a matrix as columns
and row-reducing. Doing this we obtain,
                                            1001
                                            0 i 0

                                            0   0    0_
With n = 3 (3 vectors, 3 columns) and r = 3 (3 leading 1's) we have n = r and the theorem says the
vectors are linearly independent.

C22    Contributed by Robert Beezer   Statement [142]
Five vectors from C3. Theorem MVSLD [137] says the set is linearly dependent. Boom.

C23    Contributed by Robert Beezer   Statement [142]
Theorem LIVRN [136] suggests we analyze a matrix whose columns are the vectors of S,

                                             1   3    2  1
                                             -2  3    1   0
                                      A     2    1    2   1
                                            5    2   -1 2
                                            3   -4    1   2_

Row-reducing the matrix A yields,
                                          0        0   0

                                          0   0        0

                                          0   0    0   0

We see that r =4 =rn, where r is the number of nonzero rows and n~ is the number of columns. By
Theorem LIVRN [136], the set S is linearly independent.

C24 Contributed by Robert Beezer Statement [142]
Theorem LIVRN [136] suggests we analyze a matrix whose columns are the vectors from the set,


       1   3    4   -1
       2   2    4    2
A=    -1   -1  -2   -1
       0   2    2   -2
       1   2    3    0


Version 2.02


﻿
                                                                    Subsection LI.SOL Solutions 148


Row-reducing the matrix A yields,
                                           1   0   1   2
                                           0  [    1 -1
                                           0   0   0   0
                                           0   0   0   0
                                           0   0   0   0_
We see that r = 2 # 4 = n, where r is the number of nonzero rows and n is the number of columns. By
Theorem LIVRN [136], the set S is linearly dependent.
C25    Contributed by Robert Beezer   Statement [142]
Theorem LIVRN [136] suggests we analyze a matrix whose columns are the vectors from the set,

                                              2    4   10
                                              1   -2   -7
                                        A=     3   1    0
                                              -1   3   10
                                              2    2    4

Row-reducing the matrix A yields,
                                              0~l    -1
                                            0   2    3
                                            0    0   0
                                            0    0   0
                                            0    0   0_
We see that r = 2 # 3 = n, where r is the number of nonzero rows and n is the number of columns. By
Theorem LIVRN [136], the set S is linearly dependent.
C30    Contributed by Robert Beezer   Statement [142]
The requested set is described by Theorem BNS [139]. It is easiest to find by using the procedure of Example
VFSAL [103]. Begin by row-reducing the matrix, viewing it as the coefficient matrix of a homogeneous
system of equations. We obtain,
                                             1 0   1 -2
                                           0  F1   1   1
                                           0   0   0   0
Now build the vector form of the solutions to this homogeneous system (Theorem VFSLS [99]). The free
variables are x3 and x4, corresponding to the columns without leading 1's,


The desired set S is simply the constant vectors in this expression, and these are the vectors zi and z2
described by Theorem BNS [139].


                                            .Lo] [1]

C31    Contributed by Robert Beezer   Statement [142]
Theorem BNS [139] provides formulas for n - r vectors that will meet the requirements of this question.


Version 2.02


﻿
                                                                      Subsection LI.SOL Solutions 149


These vectors are the same ones listed in Theorem VFSLS [99] when we solve the homogeneous system
IS(A, 0), whose solution set is the null space (Definition NSM [64]).
   To apply Theorem BNS [139] or Theorem VFSLS [99] we first row-reduce the matrix, resulting in

                                            B2      0   0    3
                                     B=     0   0F1     0    6
                                            0   0   0  W1-4
                                            0   0  0    0    0

So we see that n - r = 5 - 3 = 2 and F = {2, 5}, so the vector form of a generic solution vector is

                                      Xi         -2         -3
                                      X2          1          0
                                      X3 =2       0   +x5 -6
                                      X4          0          4
                                      _5_       _ 0 _        1

So we have
                                                 --2     -3
                                                   1      0
                                    N(A) =        0      -6
                                                   0      4


C32 Contributed by Robert Beezer Statement [143]
The conclusion of Theorem BNS [139] gives us everything this question asks for. We need the reduced
row-echelon form of the matrix so we can determine the number of vectors in T, and their entries.

                             2     1   1                 1   0    2   -2
                             -4   -3   1   -7   RREF     0W1     -3    5
                             1    1   -1    37 _R        0   0      00_

We can build the set T in immediately via Theorem BNS [139], but we will illustrate its construction in
two steps. Since F = {3, 4}, we will have two vectors and can distribute strategically placed ones, and
many zeros. Then we distribute the negatives of the appropriate entries of the non-pivot columns of the
reduced row-echelon matrix.


                           (1    i' 0                               0 1    2'


C33 Contributed by Robert Beezer Statement [143]
A direct application of Theorem BNS [139] will provide the desired set. We require the reduced row-echelon
form of A.


                          2 3    3    1   4           i     0   -6   0   3
                          1 1 -1 -1 -3         RREF:0[-         5    0   -2
                          3 2 -8 -1       1_0                   0 0[24

The non-pivot columns have indices F = {3, 5}. We build the desired set in two steps, first placing the
requisite zeros and ones in locations based on F, then placing the negatives of the entries of columns 3 and


Version 2.02


﻿
                                                                     Subsection LI.SOL Solutions 150


5 in the proper locations. This is all specified in Theorem BNS [139].

                                                        6      -3-
                                                        -5     2
                               S=      1 , 0      =     1   ,  0
                                                        0      -4
                                       _0_Y _1_         0       1


C60 Contributed by Robert Beezer Statement [144]
Theorem BNS [139] says that if we find the vector form of the solutions to the homogeneous system
IJS(A, 0), then the fixed vectors (one per free variable) will have the desired properties. Row-reduce A,
viewing it as the augmented matrix of a homogeneous system with an invisible columns of zeros as the last
column,
                                           1    0  4    -
                                           0   [-1  2
                                           0    0  0   0

Moving to the vector form of the solutions (Theorem VFSLS [99]), with free variables x3 and x4, solutions
to the consistent system (it is homogeneous, Theorem HSC [62]) can be expressed as

                                       zi        -4         5
                                       [27k  1[-2] +143
                                       X33        1      40
                                       _z4__0       _       1_

Then with S given by
                                                -4     5
                                        S =      -2    3
                                                 1   ' 0
                                                 0     -1

Theorem BNS [139] guarantees the set has the desired properties.

M50     Contributed by Robert Beezer  Statement [144]
We want to first find some relations of linear dependence on {vi, v2, v3, v4, v5} that will allow us to
"kick out" some vectors, in the spirit of Example SCAD [120]. To find relations of linear dependence, we
formulate a matrix A whose columns are v1, v2, v3, v4, v5. Then we consider the homogeneous system of
equations [S(A, 0) by row-reducing its coefficient matrix (remember that if we formulated the augmented
matrix we would just add a column of zeros). After row-reducing, we obtain


                                         0 2 0 1 -


From this we that solutions can be obtained employing the free variables 14 and z5. With appropriate
choices we will be able to conclude that vectors v4 and v5 are unnecessary for creating W via a span. By
Theorem SLSLC [93] the choice of free variables below lead to solutions and linear combinations, which
are then rearranged.


X4 = 1,z5 = 0           (-2)vi+(-1)v2+(0)v3+(1)v4+(0)v5=0                      v4=2vi+ v2
X4 = 0,5= 1         (1)vi+ (2)v2 + (0)v3 + (0)v4+ (1)v5 = 0  v5=-vi - 2v2


Version 2.02


﻿
                                                                      Subsection LI.SOL Solutions 151


Since v4 and v5 can be expressed as linear combinations of vi and v2 we can say that v4 and v5 are not
needed for the linear combinations used to build W (a claim that we could establish carefully with a pair
of set equality arguments). Thus

                                                      2     ~-1     1
                            W = ({vi, V2, v3}) =      1 ,-1 ,2
                                                      1      1     3

That the {v1, v2, v3} is linearly independent set can be established quickly with Theorem LIVRN [136].
   There are other answers to this question, but notice that any nontrivial linear combination of vi, v2, v3, v4, v5
will have a zero coefficient on v3, so this vector can never be eliminated from the set used to build the
span.
T15    Contributed by Robert Beezer  Statement [144]
Consider the following linear combination

                      1(VI-V2) +1(V2i- V3)+1(V3 - V4)+ - - V-+1(v+ - Vi)


                                   = 0

This is a nontrivial relation of linear dependence (Definition RLDCV [132]), so by Definition LICV [132]
the set is linearly dependent.
T20    Contributed by Robert Beezer  Statement [145]
Our hypothesis and our conclusion use the term linear independence, so it will get a workout. To establish
linear independence, we begin with the definition (Definition LICV [132]) and write a relation of linear
dependence (Definition RLDCV [132]),

                ai (vi) + a2 (vi + v2) + as (vi + v2 + v3) + a4 (vi + v2 + v3 + v4) = 0

Using the distributive and commutative properties of vector addition and scalar multiplication (Theorem
VSPCV [86]) this equation can be rearranged as

                (ai + a2 + a3 + a4) Vi + (a2 + as + a4) v2 + (as + a4) v3 + (a4) v4 =0

However, this is a relation of linear dependence (Definition RLDCV [132]) on a linearly independent set,
{vi, v2, v3, v4} (this was our lone hypothesis). By the definition of linear independence (Definition LICV
[132]) the scalars must all be zero. This is the homogeneous system of equations,


                                             a~2 + as3 +| a4 = 0
                                                  as3 +| a4 = 0
                                                       a{4 =0

Row-reducing the coefficient matrix of this system (or backsolving) gives the conclusion

                     a10               2=0                a=0                   a4=0

This means, by Definition LICV [132], that the original set


                            {vi, vi + v2, vi + V2 + V3, Vi + V2 + V3 + V4}

is linearly independent.


Version 2.02


﻿
                                                                     Subsection LI.SOL Solutions 152


T50    Contributed by Robert Beezer  Statement [145]
Let A = [A1|A2|A3|. . .|An]. IS(A, b) is consistent, so we know the system has at least one solution
(Definition CS [50]). We would like to show that there are no more than one solution to the system.
Employing Technique U [693], suppose that x and y are two solution vectors for [S(A, b). By Theorem
SLSLC [93] we know we can write,

                            b = [x]1 A1 + [x]2 A2 + [x]3 A3 + . ..+ [x]1 An
                            b = [y]1A1+ [y]2 A2 + [y]3 A3 +-...+ [y]n An

Then

               0=b-b
                  =([x]1 A1 + [x]2 A2 + -.-. + [x]n An) - ([y]1 A1 + [y]2 A42 + -.-. + [y], An)
                  = ([x]1 - [y]1)A1 + ([x]2 - [y]2) A2 +-...+ ([x],- [y]n) An

This is a relation of linear dependence (Definition RLDCV [132]) on a linearly independent set (the columns
of A). So the scalars must all be zero,

            1x]1 - [Y]1 = 0         [x]2 - Ey]2 = 0          .. .          [x] - [y], = 0

Rearranging these equations yields the statement that [x]2 = [y]2, for 1 < i < n. However, this is exactly
how we define vector equality (Definition CVE [84]), so x= y and the system has only one solution.


Version 2.02


﻿
                                                        Section LDS Linear Dependence and Spans 153


Section LDS
Linear Dependence and Spans


In any linearly dependent set there is always one vector that can be written as a linear combination of
the others. This is the substance of the upcoming Theorem DLDS [152]. Perhaps this will explain the use
of the word "dependent." In a linearly dependent set, at least one vector "depends" on the others (via a
linear combination).
   Indeed, because Theorem DLDS [152] is an equivalence (Technique E [690]) some authors use this
condition as a definition (Technique D [687]) of linear dependence. Then linear independence is defined as
the logical opposite of linear dependence. Of course, we have chosen to take Definition LICV [132] as our
definition, and then follow with Theorem DLDS [152] as a theorem.

Subsection LDSS
Linearly Dependent Sets and Spans


If we use a linearly dependent set to construct a span, then we can always create the same infinite set with
a starting set that is one vector smaller in size. We will illustrate this behavior in Example RSC5 [153].
However, this will not be possible if we build a span from a linearly independent set. So in a certain sense,
using a linearly independent set to formulate a span is the best possible way there aren't any extra
vectors being used to build up all the necessary linear combinations. OK, here's the theorem, and then
the example.
Theorem DLDS
Dependency in Linearly Dependent Sets
Suppose that S = {ui, u2, u3, ..., un} is a set of vectors. Then S is a linearly dependent set if and only if
there is an index t, 1 < t < n such that ut is a linear combination of the vectors ui, u2, u3, ... , ut-1, ut+1, ... , Us.


Proof (-) Suppose that S is linearly dependent, so there exists a nontrivial relation of linear dependence
by Definition LICV [132]. That is, there are scalars, aj, 1 < i < n, which are not all zero, such that

                               a1U1 + a2U2 + a3U3 + --. + anun = 0.

Since the ai cannot all be zero, choose one, say at, that is nonzero. Then,
            -1
       ut =     (-a~tut)                                                Property MICN [681]

          =     (ciui + - - + a_1ut-1 + at+1ut+1 + - - + ainus)     Theorem VSPCV [86]
            -ai_           -at-1        -at+1              -an_
            =   ui + - - +       ut-1 +       ut+1 + - - +    un        Theorem VSPCV [86]

Since the values of g are again scalars, we have expressed ut as a linear combination of the other elements
of S.
    (<-) Assume that the vector ut is a linear combination of the other vectors in S. Write this linear
combination, denoting the relevant scalars as #31, /#2, - - -, #t1 /3t+1, -. - - , as


                     Ut   /31u1 + /2u2 + - + /3t-1ut-1 + /3t+1ut1 + --- + /3nun

Then we have

        /31ui + ---.+/3t-1ut- + (-1)ut +/3t+1ut+1 + ...+  nun


Version 2.02


﻿
                                            Subsection LDS.LDSS  Linearly Dependent Sets and Spans 154


              = ut + (-1)ut                                             Theorem VSPCV [86]
              = (1 + (-1)) ut                                           Property DSAC [87]
              = Out                                                     Property AICN [681]
              = 0                                                      Definition CVSM [85]

So the scalars /31, /32, /33, -...-, 3t-1, 3t = -1, ±it+1, ..., /n provide a nontrivial linear combination of the
vectors in S, thus establishing that S is a linearly dependent set (Definition LICV [132]).
   This theorem can be used, sometimes repeatedly, to whittle down the size of a set of vectors used in a
span construction. We have seen some of this already in Example SCAD [120], but in the next example
we will detail some of the subtleties.
Example RSC5
Reducing a span in C5
Consider the set of n = 4 vectors from C5,

                                                   1      2      0      4
                                                   2      1     -7      1
                        R    {v1, v2, v3, v4}  { -1 , 3 ,        6   , 2
                                                   3      1     -11     1
                                                   _2 _   2_   _-2  _   -6_

and define V = (R).
   To employ Theorem LIVHS [134], we form a 5 x 4 coefficient matrix, D,

                                               1  2    0   4
                                               2   1  -7   1
                                       D      -1 3     6   2
                                              3    1 -11   1
                                              2   2   -2   6

and row-reduce to understand solutions to the homogeneous system [S(D, 0),

                                          ft2   0   0   4
                                            0  [     0  0
                                            0   0   [   1 .
                                            0   0   0   0
                                            0   0   0   0

We can find infinitely many solutions to this system, most of them nontrivial, and we choose any one we
like to build a relation of linear dependence on R. Let's begin with z4  1, to find the solution


So we can write the relation of linear dependence,

                                  (-4)vi + 0v2 + (-1)v3 + 1v4 =0.


Theorem DLDS [152] guarantees that we can solve this relation of linear dependence for some vector in
R, but the choice of which one is up to us. Notice however that v2 has a zero coefficient. In this case, we
cannot choose to solve for v2. Maybe some other relation of linear dependence would produce a nonzero


Version 2.02


﻿
                                                          Subsection LDS.COV  Casting Out Vectors 155


coefficient for v2 if we just had to solve for this vector. Unfortunately, this example has been engineered
to always produce a zero coefficient here, as you can see from solving the homogeneous system. Every
solution has x2 = 0!
    OK, if we are convinced that we cannot solve for v2, let's instead solve for v3,

                               V3 = (-4)vi + Ov2 + 1v4 = (-4)vi + 1v4.

We now claim that this particular equation will allow us to write

                             V = (R) =({vi, v2, V3, V4}) =({vi, V2, V4})

in essence declaring v3 as surplus for the task of building V as a span. This claim is an equality of two
sets, so we will use Definition SE [684] to establish it carefully. Let R' =f{v1, v2, v4} and V' = (R'). We
want to show that V = V'.
    First show that V' C V. Since every vector of R' is in R, any vector we can construct in V' as a linear
combination of vectors from R' can also be constructed as a vector in V by the same linear combination
of the same vectors in R. That was easy, now turn it around.
    Next show that V C V'. Choose any v from V. Then there are scalars 0i, a2, a3, a4 so that

                             V = aivi + O2V2 + O3V3 + O4V4
                               = aiv1 + a2V2 + a3 ((-4)vi + 1v4) + a4V4
                               = a1v1 + a2v2 + ((-4c3)vi -+ ca3v4) + a4v4
                               =(ai - 4a3) vi + +a2V2 + (a3 + a4) v4.

This equation says that v can then be written as a linear combination of the vectors in R' and hence
qualifies for membership in V'. So V C V' and we have established that V = V'.
    If R' was also linearly dependent (it is not), we could reduce the set even further. Notice that we could
have chosen to eliminate any one of vi, v3 or V4, but somehow v2 is essential to the creation of V since it
cannot be replaced by any linear combination of vi, v3 or V4.


Subsection COV
Casting Out Vectors


In Example RSC5 [153] we used four vectors to create a span. With a relation of linear dependence in
hand, we were able to "toss-out" one of these four vectors and create the same span from a subset of
just three vectors from the original set of four. We did have to take some care as to just which vector
we tossed-out. In the next example, we will be more methodical about just how we choose to eliminate
vectors from a linearly dependent set while preserving a span.

Example COV
Casting out vectors
We begin with a set S containing seven vectors from C4,


                          IL-']    [-4]    [2]    [4]    [8]     [-31]   [37])

and define W = (S). The set S is obviously linearly dependent by Theorem MVSLD [137], since we have
n = 7 vectors from C4. So we can slim down S some, and still create W as the span of a smaller set of


Version 2.02


﻿
                                                        Subsection LDS.COV Casting Out Vectors 156


vectors. As a device for identifying relations of linear dependence among the vectors of S, we place the
seven column vectors of S into a matrix as columns,

                                               1    4   0   -1    0    7   -9
                          A=.. A71             2    8   -1   3    9   -13   7
                              [A1A2A0               0   2   -3 -4     12   -8
                                               0    03-1 -4 2 4   8   -31   37_

By Theorem SLSLC [93] a nontrivial solution to IJS(A, 0) will give us a nontrivial relation of linear
dependence (Definition RLDCV [132]) on the columns of A (which are the elements of the set S). The
row-reduced form for A is the matrix

                                       1   4   0   0   2   1  -3]

                                B      0   0 2     0   1 -3    5
                                       0   0   0  2    2 -6    6
                                       0   0  0    0  0   0    0]

so we can easily create solutions to the homogeneous system [S(A, 0) using the free variables x2, x5, x6, x7.
Any such solution will correspond to a relation of linear dependence on the columns of B. These solutions
will allow us to solve for one column vector as a linear combination of some others, in the spirit of Theorem
DLDS [152], and remove that vector from the set. We'll set about forming these linear combinations
methodically. Set the free variable x2 to one, and set the other free variables to zero. Then a solution to
[S(A, 0) is
                                                  -4-
                                                  1
                                                  0
                                             X=    0
                                                   0
                                                   0
                                                   0
which can be used to create the linear combination

                         (-4)A1+ 1A2 + 0A3 + 0A4 +0A5 +0A6 +0A7 = 0

This can then be arranged and solved for A2, resulting in A2 expressed as a linear combination of
{A1, A3, A4},
                                      A2 =4A1+0A3+0A4

This means that A2 is surplus, and we can create W just as well with a smaller set with this vector
removed,
                                 W= ({A1, A3, A4, A5, A6, A7})
Technically, this set equality for W requires a proof, in the spirit of Example RSC5 [153], but we will
bypass this requirement here, and in the next few paragraphs.
   Now, set the free variable x5 to one, and set the other free variables to zero. Then a solution to
IJS(B, 0) is

                                                  -2o~


x =   -2
      1
      0
      _0 _


Version 2.02


﻿
                                                       Subsection LDS.COV   Casting Out Vectors 157


which can be used to create the linear combination

                     (-2)A1 + 0A2 + (-1)A3 + (-2)A4 + 1A5 + 0A6 + 0A7       0

This can then be arranged and solved for A5, resulting in A5 expressed as a linear combination of
{A1, A3, A4},
                                      A5=2A1+1A3+2A4

This means that A5 is surplus, and we can create W just as well with a smaller set with this vector
removed,
                                   W = ({A1, A3, A4, A6, A7})

   Do it again, set the free variable x6 to one, and set the other free variables to zero. Then a solution to
IJS(B, 0) is
                                                 -1
                                                 0
                                                 3
                                            x= 6
                                                  0
                                                  1
                                                  0
which can be used to create the linear combination

                        (-1)A1+ 0A2 + 3A3 + 6A4 + 0A5 + 1A6 + 0A7=0

This can then be arranged and solved for A6, resulting in A6 expressed as a linear combination of
{A1, A3, A4},
                                   A6 =1A1 + (-3)A3 + (-6)A4

This means that A6 is surplus, and we can create W just as well with a smaller set with this vector
removed,
                                     W = ({A1, A3, A4, A7})
   Set the free variable x7 to one, and set the other free variables to zero. Then a solution to [S(B, 0) is

                                                  3
                                                  0
                                                  -5
                                            x =-6
                                                  0
                                                  0
                                                  _ 1 _

which can be used to create the linear combination

                       3A1 + 0A2 + (-5)A3 + (-6)A4 + 0A5 + 0A6 + 1A7=0

This can then be arranged and solved for A7, resulting in A7 expressed as a linear combination of
{A1, A3, A4},
                                    A7 =(-3)A1 + 5A3 + 6A4


This means that A7 is surplus, and we can create W just as well with a smaller set with this vector
removed,
                                       W = ({A1, A3, A4})


Version 2.02


﻿
                                                          Subsection LDS.COV   Casting Out Vectors 158


   You might think we could keep this up, but we have run out of free variables. And not coincidentally,
the set {A1, A3, A4} is linearly independent (check this!). It should be clear how each free variable was
used to eliminate the corresponding column from the set used to span the column space, as this will be the
essence of the proof of the next theorem. The column vectors in S were not chosen entirely at random, they
are the columns of Archetype I [737]. See if you can mimic this example using the columns of Archetype
J [741]. Go ahead, we'll go grab a cup of coffee and be back before you finish up.
    For extra credit, notice that the vector
                                                     3

                                               b =
                                                     1
                                                     4
is the vector of constants in the definition of Archetype I [737]. Since the system IJS(A, b) is consistent,
we know by Theorem SLSLC [93] that b is a linear combination of the columns of A, or stated equivalently,
b E W. This means that b must also be a linear combination of just the three columns A1, A3, A4. Can
you find such a linear combination? Did you notice that there is just a single (unique) answer? Hmmmm.


    Example COV [154] deserves your careful attention, since this important example motivates the fol-
lowing very fundamental theorem.
Theorem BS
Basis of a Span
Suppose that S = {vi, v2, v3, ..., vn} is a set of column vectors. Define W = (S) and let A be the
matrix whose columns are the vectors from S. Let B be the reduced row-echelon form of A, with D
{di, d2, d3, ..., dr } the set of column indices corresponding to the pivot columns of B. Then

  1. T = {vd1, vd2, vd3, ... vdr} is a linearly independent set.

  2. W=(T).


Proof To prove that T is linearly independent, begin with a relation of linear dependence on T,

                               0 = a1Vd1 + a2Vd2 + a3Vd3 + ... + arVdr

and we will try to conclude that the only possibility for the scalars ai is that they are all zero. Denote the
non-pivot columns of B by F = {fi, f2, f3, ..., fTr}. Then we can preserve the equality by adding a big
fat zero to the linear combination,

              0 = aivd1 + a2vd2 + a3vd3 +| . .. +| airVdr + Ovfi + Ovf2 + Ovf3 + . .. + OVf,

By Theorem SLSLC [93], the scalars in this linear combination (suitably reordered) are a solution to the
homogeneous system [S(A, 0). But notice that this is the solution obtained by setting each free variable
to zero. If we consider the description of a solution vector in the conclusion of Theorem VFSLS [99], in
the case of a homogeneous system, then we see that if all the free variables are set to zero the resulting
solution vector is trivial (all zeros). So it must be that oi = 0, 1 < i < r. This implies by Definition LICV
[132] that T is a linearly independent set.
    The second conclusion of this theorem is an equality of sets (Definition SE [684]). Since T is a subset of
5, any linear combination of elements of the set T can also be viewed as a linear combination of elements


of the set S. So (T) C (S) = W. It remains to prove that W = (S) C (T).
   For each k, 1 < k <rn - r, form a solution x to [S(A, 0) by setting the free variables as follows:

       Xfi = 0        Xfi = 0       xf, = 0        ...       Xfk = 1        ...       Xfnr = 0


Version 2.02


﻿
                                                         Subsection LDS.COV  Casting Out Vectors 159


By Theorem VFSLS [99], the remainder of this solution vector is given by,

       xd= - [B]lfk         Xd2= - [B2,f         Xd3   - [B]3f        ...      xd    - [B]r,fk

From this solution, we obtain a relation of linear dependence on the columns of A,

                  - [B]lfk Vd - [B]2,fk Vd2 - [B]3vfkd3 - ... - [B]r,fk Vdr + lVfk= 0

which can be arranged as the equality

                     Vfk= [B]lfk Vd, + [B]2,fk Vd2 + [B]3fkvd3 + ... + [B]rfk Vdr

   Now, suppose we take an arbitrary element, w, of W = (S) and write it as a linear combination of the
elements of S, but with the terms organized according to the indices in D and F,

          W = ai1Vd1 + Oa2Vd2 + Oa3Vd3 + ... + arV, -+ 31Vf + /32Vf2 + /33Vf3 + ... + +n-rVfn_

From the above, we can replace each v f by a linear combination of the Vd,

               W =   1Vd1 + a2Vd2 + O3Vd3 + ... + OrVd+

                   31 [B],fl , Vd + [B]2,f1 Vd2 + [B]3,f1 Vd3 + ... + [B]rfi Vd) +

                   /32 ([B]1,2 Vdi + [B]2f2 Vd2 + [B]3,f2 Vd3 + ... + [B]rf2 Vd) +

                   /33 ([B]1f3 Vdi + [B]2,f3 Vd2 + [B]3,f3 Vd3 + ... + [B]rf3 Vd) +


                   /3n-r ([B]1,1flr vdi + [B]2,fnr vd2 + [B]3,f_r vd3 + ... + [B]rfn r Vdr)

With repeated applications of several of the properties of Theorem VSPCV [86] we can rearrange this
expression as,


                       + --+1 [B]1,1 + /32 [B]1,f2 + /33 [B]1,f3 + ... +  -r [B]nr) vd+

                    (a2 + 31 [B]2,f1 + /32 [B]2,f2 + /33 [B]2,f3 + ... +  -r [B]2fnr vd2+

                    (a3 + 13 [B]3,f1 + /32 [B]3,f2 + /33 [B]3,f3 +... +   [B]3, ) v-d3+


                      +r +#1 [Bbp,f1 +#/2 [B]r,f2 +#/3 [B]r,f3 + ..+#n- [Blr,fnr Vd,

This mess expresses the vector w as a linear combination of the vectors in

                                     T - {vd1, vd2, vds, ... Vd,}

thus saying that w E (T). Therefore, W= (S) c (T).                                                 U
   In Example COV [154], we tossed-out vectors one at a time. But in each instance, we rewrote the
offending vector as a linear combination of those vectors that corresponded to the pivot columns of the


reduced row-echelon form of the matrix of columns. In the proof of Theorem BS [157], we accomplish this
reduction in one big step. In Example COV [154] we arrived at a linearly independent set at exactly the
same moment that we ran out of free variables to exploit. This was not a coincidence, it is the substance
of our conclusion of linear independence in Theorem BS [157].


Version 2.02


﻿
                                                         Subsection LDS.COV  Casting Out Vectors 160


   Here's a straightforward application of Theorem BS [157].

Example RSC4
Reducing a span in C4
Begin with a set of five vectors from C4

                                       1    2      2      7      0
                                       1    2      0      1      2
                                   S   2 '4 '-1 '-1 '5
                                       1    2      1      4      1

and let W = (S). To arrive at a (smaller) linearly independent set, follow the procedure described in
Theorem BS [157]. Place the vectors from S into a matrix as columns, and row-reduce,

                            1 2    2   7   0          [1   2   0   1  2
                            1 2    0    1  2   RREF,   0   0L     3 -1
                            2 4 -1 -1 5             '  0   0   0  0   0
                            1 2    1   4   1_          0   0   0  0   0_

Columns 1 and 3 are the pivot columns (D = {1, 3}) so the set

                                                   12
                                                   10
                                            T   2 '-1


is linearly independent and (T) = (S) = W. Boom!
   Since the reduced row-echelon form of a matrix is unique (Theorem RREFU [32]), the procedure of
Theorem BS [157] leads us to a unique set T. However, there is a wide variety of possibilities for sets T
that are linearly independent and which can be employed in a span to create W. Without proof, we list
two other possibilities:

                                                2      2
                                         7,     2      0
                                                4 '-1
                                                2      1

                                                (3]-


Can you prove that T' and T* are linearly independent sets and W =(S) =(T')  T*)?

Example RES
Reworking elements of a span
Begin with a set of five vectors from C4


       r2     -1     -8      3      -10
       1      1      -1      1      -1
R       3     0      -9     -1      -1
       [2]   [1]    [-4]   [-2]      4


Version 2.02


﻿
                                                           Subsection LDS.COV Casting Out Vectors 161


It is easy to create elements of X = (R)   we will create one at random,

                           2           -1        -8         3         -10       9
                    y   6 [1]+(-7) []+1          -1 1+6[1       +2    -1        2
                           3            0         -9       -1         -1        1
                           2            1        -4        -2          4        -3

We know we can replace R by a smaller set (since it is obviously linearly dependent by Theorem MVSLD
[137]) that will create the same span. Here goes,


                         2 -1 -8       3   -10                 0   -3   0   -1
                         1   1   -1    1   -1    RREF     0        2    0    2
                         3   0   -9 -1     -1      X      0    0   0    1   -2
                         2   1   -4 -2      4             0   0    0    0    0]

So, if we collect the first, second and fourth vectors from R,

                                              2     -1      3
                                              1      1      1
                                          P   3 '0       '-1
                                              .2_   _1 _    -2_

then P is linearly independent and (P) = (R) = X by Theorem BS [157]. Since we built y as an element
of (R) it must also be an element of (P). Can we write y as a linear combination of just the three vectors
in P? The answer is, of course, yes. But let's compute an explicit linear combination just for fun. By
Theorem SLSLC [93] we can get such a linear combination by solving a system of equations with the
column vectors of R as the columns of a coefficient matrix, and y as the vector of constants. Employing
an augmented matrix to solve this system,

                              2 -1     3    9           [1    0    0   1
                              1    1   1    2    RREF    0   [1    0   -1
                              3   0    -1   1'            0   0 L     2
                              2    1   -2 -3_            0    0    0   0_


So we see, as expected, that


                               1 []+(-1) [=]+2 []               []     y


A key feature of this example is that the linear combination that expresses y as a linear combination of the
vectors in P is unique. This is a consequence of the linear independence of P. The linearly independent
set P is smaller than R, but still just (barely) big enough to create elements of the set X =(R). There


are many, many ways to write y as a linear combination of the five vectors in R (the appropriate system
of equations to verify this claim has two free variables in the description of the solution set), yet there is
precisely one way to write y as a linear combination of the three vectors in P.


Version 2.02


﻿
                                                         Subsection LDS.READ  Reading Questions 162


Subsection READ
Reading Questions


  1. Let S be the linearly dependent set of three vectors below.

                                               1       1      5
                                               10      1      23
                                           S  100   ' 1 '    203
                                              1000     1     2003

     Write one vector from S as a linear combination of the other two (you should be able to do this on
     sight, rather than doing some computations). Convert this expression into a nontrivial relation of
     linear dependence on S.

  2. Explain why the word "dependent" is used in the definition of linear dependence.

  3. Suppose that Y = (P) = (Q), where P is a linearly dependent set and Q is linearly independent.
     Would you rather use P or Q to describe Y? Why?


Version 2.02


﻿
                                                                   Subsection LDS.EXC  Exercises 163


Subsection EXC
Exercises


C20 Let T be the set of columns of the matrix B below. Define W = (T). Find a set R so that (1) R
has 3 vectors, (2) R is a subset of T, and (3) W = (R).

                                             -3 1 -2      7
                                       B=    -1 2    1    4
                                             1   1   2   -1

Contributed by Robert Beezer Solution [164]


C40 Verify that the set R' =
Contributed by Robert Beezer


{vi, v2, v4} at the end of Example RSC5 [153] is linearly independent.


C50    Consider the set of vectors from C3, W, given below. Find a linearly independent set T that contains
three vectors from W and such that (W) = (T).


W = {vi, V2, V3, V4, V5}


2     -1     1     3     0
1  , -1    , 2  , 1   , 1
1      1     3     3    -3


Contributed by Robert Beezer    Solution [164]

C51 Given the set S below, find a linearly independent set T so that (T) = (S).

                                         2      3      1      5
                                 S=      -1 ,0     ,1     ,-1
                                          2    [ 1_[-1        3_

Contributed by Robert Beezer Solution [164]

C52 Let W be the span of the set of vectors S below, W = (S). Find a set T so that 1) the span of T
is W, (T) = W, (2) T is a linearly independent set, and (3) T is a subset of S. (15 points)

                                      1      2      4      3     3
                             S= 2           -3y1t1r-1
                                    1 -1_    1_     -1_    1_    0 _

Contributed by Robert Beezer Solution [164]


                                        1      3    4     3
C55   Let T be the set of vectors T =  -1,0,2,0
                                        2      1    3     6
R and S, so that R and S each contain three vectors, and so that
both R and S are linearly independent.
Contributed by Robert Beezer Solution [165]


Find two different subsets of T, named

(R) = (T) and (S) = (T). Prove that


C70 Reprise Example RES [159] by creating a new version of the vector y. In other words, form a new,
different linear combination of the vectors in R to create a new vector y (but do not simplify the problem
too much by choosing any of the five new scalars to be zero). Then express this new y as a combination
of the vectors in P.
Contributed by Robert Beezer


Version 2.02


﻿
                                                                   Subsection LDS.EXC   Exercises 164


M10 At the conclusion of Example RSC4 [159] two alternative solutions, sets T' and T*, are proposed.
Verify these claims by proving that (T) = (T') and (T) = (T*).
Contributed by Robert Beezer

T40 Suppose that vi and v2 are any two vectors from Cm. Prove the following set equality.

                                   ({vi, v2}) = ({vi + V2, Vi - V2})


Contributed by Robert Beezer Solution [166]


Version 2.02


﻿
                                                                 Subsection LDS.SOL  Solutions 165


Subsection SOL
Solutions


C20    Contributed by Robert Beezer   Statement [162]
                                       2

Let T = {wi, w2, w3, w4}. The vector[0    is a solution to the homogeneous system with the matrix B as


the coefficient matrix (check this!). By Theorem SLSLC [93] it provides the scalars for a linear combination
of the columns of B (the vectors in T) that equals the zero vector, a relation of linear dependence on T,

                                    2wi + (-1)w2 + (1)w4 = 0

We can rearrange this equation by solving for w4,

                                        W4 = (-2)wi + W2

This equation tells us that the vector w4 is superfluous in the span construction that creates W. So
W = ({w1, w2, w3}). The requested set is R = {wi, w2, w3}.
C50    Contributed by Robert Beezer   Statement [162]
To apply Theorem BS [157], we formulate a matrix A whose columns are vi, v2, v3, v4, v5. Then we
row-reduce A. After row-reducing, we obtain


                                        0   1    0   1  -
                                             00
                                        0   0    1  0   0_

From this we that the pivot columns are D = {1, 2, 3}. Thus

                                                   2     -1     1
                             T = {vi, v2, v3} =    1,-1 ,2
                                                      1   1     3

is a linearly independent set and (T) = W. Compare this problem with Exercise LI.M50 [144].

C51    Contributed by Robert Beezer   Statement [162]
Theorem BS [157] says we can make a matrix with these four vectors as columns, row-reduce, and just
keep the columns with indices in the set D. Here we go, forming the relevant matrix and row-reducing,


                                  - 0 1   -1RREF:      02       1   1
                                    2  1 - 3 _         0   0    0  0

Analyzing the row-reduced version of this matrix, we see that the first two columns are pivot columns, so
D ={1, 2}. Theorem BS [157] says we need only "keep" the first two columns to create a set with the
requisite properties,


                                                2     3
                                       T =     -1 ,0
                                                21

C52    Contributed by Robert Beezer   Statement [162]


Version 2.02


﻿
                                                                    Subsection LDS.SOL  Solutions 166


   This is a straight setup for the conclusion of Theorem BS [157]. The hypotheses of this theorem tell us
to pack the vectors of W into the columns of a matrix and row-reduce,

                           1    2    4   3   3          rf    0   2   0   1
                           2   -3    1   1 -1    RRE:     0[L     1   0   1
                           _-1  1   -1 1     0 _          0   0   0 [     0]

Pivot columns have indices D = {1, 2, 4}. Theorem BS [157] tells us to form T with columns 1, 2 and 4
of S,

                                             1 _     2      3
                                     S=       2   ,-3 ,1
                                             -1      1      1I

C55    Contributed by Robert Beezer  Statement [162]
Let A be the matrix whose columns are the vectors in T. Then row-reduce A,

                                                  1   0    0    2
                                  A RREI   B=     0   [-   0   -1
                                                 L0   0   [     1]

From Theorem BS [157] we can form R by choosing the columns of A that correspond to the pivot columns
of B. Theorem BS [157] also guarantees that R will be linearly independent.

                                             11      3     4
                                      R=      -1 ,0 ,2
                                             12      1     3

That was easy. To find S will require a bit more work. From B we can obtain a solution to IS(A, 0),
which by Theorem SLSLC [93] will provide a nontrivial relation of linear dependence on the columns of A,
which are the vectors in T. To wit, choose the free variable x4 to be 1, then xi= -2, x2 = 1, x3 = -1,
and so
                                 1          3           4         3     0
                          (-2) -1 + (1) 0 + (-1) 2 + (1) 0              0
                                 2          1           3         6     -0
this equation can be rewritten with the second vector staying put, and the other three moving to the other
side of the equality,


We could have chosen other vectors to stay put, but may have then needed to divide by a nonzero scalar.
This equation is enough to conclude that the second vector in T is "surplus" and can be replaced (see the
careful argument in Example RSC5 [153]). So set


and then (S) = (T). T is also a linearly independent set, which we can show directly. Make a matrix
C whose columns are the vectors in S. Row-reduce B and you will obtain the identity matrix 13. By
Theorem LIVRN [136], the set S is linearly independent.


Version 2.02


﻿
                                                                   Subsection LDS.SOL  Solutions 167


T40    Contributed by Robert Beezer  Statement [163]
This is an equality of sets, so Definition SE [684] applies.
   The "easy" half first. Show that X = ({vi + v2, vi - v2}) C ({v1, v2}) = Y.
Choose x E X. Then x = ai(vi + v2) + a2(vi - v2) for some scalars ai and a2. Then,

                                  x = ai(vi + v2) + a2(vi - v2)
                                    = aiv1 + aiv2 + a2v1 + (-a2)v2
                                    = (a1 + a2)vi + (ai - a2)v2

which qualifies x for membership in Y, as it is a linear combination of vi, v2.
   Now show the opposite inclusion, Y = ({vi, v2}) C ({v1 + v2, vi - v2}) = X.
Choose y E Y. Then there are scalars bi, b2 such that y = bivi + b2v2. Rearranging, we obtain,

                      y = bivi + b2v2

                        =b [(vi + v2) + (vi - v2)] +  [(vi + v2) - (vi - v2)]
                           2                          2
                           b1 + b2            b1 - b2
                        =    2    (vi+v2)2+          (vi-v2)

This is an expression for y as a linear combination of vi + v2 and vi - v2, earning y membership in X.
Since X is a subset of Y, and vice versa, we see that X = Y, as desired.


Version 2.02


﻿
                                                                      Section 0 Orthogonality 168


Section 0
Orthogonality


In this section we define a couple more operations with vectors, and prove a few theorems. At first blush
these definitions and results will not appear central to what follows, but we will make use of them at key
points in the remainder of the course (such as Section MINM [226], Section OD [601]). Because we have
chosen to use C as our set of scalars, this subsection is a bit more, uh, ... complex than it would be for the
real numbers. We'll explain as we go along how things get easier for the real numbers R. If you haven't
already, now would be a good time to review some of the basic properties of arithmetic with complex
numbers described in Section CNO [679]. With that done, we can extend the basics of complex number
arithmetic to our study of vectors in Cm.

Subsection CAV
Complex Arithmetic and Vectors


We know how the addition and multiplication of complex numbers is employed in defining the operations
for vectors in Ctm (Definition CVA [84] and Definition CVSM [85]). We can also extend the idea of the
conjugate to vectors.
Definition CCCV
Complex Conjugate of a Column Vector
Suppose that u is a vector from Cm. Then the conjugate of the vector, u, is defined by

                          [u];= [u]i                         1<5i 5 m


(This definition contains Notation CCCV.)                                                      A
   With this definition we can show that the conjugate of a column vector behaves as we would expect
with regard to vector addition and scalar multiplication.
Theorem CRVA
Conjugation Respects Vector Addition
Suppose x and y are two vectors from Cm. Then

                                          x+Yx+Y


Proof For each 5i 5m,

                   [x +yji   [x +y]g                     Definition CCCV [167]
                            =[x]; + [y]g                 Definition CVA [84]
                            =[x]; + [y]g                 Theorem CCRA [681]
                            = [i], + [y]g                Definition CCCV [167]
                            =[Y + y]g                    Definition CVA [84]


Then by Definition CVE [84] we have x + y =  + y.

Theorem CRSM
Conjugation Respects Vector Scalar Multiplication


Version 2.02


﻿
Subsection O.IP Inner products 169


Suppose x is a vector from Cm, and a E C is a scalar. Then

                                             ax= ax


El


Proof For 1 < i <m,


[ox]j -[ox]i
      - a [x]2
      - a[x]2

      _ [~].


Definition CCCV [167]
Definition CVSM [85]
Theorem CCRM [682]
Definition CCCV [167]
Definition CVSM [85]


Then by Definition CVE [84] we have x  o=x.

   These two theorems together tell us how we can "push" complex conjugation through linear combina-
tions.


Subsection IP
Inner products


Definition IP
Inner Product
Given the vectors u, v E Cm the inner product of u and v is the scalar quantity in C,


(u, v) = [u]I [v]1 + [u]2 [v]2 + [u]3 [v]3 + --. + [U]m [v]m =


luli IvIi,


(This definition contains Notation IP.)


A


   This operation is a bit different in that we begin with two vectors but
one is straightforward.

Example CSIP
Computing some inner products
The scalar product of


produce a scalar. Computing


      2 + 3i
u= 5+2i
      -3+i


and


      1 + 2i
v= -4 + 5i
      0 + 5i


is


(u, v) = (2 + 3i)(1 + 2i) + (5 + 2i)(-4 + 5i) + (-3 + i)(0 + 5i)
      =-(2 + 3i)(1 - 2i) + (5 + 2i)(-4 - 5i) + (-3 + i)(0 - 5i)
      = (8 - i) + (-10 - 33i) + (5 + 15i)
      =3- 19i


Version 2.02


﻿
Subsection O.IP Inner products 170


The scalar product of


       2
       4
w = -3
       2
       8


and


       3
       1
x=     0
      -1
      _-2_


is


(w, x) = 2(3) + 4(1) + (-3)(0) + 2(-1) + 8(-2) = 2(3) + 4(1) + (-3)0 + 2(-1) + 8(-2)


-8.


   In the case where the entries of our vectors are all real numbers (as in the second part of Example CSIP
[168]), the computation of the inner product may look familiar and be known to you as a dot product or
scalar product. So you can view the inner product as a generalization of the scalar product to vectors
from Cm (rather than lam).
   Also, note that we have chosen to conjugate the entries of the second vector listed in the inner product,
while many authors choose to conjugate entries from the first component. It really makes no difference
which choice is made, it just requires that subsequent definitions and theorems are consistent with the
choice. You can study the conclusion of Theorem IPAC [170] as an explanation of the magnitude of the
difference that results from this choice. But be careful as you read other treatments of the inner product
or its use in applications, and be sure you know ahead of time which choice has been made.
   There are several quick theorems we can now prove, and they will each be useful later.

Theorem IPVA
Inner Product and Vector Addition
Suppose u, v, w E Cm. Then

                                  1. (u + v, w)= (u, w) + (v, w)
                                  2. (u, v + w)= (u, v) + (u, w)


Proof The proofs of the two parts are very similar, with the second one requiring just a bit more effort
due to the conjugation that occurs. We will prove part 2 and you can prove part 1 (Exercise O.T10 [179]).


(u, v + w)


   [u]; [V + w];
i=1
m
   [u], ([v]; +  [w]g)
i=1
m
   [u] ([   + [w]i)
i=1
m
   [u] [v+ [u] [w]l,
i=1
m             m

i=1          i=1
(u, v) + (u, w)


Definition IP [168]


Definition CVA [84]


Theorem CCRA [681]


Property DCN [681]


Property CACN [680]

Definition IP [168]


Version 2.02


﻿
                                                               Subsection O.IP Inner products 171


Theorem IPSM
Inner Product and Scalar Multiplication
Suppose u, v E Cm and ac C C. Then

                                     1.  (a u, v)= a (u, v)
                                     2.  (u, av)= a (u, v)


Proof The proofs of the two parts are very similar, with the second one requiring just a bit more effort
due to the conjugation that occurs. We will prove part 2 and you can prove part 1 (Exercise O.T11 [179]).


          m
(U, av) =2[u]i [av]g
          i=1
          m
          = uli a [vlg
          i=1
          m

          i=1
          m
          =Z [u- [v]
          i=1
            m
         a [u] [v]
           i=1
       = i(u, v)


Definition IP [168]


Definition CVSM [85]


Theorem CCRM [682]


Property CMCN [680]


Property DCN [681]

Definition IP [168]


0


Theorem IPAC
Inner Product is Anti-Commutative
Suppose that u and v are vectors in Cm. Then (u, v) = (v, u).
Proof


D-


         m
(U, V) =  uli]
        i=1


        i=1

        -Z~u[v]i
        i=1


        =1


        (v, u)


Definition IP [168]


Theorem CCT [682]


Theorem CCRM [682]


Theorem CCRA [681]


Property CMCN [680]

Definition IP [168]


0


Version 2.02


﻿
                                                                      Subsection O.N Norm  172


Subsection N
Norm


If treating linear algebra in a more geometric fashion, the length of a vector occurs naturally, and is what
you would expect from its name. With complex numbers, we will define a similar function. Recall that if
c is a complex number, then |cl denotes its modulus (Definition MCN [682]).
Definition NV
Norm of a Vector
The norm of the vector u is the scalar quantity in C

                                                                   M
                   ||ull =V|[U],12 +|[u]22 +|[u132 +-.-+ [um2 S]2
                                                                  i=1

(This definition contains Notation NV.)                                                     A
   Computing a norm is also easy to do.
Example CNSV
Computing the norm of some vectors
The norm of
                                              3 + 2i
                                              1 - 6i
                                         u =2+4i
                                              [2+i_
is
          lul    3+2i 2+1-6i 2+|2+4i 2+|2+i 2            13+37+20O+5       75=5v3.
The norm of
                                                3
                                                -1
                                          v= 2
                                                4
                                                -3
is
              1v |=   32+|1-112 +| 22+|412 +|-312 =   32 + 12 +22 + 42 +32  V39.


   Notice how the norm of a vector with real number entries is just the length of the vector. Inner products
and norms are related by the following theorem.
Theorem IPN
Inner Products and Norms
Suppose that u is a vector in Cm. Then |u 2 =(u, u).D
Proof


                    2
          m
IIU12         [u] 2Definition NV [171]
          i=1
       m
         5 [u]l2
      i=1


Version 2.02


﻿
                                                             Subsection O.OV  Orthogonal Vectors 173

                           m
                        =     [u]i [u]l                      Definition MCN [682]
                          i=1
                          (u, u)                             Definition IP [168]


   When our vectors have entries only from the real numbers Theorem IPN [171] says that the dot product
of a vector with itself is equal to the length of the vector squared.
Theorem PIP
Positive Inner Products
Suppose that u is a vector in C"m. Then (u, u) > 0 with equality if and only if u = 0.     D
Proof From the proof of Theorem IPN [171] we see that

                            (u,u) =|[u]i2+|[u22+|[u]3 2+...+ [U]m2

Since each modulus is squared, every term is positive, and the sum must also be positive. (Notice that in
general the inner product is a complex number and cannot be compared with zero, but in the special case
of (u, u) the result is a real number.) The phrase, "with equality if and only if" means that we want to
show that the statement (u, u) = 0 (i.e. with equality) is equivalent ("if and only if") to the statement
u=0.
   If u = 0, then it is a straightforward computation to see that (u, u) = 0. In the other direction, assume
that (u, u) = 0. As before, (u, u) is a sum of moduli. So we have

                          0 = (u, u)- =|[u]i2 +|[u]2 2 +|[u]3I2+...+ [u]m2

Now we have a sum of squares equaling zero, so each term must be zero. Then by similar logic, |[u]| = 0
will imply that [u]2 = 0, since 0 + 0i is the only complex number with zero modulus. Thus every entry of
u is zero and so u =0, as desired.                                                                U
   Notice that Theorem PIP [172] contains three implications:

                                         uE Cm 4 (u, u) > 0
                                           u   0 4 (u, u)= 0
                                       (u, u) - 0 4 u = 0

The results contained in Theorem PIP [172] are summarized by saying "the inner product is positive
definite."

Subsection OV
Orthogonal Vectors


"Orthogonal" is a generalization of "perpendicular." You may have used mutually perpendicular vectors in
a physics class, or you may recall from a calculus class that perpendicular vectors have a zero dot product.
We will now extend these ideas into the realm of higher dimensions and complex scalars.
Definition OV
Orthogonal Vectors


A pair of vectors, u and v, from Cm are orthogonal if their inner product is zero, that is, (u, v) = 0. A

Example TOV
Two orthogonal vectors


Version 2.02


﻿
Subsection O.OV  Orthogonal Vectors 174


The vectors


      2+3i]
      4-2i
u 1+i
      1+i]


      1-i
      2+3i
v  4-6i
       _ 1 _


are orthogonal since

                 (u, v) = (2 + 3i)(1 + i) + (4 - 2i)(2 - 3i) + (1 + i)(4 + 6i) + (1 + i)(1)
                       = (-1 + 5i) + (2 - 16i) + (-2 + lOi) + (1 + i)
                       =O+Oi.


   We extend this definition to whole sets by requiring vectors to be pairwise orthogonal.
the same word, careful thought about what objects you are using will eliminate any source


Despite using
of confusion.


Definition OSV
Orthogonal Set of Vectors
Suppose that S = {ui, u2, u3, ..., un} is a set of vectors from Cm. Then S is an orthogonal set if every
pair of different vectors from S is orthogonal, that is (ui, u3) = 0 whenever i # j.           A
   We now define the prototypical orthogonal set, which we will reference repeatedly.
Definition SUV
Standard Unit Vectors
Let e3 E Ctm, 1 < j < m denote the column vectors defined by


[ej]i


if i7#j
if i - j


Then the set


{ei, e2, e3, ..., em}


{eg | 1 j    m}


is the set of standard unit vectors in Cm.
(This definition contains Notation SUV.)                                                          A
   Notice that e3 is identical to column j of the m x m identity matrix Im (Definition IM [72]). This
observation will often be useful. It is not hard to see that the set of standard unit vectors is an orthogonal
set. We will reserve the notation e2 for these vectors.
Example SUVOS
Standard Unit Vectors are an Orthogonal Set
Compute the inner product of two distinct vectors from the set of standard unit vectors (Definition SUV
[173]), say e2, ej, where i # j,

                   (ei, e) =0U+0U+"---+1U+"---+0U+"---+01+"---+0U+0U
                           =0(0) +0(0)+---+1(0)+- --+0(1)+- --+0(0) +0(0)
                           =o0

So the set {ei, e2, e3, ..., em} is an orthogonal set.

Example AOS
An orthogonal set


Version 2.02


﻿
                                                             Subsection O.OV Orthogonal Vectors 175


The set
                                      1+ i     1+5i       -7+34i       -2-4i
                                        1      6 +5i      -8-23i         6+i
                 XiX2,X3,X4      =    1- i ' -7 - i ' -10+ 22i '        4+3i

                                      _i_3_-L6i_]30+13i                  6-i
is an orthogonal set. Since the inner product is anti-commutative (Theorem IPAC [170]) we can test pairs
of different vectors in any order. If the result is zero, then it will also be zero if the inner product is
computed in the opposite order. This means there are six pairs of different vectors to use in an inner
product computation. We'll do two and you can practice your inner products on the other four.


(x1, x3) = (1 + i)(-7 - 34i) + (1)(-8 + 23i) + (1 - i)(-10 - 22i) + (i)(30


13i)


(27 - 41i) + (-8 + 23i) + (-32
0 + 0i


12i) + (13 + 30i)


and


(x2, x4) = (1 +5i)(-2 +4i) +(6 +5i)(6 -i) +(-7 -i)(4 -3i) +(1
        = (-22 - 6i) + (41 + 24i) + (-31 + 17i) + (12 - 35i)
        = 0 +0i


6i)(6 + i)


   So far, this section has seen lots of definitions, and lots of theorems establishing un-surprising conse-
quences of those definitions. But here is our first theorem that suggests that inner products and orthogonal
vectors have some utility. It is also one of our first illustrations of how to arrive at linear independence as
the conclusion of a theorem.
Theorem OSLI
Orthogonal Sets are Linearly Independent
Suppose that S is an orthogonal set of nonzero vectors. Then S is linearly independent.     D
Proof    Let S = {ui, u2, u3, ..., un} be an orthogonal set of nonzero vectors. To prove the linear
independence of S, we can appeal to the definition (Definition LICV [132]) and begin with an arbitrary
relation of linear dependence (Definition RLDCV [132]),

                               a1U1 + a2U2 + a3U3 +  + --nun =0.

Then, for every 1 < i < n, we have


        1
-i =         (ci (ui, ui))
     (ui, u)
        1
     = ((a1(0) + a2(0) + -.-. + ai (ui, ui) + -.-. + an (0) )
     (ui, ui)
        1
   =         (ali(ui, u) + -- + ai(ui, u) +- + an (un, ui))
     (ui, u2)
        1
   - K,      ((iui, u) + (2u2, u) + ... + (anun, ui))
     (ui, ui )
        1
     =       (aiui +a2U2 +3++ au | ----aum, ui)
     (ui, ui )
        1
   - Ku      (0, ui)
     (ui, ui )
        1
   -         0
     (ui, ui)


Theorem PIP [172]

Property ZCN [681]

Definition OSV [173]

Theorem IPSM [170]

Theorem IPVA [169]

Definition RLDCV [132]

Definition IP [168]


Version 2.02


﻿
                                                        Subsection O.GSP  Gram-Schmidt Procedure 176


        = 0                                                             Property ZCN [681]

So we conclude that oj-= 0 for all 1 < i < n in any relation of linear dependence on S. But this says that
S is a linearly independent set since the only way to form a relation of linear dependence is the trivial way
(Definition LICV [132]). Boom!                                                                      U


Subsection GSP
Gram-Schmidt Procedure


The Gram-Schmidt Procedure is really a theorem. It says that if we begin with a linearly independent set
of p vectors, S, then we can do a number of calculations with these vectors and produce an orthogonal
set of p vectors, T, so that (S) = (T). Given the large number of computations involved, it is indeed a
procedure to do all the necessary computations, and it is best employed on a computer. However, it also
has value in proofs where we may on occasion wish to replace a linearly independent set by an orthogonal
set.
   This is our first occasion to use the technique of "mathematical induction" for a proof, a technique
we will see again several times, especially in Chapter D [370]. So study the simple example described in
Technique I [694] first.
Theorem GSP
Gram-Schmidt Procedure
Suppose that S = {vi, V2, v3, ..., vp} is a linearly independent set of vectors in C". Define the vectors
ui, 1 < i  p by

                         (vi, u1)      (vi, u2)     (vi, u3)           (vi, ui_1)
               ui=1vi -          u-           u2-           u3-.-ui_1
                         (Ui, Ui)     (U2, U2)      (u3, u3)          (us_1, ui_1)

Then if T = {ui, u2, u3, ..., up}, then T is an orthogonal set of non-zero vectors, and (T) = (S). Q
Proof We will prove the result by using induction on p (Technique I [694]). To begin, we prove that T
has the desired properties when p = 1. In this case ui = vi and T = {ui} = {v1} = S. Because S and
T are equal, (S) = (T). Equally trivial, T is an orthogonal set. If u= 0, then S would be a linearly
dependent set, a contradiction.
   Suppose that the theorem is true for any set of p-1 linearly independent vectors. Let S = {vi, v2, v3, ..., vp}
be a linearly independent set of p vectors. Then S' = {vi, v2, v3, ..., vp_1} is also linearly independent.
So we can apply the theorem to S' and construct the vectors T'  {ui, u2, u3, ..., up_1}. T' is therefore
an orthogonal set of nonzero vectors and (S') = (T'). Define

                 11 ~    (vy,    11i) (vp, 112) 11 (vp, 113)    _(vp, up_1)
                         Kui, 11i)    (112, 112)  (113, 113)11        (up-i, up_)1J~

and let T =T' u {up}. We need to now show that T has several properties by building on what we know
about T'. But first notice that the above equation has no problems with the denominators ((ui, ui)) being
zero, since the us are from T', which is composed of nonzero vectors.
   We show that (T) =(5), by first establishing that (T) C (5). Suppose x E (T), so

                                 x    i1+au          ss+---+au


The term apup is a linear combination of vectors from T' and the vector vp, while the remaining terms are
a linear combination of vectors from T'. Since (T') = (S'), any term that is a multiple of a vector from T'
can be rewritten as a linear combination of vectors from S'. The remaining term apvv is a multiple of a
vector in S. So we see that x can be rewritten as a linear combination of vectors from S, i.e. x E (5).


Version 2.02


﻿
                                                      Subsection O.GSP  Gram-Schmidt Procedure 177


   To show that (S) C (T), begin with y E (S), so

                                y=alvl+ a2v2 + a3v3 + ... + apvp

Rearrange our defining equation for up by solving for vp. Then the term apv is a multiple of a linear
combination of elements of T. The remaining terms are a linear combination of vi, v2, v3, ..., vp_1,
hence an element of (S') = (T'). Thus these remaining terms can be written as a linear combination of
the vectors in T'. So y is a linear combination of vectors from T, i.e. y E (T).
   The elements of T' are nonzero, but what about up? Suppose to the contrary that up = 0,


             (v, u))   11 (v2 , u2)2 (v, u3)  (v, up_1)
  0 up- vp (ui, ui)  (u2, u2)  (u3, u3) u  .  (up_1, up_1)  -
     (vy, ui)     (vy, U2)     (vy, U3)           (vy, up_1)
vp = u     u i1+           12 + u +  u s -|- ----|-up
     Kui, ui1)  (112, 112)  (113, 113)  (u _1,u _1


Since (S') = (T') we can write the vectors u1, u2, u3, ..., up_1 on the right side of this equation in terms
of the vectors v1, v2, v3, ..., vp_1 and we then have the vector vp expressed as a linear combination of
the other p - 1 vectors in S, implying that S is a linearly dependent set (Theorem DLDS [152]), contrary
to our lone hypothesis about S.
   Finally, it is a simple matter to establish that T is an orthogonal set, though it will not appear so
simple looking. Think about your objects as you work through the following what is a vector and what
is a scalar. Since T' is an orthogonal set by induction, most pairs of elements in T are already known to
be orthogonal. We just need to test "new" inner products, between up and ui, for 1 < i   p - 1. Here we
go, using summation notation,

                          p-1
                              (vy, uk)
         ( up, ui ) = vp -           uk, ui
                          k =1 (uk, uk)


(vs, ui) - K (vp, U/) uk, u
            k=1 (Uk, uk)


(vp, ui)-  ( , uk) (u , u)
          k=1  (uk, uk)

(vy, u\   (vy, u  (ku/  U) (u, u

        ( ,u -_  (vp, ui) (ui, ui) -  >K (vy, u(p ) (0)
          (ui)       (uk       (   k, uk)


Theorem IPVA [169]


Theorem IPVA [169]


Theorem IPSM [170]


Induction Hypothesis


                 = (vp, ui) - (vp, ui) - k0
                                       k~i


Example GSTV
Gram-Schmidt of three vectors
We will illustrate the Gram-Schmidt process with three vectors. Begin with the linearly independent (check
this!) set
                                               S v1        -i      0
                           S = {VI, V2, V3} =  1+ i ,   1   ,i
                                                   1  _   1 +2i_   i


Version 2.02


﻿
                                                     Subsection O.GSP Gram-Schmidt Procedure 178


Then

                                      1
                         ui = vi = 1 + i
                                      1
                                                   --2 -3i
                                   (V2,ui)      1
                         U2 =V2 -        ui = -     1 - z
                                                  L 2+5i

                                   (V3, u  1 u) _ (v3, u2) U 1I+      .
                           U3 V3   (ui, ui)     (U2, U2u2 - 11   _  _

and
                                            1     1  -2-3i     1   -3-i
                    T = {u, u2, u3} =     1 + i ,   [ 1 - i  ,     1 + 3i
                                            1     4  2 +5i 1       -1 -i_
is an orthogonal set (which you can check) of nonzero vectors and (T)  (S) (all by Theorem GSP [175]).
Of course, as a by-product of orthogonality, the set T is also linearly independent (Theorem OSLI [174]).


   One final definition related to orthogonal vectors.
Definition ONS
OrthoNormal Set
Suppose S = {u, u2, u3, ..., un} is an orthogonal set of vectors such that ||us|| = 1 for all 1 < i < n.
Then S is an orthonormal set of vectors.                                                       A
   Once you have an orthogonal set, it is easy to convert it to an orthonormal set -multiply each vector
by the reciprocal of its norm, and the resulting vector will have norm 1. This scaling of each vector will
not affect the orthogonality properties (apply Theorem IPSM [170]).
Example ONTV
Orthonormal set, three vectors
The set
                                          1 1     1-2-3i       1   -3-i
                    T = {u, u2, u3} =     1 + i , -   1 - i  , -   1] 1+ 3i
                                          1 1_4      2+5i_11-1-i_
from Example GSTV [176] is an orthogonal set. We compute the norm of each vector,

                                                1                            v/2
                                                2                           91

Converting each vector to a norm of 1, yields an orthonormal set,

                                  1[1
                               wi=- i1+'ti
                                  2[    ]

                                    1  1 [2 -3i]        1    -2 -3i]


W2  1/114 L2'i5i 2v   [21+5i]


       1 1         .     1i/55     .
           1              22z  -1z - z_


Version 2.02


﻿
Subsection O.READ  Reading Questions 179


Example ONFV
Orthonormal set, four vectors
As an exercise convert the linearly independent set

                                  1+i        i               -1 - i
                              S=   1        +i]      -i         i
                                  1-i '-1       '-1+i '         1


to an orthogonal set via the Gram-Schmidt Process (Theorem GSP [175]) and then scale the vectors to
norm 1 to create an orthonormal set. You should get the same set you would if you scaled the orthogonal
set of Example AOS [173] to become an orthonormal set.
   It is crazy to do all but the simplest and smallest instances of the Gram-Schmidt procedure by hand.
Well, OK, maybe just once or twice to get a good understanding of Theorem GSP [175]. After that, let a
machine do the work for you. That's what they are for.See: Computation GSP.MMA [670] .
   We will see orthonormal sets again in Subsection MINM.UM [229]. They are intimately related to
unitary matrices (Definition UM [229]) through Theorem CUMOS [230]. Some of the utility of orthonormal
sets is captured by Theorem COB [332] in Subsection B.OBC [331]. Orthonormal sets appear once again
in Section OD [601] where they are key in orthonormal diagonalization.

Subsection READ
Reading Questions


  1. Is the set
                                           1      5      8
                                           -1 ,3      ,4
                                           2     -1     -2
     an orthogonal set? Why?

  2. What is the distinction between an orthogonal set and an orthonormal set?

  3. What is nice about the output of the Gram-Schmidt process?


Version 2.02


﻿
                                                                    Subsection O.EXC  Exercises 180


Subsection EXC
Exercises


C20 Complete Example AOS [173] by verifying that the four remaining inner products are zero.

Contributed by Robert Beezer

C21 Verify that the set T created in Example GSTV [176] by the Gram-Schmidt Procedure is an or-
thogonal set.
Contributed by Robert Beezer

T10 Prove part 1 of the conclusion of Theorem IPVA [169].
Contributed by Robert Beezer

T11 Prove part 1 of the conclusion of Theorem IPSM [170].
Contributed by Robert Beezer

T20   Suppose that u, v, w E C"m, a, 3E C and u is orthogonal to both v and w. Prove that u is
orthogonal to ov + 3w.
Contributed by Robert Beezer Solution [180]

T30 Suppose that the set S in the hypothesis of Theorem GSP [175] is not just linearly independent,
but is also orthogonal. Prove that the set T created by the Gram-Schmidt procedure is equal to S. (Note
that we are getting a stronger conclusion than (T)= (S) -the conclusion is that T = S.) In other words,
it is pointless to apply the Gram-Schmidt procedure to a set that is already orthogonal.
Contributed by Steve Canfield


Version 2.02


﻿
                                                                     Subsection O.SOL Solutions 181


Subsection SOL
Solutions


T20    Contributed by Robert Beezer   Statement [179]
Vectors are orthogonal if their inner product is zero (Definition OV [172]), so we compute,


(av +3w, u)  (av, u) + (3w, u)
               a (v, u) + 3 (w, u)
               =a (0) + 13 (0)
               =0


Theorem IPVA [169]
Theorem IPSM [170]
Definition OV [172]


So by Definition OV [172], u and av +3w are an orthogonal pair of vectors.


Version 2.02


﻿
                                                                 Annotated Acronyms O.V Vectors 182


Annotated Acronyms V
Vectors


Theorem VSPCV [86]
These are the fundamental rules for working with the addition, and scalar multiplication, of column vectors.
We will see something very similar in the next chapter (Theorem VSPM [184]) and then this will be
generalized into what is arguably our most important definition, Definition VS [279].

Theorem SLSLC [93]
Vector addition and scalar multiplication are the two fundamental operations on vectors, and linear com-
binations roll them both into one. Theorem SLSLC [93] connects linear combinations with systems of
equations. This one we will see often enough that it is worth memorizing.

Theorem PSPHS [105]
This theorem is interesting in its own right, and sometimes the vaugeness surrounding the choice of z can
seem mysterious. But we list it here because we will see an important theorem in Section ILT [477] which
will generalize this result (Theorem KPI [483]).

Theorem LIVRN [136]
If you have a set of column vectors, this is the fastest computational approach to determine if the set is
linearly independent. Make the vectors the columns of a matrix, row-reduce, compare r and n. That's it
   and you always get an answer. Put this one in your toolkit.

Theorem BNS [139]
We will have several theorems (all listed in these "Annotated Acronyms" sections) whose conclusions will
provide a linearly independent set of vectors whose span equals some set of interest (the null space here).
While the notation in this theorem might appear a gruesome, in practice it can become very routine to
apply. So practice this one  we'll be using it all through the book.

Theorem BS [157]
As promised, another theorem that provides a linearly independent set of vectors whose span equals some
set of interest (a span now). You can use this one to clean up any span.


Version 2.02


﻿


Chapter M

Matrices


We have made frequent use of matrices for solving systems of equations, and we have begun to investigate
a few of their properties, such as the null space and nonsingularity. In this chapter, we will take a more
systematic approach to the study of matrices.


Section MO
Matrix Operations
U.-


In this section we will back up and start simple. First a definition of a totally general set of matrices.
Definition VSM
Vector Space of m x n Matrices
The vector space Mmn is the set of all m x n matrices with entries from the set of complex numbers.
(This definition contains Notation VSM.)                                                       A


Subsection MEASM
Matrix Equality, Addition, Scalar Multiplication


Just as we made, and used, a careful definition of equality for column vectors, so too, we have precise
definitions for matrices.
Definition ME
Matrix Equality
The m x n matrices A and B are equal, written A = B provided [A]ij = [B]2j for all 1 < i < m, 1 < j <rn.

(This definition contains Notation ME.)                                                        A
   So equality of matrices translates to the equality of complex numbers, on an entry-by-entry basis. Notice
that we now have yet another definition that uses the symbol "=" for shorthand. Whenever a theorem
has a conclusion saying two matrices are equal (think about your objects), we will consider appealing
to this definition as a way of formulating the top-level structure of the proof. We will now define two
operations on the set Mmn. Again, we will overload a symbol ('+') and a convention (juxtaposition for
scalar multiplication).
Definition MA
Matrix Addition
Given the m x n~ matrices A and B, define the sum of A and B as an m x n~ matrix, written A + B,


183


﻿
                             Subsection MO.MEASM  Matrix Equality, Addition, Scalar Multiplication 184


according to

                   [A + B] = [A] + [B]                      1<i<m,1<j<n


(This definition contains Notation MA.)                                                          A

   So matrix addition takes two matrices of the same size and combines them (in a natural way!) to create
a new matrix of the same size. Perhaps this is the "obvious" thing to do, but it doesn't relieve us from
the obligation to state it carefully.

Example MA
Addition of two matrices in M23
If

                         A  2 -3    4B                             6 2 -4
                            1  0   -7                              3 5    2

then
                    2 -3    4      6 2 -4        2+6 -3+2 4+(-4)             8 -1     0
            +B      1   0   -7]+   3 5    2      1+3     0+5     -7+2        4   5   -5


   Our second operation takes two objects of different types, specifically a number and a matrix, and
combines them to create another matrix. As with vectors, in this context we call a number a scalar in
order to emphasize that it is not a matrix.

Definition MSM
Matrix Scalar Multiplication
Given the m x n matrix A and the scalar a E C, the scalar multiple of A is an m x n matrix, written
oA and defined according to

                     [aA]j = a [A]i3                      1 i     m,1 j<nn


(This definition contains Notation MSM.)                                                         A

   Notice again that we have yet another kind of multiplication, and it is again written putting two
symbols side-by-side. Computationally, scalar matrix multiplication is very easy.

Example MSM
Scalar multiplication in M32
If


and ac= 7, then


          2  8       7(2)   7(8)      14   56
aA = 7   -3  5  =   7(-3)   7(5)     -21 35
          0  1_[     7(0)   7(1)_     0    7]


Version 2.02


﻿
                                                    Subsection MO.VSP  Vector Space Properties 185


Subsection VSP
Vector Space Properties


With definitions of matrix addition and scalar multiplication we can now state, and prove, several properties
of each operation, and some properties that involve their interplay. We now collect ten of them here for
later reference.

Theorem VSPM
Vector Space Properties of Matrices
Suppose that Mmn is the set of all m x n matrices (Definition VSM [182]) with addition and scalar
multiplication as defined in Definition MA [182] and Definition MSM [183]. Then

   " ACM Additive Closure, Matrices
     If A, BE Mmn, then A+B E Mmn.

   " SCM Scalar Closure, Matrices
     If ac E C and A E Mmn, then aA E Mmn.

   " CM    Commutativity, Matrices
     If A, B E Mmn, then A+ B = B+ A.

   " AAM Additive Associativity, Matrices
     If A, B, CE Mmn,then A+(B+C)=(A+B)+C.

   " ZM    Zero Vector, Matrices
     There is a matrix, 0, called the zero matrix, such that A + 0 = A for all A E Mmn.

   " AIM Additive Inverses, Matrices
     If A E Mmn, then there exists a matrix -A E Mmn so that A + (-A) = 0.

   " SMAM Scalar Multiplication Associativity, Matrices
     If a, 3 E C and A E Mmn, then a(3A)= (a,3)A.

   " DMAM Distributivity across Matrix Addition, Matrices
     If a E CC and A, B E Mmn, then a(A + B) = aA + aB.

   " DSAM Distributivity across Scalar Addition, Matrices
     If a, 3EC and AE Mmn, then (a +,3)A = aA + 3A.

   * OM One, Matrices
     If AcE Mmn, then 1A =A.


Proof While some of these properties seem very obvious, they all require proof. However, the proofs are
not very interesting, and border on tedious. We'll prove one version of distributivity very carefully, and
you can test your proof-building skills on some of the others. We'll give our new notation for matrix entries
a workout here. Compare the style of the proofs here with those given for vectors in Theorem VSPCV [86]
-while the objects here are more complicated, our notation makes the proofs cleaner.


   To prove Property DSAM [184], (a+3)A = aA+3A, we need to establish the equality of two matrices
(see Technique GS [689]). Definition ME [182] says we need to establish the equality of their entries,
one-by-one. How do we do this, when we do not even know how many entries the two matrices might
have? This is where Notation ME [182] comes into play. Ready? Here we go.


Version 2.02


﻿
                                            Subsection MO.TSM   Transposes and Symmetric Matrices 186


   For anyiandj, 1<i<m,1<j<n,

                 [(o + 3)A]  = (a + 3) [A]ij                   Definition MSM [183]
                               a [A]2j + 3 [A]zj               Distributivity in C
                             = [aA] + [3A]Zj                   Definition MSM [183]
                               [aA + 3A]2j                     Definition MA [182]

There are several things to notice here. (1) Each equals sign is an equality of numbers. (2) The two ends
of the equation, being true for any i and j, allow us to conclude the equality of the matrices by Definition
ME [182]. (3) There are several plus signs, and several instances of juxtaposition. Identify each one, and
state exactly what operation is being represented by each.                                        U
   For now, note the similarities between Theorem VSPM [184] about matrices and Theorem VSPCV [86]
about vectors.
   The zero matrix described in this theorem, 0, is what you would expect   a matrix full of zeros.

Definition ZM
Zero Matrix
The m x n zero matrix is written as 0O mxn and defined by [O]iJ = 0, for all 1 < i < m, 1 < j <rn.

(This definition contains Notation ZM.)                                                           A


Subsection TSM
Transposes and Symmetric Matrices


We describe one more common operation we can perform on matrices. Informally, to transpose a matrix
is to build a new matrix by swapping its rows and columns.

Definition TM
Transpose of a Matrix
Given an m x n matrix A, its transpose is the n x m matrix At given by

                                [At], = [A],g , 1<5i 5n, 1<5j     m.


(This definition contains Notation TM.)                                                           A

Example TM
Transpose of a 3 x 4 matrix
Suppose

                                      D=4    -1  4


We could formulate the transpose, entry-by-entry, using the definition. But it is easier to just systematically
rewrite rows as columns (or vice-versa). The form of the definition given will be more useful in proofs. So
we have


       3   -1   0
Dt     7    4   3
       2    2   -2
       -3   8   5_


Version 2.02


﻿
                                           Subsection MO.TSM  Transposes and Symmetric Matrices 187


   It will sometimes happen that a matrix is equal to its transpose. In this case, we will call a matrix
symmetric. These matrices occur naturally in certain situations, and also have some nice properties, so
it is worth stating the definition carefully. Informally a matrix is symmetric if we can "flip" it about the
main diagonal (upper-left corner, running down to the lower-right corner) and have it look unchanged.

Definition SYM
Symmetric Matrix
The matrix A is symmetric if A = At.                                                            A

Example SYM
A symmetric 5 x 5 matrix
The matrix
                                         2    3   -9   5    7
                                         3    1    6   -2  -3
                                   E     -9   6    0   -1   9
                                         5   -2   -1   4   -8
                                         7   -3    9   -8  -3_

is symmetric.

   You might have noticed that Definition SYM [186] did not specify the size of the matrix A, as has been
our custom. That's because it wasn't necessary. An alternative would have been to state the definition
just for square matrices, but this is the substance of the next proof. Before reading the next proof, we
want to offer you some advice about how to become more proficient at constructing proofs. Perhaps you
can apply this advice to the next theorem. Have a peek at Technique P [695] now.

Theorem SMS
Symmetric Matrices are Square
Suppose that A is a symmetric matrix. Then A is square.                                         D

Proof We start by specifying A's size, without assuming it is square, since we are trying to prove that,
so we can't also assume it. Suppose A is an m x n matrix. Because A is symmetric, we know by Definition
SM [375] that A = At. So, in particular, Definition ME [182] requires that A and At must have the same
size. The size of At is n x m. Because A has m rows and At has n rows, we conclude that m r= n, and
hence A must be square by Definition SQM [71].                                                  U

   We finish this section with three easy theorems, but they illustrate the interplay of our three new
operations, our new notation, and the techniques used to prove matrix equalities.

Theorem TMA
Transpose and Matrix Addition
Suppose that A and B are m x na matrices. Then (A + B)t =-Ai + Bt.D

Proof The statement to be proved is an equality of matrices, so we work entry-by-entry and use Definition
ME [182]. Think carefully about the objects involved here, and the many uses of the plus sign. For
1 5i 5m, 1 5j rn,

                             (A +B)*  = [ +  ],gDefinition TM [185]


= [A],g + [B]j2                  Definition MA [182]
= [At].. + [BL]..                Definition TM [185]

= [At + BL]..                    Definition MA [182]


Version 2.02


﻿
                                          Subsection MO.MCC   Matrices and Complex Conjugation 188


Since the matrices (A + B)t and At + Bt agree at each entry, Definition ME [182] tells us the two matrices
are equal.                                                                                     U

Theorem TMSM
Transpose and Matrix Scalar Multiplication
Suppose that a E C and A is an m x n matrix. Then (cA) = aAt.
Proof The statement to be proved is an equality of matrices, so we work entry-by-entry and use Definition
ME [182]. Notice that the desired equality is of n x m matrices, and think carefully about the objects
involved here, plus the many uses of juxtaposition. For 1 < i < m, 1 < j < n,

                    [(aA)]..= [caA]2j                    Definition TM [185]
                             =a [A]ij                    Definition MSM [183]
                             =a [At]..                   Definition TM [185]
                             = [aAt].                    Definition MSM [183]

Since the matrices (aA)t and aAt agree at each entry, Definition ME [182] tells us the two matrices are
equal.                                                                                         U

Theorem TT
Transpose of a Transpose
Suppose that A is an m x n matrix. Then (At) t= A.                                             D
Proof We again want to prove an equality of matrices, so we work entry-by-entry and use Definition ME
[182]. For 1 i   m, 1 < j   n,

                      (At)t  =[At]..                      Definition TM [185]

                              = [A]                       Definition TM [185]


   Its usually straightforward to coax the transpose of a matrix out of a computational device.See: Com-
putation TM.MMA [671] Computation TM.T186 [673] Computation TM.SAGE [677].

Subsection MCC
Matrices and Complex Conjugation


As we did with vectors (Definition CCCV [167]), we can define what it means to take the conjugate of a
matrix.
Definition CCM
Complex Conjugate of a Matrix
Suppose A is an m x n~ matrix. Then the conjugate of A, written A is an m x n~ matrix defined by

                                               []=[A]~


(This definition contains Notation CCM.)                                                       A

Example CCM


Complex conjugate of a matrix
If
                                      A=2-i        3    5+4i
                                        -3 + 6i 2- 3i     0


Version 2.02


﻿
                                          Subsection MO.MCC  Matrices and Complex Conjugation 189


then
                                     -   2+i      3    5-4i
                                  A =
                                       -3-6i 2+3i        0


   The interplay between the conjugate of a matrix and the two operations on matrices is what you might
expect.
Theorem CRMA
Conjugation Respects Matrix Addition
Suppose that A and B are mx n matrices. Then A + B      + B.
Proof For1 < i < m, 1 < j <rn,


[A + B = [A +B] j

          =[A]Z + [B] j
          =[A]Z + [B] 3
          [q] i + E]
          [A+ 5]


Definition CCM [187]

Definition MA [182]
Theorem CCRA [681]
Definition CCM [187]
Definition MA [182]


Since the matrices A + B and A+B are equal in each entry, Definition ME [182] says that A + B = A+B.


Theorem CRMSM
Conjugation Respects Matrix Scalar Multiplication
Suppose that a E C and A is an mx n matrix. Then oA =   A.                                    D
Proof For1 < i < m, 1 < j <rn,


  [oA]2j
    [A]ij
= a [A] 2j
  a lqlij

  IaAl 2i


Definition CCM [187]

Definition MSM [183]
Theorem CCRM [682]
Definition CCM [187]
Definition MSM [183]


Since the matrices aA and NA are equal in each entry, Definition ME [182] says that aA = NA.


0


Theorem CCM
Conjugate of the Conjugate of a Matrix
Suppose that A is an m x n matrix. Then (A)
Proof For1 < i < m, 1 < j <rn,


                            =c[A]]
                            =[A] i


A.


El


Definition CCM [187]

Definition CCM [187]
Theorem CCT [682]


Since the matrices (A) and A are equal in each entry, Definition ME [182] says that (A) = A.
   Finally, we will need the following result about matrix conjugation and transposes later.


0


Version 2.02


﻿
                                                         Subsection MO.AM  Adjoint of a Matrix 190


Theorem MCT
Matrix Conjugation and Transposes
Suppose that A is an m x n matrix. Then (At) D= ( ).
Proof For1 < i < m, 1 < j <rn,

                    [(At)]    [At]p                      Definition CCM [187]

                               [A]                       Definition TM [185]
                              [A]..                      Definition CCM [187]

                              =(A)t]                     Definition TM [185]


Since the matrices (At) and ()t are equal in each entry, Definition ME [182] says that (At) =(A)t.


Subsection AM
Adjoint of a Matrix


The combination of transposing and conjugating a matrix will be important in subsequent sections, such
as Subsection MINM.UM [229] and Section OD [601]. We make a key definition here and prove some basic
results in the same spirit as those above.
Definition A
Adjoint
If A is a square matrix, then its adjoint is A* = ()t.
(This definition contains Notation A.)                                                         A
   You will see the adjoint written elsewhere variously as AH, A* or At. Notice that Theorem MCT [189]
says it does not really matter if we conjugate and then transpose, or transpose and then conjugate.
Theorem AMA
Adjoint and Matrix Addition
Suppose A and B are matrices of the same size. Then (A + B)* = A* + B*.                        D
Proof

                  (A + B)*   (A + B)t                     Definition A [189]

                            (7 (+ E)Theorem CRMA [188]
                            =(4)t + (E)t                  Theorem TMA [186]
                            =A* + B*                      Definition A [189]


Theorem AMSM
Adjoint and Matrix Scalar Multiplication


Suppose a E C is a scalar and A is a matrix. Then (aA)* = aA*.                                 D
Proof

                    (aA)*= (cA)t                      Definition A [189]


Version 2.02


﻿
                                                        Subsection MO.READ   Reading Questions 191


                                                        Theorem CRMSM [188]
                           = a (A)t                    Theorem TMSM [187]
                           = aA*                       Definition A [189]


Theorem AA
Adjoint of an Adjoint
Suppose that A is a matrix. Then (A*)* = A
Proof

                    (A*)*   ((A*))t                        Definition A [189]

                          = ((A*)t)                        Theorem MCT [189]

                          S()        )                     Definition A [189]

                          = (4)                            Theorem TT [187]
                          =A                               Theorem CCM [188]


   Take note of how the theorems in this section, while simple, build on earlier theorems and definitions
and never descend to the level of entry-by-entry proofs based on Definition ME [182]. In other words, the
equal signs that appear in the previous proofs are equalities of matrices, not scalars (which is the opposite
of a proof like that of Theorem TMA [186]).

Subsection READ
Reading Questions


  1. Perform the following matrix computation.

                                   2 -2     8   1          2   7  1 2
                               (6) 4   5   -1 3 +(-2) 3 -1 0 5
                                   7 -3     0   2_         1   7  3 3_

  2. Theorem VSPM [184] reminds you of what previous theorem? How strong is the similarity?

  3. Compute the transpose of the matrix below.

                                              [22


Version 2.02


﻿
                                                                   Subsection MO.EXC   Exercises 192


Subsection EXC
Exercises


In Chapter V [83] we defined the operations of vector addition and vector scalar multiplication in Definition
CVA [84] and Definition CVSM [85]. These two operations formed the underpinnings of the remainder of
the chapter. We have now defined similar operations for matrices in Definition MA [182] and Definition
MSM [183]. You will have noticed the resulting similarities between Theorem VSPCV [86] and Theorem
VSPM [184].
   In Exercises M20-M25, you will be asked to extend these similarities to other fundamental definitions
and concepts we first saw in Chapter V [83]. This sequence of problems was suggested by Martin Jackson.

M20    Suppose S = {B1, B2, B3, ..., Bp} is a set of matrices from Mmn. Formulate appropriate def-
initions for the following terms and give an example of the use of each.

  1. A linear combination of elements of S.

  2. A relation of linear dependence on S, both trivial and non-trivial.

  3. S is a linearly independent set.

  4. (S).


Contributed by Robert Beezer


M21 Show that the set S is linearly independent in M2,2.

                                   S {1     001      [0 0     [0 0i
                                   0~[ 0_~ ' 0[_ 0]_ 'o0'1_


Contributed by Robert Beezer

M22 Determine if the set


            3 41 [4 -2 21 -i
S { 3 -2]' [o -ii]' [2


-2
2


22] ' [


1 1
1 0


0     [-1
-2] '0


2
-1


-2
-2] J


is linearly independent in M2,3.

Contributed by Robert Beezer

M23 Determine if the matrix A is
combination of the elements of S.


in the span of S. In other words, is A E (S)? If so write A as a linear


   -13   24    2
-4[i8 -2 -20

S{-2 3 4           4 -2 2       -1
    -1   3  -2' 0     -1   1_' 2


-2
2


22] ' [-1  0   0 J    l


2
-1


-2
-2_


Contributed by Robert Beezer


M24 Suppose Y is the set of all 3 x 3 symmetric matrices (Definition SYM [186]). Find a set T so that
T is linearly independent and (T) = Y.
Contributed by Robert Beezer


Version 2.02


﻿
                                                                  Subsection MO.EXC  Exercises 193


M25 Define a subset of M3,3 by

                            U33 = {A EM3,3        .=[A]  0 whenever i > j}

Find a set R so that R is linearly independent and (R) = U33.
Contributed by Robert Beezer

T13 Prove Property CM [184] of Theorem VSPM [184]. Write your proof in the style of the proof of
Property DSAM [184] given in this section.
Contributed by Robert Beezer Solution [193]

T14 Prove Property AAM [184] of Theorem VSPM [184]. Write your proof in the style of the proof of
Property DSAM [184] given in this section.
Contributed by Robert Beezer

T17 Prove Property SMAM [184] of Theorem VSPM [184]. Write your proof in the style of the proof of
Property DSAM [184] given in this section.
Contributed by Robert Beezer

T18 Prove Property DMAM [184] of Theorem VSPM [184]. Write your proof in the style of the proof of
Property DSAM [184] given in this section.
Contributed by Robert Beezer


Version 2.02


﻿
                                                                   Subsection MO.SOL   Solutions 194


Subsection SOL
Solutions


T13    Contributed by Robert Beezer  Statement [192]
For all A, B E Mmn and for all 1 <i <m, 1 <i <n,


[A + B]Zj = [A]1, + [B]g,
         = [B]2, + [A]ij
         =  [B+A]2j


Definition MA [182]
Commutativity in C
Definition MA [182]


With equality of each entry of the matrices A + B and B + A being equal Definition ME [182] tells us the
two matrices are equal.


Version 2.02


﻿
                                                              Section MM  Matrix Multiplication 195


Section MM
Matrix Multiplication
WV_


We know how to add vectors and how to multiply them by scalars. Together, these operations give us the
possibility of making linear combinations. Similarly, we know how to add matrices and how to multiply
matrices by scalars. In this section we mix all these ideas together and produce an operation known as
"matrix multiplication." This will lead to some results that are both surprising and central. We begin
with a definition of how to multiply a vector by a matrix.

Subsection MVP
Matrix-Vector Product


We have repeatedly seen the importance of forming linear combinations of the columns of a matrix. As one
example of this, the oft-used Theorem SLSLC [93], said that every solution to a system of linear equations
gives rise to a linear combination of the column vectors of the coefficient matrix that equals the vector of
constants. This theorem, and others, motivate the following central definition.
Definition MVP
Matrix-Vector Product
Suppose A is an m x n matrix with columns A1, A2, A3, ..., An and u is a vector of size n. Then the
matrix-vector product of A with u is the linear combination

                          Au =[u]1 A1i+ [u]2 A2 + [u]3 A3-+...+[u] nAn


(This definition contains Notation MVP.)                                                        A
   So, the matrix-vector product is yet another version of "multiplication," at least in the sense that we
have yet again overloaded juxtaposition of two symbols as our notation. Remember your objects, an m x n
matrix times a vector of size n will create a vector of size m. So if A is rectangular, then the size of the
vector changes. With all the linear combinations we have performed so far, this computation should now
seem second nature.
Example MTV
A matrix times a vector
Consider
                                                                          2
                              1'12          412
                       A=   3      0   1    -2]u                          -

                                                                         -1

Then_____                                                                     _

                 Au=2 [23] +1 2~ +(-2) 0i3         +3 [2     +(-1) H]       [ 1]


   We can now represent systems of linear equations compactly with a matrix-vector product (Definition
MVP [194]) and column vector equality (Definition CVE [84]). This finally yields a very popular alternative
to our unconventional IS(A, b) notation.


Version 2.02


﻿
                                                      Subsection MM.MVP   Matrix-Vector Product 196


Theorem SLEMM
Systems of Linear Equations as Matrix Multiplication
The set of solutions to the linear system IS(A, b) equals the set of solutions for x in the vector equation
Ax=b.                                                                                            D
Proof This theorem says that two sets (of solutions) are equal. So we need to show that one set of
solutions is a subset of the other, and vice versa (Definition SE [684]). Let A1, A2, A3, ..., An be the
columns of A. Both of these set inclusions then follow from the following chain of equivalences (Technique
E [690]),

          x is a solution to [S(A, b)
          <     [x]1 A1 + [x]2 A2 + [x]3 A3 + ----+ [x] An = b   Theorem SLSLC [93]
          <     xis a solution to Ax = b                             Definition MVP [194]


Example MNSLE
Matrix notation for systems of linear equations
Consider the system of linear equations from Example NSLE [26].

                                   2x1 + 4x2 - 3x3 + 5X4 + z57= 9
                                          3xi + x2 + X4 - 3x5 = 0
                                 -2x1 + 7x2 - 5x3 + 2X4 + 2x5= -3

has coefficient matrix
                                           2   4 -3 5      1
                                     A=43      1   0   1 -3
                                           -2 7 -5 2       2
and vector of constants
                                                   9
                                            b40
                                                  _-3_
and so will be described compactly by the vector equation Ax = b.
   The matrix-vector product is a very natural computation. We have motivated it by its connections
with systems of equations, but here is a another example.
Example MBC
Money's best cities
Every year Money magazine selects several cities in the United States as the "best" cities to live in, based
on a wide array of statistics about each city. This is an example of how the editors of Money might arrive
at a single number that consolidates the statistics about a city. We will analyze Los Angeles, Chicago and
New York City, based on four criteria: average high temperature in July (Farenheit), number of colleges
and universities in a 30-mile radius, number of toxic waste sites in the Superfund environmental clean-up
program and a personal crime index based on FBI statistics (average =100, smaller is safer). It should be
apparent how to generalize the example to a greater number of cities and a greater number of statistics.
   We begin by building a table of statistics. The rows will be labeled with the cities, and the columns
with statistical categories. These values are from Money's website in early 2005.


City          Temp    Colleges  Superfund   Crime
Los Angeles     77      28         93        254
Chicago         84      38         85        363
New York        84      99          1        193


Version 2.02


﻿
                                                       Subsection MM.MVP    Matrix-Vector Product 197


Conceivably these data might reside in a spreadsheet. Now we must combine the statistics for each city.
We could accomplish this by weighting each category, scaling the values and summing them. The sizes of
the weights would depend upon the numerical size of each statistic generally, but more importantly, they
would reflect the editors opinions or beliefs about which statistics were most important to their readers. Is
the crime index more important than the number of colleges and universities? Of course, there is no right
answer to this question.
   Suppose the editors finally decide on the following weights to employ: temperature, 0.23; colleges, 0.46;
Superfund, -0.05; crime, -0.20. Notice how negative weights are used for undesirable statistics. Then,
for example, the editors would compute for Los Angeles,

                     (0.23) (77) + (0.46) (28) + (-0.05) (93) + (-0.20)(254) = -24.86

This computation might remind you of an inner product, but we will produce the computations for all of
the cities as a matrix-vector product. Write the table of raw statistics as a matrix

                                             77  28  93 254
                                       T =   84  38  85 363
                                             84  99   1   193

and the weights as a vector
                                                   0.23
                                                   0.46
                                            w =
                                                  -0.05
                                                  -0.20]
then the matrix-vector product (Definition MVP [194]) yields

                           77           28              93             254      -24.86
             Tw = (0.23) 84 + (0.46) 38 + (-0.05) 85 + (-0.20) 363      =   -40.05
                           84           99             L               193       26.21

This vector contains a single number for each of the cities being studied, so the editors would rank New
York best (26.21), Los Angeles next (-24.86), and Chicago third (-40.05). Of course, the mayor's offices
in Chicago and Los Angeles are free to counter with a different set of weights that cause their city to be
ranked best. These alternative weights would be chosen to play to each cities' strengths, and minimize
their problem areas.
   If a speadsheet were used to make these computations, a row of weights would be entered somewhere
near the table of data and the formulas in the spreadsheet would effect a matrix-vector product. This
example is meant to illustrate how "linear" computations (addition, multiplication) can be organized as a
matrix-vector product.
   Another example would be the matrix of numerical scores on examinations and exercises for students in
a class. The rows would correspond to students and the columns to exams and assignments. The instructor
could then assign weights to the different exams and assignments, and via a matrix-vector product, compute
a single score for each student.
   Later (much later) we will need the following theorem, which is really a technical lemma (see Technique
LC [696]). Since we are in a position to prove it now, we will. But you can safely skip it for the moment,
if you promise to come back later to study the proof when the theorem is employed. At that point you
will also be able to understand the comments in the paragraph following the proof.
Theorem EMMVP
Equal Matrices and Matrix-Vector Products


Suppose that A and B are m x n matrices such that Ax = Bx for every x E C". Then A = B.      Q
Proof    We are assuming Ax = Bx for all x E C", so we can employ this equality for any choice of
the vector x. However, we'll limit our use of this equality to the standard unit vectors, e3, 1 < j <rn


Version 2.02


﻿
                                                         Subsection MM.MM   Matrix Multiplication 198


(Definition SUV [173]). For all 1 < j < n, 1 < i < m,

      [A]ij =0[A]in+---+0[A]i _1+1[A] j+0[A]i,jy+---.+0[A]in
           = [A]i1 [e3]1 + [A]i2 [ej]2 + [A]i3 [e]3 + - - - + [A]in [ej]n Definition SUV [173]
           = [Ae3]i                                                       Definition MVP [194]
           = [Be3]i                                                       Definition CVE [84]
             [B]i1 [e3]1 + [B]i2 [ej]2 + [B]i3 [e]3 + - - - + [B]in [ej]n Definition MVP [194]
           = 0 [B]ii + - - - + 0 [B] _1 + 1 [B]ij + 0 [B]i1j1 + - - - + 0 [B]in Definition SUV [173]
           = [B]i3

So by Definition ME [182] the matrices A and B are equal, as desired.                             U

   You might notice that the hypotheses of this theorem could be "weakened" (i.e. made less restrictive).
We could suppose the equality of the matrix-vector products for just the standard unit vectors (Definition
SUV [173]) or any other spanning set (Definition TSVS [313]) of C"m (Exercise LISS.T40 [320]). However,
in practice, when we apply this theorem we will only need this weaker form. (If we made the hypothesis
less restrictive, we would call the theorem "stronger.")


Subsection MM
Matrix Multiplication


We now define how to multiply two matrices together. Stop for a minute and think about how you might
define this new operation.
   Many books would present this definition much earlier in the course. However, we have taken great
care to delay it as long as possible and to present as many ideas as practical based mostly on the notion
of linear combinations. Towards the conclusion of the course, or when you perhaps take a second course
in linear algebra, you may be in a position to appreciate the reasons for this. For now, understand that
matrix multiplication is a central definition and perhaps you will appreciate its importance more by having
saved it for later.

Definition MM
Matrix Multiplication
Suppose A is an m x n matrix and B is an n x p matrix with columns B1, B2, B3, ..., By. Then the
matrix product of A with B is the m x p matrix where column i is the matrix-vector product ABi.
Symbolically,
                       AB =A [B1|B2|B3|...|B]=[AB1|AB2|AB3|...AB] .


Example PTM
Product of two matrices
Set


                                                   1    6   2   1
       1    2   -1   4   6                        -1    4   3   2
A=     0   -4   1    2   3                  B=     1    1   2   3
      -5    1   2   -3   41                        6    4   -1  2
                                                   1   -2   3   0


Version 2.02


﻿
                                      Subsection MM.MMEE Matrix Multiplication, Entry-by-Entry 199


Then
                            1       6        2       1
                            -1      4        3       2      [28     17   20  10
                AB =   A    1   Al1      A   2   A  3    =    20   -13 -3 -1 .
                           6        4       -1       2       _-18  -44   12  -3]
                           1       -2        3      -0


   Is this the definition of matrix multiplication you expected? Perhaps our previous operations for
matrices caused you to think that we might multiply two matrices of the same size, entry-by-entry? Notice
that our current definition uses matrices of different sizes (though the number of columns in the first must
equal the number of rows in the second), and the result is of a third size. Notice too in the previous
example that we cannot even consider the product BA, since the sizes of the two matrices in this order
aren't right.
   But it gets weirder than that. Many of your old ideas about "multiplication" won't apply to matrix
multiplication, but some still will. So make no assumptions, and don't do anything until you have a
theorem that says you can. Even if the sizes are right, matrix multiplication is not commutative  order
matters.
Example MMNC
Matrix multiplication is not commutative
Set

                            A  1   3B                              4 0.
                              -1 2                                 5 1

Then we have two square, 2 x 2 matrices, so Definition MM [197] allows us to multiply them in either
order. We find

                           AB  19 3                          B      4  12
                               6   2                                4  17

and AB - BA. Not even close. It should not be hard for you to construct other pairs of matrices that do
not commute (try a couple of 3 x 3's). Can you find a pair of non-identical matrices that do commute? Z
   Matrix multiplication is fundamental, so it is a natural procedure for any computational device.See:
Computation MM.MMA [671].

Subsection MMEE
Matrix Multiplication, Entry-by-Entry


While certain "natural" properties of multiplication don't hold, many more do. In the next subsection,
we'll state and prove the relevant theorems. But first, we need a theorem that provides an alternate means
of multiplying two matrices. In many texts, this would be given as the definition of matrix multiplication.
We prefer to turn it around and have the following formula as a consequence of our definition. It will prove
useful for proofs of matrix equality, where we need to examine products of matrices, entry-by-entry.
Theorem EMP
Entries of Matrix Products


Suppose A is an m x n matrix and B is an n x p matrix. Then for 1 < i <im, 1 < j < p, the individual
entries of AB are given by

                    [AB]gg = [A]1 [B]1j + [A]i2 [B]24 + [A]i3 [B]3j + ...+ [A] i [B]nj


Version 2.02


﻿
                                       Subsection MM.MMEE    Matrix Multiplication, Entry-by-Entry  200

                              n
                              S [A]ik [B]kJ
                              k=1


Proof Denote the columns of A as the vectors A1, A2, A3, ..., An and the columns of B as the vectors
B1, B2, B3, .., By. Then for 1<G i<G m, 1<G j G P,

     [AB]2j = [ABj]i                                                     Definition MM [197]
           - [[B3]1 A1 + [Bj]2 A2 + [Bj]3 A3 + ... + [Bj]n An].          Definition MVP [194]

           =[[Bj]1 A1]2 + [[Bj]2 A2]. + [[B3]3 A3]. + ... + [[Bj]2 An].  Definition CVA [84]
           = [Bj]1 [A1]2 + [Bj]2 [A2]2 + [B3]3 [A3]2 + - - - + [Bj]1 [An]2  Definition CVSM [85]
           = [B]1 [A]i1 + [B]2j [A]i2 + [B]3j [A]i3 + .- + [B]2n [A]i1   Notation ME [182]
           = [A]i1 [B]1 + [A]i2 [B]2j + [A]i3 [B]3j + .- + [A]i1 [B]12   Property CMCN [680]
               n
               S [A]ik [B]kJ
               k=1


Example PTMEE
Product of two matrices, entry-by-entry
Consider again the two matrices from Example PTM [197]

                                                                   1   6    2   1
                       1    2   -1   4   6                        -1   4    3   2
                A=     0   -4    1   2   3                  B=     1    1   2   3
                      -5    1   2   -3   4                         6   4   -1 2
                                                                   1   -2   3   0

Then suppose we just wanted the entry of AB in the second row, third column:

                [ AB]23 - [A]21 [B]13 + [A]22 [B]23 + [A]23 [B]33 + [A]24 [B]43 + [A]25 [B]53
                       =(0)(2) + (-4)(3) + (1)(2) + (2)(-1) + (3)(3) = -3

Notice how there are 5 terms in the sum, since 5 is the common dimension of the two matrices (column
count for A, row count for B). In the conclusion of Theorem EMP [198], it would be the index k that
would run from 1 to 5 in this computation. Here's a bit more practice.
   The entry of third row, first column:

                [AB]31 - [A]31 [B]11 + [A]32 [B]21 + [A]33 [B]31 + [A]34 [B]41 + [A]35 [B]51
                       =(-5) (1) + (1) (-1) + (2) (1) + (-3) (6) + (4) (1) =-18

To get some more practice on your own, complete the computation of the other 10 entries of this product.
Construct some other pairs of matrices (of compatible sizes) and compute their product two ways. First
use Definition MM [197]. Since linear combinations are straightforward for you now, this should be easy
to do and to do correctly. Then do it again, using Theorem EMP [198]. Since this process may take some
practice, use your first computation to check your work.


   Theorem EMP [198] is the way many people compute matrix products by hand. It will also be very
useful for the theorems we are going to prove shortly. However, the definition (Definition MM [197]) is
frequently the most useful for its connections with deeper ideas like the null space and the upcoming
column space.


Version 2.02


﻿
                                          Subsection MM.PMM  Properties of Matrix Multiplication 201


Subsection PMM
Properties of Matrix Multiplication


In this subsection, we collect properties of matrix multiplication and its interaction with the zero matrix
(Definition ZM [185]), the identity matrix (Definition IM [72]), matrix addition (Definition MA [182]),
scalar matrix multiplication (Definition MSM [183]), the inner product (Definition IP [168]), conjugation
(Theorem MMCC [203]), and the transpose (Definition TM [185]). Whew! Here we go. These are great
proofs to practice with, so try to concoct the proofs before reading them, they'll get progressively more
complicated as we go.


Theorem MMZM
Matrix Multiplication and the Zero Matrix
Suppose A is an m x n matrix. Then
1. AO1xp = Omxp
2. OpxmA = Opxm
Proof We'll prove (1) and leave (2) to you. Entry-by-entry, for
                              n
                 [A Onxp]  =E  [A]ik [Onxp]kj
                             k=1
                             n
                             =E [A]ik 0
                             k=1


El


1<i    m, 15j     p,


Theorem EMP [198]


Definition ZM [185]


k=1
0
[Omxp]ij


Property ZCN [681]
Definition ZM [185]


So by the definition of matrix equality (Definition ME [182]), the matrices AO1x, and Omxp are equal. U

Theorem MMIM
Matrix Multiplication and Identity Matrix
Suppose A is an m x n matrix. Then
1. AIn= A
2. ImA = A
Proof Again, we'll prove (1) and leave (2) to you. Entry-by-entry, For 1 < i   m, 1 < j < n,


          n
[AI] - S[A] [Im]k
         k=1
                        n
         = [A] [In] j + E [A]g [Im]k
                       k=1
                       k#j
                       n
          =[A] i (1) + E [A] ik (0)

                    n
          =[A] i + E  0
                   k=1,k#j


Theorem EMP [198]


Property CACN [680]


Definition IM [72]


Version 2.02


﻿
                                           Subsection MM.PMM    Properties of Matrix Multiplication  202


So the matrices A and AI are equal, entry-by-entry, and by the definition of matrix equality (Definition
ME [182]) we can say they are equal matrices.                                                    U
   It is this theorem that gives the identity matrix its name. It is a matrix that behaves with matrix
multiplication like the scalar 1 does with scalar multiplication. To multiply by the identity matrix is to
have no effect on the other matrix.


Theorem MMDAA
Matrix Multiplication Distributes Across Addition
Suppose A is an m x n matrix and B and C are n x p matrices and D is a p x s matrix. Then
1. A(B+C) =AB+AC
2. (B+C)D=BD+CD
Proof We'll do (1), you do (2). Entry-by-entry, for 1 < i <Gm, 1 < j p,


El


[A(B + C)]


n
S [A]ik [B+ C] kJ
k=1
n
S [A]k ([B]kj + [C]kJ)
k=1
n
S [A]i [B]kJ + [A]ik [C]kJ
k=1
n               n
S [A]ik [B]kJ +(S [A]ik [C]kJ
k=1            k=1
[AB]2g + [AC]2j
[AB+ AC]   j


Theorem EMP [198]


Definition MA [182]


Property DCN [681]


Property CACN [680]

Theorem EMP [198]
Definition MA [182]


So the matrices A(B + C) and AB + AC are equal, entry-by-entry, and by the definition of matrix equality
(Definition ME [182]) we can say they are equal matrices.                                        U


Theorem MMSMM
Matrix Multiplication and Scalar Matrix Multiplication
Suppose A is an mx n matrix and B is an n xp matrix. Let a be a scalar. Then a(AB)


Proof These are equalities of matrices. We'll do the first one, the second is similar
practice for you. For 1 < i < m, 1 < j < p,


(aA)B = A(aB).

and will be good


[a(AB)]2j = a [AB]gg
              n
          = 5 (E[A]ik [B] kJ
             k=1
             n
             5 (c [A]ik [B] kJ
             k=1
             n
             S [cA]ik [B] kJ
             k=1
             [(caA)B]ij


Definition MSM [183]

Theorem EMP [198]


Property DCN [681]


Definition MSM [183]

Theorem EMP [198]


So the matrices a(AB) and (aA)B are equal, entry-by-entry, and by the definition of matrix equality
(Definition ME [182]) we can say they are equal matrices.                                        U

Theorem MMA


Version 2.02


﻿
                                           Subsection MM.PMM    Properties of Matrix Multiplication  203


Matrix Multiplication is Associative
Suppose A is an m x n matrix, B is an n x p matrix and D is a p x s matrix. Then A(BD) = (AB)D. D
Proof A matrix equality, so we'll go entry-by-entry, no surprise there. For 1 < i < m, 1 < j < s,


             n
[A(BD)]ij SE[A] [BD] kJ
            k=l
            n          p
            -S[A]   A[B]W [D]g
            k=1 £=1
            n    p
                EE[A] k[B] kf [D]gg
            k=1 f=1


Theorem EMP [198]


Theorem EMP [198]


Property DCN [681]


We can switch the order of the summation since these are finite sums,


p   n
S S [A]2 [B]kW [D]g
f=1 k=1


Property CACN [680]


As [D]fj does not depend on the index k, we can use distributivity to move it outside of the inner sum,


p          n
  E [D]e. E   [A]in [B]l ,g


  p
[D]ej [AB]
f=1
p
  E[AB]Ze [D]gg
f=1
[(AB)D]gg


Property DCN [681]


Theorem EMP [198]


Property CMCN [680]

Theorem EMP [198]


So the matrices (AB)D and A(BD) are equal, entry-by-entry, and by the definition of matrix equality
(Definition ME [182]) we can say they are equal matrices.                                        U
   The statement of our next theorem is technically inaccurate. If we upgrade the vectors u, v to matrices
with a single column, then the expression utV is a 1 x 1 matrix, though we will treat this small matrix as
if it was simply the scalar quantity in its lone entry. When we apply Theorem MMIP [202] there should
not be any confusion.
Theorem MMIP
Matrix Multiplication and Inner Products
If we consider the vectors u, v E Cm as m x 1 matrices then

                                            (u, v) = utv


El


Proof


(u, v) =S   [u]k vk
        k=1
        m
        =1[u]kl Vkl
        k=1


Definition IP [168]


Column vectors as matrices


Version 2.02


﻿
Subsection MM.PMM    Properties of Matrix Multiplication  204


m
S   [utik [ik1
k=1
m
    Z ut]lk [Viki
k=1
[uV]11


Definition TM [185]


Definition CCCV [167]

Theorem EMP [198]


To finish we just blur the distinction between a 1 x 1 matrix (utV) and its lone entry.

Theorem MMCC
Matrix Multiplication and Complex Conjugation
Suppose A is an m x n matrix and B is an n x p matrix. Then AB = A B.
Proof To obtain this matrix equality, we will work entry-by-entry. For 1 < i < m, 1 < j p,


0


D-


AEB    = [AB] j
          n
            [A]ik [B] k
         k=1
         n
         -S [A]i [B]kJ
         k=1
         n
            [A]  [B]kJ
         k=1
         n
      - S ["] i [lkj
         k=1
         =A2;


Definition CCM [187]

Theorem EMP [198]


Theorem CCRA [681]


Theorem CCRM [682]


Definition CCM [187]

Theorem EMP [198]


So the matrices AB and AB are equal, entry-by-entry, and by the definition of matrix equality (Definition
ME [182]) we can say they are equal matrices.                                                    U
   Another theorem in this style, and its a good one. If you've been practicing with the previous proofs
you should be able to do this one yourself.
Theorem MMT
Matrix Multiplication and Transposes
Suppose A is an m x n matrix and B is an n x p matrix. Then (AB)t = BtAt .
Proof This theorem may be surprising but if we check the sizes of the matrices involved, then maybe
it will not seem so far-fetched. First, AB has size m x p, so its transpose has size p x m. The product
of Bt with At is a p x n matrix times an n x m matrix, also resulting in a p x m matrix. So at least
our objects are compatible for equality (and would not be, in general, if we didn't reverse the order of the
matrix multiplication).
   Here we go again, entry-by-entry. For 1 < i < m, 1 < j < p,


[(AB)t] =[AB]
              n
              S [A]ik [B] kJ
              k=1
              n
              = [B] k [A]ik
              k=1


Definition TM [185]

Theorem EMP [198]


Property CMCN [680]


Version 2.02


﻿
                                                            Subsection MM.HM   Hermitian Matrices 205

                                n
                                S  [B](.k [Atlki               Definition TM [185]
                                k=1
                                [B= At]..                      Theorem EMP [198]

So the matrices (AB)t and BtAt are equal, entry-by-entry, and by the definition of matrix equality (Defi-
nition ME [182]) we can say they are equal matrices.                                                U

   This theorem seems odd at first glance, since we have to switch the order of A and B. But if we simply
consider the sizes of the matrices involved, we can see that the switch is necessary for this reason alone.
That the individual entries of the products then come along to be equal is a bonus.
   As the adjoint of a matrix is a composition of a conjugate and a transpose, its interaction with matrix
multiplication is similar to that of a transpose. Here's the last of our long list of basic properties of matrix
multiplication.

Theorem MMAD
Matrix Multiplication and Adjoints
Suppose A is an m x n matrix and B is an n x p matrix. Then (AB)* = B*A*.                           D

Proof

                     (AB)* = (AB)t                         Definition A [189]
                           = (A5)t                         Theorem MMCC [203]
                           = (B)t (A)t                     Theorem MMT [203]
                           = B*A*                          Definition A [189]


   Notice how none of these proofs above relied on writing out huge general matrices with lots of ellipses
("...") and trying to formulate the equalities a whole matrix at a time. This messy business is a "proof
technique" to be avoided at all costs. Notice too how the proof of Theorem MMAD [204] does not use an
entry-by-entry approach, but simply builds on previous results about matrix multiplication's interaction
with conjugation and transposes.
   These theorems, along with Theorem VSPM [184] and the other results in Section MO [182], give you
the "rules" for how matrices interact with the various operations we have defined on matrices (addition,
scalar multiplication, matrix multiplication, conjugation, transposes and adjoints). Use them and use them
often. But don't try to do anything with a matrix that you don't have a rule for. Together, we would
informally call all these operations, and the attendant theorems, "the algebra of matrices." Notice, too,
that every column vector is just a n~ x 1 matrix, so these theorems apply to column vectors also. Finally,
these results, taken as a whole, may make us feel that the definition of matrix multiplication is not so
unnatural.


Subsection HM
Hermitian Matrices


The adjoint of a matrix has a basic property when employed in a matrix-vector product as part of an inner
product. At this point, you could even use the following result as a motivation for the definition of an


adjoint.

Theorem AIP
Adjoint and Inner Product


Version 2.02


﻿
                                                        Subsection MM.HM   Hermitian Matrices 206


Suppose that A is an m x n matrix and x E C"m, y E Cm. Then (Ax, y) = (x, A*y).               D

Proof

                   (Ax, y) = (Ax)t y                     Theorem MMIP [202]
                          = xtAty                        Theorem MMT [203]

                          = xA     y                     Theorem CCM [188]

                          = x (A ~yTheorem MCT [189]

                          = xt(A*)y                      Definition A [189]
                          = xt(A*y)                      Theorem MMCC [203]
                            (x, A*y)                     Theorem MMIP [202]


   Sometimes a matrix is equal to its adjoint (Definition A [189]), and these matrices have interesting
properties. One of the most common situations where this occurs is when a matrix has only real number
entries. Then we are simply talking about symmetric matrices (Definition SYM [186]), so you can view
this as a generalization of a symmetric matrix.

Definition HM
Hermitian Matrix
The square matrix A is Hermitian (or self-adjoint) if A = A*.                                 A

   Again, the set of real matrices that are Hermitian is exactly the set of symmetric matrices. In Section
PEE [419] we will uncover some amazing properties of Hermitian matrices, so when you get there, run
back here to remind yourself of this definition. Further properties will also appear in various sections of the
Topics (Part T [793]). Right now we prove a fundamental result about Hermitian matrices, matrix vector
products and inner products. As a characterization, this could be employed as a definition of a Hermitian
matrix and some authors take this approach.

Theorem HMIP
Hermitian Matrices and Inner Products
Suppose that A is a square matrix of size n. Then A is Hermitian if and only if (Ax, y) = (x, Ay) for all
x, yE C"m.

Proof   (-) This is the "easy half" of the proof, and makes the rationale for a definition of Hermitian
matrices most obvious. Assume A is Hermitian,

                    (Ax, y) =(x, A*y)                     Theorem AIP [204]
                             =(x, Ay)                     Definition HM [205]


(<-) This "half" will take a bit more work. Assume that (Ax, y) =(x, Ay) for all x, y E C"m. Choose any
x C C"m. We want to show that A =A* by establishing that Ax =A*x. With only this much motivation,
consider the inner product,

      (Ax - A*x, Ax - A*x) =(Ax - A*x, Ax) - (Ax - A*x, A*x)           Theorem IPVA [169]


= (Ax - A*x, Ax) - (A (Ax - A*x), x)        Theorem AIP [204]
= (A (Ax - A*x), x) - (A (Ax - A*x), x)     Hypothesis
= 0                                         Property AICN [681]


Version 2.02


﻿
                                                Subsection MM.READ Reading Questions 207


Because this inner product equals zero, and has the same vector in each argument (Ax - A*x), Theorem
PIP [172] gives the conclusion that Ax - A*x = 0. With Ax = A*x for all x E C"m, Theorem EMMVP
[196] says A = A*, which is the defining property of a Hermitian matrix (Definition HM [205]).

   So, informally, Hermitian matrices are those that can be tossed around from one side of an inner
product to the other with reckless abandon. We'll see later what this buys us.

Subsection READ
Reading Questions


  1. Form the matrix vector product of

                    2  3  -1 02
                    1 -2   7  3                with              [3
                    1  5   3  20
                               - - 5

  2. Multiply together the two matrices below (in the order given).


                        2   3  -1 0[2                          ]
                        1 -2    7  3-3                         4
                        1   5   3  20                         2
                                 - -3 -1-

  3. Rewrite the system of linear equations below as a vector equality and using a matrix-vector product.
    (This question does not ask for a solution to the system. But it does ask you to express the system
    of equations in a new form using tools from this section.)

                                     2xi + 3X2 - X3 = 0
                                     xi + 2X2 + X3 = 3
                                     xl + 3x2 + 3x3 = 7


Version 2.02


﻿
                                                                 Subsection MM.EXC   Exercises 208


Subsection EXC
Exercises


C20 Compute the product of the two matrices below, AB. Do this using the definitions of the matrix-
vector product (Definition MVP [194]) and the definition of matrix multiplication (Definition MM [197]).


                            A 2]5                             1 5 -3      4
                                    -1  3                B    2 0    2   -3]
                            2 -2_

Contributed by Robert Beezer Solution [209]

T10 Suppose that A is a square matrix and there is a vector, b, such that [S(A, b) has a unique solution.
Prove that A is nonsingular. Give a direct proof (perhaps appealing to Theorem PSPHS [105]) rather than
just negating a sentence from the text discussing a similar situation.
Contributed by Robert Beezer Solution [209]

T20 Prove the second part of Theorem MMZM [200].
Contributed by Robert Beezer

T21 Prove the second part of Theorem MMIM [200].
Contributed by Robert Beezer

T22 Prove the second part of Theorem MMDAA [201].
Contributed by Robert Beezer

T23 Prove the second part of Theorem MMSMM [201].
Contributed by Robert Beezer Solution [209]

T31   Suppose that A is an m x n matrix and x, y E N(A). Prove that x + y E N(A).
Contributed by Robert Beezer

T32   Suppose that A is an m x n matrix, a E C, and x E N(A). Prove that ax E N(A).
Contributed by Robert Beezer

T40   Suppose that A is an m x n matrix and B is an n x p matrix. Prove that the null space of B is a
subset of the null space of AB, that is N(B) C N(AB). Provide an example where the opposite is false,
in other words give an example where N(AB)   N(B).
Contributed by Robert Beezer Solution [209]

T41 Suppose that A is an n x n nonsingular matrix and B is an n~ x p matrix. Prove that the null space
of B is equal to the null space of AB, that is P1(B) =P1(AB). (Compare with Exercise MM.T40 [207].)
Contributed by Robert Beezer Solution [210]

T50 Suppose u and v are any two solutions of the linear system [S(A, b). Prove that u - v is an
element of the null space of A, that is, u - v C P1(A).
Contributed by Robert Beezer

T51 Give a new proof of Theorem PSPHS [105] replacing applications of Theorem SLSLC [93] with
matrix-vector products (Theorem SLEMM [195]).


Contributed by Robert Beezer Solution [210]

T52 Suppose that x, y E C"m, b E Cm and A is an m x n matrix. If x, y and x + y are each a solution to
the linear system [S(A, b), what interesting can you say about b? Form an implication with the existence


Version 2.02


﻿
                                                                      Subsection MM.EXC  Exercises 209


of the three solutions as the hypothesis and an interesting statement about IJS(A, b) as the conclusion,
and then give a proof.
Contributed by Robert Beezer Solution [210]


Version 2.02


﻿
                                                                  Subsection MM.SOL   Solutions 210


Subsection SOL
Solutions


C20    Contributed by Robert Beezer   Statement [207]
By Definition MM [197],


AB=4


[


2
-1
2


5    11
3   12
  -2


[


2
-1
2


5 -2 [1


[~


2
-1
2


5 -
  5  -31
3
2


[


2
-1
2


5       -
      1
  2 ~]


Repeated applications of Definition MVP [194] give


    2        5        2
1 -1 +2 3          5 -1]
   2  _      -2_      2
12 10      4    -7
5    -5    9   -13
-2 10 -10       14


      5
+0 3
     -2


-3


21
-1
2


      5        2
+ 2 3   4 -1
     L-2_      2


         5
+ (-3) 3
        -2_


T10    Contributed by Robert Beezer   Statement [207]
Since [S(A, b) has at least one solution, we can apply Theorem PSPHS [105]. Because the solution is
assumed to be unique, the null space of A must be trivial. Then Theorem NMTNS [74] implies that A is
nonsingular.
   The converse of this statement is a trivial application of Theorem NMUS [74]. That said, we could
extend our NSMxx series of theorems with an added equivalence for nonsingularity, "Given a single vector
of constants, b, the system IJS(A, b) has a unique solution."
T23    Contributed by Robert Beezer   Statement [207]
We'll run the proof entry-by-entry.


                  [a(AB)]ij =a [AB]gg
                               n
                           =a E   [A]ik [B] k
                              k=1
                              n
                              =  a [A]ik [B]kJ
                              k=1
                              n
                              S  [A]g a [B] kJ
                              k=1
                              n
                              S [A]g [caB] kJ
                              k=1
                              [A(caB)]jj

So the matrices a(AB) and A(aB) are equal, entry-by-entry,
(Definition ME [182]) we can say they are equal matrices.


Definition MSM [183]

Theorem EMP [198]


Distributivity in C


Commutativity in C


Definition MSM [183]

Theorem EMP [198]

and by the definition of matrix equality


T40    Contributed by Robert Beezer   Statement [207]
To prove that one set is a subset of another, we start with an element of the smaller set and see if we can
determine that it is a member of the larger set (Definition SSET [683]). Suppose x E N(B). Then we
know that Bx = 0 by Definition NSM [64]. Consider


(AB)x = A(Bx)


Theorem MMA [202]


Version 2.02


﻿
Subsection MM.SOL  Solutions 211


AO
0


Hypothesis
Theorem MMZM [200]


This establishes that x E N(AB), so N(B) c N(AB).
   To show that the inclusion does not hold in the opposite direction, choose B to be any nonsingular
matrix of size n. Then N(B) = {0} by Theorem NMTNS [74]. Let A be the square zero matrix, 0, of
the same size. Then AB = OB = 0 by Theorem MMZM [200] and therefore N(AB) = C", and is not a
subset of N(B) = {0}.
T41    Contributed by David Braithwaite  Statement [207]
From the solution to Exercise MM.T40 [207] we know that N(B) C P1(AB). So to establish the set
equality (Definition SE [684]) we need to show that N(AB) C N(B).
   Suppose x E N(AB). Then we know that ABx = 0 by Definition NSM [64]. Consider


o = (AB)x
  = A (Bx)


Definition NSM [64]
Theorem MMA [202]


So, Bx E N(A). Because A is nonsingular, it has a trivial null space (Theorem NMTNS [74]) and we
conclude that Bx = 0. This establishes that x E N(B), so N(AB) C N(B) and combined with the
solution to Exercise MM.T40 [207] we have N(B) =N(AB) when A is nonsingular.
T51    Contributed by Robert Beezer   Statement [207]
We will work with the vector equality representations of the relevant systems of equations, as described by
Theorem SLEMM [195].
   (<) Suppose y = w + z and z E N(A). Then


Ay = A(w + z)
   = Aw + Az
   =b+0


Substitution
Theorem MMDAA [201]
z E N(A)
Property ZC [86]


demonstrating that y is a solution.
   (-) Suppose y is a solution to [S(A, b). Then

                  A(y-w) =Ay-Aw
                           = b - b
                           =0


which says that y - w E N(A). In other words, y-
y =w + z, as desired.


       Theorem MMDAA [201]
       y, w solutions to Ax = b
       Property AIC [86]


w = z for some vector z E N(A). Rewritten, this is


T52    Contributed by Robert Beezer  Statement [207]
[S(A, b) must be homogeneous. To see this consider that


b=Ax
  =Ax+0
  = Ax + Ay - Ay
  A (x+ y) -Ay


Theorem SLEMM [195]
Property ZC [86]
Property AIC [86]
Theorem MMDAA [201]


Version 2.02


﻿
                                                                   Subsection MM.SOL   Solutions 212


                     = b - b                              Theorem SLEMM [195]
                     = 0                                 Property AIC [86]

By Definition HS [62] we see that IS(A, b) is homogeneous.


Version 2.02


﻿
Section MISLE  Matrix Inverses and Systems of Linear Equations 213


Section MISLE
Matrix Inverses and Systems of Linear Equations
U.-


0


We begin with a familiar example, performed in a novel way.

Example SABMI
Solutions to Archetype B with a matrix inverse
Archetype B [707] is the system of m = 3 linear equations in n = 3 variables,

                                     -7xi - 6x2 - 12x3 = -33
                                       5xi + 5x2 + 7x3 = 24
                                              zi +4x3= 5

By Theorem SLEMM [195] we can represent this system of equations as

                                             Ax=b


where


      -7  -6   -12]
A=     5   5    7
      1 0 4 _


     X
x = x2
     za_


      -33
b = 24
       5


I


We'll pull a rabbit out of our hat and present the 3 x 3 matrix B,


      -10
B =    1
       2


-12
8
3


-9
2i
21


and note that


       -10
BA = 1


-12
8
3


-9    -7
iii   5
2     1


-6
5
0


-12     1  0  0
7    =  0  1 0
4]      0 0   1


Now apply this computation to the problem of solving the system of equations,


x = I3x
  - (BA)x
  =B(Ax)
  = Bb


Theorem MMIM [200]
Substitution
Theorem MMA [202]
Substitution


So we have


           -10
x=Bb= 1


-12
8
3


-9  -33] -3
   21 24   =    5
       5  _ 2 _


So with the help and assistance of B we have been able to determine a solution to the system represented
by Ax = b through judicious use of matrix multiplication. We know by Theorem NMUS [74] that since
the coefficient matrix in this example is nonsingular, there would be a unique solution, no matter what the
choice of b. The derivation above amplifies this result, since we were forced to conclude that x = Bb and


Version 2.02


﻿
                                                          Subsection MISLE.IM  Inverse of a Matrix 214


the solution couldn't be anything else. You should notice that this argument would hold for any particular
value of b.

   The matrix B of the previous example is called the inverse of A. When A and B are combined via
matrix multiplication, the result is the identity matrix, which can be inserted "in front" of x as the first
step in finding the solution. This is entirely analogous to how we might solve a single linear equation like
3x =12.
                              x = l=      -(3)) x =   (3x) = -(12)= 4
                                          3         3        3

Here we have obtained a solution by employing the "multiplicative inverse" of 3, 3-1 = 1. This works
fine for any scalar multiple of x, except for zero, since zero does not have a multiplicative inverse. For
matrices, it is more complicated. Some matrices have inverses, some do not. And when a matrix does
have an inverse, just how would we compute it? In other words, just where did that matrix B in the last
example come from? Are there other matrices that might have worked just as well?

Subsection IM
Inverse of a Matrix


Definition MI
Matrix Inverse
Suppose A and B are square matrices of size n such that AB = I, and BA = In. Then A is invertible
and B is the inverse of A. In this situation, we write B = A-1.
(This definition contains Notation MI.)                                                            A

   Notice that if B is the inverse of A, then we can just as easily say A is the inverse of B, or A and B
are inverses of each other.
   Not every square matrix has an inverse. In Example SABMI [212] the matrix B is the inverse the
coefficient matrix of Archetype B [707]. To see this it only remains to check that AB =-13. What about
Archetype A [702]? It is an example of a square matrix without an inverse.

Example MWIAA
A matrix without an inverse, Archetype A
Consider the coefficient matrix from Archetype A [702],

                                                1 -1 2
                                          A=42      1   1
                                                1   1   0

Suppose that A is invertible and does have an inverse, say B. Choose the vector of constants


and consider the system of equations IJS(A, b). Just as in Example SABMI [212], this vector equation
would have the unique solution x =Bb.
   However, the system [S(A, b) is inconsistent. Form the augmented matrix [A |b] and row-reduce to


10       1    0
0       -1 0
0   0    0 1


Version 2.02


﻿
                                          Subsection MISLE.CIM   Computing the Inverse of a Matrix 215


which allows to recognize the inconsistency by Theorem RCLS [53].
   So the assumption of A's inverse leads to a logical inconsistency (the system can't be both consistent
and inconsistent), so our assumption is false. A is not invertible.
   Its possible this example is less than satisfying. Just where did that particular choice of the vector b
come from anyway? Stay tuned for an application of the future Theorem CSCS [237] in Example CSAA
[241].
   Let's look at one more matrix inverse before we embark on a more systematic study.
Example MI
Matrix inverse
Consider the matrices,

                     1   2    1    2    1                      -3   3    6   -1 -2
                     -2  -3   0   -5   -1                       0   -2   -5  -1    1
              A      1    1   0    2    1                B      1    2   4    1   -1
                    -2   -3  -1 -3     -2                       1    0   1    1    0
                    -1   -3  -1 -3      1                       1   -1 -2     0    1

Then
                     1    2    1   2    1    -3    3    6   -1 -2        1 0 0 0 0
                     -2 -3    0   -5 -1       0   -2 -5 -1       1       0 1 0 0 0
             AB=     1    1    0   2    1     1    2    4   1   -1 =     0 0 1 0 0
                    -2 -3 -1 -3 -2            1    0    1   1    0       0 0 0 1 0
                    -1 -3 -1 -3         1     1   -1 -2     0    1       0 0 0 0 1

and

                    -3    3    6  -1 -2       1    2    1   2    1       1 0 0 0 0
                    0    -2 -5 -1       1    -2 -3     0   -5 -1         0 1 0 0 0
             BA=     1    2    4   1   -1     1    1    0   2    1    =00      1 0 0
                     1    0    1   1    0    -2 -3 -1 -3 -2              0 0 0 1 0
                     1   -1 -2     0    1    -1 -3 -1 -3         1       0 0 0 0 1

so by Definition MI [213], we can say that A is invertible and write B = A-1.
   We will now concern ourselves less with whether or not an inverse of a matrix exists, but instead with
how you can find one when it does exist. In Section MINM [226] we will have some theorems that allow
us to more quickly and easily determine just when a matrix is invertible.

Subsection CIM
Computing the Inverse of a Matrix


We've seen that the matrices from Archetype B [707] and Archetype K [746] both have inverses, but these
inverse matrices have just dropped from the sky. How would we compute an inverse? And just when is
a matrix invertible, and when is it not? Writing a putative inverse with nt2 unknowns and solving the
resultant nt2 equations is one approach. Applying this approach to 2 x 2 matrices can get us somewhere,
so just for fun, let's do it.
Theorem TTMI


Two-by-Two Matrix Inverse
Suppose
                                            A =[a b
                                                  c d


Version 2.02


﻿
                                           Subsection MISLE.CIM   Computing the Inverse of a Matrix 216


Then A is invertible if and only if ad - bc # 0. When A is invertible, then

                                       A-1 -     1=
                                              ad- bc [-c    a


Proof    (<) Assume that ad - bc # 0. We will use the definition of the inverse of a matrix to establish
that A has inverse (Definition MI [213]). Note that if ad - bc # 0 then the displayed formula for A-1 is
legitimate since we are not dividing by zero). Using this proposed formula for the inverse of A, we compute

                               A-1     a  b       1     d   -b
                                       cA      ad-bc [         ]-c a
                                         1 ad-bc   b    ad0   1        0 1O1
                                      ad - bc     0     ad - bc     0Lo1-

and

                                A-'A     1     d   -bl a    bl
                                      ad- bc -c     aLc d-
                                         1 ad-bc   b    ad0   1        0 1 O
                                      ad - bc     0     ad - bc     0Lo1

By Definition MI [213] this is sufficient to establish that A is invertible, and that the expression for A-1 is
correct.
    (-) Assume that A is invertible, and proceed with a proof by contradiction (Technique CD [692]),
by assuming also that ad - bc= 0. This translates to ad = bc. Let

                                             B =[ "]
                                                     gh_
be a putative inverse of A. This means that

                            12 = AB= a      b  e f    _ae + bg a f + bh
                                        c d    g  h      ce+dg    cf +dh
Working on the matrices on both ends of this equation, we will multiply the top row by c and the bottom
row by a.
                                    c 0       ace + bcg  acf + bch
                                    0 a      ace + adg   acf + adh
We are assuming that ad = bc, so we can replace two occurrences of ad by bc in the bottom row of the
right matrix.
                                         c   [ _ac+bg acf +bch
                                    [    I     c     c   acf +bc
The matrix on the right now has two rows that are identical, and therefore the same must be true of the
matrix on the left. Given the form of the matrix on the left, identical rows implies that a =0 and c =0.
   With this information, the product AB becomes

                           1 0=I=AB=ae+bg                af +bh   _ [bg   bh
                           [0 1 ~I~       B     ce+dg    cf+dh]      [dg dh]

So bg =dh =1 and thus b, g, d, h are all nonzero. But then bh and dg (the "other corners") must also


be nonzero, so this is (finally) a contradiction. So our assumption was false and we see that ad - bc # 0
whenever A has an inverse.                                                                           U
   There are several ways one could try to prove this theorem, but there is a continual temptation to divide
by one of the eight entries involved (a through f), but we can never be sure if these numbers are zero or


Version 2.02


﻿
                                        Subsection MISLE.CIM Computing the Inverse of a Matrix 217


not. This could lead to an analysis by cases, which is messy, messy, messy. Note how the above proof
never divides, but always multiplies, and how zero/nonzero considerations are handled. Pay attention to
the expression ad - bc, as we will see it again in a while (Chapter D [370]).
   This theorem is cute, and it is nice to have a formula for the inverse, and a condition that tells us when
we can use it. However, this approach becomes impractical for larger matrices, even though it is possible
to demonstrate that, in theory, there is a general formula. (Think for a minute about extending this result
to just 3 x 3 matrices. For starters, we need 18 letters!) Instead, we will work column-by-column. Let's
first work an example that will motivate the main theorem and remove some of the previous mystery.
Example CMI
Computing a matrix inverse
Consider the matrix defined in Example MI [214] as,

                                        1   2    1   2    1
                                        -2  -3   0   -5  -1
                                 A      1    1   0   2    1
                                       -2   -3  -1   -3  -2
                                       -1   -3  -1   -3   1

For its inverse, we desire a matrix B so that AB = I5. Emphasizing the structure of the columns and
employing the definition of matrix multiplication Definition MM [197],

                                                AB=I5
                                 A[B1|B2|B3|B4|B5] = [e1|e2|e3|e4|es]
                           [ABiAB2|AB3|AB4|AB5] =_[e1|e2|e3|e4|es].

Equating the matrices column-by-column we have

        AB1 = ei         AB2 = e2          AB3 = e3         AB4 = e4         AB5 = e5.

Since the matrix B is what we are trying to compute, we can view each column, Bi, as a column vector of
unknowns. Then we have five systems of equations to solve, each with 5 equations in 5 variables. Notice
that all 5 of these systems have the same coefficient matrix. We'll now solve each system in turn,


Row-reduce the augmented matrix of the linear system [S(A, el),

        1   2    1   2    11                   10    0   00-3                        --3
        -2 -3 0-5 -1 0                        0WO         0 00                        0
        1    1   0   2    1   0  RREF:K       0   o  2o        Ol1so          B1 =    1
        -2 -3 -1-3-2 0               0          0     o 2o         1                  1
        -1 -3 -1-3 10_                        00 0 0W              1                _1

Row-reduce the augmented matrix of the linear system IJS(A, e2),


1    2    1   2    1  0                1   0   0   0   0   3                  3
-2-30-5-1 1                            0           00-2                       -2
1    1    0   2    1  0   RREF:        0   o o 02              so      B2 =   2
-2 -3 -1 -3 -20                        0 0 0OWO0 0                            0
-1  -3   -1  -3    1  0_0                  00      0[ -1_-1


Version 2.02


﻿
Subsection MISLE.CIM


Computing the Inverse of a Matrix 218


Subsection MISLECIM Computing the Inverse of a Matrix 218


Row-reduce the augmented matrix of the linear system IS(A, e3),


1     2
-2 -3
1     1
-2 -3
-1 -3


1    2
0   -5
0    2
-1 -3
-1 -3


1   0
-1 0
S   1   RREF:
-2 0
1   0


1 0       0   0    0   6
0         0   0    0   -5
0     0       0    0   4   so
0     0   0        0   1
0     0   0   0        -2


B3 =


6
-5
4
1
-2


Row-reduce the augmented matrix of the linear system [S(A, e4),


1     2
-2 -3
1     1
-2 -3
-1 -3


1    2
0   -5
0    2
-1 -3
-1 -3


1   0
-1 0
1   0   RREF
-2 1
1   0


1     0   0   0    0   -1-
0    F    0   0    0   -1
0     0  F    0    0   1   so
0     0   0  F     0   1
0     0   0   0        0


B4 =


-1
-1
1
1
0 _


Row-reduce the augmented matrix of the linear system [S(A, e5),


1
-2
1
-2
-1


2     1   2    1   0
-3    0   -5 -1 0
1     0   2    1   0   RREF
-3 -1 -3 -2 0
-3 -1 -3       1   1


1     0   0   0    0   -2
0    0    0   0    0   1
0     0       0    0   -1  so
0     0   0        0   0
 0    0   0   0        1


B5-


-2
1
-1
0
1


We can now collect our 5 solution vectors into the matrix B,


B =[B1B2|B3|B4
       -3      3
       0      -2
  =     1      2
        1      0
        1 _   -1
        -3    3
        0  -2  
     =-1 2
         1    0
         1   -1


   6      -1     -2
   -5     -1      1
   4       1     -1
   1       1     0
   _-2_ _ 0_ _1_
6   -1 -2
-5 -1     1
4    1   -1
1    1    0
-2   0    1


By this method, we know that AB =I5. Check that BA = I5, and then we will know that we have the
inverse of A.

   Notice how the five systems of equations in the preceding example were all solved by exactly the same
sequence of row operations. Wouldn't it be nice to avoid this obvious duplication of effort? Our main
theorem for this section follows, and it mimics this previous example, while also avoiding all the overhead.

Theorem CINM
Computing the Inverse of a Nonsingular Matrix
Suppose A is a nonsingular square matrix of size n. Create the n x 2n matrix M by placing the n x n
identity matrix In to the right of the matrix A. Let N be a matrix that is row-equivalent to M and


Version 2.02


﻿
                                          Subsection MISLE.CIM   Computing the Inverse of a Matrix 219


in reduced row-echelon form. Finally, let J be the matrix formed from the final n columns of N. Then
AJ=In.                                                                                            D
Proof A is nonsingular, so by Theorem NMRRI [72] there is a sequence of row operations that will
convert A into In. It is this same sequence of row operations that will convert M into N, since having
the identity matrix in the first n columns of N is sufficient to guarantee that N is in reduced row-echelon
form.
   If we consider the systems of linear equations, [S(A, ei), 1 < i < n, we see that the aforementioned
sequence of row operations will also bring the augmented matrix of each of these systems into reduced row-
echelon form. Furthermore, the unique solution to [S(A, ei) appears in column n + 1 of the row-reduced
augmented matrix of the system and is identical to column n + i of N. Let N1, N2, N3, ..., N2n denote
the columns of N. So we find,

              AJ =A[Nn+1|Nn+2|Nn+3 ...INn+n]
                 =[ANn+1|ANn+2|ANn+3. ...|ANn+n]                   Definition MM [197]
                 =[e1|e2|e3 ... en]
                 =In                                               Definition IM [72]

as desired.                                                                                       U

   We have to be just a bit careful here about both what this theorem says and what it doesn't say. If A
is a nonsingular matrix, then we are guaranteed a matrix B such that AB = In, and the proof gives us a
process for constructing B. However, the definition of the inverse of a matrix (Definition MI [213]) requires
that BA = In also. So at this juncture we must compute the matrix product in the "opposite" order before
we claim B as the inverse of A. However, we'll soon see that this is always the case, in Theorem OSIS
[227], so the title of this theorem is not inaccurate.
   What if A is singular? At this point we only know that Theorem CINM [217] cannot be applied.
The question of A's inverse is still open. (But see Theorem NI [228] in the next section.) We'll finish by
computing the inverse for the coefficient matrix of Archetype B [707], the one we just pulled from a hat in
Example SABMI [212]. There are more examples in the Archetypes (Appendix A [698]) to practice with,
though notice that it is silly to ask for the inverse of a rectangular matrix (the sizes aren't right) and not
every square matrix has an inverse (remember Example MWIAA [213]?).
Example CMIAB
Computing a matrix inverse, Archetype B
Archetype B [707] has a coefficient matrix given as

                                        B -7 -6    -12


Exercising Theorem CINM [217] we set

                                              - -6-12    1 001
                                       M= 5    5        0 10.
                                       M1[ -6 4 0 0 1]


which row reduces to


      1  0  0  -10   -12   -9
N= 0 1 01             8
      [0 0  1         3


Version 2.02


﻿
                                               Subsection MISLE.PMI Properties of Matrix Inverses 220


So

                                         -10  -12   -9
                                 B-1 =    13    8    11
                                          2          2
                                          5     3    5
                                          - 2        2 -

once we check that B-1B = 13 (the product in the opposite order is a consequence of the theorem).

   While we can use a row-reducing procedure to compute any needed inverse, most computational devices
have a built-in procedure to compute the inverse of a matrix straightaway. See:  Computation MI.MMA
[671] Computation MI. SAGE [677].

Subsection PMI
Properties of Matrix Inverses


The inverse of a matrix enjoys some nice properties. We collect a few here. First, a matrix can have but
one inverse.

Theorem MIU
Matrix Inverse is Unique
Suppose the square matrix A has an inverse. Then A-1 is unique.                                  D

Proof As described in Technique U [693], we will assume that A has two inverses. The hypothesis tells
there is at least one. Suppose then that B and C are both inverses for A. Then, repeated use of Definition
MI [213] and Theorem MMIM [200] plus one application of Theorem MMA [202] gives

                      B = BIn                           Theorem MMIM [200]
                         = B(AC)                        Definition MI [213]
                         = (BA)C                        Theorem MMA [202]
                         = InC                          Definition MI [213]
                         = C                            Theorem MMIM [200]


So we conclude that B and C are the same, and cannot be different. So any matrix that acts like an
inverse, must be the inverse.                                                                    U

   When most of us dress in the morning, we put on our socks first, followed by our shoes. In the evening
we must then first remove our shoes, followed by our socks. Try to connect the conclusion of the following
theorem with this everyday example.

Theorem SS
Socks and Shoes
Suppose A and B are invertible matrices of size n. Then (AB)-1 = B-1A-1 and AB is an invertible
matrix.D

Proof At the risk of carrying our everyday analogies too far, the proof of this theorem is quite easy when
we compare it to the workings of a dating service. We have a statement about the inverse of the matrix
AB, which for all we know right now might not even exist. Suppose AB was to sign up for a dating service


with two requirements for a compatible date. Upon multiplication on the left, and on the right, the result
should be the identity matrix. In other words, AB's ideal date would be its inverse.
   Now along comes the matrix B-1A-1 (which we know exists because our hypothesis says both A and
B are invertible and we can form the product of these two matrices), also looking for a date. Let's see if


Version 2.02


﻿
                                              Subsection MISLE.PMI Properties of Matrix Inverses 221


B-1A-1 is a good match for AB. First they meet at a non-committal neutral location, say a coffee shop,
for quiet conversation:


(B-1A-1)(AB) = B-1(A-1A)B
               = B-llnB
               = B-1B

               = In


Theorem MMA [202]
Definition MI [213]
Theorem MMIM [200]
Definition MI [213]


The first date having gone smoothly, a second, more serious, date is arranged, say dinner and a show:


(AB)(B-1A-) = A(BB-')A-1
               = AInA-1
               = AA-1
               = In


Theorem MMA [202]
Definition MI [213]
Theorem MMIM [200]
Definition MI [213]


So the matrix B-1A-1 has met all of the requirements to be AB's inverse (date) and with the ensuing
marriage proposal we can announce that (AB)-1 = B-1A-1.                                         U

Theorem MIMI
Matrix Inverse of a Matrix Inverse
Suppose A is an invertible matrix. Then A-1 is invertible and (A-1)-1 = A.                      D

Proof As with the proof of Theorem SS [219], we examine if A is a suitable inverse for A-1 (by definition,
the opposite is true).


AA-1 = In


and


Definition MI [213]


Definition MI [213]


A1A =In


The matrix A has met all the requirements to be the inverse of A-1, and so is invertible and we can write
A = (A-1)-1.

Theorem MIT
Matrix Inverse of a Transpose
Suppose A is an invertible matrix. Then At is invertible and (A)-1 - (A-l)t.
Proof As with the proof of Theorem SS [219], we see if (A-1)t is a suitable inverse for At. Apply Theorem
MMT [203] to see that


and


(A-1)A  =(AA-l)t
         = It
         ~In

         = In


A(A-1)t = (A-lA)t
         = It
         =In
           =1I


Theorem MMT [203]
Definition MI [213]
In is symmetric


Theorem MMT [203]
Definition MI [213]
In is symmetric


Version 2.02


﻿
                                                       Subsection MISLE.READ   Reading Questions 222


The matrix (A1)t has met all the requirements to be the inverse of At, and so is invertible and we can
write (A)l-= (A-1)t.


Theorem MISM
Matrix Inverse of a Scalar Multiple
Suppose A is an invertible matrix and a is a nonzero scalar. Then (aA)-1


A-1 and aA is invertible.


Proof As with the proof of Theorem SS [219], we see if 'A-1 is a suitable inverse for aA.


1
-A


-) (aA)


(-a~ (AA-1)

lIn
In


Theorem MMSMM [201]

Scalar multiplicative inverses
Property OM [184]


and


(aA) A-')


(I- (A-1A)

lIn
In


Theorem MMSMM [201]

Scalar multiplicative inverses
Property OM [184]


The matrix   A-1 has met all the requirements to be the inverse of aA, so we can write (6A)-


1A-l.
a


   Notice that there are some likely theorems that are missing here. For example, it would be tempting
to think that (A + B)-1 = A-1 + B-1, but this is false. Can you find a counterexample? (See Exercise
MISLE.T10 [223].)

Subsection READ
Reading Questions


1. Compute the inverse of the matrix below.


4 10
E2 6


2. Compute the inverse of the matrix below.


2     3    1
1    -2   -3
L-2   4    6


3. Explain why Theorem SS [219] has the title it does. (Do not just state the theorem, explain the
   choice of the title making reference to the theorem itself.)


Version 2.02


﻿
                                                           Subsection MISLE.EXC Exercises 223


Subsection EXC
Exercises


C21 Verify that B is the inverse of A.
                      1   1   -1   2                        4   2   0   -1

               A=-2 -1        2   -3                 B=     8   4   -1 -1
                      1   1   0    2                       -1   0   1    0
                      -1  2   0    2 _                     -6 -3    1    1_

Contributed by Robert Beezer Solution [224]

C22 Recycle the matrices A and B from Exercise MISLE.C21 [222] and set
                               21

                         c=                                d=K


Employ the matrix B to solve the two linear systems IJS(A, c) and [S(A, d).
Contributed by Robert Beezer Solution [224]

C23 If it exists, find the inverse of the 2 x 2 matrix
                                         A [7 3
                                              5 2
and check your answer. (See Theorem TTMI [214].)
Contributed by Robert Beezer

C24 If it exists, find the inverse of the 2 x 2 matrix

                                         A[= 6 3
                                              4 2
and check your answer. (See Theorem TTMI [214].)
Contributed by Robert Beezer

C25 At the conclusion of Example CMI [216], verify that BA = I5 by computing the matrix product.
Contributed by Robert Beezer

C26 Let
                                        1  -1   3   -2  1
                                        -2  3  -5   3   0
                                 D=     1  -1   4   -2  2
                                       -1   4  -1   0   4
                                       1    0   5   -2  5_
Compute the inverse of D, D-1, by forming the 5 x 10 matrix [D I I] and row-reducing (Theorem CINM
[217]). Then use a calculator to compute D-1 directly.
Contributed by Robert Beezer Solution [224]

C27 Let


      1   -1   3   -2   1
      -2   3  -5   3   -1
E=    1   -1   4   -2   2
      -1   4  -1   0    2
      1    0   5   -2   4


Version 2.02


﻿
                                                                Subsection MISLE.EXC   Exercises 224


Compute the inverse of E, E-1, by forming the 5 x 10 matrix [E  I5] and row-reducing (Theorem CINM
[217]). Then use a calculator to compute E-1 directly.
Contributed by Robert Beezer Solution [224]

C28 Let
                                             1   1    3    1
                                        -2 -1 -4 -1
                                        C    1   4    10   2
                                            -2   0   -4    5_

Compute the inverse of C, C-1, by forming the 4 x 8 matrix [C 14] and row-reducing (Theorem CINM
[217]). Then use a calculator to compute C-1 directly.
Contributed by Robert Beezer Solution [224]

C40 Find all solutions to the system of equations below, making use of the matrix inverse found in
Exercise MISLE.C28 [223].

                                       XI + X2 + 3X3 + X4 =-4
                                     -2xi - x2 - 4x3 - x4 =4
                                     xi + 4x2 + 10x3 + 2X4 = -20
                                        -2x1 - 4x3 + 5X4 = 9


Contributed by Robert Beezer Solution [224]

C41 Use the inverse of a matrix to find all the solutions to the following system of equations.

                                         xi + 2x2 -  3=-3
                                         2xi + 5X2 - X3 = -4
                                             - 4x2 = 2


Contributed by Robert Beezer Solution [225]

C42 Use a matrix inverse to solve the linear system of equations.

                                         XI - z2 + 2x3 = 5
                                              xl - 2x3 =-8
                                         2x1 - z2 - x3 = -6


Contributed by Robert Beezer Solution [225]

T1O Construct an example to demonstrate that (A+B)-1 - A- +B-1 is not true for all square matrices
A and B of the same size.
Contributed by Robert Beezer Solution [225]


Version 2.02


﻿
                                                             Subsection MISLE.SOL Solutions 225


Subsection SOL
Solutions


C21    Contributed by Robert Beezer  Statement [222]
Check that both matrix products (Definition MM [197]) AB and BA equal the 4 x 4 identity matrix I4
(Definition IM [72]).
C22    Contributed by Robert Beezer  Statement [222]
Represent each of the two systems by a vector equality, Ax = c and Ay = d. Then in the spirit of Example
SABMI [212], solutions are given by

                                 8                                    5

                       XBC     421]                           Bd     410
                       X-5                                y0
                                -16_                                 -7_
Notice how we could solve many more systems having A as the coefficient matrix, and how each such system
has a unique solution. You might check your work by substituting the solutions back into the systems of
equations, or forming the linear combinations of the columns of A suggested by Theorem SLSLC [93].
C26    Contributed by Robert Beezer  Statement [222]
The inverse of D is
                                        -7   -6  -3    2   1
                                        -7   -4   2    2  -1
                                D-1 =   -5   -2   3    1  -1
                                        -6   -3   1    1   0
                                        4     2  -2   -1   1

C27    Contributed by Robert Beezer  Statement [222]
The matrix E has no inverse, though we do not yet have a theorem that allows us to reach this conclusion.
However, when row-reducing the matrix [E I5], the first 5 columns will not row-reduce to the 5 x 5 identity
matrix, so we are a t a loss on how we might compute the inverse. When requesting that your calculator
compute E-1, it should give some indication that E does not have an inverse.
C28    Contributed by Robert Beezer  Statement [223]
Employ Theorem CINM [217],

            1   1    3   1   1 0 0 0                 0   0   0   38    18   -5   -2
            -2 -1 -4 -1 0 1 0 0          RREF    0   1   0   0   96    47   -12  -5
            1   4   10   2  0 010            '   0   0    l0     -39 -195         2
                    _-2 0 4  00      1O_0               OW0    -16    -8     2   1
And therefore we see that C is nonsingular (C row-reduces to the identity matrix, Theorem NMRRI [72])
and by Theorem CINM [217],
                                         [38   18   -5   -21

                                            -9-19    5    2~5


C40 Contributed by Robert Beezer Statement [223]


                                                                                          -4
View this system as IS(C, b), where C is the 4 x 4 matrix from Exercise MISLE.C28 [223] and b = -0


                                                                                       Version 2.02


﻿
                                                                Subsection MISLE.SOL  Solutions 226


Since C was seen to be nonsingular in Exercise MISLE.C28 [223] Theorem SNCM [229] says the solution,
which is unique by Theorem NMUS [74], is given by

                                     38   18    -5   -2     -4       2
                             1 lb_   96   47   -12   -5     4    _-1
                                    -39   -19    5    2    -20       -2
                                    -16   -8     2    1_    9_       1

Notice that this solution can be easily checked in the original system of equations.
C41    Contributed by Robert Beezer   Statement [223]
The coefficient matrix of this system of equations is

                                               1   2   -1
                                        A42        5   -1
                                              -1 -4 0_

                                  -3
and the vector of constants is b = -4. So by Theorem SLEMM [195] we can convert the system to the
                                   2
form Ax = b. Row-reducing this matrix yields the identity matrix so by Theorem NMRRI [72] we know
A is nonsingular. This allows us to apply Theorem SNCM [229] to find the unique solution as

                                          -4    4   3     -3       2
                             x = A-lb =    1   -1 -1      -4 =    -1
                                          -3    2    1    2        3

Remember, you can check this solution easily by evaluating the matrix-vector product Ax (Definition MVP
[194]).
C42 Contributed by Robert Beezer Statement [223]
We can reformulate the linear system as a vector equality with a matrix-vector product via Theorem
SLEMM [195]. The system is then represented by Ax = b where

                              1 -1    2                                 5
                        A =   1  0   -2                          b =   -8
                              2 -1 -1_                                 -6_

According to Theorem SNCM [229], if A is nonsingular then the (unique) solution will be given by A-lb.
We attempt the computation of A-1 through Theorem CINM [217], or with our favorite computational
device and obtain,


                                        Ai =43 5 -4]


So by Theorem NI [228], we know A is nonsingular, and so the unique solution is


T1O Contributed by Robert Beezer Statement [223]


Let D be any 2 x 2 matrix that has an inverse (Theorem TTMI [214] can help you construct such a matrix,
I2 is a simple choice). Set A = D and B = (-1)D. While A-- and B-1 both exist, what is (A + B)--?
Can the proposed statement be a theorem?


Version 2.02


﻿
                                            Section MINM   Matrix Inverses and Nonsingular Matrices 227


Section MINM
Matrix Inverses and Nonsingular Matrices
U.-


We saw in Theorem CINM [217] that if a square matrix A is nonsingular, then there is a matrix B so
that AB = I. In other words, B is halfway to being an inverse of A. We will see in this section that
B automatically fulfills the second condition (BA = In). Example MWIAA [213] showed us that the
coefficient matrix from Archetype A [702] had no inverse. Not coincidentally, this coefficient matrix is
singular. We'll make all these connections precise now. Not many examples or definitions in this section,
just theorems.

Subsection NMI
Nonsingular Matrices are Invertible


We need a couple of technical results for starters. Some books would call these minor, but essential, results
"lemmas." We'll just call 'em theorems. See Technique LC [696] for more on the distinction.
    The first of these technical results is interesting in that the hypothesis says something about a product
of two square matrices and the conclusion then says the same thing about each individual matrix in the
product. This result has an analogy in the algebra of complex numbers: suppose a, #3 E C, then o,@ # 0
if and only if a # 0 and 3 # 0. We can view this result as suggesting that the term "nonsingular" for
matrices is like the term "nonzero" for scalars.
Theorem NPNT
Nonsingular Product has Nonsingular Terms
Suppose that A and B are square matrices of size n. The product AB is nonsingular if and only if A and
B are both nonsingular.                                                                           D
Proof (-) We'll do this portion of the proof in two parts, each as a proof by contradiction (Technique
CD [692]). Assume that AB is nonsingular. Establishing that B is nonsingular is the easier part, so we will
do it first, but in reality, we will need to know that B is nonsingular when we prove that A is nonsingular.
   You can also think of this proof as being a study of four possible conclusions in the table below. One
of the four rows must happen (the list is exhaustive). In the proof we learn that the first three rows lead
to contradictions, and so are impossible. That leaves the fourth row as a certainty, which is our desired
conclusion.
                                       A             B        Case
                                  Singular      Singular        1
                                  Nonsingular   Singular        1
                                  Singular      Nonsingular    2
                                  Nonsingular Nonsingular

Part 1. Suppose B is singular. Then there is a nonzero vector z that is a solution to IJS(B, 0). So

                     (AB)z = A(Bz)                       Theorem MMA [202]
                            = AO                         Theorem SLEMM [195]
                            = 0                          Theorem MMZM [200]


Because z is a nonzero solution to IS(AB, 0), we conclude that AB is singular (Definition NM [71]). This
is a contradiction, so B is nonsingular, as desired.


Version 2.02


﻿
                                          Subsection MINM.NMI Nonsingular Matrices are Invertible 228


   Part 2. Suppose A is singular. Then there is a nonzero vector y that is a solution to IJS(A, 0). Now
consider the linear system [S(B, y). Since we know B is nonsingular from Case 1, the system has a unique
solution (Theorem NMUS [74]), which we will denote as w. We first claim w is not the zero vector either.
Assuming the opposite, suppose that w = 0 (Technique CD [692]). Then

                          y = Bw                         Theorem SLEMM [195]
                            = BO                         Hypothesis
                            = 0                          Theorem MMZM [200]

contrary to y being nonzero. So w # 0. The pieces are in place, so here we go,

                     (AB)w = A(Bw)                       Theorem MMA [202]
                            = Ay                         Theorem SLEMM [195]
                            = 0                          Theorem SLEMM [195]


So w is a nonzero solution to IJS(AB, 0), and thus we can say that AB is singular (Definition NM [71]).
This is a contradiction, so A is nonsingular, as desired.
    (<) Now assume that both A and B are nonsingular. Suppose that x E C"m is a solution to IJS(AB, 0).
Then

                      0 = (AB) x                       Theorem SLEMM [195]
                        = A (Bx)                       Theorem MMA [202]

By Theorem SLEMM [195], Bx is a solution to [S(A, 0), and by the definition of a nonsingular matrix
(Definition NM [71]), we conclude that Bx = 0. Now, by an entirely similar argument, the nonsingularity
of B forces us to conclude that x = 0. So the only solution to IJS(AB, 0) is the zero vector and we
conclude that AB is nonsingular by Definition NM [71].                                            U
   This is a powerful result in the "forward" direction, because it allows us to begin with a hypothesis
that something complicated (the matrix product AB) has the property of being nonsingular, and we can
then conclude that the simpler constituents (A and B individually) then also have the property of being
nonsingular. If we had thought that the matrix product was an artificial construction, results like this
would make us begin to think twice.
   The contrapositive of this result is equally interesting. It says that A or B (or both) is a singular matrix
if and only if the product AB is singular. Notice how the negation of the theorem's conclusion (A and B
both nonsingular) becomes the statement "at least one of A and B is singular." (See Technique CP [691].)
Theorem 0SIS
One-Sided Inverse is Sufficient
Suppose A and B are square matrices of size n~ such that AB =In. Then BA =In.D
Proof The matrix In is nonsingular (since it row-reduces easily to hI>, Theorem NMRRI [72]). So A
and B are nonsingular by Theorem NPNT [226], so in particular B is nonsingular. We can therefore
apply Theorem CINM [217] to assert the existence of a matrix C so that BC =In. This application of
Theorem CINM [217] could be a bit confusing, mostly because of the names of the matrices involved. B
is nonsingular, so there must be a "right-inverse" for B, and we're calling it C.
   Now


BA = (BA)I                           Theorem MMIM [200]
    = (BA)(BC)                       Theorem CINM    [217]
    = B(AB)C                         Theorem MMA [202]


Version 2.02


﻿
                                           Subsection MINM.NMI Nonsingular Matrices are Invertible 229


                         = BInC                            Hypothesis
                         = BC                              Theorem MMIM [200]
                         = In                              Theorem CINM [217]

which is the desired conclusion.                                                                    U
   So Theorem OSIS [227] tells us that if A is nonsingular, then the matrix B guaranteed by Theorem
CINM [217] will be both a "right-inverse" and a "left-inverse" for A, so A is invertible and A-'-= B.
   So if you have a nonsingular matrix, A, you can use the procedure described in Theorem CINM [217]
to find an inverse for A. If A is singular, then the procedure in Theorem CINM [217] will fail as the first
n columns of M will not row-reduce to the identity matrix. However, we can say a bit more. When A
is singular, then A does not have an inverse (which is very different from saying that the procedure in
Theorem CINM [217] fails to find an inverse). This may feel like we are splitting hairs, but its important
that we do not make unfounded assumptions. These observations motivate the next theorem.
Theorem NI
Nonsingularity is Invertibility
Suppose that A is a square matrix. Then A is nonsingular if and only if A is invertible.   D
Proof (<) Suppose A is invertible, and suppose that x is any solution to the homogeneous system
[S(A, 0). Then

                      x = Inx                            Theorem MMIM [200]
                        = (A-1A) x                       Definition MI [213]
                        = A-- (Ax)                       Theorem MMA [202]
                        = A-10                           Theorem SLEMM [195]
                        = 0                              Theorem MMZM [200]

So the only solution to [S(A, 0) is the zero vector, so by Definition NM [71], A is nonsingular.
    (-) Suppose now that A is nonsingular. By Theorem CINM [217] we find B so that AB = In. Then
Theorem OSIS [227] tells us that BA= I. So B is A's inverse, and by construction, A is invertible.
   So for a square matrix, the properties of having an inverse and of having a trivial null space are one
and the same. Can't have one without the other.
Theorem NME3
Nonsingular Matrix Equivalences, Round 3
Suppose that A is a square matrix of size n. The following are equivalent.
  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, N(A) ={0}.

  4. The linear system IJS(A, b) has a unique solution for every possible choice of b.

  5. The columns of A are a linearly independent set.

  6. A is invertible.


Proof We can update our list of equivalences for nonsingular matrices (Theorem NME2 [138]) with the
equivalent condition from Theorem NI [228].                                                         U
   In the case that A is a nonsingular coefficient matrix of a system of equations, the inverse allows us to
very quickly compute the unique solution, for any vector of constants.


Version 2.02


﻿
                                                          Subsection MINM.UM   Unitary Matrices 230


Theorem SNCM
Solution with Nonsingular Coefficient Matrix
Suppose that A is nonsingular. Then the unique solution to IJS(A, b) is A-lb.               D
Proof By Theorem NMUS [74] we know already that [S(A, b) has a unique solution for every choice of
b. We need to show that the expression stated is indeed a solution (the solution). That's easy, just "plug
it in" to the corresponding vector equation representation (Theorem SLEMM [195]),

                   A (A--b) = (AA-1) b                     Theorem MMA [202]
                             = Inb                         Definition MI [213]
                             = b                           Theorem MMIM [200]

Since Ax = b is true when we substitute A-lb for x, A-lb is a (the!) solution to [S(A, b).


Subsection UM
Unitary Matrices


Recall that the adjoint of a matrix is A* = (A) (Definition A [189]).
Definition UM
Unitary Matrices
Suppose that U is a square matrix of size n such that U*U = In. Then we say U is unitary.   A
   This condition may seem rather far-fetched at first glance. Would there be any matrix that behaved
this way? Well, yes, here's one.
Example UM3
Unitary matrix of size 3

                                           [1+i 3+2i    2+2i -
                                         ~   _     55    22]


The computations get a bit tiresome, but if you work your way through the computation of U*U, you will
arrive at the 3 x 3 identity matrix 13.

   Unitary matrices do not have to look quite so gruesome. Here's a larger one that is a bit more pleasing.
Example UPM
Unitary permutation matrix
The matrix


                                       P= 1 0 0 0 0


is unitary as can be easily checked. Notice that it is just a rearrangement of the columns of the 5 x 5
identity matrix, 15 (Definition IM [72]).


   An interesting exercise is to build another 5 x 5 unitary matrix, R, using a different rearrangement of
the columns of I5. Then form the product PR. This will be another unitary matrix (Exercise MINM.T10
[234]). If you were to build all 5! = 5 x 4 x 3 x 2 x 1 = 120 matrices of this type you would have a set
that remains closed under matrix multiplication. It is an example of another algebraic structure known as


Version 2.02


﻿
                                                       Subsection MINM.UM   Unitary Matrices 231


a group since together the set and the one operation (matrix multiplication here) is closed, associative,
has an identity (15), and inverses (Theorem UMI [230]). Notice though that the operation in this group is
not commutative!

   If a matrix A has only real number entries (we say it is a real matrix) then the defining property of
being unitary simplifies to ALA = I. In this case we, and everybody else, calls the matrix orthogonal,
so you may often encounter this term in your other reading when the complex numbers are not under
consideration.
   Unitary matrices have easily computed inverses. They also have columns that form orthonormal sets.
Here are the theorems that show us that unitary matrices are not as strange as they might initially appear.

Theorem UMI
Unitary Matrices are Invertible
Suppose that U is a unitary matrix of size n. Then U is nonsingular, and U- = U*.

Proof By Definition UM [229], we know that U*U = I. The matrix In is nonsingular (since it row-
reduces easily to In, Theorem NMRRI [72]). So by Theorem NPNT [226], U and U* are both nonsingular
matrices.
   The equation U*U = In gets us halfway to an inverse of U, and Theorem OSIS [227] tells us that then
UU* = In also. So U and U* are inverses of each other (Definition MI [213]).                 U

Theorem CUMOS
Columns of Unitary Matrices are Orthonormal Sets
Suppose that A is a square matrix of size n with columns S = {A1, A2, A3, ..., An}. Then A is a unitary
matrix if and only if S is an orthonormal set.                                               D

Proof The proof revolves around recognizing that a typical entry of the product A*A is an inner product
of columns of A. Here are the details to support this claim.

                           n
                 [A*A] t  >3 [A*]ik [A]kJ                 Theorem EMP [198]
                          k=1

                              [(t][A]k                    Theorem EMP [198]
                           k=1
                           n
                        = S [Ak [A]kJ                     Definition TM [185]
                          k=1
                          n
                        = S [A] [A]kJ                     Definition CCM [187]
                          k=1

                        = S [A]1 [A]k                     Property CMCN [680]
                          k=1

                        = S [A3]~ [Ai]k
                          k=1
                        = (A3, Ag)                        Definition IP [168]

We now employ this equality in a chain of equivalences,


S= {A1, A2, A3, ..., An} is an orthonormal set
                 (0 ifi -f
 <    (Ai, Ai) =     ..                                Definition ONS [177]
                  1  if zr=2.

                                                                            Version 2.02


﻿
Subsection MINM.UM  Unitary Matrices 232


            <-> [A*A]     { ( ifi -fj
                           1  if i=j

              <>[A*A]i = [In]2j 1 < i < n, 1 < j < n
            <   A* A = In
            <    A is a unitary matrix


Example OSMC
Orthonormal set from matrix columns
The matrix
                                         [- i  3+2i  2+2i
                                       U=1-i 2+2 i   -3+i

                                    S E  K             22]
from Example UM3 [229] is a unitary matrix. By Theorem CUMOS


Definition IM [72]
Definition ME [182]
Definition UM [229]


0


[230], its columns


{


[1+i   -3+2i-  2+2i
15 i    2         22
1-i     2+2 i    -3+i
  II,     55 'I22
        55     [22]


form an orthonormal set. You might find checking the six inner products of pairs of these vectors easier
than doing the matrix product U*U. Or, because the inner product is anti-commutative (Theorem IPAC
[170]) you only need check three inner products (see Exercise MINM.T12 [234]).
   When using vectors and matrices that only have real number entries, orthogonal matrices are those
matrices with inverses that equal their transpose. Similarly, the inner product is the familiar dot product.
Keep this special case in mind as you read the next theorem.
Theorem UMPIP
Unitary Matrices Preserve Inner Products
Suppose that U is a unitary matrix of size n and u and v are two vectors from C". Then


(Uu, Uv) = (u, v)


and


||Uv|| = V


El


Proof


(Uu, Uv)


(Uu)tUv


ut (U UV

ut (U) Uv

ut (U) tUv
uLU*UV


utV


Theorem MMIP [202]
Theorem MMT [203]
Theorem MMCC [203]

Theorem CCT [682]

Theorem MCT [189]

Theorem MMCC [203]
Definition A [189]
Definition UM [229]
Definition IM [72]
Theorem MMIM [200]


Version 2.02


﻿
                                                        Subsection MINM.READ     Reading Questions 233


                               (u, v)                         Theorem MMIP [202]


The second conclusion is just a specialization of the first conclusion.


                      ||Uv| =   ||Uv 2
                                (Uv, Uv)                       Theorem IPN [171]
                                (v, v)

                                v 2                            Theorem IPN [171]

                              1VI


    Aside from the inherent interest in this theorem, it makes a bigger statement about unitary matrices.
When we view vectors geometrically as directions or forces, then the norm equates to a notion of length. If
we transform a vector by multiplication with a unitary matrix, then the length (norm) of that vector stays
the same. If we consider column vectors with two or three slots containing only real numbers, then the inner
product of two such vectors is just the dot product, and this quantity can be used to compute the angle
between two vectors. When two vectors are multiplied (transformed) by the same unitary matrix, their
dot product is unchanged and their individual lengths are unchanged. The results in the angle between
the two vectors remaining unchanged.
    A "unitary transformation" (matrix-vector products with unitary matrices) thus preserve geometrical
relationships among vectors representing directions, forces, or other physical quantities. In the case of a two-
slot vector with real entries, this is simply a rotation. These sorts of computations are exceedingly important
in computer graphics such as games and real-time simulations, especially when increased realism is achieved
by performing many such computations quickly. We will see unitary matrices again in subsequent sections
(especially Theorem OD [607]) and in each instance, consider the interpretation of the unitary matrix
as a sort of geometry-preserving transformation. Some authors use the term isometry to highlight this
behavior. We will speak loosely of a unitary matrix as being a sort of generalized rotation.
    A final reminder: the terms "dot product," "symmetric matrix" and "orthogonal matrix" used in refer-
ence to vectors or matrices with real number entries correspond to the terms "inner product," "Hermitian
matrix" and "unitary matrix" when we generalize to include complex number entries, so keep that in mind
as you read elsewhere.

Subsection READ
Reading Questions


   1. Compute the inverse of the coefficient matrix of the system of equations below and use the inverse to
     solve the system.

                                              4xi + 10X2 =12
                                              2x1 + 6x2 =4

  2. In the reading questions for Section MISLE [212] you were asked to find the inverse of the 3 x 3 matrix


below.
                                           2    3    1
                                           1   -2 -3
                                           -2   4    6


Version 2.02


﻿
                                                      Subsection MINM.READ  Reading Questions 234


   Because the matrix was not nonsingular, you had no theorems at that point that would allow you to
   compute the inverse. Explain why you now know that the inverse does not exist (which is different
   than not being able to compute it) by quoting the relevant theorem's acronym.

3. Is the matrix A unitary? Why?


                                       [-22(4+2i)         4(5+ 3i)
                                          2(-1 -Zi)  3(12 + 14i)


Version 2.02


﻿
                                                                  Subsection MINM.EXC   Exercises 235


Subsection EXC
Exercises


C40 Solve the system of equations below using the inverse of a matrix.

                                          i + x2 + 3x3 +4 =5
                                      -2xi - z2 - 4x3 - x4 = -7
                                      xzi +4x2 + 10x3 +2X4 = 9
                                           -2xi1- 4x3 + 5X4 = 9


Contributed by Robert Beezer Solution [235]

M20 Construct an example of a 4 x 4 unitary matrix.
Contributed by Robert Beezer Solution [235]

M80 Matrix multiplication interacts nicely with many operations. But not always with transforming a
matrix to reduced row-echelon form. Suppose that A is an m x n matrix and B is an n x p matrix. Let P be
a matrix that is row-equivalent to A and in reduced row-echelon form, Q be a matrix that is row-equivalent
to B and in reduced row-echelon form, and let R be a matrix that is row-equivalent to AB and in reduced
row-echelon form. Is PQ = R? (In other words, with nonstandard notation, is rref(A)rref(B) = rref(AB)?)
   Construct a counterexample to show that, in general, this statement is false. Then find a large class of
matrices where if A and B are in the class, then the statement is true.
Contributed by Mark Hamrick  Solution [235]

T10 Suppose that Q and P are unitary matrices of size n. Prove that QP is a unitary matrix.
Contributed by Robert Beezer

T11    Prove that Hermitian matrices (Definition HM   [205]) have real entries on the diagonal. More
precisely, suppose that A is a Hermitian matrix of size n. Then [A]  E R, 1 < i <rn.
Contributed by Robert Beezer

T12 Suppose that we are checking if a square matrix of size n is unitary. Show that a straightforward
application of Theorem CUMOS [230] requires the computation of n2 inner products when the matrix is
unitary, and fewer when the matrix is not orthogonal. Then show that this maximum number of inner
products can be reduced to n(n + 1) in light of Theorem IPAC [170].
Contributed by Robert Beezer


Version 2.02


﻿
                                                              Subsection MINM.SOL  Solutions 236


Subsection SOL
Solutions


C40    Contributed by Robert Beezer  Statement [234]
The coefficient matrix and vector of constants for the system are

                         1   1    3   1                               5
                         -2 -1 -4 -1                            b=
                         1   4   10   2                         b  [9
                       L-2   0   -4   5 __9_

A-- can be computed by using a calculator, or by the method of Theorem CINM [217]. Then Theorem
SNCM [229] says the unique solution is

                                    38    18   -5   -2    5       1

                          Ab=       96   47   -12   -5    -7   _-2
                                   -39   -19    5    2    9       1
                                   -16   -8     2    1] [9]      [3]

M20     Contributed by Robert Beezer  Statement [234]
The 4 x 4 identity matrix, 14, would be one example (Definition IM [72]). Any of the 23 other rearrangements
of the columns of I4 would be a simple, but less trivial, example. See Example UPM [229].
M80     Contributed by Robert Beezer  Statement [234]
Take

                              1  0                                 0 0
      T h nA               =                                  B =
Then A is already in reduced row-echelon form, and by swapping rows, B row-reduces to A. So the product
of the row-echelon forms of A is AA = A -f 0. However, the product AB is the 2 x 2 zero matrix, which
is in reduced-echelon form, and not equal to AA. When you get there, Theorem PEEF [262] or Theorem
EMDRO [372] might shed some light on why we would not expect this statement to be true in general.
   If A and B are nonsingular, then AB is nonsingular (Theorem NPNT [226]), and all three matrices
A, B and AB row-reduce to the identity matrix (Theorem NMRRI [72]). By Theorem MMIM [200], the
desired relationship is true.


Version 2.02


﻿
                                                           Section CRS  Column and Row Spaces 237


Section CRS
Column and Row Spaces


Theorem SLSLC [93] showed us that there is a natural correspondence between solutions to linear sys-
tems and linear combinations of the columns of the coefficient matrix. This idea motivates the following
important definition.
Definition CSM
Column Space of a Matrix
Suppose that A is an m x n matrix with columns {A1, A2, A3, ..., An}. Then the column space of A,
written C(A), is the subset of Cm containing all linear combinations of the columns of A,

                                  C(A) = ({A1, A2, A3, ..., An})


(This definition contains Notation CSM.)                                                        A
   Some authors refer to the column space of a matrix as the range, but we will reserve this term for use
with linear transformations (Definition RLT [496]).

Subsection CSSE
Column Spaces and Systems of Equations


Upon encountering any new set, the first question we ask is what objects are in the set, and which objects
are not? Here's an example of one way to answer this question, and it will motivate a theorem that will
then answer the question precisely.
Example CSMCS
Column space of a matrix and consistent systems
Archetype D [716] and Archetype E [720] are linear systems of equations, with an identical 3 x 4 coefficient
matrix, which we call A here. However, Archetype D [716] is consistent, while Archetype E [720] is not.
We can explain this difference by employing the column space of the matrix A.
   The column vector of constants, b, in Archetype D [716] is

                                                  8
                                           b    [-12
                                                  4

One solution to [S(A, b), as listed, is


By Theorem SLSLC [93], we can summarize this solution as a linear combination of the columns of A that
equals b,


                            2        1        7        -7       8
                         7 -3 +8 4 + 1 -5 +3 -6 =              -12 = b.
                             1       1        4        -5       4
This equation says that b is a linear combination of the columns of A, and then by Definition CSM [236],
we can say that b E C(A).


Version 2.02


﻿
                                      Subsection CRS.CSSE   Column Spaces and Systems of Equations 238


   On the other hand, Archetype E [720] is the linear system IJS(A, c), where the vector of constants is

                                                    2
                                               c = 3
                                                    2

and this system of equations is inconsistent. This means c 0 C(A), for if it were, then it would equal a
linear combination of the columns of A and Theorem SLSLC [93] would lead us to a solution of the system
[S(A, c).
   So if we fix the coefficient matrix, and vary the vector of constants, we can sometimes find consistent
systems, and sometimes inconsistent systems. The vectors of constants that lead to consistent systems
are exactly the elements of the column space. This is the content of the next theorem, and since it is an
equivalence, it provides an alternate view of the column space.
Theorem CSCS
Column Spaces and Consistent Systems
Suppose A is an m x n matrix and b is a vector of size m. Then b E C(A) if and only if [S(A, b) is
consistent.                                                                                         D
Proof (-) Suppose b E C(A). Then we can write b as some linear combination of the columns of A. By
Theorem SLSLC [93] we can use the scalars from this linear combination to form a solution to [S(A, b),
so this system is consistent.
    (<) If IJS(A, b) is consistent, there is a solution that may be used with Theorem SLSLC [93] to write
b as a linear combination of the columns of A. This qualifies b for membership in C(A).
   This theorem tells us that asking if the system IJS(A, b) is consistent is exactly the same question as
asking if b is in the column space of A. Or equivalently, it tells us that the column space of the matrix A
is precisely those vectors of constants, b, that can be paired with A to create a system of linear equations
[S(A, b) that is consistent.
   Employing Theorem SLEMM [195] we can form the chain of equivalences

                    b c C(A) <     [S(A, b) is consistent <    Ax = b for some x

Thus, an alternative (and popular) definition of the column space of an m x n matrix A is

                  C(A)={yECm        y=Ax for somexECTh}={Ax          xEC}CCm

We recognize this as saying create all the matrix vector products possible with the matrix A by letting x
range over all of the possibilities. By Definition MVP [194] we see that this means take all possible linear
combinations of the columns of A  precisely the definition of the column space (Definition CSM [236])
we have chosen.
   Notice how this formulation of the column space looks very much like the definition of the null space of
a matrix (Definition NSM [64]), but for a rectangular matrix the column vectors of C(A) and P1(A) have
different sizes, so the sets are very different.
   Given a vector b and a matrix A it is now very mechanical to test if b E C(A). Form the linear system
[S(A, b), row-reduce the augmented matrix, [A |b], and test for consistency with Theorem RCLS [53].
Here's an example of this procedure.

Example MCSM
Membership in the column space of a matrix
Consider the column space of the 3 x 4 matrix A,


       3    2    1  -4
A =   -1    1   -2   3
       2   -4   6   -8


Version 2.02


﻿
                                     Subsection CRS.CSSE   Column Spaces and Systems of Equations 239


                         18
We first show that v = -6 is in the column space of A, v E C(A). Theorem CSCS [237] says we need
                         12
only check the consistency of [S(A, v). Form the augmented matrix and row-reduce,
                         3    2    1   -4   18   -       1   0    1   -2 6
                         -1   1   -2   3   -6    RREF:   0       -1    1   0
                         2   -4    6   -812              0   0    0    0  0
Without a leading 1 in the final column, Theorem RCLS [53] tells us the system is consistent and therefore
by Theorem CSCS [237], v E C(A).
   If we wished to demonstrate explicitly that v is a linear combination of the columns of A, we can
find a solution (any solution) of [S(A, v) and use Theorem SLSLC [93] to construct the desired linear
combination. For example, set the free variables to x3= 2 and x4 =1. Then a solution has x2 = 1 and
xi = 6. Then by Theorem SLSLC [93],
                               18         3         2         1       -4
                          V=-6       = 6 -1 +1      1   +2 -2 +1 [3
                                12        2        -4        6        _-8-
                          2
Now we show that w = 1 is not in the column space of A, w g C(A). Theorem CSCS [237] says we
                         .-3_
need only check the consistency of [S(A, w). Form the augmented matrix and row-reduce,

                         3   2    1   -4    2                0   1   -2   0
                         -1  1   -2    3    1   RREF    0    0- -1    1    0
                         L2  -4   6   -8 -3_           _ 0   0   0    0   [1_
With a leading 1 in the final column, Theorem RCLS [53] tells us the system is inconsistent and therefore
by Theorem CSCS [237], w 0 C(A).
   Theorem CSCS [237] completes a collection of three theorems, and one definition, that deserve comment.
Many questions about spans, linear independence, null space, column spaces and similar objects can be
converted to questions about systems of equations (homogeneous or not), which we understand well from
our previous results, especially those in Chapter SLE [2]. These previous results include theorems like
Theorem RCLS [53] which allows us to quickly decide consistency of a system, and Theorem BNS [139]
which allows us to describe solution sets for homogeneous systems compactly as the span of a linearly
independent set of column vectors.
   The table below lists these for definitions and theorems along with a brief reminder of the statement
and an example of how the statement is used.
          Definition NSM [64]     __________________________
                        Synopsis [Null space is solution set of homogeneous system
                        Example [General solution sets described by Theorem PSPHS [105]
          Theorem SLSLC [93]      __________________________
                        Synopsis [Solutions for linear combinations with unknown scalars
                        Example [Deciding membership in spans
          Theorem SLEMM [195] _______________________
                        Synopsis [System of equations represented by matrix-vector product
                        Example [Solution to [S(A, b) is A--b when A is nonsingular


Theorem CSCS [237]
              Synopsis   Column space vectors create consistent systems
              Example    Deciding membership in column spaces


Version 2.02


﻿
                                Subsection CRS.CSSOC Column Space Spanned by Original Columns 240


Subsection CSSOC
Column Space Spanned by Original Columns


So we have a foolproof, automated procedure for determining membership in C(A). While this works just
fine a vector at a time, we would like to have a more useful description of the set C(A) as a whole. The
next example will preview the first of two fundamental results about the column space of a matrix.
Example CSTW
Column space, two ways
Consider the 5 x 7 matrix A,
                                  2   4    1   -1   1   4    4
                                  1   2    1   0    2   4    7
                                  0   0    1   4    1   8    7
                                  1   2   -1   2    1   9    6
                                  -2  -4   1   3   -1 -2    -2_
According to the definition (Definition CSM [236]), the column space of A is

                              2      4      1     -1      1      4      4
                              1      2      1      0      2      4      7
                 C(A) =       0   ,0     ,1     ,4     ,1     ,8     ,7
                              1      2     -1      2      1      9      6
                            C4-2_   _-4_    1      3     -1     -2     _-2

While this is a concise description of an infinite set, we might be able to describe the span with fewer than
seven vectors. This is the substance of Theorem BS [157]. So we take these seven vectors and make them
the columns of matrix, which is simply the original matrix A again. Now we row-reduce,

                2    4   1   -1   1    4   4            120        0   0   3   1
                1    2   1    0   2    4   7            0  0W2     0   0   -1 0
                0    0   1    4   1    8   7    RREF 0     0   0       0   2   1
                1    2  -1    2   1    9   6           0   0  0    0 W     13
                -2 -4    1    3  -1 -2 -2_              0  0  0    0   0   0   0

The pivot columns are D = {1, 3, 4, 5}, so we can create the set

                                       2      1     -1      1
                                       1      1      0      2
                               T=      {0 ,1      ,4     ,1


and know that C(A) =(T) and T is a linearly independent set of columns from the set of columns of A. Z
   We will now formalize the previous example, which will make it trivial to determine a linearly inde-
pendent set of vectors that will span the column space of a matrix, and is constituted of just columns of
A.
Theorem BCS


Basis of the Column Space
Suppose that A is an m x n matrix with columns A1, A2, A3, ..., An, and B is a row-equivalent matrix in
reduced row-echelon form with r nonzero rows. Let D = {di, d2, d3, ..., dr } be the set of column indices
where B has leading 1's. Let T = {Ad1, Ad2, Ad3, ..., Adr}. Then


Version 2.02


﻿
                                 Subsection CRS.CSSOC  Column Space Spanned by Original Columns 241


  1. T is a linearly independent set.

  2. C(A) = (T).


Proof Definition CSM [236] describes the column space as the span of the set of columns of A. Theorem
BS [157] tells us that we can reduce the set of vectors used in a span. If we apply Theorem BS [157] to
C(A), we would collect the columns of A into a matrix (which would just be A again) and bring the matrix
to reduced row-echelon form, which is the matrix B in the statement of the theorem. In this case, the
conclusions of Theorem BS [157] applied to A, B and C(A) are exactly the conclusions we desire.
   This is a nice result since it gives us a handful of vectors that describe the entire column space (through
the span), and we believe this set is as small as possible because we cannot create any more relations of
linear dependence to trim it down further. Furthermore, we defined the column space (Definition CSM
[236]) as all linear combinations of the columns of the matrix, and the elements of the set S are still columns
of the matrix (we won't be so lucky in the next two constructions of the column space).
   Procedurally this theorem is extremely easy to apply. Row-reduce the original matrix, identify r
columns with leading 1's in this reduced matrix, and grab the corresponding columns of the original
matrix. But it is still important to study the proof of Theorem BS [157] and its motivation in Example
COV [154] which lie at the root of this theorem. We'll trot through an example all the same.
Example CSOCD
Column space, original columns, Archetype D
Let's determine a compact expression for the entire column space of the coefficient matrix of the system
of equations that is Archetype D [716]. Notice that in Example CSMCS [236] we were only determining if
individual vectors were in the column space or not, now we are describing the entire column space.
   To start with the application of Theorem BCS [239], call the coefficient matrix A

                                              2   1  7   -7
                                       A=    -3 4 -5 -6
                                              1   1  4   -5

and row-reduce it to reduced row-echelon form,

                                                1 0   3 -2-
                                       B=     0  F2   1 -3 .
                                              0   0   0   0_

There are leading 1's in columns 1 and 2, so D = {1, 2}. To construct a set that spans C(A), just grab the
columns of A indicated by the set D, so


                                     C(A<{[ -3],4


That's it.
   In Example CSMCS [236] we determined that the vector


                                                    2
                                               c = 3
                                                    -2

was not in the column space of A. Try to write c as a linear combination of the first two columns of A.
What happens?


Version 2.02


﻿
                                       Subsection CRS.CSNM  Column Space of a Nonsingular Matrix 242


   Also in Example CSMCS [236] we determined that the vector

                                                   8
                                            b    [-12
                                                   4

was in the column space of A. Try to write b as a linear combination of the first two columns of A. What
happens? Did you find a unique solution to this question? Hmmmm.


Subsection CSNM
Column Space of a Nonsingular Matrix


Let's specialize to square matrices and contrast the column spaces of the coefficient matrices in Archetype
A [702] and Archetype B [707].

Example CSAA
Column space of Archetype A
The coefficient matrix in Archetype A [702] is

                                               1 -1 2
                                         A42       1   1
                                               1   1   0

which row-reduces to
                                            1   0    1
                                            0  Q1   -1 .
                                            0   0    0

Columns 1 and 2 have leading 1's, so by Theorem BCS [239] we can write

                                                       1     -1
                             C(A) =_({A1, A2}) =       2 ,    1      .
                                                       1      1

                                                                                     1
We want to show in this example that C(A) - C3. So take, for example, the vector b = 3 . Then there
                                                                                    2
is no solution to the system [S(A, b), or equivalently, it is not possible to write b as a linear combination
of A1 and A2. Try one of these two computations yourself. (Or try both!). Since b g C(A), the column
space of A cannot be all of C3. So by varying the vector of constants, it is possible to create inconsistent
systems of equations with this coefficient matrix (the vector b being one such example).
   In Example MWIAA [213] we wished to show that the coefficient matrix from Archetype A [702] was
not invertible as a first example of a matrix without an inverse. Our device there was to find an inconsistent
linear system with A as the coefficient matrix. The vector of constants in that example was b, deliberately
chosen outside the column space of A.

Example CSAB


Column space of Archetype B
The coefficient matrix in Archetype B [707], call it B here, is known to be nonsingular (see Example NM
[72]). By Theorem NMUS [74], the linear system [S(B, b) has a (unique) solution for every choice of b.
Theorem CSCS [237] then says that b E C(B) for all b E C3. Stated differently, there is no way to build


Version 2.02


﻿
                                      Subsection CRS.CSNM   Column Space of a Nonsingular Matrix 243


an inconsistent system with the coefficient matrix B, but then we knew that already from Theorem NMUS
[74].

   Example CSAA [241] and Example CSAB [241] together motivate the following equivalence, which says
that nonsingular matrices have column spaces that are as big as possible.

Theorem CSNM
Column Space of a Nonsingular Matrix
Suppose A is a square matrix of size n. Then A is nonsingular if and only if C(A) = C".    D

Proof    (-) Suppose A is nonsingular. We wish to establish the set equality C(A) = C". By Definition
CSM [236], C(A) C C".
   To show that C" C C(A) choose b E C". By Theorem NMUS [74], we know the linear system [S(A, b)
has a (unique) solution and therefore is consistent. Theorem CSCS [237] then says that b E C(A). So by
Definition SE [684], C(A) = C".
   (<) If ei is column i of the n x n identity matrix (Definition SUV [173]) and by hypothesis C(A) = C",
then e2 E C(A) for 1 < i < n. By Theorem CSCS [237], the system [S(A, e2) is consistent for 1 < i < n.
Let bi denote any one particular solution to [S(A, el), 1 < i < n.
   Define the n x n matrix B = [b1Ib2|b3|... lba]. Then

                 AB = A [b1b2b3|...lbn]
                     = [AbilAb2|Ab3|...|Aba]                   Definition MM [197]
                     = [e1|e2|e3| -  - en]
                     = In                                     Definition SUV [173]


So the matrix B is a "right-inverse" for A. By Theorem NMRRI [72], In is a nonsingular matrix, so
by Theorem NPNT [226] both A and B are nonsingular. Thus, in particular, A is nonsingular. (Travis
Osborne contributed to this proof.)                                                              U

   With this equivalence for nonsingular matrices we can update our list, Theorem NME3 [228].

Theorem NME4
Nonsingular Matrix Equivalences, Round 4
Suppose that A is a square matrix of size n. The following are equivalent.

  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, P1(A) ={O}.

  4. The linear system [S(A, b) has a unique solution for every possible choice of b.

  5. The columns of A are a linearly independent set.

  6. A is invertible.

  7. The column space of A is C"m, C(A) =C"m.


Proof Since Theorem CSNM [242] is an equivalence, we can add it to the list in Theorem NME3 [228].


Version 2.02


﻿
                                                      Subsection CRS.RSM   Row Space of a Matrix 244


Subsection RSM
Row Space of a Matrix


The rows of a matrix can be viewed as vectors, since they are just lists of numbers, arranged horizontally.
So we will transpose a matrix, turning rows into columns, so we can then manipulate rows as column
vectors. As a result we will be able to make some new connections between row operations and solutions
to systems of equations. OK, here is the second primary definition of this section.

Definition RSM
Row Space of a Matrix
Suppose A is an m x n matrix. Then the row space of A, R(A), is the column space of At, i.e. 7Z(A) -
C(At).
(This definition contains Notation RSM.)                                                          A

   Informally, the row space is the set of all linear combinations of the rows of A. However, we write
the rows as column vectors, thus the necessity of using the transpose to make the rows into columns.
Additionally, with the row space defined in terms of the column space, all of the previous results of this
section can be applied to row spaces.
   Notice that if A is a rectangular m x n matrix, then C(A) C Cm, while 7Z(A) C C" and the two sets
are not comparable since they do not even hold objects of the same type. However, when A is square of
size n, both C(A) and 7Z(A) are subsets of C", though usually the sets will not be equal (but see Exercise
CRS.M20 [251]).

Example RSAI
Row space of Archetype I
The coefficient matrix in Archetype I [737] is

                                    1    4    0   -1   0     7   -9
                                    2    8   -1   3    9   -13    7
                                    0    0    2   -3 -4     12   -8*
                                    -1 -4     2    4   8   -31   37]

To build the row space, we transpose the matrix,

                                            1    2    0    -1
                                            4    8    0    -4
                                            0   -1    2     2
                                     Ij=   -1    3    -3    4
                                            o    9    -4    8
                                            7   -13   12  -31
                                            _-9 7 -8 37_

Then the columns of this matrix are used in a span to build the row space,


                  0      -1      2       2
()=C (IP) = < - 1  , 3 , -3  , 4 > .
                  0       9      -4      8
                  7     -13      12     -31
                  -9_   _7  _ -8_  _37_,


Version 2.02


﻿
                                                     Subsection CRS.RSM   Row Space of a Matrix 245


However, we can use Theorem BCS [239] to get a slightly better description. First, row-reduce It,

                                         1   0   0   -
                                         o~ilo        7
                                         0     0i1

                                         0   0   0    0    -
                                         0   0   0    0
                                         0   0   0    0
                                         0   0   0    0

Since there are leading 1's in columns with indices D = {1, 2, 3}, the column space of It can be spanned
by just the first three columns of It,

                                                1       2      0
                                                4       8      0
                                                0      -1      2
                           7(I) = C(It)=    <-1 ,       3   , -3
                                                0       9      -4
                                                7     -13      12
                                                -9      7      -8


   The row space would not be too interesting if it was simply the column space of the transpose. However,
when we do row operations on a matrix we have no effect on the many linear combinations that can be
formed with the rows of the matrix. This is stated more carefully in the following theorem.
Theorem REMRS
Row-Equivalent Matrices have equal Row Spaces
Suppose A and B are row-equivalent matrices. Then 7Z(A) =R(B).                                  D
Proof Two matrices are row-equivalent (Definition REM [28]) if one can be obtained from another by
a sequence of (possibly many) row operations. We will prove the theorem for two matrices that differ
by a single row operation, and then this result can be applied repeatedly to get the full statement of the
theorem. The row spaces of A and B are spans of the columns of their transposes. For each row operation
we perform on a matrix, we can define an analogous operation on the columns. Perhaps we should call
these column operations. Instead, we will still call them row operations, but we will apply them to the
columns of the transposes.
   Refer to the columns of At and Bt as AZ and Bi, 1 < i < m. The row operation that switches rows will
just switch columns of the transposed matrices. This will have no effect on the possible linear combinations
formed by the columns.
   Suppose that BL is formed from At by multiplying column At by a~ # 0. In other words, Be  ai
and By    As for all i # t. We need to establish that two sets are equal, C(AL) - C(Bt). We will take a
generic element of one and show that it is contained in the other.

                    !31B1+2B2 +        +3B  -|- - -3|-t + - -+ !mBm
                           = #1A1i+ #22+ +3A      -|  + --- (a{At) + - -+ !3mAm
                           = 1A1 + #22+ +3A    +|      | (a/t) A+ - -+ !mAm


says that C (Bt) c C(At). Similarly,

                   ~y1A1+,Y2A2 + yA+..+- ~+...+ -mAm
                            +1A1+7  y2A2 -   3A3- +.-..+-|-7  At + .-..-+- 7mAm

                                                                                          Version 2.02


﻿
                                                   Subsection CRS.RSM Row Space of a Matrix 246


                         =yA1i+72A2 +73A3 + ---+  (caAt) + . -+ymAm

                         = y1B1i+72B2 +73B3 + ---t+ Bt + --.-+rymBm

says that C(At) C C(Bt). So R(A) = C(At) = C(Bt) = R(B) when a single row operation of the second
type is performed.
   Suppose now that Bt is formed from At by replacing At with aAs + At for some a E C and s # t. In
other words, Bt = aAs + At, and B = AZ for i # t.

          31B1+32B2 + /3B3 + -.- + #sBs + -. + i3tBt +-.- + !3mBm
                =3iAi+ 32A2 + #3A3 +      -+3As +     -+ 3t (aAs + At) + -.+ 3mAm
                  31A1 + #Q2A2 + /33A3 + -.- +3sAs + --. + (%o) A8 + !3tAt + -..-+ !3mAm
                = 31A1+ #2A2 + 33A3 +     -+3As + (/ta) As +    -+  tAt +  -+ 3mAm
                = 31A1i+!2A2+     !3A3+|+(!3s+--ta)As +- -+!3tAt+---+!3mAm
says that C (Bt) C(At). Similarly,

       yiAi+ y2A2 + y3A3 + --+ ysAs + --. + 7tAt + -. + ymAm
         = 71iA1 + 72A2 + 73A3 +.-.-.-+ 7sAs + -.-. + (-a-ytAs + aytAs) + 7tAt + -.-.-+ 7mAm
         =yiAi + y2A2 + y3A3 + ... + (-c +tA8) + y8As+ + ... + (aA8 + ytA) + ... +ymAm


         = 71B1 +72B2 +73B3 + - -+ (-a-yt+73s) Bs++ - -+tBt++ - -+mBm
says that C(At) C C(Bt). So R(A) = C(At) = C(Bt) = R(B) when a single row operation of the third
type is performed.
   So the row space of a matrix is preserved by each row operation, and hence row spaces of row-equivalent
matrices are equal sets.                                                                    U

Example RSREM
Row spaces of two row-equivalent matrices
In Example TREM [28] we saw that the matrices

                       2  -1   3   4                        1   1   0    6
                  A=   5   2  -2 3                     B=3      0   -2 -9
                       1   1   0   6_                       2  -1   3    4_

are row-equivalent by demonstrating a sequence of two row operations that converted A into B. Applying
Theorem REMRS [244] we can say

                         r -1     2 1   1            1     0 2- r-1


   Theorem REMRS [244] is at its best when one of the row-equivalent matrices is in reduced row-echelon
form. The vectors that correspond to the zero rows can be ignored. (Who needs the zero vector when
building a span? See Exercise LI.T10 [144].) The echelon pattern insures that the nonzero rows yield
vectors that are linearly independent. Here's the theorem.


Theorem BRS
Basis for the Row Space
Suppose that A is a matrix and B is a row-equivalent matrix in reduced row-echelon form. Let S be the
set of nonzero columns of B'. Then


Version 2.02


﻿
                                                      Subsection CRS.RSM   Row Space of a Matrix 247


  1. R(A) = (S).

  2. S is a linearly independent set.


Proof    From Theorem REMRS [244] we know that R(A) = R(B). If B has any zero rows, these
correspond to columns of Bt that are the zero vector. We can safely toss out the zero vector in the span
construction, since it can be recreated from the nonzero vectors by a linear combination where all the
scalars are zero. So 7Z(A) = (S).
   Suppose B has r nonzero rows and let D = {di, d2, d3, ..., dr} denote the column indices of B that
have a leading one in them. Denote the r column vectors of Bt, the vectors in S, as B1, B2, B3, ... , Br.
To show that S is linearly independent, start with a relation of linear dependence

                               o1B1 +2B2+ aB3----|arBr = 0

Now consider this vector equality in location di. Since B is in reduced row-echelon form, the entries of
column di of B are all zero, except for a (leading) 1 in row i. Thus, in Bt, row di is all zeros, excepting a
1 in column i. So, for 1 < i < r,

        0 = [0]d                                                     Definition ZCV [25]
          = [a1B1 + a2B2 + a3B3 + - + -O-rBr]d                       Definition RLDCV [132]
          = [o1B1]di + [a2B2]d. + [a3B3]d. + - - - + [arBr]d +  Definition MA [182]
          = ai [B1]d + 62 [B2]d. + a3 [B3]di + - - - + ar [Br]di +   Definition MSM [183]
          = c1(0) + 62(0) + 63(0) + - - - + ac (1) + - - - + ar (0)  Definition RREF [30]


So we conclude that ai = 0 for all 1 < i < r, establishing the linear independence of S (Definition LICV
[132]).                                                                                           U

Example IAS
Improving a span
Suppose in the course of analyzing a matrix (its column space, its null space, its...) we encounter the
following set of vectors, described by a span

                                        1      3      1      -3
                                        2     -1     -1       2
                              X         1 ,    2   ,  0   ,  -3
                                        6     -1     -1       6


Let A be the matrix whose rows are the vectors in X, so by design X=R()


                                      Ar2[1            a    16


Row-reduce A to form a row-equivalent matrix in reduced row-echelon form,


       110      0   2   -1

B_     0        0   3    1
       0   0    1  -2    5
       0   0   0    0    0]


Version 2.02


﻿
                                                      Subsection CRS.RSM  Row Space of a Matrix 248


Then Theorem BRS [245] says we can grab the nonzero columns of Bt and write
                                                     1     0      0
                                                     0     1      0
                          X = R(A) = R(B) =          0   , 0 ,    1
                                                     2     3     -2
                                                     -1_   1_    _5 _
These three vectors provide a much-improved description of X. There are fewer vectors, and the pattern
of zeros and ones in the first three entries makes it easier to determine membership in X. And all we
had to do was row-reduce the right matrix and toss out a zero row. Next to row operations themselves,
this is probably the most powerful computational technique at your disposal as it quickly provides a much
improved description of a span, any span.
   Theorem BRS [245] and the techniques of Example IAS [246] will provide yet another description of
the column space of a matrix. First we state a triviality as a theorem, so we can reference it later.
Theorem CSRST
Column Space, Row Space, Transpose
Suppose A is a matrix. Then C(A) = R(At).                                                        D
Proof
                     C(A) = C ((At)t)                     Theorem TT [187]

                          = 7Z(At)                        Definition RSM [243]


   So to find another expression for the column space of a matrix, build its transpose, row-reduce it, toss
out the zero rows, and convert the nonzero rows to column vectors to yield an improved set for the span
construction. We'll do Archetype I [737], then you do Archetype J [741].
Example CSROI
Column space from row operations, Archetype I
To find the column space of the coefficient matrix of Archetype I [737], we proceed as follows. The matrix
is
                                    1    4   0   -1    0    7   -9
                                    2    8  -1    3    9   -13   7
                                 I  0   0    2   -3 -4     12   -8*
                                   -1 -4     2    4    8   -31   37]
The transpose is
                                        1     2    0   -1
                                        4     8    0   -4
                                        o    -1    2    2
                                        -1    3   -3    4   .
                                        o     9   -4    8
                                        7   -13   12   -31
                                        -9    7   -8    37_
Row-reduced this becomes,
                                         1 00 -1

                                         0    0~I1


0   0    0    0.
0   0    0    0
0   0    0    0
0   0    0    0


Version 2.02


﻿
                                                          Subsection CRS.READ  Reading Questions 249


Now, using Theorem CSRST [247] and Theorem BRS [245]

                                                   1       0      0
                                  C I)=RIt_        0       1      0
                                  C() 2(*)=0            '  0   '  1      *
                                                    31     12     13
                                                    - 7 -J -7 -   7 -
This is a very nice description of the column space. Fewer vectors than the 7 involved in the definition,
and the pattern of the zeros and ones in the first 3 slots can be used to advantage. For example, Archetype
I [737] is presented as a consistent system of equations with a vector of constants

                                                    3
                                              b =      .
                                                    4

Since IS(I, b) is consistent, Theorem CSCS [237] tells us that b E C(I). But we could see this quickly
with the following computation, which really only involves any work in the 4th entry of the vectors as the
scalars in the linear combination are dictated by the first three entries of b.

                                      3         1         0        0
                                b    []    3 [%     +9[       +1[]
                                      1         0   +     0   +    1
                                      4         31        12       13
                                        - -   -  7 -      7 -      7 -

Can you now rapidly construct several vectors, b, so that IJS(I, b) is consistent, and several more so that
the system is inconsistent?


Subsection READ
Reading Questions


  1. Write the column space of the matrix below as the span of a set of three vectors and explain your
     choice of method.
                                                 1  3   1 3
                                                 2  0   1 1
                                                 -1 2   1 0

  2. Suppose that A is an n x n nonsingular matrix. What can you say about its column space?


  3. Is the vector [] in the row space of the following matrix? Why or why not?


Version 2.02


﻿
                                                                Subsection CRS.EXC  Exercises 250


Subsection EXC
Exercises


C30 Example CSOCD [240] expresses the column space of the coefficient matrix from Archetype D [716]
(call the matrix A here) as the span of the first two columns of A. In Example CSMCS [236] we determined
that the vector
                                                  2
                                            c = 3
                                                  2
was not in the column space of A and that the vector

                                                 .8
                                           b    [-12
                                                  4

was in the column space of A. Attempt to write c and b as linear combinations of the two vectors in the
span construction for the column space in Example CSOCD [240] and record your observations.
Contributed by Robert Beezer Solution [253]

C31 For the matrix A below find a set of vectors T meeting the following requirements: (1) the span of
T is the column space of A, that is, (T) = C(A), (2) T is linearly independent, and (3) the elements of T
are columns of A.
                                          2   1    4   -1 2
                                       _ 1   -1    5   1   1
                                       A -1   2   -7   0   1
                                         2   -1    8   -1 2_

Contributed by Robert Beezer   Solution [253]

C32 In Example CSAA [241], verify that the vector b is not in the column space of the coefficient matrix.
Contributed by Robert Beezer

C33 Find a linearly independent set S so that the span of S, (S), is row space of the matrix B, and S
is linearly independent.
                                             2  3 1    1
                                      B4     1   1 0   1
                                            -1 2 3 -4_

Contributed by Robert Beezer Solution [253]

C34 For the 3 x 4 matrix A and the column vector y E C4 given below, determine if y is in the row
space of A. In other words, answer the question: y E 7Z(A)? (15 points)


                      A=47       -3 0   -3y=


Contributed by Robert Beezer Solution [253]


C35 For the matrix A below, find two different linearly independent sets whose spans equal the column
space of A, C(A), such that
(a) the elements are each columns of A.


Version 2.02


﻿
                                                                   Subsection CRS.EXC  Exercises 251


(b) the set is obtained by a procedure that is substantially different from the procedure you use in part
(a).

                                             3    5   1 -2
                                       A=     1   2   3   3
                                            -3 -4 7       13

Contributed by Robert Beezer Solution [254]

C40 The following archetypes are systems of equations. For each system, write the vector of constants
as a linear combination of the vectors in the span construction for the column space provided by Theorem
BCS [239] (these vectors are listed for each of these archetypes).
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]
Archetype E [720]
Archetype F [724]
Archetype G [729]
Archetype H [733]
Archetype I [737]
Archetype J [741]

Contributed by Robert Beezer

C42 The following archetypes are either matrices or systems of equations with coefficient matrices. For
each matrix, compute a set of column vectors such that (1) the vectors are columns of the matrix, (2) the
set is linearly independent, and (3) the span of the set is the column space of the matrix. See Theorem
BCS [239].
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]
Archetype G [729]/Archetype H [733]
Archetype I [737]
Archetype J [741]
Archetype K [746]
Archetype L [750]


Contributed by Robert Beezer

C50 The following archetypes are either matrices or systems of equations with coefficient matrices. For
each matrix, compute a set of column vectors such that (1) the set is linearly independent, and (2) the
span of the set is the row space of the matrix. See Theorem BRS [245].
Archetype A [702]
Archetype B [707]


Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]
Archetype G [729]/Archetype H [733]


Version 2.02


﻿
                                                                   Subsection CRS.EXC   Exercises 252


Archetype I [737]
Archetype J [741]
Archetype K [746]
Archetype L [750]


Contributed by Robert Beezer

C51 The following archetypes are either matrices or systems of equations with coefficient matrices. For
each matrix, compute the column space as the span of a linearly independent set as follows: transpose the
matrix, row-reduce, toss out zero rows, convert rows into column vectors. See Example CSROI [247].
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]
Archetype G [729]/Archetype H [733]
Archetype I [737]
Archetype J [741]
Archetype K [746]
Archetype L [750]


Contributed by Robert Beezer

C52 The following archetypes are systems of equations. For each different coefficient matrix build two
new vectors of constants. The first should lead to a consistent system and the second should lead to an
inconsistent system. Descriptions of the column space as spans of linearly independent sets of vectors with
"nice patterns" of zeros and ones might be most useful and instructive in connection with this exercise.
(See the end of Example CSROI [247].)
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]
Archetype G [729]/Archetype H [733]
Archetype I [737]
Archetype J [741]

Contributed by Robert Beezer

M1O For the matrix £ below, find vectors b and c so that the system IJS(E, b) is consistent and
[S(E, c) is inconsistent.
                                           -23      1   10]


Contributed by Robert Beezer Solution [254]


M20 Usually the column space and null space of a matrix contain vectors of different sizes. For a square
matrix, though, the vectors in these two sets are the same size. Usually the two sets will be different.
Construct an example of a square matrix where the column space and null space are equal.


Version 2.02


﻿
                                                                  Subsection CRS.EXC  Exercises 253


Contributed by Robert Beezer Solution [255]

M21 We have a variety of theorems about how to create column spaces and row spaces and they frequently
involve row-reducing a matrix. Here is a procedure that some try to use to get a column space. Begin
with an m x n matrix A and row-reduce to a matrix B with columns B1, B2, B3, ..., Ba. Then form the
column space of A as
                               C(A) = ({B1, B2, B3, ... , Ba}) =C(B)
This is not not a legitimate procedure, and therefore is not a theorem. Construct an example to show that
the procedure will not in general create the column space of A.
Contributed by Robert Beezer Solution [255]

T40   Suppose that A is an m x n matrix and B is an n x p matrix. Prove that the column space of AB is
a subset of the column space of A, that is C(AB) C C(A). Provide an example where the opposite is false,
in other words give an example where C(A) g C(AB). (Compare with Exercise MM.T40 [207].)
Contributed by Robert Beezer Solution [255]

T41   Suppose that A is an m x n matrix and B is an n x n nonsingular matrix. Prove that the column
space of A is equal to the column space of AB, that is C(A) =C(AB). (Compare with Exercise MM.T41
[207] and Exercise CRS.T40 [252].)
Contributed by Robert Beezer Solution [255]

T45   Suppose that A is an m x n matrix and B is an n x m matrix where AB is a nonsingular matrix.
Prove that
(1) N (B) = {0}f
(2) C(B) nAf(A) ={o}
Discuss the case when m = n in connection with Theorem NPNT [226].
Contributed by Robert Beezer Solution [255]


Version 2.02


﻿
                                                                  Subsection CRS.SOL  Solutions 254


Subsection SOL
Solutions


C30    Contributed by Robert Beezer   Statement [249]
In each case, begin with a vector equation where one side contains a linear combination of the two vectors
from the span construction that gives the column space of A with unknowns for scalars, and then use
Theorem SLSLC [93] to set up a system of equations. For c, the corresponding system has no solution, as
we would expect.
   For b there is a solution, as we would expect. What is interesting is that the solution is unique. This
is a consequence of the linear independence of the set of two vectors in the span construction. If we wrote
b as a linear combination of all four columns of A, then there would be infinitely many ways to do this.
C31 Contributed by Robert Beezer Statement [249]
Theorem BCS [239] is the right tool for this problem. Row-reduce this matrix, identify the pivot columns
and then grab the corresponding columns of A for the set T. The matrix A row-reduces to

                                       W    0    3   0    0
                                       0        -2   0    0
                                       0    0    0        0
                                       0    0    0   0   L

So D = {1, 2, 4, 5} and then


                                                 2[k]     1      -1     2
                      T = {A1, A2, A4, A5} =          -['2:1      ' 11
                                                  .2 _    -1     - 1_   _2_

has the requested properties.
C33    Contributed by Robert Beezer   Statement [249]
Theorem BRS [245] is the most direct route to a set with these properties. Row-reduce, toss zero rows,
keep the others. You could also transpose the matrix, then look for the column space by row-reducing the
transpose and applying Theorem BCS [239]. We'll do the former,

                                                 1 0   -1   2
                                   B RREF:     0       1   -1
                                               1 0

                                               01


C34 Contributed by Robert Beezer Statement [249]


y E R(A) <     y E C(AL)                           Definition RSM [243]
           -   ES(At, y) is consistent             Theorem CSCS [237]


Version 2.02


﻿
                                                                   Subsection CRS.SOL  Solutions 255


The augmented matrix [At I y] row reduces to

                                          1    0    0   0
                                          0    0    0   0

                                          0    0   Ft   0

and with a leading 1 in the final column Theorem RCLS [53] tells us the linear system is inconsistent and
soy ¢ R(A).
C35    Contributed by Robert Beezer  Statement [249]
(a) By Theorem BCS [239] we can row-reduce A, identify pivot columns with the set D, and "keep" those
columns of A and we will have a set with the desired properties.

                                              1   0   -13 -19
                                   A RREF          0   8     11


So we have the set of pivot columns D = {1, 2} and we "keep" the first two columns of A,

                                              3      5
                                              1   ,  2
                                          [-3]      -4_
(b) We can view the column space as the row space of the transpose (Theorem CSRST [247]). We can
get a basis of the row space of a matrix quickly by bringing the matrix to reduced row-echelon form and
keeping the nonzero rows as column vectors (Theorem BRS [245]). Here goes.

                                                  0       -2
                                      At RREF,    00[]3


Taking the nonzero rows and tilting them up as columns gives us

                                               1     0
                                               0   , 1
                                               -2]   3

An approach based on the matrix L from extended echelon form (Definition EEF [261]) and Theorem FS
[263] will work as well as an alternative approach.
M1O Contributed by Robert Beezer Statement [251]
Any vector from C3 will lead to a consistent system, and therefore there is no vector that will lead to an
inconsistent system.
   How do we convince ourselves of this? First, row-reduce B,

                                                  1001

                                               0    o   2    1


If we augment E with any vector of constants, and row-reduce the augmented matrix, we will never find
a leading 1 in the final column, so by Theorem RCLS [53] the system will always be consistent.
   Said another way, the column space of E is all of C3, C(E) = C3. So by Theorem CSCS [237] any
vector of constants will create a consistent system (and none will create an inconsistent system).


Version 2.02


﻿
                                                                  Subsection CRS.SOL  Solutions 256


M20     Contributed by Robert Beezer   Statement [251]
The 2 x 2 matrix

                                              1[ 11]

has C(A) = N(A) = _{ [1l }>.

M21     Contributed by Robert Beezer   Statement [252]
Begin with a matrix A (of any size) that does not have any zero rows, but which when row-reduced to B
yields at least one row of zeros. Such a matrix should be easy to construct (or find, like say from Archetype
A [702]).
   C(A) will contain some vectors whose final slot (entry m) is non-zero, however, every column vector
from the matrix B will have a zero in slot m and so every vector in C(B) will also contain a zero in the
final slot. This means that C(A) # C(B), since we have vectors in C(A) that cannot be elements of C(B).
T40    Contributed by Robert Beezer   Statement [252]
Choose x c C(AB). Then by Theorem CSCS [237] there is a vector w that is a solution to IS(AB, x).
Define the vector y by y = Bw. We're set,

                     Ay = A (Bw)                      Definition of y
                           (AB) w                     Theorem MMA [202]
                        =x                            w solution to [S(AB, x)


This says that [S(A, x) is a consistent system, and by Theorem CSCS [237], we see that x E C(A) and
therefore C(AB) C C(A).
   For an example where C(A) g C(AB) choose A to be any nonzero matrix and choose B to be a zero
matrix. Then C(A) # {0} and C(AB) = C(O) =_{0}.
T41    Contributed by Robert Beezer   Statement [252]
From the solution to Exercise CRS.T40 [252] we know that C(AB) C C(A). So to establish the set equality
(Definition SE [684]) we need to show that C(A) C C(AB).
   Choose x E C(A). By Theorem CSCS [237] the linear system [S(A, x) is consistent, so let y be one
such solution. Because B is nonsingular, and linear system using B as a coefficient matrix will have a
solution (Theorem NMUS [74]). Let w be the unique solution to the linear system [S(B, y). All set, here
we go,

                    (AB) w = A (Bw)                      Theorem MMA [202]
                             =Ay                         w solution to [S(B, y)
                             =x                          y solution to [S(A, x)


This says that the linear system IJS(AB, x) is consistent, so by Theorem CSCS [237], x C C(AB). So
C( A) C C( AB).
T45 Contributed by Robert Beezer Statement [252]
First, 0 C P1(B) trivially. Now suppose that x C N1(B). Then


ABx = A(Bx)                        Theorem MMA [202]
     A0                            x E N(B)
     = 0                           Theorem MMZM [200]


Version 2.02


﻿
                                                                   Subsection CRS.SOL  Solutions 257


Since we have assumed AB is nonsingular, Definition NM [71] implies that x = 0.
    Second, 0 E C(B) and 0 E N(A) trivially, and so the zero vector is in the intersection as well (Definition
SI [685]). Now suppose that y E C(B) fnN(A). Because y E C(B), Theorem CSCS [237] says the system
[S(B, y) is consistent. Let x E C" be one solution to this system. Then

                      ABx = A(Bx)                        Theorem MMA [202]
                           = Ay                          x solution to [S(B, y)
                           = 0                           y&E N(A)

Since we have assumed AB is nonsingular, Definition NM [71] implies that x = 0. Then y = Bx = BO= 0.
   When AB is nonsingular and m = n we know that the first condition, N(B) = {0}, means that
B is nonsingular (Theorem NMTNS [74]). Because B is nonsingular Theorem CSNM [242] implies that
C(B) =Ct". In order to have the second condition fulfilled, C(B) n N(A) = {0}, we must realize that
N(A)     {0}. However, a second application of Theorem NMTNS [74] shows that A must be nonsingular.
This reproduces Theorem NPNT [226].


Version 2.02


﻿
                                                                         Section FS Four Subsets 258


Section FS
Four Subsets


There are four natural subsets associated with a matrix. We have met three already: the null space,
the column space and the row space. In this section we will introduce a fourth, the left null space. The
objective of this section is to describe one procedure that will allow us to find linearly independent sets
that span each of these four sets of column vectors. Along the way, we will make a connection with the
inverse of a matrix, so Theorem FS [263] will tie together most all of this chapter (and the entire course
so far).


Subsection LNS
Left Null Space


Definition LNS
Left Null Space
Suppose A is an m x n matrix. Then the left null space is defined as 12(A) =PN(At)   C".
(This definition contains Notation LNS.)                                                          A

   The left null space will not feature prominently in the sequel, but we can explain its name and connect
it to row operations. Suppose y E [(A). Then by Definition LNS [257], Aty = 0. We can then write

                       Ot = (Aty)t                        Definition LNS [257]
                         = yt (At)t                       Theorem MMT [203]
                         = ytA                            Theorem TT [187]

The product ytA can be viewed as the components of y acting as the scalars in a linear combination of
the rows of A. And the result is a "row vector", O that is totally zeros. When we apply a sequence
of row operations to a matrix, each row of the resulting matrix is some linear combination of the rows.
These observations tell us that the vectors in the left null space are scalars that record a sequence of row
operations that result in a row of zeros in the row-reduced version of the matrix. We will see this idea
more explicitly in the course of proving Theorem FS [263].

Example LNS
Left null space
We will find the left null space of


                                            -r[2 1I


We transpose A and row-reduce,


        1 -2 1 91                    0    0   2
At = -3 1 5 -4 RREF: 0  1     0 -3
        1   1 1 0_               0   0 2 1_


Version 2.02


﻿
                                                     Subsection FS.CRS Computing Column Spaces 259


Applying Definition LNS [257] and Theorem BNS [139] we have


                                   [(A) =H( At =           _}


If you row-reduce A you will discover one zero row in the reduced row-echelon form. This zero row is
created by a sequence of row operations, which in total amounts to a linear combination, with scalars
ai = -2, a2 = 3, a3= -1 and a4 =1, on the rows of A and which results in the zero vector (check this!).
So the components of the vector describing the left null space of A provide a relation of linear dependence
on the rows of A.


Subsection CRS
Computing Column Spaces


We have three ways to build the column space of a matrix. First, we can use just the definition, Definition
CSM [236], and express the column space as a span of the columns of the matrix. A second approach gives
us the column space as the span of some of the columns of the matrix, but this set is linearly independent
(Theorem BCS [239]). Finally, we can transpose the matrix, row-reduce the transpose, kick out zero rows,
and transpose the remaining rows back into column vectors. Theorem CSRST [247] and Theorem BRS
[245] tell us that the resulting vectors are linearly independent and their span is the column space of the
original matrix.
   We will now demonstrate a fourth method by way of a rather complicated example. Study this example
carefully, but realize that its main purpose is to motivate a theorem that simplifies much of the apparent
complexity. So other than an instructive exercise or two, the procedure we are about to describe will not
be a usual approach to computing a column space.

Example CSANS
Column space as null space
Lets find the column space of the matrix A below with a new approach.

                                         10    0    3    8     7
                                         -16  -1 -4     -10   -13
                                    A=   -6    1   -3   -6    -6
                                  A       0    2   -2   -3    -2
                                          3    0    1    2     3
                                          -1  -1    1    1     0 _

By Theorem CSCS [237] we know that the column vector b is in the column space of A if and only if the
linear system IJS(A, b) is consistent. So let's try to solve this system in full generality, using a vector of
variables for the vector of constants. In other words, which vectors b lead to consistent systems? Begin
by forming the augmented matrix [A |b] with a general version of b,


10     0    3    8     7   bi
-16   -1 -4     -10  -13 b2
-6     1   -3   -6    -6   b3
0      2   -2   -3    -2   b4
3      0    1    2     3   b5
-1    -1    1    1     0   b6_


Version 2.02


﻿
                                                      Subsection FS.CRS  Computing Column Spaces 260


To identify solutions we will row-reduce this matrix and bring it to reduced row-echelon form. Despite the
presence of variables in the last column, there is nothing to stop us from doing this. Except our numerical
routines on calculators can't be used, and even some of the symbolic algebra routines do some unexpected
maneuvers with this computation. So do it by hand. Yes, it is a bit of work. But worth it. We'll still
be here when you get back. Notice along the way that the row operations are exactly the same ones you
would do if you were just row-reducing the coefficient matrix alone, say in connection with a homogeneous
system of equations. The column with the bi acts as a sort of bookkeeping device. There are many different
possibilities for the result, depending on what order you choose to perform the row operations, but shortly
we'll all be on the same page. Here's one possibility (you can find this same result by doing additional row
operations with the fifth and sixth rows to remove any occurrences of bi and b2 from the first four rows of
your result):

                            1     0   0    0   2      b3-b4+2b5-b6
                            0    [    0    0   -3   -2b3 + 3b4 - 3b5 + 3b6
                            0     0        0    1     b3+b4+3b5+3b6
                            0     0   0   [    -2      -2b3-+b4 - 4b5
                            0     0   0    0   0   b1+3b3-b4+3b5+b6
                            0     0   0    0   0    b2-2b3+b4+b5-b6

Our goal is to identify those vectors b which make [S(A, b) consistent. By Theorem RCLS [53] we know
that the consistent systems are precisely those without a leading 1 in the last column. Are the expressions
in the last column of rows 5 and 6 equal to zero, or are they leading 1's? The answer is: maybe. It depends
on b. With a nonzero value for either of these expressions, we would scale the row and produce a leading
1. So we get a consistent system, and b is in the column space, if and only if these two expressions are
both simultaneously zero. In other words, members of the column space of A are exactly those vectors b
that satisfy

                                      b1+3b3-b4+3b5+b6= 0
                                      b2 - 2b3 + b4 + b5 - b= 0

Hmmm. Looks suspiciously like a homogeneous system of two equations with six variables. If you've
been playing along (and we hope you have) then you may have a slightly different system, but you should
have just two equations. Form the coefficient matrix and row-reduce (notice that the system above has a
coefficient matrix that is already in reduced row-echelon form). We should all be together now with the
same matrix,
                                       L  1   0    3   -1 3     1
                                          0[      -2    1   1 -1]

So, C(A) - N~(L) and we can apply Theorem BNS [139] to obtain a linearly independent set to use in a
span construction,


            Whw!A apotcrptt ti cnta eaml,3o ma wis tocnic-1                  refthttefu          etr


above really are elements of the column space? Do they create consistent systems with A as coefficient
matrix? Can you recognize the constant vector in your description of these solution sets?
   OK, that was so much fun, let's do it again. But simpler this time. And we'll all get the same results
all the way through. Doing row operations by hand with variables can be a bit error prone, so let's see if


Version 2.02


﻿
                                                      Subsection FS.CRS Computing Column Spaces 261


we can improve the process some. Rather than row-reduce a column vector b full of variables, let's write
b = I6b and we will row-reduce the matrix I6 and when we finish row-reducing, then we will compute the
matrix-vector product. You should first convince yourself that we can operate like this (this is the subject
of a future homework exercise). Rather than augmenting A with b, we will instead augment it with I6
(does this feel familiar?),
                                 10    0   3     8     7   1 0 0 0 0 0
                                 -16  -1 -4     -10   -13 0 1 0 0 0 0
                            M=  -6     1   -3   -6    -6   0 0 1 0 0 0
                                 0     2   -2   -3    -2   0 0 0 1 0 0
                                 3     0    1    2     3   0 0 0 0 1 0
                                 -1   -1    1    1     0   0 0 0 0 0 1
We want to row-reduce the left-hand side of this matrix, but we will apply the same row operations to the
right-hand side as well. And once we get the left-hand side in reduced row-echelon form, we will continue on
to put leading 1's in the final two rows, as well as clearing out the columns containing those two additional
leading 1's. It is these additional row operations that will ensure that we all get to the same place, since
the reduced row-echelon form is unique (Theorem RREFU [32]),
                                 1 0 0 0       2   0 0    1   -1   2    -1
                                 0 1 0 0 -3 0 0 -2             3   -3   3
                             0 0 1 0           1  0 0     1    1   3    3
                                 0 0 0 1 -2 0 0 -2             1   -4   0
                                 0 0 0 0       0   1 0    3   -1   3    1
                                 0 0 0 0       0   0 1 -2      1    1   -1_
We are after the final six columns of this matrix, which we will multiply by b
                                          0 0    1   -1   2    -1
                                          0 0 -2      3   -3   3
                                          0 0    1    1   3    3
                                          0 0 -2      1   -4   0
                                          1 0    3   -1   3    1
                                          0 1 -2      1    1   -1_
so
                         0 0     1   -1   2   -1     b1        b3-b4+2b5-b6
                         0 0 -2      3    -3   3     b2      -2b3 + 3b4 - 3b5 + 3b6

                  Jb _ 0 0       1    1   3    3     b3 _      b3-+b4 + 3b5 + 3b6
                         0 0 -2       1   -4   0     b4         -2b3-+b4 - 4b5
                         1 0     3   -1   3    1     b5     b1+3b3-b4+3b5+b6
                         _0 1 -2      1   1   -1_ _be_       b2 -2b3+--b4+--b5 - b6
So by applying the same row operations that row-reduce A to the identity matrix (which we could do with
a calculator once 16 is placed alongside of A), we can then arrive at the result of row-reducing a column
of symbols where the vector of constants usually resides. Since the row-reduced version of A has two zero
rows, for a consistent system we require that
                                      bi+ 3b3 -b4 +3b5 + b6  0
                                        b- 2b3 +| b4 +| b5 - b- 0
Now we are exactly back where we were on the first go-round. Notice that we obtain the matrix L as


simply the last two rows and last six columns of N.
    This example motivates the remainder of this section, so it is worth careful study. You might attempt
to mimic the second approach with the coefficient matrices of Archetype I [737] and Archetype J [741]. We
will see shortly that the matrix L contains more information about A than just the column space.


Version 2.02


﻿
                                                       Subsection FS.EEF  Extended echelon form 262


Subsection EEF
Extended echelon form


The final matrix that we row-reduced in Example CSANS [258] should look familiar in most respects to
the procedure we used to compute the inverse of a nonsingular matrix, Theorem CINM [217]. We will
now generalize that procedure to matrices that are not necessarily nonsingular, or even square. First a
definition.
Definition EEF
Extended Echelon Form
Suppose A is an m x n matrix. Add m new columns to A that together equal an m x m identity matrix
to form an m x (n + m) matrix M. Use row operations to bring M to reduced row-echelon form and call
the result N. N is the extended reduced row-echelon form of A, and we will standardize on names
for five submatrices (B, C, J, K, L) of N.
   Let B denote the m x n matrix formed from the first n columns of N and let J denote the m x m
matrix formed from the last m columns of N. Suppose that B has r nonzero rows. Further partition N by
letting C denote the r x n matrix formed from all of the non-zero rows of B. Let K be the r x m matrix
formed from the first r rows of J, while L will be the (m - r) x m matrix formed from the bottom m - r
rows of J. Pictorially,
                                         RRE                 C   K
                             M =[A|Im]         N =[B|J]=

                                                                                                A

Example SEEF
Submatrices of extended echelon form
We illustrate Definition EEF [261] with the matrix A,

                                      1   -1  -2    7    1     6
                                 A =   6   2  -4   -18   -3  -26
                                      4   -1   4    10    2   17
                                      3   -1   2    9     1   12]

Augmenting with the 4 x 4 identity matrix, M=

                              1   -1 -2     7    1    6    1 0 0 0
                              -6  2   -4 -18 -3 -26 0 1 0 0
                              4   -1   4    10   2    17   0 0 1 0


and row-reducing, we obtain

                                  102 1         03 0          1 1 1]
                             N_ 0       4 -6    0   -1   0    2   3  0
                                0   0   0   0   2    2   0   -1  2 72]
                                00 00 0 0W                    221

So we then obtain


       0      2   1    0   3

B-     0       4 -6    0   -1
       0   0  0   0        2
       0   0  0   0    0   0]


Version 2.02


﻿
                                                          Subsection FS.EEF  Extended echelon form 263


                                          1    0  2    1   0    3
                                   C=     0    W[]4 -6     0   -1
                                          0    0  0   0    1    2]
                                          0    1   1   1
                                        _0     2   3   0
                                          0   -1 0    -2
                                          1    2   2   1
                                          0   1   1   1
                                   K=     0   2   3   0
                                          0 -1 0 -2_
                                    L = [1    2 2 1]

You can observe (or verify) the properties of the following theorem with this example.

Theorem PEEF
Properties of Extended Echelon Form
Suppose that A is an m x n matrix and that N is its extended echelon form. Then

  1. J is nonsingular.

  2. B = JA.

  3. If x E C"m and y E Ctm, then Ax = y if and only if Bx = Jy.

  4. C is in reduced row-echelon form, has no zero rows and has r pivot columns.

  5. L is in reduced row-echelon form, has no zero rows and has m - r pivot columns.


Proof J is the result of applying a sequence of row operations to Im, as such J and Im are row-equivalent.
IJS(Im, 0) has only the zero solution, since Im is nonsingular (Theorem NMRRI [72]). Thus, [S(J, 0) also
has only the zero solution (Theorem REMES [28], Definition ESYS [11]) and J is therefore nonsingular
(Definition NSM [64]).
    To prove the second part of this conclusion, first convince yourself that row operations and the matrix-
vector are commutative operations. By this we mean the following. Suppose that F is an m x n matrix that
is row-equivalent to the matrix G. Apply to the column vector Fw the same sequence of row operations
that converts F to G. Then the result is Gw. So we can do row operations on the matrix, then do a
matrix-vector product, or do a matrix-vector product and then do row operations on a column vector, and
the result will be the same either way. Since matrix multiplication is defined by a collection of matrix-
vector products (Definition MM [197]), if we apply to the matrix product FH the same sequence of row
operations that converts F to C then the result will equal GH. Now apply these observations to A.
    Write AI,    ImA and apply the row operations that convert M to N. A is converted to B, while Im
is converted to J, so we have BI,   JA. Simplifying the left side gives the desired conclusion.
    For the third conclusion, we now establish the two equivalences

           Ax =y              <>JAx =Jy                             <>Bx =Jy


The forward direction of the first equivalence is accomplished by multiplying both sides of the matrix
equality by J, while the backward direction is accomplished by multiplying by the inverse of J (which we
know exists by Theorem NI [228] since J is nonsingular). The second equivalence is obtained simply by
the substitutions given by JA = B.


Version 2.02


﻿
                                                                      Subsection FS.FS Four Subsets 264


    The first r rows of N are in reduced row-echelon form, since any contiguous collection of rows taken
from a matrix in reduced row-echelon form will form a matrix that is again in reduced row-echelon form.
Since the matrix C is formed by removing the last n entries of each these rows, the remainder is still in
reduced row-echelon form. By its construction, C has no zero rows. C has r rows and each contains a
leading 1, so there are r pivot columns in C.
    The final m - r rows of N are in reduced row-echelon form, since any contiguous collection of rows
taken from a matrix in reduced row-echelon form will form a matrix that is again in reduced row-echelon
form. Since the matrix L is formed by removing the first n entries of each these rows, and these entries
are all zero (they form the zero rows of B), the remainder is still in reduced row-echelon form. L is the
final m - r rows of the nonsingular matrix J, so none of these rows can be totally zero, or J would not
row-reduce to the identity matrix. L has m - r rows and each contains a leading 1, so there are m - r
pivot columns in L.


    Notice that in the case where A is a nonsingular matrix we know that the reduced row-echelon form
of A is the identity matrix (Theorem NMRRI [72]), so B = I. Then the second conclusion above says
JA = B = In, so J is the inverse of A. Thus this theorem generalizes Theorem CINM [217], though the
result is a "left-inverse" of A rather than a "right-inverse."
    The third conclusion of Theorem PEEF [262] is the most telling. It says that x is a solution to the linear
system [S(A, y) if and only if x is a solution to the linear system [S(B, Jy). Or said differently, if we
row-reduce the augmented matrix [A  y] we will get the augmented matrix [B  Jy]. The matrix J tracks
the cumulative effect of the row operations that converts A to reduced row-echelon form, here effectively
applying them to the vector of constants in a system of equations having A as a coefficient matrix. When
A row-reduces to a matrix with zero rows, then Jy should also have zero entries in the same rows if the
system is to be consistent.

Subsection FS
Four Subsets


With all the preliminaries in place we can state our main result for this section. In essence this result will
allow us to say that we can find linearly independent sets to use in span constructions for all four subsets
(null space, column space, row space, left null space) by analyzing only the extended echelon form of the
matrix, and specifically, just the two submatrices C and L, which will be ripe for analysis since they are
already in reduced row-echelon form (Theorem PEEF [262]).
Theorem FS
Four Subsets
Suppose A is an m x n~ matrix with extended echelon form N. Suppose the reduced row-echelon form of
A has r nonzero rows. Then C is the submatrix of N formed from the first r rows and the first n~ columns
and L is the submatrix of N formed from the last m columns and the last m - r rows. Then
   1. The null space of A is the null space of C, P1(A) =P1(C).

   2. The row space of A is the row space of C, 7Z(A) =- ()

   3. The column space of A is the null space of L, C(A) =P1(L).

   4. The left null space of A is the row space of L, [(A) =R(L).


Proof First, N(A) = N(B) since B is row-equivalent to A (Theorem REMES [28]). The zero rows of
B represent equations that are always true in the homogeneous system IS(B, 0), so the removal of these
equations will not change the solution set. Thus, in turn, N(B) = N(C).


Version 2.02


﻿
                                                                 Subsection FS.FS Four Subsets 265


   Second, R(A) = R(B) since B is row-equivalent to A (Theorem REMRS [244]). The zero rows of B
contribute nothing to the span that is the row space of B, so the removal of these rows will not change the
row space. Thus, in turn, R(B) = R(C).
   Third, we prove the set equality C(A) = N(L) with Definition SE [684]. Begin by showing that
C(A) C P1(L). Choose y E C(A) C Cm. Then there exists a vector x E C" such that Ax = y (Theorem
CSCS [237]). Then for 1 < k < m - r,


[Ly]k = [yy]r+k
     = [BX]r+k
       =[Ox]k
     = [0], k


L a submatrix of J
Theorem PEEF [262]
Zero matrix a submatrix of B
Theorem MMZM [200]


So, for all 1 < k < m - r, [Ly]k _ [0]k. So by Definition CVE [84] we have Ly = 0 and thus y E N(L).
   Now, show that N(L) C C(A). Choose y E N(L) C Cm. Form the vector Ky E Cr. The linear system
[S(C, Ky) is consistent since C is in reduced row-echelon form and has no zero rows (Theorem PEEF
[262]). Let x E C" denote a solution to [S(C, Ky).
   Then for 1 < j < r,


[Bx]3 = [Cx]3
     = [Ky]3

     = [Jy]


C a submatrix of B
x a solution to [S(C, Ky)
K a submatrix of J


And for r + 1  k <m,


[Bx]k [OX]k-r
        [0]k-r
        [Ly]k-r
      = [Jy],


Zero matrix a submatrix of B
Theorem MMZM [200]
y in P1(L)
L a submatrix of J


So for all 1 < i <im, [Bx]2 = [Jy]2 and by Definition CVE [84] we have Bx = Jy. From Theorem PEEF
[262] we know then that Ax = y, and therefore y E C(A) (Theorem CSCS [237]). By Definition SE [684]
we now have C(A) = N(L).
   Fourth, we prove the set equality [(A) = R(L) with Definition SE [684]. Begin by showing that
R(L) C [(A). Choose y E R(L) C Cm. Then there exists a vector w E Cm-r such that y = Ltw
(Definition RSM [243], Theorem CSCS [237]). Then for 1 < i < n,


         m
[Ay]        [At]ik [Y]k
        k=1
        m
        5 [At] ik [Ltw] k
        k=1
        m         m-r
        S   [At]2 5E  [Lt]kW [w]t
        k=1       f=1
        m m-r
        S   S   [At]2 [Lt kP [w],
        k=1 =1


Theorem EMP [198]


Definition of w


Theorem EMP [198]


Property DCN [681]


Version 2.02


﻿
Subsection FS.FS Four Subsets 266


m-r m
       [At] ik [Lt] kt[w]t
f=1 k=1
m-r   m
E     _  [At] 2k [Lt]k) [wi
f=1 k=1
m-r   m
S__ [At] ik Jt] kr+f) [w]
f=1 k=1
m-r

f=1


Property CACN [680]


Property DCN [681]


L a submatrix of J


Theorem EMP [198]


Theorem MMT [203]


Theorem PEEF [262]


Zero rows in B

Property ZCN [681]
Definition ZCV [25]


m-r

f=1


[(JA)tP ] [w]i


m-r
  E  Bt]  [w]k
f=1
m-r
Sio [wi
f=1
0
[1]0


Since [Aty]2 = [0]i for 1 < i <rn, Definition CVE [84] implies that Aty = 0. This means that y E N(At).
   Now, show that G(A) C R(L). Choose y E L(A) C Cm. The matrix J is nonsingular (Theorem PEEF
[262]), so Jt is also nonsingular (Theorem MIT [220]) and therefore the linear system [S(Jt, y) has a
unique solution. Denote this solution as x E Cm. We will need to work with two "halves" of x, which we
will denote as z and w with formal definitions given by


[z], = [x];


1   j  r,


[w]k   [ XrI+k


1< k<m-r


Now, for 1  j   r,


         r
[Ctz]3S [c]'jk [Z]k
        k=1
        r              m-r

        B S [Ctyj[Z]k + SE[O] [W]

        k=1             k=1
        r              m-r
        >1[Bt] 1k [Z]k + >1[Bt]1    [W]i


        k=1             f=1
        r               m-m
        -E>1[Bt] jk [Xlk + >1 [Bt1jrft[x]r+t

        r                m
        ES[Bt]jk[x]k+    SE  [B]j 1[x]k
        k=1             k=r+1
        m
        S=[B1  jk [Xlk
        k~1


Theorem EMP [198]


Definition ZM [185]


C, ( submatrices of B


Definitions of z and w


Re-index second sum


Combine sums


Theorem PEEF [262]


k=1


[(JA)t ] [x]k


Version 2.02


﻿
Subsection FS.FS Four Subsets 267


m
     S[J ljk[X~k

k=1
m m
E2 EAt] Jt ] IX
k=1 f=1
m m
E E [At] J ] l kIX
f=1 k=1
m           m
  E At ]e  [Jt ]
f=1 (k=1
m
E [At ]e [Jx] f


E [At ] [y],
e=1
[Aty]
[O],


Theorem MMT [203]


Theorem EMP [198]


Property CACN [680]


Property DCN [681]


Theorem EMP [198]


Definition of x

Theorem EMP [198]

y E [(A)


So, by Definition CVE [84], Ctz = 0 and the vector z gives us a linear combination of the columns of Ct
that equals the zero vector. In other words, z gives a relation of linear dependence on the the rows of C.
However, the rows of C are a linearly independent set by Theorem BRS [245]. According to Definition
LICV [132] we must conclude that the entries of z are all zero, i.e. z = 0.
   Now, for 1 < i <m, we have


[y]- [Jtx]
       m
           [j] [k
      k=1
      r                m
      E 3[Jtl k x +  >3 [jt][k II]
      k=1            k=r+1
      r                m
      S   [jt] [Z k + S  [jt] [W]k-r
      k=1            k=r+1
      r            m-r
               0 + E Jt] j-ve [w]i
      k=1m-         =1
          m-r
    = 0 + Lt] [w],
          f=1
     =[Ltw]


Definition of x


Theorem EMP [198]


Break apart sum


Definition of z and w


z = 0, re-index


L a submatrix of J

Theorem EMP [198]


So by Definition CVE [84], y = Ltw. The existence of w implies
R(L). So by Definition SE [684] we have [(A) = R(L).


that y E R(L), and therefore [(A) C


   The first two conclusions of this theorem are nearly trivial. But they set up a pattern of results for C
that is reflected in the latter two conclusions about L. In total, they tell us that we can compute all four
subsets just by finding null spaces and row spaces. This theorem does not tell us exactly how to compute
these subsets, but instead simply expresses them as null spaces and row spaces of matrices in reduced
row-echelon form without any zero rows (C and L). A linearly independent set that spans the null space


Version 2.02


﻿
                                                                    Subsection FS.FS Four Subsets 268


of a matrix in reduced row-echelon form can be found easily with Theorem BNS [139]. It is an even easier
matter to find a linearly independent set that spans the row space of a matrix in reduced row-echelon form
with Theorem BRS [245], especially when there are no zero rows present. So an application of Theorem
FS [263] is typically followed by two applications each of Theorem BNS [139] and Theorem BRS [245].
    The situation when r = m deserves comment, since now the matrix L has no rows. What is C(A) when
we try to apply Theorem FS [263] and encounter N(L)? One interpretation of this situation is that L is
the coefficient matrix of a homogeneous system that has no equations. How hard is it to find a solution
vector to this system? Some thought will convince you that any proposed vector will qualify as a solution,
since it makes all of the equations true. So every possible vector is in the null space of L and therefore
C(A) = N(L) = Cm. OK, perhaps this sounds like some twisted argument from Alice in Wonderland. Let
us try another argument that might solidly convince you of this logic.
    If r = m, when we row-reduce the augmented matrix of [S(A, b) the result will have no zero rows, and
all the leading 1's will occur in first n columns, so by Theorem RCLS [53] the system will be consistent.
By Theorem CSCS [237], b E C(A). Since b was arbitrary, every possible vector is in the column space of
A, so we again have C(A) = Cm. The situation when a matrix has r = m is known by the term full rank,
and in the case of a square matrix coincides with nonsingularity (see Exercise FS.M50 [273]).
    The properties of the matrix L described by this theorem can be explained informally as follows. A
column vector y E Cm is in the column space of A if the linear system [S(A, y) is consistent (Theorem
CSCS [237]). By Theorem RCLS [53], the reduced row-echelon form of the augmented matrix [A | y] of a
consistent system will have zeros in the bottom m - r locations of the last column. By Theorem PEEF
[262] this final column is the vector Jy and so should then have zeros in the final m - r locations. But
since L comprises the final m - r rows of J, this condition is expressed by saying y E N(L).
    Additionally, the rows of J are the scalars in linear combinations of the rows of A that create the
rows of B. That is, the rows of J record the net effect of the sequence of row operations that takes A to
its reduced row-echelon form, B. This can be seen in the equation JA = B (Theorem PEEF [262]). As
such, the rows of L are scalars for linear combinations of the rows of A that yield zero rows. But such
linear combinations are precisely the elements of the left null space. So any element of the row space of
L is also an element of the left null space of A. We will now illustrate Theorem FS [263] with a few examples.


Example FS1
Four subsets, #1
In Example SEEF [261] we found the five relevant submatrices of the matrix

                                       1    -1 -2      7    1     6
                                       6     2   -4   -18   -3 -26
                                     4 4    -1   4    10    2    17
                                       3    -1   2     9    1    12_

To apply Theorem ES [263] we only need C and L,

                                02   1   0      I3~L                         22i


Then we use Theorem ES [263] to obtain


                      -2      -1     -3-
                      -4       6      1

N(A) = N(C) =              '1'                            Theorem BNS [139]
                       0       0     -2

                       0 _  0 _ 1 _


Version 2.02


﻿
Subsection FS.FS Four Subsets 269


R(A) = R(C)


1
0
2
1
0
-3-


0
1
4
-6
0
-1


0
0
0
0
1
2


I>


                    --2     -2     -1

C(A) = N(L) =        1


                    1

,C(A) = R(L) =


Theorem BRS [245]


Theorem BNS [139]


Theorem BRS [245]


Boom!


Example FS2
Four subsets, #2
Now lets return to the matrix A that we used to motivate this section in Example CSANS [258],


10    0   3


8     7 -


-16 -1 -4 -10 -13
-6   1   -3   -6    -6


A


0     2   -2   -3
3     0   1    2
-1   -1   1     1


-2
3
0


We form the matrix M by adjoining the 6 x 6 identity matrix 16,


1


M


i


[0   0    3    8     7   1 0 0 0 0 0
-16 -1 -4     -10  -13   0 1 0 0 0 0
-6   1   -3   -6    -6   0 0 1 0 0 0
0    2   -2   -3    -2   0 0 0 1 0 0
3    0    1    2     3   0 0 0 0 1 0
-1  -1    1    1     0   0 0 0 0 0 1


and row-reduce to obtain N


N


[L   0   0    0   2    0   0    1
0        0    0  -3    0   0   -2
0    0 o      0   1    0   0    1
0    0   0       -2    0   0   -2
0    0   0    0   0   LE0      3
0    0   0    0   0    0       -2


-1   2   -1
3   -3    3
1    3    3
1   -4    0
-1   3    1
1    1   -1


To find the four subsets for A, we only need identify the 4 x 5 matrix C and the 2 x 6 matrix L,


        I  0   0   0    2

C_    0W2      0   0   -3
       0   0  LE   0    1
       0   0   0   LE  -2


L-L 0 3 -1 3 1
     0   -2 1 1 -1_


Version 2.02


﻿
Subsection FS.FS Four Subsets 270


Then we apply Theorem FS [263],


N(A) = N(C)


R(A) = R(C)


C(A) = N(L)


[(A) = R(L)


-2
3
-1
2
1
1
0
0 ,
0
2_
-3
2
1
0
0
0
1
0

3

3
1 _


0      0
1      0
0 ,1 , 0
0      0      1
-3     1_    -2
   1     -3     -1-
   -1    -1      1
   0      0      0
   1 ' 0'0
   0      1      0
   0      0      1


Theorem BNS [139]


Theorem BRS [245]


I


0
1
-2
1
1
-1


Theorem BNS [139]


Theorem BRS [245]


I


   The next example is just a bit different since the matrix has more rows than columns, and a trivial
null space.


Example FSAG
Four subsets, Archetype G
Archetype G [729] and Archetype H [733] are both systems of m = 5 equations in n
have identical coefficient matrices, which we will denote here as the matrix G,
                                                 2    3
                                                 -1 4
                                          G =    3   10
                                                 3   -1
                                                 6    9
Adjoin the 5 x 5 identity matrix, I5, to form


2 variables. They


       2
       -1
M=     3
       3
       6


3
4
10
-1
9


1
0
0
0
0


0
1
0
0
0


0
0
1
0
0


0
0
0
1
0


0
0
0
0
1


This row-reduces to


       1 0 0 0
       0        0   0
N=     0   0    1 0
       0 0 0 o
       0   0    0   0


0
0
0
0
Q1


3
-n
  2
  0
  1
  1


1
1
1-
  1
  1
-1


Version 2.02


﻿
                                                                   Subsection FS.FS Four Subsets 271


The first n = 2 columns contain r = 2 leading 1's, so we obtain C as the 2 x 2 identity matrix and extract
L from the final m - r = 3 rows in the final m = 5 columns.


FI]
0 1


      L  0 0 0 -}
L= 00         E01 -}
      0    0  Fl-  1  -1_


Then we apply Theorem FS [263],


P1(G) =  (C)  (0) ={o}

R(G)R(C)V{[= 1C

                       0
                       -1
 C(G) = (L) K{1 ,
                       1     0
                       01
                       01
                       -1      1
               =      -1   ,3
                       1     0
                       0     3 -
                       1      0
                       0       1
 ,C(G) = R(L) =  0  , 0  ,
                       0       1

                       3      0
                       0      3
               =       0   , 0    ,
                       0      3
                       -1_   -1_


2


Theorem BNS [139]

Theorem BRS [245]


Theorem BNS [139]


  0
  0
  1
  1
  -1


0i


Theorem BRS [245]


As mentioned earlier, Archetype G [729] is consistent, while Archetype H [733] is inconsistent. See if you
can write the two different vectors of constants from these two archetypes as linear combinations of the two
vectors in C(G). How about the two columns of G, can you write each individually as a linear combination
of the two vectors in C(G)? They must be in the column space of G also. Are your answers unique? Do
you notice anything about the scalars that appear in the linear combinations you are forming?  0

   Example COV [154] and Example CSROI [247] each describes the column space of the coefficient matrix
from Archetype I [737] as the span of a set of r = 3 linearly independent vectors. It is no accident that
these two different sets both have the same size. If we (you?) were to calculate the column space of this
matrix using the null space of the matrix L from Theorem FS [263] then we would again find a set of 3
linearly independent vectors that span the range. More on this later.
   So we have three different methods to obtain a description of the column space of a matrix as the
span of a linearly independent set. Theorem BCS [239] is sometimes useful since the vectors it specifies
are equal to actual columns of the matrix. Theorem BRS [245] and Theorem CSRST [247] combine to
create vectors with lots of zeros, and strategically placed l's near the top of the vector. Theorem FS [263]
and the matrix L from the extended echelon form gives us a third method, which tends to create vectors
with lots of zeros, and strategically placed l's near the bottom of the vector. If we don't care about linear


Version 2.02


﻿
                                                          Subsection FS.READ   Reading Questions 272


independence we can also appeal to Definition CSM [236] and simply express the column space as the span
of all the columns of the matrix, giving us a fourth description.
   With Theorem CSRST [247] and Definition RSM [243], we can compute column spaces with theorems
about row spaces, and we can compute row spaces with theorems about row spaces, but in each case
we must transpose the matrix first. At this point you may be overwhelmed by all the possibilities for
computing column and row spaces. Diagram CSRST [271] is meant to help. For both the column space
and row space, it suggests four techniques. One is to appeal to the definition, another yields a span of a
linearly independent set, and a third uses Theorem FS [263]. A fourth suggests transposing the matrix
and the dashed line implies that then the companion set of techniques can be applied. This can lead to
a bit of silliness, since if you were to follow the dashed lines twice you would transpose the matrix twice,
and by Theorem TT [187] would accomplish nothing productive.

                                             Definition CSM

                                C(A)         Theorem BCS
                                             Theorem FS, NV(L)
                                             Theorem CSRST, Z(At).
                                                   -------------- --------------
                                              ---------------------------
                                    /'       Definition RSM, C(A )-'
                                             Theorem FS, 7(C)
                                             Theorem BRS
                                             Definition RSM
                     Diagram CSRST. Column Space and Row Space Techniques


Although we have many ways to describe a column space, notice that one tempting strategy will usually
fail. It is not possible to simply row-reduce a matrix directly and then use the columns of the row-reduced
matrix as a set whose span equals the column space. In other words, row operations do not preserve column
spaces (however row operations do preserve row spaces, Theorem REMRS [244]). See Exercise CRS.M21
[252].

Subsection READ
Reading Questions


  1. Find a nontrivial element of the left null space of A.

                                            A 4 2   15   -3  4


  2. Find the matrices C and L in the extended echelon form of A.


                                           A=42 -1 1]


3. Why is Theorem FS [263] a great conclusion to Chapter M [182]?


Version 2.02


﻿
                                                                     Subsection FS.EXC  Exercises 273


Subsection EXC
Exercises


C20 Example FSAG [269] concludes with several questions. Perform the analysis suggested by these
questions.
Contributed by Robert Beezer

C25 Given the matrix A below, use the extended echelon form of A to answer each part of this problem.
In each part, find a linearly independent set of vectors, S, so that the span of S, (S), equals the specified
set of vectors.
                                               -5   3   -1
                                            A=1 1 1
                                            4 -8    5   -1
                                               3    -2   0
(a) The row space of A, R(A).
(b) The column space of A, C(A).
(c) The null space of A, N(A).
(d) The left null space of A, G(A).


Contributed by Robert Beezer Solution [274]

C26 For the matrix D below use the extended echelon form to find
(a) a linearly independent set whose span is the column space of D.
(b) a linearly independent set whose span is the left null space of D.

                                           -7 -11 -19 -15
                                           6    10    18    14
                                        D   3    5     9     7
                                           -1   -2    -4    -3_

Contributed by Robert Beezer Solution [274]

C41 The following archetypes are systems of equations. For each system, write the vector of constants
as a linear combination of the vectors in the span construction for the column space provided by Theorem
FS [263] and Theorem BNS [139] (these vectors are listed for each of these archetypes).
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]
Archetype E [720]
Archetype F [724]
Archetype G [729]
Archetype H [733]
Archetype I [737]
Archetype J [741]


Contributed by Robert Beezer

C43 The following archetypes are either matrices or systems of equations with coefficient matrices. For
each matrix, compute the extended echelon form N and identify the matrices C and L. Using Theorem


Version 2.02


﻿
                                                                     Subsection FS.EXC  Exercises 274


FS [263], Theorem BNS [139] and Theorem BRS [245] express the null space, the row space, the column
space and left null space of each coefficient matrix as a span of a linearly independent set.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]
Archetype G [729]/Archetype H [733]
Archetype I [737]
Archetype J [741]
Archetype K [746]
Archetype L [750]

Contributed by Robert Beezer

C60 For the matrix B below, find sets of vectors whose span equals the column space of B (C(B)) and
which individually meet the following extra requirements.
(a) The set illustrates the definition of the column space.
(b) The set is linearly independent and the members of the set are columns of B.
(c) The set is linearly independent with a "nice pattern of zeros and ones" at the top of each vector.
(d) The set is linearly independent with a "nice pattern of zeros and ones" at the bottom of each vector.

                                               2   3 1    1
                                        B4     1   1 0    1
                                              -1 2 3 -4_

Contributed by Robert Beezer Solution [275]

C61 Let A be the matrix below, and find the indicated sets with the requested properties.

                                             2   -1    5   -3
                                      A =   -5   3   -12    7
                                             1   1     4   -3-

(a) A linearly independent set S so that C(A)   (S) and S is composed of columns of A.
(b) A linearly independent set S so that C(A)  (S) and the vectors in S have a nice pattern of zeros
and ones at the top of the vectors.
(c) A linearly independent set S so that C(A)  (S) and the vectors in S have a nice pattern of zeros
and ones at the bottom of the vectors.
(d) A linearly independent set S so that 7Z(A) =(S).
Contributed by Robert Beezer Solution [276]

M50 Suppose that A is a nonsingular matrix. Extend the four conclusions of Theorem FS [263] in this
special case and discuss connections with previous results (such as Theorem NME4 [242]).
Contributed by Robert Beezer

M51 Suppose that A is a singular matrix. Extend the four conclusions of Theorem FS [263] in this
special case and discuss connections with previous results (such as Theorem NME4 [242]).
Contributed by Robert Beezer


Version 2.02


﻿
                                                               Subsection FS.SOL Solutions 275


Subsection SOL
Solutions


C25    Contributed by Robert Beezer Statement [272]
Add a 4 x 4 identity matrix to the right of A to form the matrix M and then row-reduce to the matrix N,


    -3
-5-8 5

3   -2


-1 1 0 0
1   0 1 0
-1 0 0 1
0   0 0 0


01
0
0
1_


       1 0     2  0   0
RREF    0  []3     0   0
        0   0  0 [     0
        0   0  0   0


-2 -5
-3 -8
-1 -1
1   3


N


To apply Theorem FS [263] in each of these four parts, we need the two matrices,


1c 0 2]


   1   0
=OW


-1    1
1 3_


(a)


(b)


(c)


R(A) R(C)
           1    0
      =    0 , [1
           .2   3_


C(A) = N(L)
           11

           .-0    13


  N(A) = N(C)
             -2
             -3


 [(A) = R(L)
           1     0


           .-1.  3


Theorem FS [263]


Theorem BRS [245]


Theorem FS [263]


Theorem BNS [139]


(d)


Theorem FS [263]


Theorem BNS [139]


  Theorem FS [263]


  Theorem BRS [245]


C26    Contributed by Robert Beezer Statement [272]
For both parts, we need the extended echelon form of the matrix.


-7
6
3
-1


-11
10
5
-2


-19
18
9
-4


-15
14
7
-3


1
0
0
0


000
100
010
0  0  1


        1   0
RREF  0
      '0 0
      0 0


-2
3
0
0


-1
2
0
0


0 0
0 0
o 0
0 Ql


2
-1
3
-2


5
-3
2
0


Version 2.02


﻿
                                                                   Subsection FS.SOL Solutions 276


From this matrix we extract the last two rows, in the last four columns to form the matrix L,

                                         L-1]    0   3   2-
                                           L 0A I EI-2 0

(a) By Theorem FS [263] and Theorem BNS [139] we have

                                                    -3     -2

                               C(D) = N(L) =         1  '0


(b) By Theorem FS [263] and Theorem BRS [245] we have

                                                        10


C60    Contributed by Robert Beezer   Statement [273]
(a) The definition of the column space is the span of the set of columns (Definition CSM [236]). So the
desired set is just the four columns of B,

                                         2      3    1      1
                                 S =     1   , 1   , 0  ,   1
                                         -1_    2    3     -4

(b) Theorem BCS [239] suggests row-reducing the matrix and using the columns of B that correspond to
the pivot columns.
                                              0       -1   2
                                   B RREF: 0 2 1 -1
                                             _0 0 0 0_
So the pivot columns are numbered by elements of D = {1, 2}, so the requested set is


                                            2 3i


(c) We can find this set by row-reducing the transpose of B, deleting the zero rows, and using the nonzero
rows as column vectors in the set. This is an application of Theorem CSRST [247] followed by Theorem
BRS [245].
                                                   103
                                       BRREF, 0 2 -7
                                                 0 00


So the requested set is
                                              11     0
                                       S=      0  ,1
                                             13_    -7_


Version 2.02


﻿
                                                                   Subsection FS.SOL Solutions 277


(d) With the column space expressed as a null space, the vectors obtained via Theorem BNS [139] will
be of the desired shape. So we first proceed with Theorem FS [263] and create the extended echelon form,

                        2   3  1  1   1 0   01              0-1      2    0   2   -
                     [B 13  1                           O        1     1     1    11
            [B3I3]=     1   1 0   1   0  1 0    RE      0        1  -1    0      j j
                       -1 2 3     -4  0 0   1_         _ 0  0   0    0 1       7    _

So, employing Theorem FS [263], we have C(B) =PN(L), where


We can find the desired set of vectors from Theorem BNS [139] as

                                                3    3
                                        S=      1 , 0
                                             .0_[ [1

C61    Contributed by Robert Beezer   Statement [273]
(a) First find a matrix B that is row-equivalent to A and in reduced row-echelon form

                                               1 0   3 -2-
                                      B=     0  [1   1 -1
                                            0    0   0  0_

By Theorem BCS [239] we can choose the columns of A that correspond to dependent variables (D = {1, 2})
as the elements of S and obtain the desired properties. So

                                               2     -1
                                       S_=    -5 ,3
                                               1      1 _

(b) We can write the column space of A as the row space of the transpose (Theorem CSRST [247]). So
we row-reduce the transpose of A to obtain the row-equivalent matrix C in reduced row-echelon form

                                                1 0 8
                                          C= 0     1 3
                                               0 0 0
                                               _0 0 0

The nonzero rows (written as columns) will be a linearly independent set that spans the row space of At,
by Theorem BRS [245], and the zeros and ones will be at the top of the vectors,


(c) In preparation for Theorem ES [263], augment A with the 3 x 3 identity matrix 13 and row-reduce to
obtain the extended echelon form,

                                    [1 0 3 -2     0   -


                                    0O 1   1  -1  0   $     8


Then since the first four columns of row 3 are all zeros, we extract

                                         I' L         -


Version 2.02


﻿
                                                                       Subsection FS.SOL  Solutions 278


Theorem FS [263] says that C(A) = N(L). We can then use Theorem BNS [139] to construct the desired
set S, based on the free variables with indices in F = {2, 3} for the homogeneous system LS(L, 0), so

                                                   8     8
                                         S=        1   , 0
                                                 L0      1

Notice that the zeros and ones are at the bottom of the vectors.
(d) This is a straightforward application of Theorem BRS [245]. Use the row-reduced matrix B from
part (a), grab the nonzero rows, and write them as column vectors,

                                                  1      0
                                                  0 S    1-


Version 2.02


﻿
                                                               Annotated Acronyms FS.M   Matrices 279


Annotated Acronyms M
Matrices


Theorem VSPM [184]
These are the fundamental rules for working with the addition, and scalar multiplication, of matrices. We
saw something very similar in the previous chapter (Theorem VSPCV [86]). Together, these two definitions
will provide our definition for the key definition, Definition VS [279].

Theorem SLEMM [195]
Theorem SLSLC [93] connected linear combinations with systems of equations. Theorem SLEMM [195]
connects the matrix-vector product (Definition MVP [194]) and column vector equality (Definition CVE
[84]) with systems of equations. We'll see this one regularly.

Theorem EMP [198]
This theorem is a workhorse in Section MM [194] and will continue to make regular appearances. If you
want to get better at formulating proofs, the application of this theorem can be a key step in gaining that
broader understanding. While it might be hard to imagine Theorem EMP [198] as a definition of matrix
multiplication, we'll see in Exercise MR.T80 [564] that in theory it is actually a better definition of matrix
multiplication long-term.

Theorem CINM [217]
The inverse of a matrix is key. Here's how you can get one if you know how to row-reduce.

Theorem NI [228]
"Nonsingularity" or "invertibility"? Pick your favorite, or show your versatility by using one or the other
in the right context. They mean the same thing.

Theorem CSCS [237]
Given a coefficient matrix, which vectors of constants create consistent systems. This theorem tells us that
the answer is exactly those column vectors in the column space. Conversely, we also use this teorem to test
for membership in the column space by checking the consistency of the appropriate system of equations.

Theorem BCS [239]
Another theorem that provides a linearly independent set of vectors whose span equals some set of interest
(a column space this time).

Theorem BRS [245]
Yet another theorem that provides a linearly independent set of vectors whose span equals some set of
interest (a row space).

Theorem CSRST [247]
Column spaces, row spaces, transposes, rows, columns. Many of the connections between these objects
are based on the simple observation captured in this theorem. This is not a deep result. We state it as a
theorem for convenience, so we can refer to it as needed.

Theorem ES [263]


This theorem is inherently interesting, if not computationally satisfying. Null space, row space, column
space, left null space   here they all are, simply by row reducing the extended matrix and applying
Theorem BNS [139] and Theorem BCS [239] twice (each). Nice.


Version 2.02


﻿


Chapter VS

Vector Spaces
0                                                                                            -0


We now have a computational toolkit in place and so we can begin our study of linear algebra in a more
theoretical style.
   Linear algebra is the study of two fundamental objects, vector spaces and linear transformations (see
Chapter LT [452]). This chapter will focus on the former. The power of mathematics is often derived from
generalizing many different situations into one abstract formulation, and that is exactly what we will be
doing throughout this chapter.


Section VS
Vector Spaces


In this section we present a formal definition of a vector space, which will lead to an extra increment of
abstraction. Once defined, we study its most basic properties.

Subsection VS
Vector Spaces


Here is one of the two most important definitions in the entire course.
Definition VS
Vector Space
Suppose that V is a set upon which we have defined two operations: (1) vector addition, which combines
two elements of V and is denoted by "+", and (2) scalar multiplication, which combines a complex
number with an element of V and is denoted by juxtaposition. Then V, along with the two operations, is
a vector space if the following ten properties hold.

   " AC Additive Closure
     If u, v E V, then u+v E V.

   " SC Scalar Closure
     If a E C and u C V, then au C V.

   * C Commutativity
     If u, v C V, then u +v v v+u.

   * AA Additive Associativity
     If u, v, w C V, then u +(v +w)= (u +v) +w.


280


﻿
                                                      Subsection VS.EVS Examples of Vector Spaces 281


   " Z Zero Vector
     There is a vector, 0, called the zero vector, such that u + 0 = u for all u E V.

   " Al Additive Inverses
     If u E V, then there exists a vector -u E V so that u + (-u) = 0.

   " SMA     Scalar Multiplication Associativity
     If a, 3 E C and u E V, then o(#3u) = (a3)u.

   " DVA Distributivity across Vector Addition
     If a E C and u, v E V, then a(u+ v) =au+ av.

   " DSA Distributivity across Scalar Addition
     If a, 3 E C and u E V, then (a+,3)u= au +,3u.

   " 0 One
     If u E V, then lu   u.

The objects in V are called vectors, no matter what else they might really be, simply by virtue of being
elements of a vector space.                                                                        A
   Now, there are several important observations to make. Many of these will be easier to understand
on a second or third reading, and especially after carefully studying the examples in Subsection VS.EVS
[280].
   An axiom is often a "self-evident" truth. Something so fundamental that we all agree it is true and
accept it without proof. Typically, it would be the logical underpinning that we would begin to build
theorems upon. Some might refer to the ten properties of Definition VS [279] as axioms, implying that
a vector space is a very natural object and the ten properties are the essence of a vector space. We will
instead emphasize that we will begin with a definition of a vector space. After studying the remainder of
this chapter, you might return here and remind yourself how all our forthcoming theorems and definitions
rest on this foundation.
   As we will see shortly, the objects in V can be anything, even though we will call them vectors. We
have been working with vectors frequently, but we should stress here that these have so far just been
column vectors  scalars arranged in a columnar list of fixed length. In a similar vein, you have used the
symbol "+" for many years to represent the addition of numbers (scalars). We have extended its use to the
addition of column vectors and to the addition of matrices, and now we are going to recycle it even further
and let it denote vector addition in any possible vector space. So when describing a new vector space, we
will have to define exactly what "+" is. Similar comments apply to scalar multiplication. Conversely, we
can define our operations any way we like, so long as the ten properties are fulfilled (see Example CVS
[283]).
   A vector space is composed of three objects, a set and two operations. However, we usually use the
same symbol for both the set and the vector space itself. Do not let this convenience fool you into thinking
the operations are secondary!
   This discussion has either convinced you that we are really embarking on a new level of abstraction,
or they have seemed cryptic, mysterious or nonsensical. You might want to return to this section in a few
days and give it another read then. In any case, let's look at some concrete examples now.

Subsection EVS
Examples of Vector Spaces


Our aim in this subsection is to give you a storehouse of examples to work with, to become comfortable
with the ten vector space properties and to convince you that the multitude of examples justifies (at least


Version 2.02


﻿
                                                      Subsection VS.EVS Examples of Vector Spaces 282


initially) making such a broad definition as Definition VS [279]. Some of our claims will be justified by
reference to previous theorems, we will prove some facts from scratch, and we will do one non-trivial
example completely. In other places, our usual thoroughness will be neglected, so grab paper and pencil
and play along.
Example VSCV
The vector space Cm
Set: Ctm, all column vectors of size m, Definition VSCV [83].
Equality: Entry-wise, Definition CVE [84].
Vector Addition: The "usual" addition, given in Definition CVA [84].
Scalar Multiplication: The "usual" scalar multiplication, given in Definition CVSM [85].
   Does this set with these operations fulfill the ten properties? Yes. And by design all we need to do is
quote Theorem VSPCV [86]. That was easy.

Example VSM
The vector space of matrices, Mmn
Set: Mm, the set of all matrices of size m x n and entries from C, Example VSM [281].
Equality: Entry-wise, Definition ME [182].
Vector Addition: The "usual" addition, given in Definition MA [182].
Scalar Multiplication: The "usual" scalar multiplication, given in Definition MSM [183].
   Does this set with these operations fulfill the ten properties? Yes. And all we need to do is quote
Theorem VSPM [184]. Another easy one (by design).
   So, the set of all matrices of a fixed size forms a vector space. That entitles us to call a matrix a vector,
since a matrix is an element of a vector space. For example, if A, B E M3,4 then we call A and B "vectors,"
and we even use our previous notation for column vectors to refer to A and B. So we could legitimately
write expressions like
                                   u+v=A+B=B+A=v+u
This could lead to some confusion, but it is not too great a danger. But it is worth comment.
   The previous two examples may be less than satisfying. We made all the relevant definitions long
ago. And the required verifications were all handled by quoting old theorems. However, it is important to
consider these two examples first. We have been studying vectors and matrices carefully (Chapter V [83],
Chapter M [182]), and both objects, along with their operations, have certain properties in common, as
you may have noticed in comparing Theorem VSPCV [86] with Theorem VSPM [184]. Indeed, it is these
two theorems that motivate us to formulate the abstract definition of a vector space, Definition VS [279].
Now, should we prove some general theorems about vector spaces (as we will shortly in Subsection VS.VSP
[285]), we can instantly apply the conclusions to both Ctm and Mmn. Notice too how we have taken six
definitions and two theorems and reduced them down to two examples. With greater generalization and
abstraction our old ideas get downgraded in stature.
   Let us look at some more examples, now considering some new vector spaces.

Example VSP
The vector space of polynomials, Pa
Set: Ps, the set of all polynomials of degree n or less in the variable x with coefficients from C.
Equality:

     ao +aix +a2x2 -+ -..-- anz"=bo-+-biz +-b292  -  a"i n     nyi i=b o


Vector Addition:

                   (ao+aix+a2x2 +- +ax")+(bo+bix+b2x2+- +b") =
                             (ao+bo)+(ai+bi)x+(a2+b2)x2+- +(an+bb)x"


Version 2.02


﻿
                                                      Subsection VS.EVS Examples of Vector Spaces 283


Scalar Multiplication:

              a(ao + aix + a2x2 + --. + anx") = (aao) + (aai)x + (aa2)x2 + ... + (aan)x"

   This set, with these operations, will fulfill the ten properties, though we will not work all the details
here. However, we will make a few comments and prove one of the properties. First, the zero vector
(Property Z [280]) is what you might expect, and you can check that it has the required property.

                                     O=0+ +0x +0Ox2 + - -    O-+0z

The additive inverse (Property Al [280]) is also no surprise, though consider how we have chosen to write
it.
             - (ao + aix + a2x2 +... + anx") = (-ao) + (-ai)x + (-a2)x2 + ... + (-an)x"
Now let's prove the associativity of vector addition (Property AA [279]). This is a bit tedious, though
necessary. Throughout, the plus sign ("+") does triple-duty. You might ask yourself what each plus sign
represents as you work through this proof.

           u+(v + w)
               = (ao +aix+...+axTh)+((bo+bix+...+bxTh)+(co+cix+...+cxTh))
               = (ao + aix+... + anx") + ((bo + co) + (bi + ci)x +-...+ (bn + cn)")
               = (ao + (bo + co)) + (ai+ (bi + ci))x +...+ (an + (bn + cn))"
               = ((ao + bo) + co) + ((ai + bi) + ci)x +-...+ ((an + bn) + cn)"
               = ((ao + bo) + (ai + bi)x + -.-. + (an + bn)z") + (co + ciz + -.-. + cnz")
               = ((ao + ai i+ +a")x++(bo+bix+-+ b"))+(co+cix+...         +ca")
               = (u+v)+w

Notice how it is the application of the associativity of the (old) addition of complex numbers in the middle
of this chain of equalities that makes the whole proof happen. The remainder is successive applications of
our (new) definition of vector (polynomial) addition. Proving the remainder of the ten properties is similar
in style and tedium. You might try proving the commutativity of vector addition (Property C [279]), or
one of the distributivity properties (Property DVA [280], Property DSA [280]).

Example VSIS
The vector space of infinite sequences
Set: C° ={(co, ci, c2, c3, ...) | c2 E C, i E N}.
Equality:
                   (co, ci, c2, - ..)  (do, di, d2, . ..) if and only if c = d2 for all i > 0
Vector Addition:

                    (co, ci, c2, . . .) + (do, di, d2, . ..) =(co + do, ci + di, c2 + d2, -.-.)

Scalar Multiplication:
                            ca(co, ci, c2, c3, . ..) = (a~co, aci, ac2, ac3, ---
This should remind you of the vector space C"m, though now our lists of scalars are written horizontally
with commas as delimiters and they are allowed to be infinite in length. What does the zero vector look
like (Property Z [280])? Additive inverses (Property Al [280])? Can you prove the associativity of vector
addition (Property AA [279])?


Example VSF
The vector space of functions
Set: F={f | f : C- -C}.


Version 2.02


﻿
                                                      Subsection VS.EVS Examples of Vector Spaces 284


Equality: f = g if and only if f (x) = g(x) for all x E C.
Vector Addition: f + g is the function with outputs defined by (f + g)(x) = f(x) + g(x).
Scalar Multiplication: af is the function with outputs defined by (af)(z) =of(x).
    So this is the set of all functions of one variable that take a complex number to a complex number.
You might have studied functions of one variable that take a real number to a real number, and that might
be a more natural set to study. But since we are allowing our scalars to be complex numbers, we need to
expand the domain and range of our functions also. Study carefully how the definitions of the operation
are made, and think about the different uses of "+" and juxtaposition. As an example of what is required
when verifying that this is a vector space, consider that the zero vector (Property Z [280]) is the function
z whose definition is z(x) = 0 for every input x.
   While vector spaces of functions are very important in mathematics and physics, we will not devote
them much more attention.


    Here's a unique example.
Example VSS
The singleton vector space
Set: Z = {z}.
Equality: Huh?
Vector Addition: z + z = z.
Scalar Multiplication: az = z.
    This should look pretty wild. First, just what is z? Column vector, matrix, polynomial, sequence,
function? Mineral, plant, or animal? We aren't saying! z just is. And we have definitions of vector
addition and scalar multiplication that are sufficient for an occurrence of either that may come along.
    Our only concern is if this set, along with the definitions of two operations, fulfills the ten properties of
Definition VS [279]. Let's check associativity of vector addition (Property AA [279]). For all u, v, w E Z,

                                      u+(v+w) =z+(z+z)
                                                  = z + z
                                                  = (z+z)+z
                                                     (u+v)+w

What is the zero vector in this vector space (Property Z [280])? With only one element in the set, we do
not have much choice. Is z = 0? It appears that z behaves like the zero vector should, so it gets the title.
Maybe now the definition of this vector space does not seem so bizarre. It is a set whose only element is
the element that behaves like the zero vector, so that lone element is the zero vector.
    Perhaps some of the above definitions and verifications seem obvious or like splitting hairs, but the
next example should convince you that they are necessary. We will study this one carefully. Ready? Check
your preconceptions at the door.
Example CVS
The crazy vector space
Set: C ={(zi, X2) |   Xi  2 E C}.
Vector Addition: (zi, X2) + (yi, y2) =(zi + yi + 1, X2 + y2 + 1).
Scalar Multiplication: ajzi, z2) =(azi1 + a~ - 1, azx2 + a~ - 1).
    Now, the first thing I hear you say is "You can't do that!" And my response is, "Oh yes, I can!" I am


free to define my set and my operations any way I please. They may not look natural, or even useful, but
we will now verify that they provide us with another example of a vector space. And that is enough. If
you are adventurous, you might try first checking some of the properties yourself. What is the zero vector?
Additive inverses? Can you prove associativity? Ready, here we go.


Version 2.02


﻿
                                                  Subsection VS.EVS Examples of Vector Spaces 285


   Property AC [279], Property SC [279]: The result of each operation is a pair of complex numbers, so
these two closure properties are fulfilled.
   Property C [279]:

                       u + v = (x1, X2) + (Yi, Y2)= (x + yi + 1, x2 + Y2 + 1)
                            =(yi +   i + 1, y2 + x2 + 1) = (yi, y2) + (Xi, x2)
                            =v+u

Property AA [279]:

                    u + (v + w) =_(x1, X2) + ((y1, y2) + (zi, z2))
                               = (XI, X2) + (y1 + zi + 1, y2 + z2 + 1)
                               = (xi + (yi + zi + 1) + 1, 2 + (y2 + z2 + 1) + 1)
                               = (xi+yi+zi+2, X2+Y2+z2+2)
                               = ((i + Y1 + 1) + zi + 1, (x2 + Y2 + 1) + Z2 + 1)
                               = (xi + yi + 1, X2 + Y2 + 1) + (z1, z2)
                               = ((xi, X2) + (Yi, Y2)) + (zi, Z2)
                               = (u+v)+w

Property Z [280]: The zero vector is ...0 = (-1, -1). Now I hear you say, "No, no, that can't be, it must
be (0, 0)!" Indulge me for a moment and let us check my proposal.

             u + 0 = (Xi, X2) + (-1, -1) =(x + (-1) + 1, X2 + (-1) + 1) (Xi, x2)= u

Feeling better? Or worse?
   Property Al [280]: For each vector, u, we must locate an additive inverse, -u. Here it is, -(Xi, X2)
(-xi - 2, -x2 - 2). As odd as it may look, I hope you are withholding judgment. Check:

u + (-u) = (xi, X2) + (-x1 - 2, -x2 - 2) - (x1 + (-xi - 2) + 1, -x2 + (x2 - 2) + 1) = (-1, -1) = 0

Property SMA [280]:

                   o(#3u) = a((i, X2))
                         = a(#3xi+ 3#- 1, /3X2 + / -1)
                         S(9x1+3- 1)+c- 1, (Qx2+-1)+c- 1)
                         = ((4xi + c - ) +   - 1, (#x2 + c -   ) +     - 1)
                         =(c43x1+a#3- 1, 03X2 +a#3- 1)
                         - (ca3)(xi, X2)
                         - (c43)u

Property DVA [280]: If you have hung on so far, here's where it gets even wilder. In the next two properties
we mix and mash the two operations.

          ju + v) =a ((zi, x2) + (yi, y2))
                    =ojxi + yi + 1, X2 + y2 + 1)
                    - (c~zi + yi + 1) + ca - 1, ajx2 + y2 + 1) + a - 1)


- (axi + ayi + a + a - 1, ax2 + ay2 + a + a - 1)
=x(axi + a - 1 + aY1 + a - 1 + 1, ax2 + a - 1 + ay2 + a - 1+1)
=((axi + a - 1) + (ay1 + a - 1) + 1, (ax2 + a - 1) + (aY2 + a - 1) + 1)


Version 2.02


﻿
                                                       Subsection VS.VSP  Vector Space Properties 286


                    =(oxi + a - 1, ax2 + a - 1) + (ay1 + a - 1, ay2 + a - 1)
                      a(xi, X2) + o(yi, Y2)
                    = au + OaTv

Property DSA [280]:

           (o+f3)u= (ca+ 3) (1, X2)
                    = ((ae+ #)X1+ (ae+ 1) - 1, (a +,)X2 +(a +,)-1
                    = (ai +3x1+ c+ 3-1, a2+#3x2+ c+(3-1)
                      (axi+a-1+#xi+3-1+1, azx2+a-1+#3x2+#3-1+1)
                    =((axi + a - 1) + (3x1 +3 - 1) + 1, (ax2 + a - 1) + (#x2 +   - 1) + 1)
                    = (azi+ a- 1, a2 + a- 1) + (#zi+# - 1,~x2 +1 - 1)
                    =c(xi, X2)+3(x 1, x2)
                    = au + ,3u

Property 0 [280]: After all that, this one is easy, but no less pleasing.

                       lu = 1(xi, x2) = (x1 + 1 - 1, x2 + 1 - 1) = (xi, 32) = u

That's it, C is a vector space, as crazy as that may seem.
   Notice that in the case of the zero vector and additive inverses, we only had to propose possibilities and
then verify that they were the correct choices. You might try to discover how you would arrive at these
choices, though you should understand why the process of discovering them is not a necessary component
of the proof itself.


Subsection VSP
Vector Space Properties


Subsection VS.EVS [280] has provided us with an abundance of examples of vector spaces, most of them
containing useful and interesting mathematical objects along with natural operations. In this subsection
we will prove some general properties of vector spaces. Some of these results will again seem obvious, but
it is important to understand why it is necessary to state and prove them. A typical hypothesis will be
"Let V be a vector space." From this we may assume the ten properties of Definition VS [279], and nothing
more. Its like starting over, as we learn about what can happen in this new algebra we are learning. But
the power of this careful approach is that we can apply these theorems to any vector space we encounter
-those in the previous examples, or new ones we have not yet contemplated. Or perhaps new ones that
nobody has ever contemplated. We will illustrate some of these results with examples from the crazy vector
space (Example CVS [283]), but mostly we are stating theorems and doing proofs. These proofs do not
get too involved, but are not trivial either, so these are good theorems to try proving yourself before you
study the proof given here. (See Technique P [695].)
   First we show that there is just one zero vector. Notice that the properties only require there to be at
least one, and say nothing about there possibly being more. That is because we can use the ten properties
of a vector space (Definition VS [279]) to learn that there can never be more than one. To require that
this extra condition be stated as an eleventh property would make the definition of a vector space more


complicated than it needs to be.

Theorem ZVU
Zero Vector is Unique


Version 2.02


﻿
                                                       Subsection VS.VSP  Vector Space Properties 287


Suppose that V is a vector space. The zero vector, 0, is unique.                                  D
Proof To prove uniqueness, a standard technique is to suppose the existence of two objects (Technique
U [693]). So let O1 and 02 be two zero vectors in V. Then

                      O1 = 01+02                        Property Z [280] for 02
                         = 02 + O1                      Property C [279]
                         = 02                           Property Z [280] for 01

This proves the uniqueness since the two zero vectors are really the same.                        U

Theorem AIU
Additive Inverses are Unique
Suppose that V is a vector space. For each u E V, the additive inverse, -u, is unique.       D
Proof To prove uniqueness, a standard technique is to suppose the existence of two objects (Technique
U [693]). So let -ui and -u2 be two additive inverses for u. Then

                    -ui = -ui + 0                              Property Z [280]
                        =-ui + (u + -u2)                       Property Al [280]
                        = (-ui + u) + -u2                      Property AA [279]
                        = 0 + -u2                              Property Al [280]
                        =-u2                                   Property Z [280]

So the two additive inverses are really the same.                                                 U
   As obvious as the next three theorems appear, nowhere have we guaranteed that the zero scalar, scalar
multiplication and the zero vector all interact this way. Until we have proved it, anyway.
Theorem ZSSM
Zero Scalar in Scalar Multiplication
Suppose that V is a vector space and u E V. Then Ou = 0.                                          D
Proof Notice that 0 is a scalar, u is a vector, so Property SC [279] says 0u is again a vector. As such,
Ou has an additive inverse, -(0u) by Property Al [280].

                   Ou = 0 + Ou                                Property Z [280]
                         (-(0u) + Ou) + Ou                    Property Al [280]
                         -(0u) + (0u + 0u)                    Property AA [279]
                       =-(Ou)+ (0 +0)u                        Property DSA [280]
                       =-(Ou)+0Ou                             Property ZCN [681]
                       =0                                     Property Al [280]


   Here's another theorem that looks like it should be obvious, but is still in need of a proof.
Theorem ZVSM
Zero Vector in Scalar Multiplication
Suppose that V is a vector space and ca C C. Then ca0 =0.D


Proof Notice that a is a scalar, 0 is a vector, so Property SC [279] means a0 is again a vector. As such,
a0 has an additive inverse, -(a0) by Property Al [280].

                   a0 = 0 + a0                                Property Z [280]


Version 2.02


﻿
                                                       Subsection VS.VSP Vector Space Properties 288


                      = (-(a0) + a0) + a0                    Property Al [280]
                      = -(a0) + (a0 + a0)                    Property AA [279]
                      = -(a0) + a (0 + 0)                    Property DVA [280]
                      = -(a0) + a0                           Property Z [280]
                      = 0                                    Property Al [280]


   Here's another one that sure looks obvious. But understand that we have chosen to use certain notation
because it makes the theorem's conclusion look so nice. The theorem is not true because the notation looks
so good, it still needs a proof. If we had really wanted to make this point, we might have defined the additive
inverse of u as u . Then we would have written the defining property, Property Al [280], as u + u  = 0.
This theorem would become ua = (-1)u. Not really quite as pretty, is it?

Theorem AISM
Additive Inverses from Scalar Multiplication
Suppose that V is a vector space and u E V. Then -u = (-1)u.                                     D
Proof

                  -u = -u+0                                  Property Z [280]
                      = -u+Ou                                Theorem ZSSM [286]
                        -U+ (1+ (-1)) U
                      - -u + (lu + (-1)u)                    Property DSA [280]
                        -u + (u + (-1)u)                     Property 0 [280]
                        (-u + u) + (-1)u                     Property AA [279]
                      = 0 + (-1)u                            Property Al [280]
                        (-1)uProperty Z [280]


   Because of this theorem, we can now write linear combinations like 6ui + (-4)u2
as 6ui - 4u2, even though we have not formally defined an operation called vector subtraction. Our
next theorem is a bit different from several of the others in the list. Rather than making a declaration
("the zero vector is unique") it is an implication ("if...,then...") and so can be used in proofs to convert
a vector equality into two possibilities, one a scalar equality and the other a vector equality. It should
remind you of the situation for complex numbers. If a, 3 E C and a3= 0, then a = 0 or 3/= 0. This
critical property is the driving force behind using a factorization to solve a polynomial equation.
Theorem SMEZV
Scalar Multiplication Equals the Zero Vector
Suppose that V is a vector space and ca E C. If au =0, then either ac= 0 or u =0.D
Proof We prove this theorem by breaking up the analysis into two cases. The first seems too trivial, and
it is, but the logic of the argument is still legitimate.
   Case 1. Suppose ac= 0. In this case our conclusion is true (the first part of the either/or is true) and
we are done. That was easy.
   Case 2. Suppose a~ # 0.


u= lu                             Property 0 [280]

     (i-a)u                       a#0

                                                                     Version 2.02


﻿
                                                            Subsection VS.RD  Recycling Definitions 289

                           1
                         = - (ou)                         Property SMA [280]
                            1
                         = - (0)                          Hypothesis
                         = 0                              Theorem ZVSM [286]

So in this case, the conclusion is true (the second part of the either/or is true) and we are done since the
conclusion was true in each of the two cases.                                                      U

Example PCVS
Properties for the Crazy Vector Space
Several of the above theorems have interesting demonstrations when applied to the crazy vector space,
C (Example CVS [283]). We are not proving anything new here, or learning anything we did not know
already about C. It is just plain fun to see how these general theorems apply in a specific instance. For
most of our examples, the applications are obvious or trivial, but not with C.
   Suppose u E C.
Then, as given by Theorem ZSSM [286],

                      Ou = 0(zi, x2) = (Oxi + 0 - 1, OX2 + 0 - 1)  (-1,-) =0

And as given by Theorem ZVSM [286],

                          a0 = a(-1, -1) = (a(-1) + a - 1, a(-1) + c - 1)
                             = (-a + a - 1, -a + a - 1) = (-1, -1) =0

Finally, as given by Theorem AISM [287],

                    (-1)u = (-1)(zi, z2)= ((-1)xi + (-1) - 1, (-1)x2 + (-1) - 1)
                          = (-xi - 2, -x2 - 2) - -u


Subsection RD
Recycling Definitions


When we say that V is a vector space, we then know we have a set of objects (the "vectors"), but we also
know we have been provided with two operations ("vector addition" and "scalar multiplication") and these
operations behave with these objects according to the ten properties of Definition VS [279]. One combines
two vectors and produces a vector, the other takes a scalar and a vector, producing a vector as the result.
So if u11, 112, 113 E V then an expression like

                                          5111 + 7112 - 13113

would be unambiguous in any of the vector spaces we have discussed in this section. And the resulting
object would be another vector in the vector space. If you were tempted to call the above expression a
linear combination, you would be right. Four of the definitions that were central to our discussions in
Chapter V [83] were stated in the context of vectors being column vectors, but were purposely kept broad


enough that they could be applied in the context of any vector space. They only rely on the presence of
scalars, vectors, vector addition and scalar multiplication to make sense. We will restate them shortly,
unchanged, except that their titles and acronyms no longer refer to column vectors, and the hypothesis of
being in a vector space has been added. Take the time now to look forward and review each one, and begin


Version 2.02


﻿
                                                         Subsection VS.READ   Reading Questions 290


to form some connections to what we have done earlier and what we will be doing in subsequent sections
and chapters. Specifically, compare the following pairs of definitions:
Definition LCCV [90] and Definition LC [297]
Definition SSCV [112] and Definition SS [298]
Definition RLDCV [132] and Definition RLD [308]
Definition LICV [132] and Definition LI [308]

Subsection READ
Reading Questions


  1. Comment on how the vector space Cm went from a theorem (Theorem VSPCV [86]) to an example
     (Example VSCV [281]).

  2. In the crazy vector space, C, (Example CVS [283]) compute the linear combination

                                          2(3, 4) + (-6)(1, 2).

  3. Suppose that a is a scalar and 0 is the zero vector. Why should we prove anything as obvious as
     a0 = 0 such as we did in Theorem ZVSM [286]?


Version 2.02


﻿
                                                                  Subsection VS.EXC  Exercises 291


Subsection EXC
Exercises


M10 Define a possibly new vector space by beginning with the set and vector addition from C2 (Example
VSCV [281]) but change the definition of scalar multiplication to

                        ax=0= 0EC, xEC2
                                  0

Prove that the first nine properties required for a vector space hold, but Property 0 [280] does not hold.
   This example shows us that we cannot expect to be able to derive Property 0 [280] as a consequence of
assuming the first nine properties. In other words, we cannot slim down our list of properties by jettisoning
the last one, and still have the same collection of objects qualify as vector spaces.
Contributed by Robert Beezer

T10 Prove each of the ten properties of Definition VS [279] for each of the following examples of a vector
space:
Example VSP [281]
Example VSIS [282]
Example VSF [282]
Example VSS [283]
Contributed by Robert Beezer

   The next three problems suggest that under the right situations we can "cancel." In practice, these
techniques should be avoided in other proofs. Prove each of the following statements.

T21 Suppose that V is a vector space, and u, v, w E V. If w + u = w + v, then u = v.
Contributed by Robert Beezer Solution [291]

T22 Suppose V is a vector space, u, v E V and a is a nonzero scalar from C. If au = av, then u = v.
Contributed by Robert Beezer Solution [291]

T23   Suppose V is a vector space, u # 0 is a vector in V and a, /3 E C. If au = 3u, then a =3.
Contributed by Robert Beezer Solution [291]


Version 2.02


﻿
                                                                Subsection VS.SOL Solutions 292


Subsection SOL
Solutions


T21    Contributed by Robert Beezer Statement [290]


                    u0O+u
                        (-w + w) + u
                      -w + (w + u)
                      -w + (w + v)
                        (-w+w)+v
                        0O+v
                      = v

T22    Contributed by Robert Beezer Statement [290]


Property Z [280]
Property Al [280]
Property AA [279]
Hypothesis
Property AA [279]
Property Al [280]
Property Z [280]


U = lu


    = a u
    1
  = - (au)
    1
  = - (av)

    =-a V

  = lv
  =v


Property 0 [280]

a# 0

Property SMA [280]

Hypothesis

Property SMA [280]


Property 0 [280]


T23    Contributed by Robert Beezer Statement [290]


O= au + - (au)
  =,3u + - (au)
  =,3u + (-1) (au)
  =,3u + ((-1)a) u
  =,3u + (-a) u
  = (3 - a) u


Property Al [280]
Hypothesis
Theorem AISM [287]
Property SMA [280]


Property DSA [280]


By hypothesis, u # 0, so Theorem SMEZV [287] implies

                   0=,(3-a


Version 2.02


﻿
                                                                             Section S Subspaces 293


Section S
Subspaces


A subspace is a vector space that is contained within another vector space. So every subspace is a vector
space in its own right, but it is also defined relative to some other (larger) vector space. We will discover
shortly that we are already familiar with a wide variety of subspaces from previous sections. Here's the
definition.

Definition S
Subspace
Suppose that V and W are two vector spaces that have identical definitions of vector addition and scalar
multiplication, and that W is a subset of V, W C V. Then W is a subspace of V.               A

   Lets look at an example of a vector space inside another vector space.

Example SC3
A subspace of C3
We know that C3 is a vector space (Example VSCV [281]). Consider the subset,


                                 W =     x2   | 2x1 - 5x2 + 7x3 = 0
                                         . 3 _

It is clear that W C C3, since the objects in W are column vectors of size 3. But is W a vector space? Does
it satisfy the ten properties of Definition VS [279] when we use the same operations? That is the main

question. Suppose x =[x2  and y =[Y2 are vectors from W. Then we know that these vectors cannot

be totally arbitrary, they must have gained membership in W by virtue of meeting the membership test.
For example, we know that x must satisfy 2x1 - 5x2 + 7x3= 0 while y must satisfy 2y1 - 5Y2 + 7y3= 0.
Our first property (Property AC [279]) asks the question, is x + y E W? When our set of vectors was C3,
this was an easy question to answer. Now it is not so obvious. Notice first that

                                           xi      Y1      zi + y1
                                  x+y= [2 +        Y2 =     2+y2
                                           x3_     y3_     x3 -|-y3_

and we can test this vector for membership in W as follows,

     2(zi + yi) - 5(z2 + y2) + 7(z3 +| ys) =2xi +| 2y1 - 5X2 - 5y2 + 7X3 +| 79I3
                                         =(2zi - 5X2 + 7X3) +| (2y1 - 5y2 + 7y3)
                                         =0+0                                      xE W, yEW


and by this computation we see that x + y E W. One property down, nine to go.
   If a~ is a scalar and x E W, is it always true that ax E W? This is what we need to establish Property
SC [279]. Again, the answer is not as obvious as it was when our set of vectors was all of C3. Let's see.


O X lO  [x]
cox = x2 =    xe


Version 2.02


﻿
                                                                Subsection S.TS Testing Subspaces 294


and we can test this vector for membership in W with

                2(axi) - 5(ax2) + 7(ax3) = a(2xi - 5x2 + 7x3)
                                          =a0                                 xE W
                                          = 0

and we see that indeed ax E W. Always.
   If W has a zero vector, it will be unique (Theorem ZVU [285]). The zero vector for C3 should also
perform the required duties when added to elements of W. So the likely candidate for a zero vector in
                                                                         .0
W is the same zero vector that we know C3 has. You can check that 0 = 0 is a zero vector in W too
                                                                         -0
(Property Z [280]).
   With a zero vector, we can now ask about additive inverses (Property Al [280]). As you might suspect,
the natural candidate for an additive inverse in W is the same as the additive inverse from C3. However,
we must insure that these additive inverses actually are elements of W. Given x E W, is -x E W?

                                                    -X1
                                            -x = -z2


and we can test this vector for membership in W with

                2(-xi) - 5(-x2) + 7(-x3)    -(2xi - 5x2 + 7x3)
                                           =-0                                 x E W
                                           0

and we now believe that -x E W.
   Is the vector addition in W commutative (Property C [279])? Is x + y = y + x? Of course! Nothing
about restricting the scope of our set of vectors will prevent the operation from still being commutative.
Indeed, the remaining five properties are unaffected by the transition to a smaller set of vectors, and so
remain true. That was convenient.
    So W satisfies all ten properties, is therefore a vector space, and thus earns the title of being a subspace
of C3.


Subsection TS
Testing Subspaces


In Example SC3 [292] we proceeded through all ten of the vector space properties before believing that
a subset was a subspace. But six of the properties were easy to prove, and we can lean on some of the
properties of the vector space (the superset) to make the other four easier. Here is a theorem that will
make it easier to test if a subset is a vector space. A shortcut if there ever was one.
Theorem TSS
Testing Subsets for Subspaces
Suppose that V is a vector space and W is a subset of V, W c V. Endow W with the same operations as


V. Then W is a subspace if and only if three conditions are met

  1. W is non-empty, W - 0.

  2. If x E W andy E W, then x+y E W.


Version 2.02


﻿
                                                                 Subsection S.TS Testing Subspaces 295


  3. If a E C and x E W, then ax E W.


Proof (-) We have the hypothesis that W is a subspace, so by Definition VS [279] we know that W
contains a zero vector. This is enough to show that W $ 0. Also, since W is a vector space it satisfies the
additive and scalar multiplication closure properties, and so exactly meets the second and third conditions.
If that was easy, the the other direction might require a bit more work.
    (<) We have three properties for our hypothesis, and from this we should conclude that W has the
ten defining properties of a vector space. The second and third conditions of our hypothesis are exactly
Property AC [279] and Property SC [279]. Our hypothesis that V is a vector space implies that Property
C [279], Property AA [279], Property SMA [280], Property DVA [280], Property DSA [280] and Property
O [280] all hold. They continue to be true for vectors from W since passing to a subset, and keeping the
operation the same, leaves their statements unchanged. Eight down, two to go.
    Suppose x E W. Then by the third part of our hypothesis (scalar closure), we know that (-1)x E W.
By Theorem AISM [287] (-1)x = -x, so together these statements show us that -x E W. -x is the
additive inverse of x in V, but will continue in this role when viewed as element of the subset W. So every
element of W has an additive inverse that is an element of W and Property Al [280] is completed. Just
one property left.
   While we have implicitly discussed the zero vector in the previous paragraph, we need to be certain
that the zero vector (of V) really lives in W. Since W is non-empty, we can choose some vector z E W.
Then by the argument in the previous paragraph, we know -z E W. Now by Property Al [280] for V and
then by the second part of our hypothesis (additive closure) we see that

                                          0 = z + (-z) E W

So W contain the zero vector from V. Since this vector performs the required duties of a zero vector in V,
it will continue in that role as an element of W. This gives us, Property Z [280], the final property of the
ten required. (Sarah Fellez contributed to this proof.)


    So just three conditions, plus being a subset of a known vector space, gets us all ten properties.
Fabulous! This theorem can be paraphrased by saying that a subspace is "a non-empty subset (of a vector
space) that is closed under vector addition and scalar multiplication."
   You might want to go back and rework Example SC3 [292] in light of this result, perhaps seeing where
we can now economize or where the work done in the example mirrored the proof and where it did not.
We will press on and apply this theorem in a slightly more abstract setting.
Example SP4
A subspace of P4
P4 is the vector space of polynomials with degree at most 4 (Example VSP [281]). Define a subset W as

                                     W  ={p(x) |p EP4, p(2) =O}

so W is the collection of those polynomials (with degree 4 or less) whose graphs cross the x-axis at x  2.
Whenever we encounter a new set it is a good idea to gain a better understanding of the set by finding a
few elements in the set, and a few outside it. For example 92 - x - 2 C W, while zi -+ 9s - 7 g W.
    Is W nonempty? Yes, x - 2 C W.
    Additive closure? Suppose p C W and q C W. Is p + q C W? p and q are not totally arbitrary, we


know that p(2) = 0 and q(2) = 0. Then we can check p + q for membership in W,

                      (p + q)(2) = p(2) + q(2)                    Addition in P4
                                =0+0                              pCE W, qE W


Version 2.02


﻿
                                                                Subsection S.TS Testing Subspaces 296


                                 =0

so we see that p + q qualifies for membership in W.
   Scalar multiplication closure? Suppose that a E C and p E W. Then we know that p(2) = 0. Testing
ap for membership,

                    (cep)(2) = ap(2)                    Scalar multiplication in P4
                            =a00                       pEW
                            =0

so op E W.
   We have shown that W meets the three conditions of Theorem TSS [293] and so qualifies as a subspace
of P4. Notice that by Definition S [292] we now know that W is also a vector space. So all the properties
of a vector space (Definition VS [279]) and the theorems of Section VS [279] apply in full.


   Much of the power of Theorem TSS [293] is that we can easily establish new vector spaces if we can
locate them as subsets of other vector spaces, such as the ones presented in Subsection VS.EVS [280].
   It can be as instructive to consider some subsets that are not subspaces. Since Theorem TSS [293] is an
equivalence (see Technique E [690]) we can be assured that a subset is not a subspace if it violates one of
the three conditions, and in any example of interest this will not be the "non-empty" condition. However,
since a subspace has to be a vector space in its own right, we can also search for a violation of any one of
the ten defining properties in Definition VS [279] or any inherent property of a vector space, such as those
given by the basic theorems of Subsection VS.VSP [285]. Notice also that a violation need only be for a
specific vector or pair of vectors.

Example NSC2Z
A non-subspace in C2, zero vector
Consider the subset W below as a candidate for being a subspace of C2


                                    W   =xi1     3x1-5x2=12
                                            z2_

The zero vector of C2, 0 = 0 will need to be the zero vector in W  also. However, 0 g W  since

3(0) - 5(0) = 0 f 12. So W has no zero vector and fails Property Z [280] of Definition VS [279]. This
subspace also fails to be closed under addition and scalar multiplication. Can you find examples of this?


Example NSC2A
A non-subspace in C2, additive closure
Consider the subset X below as a candidate for being a subspace of C2


You can check that 0 C X, so the approach of the last example will not get us anywhere. However, notice


that x  [0-]E X and y= C0E X. Yet


                                    x + y = -0-    1 1


Version 2.02


﻿
                                                               Subsection S.TS Testing Subspaces 297


So X fails the additive closure requirement of either Property AC [279] or Theorem TSS [293], and is
therefore not a subspace.

Example NSC2S
A non-subspace in C2, scalar multiplication closure
Consider the subset Y below as a candidate for being a subspace of C2

                                    Y =         | zi E Z,z2 E Z
                                           x2_

Z is the set of integers, so we are only allowing "whole numbers" as the constituents of our vectors. Now,
O E Y, and additive closure also holds (can you prove these claims?). So we will have to try something
different. Note that a = E C and 3 Y but


                                          Tx 1 2      1
                                          ax-      -3 0
                                             2 3]    [

So Y fails the scalar multiplication closure requirement of either Property SC [279] or Theorem TSS [293],
and is therefore not a subspace.
   There are two examples of subspaces that are trivial. Suppose that V is any vector space. Then V is
a subset of itself and is a vector space. By Definition S [292], V qualifies as a subspace of itself. The set
containing just the zero vector Z = {O} is also a subspace as can be seen by applying Theorem TSS [293]
or by simple modifications of the techniques hinted at in Example VSS [283]. Since these subspaces are so
obvious (and therefore not too interesting) we will refer to them as being trivial.
Definition TS
Trivial Subspaces
Given the vector space V, the subspaces V and {O} are each called a trivial subspace.        A
   We can also use Theorem TSS [293] to prove more general statements about subspaces, as illustrated
in the next theorem.
Theorem NSMS
Null Space of a Matrix is a Subspace
Suppose that A is an m x n matrix. Then the null space of A, P1(A), is a subspace of C"m.    D
Proof We will examine the three requirements of Theorem TSS [293]. Recall that Nf(A) = {x E C" Ax = 0}.
   First, 0 E P1(A), which can be inferred as a consequence of Theorem HSC [62]. So P1(A) # 0.
   Second, check additive closure by supposing that x E N(A) and y C P1(A). So we know a little
something about x and y: Ax =0 and Ay =0, and that is all we know. Question: Is x + y C P1(A)?
Let's check.

                   A(x + y) =Ax + Ay                      Theorem MMDAA [201]
                            = 0+ 0                        x EP1(A) ,y EP1(A)
                            =0                            Theorem VSPCV [86]

So, yes, x + y qualifies for membership in P1(A).
   Third, check scalar multiplication closure by supposing that ca C C and x C P1(A). So we know a little


something about x: Ax = 0, and that is all we know. Question: Is cx E N(A)? Let's check.

                     A(ax) = a(Ax)                      Theorem MMSMM [201]
                           = a0                         x cN(A)


Version 2.02


﻿
                                                              Subsection S.TSS The Span of a Set 298


                           = 0                         Theorem ZVSM [286]

So, yes, ax qualifies for membership in Af(A).
   Having met the three conditions in Theorem TSS [293] we can now say that the null space of a matrix
is a subspace (and hence a vector space in its own right!).                                      U
   Here is an example where we can exercise Theorem NSMS [296].
Example RSNS
Recasting a subspace as a null space
Consider the subset of C5 defined as

                                  x1
                                  J 2   3xi + x2 - 5X3 + 7X4 + X5 =0,
                         W =      x3| 4xi + 6x2 + 3x3 - 6X4 - 5x5 =0,
                                  X4    -2xi + 4x2 + 7x4 + X5 = 0
                                  _5_

It is possible to show that W is a subspace of C5 by checking the three conditions of Theorem TSS [293]
directly, but it will get tedious rather quickly. Instead, give W a fresh look and notice that it is a set of
solutions to a homogeneous system of equations. Define the matrix

                                           3   1 -5    7    1
                                    A=     4   6  3   -6   -5
                                          -2 4    0    7    1

and then recognize that W = f(A). By Theorem NSMS [296] we can immediately see that W is a
subspace. Boom!


Subsection TSS
The Span of a Set


The span of a set of column vectors got a heavy workout in Chapter V [83] and Chapter M [182]. The
definition of the span depended only on being able to formulate linear combinations. In any of our more
general vector spaces we always have a definition of vector addition and of scalar multiplication. So we
can build linear combinations and manufacture spans. This subsection contains two definitions that are
just mild variants of definitions we have seen earlier for column vectors. If you haven't already, compare
them with Definition LCCV [90] and Definition SSCV [112].
Definition LC
Linear Combination
Suppose that V is a vector space. Given n~ vectors ui1, u12, 113, ..., un and n~ scalars ai, a2, as3, ...,a,
their linear combination is the vector

                                 ai11+ aJ2U2 + as{us +| -. + - -- aUn.


Example LCM
A linear combination of matrices


In the vector space M23 of 2 x 3 matrices, we have the vectors

                   1 3   -2                    3  -1 2                     4   2 -4
                X  2  0  7 ]                   5   5   1_             z    [ =1


Version 2.02


﻿
Subsection S.TSS  The Span of a Set 299


and we can form linear combinations such as


2x+4y+ (-1)z


   4x-2y+3z


or,


2203 -2    [3 -1 242 -
2 21  -2 0  7] +4 [5  5 1 1  +( -1) L4  12  1
[2 6 -41 [12 -4 81+-4 -2 41
[4 0 141+ 20 20 4+[-1 -1   1]
10 0 8
23 19 17


4[1 3 -2]2[3 -1 24+               -4

4 12 -8       -6    2   -4112 6 -12
8 0 28     -10 -10 -2 + [3 3  3 ]
10 20 -24
1 -7 29


   When we realize that we can form linear combinations in any vector space, then it is natural to revisit
our definition of the span of a set, since it is the set of all possible linear combinations of a set of vectors.
Definition SS
Span of a Set
Suppose that V is a vector space. Given a set of vectors S = {ui, u2, u3, ..., ut}, their span, (S), is the
set of all possible linear combinations of u1, u2, u3, .., ut. Symbolically,

                  (S) ={aiui + a2u2 + as3u3 + -.-. + ast | ai E CC, 1 < i < t}

                     =      aiuCa1<EtC, 1}i   t
                         i=1

                                                                                   A

Theorem  SSS
Span of a Set is a Subspace
Suppose V is a vector space. Given a set of vectors S = {ui, U2, U3, ..., ut} C V, their span, (S), is a
subspace.                                                                          D
Proof We will verify the three conditions of Theorem TSS [293]. First,


O = 0+0+0+...+0
  =OUi+ Ou2 + Ou3 + --+ OUt


Property Z [280] for V
Theorem ZSSM [286]


So we have written 0 as a linear combination of the vectors in S and by Definition SS [298], 0 E (S) and
therefore S # 0.
   Second, suppose x E (S) and y E (S). Can we conclude that x + y E (S)? What do we know about
x and y by virtue of their membership in (S)? There must be scalars from C, ai, a2, a3, ..., at and
,31, /32, ,3, -"--", ,ft so  that


x
y


aiui + a2u2 + asu3 + ... + atut
/31ui + /32u2 + /3us + ... + /tut


Version 2.02


﻿
                                                              Subsection S.TSS The Span of a Set 300


Then

       x + y = aiu1 + a2u2 + anus + -.-. + atut
                  +/31u1+/32u2+/33u3+---+/3tut
               =aiui + #1ui + (a2u2 + 32u2
                  + asus + /33us + - - - + atut + /tut   Property AA [279], Property C [279]
                (c1+#1)u1+(c2+#2)u2
                  + (s -+ /3)u3 + - - - + (cat + /3t)ut  Property DSA [280]

Since each ai2+#f3 is again a scalar from C we have expressed the vector sum x + y as a linear combination
of the vectors from S, and therefore by Definition SS [298] we can say that x + y E (S).
   Third, suppose a E C and x E (S). Can we conclude that ax E (S)? What do we know about x by
virtue of its membership in (S)? There must be scalars from C, ai, a2, a3, ..., at so that

                                X=a11U1+ a2u12U+aus-13  +-+atuU


Then

           ax = a (aiUi + a2u2 + au3 + ... + atut)
              = a(aiui) + a(a2u2) + a(a3u3) + - - - + a(atut)      Property DVA [280]
              = (o1)ui + (aa2)u2 + (oo3)u3 + - - - + (aat)ut      Property SMA [280]


Since each aa is again a scalar from C we have expressed the scalar multiple ax as a linear combination
of the vectors from S, and therefore by Definition SS [298] we can say that ax E (S).
   With the three conditions of Theorem TSS [293] met, we can say that (S) is a subspace (and so is
also vector space, Definition VS [279]). (See Exercise SS.T20 [125], Exercise SS.T21 [125], Exercise SS.T22
[125].)                                                                                          U

Example SSP
Span of a set of polynomials
In Example SP4 [294] we proved that

                                   W = {p(x) | p EP4, p(2) = O}

is a subspace of P4, the vector space of polynomials of degree at most 4. Since W is a vector space itself,
let's construct a span within W. First let

                       S ={z4 - 4x3 + 5x2 - x - 2, 2x4 - 3x3 - 6x2 +6x + 4}

and verify that S is a subset of W by checking that each of these two polynomials has x 2 as a root.
Now, if we define U = (S), then Theorem SSS [298] tells us that U is a subspace of W. So quite quickly
we have built a chain of subspaces, U inside W, and W inside P4.
   Rather than dwell on how quickly we can build subspaces, let's try to gain a better understanding of
just how the span construction creates subspaces, in the context of this example. We can quickly build
representative elements of U,


       3(x4 - 4x3 + 5x2 - x - 2) + 5(2x4 - 3x3 - 6x2 + 6x + 4) = 134 - 27x3 - 15x2 + 27x + 14

and

      (-2)(x4 - 4x3 + 5x2 - x - 2) + 8(2x4 - 3x3 - 6x2 + 6x + 4) = 14x4 - 16x3 - 58x2 + 50x + 36


Version 2.02


﻿
                                                               Subsection S.TSS The Span of a Set 301


and each of these polynomials must be in W since it is closed under addition and scalar multiplication.
But you might check for yourself that both of these polynomials have x= 2 as a root.
   I can tell you that y = 3x4 - 7x3 - 2 + 7x - 2 is not in U, but would you believe me? A first check
shows that y does have x= 2 as a root, but that only shows that y E W. What does y have to do to gain
membership in U = (S)? It must be a linear combination of the vectors in S, x4 - 4x3 + 52 - x - 2 and
2x4 - 3x3 - 6x2 + 6x + 4. So let's suppose that y is such a linear combination,

          y=3x4-7x3-2+7x-2
            =ca1(x4 - 4x3 + 5x2 - x - 2) + a2(2x4 - 3x3 - 6x2 + 6x + 4)
            =(ai + 2012)x4 + (-4ai - 3ca2)x3 + (5a1 - 602)x2 + (-ai + 6ca2)x - (-2ai + 4a2)

Notice that operations above are done in accordance with the definition of the vector space of polynomials
(Example VSP [281]). Now, if we equate coefficients, which is the definition of equality for polynomials,
then we obtain the system of five linear equations in two variables

                                             ai1 + 2o2 = 3
                                          -4oi - 3a2 = -7
                                            5c1 - 6c2 =-1
                                            -ai + 6a2 = 7
                                          -2ai + 402 = -2

Build an augmented matrix from the system and row-reduce,

                                   1   2    3            1   0    0
                                   -4  -3  -7            0   1    0
                                   5   -6  -1    RREF    0   0   W
                                   -1  6    7            0   0    0
                                   -2  4   -2            0   0    0

With a leading 1 in the final column of the row-reduced augmented matrix, Theorem RCLS [53] tells us
the system of equations is inconsistent. Therefore, there are no scalars, ai and a2, to establish y as a
linear combination of the elements in U. So y 0 U.

   Let's again examine membership in a span.

Example SM32
A subspace of M32
The set of all 3 x 2 matrices forms a vector space when we use the operations of matrix addition (Definition
MA [182]) and scalar matrix multiplication (Definition MSM [183]), as was show in Example VSM [281].
Consider the subset


                               5  -5_ 4  -1_ -19     -11_      4   -_     -77

and define a new subset of vectors W in M~i32 using the span (Definition SS [298]), W= (S). So by
Theorem SSS [298] we know that W is a subspace of Ml32. While W is an infinite set, and this is a precise
description, it would still be worthwhile to investigate whether or not W contains certain elements.


First, is
                                              9    3
                                       y=     7    3
                                             10 -11


Version 2.02


﻿
                                                              Subsection S.TSS The Span of a Set 302


in W? To answer this, we want to determine if y can be written as a linear combination of the five matrices
in S. Can we find scalars, ci, aC2, 3, a4, a5 so that

        9    3         3   1          1   1           3    -1         4    2          3    1
        7    3   = ai 4    2   + a2   2  -1 +as3     -1    2    +aO4  1   -2 +aO5    -4    0
        10 -11         5 -5          14 -1          -19 -11           14  -2         -17 7
                       3a1+a2+3-3+4a4+3a 5               ai+a2-a3-2a4-a5
                       4ai+2O2-a-3+a4-4o5                  2ai-a2+2a 3-2a4
                     5o1 + 1402 - 1903 + 14a4 - 1705  -5o1 - a2- 1103 - 2a4 + 7cs_

Using our definition of matrix equality (Definition ME [182]) we can translate this statement into six
equations in the five unknowns,

                                    3cli + a~2 + 3c63 + 4ci4 + 365 =9
                                    a1+a2+3a3+4a4+3as=9
                                       ai + a2 - as +| 2ca4 +| ac 3
                                     4i1 + 2c2 - a    a4 - 4c= 7
                                          2ai - a2 + 2a3 - 2a4 =3
                              So1 + 142 -1903 + 1404 -1705a=10
                                 -5a1 - a2 -1103 - 264 + 7c5 = -11

This is a linear system of equations, which we can represent with an augmented matrix and row-reduce in
search of solutions. The matrix that is row-equivalent to the augmented matrix is

                                     10       0   0         2
                                     0   R    0   0        -1
                                     0    0  F-   0 ]00
                                     0    0   0  F-]    ig  1
                                     0    0   0   0    0    0
                                     0    0   0   0    0    0

So we recognize that the system is consistent since there is no leading 1 in the final column (Theorem RCLS
[53]), and compute n - r = 5 - 4 =1 free variables (Theorem FVCS [55]). While there are infinitely many
solutions, we are only in pursuit of a single solution, so let's choose the free variable a5 = 0 for simplicity's
sake. Then we easily see that ciq= 2, c2 = -1, a3 = 0, O4= 1. So the scalars aiq= 2, a2 = -1, Os= 0,
a4 = 1, a5 = 0 will provide a linear combination of the elements of S that equals y, as we can verify by
checking,

                       [9   3 1       3   1I          1    1I         4   21


So with one particular linear combination in hand, we are convinced that y deserves to be a member of
W =(S). Second, is


in W? To answer this, we want to determine if x can be written as a linear combination of the five matrices
in S. Can we find scalars, ai, a2, as, a4, as so that


2   1         3   1          1   1          3    -1          4    2          3    1
3   1   = ai 4    2   + a2   2  -1 + a3     -1    2    + a4  1   -2 + a5    -4   0
4 -2_         5 -5_         14 -1_         -19 -11_          14 -2_         -17 7_


Version 2.02


﻿
                                                         Subsection S.SC Subspace Constructions 303


                      3a1+a2 +3a3+4a4+3a5              ai+a2-a3+2a4+- a
                =      4i + 22 - a3+ a4 - 4o5            2ai - a2 +2o3 - 2a4
                   5o1 + 1402 - 1903 + 1404 - 17o5   -51 - a2 - 1103 - 2o4 + 7oaj

Using our definition of matrix equality (Definition ME [182]) we can translate this statement into six
equations in the five unknowns,

                                    3c1 + a2 + 363 + 4c4 + 35 =c 2
                                       al + a2 - a3 + 2a4 + a5= 1
                                     4oi + 2o2 - a3 + a4 - 4o5 = 3
                                          2ai - a2 + 2o3 - 2a4 =1
                               5c1 + 14x2 - 19x3 + 14x4 - 17x5= 4
                                 -501 - a2- 11c3 - 2a4 + 7c5 =-2

This is a linear system of equations, which we can represent with an augmented matrix and row-reduce in
search of solutions. The matrix that is row-equivalent to the augmented matrix is

                                    U10      0    0    8    0
                                    0        0    0  -8     0
                                    0    0   [    0   -g    0
                                    0    0   0   []-g       0
                                    0    0   0    0    0   R-
                                    0    0   0    0    0   0

With a leading 1 in the last column Theorem RCLS [53] tells us that the system is inconsistent. Therefore,
there are no values for the scalars that will place x in W, and so we conclude that x 0 W.

   Notice how Example SSP [299] and Example SM32 [300] contained questions about membership in a
span, but these questions quickly became questions about solutions to a system of linear equations. This
will be a common theme going forward.


Subsection SC
Subspace Constructions


Several of the subsets of vectors spaces that we worked with in Chapter M [182] are also subspaces  they
are closed under vector addition and scalar multiplication in Cm.

Theorem CSMS
Column Space of a Matrix is a Subspace
Suppose that A is an m x n~ matrix. Then C(A) is a subspace of Ctm.D

Proof Definition CSM [236] shows us that C(A) is a subset of Ctm, and that it is defined as the span of
a set of vectors from Ctm (the columns of the matrix). Since C(A) is a span, Theorem SSS [298] says it is
a subspace.U


   That was easy! Notice that we could have used this same approach to prove that the null space is a
subspace, since Theorem SSNS [118] provided a description of the null space of a matrix as the span of a
set of vectors. However, I much prefer the current proof of Theorem NSMS [296]. Speaking of easy, here
is a very easy theorem that exposes another of our constructions as creating subspaces.


Version 2.02


﻿
                                                           Subsection S.READ Reading Questions 304


Theorem RSMS
Row Space of a Matrix is a Subspace
Suppose that A is an m x n matrix. Then R(A) is a subspace of Ct".                               D
Proof   Definition RSM [243] says 7Z(A) = C(At), so the row space of a matrix is a column space, and
every column space is a subspace by Theorem CSMS [302]. That's enough.                           U
   One more.
Theorem LNSMS
Left Null Space of a Matrix is a Subspace
Suppose that A is an m x n matrix. Then C(A) is a subspace of Ctm.                               D
Proof    Definition LNS [257] says [(A) =AJ(At), so the left null space is a null space, and every null
space is a subspace by Theorem NSMS [296]. Done.                                                 U
   So the span of a set of vectors, and the null space, column space, row space and left null space of a
matrix are all subspaces, and hence are all vector spaces, meaning they have all the properties detailed
in Definition VS [279] and in the basic theorems presented in Section VS [279]. We have worked with
these objects as just sets in Chapter V [83] and Chapter M [182], but now we understand that they have
much more structure. In particular, being closed under vector addition and scalar multiplication means a
subspace is also closed under linear combinations.

Subsection READ
Reading Questions


  1. Summarize the three conditions that allow us to quickly test if a set is a subspace.

  2. Consider the set of vectors


                                     W   =[bj     3a-2b+c=5
                                             c_

     Is the set W a subspace of C3? Explain your answer.

  3. Name five general constructions of sets of column vectors (subsets of Cm) that we now know as
     subspaces.


Version 2.02


﻿
                                                                     Subsection S.EXC Exercises 305


Subsection EXC
Exercises


C20   Working within the vector space P3 of polynomials of degree 3 or less, determine if p(x) = x3+6x+4
is in the subspace W below.

                              W=({x3 + X2 +X) x3 +2x-6, X2 -5})


Contributed by Robert Beezer Solution [305]

C21 Consider the subspace


                                        ,3 - 1_i'L2 3_1 ' 2 1_-J1

of the vector space of 2 x 2 matrices, M22. Is C [63   41 an element of W?
Contributed by Robert Beezer Solution [305]

C25   Show that the set W      { [x  | 3x1 - 5x2 = 12} from Example NSC2Z [295] fails Property AC
[279] and Property SC [279].
Contributed by Robert Beezer

C26   Show that the set Y =   [  i E Z, x2 E  from Example NSC2S [296] has Property AC [279].
                              S2_
Contributed by Robert Beezer

M20    In C3, the vector space of column vectors of size 3, prove that the set Z is a subspace.


                                 Z =     X2|4x1 - x2 + 5x3 = 0
                                         3_

Contributed by Robert Beezer Solution [306]

T20   A square matrix A of size n is upper triangular if [A]j= 0 whenever i > j. Let UT" be the set
of all upper triangular matrices of size n. Prove that UT, is a subspace of the vector space of all square
matrices of size n, Man
Contributed by Robert Beezer Solution [306]


Version 2.02


﻿
                                                                       Subsection S.SOL Solutions 306


Subsection SOL
Solutions


C20    Contributed by Robert Beezer  Statement [304]
The question is if p can be written as a linear combination of the vectors in W. To check this, we set p
equal to a linear combination and massage with the definitions of vector addition and scalar multiplication
that we get with P3 (Example VSP [281])

                        p(x) = ai(3 + 2 + x) + a2(3 + 2x - 6) + a3(2 - 5)
                 x3 + 6x + 4= (al + a2)x3 + (al + a3)x2 + (al + 2a2)x + (-6a2 - 5a3)


Equating coefficients of equal powers of x, we get the system of equations,

                                               ai + a2= 1
                                               a1 + a3 =0
                                               ai + 2a2 = 6
                                           -6a2 - 5a3 = 4

The augmented matrix of this system of equations row-reduces to


                                           0    0  [1    0
                                              0   0   0 [


There is a leading 1 in the last column, so Theorem RCLS [53] implies that the system is inconsistent. So
there is no way for p to gain membership in W, so p 0 W.
C21    Contributed by Robert Beezer  Statement [304]
In order to belong to W, we must be able to express C as a linear combination of the elements in the
spanning set of W. So we begin with such an expression, using the unknowns a, b, c for the scalars in the
linear combination.
                             C=-3    3     a2     1  +    4 0 +      -3 1
                                6   -4        3 -1        2 3        2   1
Massaging the right-hand side, according to the definition of the vector space operations in A22 (Example
VSM [281]), we find the matrix equality,


                                K3 -41 [3a+2b+2c -a+3b+c1

Matrix equality allows us to form a system of four equations in three variables, whose augmented matrix
row-reduces as follows,

                                   1  0 1  3    RREF    0        0   -1


                             [-13     1   -4]         [00        0    0]

Since this system of equations is consistent (Theorem RCLS [53]), a solution will provide values for a, b
and c that allow us to recognize C as an element of W.


Version 2.02


﻿
                                                                       Subsection S.SOL Solutions 307


M20     Contributed by Robert Beezer    Statement [304]
The membership criteria for Z is a single linear equation, which comprises a homogeneous system of
equations. As such, we can recognize Z as the solutions to this system, and therefore Z is a null space.
Specifically, Z = N([4  -1 5]). Every null space is a subspace by Theorem NSMS [296].
   A less direct solution appeals to Theorem TSS [293].
                                                                             .0
   First, we want to be certain Z is non-empty. The zero vector of C3, 0 = 0 , is a good candidate,
                                                                             0
since if it fails to be in Z, we will know that Z is not a vector space. Check that

                                         4(0) - (0) + 5(0) = 0

so that 0 E Z.

   Suppose x =    z2   and y =   Y2   are vectors from Z. Then we know that these vectors cannot be

totally arbitrary, they must have gained membership in Z by virtue of meeting the membership test. For
example, we know that x must satisfy 4x1 - 12 + 513 = 0 while y must satisfy 4y1 - Y2 + 5y3= 0. Our
second criteria asks the question, is x + y E Z? Notice first that

                                           [1 +y i+ y1
                                  X +y3'=    2 +   y2 =     2 + y2


and we can test this vector for membership in Z as follows,

                  4(zi + yi) - 1(X2 + Y2) + 5(13 + Y3)
                  = 4x1 + 4y1 - 12 - Y2 + 5x3 + 5ys3
                  = (4x1 - X2 + 5X3) + (4Y1 - Y2 + 5ys)
                  =0+0                                                 xEZ, yEZ
                  = 0

and by this computation we see that x + y E Z.
   If a is a scalar and x E Z, is it always true that ax E Z? To check our third criteria, we examine

                                                x1       ali
                                        aX = a x2    =ale2
                                                [3]_  a3

and we can test this vector for membership in Z with

                        4(azi1) - (0612) + 5(0613)
                              =0a(41i - 12 + 513)
                              =a60                                      xEZ


and we see that indeed a6x C Z. With the three conditions of Theorem TSS [293] fulfilled, we can conclude
that Z is a subspace of C3.


T20    Contributed by Robert Beezer  Statement [304]
Apply Theorem TSS [293].
   First, the zero vector of Man is the zero matrix, 0, whose entries are all zero (Definition ZM [185]).
This matrix then meets the condition that [0] = 0 for i > j and so is an element of UTn.


Version 2.02


﻿
                                                                      Subsection S.SOL Solutions 308


   Suppose A, B E UTn. Is A + B E UT,? We examine the entries of A + B "below" the diagonal. That
is, in the following, assume that i > j.


[A + B]   = [A]j + [B]
         =0+0


Definition MA [182]
A,B E UTn


which qualifies A + B for membership in UT,.
   Suppose a E C and A E UTn. Is oA E UT,? We examine the entries of oA "below" the diagonal.
That is, in the following, assume that i > j.


[caA]ij a c~i


Definition MSM [183]
A E UTn


which qualifies oA for membership in UT,.
   Having fulfilled the three conditions of Theorem TSS [293] we see that UT, is a subspace of Man.


Version 2.02


﻿
                                               Section LISS Linear Independence and Spanning Sets 309


Section LISS
Linear Independence and Spanning Sets


A vector space is defined as a set with two operations, meeting ten properties (Definition VS [279]). Just
as the definition of span of a set of vectors only required knowing how to add vectors and how to multiply
vectors by scalars, so it is with linear independence. A definition of a linear independent set of vectors in
an arbitrary vector space only requires knowing how to form linear combinations and equating these with
the zero vector. Since every vector space must have a zero vector (Property Z [280]), we always have a
zero vector at our disposal.
   In this section we will also put a twist on the notion of the span of a set of vectors. Rather than
beginning with a set of vectors and creating a subspace that is the span, we will instead begin with a
subspace and look for a set of vectors whose span equals the subspace.
   The combination of linear independence and spanning will be very important going forward.


Subsection LI
Linear Independence


Our previous definition of linear independence (Definition LI [308]) employed a relation of linear dependence
that was a linear combination on one side of an equality and a zero vector on the other side. As a
linear combination in a vector space (Definition LC [297]) depends only on vector addition and scalar
multiplication, and every vector space must have a zero vector (Property Z [280]), we can extend our
definition of linear independence from the setting of Cm to the setting of a general vector space V with
almost no changes. Compare these next two definitions with Definition RLDCV [132] and Definition LICV
[132].

Definition RLD
Relation of Linear Dependence
Suppose that V is a vector space. Given a set of vectors S = {ui, u2, u3, ..., un}, an equation of the
form
                                1ui+ a2U2 +     3u3 +   -+nun =0
is a relation of linear dependence on S. If this equation is formed in a trivial fashion, i.e. ai = 0,
1 < i <;n, then we say it is a trivial relation of linear dependence on S.                A

Definition LI
Linear Independence
Suppose that V is a vector space. The set of vectors S ={ui, 112, 113, ..., un} from V is linearly
dependent if there is a relation of linear dependence on S that is not trivial. In the case where the only
relation of linear dependence on S is the trivial one, then S is a linearly independent set of vectors. A

   Notice the emphasis on the word "only." This might remind you of the definition of a nonsingular
matrix, where if the matrix is employed as the coefficient matrix of a homogeneous system then the only
solution is the trivial one.

Example LIP4


Linear independence in P4
In the vector space of polynomials with degree 4 or less, P4 (Example VSP [281]) consider the set

         S= {2x4+3x3+2x2-x+ 10, -x4-2x3+x2+5x-8, 2x4 +x3 +10x2+17x-2}.


Version 2.02


﻿
                                                           Subsection LISS.LI Linear Independence 310


Is this set of vectors linearly independent or dependent? Consider that

                3 (2x4 + 3x3 + 22 - x + 10) + 4 (-x4 - 2x3 + x2 + 5x - 8)
                   + (-1) (2x4 + x3 + lOx2 + 17x - 2)   Ox4+ Ox3 + Ox2 + Ox +0 =0

This is a nontrivial relation of linear dependence (Definition RLD [308]) on the set S and so convinces us
that S is linearly dependent (Definition LI [308]).
   Now, I hear you say, "Where did those scalars come from?" Do not worry about that right now, just
be sure you understand why the above explanation is sufficient to prove that S is linearly dependent. The
remainder of the example will demonstrate how we might find these scalars if they had not been provided
so readily. Let's look at another set of vectors (polynomials) from P4. Let

                     T= {3x4 - 2x3 + 4x2 + 6x - 1, -3x4 + 1x3 + Ox2 + 4x + 2,
                          4x4 + 5x3 - 2x2 + 3x + 1, 2x4 - 7x3 + 4x2 + 2x + 1}

Suppose we have a relation of linear dependence on this set,

                 0 = Ox4 + Ox3 + Ox2 + Ox + 0
                   = c (3x4 - 2x3 + 4x2 + 6x - 1) + a2 (-3x4 + 1x3 + Ox2 + 4x + 2)
                     + a3 (4x4 + 5x3 - 22 + 3x + 1) + a4 (2x4 - 7x3 + 4x2 + 2x + 1)

Using our definitions of vector addition and scalar multiplication in P4 (Example VSP [281]), we arrive at,

         Ox4 + Oz3 + O 2 + Ox + 0 = (3c1 - 3a2 + 4a3-+2a4)x4 +(-2ai-| a2 + 5cr3 - 704) x3
                                    + (4ai +-2a3+-4a4)x2 + (6a1 + 4a2 + 3a3-+2a4) x
                                    + (-a1 + 2c2 + a3 + a4).

Equating coefficients, we arrive at the homogeneous system of equations,

                                      3°1-c3 2 + 4o3 + 2a4=0
                                      -2a1+L2+563 -7c4= 0
                                          4ci + -2a3 + 4a4 =0
                                      6a1+4a2+3a3+2a4=0
                                      -ai +2c2+ a3 + a4 = 0

We form the coefficient matrix of this homogeneous system of equations and row-reduce to find

                                          O    0    0   0


                                          O    0    0  W

We expected the system to be consistent (Theorem HSC [62]) and so can compute n~ - r =4 - 4 =0 and
Theorem CSRN [54] tells us that the solution is unique. Since this is a homogeneous system, this unique
solution is the trivial solution (Definition TSHSE [62]), ai = 0, ca2 =0, as = 0, a4 =0. So by Definition
LI [308] the set T is linearly independent.


   A few observations. If we had discovered infinitely many solutions, then we could have used one of
the non-trivial ones to provide a linear combination in the manner we used to show that S was linearly
dependent. It is important to realize that it is not interesting that we can create a relation of linear
dependence with zero scalars   we can always do that    but that for T, this is the only way to create a


Version 2.02


﻿
                                                           Subsection LISS.LI Linear Independence 311


relation of linear dependence. It was no accident that we arrived at a homogeneous system of equations
in this example, it is related to our use of the zero vector in defining a relation of linear dependence. It is
easy to present a convincing statement that a set is linearly dependent (just exhibit a nontrivial relation of
linear dependence) but a convincing statement of linear independence requires demonstrating that there is
no relation of linear dependence other than the trivial one. Notice how we relied on theorems from Chapter
SLE [2] to provide this demonstration. Whew! There's a lot going on in this example. Spend some time
with it, we'll be waiting patiently right here when you get back.

Example LIM32
Linear independence in M32
Consider the two sets of vectors R and S from the vector space of all 3 x 2 matrices, M32 (Example VSM
[281])

                               3 -1       -2   3       6   -6      7   9
                       R={     1   I4  ,   1  -3], -1      01, -4 -5
                              16-6        -2 -6        7   -9_     2   5_
                              12   0      -4   0       1   1     -5   3
                        S{=    1 -1 , -2       2   , -2 1 , -10 7
                              11   3      -2 -6        2  4       2   0

One set is linearly independent, the other is not. Which is which? Let's examine R first. Build a generic
relation of linear dependence (Definition RLD [308]),

                       3 -1          -2   3          6   -6          7    9
                   cai 1   4   + a2   1   -3 + a3 -1      0   + a4 -4 -5 = 0
                       6 -6_         -2 -6_          7   -9_         2    5_

Massaging the left-hand side with our definitions of vector addition and scalar multiplication in M32
(Example VSM [281]) we obtain,

                      3a1 - 2a2 + 6a3+7a4 -101+ 3a2 - 6a3 + 9a4      0 0
                      1a1+1a2 -as-4a4           4a1-3a22+-5a4          =  0 0
                      6a1 - 2a2 + 73+2-2a4 -6a1 - 6a2 - 9a3 + 54_    0 0

Using our definition of matrix equality (Definition ME [182]) and equating corresponding entries we get
the homogeneous system of six equations in four variables,

                                      3a i- 2a2 + 6a3+ 7a4= 0
                                      -1o1 + 3a2 - 6a3 + 9a4=0
                                      101+102 -a - 4a4= 0

                                           4cai - 3c02 + -504 = 0


                                     -6q1- 602 - 903+ 504=0

Form the coefficient matrix of this homogeneous system and row-reduce to obtain

                                          F2fL0 0


0    0   0   0
0    0   0   0
0 0 00
0 0 00


Version 2.02


﻿
                                                            Subsection LISS.LI Linear Independence 312


Analyzing this matrix we are led to conclude that ai= 0, a2 = 0, a3 = 0, a4 = 0. This means there is
only a trivial relation of linear dependence on the vectors of R and so we call R a linearly independent set
(Definition LI [308]).
   So it must be that S is linearly dependent. Let's see if we can find a non-trivial relation of linear
dependence on S. We will begin as with R, by constructing a relation of linear dependence (Definition
RLD [308]) with unknown scalars,

                        2   0          ~-4   0~ 1           1~ -5          3
                    ai 1 -1 + O2 -2       2     + a3 -2 1 + a4 -10 7 = 0
                         1  3          -2 -6            2   4          2   0

Massaging the left-hand side with our definitions of vector addition and scalar multiplication in M1/l32
(Example VSM [281]) we obtain,

                        2c1-4o22+O3 -5o4              O3+3o4               0 0
                        ai - 2a2 - 2a3- 10oa4 -ai + 2a2 +a3+-7a4 =  0 0
                        ai - 22 + 2a3+2a4         31 - 6o2 + 4o3     j     .0 0]

Using our definition of matrix equality (Definition ME [182]) and equating corresponding entries we get
the homogeneous system of six equations in four variables,

                                       2ci - 4o2 + as - 5a4 = 0
                                                  +O3 + 3o4 = 0
                                      ai-2a2-2a3-10a4=0
                                      -ai +2c2+ a + 764 =0
                                      ai,- 2a2+2a3 + 2a4= 0
                                            3a1 - 6a2 + 43 = 0

Form the coefficient matrix of this homogeneous system and row-reduce to obtain

                                            1 -2     0   -4
                                            0   0   W     3
                                            0   0    0   0
                                            0   0    0    0
                                            0   0    0    0
                                            0   0    0    0_

Analyzing this we see that the system is consistent (we expected this since the system is homogeneous,
Theorem HSC [62]) and has n - r =4 - 2 =2 free variables, namely ca2 and a4. This means there are
infinitely many solutions, and in particular, we can find a non-trivial solution, so long as we do not pick all
of our free variables to be zero. The mere presence of a nontrivial solution for these scalars is enough to
conclude that S is a linearly dependent set (Definition LI [308]). But let's go ahead and explicitly construct
a non-trivial relation of linear dependence.
    Choose ca2 =1 and a4      -1. There is nothing special about this choice, there are infinitely many
possibilities, some "easier" than this one, just avoid picking both variables to be zero. Then we find the
corresponding dependent variables to be ai = -2 and as  3. So the relation of linear dependence,


      2   0          -4    0           1   1           -5   3      0 0
(-2) 1 -1 + (1) -2         2   + (3) -2 1 + (-1) -10 7 =           0 0
      1   3          -2 -6j          L 2   4j        L 2    0      0 0


Version 2.02


﻿
                                                                 Subsection LISS.SS Spanning Sets 313


is an iron-clad demonstration that S is linearly dependent. Can you construct another such demonstration?


Example LIC
Linearly independent set in the crazy vector space
Is the set R = {(1, 0), (6, 3)} linearly independent in the crazy vector space C (Example CVS [283])? We
begin with an arbitrary relation of linear independence on R

                    0   a= 1(1, 0) + a2(6, 3)                Definition RLD [308]


and then massage it to a point where we can apply the definition of equality in C. Recall the definitions
of vector addition and scalar multiplication in C are not what you would expect.

     (-1, -1) = 0                                                            Example CVS [283]
              = al(1, 0) + a2(6, 3)                                          Definition RLD [308]
              = (1a1 + ai - 1, 0a1 + ai - 1) + (6a2 + a2 - 1, 3a2 + a2 - 1) Example CVS [283]
              = (2ai - 1, a1 - 1) + (7a2 - 1, 4a2 - 1)
              = (2ai-1 + 7a2 -1 + 1, ai-1 + 4a2 -1 + 1)                      Example CVS [283]
              = (2ai + 7a2 - 1, a1 + 4a2 - 1)

Equality in C (Example CVS [283]) then yields the two equations,

                                         2a1 + 7a2 - 1 =-1
                                         a1 + 4a2 - 1 =-1

which becomes the homogeneous system

                                            2a1 + 7a2 = 0
                                            a1 + 4a2 = 0

Since the coefficient matrix of this system is nonsingular (check this!) the system has only the trivial
solution a1= a2 = 0. By Definition LI [308] the set R is linearly independent. Notice that even though the
zero vector of C is not what we might first suspected, a question about linear independence still concludes
with a question about a homogeneous system of equations. Hmmm.


Subsection SS
Spanning Sets


In a vector space V, suppose we are given a set of vectors S C V. Then we can immediately construct a
subspace, (S), using Definition SS [298] and then be assured by Theorem SSS [298] that the construction
does provide a subspace. We now turn the situation upside-down. Suppose we are first given a subspace
W C V. Can we find a set S so that (S)= W? Typically W is infinite and we are searching for a finite
set of vectors S that we can combine in linear combinations and "build" all of W.
   I like to think of S as the raw materials that are sufficient for the construction of W. If you have


nails, lumber, wire, copper pipe, drywall, plywood, carpet, shingles, paint (and a few other things), then
you can combine them in many different ways to create a house (or infinitely many different houses for
that matter). A fast-food restaurant may have beef, chicken, beans, cheese, tortillas, taco shells and hot
sauce and from this small list of ingredients build a wide variety of items for sale. Or maybe a better


Version 2.02


﻿
                                                                 Subsection LISS.SS Spanning Sets 314


analogy comes from Ben Cordes   the additive primary colors (red, green and blue) can be combined to
create many different colors by varying the intensity of each. The intensity is like a scalar multiple, and
the combination of the three intensities is like vector addition. The three individual colors, red, green and
blue, are the elements of the spanning set.
   Because we will use terms like "spanned by" and "spanning set," there is the potential for confusion
with "the span." Come back and reread the first paragraph of this subsection whenever you are uncertain
about the difference. Here's the working definition.
Definition TSVS
To Span a Vector Space
Suppose V is a vector space. A subset S of V is a spanning set for V if (S) = V. In this case, we also
say S spans V.                                                                                     A
   The definition of a spanning set requires that two sets (subspaces actually) be equal. If S is a subset of
V, then (S) C V, always. Thus it is usually only necessary to prove that V C (5). Now would be a good
time to review Definition SE [684].
Example SSP4
Spanning set in P4
In Example SP4 [294] we showed that

                                    W = {p(x) p E P4, p(2) = O}

is a subspace of P4, the vector space of polynomials with degree at most 4 (Example VSP [281]). In this
example, we will show that the set

               S = {x - 2, x2 - 4x + 4, x3 - 6x2 +12x - 8, x4 - 8x3 + 242 - 32x + 16}

is a spanning set for W. To do this, we require that W = (5). This is an equality of sets. We can check
that every polynomial in S has x= 2 as a root and therefore S C W. Since W is closed under addition
and scalar multiplication, (S) C W also.
   So it remains to show that W C (S) (Definition SE [684]). To do this, begin by choosing an arbitrary
polynomial in W, say r(x) = ax4+ bx3 + cx2 + dx + e E W. This polynomial is not as arbitrary as it would
appear, since we also know it must have x= 2 as a root. This translates to

                      0 = a(2)4 + b(2)3 +c(2)2 + d(2) + e =16a + 8b+ 4c+ 2d+ e

as a condition on r.
   We wish to show that r is a polynomial in (5), that is, we want to show that r can be written as a
linear combination of the vectors (polynomials) in S. So let's try.


                        =cai (x - 2) + ca2 (x2 - 4x + 4) + as (x3 - 6x2 + 12x - 8)
                        + Oa4 (x4 - 8x3 -+ 24x2 - 32x + 16)
                        =oa4x4 + (as - 8ca4) x3 + (ca2 - 6c03 +| 2402) x2
                        + (i - 402+ 12a3 -32a4)xz+(-2i + 42 -83+ 164)

Equating coefficients (vector equality in F4) gives the system of five equations in four variables,

                                                           a{4 - a


              a3 - 8a4 = b
       a2 - 603 + 2402 =C
a1- 402 + 1203 - 3204= d


Version 2.02


﻿
                                                                 Subsection LISS.SS Spanning Sets 315


                                    -261 + 462 - 863 + 1664= e


Any solution to this system of equations will provide the linear combination we need to determine if r E (S),
but we need to be convinced there is a solution for any values of a, b, c, d, e that qualify r to be a member
of W. So the question is: is this system of equations consistent? We will form the augmented matrix, and
row-reduce. (We probably need to do this by hand, since the matrix is symbolic   reversing the order of
the first four rows is the best way to start). We obtain a matrix in reduced row-echelon form

          [     0   0   0     32a+12b+4c+d            [ 110      0   0   32a+12b+4c+d
          0    [-   0   0        24a+6b+c               0   [1   0   0      24a+6b+c
          0     0  [1   0           8a+b            =   0    0  []0            8a+b
          0    0    0                 a                 0    0   0   [1a
          0     0   0   0   16a+8b+4c+2d+e              0    0   0   0           0

For your results to match our first matrix, you may find it necessary to multiply the final row of your
row-reduced matrix by the appropriate scalar, and/or add multiples of this row to some of the other rows.
To obtain the second version of the matrix, the last entry of the last column has been simplified to zero
according to the one condition we were able to impose on an arbitrary polynomial from W. So with
no leading 1's in the last column, Theorem RCLS [53] tells us this system is consistent. Therefore, any
polynomial from W can be written as a linear combination of the polynomials in S, so W C (S). Therefore,
W = (S) and S is a spanning set for W by Definition TSVS [313].
   Notice that an alternative to row-reducing the augmented matrix by hand would be to appeal to
Theorem FS [263] by expressing the column space of the coefficient matrix as a null space, and then
verifying that the condition on r guarantees that r is in the column space, thus implying that the system
is always consistent. Give it a try, we'll wait. This has been a complicated example, but worth studying
carefully.

   Given a subspace and a set of vectors, as in Example SSP4 [313] it can take some work to determine
that the set actually is a spanning set. An even harder problem is to be confronted with a subspace and
required to construct a spanning set with no guidance. We will now work an example of this flavor, but
some of the steps will be unmotivated. Fortunately, we will have some better tools for this type of problem
later on.

Example SSM22
Spanning set in l22
In the space of all 2 x 2 matrices, M22 consider the subspace


                     Z =[ | a+3b-c -5d =0,-2a -6b+3c+14d=0}


and find a spanning set for Z.
   We need to construct a limited number of matrices in Z so that every matrix in Z can be expressed as
a linear combination of this limited number of matrices. Suppose that B  [=  $is a matrix in Z. Then
we can form a column vector with the entries of B and write


b  EE [1        3   - 1 _-51\
     c    -2-6 31 14]


Version 2.02


﻿
                                                           Subsection LISS.SS Spanning Sets 316


Row-reducing this matrix and applying Theorem REMES [28] we obtain the equivalent statement,


                                  b E3 0         0    14J

                                  _d_

We can then express the subspace Z in the following equal forms,

                   Z           |a Ja+3b-c-5d=0,-2a-6b+3c+14d           0}


                     {Ka b1 a+3b-d=0,c+4d=0}

                        (a b
                        ({Kd  |a-3b+dc-4d
                        ( -3b-+Ad bl bddCC
                        ([ -4d    d   _

                        { 0    0-]+[d      ]bd    cC


                              -3 1_1 0
                           0{b-     d0[' -4 1 b]/


So the set


spans Z by Definition TSVS [313].

Example SSC
Spanning set in the crazy vector space
In Example LIC [312] we determined that the set R = {(1, 0), (6, 3)} is linearly independent in the crazy
vector space C (Example CVS [283]). We now show that R is a spanning set for C.
   Given an arbitrary vector (x, y) E C we desire to show that it can be written as a linear combination
of the elements of R. In other words, are there scalars ai and a2 so that

                                   (x, y) ai0(1, 0) + a2(6, 3)

We will act as if this equation is true and try to determine just what ai and a2 would be (as functions of
x and y).

       (x, y) ai0(1, 0) + a2(6, 3)
             =(11i+ ai - 1, 01 + ai - 1) +(6a2 +a2 - 1, 3a2 +a2 - 1)   Scalar mult in C
             =(2ai - 1, a1 - 1) + (7a2 - 1, 4a2 - 1)
             =(2a- 1+ 7a2 - 1+1, ai- 1+ 4a2 - 1+1)                     Addition in C
             =(2ai + 7a2 - 1, ai + 4a2 - 1)


Equality in C then yields the two equations,

                                      2a1 + 7a2 - 1 =x


Version 2.02


﻿
                                                         Subsection LISS.VR  Vector Representation 317


                                           a1+ 4a2 - 1 =y

which becomes the linear system with a matrix representation


                                         [1 4] a2]     y + l_

The coefficient matrix of this system is nonsingular, hence invertible (Theorem NI [228]), and we can
employ its inverse to find a solution (Theorem TTMI [214], Theorem SNCM [229]),


                     a2_    [1 4_[y +1J _       -1    2] [y+1_       -x+2y + l_

We could chase through the above implications backwards and take the existence of these solutions as
sufficient evidence for R being a spanning set for C. Instead, let us view the above as simply scratchwork
and now get serious with a simple direct proof that R is a spanning set. Ready? Suppose (x, y) is any
vector from C, then compute the following linear combination using the definitions of the operations in C,

    (4x - 7y - 3)(1, 0) + (-x+ 2y + 1)(6, 3)
                       = (1(4x - 7y- 3) + (4x - 7y- 3) - 1, 0(4x - 7y- 3) + (4x - 7y- 3) - 1) +
                         (6(-x+ 2y + 1) + (-x+ 2y + 1) - 1, 3(-x+ 2y + 1) + (-x+ 2y + 1) - 1)
                       = (8x - 14y - 7, 4x - 7y - 4) + (-7x + 14y + 6, -4x + 8y + 3)
                         ((8x- 14y-7)+(-7x+14y+6)+1, (4x-7y-4)+(-4x+8y+3)+1)
                       - (X, Y)

This final sequence of computations in C is sufficient to demonstrate that any element of C can be written
(or expressed) as a linear combination of the two vectors in R, so C C (R). Since the reverse inclusion
(R) C C is trivially true, C = (R) and we say R spans C (Definition TSVS [313]). Notice that this
demonstration is no more or less valid if we hide from the reader our scratchwork that suggested ai
4x-7y-3 and a2= -x+2y+1.


Subsection VR
Vector Representation


In Chapter R [530] we will take up the matter of representations fully, where Theorem VRRB [317] will
be critical for Definition VR [530]. We will now motivate and prove a critical theorem that tells us how
to "represent" a vector. This theorem could wait, but working with it now will provide some extra insight
into the nature of linearly independent spanning sets. First an example, then the theorem.

Example AVR
A vector representation
Consider the set


from the vector space C3. Let A be the matrix whose columns are the set S, and verify that A is nonsingular.
By Theorem NMLIC [138] the elements of S form a linearly independent set. Suppose that b E C3. Then
[S(A, b) has a (unique) solution (Theorem NMUS [74]) and hence is consistent. By Theorem SLSLC [93],
b E (S). Since b is arbitrary, this is enough to show that (S) = C3, and therefore S is a spanning set for


Version 2.02


﻿
                                                        Subsection LISS.VR  Vector Representation 318


C3 (Definition TSVS [313]). (This set comes from the columns of the coefficient matrix of Archetype B
[707].)
                                                                  --33-
   Now examine the situation for a particular choice of b, say b =[24]. Because S is a spanning set
                                                                    5
for C3, we know we can write b as a linear combination of the vectors in 5,

                             --33          ~-7         ~-6        -12
                             24    = (-3)   5   + (5)   5   + (2) 7     .
                             5              1           0           4

The nonsingularity of the matrix A tells that the scalars in this linear combination are unique. More
precisely, it is the linear independence of S that provides the uniqueness. We will refer to the scalars
ai = -3, a2 = 5, a3= 2 as a "representation of b relative to S." In other words, once we settle on S as
a linearly independent set that spans C3, the vector b is recoverable just by knowing the scalars ai1= -3,
a2 = 5, a3= 2 (use these scalars in a linear combination of the vectors in S). This is all an illustration of
the following important theorem, which we prove in the setting of a general vector space.

Theorem VRRB
Vector Representation Relative to a Basis
Suppose that V is a vector space and B = {vi, v2, v3, ..., vm} is a linearly independent set that spans
V. Let w be any vector in V. Then there exist unique scalars ai, a2, a3, ..., am such that

                                w = aiv1 + a2v2 + a3v3+ -..-+ amvm.


Proof That w can be written as a linear combination of the vectors in B follows from the spanning
property of the set (Definition TSVS [313]). This is good, but not the meat of this theorem. We now know
that for any choice of the vector w there exist some scalars that will create w as a linear combination of
the basis vectors. The real question is: Is there more than one way to write w as a linear combination of
{vi, v2, v3, ... , vm}? Are the scalars ai, a2, a3, ... , am unique? (Technique U [693])
   Assume there are two ways to express w as a linear combination of {vi, v2, v3, ..., Vm}. In other
words there exist scalars ai, a2, a3, ..., am and bi, b2, b3, ..., bm so that

                                w = aiv1 +a2v2+a3v3+---+amvm
                                W =bivi+b2V2+b3V3+---+bmvm.

Then notice that

       0 = w + (-w)                                      Property Al [280]
                     = w +(-1)wTheorem AISM [287]
         = (aivi + a2v2 + a3v3 -|- -. -+-- amvm)+
             (-1)(bivi + b2v2 + b3v3 +| -. + | bmvm)
         - (aivi + a2v2 + a3v3 +| -. + | amvm)+
             (-bivi - b2v2 - b3v3 - . .. - bmvm)         Property DVA [280]
          =(ai - bi)v1 + (a2 - b2)v2 + (a3 - b)3|
             - - + (am - bm)vm                           Property C [279], Property DSA [280]

But this is a relation of linear dependence on a linearly independent set of vectors (Definition RLD [308])!


Now we are using the other assumption about B, that {vi, v2, v3, ... , vm } is a linearly independent set.
So by Definition LI [308] it must happen that the scalars are all zero. That is,

       (ai -bi) =0         (a2 -b2) =0          (as3-b3) =0         . ..     (am, - bm,) = 0


Version 2.02


﻿
                                                         Subsection LISS.READ   Reading Questions 319


              ai=bia2 = b2                            as=b3          ...             am=bm.

And so we find that the scalars are unique.                                                         U
   This is a very typical use of the hypothesis that a set is linearly independent obtain a relation of
linear dependence and then conclude that the scalars must all be zero. The result of this theorem tells
us that we can write any vector in a vector space as a linear combination of the vectors in a linearly
independent spanning set, but only just. There is only enough raw material in the spanning set to write
each vector one way as a linear combination. So in this sense, we could call a linearly independent spanning
set a "minimal spanning set." These sets are so important that we will give them a simpler name ("basis")
and explore their properties further in the next section.

Subsection READ
Reading Questions


  1. Is the set of matrices below linearly independent or linearly dependent in the vector space M22? Why
     or why not?

                                     {  1   3    -2    3      0   9
                                       -2 4 '     3   -5_ ' -1 3_-

  2. Explain the difference between the following two uses of the term "span":
     (a) S is a subset of the vector space V and the span of S is a subspace of V.
     (b) W is subspace of the vector space Y and T spans W.

  3. The set
                                                6      4     5
                                        S=      2, -3 ,8
                                               11      1     2

                                                              -6
     is linearly independent and spans C3. Write the vector x = 2 a linear combination of the elements
                                                               2
     of S. How many ways are there to answer this question, and which theorem allows you to say so?


Version 2.02


﻿
                                                                  Subsection LISS.EXC  Exercises 320


Subsection EXC
Exercises


C20 In the vector space of 2 x 2 matrices, M22, determine if the set S below is linearly independent.


Contributed by Robert Beezer Solution [321]

C21 In the crazy vector space C (Example CVS [283]), is the set S = {(0, 2), (2, 8)} linearly indepen-
dent?
Contributed by Robert Beezer Solution [321]

C22 In the vector space of polynomials P3, determine if the set S is linearly independent or linearly
dependent.
                       S= {2+ x - 3x2 -8x3, 1+      + x2+5x3, 3 - 4x2 - 7x3}


Contributed by Robert Beezer Solution [322]

C23 Determine if the set S = {(3, 1), (7, 3)} is linearly independent in the crazy vector space C (Example
CVS [283]).
Contributed by Robert Beezer Solution [322]

C30 In Example LIM32 [310], find another nontrivial relation of linear dependence on the linearly de-
pendent set of 3 x 2 matrices, S.
Contributed by Robert Beezer

C40 Determine if the set T = {x2 - x + 5, 4x3 - 2 + 5x, 3x + 2} spans the vector space of polynomials
with degree 4 or less, P4.
Contributed by Robert Beezer Solution [322]

C41 The set W is a subspace of M22, the vector space of all 2 x 2 matrices. Prove that S is a spanning
set for W.

           W={|a ]J2a - 3b+4c - d = 0}                    S{        01[0      ]   [0  4]}


Contributed by Robert Beezer Solution [322]

C42 Determine if the set S ={(3, 1), (7, 3)} spans the crazy vector space C (Example CVS [283]).
Contributed by Robert Beezer Solution [323]

M1O Halfway through Example SSP4 [313], we need to show that the system of equations


                                      ~-2    4Z    8   16_    _e


is consistent for every choice of the vector of constants satisfying 16a + 8b + 4c + 2d + e = 0.
   Express the column space of the coefficient matrix of this system as a null space, using Theorem FS
[263]. From this use Theorem CSCS [237] to establish that the system is always consistent. Notice that


Version 2.02


﻿
                                                                   Subsection LISS.EXC  Exercises 321


this approach removes from Example SSP4 [313] the need to row-reduce a symbolic matrix.
Contributed by Robert Beezer Solution [323]

T40 Prove the following variation of Theorem EMMVP [196]: Suppose that B = {u, u2, u3, ..., un}
is a basis for C". Suppose also that A and B are m x n matrices such that Aui = But for every 1 < i < n.
Then A = B. Can you modify the hypothesis further and obtain a generalization of Theorem EMMVP
[196]?
Contributed by Robert Beezer

T50 Suppose that V is a vector space and u, v E V are two vectors in V. Use the definition of linear
independence to prove that S = {u, v} is a linearly dependent set if and only if one of the two vectors is
a scalar multiple of the other. Prove this directly in the context of an abstract vector space (V), without
simply giving an upgraded version of Theorem DLDS [152] for the special case of just two vectors.
Contributed by Robert Beezer Solution [323]


Version 2.02


﻿
                                                                    Subsection LISS.SOL  Solutions 322


Subsection SOL
Solutions


C20    Contributed by Robert Beezer  Statement [319]
Begin with a relation of linear dependence on the vectors in S and massage it according to the definitions
of vector addition and scalar multiplication in Ml22,

                                     a    2 -1          0   4   4       23
                                     0 i1     3_ + a2   -1 2_ +a     1 3_

                              0 0        2ai + 4a3   -ai + 4a2 + 2a3
                              0 0      al - a2+ a3   3a1+ 2a2+3a3_


By our definition of matrix equality (Definition ME [182]) we arrive at
equations,


a homogeneous system of linear


                                                2a1 + 4a3 = 0
                                         -ai + 4a2 + 2a3 = 0
                                             ai - a2+ a3 =0
                                          3a1 + 2a2 + 3a3 = 0

The coefficient matrix of this system row-reduces to the matrix,

                                              1    0   0
                                              0    1i  0
                                              0    0  [-1
                                              0    0   0

and from this we conclude that the only solution is ai= a2 = a3= 0. Since the relation of linear
dependence (Definition RLD [308]) is trivial, the set S is linearly independent (Definition LI [308]).
C21    Contributed by Robert Beezer  Statement [319]
We begin with a relation of linear dependence using unknown scalars a and b. We wish to know if these
scalars must both be zero. Recall that the zero vector in C is (-1, -1) and that the definitions of vector
addition and scalar multiplication are not what we might expect.


o = (-1, -1)
  = a(0, 2) + b(2, 8)
  = (Oa+a-1, 2a+a-1)+(2b+b-1, 8b+b-1)
  = (a - 1, 3a - 1) + (3b - 1, 9b - 1)
  = (a- 1+3b- 1+1, 3a- 1+9b- 1+1)
  = (a + 3b - 1, 3a + 9b- 1)


Definition RLD [308]
Scalar mult., Example CVS [283]


Vector addition, Example CVS [283]


From this we obtain two equalities, which can be converted to a homogeneous system of equations,


1=a+3b-1
1=3a+9b-1


a+3b=0
3a+9b=0


This homogeneous system has a singular coefficient matrix (Theorem SMZD [389]), and so has more than
just the trivial solution (Definition NM [71]). Any nontrivial solution will give us a nontrivial relation of
linear dependence on S. So S is linearly dependent (Definition LI [308]).


Version 2.02


﻿
                                                                    Subsection LISS.SOL  Solutions 323


C22    Contributed by Robert Beezer  Statement [319]
Begin with a relation of linear dependence (Definition RLD [308]),

                ai(2+x-3x2-8x3)+a2 (1+x+x2 +5x3)+a3(3-4x2-7x3) =0

Massage according to the definitions of scalar multiplication and vector addition in the definition of P3
(Example VSP [281]) and use the zero vector dro this vector space,

   (2ai + a2 + 3a3) + (al + a2) x + (-3ai + a2 - 4a3) x2 + (-8ai + 5a2 - 7a3) x3 = 0 + Ox + Ox2 + Ox3

The definition of the equality of polynomials allows us to deduce the following four equations,

                                            2ai + a2 + 3a3 = 0
                                                   a1 + a2 =0
                                          -3ai + a2 - 4a3= 0
                                          -8ai + 5a2 - 7a3 = 0

Row-reducing the coefficient matrix of this homogeneous system leads to the unique solution ai= a2
a3 = 0. So the only relation of linear dependence on S is the trivial one, and this is linear independence
for S (Definition LI [308]).
C23    Contributed by Robert Beezer  Statement [319]
Notice, or discover, that the following gives a nontrivial relation of linear dependence on S in C, so by
Definition LI [308], the set S is linearly dependent.

                        2(3, 1) + (-1)(7, 3) = (7, 3) + (-9, -5)  (-1, -1) = 0

C40    Contributed by Robert Beezer  Statement [319]
The polynomial x4 is an element of P4. Can we write this element as a linear combination of the elements
of T? To wit, are there scalars a1, a2, a3 such that

                         x4 = ai(x2 - x+ 5) + a2 (4x3 - x2+ 5x) + a3(3x + 2)

Massaging the right side of this equation, according to the definitions of Example VSP [281], and then
equating coefficients, leads to an inconsistent system of equations (check this!). As such, T is not a spanning
set for P4.
C41    Contributed by Robert Beezer  Statement [319]
We want to show that W = (S) (Definition TSVS [313]), which is an equality of sets (Definition SE [684]).
    First, show that (S)                          - W. Begin by checking that each of the three matrices in S is a member of the
set W. Then, since W is a vector space, the closure properties (Property AC [279], Property SC [279])
guarantee that every linear combination of elements of S remains in W.
    Second, show that W C (S). We want to convince ourselves that an arbitrary element of W is a linear
combination of elements of S. Choose


The values of a, b, c, d are not totally arbitrary, since membership in W requires that 2a - 3b +4c - d= 0.
Now, rewrite as follows,


x =[a  b]
     -c  d
     = a      b                                    21    b   A
     Lc2a-3+       c          2a - 3b++ Ac                       d=


Version 2.02


﻿
Subsection LISS.SOL  Solutions 324


   -0  2a] + [0  -3b] + [ C4c]

= a 0   2 +(b 0 1-)+c
E (S)


Definition MA [182]

Definition MSM [183]

Definition SS [298]


C42    Contributed by Robert Beezer    Statement [319]
We will try to show that S spans C. Let (x, y) be an arbitrary element of C and search for scalars ai and
a2 such that


(x, y) =1ai(3, 1) + a2(7, 3)
      = (4ai - 1, 2ai - 1) + (8a2
      = (4ai + 8a2 - 1, 2ai + 4a2


1, 4a2 - 1)
1)


Equality in C leads to the system


4a1 + 8a2
2g1 + 4a2


x+1
y+1


This system has a singular coefficient matrix whose column space is simply K[]). So any choice of x

and y that causes the column vectorX+   to lie outside the column space will lead to an inconsistent
system, and hence create an element (x, y) that is not in the span of S. So S does not span C.
   For example, choose x= 0 and y = 5, and then we can see that 6  K [/ and we know that (0, 5)
cannot be written as a linear combination of the vectors in S. A shorter solution might begin by asserting
that (0, 5) is not in (S) and then establishing this claim alone.
M10     Contributed by Robert Beezer  Statement [319]
Theorem FS [263] provides the matrix

                                           L~

and so if A denotes the coefficient matrix of the system, then C(A) = N(L). The single homogeneous
equation in [S(L, 0) is equivalent to the condition on the vector of constants (use a, b, c, d, e as variables
and then multiply by 16).
T50    Contributed by Robert Beezer  Statement [320]

    (-) If S is linearly dependent, then there are scalars a and 3, not both zero, such that au +3v = 0.
Suppose that a # 0, the proof proceeds similarly if /3 # 0. Now,


U = lu

      1

   =- (au)
     1
  = - (au+0)
     1
   =- (au+#~v-#v)


Property 0 [280]

Property MICN [681]

Property SMA [280]

Property Z [280]

Property Al [280]


Version 2.02


﻿
Subsection LISS.SOL  Solutions 325


1
- (0- f#v)
1
- (-3v)

-v


Definition LI [308]

Property Z [280]

Property SMA [280]


which shows that u is a scalar multiple of v.
    (<) Suppose now that u is a scalar multiple of v. More precisely, suppose there
that u = yv. Then


is a scalar y such


(-1)u+ 7v


(-1)u+u
(-1)u + (1)u
((-1)+1)u
Ou
0


Property 0 [280]
Property DSA [280]
Property AICN [681]
Theorem ZSSM [286]


This is a relation of linear of linear dependence on S (Definition RLD [308]), which is nontrivial since one
of the scalars is -1. Therefore S is linearly dependent by Definition LI [308].
    Be careful using this theorem. It is only applicable to sets of two vectors. In particular, linear de-
pendence in a set of three or more vectors can be more complicated than just one vector being a scalar
multiple of another.


Version 2.02


﻿
                                                                                  Section B Bases 326


Section B
Bases
N                                                                                                   _


A basis of a vector space is one of the most useful concepts in linear algebra. It often provides a concise,
finite description of an infinite vector space.


Subsection B
Bases


We now have all the tools in place to define a basis of a vector space.

Definition B
Basis
Suppose V is a vector space. Then a subset S C V is a basis of V if it is linearly independent and spans
V.                                                                                                   A

    So, a basis is a linearly independent spanning set for a vector space. The requirement that the set
spans V insures that S has enough raw material to build V, while the linear independence requirement
insures that we do not have any more raw material than we need. As we shall see soon in Section D [341],
a basis is a minimal spanning set.
   You may have noticed that we used the term basis for some of the titles of previous theorems (e.g.
Theorem BNS [139], Theorem BCS [239], Theorem BRS [245]) and if you review each of these theorems you
will see that their conclusions provide linearly independent spanning sets for sets that we now recognize
as subspaces of Ctm. Examples associated with these theorems include Example NSLIL [140], Example
CSOCD [240] and Example IAS [246]. As we will see, these three theorems will continue to be powerful
tools, even in the setting of more general vector spaces.
    Furthermore, the archetypes contain an abundance of bases. For each coefficient matrix of a system
of equations, and for each archetype defined simply as a matrix, there is a basis for the null space, three
bases for the column space, and a basis for the row space. For this reason, our subsequent examples will
concentrate on bases for vector spaces other than Cm. Notice that Definition B [325] does not preclude
a vector space from having many bases, and this is the case, as hinted above by the statement that the
archetypes contain three bases for the column space of a matrix. More generally, we can grab any basis for
a vector space, multiply any one basis vector by a non-zero scalar and create a slightly different set that
is still a basis. For "important" vector spaces, it will be convenient to have a collection of "nice" bases.
When a vector space has a single particularly nice basis, it is sometimes called the standard basis though
there is nothing precise enough about this term to allow us to define it formally -it is a question of style.
Here are some nice bases for important vector spaces.

Theorem SUVB
Standard Unit Vectors are a Basis
The set of standard unit vectors for Ctm (Definition SUV [173]), B ={ei, e2, es, -.-.-, em} ={ei 1   i   m}
is a basis for the vector space Ctm.D


Proof We must show that the set B is both linearly independent and a spanning set for Cm. First, the
vectors in B are, by Definition SUV [173], the columns of the identity matrix, which we know is nonsingular
(since it row-reduces to the identity matrix, Theorem NMRRI [72]). And the columns of a nonsingular
matrix are linearly independent by Theorem NMLIC [138].


Version 2.02


﻿
                                                                         Subsection B.B Bases 327


   Suppose we grab an arbitrary vector from Cm, say

                                                 V1
                                                 V2
                                            v =  V3.
                                              vm


Can we write v as a linear combination of the vectors in B? Yes, and quite simply.

                           V1         1       0        0              0
                           V2         0        1       0              0
                           V3   =v1 0 +v2 0 +v3 1 +---+vm             0


                           Vm         0       0        0              1
                              v=viei+ v2e2 + v3e3 + ... + vmem

this shows that Cm C (B), which is sufficient to show that B is a spanning set for Ctm.      U

Example BP
Bases for Pn
The vector space of polynomials with degree at most n, Pa, has the basis

                                    B ={1, x, x2, x3, ..., z}.

Another nice basis for Pn is

             C= {1, 1+x, 1+x+x2, 1+x+x2+x3, ..., 1+x+x2+x3+...+x"}.

Checking that each of B and C is a linearly independent spanning set are good exercises.

Example BM
A basis for the vector space of matrices
In the vector space Mmn of matrices (Example VSM [281]) define the matrices BkU, 1 < k < m, 1 < < rn
by
                                              1i if k=if=j
                                    [BkfI ]
                                        [ 0      otherwise

So these matrices have entries that are all zeros, with the exception of a lone entry that is one. The set of
all mnt of them,
                                 B ={Br 1<5k 5m, 1<C E        t}

forms a basis for Mmn.

   The bases described above will often be convenient ones to work with. However a basis doesn't have
to obviously look like a basis.

Example BSP4


A basis for a subspace of P4
In Example SSP4 [313] we showed that

              S = {x - 2, 2 - 4x + 4, x3 - 6x2 + 12x - 8, x4 - 8x3 + 242 - 32x + 16}


Version 2.02


﻿
                                                                            Subsection B.B Bases 328


is a spanning set for W = {p(x) | p E P4, p(2) = 0}. We will now show that S is also linearly independent
in W. Begin with a relation of linear dependence,

        0 + Ox + Ox2 + Ox3 + Ox4=   1 (x - 2) + c2 (X2 - 4x +4)
                                   + a3 (x3 - 6x2 + 12x - 8) + a4 (54 - 8x3 + 242 - 32x + 16)
                                 = a4x4 + (as - 8a4) x3 + (a2 - 6a3 + 24x4) x2
                                   + (ai - 4a2 + 12x3 - 32x4) x + (-2ai + 4a2 - 8a3 + 16x4)

Equating coefficients (vector equality in P4) gives the homogeneous system of five equations in four vari-
ables,

                                                           a4 = 0
                                                    as3 - 8ca4 =0
                                             a2- 63 + 24x4 = 0
                                      ai- 4a2 + 123 - 32x4=0
                                    -2a1 + 4a2 - 8a3 + 16x4=0


We form the coefficient matrix, and row-reduce to obtain a matrix in reduced row-echelon form


                                           O    0   0    0
                                           0    0[]      0
                                           0    0   0   W
                                           0    0   0    0

With only the trivial solution to this homogeneous system, we conclude that only scalars that will form a
relation of linear dependence are the trivial ones, and therefore the set S is linearly independent (Definition
LI [308]). Finally, S has earned the right to be called a basis for W (Definition B [325]).

Example BSM22
A basis for a subspace of M22
In Example SSM22 [314] we discovered that


                                          0 0_OJ' L-4 1_J
is a spanning set for the subspace

                     Z =[ | a+3b-c -5d =0,-2a -6b+3c+14d=0}

of the vector space of all 2 x 2 matrices, M22. If we can also determine that Q is linearly independent in
Z (or in M122), then it will qualify as a basis for Z. Let's begin with a relation of linear dependence.


                                            [-3o1 +2 ai1


Using our definition of matrix equality (Definition ME [182]) we equate corresponding entries and get a
homogeneous system of four equations in two variables,

                                            -3a1 + a2 = 0


Version 2.02


﻿
                                              Subsection B.BSCV   Bases for Spans of Column Vectors 329


                                                 -4a2 = 0
                                                   2=0

We could row-reduce the coefficient matrix of this homogeneous system, but it is not necessary. The second
and fourth equations tell us that ai= 0, a2 = 0 is the only solution to this homogeneous system. This
qualifies the set Q as being linearly independent, since the only relation of linear dependence is trivial
(Definition LI [308]). Therefore Q is a basis for Z (Definition B [325]).

Example BC
Basis for the crazy vector space
In Example LIC [312] and Example SSC [315] we determined that the set R = {(1, 0), (6, 3)} from the
crazy vector space, C (Example CVS [283]), is linearly independent and is a spanning set for C. By
Definition B [325] we see that R is a basis for C.

   We have seen that several of the sets associated with a matrix are subspaces of vector spaces of column
vectors. Specifically these are the null space (Theorem NSMS [296]), column space (Theorem CSMS [302]),
row space (Theorem RSMS [303]) and left null space (Theorem LNSMS [303]). As subspaces they are vector
spaces (Definition S [292]) and it is natural to ask about bases for these vector spaces. Theorem BNS [139],
Theorem BCS [239], Theorem BRS [245] each have conclusions that provide linearly independent spanning
sets for (respectively) the null space, column space, and row space. Notice that each of these theorems
contains the word "basis" in its title, even though we did not know the precise meaning of the word at
the time. To find a basis for a left null space we can use the definition of this subspace as a null space
(Definition LNS [257]) and apply Theorem BNS [139]. Or Theorem FS [263] tells us that the left null space
can be expressed as a row space and we can then use Theorem BRS [245].
   Theorem BS [157] is another early result that provides a linearly independent spanning set (i.e. a basis)
as its conclusion. If a vector space of column vectors can be expressed as a span of a set of column vectors,
then Theorem BS [157] can be employed in a straightforward manner to quickly yield a basis.


Subsection BSCV
Bases for Spans of Column Vectors


We have seen several examples of bases in different vector spaces. In this subsection, and the next (Sub-
section B.BNM [330]), we will consider building bases for Ctm and its subspaces.
   Suppose we have a subspace of Cm that is expressed as the span of a set of vectors, S, and S is
not necessarily linearly independent, or perhaps not very attractive. Theorem REMRS [244] says that
row-equivalent matrices have identical row spaces, while Theorem BRS [245] says the nonzero rows of a
matrix in reduced row-echelon form are a basis for the row space. These theorems together give us a great
computational tool for quickly finding a basis for a subspace that is expressed originally as a span.

Example RSB
Row space basis
When we first defined the span of a set of column vectors, in Example SCAD [120] we looked at the set


                                          2      1      7     -7
                               W =        -3 ,4 ,-5 ,-6
                                         11      1      4     -5

with an eye towards realizing W as the span of a smaller set. By building relations of linear dependence
(though we did not know them by that name then) we were able to remove two vectors and write W as


Version 2.02


﻿
                                              Subsection B.BSCV   Bases for Spans of Column Vectors 330


the span of the other two vectors. These two remaining vectors formed a linearly independent set, even
though we did not know that at the time.
   Now we know that W is a subspace and must have a basis. Consider the matrix, C, whose rows are
the vectors in the spanning set for W,

                                                2   -3   1
                                                1   4    1
                                            C   7   -5   4
                                              4-7 -6 -5

Then, by Definition RSM [243], the row space of C will be W, R(C) = W. Theorem BRS [245] tells us
that if we row-reduce C, the nonzero rows of the row-equivalent matrix in reduced row-echelon form will
be a basis for 7Z(C), and hence a basis for W. Let's do it C row-reduces to


                                              0
                                              0
                                              0   0   0
                                              _0  0   0_

If we convert the two nonzero rows to column vectors then we have a basis,

                                                 1      0
                                        B=       0   ,  1


and
                                                 1      0
                                      W=         0   ,  1


For aesthetic reasons, we might wish to multiply each vector in B by 11, which will not change the spanning
or linear independence properties of B as a basis. Then we can also write

                                                 11     0
                                      W =        0, 11
                                                 7      1


   Example IAS [246] provides another example of this flavor, though now we can notice that X is a
subspace, and that the resulting set of three vectors is a basis. This is such a powerful technique that we
should do one more example.

Example RS
Reducing a span
In Example RSC5 [153] we began with a set of nr= 4 vectors from C5,


                        R ={vi, V2, v3, V4}     =-1 , 3 ,        6    , 2}


and defined V = (R). Our goal in that problem was to find a relation of linear dependence on the vectors
in R, solve the resulting equation for one of the vectors, and re-express V as the span of a set of three
vectors.


Version 2.02


﻿
                                                Subsection B.BNM  Bases and Nonsingular Matrices 331


   Here is another way to accomplish something similar. The row space of the matrix

                                         1   2   -1    3    2
                                      A-2    1    3    1    2
                                         0 -7     6   -11 -2
                                         4   1    2    1    6

is equal to (R). By Theorem BRS [245] we can row-reduce this matrix, ignore any zero rows, and use
the non-zero rows as column vectors that are a basis for the row space of A. Row-reducing A creates the
matrix
                                        1 0 0       1   3o0
                                                   17 17
                                        0 1 0     -i     2
                                                  17 17
                                        0 0 1     -2   -8
                                        0 0 0      0    0
So
                                         1       0       0
                                         0       1       0
                                         0   ,   0   ,   1
                                         3       2
                                         - 17  - - 17 - - 17 -
                                      I  0 i       J;- 8JL_17 17 7I

is a basis for V. Our theorem tells us this is a basis, there is no need to verify that the subspace spanned
by three vectors (rather than four) is the identical subspace, and there is no need to verify that we have
reached the limit in reducing the set, since the set of three vectors is guaranteed to be linearly independent.


Subsection BNM
Bases and Nonsingular Matrices


A quick source of diverse bases for Cm is the set of columns of a nonsingular matrix.
Theorem CNMB
Columns of Nonsingular Matrix are a Basis
Suppose that A is a square matrix of size m. Then the columns of A are a basis of Cm if and only if A is
nonsingular.                                                                                     D
Proof    (-) Suppose that the columns of A are a basis for Ctm. Then Definition B [325] says the set of
columns is linearly independent. Theorem NMLIC [138] then says that A is nonsingular.
    (<-) Suppose that A is nonsingular. Then by Theorem NMLIC [138] this set of columns is linearly
independent. Theorem CSNM [242] says that for a nonsingular matrix, C(A)= Ctm. This is equivalent
to saying that the columns of A are a spanning set for the vector space Cm. As a linearly independent
spanning set, the columns of A qualify as a basis for Ctm (Definition B [325]).U

Example CABAK
Columns as Basis, Archetype K
Archetype K [746] is the 5 x 5 matrix


       10    18    24   24   -12
       12    -2   -6     0   -18
K =   -30   -21   -23   -30   39
       27    30    36   37   -30
       18    24    30   30   -20


Version 2.02


﻿
                                             Subsection B.OBC  Orthonormal Bases and Coordinates 332


which is row-equivalent to the 5 x 5 identity matrix I5. So by Theorem NMRRI [72], K is nonsingular.
Then Theorem CNMB [330] says the set

                                10      18      24       24     -12
                                12      -2      -6       0      -18
                                -30 , -21 , -23 , -30 ,          39
                                27      30      36       37     -30
                                18      24      30       30     -20

is a (novel) basis of C5.

   Perhaps we should view the fact that the standard unit vectors are a basis (Theorem SUVB [325]) as
just a simple corollary of Theorem CNMB [330]? (See Technique LC [696].)
   With a new equivalence for a nonsingular matrix, we can update our list of equivalences.
Theorem NME5
Nonsingular Matrix Equivalences, Round 5
Suppose that A is a square matrix of size n. The following are equivalent.

  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, Nf(A) = {0}.

  4. The linear system [S(A, b) has a unique solution for every possible choice of b.

  5. The columns of A are a linearly independent set.

  6. A is invertible.

  7. The column space of A is C"m, C(A) = C'.

  8. The columns of A are a basis for C".


Proof With a new equivalence for a nonsingular matrix in Theorem CNMB [330] we can expand Theorem
NME4 [242].                                                                                      U


Subsection OBC
Orthonormal Bases and Coordinates


We learned about orthogonal sets of vectors in Cm back in Section 0 [167], and we also learned that
orthogonal sets are automatically linearly independent (Theorem OSLI [174]). When an orthogonal set
also spans a subspace of Cm, then the set is a basis. And when the set is orthonormal, then the set is
an incredibly nice basis. We will back up this claim with a theorem, but first consider how you might
manufacture such a set.
   Suppose that W is a subspace of Ctm with basis B. Then B spans W and is a linearly independent
set of nonzero vectors. We can apply the Gram-Schmidt Procedure (Theorem GSP [175]) and obtain a


linearly independent set T such that (T) = (B) = W and T is orthogonal. In other words, T is a basis for
W, and is an orthogonal set. By scaling each vector of T to norm 1, we can convert T into an orthonormal
set, without destroying the properties that make it a basis of W. In short, we can convert any basis into
an orthonormal basis. Example GSTV [176], followed by Example ONTV [177], illustrates this process.


Version 2.02


﻿
                                             Subsection B.OBC  Orthonormal Bases and Coordinates 333


   Unitary matrices (Definition UM [229]) are another good source of orthonormal bases (and vice versa).
Suppose that Q is a unitary matrix of size n. Then the n columns of Q form an orthonormal set (Theorem
CUMOS [230]) that is therefore linearly independent (Theorem OSLI [174]). Since Q is invertible (Theorem
UMI [230]), we know Q is nonsingular (Theorem NI [228]), and then the columns of Q span C" (Theorem
CSNM [242]). So the columns of a unitary matrix of size n are an orthonormal basis for C".
   Why all the fuss about orthonormal bases? Theorem VRRB [317] told us that any vector in a vector
space could be written, uniquely, as a linear combination of basis vectors. For an orthonormal basis,
finding the scalars for this linear combination is extremely easy, and this is the content of the next theorem.
Furthermore, with vectors written this way (as linear combinations of the elements of an orthonormal set)
certain computations and analysis become much easier. Here's the promised theorem.

Theorem COB
Coordinates and Orthonormal Bases
Suppose that B = {vi, v2, v3, ... , vp} is an orthonormal basis of the subspace W of C". For any w E W,

                     w = (w, vi) vi+ (w, v2) v2 + (w, v3) v3 +...+ (w, vp) vp


Proof Because B is a basis of W, Theorem VRRB [317] tells us that we can write w uniquely as a
linear combination of the vectors in B. So it is not this aspect of the conclusion that makes this theorem
interesting. What is interesting is that the particular scalars are so easy to compute. No need to solve big
systems of equations  just do an inner product of w with vi to arrive at the coefficient of vi in the linear
combination.
   So begin the proof by writing w as a linear combination of the vectors in B, using unknown scalars,


w = alvl +a2v2+a3v3+...+apvp


and compute,


(w, v   = K   akvk, v
           k=1
           P
       = Za(aKvk, v2)
         k=1
         P
       = Eak (vk, vi)
         k=1
                      p
       = ai (vi, v2) +  ak (vk, vi)
                     i=1
                     k#i

       = a(1) + Ka(0)
                 i=1
                 k#i
       = ai


Theorem VRRB [317]


Theorem IPVA [169]


Theorem IPSM [170]


Property C [279]


Definition ONS [177]


So the (unique) scalars for the linear combination are indeed the inner products advertised in the conclusion
of the theorem's statement.                                                                      U

Example CROB4
Coordinatization relative to an orthonormal basis, C4


Version 2.02


﻿
                                            Subsection B.OBC Orthonormal Bases and Coordinates 334


The set
                                     1+ i     1+5i       -7+34i       -2-4i
                                       1      6+5i       -8-23i        6+i
                 (xi, x2, x3, x4) =  1 - i ' -7 - i ' -10+ 22i '       4+  i
                                     _Xi_3_-L6i_]30+13i                6-i
was proposed, and partially verified, as an orthogonal set in Example AOS [173]. Let's scale each vector
to norm 1, so as to form an orthonormal set in C4. Then by Theorem OSLI [174] the set will be linearly
independent, and by Theorem NME5 [331] the set will be a basis for C4. So, once scalked to norm 1, the
adjusted set will be an orthonormal basis of C4. The norms are,


||X2|| = 174


|X3|| = v3451


||X4|| = 119


So an orthonormal basis is


B = {v1, v2, v3, v4}
           1+i            1+5i             -7+34i
       1     14      1[6+5i          1     -8 - 23i
           1 -  '174 -7 - i  3451 -10 + 22i
           _i __ 1 -          6i__ 30 +         13i _


1
119


-2- 4i
6+i
4+ 3i
6-i]_


Now, to illustrate Theorem COB [332], choose any


                         2

vector from (C4, say w = [3, and compute

                         4


          -5i
(w) vi) =


          -19 + 30i
(w, v2)   -1=
     ' v/li17'


          120 - 211i
(w  v3) =     451


(w) V4) - 6 + 12i
            119


Then Theorem COB [332] guarantees that


[


21
-3
1
4]


-5i    1


1+i i
  1
1 -iJ
  ij.


-19 + 30i
  /174


1


L


1 + 5i
6 + 5i
-7-i
1-6i


p


120 - 211i
  v/3451


  1
v3451


_K


-7 + 34i
-8 - 23i
-10 + 22i
30 + 13i _/


6 + 12i
/119


1
Vi9


L


-2- 4i
6+i
4+ 3i
6-i


as you might want to check (if you have unlimited patience).
   A slightly less intimidating example follows, in three dimensions and with just real numbers.
Example CROB3
Coordinatization relative to an orthonormal basis, C3
The set
                                                 1    -1     2
                               {x1, X2, X3}{=   2 ,    0  , 1
                                               11      1     1
is a linearly independent set, which the Gram-Schmidt Process (Theorem GSP [175]) converts to an
orthogonal set, and which can then be converted to the orthonormal set,


B = {v1, v2, v3}


1   1    1   -1     1   1
    1   v'2-  0    V3   it


Version 2.02


﻿
                                             Subsection B.OBC  Orthonormal Bases and Coordinates 335


which is therefore an orthonormal basis of C3. With three vectors in C3, all with real number entries,
the inner product (Definition IP [168]) reduces to the usual "dot product" (or scalar product) and the
orthogonal pairs of vectors can be interpreted as perpendicular pairs of directions. So the vectors in B
serve as replacements for our usual 3-D axes, or the usual 3-D unit vectors i, j and k. We would like
to decompose arbitrary vectors into "components" in the directions of each of these basis vectors. It is
Theorem COB [332] that tells us how to do this.
                                 .2
   Suppose that we choose w = -1 . Compute
                                 5

                  5                                   3                           8
               (w, vi) =5(w,v2)=                                       (w,v3)=

then Theorem COB [332] guarantees that


                        2   5     1 1         3    1 1           8     1   1
                    -1 =             2    +0                  +-1
                    5                1                  1-j V6(V/- " - ) v21 ( /-2-1

which you should be able to check easily, even if you do not have much patience.

   Not only do the columns of a unitary matrix form an orthonormal basis, but there is a deeper connection
between orthonormal bases and unitary matrices. Informally, the next theorem says that if we transform
each vector of an orthonormal basis by multiplying it by a unitary matrix, then the resulting set will be
another orthonormal basis. And more remarkably, any matrix with this property must be unitary! As an
equivalence (Technique E [690]) we could take this as our defining property of a unitary matrix, though it
might not have the same utility as Definition UM [229].

Theorem UMCOB
Unitary Matrices Convert Orthonormal Bases
Let A be an n x n matrix and B = {xi, x2, x3, ..., xn} be an orthonormal basis of C". Define

                                  C = {Axi, Ax2, Ax3, ..., Axn}

Then A is a unitary matrix if and only if C is an orthonormal basis of C"m.                      D

Proof (-) Assume A is a unitary matrix and establish several facts about C. First we check that C
is an orthonormal set (Definition ONS [177]). By Theorem UMPIP [231], for i -j,

                                      (Ax , Ax) =(xi, x) =0

Similarly, Theorem UMPIP [231] also gives, for 1 < i <rtn

                                          Axil  = |xil   1

As C is an orthogonal set (Definition OSV [173]), Theorem OSLI [174] yields the linear independence of C.
Having established that the column vectors on C form a linearly independent set, a matrix whose columns
are the vectors of C is nonsingular (Theorem NMLIC [138]), and hence these vectors form a basis of C"
by Theorem CNMB [330].


   (<) Now assume that C is an orthonormal set. Let y be an arbitrary vector from C". Since B spans
C", there are scalars, ai, a2, a3, ..., an, such that

                                y=alxl+a2x2+a3x3+---+anxm


Version 2.02


﻿
Subsection B.OBC Orthonormal Bases and Coordinates 336


Now


A* Ay =  KA* Ay, xi x
        i=1


          SKA*A  ax, xi xi


      -ES KaA*Axj, xi) xi
        i~l j~l
        12 1
     - E5a KA*Axj, xi xi
        i~l j~l

     _E5E5a (Ax,(*xi) x
        i~l j~l

     -E E a (A*x, Axi) x
        i~l j~l

     -E55 a3 (Axj, (A* x) xi 5a Ae x)x
        i~l j~1l£


        i~l j~l1£=1
           jai


        i~l j~l1£=1
           jai
        12 n


        f=1Y


Theorem COB [332]


Definition TSVS [313]


Theorem MMDAA [201]


Theorem MMSMM [201]


Theorem JPVA [169]


Theorem JPSM [170]


Theorem AlP [204]


Theorem AA [190]


Property C [279]


Definition ONS [177]


Theorem ZSSM [286]


Property Z [280]


Theorem MMJM [200]


Since the choice of y was arbitrary, Theorem EMMVP [196] tells us that A*A 112n, 50 A is unitary
(Definition UM  [229]).


Version 2.02


﻿
                                                         Subsection B.READ  Reading Questions 337


Subsection READ
Reading Questions


  1. The matrix below is nonsingular. What can you now say about its columns?

                                                 -3 0 1
                                           A=    1   2  1
                                                 5   1 6

                          .6
  2. Write the vector w = 6 as a linear combination of the columns of the matrix A above. How many
                          15
     ways are there to answer this question?

  3. Why is an orthonormal basis desirable?


Version 2.02


﻿
                                                                      Subsection B.EXC  Exercises 338


Subsection EXC
Exercises


C40 From Example RSB [328], form an arbitrary (and nontrivial) linear combination of the four vectors
in the original spanning set for W. So the result of this computation is of course an element of W. As
such, this vector should be a linear combination of the basis vectors in B. Find the (unique) scalars that
provide this linear combination. Repeat with another linear combination of the original four vectors.
Contributed by Robert Beezer Solution [339]

C80 Prove that {(1, 2), (2, 3)} is a basis for the crazy vector space C (Example CVS [283]).
Contributed by Robert Beezer

M20 In Example BM [326] provide the verifications (linear independence and spanning) to show that B
is a basis of Mmn.
Contributed by Robert Beezer Solution [338]

T50 Theorem UMCOB [334] says that unitary matrices are characterized as those matrices that "carry"
orthonormal bases to orthonormal bases. This problem asks you to prove a similar result: nonsingular
matrices are characterized as those matrices that "carry" bases to bases.
   More precisely, suppose that A is a square matrix of size n and B = {x1, x2, x3, ... , xn} is a basis of
C". Prove that A is nonsingular if and only if C = {Axi, Ax2, Ax3, ..., Axn} is a basis of C". (See also
Exercise PD.T33 [366], Exercise MR.T20 [564].
Contributed by Robert Beezer Solution [339]

T51 Use the result of Exercise B.T50 [337] to build a very concise proof of Theorem CNMB [330]. (Hint:
make a judicious choice for the basis B.)
Contributed by Robert Beezer Solution [340]


Version 2.02


﻿
                                                                      Subsection B.SOL  Solutions 339


Subsection SOL
Solutions


M20     Contributed by Robert Beezer  Statement [337]
We need to establish the linear independence and spanning properties of the set

                                  B={Bkf |1<5k 5m, 1<f E <n}

relative to the vector space Mmn.
   This proof is more transparent if you write out individual matrices in the basis with lots of zeros and
dots and a lone one. But we don't have room for that here, so we will use summation notation. Think
carefully about each step, especially when the double summations seem to "disappear." Begin with a
relation of linear dependence, using double subscripts on the scalars to align with the basis elements.
                                               m   n
                                          0 = >3( yk Bk
                                              k=1 £=1
Now consider the entry in row i and column j for these equal matrices,


o   [0]
      m   n


  - >1>1    cekkBkAf
     k=1 P=1       .
     m   n
  = E  E   [aeBkf Ii
    k=1 f=1
    m    n
  = (   (  ake [Bkf ] i
    k=1 k=1
  = aig [Big] j
  = aig (1)


Definition ZM [185]

Definition ME [182]


Definition MA [182]


Definition MSM [183]

[Bka] = 0 when (k,e) # (i,j)
[Bi] = 1


Since i and j were arbitrary, we find that each scalar is zero and so B is linearly independent (Definition
LI [308]).
   To establish the spanning property of B we need only show that an arbitrary matrix A can be written
as a linear combination of the elements of B. So suppose that A is an arbitrary m x n matrix and consider
the matrix C defined as a linear combination of the elements of B by
                                              m   n
                                         C =Z(Z(E[A] Bkg
                                             k=1 f=1


Then,


         m   n
[C]2 =E(E [ AlggBye
        k=1 k=1         .
        m   n

        k=1 =1
        m   n
      =E(E([A] kf [Bkfl]
      k=1 f=1


Definition ME [182]


Definition MA [182]


Definition MSM [183]


Version 2.02


﻿
                                                                      Subsection B.SOL  Solutions 340


                    = [A] [Bi]                            [Bk] j = 0 when (k,f) - (i,j)
                    = [A4] (1)                           [Big]i = 1
                    = [A] i

So by Definition ME [182], A = C, and therefore A E (B). By Definition B [325], the set B is a basis of
the vector space Mmn.

C40    Contributed by Robert Beezer  Statement [337]
An arbitrary linear combination is

                              2            '1'7                 '-7      '25
                       y= 3 -3 +(-2) 4 +1 [-5] +(-2) -6 =                -10
                               1           1        4            -5       15

(You probably used a different collection of scalars.) We want to write y as a linear combination of


                                        B=       0   ,1


We could set this up as vector equation with variables as scalars in a linear combination of the vectors
in B, but since the first two slots of B have such a nice pattern of zeros and ones, we can determine the
necessary scalars easily and then double-check our answer with a computation in the third slot,

                          1             0              25              25
                      25  0   + (-10)   1   =         -10         =   -10   = y
                          i_           i_       (25)  + (-10) 1_       15 __
Notice how the uniqueness of these scalars arises. They are forced to be 25 and -10.

T50    Contributed by Robert Beezer  Statement [337]
Our first proof relies mostly on definitions of linear independence and spanning, which is a good exercise.
The second proof is shorter and turns on a technical result from our work with matrix inverses, Theorem
NPNT [226].
    (-) Assume that A is nonsingular and prove that C is a basis of C". First show that C is linearly
independent. Work on a relation of linear dependence on C,

            o0= a1Ax1 + a2Ax2 + a3Ax3 + . -+ anAxn          Definition RLD [308]
              = Aaix1 + Aa2x2 + Aa3x3 + ."- + Aanxn          Theorem MMSMM [201]
              =A (aix1 + a2x2 + a3x3 +| -. + | anxa)         Theorem MMDAA [201]

Since A is nonsingular, Definition NM [71] and Theorem SLEMM [195] allows us to conclude that

                                     alx1+ a2x2 +*- -+ anx, =

But this is a relation of linear dependence of the linearly independent set B, so the scalars are trivial,
ai1  a2 =as3     -=a      0. By Definition LI [308], the set C is linearly independent.
   Now prove that C spans C". Given an arbitrary vector y E C", can it be expressed as a linear
combination of the vectors in C? Since A is a nonsingular matrix we can define the vector w to be the


unique solution of the system [S(A, y) (Theorem NMUS [74]). Since w E Cn we can write w as a linear
combination of the vectors in the basis B. So there are scalars, bi, b2, b3, ..., bn such that

                                 w = bix1+b2x2+b3x3----|bnxn


Version 2.02


﻿
Subsection B.SOL Solutions 341


Then,


y =Aw
  = A(bixi +b2x2 + b3x3+--- + box)
  = Abix1 + Ab2x2 + Ab3x3 + - - - + Abnx
  = biAx1 + b2Ax2 + b3Ax3 + ... + bnAx


Theorem SLEMM [195]
Definition TSVS [313]
Theorem MMDAA [201]
Theorem MMSMM [201]


So we can write an arbitrary vector of C' as a linear combination of the elements of C. In other words, C
spans CC (Definition TSVS [313]). By Definition B [325], the set C is a basis for C".
   (<) Assume that C is a basis and prove that A is nonsingular. Let x be a solution to the homogeneous
system [S(A, 0). Since B is a basis of C there are scalars, ai, a2, a3, ..., an, such that

                               x=alxl+ a2x2 + a3x3 +      -+ anx


Then


0=Ax
  = A (aix1 + a2x2 + a3x3 + ... + anxn)
  = Aaix1 + Aa2x2 + Aa3x3 + - - - + Aanx
  = a1Ax1 + a2Ax2 + a3Ax3 + ... + anAx


Theorem SLEMM [195]
Definition TSVS [313]
Theorem MMDAA [201]
Theorem MMSMM [201]


This is a relation of linear dependence on the linearly independent set C, so the scalars must all be zero,
ai=a2= as      = an=0. Thus,

               x=aix1+ a2x2 + a3x3 + -+ anx=0x1+0x2 + Ox3 +            -+ 0x=0.

By Definition NM [71] we see that A is nonsingular.
   Now for a second proof. Take the vectors for B and use them as the columns of a matrix, G
[x1 x2 x3 ... xn]. By Theorem CNMB [330], because we have the hypothesis that B is a basis of C"m, G is
a nonsingular matrix. Notice that the columns of AG are exactly the vectors in the set C, by Definition
MM [197].


A nonsingular <    AG nonsingular
               - C basis for C"


Theorem NPNT [226]
Theorem CNMB [330]


That was easy!
T51    Contributed by Robert Beezer  Statement [337]
Choose B to be the set of standard unit vectors, a particularly nice basis of C" (Theorem SUVB [325]).
For a vector e3 (Definition SUV [173]) from this basis, what is Ae3?


Version 2.02


﻿
                                                                             Section D Dimension  342


Section D
Dimension


Almost every vector space we have encountered has been infinite in size (an exception is Example VSS
[283]). But some are bigger and richer than others. Dimension, once suitably defined, will be a measure of
the size of a vector space, and a useful tool for studying its properties. You probably already have a rough
notion of what a mathematical definition of dimension might be   try to forget these imprecise ideas and
go with the new ones given here.

Subsection D
Dimension


Definition D
Dimension
Suppose that V is a vector space and {vi, v2, v3, ..., vt} is a basis of V. Then the dimension of V is
defined by dim (V) = t. If V has no finite bases, we say V has infinite dimension.
(This definition contains Notation D.)                                                             A
   This is a very simple definition, which belies its power. Grab a basis, any basis, and count up the
number of vectors it contains. That's the dimension. However, this simplicity causes a problem. Given a
vector space, you and I could each construct different bases remember that a vector space might have
many bases. And what if your basis and my basis had different sizes? Applying Definition D [341] we
would arrive at different numbers! With our current knowledge about vector spaces, we would have to say
that dimension is not "well-defined." Fortunately, there is a theorem that will correct this problem.
   In a strictly logical progression, the next two theorems would precede the definition of dimension. Many
subsequent theorems will trace their lineage back to the following fundamental result.
Theorem SSLD
Spanning Sets and Linear Dependence
Suppose that S = {vi, v2, v3, ..., vt} is a finite set of vectors which spans the vector space V. Then any
set of t + 1 or more vectors from V is linearly dependent.                                          D
Proof We want to prove that any set of t + 1 or more vectors from V is linearly dependent. So we will
begin with a totally arbitrary set of vectors from V, R = {Ui, u2, u3, ..., um}, where m > t. We will now
construct a nontrivial relation of linear dependence on R.
    Each vector ui1, 112, 113, ... um can be written as a linear combination of vi, v2, v3, ..., vt since S is
a spanning set of V. This means there exist scalars agg 1 i <t, 1 < j   m, so that

                              ui1= aiivi + a21V2 + asiva +| -. + | ativt
                              112 =a12vi + a22v2 + a32v3 +| -. + | at2vt
                              u11= aisvi +| a23v2 +| a33v3 +| -. + | atsvt


                                um=aimvi + a2mv2 + a3mv3 +| -. + | atmvt


Now we form, unmotivated, the homogeneous system of t equations in the m variables, xi, z2, x3, ... , Xm,
where the coefficients are the just-discovered scalars agg,

                               an1xi + a12x2 + a13x3 + ... + almzm = 0


Version 2.02


﻿
Subsection D.D Dimension  343


a21x1 + a22x2 + a23x3 + .. + a2mxm
a31x1 + a32x2 + a33x3 + .. + a3mxm


0
0


                                atlxl + at2x2 + at3x3 + ... + atmxm =0


This is a homogeneous system with more variables than equations (our hypothesis is expressed as m > t),
so by Theorem HMVEI [64] there are infinitely many solutions. Choose a nontrivial solution and denote
it by x1  Cl, x2 =c2, X3= C3, ... , Xm= Cm. As a solution to the homogeneous system, we then have


a11c1 + a12c2 + a13c3 + ... + almcm
a21C1 + a22C2 + a23C3 + ... + a2mCm
a31 C1 + a32 C2 + a33 c3 + ... + aim cm


0
0
0


atl Cl + at2 c2 + at3 c3 + ... + atm Cm  0


As a collection of nontrivial scalars, cl, c2, c3, ... , cm will provide the
dence we desire,

         Ciii + C2U2 + C3U3 + ... + CmUm
         = Cl (aiivi + a2iv2 + a3iv3 + . - + atl Vt)
              + c2 (ai2vi + a22v2 + a32v3 + ... + at2vt)
              + c3 (ai3vi + a23v2 + a33v3 + ... + at3vt)


              + Cm (aimvi + a2mv2 + a3mV3 + ..+ atmvt)
         - CiaiiVi + Cla2lv2 + Cia3iv3 + .*.* + CiatiVt
              + C2al2vl + C2a22v2 + C2a32v3 + ..+ C2at2vt
              + C3ai3vi + C3a23v2 + C3a33V3 + ..+ C3at3Vt


              + CmaimVi + Cma2mV2 + Cma3mV3 +    . + CmatmVt
         - (Ciaii + C2al2 + C3ai3 + ... + Cmaim) Vi
             + (Cla2l + C2a22 + C3a23 + ... + Cma2m) V2
             + (Cia3i + C2a32 + C3a33 + ... + Cma3m) V3


             + (Ciati + C2at2 + C3at3 + ... + cm atm) Vt
          _(aiiCi + al2C2 + ai3 C3 + .*.* + aimCm) Vi
             + (a2lCl + a22 C2 + a23 C3 + ..+ a2m cm) V2
             + (a3iCi + a32 c2 + a33 c3 + ..+ aim cm) V3


             + (atiCi + at2 C2 + at3 C3 + ..+ atm cm) Vt
         =OVl+OV2+OV3+..+OVt


nontrivial relation of linear depen-


   Definition TSVS [313]


   Property DVA [280]


   Property DSA [280]


   Property CMCN [680]


   C3 as solution


Version 2.02


﻿
Subsection D.D  Dimension  344


Theorem ZSSM [286]
Property Z [280]


0


That does it. R has been undeniably shown to be a linearly dependent set.


0


   The proof just given has some monstrous expressions in it, mostly owing to the double subscripts
present. Now is a great opportunity to show the value of a more compact notation. We will rewrite the key
steps of the previous proof using summation notation, resulting in a more economical presentation, and
even greater insight into the key aspects of the proof. So here is an alternate proof  study it carefully.

Proof     (Alternate Proof of Theorem    SSLD) We want to prove that any set of t + 1 or more
vectors from V is linearly dependent. So we will begin with a totally arbitrary set of vectors from V,
R = {u3 1 < j < m}, where m > t. We will now construct a nontrivial relation of linear dependence on
R.
   Each vector uj, 1 < j < m can be written as a linear combination of v2, 1 < i < t since S is a spanning
set of V. This means there are scalars azg, 1 < i < t, 1 < j < m, so that


- Z=ai jvi
    i=1


1 j m


Now we form, unmotivated, the homogeneous system of t equations in the m variables, xj, 1 < j < m,
where the coefficients are the just-discovered scalars agg,


m
   aijx = 0
j=1


1<i<t


This is a homogeneous system with more variables than equations (our hypothesis is expressed as m > t),
so by Theorem HMVEI [64] there are infinitely many solutions. Choose one of these solutions that is not
trivial and denote it by x = cj, 1 < j < m. As a solution to the homogeneous system, we then have
=f  1 aijcj = 0 for 1 < i < t. As a collection of nontrivial scalars, cy, 1 < j   m, will provide the nontrivial
relation of linear dependence we desire,


m
   CjUj
j=1


m        t
Zcj        aij vi
j=1  i=1
m t
Z Z cjaijvi
j=1 i=1
t m
Z Z c aijvi
i=1 j=1
t m
S    v    civi
i=1 j=1
t     m
         aij cj vi
i=1 j=1
t
   ovi
i=1
t
    0
i=1


Definition TSVS [313]


Property DVA [280]


Property CMCN [680]


Commutativity in C


Property DSA [280]


cj as solution


Theorem ZSSM [286]


Version 2.02


﻿
                                                                       Subsection D.D  Dimension 345


                          = 0                                  Property Z [280]

That does it. R has been undeniably shown to be a linearly dependent set.                           U
   Notice how the swap of the two summations is so much easier in the third step above, as opposed to
all the rearranging and regrouping that takes place in the previous proof. In about half the space. And
there are no ellipses (...).
   Theorem SSLD [341] can be viewed as a generalization of Theorem MVSLD [137]. We know that Ctm
has a basis with m vectors in it (Theorem SUVB [325]), so it is a set of m vectors that spans Ctm. By
Theorem SSLD [341], any set of more than m vectors from Cm will be linearly dependent. But this is
exactly the conclusion we have in Theorem MVSLD [137]. Maybe this is not a total shock, as the proofs
of both theorems rely heavily on Theorem HMVEI [64]. The beauty of Theorem SSLD [341] is that it
applies in any vector space. We illustrate the generality of this theorem, and hint at its power, in the next
example.
Example LDP4
Linearly dependent set in P4
In Example SSP4 [313] we showed that

               S ={x - 2, 2 - 4x + 4, x3 - 6x2 + 12x - 8, 34 - 8x3 + 242 - 32x + 16}

is a spanning set for W = {p(x) p E P4, p(2) = 0}. So we can apply Theorem SSLD [341] to W with
t = 4. Here is a set of five vectors from W, as you may check by verifying that each is a polynomial of
degree 4 or less and has x= 2 as a root,

                                     T = {pi, p2, P3, P4, P5} c W


                                     Pi = x4 - 2x3 + 22 - 8x + 8
                                     P2 = -x3 + 6x2 - 5x - 6
                                     p3=2x4-5x3+ 5x2-7x+2
                                     P4= -x+4x3-72+6x
                                     p = 4x3 - 92 + 5x - 6

By Theorem SSLD [341] we conclude that T is linearly dependent, with no further computations.
   Theorem SSLD [341] is indeed powerful, but our main purpose in proving it right now was to make
sure that our definition of dimension (Definition D [341]) is well-defined. Here's the theorem.
Theorem BIS
Bases have Identical Sizes
Suppose that V is a vector space with a finite basis B and a second basis C. Then B and C have the same
size.D
Proof Suppose that C has more vectors than B. (Allowing for the possibility that C is infinite, we can
replace C by a subset that has more vectors than B.) As a basis, B is a spanning set for V (Definition B
[325]), so Theorem SSLD [341] says that C is linearly dependent. However, this contradicts the fact that
as a basis C is linearly independent (Definition B [325]). So C must also be a finite set, with size less than,
or equal to, that of B.
    Suppose that B has more vectors than C. As a basis, C is a spanning set for V (Definition B [325]), so
Theorem SSLD [341] says that B is linearly dependent. However, this contradicts the fact that as a basis


B is linearly independent (Definition B [325]). So C cannot be strictly smaller than B.
   The only possibility left for the sizes of B and C is for them to be equal.                     U
   Theorem BIS [344] tells us that if we find one finite basis in a vector space, then they all have the same
size. This (finally) makes Definition D [341] unambiguous.


Version 2.02


﻿
                                                   Subsection D.DVS Dimension of Vector Spaces 346


Subsection DVS
Dimension of Vector Spaces


We can now collect the dimension of some common, and not so common, vector spaces.

Theorem DCM
Dimension of Cm
The dimension of Ctm (Example VSCV [281]) is m.                                               D

Proof Theorem SUVB [325] provides a basis with m vectors.                                     U

Theorem DP
Dimension of Pn
The dimension of Pn (Example VSP [281]) is n + 1.                                             D

Proof Example BP [326] provides two bases with n + 1 vectors. Take your pick.                 U

Theorem DM
Dimension of Mmn
The dimension of Mmn (Example VSM [281]) is mn.                                               D

Proof Example BM [326] provides a basis with mn vectors.                                      U

Example DSM22
Dimension of a subspace of M1/l22
It should now be plausible that


                     Z = [  ab 2a+b+3c+4d = 0, -a+3b -5c - d = 0


is a subspace of the vector space M1/l22 (Example VSM [281]). (It is.) To find the dimension of Z we must
first find a basis, though any old basis will do.
   First concentrate on the conditions relating a, b, c and d. They form a homogeneous system of two
equations in four variables with coefficient matrix

                                         2   1   3   4
                                         -1 3 -5    -1

We can row-reduce this matrix to obtain


Rewrite the two equations represented by each row of this matrix, expressing the dependent variables (a
and b) in terms of the free variables (c and d), and we obtain,

                                          a = -2c -2d
                                          b =c


We can now write a typical entry of Z strictly in terms of c and d, and we can decompose the result,

              a   b]    -2cc-2d   c]   [-2c  c] + [-2d  0]     [-2  1] +d[-2    0]
                      +L=c +


Version 2.02


﻿
                                                      Subsection D.DVS Dimension of Vector Spaces 347


this equation says that an arbitrary matrix in Z can be written as a linear combination of the two vectors
in
                                       S =   -2 1      -2 0
                                       S  {1      0]'   0   1
so we know that
                                     Z[-S)=       2 1     -2 0
                                     1 0 ' 01

Are these two matrices (vectors) also linearly independent? Begin with a relation of linear dependence on
S,

                                  ai-2 1           -2 00
                                  ai  1   0_ +a2    0   1
                                          -2a1- 2a2 ai        0 0
                                              ai       a2     0 0_
From the equality of the two entries in the last row, we conclude that ai1= 0, a2 = 0. Thus the only
possible relation of linear dependence is the trivial one, and therefore S is linearly independent (Definition
LI [308]). So S is a basis for V (Definition B [325]). Finally, we can conclude that dim (Z) = 2 (Definition
D [341]) since S has two elements.

Example DSP4
Dimension of a subspace of P4
In Example BSP4 [326] we showed that

               S ={x - 2, x2 - 4x + 4, x3 - 6x2 + 12x - 8, x4 - 8x3 + 242 - 32x + 16}

is a basis for W = {p(x) p E P4, p(2) = 0}. Thus, the dimension of W is four, dim (W) = 4.
   Note that dim (P4) = 5 by Theorem DP [345], so W is a subspace of dimension 4 within the vector
space P4 of dimension 5, illustrating the upcoming Theorem PSSD [358].

Example DC
Dimension of the crazy vector space
In Example BC [328] we determined that the set R = {(1, 0), (6, 3)} from the crazy vector space, C
(Example CVS [283]), is a basis for C. By Definition D [341] we see that C has dimension 2, dim (C) = 2.


   It is possible for a vector space to have no finite bases, in which case we say it has infinite dimension.
Many of the best examples of this are vector spaces of functions, which lead to constructions like Hilbert
spaces. We will focus exclusively on finite-dimensional vector spaces. OK, one infinite-dimensional example,
and then we will focus exclusively on finite-dimensional vector spaces.
Example VSPUD
Vector space of polynomials with unbounded degree
Define the set P by
                                  P ={p |p(x) is a polynomial in 4}
Our operations will be the same as those defined for Pa (Example VSP [281]).
   With no restrictions on the possible degrees of our polynomials, any finite set that is a candidate for
spanning P will come up short. We will give a proof by contradiction (Technique CD [692]). To this end,
suppose that the dimension of P is finite, say dim (F) =rn.
   The set T { {1, x, 92, . . ., xz"} is a linearly independent set (check this!) containing n~+1 polynomials


from P. However, a basis of P will be a spanning set of P containing n vectors. This situation is a
contradiction of Theorem SSLD [341], so our assumption that P has finite dimension is false. Thus, we
say dim (P) = 00.


Version 2.02


﻿
                                                  Subsection D.RNM   Rank and Nullity of a Matrix 348


Subsection RNM
Rank and Nullity of a Matrix


For any matrix, we have seen that we can associate several subspaces the null space (Theorem NSMS
[296]), the column space (Theorem CSMS [302]), row space (Theorem RSMS [303]) and the left null space
(Theorem LNSMS [303]). As vector spaces, each of these has a dimension, and for the null space and
column space, they are important enough to warrant names.

Definition NOM
Nullity Of a Matrix
Suppose that A is an m x n matrix. Then the nullity of A is the dimension of the null space of A,
n (A) = dim (N(A)).
(This definition contains Notation NOM.)                                                         A

Definition ROM
Rank Of a Matrix
Suppose that A is an m x n matrix. Then the rank of A is the dimension of the column space of A,
r (A) = dim (C(A)).
(This definition contains Notation ROM.)                                                         A

Example RNM
Rank and nullity of a matrix
Let's compute the rank and nullity of

                                     2   -4   -1   3    2    1  -4
                                     1   -2   0    0    4    0   1
                                A=-2      4    1   0   -5   -4  -8
                              A      1   -2    1   1    6    1  -3
                                     2   -4   -1   1    4   -2  -1
                                     -1   2   3   -1    6    3  -1_

To do this, we will first row-reduce the matrix since that will help us determine bases for the null space
and column space.
                                   1   -2   0    0   4    0   1
                                   0    0  [     0   3    0   -2
                                   0    0   0   [    -1   0   -3
                                   0    0   0    0   0   1    1


From this row-equivalent matrix in reduced row-echelon form we record D ={1, 3, 4, 6} and F ={2, 5, 7}.
   For each index in D, Theorem BCS [239] creates a single basis vector. In total the basis will have 4
vectors, so the column space of A will have dimension 4 and we write r (A) =4.
   For each index in F, Theorem BNS [139] creates a single basis vector. In total the basis will have 3
vectors, so the null space of A will have dimension 3 and we write n~ (A) =3.

   There were no accidents or coincidences in the previous example -with the row-reduced version of a
matrix in hand, the rank and nullity are easy to compute.


Theorem CRN
Computing Rank and Nullity
Suppose that A is an m x n matrix and B is a row-equivalent matrix in reduced row-echelon form with r


Version 2.02


﻿
                                      Subsection D.RNNM   Rank and Nullity of a Nonsingular Matrix 349


nonzero rows. Then r (A) = r and n (A) = n - r.                                                  D

Proof Theorem BCS [239] provides a basis for the column space by choosing columns of A that correspond
to the dependent variables in a description of the solutions to IJS(A, 0). In the analysis of B, there is
one dependent variable for each leading 1, one per nonzero row, or one per pivot column. So there are r
column vectors in a basis for C(A).
   Theorem BNS [139] provide a basis for the null space by creating basis vectors of the null space of A
from entries of B, one for each independent variable, one per column with out a leading 1. So there are
n - r column vectors in a basis for n (A).


   Every archetype (Appendix A [698]) that involves a matrix lists its rank and nullity. You may have
noticed as you studied the archetypes that the larger the column space is the smaller the null space is. A
simple corollary states this trade-off succinctly. (See Technique LC [696].)

Theorem RPNC
Rank Plus Nullity is Columns
Suppose that A is an m x n matrix. Then r (A) + n (A) = n.

Proof Let r be the number of nonzero rows in a row-equivalent matrix in reduced row-echelon form. By
Theorem CRN [347],
                                   r (A) + n (A) = r + (n - r) = n


   When we first introduced r as our standard notation for the number of nonzero rows in a matrix in
reduced row-echelon form you might have thought r stood for "rows." Not really  it stands for "rank"!


Subsection RNNM
Rank and Nullity of a Nonsingular Matrix


Let's take a look at the rank and nullity of a square matrix.

Example RNSM
Rank and nullity of a square matrix
The matrix
                                     0    4   -1   2    2    3   1
                                     2   -2    1  -1    0   -4  -3
                                     -2  -3    9  -3    9   -1   9
                              E=    -3   -4    9   4   -1    6  -2
                                    -3   -4    6  -2    5    9  -4
                                    9    -3    8  -2   -4    2   4
                                    8     2    2   9    3    0   9_
is row-equivalent to the matrix in reduced row-echelon form,


                                      0      0   0   0    0   0
                                    0ooL0        oo0      0   l


0    o0 0Q1       0   0    0
0o0o   0 Q 0 0
0    0   0    0   0   Q    0
0    0   0    0   0   0   R-_


Version 2.02


﻿
                                      Subsection D.RNNM   Rank and Nullity of a Nonsingular Matrix 350


With n = 7 columns and r = 7 nonzero rows Theorem CRN [347] tells us the rank is r (E) = 7 and the
nullity is n (E) = 7 - 7 = 0.
   The value of either the nullity or the rank are enough to characterize a nonsingular matrix.
Theorem RNNM
Rank and Nullity of a Nonsingular Matrix
Suppose that A is a square matrix of size n. The following are equivalent.

  1. A is nonsingular.

  2. The rank of A is n, r (A) = n.

  3. The nullity of A is zero, n (A) = 0.


Proof   (1 - 2) Theorem CSNM [242] says that if A is nonsingular then C(A) = Ct. If C(A) = C", then
the column space has dimension n by Theorem DCM [345], so the rank of A is n.
(2 - 3) Suppose r (A) = n. Then Theorem RPNC [348] gives

                     n (A) = n - r (A)                    Theorem RPNC [348]
                          = n - n                         Hypothesis
                          =0

(3 - 1) Suppose n (A) = 0, so a basis for the null space of A is the empty set. This implies that Af(A) = {O}
and Theorem NMTNS [74] says A is nonsingular.                                                    U
   With a new equivalence for a nonsingular matrix, we can update our list of equivalences (Theorem
NME5 [331]) which now becomes a list requiring double digits to number.
Theorem NME6
Nonsingular Matrix Equivalences, Round 6
Suppose that A is a square matrix of size n. The following are equivalent.

  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, .N(A) = {0}.

  4. The linear system IJS(A, b) has a unique solution for every possible choice of b.

  5. The columns of A are a linearly independent set.

  6. A is invertible.

  7. The column space of A is C"m, C(A) =C"m.

  8. The columns of A are a basis for C".

  9. The rank of A is n, r (A) = n

  10. The nullity of A is zero, n~ (A) =0.


                                                                                                 D-
Proof Building on Theorem NME5 [331] we can add two of the statements from Theorem RNNM [349].


Version 2.02


﻿
                                                          Subsection D.READ   Reading Questions 351


Subsection READ
Reading Questions


  1. What is the dimension of the vector space P6, the set of all polynomials of degree 6 or less?

  2. How are the rank and nullity of a matrix related?

  3. Explain why we might say that a nonsingular matrix has "full rank."


Version 2.02


﻿
                                                                    Subsection D.EXC  Exercises 352


Subsection EXC
Exercises


C20 The archetypes listed below are matrices, or systems of equations with coefficient matrices. For
each, compute the nullity and rank of the matrix. This information is listed for each archetype (along with
the number of columns in the matrix, so as to illustrate Theorem RPNC [348]), and notice how it could
have been computed immediately after the determination of the sets D and F associated with the reduced
row-echelon form of the matrix.
Archetype A [702]
Archetype B [707]
Archetype C [712]
Archetype D [716]/Archetype E [720]
Archetype F [724]
Archetype G [729]/Archetype H [733]
Archetype I [737]
Archetype J [741]
Archetype K [746]
Archetype L [750]
Contributed by Robert Beezer

C30   For the matrix A below, compute the dimension of the null space of A, dim (N(A)).

                                          2 -1 -3      11   9
                                       A=1    2   1   -7 -3
                                       4 3    1  -3    6    8
                                          2   1   2   -5 -3]

Contributed by Robert Beezer Solution [353]

C31   The set W below is a subspace of C4. Find the dimension of W.

                                            2      3      -4
                                    W=       3     0      -3
                                    W4          '1     '2
                                            .l .   -2_     5 _

Contributed by Robert Beezer Solution [353]

C40 In Example LDP4 [344] we determined that the set of five polynomials, T, is linearly dependent by
a simple invocation of Theorem SSLD [341]. Prove that T is linearly dependent from scratch, beginning
with Definition LI [308].
Contributed by Robert Beezer

M20 M22 is the vector space of 2 x 2 matrices. Let S22 denote the set of all 2 x 2 symmetric matrices.
That is
                                     S22 ={ AcEM22 |AK =A}


(a)  Show that S22 is a subspace of M22.
(b) Exhibit a basis for S22 and prove that it has the required properties.
(c) What is the dimension of S22?
Contributed by Robert Beezer Solution [353]


Version 2.02


﻿
                                                                      Subsection D.EXC  Exercises 353


M21 A 2 x 2 matrix B is upper triangular if [B]21 = 0. Let UT2 be the set of all 2 x 2 upper triangular
matrices. Then UT2 is a subspace of the vector space of all 2 x 2 matrices, M22 (you may assume this).
Determine the dimension of UT2 providing all of the necessary justifications for your answer.
Contributed by Robert Beezer Solution [354]


Version 2.02


﻿
                                                                    Subsection D.SOL Solutions 354


Subsection SOL
Solutions


C30    Contributed by Robert Beezer   Statement [351]
Row reduce A,
                                            0       0    1   1

                                 A RREF,0           0   -3   -1
                                            0   0  [    -2   -2
                                            0   0   0    0   0]
So r = 3 for this matrix. Then

              dim (Jf(A)) = n (A)                              Definition NOM [347]
                          = (n (A)+ r (A)) - r (A)
                          = 5 - r (A)                          Theorem RPNC [348]
                          = 5 - 3                              Theorem CRN [347]
                          -2

We could also use Theorem BNS [139] and create a basis for N(A) with n - r = 5 - 3 = 2 vectors (because
the solutions are described with 2 free variables) and arrive at the dimension as the size of this basis.
C31    Contributed by Robert Beezer   Statement [351]
We will appeal to Theorem BS [157] (or you could consider this an appeal to Theorem BCS [239]). Put
the three column vectors of this spanning set into a matrix as columns and row-reduce.

                                 2    3   -4         [1    0    1
                                 -3   0   -3   RREF,   0WT     -2
                                 4    1   2'           0   0    0
                                 1   -2   5            0   0      _

The pivot columns are D = {1, 2} so we can "keep" the vectors corresponding to the pivot columns and
set
                                               2      3
                                           T-   3     0
                                           T   4   '1
                                              . 1]   [-2_
and conclude that W = (T) and T is linearly independent. In other words, T is a basis with two vectors,
so W has dimension 2.
M20 Contributed by Robert Beezer Statement [351]
(a) We will use the three criteria of Theorem TSS [293]. The zero vector of M22 is the zero matrix, 0
(Definition ZM [185]), which is a symmetric matrix. So S22 is not empty, since 0 E S22.
   Suppose that A and B are two matrices in S22. Then we know that At - A and Bt  B. We want to
know if A + B E S22, so test A + B for membership,

                    (A + B)t   At + Bt                     Theorem TMA [186]
                             = A +B                        A, Bc S22


So A + B is symmetric and qualifies for membership in S22.
   Suppose that A E S22 and a E C. Is aA E S22? We know that At = A. Now check that,

                      aA  = aAt                        Theorem TMSM [187]


Version 2.02


﻿
                                                                        Subsection D.SOL  Solutions 355


                              aA                          AES22

So aA is also symmetric and qualifies for membership in S22.
    With the three criteria of Theorem TSS [293] fulfilled, we see that S22 is a subspace of M22.
    (b) An arbitrary matrix from S22 can be written as [b d1. We can express this matrix as


                                 [b d]     0 0]      b 0]     0 d]

                                       = a 0 0] + b      1 0 + d-0 1o

this equation says that the set


spans 522. Is it also linearly independent?
    Write a relation of linear dependence on S,

                                   0O= ai 0 0- +a2 1 0_ + a3 0 1_

                                0 0      ai a2
                                0  0_ -  a2  as_

The equality of these two matrices (Definition ME [182]) tells us that al= a2 = a3= 0, and the only
relation of linear dependence on T is trivial. So T is linearly independent, and hence is a basis of S22.
    (c) The basis T found in part (b) has size 3. So by Definition D [341], dim (S22) = 3.
M21     Contributed by Robert Beezer  Statement [352]
A typical matrix from UT2 looks like
                                                  a b
                                                  [  J
where a, b, c E C are arbitrary scalars. Observing this we can then write

                                 0 c]     a [0 0] + b 0 0] + c 0     1f

which says that
                                    R ={ 1 01       0 11    [0 01

is a spanning set for UT2 (Definition TSVS [313]). Is R is linearly independent? If so, it is a basis for UT2.
So consider a relation of linear dependence on R


From this equation, one rapidly arrives at the conclusion that ai = a2 =Oas3 0. So R is a linearly
independent set (Definition LI [308]), and hence is a basis (Definition B [325]) for UT2. Now, we simply
count up the size of the set R to see that the dimension of UT2 is dim (UT2) =3.


Version 2.02


﻿
                                                               Section PD  Properties of Dimension 356


Section PD
Properties of Dimension


Once the dimension of a vector space is known, then the determination of whether or not a set of vectors
is linearly independent, or if it spans the vector space, can often be much easier. In this section we will
state a workhorse theorem and then apply it to the column space and row space of a matrix. It will also
help us describe a super-basis for Cm.

Subsection GT
Goldilocks' Theorem


We begin with a useful theorem that we will need later, and in the proof of the main theorem in this
subsection. This theorem says that we can extend linearly independent sets, one vector at a time, by
adding vectors from outside the span of the linearly independent set, all the while preserving the linear
independence of the set.
Theorem ELIS
Extending Linearly Independent Sets
Suppose V is vector space and S is a linearly independent set of vectors from V. Suppose w is a vector
such that w 0 (S). Then the set S' = S U {w} is linearly independent.                               D
Proof Suppose S = {vi, v2, v3, ..., vm} and begin with a relation of linear dependence on S',

                           aiv1 + a2v2 + a3v3 + ... + amvm + am+lw = 0.

There are two cases to consider. First suppose that am+1 = 0. Then the relation of linear dependence on
S' becomes
                                aiv11+ a2v2 + a3v3+ -..-+ amvm = 0.
and by the linear independence of the set S, we conclude that ai1= a2 = a3 = - - - = am = 0. So all of the
scalars in the relation of linear dependence on S' are zero.
   In the second case, suppose that am+1 $ 0. Then the relation of linear dependence on S' becomes

                       am+lw = -alvl - a2v2 - a3v3 - --. - amvm
                                    a1        a2        a3              am
                            W=--       Vi-        V2-       V3- -.--        Vm
                                  am+1       am+1      am+1            am+1
This equation expresses w as a linear combination of the vectors in 5, contrary to the assumption that
w g (5), so this case leads to a contradiction.
   The first case yielded only a trivial relation of linear dependence on 5' and the second case led to a
contradiction. So 5' is a linearly independent set since any relation of linear dependence is trivial. U
   In the story Goldilocks and the Three Bears, the young girl Goldilocks visits the empty house of the
three bears while out walking in the woods. One bowl of porridge is too hot, the other too cold, the third
is just right. One chair is too hard, one too soft, the third is just right. So it is with sets of vectors -some
are too big (linearly dependent), some are too small (they don't span), and some are just right (bases).
Here's Goldilocks' Theorem.


Theorem G
Goldilocks
Suppose that V is a vector space of dimension t. Let S = {vi, v2, v3, ..., vm} be a set of vectors from
V. Then


Version 2.02


﻿
                                                            Subsection PD.GT  Goldilocks' Theorem 357


  1. If m > t, then S is linearly dependent.

  2. If m < t, then S does not span V.

  3. If m = t and S is linearly independent, then S spans V.

  4. If m = t and S spans V, then S is linearly independent.


Proof Let B be a basis of V. Since dim (V) = t, Definition B [325] and Theorem BIS [344] imply that
B is a linearly independent set of t vectors that spans V.

  1. Suppose to the contrary that S is linearly independent. Then B is a smaller set of vectors that spans
     V. This contradicts Theorem SSLD [341].

  2. Suppose to the contrary that S does span V. Then B is a larger set of vectors that is linearly
     independent. This contradicts Theorem SSLD [341].

  3. Suppose to the contrary that S does not span V. Then we can choose a vector w such that w E V
     and w g (S). By Theorem ELIS [355], the set S' = S U {w} is again linearly independent. Then S'
     is a set of m + 1= t + 1 vectors that are linearly independent, while B is a set of t vectors that span
     V. This contradicts Theorem SSLD [341].

  4. Suppose to the contrary that S is linearly dependent. Then by Theorem DLDS [152] (which can be
     upgraded, with no changes in the proof, to the setting of a general vector space), there is a vector
     in S, say vk that is equal to a linear combination of the other vectors in S. Let S' = S \ {vk},
     the set of "other" vectors in S. Then it is easy to show that V = (S) = (5'). So S' is a set of
     m - 1 = t - 1 vectors that spans V, while B is a set of t linearly independent vectors in V. This
     contradicts Theorem SSLD [341].


   There is a tension in the construction of basis. Make a set too big and you will end up with relations
of linear dependence among the vectors. Make a set too small and you will not have enough raw material
to span the entire vector space. Make a set just the right size (the dimension) and you only need to have
linear independence or spanning, and you get the other property for free. These roughly-stated ideas are
made precise by Theorem G [355].
   The structure and proof of this theorem also deserve comment. The hypotheses seem innocuous. We
presume we know the dimension of the vector space in hand, then we mostly just look at the size of the
set S. From this we get big conclusions about spanning and linear independence. Each of the four proofs
relies on ultimately contradicting Theorem SSLD [341], so in a way we could think of this entire theorem
as a corollary of Theorem SSLD [341]. (See Technique LC [696].) The proofs of the third and fourth parts
parallel each other in style (add w, toss vk) and then turn on Theorem ELIS [355] before contradicting
Theorem SSLD [341].
   Theorem G [355] is useful in both concrete examples and as a tool in other proofs. We will use it often
to bypass verifying linear independence or spanning.

Example BPR
Bases for Ps, reprised
In Example BP [326] we claimed that


B={1,xx2,x3,...,xn}
C= {1, 1+x, 1+x+x2, 1+x+x2+x3, ..., 1+x+x2+x3+""+xn}.


Version 2.02


﻿
                                                          Subsection PD.GT  Goldilocks' Theorem 358


were both bases for Pn (Example VSP [281]). Suppose we had first verified that B was a basis, so we
would then know that dim (Pa) = n + 1. The size of C is n + 1, the right size to be a basis. We could
then verify that C is linearly independent. We would not have to make any special efforts to prove that C
spans Pa, since Theorem G [355] would allow us to conclude this property of C directly. Then we would
be able to say that C is a basis of Pn also.

Example BDM22
Basis by dimension in M1/l22
In Example DSM22 [345] we showed that

                                    s  m     2  1    -2 0
                                        B   1   0' 0      1_

is a basis for the subspace Z of M122 (Example VSM [281]) given by


Z= [a ]2a+b+3c+4d=O,-a+3b


5c-d=0}


This tells us that dim (Z) = 2. In this example we will find another basis.
matrices in Z by forming linear combinations of the matrices in B.
                                  -2 1 -2 0 2 2
                               2   1  0  + (- 3)  0   1 -   2  - 3 -


                                  3L12 0J+1 0        O1-    3   1J


We can construct two new


Then the set


has the right size to be a basis of Z.
dependence


s e         3e if i 3 ie1
Let's see if it is a linearly independent set. The relation of linear


                                 ai2      -   33 + a2 3 1+
                                   2a1-8a2    2a1+3a2       0 0
                                   2ai+s32 -3a1+aa2         0  0
leads to the homogeneous system of equations whose coefficient matrix


[


2   -8
2    3
2    3
-3 1


row-reduces to


So with ai = a2 = 0 as
[355] to see that C also

Example SVP4
Sets of vectors in P4
In Example BSP4 [326]


                         U10
                         0 F-
                         0  0

the only solution, the set is linearly independent. Now we can apply Theorem G
spans Z and therefore is a second basis for Z.


we showed that


B = {x - 2, x2 - 4x + 4, x3 - 6x2 + 12x - 8, x4 - 8x3 + 24x2 - 32x + 16}


Version 2.02


﻿
                                                          Subsection PD.RT  Ranks and Transposes 359


is a basis for W = {p(x) p E P4, p(2) = 0}. So dim (W) = 4.
   The set
                            {3x2-5x-2, 2x2-7x+6, x3-2x2+x-2}
is a subset of W (check this) and it happens to be linearly independent (check this, too). However, by
Theorem G [355] it cannot span W.
   The set

            {3x2 - 5x - 2, 2x2 - 7x + 6, x3 - 2x2 + x - 2, -x4 + 2x3 + 5x2 - 10x, x4 - 16}

is another subset of W (check this) and Theorem G [355] tells us that it must be linearly dependent.
   The set
                                 {x - 2, x2 - 2x, x3 2X2, x4 - 2x3}
is a third subset of W (check this) and is linearly independent (check this). Since it has the right size to
be a basis, and is linearly independent, Theorem G [355] tells us that it also spans W, and therefore is a
basis of W.


   A simple consequence of Theorem G [355] is the observation that proper subspaces have strictly smaller
dimensions. Hopefully this may seem intuitively obvious, but it still requires proof, and we will cite this
result later.
Theorem PSSD
Proper Subspaces have Smaller Dimension
Suppose that U and V are subspaces of the vector space W, such that U C V. Then dim (U) < dim (V).


Proof Suppose that dim (U) = m and dim (V) = t. Then U has a basis B of size m. If m > t, then by
Theorem G [355], B is linearly dependent, which is a contradiction. If m = t, then by Theorem G [355],
B spans V. Then U = (B) = V, also a contradiction. All that remains is that m < t, which is the desired
conclusion.                                                                                       U
   The final theorem of this subsection is an extremely powerful tool for establishing the equality of two
sets that are subspaces. Notice that the hypotheses include the equality of two integers (dimensions) while
the conclusion is the equality of two sets (subspaces). It is the extra "structure" of a vector space and its
dimension that makes possible this huge leap from an integer equality to a set equality.
Theorem EDYES
Equal Dimensions Yields Equal Subspaces
Suppose that U and V are subspaces of the vector space W, such that U C V and dim (U) = dim (V).
Then U=V.                                                                                         D
Proof We give a proof by contradiction (Technique CD [692]). Suppose to the contrary that U # V.
SinceU CV, there must be avector vsuch that vE Vand v gU. Let B ={ui, u2, us,. . .,ut} be a
basis for U. Then, by Theorem ELIS [355], the set C =B U {v} ={ui, 112, 113, . . ., ut, v} is a linearly
independent set of t + 1 vectors in V. However, by hypothesis, V has the same dimension as U (namely t)
and therefore Theorem G [355] says that C is too big to be linearly independent. This contradiction shows
that U =V.U


Subsection RT
Ranks and Transposes


We now prove one of the most surprising theorems about matrices. Notice the paucity of hypotheses
compared to the precision of the conclusion.


Version 2.02


﻿
                                                        Subsection PD.RT  Ranks and Transposes 360


Theorem RMRT
Rank of a Matrix is the Rank of the Transpose
Suppose A is an m x n matrix. Then r (A) = r (At).                                              D

Proof Suppose we row-reduce A to the matrix B in reduced row-echelon form, and B has r non-zero
rows. The quantity r tells us three things about B: the number of leading 1's, the number of non-zero
rows and the number of pivot columns. For this proof we will be interested in the latter two.
   Theorem BRS [245] and Theorem BCS [239] each has a conclusion that provides a basis, for the row
space and the column space, respectively. In each case, these bases contain r vectors. This observation
makes the following go.


                   r (A) = dim (C(A))                    Definition ROM [347]
                         =r                              Theorem BCS [239]
                         = dim (R(A))                    Theorem BRS [245]
                         = dim (C(At))                   Theorem CSRST [247]
                         = r (At)                        Definition ROM [347]

Jacob Linenthal helped with this proof.                                                         U

   This says that the row space and the column space of a matrix have the same dimension, which should
be very surprising. It does not say that column space and the row space are identical. Indeed, if the matrix
is not square, then the sizes (number of slots) of the vectors in each space are different, so the sets are not
even comparable.
   It is not hard to construct by yourself examples of matrices that illustrate Theorem RMRT [359], since
it applies equally well to any matrix. Grab a matrix, row-reduce it, count the nonzero rows or the leading
l's. That's the rank. Transpose the matrix, row-reduce that, count the nonzero rows or the leading 1's.
That's the rank of the transpose. The theorem says the two will be equal. Here's an example anyway.

Example RRTI
Rank, rank of transpose, Archetype I
Archetype I [737] has a 4 x 7 coefficient matrix which row-reduces to

                                     14     0   0   2   1  -3]
                                     0  0 2     0   1 -3    5
                                     0  0   0  2    2 -6    6
                                     _000      00      0    0]_

so the rank is 3. Row-reducing the transpose yields

                                           1 0   0   -~

                                         0       0W~
                                         00       0      .

                                         0 00 0
                                         0 00 0


                                       [0    0   0    0]

demonstrating that the rank of the transpose is also 3.


Version 2.02


﻿
                                                  Subsection PD.DFS Dimension of Four Subspaces 361


Subsection DFS
Dimension of Four Subspaces


That the rank of a matrix equals the rank of its transpose is a fundamental and surprising result. However,
applying Theorem FS [263] we can easily determine the dimension of all four fundamental subspaces
associated with a matrix.

Theorem DFS
Dimensions of Four Subspaces
Suppose that A is an m x n matrix, and B is a row-equivalent matrix in reduced row-echelon form with r
nonzero rows. Then

  1. dim (P1(A)) = n - r

  2. dim (C(A)) = r

  3. dim (R (A)) = r

  4. dim ([(A)) = m - r


Proof If A row-reduces to a matrix in reduced row-echelon form with r nonzero rows, then the matrix C
of extended echelon form (Definition EEF [261]) will be an r x n matrix in reduced row-echelon form with
no zero rows and r pivot columns (Theorem PEEF [262]). Similarly, the matrix L of extended echelon
form (Definition EEF [261]) will be an m - r x m matrix in reduced row-echelon form with no zero rows
and m - r pivot columns (Theorem PEEF [262]).


dim (N(A))


dim (C(A))


dim (N(C))
n - r


dim (N(L))
m-(m-r)
r


dim (R(C))
r


dim (R(L))
m - r


Theorem FS [263]
Theorem BNS [139]


Theorem FS [263]
Theorem BNS [139]


dim (R(A))


Theorem FS [263]
Theorem BRS [245]


dim (G(A))


Theorem
Theorem


FS [263]
BRS [245]


0


   There are many different ways to state and prove this result, and indeed, the equality of the dimensions
of the column space and row space is just a slight expansion of Theorem RMRT [359]. However, we
have restricted our techniques to applying Theorem FS [263] and then determining dimensions with bases
provided by Theorem BNS [139] and Theorem BRS [245]. This provides an appealing symmetry to the
results and the proof.


Version 2.02


﻿
                                                                  Subsection PD.DS Direct Sums 362


Subsection DS
Direct Sums


Some of the more advanced ideas in linear algebra are closely related to decomposing (Technique DC [694])
vector spaces into direct sums of subspaces. With our previous results about bases and dimension, now
is the right time to state and collect a few results about direct sums, though we will only mention these
results in passing until we get to Section NLT [610], where they will get a heavy workout.
   A direct sum is a short-hand way to describe the relationship between a vector space and two, or more,
of its subspaces. As we will use it, it is not a way to construct new vector spaces from others.
Definition DS
Direct Sum
Suppose that V is a vector space with two subspaces U and W such that for every v E V,

  1. There exists vectors u E U, w E W such that v = u+ w

  2. If v = ui + wi and v = u2 + w2 where ui, u2 E U, w1, w2 E W then ui = u2 and wi1= w2.

Then V is the direct sum of U and W and we write V = U ( W.
(This definition contains Notation DS.)                                                          A
   Informally, when we say V is the direct sum of the subspaces U and W, we are saying that each vector
of V can always be expressed as the sum of a vector from U and a vector from W, and this expression
can only be accomplished in one way (i.e. uniquely). This statement should begin to feel something like
our definitions of nonsingular matrices (Definition NM [71]) and linear independence (Definition LI [308]).
It should not be hard to imagine the natural extension of this definition to the case of more than two
subspaces. Could you provide a careful definition of V = U1 @ U2 @ U3 e...   Um (Exercise PD.M50 [366])?
Example SDS
Simple direct sum
In C3, define

                        3                         -1                          2
                 v1=2V2=                           2                   v3=    1
                        5                          1                         _-2-

Then C3 = ({vi, v2}) ( ({v3}). This statement derives from the fact that B = {vi, v2, v3} is basis for
C3. The spanning property of B yields the decomposition of any vector into a sum of vectors from the two
subspaces, and the linear independence of B yields the uniqueness of the decomposition. We will illustrate
these claims with a numerical example.

   Choose v =[1]. Then


                          v =2vi + (-2)v2 + lv3 =(2vi + (-2)v2) + (lv3)

where we have added parentheses for emphasis. Obviously 1v3 E ({v3}), while 2v1 + (-2)v2 E ({vi, v2}).
Theorem VRRB [317] provides the uniqueness of the scalars in these linear combinations.

   Example SDS [361] is easy to generalize into a theorem.
Theorem DSFB


Direct Sum From a Basis
Suppose that V is a vector space with a basis B = {vi, v2, v3, ..., vn}. Define

             U = ({vi, v2, v3, ..., Vm})            W = ({vm+i, Vm+2, Vm+3, ..., Vn})


Version 2.02


﻿
                                                                   Subsection PD.DS Direct Sums 363


Then V= U(eW.                                                                                     D
Proof Choose any vector v E V. Then by Theorem VRRB [317] there are unique scalars, al, a2, a3, ..., an
such that

                      v = aiv1 + a2v2 + asys + --. + anvn
                        = (aivi + a2v2 + a3v3 + ... + amvm) +
                             (am+ivm+1 + am+2vm+2 + am+3vm+3 + ... + anvn)
                        = u+ w

where we have implicitly defined u and w in the last line. It should be clear that u E U, and similarly,
w E W (and not simply by the choice of their names).
   Suppose we had another decomposition of v, say v = u* + w*. Then we could write u* as a linear
combination of vi through vm, say using scalars bi, b2, b3, ... , bm. And we could write w* as a linear
combination of vm+1 through vn, say using scalars ci, c2, c3, ... , Cn-m. These two collections of scalars
would then together give a linear combination of vi through vn that equals v. By the uniqueness of
ai, a2, a3, ..., an, ai = bi for 1 < i < m and am+i = ci for 1 < i < n - m. From the equality of these
scalars we conclude that u = u* and w = w*. So with both conditions of Definition DS [361] fulfilled we
see that V= U    W.                                                                               U

   Given one subspace of a vector space, we can always find another subspace that will pair with the first
to form a direct sum. The main idea of this theorem, and its proof, is the idea of extending a linearly
independent subset into a basis with repeated applications of Theorem ELIS [355].
Theorem DSFOS
Direct Sum From One Subspace
Suppose that U is a subspace of the vector space V. Then there exists a subspace W of V such that
V =UeW.                                                                                           D
Proof    If U = V, then choose W = {O}. Otherwise, choose a basis B = {vi, v2, v3, ..., vm} for U.
Then since B is a linearly independent set, Theorem ELIS [355] tells us there is a vector vm+l in V, but
not in U, such that B U {vm+1} is linearly independent. Define the subspace U1 = (B U {vm+i}).
   We can repeat this procedure, in the case were U1 # V, creating a new vector vm+2 in V, but
not in U1, and a new subspace U2 = (B U {vm+1, vm+2 }). If we continue repeating this procedure,
eventually, Uk = V for some k, and we can no longer apply Theorem ELIS [355]. No matter, in this case
B U {vm+1, Vm+2, ..., Vm+k} is a linearly independent set that spans V, i.e. a basis for V.
   Define W = ({vm+i, vm+2, ... , Vm+k}). We now are exactly in position to apply Theorem DSFB [361]
and see that V = U @ W.                                                                           U
   There are several different ways to define a direct sum. Our next two theorems give equivalences
(Technique E [690]) for direct sums, and therefore could have been employed as definitions. The first
should further cement the notion that a direct sum has some connection with linear independence.

Theorem DSZV
Direct Sums and Zero Vectors
Suppose U and W are subspaces of the vector space V. Then V= U e W if and only if

  1. For every v C V, there exists vectors u C U, w C W such that v u u+ w.

  2. Whenever 0    u u+ w with u C U, w C W then u= w =0.


Proof The first condition is identical in the definition and the theorem, so we only need to establish the
equivalence of the second conditions.


Version 2.02


﻿
                                                                 Subsection PD.DS Direct Sums 364


   (-) Assume that V = U @ W, according to Definition DS [361]. By Property Z [280], 0 E V and
0 = 0 + 0. If we also assume that 0 = u + w, then the uniqueness of the decomposition gives u = 0 and
w=0.
   ( ) Suppose that v c V, v = ui + w1 and v = u2 + w2 where ui, u2 E U, w1, w2 E W. Then

                   0 = v - v                                  Property Al [280]
                       (ui+wi) -((u2 +w2)
                    = (ui - U2) + (wi - W2)                   Property AA [279]

By Property AC [279], u1 - u2 E U and w1 - w2 E W. We can now apply our hypothesis, the second
statement of the theorem, to conclude that

                        u1-u2 =0                            w1-w2=0
                             u11=1u2                             Wi =W2


which establishes the uniqueness needed for the second condition of the definition.         U
   Our second equivalence lends further credence to calling a direct sum a decomposition. The two
subspaces of a direct sum have no (nontrivial) elements in common.
Theorem DSZI
Direct Sums and Zero Intersection
Suppose U and W are subspaces of the vector space V. Then V = U ( W if and only if

  1. For every v E V, there exists vectors u E U, w E W such that v = u+ w.

  2. UnW = {o}.


Proof The first condition is identical in the definition and the theorem, so we only need to establish the
equivalence of the second conditions.
   (-) Assume that V = U @ W, according to Definition DS [361]. By Property Z [280] and Definition
SI [685], {0} C U n W. To establish the opposite inclusion, suppose that x E U n W. Then, since x is an
element of both U and W, we can write two decompositions of x as a vector from U plus a vector from W,

                          x=x+0                               x=0+x

By the uniqueness of the decomposition, we see (twice) that x = 0 and U n W C {0}. Applying Definition
SE [684], we have U nW  = {O}.
   (e) Assume that U n W= {0}. And assume further that v C V is such that v =1ui + wi and
V =1u2 + w2 where ui, 112 C U, wi, w2 C W. Define x =1ui - 112. then by Property AC [279], x C U.
Also

                                     x = 11i - 112
                                       - (v - wi) - (v - w2)
                                       =(v - v) - (wi - w2)


So x c W by Property AC [279]. Thus, x c U 0 W = {0} (Definition SI [685]). So x = 0 and

                       ui-u2=0                           w2-w1=0
                           u = u2                                  w2 = wi


Version 2.02


﻿
Subsection PD.DS  Direct Sums 365


yielding the desired uniqueness of the second condition of the definition.                        U
   If the statement of Theorem DSZV [362] did not remind you of linear independence, the next theorem
should establish the connection.
Theorem DSLI
Direct Sums and Linear Independence
Suppose U and W are subspaces of the vector space V with V = U ( W. Suppose that R is a linearly
independent subset of U and S is a linearly independent subset of W. Then RU S is a linearly independent
subset of V.                                                                                      D
Proof    Let R = {ui, u2, u3, ..., uk} and S = {wi, w2, w3, ..., w}. Begin with a relation of linear
dependence (Definition RLD [308]) on the set R U S using scalars al, a2, a3, ..., a and bi, b2, b3, ..., bf.
Then,

              O=alul+a2u2+a3u3+---+akuk+biwi+b2w2+b3w3+---+bbwf
                = (aiui + a2u2 + a3u3 + - + akuk) + (biwi + b2w2 + b3w3 + ... + bews)
                = u+w

where we have made an implicit definition of the vectors u E U, w E W. Applying Theorem DSZV [362]
we conclude that

                              u = aiu1 + a2u2 + a3u3s+ --. + akuk = 0
                              w = biwi + b2w2 + b3w3 + --. + bfwf = 0

Now the linear independence of R and S (individually) yields

                ai=a2=as=---= a=0                          bi=b2=b3=---=b=0

Forced to acknowledge that only a trivial linear combination yields the zero vector, Definition LI [308] says
the set R U S is linearly independent in V.                                                       U
   Our last theorem in this collection will go some ways towards explaining the word "sum" in the moniker
"direct sum," while also partially explaining why these results appear in a section devoted to a discussion
of dimension.
Theorem DSD
Direct Sums and Dimension
Suppose U and W are subspaces of the vector space V with V = UeW. Then dim (V) = dim (U)+dim (W).


Proof We will establish this equality of positive integers with two inequalities. We will need a basis of
U (call it B) and a basis of W (call it C).
   First, note that B and C have sizes equal to the dimensions of the respective subspaces. The union
of these two linearly independent sets, B U C will be linearly independent in V by Theorem DSLI [364].
Further, the two bases have no vectors in common by Theorem DSZI [363], since Bn0 C C {0} and the zero
vector is never an element of a linearly independent set (Exercise LI.T10 [144]). So the size of the union is
exactly the sum of the dimensions of U and W. By Theorem G [355] the size of B U C cannot exceed the
dimension of V without being linearly dependent. These observations give us dim (U) +dim (W) <; dim (V).
   Grab any vector v E V. Then by Theorem DSZI [363] we can write v =u+w with u E U and w E W.
Individually, we can write u as a linear combination of the basis elements in B, and similarly, we can write
w as a linear combination of the basis elements in C, since the bases are spanning sets for their respective
subspaces. These two sets of scalars will provide a linear combination of all of the vectors in B U C which


Version 2.02


﻿
                                                         Subsection PD.READ   Reading Questions 366


will equal v. The upshot of this is that B U C is a spanning set for V. By Theorem G [355], the size of
B U C cannot be smaller than the dimension of V without failing to span V. These observations give us
dim (U) + dim (W) > dim (V).                                                                     U
   There is a certain appealling symmetry in the previous proof, where both linear independence and
spanning properties of the bases are used, both of the first two conclusions of Theorem G [355] are employed,
and we have quoted both of the two conditions of Theorem DSZI [363].
   One final theorem tells us that we can successively decompose direct sums into sums of smaller and
smaller subspaces.
Theorem RDS
Repeated Direct Sums
Suppose V is a vector space with subspaces U and W with V = U ( W. Suppose that X and Y are
subspaces of W with W = X e(Y. Then V = U e(X e(Y.                                               D
Proof   Suppose that v E V. Then due to V = U ( W, there exist vectors u E U and w E W such that
v = u+ w. Due to W = X e Y, there exist vectors x E X and y E Y such that w = x + y. All together,
                                      v =u+w=u+x+y
which would be the first condition of a definition of a 3-way direct product. Now consider the uniqueness.
Suppose that
                      V=ui+Xi+y1                            V=u2+X2+y2
Because x1 + yi E W, x2 + Y2 E W, and V = U ( W, we conclude that
                         u1 = u2                         X1+Y1= X2 + Y2
From the second equality, an application of W = X e Y yields the conclusions x1= x2 and y1 = y2. This
establishes the uniqueness of the decomposition of v into a sum of vectors from U, X and Y.
   Remember that when we write V = UeW there always needs to be a "superspace," in this case V. The
statement U ( W is meaningless. Writing V = U ( W is simply a shorthand for a somewhat complicated
relationship between V, U and W, as described in the two conditions of Definition DS [361], or Theorem
DSZV [362], or Theorem DSZI [363]. Theorem DSFB [361] and Theorem DSFOS [362] gives us sure-fire
ways to build direct sums, while Theorem DSLI [364], Theorem DSD [364] and Theorem RDS [365] tell us
interesting properties of direct sums. This subsection has been long on theorems and short on examples.
If we were to use the term "lemma" we might have chosen to label some of these results as such, since
they will be important tools in other proofs, but may not have much interest on their own (see Technique
LC [696]). We will be referencing these results heavily in later sections, and will remind you then to come
back for a second look.

Subsection READ
Reading Questions


  1. Why does Theorem G [355] have the title it does?

  2. What is so surprising about Theorem RMRT [359]?

  3. Row-reduce the matrix A to reduced row-echelon form. Without any further computations, compute
     the dimensions of the four subspaces, Af(A), C(A), R(A) and [(A).


~1 -1     2    8    5
  1   1   1    4   -1
4 0   2  -3 -8 -6
  2  0    1    8    4


Version 2.02


﻿
                                                                    Subsection PD.EXC  Exercises 367


Subsection EXC
Exercises


C10 Example SVP4 [357] leaves several details for the reader to check. Verify these five claims.
Contributed by Robert Beezer

C40 Determine if the set T = {x2 - x + 5, 4x3 - 2 + 5x, 3x + 2} spans the vector space of polynomials
with degree 4 or less, P4. (Compare the solution to this exercise with Solution LISS.C40 [322].)
Contributed by Robert Beezer Solution [367]

M50    Mimic Definition DS [361] and construct a reasonable definition of V = U1 ( U2 ( U3 e ...  Urn.
Contributed by Robert Beezer

T05 Trivially, if U and V are two subspaces of W, then dim (U) = dim (V). Combine this fact, Theorem
PSSD [358], and Theorem EDYES [358] all into one grand combined theorem. You might look to Theorem
PIP [172] stylistic inspiration. (Notice this problem does not ask you to prove anything. It just asks you
to roll up three theorems into one compact, logically equivalent statement.)
Contributed by Robert Beezer

T10 Prove the following theorem, which could be viewed as a reformulation of parts (3) and (4) of
Theorem G [355], or more appropriately as a corollary of Theorem G [355] (Technique LC [696]).
   Suppose V is a vector space and S is a subset of V such that the number of vectors in S equals the
dimension of V. Then S is linearly independent if and only if S spans V.
Contributed by Robert Beezer

T15   Suppose that A is an m x n matrix and let min(m, n) denote the minimum of m and n. Prove that
r (A) < min(m, n).
Contributed by Robert Beezer

T20 Suppose that A is an m x n matrix and b E Ctm. Prove that the linear system [S(A, b) is consistent
if and only if r (A) = r ([A  b]).
Contributed by Robert Beezer Solution [367]

T25 Suppose that V is a vector space with finite dimension. Let W be any subspace of V. Prove that
W has finite dimension.
Contributed by Robert Beezer

T33 Part of Exercise B.T50 [337] is the half of the proof where we assume the matrix A is nonsingular
and prove that a set is basis. In Solution B.T50 [339] we proved directly that the set was both linearly
independent and a spanning set. Shorten this part of the proof by applying Theorem G [355]. Be careful,
there is one subtlety.
Contributed by Robert Beezer Solution [367]

T60 Suppose that W is a vector space with dimension 5, and U and V are subspaces of W, each of
dimension 3. Prove that U n V contains a non-zero vector. State a more general result.
Contributed by Joe Riegsecker Solution [367]


Version 2.02


﻿
                                                                    Subsection PD.SOL  Solutions 368


Subsection SOL
Solutions


C40    Contributed by Robert Beezer  Statement [366]
The vector space P4 has dimension 5 by Theorem DP [345]. Since T contains only 3 vectors, and 3 < 5,
Theorem G [355] tells us that T does not span P5.
T20    Contributed by Robert Beezer  Statement [366]
(-) Suppose first that IS(A, b) is consistent. Then by Theorem CSCS [237], b E C(A). This means that
C(A) = C([A | b]) and so it follows that r (A) = r ([A | b]).
    (<) Adding a column to a matrix will only increase the size of its column space, so in all cases,
C(A) c C([A | b]). However, if we assume that r (A) = r ([A | b]), then by Theorem EDYES [358] we
conclude that C(A) = C([A | b]). Then b E C([A | b]) = C(A) so by Theorem CSCS [237], [S(A, b) is
consistent.
T33    Contributed by Robert Beezer  Statement [366]
By Theorem DCM [345] we know that C" has dimension n. So by Theorem G [355] we need only establish
that the set C is linearly independent or a spanning set. However, the hypotheses also require that C
be of size n. We assumed that B = {x1, x2, x3, ..., xn} had size n, but there is no guarantee that
C = {Axi, Ax2, Ax3, ..., Axn} will have size n. There could be some "collapsing" or "collisions."
   Suppose we establish that C is linearly independent. Then C must have n distinct elements or else we
could fashion a nontrivial relation of linear dependence involving duplicate elements.
   If we instead to choose to prove that C is a spanning set, then we could establish the uniqueness of the
elements of C quite easily. Suppose that Axe =Axe. Then

                                    A(x2 - x3) = Axe - Axe = 0

Since A is nonsingular, we conclude that x2 - x, = 0, or x = x3, contrary to our description of B.
T60    Contributed by Robert Beezer  Statement [366]
Let {ui, u2, u3} and {vi, v2, v3} be bases for U and V (respectively). Then, the set {ui, u2, u3, vi, v2, v3}
is linearly dependent, since Theorem G [355] says we cannot have 6 linearly independent vectors in a vector
space of dimension 5. So we can assert that there is a non-trivial relation of linear dependence,

                             aiui + a2u2 + asu3 + biv -+ b2v2 + b3v3 = 0

where ai, a2, a3 and bi, b2, b3 are not all zero.
   We can rearrange this equation as

                              aiu1 + a2u2 + asus 3   bv - b2v2 - b3v3

This is an equality of two vectors, so we can give this common vector a name, say w,

                           w =aiu1 + a2u2 + asus3      bv - b2v2 - b3v3

This is the desired non-zero vector, as we will now show.
   First, since w =aiu1 + a2u2 + asus, we can see that w C U. Similarly, w =-ii- b2v2 - b3v3, so
w C V. This establishes that w C U n V (Definition SI [685]).
   Is w -f 0? Suppose not, in other words, suppose w =0. Then


                                    0 =w =au1 + a2u2 + a3u3

Because {ui, u2, u3} is a basis for U, it is a linearly independent set and the relation of linear dependence
above means we must conclude that ai1= a2 = a3= 0. By a similar process, we would conclude that


Version 2.02


﻿
                                                                       Subsection PD.SOL   Solutions 369


bi = b2 = b3 = 0. But this is a contradiction since ai, a2, a3, b1, b2, b3 were chosen so that some were
nonzero. So w # 0.
    How does this generalize? All we really needed was the original relation of linear dependence that
resulted because we had "too many" vectors in W. A more general statement would be: Suppose that W
is a vector space with dimension n, U is a subspace of dimension p and V is a subspace of dimension q. If
p + q> n, then U n V contains a non-zero vector.


Version 2.02


﻿
                                                        Annotated Acronyms PD.VS Vector Spaces 370


Annotated Acronyms VS
Vector Spaces


Definition VS [279]
The most fundamental object in linear algebra is a vector space. Or else the most fundamental object is
a vector, and a vector space is important because it is a collection of vectors. Either way, Definition VS
[279] is critical. All of our remaining theorems that assume we are working with a vector space can trace
their lineage back to this definition.

Theorem TSS [293]
Check all ten properties of a vector space (Definition VS [279]) can get tedious. But if you have a subset
of a known vector space, then Theorem TSS [293] considerably shortens the verification. Also, proofs of
closure (the last trwo conditions in Theorem TSS [293]) are a good way tp practice a common style of
proof.

Theorem VRRB [317]
The proof of uniqueness in this theorem is a very typical employment of the hypothesis of linear inde-
pendence. But that's not why we mention it here. This theorem is critical to our first section about
representations, Section VR [530], via Definition VR [530].

Theorem CNMB [330]
Having just defined a basis (Definition B [325]) we discover that the columns of a nonsingular matrix form
a basis of C". Much of what we know about nonsingular matrices is either contained in this statement, or
much more evident because of it.

Theorem SSLD [341]
This theorem is a key juncture in our development of linear algebra. You have probably already realized
how useful Theorem G [355] is. All four parts of Theorem G [355] have proofs that finish with an application
of Theorem SSLD [341].

Theorem RPNC [348]
This simple relationship between the rank, nullity and number of columns of a matrix might be surprising.
But in simplicity comes power, as this theorem can be very useful. It will be generalized in the very last
theorem of Chapter LT [452], Theorem RPNDD [517].

Theorem G [355]
A whimsical title, but the intent is to make sure you don't miss this one. Much of the interaction between
bases, dimension, linear independence and spanning is captured in this theorem.

Theorem RMRT [359]
This one is a real surprise. Why should a matrix, and its transpose, both row-reduce to the same number
of non-zero rows?


Version 2.02


﻿


Chapter D

Determinants


0


0


The determinant is a function that takes a square matrix as an input and produces a scalar as an output.
So unlike a vector space, it is not an algebraic structure. However, it has many beneficial properties for
studying vector spaces, matrices and systems of equations, so it is hard to ignore (though some have tried).
While the properties of a determinant can be very useful, they are also complicated to prove.


Section DM
Determinant of a Matrix
U.-


0


First, a slight detour, as we introduce elementary matrices, which will bring us back to the beginning of
the course and our old friend, row operations.


Subsection EM
Elementary Matrices


Elementary matrices are very simple, as you might have suspected from their name. Their purpose is
to effect row operations (Definition RO [28]) on a matrix through matrix multiplication (Definition MM
[197]). Their definitions look more complicated than they really are, so be sure to read ahead after you
read the definition for some explanations and an example.

Definition ELEM
Elementary Matrices


  1. For i # j, Ezj is the square matrix of size n with


          0
          1

[Ei,j]k  0=
           1
           0
           1


k # ik :#j,2=k
k: i k # j


k
k
k


i,.=ij

j~f$i


371


﻿
                                                           Subsection DM.EM   Elementary Matrices 372


  2. For a # 0, E (a) is the square matrix of size n with

                                                     0 k #i,£# k
                                       [Ei(a)]g=1       k~ij =k
                                                     a k=i,F=i

  3. For i # j, Ei~J (a) is the square matrix of size n with

                                                  0   k# j, £ # k
                                                  1   k #j, E= k
                                    [Ei, (a)]kg=0     k = j, #i, E #j
                                                   1 k = j, E= j
                                                   o k = j, E= i

     (This definition contains Notation ELEM.)
                                                                                                   A
   Again, these matrices are not as complicated as they appear, since they are mostly perturbations of
the n x n identity matrix (Definition IM [72]). Eij is the identity matrix with rows (or columns) i and
j trading places, Ei (a) is the identity matrix where the diagonal entry in row i and column i has been
replaced by a, and Ei (a) is the identity matrix where the entry in row j and column i has been replaced
by a. (Yes, those subscripts look backwards in the description of Ei (a)). Notice that our notation makes
no reference to the size of the elementary matrix, since this will always be apparent from the context, or
unimportant.
   The raison d'8tre for elementary matrices is to "do" row operations on matrices with matrix multi-
plication. So here is an example where we will both see some elementary matrices and see how they can
accomplish row operations.
Example EMRO
Elementary matrices and row operations
We will perform a sequence of row operations (Definition RO [28]) on the 3 x 4 matrix A, while also
multiplying the matrix on the left by the appropriate 3 x 3 elementary matrix.

                                                2 1 3 1
                                          A     1 3 2 4
                                                5 0 3 1


            R1-R3: 1[324                     E1,3:0          0    1 3 2 4>=[        2 4]


         2R3+R1:      2648]                E3,1(2):   0                                2


   The next three theorems establish that each elementary matrix effects a row operation via matrix
multiplication.


Version 2.02


﻿
                                                          Subsection DM.EM   Elementary Matrices 373


Theorem EMDRO
Elementary Matrices Do Row Operations
Suppose that A is an m x n matrix, and B is a matrix of the same size that is obtained from A by a single
row operation (Definition RO [28]). Then there is an elementary matrix of size m that will convert A to
B via matrix multiplication on the left. More precisely,

  1. If the row operation swaps rows i and j, then B = E A.

  2. If the row operation multiplies row i by a, then B = E (a) A.

  3. If the row operation multiplies row i by a and adds the result to row j, then B = Ez (a) A.


Proof In each of the three conclusions, performing the row operation on A will create the matrix B
where only one or two rows will have changed. So we will establish the equality of the matrix entries row
by row, first for the unchanged rows, then for the changed rows, showing in each case that the result of
the matrix product is the same as the result of the row operation. Here we go.
   Row k of the product Eij A, where k # i, k # j, is unchanged from A,


                         n
             [Ei~jA]    E  [E2,jl] [A],g
                        p=1
                                       n
                        [E],jlkk [A]k +(  [E ,]k [A]pf
                                      p=1
                                      pzk
                                 n
                      = 1 [Alke +   0 [A]Pe
                                p=1
                                p:Ak
                        [A] W

Row i of the product EijA is row j of A,


Theorem EMP [198]


Definition ELEM [370]


           n
[E1, A] = E[E , ] p[A],g
          p=1
                         n
        = [E J] A[] jg +(E [Ei,49] pA]pf
                        p=1
                        pzj
                    n
        = 1 [A]jf + 50 [A]pp
                   p=1
                   pzj


Theorem EMP [198]


Definition ELEM [370]


Row j of the product Ei A is row i of A,


            n
[Eij ]j A]E [Eijg] p[A],g
           p=1
                         n
        = [E,]1 A]iP+(E     [Ei,4]j A]p
                        p=1


Theorem EMP [198]


Version 2.02


﻿
Subsection DM.EM  Elementary Matrices 374


         n
1 [A]ze + S0 [A]pe
        p=1
        p#i


Definition ELEM [370]


So the matrix product E2, A is the same as the row operation that swaps rows i and j.
   Row k of the product E (a) A, where k # i, is unchanged from A,


[EZ (ce) A] e


n
S   [EZ (a)lkp [A]pe
p=1
                 n
[EZ (a)] kk [A + 5  [E (a)]kp [A]pe
                p=1
                pzk
         n
1 [A] W + 5 0 [A]pe
        p=1
        p:Ak


Theorem EMP [198]


Definition ELEM [370]


                     = [A] W

Row i of the product E (a) A is a times row i of A,


[EZ (ce) Adze


n
S [EZ (az)]ip [A]Pe
p=1
                n
[EZ (a)]2 [A]ig + 5 [ E  (a)]jp [A]pe
               p=1
               p#i
         n
a [A] + 50 [A]pe
        p=1
        p#i
a [ A]Ze


Theorem EMP [198]


Definition ELEM [370]


So the matrix product E (a) A is the same as the row operation that swaps multiplies row i by a.
   Row k of the product E2, (a) A, where k # j, is unchanged from A,


[E1,j (a) A] e


n
5   [E  (a)] [A]pP
p=1
                  n
[E   (a)]kk [A]k +(  [E (a)]k [A]pP
                 p=1
                 pzk
         n
1 [A]kg +   0 [A]pe
        p=1
        p[Ak
[ A] W


Theorem EMP [198]


Definition ELEM [370]


Row j of the product EJ (a) A, is a times row i of A and then added to row j of A,


[E2J (a) A]je


p=1


Theorem EMP [198]


Version 2.02


﻿
                                                   Subsection DM.DD  Definition of the Determinant 375


                       [Ei, (o)]jj [A]gg +

                         [Ei, (a)] i [A]ir + E [Ei,j (o)] , [A]p,
                                          p=1
                                          p:Aj,i
                                          n
                     =1 [A]gf + a [A]i + >3 0 [A]pf                   Definition ELEM [370]
                                         p=1
                                         poj,i
                     =[A]+ a [A]

So the matrix product Ei, (a) A is the same as the row operation that multiplies row i by a and adds the
result to row j.                                                                                  U
   Later in this section we will need two facts about elementary matrices.
Theorem EMN
Elementary Matrices are Nonsingular
If E is an elementary matrix, then E is nonsingular.                                              D
Proof    We show that we can row-reduce each elementary matrix to the identity matrix. Given an
elementary matrix of the form Ei,j, perform the row operation that swaps row j with row i. Given an
elementary matrix of the form Ei (a), with a # 0, perform the row operation that multiplies row i by 1/a.
Given an elementary matrix of the form Ei, (a), with a # 0, perform the row operation that multiplies
row i by -a and adds it to row j. In each case, the result of the single row operation is the identity
matrix. So each elementary matrix is row-equivalent to the identity matrix, and by Theorem NMRRI [72]
is nonsingular.


   Notice that we have now made use of the nonzero restriction on a in the definition of Ei (a). One more
key property of elementary matrices.
Theorem NMPEM
Nonsingular Matrices are Products of Elementary Matrices
Suppose that A is a nonsingular matrix. Then there exists elementary matrices ELi, E2, E3, ..., Et so that
A = E1 E2 E3 . . . E.
Proof Since A is nonsingular, it is row-equivalent to the identity matrix by Theorem NMRRI [72], so
there is a sequence of t row operations that converts I to A. For each of these row operations, form the as-
sociated elementary matrix from Theorem EMDRO [372] and denote these matrices by Eli, E2, E3, ..., Et.
Applying the first row operation to I yields the matrix E1I. The second row operation yields E2(El1I),
and the third row operation creates E3E2E1I. The result of the full sequence of t row operations will yield
A, so


Other than the cosmetic matter of re-indexing these elementary matrices in the opposite order, this is the
desired result.U


Subsection DD
Definition of the Determinant


We'll now turn to the definition of a determinant and do some sample computations. The definition of the
determinant function is recursive, that is, the determinant of a large matrix is defined in terms of the
determinant of smaller matrices. To this end, we will make a few definitions.


Version 2.02


﻿
                                                 Subsection DM.DD   Definition of the Determinant 376


Definition SM
SubMatrix
Suppose that A is an m x n matrix. Then the submatrix A (ilj) is the (m - 1) x (n - 1) matrix obtained
from A by removing row i and column j.
(This definition contains Notation SM.)                                                         A

Example SS
Some submatrices
For the matrix
                                             1 -2 3 9
                                       A     4 -2 0 1
                                             3   5   2 1
we have the submatrices

                               1-2   9(3                             -2[ 3


Definition DM
Determinant of a Matrix
Suppose A is a square matrix. Then its determinant, det (A) = |A l,is an element of C defined recursively
by:
If A is a 1 x 1 matrix, then det (A) = [A]11.
If A is a matrix of size n with n> 2, then

                 det (A) = [A]11 det (A (1|1)) - [A]12 det (A (1|2)) + [A]13 det (A (1|3)) -
                          [A]14det (A(1|4)) + - - - + (-1)n+1[A]1,det (A (1|n))


(This definition contains Notation DM.)                                                         A
   So to compute the determinant of a 5 x 5 matrix we must build 5 submatrices, each of size 4. To
compute the determinants of each the 4 x 4 matrices we need to create 4 submatrices each, these now of
size 3 and so on. To compute the determinant of a 10 x 10 matrix would require computing the determinant
of 10! = 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2= 3, 628, 800 1 x 1 matrices. Fortunately there are better ways.
However this does suggest an excellent computer programming exercise to write a recursive procedure to
compute a determinant.
   Let's compute the determinant of a reasonable sized matrix by hand.

Example D33M
Determinant of a 3 x 3 matrix
Suppose that we have the 3 x 3 matrix

                                           A4 21 -1


Then


                3    2   -1
det(A)=|Al=     4    1    6
                -3  -1    2


Version 2.02


﻿
                                                     Subsection DM.CD  Computing Determinants 377


                                  1  6       4  6          4    1
                              =3        -2         +(-1)
                                 -1 2       -3 2          -3 -1
                            = 3(1 2 -6 -1 ) -2(4 2 -6 -3) - (4 -1 -1 -3I)
                            =3 (1(2) - 6(-1)) - 2 (4(2) - 6(-3)) - (4(-1) - 1(-3))
                               24- 52+ 1
                            =-27


   In practice it is a bit silly to decompose a 2 x 2 matrix down into a couple of 1 x 1 matrices and then
compute the exceedingly easy determinant of these puny matrices. So here is a simple theorem.
Theorem DMST
Determinant of Matrices of Size Two
Suppose that A =a       . Then det (A) = ad -bc

Proof Applying Definition DM [375],

                                    a b
                                    c d =a d -bc =ad - bc


   Do you recall seeing the expression ad - bc before? (Hint: Theorem TTMI [214])

Subsection CD
Computing Determinants


There are a variety of ways to compute the determinant. We will establish first that we can choose to
mimic our definition of the determinant, but by using matrix entries and submatrices based on a row other
than the first one.
Theorem DER
Determinant Expansion about Rows
Suppose that A is a square matrix of size n. Then

        det (A)  (-1)i+1 [A]1 det (A (il1)) + (-1)i+2 [A]i2 det (A (il2))
                 + (-1)i+3 [A]is det (A (il3)) + - - + (-1)i~n [A]g, det (A (ilnt))  1 < i <rn

which is known as expansion about row i.D
Proof First, the statement of the theorem coincides with Definition DM [375] when i =1, so throughout,
we need only consider i > 1.
   Given the recursive definition of the determinant, it should be no surprise that we will use induction
for this proof (Technique I [694]). When nr= 1, there is nothing to prove since there is but one row. When
n = 2, we just examine expansion about the second row,

         (-1)2±1 [A]21 det (A (2|1)) + (-1)2±2 [A]22 det (A (2|2))


  - [A]21 [A]12 + [A]22 [A]11                 Definition DM  [375]
- [A]11 [A]22 - [A]12 [A]21
= det (A)                                     Theorem DMST [376]


Version 2.02


﻿
Subsection DM.CD  Computing Determinants 378


So the theorem is true for matrices of size n =1 and n = 2. Now assume the result is true for all matrices
of size n - 1 as we derive an expression for expansion about row i for a matrix of size n. We will abuse
our notation for a submatrix slightly, so A (i1, i2|ji, j2) will denote the matrix formed by removing rows
ii and i2, along with removing columns ji and j2. Also, as we take a determinant of a submatrix, we will
need to "jump up" the index of summation partway through as we "skip over" a missing column. To do
this smoothly we will set


                                               N0w<    j
Now,


          n
det (A)   Z (-1)1+j [A]1j det (A (1|j))
         j=1
         n
         >     3(-1)1+j [A](iA1i- -et [A]  det (A  (1, 1iijf)))
         j=1             1<Q <n
         n


              EE(-1)3+i+ -i [A]1j [A]Ze det (A (1, ilj, E ))


          n
          (-1)i+   _)[A +  i (-13-   [A] det (A (1  ilj , ))
          P=1 1jsn
          n
          E(-1)i+ e [A]g 2 (-1)3i ~j[A]1j det (A (i,1, j.7))
          P=1           1Gjsn
          n


          E(-1)i+ e [A] i det (A (il))
          f=1


Definition DM [375]


Induction Hypothesis


Property DCN [681]


Property CACN [680]


Property DCN [681]


2eJ is even


Definition DM [375]


0


   We can also obtain a formula that computes a determinant by expansion about a column, but this will
be simpler if we first prove a result about the interplay of determinants and transposes. Notice how the
following proof makes use of the ability to compute a determinant by expanding about any row.
Theorem DT


Determinant of the Transpose
Suppose that A is a square matrix. Then det (At) = det (A).


F-I


Proof With our definition of the determinant (Definition DM [375]) and theorems like Theorem DER
[376], using induction (Technique I [694]) is a natural approach to proving properties of determinants. And
so it is here. Let n be the size of the matrix A, and we will use induction on n.
   For n = 1, the transpose of a matrix is identical to the original matrix, so vacuously, the determinants
are equal.
   Now assume the result is true for matrices of size n - 1. Then,
                        n
           det (At) = >3Edet (At)
                      n
                        i=1


Version 2.02


﻿
Subsection DM.CD  Computing Determinants 379


  1   n
- 1 Z(-)i+j [At] j det (At (ilj))
  i=1 j=1
  1 n r
- S   5(-1)i+j [A]1 det (At (ij))t
  i=1 j=1
  1 n
- (1)'+j [A] j det ((A (jli))
  i=1 j=1
  1  n       2+j
    - 2 (-1i~j[A] j det (A (jli))

  1 n
-        (-1)J+i [A]j det (A (jli))


- >det (A)
  j=1
det (A)


Theorem DER [376]


Definition TM [185]


Definition TM [185]


Induction Hypothesis


Property CACN [680]


Theorem DER [376]


0


   Now we can easily get the result that a determinant can be computed by expansion about any column
as well.

Theorem DEC
Determinant Expansion about Columns
Suppose that A is a square matrix of size n. Then


det (A)  (-1)1+J [A]1, det (A (1|j)) + (-1)2+j [A]2j det (A (2|j))
         + (-i)3+3 [A]3j det (A (3|j)) + ... + (-1)n3j [A] j det (A(rj))


1 j n


which is known as expansion about column j.

Proof


D-


det (A) = det (At)

        = (-1)j+i [At] det (At (jli))
          21


          = (-1)j+i [At]j, det (A (ilj))
          i~1


          n
          21


          = (-1)i+j [A] i det (A (ilj))
          i=1


Theorem DT [377]

Theorem DER [376]


Definition TM [185]


Theorem DT [377]


Definition TM [185]


0


   That the determinant of an n x n matrix can be computed in 2n different (albeit similar) ways is
nothing short of remarkable. For the doubters among us, we will do an example, computing a 4 x 4 matrix
in two different ways.


Version 2.02


﻿
Subsection DM.CD  Computing Determinants 380


Example TCSD
Two computations, same determinant
Let
                                          -2 3
                                          9   -2
                                    Tat1       3
                                           4 1
Then expanding about the fourth row (Theorem DER


  0
  0
  -2
  2
[376]


  1
  1
  -1
  6]
with i:


= 4) yields,


                 3
|AI = (4)(-1)4+1 -2


0
0


1
1 + (1) (-1)4+2


-2
9


            3  -2-1                  1
               -2   3    1-
  + (2)(-1)4+3 9   -2    1 + (6)(-1)4+4
                1   3   -1
(-4)(10) + (1)(-22) + (-2)(61) + 6(46) = 92


0
0
-2
-2
9
1


1
1
-1
3    0
-2 0
3   -2


while expanding about column 3 (Theorem DEC [378] with j = 3) gives

                                    9 -2     1              -2 3
                    |AI - (0)(-1)1±3 1  3   -1 + (0) (-1)2+3 1   3
                                    4   1   6                4   1
                                       -2    3  1              -2
                           (-2)(-1)3+3 9    -2  1 + (2)(-1)4+3 9
                                        4    1  6               1
                       = 0 + 0 + (-2)(-107) + (-2)(61) = 92


1
-1+
6
3    1
-2 1
3   -1


Notice how much easier the second computation was. By choosing to expand about the third column, we
have two entries that are zero, so two 3 x 3 determinants need not be computed at all!
   When a matrix has all zeros above (or below) the diagonal, exploiting the zeros by expanding about
the proper row or column makes computing a determinant insanely easy.
Example DUTM
Determinant of an upper triangular matrix
Suppose that
                                        2   3   -1   3    3
                                        0 -1     5   2   -1
                                   T    0   0    3   9    2
                                        0   0    0  -1    3
                                        0   0    0   0    5
We will compute the determinant of this 5 x 5 matrix by consistently expanding about the first column for
each submatrix that arises and does not have a zero entry multiplying it.


          2
          0
det(T)= 0
          0
          0


3
-1
0
0
0


c-
e.
C
C


-1
5
3
0
0
-1
0
0
0


3
2
9
-1
0
5
3
0
0


3
-1
2
3
5
2
9
-1
0


2(-1)i±1


-1
2
3
5


Version 2.02


﻿
                                                         Subsection DM.READ  Reading Questions 381


                                                         3  9   2
                                        =2(-1) (-1)1+1 0 -1 3
                                                        0   0   5

                                        =2(1) (3) (-1) ( -1 3
                                                            0 5


                                        = 2(-1)(3)(-1)(5) = 30


   If you consult other texts in your study of determinants, you may run into the terms "minor" and
"cofactor," especially in a discussion centered on expansion about rows and columns. We've chosen not to
make these definitions formally since we've been able to get along without them. However, informally, a
minor is a determinant of a submatrix, specifically det (A (ilj)) and is usually referenced as the minor of
[A] 3. A cofactor is a signed minor, specifically the cofactor of [A]2, is (-1)i+j det (A (ij)).

Subsection READ
Reading Questions


  1. Construct the elementary matrix that will effect the row operation -6R2 + R3 on a 4 x 7 matrix.

  2. Compute the determinant of the matrix

                                               2   3   -1
                                               3   8    2
                                               4 -1 -3_

  3. Compute the determinant of the matrix

                                            3 9 -2      4   2
                                            0  1   4   -2   7
                                            0 0 -2      5   2
                                            0 0    0   -1 6
                                            0 0    0    0   4


Version 2.02


﻿
                                                                  Subsection DM.EXC  Exercises 382


Subsection EXC
Exercises


C24 Doing the computations by hand, find the determinant of the matrix below.

                                           --2 3 -2
                                           -4 -2 1
                                           2    4    2

Contributed by Robert Beezer Solution [382]

C25 Doing the computations by hand, find the determinant of the matrix below.

                                            3 -1 4
                                            2   5   1
                                            2   0   6

Contributed by Robert Beezer   Solution [382]

C26 Doing the computations by hand, find the determinant of the matrix A.

                                              2 0 3 2
                                        A= 5 1 2 4
                                              3 0 1 2
                                              _5 3 2 1_

Contributed by Robert Beezer Solution [382]


Version 2.02


﻿
                                                                Subsection DM.SOL  Solutions 383


Subsection SOL
Solutions


C24    Contributed by Robert Beezer  Statement [381]
We'll expand about the first row since there are no zeros to exploit,


-2
-4
2


3
-2
4


-2         -2 1             -4 1          -   -2
1 =(-2) 4       2 +(-1)(3) 2    2 + (-2)2      4
2


                     = (-2)((-2)(2) - 1(4)) + (-3)((-4)(2) - 1(2)) + (-2)((-4)(4) - (-2)(2))
                     = (-2)(-8) + (-3)(-10) + (-2)(-12) = 70

C25    Contributed by Robert Beezer  Statement [381]
We can expand about any row or column, so the zero entry in the middle of the last row is attractive. Let's
expand about column 2. By Theorem DER [376] and Theorem DEC [378] you will get the same result by
expanding about a different row or column. We will use Theorem DMST [376] twice.

                3 -1 4
                2  5   1    (-1)(-1)1+2      + (5)(-1)2+2 26+ (0)(-1)332
                2  0   6
                         = (1)(10) + (5)(10) + 0 = 60

C26    Contributed by Robert Beezer  Statement [381]
With two zeros in column 2, we choose to expand about that column (Theorem DEC [378]),


          2 0 3 2
          5 12 4
det (A)=3 0 1 2
          5 3 2 1
                5 2 4         2 3 2            2 3 2         2 3 2
       = 0(-1) 3 1 2 + 1(1) 3    1 2 +0(-1) 5 2 4+ 3(1) 5 2 4
                5 2 1         5 2 1            5 2 1         3 1 2
       - (1) (2(1(1) - 2(2)) - 3(3(1) - 5(2)) + 2(3(2) - 5(1))) +
           (3) (2(2(2) - 4(1)) - 3(5(2) - 4(3)) + 2(5(1) - 3(2)))
       = (-6+21+2)+(3)(0+6-2) = 29


Version 2.02


﻿
                                              Section PDM  Properties of Determinants of Matrices 384


Section PDM
Properties of Determinants of Matrices


We have seen how to compute the determinant of a matrix, and the incredible fact that we can perform
expansion about any row or column to make this computation. In this largely theoretical section, we will
state and prove several more intriguing properties about determinants. Our main goal will be the two
results in Theorem SMZD [389] and Theorem DRMM [391], but more specifically, we will see how the
value of a determinant will allow us to gain insight into the various properties of a square matrix.

Subsection DRO
Determinants and Row Operations


We start easy with a straightforward theorem whose proof presages the style of subsequent proofs in this
subsection.
Theorem DZRC
Determinant with Zero Row or Column
Suppose that A is a square matrix with a row where every entry is zero, or a column where every entry is
zero. Then det (A) = 0.                                                                        D
Proof Suppose that A is a square matrix of size n and row i has every entry equal to zero. We compute
det (A) via expansion about row i.
                         n
               det (A) =((1)i+j [A]j3 det (A (ilj))             Theorem DER [376]
                        j=1
                        n
                        =  (-1)'+j 0 det (A (ilj))              Row i is zeros
                        j=1
                        n
                      =SE0= 0
                        j=1

The proof for the case of a zero column is entirely similar, or could be derived from an application of
Theorem DT [377] employing the transpose of the matrix.                                        U

Theorem DRCS
Determinant for Row or Column Swap
Suppose that A is a square matrix. Let B be the square matrix obtained from A by interchanging the
location of two rows, or interchanging the location of two columns. Then det (B) =- det (A).   D
Proof Begin with the special case where A is a square matrix of size n~ and we form B by swapping
adjacent rows i and i + 1 for some 1 < i < n~ - 1. Notice that the assumption about swapping adjacent
rows means that B (i + 1|j) = A (ilj) for all 1   j <rn, and [B]i+1,j = [A]g for all 1   j   n. We compute
det (B) via expansion about row i + 1.


           det (B) =   (1)(i+1)+j [B]i+1,j det (B (i + 1|j))       Theorem DER [376]


j=1
n
=5(-1)(i+1)+j [A]2j det (A (ij))               Hypothesis
j=1


Version 2.02


﻿
                                            Subsection PDM.DRO  Determinants and Row Operations 385

                       n
                       = (-)1(-)i~j[A]2j det (A  (ilj))
                       j=1
                           n
                   = (-1)     (-1)i+J [A]2j det (A (ij))
                          j=1
                      -det (A)                                        Theorem DER [376]

So the result holds for the special case where we swap adjacent rows of the matrix. As any computer
scientist knows, we can accomplish any rearrangement of an ordered list by swapping adjacent elements.
This principle can be demonstrated by naive sorting algorithms such as "bubble sort." In any event, we
don't need to discuss every possible reordering, we just need to consider a swap of two rows, say rows s
and t with 1 <s <t < n.
   Begin with row s, and repeatedly swap it with each row just below it, including row t and stopping
there. This will total t - s swaps. Now swap the former row t, which currently lives in row t - 1, with
each row above it, stopping when it becomes row s. This will total another t - s - 1 swaps. In this way,
we create B through a sequence of 2(t - s) - 1 swaps of adjacent rows, each of which adjusts det (A) by a
multiplicative factor of -1. So

                 det (B) = (1)2(t-S)-1 det (A) = ((1)2)tS (_1)-1 det (A)  - det (A)

as desired.
   The proof for the case of swapping two columns is entirely similar, or could be derived from an appli-
cation of Theorem DT [377] employing the transpose of the matrix.                                   U
   So Theorem DRCS [383] tells us the effect of the first row operation (Definition RO [28]) on the
determinant of a matrix. Here's the effect of the second row operation.
Theorem DRCM
Determinant for Row or Column Multiples
Suppose that A is a square matrix. Let B be the square matrix obtained from A by multiplying a single
row by the scalar a, or by multiplying a single column by the scalar a. Then det (B) = a det (A).  Q
Proof Suppose that A is a square matrix of size n and we form the square matrix B by multiplying each
entry of row i of A by a. Notice that the other rows of A and B are equal, so A (ij) = B (ilj), for all
1 < j < n. We compute det (B) via expansion about row i.
                         n
               det (B)   Z(-1)i±J [B]Z, det (B (ilj))              Theorem DER [376]
                        j=1
                        n
                        =Z(-1)i+j [B]2, det (A (ilj))              Hypothesis
                        j=1

                        =Z(-1)iJa~ [A]gg det (A (ilj))             Hypothesis
                        j=1

                        =~ Z(-1)i+J [A]gg det (A (ilj))
                          j=1
                       =ca det (A)                                 Theorem DER [376]


The proof for the case of a multiple of a column is entirely similar, or could be derived from an application


of Theorem DT [377] employing the transpose of the matrix.                                          U
   Let's go for understanding the effect of all three row operations. But first we need an intermediate
result, but it is an easy one.


Version 2.02


﻿
                                          Subsection PDM.DRO   Determinants and Row Operations 386


Theorem DERC
Determinant with Equal Rows or Columns
Suppose that A is a square matrix with two equal rows, or two equal columns. Then det (A) = 0. Q
Proof Suppose that A is a square matrix of size n where the two rows s and t are equal. Form the matrix
B by swapping rows r and s. Notice that as a consequence of our hypothesis, A = B. Then

                          1
                det (A) =- (det (A) + det (A))
                          2
                          1
                       =- (det (A) - det (B))                Theorem DRCS [383]
                          2
                          1
                       = 2 (det (A) - det (A))               Hypothesis, A =B
                          1
                       - (0) = 0

The proof for the case of two equal columns is entirely similar, or could be derived from an application of
Theorem DT [377] employing the transpose of the matrix.                                        U
   Now explain the third row operation. Here we go.
Theorem DRCMA
Determinant for Row or Column Multiples and Addition
Suppose that A is a square matrix. Let B be the square matrix obtained from A by multiplying a row
by the scalar a and then adding it to another row, or by multiplying a column by the scalar a and then
adding it to another column. Then det (B) = det (A).                                           D
Proof Suppose that A is a square matrix of size n. Form the matrix B by multiplying row s by a and
adding it to row t. Let C be the auxiliary matrix where we replace row t of A by row s of A. Notice that
A (t j) = B (t j) = C (t j) for all 1 < j <rn. We compute the determinant of B by expansion about row t.
                     n
          det (B) =Z(-1)t+j [B]tu det (B (t j))                    Theorem DER [376]
                    j=1

                 =Z(-1)t+j (a [A]sj + [A]) det (B (t j))          Hypothesis
                    j=1
                    n
                  = (-1)t+j a [A]S det (B (tj))
                    j=1
                        n
                     + Z(-1)t+j [A]- det (B (t j))
                        j=1


                     j=1

                     + Z(-)t~j [A]~ det (B (t j))
                        j=1


                     j=1


       n
    + Z(-)t~j [A]tj det (A (t j))
      j=1
= a det (C) + det (A)                            Theorem DER [376]


Version 2.02


﻿
Subsection PDM.DRO   Determinants and Row Operations 387


a 0 + det (A) = det (A)


Theorem DERC [385]


The proof for the case of adding a multiple of a column is entirely similar, or could be derived from an
application of Theorem DT [377] employing the transpose of the matrix.                        U

   Is this what you expected? We could argue that the third row operation is the most popular, and yet it
has no effect whatsoever on the determinant of a matrix! We can exploit this, along with our understanding
of the other two row operations, to provide another approach to computing a determinant. We'll explain
this in the context of an example.

Example DRO
Determinant by row operations
Suppose we desire the determinant of the 4 x 4 matrix


   2   0   2  3
A 1    3  -1  1
  -1 1 -1 2
  3    5   4  0


]


We will perform a sequence of row operations on this matrix, shooting for an upper triangular matrix,
whose determinant will be simply the product of its diagonal entries. For each row operation, we will track
the effect on the determinant via Theorem DRCS [383], Theorem DRCM [384], Theorem DRCMA [385].


               [1
 R1<-+R2  I2
     -  A1    [71
               3
               1
-2R1+R2  2  0
               -1
               3
               1
 1R1+R3   3_ 0
         -3 K


              1
-3R1+R4 _10
              0
              [0
              1
 1R3+R2 A   =10
              0
              [0
              1
     -R2      0


              [0
              ~1
4R2+R3A =0

              _0


3
0
1
5
  3
  -6
  1
  5
3
-6
4
5
3
-6
4
-4
3
-2
4
-4
3
1
4
-4
3
1
0
-4


-1 1


4 3~
- 2
4   0
-1 1
  4   1
  -1 2
  4 0
-1 1

42


-2 3
4   0
-1   1

42


-2 3
7   -3
-1   1

2    1
7   -3
-1 1
-1 -2
-2 3
7 -3_
-1 1
-1 -2

7 -3_


det (A)  - det (A1)


       - det (A2)


       - det (A3)


       - det (A4)


       - det (A5)


       = 2det (A6)


       = 2det (A7)


Theorem DRCS [383]


Theorem DRCMA [385]


Theorem DRCMA [385]


Theorem DRCMA [385]


Theorem DRCMA [385]


Theorem DRCM [384]


Theorem DRCMA [385]


Version 2.02


﻿
Subsection PDM.DROEM    Determinants, Row Operations, Elementary Matrices   388


                1
  4R2+R4  0
       -: A8 =K
                0

                1
 -1R3+R4  A g = 0
           A9 0
                0
                1
-2R4+R3  1A = 0
          - 100
                0
                1
  R3<-R4 A _ 0
          11 0 K
                0
                1
    5R4 0
      5A12 =
                0
                0


3
1
0
0
3
1
0
0
3
1
0
0
3
1
0
0
3
1
0
0


-1
-1
2
3
-1
-1
2
1
-1
-1
0
1
-1
-1
1
0
-1
-1
1
0


1
-2
11
-11
1
-2
11
-22
1
-2
55
-22
1
-2
-22
55
1
-2
-22
1


2 det (A8)


2 det (A9)


2 det (Aio)


Theorem DRCMA [385]


Theorem DRCMA [385]


Theorem DRCMA [385]


Theorem DRCS [383]


Theorem DRCM [384]


-2 det (Anl)


-110 det (A12)


The matrix A12 is upper triangular, so expansion about the first column (repeatedly) will result in
det (A12) =(1)(1)(1)(1) =1 (see Example DUTM [379]) and thus, det (A) = -110(1) = -110.
   Notice that our sequence of row operations was somewhat ad hoc, such as the transformation to A5.
We could have been even more methodical, and strictly followed the process that converts a matrix to
reduced row-echelon form (Theorem REMEF [30]), eventually achieving the same numerical result with
a final matrix that equaled the 4 x 4 identity matrix. Notice too that we could have stopped with A8,
since at this point we could compute det (A8) by two expansions about first columns, followed by a simple
determinant of a 2 x 2 matrix (Theorem DMST [376]).
   The beauty of this approach is that computationally we should already have written a procedure to
convert matrices to reduced-row echelon form, so all we need to do is track the multiplicative changes to
the determinant as the algorithm proceeds. Further, for a square matrix of size n this approach requires on
the order of na multiplications, while a recursive application of expansion about a row or column (Theorem
DER [376], Theorem DEC [378]) will require in the vicinity of (n - 1)(n!) multiplications. So even for very
small matrices, a computational approach utilizing row operations will have superior run-time. Tracking,
and controlling, the effects of round-off errors is another story, best saved for a numerical linear algebra
course.


Subsection DROEM
Determinants, Row Operations, Elementary Matrices


As a final preparation for our two most important theorems about determinants, we prove a handful of
facts about the interplay of row operations and matrix multiplication with elementary matrices with regard
to the determinant. But first, a simple, but crucial, fact about the identity matrix.

Theorem DIM
Determinant of the Identity Matrix


Version 2.02


﻿
Subsection PDM.DROEM    Determinants, Row Operations, Elementary Matrices   389


For every n> 1, det (In) = 1.


D-


Proof It may be overkill, but this is a good situation to run through a proof by induction on n (Technique
I [694]). Is the result true when n = 1? Yes,

                      det (Ii)  [Ii]11                    Definition DM [375]
                             = 1                          Definition IM [72]


Now assume the theorem is true for the identity matrix of size n - 1 and investigate the determinant of
the identity matrix of size n with expansion about row 1,


det (In) = (-1)1+3 [Ih]l det (In (1|j))
          j=1
        =Z(-1)1)1[[I']]t det (In (1|1))
               n
            + E (-1) 1+j [In] l det (In (1|j))
              j=2
                        n
        = 1 det (In_1) + E(-1)1+j 0 det (In (1|j))
                       j=2
                 n
        = 1(1) +E   0 = 1
                j=2


Definition DM [375]


Definition IM [72]


Induction Hypothesis


0


Theorem DEM
Determinants of Elementary Matrices
For the three possible versions of an elementary matrix (Definition ELEM [370]) we have the determinants,


1. det (E1,)


1


2. det (Ei (a)) = a

3. det (Eij (a)) = I


D-


Proof Swapping rows i and j of the identity matrix will create E2,j (Definition ELEM [370]), so


det (Ei,j) - det (In)
         = -1


Theorem DRCS [383]
Theorem DIM [387]


Multiplying row i of the identity matrix by a will create E (a) (Definition ELEM [370]), so


det (E1 (a)) = a det (In)


Theorem DRCM [384]
Theorem DIM [387]


Version 2.02


﻿
                 Subsection PDM.DNMMM     Determinants, Nonsingular Matrices, Matrix Multiplication 390


Multiplying row i of the identity matrix by a and adding to row j will create Ei (a) j (Definition ELEM
[370]), so


det (EZ (a) j) = det (In)
            =1


Theorem DRCMA [385]
Theorem DIM [387]


0


Theorem DEMMM
Determinants, Elementary Matrices, Matrix Multiplication
Suppose that A is a square matrix of size n and E is any elementary matrix of size n. Then

                                    det (EA) =det (E) det (A)


Proof The proof procedes in three parts, one for each type of elementary matrix, with each part very
similar to the other two. First, let B be the matrix obtained from A by swapping rows i and j,


det (EA) = det (B)
          = - det (A)
          = det (Ei,) det (A)


Theorem EMDRO [372]
Theorem DRCS [383]
Theorem DEM [388]


Second, let B be the matrix obtained from A by multiplying row i by a,


det (Ei (a) A) det (AB)
             = a det (A)
             = det (EZ (a)) det (A)


Theorem EMDRO [372]
Theorem DRCM [384]
Theorem DEM [388]


Third, let B be the matrix obtained from A by multiplying row i by a and adding to row j,


det (Ei,j (a) A) = det (B)
              = det (A)
              = det (Ei,j (a)) det (A)


Theorem EMDRO [372]
Theorem DRCMA [385]
Theorem DEM [388]


Since the desired result holds for each variety of elementary matrix individually, we are done.


0


Subsection DNMMM
Determinants, Nonsingular Matrices, Matrix Multiplication


If you asked someone with substantial experience working with matrices about the value of the determinant,
they'd be likely to quote the following theorem as the first thing to come to mind.
Theorem SMZD
Singular Matrices have Zero Determinants
Let A be a square matrix. Then A is singular if and only if det (A) = 0.                      D
Proof Rather than jumping into the two halves of the equivalence, we first establish a few items. Let
B be the unique square matrix that is row-equivalent to A and in reduced row-echelon form (Theorem
REMEF [30], Theorem RREFU [32]). For each of the row operations that converts B into A, there is an


Version 2.02


﻿
                  Subsection PDM.DNMMM     Determinants, Nonsingular Matrices, Matrix Multiplication 391


elementary matrix Ei which effects the row operation by matrix multiplication (Theorem EMDRO [372]).
Repeated applications of Theorem EMDRO [372] allow us to write

                                       A = EsEs_1... E2E1B

Then

        det (A) = det (E8E8_1 .. . E2E1B)
               = det (Es) det (E8_1) ... det (E2) det (E1) det (B)  Theorem DEMMM [389]

From Theorem DEM [388] we can infer that the determinant of an elementary matrix is never zero (note
the ban on a = 0 for E (a) in Definition ELEM [370]). So the product on the right is composed of nonzero
scalars, with the possible exception of det (B). More precisely, we can argue that det (A) = 0 if and only
if det (B) = 0. With this established, we can take up the two halves of the equivalence.
    (-) If A is singular, then by Theorem NMRRI [72], B cannot be the identity matrix. Because (1) the
number of pivot columns is equal to the number of nonzero rows, (2) not every column is a pivot column,
and (3) B is square, we see that B must have a zero row. By Theorem DZRC [383] the determinant of B
is zero, and by the above, we conclude that the determinant of A is zero.
    (<) We will prove the contrapositive (Technique CP [691]). So assume A is nonsingular, then by
Theorem NMRRI [72], B is the identity matrix and Theorem DIM [387] tells us that det (B) = 1 # 0.
With the argument above, we conclude that the determinant of A is nonzero as well.
   For the case of 2 x 2 matrices you might compare the application of Theorem SMZD [389] with the
combination of the results stated in Theorem DMST [376] and Theorem TTMI [214].
Example ZNDAB
Zero and nonzero determinant, Archetypes A and B
The coefficient matrix in Archetype A [702] has a zero determinant (check this!) while the coefficient matrix
Archetype B [707] has a nonzero determinant (check this, too). These matrices are singular and nonsingular,
respectively. This is exactly what Theorem SMZD [389] says, and continues our list of contrasts between
these two archetypes.
   Since Theorem SMZD [389] is an equivalence (Technique E [690]) we can expand on our growing list
of equivalences about nonsingular matrices. The addition of the condition det (A) $ 0 is one of the best
motivations for learning about determinants.
Theorem NME7
Nonsingular Matrix Equivalences, Round 7
Suppose that A is a square matrix of size n. The following are equivalent.

  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, P1(A) ={O}.

  4. The linear system [S(A, b) has a unique solution for every possible choice of b.

  5. The columns of A are a linearly independent set.

  6. A is invertible.


7. The column space of A is C", C(A) = C'.

8. The columns of A are a basis for C".

9. The rank of A is n, r (A) = n.


Version 2.02


﻿
                  Subsection PDM.DNMMM   Determinants, Nonsingular Matrices, Matrix Multiplication 392


 10. The nullity of A is zero, n (A) = 0.


 11. The determinant of A is nonzero, det (A) $ 0.


 Proof Theorem SMZD [389] says A is singular if and only if det (A) = 0. If we negate each of these
 statements, we arrive at two contrapositives that we can combine as the equivalence, A is nonsingular if
 and only if det (A) $ 0. This allows us to add a new statement to the list found in Theorem NME6 [349].


   Computationally, row-reducing a matrix is the most efficient way to determine if a matrix is nonsingular,
though the effect of using division in a computer can lead to round-off errors that confuse small quantities
with critical zero quantities. Conceptually, the determinant may seem the most efficient way to determine if
a matrix is nonsingular. The definition of a determinant uses just addition, subtraction and multiplication,
so division is never a problem. And the final test is easy: is the determinant zero or not? However,
the number of operations involved in computing a determinant by the definition very quickly becomes so
excessive as to be impractical.
   Now for the coup de grdce. We will generalize Theorem DEMMM [389] to the case of any two square
matrices. You may recall thinking that matrix multiplication was defined in a needlessly complicated
manner. For sure, the definition of a determinant seems even stranger. (Though Theorem SMZD [389]
might be forcing you to reconsider.) Read the statement of the next theorem and contemplate how nicely
matrix multiplication and determinants play with each other.

Theorem DRMM
Determinant Respects Matrix Multiplication
Suppose that A and B are square matrices of the same size. Then det (AB) = det (A) det (B).  Q

Proof This proof is constructed in two cases. First, suppose that A is singular. Then det (A) = 0 by
Theorem SMZD [389]. By the contrapositive of Theorem NPNT [226], AB is singular as well. So by a
second application ofTheorem SMZD [389], det (AB) = 0. Putting it all together


                              det (AB) = 0 = 0 det (B) = det (A) det (B)


as desired.
   For the second case, suppose that A is nonsingular. By Theorem NMPEM [374] there are elementary
matrices Li, £2, £3, . . ., Es such that A =E1E2E3 . .. Es. Then


        det (AB) =det (E1E2E3 . .. ESB)
                 = det (L1) det (£2) det (£3) . .. det (Es) det (B)  Theorem DEMMM [389]
                 = det (£1£2£3 . .. £s) det (B)                      Theorem DEMMM [389]
                 = det (A) det (B)


   It is amazing that matrix multiplication and the determinant interact this way. Might it also be true
that det (A + B) = det (A) + det (B)? (See Exercise PDM.M30 [393].)


Version 2.02


﻿
                                                      Subsection PDM.READ   Reading Questions 393


Subsection READ
Reading Questions


1. Consider the two matrices below, and suppose you already have computed det (A)
  det (B)? Why?


-120. What is


      0

4    -2

      0


8
2
8
-4


3
-2
4
2


-4]
5
3
-3]


       0
B= o

      -1


8
-4
8
2


3
2
4
-2


-4
-3
3
5_


2. State the theorem that allows us to make yet another extension to our NMEx series of theorems.

3. What is amazing about the interaction between matrix multiplication and the determinant?


Version 2.02


﻿
                                                                 Subsection PDM.EXC   Exercises 394


Subsection EXC
Exercises


C30 Each of the archetypes below is a system of equations with a square coefficient matrix, or is a square
matrix itself. Compute the determinant of each matrix, noting how Theorem SMZD [389] indicates when
the matrix is singular or nonsingular.
Archetype A [702]
Archetype B [707]
Archetype F [724]
Archetype K [746]
Archetype L [750]

Contributed by Robert Beezer

M20 Construct a 3 x 3 nonsingular matrix and call it A. Then, for each entry of the matrix, compute
the corresponding cofactor, and create a new 3 x 3 matrix full of these cofactors by placing the cofactor of
an entry in the same location as the entry it was based on. Once complete, call this matrix C. Compute
ACt. Any observations? Repeat with a new matrix, or perhaps with a 4 x 4 matrix.
Contributed by Robert Beezer Solution [394]

M30 Construct an example to show that the following statement is not true for all square matrices A
and B of the same size: det (A + B) = det (A) + det (B).
Contributed by Robert Beezer

T10 Theorem NPNT [226] says that if the product of square matrices AB is nonsingular, then the
individual matrices A and B are nonsingular also. Construct a new proof of this result making use of
theorems about determinants of matrices.
Contributed by Robert Beezer

T15 Use Theorem DRCM [384] to prove Theorem DZRC [383] as a corollary. (See Technique LC [696].)
Contributed by Robert Beezer

T20 Suppose that A is a square matrix of size n and a E C is a scalar. Prove that det (oA) = an det (A).
Contributed by Robert Beezer

T25 Employ Theorem DT [377] to construct the second half of the proof of Theorem DRCM [384] (the
portion about a multiple of a column).
Contributed by Robert Beezer


Version 2.02


﻿
                                                                   Subsection PDM.SOL   Solutions 395


Subsection SOL
Solutions


M20     Contributed by Robert Beezer  Statement [393]
The result of these computations should be a matrix with the value of det (A) in the diagonal entries and
zeros elsewhere. The suggestion of using a nonsingular matrix was partially so that it was obvious that
the value of the determinant appears on the diagonal.
   This result (which is true in general) provides a method for computing the inverse of a nonsingular
matrix. Since ACt = det (A) In,, we can multiply by the reciprocal of the determinant (which is nonzero!)
and the inverse of A (it exists!) to arrive at an expression for the matrix inverse:

                                                    1
                                          A--1=d     ACt
                                                 det (A)


Version 2.02


﻿
                                                        Annotated Acronyms PDM.D  Determinants 396


Annotated Acronyms D
Determinants


Theorem EMDRO [372]
The main purpose of elementary matrices is to provide a more formal foundation for row operations.
With this theorem we can convert the notion of "doing a row operation" into the slightly more precise,
and tractable, operation of matrix multiplication by an elementary matrix. The other big results in this
chapter are made possible by this connection and our previous understanding of the behavior of matrix
multiplication (such as results in Section MM [194]).

Theorem DER [376]
We define the determinant by expansion about the first row and then prove you can expand about any row
(and with Theorem DEC [378], about any column). Amazing. If the determinant seems contrived, these
results might begin to convince you that maybe something interesting is going on.

Theorem DRMM [391]
Theorem EMDRO [372] connects elementary matrices with matrix multiplication. Now we connect deter-
minants with matrix multiplication. If you thought the definition of matrix multiplication (as exemplified
by Theorem EMP [198]) was as outlandish as the definition of the determinant, then no more. They seem
to play together quite nicely.

Theorem SMZD [389]
This theorem provides a simple test for nonsingularity, even though it is stated and titled as a theorem about
singularity. It'll be helpful, especially in concert with Theorem DRMM [391], in establishing upcoming
results about nonsingular matrices or creating alternative proofs of earlier results. You might even use
this theorem as an indicator of how often a matrix is singular. Create a square matrix at random what
are the odds it is singular? This theorem says the determinant has to be zero, which we might suspect is
a rare occurrence. Of course, we have to be a lot more careful about words like "random," "odds," and
"rare" if we want precise answers to this question.


Version 2.02


﻿


Chapter E

Eigenvalues


When we have a square matrix of size n, A, and we multiply it by a vector x from C" to form the matrix-
vector product (Definition MVP [194]), the result is another vector in C". So we can adopt a functional
view of this computation -the act of multiplying by a square matrix is a function that converts one vector
(x) into another one (Ax) of the same size. For some vectors, this seemingly complicated computation is
really no more complicated than scalar multiplication. The vectors vary according to the choice of A, so
the question is to determine, for an individual choice of A, if there are any such vectors, and if so, which
ones. It happens in a variety of situations that these vectors (and the scalars that go along with them) are
of special interest.
   We will be solving polynomial equations in this chapter, which raises the specter of roots that are
complex numbers. This distinct possibility is our main reason for entertaining the complex numbers
throughout the course. You might be moved to revisit Section CNO [679] and Section 0 [167].


Section EE
Eigenvalues and Eigenvectors


We start with the principal definition for this chapter.

Subsection EEM
Eigenvalues and Eigenvectors of a Matrix


Definition EEM
Eigenvalues and Eigenvectors of a Matrix
Suppose that A is a square matrix of size n, x -f 0 is a vector in C", and A is a scalar in C. Then we say
x is an eigenvector of A with eigenvalue A if

                                            Ax = Ax

                                                                                               D
   Before going any further, perhaps we should convince you that such things ever happen at all. Un-
derstand the next example, but do not concern yourself with where the pieces come from. We will have
methods soon enough to be able to discover these eigenvectors ourselves.

Example SEE
Some eigenvalues and eigenvectors


397


﻿
Subsection EE.EEM


Eigenvalues and Eigenvectors of a Matrix 398


Subsection EBEEM Bigenvalues and Bigenvectors of a Matrix 398


Consider the matrix


  204
  -280
  716
[-472


98
-134
348
-232


-26
36
-90
60


-10
14
-36
28 ]


and the vectors


11
-1
2
5]


-3]
4
-10
4]


     -3]
     71
Z     0
      8]


1


_0 _


Then
                           204     98    -26
                           -280   -134   36
                   Ax =    716    348    -90

                          [-472 -232 60
so x is an eigenvector of A with eigenvalue A


-10      1       41
14      -1      -4
-36      2       81
28 ][ 5j [20_]
4. Also,

-10     -3       0
14       4       0
-36     -10      0
28       4_      0


=4


0


1

2
5


-3
4
-10
4


4x


        204     98
        -280 -134
Ay     I 716   348
       -472 -232


-26
36
-90
60


: 0y


so y is an eigenvector of A with eigenvalue A = 0. Also,


Az =


204
-280
716
-472


98
-134
348
-232


-26
36
-90
60


-10    -3      -61
14      7   _   14
-36     0       01
28 _    8       16]


so z is an eigenvector of A with eigenvalue A = 2. Also,


2


2


-3
7
0
8


1
-1

0


2z


Aw =


204
-280
716
-472


98
-134
348
-232


-26
36
-90
60


-10     1]
14     -1
-36     4
28 _ 0]


L


21
-2
8
0]


2w


so w is an eigenvector of A with eigenvalue A


2.


   So we have demonstrated four eigenvectors of A. Are there
an eigenvector is again an eigenvector. In this example, set u =


more? Yes, any nonzero scalar multiple of
: 30x. Then


Au = A(30x)
    = 30Ax
    = 30(4x)
    = 4(30x)


Theorem MMSMM [201]
x an eigenvector of A
Property SMAM [184]


                         = 4u

so that u is also an eigenvector of A for the same eigenvalue, A = 4.
   The vectors z and w are both eigenvectors of A for the same eigenvalue A = 2, yet this is not as simple
as the two vectors just being scalar multiples of each other (they aren't). Look what happens when we
add them together, to form v = z + w, and multiply by A,


Av = A(z + w)


Version 2.02


﻿
Subsection EE.PM  Polynomials and Matrices 399


Az + Aw
2z + 2w
2(z+ w)


Theorem MMDAA [201]
z, w eigenvectors of A
Property DVAC [87]


                         = 2v

so that v is also an eigenvector of A for the eigenvalue A = 2. So it would appear that the set of eigenvectors
that are associated with a fixed eigenvalue is closed under the vector space operations of C". Hmmm.
   The vector y is an eigenvector of A for the eigenvalue A = 0, so we can use Theorem ZSSM [286] to
write Ay =0y = 0. But this also means that y E P1(A). There would appear to be a connection here
also.
   Example SEE [396] hints at a number of intriguing properties, and there are many more. We will
explore the general properties of eigenvalues and eigenvectors in Section PEE [419], but in this section we
will concern ourselves with the question of actually computing eigenvalues and eigenvectors. First we need
a bit of background material on polynomials and matrices.

Subsection PM
Polynomials and Matrices


A polynomial is a combination of powers, multiplication by scalar coefficients, and addition (with subtrac-
tion just being the inverse of addition). We never have occasion to divide when computing the value of
a polynomial. So it is with matrices. We can add and subtract matrices, we can multiply matrices by
scalars, and we can form powers of square matrices by repeated applications of matrix multiplication. We
do not normally divide matrices (though sometimes we can multiply by an inverse). If a matrix is square,
all the operations constituting a polynomial will preserve the size of the matrix. So it is natural to consider
evaluating a polynomial with a matrix, effectively replacing the variable of the polynomial by a matrix.
We'll demonstrate with an example,
Example PM
Polynomial of a matrix
Let


p(x) = 14+ 19x - 3x2 - 7x3+x4


      -1 3 2
D = 1 0-2
      -3   1   1


and we will compute p(D). First, the necessary powers of D. Notice that D0 is defined to be the multi-
plicative identity, I3, as will be the case in general.


            1  0  0
D0=I3= 0 1 0
            0  0  1
            -1   3   2
D[=D= 1 0 -2
            -3 1 1

    -                 3 2       -1D 2 = D D 1 = 1 0 -2 1- 3- 2D 1 0 5


               -1   3   21    2
    D ~DD2~3 1 1] _L1


   2       -2 -1 -6
   -2 =     5    1   0
   1 _    L1    -8 -7]
-1 -6        19 -12 -8
1    0   =   -4    15   8
-8 -7        12   -4    11_


Version 2.02


﻿
                                       Subsection EE.EEE Existence of Eigenvalues and Eigenvectors 400


                                 -1 3    2     19  -12   -8       -7   49   54
                  D4=DD3=        1   0  -2]-4       15    8  =    -5   -4   -30
                                 -3 1    1     12   -4   11      -49   47   43


Then

                     p(D) =14 + 19D - 3D2 - 7D3 + D4
                                1 0 0          -1 3     2        -2 -1 -6
                          =14 0 1 0 +19         1   0 -2 -3      5    1    0
                                0 0 1          -3 1     1        1   -8 -7-
                                    19  -12 -8        -7    49   54
                              -7 -4      15   8   +   -5    -4  -30
                                   12   -4    11      -49   47   43
                              -139   193   166
                          -    27    -98  -124
                              -193   118    20

Notice that p(x) factors as

                      p(x) = 14 + 19x - 3x2 - 7x3-+x4 = (x - 2)(x - 7)(x + 1)2

Because D commutes with itself (DD = DD), we can use distributivity of matrix multiplication across
matrix addition (Theorem MMDAA [201]) without being careful with any of the matrix products, and just
as easily evaluate p(D) using the factored form of p(x),

                 p(D)    14 + 19D - 3D2 - 7D3 + D4 = (D - 213)(D - 713)(D + 13)2

                          -3    3   2     -8   3    2     0   3   2  2
                          = 1  -2 -2       1   -7 -2      1   1 -2
                          -3    1   -1_   -3   1   -6_    -3 1    2_
                          -139   193    166
                          27     -98   -124
                          -193   118    20

This example is not meant to be too profound. It is meant to show you that it is natural to evaluate a
polynomial with a matrix, and that the factored form of the polynomial is as good as (or maybe better
than) the expanded form. And do not forget that constant terms in polynomials are really multiples of
the identity matrix when we are evaluating the polynomial with a matrix.


Subsection EEE
Existence of Eigenvalues and Eigenvectors


Before we embark on computing eigenvalues and eigenvectors, we will prove that every matrix has at least
one eigenvalue (and an eigenvector to go with it). Later, in Theorem MNEM [427], we will determine the


maximum number of eigenvalues a matrix may have.
   The determinant (Definition D [341]) will be a powerful tool in Subsection EE.CEE [403] when it comes
time to compute eigenvalues. However, it is possible, with some more advanced machinery, to compute
eigenvalues without ever making use of the determinant. Sheldon Axler does just that in his book, Linear


Version 2.02


﻿
                                        Subsection EE.EEE  Existence of Eigenvalues and Eigenvectors 401


Algebra Done Right. Here and now, we give Axler's "determinant-free" proof that every matrix has an
eigenvalue. The result is not too startling, but the proof is most enjoyable.
Theorem EMHE
Every Matrix Has an Eigenvalue
Suppose A is a square matrix. Then A has at least one eigenvalue.                                 D
Proof Suppose that A has size n, and choose x as any nonzero vector from C". (Notice how much
latitude we have in our choice of x. Only the zero vector is off-limits.) Consider the set

                                  S= {x, Ax, A2x, A3x, ..., Anx}

This is a set of n + 1 vectors from C", so by Theorem MVSLD [137], S is linearly dependent. Let
ao, ai, a2, ... , an be a collection of n + 1 scalars from C, not all zero, that provide a relation of linear
dependence on S. In other words,

                           aox + a1Ax + a2A2x + a3A3x + -"" + anAnx = 0

Some of the ai are nonzero. Suppose that just ao # 0, and ai = a2 = a3- -"- = an = 0. Then aox = 0
and by Theorem SMEZV [287], either ao = 0 or x = 0, which are both contradictions. So ai # 0 for some
i > 1. Let m be the largest integer such that am $ 0. From this discussion we know that m > 1. We can
also assume that am = 1, for if not, replace each ai by ai/am to obtain scalars that serve equally well in
providing a relation of linear dependence on S.
   Define the polynomial
                             p(x) = ao + aix + a2x2 + a3x3 -+ .--+-amxm
Because we have consistently used C as our set of scalars (rather than R), we know that we can factor
p(x) into linear factors of the form (x - bi), where bi E C. So there are scalars, b1, b2, b3, ..., bm, from C
so that,
                         p(x)   (x - bm)(x - bm-1) ... (x - b3)( - b2)(x - bi)
Put it all together and

     0 = aox + a1Ax + a2A2x + a3A3x + - --+ anAx
       =aox+a1Ax+a2A2x+a3A3x+---+amAmx                                  ai ==0 for i > m
       = (aoIn + a1A + a2A2 + a3A3 + - - - + amAm) x                    Theorem MMDAA [201]
       = p(A)x                                                          Definition of p(x)
       = (A - bmIn)(A - bm_1In) ... (A - b3In)(A - b2In)(A - biIn)x

Let k be the smallest integer such that

                    ( A - bkIn)( A - bk_1In) - - ( A - b3In)( A - b2In)( A - b1In)x =0.

From the preceding equation, we know that k <im. Define the vector z by

                         z =( A - bk 1In) - - ( A - b3In)( A - b2In)( A - b1In)x

Notice that by the definition of k, the vector z must be nonzero. In the case where k =1, we understand
that z is defined by z =x, and z is still nonzero. Now

             ( A - bk In)z - ( A - bk In)( A - bk 1In) - - ( A - b3In)( A - b2In)( A - b1In)x - 0


which allows us to write

                  Az = (A + O)z                            Property ZM [184]


Version 2.02


﻿
                                        Subsection EE.EEE Existence of Eigenvalues and Eigenvectors 402


                     =(A - bkIn + bkIn)z                    Property AIM [184]
                     = (A - bkIn)z + bkInz                  Theorem MMDAA [201]
                     = 0 + bkInz                            Defining property of z
                     = bkInz                                Property ZM [184]
                     = bkz                                  Theorem MMIM [200]

Since z - 0, this equation says that z is an eigenvector of A for the eigenvalue A = bk (Definition EEM
[396]), so we have shown that any square matrix A does have at least one eigenvalue.
   The proof of Theorem EMHE [400] is constructive (it contains an unambiguous procedure that leads
to an eigenvalue), but it is not meant to be practical. We will illustrate the theorem with an example, the
purpose being to provide a companion for studying the proof and not to suggest this is the best procedure
for computing an eigenvalue.
Example CAEHW
Computing an eigenvalue the hard way
This example illustrates the proof of Theorem EMHE [400], so will employ the same notation as the proof
   look there for full explanations. It is not meant to be an example of a reasonable computational approach
to finding eigenvalues and eigenvectors. OK, warnings in place, here we go.
   Let
                                         -7    -1   11    0   -4
                                         4      1    0    2    0
                                   A=    -10   -1   14    0   -4
                                          8     2   -15 -1     5
                                          -10  -1   16    0   -6_
and choose
                                                    3
                                                    0
                                              x= 3
                                                   -5
                                                   4
It is important to notice that the choice of x could be anything, so long as it is not the zero vector. We
have not chosen x totally at random, but so as to make our illustration of the theorem as general as
possible. You could replicate this example with your own choice and the computations are guaranteed to
be reasonable, provided you have a computational tool that will factor a fifth degree polynomial for you.
   The set

                       S = {x, Ax, A2x, A3x, A4x, A5x}
                                3r     -4      6     r-10      18      -34
                                        0  2   -6     14      -30      62
                                   =  3 -4 ,   6   ,  -0   ,   18   ,  -34
                                       -5      -2      -2      10      -26
                                  _4 _ _ 6     1_    _- 8_   _ 34 _   _-66_J

is guaranteed to be linearly dependent, as it has six vectors from C5 (Theorem MVSLD [137]). We will
search for a non-trivial relation of linear dependence by solving a homogeneous system of equations whose
coefficient matrix has the vectors of S as columns through row operations,


3    -4   6   -10    18   -347               0   -2   6  -14   30
0     2   -6   14   -30    62            0   2   -3   7  -15   31
3    -4   6   -10    18   -34    RREF    0   0    0   0   0    0
-5    4   -2   -2    10   -26            0   0    0   0   0    0
4    -6   10  -18    34   -66_          _ 0  0    0   0   0    0 _


Version 2.02


﻿
                                        Subsection EE.EEE  Existence of Eigenvalues and Eigenvectors 403


There are four free variables for describing solutions to this homogeneous system, so we have our pick of
solutions. The most expedient choice would be to set x3 =1 and x4 = x5 = x6 = 0. However, we will again
opt to maximize the generality of our illustration of Theorem EMHE [400] and choose x3 = -8, x4 = -3,
x5 =1 and x6= 0. The leads to a solution with xi =16 and x2 =12.
   This relation of linear dependence then says that

                            0 =16x + 12Ax - 8A2x - 3A3x + A4x + OA5x
                            0 =(16 + 12A - 8A2 - 3A3 + A4)x

So we define p(x) =16+ 12x -8x2 - 3x3 + x4, and as advertised in the proof of Theorem EMHE [400], we
have a polynomial of degree m = 4 > 1 such that p(A)x = 0. Now we need to factor p(x) over C. If you
made your own choice of x at the start, this is where you might have a fifth degree polynomial, and where
you might need to use a computational tool to find roots and factors. We have

                    p(x) = 16 + 12x - 8x2 - 3x3 + x4 = (x - 4)(x + 2)(x - 2)(x + 1)

So we know that
                         0 = p(A)x = (A - 415)(A + 215)(A - 215)(A + 115)x

We apply one factor at a time, until we get the zero vector, so as to determine the value of k described in
the proof of Theorem EMHE [400],

                                               -6    -1    11  0 -4       3      -1
                                               4      2    0   2   0     0        2
                                (A+1I5)x       -10   -1    15  0   -4     3  =   -1
                                                8     2   -15  0   5     -5      -1
                                                -10  -1    16  0   -5_ 4 _       _-2
                                                -9   -1    11   0   -4    -1        4
                                                4    -1    0    2    0     2       -8
                       (A-2I5)(A+1I5)x =       -10   -1    12   0   -4    -1 =      4
                                                8     2  -15 -3      5    -1        4
                                                -10  -1    16   0   -8    -2    _   8
                                                -5   -1    11  0 -4      4       0
                                                4     3    0   2   0     -8      0
              (A+2I5)(A-2I5)(A+1I5)x =         -10   -1    16  0   -4    4   =   0
                                                8     2  -15 1     5     4       0
                                                -10  -1    16  0   -4     8      0


So k =3 and
                                                               -4-
                                                               -8
                                   z =(A -2I5)(A +1I5)x =      4
                                                               4
                                                               _8 _
is an eigenvector of A for the eigenvalue A =-2, as you can check by doing the computation Az. If


you work through this example with your own choice of the vector x (strongly recommended) then the
eigenvalue you will find may be different, but will be in the set {3, 0, 1, -1, -2}. See Exercise EE.M60
[414] for a suggested starting vector.


Version 2.02


﻿
                                       Subsection EE.CEE  Computing Eigenvalues and Eigenvectors 404


Subsection CEE
Computing Eigenvalues and Eigenvectors


Fortunately, we need not rely on the procedure of Theorem EMHE [400] each time we need an eigenvalue.
It is the determinant, and specifically Theorem SMZD [389], that provides the main tool for computing
eigenvalues. Here is an informal sequence of equivalences that is the key to determining the eigenvalues
and eigenvectors of a matrix,

                        Ax=Ax <        Ax-AIxOnx =         (A- AI)x=O

So, for an eigenvalue A and associated eigenvector x - 0, the vector x will be a nonzero element of the
null space of A - AIn, while the matrix A - AIn will be singular and therefore have zero determinant.
These ideas are made precise in Theorem EMRCP [404] and Theorem EMNS [405], but for now this brief
discussion should suffice as motivation for the following definition and example.

Definition CP
Characteristic Polynomial
Suppose that A is a square matrix of size n. Then the characteristic polynomial of A is the polynomial
PA (x) defined by
                                       PA (x) = det (A - zIn)


Example CPMS3
Characteristic polynomial of a matrix, size 3
Consider
                                             -13
                                       F=    12
                                             24

Then


-8
7
16


-4
4
7


PF (x) = det (F - 3I3)
         -13-x      -8
      =     12     7-x
            24      16


-4
  4
7-x


Definition CP [403]


Definition DM [375]


(-13-x) 7          4   +(-8)(-1) 12      4
            16   7-x               24 7-x
         12 7-x
  + (-4) 24    16

(-13 - x)((7 - x)(7 - x) - 4(16))
  + (-8)(-1)(12(7 - x) - 4(24))
  + (-4)(12(16) - (7 - x)(24))
3 + 5x + 2 -3
-(x - 3)(x + 1)2


Theorem DMST [376]


   The characteristic polynomial is our main computational tool for finding eigenvalues, and will sometimes
be used to aid us in determining the properties of eigenvalues.


Version 2.02


﻿
                                        Subsection EE.CEE  Computing Eigenvalues and Eigenvectors 405


Theorem EMRCP
Eigenvalues of a Matrix are Roots of Characteristic Polynomials
Suppose A is a square matrix. Then A is an eigenvalue of A if and only if PA (A) = 0.      D
Proof Suppose A has size n.

          A is an eigenvalue of A
          M      there exists x # 0 so that Ax = Ax               Definition EEM [396]
          M there exists x$#0 so that Ax - Ax = 0
          m      there exists x # 0 so that Ax - AInx = 0      Theorem MMIM [200]
          M      there exists x # 0 so that (A - AIn)x = 0     Theorem MMDAA [201]
          S     A - AIn is singular                               Definition NM [71]
          M     det (A - AIn) = 0                                 Theorem SMZD [389]
          P PA (A) = 0                                            Definition CP [403]


Example EMS3
Eigenvalues of a matrix, size 3
In Example CPMS3 [403] we found the characteristic polynomial of

                                             -13 -8     -4
                                       F=     12    7   4
                                              24   16   7

to be pF (x) = -(x-3)(x+ 1)2. Factored, we can find all of its roots easily, they are x = 3 and x = -1. By
Theorem EMRCP [404], A = 3 and A = -1 are both eigenvalues of F, and these are the only eigenvalues
of F. We've found them all.
   Let us now turn our attention to the computation of eigenvectors.
Definition EM
Eigenspace of a Matrix
Suppose that A is a square matrix and A is an eigenvalue of A. Then the eigenspace of A for A, EA (A),
is the set of all the eigenvectors of A for A, together with the inclusion of the zero vector.   A
   Example SEE [396] hinted that the set of eigenvectors for a single eigenvalue might have some closure
properties, and with the addition of the non-eigenvector, 0, we indeed get a whole subspace.
Theorem EMS
Eigenspace for a Matrix is a Subspace
Suppose A is a square matrix of size n~ and A is an eigenvalue of A. Then the eigenspace EA (A) is a subspace
of the vector space C"m.D
Proof We will check the three conditions of Theorem TSS [293]. First, Definition EM [404] explicitly
includes the zero vector in EA (A), so the set is non-empty.
   Suppose that x, y E SA (A), that is, x and y are two eigenvectors of A for A. Then

                   A (x + y) =Ax + Ay                    Theorem MMDAA [201]
                             =Ax + Ay                    x, y eigenvectors of A


                            =A (x + y)                   Property DVAC [87]

So either x + y = 0, or x + y is an eigenvector of A for A (Definition EEM [396]). So, in either event,
x + y E EA (A), and we have additive closure.


Version 2.02


﻿
                                       Subsection EE.CEE  Computing Eigenvalues and Eigenvectors 406


   Suppose that a E C, and that x C SA (A), that is, x is an eigenvector of A for A. Then

                    A (ox) = a (Ax)                    Theorem MMSMM [201]
                           = aAx                       x an eigenvector of A
                           = A (ax)                    Property SMAC [86]

So either cx = 0, or cx is an eigenvector of A for A (Definition EEM [396]). So, in either event, ax E EA (A),
and we have scalar closure.
   With the three conditions of Theorem TSS [293] met, we know EA (A) is a subspace.
   Theorem EMS [404] tells us that an eigenspace is a subspace (and hence a vector space in its own
right). Our next theorem tells us how to quickly construct this subspace.
Theorem EMNS
Eigenspace of a Matrix is a Null Space
Suppose A is a square matrix of size n and A is an eigenvalue of A. Then

                                       EA (A) = N(A - AIn)


Proof The conclusion of this theorem is an equality of sets, so normally we would follow the advice of
Definition SE [684]. However, in this case we can construct a sequence of equivalences which will together
provide the two subset inclusions we need. First, notice that 0 E EA (A) by Definition EM [404] and
0 E N(A - AIn) by Theorem HSC [62]. Now consider any nonzero vector x E C",

               x E EA (A)  M   Ax = Ax                      Definition EM [404]
                          M Ax-Ax=0
                          m    Ax - AInx = 0                Theorem MMIM [200]
                          M    (A - AIn) x = 0              Theorem MMDAA [201]
                          S    x E N(A - AIn)               Definition NSM [64]


   You might notice the close parallels (and differences) between the proofs of Theorem EMRCP [404]
and Theorem EMNS [405]. Since Theorem EMNS [405] describes the set of all the eigenvectors of A as a
null space we can use techniques such as Theorem BNS [139] to provide concise descriptions of eigenspaces.
Theorem EMNS [405] also provides a trivial proof for Theorem EMS [404].
Example ESMS3
Eigenspaces of a matrix, size 3
Example CPMS3 [403] and Example EMS3 [404] describe the characteristic polynomial and eigenvalues of
the 3 x 3 matrix
                                                -3-8 -4]
                                          F= 127 4
                                             [24 16 7]
We will now take the each eigenvalue in turn and compute its eigenspace. To do this, we row-reduce
the matrix F - AI3 in order to determine solutions to the homogeneous system IJS(F - Al3, 0) and then
express the eigenspace as the null space of F - AI3 (Theorem EMNS [405]). Theorem BNS [139] then tells
us how to write the null space as the span of a basis.


                          -16 -8 -4   iW0 1
A=3            F-3I3=      12    4   4 R :0               -
                           24   16   4            0   0    0


Version 2.02


﻿
                            Subsection EE.ECEE  Examples of Computing Eigenvalues and Eigenvectors 407


                         EF(3) = V(F-3I3) ==                          1
                                                      1               2
                                   -12   -8   -4
        A=-1            F+1/3=      12    8    4   REF     0   0  0
                                    24   16    8           0   0  0

                                                       3      3
                       EFI(-1) =R(F+1F3) =            1      0       =       3   ,0
                                                      0      1               0       3

Eigenspaces in hand, we can easily compute eigenvectors by forming nontrivial linear combinations of
the basis vectors describing each eigenspace. In particular, notice that we can "pretty up" our basis
vectors by using scalar multiples to clear out fractions. More powerful scientific calculators, and most
every mathematical software package, will compute eigenvalues of a matrix along with basis vectors of the
eigenspaces. Be sure to understand how your device outputs complex numbers, since they are likely to
occur. Also, the basis vectors will not necessarily look like the results of an application of Theorem BNS
[139]. Duplicating the results of the next section (Subsection EE.ECEE [406]) with your device would be
very good practice.See:  Computation E.SAGE [677].


Subsection ECEE
Examples of Computing Eigenvalues and Eigenvectors


No theorems in this section, just a selection of examples meant to illustrate the range of possibilities for the
eigenvalues and eigenvectors of a matrix. These examples can all be done by hand, though the computation
of the characteristic polynomial would be very time-consuming and error-prone. It can also be difficult
to factor an arbitrary polynomial, though if we were to suggest that most of our eigenvalues are going
to be integers, then it can be easier to hunt for roots. These examples are meant to look similar to a
concatenation of Example CPMS3 [403], Example EMS3 [404] and Example ESMS3 [405]. First, we will
sneak in a pair of definitions so we can illustrate them throughout this sequence of examples.

Definition AME
Algebraic Multiplicity of an Eigenvalue
Suppose that A is a square matrix and A is an eigenvalue of A. Then the algebraic multiplicity of A,
aA (A), is the highest power of (x - A) that divides the characteristic polynomial, PA (x).
(This definition contains Notation AME.)                                                          A

   Since an eigenvalue A is a root of the characteristic polynomial, there is always a factor of (x - A),
and the algebraic multiplicity is just the power of this factor in a factorization of PA (x). So in particular,
aa (A) > 1. Compare the definition of algebraic multiplicity with the next definition.
Definition GME
Geometric Multiplicity of an Eigenvalue
Suppose that A is a square matrix and A is an eigenvalue of A. Then the geometric multiplicity of A,
7YA (A), is the dimension of the eigenspace EA (A).
(This definition contains Notation GME.)                                                          A
   Since every eigenvalue must have at least one eigenvector, the associated eigenspace cannot be trivial,


and so -yA (A) > 1.

Example EMMS4
Eigenvalue multiplicities, matrix of size 4


Version 2.02


﻿
Subsection EE.ECEE  Examples of Computing Eigenvalues and Eigenvectors 408


Consider the matrix


then


-2 1
12   1
6    5
3 -4


-2
4
-2
5


-4
9
-4
10_


PB (x) = 8 - 20x + 18x2 - 7x3a+ x4 = (xz- 1)(x - 2)3


So the eigenvalues are A = 1, 2 with algebraic multiplicities aB (1)
   Computing eigenvectors,


1 and aB (2)


3.


A=1


            -3

B- 1I4=     12
            6
            3


1
0
5
-4


-2
4
-3
5


-4
9]
-4
9 _


        1
RREF    0
        0
        0


ES (1) = N(B - 1I4) =  

                           40


K


A=2


            -4

B-2I4=      12
            6
            _3


1
-1
5
-4


-2
4
-4
5


-4
9
-4
8 _


01
     3

0    0
0    0
  -_1


_ 0i _
00
0  
0 1
0 0
  -_1
  2

  .2_


        1
RREF    0
        0
        _0


1/21
-1
1/2
0_


S (2) = P(B - 214)  K   1

                            .1_


K


So each eigenspace has dimension 1 and so -yB (1)


1 and 7YB (2)


1. This example is of interest because


of the discrepancy between the two multiplicities for A


2. In many of our


geometric multiplicities will be equal for all of the eigenvalues (as it was for A
this example in mind. We will have some explanations for this phenomenon
[440]).


examples the algebraic and
= 1 in this example), so keep
later (see Example NDMS4


Example ESMS4
Eigenvalues, symmetric matrix of size 4
Consider the matrix
                                               1 0 1 1
                                            C=0 1 1 1
                                            C  1 1 1 0
                                               1  1  0  1_
then
                     pc (xV) =-3 + 4x + 2x2 - 4x3 + xz = (x - 3)(x - 1)2(x + 1)


So the eigenvalues are A = 3, 1, -1 with algebraic multiplicities ac (3)
   Computing eigenvectors,


:1, ac (1)= 2 and ac (-1) =1.


A = 3


           -2 -0

C -3I4 =    0   -2
            1 1
            1 1


1
1
-2
0


1            1   0    0   -11
1    RREF,   0        0  -1
0            0   0       -1
-2]          0   0    0   0]


Version 2.02


﻿
Subsection EE.ECEE Examples of Computing Eigenvalues and Eigenvectors 409


Ec (3) = (C - 3I4) =


A=1


C-1I4- 0
           1


0
0
1
1


1
1
0
0


11
1
0
0]


Ec (1) = N(C - 1I4)


     1


     1_
     11

     1
RREF    0  0
        0 0
        _ 00


     -1
RREF  0 ~
        0


    -_[ -


0     0
-   1


71]}
0    0


0
0
0 0


1


A=-1


C+1I4_ 0
           1


0
2
1
1


1
1
2
0


11
1
0
2]


1
1
-1
0_


Ec (-1) = N(C + 114) =


So the eigenspace dimensions yield geometric multiplicities yc (3) = 1, yc (1) = 2 and yc (-1) = 1, the
same as for the algebraic multiplicities. This example is of interest because A is a symmetric matrix, and
will be the subject of Theorem HMRE [427].


Example HMEM5
High multiplicity eigenvalues, matrix of size 5
Consider the matrix


      29
      -47
E=     19
      -19
      7


14    2
-22 -1
10    5
-10 -3
4     3


6    -9
-11 13
4    -8
-2   8
1    -3


then


PE (x) -16 + 16x + 8x2 - 163 + 74 -5


_x - 2)4(x + 1)


So the eigenvalues are A = 2,
   Computing eigenvectors,


-1 with algebraic multiplicities aE (2) = 4 and aE (-1)


1.


A=2


            27
            -47
E-2I5=      19
           -19
           7


14
-24
10
-10
4


2
-1
3
-3
3


6    -9
-11 13
4    -8
-4   8
1    -5


        10      0   1
        00 0 -3
RREF  002
    RE  0   0  F-] 0
        0   0   0   0
        0   0   0   0
           -2     0
           3      1
    =       0   ,2
            2     0
  _0 _0[2_


0
1
-1
0
0


                          -1 0

E E (2 )=N(E -2I5) =  0 ,1
                           10    01


Version 2.02


﻿
Subsection EE.ECEE   Examples of Computing Eigenvalues and Eigenvectors 410


A = -1


             30    14   2
             -47 -21 -1
E+1I5=       19    10   6
            -19 -10 -3
            7      4    3


6    -9            1   0   0    2   0
-11  13           0   Q    0   -4   0
4    -8   RREF    0    0   1    1   0
-1    8           0    0   0    0   W
1    -2           0    0   0    0   0


                                                    -2
                                                    4
                       EE (-1) = N (E + 1-15) =       1
                                                     1
                                                     _0 _


So the eigenspace dimensions yield geometric multiplicities 7E (2) = 2 and 7E (-1) = 1. This example is
of interest because A = 2 has such a large algebraic multiplicity, which is also not equal to its geometric
multiplicity.


Example CEMS6
Complex eigenvalues, matrix of size 6
Consider the matrix


F


-59
  1
-233
157
-91
209


-34    41    12    25    30
7     -46   -36   -11   -29
-119   58   -35    75    54
81    -43    21   -51   -39
-48    32    -5    32    26
107   -55    28   -69   -50


then


PF (x) = -50 + 55x + 13x2 - 50x3 + 32x4 - 9x5 + x6
       - (x - 2)(x + 1)(x2 - 4x + 5)2
       = (x- 2)(x+ 1)((x- (2 + i))(xz- (2 -i)))2
       = (x - 2)(x + 1)(x - (2 + i))2(x - (2 -i))2


So the eigenvalues are A = 2, -1, 2 + i, 2 - i with algebraic multiplicities aF (2)
aF (2+ i)=2 andaF(2-i)=2.
   Computing eigenvectors,
              A=2


1, aF (-1)


1,


F - 216


-61     -34    41    12    25    30
   1      5    -46  -36   -11   -29
 -233   -119   56   -35    75    54
 157     81    -43   19   -51   -39
 -91    -48    32    -5    30    26
_ 209    107   -55   28   -69   -52


RREF


    0    0   0    0
0  F     0   0    0   0
0   0   2    0    0
0   0    0        0  -
0   0    0   0
0   0    0   0   0    0


                           [-})1-1
                             0               0

F (2) = N (F - 2I16)=

                            _ 1 __5


Version 2.02


﻿
Subsection EE.ECEE  Examples of Computing Eigenvalues and Eigenvectors 411


A = -1


F+ 116


-58
  1
-233
157
-91
209


-34
8
-119
81
-48
107


41    12
-46 -36
59    -35
-43 22
32    -5
-55 28


25
-11
75
-51
33
-69


30
-29
54
-39
26
-49

-1
  3
  -1
  0
  1
  2 _


RREF


w     0   0   0    0
0         0   0    0   -2
0     0       0    0
0     0   0   2    0    0
0    0    0   0        -
0    0    0   0    0   0


EF(-1) =  (F + I6) =K


I


_1
3
1
0
1
1


>=C


I>


F-


        A=2+i
              -61-i     -34     41     12      25
                 1     5-i     -46     -36    -11

- (2 + i)16    -233    -119 56 - i     -35     75
                157      81    -43    19 - i  -51
                -91     -48     32     -5    30 - i
                209     107    -55     28     -69
                  1    0    0   0    0    5(7+i)
                  0         0   0    0    (-9-2i)
           RREF    0   0        0    0       1
                   0   0    0  2     0      -1
                   0   0    0   0   [1       1
                   0   0    0   0   0        0

                                     -(7 +i;)
                                     5(9 +2i)

EF (2 + i) = N(F - (2 + i)-6) =1

                                        -1


30
-29
54
-39
26
-52-i


=C


-7 - i-
9 + 2i
  -5
  5
  -5
-5-


I>


A=2-i


F - (2 - i)I6


-61 + i
   1
 -233
 157
 -91
 209


-34      41     12      25
5+i     -46    -36     -11
-119   56 + i  -35      75
81      -43    19+i    -51
-48      32     -5    30 + i
107     -55     28     -69


30
-29
54
-39
26
-52+ i


Version 2.02


﻿
Subsection EE.ECEE


Examples of Computing Eigenvalues and Eigenvectors 412


          RREF


EF (2 - i) = N(F


1110
oWQ
0 0
0 0
0 0
0 0


0
0
01
0
0
0


0
0
0
Q1
0
0


0    5(7 - i)
0 5(-9+2i)
0       1
0       -1
        1
0       0
-7  i)
*(9 2i)


    1


(2- i)I6)   (


I


K


I


-7 + i-
9 - 2i
  -5
  5
  -5
  5


I


So the eigenspace dimensions yield geometric multiplicities 7F (2)   1, 7F (-1) = 1, 7F (2 + i) = 1
and YF (2 - i) = 1. This example demonstrates some of the possibilities for the appearance of complex
eigenvalues, even when all the entries of the matrix are real. Notice how all the numbers in the analysis of
A = 2 - i are conjugates of the corresponding number in the analysis of A = 2 + i. This is the content of
the upcoming Theorem ERMCP [423].


Example DEMS5
Distinct eigenvalues, matrix of size 5
Consider the matrix
                                        15
                                        5
                                 H=     0
                                       -43
                                       26
then


18
3
-4
-46
30


-8    6
1     -1
5     -4
17 -14
-12   8


-5
-3
-2
15
-10


PH (x)_ -6x + 92 + 7x3 - x4- x  x(x - 2) ( - 1) ( + 1)>( + 3)


So the eigenvalues are A = 2, 1,
aH (-1)=1 and aH (-3)= 1-
   Computing eigenvectors,


0, -1, -3 with algebraic multiplicities aH (2) 1, cH (1)


1, aH (0) = 1,


A=2


             13
             5
H-2I5=       0
            -43
            26


18
1
-4
-46
30


-8     6
1     -1
3     -4
17 -16
-12    8


-5
-3
-2
15
-12


       1
       0
RREF    0

        0
        0


0

0
0
0


0   0   -1
0   0    1
1   0    2
0 0      1
0   0    0


                             1

EH(2) = A (H - 2-15)        -2
                            -1
                            1


A=1


             14
             5
H-1I5=       0
            -43
            26


18
2
-4
-46
30


-8    6
1     -1
4     -4


-5
-3
-2
15
-11


        1   0    0   0
        0        0   0    0
RREF:   0    0       0    1
        0    0   0   Q    1
        0    0   0   0    0   V_
                              Version 2.02


17
-12


-15
8


﻿
Subsection EE.ECEE   Examples of Computing Eigenvalues and Eigenvectors 413


EH(1) =    (H - 115) =


I


11
0               0
-       =       -1
-1              -2
1  __2             _


I


A=0


H-0I5=


15
5
0
-43
26


18
3
-4
-46
30


-8     6
1     -1
5     -4
17 -14
-12    8


-5
-3
-2
15
-10


       1
       0
RREF:0

        0
        0


0
01
0
0
0


0    0    1
0    0   -2
L    0 -2
0 L1      0
0    0    0_


EH(0) =N(H-0I5)=


-1
2
2
0
1


                           16
                           5
A=-1          H+1I5=       0
                          -43
                          26


18
4
-4
-46
30


-8     6
1     -1
6     -4
17 -13
-12    8


-5
-3
-2
15
-9


I>


         1   0   0    0   -1/2
          F 0    0    0     0
RREF     0   0        0     0
        0    0   0   W     1/2
        _0   0   0    0     0
        1
      0
      0
      -1
      _2 _


EH (-1) A=f(H + 1-15)


K


I


1
0
0
1
1


                            18
                            5
A=-3           H+3I5=       0
                           -43
                           26


18
6
-4
-46
30


1


-8    6   -5
1    -1   -3
8    -4   -2
7   -11    15
12    8   -7

      1


      -2

      _ 1  _


        1    0   0    0   -1
        0        0    0   2
RREF    0    0   1    0   1

        0    0   0      12
        0    0   0    0   0


EH(-3) = (H + 315)


-2
1
2
4
_2_


I


So the eigenspace dimensions yield geometric multiplicities 7H (2) = 1, 7H (1) = 1, 7H (0) = 1, 7H (-1) = 1
and 7H (-3) = 1, identical to the algebraic multiplicities. This example is of interest for two reasons. First,
A = 0 is an eigenvalue, illustrating the upcoming Theorem SMZE [420]. Second, all the eigenvalues are
distinct, yielding algebraic and geometric multiplicities of 1 for each eigenvalue, illustrating Theorem DED
[440].


Version 2.02


﻿
                                                       Subsection EE.READ  Reading Questions 414


Subsection READ
Reading Questions


Suppose A is the 2 x 2 matrix
                                          A =[5 8
                                               -4 7-

  1. Find the eigenvalues of A.

  2. Find the eigenspaces of A.

  3. For the polynomial p(x) = 3x2 - x + 2, compute p(A).


Version 2.02


﻿
                                                                      Subsection EE.EXC  Exercises 415


Subsection EXC
Exercises


C19 Find the eigenvalues, eigenspaces, algebraic multiplicities and geometric multiplicities for the matrix
below. It is possible to do all these computations by hand, and it would be instructive to do so.

                                                C  -1 2
                                                   -6 6_

Contributed by Robert Beezer Solution [415]

C20 Find the eigenvalues, eigenspaces, algebraic multiplicities and geometric multiplicities for the matrix
below. It is possible to do all these computations by hand, and it would be instructive to do so.

                                                _-12 30
                                            B -   -5    13


Contributed by Robert Beezer Solution [415]

C21 The matrix A below has A = 2 as an eigenvalue. Find the geometric multiplicity of A= 2 using
your calculator only for row-reducing matrices.

                                            18  -15    33   -15
                                      A=-4       8     -6    6
                                        A  -9    9    -16    9
                                           A 5   -6    9     -4]

Contributed by Robert Beezer Solution [416]

C22 Without using a calculator, find the eigenvalues of the matrix B.

                                                B  2   -1
                                                   1   1

Contributed by Robert Beezer Solution [416]

                                                         0
                                                         8
M60    Repeat Example CAEHW [401] by choosing x     2 and then arrive at an eigenvalue and eigen-
                                                         1
                                                         2
vector of the matrix A. The hard way.
Contributed by Robert Beezer Solution [416]

T1O A matrix A is idempotent if A2 =A. Show that the only possible eigenvalues of an idempotent
matrix are A =0 and A =1. Then give an example of a matrix that is idempotent and has both of these
two values as eigenvalues.
Contributed by Robert Beezer Solution [417]


T20 Suppose that A and p are two different eigenvalues of the square matrix A. Prove that the intersection
of the eigenspaces for these two eigenvalues is trivial. That is, EA (A) n SA (p) = {0}.
Contributed by Robert Beezer Solution [417]


Version 2.02


﻿
                                                                  Subsection EE.SOL Solutions 416


Subsection SOL
Solutions


C19    Contributed by Robert Beezer   Statement [414]
First compute the characteristic polynomial,

               pc (x) =det (C - xI2)                           Definition CP [403]
                         -1-x      2
                           -6    6-x
                      = (-1-x)(6-x)-(2)(-6)
                      =x2 -5x+6
                      =(x - 3)(x - 2)

So the eigenvalues of C are the solutions to pc (x) = 0, namely, A = 2 and A = 3.
   To obtain the eigenspaces, construct the appropriate singular matrices and find expressions for the null
spaces of these matrices.

                               A =2
                                             21    [[l   3 2
                        C - (2)I2 =[22 RREF          -

                           EC (2) = N(C - (2)I-2)=


                               A=3

                        Cr-2(3)I2      2    RREF     0-
                           C ()2     -6 3-          0 '02

                           Ec (3) = N (C - (3)-2) =


C20    Contributed by Robert Beezer   Statement [414]
The characteristic polynomial of B is

              PB (x) = det (B - x12)                           Definition CP [403]
                       -12-x      30
                         -5     13- x
                      (-12 - z)(13 - z) - (30)(-5)             Theorem DMST [376]
                      z2 - x- 6
                      ( - 3)(v+ 2)

From this we find eigenvalues A =3, -2 with algebraic multiplicities asB (3) =1 and asB (-2) =1
   For eigenvectors and geometric multiplicities, we study the null spaces of B - AI2 (Theorem EMNS
[405]).


B[- 3I2 = 15  30  RREF  [ -2
  B3V        -35 10_        0 0

  Es (3) = N (B - 3I12) =


Version 2.02


﻿
                                                                    Subsection EE.SOL  Solutions 417


                   A = -2                   B + 212 [10       30]      E   I-3]

                                            EB(-2) = V(B + 212) =

Each eigenspace has dimension one, so we have geometric multiplicities YB (3) = 1 and YB (-2) = 1.
C21    Contributed by Robert Beezer  Statement [414]
If A = 2 is an eigenvalue of A, the matrix A - 2I4 will be singular, and its null space will be the eigenspace
of A. So we form this matrix and row-reduce,

                                  16  -15    33   -15                0   3 0
                      A -2I4=       -4  6    -6    6    RREF,    0      3 1 1
                               4  -9    9   -18    9            0    0   0 0
                                  5    -6    9    -6_           0    0   0 0_

With two free variables, we know a basis of the null space (Theorem BNS [139]) will contain two vectors.
Thus the null space of A - 2I4 has dimension two, and so the eigenspace of A = 2 has dimension two also
(Theorem EMNS [405]), yA (2) = 2.
C22    Contributed by Robert Beezer  Statement [414]
The characteristic polynomial (Definition CP [403]) is

             pB (x) = det (B - zI2)
                       2-x     -1
                         1    1-3x
                      (2 - x)(1 - x) - (1)(-1)                    Theorem DMST [376]
                      =2-3x + 3
                            3+   3i) (     -     3 i
                    =   x-     2   )   X_      2

where the factorization can be obtained by finding the roots of PB (x) = 0 with the quadratic equation.
By Theorem EMRCP [404] the eigenvalues of B are the complex numbers Al-=3+2i and A2 = 3-2i
M60     Contributed by Robert Beezer  Statement [414]
Form the matrix C whose columns are x, Ax, A2x, A3x, A4x, A5x and row-reduce the matrix,

               0   6    32    102   320     966            1    0   0   -3 -9 -30
               8   10   24    58     168    490            0   2    0    1   0     1
               2   12   50    156   482    1452    RREF:   0    0  2     3   10   30
               1 -5    -47   -149   -479   -1445           0    0   0    0   0     0
               _212 50        156   482    1452 __0            0   0   0     0     0

The simplest possible relation of linear dependence on the columns of C comes from using scalars a4=
and as = as 0 for the free variables in a solution to IJS(C, 0). The remainder of this solution is ai = 3,
Oa2 =-1, as3 -3. This solution gives rise to the polynomial


which then has the property that p(A)x = 0.
   No matter how you choose to order the factors of p(x), the value of k (in the language of Theorem
EMHE [400] and Example CAEHW [401]) is k = 2. For each of the three possibilities, we list the resulting


Version 2.02


﻿
Subsection EE.SOL  Solutions 418


eigenvector and the associated eigenvalue:


                      (C - 315)(C - I5)z


                      (C - 315)(C + I5)z


                      (C + 15)(C - I5)z


8
8
8
-24
8
20
-20
20
-40
20
32
16
48
-48
48


A _-1


A=1


A=3


Note that each of these eigenvectors can be simplified by an appropriate scalar multiple, but we have shown
here the actual vector obtained by the product specified in the theorem.

T10    Contributed by Robert Beezer  Statement [414]
Suppose that A is an eigenvalue of A. Then there is an eigenvector x, such that Ax = Ax. We have,


Ax = Ax
   = A2x
   = A(Ax)
   = A(Ax)
   = A(Ax)
   =A(Ax)


x eigenvector of A
A is idempotent


x eigenvector of A
Theorem MMSMM [201]
x eigenvector of A


From this we get


0 = A2x -.Ax
    (A2 - A)x


Property DSAC [87]


Since x is an eigenvector, it is nonzero, and Theorem SMEZV [287] leaves us with the conclusion that
A2 - A = 0, and the solutions to this quadratic polynomial equation in A are A= 0 and A =1.
   The matrix
                                                 1 0
                                                 0 0
is idempotent (check this!) and since it is a diagonal matrix, its eigenvalues are the diagonal entries, A = 0
and A = 1, so each of these possible values for an eigenvalue of an idempotent matrix actually occurs as an
eigenvalue of some idempotent matrix. So we cannot state any stronger conclusion about the eigenvalues
of an idempotent matrix, and we can say that this theorem is the "best possible."
T20    Contributed by Robert Beezer  Statement [414]
This problem asks you to prove that two sets are equal, so use Definition SE [684].


Version 2.02


﻿
                                                                     Subsection EE.SOL  Solutions 419


   First show that {0} C E (A) n E (p). Choose x E {0}. Then x = 0. Eigenspaces are subspaces
(Theorem EMS [404]), so both EA (A) and EA (p) contain the zero vector, and therefore x E SA (A) n EA (p)
(Definition SI [685]).
   To show that EA (A) n SA (p) C {0}, suppose that x E SA (A) n SA (p). Then x is an eigenvector of A
for both A and p (Definition SI [685]) and so

                  x = 1x                                 Property 0 [280]
                         1
                       1     (A -p) x                    A: p, A- p# 0

                         1
                    =        (Ax - px)                   Property DSAC [87]

                         1
                    =        (Ax - Ax)                   x eigenvector of A for A, p
                       A -p
                         1
                    =        (0)
                       A -p
                    = 0                                  Theorem ZVSM [286]


So x = 0, and trivially, x E {0}.


Version 2.02


﻿
Section PEE  Properties of Eigenvalues and Eigenvectors 420


Section PEE
Properties of Eigenvalues and Eigenvectors
U.-


--m


The previous section introduced eigenvalues and eigenvectors, and concentrated on their existence and
determination. This section will be more about theorems, and the various properties eigenvalues and
eigenvectors enjoy. Like a good 4 x 100 meter relay, we will lead-off with one of our better theorems and
save the very best for the anchor leg.

Theorem EDELI
Eigenvectors with Distinct Eigenvalues are Linearly Independent
Suppose that A is an n x n square matrix and S = {xi, x2, x3, ..., xp} is a set of eigenvectors with
eigenvalues A1, A2, A3, ..., AP such that A # A3 whenever i # j. Then S is a linearly independent set. Q

Proof If p = 1, then the set S = {x1} is linearly independent since eigenvectors are nonzero (Definition
EEM [396]), so assume for the remainder that p > 2.
   We will prove this result by contradiction (Technique CD [692]). Suppose to the contrary that S is
a linearly dependent set. Define Si = {xi, x2, x3, ..., xi} and let k be an integer such that Sk_1
{x1, x2, x3, ..., xk-1} is linearly independent and Sk _ {x1, x2, x3, ..., xk} is linearly dependent. We
have to ask if there is even such an integer k? First, since eigenvectors are nonzero, the set {x1} is
linearly independent. Since we are assuming that S = Sp is linearly dependent, there must be an integer
k, 2 < k < p, where the sets Si transition from linear independence to linear dependence (and stay that
way). In other words, xk is the vector with the smallest index that is a linear combination of just vectors
with smaller indices.
   Since {x1, x2, x3, ..., xk} is linearly dependent there are scalars, ai, a2, a3, ..., a, some non-zero
(Definition LI [308]), so that

                                0 = a1xi+ a2x2 + a3x3 + --. + akxk


Then,


O= (A- AIn)0
  =(A - AkIn) (aix1 + a2x2 + a3x3 + . + akxk)
  = (A - AkIn) a1x1 + (A - AkIn) a2x2 + ... + (A - AkIn) akxk
  = ai (A - AkIn) Xi + a2 (A - AkIn) X2 + ...+ ak (A - AkI-n) xk
  = ai (Axi - AkInx1) + a2 (Ax2 - AkInx2) + ... + ak (Axe - AkInxk)
  = ai (Axi - Akxl) + a2 (Ax2 - Akx2) + ... + ak (Axe - Akxk)
  = ai (Aix1 - Akx1) + a2 (A2x2 - Akx2) + . + ak (Akxk - Akxk)
  = ai (A1 - Ak)xi + a2 (A2 - Ak) x2 + -+ ak(Ak - Ak) xk
  = ai (A1 - Ak)xi + a2 (A2 - Ak) x2 +-- + ak(0) xk
  = al (A1 - Ak) xi + a2 (A2 - Ak) x2 +-+ ak_1 (Ak1 - Ak) Xk_1 + 0
  = al (A1 - Ak)xl + a2 (A2 - Ak) x2 + + a_1 (Ak_1 - Ak) xki


Theorem ZVSM [286]
Definition RLD [308]
Theorem MMDAA [201]
Theorem MMSMM [201]
Theorem MMDAA [201]
Theorem MMIM [200]
Definition EEM [396]
Theorem MMDAA [201]
Property AICN [681]
Theorem ZSSM [286]
Property Z [280]


This is a relation of linear dependence on the linearly independent set {x1, x2, x3, ... , xk-1}, so the scalars
must all be zero. That is, ai (Ai - Ak) = 0 for 1 < i < k - 1. However, we have the hypothesis that the
eigenvalues are distinct, so Ai # Ak for 1 < i < k - 1. Thus ai = 0 for 1 < i < k - 1.
   This reduces the original relation of linear dependence on {x1, x2, x3, ..., xk } to the simpler equation
akxk = 0. By Theorem SMEZV [287] we conclude that ak = 0 or xk = 0. Eigenvectors are never the zero


Version 2.02


﻿
                                              Section PEE  Properties of Eigenvalues and Eigenvectors 421


vector (Definition EEM [396]), so a/ = 0. So all of the scalars ai, 1 < i < k are zero, contradicting their in-
troduction as the scalars creating a nontrivial relation of linear dependence on the set {x1, x2, x3, ... , xk}-
With a contradiction in hand, we conclude that S must be linearly independent.             U

   There is a simple connection between the eigenvalues of a matrix and whether or not the matrix is
nonsingular.

Theorem SMZE
Singular Matrices have Zero Eigenvalues
Suppose A is a square matrix. Then A is singular if and only if A = 0 is an eigenvalue of A.  Q

Proof We have the following equivalences:

            A is singular <   there exists x # 0, Ax = 0         Definition NSM [64]
                         <    there exists x # 0, Ax O=0x            Theorem ZSSM [286]
                         SA= 0 is an eigenvalue of A                 Definition EEM [396]


   With an equivalence about singular matrices we can update our list of equivalences about nonsingular
matrices.

Theorem NME8
Nonsingular Matrix Equivalences, Round 8
Suppose that A is a square matrix of size n. The following are equivalent.

  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, Nf(A) = {0}.

  4. The linear system [S(A, b) has a unique solution for every possible choice of b.

  5. The columns of A are a linearly independent set.

  6. A is invertible.

  7. The column space of A is C"m, C(A) = C'.

  8. The columns of A are a basis for C .

  9. The rank of A is n, r (A) = n

  10. The nullity of A is zero, n~ (A) =0.

  11. The determinant of A is nonzero, det (A) # 0.

  12. A =0 is not an eigenvalue of A.


Proof The equivalence of the first and last statements is the contrapositive of Theorem SMZE [420], so
we are able to improve on Theorem NME7 [390].                                                       U

   Certain changes to a matrix change its eigenvalues in a predictable way.


Version 2.02


﻿
                                            Section PEE  Properties of Eigenvalues and Eigenvectors 422


Theorem ESMM
Eigenvalues of a Scalar Multiple of a Matrix
Suppose A is a square matrix and A is an eigenvalue of A. Then aA is an eigenvalue of oA.    Q
Proof Let x # 0 be one eigenvector of A for A. Then

                    (oA) x = a (Ax)                    Theorem MMSMM [201]
                           =a (Ax)                     x eigenvector of A
                           = (a) x                     Property SMAC [86]

So x # 0 is an eigenvector of oA for the eigenvalue &A.
   Unfortunately, there are not parallel theorems about the sum or product of arbitrary matrices. But we
can prove a similar result for powers of a matrix.
Theorem EOMP
Eigenvalues Of Matrix Powers
Suppose A is a square matrix, A is an eigenvalue of A, and s > 0 is an integer. Then As is an eigenvalue
of As.
Proof Let x # 0 be one eigenvector of A for A. Suppose A has size n. Then we proceed by induction on
s (Technique I [694]). First, for s = 0,

                      Asx = A0x
                           = Inx
                           = x                         Theorem MMIM [200]
                           = 1x                        Property OC [87]
                           = Aox
                             Asx
                           = A~x


so As is an eigenvalue of As in this special case. If we assume the theorem is true for s, then we find

                    As+lx = A8Ax
                          = As (Ax)                    x eigenvector of A for A
                          = A (Asx)                    Theorem MMSMM [201]
                          = A (Asx)                    Induction hypothesis
                          = (AA') x                    Property SMAC [86]
                          = Asilx

So x # 0 is an eigenvector of AS+1 for AS+1, and induction tells us the theorem is true for all s ;> 0. U
   While we cannot prove that the sum of two arbitrary matrices behaves in any reasonable way with
regard to eigenvalues, we can work with the sum of dissimilar powers of the same matrix. We have already
seen two connections between eigenvalues and polynomials, in the proof of Theorem EMHE [400] and the
characteristic polynomial (Definition CP [403]). Our next theorem strengthens this connection.
Theorem EPM
Eigenvalues of the Polynomial of a Matrix
Suppose A is a square matrix and A is an eigenvalue of A. Let q~x) be a polynomial in the variable z.


Then q(A) is an eigenvalue of the matrix q(A).                                                  D
Proof Let x # 0 be one eigenvector of A for A, and write q(x) = ao + a1x + a2x2 + - - - + amxm. Then

       q(A)x = (aoA0 + a1A1 + a2A2 +... + amAm) x


Version 2.02


﻿
                                            Section PEE  Properties of Eigenvalues and Eigenvectors 423


              = (aoA°)x + (a1A1)x + (a2A2)x + - - - + (amAm)x      Theorem MMDAA [201]
              = ao(A0x) + ai(Alx) + a2(A2x) + - - - + am(Amx)      Theorem MMSMM [201]
              = ao(A~x) + ai(Alx) + a2(A2x) + - - - + am(Amx)      Theorem EOMP [421]
              = (aoA0)x + (a1Al)x + (a2A2)x + - - - + (amAm)x      Property SMAC [86]
              = (aoA0 + a1A1 + a2A2 +... + amAm) x                 Property DSAC [87]
              = q(A)x

So x -f 0 is an eigenvector of q(A) for the eigenvalue q(A).                                    U

Example BDE
Building desired eigenvalues
In Example ESMS4 [407] the 4 x 4 symmetric matrix

                                              1 0   1 1
                                              0 1 1 1
                                           C  1 1 1 0
                                              1 1 0 1_

is shown to have the three eigenvalues A = 3, 1, -1. Suppose we wanted a 4 x 4 matrix that has the three
eigenvalues A = 4, 0, -2. We can employ Theorem EPM [421] by finding a polynomial that converts 3 to
4, 1 to 0, and -1to -2. Such a polynomial is called an interpolating polynomial, and in this example
we can use
                                               12        5

We will not discuss how to concoct this polynomial, but a text on numerical analysis should provide the
details or see Section CF [847]. For now, simply verify that r(3) = 4, r(1) = 0 and r(-1)  -2.
   Now compute

                      r(C) =C2+C -5I4
                             4          4
                                3 2 2 2         1 0  1 1         1 0 0 0
                             1 2 3 2 2         0 1 1 1        5 0 1 0 0
                             42 2 3 2+          1 1 1 0       4  0 0   1 0
                                _2 2 2 3        1 1 0 1_         0 0 0 1_
                                ~1 1 3 3
                             1 1 1 3 3
                             2 3 3 1 1


Theorem EPM [421] tells us that if r(x) transforms the eigenvalues in the desired manner, then r(C)
will have the desired eigenvalues. You can check this by computing the eigenvalues of r(C) directly.
Furthermore, notice that the multiplicities are the same, and the eigenspaces of C and r(C) are identical.


   Inverses and transposes also behave predictably with regard to their eigenvalues.
Theorem EIM
Eigenvalues of the Inverse of a Matrix


Suppose A is a square nonsingular matrix and A is an eigenvalue of A. Then jis an eigenvalue of the
matrix A--1.
Proof Notice that since A is assumed nonsingular, A-- exists by Theorem NI [228], but more importantly,
  does not involve division by zero since Theorem SMZE [420] prohibits this possibility.


Version 2.02


﻿
                                             Section PEE Properties of Eigenvalues and Eigenvectors 424


   Let x # 0 be one eigenvector of A for A. Suppose A has size n. Then

                   A-lx = A-1(1x)                        Property OC [87]

                         = A-1( Ax)                      Property MICN [681]

                         = -A-1(Ax)                      Theorem MMSMM [201]

                         =-A-1(Ax)                       Definition EEM [396]

                         = -(A-1A)x                      Theorem MMA [202]
                            1
                         = -Inx                          Definition MI [213]
                            A
                            x                            Theorem MMIM [200]

So x # 0 is an eigenvector of A-1 for the eigenvalue A.I
   The theorems above have a similar style to them, a style you should consider using when confronted
with a need to prove a theorem about eigenvalues and eigenvectors. So far we have been able to reserve the
characteristic polynomial for strictly computational purposes. However, the next theorem, whose statement
resembles the preceding theorems, has an easier proof if we employ the characteristic polynomial and results
about determinants.
Theorem ETM
Eigenvalues of the Transpose of a Matrix
Suppose A is a square matrix and A is an eigenvalue of A. Then A is an eigenvalue of the matrix At.  D
Proof Suppose A has size n. Then

                  PA (x) = det (A - zIn)                     Definition CP [403]
                         = det ((A - xIn)t)                  Theorem DT [377]
                         = det (At - (xI)t)                  Theorem TMA [186]
                         = det (At - xII)                    Theorem TMSM [187]
                         = det (At - zln)                    Definition IM [72]
                         = pAt (x)                           Definition CP [403]


So A and At have the same characteristic polynomial, and by Theorem EMRCP [404], their eigenvalues
are identical and have equal algebraic multiplicities. Notice that what we have proved here is a bit stronger
than the stated conclusion in the theorem.U
   If a matrix has only real entries, then the computation of the characteristic polynomial (Definition CP
[403]) will result in a polynomial with coefficients that are real numbers. Complex numbers could result as
roots of this polynomial, but they are roots of quadratic factors with real coefficients, and as such, come
in conjugate pairs. The next theorem proves this, and a bit more, without mentioning the characteristic
polynomial.
Theorem ERMCP
Eigenvalues of Real Matrices come in Conjugate Pairs
Suppose A is a square matrix with real entries and x is an eigenvector of A for the eigenvalue A. Then i


is an eigenvector of A for the eigenvalue A.                                                      D
Proof

                       Ax = Ax                         A has real entries


Version 2.02


﻿
                                                    Subsection PEE.ME   Multiplicities of Eigenvalues 425


                           = Ax                         Theorem MMCC [203]
                             Ax                         x eigenvector of A
                             =Ax                        Theorem CRSM [167]

So x is an eigenvector of A for the eigenvalue A.                                                   U

    This phenomenon is amply illustrated in Example CEMS6 [409], where the four complex eigenvalues
come in two pairs, and the two basis vectors of the eigenspaces are complex conjugates of each other.
Theorem ERMCP [423] can be a time-saver for computing eigenvalues and eigenvectors of real matrices
with complex eigenvalues, since the conjugate eigenvalue and eigenspace can be inferred from the theorem
rather than computed.


Subsection ME
Multiplicities of Eigenvalues


A polynomial of degree n will have exactly n roots. From this fact about polynomial equations we can say
more about the algebraic multiplicities of eigenvalues.

Theorem DCP
Degree of the Characteristic Polynomial
Suppose that A is a square matrix of size n. Then the characteristic polynomial of A, PA (x), has degree
n~.                                                                                                Q

Proof We will prove a more general result by induction (Technique I [694]). Then the theorem will be
true as a special case. We will carefully state this result as a proposition indexed by m, m > 1.
   P(m): Suppose that A is an m x m matrix whose entries are complex numbers or linear polynomials
in the variable x of the form c - x, where c is a complex number. Suppose further that there are exactly
k entries that contain x and that no row or column contains more than one such entry. Then, when
k = m, det (A) is a polynomial in x of degree m, with leading coefficient +1, and when k < m, det (A) is
a polynomial in x of degree k or less.
    Base Case: Suppose A is a 1 x 1 matrix. Then its determinant is equal to the lone entry (Definition
DM [375]). When k = m = 1, the entry is of the form c - x, a polynomial in x of degree m = 1 with
leading coefficient -1. When k < m, then k = 0 and the entry is simply a complex number, a polynomial
of degree 0 < k. So P(1) is true.
    Induction Step: Assume P(m) is true, and that A is an (m + 1) x (m + 1) matrix with k entries of the
form c - x. There are two cases to consider.
    Suppose k m m+1. Then every row and every column will contain an entry of the form c - z. Suppose
that for the first row, this entry is in column t. Compute the determinant of A by an expansion about this
first row (Definition DM [375]). The term associated with entry t of this row will be of the form

                                      (c - x)(-1)1+L det (A (1it))

The submatrix A (1|t) is an m x m matrix with k =m terms of the form c - x, no more than one per row
or column. By the induction hypothesis, det (A (1|t)) will be a polynomial in x of degree m with coefficient
t1. So this entire term is then a polynomial of degree m + 1 with leading coefficient t1.
    The remaining terms (which constitute the sum that is the determinant of A) are products of complex


numbers from the first row with cofactors built from submatrices that lack the first row of A and lack some
column of A, other than column t. As such, these submatrices are m x m matrices with k = m - 1 < m
entries of the form c - x, no more than one per row or column. Applying the induction hypothesis, we
see that these terms are polynomials in x of degree m - 1 or less. Adding the single term from the entry


Version 2.02


﻿
                                                   Subsection PEE.ME   Multiplicities of Eigenvalues 426


in column t with all these others, we see that det (A) is a polynomial in x of degree m + 1 and leading
coefficient 1.
   The second case occurs when k < m + 1. Now there is a row of A that does not contain an entry of
the form c - x. We consider the determinant of A by expanding about this row (Theorem DER [376]),
whose entries are all complex numbers. The cofactors employed are built from submatrices that are m x m
matrices with either k or k - 1 entries of the form c - x, no more than one per row or column. In either
case, k < m, and we can apply the induction hypothesis to see that the determinants computed for the
cofactors are all polynomials of degree k or less. Summing these contributions to the determinant of A
yields a polynomial in x of degree k or less, as desired.
   Definition CP [403] tells us that the characteristic polynomial of an n x n matrix is the determinant of
a matrix having exactly n entries of the form c - x, no more than one per row or column. As such we can
apply P(n) to see that the characteristic polynomial has degree n.                                U

Theorem NEM
Number of Eigenvalues of a Matrix
Suppose that A is a square matrix of size n with distinct eigenvalues Ai, A2, A3, ..., Ak. Then

                                            k
                                              AA (Ai) = n
                                           i=1


Proof By the definition of the algebraic multiplicity (Definition AME [406]), we can factor the charac-
teristic polynomial as

                 PA (x) = c(x - A1)aA(A1) ( - A2) (A2)(x - A3)a(A3) ... (x - Ak)aA(Ak)

where c is a nonzero constant. (We could prove that c = (-1)", but we do not need that specificity right
now. See Exercise PEE.T30 [429]) The left-hand side is a polynomial of degree n by Theorem DCP [424]
and the right-hand side is a polynomial of degree zX1 a (Ai). So the equality of the polynomials' degrees
gives the equality z_1 aA (Ai) = n.

Theorem ME
Multiplicities of an Eigenvalue
Suppose that A is a square matrix of size n and A is an eigenvalue. Then

                                       1   7A(A)< aA(A)< n


Proof Since A is an eigenvalue of A, there is an eigenvector of A for A, x. Then x E SA (A), so 7yA (A)   1,
since we can extend {x} into a basis of EA (A) (Theorem ELIS [355]).
   To show that 7yA (A)   ca (A) is the most involved portion of this proof. To this end, let g = yA (A)
and let x1, x2, x3, -.-., Xg be a basis for the eigenspace of A, EA (A). Construct another n~ - g vectors,
yi, y2, yT3, ---, yn-g, so that

                              {x1, x2, x3, -.-.-, xg, yi, y2, 3, - --, yn-g}


is a basis of C". This can be done by repeated applications of Theorem ELIS [355]. Finally, define a matrix
S by
                     S = [x1x2x3 ... Xgly2y3 ... yn-y     ]=[x1x2x3 ... xgR]


Version 2.02


﻿
                                                 Subsection PEE.ME  Multiplicities of Eigenvalues 427


where R is an n x (n - g) matrix whose columns are yi, y2, y3, --. , yn-g. The columns of S are linearly
independent by design, so S is nonsingular (Theorem NMLIC [138]) and therefore invertible (Theorem NI
[228]). Then,

                      [e1|e2|e3| -..-en] =  In
                                     = S--1S
                                     = S-1[x1|x2|x3 ... -|xgR]
                                     =[S--i1-x2|S  x3| - - -S1xg|S1R]


So
                                      S-1x2=e2 1     i   g
Preparations in place, we compute the characteristic polynomial of A,


(*)


PA (x) = det (A - xIn)
      = 1 det (A - zIn)
      = det (In) det (A - zln)
      = det (S-S) det (A - xIn)
      = det (S-1) det (S) det (A - xIn)
      = det (S-1) det (A - xIn) det (S)
      = det (S--1 (A - zln) S)


Definition CP [403]
Property OCN [681]
Definition DM [375]
Definition MI [213]
Theorem DRMM [391]
Property CMCN [680]
Theorem DRMM [391]
Theorem MMDAA [201]
Theorem MMSMM [201]
Theorem MMIM [200]
Definition MI [213]
Definition CP [403]


det (S
det (S-
det (S-
det (S-


-1AS
-1AS
-1AS
-1AS


S-1xInS)
xS--InS)
xS-1S)
zIn)


PS-1As ()


What can we learn then about the matrix S-1AS?

        S-1AS = S-1A[xix2|x3| ... |xR]
               = S--[Ax1|Ax2|Ax3| ... Axg|AR]
               = S-1[Ax1IAx2|Ax3| ... AXg|AR]
               = [S - - A x i lS - - A x 2 S 1A x 3 | . .. S-A x S
               = [AS--i1AS--x2|AS x3| -. -|ASxg| S
               = [Ae1|Ae2|Ae3| -. -|Aeg|S1AR]


1AR]
1AR]


Definition MM [197]
Definition EEM [396]
Definition MM [197]
Theorem MMSMM [201]
S--1S = In, ((*) above)


Now imagine computing the characteristic polynomial of A by computing the characteristic polynomial
of S-1AS using the form just obtained. The first g columns of S-1AS are all zero, save for a A on the
diagonal. So if we compute the determinant by expanding about the first column, successively, we will get
successive factors of (A - x). More precisely, let T be the square matrix of size n - g that is formed from
the last n - g rows and last n - g columns of S-1AR. Then

                               PA () = PS-1AS () = (A - x)oPT (X)-

This says that (x-A) is a factor of the characteristic polynomial at least g times, so the algebraic multiplicity
of A as an eigenvalue of A is greater than or equal to g (Definition AME [406]). In other words,

                                       /A (A) = 9 < o'A (A)


Version 2.02


﻿
                                           Subsection PEE.EHM   Eigenvalues of Hermitian Matrices 428


as desired.
   Theorem NEM [425] says that the sum of the algebraic multiplicities for all the eigenvalues of A is equal
to n. Since the algebraic multiplicity is a positive quantity, no single algebraic multiplicity can exceed n
without the sum of all of the algebraic multiplicities doing the same.                          U


Theorem MNEM
Maximum Number of Eigenvalues of a Matrix
Suppose that A is a square matrix of size n. Then A cannot have more than n distinct eigenvalues.


D


Proof   Suppose that A has k distinct eigenvalues, A1, A2, A3, ..., Ak. Then

                           k
                      k=    1
                          i=1


   <YA(Ai)
i=1


Theorem ME [425]

Theorem NEM [425]


0


Subsection EHM
Eigenvalues of Hermitian Matrices


0


Recall that a matrix is Hermitian (or self-adjoint) if A = A* (Definition HM [205]). In the case where A
is a matrix whose entries are all real numbers, being Hermitian is identical to being symmetric (Definition
SYM [186]). Keep this in mind as you read the next two theorems. Their hypotheses could be changed to
"suppose A is a real symmetric matrix."


Theorem HMRE
Hermitian Matrices have Real Eigenvalues
Suppose that A is a Hermitian matrix and A is an eigenvalue of A. Then A E R.

Proof   Let x # 0 be one eigenvector of A for the eigenvalue A. Then by Theorem
(x, x) # 0. So


D-


PIP [172] we know


     1
 A=     A (xx)
   (x, x)
     1
(x, x) (Ax, x)
     1
  (x, x)(Ax,x)
     1
  (x x) (x,Ax)
     1
=  xx  (x, Ax)
     1-
   (x, x) Axx


Property MICN [681]

Theorem IPSM [170]

Definition EEM [396]

Theorem HMIP [205]

Definition EEM [396]

Theorem IPSM [170]

Property MICN [681]


Version 2.02


﻿
                                                        Subsection PEE.READ   Reading Questions 429


If a complex number is equal to its conjugate, then it has a complex part equal to zero, and therefore is a
real number.                                                                                     U
   Notice the appealing symmetry to the justifications given for the steps of this proof. In the center is
the ability to pitch a Hermitian matrix from one side of the inner product to the other.
   Look back and compare Example ESMS4 [407] and Example CEMS6 [409]. In Example CEMS6 [409]
the matrix has only real entries, yet the characteristic polynomial has roots that are complex numbers,
and so the matrix has complex eigenvalues. However, in Example ESMS4 [407], the matrix has only real
entries, but is also symmetric, and hence Hermitian. So by Theorem HMRE [427], we were guaranteed
eigenvalues that are real numbers.
   In many physical problems, a matrix of interest will be real and symmetric, or Hermitian. Then if the
eigenvalues are to represent physical quantities of interest, Theorem HMRE [427] guarantees that these
values will not be complex numbers.
   The eigenvectors of a Hermitian matrix also enjoy a pleasing property that we will exploit later.
Theorem HMOE
Hermitian Matrices have Orthogonal Eigenvectors
Suppose that A is a Hermitian matrix and x and y are two eigenvectors of A for different eigenvalues.
Then x and y are orthogonal vectors.                                                             D
Proof Let x be an eigenvector of A for A and let y be an eigenvector of A for a different eigenvalue p.
So we have A - p # 0. Then
                          1
               (x, y)   A    (A- p) (x, y)                     Property MICN [681]
                          1
                     =       (A (x, y) - p (x, y))             Property MICN [681]
                        A -p
                          1
                     =       ((Ax, y) - (x, py))               Theorem IPSM [170]
                          1
                     =       ((Ax, y) - (x, py))               Theorem HMRE [427]
                        A -p
                          1
                     -       ((Ax, y) - (x, Ay))               Definition EEM [396]
                          1
                     -       ((Ax, y) - (Ax, y))               Theorem HMIP [205]
                          1
                     =       (0)                               Property AICN [681]

                     =0

This equality says that x and y are orthogonal vectors (Definition OV [172]).
   Notice again how the key step in this proof is the fundamental property of a Hermitian matrix (Theorem
HMIP [205]) -the ability to swap A across the two arguments of the inner product. We'll build on these
results and continue to see some more interesting properties in Section OD [601].

Subsection READ
Reading Questions


  1. How can you identify a nonsingular matrix just by looking at its eigenvalues?


2. How many different eigenvalues may a square matrix of size n have?

3. What is amazing about the eigenvalues of a Hermitian matrix and why is it amazing?


Version 2.02


﻿
                                                                   Subsection PEE.EXC   Exercises 430


Subsection EXC
Exercises


T10 Suppose that A is a square matrix. Prove that the constant term of the characteristic polynomial
of A is equal to the determinant of A.
Contributed by Robert Beezer Solution [430]

T20 Suppose that A is a square matrix. Prove that a single vector may not be an eigenvector of A for
two different eigenvalues.
Contributed by Robert Beezer Solution [430]

T22 Suppose that U is a unitary matrix with eigenvalue A. Prove that A had modulus 1, i.e. |Al = 1.
This says that all of the eigenvalues of a unitary matrix lie on the unit circle of the complex plane.
Contributed by Robert Beezer

T30 Theorem DCP [424] tells us that the characteristic polynomial of a square matrix of size n has
degree n. By suitably augmenting the proof of Theorem DCP [424] prove that the coefficient of x in the
characteristic polynomial is (-1)n.
Contributed by Robert Beezer

T50 Theorem EIM [422] says that if A is an eigenvalue of the nonsingular matrix A, then j is an eigenvalue
of A-1. Write an alternate proof of this theorem using the characteristic polynomial and without making
reference to an eigenvector of A for A.
Contributed by Robert Beezer Solution [430]


Version 2.02


﻿
                                                              Subsection PEE.SOL Solutions 431


Subsection SOL
Solutions


T10    Contributed by Robert Beezer Statement [429]
Suppose that the characteristic polynomial of A is

                              PA (x) = ao + aix + a2x2 + --. + anzx


Then


ao = ao + ai(0) + a2(0)2 + -..-+ an(0)"h
   = PA (0)
   = det (A)- 0In)
   = det (A)


Definition CP [403]


T20    Contributed by Robert Beezer Statement [429]
Suppose that the vector x # 0 is an eigenvector of A for the two eigenvalues A and p, where A # p. Then
A - p # 0, and we also have


0=Ax-Ax
  =Ax - px
  = (A - p)x


Property AIC [86]
Definition EEM [396]
Property DSAC [87]


By Theorem SMEZV [287], either A - p = 0 or x = 0, which are both contradictions.
T50    Contributed by Robert Beezer Statement [429]
Since A is an eigenvalue of a nonsingular matrix, A # 0 (Theorem SMZE [420]). A is invertible (Theorem
NI [228]), and so -AA is invertible (Theorem MISM [221]). Thus -AA is nonsingular (Theorem NI [228])
and det (-AA) # 0 (Theorem SMZD [389]).


PA-1 ( )


det (A-1 -

1det (A-1


- In

A In7


det (-AA) det (-AA) det (A-1 - jIn

det(-A)det (-AA) AA-1 - In
de )det          A_ (-AA)      AI
    1        (i
          det -AAA1 - (-AA) In
det (-AA)  (   +AJ

    d     det (-AI7+ - (-AA)
    1        (         1    \
det (-AA) d           A     j
    1
det (-AA)de(-l2+11)
    1
    d     det (-An + AIn)
det (-AA)


Definition CP [403]

Property OCN [681]

Property MICN [681]

Theorem DRMM [391]

Theorem MMDAA [201]

Definition MI [213]

Theorem MMSMM [201]

Property MICN [681]

Property OCN [681]


Version 2.02


﻿
                                                                  Subsection PEE.SOL  Solutions 432

                         1
                   =           det (-AId + A)                      Theorem MMIM [200]
                     det (-AA)
                         1
                   =d   (      det (A - AIn)                       Property ACM [184]
                     det (-AA)
                         1
                               PA (A)                              Definition CP [403]
                     det (-AA

                   =     1     0                                   Theorem EMRCP [404]
                     det (-AA)
                   = 0                                             Property ZCN [681]

So j is a root of the characteristic polynomial of A-1 and so is an eigenvalue of A-1. This proof is due to
Sara Bucht.


Version 2.02


﻿
                                                        Section SD  Similarity and Diagonalization  433


Section SD
Similarity and Diagonalization


This section's topic will perhaps seem out of place at first, but we will make the connection soon with
eigenvalues and eigenvectors. This is also our first look at one of the central ideas of Chapter R [530].

Subsection SM
Similar Matrices


The notion of matrices being "similar" is a lot like saying two matrices are row-equivalent. Two similar
matrices are not equal, but they share many important properties. This section, and later sections in
Chapter R [530] will be devoted in part to discovering just what these common properties are.
   First, the main definition for this section.
Definition SIM
Similar Matrices
Suppose A and B are two square matrices of size n. Then A and B are similar if there exists a nonsingular
matrix of size n, S, such that A = S-1BS.                                                        A

   We will say "A is similar to B via S" when we want to emphasize the role of S in the relationship
between A and B. Also, it doesn't matter if we say A is similar to B, or B is similar to A. If one statement
is true then so is the other, as can be seen by using S-- in place of S (see Theorem SER [433] for the careful
proof). Finally, we will refer to S-1BS as a similarity transformation when we want to emphasize the
way S changes B. OK, enough about language, let's build a few examples.
Example SMS5
Similar matrices of size 5
If you wondered if there are examples of similar matrices, then it won't be hard to convince you they exist.
Define

                    -4 1 -3      -2   2                        1   2   -1   1    1
                    1    2  -1    3   -2                      0    1   -1   -2   -1
              B=-4       1   3    2   2                 S=    1    3   -1   1    1
                    -3   4  -2   -1   -3                      -2  -3    3    1  -2
                    3    1 -1     1  -4                        1   3   -1   2    1

Check that S is nonsingular and then compute

           A=--1BS
                  10 1     0    2 -5-4 1-3 -22                    1 2 -11           1
                  -1 0     1    0    0     1 2-1 3-2              0   1 -1-2 -1
             =-3       0   2    1   -3   -4 13         2    2     1   3   -11       1
                  0 0 -1 0           1    -3 4-2 -1-3 -2 -33                    1 -2
                  -4 -1    1-1      1_     31-1       1-4_ _1         3   -12       1_


    -10  -27   -29   -80   -25
    -2     6     6    10   -2
-   -3    11    -9   -14   -9
    -1   -13     0   -10   -1
    11    35     6    49    19 _


Version 2.02


﻿
                                                  Subsection SD.PSM   Properties of Similar Matrices 434


So by this construction, we know that A and B are similar.

   Let's do that again.

Example SMS3
Similar matrices of size 3
Define

                          -13   -8   -4                             1    1    2
                    B=     12    7    4                      S=    -2 -1 -3
                           24    16   7                             1   -2    0

Check that S is nonsingular and then compute

                        A = S--1BS
                              -6   -4   -1    -13   -8   -4     1    1   2
                           -3      -2   -1     12    7    4    -2   -1 -3
                               5    3    1     24   16    7     1   -2   0
                               -1 0    0
                             = 0   3   0
                               0   0 -11

So by this construction, we know that A and B are similar. But before we move on, look at how pleasing the
form of A is. Not convinced? Then consider that several computations related to A are especially easy. For
example, in the spirit of Example DUTM [379], det (A)  (-1)(3)(-1) = 3. Similarly, the characteristic
polynomial is straightforward to compute by hand, PA (x)  (-1-x)(3 -x)(-1-x) =_-(x-3)(x+ 1)2 and
since the result is already factored, the eigenvalues are transparently A = 3, -1. Finally, the eigenvectors
of A are just the standard unit vectors (Definition SUV [173]).


Subsection PSM
Properties of Similar Matrices


Similar matrices share many properties and it is these theorems that justify the choice of the word "similar."
First we will show that similarity is an equivalence relation. Equivalence relations are important in the
study of various algebras and can always be regarded as a kind of weak version of equality. Sort of alike, but
not quite equal. The notion of two matrices being row-equivalent is an example of an equivalence relation
we have been working with since the beginning of the course (see Exercise RREF.T1 1 [43]). Row-equivalent
matrices are not equal, but they are a lot alike. For example, row-equivalent matrices have the same rank.
Formally, an equivalence relation requires three conditions hold: reflexive, symmetric and transitive. We
will illustrate these as we prove that similarity is an equivalence relation.

Theorem SER
Similarity is an Equivalence Relation
Suppose A, B and C are square matrices of size n. Then

  1. A is similar to A. (Reflexive)


2. If A is similar to B, then B is similar to A. (Symmetric)

3. If A is similar to B and B is similar to C, then A is similar to C. (Transitive)


Version 2.02


﻿
                                                 Subsection SD.PSM Properties of Similar Matrices 435


Proof To see that A is similar to A, we need only demonstrate a nonsingular matrix that effects a similarity
transformation of A to A. In is nonsingular (since it row-reduces to the identity matrix, Theorem NMRRI
[72]), and
                                       InAIn = InAI = A
If we assume that A is similar to B, then we know there is a nonsingular matrix S so that A = S-1BS
by Definition SIM [432]. By Theorem MIMI [220], S-1 is invertible, and by Theorem NI [228] is therefore
nonsingular. So


(S-)-A( -1 =SAS-1
           = SS-1BSS-1

           S(SS-1) B (SS-1)
           = InBI,
           =B


Theorem MIMI [220]
Definition SIM [432]
Theorem MMA [202]
Definition MI [213]
Theorem MMIM [200]


and we see that B is similar to A.
   Assume that A is similar to B, and B is similar to C. This gives us the existence of two nonsingular
matrices, S and R, such that A = S-1BS and B = R-1CR, by Definition SIM [432]. (Notice how we have
to assume S # R, as will usually be the case.) Since S and R are invertible, so too RS is invertible by
Theorem SS [219] and then nonsingular by Theorem NI [228]. Now


                (RS)1C(RS) = S-1R-'CRS
                              = S-1 (R-1CR) S
                              = S-1BS
                              = A

so A is similar to C via the nonsingular matrix RS.


Theorem SS [219]
Theorem MMA [202]
Definition SIM [432]


0


   Here's another theorem that tells us exactly what sorts of properties similar matrices share.
Theorem SMEE
Similar Matrices have Equal Eigenvalues
Suppose A and B are similar matrices. Then the characteristic polynomials of A and B are equal, that is,
PA (x) =PB (x).
Proof Let n denote the size of A and B. Since A and B are similar, there exists a nonsingular matrix
S, such that A = S-1BS (Definition SIM [432]). Then


PA (x) = det (A - xIn)
      = det (S1BS - zln)
      = det (S1BS - xS-1nS)
      = det (S1BS - S-1xInS)
      = det (S-1 (B - zln) S)
      = det (S-1) det (B - xIn) det (S)
      = det (S-1) det (S) det (B - xIn)
      = det (S-S) det (B - xIn)
      = Cdet (In) det (B - zln)
      = 1 det (B - zIn)


Definition CP [403]
Definition SIM [432]
Theorem MMIM [200]
Theorem MMSMM [201]
Theorem MMDAA [201]
Theorem DRMM [391]
Property CMCN [680]
Theorem DRMM [391]
Definition MI [213]
Definition DM [375]


Version 2.02


﻿
                                                                 Subsection SD.D  Diagonalization 436


                   = PB (x)                                     Definition CP [403]


   So similar matrices not only have the same set of eigenvalues, the algebraic multiplicities of these
eigenvalues will also be the same. However, be careful with this theorem. It is tempting to think the
converse is true, and argue that if two matrices have the same eigenvalues, then they are similar. Not so,
as the following example illustrates.
Example EENS
Equal eigenvalues, not similar
Define

                             1-  1                  1                   0-
                                0  1                                 0  1-
and check that
                               PA (x) = PB (x) =1 - 2x + x2 = (x - 1)2
and so A and B have equal characteristic polynomials. If the converse of Theorem SMEE [434] were true,
then A and B would be similar. Suppose this is the case. More precisely, suppose there is a nonsingular
matrix S so that A = S-1BS. Then

                                 A = S-1BS = S--1I2S = s-is =12

Clearly A # I2 and this contradiction tells us that the converse of Theorem SMEE [434] is false.


Subsection D
Diagonalization


Good things happen when a matrix is similar to a diagonal matrix. For example, the eigenvalues of the
matrix are the entries on the diagonal of the diagonal matrix. And it can be a much simpler matter to
compute high powers of the matrix. Diagonalizable matrices are also of interest in more abstract settings.
Here are the relevant definitions, then our main theorem for this section.
Definition DIM
Diagonal Matrix
Suppose that A is a square matrix. Then A is a diagonal matrix if [A]d= 0 whenever i # j.    A

Definition DZM
Diagonalizable Matrix
Suppose A is a square matrix. Then A is diagonalizable if A is similar to a diagonal matrix.  A

Example DAB
Diagonalization of Archetype B
Archetype B [707] has a 3 x 3 coefficient matrix

                                           B  -[7  -6   -12]


and is similar to a diagonal matrix, as can be seen by the following computation with the nonsingular
matrix S,


           -5 -3 -2]1 [-7 -6 -12 -5 -3 -2
S--1BS =    3    2    1       5    5    7     3    2    1
           1     1    1     L 1    0    4JL 1      1    1


Version 2.02


﻿
                                                                Subsection SD.D  Diagonalization 437


                               -_2       -1] [-7   -6   -12    -5 -3 -2
                            =   2    3    1    5    5    7      3   2    1
                               -1 -2      1    1    0    4      1    1   1
                               -1 0 0
                               = 0  1 0
                               0    0 2


   Example SMS3 [433] provides yet another example of a matrix that is subjected to a similarity trans-
formation and the result is a diagonal matrix. Alright, just how would we find the magic matrix S that
can be used in a similarity transformation to produce a diagonal matrix? Before you read the statement
of the next theorem, you might study the eigenvalues and eigenvectors of Archetype B [707] and compute
the eigenvalues and eigenvectors of the matrix in Example SMS3 [433].

Theorem DC
Diagonalization Characterization
Suppose A is a square matrix of size n. Then A is diagonalizable if and only if there exists a linearly
independent set S that contains n eigenvectors of A.                                             D

Proof    (<) Let S = {x1, x2, x3, ..., xn} be a linearly independent set of eigenvectors of A for the
eigenvalues Ai, A2, A3, ..., An. Recall Definition SUV [173] and define


                             A1  0   0   ...  0
                             0   A2  0   ...  0
                      D =    0   0   A3--.    0   = [1e1|\2e2|A3e3|...Anen]

                             0   0   0   ... An

The columns of R are the vectors of the linearly independent set S and so by Theorem NMLIC [138] the
matrix R is nonsingular. By Theorem NI [228] we know R-1 exists.

        R-1AR = R-1A [x1|x2|x3 ... xn]
                = R-1[Ax1|Ax2|Ax3|...|Axn]                         Definition MM [197]
                - R1--[Aix x1x2|A3x3| -. - -|Anxn]                  Definition EEM [396]
                = R-- [AiRei lA Re2|A3Re3| -. -AnRen]               Definition MVP [194]
                = R--[R(Aiei)|R(Ake2)|R(Ases)|.. .R(Anen)]      Theorem MMSMM [201]
                = R--R[Aie e1|e2|Ase3| -. |Anen]                    Definition MM [197]
                = InD                                               Definition MI [213]
                =-D                                                 Theorem MMIM [200]

This says that A is similar to the diagonal matrix D via the nonsingular matrix R. Thus A is diagonalizable
(Definition DZM [435]).


(-) Suppose that A is diagonalizable, so there is a nonsingular matrix of size n

            T = [y1|y2|y3|   yn]


Version 2.02


﻿
                                                              Subsection SD.D Diagonalization 438


and a diagonal matrix (recall Definition SUV [173])

                     di  0   0  --   0
                     0   d2  0  ---  0
               E =   0   0  d3 .-.   0   = [dieid2e2d3e3|---dnen]

                     0   0   0  --- do

such that T-1AT = E. Then consider,


      [AyiAy2|Ay31 ... .|Ayn] = A [y1|y2|y3 l.. yn]                   Definition MM [197]
                           = AT
                           = InAT                                     Theorem MMIM [200]
                           = TT-1AT                                   Definition MI [213]
                           =TE
                           = T[dieid2e2|dse| ... dnen]
                              [T(diei)|T(d2e2)|T(d3e3)|. ...|T(dnen)] Definition MM [197]
                              [diTeid2Te2|d3Te3|...|dnTen]            Definition MM [197]
                            = [diyid2y2|d3y3| -" |dyn]                Definition MVP [194]

This equality of matrices (Definition ME [182]) allows us to conclude that the individual columns are equal
vectors (Definition CVE [84]). That is, Ay = diy2 for 1 < i < n. In other words, y2 is an eigenvector of
A for the eigenvalue d2, 1 < i <rn. (Why can't y2 = 0?). Because T is nonsingular, the set containing T's
columns, S = {yi, y2, y3, -*- -, yn}, is a linearly independent set (Theorem NMLIC [138]). So the set S
has all the required properties.                                                              U

   Notice that the proof of Theorem DC [436] is constructive. To diagonalize a matrix, we need only locate
n linearly independent eigenvectors. Then we can construct a nonsingular matrix using the eigenvectors
as columns (R) so that R-1AR is a diagonal matrix (D). The entries on the diagonal of D will be the
eigenvalues of the eigenvectors used to create R, in the same order as the eigenvectors appear in R. We
illustrate this by diagonalizing some matrices.

Example DMS3
Diagonalizing a matrix of size 3
Consider the matrix
                                              -3-8 -4]
                                         F=  12   7    4
                                            [24   16   7]

of Example CPMS3 [403], Example EMS3 [404] and Example ESMS3 [405]. F's eigenvalues and eigenspaces
are


                           AS F (3)                       {[]}


= -1                        F     1V {[]F (-1)= 1 , 0
                                          0    L1


Version 2.02


﻿
                                                                  Subsection SD.D  Diagonalization 439


Define the matrix S to be the 3 x 3 matrix whose columns are the three basis vectors in the eigenspaces
for F,
                                                1    2    1
                                        S =    2    1    0
                                               1    0    1

Check that S is nonsingular (row-reduces to the identity matrix, Theorem NMRRI [72] or has a nonzero
determinant, Theorem SMZD [389]). Then the three columns of S are a linearly independent set (Theorem
NMLIC [138]). By Theorem DC [436] we now know that F is diagonalizable. Furthermore, the construction
in the proof of Theorem DC [436] tells us that if we apply the matrix S to F in a similarity transformation,
the result will be a diagonal matrix with the eigenvalues of F on the diagonal. The eigenvalues appear on
the diagonal of the matrix in the same order as the eigenvectors appear in S. So,

                                 1F-1 -1_ [13 -8             -4]   -j   -4   -]
                       S-2F          1    0        12    7   4      j    1    0
                                1    0    1        24   16   7      1    0    1
                                6    4    2    -13   -8   -41   -    - 2 -j
                                -3  -1     12   12    7    4     2    1    0
                                -6  -4   -1     24   16    7     1    0    1
                                3  0    0
                             =0 -1      0
                               0   0   -1-

Note that the above computations can be viewed two ways. The proof of Theorem DC [436] tells us that
the four matrices (F, S, F-1 and the diagonal matrix) will interact the way we have written the equation.
Or as an example, we can actually perform the computations to verify what the theorem predicts.
   The dimension of an eigenspace can be no larger than the algebraic multiplicity of the eigenvalue by
Theorem ME [425]. When every eigenvalue's eigenspace is this large, then we can diagonalize the matrix,
and only then. Three examples we have seen so far in this section, Example SMS5 [432], Example DAB
[435] and Example DMS3 [437], illustrate the diagonalization of a matrix, with varying degrees of detail
about just how the diagonalization is achieved. However, in each case, you can verify that the geometric
and algebraic multiplicities are equal for every eigenvalue. This is the substance of the next theorem.
Theorem DMFE
Diagonalizable Matrices have Full Eigenspaces
Suppose A is a square matrix. Then A is diagonalizable if and only if 'YA (A) - a (A) for every eigenvalue
A of A.
Proof Suppose A has size n~ and k distinct eigenvalues, A1, A2, A3, . . ., Ak. Let Si {xii, xi2, xis, . ..-, xi-yA(A)}
denote a basis for the eigenspace of Ai, EA (Al), for 1 < i < k. Then

                                      S =S1UGS2 U 3 U--   Sk

is a set of eigenvectors for A. A vector cannot be an eigenvector for two different eigenvalues (see Exercise
EE.T20 [414]) so Si n Sj   0 whenever i -f j. In other words, S is a disjoint union of Si, 1 < i < k.
    (<-) The size of S is

                            k


|S=ZyA (A2)                          S disjoint union of Si
      i=1
      k
         - A (Ai)                    Hypothesis
      i=1


Version 2.02


﻿
                                                                  Subsection SD.D  Diagonalization 440


                         = n                               Theorem NEM [425]

We next show that S is a linearly independent set. So we will begin with a relation of linear dependence
on S, using doubly-subscripted scalars and eigenvectors,

        0= (alixi + a12x12 +- - - + a1A(Al)x1yA(Al)) + (a21x21 + a22x22 +- + a2-A(A2)X2-A(A2))
              + ... + (akixki + ak2xk2 +- + ak A(Ak)xk A(Ak))
Define the vectors yi, 1 < i < k by

                       Y1 = (aiixii + a12x12 + a13x13 +- + a A(1Al)x1A(Al))
                       Y2 = (a21x21 + a22x22 + a23x23 +-   + ayA(2A2)x27A(A2))
                       y3 = (a3ix3i + a32x32 + a33x33 +-   + aYA(3A3) x37A(A3))


                       Yk = (aklxkl + ak2xk2 + ak3xk3 +-  + a A(kAk)XkyA(Ak))
Then the relation of linear dependence becomes

                                     0=Y1+Y2+Y3+-+Yk
Since the eigenspace EA (Aj) is closed under vector addition and scalar multiplication, yi E EA (Xi), 1 <
i < k. Thus, for each i, the vector yi is an eigenvector of A for Xi, or is the zero vector. Recall that sets
of eigenvectors whose eigenvalues are distinct form a linearly independent set by Theorem EDELI [419].
Should any (or some) yi be nonzero, the previous equation would provide a nontrivial relation of linear
dependence on a set of eigenvectors with distinct eigenvalues, contradicting Theorem EDELI [419]. Thus
yi=0, 1 <i < k.
   Each of the k equations, yi = 0 is a relation of linear dependence on the corresponding set Si, a set
of basis vectors for the eigenspace EA (A), which is therefore linearly independent. From these relations
of linear dependence on linearly independent sets we conclude that the scalars are all zero, more precisely,
aid = 0, 1 < j < 7YA (Ai) for 1 < i < k. This establishes that our original relation of linear dependence on
S has only the trivial relation of linear dependence, and hence S is a linearly independent set.
   We have determined that S is a set of n linearly independent eigenvectors for A, and so by Theorem
DC [436] is diagonalizable.
    (-) Now we assume that A is diagonalizable. Aiming for a contradiction (Technique CD [692]), suppose
that there is at least one eigenvalue, say At, such that YA (At) # 6A (At). By Theorem ME [425] we must
have yA (At) <oaA (At), and yA (As) c aA(Ai) for 1 < i < k, i #t.
   Since A is diagonalizable, Theorem DC [436] guarantees a set of n linearly independent vectors, all of
which are eigenvectors of A. Let ni denote the number of eigenvectors in S that are eigenvectors for Xi,
and recall that a vector cannot be an eigenvector for two different eigenvalues (Exercise EE.T20 [414]). S
is a linearly independent set, so the the subset Si containing the nti eigenvectors for As must also be linearly
independent. Because the eigenspace EA (Al) has dimension ~YA (Al) and Si is a linearly independent subset
in EA (Ai), Theorem G [355] tells us that nti   yA (Ai), for 1 < i < k. Putting all these facts together gives,

       n = ni+ 22+ ns3+ ---+ nt+ ---+n                                     Definition SU [685]
         i  A(1 | A(2 | A(3               '+7    A)--'' | A(k              Theorem G [355]
         <'IA (A1) +| GA (A2) +| cA (A3) +| '. + ' | A (At) +| '. + ' | A (Ak)  Theorem ME [425]
                         = n Theorem NEM [425]


This is a contradiction (we can't have n < n!) and so our assumption that some eigenspace had less than
full dimension was false.                                                                           U
   Example SEE [396], Example CAEHW [401], Example ESMS3 [405], Example ESMS4 [407], Example
DEMS5 [411], Archetype B [707], Archetype F [724], Archetype K [746] and Archetype L [750] are all


Version 2.02


﻿
                                                                Subsection SD.D  Diagonalization 441


examples of matrices that are diagonalizable and that illustrate Theorem DMFE [438]. While we have
provided many examples of matrices that are diagonalizable, especially among the archetypes, there are
many matrices that are not diagonalizable. Here's one now.
Example NDMS4
A non-diagonalizable matrix of size 4
In Example EMMS4 [406] the matrix

                                           --2   1   -2 -4
                                           S12   1   4    9
                                         B  6    5   -2 -4
                                           3    -4    5   10_

was determined to have characteristic polynomial

                                      PB (x)   (x - 1)(x - 2)3

and an eigenspace for A = 2 of

                                                     1
                                      S (2) =        1
                                                      2

So the geometric multiplicity of A = 2 is 7B (2) = 1, while the algebraic multiplicity is aB (2) = 3. By
Theorem DMFE [438], the matrix B is not diagonalizable.
   Archetype A [702] is the lone archetype with a square matrix that is not diagonalizable, as the algebraic
and geometric multiplicities of the eigenvalue A = 0 differ. Example HMEM5 [408] is another example of a
matrix that cannot be diagonalized due to the difference between the geometric and algebraic multiplicities
of A = 2, as is Example CEMS6 [409] which has two complex eigenvalues, each with differing multiplicities.
Likewise, Example EMMS4 [406] has an eigenvalue with different algebraic and geometric multiplicities
and so cannot be diagonalized.
Theorem DED
Distinct Eigenvalues implies Diagonalizable
Suppose A is a square matrix of size n with n distinct eigenvalues. Then A is diagonalizable.  Q
Proof   Let A1, A2, A3, ..., An denote the n distinct eigenvalues of A. Then by Theorem NEM [425] we
have n =    _1 cA (X), which implies that aA (A) = 1, 1 < i < n. From Theorem ME [425] it follows that
YA (Ai) = 1, 1 < i < n. So YA (Ai) =_ (Ai), 1 < i <rn and Theorem DMFE [438] says A is diagonalizable.


Example DEHD
Distinct eigenvalues, hence diagonalizable
In Example DEMS5 [411] the matrix

                                       15    18    -8    6    -5
                                       5      3    1    -1    -3
                                H=      0    -4    5    -4    -2
                                       -43  -46    17   -14   15
                                       26    30   -12    8    -10_


has characteristic polynomial


Version 2.02


﻿
                                                                 Subsection SD.D  Diagonalization 442


and so is a 5 x 5 matrix with 5 distinct eigenvalues. By Theorem DED [440] we know H must be diago-
nalizable. But just for practice, we exhibit the diagonalization itself. The matrix S contains eigenvectors
of H as columns, one from each eigenspace, guaranteeing linear independent columns and thus the non-
singularity of S. The diagonal matrix has the eigenvalues of H in the same order that their respective
eigenvectors appear as the columns of S. Notice that we are using the versions of the eigenvectors from
Example DEMS5 [411] that have integer entries.

        S-- HS

             2    1  -1    1    1       15    18   -8     6    -5     2    1   -1    1   1
             -1  0    2    0   -1        5    3     1    -1    -3     -1   0    2    0  -1
        =-2      0    2   -1 -2         0     -4    5    -4    -2     -2   0    2   -1 -2
            -4 -1     0   -2 -1        -43 -46     17   -14    15     -4 -1     0   -2 -1
            2    2    1    2    1_    _26     30   -12    8   -10    _2    2    1   2    1_
            -3 -3     1   -1    1     15    18   -8     6    -5     2    1   -1   1    1
            -1 -2     1    0    1     5     3     1    -1    -3    -1    0    2   0   -1
        =-5 -4        1   -1   2      0    -4     5    -4    -2    -2    0    2  -1 -2
            10   10  -3    2   -4    -43   -46   17   -14    15    -4   -1   0   -2   -1
            -7 -6     1   -1   3      26    30  -12     8   -10     2    2    1   2    1
            -3   0   0 0 0
            0    -1 0 0 0
        =    0   0   0 0 0
             0   0   0 1 0
             0   0   0 0 2


   Archetype B [707] is another example of a matrix that has as many distinct eigenvalues as its size, and
is hence diagonalizable by Theorem DED [440].
   Powers of a diagonal matrix are easy to compute, and when a matrix is diagonalizable, it is almost as
easy. We could state a theorem here perhaps, but we will settle instead for an example that makes the
point just as well.
Example HPDM
High power of a diagonalizable matrix
Suppose that
                                           19    0    6     13
                                      A   -33   -1   -9    -21
                                    A      21   -4    12   21
                                          [-36   2   -14   -28]
and we wish to compute A20. Normally this would require 19 matrix multiplications, but since A is
diagonalizable, we can simplify the computations substantially. First, we diagonalize A. With


                                      S=Jf2 3 -33


we find


                -6   1   -3 -6       19    0    6    13    ~1   -1    2   -1

D = S-1AS =      0   2   -2 -3      -33 -1     -9    -21    -2   3   -3    3
                 3   0    1    2     21   -4    12   21     1    1    3    3
                 -1 -1    1    1 __-36     2   -14 -28j[-2       1   -4    0]


Version 2.02


﻿
Subsection SD.FS Fibonacci Sequences 443


                          -1 0 0 0
                          0    0 0 0
                          0    0 2 0
                          0    0 0 1

Now we find an alternate expression for A20


A20 = AAA... A
    = InAInAInAIn... InAIn
    = (SS-1) A (SS-1) A (SS-
    = S (S-1AS) (S-1AS) (S-
    = SDDD ...DS-1
    = SD20S-1


-1) A (SS-1) ... (SS-1) A (SS-1)
J AS) ... (S--1AS) S--1


and since D is a diagonal matrix, powers are much easier to compute,


   [-1 0 0 012(
S 0 0 0 0
    0   0  2  0
    (-1)2000

      0    (0)20
S     0'        3
      0      0

  1   -1   2 -
  -2 3 -3 3


0


  0     0]
  0     0      _-1
    S-S


 (2)20  0
 0     (1)20
1   1 0      0     0
    0  0     0     0
    0  0  1048576  0
  _ 0  0     0     1_-
  2097148    4194297
  -3145719  -6291441
  3145728    6291453
  -4194298  -8388596]


K


K


1    1   3
-2   1   -4
6291451
-9437175   -
9437175    -
-12582900-


  3
  0
2
-5
-2
-2


-6
0
3
_-1


1
2
0
-1


-3
-2
1
1


-6
-3
2
1]


Notice how we effectively replaced the twentieth power of A by the twentieth power of D, and how a high
power of a diagonal matrix is just a collection of powers of scalars on the diagonal. The price we pay for
this simplification is the need to diagonalize the matrix (by computing eigenvalues and eigenvectors) and
finding the inverse of the matrix of eigenvectors. And we still need to do two matrix products. But the
higher the power, the greater the savings.


Subsection FS
Fibonacci Sequences


Example FSCF
Fibonacci sequence, closed form
The Fibonacci sequence is a sequence of integers defined recursively by


ao = 0


a1= 1


an+l = an + an-1 , n > 1


Version 2.02


﻿
                                                             Subsection SD.FS Fibonacci Sequences 444


So the initial portion of the sequence is 0, 1, 1, 2, 3, 5, 8, 13, 21, .... In this subsection we will illustrate
an application of eigenvalues and diagonalization through the determination of a closed-form expression
for an arbitrary term of this sequence.
   To begin, verify that for any n > 1 the recursive statement above establishes the truth of the statement

                                          1      [1    1
                                          an   _ 0   1   an_1
                                        an+1_     1  1_  .an _

Let A denote this 2 x 2 matrix. Through repeated applications of the statement above we have

                       an   =A     n-1    A2 an-2      A3 an-3             n...A a0
                       an+1_       an _       an-_         an-2_             [a1

In preparation for working with this high power of A, not unlike in Example HPDM [441], we will diago-
nalize A. The characteristic polynomial of A is PA (x) =9x2 - x - 1, with roots (the eigenvalues of A by
Theorem EMRCP [404])

                               1+ V5                                1-V5
                                 2                                     2

With two distinct eigenvalues, Theorem DED [440] implies that A is diagonalizable. It will be easier to
compute with these eigenvalues once you confirm the following properties (all but the last can be derived
from the fact that p and a are roots of the characteristic polynomial, in a factored or unfactored form)

         p+b=1             p =-1            1+p=p2             1+6= 62           p-6=v5

Then eigenvectors of A (for p and a, respectively) are

                                 1                                  1
                                 p                                 -6

which can be easily confirmed, as we demonstrate for the eigenvector for p,

                                  0 1    1_      p        p       1
                                  1 1    p     1+p_      p2   -

From the proof of Theorem DC [436] we know A can be diagonalized by a matrix S with these eigenvectors
as columns, giving D = S-1AS. We list S, S-1 and the diagonal matrix D,


OK, we have everything in place now. The main step in the following is to replace A by SDS1. Here we


                             Ean 1 -A2 [aol


                                    -(SDS-1)2 [ao]


- SDS-1SDS-1SDS-1 -"-"- SDS_1 [ao]
                                   Dai_

 =SDDD - - - DS_1[aol
                   [ai_


Version 2.02


﻿
                                                          Subsection SD.FS Fibonacci Sequences 445


                                 = SD"S_1 [ao
                                             11
                                     1  1  p 0   ) 1    -     1   a
                                     _p 6 0 060   p -0   p   -1   a1
                                     1    1 1    p"  0   -b    1   0
                                     p -0 p 0    0   on   p   -1   1
                                     1    1 1    p"  0    1
                                     p -06 p     0   on  -1_  
                                          -              L_1 1 1 p "

                                    p -06 p      -o"
                                      1     p" - O
                                    = p - n+1 _6n+1


Performing the scalar multiplication and equating the first entries of the two vectors, we arrive at the
closed form expression


                              a =    1      -0( p" - )
                                   p - b
                                   11 (+          n     1 -V

                                     14
                                   =1+  - 1 -  n
                                   2ndV


Notice that it does not matter whether we use the equality of the first or second entries of the vectors, we
will arrive at the same formula, once in terms of n and again in terms of n +1. Also, our definition clearly
describes a sequence that will only contain integers, yet tpco theo irrational number i  might
make us suspicious. But no, our expression for a will always yield an integer!
   The Fibonacci sequence, and generalizations of it, have been extensively studied (Fibonacci lived in the
12th and 13th centuries). There are many ways to derive the closed-form expression we just found, and
our approach may not be the most efficient route. But it is a nice demonstration of how diagonalization
can be used to solve a problem outside the field of linear algebra.


   We close this section with a commen t  an important upcoming theorem that we prove in Chater


theseinabl (Deighintn ZM[43]) stud the limlaritygebransformaie statmn acofmplishe diagonalization sl


applies to a slightly broader class of matrices, known as "normal" matrices (Definition NRML [606]),
which are matrices that commute with their adjoints. With this expanded category of matrices, the result
becomes an equivalence (Technique E [690]). See Theorem OD [607] and Theorem OBNM [609] in Section
OD [601] for all the details.


Version 2.02


﻿
                                                           Subsection SD.READ   Reading Questions 446


Subsection READ
Reading Questions


  1. What is an equivalence relation?

  2. State a condition that is equivalent to a matrix being diagonalizable, but is not the definition.

  3. Find a diagonal matrix similar to
                                               A =[5 8
                                                    -4 7


Version 2.02


﻿
                                                                      Subsection SD.EXC  Exercises 447


Subsection EXC
Exercises


C20 Consider the matrix A below. First, show that A is diagonalizable by computing the geometric
multiplicities of the eigenvalues and quoting the relevant theorem. Second, find a diagonal matrix D
and a nonsingular matrix S so that S-1AS = D. (See Exercise EE.C20 [414] for some of the necessary
computations.)
                                            18  -15    33   -15
                                      A=-4       8     -6    6
                                        A  -9    9    -16    9
                                            5    -6     9    -4]

Contributed by Robert Beezer Solution [447]

C21 Determine if the matrix A below is diagonalizable. If the matrix is diagonalizable, then find a
diagonal matrix D that is similar to A, and provide the invertible matrix S that performs the similarity
transformation. You should use your calculator to find the eigenvalues of the matrix, but try only using
the row-reducing function of your calculator to assist with finding eigenvectors.

                                            1    9     9     24
                                       A_= -3 -27 -29       -68
                                            1    11    13    26
                                            1    7     7     18

Contributed by Robert Beezer Solution [447]

C22 Consider the matrix A below. Find the eigenvalues of A using a calculator and use these to construct
the characteristic polynomial of A, PA (x). State the algebraic multiplicity of each eigenvalue. Find all of
the eigenspaces for A by computing expressions for null spaces, only using your calculator to row-reduce
matrices. State the geometric multiplicity of each eigenvalue. Is A diagonalizable? If not, explain why. If
so, find a diagonal matrix D that is similar to A.

                                            19    25    30    5
                                            A -23-30   -35   -5
                                         A   7     9    10    1
                                            -3    -4    -5   -1_

Contributed by Robert Beezer Solution [448]

T15 Suppose that A and B are similar matrices. Prove that As and B3 are similar matrices. Generalize.
Contributed by Robert Beezer Solution [449]

T16 Suppose that A and B are similar matrices, with A nonsingular. Prove that B is nonsingular, and
that A-- is similar to B-1.
Contributed by Robert Beezer Solution [449]

T17 Suppose that B is a nonsingular matrix. Prove that AB is similar to BA.
Contributed by Robert Beezer Solution [449]


Version 2.02


﻿
                                                                      Subsection SD.SOL  Solutions 448


Subsection SOL
Solutions


C20    Contributed by Robert Beezer  Statement [446]
Using a calculator, we find that A has three distinct eigenvalues, A = 3, 2, -1, with A = 2 having algebraic
multiplicity two, cA (2) = 2. The eigenvalues A = 3, -1 have algebraic multiplicity one, and so by Theorem
ME [425] we can conclude that their geometric multiplicities are one as well. Together with the computation
of the geometric multiplicity of A = 2 from Exercise EE.C20 [414], we know


7A(3)= aA (3)= 1


YA (2)= aA (2)= 2


7A (-I) = aA (-I) = I


This satisfies the hypotheses of Theorem DMFE [438], and so we can conclude that A is diagonalizable.
    A calculator will give us four eigenvectors of A, the two for A = 2 being linearly independent presumably.
Or, by hand, we could find basis vectors for the three eigenspaces. For A = 3, -1 the eigenspaces have
dimension one, and so any eigenvector for these eigenvalues will be multiples of the ones we use below. For
A = 2 there are many different bases for the eigenspace, so your answer could vary. Our eigenvectors are
the basis vectors we would have obtained if we had actually constructed a basis in Exercise EE.C20 [414]
rather than just computing the dimension.
    By the construction in the proof of Theorem DC [436], the required matrix S has columns that are
four linearly independent eigenvectors of A and the diagonal matrix has the eigenvalues on the diagonal
(in the same order as the eigenvectors in S). Here are the pieces, "doing" the diagonalization,


-1
-2
0
[1


0  -3  6 - 18
-1 -1 0           -4
0    1 -3         -9
1    0    1_       5


-15
8
9
-6


33    -15    -1
-6     6     -2
-16    9      0
9     -4] _[1


0   -3    6       3 0 0     0
-1 -1     0       0 2 0     0
0    1   -3       0 0 2     0
1    0    1 _     0 0 0 -1_


C21    Contributed by Robert Beezer  Statement [446]
A calculator will provide the eigenvalues A = 2, 2, 1, 0, so we can reconstruct the characteristic polynomial
as
                                       PA (x) =_(x - 2)2(x - 1)x
so the algebraic multiplicities of the eigenvalues are


aA (2) = 2


aA (1) = 1


aA (0) = 1


Now compute eigenspaces by hand, obtaining null spaces for each of the three eigenvalues by constructing
the correct singular matrix (Theorem EMNS [405]),


A -2I4-
             1
             1


9
-29
11
7


9
-29
11
7


EA (2) =   (A - 2I4) K=


{


24
-68
26
16 _


24]
-68
26]
17 _


       [1
RREF    0
        0


   0


     1

RREF    0
        0
        [0


0
1
0
0


K


0
1
0
0


{


-3
  2
  2
  0
  0]
  3

  05
  _2 _
  -
  3
  13

  0_


0
-1
1
0


A -1I4      <
             1


9
-28
11
7


9
-29
12
7


0
1
0
0


0
0
1
0


Version 2.02


﻿
Subsection SD.SOL  Solutions 449


EA (1) = V(A - I4) =


{


[


A -0I4


1
-3
1
1


9
-27
11
7


9
-29
13
7


5             (   5
3
13               _13
3                 5
     33

24            1  0  0  -3
-68   RREF    0  1 0    5
26            0  0  1 -2
18            0  0 0    0
3
-5
2
1]


EA(O) = A (A - -14) =


L


From this we can compute the dimensions of the eigenspaces to obtain the geometric multiplicities,


7A (2) =2


7A (1)= 1


7A (0) =1


For each eigenvalue, the algebraic and geometric multiplicities are equal and so by Theorem DMFE [438]
we now know that A is diagonalizable. The construction in Theorem DC [436] suggests we form a matrix
whose columns are eigenvectors of A


3     0
S = 5-1
0     1
2 0


5     3
-13 -5
5     2
3     1_


Since det (S) = -1 -f 0, we know that S is nonsingular (Theorem SMZD [389]), so the columns of S are a
set of 4 linearly independent eigenvectors of A. By the proof of Theorem SMZD [389] we know

                                                  2 0 0 0

                                       S--'AS =0 2 0
                                                  0 0 1 0
                                                  _0 0 0 0_
a diagonal matrix with the eigenvalues of A along the diagonal, in the same order as the associated
eigenvectors appear as columns of S.
C22 Contributed by Robert Beezer Statement [446]
A calculator will report A = 0 as an eigenvalue of algebraic multiplicity of 2, and A = -1 as an eigenvalue
of algebraic multiplicity 2 as well. Since eigenvalues are roots of the characteristic polynomial (Theorem
EMRCP [404]) we have the factored version

                    PA (x) =_(x - 0)2(x - (-1))2 =9z2(2 + 2x + 1) = z4 + 2x3 + 2
The eigenspaces are then


        A=0


A - (0)I4 -


19
-23
7
-3


25
-30
9
-4


30
-35
10
-5


5                0 -i
-5   RREF    0
1            0   0
-1]          0   0
  5      5
  -5     -4

  .0    _1


-5
5
0
0


-5
4
0
0]


SA (0) =Af(C- (0)14) =


Version 2.02


﻿
Subsection SD.SOL  Solutions 450


A = -1


A - (-1)I4


20
-23
7
-3


25
-29
9
-4


30
-35
11
-5


5]
-5
1
0]


           Fi0
 RREF  0
         0L0


1      -4
2      3
_     _0
0j     1


-1
2
0
0


4
-3
0
0_


SA (-1) =  f(C - (-1)14) =


{


L


Each eigenspace above is described by a spanning set obtained through an application of Theorem BNS [139]
and so is a basis for the eigenspace. In each case the dimension, and therefore the geometric multiplicity,
is 2.
   For each of the two eigenvalues, the algebraic and geometric multiplicities are equal. Theorem DMFE
[438] says that in this situation the matrix is diagonalizable. We know from Theorem DC [436] that when
we diagonalize A the diagonal matrix will have the eigenvalues of A on the diagonal (in some order). So
we can claim that
                                             0 0    0    0
                                          D=0 0     0    0
                                          D  0 0 -1      0
                                             _0 0   0   -1_

T15    Contributed by Robert Beezer   Statement [446]
By Definition SIM [432] we know that there is a nonsingular matrix S so that A = S-1BS. Then

                A3   (S1BS)3


(S- BS) (S-1BS) (S-1BS)
S-1B(SS-1)B(SS--)BS
S--1B(I3)B(I3)BS
S-1BBBS
S--1B3S


Theorem MMA [202]
Definition MI [213]
Theorem MMIM [200]


This equation says that A3 is similar to B3 (via the matrix S).
   More generally, if A is similar to B, and m is a non-negative integer, then Am is similar to Bm. This
can be proved using induction (Technique I [694]).
T16    Contributed by Steve Canfield  Statement [446]
A being similar to B means that there exists an S such that A = S-1BS. So, B = SAS-1 and because S,
A, and S-i are nonsingular, by Theorem NPNT [226], B is nonsingular.


(S--BS)

S--B-1 (S--)i


Definition SIM [432]

Theorem SS [219]


S-1B-1S       Theorem MIMI [220]


Then by Definition SIM [432], A-1 is similar to B-1.
T17    Contributed by Robert Beezer   Statement [446]
The nonsingular (invertible) matrix B will provide the desired similarity transformation,


B-1 (BA) B = (B1B) (AB)
            = InAB


Theorem MMA [202]
Definition MI [213]


Version 2.02


﻿
                                        Subsection SD.SOL   Solutions 451


= AB                              Theorem MMIM [200]


Version 2.02


﻿
                                                            Annotated Acronyms SD.E   Eigenvalues 452


Annotated Acronyms E
Eigenvalues


Theorem EMRCP [404]
Much of what we know about eigenvalues can be traced to analysis of the characteristic polynomial. When
we first defined eigenvalues, you might have wondered if they were scarce, or abundant. The characteristic
polynomial allows us to answer a question like this with a result like Theorem NEM [425] which tells us
there are always a few eigenvalues, but never too many.

Theorem EMNS [405]
If Theorem EMRCP [404] allows us to learn about eigenvalues through what we know about roots of
polynomials, then Theorem EMNS [405] allows us to learn about eigenvectors, and eigenspaces, from what
we already know about null spaces. These two theorems, along with Definition EEM [396], provide the
starting points for discerning the properties of eigenvalues and eigenvectors (to say nothing of actually
computing them).

Theorem HMRE [427]
As we have remarked before, we choose to include all of the complex numbers in our set of allowed scalars,
whereas many introductory texts restrict their attention to just the real numbers. Here is one of the
payoffs to this approach. Begin with a matrix, possibly containing complex entries, and require the matrix
to be Hermitian (Definition HM [205]). In the case of only real entries, this boils down to just requiring
the matrix to be symmetric (Definition SYM [186]). Generally, the roots of a characteristic polynomial,
even with all real coefficients, can have complex numbers as roots. But for a Hermitian matrix, all of the
eigenvalues are real numbers! When somebody tells you mathematics can be beautiful, this is an example
of what they are talking about.

Theorem DC [436]
Diagonalizing a matrix, or the question of if a matrix is diagonalizable, could be viewed as one of a
handful of central questions in linear algebra. Here we have an unequivocal answer to the question of "if,"
along with a proof containing a construction for the diagonalization. So this theorem is of theoretical and
computational interest. This topic will be important again in Chapter R [530].

Theorem DMFE [438]
Another unequivocal answer to the question of if a matrix is diagonalizable, with perhaps a simpler condi-
tion to test. The proof also tells us how to construct the necessary set of n linearly independent eigenvectors
-just round up bases for each eigenspace and join them together. No need to test the linear independence
of the combined set.


Version 2.02


﻿


Chapter LT

Linear Transformations
0                                                                                             -0


In the next linear algebra course you take, the first lecture might be a reminder about what a vector space
is (Definition VS [279]), their ten properties, basic theorems and then some examples. The second lecture
would likely be all about linear transformations. While it may seem we have waited a long time to present
what must be a central topic, in truth we have already been working with linear transformations for some
time.
   Functions are important objects in the study of calculus, but have been absent from this course until
now (well, not really, it just seems that way). In your study of more advanced mathematics it is nearly
impossible to escape the use of functions -they are as fundamental as sets are.


Section LT
Linear Transformations


Early in Chapter VS [279] we prefaced the definition of a vector space with the comment that it was "one
of the two most important definitions in the entire course." He comes the other. Any capsule summary
of linear algebra would have to describe the subject as the interplay of linear transformations and vector
spaces. Here we go.

Subsection LT
Linear Transformations


Definition LT
Linear Transformation
A linear transformation, T: U H V, is a function that carries elements of the vector space U (called
the domain) to the vector space V (called the codomain), and which has two additional properties

  1. T (ui+ u2) = T (ui) + T (u2) for all ui, u2 E U

  2. T (au) =aT (u) for all u E&U and all ca EC

(This definition contains Notation LT.)                                                        A
   The two defining conditions in the definition of a linear transformation should "feel linear," whatever
that means. Conversely, these two conditions could be taken as exactly what it means to be linear. As
every vector space property derives from vector addition and scalar multiplication, so too, every property


453


﻿
                                                           Subsection LT.LT  Linear Transformations 454


of a linear transformation derives from these two defining properties. While these conditions may be
reminiscent of how we test subspaces, they really are quite different, so do not confuse the two.
    Here are two diagrams that convey the essence of the two defining properties of a linear transformation.
In each case, begin in the upper left-hand corner, and follow the arrows around the rectangle to the lower-
right hand corner, taking two different routes and doing the indicated operations labeled on the arrows.
There are two results there. For a linear transformation these two expressions are always equal.

                                            T
                         uT, U2>T (ui), T (u2)


                                         T
                         U1 + u2                 > T (u1i+ u2) = T (ui) +T(u2)
                    Diagram DLTA. Definition of Linear Transformation, Additive


                                               T
                                   u                    >T (u)


                                   {" T
                                   au              >T(au)=aT(u)
                 Diagram DLTM. Definition of Linear Transformation, Multiplicative


A couple of words about notation. T is the name of the linear transformation, and should be used when
we want to discuss the function as a whole. T (u) is how we talk about the output of the function, it is a
vector in the vector space V. When we write T (x + y) = T (x) + T (y), the plus sign on the left is the
operation of vector addition in the vector space U, since x and y are elements of U. The plus sign on the
right is the operation of vector addition in the vector space V, since T (x) and T (y) are elements of the
vector space V. These two instances of vector addition might be wildly different.
    Let's examine several examples and begin to form a catalog of known linear transformations to work
with.

Example ALT
A linear transformation
Define T: C3 a C2 by describing the output of the function for a generic input with the formula


                                           T([ ])      211-|- 31


and check the two defining properties.


T (x + y) =T(2 + y2


               (T 12 + yi
            T - 32+Y2
                   \[ Y 3J


Version 2.02


﻿
                                                         Subsection LT.LT Linear Transformations 455


                                            [2(xil+ yi) + (x3 + y3)
                                                 -4(x2 + y2)_

                                            I 2x1 + x3) + (2y1 + y3)
                                                -4x2 + (-4)Y2
                                            2x1+ x3      2y1 + Y3

                                            T  4x2    +    -
                                          =T    x2    +T      Y2


                                          = T (x) + T (y)

and

                                                  x1
                                  T (cox) = T  a x2
                                                  .33
                                                  alil
                                         = T     ax2
                                               [ox3 j
                                            _2(ax) + (ax3)
                                                -4(az2)
                                            _a(2xi +x3)
                                              a(-4x2)
                                              2xi + X3
                                              a-4x2
                                                  x1
                                         =caT     x2
                                                 LX3_
                                         =aT (x)


So by Definition LT [452], T is a linear transformation.

   It can be just as instructive to look at functions that are not linear transformations. Since the defining
conditions must be true for all vectors and scalars, it is enough to find just one situation where the
properties fail.

Example NLT
Not a linear transformation
Define S: C3 a C3 by
                                             [i 4xi + 2x21
                                    S    x2    =        0
                                           za_ z1 + 3x3 - 2]
This function "looks" linear, but consider


       1         8      24
3S[2) 3 0 = 0
     \3          8      24


Version 2.02


﻿
                                                        Subsection LT.LT Linear Transformations 456


while
                                        1           3        ~24
                                 S   3  2     =S( 6      =   0
                                        3__9_                _28


So the second required property fails for the choice of a = 3 and x = 2 and by Definition LT [452], S is
                                                                 .3_
not a linear transformation. It is just about as easy to find an example where the first defining property
fails (try it!). Notice that it is the "-2" in the third component of the definition of S that prevents the
function from being a linear transformation.

Example LTPM
Linear transformation, polynomials to matrices
Define a linear transformation T: P3 1 J-M22 by

                              T (a+bx+cx2+dx3)_ajb a-2c

We verify the two defining conditions of a linear transformations.
                 T(x + y) = Tl((ai + bizx+ cix2 + dix3)+-(a2+b2x+-c2x2 +d2x3))
                          = T ((ai + a2) + (bi + b2)x + (ci + c2)x2 + (di + d2)x3)
                             (ai+a2)+(bi+b2)       (ai+a2) - 2(ci + c2)
                                    d1 + d2        (b1 + b2) - (d1+ d2)

                             (ai,+ bi) + (a2 + b2) (ai- 2c1) + (a2 - 2c2)
                                    +d1+d2          (bi - di) + (b2 - d2)
                             E ai+bi ai- 2c11  a2+b2 a2-2c2
                                d1     b1-d1_        d2    b2-d2_
                          = T (ail+ bizx+ ci2 + dix3)+-T (a2+-b2x+-c2x2 +d2x3)
                          = T (x) + T (y)

and

                    T (ax) = T (a(a+ bz + cx2 +dx3))
                          = T ((aa) + (ab)x + (ac)x2 + (ad)x3)
                             [(a) + (ab) (a) - 2(ac)1
                             [    ad       (ab) - (ad)
                             _aja +b) aja -2c)1
                               Eacd     ajb -d) _
                               [a+b a-2c]

                          =caT (a+b+cz9+dz3)
                            = aT (x)

So by Definition LT [452], T is a linear transformation.


Example LTPP
Linear transformation, polynomials to polynomials
Define a function S: P4 H P5 by
                                       S(p(x)) = (x - 2)p(x)


Version 2.02


﻿
                                                Subsection LT.LTC  Linear Transformation Cartoons 457


Then


         S (p(x) + q(x)) = (x - 2)(p(x) + q(x)) = (x - 2)p(x) + (x - 2)q(x) = S (p(x)) + S (q(x))
              S (ap(x)) = (x - 2)(ap(x)) = (x - 2)cap(x) = a(x - 2)p(x) = aS (p(x))


So by Definition LT [452], S is a linear transformation.


   Linear transformations have many amazing properties, which we will investigate through the next few
sections. However, as a taste of things to come, here is a theorem we can prove now and put to use
immediately.


Theorem LTTZZ
Linear Transformations Take Zero to Zero
Suppose T: U H V is a linear transformation. Then T (0) = 0.                                      D


Proof The two zero vectors in the conclusion of the theorem are different. The first is from U while the
second is from V. We will subscript the zero vectors in this proof to highlight the distinction. Think about
your objects. (This proof is contributed by Mark Shoemaker).


T (Ou) = T (OOu)
       = OT (Oy)
       = ov


Theorem ZSSM [286] in U
Definition LT [452]
Theorem ZSSM [286] in V


0


Return to Example NLT [454] and compute S


0         0
0     =   0    to quickly see again
0_        -2]


that S is not

0 0
L0 0- as an


a linear transformation, while in Example LTPM [455] compute S (0 + Ox + 0x2 + 0x3)
example of Theorem LTTZZ [456] at work.


Subsection LTC
Linear Transformation Cartoons


Throughout this chapter, and Chapter R [530], we will include drawings of linear transformations. We will
call them "cartoons," not because they are humorous, but because they will only expose a portion of the
truth. A Bugs Bunny cartoon might give us some insights on human nature, but the rules of physics and
biology are routinely (and grossly) violated. So it will be with our linear transformation cartoons.
Here is our first, followed by a guide to help you understand how these are meant to describe fundamental
truths about linear transformations, while simultaneously violating other truths.


Version 2.02


﻿
Subsection LT.MLT  Matrices and Linear Transformations 458


T


U


V


                            Diagram GLT. General Linear Transformation

Here we picture a linear transformation T: U H V, where this information will be consistently displayed
along the bottom edge. The ovals are meant to represent the vector spaces, in this case U, the domain,
on the left and V, the codomain, on the right. Of course, vector spaces are typically infinite sets, so you'll
have to imagine that characteristic of these sets. A small dot inside of an oval will represent a vector
within that vector space, sometimes with a name, sometimes not (in this case every vector has a name).
The sizes of the ovals are meant to be proportional to the dimensions of the vector spaces. However, when
we make no assumptions about the dimensions, we will draw the ovals as the same size, as we have done
here (which is not meant to suggest that the dimensions have to be equal).
   To convey that the linear transformation associates a certain input with a certain output, we will draw
an arrow from the input to the output. So, for example, in this cartoon we suggest that T (x) = y.
Nothing in the definition of a linear transformation prevents two different inputs being sent to the same
output and we see this in T (u) = v = T (w). Similarly, an output may not have any input being sent
its way, as illustrated by no arrow pointing at t. In this cartoon, we have captured the essence of our one
general theorem about linear transformations, Theorem LTTZZ [456], T (OU) = Ov. On occasion we might
include this basic fact when it is relevant, at other times maybe not. Note that the definition of a linear
transformation requires that it be a function, so every element of the domain should be associated with
some element of the codomain. This will be reflected by never having an element of the domain without
an arrow originating there.
   These cartoons are of course no substitute for careful definitions and proofs, but they can be a handy
way to think about the various properties we will be studying.


Subsection MLT
Matrices and Linear Transformations


If you give me a matrix, then I can quickly build you a linear transformation. Always. First a motivating
example and then the theorem.

Example LTM
Linear transformation from a matrix
Let
                                              3 -1 8      1
                                        A =   2   0   5 -2
                                              1   1   3 -7


Version 2.02


﻿
                                            Subsection LT.MLT   Matrices and Linear Transformations 459


and define a function P: C4 H C3 by
                                             P (x) =Ax
So we are using an old friend, the matrix-vector product (Definition MVP [194]) as a way to convert a
vector with 4 components into a vector with 3 components. Applying Definition MVP [194] allows us to
write the defining formula for P in a slightly different form,


               3 -1 8
P(x)=Ax =      2   0   5
               1 1 3


1    [          3]
-2     2
-7    X3        1
     _x4_


      -1         8
+ x2 0 + x3 5
       1         3


       1
+ X4 -
      -7


So we recognize the action of the function P as using the components of the vector (z1, z2, x3, 34) as
scalars to form the output of P as a linear combination of the four columns of the matrix A, which are
all members of C3, so the result is a vector in C3. We can rearrange this expression further, using our
definitions of operations in C3 (Section VO [83]).


P (x) = Ax
            .3       -1 8
      = xi 2 + x2 0    + x3 5
            1         1         3
         3xi      -x2       8X3
      =  2x1 +      0   + 5X3 +
          I]      [  2] _   3x3_
         3xi - x2 + 8x3 +3:4
      =    2x1+ 533-2x4
         Lz1 + X2 + 3X3 - 7X4]


+34 1

   X4
 -2X4
 -74]


2]
7


Definition of P

Definition MVP [194]


Definition CVSM [85]


Definition CVA [84]


You might recognize this final expression as being similar in style to some previous examples (Example ALT
[453]) and some linear transformations defined in the archetypes (Archetype M [754] through Archetype
R [769]). But the expression that says the output of this linear transformation is a linear combination of
the columns of A is probably the most powerful way of thinking about examples of this type.
   Almost forgot     we should verify that P is indeed a linear transformation. This is easy with two
matrix properties from Section MM [194].


and


P (x + y)= A (x + y)
          = Ax + Ay
          =P(x) +P(y)


   P (ax) = A (ax)
          = a (Ax)
          = aP (x)


Definition of P
Theorem MMDAA [201]
Definition of P


Definition of P
Theorem MMSMM [201]
Definition of P


So by Definition LT [452], P is a linear transformation.
   So the multiplication of a vector by a matrix "transforms" the input vector into an output vector,
possibly of a different size, by performing a linear combination. And this transformation happens in a
"linear" fashion. This "functional" view of the matrix-vector product is the most important shift you can
make right now in how you think about linear algebra. Here's the theorem, whose proof is very nearly an
exact copy of the verification in the last example.


Version 2.02


﻿
Subsection LT.MLT  Matrices and Linear Transformations 460


Theorem MBLT
Matrices Build Linear Transformations
Suppose that A is an m x n matrix. Define a function T:
transformation.
Proof


Cm a Ctm by T (x) = Ax. Then T is a linear
                                         D-


T (x + y)= A (x + y)
         = Ax + Ay
         = T (x) + T (y)


Definition of T
Theorem MMDAA [201]
Definition of T


and


T (ax) = A (ax)
       = a (Ax)
       = aT (x)


Definition of T
Theorem MMSMM [201]
Definition of T


So by Definition LT [452], T is a linear transformation.


0


   So Theorem MBLT [459] gives us a rapid way to construct linear transformations. Grab an m x n
matrix A, define T (x) = Ax and Theorem MBLT [459] tells us that T is a linear transformation from C"
to Cm, without any further checking.
   We can turn Theorem MBLT [459] around. You give me a linear transformation and I will give you a
matrix.
Example MFLT
Matrix from a linear transformation
Define the function R: C3 - C4 by

                                               [ 2x1 - 3x2 + 4x3
                                      X1
                                 R    x2    =   _X      2+X
                                                  -i + 52 -3x3
                                          -43:3 jX

You could verify that R is a linear transformation by applying the definition, but we will instead massage the
expression defining a typical output until we recognize the form of a known class of linear transformations.


      -I
R x2
     x3]


  2x1 - 3x2 + 433
  X1 + X2 +3
  -xi + 5x2- 3x3


  2xi ~-3x2
       + -  ] +

   0 _  X 2 _ _
     2         -3
     1         1
XI -1+   52 j+

     0         1
  2 -3 4-
  11     1     1
  -1 5 -3 X
  0    1   -4    za_


4x3
X3
-3x3
-4X3_
     4
     1
 X3   3

    L-4j


Definition CVA [84]


Definition CVSM [85]


Definition MVP [194]


Version 2.02


﻿
                                            Subsection LT.MLT Matrices and Linear Transformations 461


So if we define the matrix
                                               2   -3    4
                                               1    1    1
                                           B  -1    5   -3
                                               0    1   -4
then R (x) = Bx. By Theorem MBLT [459], we can easily recognize R as a linear transformation since it
has the form described in the hypothesis of the theorem.
   Example MFLT [459] was not accident. Consider any one of the archetypes where both the domain
and codomain are sets of column vectors (Archetype M [754] through Archetype R [769]) and you should
be able to mimic the previous example. Here's the theorem, which is notable since it is our first occasion
to use the full power of the defining properties of a linear transformation when our hypothesis includes a
linear transformation.
Theorem MLTCV
Matrix of a Linear Transformation, Column Vectors
Suppose that T: C"      Cm is a linear transformation. Then there is an m x n matrix A such that
T (x) =Ax.                                                                                        D
Proof The conclusion says a certain matrix exists. What better way to prove something exists than to
actually build it? So our proof will be constructive (Technique C [690]), and the procedure that we will
use abstractly in the proof can be used concretely in specific examples.
   Let ei, e2, e3, ..., en be the columns of the identity matrix of size n, In (Definition SUV [173]).
Evaluate the linear transformation T with each of these standard unit vectors as an input, and record the
result. In other words, define n vectors in Cm, Ai, 1 < i  n by

                                             Ai = T (eZ)

Then package up these vectors as the columns of a matrix

                                       A = [A1|A2|A3| -..-An]

Does A have the desired properties? First, A is clearly an m x n matrix. Then

       T (x) = T (Inx)                                                  Theorem MMIM [200]
            = T ([eie2e3| -. - den] x)                                  Definition SUV [173]
            = T ([x]1 ei + [x]2 e2 + [x]3 e3 + -- --+ [x] en)       Definition MVP [194]
            = T ([x]1 ei) + T ([x]2 e2) + T ([x]3 e3) + - --+ T ([x]n en)  Definition LT [452]
               [x]1 T (ei) + [x]2 T (e2) + [x]3 T (e3) + -. --+ [x]n T (en)  Definition LT [452]
             =[x]1 A1 + [x]2 A2 + [x]3 A3 +| -. + | [x], As          Definition of Ai
             = Ax                                                        Definition MVP [194]

as desired.U
   So if we were to restrict our study of linear transformations to those where the domain and codomain are
both vector spaces of column vectors (Definition VSCV [83]), every matrix leads to a linear transformation
of this type (Theorem MBLT [459]), while every such linear transformation leads to a matrix (Theorem
MLTCV [460]). So matrices and linear transformations are fundamentally the same. We call the matrix
A of Theorem MLTCV [460] the matrix representation of T.


   We have defined linear transformations for more general vector spaces than just Cm, can we extend
this correspondence between linear transformations and matrices to more general linear transformations
(more general domains and codomains)? Yes, and this is the main theme of Chapter R [530]. Stay tuned.
For now, let's illustrate Theorem MLTCV [460] with an example.


Version 2.02


﻿
                          Subsection LT.LTLC Linear Transformations and Linear Combinations 462


Example MOLT
Matrix of a linear transformation
Suppose S: C3 C4 is defined by

                                       3xi - 2X2 + 5x3
                               x1
                                       9x1i- 2x2 + 5x3


Then

                                              . -3
                           C1= S(e1) =S   0    =
                                              0


                                                  1


so define

       shouldobtainthevectorS (z) K2
                                           3-2 5

                            C =  (es) =  - S  0 2=
                                          - -    0


LnheaTransformations  adnetatCombinations
                                   322


 Its thelinteatingbewerenlertransormaionand  linuear c)ombitiontas.iesthetert oman
 od TheipormTCVan theoe  guearageb. Thed xt theormistxills.the essce of thesro
                             27


should obtain the vector S (z) =2


not deep, the result is hardly startling, but it will be referenced frequently. We have already passed by one
occasion to employ it, in the proof of Theorem MLTCV [460]. Paraphrasing, this theorem says that we
can "push" linear transformations "down into" linear combinations, or "pull" linear transformations "up
out" of linear combinations. We'll have opportunities to both push and pull.


Version 2.02


﻿
                              Subsection LT.LTLC Linear Transformations and Linear Combinations 463


Theorem LTLC
Linear Transformations and Linear Combinations
Suppose that T: U H V is a linear transformation, u1, u2, u3, ... , ut are vectors from U and a1, a2, a3, ... , at
are scalars from C. Then

         T (aiui + a2u2 +a3u3 +   + atut) = a1T (ui) +a2T (u2) +a3T(u3) +-.- + atT (ut)


Proof

         T (aiui + a2u2 + a3u3 + ... + atut)
              = T (aiui) + T (a2u2) + T (a3u3) + - --+ T (atut)    Definition LT [452]
              = a1T (ui) + a2T (u2) + a3T (u3) + - --+ atT (ut)    Definition LT [452]


   Some authors, especially in more advanced texts, take the conclusion of Theorem LTLC [462] as the
defining condition of a linear transformation. This has the appeal of being a single condition, rather than
the two-part condition of Definition LT [452]. (See Exercise LT.T20 [473]).
   Our next theorem says, informally, that it is enough to know how a linear transformation behaves for
inputs from any basis of the domain, and all the other outputs are described by a linear combination of
these few values. Again, the statement of the theorem, and its proof, are not remarkable, but the insight
that goes along with it is very fundamental.

Theorem LTDB
Linear Transformation Defined on a Basis
Suppose B = {Ui, u2, u3, ... , un} is a basis for the vector space U and v1, v2, v3, ... , vn is a list of vectors
from the vector space V (which are not necessarily distinct). Then there is a unique linear transformation,
T: UQ V, suchthatT(ui) =       1 i    n

Proof To prove the existence of T, we construct a function and show that it is a linear transformation
(Technique C [690]). Suppose w E U is an arbitrary element of the domain. Then by Theorem VRRB
[317] there are unique scalars a1, a2, a3, ... , an such that

                                w = aiu1 + a2u2 + asus |---. --anun

Then define

                            T (w) = aivi+ a2v2 + a3v3 -| -. -+ | aava

It should be clear that T behaves as required for n inputs from B. Since the scalars provided by Theorem
VRRB [317] are unique, there is no ambiguity in this definition, and T qualifies as a function with domain
U and codomain V (i.e. T is well-defined). But is T a linear transformation as well?
   Let x E U be a second element of the domain, and suppose the scalars provided by Theorem VRRB
[317] (relative to B) are bi, b2, b3, . . ., be. Then


              = T ((ai + bi) ui1 + (a2 + b2) 112 + - - + (an + ba) un)  Definition VS [279]


= (ai+ bi) vi + (a2 + b2) v2 + - - - + (an + bn) vn          Definition of T
= aiv1 + a2v2 + ."- + anvn + bivi + b2v2 + ."- + bnvn    Definition VS [279]
= T (w) + T (x)


Version 2.02


﻿
Subsection LT.LTLC  Linear Transformations and Linear Combinations 464


Let a E C be any scalar. Then


T (aw) = T (a (alul + a2u2 + asus -+ . + anun))
       = T (aaiui + aa2u2 + aa3u3 + .. + aanu)
       = aaivi + aa2v2 + aa3v3 + .. +   aanv
       = a (aivi + a2v2 + a3v3 + ... + anvn)
       = aT (w)


Definition VS [279]
Definition of T
Definition VS [279]


So by Definition LT [452], T is a linear transformation.
   Is T unique (among all linear transformations that take the ui to the vi)? Applying Technique U [693],
we posit the existence of a second linear transformation, S: U H V such that S (ui) = vi, 1 < i < n.
Again, let w E U represent an arbitrary element of U and let al, a2, a3, ..., an be the scalars provided by
Theorem VRRB [317] (relative to B). We have,


T (w) = T (aiui + a2u2 + a3u3 + ... + anun)
      = a1T (ui) + a2T (u2) + a3T (u3)-+...+ anT (un)
      = aiv1 + a2v2 + a3v3 -|-..-.--|-anva
      = aiS (ui) + a2S (u2) +a3S(u3) + ... +anS (un)
      = S (aiui + a2u2 + a3u3 +... + anun)
      = S (w)


Theorem VRRB [317]
Theorem LTLC [462]
Definition of T
Definition of S
Theorem LTLC [462]
Theorem VRRB [317]


So the output of T and S agree on every input, which means they are equal as functions, T = S. So T is
unique.                                                                                          U

   You might recall facts from analytic geometry, such as "any two points determine a line" and "any
three non-collinear points determine a parabola." Theorem LTDB [462] has much of the same feel. By
specifying the n outputs for inputs from a basis, an entire linear transformation is determined. The analogy
is not perfect, but the style of these facts are not very dissimilar from Theorem LTDB [462].
   Notice that the statement of Theorem LTDB [462] asserts the existence of a linear transformation with
certain properties, while the proof shows us exactly how to define the desired linear transformation. The
next examples how how to work with linear transformations that we find this way.

Example LTDB1
Linear transformation defined on a basis
Consider the linear transformation T: C3 H C2 that is required to have the following three values,


     1         -2
T    0    =   1
       [o1]


   0 -2


   B  {       0
T     1 =     0


10      0     1


     0T-
T    0    = I
     1         -


Because


}


is a basis for C3 (Theorem SUVB [325]), Theorem LTDB [462] says there is a unique linear transformation
T that behaves this way. How do we compute other values of T? Consider the input

                                    2          1          0        0
                             w=    -3= (2) 0 + (-3) 1 + (1) 0
                                    1         0           0         1


Version 2.02


﻿
                               Subsection LT.LTLC Linear Transformations and Linear Combinations 465


Then
                          T (w)   (2) [1J + (-3) [1+ (1) 0 [ 1 [-10

Doing it again,
                                  5          1        0          0
                            x =   2    = (5) 0 + (2) 1 + (-3) 0
                                  -3         0        0         -1
so
                          T (x)   (5) [  + (2) 4   + (-3) []     13]

Any other value of T could be computed in a similar manner. So rather than being given a formula for
the outputs of T, the requirement that T behave in a certain way for the inputs chosen from a basis of
the domain, is as sufficient as a formula for computing any value of the function. You might notice some
parallels between this example and Example MOLT [461] or Theorem MLTCV [460].

Example LTDB2
Linear transformation defined on a basis
Consider the linear transformation R: C3 C2 with the three values,


            R    2    = ( ] 5lR               5 ])=                  R    1    =
                 1          -J1                         -                 4-

You can check that
                                           1     -1     3
                                    D=     2,     5  , 1
                                          11      1     4
is a basis for C3 (make the vectors the columns of a square matrix and check that the matrix is nonsingular,
Theorem CNMB [330]). By Theorem LTDB [462] we know there is a unique linear transformation R with
the three specified outputs. However, we have to work just a bit harder to take an input vector and express
it as a linear combination of the vectors in D. For example, consider,

                                                 8
                                           y-=   -3
                                                 5

Then we must first write y as a linear combination of the vectors in D and solve for the unknown scalars,
to arrive at
                                 r8     1  r1          -11        31

                              K] - = (3) [2] + (-2) [5] + (1) K]

Then the proof of Theorem LTDB [462] gives us

                           R (y) =-(3) [2k] + (-2) []+ (1) [3   -8l

Any other value of R could be computed in a similar manner.
   Here is a third example of a linear transformation defined by its action on a basis, only with more
abstract vector spaces involved.


Example LTDB3
Linear transformation defined on a basis
The set W = {p(x) E P3 | p(l) = 0,p(3) =0} C P3 is a subspace of the vector space of polynomials P3.


Version 2.02


﻿
                                                                    Subsection LT.PI Pre-Images 466


This subspace has C = {3 - 4x + 2, 12 - 13x + X3} as a basis (check this!). Suppose we consider the
linear transformation S: P3 H M22 with values

                S (3 - 4x + x2)   [2  0]                 S (12 - 13x + z3)    1 0

By Theorem LTDB [462] we know there is a unique linear transformation with these two values. To
illustrate a sample computation of S, consider q(x) = 9 - 6x - 52+ 2x3. Verify that q(x) is an element of
W (does it have roots at x= 1 and x= 3?), then find the scalars needed to write it as a linear combination
of the basis vectors in C. Because

                 q(x) = 9 - 6x - 52 + 2x3 = (-5)(3 - 4x + 2) + (2)(12 - 13x + x3)

The proof of Theorem LTDB [462] gives us

                               S q)  5)1 -3 +2        0 1       -5 17
                               S~q= (5)2    0 1+(2)     1 0     -8   01

And all the other outputs of S could be computed in the same manner. Every output of S will have a zero
in the second row, second column. Can you see why this is so?
   Informally, we can describe Theorem LTDB [462] by saying "it is enough to know what a linear
transformation does to a basis (of the domain)."

Subsection PI
Pre-Images


The definition of a function requires that for each input in the domain there is exactly one output in the
codomain. However, the correspondence does not have to behave the other way around. A member of the
codomain might have many inputs from the domain that create it, or it may have none at all. To formalize
our discussion of this aspect of linear transformations, we define the pre-image.
Definition PI
Pre-Image
Suppose that T: U i V is a linear transformation. For each v, define the pre-image of v to be the subset
of U given by
                                   T-1 (v)={u E U | T(u)=v}
                                                                                                 A
   In other words, T-1 (v) is the set of all those vectors in the domain U that get "sent" to the vector v.

Example SPIAS
Sample pre-images, Archetype S
Archetype S [772] is the linear transformation defined by


                             T:CF-~22, T([])        3a+b+c     -26b-2c;~


We could compute a pre-image for every element of the codomain Ml22. However, even in a free textbook,


we do not have the room to do that, so we will compute just two.
   Choose
                                         v =        E M22


Version 2.02


﻿
                                                                 Subsection LT.PI Pre-Images 467


                                                     u1
for no particular reason. What is T-1 (v)? Suppose u = u2 E T-1 (v). The condition that T (u) = v
                                                     _u3_
becomes

                2  1=                               ui -u2      2a1 + 2a2 + u1
                    = v=T(u) =T([2)           =-2ui-6a-23]

Using matrix equality (Definition ME [182]), we arrive at a system of four equations in the three unknowns
u, u2, u3 with an augmented matrix that we can row-reduce in the hunt for solutions,

                             1   -1   0   2          1   0   1  5
                                                             4 4
                             2   2    1   1  RREF    0       1-3
                             3   1    1   3          0   0  0   0
                             -2  -6  -2   2          0   0  0   0

We recognize this system as having infinitely many solutions described by the single free variable u3.
Eventually obtaining the vector form of the solutions (Theorem VFSLS [99]), we can describe the preimage
precisely as,

                        T-1(v)={u E C3 T (u) = v}

                                   1?i         5   1          3   1
                                   = u2  | ni =     us, u2  -4    g
                                   u3_
                                     5   1   -
                                     4   4 u3
                                     {   -AU3   U3   C3
                                        U3_


                               ={[-h+u['Z] |a3 E C3}
                                     0          1
                                     44
                                   = 4 + u3     4    u3E3
                                   10          1


This last line is merely a suggestive way of describing the set on the previous line. You might create three
or four vectors in the preimage, and evaluate T with each. Was the result what you expected? For a hint
of things to come, you might try evaluating T with just the lone vector in the spanning set above. What
was the result? Now take a look back at Theorem PSPHS [105]. Hmmmm.
   OK, let's compute another preimage, but with a different outcome this time. Choose


What is T-1 (v)? Suppose u =[2] ET-1 (v). That T (u) =v becomes


1 1                                 ui - u2      2u1 + 2u2 + u3
2 4uKT                  2j}       3u1 +au2 +a U3-2u1 - 6u2 - 2u3
                        u3_


Version 2.02


﻿
                                        Subsection LT.NLTFO New Linear Transformations From Old 468


Using matrix equality (Definition ME [182]), we arrive at a system of four equations in the three unknowns
u1, u2, u3 with an augmented matrix that we can row-reduce in the hunt for solutions,

                               1   -1   0   1     1          0  4   0
                               2    2   1   1   RREF    0W0     4   0
                               3    1   1   2           0    0  0 W
                                                           000
                              -2 -6 -2 4_               0    0  0   0_

By Theorem RCLS [53] we recognize this system as inconsistent. So no vector u is a member of T-1 (v)
and so
                                            T-1 (v) =0


   The preimage is just a set, it is almost never a subspace of U (you might think about just when T-1 (v)
is a subspace, see Exercise ILT.T10 [488]). We will describe its properties going forward, and it will be
central to the main ideas of this chapter.


Subsection NLTFO
New Linear Transformations From Old


We can combine linear transformations in natural ways to create new linear transformations. So we will
define these combinations and then prove that the results really are still linear transformations. First the
sum of two linear transformations.

Definition LTA
Linear Transformation Addition
Suppose that T: U H V and S: U H V are two linear transformations with the same domain and
codomain. Then their sum is the function T + S: U H V whose outputs are defined by

                                     (T + S) (u) = T (u) + S (u)

                                                                                                  A

   Notice that the first plus sign in the definition is the operation being defined, while the second one is
the vector addition in V. (Vector addition in U will appear just now in the proof that T + S is a linear
transformation.) Definition LTA [467] only provides a function. It would be nice to know that when the
constituents (T, 5) are linear transformations, then so too is T + S.

Theorem SLTLT
Sum of Linear Transformations is a Linear Transformation
Suppose that T: U a V and 5: U - V are two linear transformations with the same domain and
codomain. Then T + S: U a V is a linear transformation.D

Proof We simply check the defining properties of a linear transformation (Definition LT [452]). This is
a good place to consistently ask yourself which objects are being combined with which operations.

           (T +S) (x +y)=T (x +y) + S(x +y)                       Definition LTA [467]


= T (x) + T (y) + S (x) + S (y)    Definition LT [452]
= T (x) + S (x) + T (y) + S (y)    Property C [279] in V
= (T + S) (x) + (T + S) (y)        Definition LTA [467]


Version 2.02


﻿
Subsection LT.NLTFO  New Linear Transformations From Old 469


and


(T+S)(x)


T (ax) + S(ax)
aT (x) + aS (x)
a (T (x) + S (x))
a(T + S) (x)


Definition LTA [467]
Definition LT [452]
Property DVA [280] in V
Definition LTA [467]


0


Example STLT
Sum of two linear transformations
Suppose that T: C2 H C3 and S: C2 H C3 are defined by


             [ zi + 2X2
T([211 =      3x1-4x2
       -      5xi + 2x2]


-2]


.


4x1 - X2
zi +3X2
-7xi + 5x2]


Then by Definition LTA [467], we have


            -F 1        1           1\    [i +2x21
(T + S) =31K /T(\ )/+ S(I = 3x1-4x2 +
            I           -J          -J     5i5 + 2x2]


L


4x1 - 2
z1 +3X2
-7xi + 5x2]


K


5x1 + z2
4x1 - x2
-2x1 + 7x2]


and by Theorem SLTLT [467] we know T + S is also a linear transformation from C2 to C3.

Definition LTSM
Linear Transformation Scalar Multiplication
Suppose that T: U H V is a linear transformation and a E C. Then the scalar multiple is the function
oT: U H V whose outputs are defined by

                                         (aT) (u) =oaT (u)

                                                                                                A

   Given that T is a linear transformation, it would be nice to know that oT is also a linear transformation.
Theorem MLTLT
Multiple of a Linear Transformation is a Linear Transformation
Suppose that T: U H V is a linear transformation and a E C. Then (aT): U H V is a linear transforma-
tion.                                                                                           Q
Proof We simply check the defining properties of a linear transformation (Definition LT [452]). This is
another good place to consistently ask yourself which objects are being combined with which operations.


(aT) (x + y) = a (T (x + y))
            = a (T (x) +T (y))
            = aT (x) + aT (y)
              (cT) (x) + (aT) (y)


   (aT) (3x) = aT (3x)


Definition LTSM [468]
Definition LT [452]
Property DVA [280] in V
Definition LTSM [468]


Definition LTSM [468]


and


Version 2.02


﻿
Subsection LT.NLTFO


New Linear Transformations From Old 470


                          = a (/3T (x))                     Definition LT [452]
                          = (o3) T (x)                      Property SMA [280] in V
                          =(/3a) T (x)                      Commutativity in C
                          =/3 (aT (x))                      Property SMA [280] in V
                          =/3 ((auT) (x))                   Definition LTSM [468]


Example SMLT
Scalar multiple of a linear transformation
Suppose that T: C4 H C3 is defined by

                                      xl      x1+ 2X2 -x3 +2x4
                              T     2    =      i +25z2 - 3x3 + 234
                                   (x3K )  -2x1+33X2-4X3+232x4
                                   _4_
For the sake of an example, choose a = 2, so by Definition LTSM [468], we have

                                    2    3:1+5i + 2x2 - 3-32X4   231-+ 4x2 - 2x3 --4x4
       aT(X2f= 2T            2]        [=2 31+ 5X2 - 3X31-|-4  =   2x I+10x2 - 6x3 --2X4
                                       3-21 + 3x2 - 4x3 + 2x4      -41 + 6x2 - 8x3 + 4x4

and by Theorem MLTLT [468] we know 2T is also a linear transformation from C4 to C3.
   Now, let's imagine we have two vector spaces, U and V, and we collect every possible linear transfor-
mation from U to V into one big set, and call it lT (U, V). Definition LTA [467] and Definition LTSM
[468] tell us how we can "add" and "scalar multiply" two elements of lT (U, V). Theorem SLTLT [467]
and Theorem MLTLT [468] tell us that if we do these operations, then the resulting functions are linear
transformations that are also in lT (U, V). Hmmmm, sounds like a vector space to me! A set of objects,
an addition and a scalar multiplication. Why not?
Theorem VSLT
Vector Space of Linear Transformations
Suppose that U and V are vector spaces. Then the set of all linear transformations from U to V, lT (U, V)
is a vector space when the operations are those given in Definition LTA [467] and Definition LTSM [468].


Proof Theorem SLTLT [467] and Theorem MLTLT [468] provide two of the ten properties in Definition
VS [279]. However, we still need to verify the remaining eight properties. By and large, the proofs are
straightforward and rely on concocting the obvious object, or by reducing the question to the same vector
space property in the vector space V.
   The zero vector is of some interest, though. What linear transformation would we add to any other
linear transformation, so as to keep the second one unchanged? The answer is Z: U a V defined by
Z (u) =0O for every u E U. Notice how we do not need to know any of the specifics about U and V to
make this definition of Z.U

Definition LTC
Linear Transformation Composition


Suppose that T: U H V and S: V H W are linear transformations. Then the composition of S and T
is the function (S o T): U F W whose outputs are defined by

                                      (S o T) (u) = S (T (u))


Version 2.02


﻿
                                       Subsection LT.NLTFO  New Linear Transformations From Old 471


                                                                                                A
   Given that T and S are linear transformations, it would be nice to know that S o T is also a linear
transformation.
Theorem CLTLT
Composition of Linear Transformations is a Linear Transformation
Suppose that T: U H V and S: V H W are linear transformations. Then (S o T): U H W is a linear
transformation.                                                                                 D
Proof We simply check the defining properties of a linear transformation (Definition LT [452]).


(S o T) (x+ y)=


S(T(x+y))
S (T (x) + T(y))
S (T (x)) + S (T (y))
(S o T) (x) + (S o T) (y)


and


Definition LTC [469]
Definition LT [452] for T
Definition LT [452] for S
Definition LTC [469]


Definition LTC [469]
Definition LT [452] for T
Definition LT [452] for S
Definition LTC [469]


(SoT) (ax)


S (T (ax))
S (aT (x))
aS (T (x))
a(S o T) (x)


0


Example CTLT
Composition of two linear transformations
Suppose that T: C2 H C4 and S: C4 H C3 are defined by


T    zi


zi + 2X2
3xi - 4x2
5xi + 2x2
6x1 - 3x2]


X1

X3
_[ 4


B


  2xi1-x2+x3 -
5zi - 3x2 + 833 -
-4xi + 3x2 - 4x3


34
- 2X4
+ 53:4]


Then by Definition LTC [469]

            (SoT)[1)


T        j

    xi + 2x2
 S3x1 - 4x2

    6x1 - 3x2
  2(31 + 2X2) - (3x1
  5(x1 + 2X2) - 3(3xi-
-4(i + 2X2) + 3(3x1
-231 + 13X2
24x1 + 44x2
15x1 - 43X2]


-4X2) + (5x1 + 2X2) - (6x1 - 3x2)
4X2) + 8(5x1 + 2X2) - 2(6x1 - 3x2)
- 4X2) - 4(5x1 + 2X2) + 5(6x1 - 3x2)]


and by Theorem CLTLT [470] S o T is a linear transformation from C2 to C3.
   Here is an interesting exercise that will presage an important result later. In Example STLT [468]
compute (via Theorem MLTCV [460]) the matrix of T, S and T + S. Do you see a relationship between
these three matrices?


Version 2.02


﻿
                                                        Subsection LT.READ Reading Questions 472


   In Example SMLT [469] compute (via Theorem MLTCV [460]) the matrix of T and 2T. Do you see a
relationship between these two matrices?
   Here's the tough one. In Example CTLT [470] compute (via Theorem MLTCV [460]) the matrix of T,
S and S o T. Do you see a relationship between these three matrices???

Subsection READ
Reading Questions


  1. Is the function below a linear transformation? Why or why not?


                             T: C3- C2,T        z2     =3xi     2+ z3
                                                            8x2 -6J
                                                .3_

  2. Determine the matrix representation of the linear transformation S below.

                                                    S      3x1 + 5x2
                                S: C2 -HC3,    (  1    =   8x1-3x2
                                                 -X2-     [-4x1


  3. Theorem LTLC [462] has a fairly simple proof. Yet the result itself is very powerful. Comment on
     why we might say this.


Version 2.02


﻿
                                                                  Subsection LT.EXC Exercises 473


Subsection EXC
Exercises


C15 The archetypes below are all linear transformations whose domains and codomains are vector spaces
of column vectors (Definition VSCV [83]). For each one, compute the matrix representation described in
the proof of Theorem MLTCV [460].
Archetype M [754]
Archetype N [757]
Archetype 0 [760]
Archetype P [763]
Archetype Q [765]
Archetype R [769]
Contributed by Robert Beezer

                --3
C20   Let w =    1. Referring to Example MOLT [461], compute S (w) two different ways. First use
                L4
the definition of 5, then compute the matrix-vector product Cw (Definition MVP [194]).
Contributed by Robert Beezer Solution [474]

C25 Define the linear transformation
                                             X1-
                         T: C3FH C2    T    z2     =   2x1 - x2 + 5x3
                                                     -4x1 + 2x2 - 10x3
                                             3_
Verify that T is a linear transformation.
Contributed by Robert Beezer Solution [474]

C26 Verify that the function below is a linear transformation.

                            T: P2F- C2, T (a + bx + cz2)     [ab
                                                              b+ c_

Contributed by Robert Beezer Solution [474]

C30 Define the linear transformation
                                             X1-
                         T: C3F- C2    T(x2]=          2x1 - x2 + 5x3
                                                     -4x1+ 2x2 - 10x3]


Compute the preimages, T-1 (3n and T-1 (K81)
Contributed by Robert Beezer Solution [474]

C31 For the linear transformation S compute the pre-images.
                                                    aa-2b-c1
                                    S:C-C3 S         =3a - b+2c
                                                    c_ a~b2c_


       -2                                   -5
S-1     5                            S-1    5
        3 j)(L7


Version 2.02


﻿
                                                               Subsection LT.EXC Exercises 474


Contributed by Robert Beezer Solution [475]

M10    Define two linear transformations, T: C4 [ C3 and S: C3   C2 by

              ~1z   i           - 2Xzi3xl--zi + 3X2 + X3 + 9X4
                        3i22      331              (1
          S     2                               TI         =     2x1 + .3 + 7.4
              (q  )    5x1 + 4x2 + 2x3               -343
                                                     -4_

Using the proof of Theorem MLTCV [460] compute the matrix representations of the three linear trans-
formations T, S and S o T. Discover and comment on the relationship between these three matrices.
Contributed by Robert Beezer Solution [476]

T20 Use the conclusion of Theorem LTLC [462] to motivate a new definition of a linear transformation.
Then prove that your new definition is equivalent to Definition LT [452]. (Technique D [687] and Technique
E [690] might be helpful if you are not sure what you are being asked to prove here.)
Contributed by Robert Beezer


Version 2.02


﻿
                                                                  Subsection LT.SOL Solutions 475


Subsection SOL
Solutions


C20    Contributed by Robert Beezer   Statement [472]
                                       9
In both cases the result will be S (w) =[_].


C25    Contributed by Robert Beezer   Statement [472]
We can rewrite T as follows:

                     2x1 - X2 + 5x3 1        2        -1
     T (K1])=[-4x1+ 2x2 - 10x3]         Xi [-4 +X2 [2] +X3
           x3_
and Theorem MBLT [459] tell us that any function of this form is a
C26    Contributed by Robert Beezer   Statement [472]
Check the two conditions of Definition LT [452].


[-10] - 2
           4


-1
2


5    [i
-10] X
      x3_


linear transformation.


T (u+v) =T ((a + bx + cx2) + (d+ex+fx2))
         =T ((a+d) + (b+e)x+ (c+f)x2)
         _ 2(a + d) - (b+ e)
             (b +e) + (c +f)
             [(2a b)+(2d-e)]
             (b +c) +(e +f)
             2a-b      2d-el
             b+c_     [e+f
         = T (u) + T (v)


and


T (au) = T ((a + bx + cx2))
       = T ((aa) + (ab) x + (ac) 2)
          2(aa) - (ab)
          (ab) + (ac)
          a(2a - b)
          a(b + c)
            2a - b
       a    b+c_
       = aT (u)


So T is indeed a linear transformation.
C30    Contributed by Robert Beezer   Statement [472]
For the first pre-image, we want x E C3 such that T (x) [ . This becomes,

                                       2x - 2 + 5x3     _ 2
                                     -4xi + 2X2 - 10x3_ 3_


Version 2.02


﻿
                                                                    Subsection LT.SOL Solutions 476


Vector equality gives a system of two linear equations in three variables, represented by the augmented
matrix
                              2   -1    5                    2  20
                              [-4  2   -10 3           0    0   0

so the system is inconsistent and the pre-image is the empty set. For the second pre-image the same
procedure leads to an augmented matrix with a different vector of constants

                              2   -1    5    4   RREF [W-2           21
                            [-4    2  -10   -8          [0    0   0 0]

This system is consistent and has infinitely many solutions, as we can see from the presence of the two free
variables (x2 and x3) both to zero. We apply Theorem VFSLS [99] to obtain

                                   42                        2
                      T-1 ([0 +                2[1] +3 [0       |2, 136 C}
                            -[8-        0         0         1


C31    Contributed by Robert Beezer   Statement [472]
We work from the definition of the pre-image, Definition PI [465]. Setting

                                              a       -2
                                         S    b    =   5
                                              c        3_

we arrive at a system of three equations in three variables, with an augmented matrix that we row-reduce
in a search for solutions,
                              1 -2 -1 -2              [i    0   1  0
                              3 -1    2    5    RREF    0[      1  0
                              1   1   2    3            0   0  0 [_

With a leading 1 in the last column, this system is inconsistent (Theorem RCLS [53]), and there are no
values of a, b and c that will create an element of the pre-image. So the preimage is the empty set.
   We work from the definition of the pre-image, Definition PI [465]. Setting

                                              a       -5
                                        S     b    =   5
                                              c_       7


The solution set to this system, which is also the desired pre-image, can be expressed using the vector form
of the solutions (Theorem VFSLS [99])


                         -5           3       -1               3         -1
                  S-1     5     =     4 +c -1      |cE C   =   4 +       -1
                          7           0       1                01

Does the final expression for this set remind you of Theorem KPI [483]?


Version 2.02


﻿
Subsection LT.SOL  Solutions 477


M10     Contributed by Robert Beezer  Statement [473]

                            1 -2 3-      21 3 1 97        .7   9   2   1
                            [5  4   2-   4   2 1 2        -l1 19 11 77-


Version 2.02


﻿
Section ILT Injective Linear Transformations 478


Section ILT
Injective Linear Transformations


0


Some linear transformations possess one, or both, of two key properties, which go by the names injective
and surjective. We will see that they are closely related to ideas like linear independence and spanning,
and subspaces like the null space and the column space. In this section we will define an injective linear
transformation and analyze the resulting consequences. The next section will do the same for the surjective
property. In the final section of this chapter we will see what happens when we have the two properties
simultaneously.
   As usual, we lead with a definition.
Definition ILT
Injective Linear Transformation
Suppose T: U H V is a linear transformation. Then T is injective if whenever T (x) = T (y), then x = y.
                                                                                                  A
   Given an arbitrary function, it is possible for two different inputs to yield the same output (think about
the function f(x) =9x2 and the inputs x= 3 and x= -3). For an injective function, this never happens. If
we have equal outputs (T (x) = T (y)) then we must have achieved those equal outputs by employing equal
inputs (x = y). Some authors prefer the term one-to-one where we use injective, and we will sometimes
refer to an injective linear transformation as an injection.

Subsection EILT
Examples of Injective Linear Transformations


It is perhaps most instructive to examine a linear transformation that is not injective first.
Example NIAQ
Not injective, Archetype Q
Archetype Q [765] is the linear transformation


T : C5 - C5,


    /XI \
    12
T    x3
     14
     \15s /


-2xi + 3x2 + 3x3
-16xi + 9x2 + 12x3
-19xi + 7x2 + 14x3
.21x1 + 9x2 + 15x3
-9xi + 5x2 + 733 -


- 6x4 + 3x5
- 2834 + 28x5
- 32x4 + 37x5
- 3534 + 39x5
16x4 + 16x5 _


Notice that for


we have


X =   -


1
3
-1
2
4


      4
      7
y=    0
      5
      7


     /1\        4
     3          55
T    -1     = 72
      2         77
    \ _         31_


    /4\        4
    7         55
T    0    =   72
     5        77
     \ _7     31_


Version 2.02


﻿
                                    Subsection ILT.EILT  Examples of Injective Linear Transformations 479


So we have two vectors from the domain, x - y, yet T (x) = T (y), in violation of Definition ILT [477].
This is another example where you should not concern yourself with how x and y were selected, as this will
be explained shortly. However, do understand why these two vectors provide enough evidence to conclude
that T is not injective.

    Here's a cartoon of a non-injective linear transformation. Notice that the central feature of this cartoon
is that T (u) = v = T (w). Even though this happens again with some unnamed vectors, it only takes one
occurrence to destroy the possibility of injectivity. Note also that the two vectors displayed in the bottom
of V have no bearing, either way, on the injectivity of T.


                          T
       UD                                   V
Diagram NILT. Non-Injective Linear Transformation


To show that a linear transformation is not injective, it is enough to find a single pair of inputs that get
sent to the identical output, as in Example NIAQ [477]. However, to show that a linear transformation is
injective we must establish that this coincidence of outputs never occurs. Here is an example that shows
how to establish this.

Example IAR
Injective, Archetype R
Archetype R [769] is the linear transformation


   /   1\
     12
T    x 3
     14
   \1_5  /


-65xi+ 12812 + 1013 - 26214 + 40x5
36xi - 73x2 - 13 + 15114 - 1615
-441i + 8812 + 5x3 - 180X4 + 24x5
341i - 6812 - 313 + 14014 - 1815
   12xi - 24x2 - 33 + 49x4 - 5x5


To establish that R is injective we must begin with the assumption that T (x) = T (y) and somehow arrive
from this at the conclusion that x = y. Here we go,


      T (x)


      X2
T X3
     X4
   \xz5 /


T (y)


     Y2

     y4
     \ Y-5-


Version 2.02


﻿
Subsection ILT.EILT  Examples of Injective Linear Transformations 480


   -653:1 + 1283:2 + 103:3 - 2623:4 + 40x5j   -65yi + 128Y2 + 10Y3 - 2632y4 + 40
      36x:1 - 733:2 - 3:3 + 1513:4 - 1615~       36y1 - 73Y2 - Y3 + l1l1/4 - 1/
    -4411 + 883:2 + 53:3 - 1803:4 + 24x5   =   -44yi + 88y2 + 5Y3 - l8OY4 + 24y
    3411i - 683:2 -333 + 1403:4 - 18X5~         34y' - 68Y2 - 3Y3 + NONY - 18y5
    _ 123:1 - 243:2 - 3:3 + 49x~4 - 53:5          12y1 - 24Y2 - Y3 + 49Y4 - 5Y5
-653:1 + 1281:2 + 101:3 - 2623:4 + 40x5~   -65yi + 128Y2 + 10Y3 - 2632y4 + 40Y5
  3631- 73:2-33+ 1513:4 -1615                36Y - 73Y2 -Y3 + 'SY4 -6Y
  -4411 + 883:2 + 53:3 - 1803:4 + 24x5~ -   -44y1 + 88y2 + 5Y3 - 180Y4 + 24y5
  34x1i - 683:2 -333 + 1403:4 - 18X5         34Y - 68Y2 - 3Y/3 + NONY - 18y5
    123:1 - 243:2 - 33+ 493:4 - 5x5~          12y1 - 24Y2 - Y3 + 49y4 - 5Y5


)y5


     0

     0
     0


-65(x:1 - yi) + 128(3:2 - y2) + 10(3:3 - YI3) - 262(3:4 - yI4) + 40(X5 - 25
  3631 - yi) - 73(3:2 - Y2) - (3:3 - Y3) + 15134 - YI4) - 16(X5 - Y5)
  -44(3:1 - yi) + 88(3:2 - Y2) + 5(3:3 - YI3) - 180(3:4 - yI4) + 24(x5 - Y5)
  34(3:1 - Yi) - 68(3:2 - Y2) - 3(3:3 - Y3) + 140(3:4 - YI4) - 18(x5 - Y5)
    12(3: - Yi) - 24(x:2- Y2) - (3:3y3)+49(:4-y4) - 5(X -Y5)
                -65   128   10  -262    40     xi - Yi
                36    -73   -1   151    -16     2 -Y22      0
                -44   88    5   -180    24     3:3- 23  =0
                34    -68   -3   140    -18    34 - 4       0
                12    -24   -1    49    -5     X5 - 2J5     0


0
0
0
0
0


Now we recognize that we have a homogeneous system of 5 equations in 5 variables (the terms x:2 - y2 are
the variables), so we row-reduce the coefficient matrix to


                                         1tl00 0 0
                                         0    T]   0   0    0
                                         0    0    1   0    0
                                         0    0    0  [-1   0
                                         0    0    0   0   [-1]


So the only solution is the trivial solution


X1i - Yi1-


2- Y2=-0


3:3-J3-0


3:4 -2-0


5-y5=-0


and we conclude that indeed x= y. By Definition ILT [477], T is injective.


   Here's the cartoon for an injective linear transformation. It is meant to suggest that we never have two
inputs associated with a single output. Again, the two lonely vectors at the bottom of V have no bearing
either way on the injectivity of T.


Version 2.02


﻿
                                    Subsection ILT.EILT  Examples of Injective Linear Transformations 481


                                U                               >0   V
                            Diagram ILT. Injective Linear Transformation

Let's now examine an injective linear transformation between abstract vector spaces.
Example IAV
Injective, Archetype V
Archetype V [779] is defined by

                       T: P3 H M22,    T (a + bz+ cz2+ dx3)      [a+b a- 2c

To establish that the linear transformation is injective, begin by supposing that two polynomial inputs
yield the same output matrix,

                        T (ai + bix + ci12 + dix3)   T (a2 + b2x + c2x2 + d2x3)

Then

        0 [0 0
              0 0
           = T (ai + bix + cix2 + dix3) - T (a2 + b2x + c2x2 + d2x3)    Hypothesis
           = T ((al + bix + cix2 + dix3) - (a2 + b2x + c2x2 + d2x3))    Definition LT [452]
           = T ((ai - a2) + (bi - b2)x + (ci - c2)2 + (di - d2)x3)     Operations in P3
              (ai - a2) + (b   b2) (ai - a2) - 2(cl- c2)1
                    ( -      2)     (i-2)- (d-d2)                           Definition of T

This single matrix equality translates to the homogeneous system of equations in the variables ai -b,

                                        (ai - a2) + (bi - b2) - 0
                                        (ai - a2) - 2(ci - c2) =0
                                                   (di - d2) =0
                                          (i- b2) - (di - d2) =0

This system of equations can be rewritten as the matrix equation


[1  1   0   01    (al- a2)      0
1   0  -2   0     (bi - b2)  _ 0
0   0   0   1     (ci - c2)     0
0   1  0    -1    (d - d2)_     01


Version 2.02


﻿
                                             Subsection ILT.KLT Kernel of a Linear Transformation 482


Since the coefficient matrix is nonsingular (check this) the only solution is trivial, i.e.

           ai-a2=0               bi-b2=0              ci-c2=0               di-d2=0

so that

               ai=a2                 bi=b2                 ci=c2                di=d2

so the two inputs must be equal polynomials. By Definition ILT [477], T is injective.


Subsection KLT
Kernel of a Linear Transformation


For a linear transformation T: U H V, the kernel is a subset of the domain U. Informally, it is the set
of all inputs that the transformation sends to the zero vector of the codomain. It will have some natural
connections with the null space of a matrix, so we will keep the same notation, and if you think about your
objects, then there should be little confusion. Here's the careful definition.
Definition KLT
Kernel of a Linear Transformation
Suppose T: U H V is a linear transformation. Then the kernel of T is the set

                                    1C(T)={uEU      T(u)=O}


(This definition contains Notation KLT.)                                                        A
   Notice that the kernel of T is just the preimage of 0, T-1 (0) (Definition PI [465]). Here's an example.
Example NKAO
Nontrivial kernel, Archetype 0
Archetype 0 [760] is the linear transformation

                                                        -Xi + x2 - 3x3
                                             zi        -xi + 2x2 - 4x3
                          T : C3F- C5, T     x2    =     Xl + x2 + x3
                                             za_        2xi + 3x2 + x3
                                                           xi + 2x3

To determine the elements of C3 in K(T), find those vectors u such that T (u) =0, that is,

                                                  T (u) =0
                                       -ni + 112 -3s0


                                       -ai+2a243           0


Vector equality (Definition CVE [84]) leads us to a homogeneous system of 5 equations in the variables u2,

                                        -ni + u2 - 3u3 = 0
                                        -ai + 2u2 - 4u3= 0


Version 2.02


﻿
                                             Subsection ILT.KLT Kernel of a Linear Transformation 483


                                         2u1 + 3u2 + U3 = 0
                                               ui + 2u3= 0
Row-reducing the coefficient matrix gives

                                           I    0   2
                                           0   [T    -1
                                           0    0    0
                                           0    0    0
                                           0    0    0_
The kernel of T is the set of solutions to this homogeneous system of equations, which by Theorem BNS
[139] can be expressed as

                                       KC(T) =      I
                                                    1


   We know that the span of a set of vectors is always a subspace (Theorem SSS [298]), so the kernel com-
puted in Example NKAO [481] is also a subspace. This is no accident, the kernel of a linear transformation
is always a subspace.
Theorem KLTS
Kernel of a Linear Transformation is a Subspace
Suppose that T: U H V is a linear transformation. Then the kernel of T, K(T), is a subspace of U.  D
Proof We can apply the three-part test of Theorem TSS [293]. First T (Ou) = Ov by Theorem LTTZZ
[456], so Ou E K(T) and we know that the kernel is non-empty.
   Suppose we assume that x, y E C(T). Is x + y E C(T)?
                   T (x + y) = T (x) + T (y)                 Definition LT [452]
                            = 0 + 0                             yx, y&C(T)
                            = 0                              Property Z [280]
This qualifies x + y for membership in K(T). So we have additive closure.
   Suppose we assume that a E C and x E 1C(T). Is cox E K(T)?
                     T (cox) = aT (x)                    Definition LT [452]
                            = a0                         xE CK(T)
                            = 0                          Theorem ZVSM [286]
This qualifies cix for membership in /K(T). So we have scalar closure and Theorem TSS [293] tells us that
K(T) is a subspace of U.

   Let's compute another kernel, now that we know in advance that it will be a subspace.
Example TKAP
Trivial kernel, Archetype P
Archetype P [763] is the linear transformation


                              -XI + x2 + x3
                    zi -xi + 2X2 + 2X3
T : C3F- C5, T  X2   =   Xi + x2 + 3x3
                   (3_        2x1 + 3x2 + X3
                             -2xi + x2 + 3x3_


Version 2.02


﻿
                                              Subsection ILT.KLT Kernel of a Linear Transformation 484


To determine the elements of C3 in C(T), find those vectors u such that T (u) = 0, that is,

                                                   T(u)    0
                                        -u1+u2+u3           0
                                        -ui+2u2+2u3         0
                                        U1+ u2 + 3u3  =     0
                                        2u1 + 3u2 + u3      0
                                        -2ui+ u2 + 3u3      0

Vector equality (Definition CVE [84]) leads us to a homogeneous system of 5 equations in the variables u2,

                                          -n + 112 + 113 = 0
                                        -u1 + 2u2 + 2u3= 0
                                           u1 + u2 + 3u3= 0
                                           2u1 + 3u2 + U3=3 0
                                        -2u1 + u2 + 3u3= 0

Row-reducing the coefficient matrix gives

                                            U    0    0

                                            000
                                            0    0   L1
                                            0    0    0
                                            0    0    0

The kernel of T is the set of solutions to this homogeneous system of equations, which is simply the trivial
solution u = 0, so
                                         KC(T) = {0} = ({ })


   Our next theorem says that if a preimage is a non-empty set then we can construct it by picking any
one element and adding on elements of the kernel.
Theorem KPI
Kernel and Pre-Image
Suppose T: U H V is a linear transformation and v E V. If the preimage T-1 (v) is non-empty, and
u E T-1 (v) then
                              T-1 (v)={u+z zE1C(T)}=u+1C(T)


Proof Let M= {u + z |z E KC(T)}. First, we show that M c T-1 (v). Suppose that w E M, so w has
the form w u u+ z, where z C K(T). Then

                   T (w) = T(u +z)
                         = T (u) + T (z)                   Definition LT [452]
                         =-v+ 0                            u&ET- (v) ,z&E1K(T)
                         =-v                               Property Z [280]


which qualifies w for membership in the preimage of v, w E T-1 (v).
   For the opposite inclusion, suppose x E T-1 (v). Then,

                   T (x - u) = T (x) - T (u)                   Definition LT [452]


Version 2.02


﻿
                                              Subsection ILT.KLT Kernel of a Linear Transformation 485


                               v - v                            x, u E T- (v)
                               =0

This qualifies x - u for membership in the kernel of T, C(T). So there is a vector z E K(T) such that
x - u = z. Rearranging this equation gives x = u + z and so x E M. So T-1 (v) C M and we see that
M = T-1 (v), as desired.                                                                            U
   This theorem, and its proof, should remind you very much of Theorem PSPHS [105]. Additionally, you
might go back and review Example SPIAS [465]. Can you tell now which is the only preimage to be a
subspace?
   The next theorem is one we will cite frequently, as it characterizes injections by the size of the kernel.
Theorem KILT
Kernel of an Injective Linear Transformation
Suppose that T: U i V is a linear transformation. Then T is injective if and only if the kernel of T is
trivial, K(T) = {0}.                                                                                D
Proof (-) We assume T is injective and we need to establish that two sets are equal (Definition SE
[684]). Since the kernel is a subspace (Theorem KLTS [482]), {0} C K(T). To establish the opposite
inclusion, suppose x c C(T).

                      T (x) = 0                          Definition KLT [481]
                            = T (0)                      Theorem LTTZZ [456]

We can apply Definition ILT [477] to conclude that x = 0. Therefore C(T) C {0} and by Definition SE
[684], K(T) =_{0}.
    (<) To establish that T is injective, appeal to Definition ILT [477] and begin with the assumption that
T (x) = T (y). Then

                    T (x - y) = T (x) - T (y)                   Definition LT [452]
                             = 0                                Hypothesis

So x - y E K(T) by Definition KLT [481] and with the hypothesis that the kernel is trivial we conclude
that x - y = 0. Then

                                     y = y + 0 = y + (x - y)   x
thus establishing that T is injective by Definition ILT [477].                                      U

Example NIAQR
Not injective, Archetype Q, revisited
We are now in a position to revisit our first example in this section, Example NIAQ [477]. In that example,
we showed that Archetype Q [765] is not injective by constructing two vectors, which when used to evaluate
the linear transformation provided the same output, thus violating Definition ILT [477]. Just where did
those two vectors come from?
    The key is the vector
                                                    -3-
                                                    4

                                                    3
                                                    _3_


which you can check is an element of C(T) for Archetype Q [765]. Choose a vector x at random, and then
compute y = x + z (verify this computation back in Example NIAQ [477]). Then

                     T (y) = T (x + z)


Version 2.02


﻿
                        Subsection ILT.ILTLI Injective Linear Transformations and Linear Independence 486


                          = T (x) + T (z)                    Definition LT [452]
                          = T (x) + 0                        z E 1C(T)
                          = T (x)                            Property Z [280]

Whenever the kernel of a linear transformation is non-trivial, we can employ this device and conclude that
the linear transformation is not injective. This is another way of viewing Theorem KILT [484]. For an
injective linear transformation, the kernel is trivial and our only choice for z is the zero vector, which will
not help us create two different inputs for T that yield identical outputs. For every one of the archetypes
that is not injective, there is an example presented of exactly this form.

Example NIAO
Not injective, Archetype 0
In Example NKAO [481] the kernel of Archetype 0 [760] was determined to be

                                                 -2
                                                 1


a subspace of C3 with dimension 1. Since the kernel is not trivial, Theorem KILT [484] tells us that T is
not injective.

Example IAP
Injective, Archetype P
In Example TKAP [482] it was shown that the linear transformation in Archetype P [763] has a trivial
kernel. So by Theorem KILT [484], T is injective.


Subsection ILTLI
Injective Linear Transformations and Linear Independence


There is a connection between injective linear transformations and linearly independent sets that we will
make precise in the next two theorems. However, more informally, we can get a feel for this connection
when we think about how each property is defined. A set of vectors is linearly independent if the only
relation of linear dependence is the trivial one. A linear transformation is injective if the only way two
input vectors can produce the same output is if the trivial way, when both input vectors are equal.
Theorem ILTLI
Injective Linear Transformations and Linear Independence
Suppose that T: U a V is an injective linear transformation and S ={ui, 112, 113, ..., Ut} is a linearly
independent subset of U. Then R ={T (ui1), T (112) , T (113), . . ., T (Ut)} is a linearly independent subset
of V.D
Proof Begin with a relation of linear dependence on R (Definition RLD [308], Definition LI [308]),

         aiT (ui) +a2T (u2) + asT(us) +. . .+ aT(u)  O
                    T (aiui + a2u2 + asus + - - + atut) =0         Theorem LTL C [462]
                        aiu1 + a2u2 + asus +| -. + | atut E K(T)  Definition KLT [481]


aiu1 + a2u2 + a3u3 + . + atut E {0}     Theorem KILT [484]
aiu1 + a2u2 + a3u3 + . + atut = 0       Definition SET [683]


Version 2.02


﻿
                                Subsection ILT.ILTD  Injective Linear Transformations and Dimension  487


Since this is a relation of linear dependence on the linearly independent set S, we can conclude that

            a1=0              a2=0              a3=0              ...            at=0

and this establishes that R is a linearly independent set.                                       U

Theorem ILTB
Injective Linear Transformations and Bases
Suppose that T: U H V is a linear transformation and B = {ui, u2, u3, ..., um} is a basis of U. Then T
is injective if and only if C = {T (ui), T (u2), T (u3), ..., T (um)} is a linearly independent subset of V.


Proof (-) Assume T is injective. Since B is a basis, we know B is linearly independent (Definition B
[325]). Then Theorem ILTLI [485] says that C is a linearly independent subset of V.
   (<) Assume that C is linearly independent. To establish that T is injective, we will show that the
kernel of T is trivial (Theorem KILT [484]). Suppose that u E C(T). As an element of U, we can write u
as a linear combination of the basis vectors in B (uniquely). So there are are scalars, a1, a2, a3, ... , am,
such that
                               u = aiu1 + a2u2 + a3u3 + --. + amum
Then,

         0 = T (u)                                                   Definition KLT [481]
           = T (aiui + a2u2 + a3u3 + - - - + amum)                   Definition TSVS [313]
           = a1T (ui) + a2T (u2) + a3T (u3) + - - - + amT (um)       Theorem LTLC [462]

This is a relation of linear dependence (Definition RLD [308]) on the linearly independent set C, so the
scalars are all zero: al =a2= a3     =am =0. Then

              u=aiu1+ a2u2 + a3u3 +        + amum
                = Oui + Ou2 +0Ou3 + - - - + Oum                  Theorem ZSSM [286]
                = 0 + 0 + 0 + - - - + 0                          Theorem ZSSM [286]
                = 0                                              Property Z [280]

Since u was chosen as an arbitrary vector from K(T), we have C(T)  {0} and Theorem KILT [484] tells
us that T is injective.                                                                          U


Subsection ILTD
Injective Linear Transformations and Dimension


Theorem ILTD
Injective Linear Transformations and Dimension
Suppose that T: U a V is an injective linear transformation. Then dim (U) dim (V).D
Proof Suppose to the contrary that m =dim (U) > dim (V) =t. Let B be a basis of U, which will then
contain m vectors. Apply T to each element of B to form a set C that is a subset of V. By Theorem ILTB
[486], C is linearly independent and therefore must contain m distinct vectors. So we have found a set of
m linearly independent vectors in V, a vector space of dimension t, with m > t. However, this contradicts


Theorem G [355], so our assumption is false and dim (U) dim (V).                                 U

Example NIDAU
Not injective by dimension, Archetype U


Version 2.02


﻿
                                 Subsection ILT.CILT Composition of Injective Linear Transformations 488


The linear transformation in Archetype U [777] is

                                                      a+2b+12c-3d+e+6f
                   T: M23 H(C4,    T    a  b c           2a-b-c+d-11f
                                        d e f_         a+b+7c+2d+e-3f
                                                       _a+2b+12c+5e-5f_

Since dim (M23) = 6 > 4 = dim (C4), T cannot be injective for then T would violate Theorem ILTD [486].


   Notice that the previous example made no use of the actual formula defining the function. Merely
a comparison of the dimensions of the domain and codomain are enough to conclude that the linear
transformation is not injective. Archetype M [754] and Archetype N [757] are two more examples of linear
transformations that have "big" domains and "small" codomains, resulting in "collisions" of outputs and
thus are non-injective linear transformations.

Subsection CILT
Composition of Injective Linear Transformations


In Subsection LT.NLTFO [467] we saw how to combine linear transformations to build new linear trans-
formations, specifically, how to build the composition of two linear transformations (Definition LTC [469]).
It will be useful later to know that the composition of injective linear transformations is again injective,
so we prove that here.
Theorem CILTI
Composition of Injective Linear Transformations is Injective
Suppose that T: U H V and S: V H W are injective linear transformations. Then (S o T): U H W is an
injective linear transformation.                                                                  D
Proof That the composition is a linear transformation was established in Theorem CLTLT [470], so we
need only establish that the composition is injective. Applying Definition ILT [477], choose x, y from U.
Then if (S o T) (x) = (S o T) (y),

                             S (T (x)) = S (T (y))            Definition LTC [469]
                                 T (x) = T (y)                Definition ILT [477] for S
                                    x = y                     Definition ILT [477] for T


Subsection READ
Reading Questions


  1. Suppose T: C8 m C5 is a linear transformation. Why can't T be injective?

  2. Describe the kernel of an injective linear transformation.

  3. Theorem KPI [483] should remind you of Theorem PSPHS [105]. Why do we say this?


Version 2.02


﻿
                                                                   Subsection ILT.EXC  Exercises 489


Subsection EXC
Exercises


C1O Each archetype below is a linear transformation. Compute the kernel for each.
Archetype M [754]
Archetype N [757]
Archetype 0 [760]
Archetype P [763]
Archetype Q [765]
Archetype R [769]
Archetype 5 [772]
Archetype T [775]
Archetype U [777]
Archetype V [779]
Archetype W [781]
Archetype X [783]

Contributed by Robert Beezer

C20   The linear transformation T: C4 H C3 is not injective. Find two inputs x, y E C4 that yield the
same output (that is T (x) = T (y)).

                                                  2x1 + x2 + x3
                                T          =   -xi+3x2 +zx3 - x4
                                      x3       3x1 + x2+ 2x3 - 2X4j
                                    _z4_

Contributed by Robert Beezer  Solution [490]

C25 Define the linear transformation

                                              x1-
                          T: C3 H C2    T     z          2xi - x2 + 5x3
                                                       -4x1 + 2X2 - 10x3
                                              3_

Find a basis for the kernel of T, KC(T). Is T injective?
Contributed by Robert Beezer Solution [490]

C40 Show that the linear transformation R is not injective by finding two different elements of the
domain, x and y, such that R (x) =R (y). (S22 is the vector space of symmetric 2 x 2 matrices.)


Contributed by Robert Beezer Solution [491]

T1O Suppose T: U - V is a linear transformation. For which vectors v E V is T-1 (v) a subspace of
U?
Contributed by Robert Beezer


T15 Suppose that that T: U H V and S: V H W are linear transformations. Prove the following
relationship between null spaces.
                                          K(T) C K(S o T)


Version 2.02


﻿
Subsection ILT.EXC  Exercises 490


Contributed by Robert Beezer Solution [491]

T20 Suppose that A is an m x n matrix. Define the linear transformation T by

                                     T: C"[-(Cr", T(x)=Ax

Prove that the kernel of T equals the null space of A, C(T) = N(A).
Contributed by Andy Zimmer   Solution [491]


Version 2.02


﻿
                                                                    Subsection ILT.SOL  Solutions 491


Subsection SOL
Solutions


C20    Contributed by Robert Beezer  Statement [488]
A linear transformation that is not injective will have a non-trivial kernel (Theorem KILT [484]), and this
is the key to finding the desired inputs. We need one non-trivial element of the kernel, so suppose that
z E C4 is an element of the kernel,

                                0[                    2zi + z2 + z3
                                0  =0=7T(z)=-zi+3z2+z3-z4
                                0__3zi + z2 + 2z3 - 2z4

Vector equality Definition CVE [84] leads to the homogeneous system of three equations in four variables,

                                              2zi + z2 + z3= 0
                                        -z1 + 3z2 + z3 - z4 = 0
                                        3z1 + z2 + 2z3 - 2z4 = 0

The coefficient matrix of this system row-reduces as

                               2   1 1    0i1               0   0    1
                               -1 3    1 -1    RREF:   0   w    0    1
                               3   1 2    -2_         . 0   0    j-3_

From this we can find a solution (we only need one), that is an element of C(T),

                                                   -_1i
                                                   -1

                                                   1

Now, we choose a vector x at random and set y = x + z,

                         2                                    2       -1       1
                         3                                    3       -1       2
                   x=    4                     y=x+z=         4   +   3    =   7

                        .-2_-2_                                      _ >      -1
and you can check that

                                        T (x) [1           y


A quicker solution is to take two elements of the kernel (in this case, scalar multiples of z) which both get
sent to 0 by T. Quicker yet, take 0 and z as x and y, which also both get sent to 0 by T.

C25 Contributed by Robert Beezer Statement [488]
To find the kernel, we require all x E C3 such that T (x) =0. This condition is

                                       E2xi - x2 + 5z   1    01
                                         -4i+ 2x2 - 10x3 [0J


This leads to a homogeneous system of two linear equations in three variables, whose coefficient matrix
row-reduces to
                                              0E   0   0_


Version 2.02


﻿
                                                                    Subsection ILT.SOL Solutions 492


With two free variables Theorem BNS [139] yields the basis for the null space

                                               2     2
                                               0   , 1
                                           {   1] . .0_
   With n (T) # 0, K(T) # {0}, so Theorem KILT [484] says T is not injective.
C40    Contributed by Robert Beezer  Statement [488]
We choose x to be any vector we like. A particularly cocky choice would be to choose x = 0, but we will
instead choose

                                           X -   _i21 41
Then R (x) = 9 + 9x. Now compute the kernel of R, which by Theorem KILT [484] we expect to be
nontrivial. Setting R ([b cj) equal to the zero vector, 0= 0 + Ox, and equating coefficients leads to
a homogeneous system of equations. Row-reducing the coefficient matrix of this system will allow us to
determine the values of a, b and c that create elements of the null space of R,
                                   2-111       RREF [    O     i


We only need a single element of the null space of this coefficient matrix, so we will not compute a precise
description of the whole null space. Instead, choose the free variable c = 2. Then

                                           z -=[-2    22_
is the corresponding element of the kernel. We compute the desired y as
                                         2   -1     -2 -2         0   -
                               y-x z=            + [-2    2     [-3   6

Then check that R (y) = 9 + 9x.
T15 Contributed by Robert Beezer Statement [488]
We are asked to prove that K(T) is a subset of 1C(S o T). Employing Definition SSET [683], choose
x E K(T). Then we know that T (x) = 0. So
                   (S o T) (x) = S (T (x))                  Definition LTC [469]
                              = S(0)                        xE C(T)
                              = 0                           Theorem LTTZZ [456]
This qualifies x for membership in K(S o T).
T20 Contributed by Andy Zimmer Statement [489]
This is an equality of sets, so we want to establish two subset conditions (Definition SE [684]).
   First, show P1(A) C KC(T). Choose x E N1(A). Check to see if x E K(T),

                         T (x) =Ax                           Definition of T
                                      = 0 x& E (A)
So by Definition KLT [481], x C K(T) and thus P1(A) G P1(T).
   Now, show K(T) C P1(A). Choose x C KC(T). Check to see if x C P1(A),


                         Ax = T (x)                          Definition of T
                               0                             x EK(T)
So by Definition NSM [64], x E N(A) and thus N(T) C P1(A).


Version 2.02


﻿
Section SLT Surjective Linear Transformations 493


Section SLT
Surjective Linear Transformations


--m


The companion to an injection is a surjection. Surjective linear transformations are closely related to
spanning sets and ranges. So as you read this section reflect back on Section ILT [477] and note the
parallels and the contrasts. In the next section, Section IVLT [508], we will combine the two properties.
   As usual, we lead with a definition.

Definition SLT
Surjective Linear Transformation
Suppose T: U H V is a linear transformation. Then T is surjective if for every v c V there exists a
u EU so that T(u) =v.                                                                            A

   Given an arbitrary function, it is possible for there to be an element of the codomain that is not an
output of the function (think about the function y = f(x) 9=2 and the codomain element y = -3). For
a surjective function, this never happens. If we choose any element of the codomain (v E V) then there
must be an input from the domain (u E U) which will create the output when used to evaluate the linear
transformation (T (u) = v). Some authors prefer the term onto where we use surjective, and we will
sometimes refer to a surjective linear transformation as a surjection.


Subsection ESLT
Examples of Surjective Linear Transformations


It is perhaps most instructive to examine a linear transformation that is not surjective first.

Example NSAQ
Not surjective, Archetype Q
Archetype Q [765] is the linear transformation


T : C5 - C5,


    /XzI\
    12
T    x3
     14
     \15s /


-2xi + 3x2 + 3x3
-16xi + 9x2 + 12x3
-19xi + 7x2 + 14x3
.21x1 + 9x2 + 15x3
-9xi + 5x2 + 73 -


- 6x4 + 3x5
- 2834 + 28x5
- 32x4 + 37x5
- 3534 + 39x5
16x4 + 16x5 _


We will demonstrate that
                                                  --
                                                  2
                                             v= 3
                                                  -1
                                                  4

is an unobtainable element of the codomain. Suppose to the contrary that u is an element of the domain
such that T (u) = v. Then


-1
2
3  =v=T(u)
-1
4


     U2
T    Us
     U4
     \ u5_


Version 2.02


﻿
Subsection SLT.ESLT  Examples of Surjective Linear Transformations 494


                                        -2u1 + 3u
                                      -16ui + 9u2
                                  = -19u1 + 7u2
                                      -21ui + 9u2
                                      -9ui + 5u2
                                      -2    3  3
                                      -16 9 12
                                  = -19 7 14
                                      -21 9 15
                                      -9    5  7
Now we recognize the appropriate input vector u as
augmented matrix of the system, and row-reduce to


t2


2+3u3-6U4+3u5
+ 12u3 - 28U4 + 28u5
+ 14u3 - 32u4 + 37U5
+ 15a3 - 35U4 + 39U5
+ 7U3 - 16u4 + 16U5 _
-6     3    ui
-28 28      u2
-32 37      U3
-35 39      U4
-16  16  _U5
a solution to a linear system of equations. Form the


                                      1    0   0   0   -1    0
                                      0   L    0   0   -3    0
                                      0    0  0    0   -}    0
                                      0    0   0  R1-1       0
                                      0   0    0   0    0    1
With a leading 1 in the last column, Theorem RCLS [53] tells us the system is inconsistent. From the
absence of any solutions we conclude that no such vector u exists, and by Definition SLT [492], T is not
surjective.
   Again, do not concern yourself with how v was selected, as this will be explained shortly. However, do
understand why this vector provides enough evidence to conclude that T is not surjective.
   To show that a linear transformation is not surjective, it is enough to find a single element of the
codomain that is never created by any input, as in Example NSAQ [492]. However, to show that a linear
transformation is surjective we must establish that every element of the codomain occurs as an output of
the linear transformation for some appropriate input.
Example SAR
Surjective, Archetype R
Archetype R [769] is the linear transformation
                                    / 1       -65x1 + 128x2 +10x3 - 262x4 + 40x5
                                    X2           36x1 - 73x2 - x3 + 151X4 - 16x5
                T: C5K- C5, T       z3    =    -44x1 + 88x2 + 5x3 - 180x4 + 24x5
                                    X4          34xi- 68x2 - 3x3 + 140x4 -18x5
                                    \             12x1 - 24x2 - x3 + 49X4 - 5X5
To establish that R is surjective we must begin with a totally arbitrary element of the codomain, v and
somehow find an input vector u such that T (u) = v. We desire,
                                                             T(u)    v
                            -65u1 + 128n2 + 10u3 - 262U4 + 40n5 v1
                               36u1 - 73n2 - u3 + 151n4 - 16n5        V2
                             -44u1 + 88n2 + 5u3 - 180U4 + 24n5 =V3
                             34u1 - 68a2 - 3a3 + 140U4 - 18n5     V4
                                12u1 - 24n2 - U3 + 49U4 - 5u5        _V5
                                -65   128   10  -262    40    u p     vi
                                36    -73  -1    151   -16    u2      V2
                                -44   88    5   -180    24    u31=1   V3
                                34    -68  -3    140   -18    U4      V4
                                12    -24  -1    49     -5 _ _UV_ _v_


Version 2.02


﻿
                                  Subsection SLT.ESLT Examples of Surjective Linear Transformations 495


We recognize this equation as a system of equations in the variables uW, but our vector of constants contains
symbols. In general, we would have to row-reduce the augmented matrix by hand, due to the symbolic
final column. However, in this particular example, the 5 x 5 coefficient matrix is nonsingular and so has
an inverse (Theorem NI [228], Definition MI [213]).

                   -65    128   10  -262    40         -47    92    1   -181   -14
                   36    -73   -1    151   -16          27   -55          2     11
                   -44    88    5   -180    24      =  -32    64   -1   -126   -12
                   34    -68   -3    140   -18          25   -50    3    199     9
                   12    -24 -1       49    -5           9   -181         71     4

so we find that

                               U1      -47    92    1   -181 -14      v1
                               U2       27   -55         22    11     v2
                               Us =    -32    64   -1 -126     -12    v3
                               U4       25   -50    3    199    9     V4
                               U5       9    -18    1    21     4_V5
                                       -47v1 + 92v2 + v3 - 181v4 - 14v5
                                       27v1- 55V2 +2V3+ 221V4 + 11V
                                   - -32v1 + 64v2 - V3 - 126v4 - 12v5
                                        25v1 - 50V2 +2V3+ 199v4 + 9V
                                        9v1 - 18v2 + 2v3 + 7V4 + 4v5 _

This establishes that if we are given any output vector v, we can use its components in this final expression
to formulate a vector u such that T (u) = v. So by Definition SLT [492] we now know that T is surjective.
You might try to verify this condition in its full generality (i.e. evaluate T with this final expression and see
if you get v as the result), or test it more specifically for some numerical vector v (see Exercise SLT.C20
[504]).
    Let's now examine a surjective linear transformation between abstract vector spaces.
Example SAV
Surjective, Archetype V
Archetype V [779] is defined by


                       T: P 3 HM22, T (a+bx +cx2 +dx3) - [a+b a-]
                                                                   d     b - d

To establish that the linear transformation is surjective, begin by choosing an arbitrary output. In this
example, we need to choose an arbitrary 2 x 2 matrix, say


and we would like to find an input polynomial

                                        u =a + bx + cx2 + dx3

so that T (u) =v. So we have,


x        Ty
z w_
       = T (u)


Version 2.02


﻿
                                   Subsection SLT.ESLT  Examples of Surjective Linear Transformations 496


                                            =T(a+bx+cz2+dx3)
                                               a+b a-2c
                                                 d     b- d

Matrix equality leads us to the system of four equations in the four unknowns, x, y, z, w,

                                                a+b~x
                                              a - 2c = y
                                                   d=z
                                                b - d  w


which can be rewritten as a matrix equation,


1
1
0
0


1
0
0
1


0
-2
0
0


0     a
0     b      y
1     c      z
-1_ _d_ _w_


The coefficient matrix is nonsingular, hence it has an inverse,

                                1 1     0    0         1   0
                                1 0    -2    0         0   0
                                0 0     0    1             - 2
                                0 1 0-1                O0 0


-1 -1
1 1
1 -
1 0


so we have


a      1    0   -1 -1
b      0   0     1    1     y
c      2   -2   -2   -2     z
d      0    0    1    0    _w_


       (X1 y - z-W)
               z


So the input polynomial u = (x - z - w) + (z + w)xz+ 2 (x - y - z - w)2+ zx3 will yield the output matrix
v, no matter what form v takes. This means by Definition SLT [492] that T is surjective. All the same,
let's do a concrete demonstration and evaluate T with u,


                  T(u)= T     (- z - w) + (z + w)x +2    - y - z - w)2+ z3

                            (X- z- w) +(z +w) ( - z- w) -2(j-( - y- z- w))
                                     z                        (z+w)-z

                           z w
                         =V


Version 2.02


﻿
                                            Subsection SLT.RLT Range of a Linear Transformation 497


Subsection RLT
Range of a Linear Transformation


For a linear transformation T: U H V, the range is a subset of the codomain V. Informally, it is the set
of all outputs that the transformation creates when fed every possible input from the domain. It will have
some natural connections with the column space of a matrix, so we will keep the same notation, and if you
think about your objects, then there should be little confusion. Here's the careful definition.
Definition RLT
Range of a Linear Transformation
Suppose T: U H V is a linear transformation. Then the range of T is the set

                                     R(T)={T(u) uEU}


(This definition contains Notation RLT.)                                                      A

Example RAO
Range, Archetype 0
Archetype 0 [760] is the linear transformation

                                                      -xi + z2 - 3x3
                                            Xi        -xi + 2x2 - 4x3
                         T : C3F- C5, T     x2    =     xi + x2 + x3
                                            x3_        2x1 + 3x2 + x3
                                                         xi + 2x3

To determine the elements of C5 in R(T), find those vectors v such that T (u) = v for some u E C3

                                v = T (u)
                                      -ai + U2 - 3U3
                                      -ui + 2U2 - 43
                                  =     t1 + u2 + u3
                                       2u1 + 3u2 + U3
                                         ai + 2u3 _
                                      -ai      u2      -3u3
                                      -a1      2u2     -4u3
                                  =   ui   +   u2   +   U3
                                      21i      3q2 as
                                      .a1_    _ 0 _    _ 2a3 _
                                        -1         1       -3
                                        -1         2       -4
                                  -a1    1   +a2 1 +as      1
                                         2         3        1
                                         1        0         2

This says that every output of T (v) can be written as a linear combination of the three vectors


-1                        1                       -3
-1                        2                       -4
1                         1                       1
2                         3                       1
1                         0                       2


Version 2.02


﻿
                                             Subsection SLT.RLT Range of a Linear Transformation 498


using the scalars a1, a2, u. Furthermore, since u can be any element of C3, every such linear combination
is an output. This means that
                                              -1     1    -3-
                                              -1     2    -4
                                 R(T) =        1  ,1 ,1
                                              2      3     1
                                              1      0_    2 _
The three vectors in this spanning set for R(T) form a linearly dependent set (check this!). So we can
find a more economical presentation by any of the various methods from Section CRS [236] and Section
FS [257]. We will place the vectors into a matrix as rows, row-reduce, toss out zero rows and appeal to
Theorem BRS [245], so we can describe the range of T with a basis,

                                                  1     0
                                                  0      1
                                    R(T)=        -3 ,2
                                                 -7     5
                                                 L-2_ 1_


   We know that the span of a set of vectors is always a subspace (Theorem SSS [298]), so the range
computed in Example RAO [496] is also a subspace. This is no accident, the range of a linear transformation
is always a subspace.
Theorem RLTS
Range of a Linear Transformation is a Subspace
Suppose that T: U H V is a linear transformation. Then the range of T, R(T), is a subspace of V. D
Proof We can apply the three-part test of Theorem TSS [293]. First, Ou E U and T (Ou) = Ov by
Theorem LTTZZ [456], sO Oy E R(T) and we know that the range is non-empty.
   Suppose we assume that x, y E R(T). Is x+y E R(T)? If x, y E R(T) then we know there are vectors
w, z E U such that T (w) = x and T (z) = y. Because U is a vector space, additive closure (Property AC
[279]) implies that w + z E U. Then

                  T (w + z) = T (w) + T (z)                  Definition LT [452]
                            = x + y                          Definition of w and z

So we have found an input, w + z, which when fed into T creates x + y as an output. This qualifies x + y
for membership in R(T). So we have additive closure.
   Suppose we assume that ca C C and x C 7Z(T). Is cax C 7Z(T)? If x C 7Z(T), then there is a vector
w C U such that T (w) =x. Because U is a vector space, scalar closure implies that a~w C U. Then

                      T (aw) =oaT (w)                       Definition LT [452]
                              =ox                           Definition of w

So we have found an input (aw) which when fed into T creates ax as an output. This qualifies ax for
membership in 7Z(T). So we have scalar closure and Theorem TSS [293] tells us that 7Z(T) is a subspace
of V.


   Let's compute another range, now that we know in advance that it will be a subspace.
Example FRAN
Full range, Archetype N


Version 2.02


﻿
                                              Subsection SLT.RLT Range of a Linear Transformation 499


Archetype N [757] is the linear transformation


                                         2         ~2xi + X2 + 3x3 - 4x4 + 5X5
                     T : C5K-  C3,  T    z3     =  zi - 2x2 + 3x3 - 9X4 + 3xj
                                         X4        _3x1 +4x3- 6X4 + 5x5

To determine the elements of C3 in R(T), find those vectors v such that T (u) = v for some u E C5
                         v   T(u)
                              2ui + u2 + 3u3 - 4u4 + 5u1
                           =  al-2U2+3u3-94+3u5
                               [3u1 + 4u3 - 6u4 + 5u5]
                               2u1 1      2       3u3     -4u4       5u5
                           =   ui   +  -2u2 +     33 +    -9u4 +     3u5
                              3u1        0        4u3_    -6u4_      5u5
                                 .2        1         3        -4         5
                           =ui 1 +u2 -2 +au33           +u4 -9 +u        3
                                 3         0         4        -6        -5


This says that every output of T (v) can be written as a linear combination of the five vectors
              2                 1                3                -4                 5
              1                -2                3                -9                 3
              3                 0                4                -6 5
using the scalars a1, a2, u, u4, u. Furthermore, since u can be any element of C5, every such linear
combination is an output. This means that
                                        2      1     3     -4     5
                           R(T) =        1, -2 ,3, -9 ,3
                                       13      0     4     -6     5
The five vectors in this spanning set for R(T) form a linearly dependent set (Theorem MVSLD [137]). So
we can find a more economical presentation by any of the various methods from Section CRS [236] and
Section FS [257]. We will place the vectors into a matrix as rows, row-reduce, toss out zero rows and
appeal to Theorem BRS [245], so we can describe the range of T with a (nice) basis,

                                   R(T =        1     , 0       =l C3


   In contrast to injective linear transformations having small (trivial) kernels (Theorem KILT [484]),
surjective linear transformations have large ranges, as indicated in the next theorem.
Theorem RSLT
Range of a Surjective Linear Transformation
Suppose that T: U a V is a linear transformation. Then T is surjective if and only if the range of T


equals the codomain, R(T) = V.                                                                    D
Proof (-) By Definition RLT [496], we know that R(T) C V. To establish the reverse inclusion, assume
v E V. Then since T is surjective (Definition SLT [492]), there exists a vector u E U so that T (u) = v.
However, the existence of u gains v membership in R(T), so V C R(T). Thus, R(T) = V.


Version 2.02


﻿
                                              Subsection SLT.RLT  Range of a Linear Transformation 500


    (<) To establish that T is surjective, choose v E V. Since we are assuming that R(T) = V, v E R(T).
This says there is a vector u E U so that T (u) = v, i.e. T is surjective.                          U

Example NSAQR
Not surjective, Archetype Q, revisited
We are now in a position to revisit our first example in this section, Example NSAQ [492]. In that example,
we showed that Archetype Q [765] is not surjective by constructing a vector in the codomain where no
element of the domain could be used to evaluate the linear transformation to create the output, thus
violating Definition SLT [492]. Just where did this vector come from?
   The short answer is that the vector
                                                   --
                                                   2
                                              v= 3
                                                   -1
                                                   4

was constructed to lie outside of the range of T. How was this accomplished? First, the range of T is given
by
                                            1     0       0     0
                                            0     1       0     0
                               R(T) =       0 ,   0   ,   1   , 0
                                            0     0       0     1
                                            1     -1     -1     2_

Suppose an element of the range v* has its first 4 components equal to -1, 2, 3, -1, in that order. Then
to be an element of R(T), we would have

                                  1         0           0           0      -1
                                  0         1           0           0       2
                      v* = (-1) 0 + (2)     0   + (3)   1   + (-1) 0 =      3
                                 0          0           0           1      -1
                                 1         -1          -1           2_     -8_

So the only vector in the range with these first four components specified, must have -8 in the fifth
component. To set the fifth component to any other value (say, 4) will result in a vector (v in Example
NSAQ [492]) outside of the range. Any attempt to find an input for T that will produce v as an output
will be doomed to failure.
   Whenever the range of a linear transformation is not the whole codomain, we can employ this device
and conclude that the linear transformation is not surjective. This is another way of viewing Theorem
RSLT [498]. For a surjective linear transformation, the range is all of the codomain and there is no choice
for a vector v that lies in V, yet not in the range. For every one of the archetypes that is not surjective,
there is an example presented of exactly this form.

Example NSAO
Not surjective, Archetype 0
In Example RAO [496] the range of Archetype 0 [760] was determined to be


              1      0-
              0      1
R(T) =  -3  ,2
             -7      5
             L-2_ 1 _


Version 2.02


﻿
                           Subsection SLT.SSSLT Spanning Sets and Surjective Linear Transformations 501


a subspace of dimension 2 in C5. Since R(T) -f C5, Theorem RSLT [498] says T is not surjective.  0

Example SAN
Surjective, Archetype N
The range of Archetype N [757] was computed in Example FRAN [497] to be

                                               .1_   0    0
                                  R(T) =       0 ,1 ,0
                                               0     0     1

Since the basis for this subspace is the set of standard unit vectors for C3 (Theorem SUVB [325]), we have
R(T) = C3 and by Theorem RSLT [498], T is surjective.


Subsection SSSLT
Spanning Sets and Surjective Linear Transformations


Just as injective linear transformations are allied with linear independence (Theorem ILTLI [485], Theorem
ILTB [486]), surjective linear transformations are allied with spanning sets.
Theorem SSRLT
Spanning Set for Range of a Linear Transformation
Suppose that T: U i V is a linear transformation and S = {ui, u2, u3, ..., ut} spans U. Then

                              R = {T (ui) , T (U2) , T (u3) , . .., T (ut)}

spans R(T).                                                                                      D
Proof We need to establish that R(T) = (R), a set equality. First we establish that R(T) C (R). To
this end, choose v E R(T). Then there exists a vector u E U, such that T (u) = v (Definition RLT [496]).
Because S spans U there are scalars, ai, a2, a3, ..., at, such that

                                u = aiu1 + a2u2 + a3u3 +-- + atut

Then

          v = T (u)                                                 Definition RLT [496]
            = T (aiui + a2u2 + a3u3 + - - - + atut)                 Definition TSVS [313]
            = a1T (ui) + a2T (u2) + a3T (u3) + . .. + atT (ut)      Theorem LTLC [462]


which establishes that v E (R) (Definition SS [298]). So 7Z(T) C (R).
   To establish the opposite inclusion, choose an element of the span of R, say v C (R). Then there are
scalars bi, b2, b3, . . ., bt so that

             v=biT (11i) + b2T (112) + bs T (113) -|- -. -+-- bt T (Ut)  Definition SS [298]
             =T (biui + b2u2 + baus +| -. + | btut)                  Theorem LTL C [462]

This demonstrates that v is an output of the linear transformation T, so v C R(T). Therefore (R) G  ()
so we have the set equality RZ(T) =(R) (Definition SE [684]). In other words, R spans R(T) (Definition


TSVS [313]).                                                                                     U
   Theorem SSRLT [500] provides an easy way to begin the construction of a basis for the range of a linear
transformation, since the construction of a spanning set requires simply evaluating the linear transformation


Version 2.02


﻿
                           Subsection SLT.SSSLT Spanning Sets and Surjective Linear Transformations 502


on a spanning set of the domain. In practice the best choice for a spanning set of the domain would be
as small as possible, in other words, a basis. The resulting spanning set for the codomain may not be
linearly independent, so to find a basis for the range might require tossing out redundant vectors from the
spanning set. Here's an example.
Example BRLT
A basis for the range of a linear transformation
Define the linear transformation T: M22 - P2 by

                 T([a    =1) I (a+2b+8c+d)+(-3a+2b+5d)x+(a+b+5c)x2

A convenient spanning set for M22 is the basis

                                   S ={1 0       1 [0 0       [0 0i
                                   0[ 0_~ ' 0_'[ 0]_ '[0 'If

So by Theorem SSRLT [500], a spanning set for R(T) is


                       = {1-3x+z2, 2 + 2x +2, 8 + 5x2, 1+ 5x}

The set R is not linearly independent, so if we desire a basis for R(T), we need to eliminate some redundant
vectors. Two particular relations of linear dependence on R are

                 (-2)(1 - 3x + x2) + (-3)(2+ 2x + x2) + (8 + 5x2)  0 + Ox + Ox2   0
                      (1 - 3x + x2) + (-1)(2 + 2x + x2) + (1+ 5x)  0 + Ox + Ox2   0

These, individually, allow us to remove 8 + 5x2 and 1 + 5x from R with out destroying the property that
R spans R(T). The two remaining vectors are linearly independent (check this!), so we can write

                                R(T) K{1 - 3x + X2, 2 + 2x + x2)

and see that dim (R(T)) = 2.
   Elements of the range are precisely those elements of the codomain with non-empty preimages.
Theorem RPI
Range and Pre-Image
Suppose that T: U H V is a linear transformation. Then

                                 v E R(T) if and only if T-1 (v) -f 0


Proof (->) If v E R(T), then there is a vector u E U such that T (u) =v. This qualifies u for membership
in T-1 (v), and thus the preimage of v is not empty.
    (<-) Suppose the preimage of v is not empty, so we can choose a vector u C U such that T (u) =v.
Then v C R(T).

Theorem SLTB
Surjective Linear Transformations and Bases
Suppose that T: U a V is a linear transformation and B = {ui, 112, 113, ..., um} is a basis of U. Then T


is surjective if and only if C = {T (ui) , T (U2) , T (u3), ..., T (um)} is a spanning set for V. Q
Proof    (-) Assume T is surjective. Since B is a basis, we know B is a spanning set of U (Definition
B [325]). Then Theorem SSRLT [500] says that C spans R(T). But the hypothesis that T is surjective
means V = R(T) (Theorem RSLT [498]), so C spans V.


Version 2.02


﻿
                               Subsection SLT.SLTD Surjective Linear Transformations and Dimension  503


    (<) Assume that C spans V. To establish that T is surjective, we will show that every element of V
is an output of T for some input (Definition SLT [492]). Suppose that v c V. As an element of V, we can
write v as a linear combination of the spanning set C. So there are are scalars, bi, b2, b3, ... , bm, such
that
                         V = biT (ui) + b2T (U2) + b3T (U3) + .---+ bmT(um)

Now define the vector u E U by


u= biui+b2u2+b3u3+---+bmum


Then


         T (u) =T(biui+ b2u2 + b3u3s+-+ bmum)
              = biT(ui) + b2T(u2) + b3T(us) +-.- + bmT (um)
              = v

So, given any choice of a vector v E V, we can design an input u E U
Thus, by Definition SLT [492], T is surjective.


Theorem LTLC [462]


to produce v as an output of T.


Subsection SLTD
Surjective Linear Transformations and Dimension


Theorem SLTD
Surjective Linear Transformations and Dimension
Suppose that T: U H V is a surjective linear transformation. Then dim (U) > dim (V).


D-


Proof Suppose to the contrary that m =dim (U) < dim (V) = t. Let B be a basis of U, which will then
contain m vectors. Apply T to each element of B to form a set C that is a subset of V. By Theorem SLTB
[501], C is spanning set of V with m or fewer vectors. So we have a set of m or fewer vectors that span V,
a vector space of dimension t, with m < t. However, this contradicts Theorem G [355], so our assumption
is false and dim (U) > dim (V).                                                                   U


Example NSDAT
Not surjective by dimension, Archetype T
The linear transformation in Archetype T [775] is

                                T : P4 s P, T (p(x)) = (x - 2)p(x)


Since dim (P4)


5 < 6 =dim (P5), T cannot be surjective for then it would violate Theorem SLTD [502].


   Notice that the previous example made no use of the actual formula defining the function. Merely
a comparison of the dimensions of the domain and codomain are enough to conclude that the linear
transformation is not surjective. Archetype 0 [760] and Archetype P [763] are two more examples of linear
transformations that have "small" domains and "big" codomains, resulting in an inability to create all
possible outputs and thus they are non-surjective linear transformations.


Version 2.02


﻿
                               Subsection SLT.CSLT  Composition of Surjective Linear Transformations 504


Subsection CSLT
Composition of Surjective Linear Transformations


In Subsection LT.NLTFO [467] we saw how to combine linear transformations to build new linear trans-
formations, specifically, how to build the composition of two linear transformations (Definition LTC [469]).
It will be useful later to know that the composition of surjective linear transformations is again surjective,
so we prove that here.
Theorem CSLTS
Composition of Surjective Linear Transformations is Surjective
Suppose that T: U H V and S: V H W are surjective linear transformations. Then (S o T): U H W is a
surjective linear transformation.                                                                   D
Proof That the composition is a linear transformation was established in Theorem CLTLT [470], so we
need only establish that the composition is surjective. Applying Definition SLT [492], choose w E W.
   Because S is surjective, there must be a vector v E V, such that S (v) = w. With the existence of v
established, that T is surjective guarantees a vector u E U such that T (u) = v. Now,

                    (S o T) (u) = S (T (u))                   Definition LTC [469]
                               = S (v)                        Definition of u
                               = w                            Definition of v

This establishes that any element of the codomain (w) can be created by evaluating S o T with the right
input (u). Thus, by Definition SLT [492], S o T is surjective.                                      U


Subsection READ
Reading Questions


  1. Suppose T: C5 H C8 is a linear transformation. Why can't T be surjective?

  2. What is the relationship between a surjective linear transformation and its range?

  3. Compare and contrast injective and surjective linear transformations.


Version 2.02


﻿
                                                                  Subsection SLT.EXC  Exercises 505


Subsection EXC
Exercises


C10 Each archetype below is a linear transformation. Compute the range for each.
Archetype M [754]
Archetype N [757]
Archetype 0 [760]
Archetype P [763]
Archetype Q [765]
Archetype R [769]
Archetype S [772]
Archetype T [775]
Archetype U [777]
Archetype V [779]
Archetype W [781]
Archetype X [783]

Contributed by Robert Beezer

C20 Example SAR [493] concludes with an expression for a vector u E C5 that we believe will create the
vector v E C5 when used to evaluate T. That is, T (u) = v. Verify this assertion by actually evaluating T
with u. If you don't have the patience to push around all these symbols, try choosing a numerical instance
of v, compute u, and then compute T (u), which should result in v.
Contributed by Robert Beezer

C22 The linear transformation S: C4     C3 is not surjective. Find an output w E C3 that has an empty
pre-image (that is S-- (w)  0.)

                                        x    [ 2xi + x2 + 3x3 - 4x4
                               S          =    i +332+4x3+3X4
                                    ( ]       -1 + 2X2 +3:3 + 74
                                    _4_

Contributed by Robert Beezer   Solution [506]

C25 Define the linear transformation


                          T: C3F- C2,T       z           2zi - z2 +5331
                                                       -4x1 + 2x:2 - 1033

Find a basis for the range of T, 7R(T). Is T surjective?
Contributed by Robert Beezer Solution [506]

C40 Show that the linear transformation T is not surjective by finding an element of the codomain, v,
such that there is no vector u with T (u) =v. (15 points)


                                                 7 a      2a+3b-c]
                             T: C3- (C3, T       b     -   2b - 2c
                                                 c_       a-b+2c_

Contributed by Robert Beezer Solution [507]


Version 2.02


﻿
                                                                    Subsection SLT.EXC  Exercises 506


T15    Suppose that that T: U H V and S: V H W    are linear transformations. Prove the following
relationship between ranges. (15 points)

                                          R (S o T) C-R(S)


Contributed by Robert Beezer Solution [507]

T20 Suppose that A is an m x n matrix. Define the linear transformation T by

                                     T: C" H  Cm, T(x)=Ax

Prove that the range of T equals the column space of A, R(T) = C(A).
Contributed by Andy Zimmer   Solution [507]


Version 2.02


﻿
                                                                Subsection SLT.SOL Solutions 507


Subsection SOL
Solutions


C22    Contributed by Robert Beezer  Statement [504]
To find an element of C3 with an empty pre-image, we will compute the range of the linear transformation
R(S) and then find an element outside of this set.
   By Theorem SSRLT [500] we can evaluate S with the elements of a spanning set of the domain and
create a spanning set for the range.

             12                   01                   03                    0-4
      S    0    =   1        S    1    =  3        S   0    =   4       S    0    =   3
           0        -10                   21                    107


So
                                          2     1     3    -4
                             -R(S) =      1   ,3 ,4 ,3
                                         -1     2     1     7
This spanning set is obviously linearly dependent, so we can reduce it to a basis for R(S) using Theorem
BRS [245], where the elements of the spanning set are placed as the rows of a matrix. The result is that

                                                1     0
                                   R(S) =       0]     1
                                                -1     1
Therefore, the unique vector in R(S) with a first slot equal to 6 and a second slot equal to 15 will be the
linear combination
                                        1        0      6
                                    6   0   +15 1 =     15
                                       -1_       1_    -9_
So, any vector with first two components equal to 6 and 15, but with a third component different from 9,
such as
                                                 6
                                          w =    15
                                                -63_
will not be an element of the range of S and will therefore have an empty pre-image. Another strategy
on this problem is to guess. Almost any vector will lie outside the range of T, you have to be unlucky to
randomly choose an element of the range. This is because the codomain has dimension 3, while the range
is "much smaller" at a dimension of 2. You still need to check that your guess lies outside of the range,
which generally will involve solving a system of equations that turns out to be inconsistent.
C25 Contributed by Robert Beezer Statement [504]
To find the range of T, apply T to the elements of a spanning set for C3 as suggested in Theorem SSRLT
[500]. We will use the standard basis vectors (Theorem SUVB [325]).


Each of these vectors is a scalar multiple of the others, so we can toss two of them in reducing the spanning


set to a linearly independent set (or be more careful and apply Theorem BCS [239] on a matrix with these
three vectors as columns). The result is the basis of the range,


                                            {[12]


Version 2.02


﻿
                                                                    Subsection SLT.SOL  Solutions 508


   With r (T) # 2, R(T) # C2, so Theorem RSLT [498] says T is not surjective.
C40    Contributed by Robert Beezer  Statement [504]
We wish to find an output vector v that has no associated input. This is the same as requiring that there
is no solution to the equality

                               a        ~2a +3b -c        2        3       ~-1
                     v=T       b    =     2b-2c      =a 0 +b       2   +c -2
                               c_       _a - b+2c_        1_      -1_      _2_

In other words, we would like to find an element of C3 not in the set

                                             .2     3     -1
                                   Y =       0 ,2       ,-2


If we make these vectors the rows of a matrix, and row-reduce, Theorem BRS [245] provides an alternate
description of Y,
                                                 .2    0


If we add these vectors together, and then change the third component of the result, we will create a vector
                               2
that lies outside of Y, say v = 4
                               -9_
T15    Contributed by Robert Beezer  Statement [505]
This question asks us to establish that one set (7(S o T)) is a subset of another (R(S)). Choose an element
in the "smaller" set, say w E R(S o T). Then we know that there is a vector u E U such that

                                     w = (S o T) (u) = S (T (u))

Now define v = T (u), so that then
                                        S (v) = S (T (u)) = w
This statement is sufficient to show that w E R(S), so w is an element of the "larger" set, and R(S o T) C
R(S).
T20 Contributed by Andy Zimmer Statement [505]
This is an equality of sets, so we want to establish two subset conditions (Definition SE [684]).
   First, show C(A) C R(T). Choose y E C(A). Then by Definition CSM [236] and Definition MVP [194]
there is a vector x C C" such that Ax =y. Then

                         T (x) =Ax                            Definition of T


This statement qualifies y as a member of 7Z(T) (Definition RLT [496]), so C(A) G R(T).
   Now, show R(T) C C(A). Choose y C R(T). Then by Definition RLT [496], there is a vector x in C"m
such that T (x) =y. Then

                         Ax =T (x)                            Definition of T


                             =-y

So by Definition CSM [236] and Definition MVP [194], y qualifies for membership in C(A) and so R(T) C
C(A).


Version 2.02


﻿
                                                    Section IVLT  Invertible Linear Transformations 509


Section IVLT
Invertible Linear Transformations


In this section we will conclude our introduction to linear transformations by bringing together the twin
properties of injectivity and surjectivity and consider linear transformations with both of these proper-
ties.

Subsection IVLT
Invertible Linear Transformations


One preliminary definition, and then we will have our main definition for this section.

Definition IDLT
Identity Linear Transformation
The identity linear transformation on the vector space W is defined as

                                    Iw : W HW,       Iw (w) = w

                                                                                                  A

   Informally, Iw is the "do-nothing" function. You should check that Iw is really a linear transformation,
as claimed, and then compute its kernel and range to see that it is both injective and surjective. All of
these facts should be straightforward to verify (Exercise IVLT.T05 [523]). With this in hand we can make
our main definition.

Definition IVLT
Invertible Linear Transformations
Suppose that T: U H V is a linear transformation. If there is a function S: V H U such that

                           So T=Iu                             To S=Iv

then T is invertible. In this case, we call S the inverse of T and write S = T-1.          A

   Informally, a linear transformation T is invertible if there is a companion linear transformation, S, which
"undoes" the action of T. When the two linear transformations are applied consecutively (composition),
in either order, the result is to have no real effect. It is entirely analogous to squaring a positive number
and then taking its (positive) square root.
   Here is an example of a linear transformation that is invertible. As usual at the beginning of a section,
do not be concerned with where S came from, just understand how it illustrates Definition IVLT [508].

Example AIVLT
An invertible linear transformation
Archetype V [779] is the linear transformation


                      T :PF- M22, T (a+bx +cx2+dx3) -[abi]


Define the function 5: M22 H P3 defined by


                    s( La   J=(a-c-d)+(c+d)x+(a-b-c-d)x2+cx3


Version 2.02


﻿
Subsection IVLT.IVLT  Invertible Linear Transformations  510


Then


(T oS)([ a          T S[a

                  - T ((a -c- d) + (c+d)x+ (a--b-c- d)x2 + cx3)

                     (a-c-d)+(c+d) (a-c-d)-2('(a-b-c-d))
                              c                      (c+d)-c

                     -a b

                  = IM~r22\ c d


And


(SoT) (a+bx+cx2 +dx3)


S (T (a+bx+cx2+dx3))

S([a+b a-2c])

((a+b)-d-(b-d))+(d+(b


  + (I(a+b)-(a-2c)

a+bx+cx2+dx3

Ip (a+bx+cx2 +dx3)


d-


- d))x

(b -d)) lIx2 +(d)x3


For now, understand why these computations show that T is invertible, and that S
amazed by how S works so perfectly in concert with T! We will see later just how
form of S (when it is possible).


= T-1. Maybe even be
to arrive at the correct


   It can be as instructive to study a linear transformation that is not invertible.

Example ANILT
A non-invertible linear transformation
Consider the linear transformation T: C3   JM22 defined by

                                    T    -     a-b       2a+2b+c
                               T([b])       L3a+b+c -2a-6b-2c]
                                    c_

Suppose we were to search for an inverse function 5: M22 - C3.
   First verify that the 2 x 2 matrix A = 8 2_ is not in the range of T. This will amount to finding an


input to T, b], such that
             c_

                                                  a-b=5
                                             2a+2b+c= 3
                                             3a + b+ c =8
                                          -2a-6b-2c=2


Version 2.02


﻿
                                             Subsection IVLT.IVLT Invertible Linear Transformations 511


As this system of equations is inconsistent, there is no input column vector, and A 0 R(T). How should
we define S (A)? Note that
                                T (S(A))=(T o S)(A)=IM22 (A)= A

So any definition we would provide for S (A) must then be a column vector that T sends to A and we
would have A E R(T), contrary to the definition of T. This is enough to see that there is no function S
that will allow us to conclude that T is invertible, since we cannot provide a consistent definition for S (A)
if we assume T is invertible.
    Even though we now know that T is not invertible, let's not leave this example just yet. Check that


                     T [2)       5 2l                           []         5 2      B
                       4             -Jg8                                       -

How would we define S (B)?


                S(B)=S      T     -2      = (So T )    -2     = I c     -2    =   -2
                                  4                     4               4         -4

or

                                  0                     0               0          0
                S(B)=S      T     -3      = (So T)     -3) =     I -3         =   -3
                                  8                     8               8          8

Which definition should we provide for S (B)? Both are necessary. But then S is not a function. So we
have a second reason to know that there is no function S that will allow us to conclude that T is invertible.
It happens that there are infinitely many column vectors that S would have to take to B. Construct the
kernel of T,

                                         IC (T) =       1


Now choose either of the two inputs used above for T and add to it a scalar multiple of the basis vector
for the kernel of T. For example,

                                               1 -1 3
                                   x   -[2    +(-2) [-1         0


then verify that T (x) =B. Practice creating a few more inputs for T that would be sent to B, and see
why it is hopeless to think that we could ever provide a reasonable definition for S (B)! There is a "whole
subspace's worth" of values that S5(B) would have to take on.
    In Example ANILT [509] you may have noticed that T is not surjective, since the matrix A was not in
the range of T. And T is not injective since there are two different input column vectors that T sends to
the matrix B. Linear transformations T that are not surjective lead to putative inverse functions S that
are undefined on inputs outside of the range of T. Linear transformations T that are not injective lead


to putative inverse functions S that are multiply-defined on each of their inputs. We will formalize these
ideas in Theorem ILTIS [511].
    But first notice in Definition IVLT [508] that we only require the inverse (when it exists) to be a
function. When it does exist, it too is a linear transformation.


Version 2.02


﻿
                                                                    Subsection IVLT.IV  Invertibility 512


Theorem ILTLT
Inverse of a Linear Transformation is a Linear Transformation
Suppose that T: U i V is an invertible linear transformation. Then the function T-1: V i U is a linear
transformation.                                                                                       D
Proof We work through verifying Definition LT [452] for T-1, using the fact that T is a linear trans-
formation to obtain the second equality in each half of the proof. To this end, suppose x, y E V and
a E C.

            T-1 (x + y)  T-1 (T (T-1 (x)) + T (T-1 (y)))         Definition IVLT [508]
                        = T-1 (T (T-1 (x) + T-1 (y)))                  Definition LT [452]
                        = T-1 (x) + T-1 (y)                            Definition IVLT [508]

Now check the second defining property of a linear transformation for T-1,

               T-1 (ox)    T-1 (oT (T-1 (x)))                          Definition IVLT [508]
                        = T-1 (T (cT-1 (x)))                           Definition LT [452]
                        = aT-1 (x)                                     Definition IVLT [508]


    So T-1 fulfills the requirements of Definition LT [452] and is therefore a linear transformation. So when
T has an inverse, T-1 is also a linear transformation. Additionally, T-1 is invertible and its inverse is what
you might expect.
Theorem IILT
Inverse of an Invertible Linear Transformation
Suppose that T: U H V is an invertible linear transformation. Then T-1 is an invertible linear transfor-
mation and (T-1) -=T.
Proof Because T is invertible, Definition IVLT [508] tells us there is a function T-1: V H U such that

                          T-1 o T= Iu                            T o T-1 = Iv

Additionally, Theorem ILTLT [511] tells us that T-1 is more than just a function, it is a linear trans-
formation. Now view these two statements as properties of the linear transformation T-1. In light of
Definition IVLT [508], they together say that T-1 is invertible (let T play the role of S in the statement
of the definition). Furthermore, the inverse of T-1 is then T, i.e. (T-1)1 -T.              U


Subsection IV
Invertibility


We now know what an inverse linear transformation is, but just which linear transformations have inverses?
Here is a theorem we have been preparing for all chapter long.
Theorem ILTIS
Invertible Linear Transformations are Injective and Surjective
Suppose T: U - V is a linear transformation. Then T is invertible if and only if T is injective and
surj ective.D


Proof    (-) Since T is presumed invertible, we can employ its inverse, T-1 (Definition IVLT [508]). To
see that T is injective, suppose x, y E U and assume that T (x) = T (y),

                            x = Iu (x)                          Definition IDLT [508]


Version 2.02


﻿
                                                                   Subsection IVLT.IV  Invertibility  513


                              = (T-1 o T) (x)                   Definition IVLT [508]
                              = T-1 (T (x))                     Definition LTC [469]
                              = T-1 (T (y))                     Definition ILT [477]
                                 (T-1 o T) (y)                  Definition LTC [469]
                              = Iu (y)                          Definition IVLT [508]
                              = y                               Definition IDLT [508]

 So by Definition ILT [477] T is injective. To check that T is surjective, suppose v E V. Then T-1 (v) is
 a vector in U. Compute

                  T (T-1 (v)) = (T o T-1) (v)                   Definition LTC [469]
                              = IV (v)                          Definition IVLT [508]
                              = v                               Definition IDLT [508]

So there is an element from U, when used as an input to T (namely T-1 (v)) that produces the desired
output, v, and hence T is surjective by Definition SLT [492].
    (<) Now assume that T is both injective and surjective. We will build a function S: V 1  U that
will establish that T is invertible. To this end, choose any v c V. Since T is surjective, Theorem RSLT
[498] says R(T) = V, so we have v E R(T). Theorem RPI [501] says that the pre-image of v, T-1 (v),
is nonempty. So we can choose a vector from the pre-image of v, say u. In other words, there exists
u E T-1 (v).
    Since T-1 (v) is non-empty, Theorem KPI [483] then says that

                                     T-1 (v)={u+z | zEC(T)}

However, because T is injective, by Theorem KILT [484] the kernel is trivial, C(T)  {0}. So the pre-image
is a set with just one element, T-1 (v) = {u}. Now we can define S by S (v) = u. This is the key to
this half of this proof. Normally the preimage of a vector from the codomain might be an empty set, or
an infinite set. But surjectivity requires that the preimage not be empty, and then injectivity limits the
preimage to a singleton. Since our choice of v was arbitrary, we know that every pre-image for T is a set
with a single element. This allows us to construct S as a function. Now that it is defined, verifying that
it is the inverse of T will be easy. Here we go.
    Choose u E U. Define v = T (u). Then T-1 (v) = {u}, so that S (v) = u and,

                              (S o T) (u) = S (T (u)) = S (v) = u = IU (u)

and since our choice of u was arbitrary we have function equality, 5 o T=I.
    Now choose v E V. Define u to be the single vector in the set T-1 (v), in other words, u =S (v).
Then T (u) =v, so
                              (T o S) (v) = T (S(v)) = T (u) = v = Iyv(v)
and since our choice of v was arbitrary we have function equality, T o 5  y


    When a linear transformation is both injective and surjective, the pre-image of any element of the
codomain is a set of size one (a "singleton"). This fact allowed us to construct the inverse linear trans-
formation in one half of the proof of Theorem ILTIS [511] (see Technique C [690]). We can follow this


approach to construct the inverse of a specific linear transformation, as the next example shows.
Example CIVLT
Computing the Inverse of a Linear Transformations


Version 2.02


﻿
                                                                   Subsection IVLT.IV  Invertibility  514


Consider the linear transformation T: S22 H P2 defined by

                      T ([         = (a+b+c)+(-a+2c)x+(2a+3b+6c)x2

T is invertible, which you are able to verify, perhaps by determining that the kernel of T is empty and the
range of T is all of P2. This will be easier once we have Theorem RPNDD [517], which appears later in
this section.
    By Theorem ILTIS [511] we know T-1 exists, and it will be critical shortly to realize that T-1 is
automatically known to be a linear transformation as well (Theorem ILTLT [511]). To determine the
complete behavior of T-1: P2 H S22 we can simply determine its action on a basis for the domain, P2.
This is the substance of Theorem LTDB [462], and an excellent example of its application. Choose any
basis of P2, the simpler the better, such as B = {1, x, x2}. Values of T-1 for these three basis elements
will be the single elements of their preimages. In turn, we have

                T-1 (1):


T ([ bD


1 +Ox+Ox2


L


1

2


   1
1 0
   3


  1 1           1 0 0 -6
  2 0   RREF: 01 0        10
  6 0           0 0 1 -3

T-1 (1) =[61       0

ZT1 (1)    E-6   10
            10  -3


         (preimage)

         (function)

T1 (x):


T ([ bD


0+ 1x+Ox2


L


1

2


   1
1 0
   3


r~3 4i1


          (preimage)

          (function)

T-1 (x2):


T ([ bD


0+Ox+ 1x2


L


1

2


1


11 0              1 0 0     2
0 2 0     RREF: 0 1 0 -3
3 6 1_            0 0 1     1_
  T1(X 2)-{[2     3   ]
T   - ( 2)    2 3  - 3

T-1 (2     -23     13


(preimage)

(function)


Theorem LTDB [462] says, informally, "it is enough to know what a linear transformation does to a basis."
Formally, we have the outputs of T-1 for a basis, so by Theorem LTDB [462] there is a unique linear


Version 2.02


﻿
                                                                  Subsection IVLT.IV Invertibility  515


transformation with these outputs. So we put this information to work. The key step here is that we can
convert any element of P2 into a linear combination of the elements of the basis B (Theorem VRRB [317]).
We are after a "formula" for the value of T-1 on a generic element of P2, say p + qx + r2.


T-1 (p+qx+rx2)


T-1 (p(1) + q(x) + r(x2))
pT-1 (1) + qT-1 (x) + T-1 (X2)
E-6     1OJ+ [-3      4 ] +    23   13

-6p-3q+2r 1Op+4q-3r
  10p+4q-3r       -3p-q+r


Theorem VRRB [317]
Theorem LTLC [462]


Notice how a linear combination in the domain of T-1 has been translated into a linear combination in
the codomain of T-1 since we know T-1 is a linear transformation by Theorem ILTLT [511].
   Also, notice how the augmented matrices used to determine the three pre-images could be combined into
one calculation of a matrix in extended echelon form, reminiscent of a procedure we know for computing
the inverse of a matrix (see Example CMI [216]). Hmmmm.
   We will make frequent use of the characterization of invertible linear transformations provided by
Theorem ILTIS [511]. The next theorem is a good example of this, and we will use it often, too.
Theorem CIVLT
Composition of Invertible Linear Transformations
Suppose that T: U H V and S: V H W are invertible linear transformations. Then the composition,
(S o T) : U H W is an invertible linear transformation.                                           D
Proof Since S and T are both linear transformations, S o T is also a linear transformation by Theorem
CLTLT [470]. Since S and T are both invertible, Theorem ILTIS [511] says that S and T are both injective
and surjective. Then Theorem CILTI [487] says S o T is injective, and Theorem CSLTS [503] says S o T is
surjective. Now apply the "other half" of Theorem ILTIS [511] and conclude that S o T is invertible. U
   When a composition is invertible, the inverse is easy to construct.
Theorem ICLT
Inverse of a Composition of Linear Transformations
Suppose that T: U H V and S: V H W are invertible linear transformations. Then S o T is invertible
and (SoT)1 = T-1 o S-1.                                                                           D
Proof Compute, for all w E W


((So T)o (T-1 oS-)) (w)=


S (T (T-1 (S-1 (w))))

S (Iv (S-1 (w)))
S (S-1 (w))
w
Iw (w)


Definition IVLT [508]


Definition IDLT
Definition IVLT
Definition IDLT


[508]
[508]
[508]


so (S o T) o (T-1 o S-1) = Iw and also


((T-1 oS-1)o (SoT)) (u)


T-1 (s-' (S (T (u))))
T-1 (Iy (T (u)))
T-1 (T (u))
u
IU (u)


Definition IVLT [508]


Definition IDLT
Definition IVLT
Definition IDLT


[508]
[508]
[508]


Version 2.02


﻿
                                                      Subsection IVLT.SI Structure and Isomorphism 516


so (T-1 o S-1) o (S o T) = I. By Definition IVLT [508], SoT is invertible and (S o T)-1 = T-1 o S-1. *

    Notice that this theorem not only establishes what the inverse of SoT is, it also duplicates the conclusion
of Theorem CIVLT [514] and also establishes the invertibility of SoT. But somehow, the proof of Theorem
CIVLT [514] is nicer way to get this property.
    Does Theorem ICLT [514] remind you of the flavor of any theorem we have seen about matrices? (Hint:
Think about getting dressed.) Hmmmm.


Subsection SI
Structure and Isomorphism


A vector space is defined (Definition VS [279]) as a set of objects ("vectors") endowed with a definition
of vector addition (+) and a definition of scalar multiplication (written with juxtaposition). Many of our
definitions about vector spaces involve linear combinations (Definition LC [297]), such as the span of a set
(Definition SS [298]) and linear independence (Definition LI [308]). Other definitions are built up from
these ideas, such as bases (Definition B [325]) and dimension (Definition D [341]). The defining properties
of a linear transformation require that a function "respect" the operations of the two vector spaces that
are the domain and the codomain (Definition LT [452]). Finally, an invertible linear transformation is one
that can be "undone"   it has a companion that reverses its effect. In this subsection we are going to
begin to roll all these ideas into one.
    A vector space has "structure" derived from definitions of the two operations and the requirement
that these operations interact in ways that satisfy the ten properties of Definition VS [279]. When two
different vector spaces have an invertible linear transformation defined between them, then we can translate
questions about linear combinations (spans, linear independence, bases, dimension) from the first vector
space to the second. The answers obtained in the second vector space can then be translated back, via
the inverse linear transformation, and interpreted in the setting of the first vector space. We say that
these invertible linear transformations "preserve structure." And we say that the two vector spaces are
"structurally the same." The precise term is "isomorphic," from Greek meaning "of the same form." Let's
begin to try to understand this important concept.

Definition IVS
Isomorphic Vector Spaces
Two vector spaces U and V are isomorphic if there exists an invertible linear transformation T with
domain U and codomain V, T: U H V. In this case, we write U N V, and the linear transformation T is
known as an isomorphism between U and V.                                                             A

    A few comments on this definition. First, be careful with your language (Technique L [688]). Two
vector spaces are isomorphic, or not. It is a yes/no situation and the term only applies to a pair of vector
spaces. Any invertible linear transformation can be called an isomorphism, it is a term that applies to
functions. Second, a given pair of vector spaces there might be several different isomorphisms between the
two vector spaces. But it only takes the existence of one to call the pair isomorphic. Third, U isomorphic
to V, or V isomorphic to U? Doesn't matter, since the inverse linear transformation will provide the
needed isomorphism in the "opposite" direction. Being "isomorphic to" is an equivalence relation on the
set of all vector spaces (see Theorem SER [433] for a reminder about equivalence relations).

Example IVSAV
Isomorphic vector spaces, Archetype V


Archetype V [779] is a linear transformation from P3 to M22,


                       T: P3 1-M22,    T (a+bx+cx2+dx3) -a+ba-2]
                                                                   d     b - d


Version 2.02


﻿
                                                    Subsection IVLT.SI Structure and Isomorphism 517


Since it is injective and surjective, Theorem ILTIS [511] tells us that it is an invertible linear transformation.
By Definition IVS [515] we say P3 and M22 are isomorphic.
   At a basic level, the term "isomorphic" is nothing more than a codeword for the presence of an invertible
linear transformation. However, it is also a description of a powerful idea, and this power only becomes
apparent in the course of studying examples and related theorems. In this example, we are led to believe
that there is nothing "structurally" different about P3 and M22. In a certain sense they are the same. Not
equal, but the same. One is as good as the other. One is just as interesting as the other.
   Here is an extremely basic application of this idea. Suppose we want to compute the following linear
combination of polynomials in P3,

                           5(2 + 3x - 4x2 + 5x3) + (-3)(3 - 5x + 3x2 + x3)

Rather than doing it straight-away (which is very easy), we will apply the transformation T to convert
into a linear combination of matrices, and then compute in M22 according to the definitions of the vector
space operations there (Example VSM [281]),

         T (5(2 + 3x - 4x2 + 5x3) + (-3)(3 - 5x + 3X2 + X3))
         = 5T (2 + 3x - 4x2 + 5X3) + (-3)T (3 - 5x + 3x2 + x3)    Theorem LTLC [462]
            5 5   10]+(-3) [2      _]JDefinition of T

            31= 8                                                      Operations in M22


Now we will translate our answer back to P3 by applying T1, which we found in Example AIVLT [508],


         T'-1:M22G Ps, T_1ILa          -   -c- (a- -Fd)+(c+d)x+-(a-b-c-d)2+cx3

We compute,

                              T1([31 59        =1 + 30x - 292+ 22x3

which is, as expected, exactly what we would have computed for the original linear combination had we
just used the definitions of the operations in P3 (Example VSP [281]). Notice this is meant only as an
illustration and not a suggested route for doing this particular computation.

   Checking the dimensions of two vector spaces can be a quick way to establish that they are not
isomorphic. Here's the theorem.

Theorem IVSED
Isomorphic Vector Spaces have Equal Dimension
Suppose U and V are isomorphic vector spaces. Then dim (U) =dim (V).D
Proof If U and V are isomorphic, there is an invertible linear transformation T: U a V (Definition
IVS [515]). T is injective by Theorem ILTIS [511] and so by Theorem ILTD [486], dim (U) <; dim (V).
Similarly, T is surjective by Theorem ILTIS [511] and so by Theorem SLTD [502], dim (U)   dim (V). The
net effect of these two inequalities is that dim (U) =dim (V).U

   The contrapositive of Theorem IVSED [516] says that if U and V have different dimensions, then they


are not isomorphic. Dimension is the simplest "structural" characteristic that will allow you to distinguish
non-isomorphic vector spaces. For example P6 is not isomorphic to M34 since their dimensions (7 and 12,
respectively) are not equal. With tools developed in Section VR [530] we will be able to establish that the
converse of Theorem IVSED [516] is true. Think about that one for a moment.


Version 2.02


﻿
                                 Subsection IVLT.RNLT  Rank and Nullity of a Linear Transformation 518


Subsection RNLT
Rank and Nullity of a Linear Transformation


Just as a matrix has a rank and a nullity, so too do linear transformations. And just like the rank and
nullity of a matrix are related (they sum to the number of columns, Theorem RPNC [348]) the rank and
nullity of a linear transformation are related. Here are the definitions and theorems, see the Archetypes
(Appendix A [698]) for loads of examples.
Definition ROLT
Rank Of a Linear Transformation
Suppose that T: U H V is a linear transformation. Then the rank of T, r (T), is the dimension of the
range of T,
                                        r (T) = dim (7(T))


(This definition contains Notation ROLT.)                                                        A

Definition NOLT
Nullity Of a Linear Transformation
Suppose that T: U H V is a linear transformation. Then the nullity of T, n (T), is the dimension of the
kernel of T,
                                        n (T) = dim (K(T))


(This definition contains Notation NOLT.)                                                        A
   Here are two quick theorems.
Theorem ROSLT
Rank Of a Surjective Linear Transformation
Suppose that T: U H V is a linear transformation. Then the rank of T is the dimension of V, r (T)
dim (V), if and only if T is surjective.                                                         D
Proof    By Theorem RSLT [498], T is surjective if and only if R(T) = V. Applying Definition ROLT
[517], 7Z(T) = V if and only if r (T) = dim (7(T)) = dim (V).                                    U

Theorem NOILT
Nullity Of an Injective Linear Transformation
Suppose that T: U H V is a linear transformation. Then the nullity of T is zero, n (T) = 0, if and only if
T is injective.                                                                                  D
Proof By Theorem KILT [484], T is injective if and only if K(T) ={O}. Applying Definition NOLT
[517], K(T) ={O} if and only if n~ (T) =0.U
   Just as injectivity and surjectivity come together in invertible linear transformations, there is a clear
relationship between rank and nullity of a linear transformation. If one is big, the other is small.
Theorem RPNDD
Rank Plus Nullity is Domain Dimension
Suppose that T: U a V is a linear transformation. Then

                                      r (T) + n~ (T) =dim (U)


Proof   Let r = r (T) and s = n (T). Suppose that R = {vi, v2, v3, ..., vr} C V is a basis of the range
of T, 7Z(T), and S = {ui, u2, u3, ..., u8} C U is a basis of the kernel of T, KC(T). Note that R and S are


Version 2.02


﻿
                                 Subsection IVLT.RNLT  Rank and Nullity of a Linear Transformation  519


possibly empty, which means that some of the sums in this proof are "empty" and are equal to the zero
vector.
   Because the elements of R are all in the range of T, each must have a non-empty pre-image by Theorem
RPI [501]. Choose vectors w2 E U, 1 < i < r such that w2 E T-1 (vi). So T (wi) = v2, 1 < i < r. Consider
the set
                           B = {ui, u2, u3, ..., us, w1, w2, w3, ..., wr}

We claim that B is a basis for U.
   To establish linear independence for B, begin with a relation of linear dependence on B. So suppose
there are scalars ai, a2, a3, ..., as and bi, b2, b3, ..., br

                0=alul+a2u2+a3u3+---+asus+blwl+b2w2+b3w3+---+brwr


Then


o   T (o)
  = T (aiui + a2u2 + a3u3 + ... + asus+
          b1w1 + b2w2 + b3w3 + -   + brwr)
  = a1T' (ui) + a2T' (u2) + a37T (us) -|-..-.--|-as T (us) -|
      biT (wi) + b2T (w2) + b3T (w3) + . -+brT (wr)
  = a10+ a20 + a30 +    -+ as0+
      biT (wi) + b2T (w2) + b3T (w3) + - -+-brT(wr)


      biT (wi) + b2T (w2) + b3T (w3) + - -+-brT(wr)
    biT(wi) + b2T (w2) + b3T (w3) + --. + br T(wr)
  = blvl+ b2v2 + b3v3 + ... + brvr


Theorem LTTZZ [456]


Definition LI [308]


Theorem LTLC [462]


Definition KLT [481]


Theorem ZVSM [286]
Property Z [280]
Definition PI [465]


This is a relation of linear dependence on R (Definition RLD [308]), and since R is a linearly independent
set (Definition LI [308]), we see that bi = b2 = b3 = ... = br = 0. Then the original relation of linear
dependence on B becomes


0=alul+ a2u2 + a3u3 + -+ asus + Ow+ Ow2 + ... + Owr
  =aiu1+a2u2+a3u3+---+asus+0+0+...+0
  = aiu1+ a2u2 + a3u3 +  + asus


Theorem ZSSM [286]
Property Z [280]


But this is again a relation of linear independence (Definition RLD [308]), now on the set S. Since S is
linearly independent (Definition LI [308]), we have ai1= a2 = a3= ... = ar = 0. Since we now know
that all the scalars in the relation of linear dependence on B must be zero, we have established the linear
independence of S through Definition LI [308].
   To now establish that B spans U, choose an arbitrary vector u E U. Then T (u) E R(T), so there are
scalars ci, c2, c3, ..., cr such that

                               T (u) = civi + c2v2 + c3v3 +. -+- crvr

Use the scalars cl, c2, c3, ..., cr to define a vector y E U,

                                y = cW+ c2w2 + c3W3+ . .. + crwr


Then


T (u - y) = T (u) - T (y)


Theorem LTLC [462]


Version 2.02


﻿
                                  Subsection IVLT.RNLT   Rank and Nullity of a Linear Transformation  520


                  = T (u) - T (ciwi + c2w2 + c3w3 + -- --+ crwr)     Substitution
                  = T (u) - (ciT (wi) + c2T (w2) + - - - + crT (wr))  Theorem LTLC [462]
                  = T (u) - (clvi + c2v2 + c3v3 + ... + crvr)        w E T-' (vi)
                  = T (u) - T (u)                                         Substitution
                  = 0                                                     Property Al [280]

So the vector u - y is sent to the zero vector by T and hence is an element of the kernel of T. As such it
can be written as a linear combination of the basis vectors for 1C(T), the elements of the set S. So there
are scalars di, d2, d3, -  .-, d8 such that

                               u-y=diui+ d2u2 + d3u3+ ... + deus

Then

                u    (u - y) + y
                  = diui+ d2u2 + d3u3+ ...+ dus + ciwi+ c2w2 + c3w3+ ...+ crwr

This says that for any vector, u, from U, there exist scalars (di, d2, d3, ..., d, ci, c2, c3, ..., cr) that form
u as a linear combination of the vectors in the set B. In other words, B spans U (Definition SS [298]).
    So B is a basis (Definition B [325]) of U with s + r vectors, and thus

                                    dim (U)= s + r = n (T) + r(T)

as desired.                                                                                          U

    Theorem RPNC [348] said that the rank and nullity of a matrix sum to the number of columns of the
matrix. This result is now an easy consequence of Theorem RPNDD [517] when we consider the linear
transformation T: C"     Cm defined with the m x n matrix A by T (x) = Ax. The range and kernel
of T are identical to the column space and null space of the matrix A (Exercise ILT.T20 [489], Exercise
SLT.T20 [505]), so the rank and nullity of the matrix A are identical to the rank and nullity of the linear
transformation T. The dimension of the domain of T is the dimension of C", exactly the number of columns
for the matrix A.
    This theorem can be especially useful in determining basic properties of linear transformations. For
example, suppose that T: C6 H C6 is a linear transformation and you are able to quickly establish that
the kernel is trivial. Then n (T) = 0. First this means that T is injective by Theorem NOILT [517]. Also,
Theorem RPNDD [517] becomes

                            6   dim (C) =r(T) + n (T) =r(T) +0     r(T)

So the rank of T is equal to the rank of the codomain, and by Theorem ROSLT [517] we know T is
surjective. Finally, we know T is invertible by Theorem ILTIS [511]. So from the determination that the
kernel is trivial, and consideration of various dimensions, the theorems of this section allow us to conclude
the existence of an inverse linear transformation for T.
    Similarly, Theorem RPNDD [517] can be used to provide alternative proofs for Theorem ILTD [486],
Theorem SLTD [502] and Theorem IVSED [516]. It would be an interesting exercise to construct these
proofs.
    It would be instructive to study the archetypes that are linear transformations and see how many of


their properties can be deduced just from considering only the dimensions of the domain and codomain.
Then add in just knowledge of either the nullity or rank, and so how much more you can learn about the
linear transformation. The table preceding all of the archetypes (Appendix A [698]) could be a good place
to start this analysis.


Version 2.02


﻿
                       Subsection IVLT.SLELT   Systems of Linear Equations and Linear Transformations 521


Subsection SLELT
Systems of Linear Equations and Linear Transformations


This subsection does not really belong in this section, or any other section, for that matter. It is just the
right time to have a discussion about the connections between the central topic of linear algebra, linear
transformations, and our motivating topic from Chapter SLE [2], systems of linear equations. We will
discuss several theorems we have seen already, but we will also make some forward-looking statements that
will be justified in Chapter R [530].
    Archetype D [716] and Archetype E [720] are ideal examples to illustrate connections with linear
transformations. Both have the same coefficient matrix,

                                               2   1   7   -7
                                       D =    -3 4 -5 -6
                                               1   1   4   -5

To apply the theory of linear transformations to these two archetypes, employ matrix multiplication (Def-
inition MM [197]) and define the linear transformation,

                                                   2         1         7          -7
                 T: C4F- C3, T (x) = Dx =        [ -3 + x2 4 + x3 -5 + X4 -6
                                                   1         1         4         _-5-

Theorem MBLT [459] tells us that T is indeed a linear transformation. Archetype D [716] asks for solutions
                           8
to [S(D, b), where b = -12 . In the language of linear transformations this is equivalent to asking for
                          -4
T-1 (b). In the language of vectors and matrices it asks for a linear combination of the four columns of D
                                              7

that will equal b. One solution listed is w =[8]. With a non-empty preimage, Theorem KPI [483] tells


us that the complete solution set of the linear system is the preimage of b,

                                    w+1C(T)={w+z zEKC(T)}

The kernel of the linear transformation T is exactly the null space of the matrix D (see Exercise ILT.T20
[489]), so this approach to the solution set should be reminiscent of Theorem PSPHS [105]. The kernel
of the linear transformation is the preimage of the zero vector, exactly equal to the solution set of the
homogeneous system [S(D, 0). Since D has a null space of dimension two, every preimage (and in
particular the preimage of b) is as "big" as a subspace of dimension two (but is not a subspace).

    Archetype E [720] is identical to Archetype D [716] but with a different vector of constants, d =[3].

We can use the same linear transformation T to discuss this system of equations since the coefficient matrix
is identical. Now the set of solutions to [S(D, d) is the pre-image of d, T-1 (d). However, the vector d
is not in the range of the linear transformation (nor is it in the column space of the matrix, since these


two sets are equal by Exercise SLT.T20 [505]). So the empty pre-image is equivalent to the inconsistency
of the linear system.
    These two archetypes each have three equations in four variables, so either the resulting linear systems
are inconsistent, or they are consistent and application of Theorem CMVEI [56] tells us that the system has


Version 2.02


﻿
                                                         Subsection IVLT.READ  Reading Questions 522


infinitely many solutions. Considering these same parameters for the linear transformation, the dimension
of the domain, C4, is four, while the codomain, C3, has dimension three. Then

                   n (T) = dim (C4) - r (T)                  Theorem RPNDD [517]
                        = 4 - dim (7(T))                     Definition ROLT [517]
                        > 4 - 3                              R(T) subspace of C3
                        = 1

So the kernel of T is nontrivial simply by considering the dimensions of the domain (number of variables)
and the codomain (number of equations). Pre-images of elements of the codomain that are not in the range
of T are empty (inconsistent systems). For elements of the codomain that are in the range of T (consistent
systems), Theorem KPI [483] tells us that the pre-images are built from the kernel, and with a non-trivial
kernel, these pre-images are infinite (infinitely many solutions).
   When do systems of equations have unique solutions? Consider the system of linear equations IJS(C, f)
and the linear transformation S (x) = Cx. If S has a trivial kernel, then pre-images will either be empty or
be finite sets with single elements. Correspondingly, the coefficient matrix C will have a trivial null space
and solution sets will either be empty (inconsistent) or contain a single solution (unique solution). Should
the matrix be square and have a trivial null space then we recognize the matrix as being nonsingular.
A square matrix means that the corresponding linear transformation, T, has equal-sized domain and
codomain. With a nullity of zero, T is injective, and also Theorem RPNDD [517] tells us that rank of T is
equal to the dimension of the domain, which in turn is equal to the dimension of the codomain. In other
words, T is surjective. Injective and surjective, and Theorem ILTIS [511] tells us that T is invertible. Just
as we can use the inverse of the coefficient matrix to find the unique solution of any linear system with a
nonsingular coefficient matrix (Theorem SNCM [229]), we can use the inverse of the linear transformation
to construct the unique element of any pre-image (proof of Theorem ILTIS [511]).
    The executive summary of this discussion is that to every coefficient matrix of a system of linear equa-
tions we can associate a natural linear transformation. Solution sets for systems with this coefficient matrix
are preimages of elements of the codomain of the linear transformation. For every theorem about systems
of linear equations there is an analogue about linear transformations. The theory of linear transformations
provides all the tools to recreate the theory of solutions to linear systems of equations.
   We will continue this adventure in Chapter R [530].

Subsection READ
Reading Questions


  1. What conditions allow us to easily determine if a linear transformation is invertible?

  2. What does it mean to say two vector spaces are isomorphic? Both technically, and informally?

  3. How do linear transformations relate to systems of linear equations?


Version 2.02


﻿
                                                                  Subsection IVLT.EXC  Exercises 523


Subsection EXC
Exercises


C10 The archetypes below are linear transformations of the form T: U H V that are invertible. For
each, the inverse linear transformation is given explicitly as part of the archetype's description. Verify for
each linear transformation that

                         T-1 o T= Iu                          T o T-1 = Iv

Archetype R [769],
Archetype V [779],
Archetype W [781]
Contributed by Robert Beezer

C20    Determine if the linear transformation T: P2 H M22 is (a) injective, (b) surjective, (c) invertible.

                           Ta~b2~c         -  Ea+2b-2c        2a+2b
                                               -a+b-4c 3a+2b+2c


Contributed by Robert Beezer Solution [524]

C21    Determine if the linear transformation S: P3 1 M22 is (a) injective, (b) surjective, (c) invertible.

                        S a+ x+ x2 +dx3)      -a+4b+c+2d4aG-b+6c-d
                                              a+5b-2c+2d          a+2c+5d


Contributed by Robert Beezer Solution [524]

C50 Consider the linear transformation S: M12 H Pi from the set of 1 x 2 matrices to the set of
polynomials of degree at most 1, defined by

                                  S ([a b]) = (3a + b) + (5a + 2b)x

Prove that S is invertible. Then show that the linear transformation

                         R: P1    M12,   R (r + sx) = [(2r - s) (-5r + 3s)]

is the inverse of S, that is S-1 = R.
Contributed by Robert Beezer Solution [525]

M30 The linear transformation S below is invertible. Find a formula for the inverse linear transformation,

                            S: P1F- M1,2,   S (a +bx) =[3a +b    2a +b]


Contributed by Robert Beezer Solution [525]

M31 The linear transformation R: M12 - M21 is invertible. Determine a formula for the inverse linear
transformation R-1: M21 s M12.


                                             ([a b]) a + 3b
                                        R ([a b} Lll=
                                                    4Ra + 11b

Contributed by Robert Beezer  Solution [526]


Version 2.02


﻿
                                                                   Subsection IVLT.EXC   Exercises 524


M50 Rework Example CIVLT [512], only in place of the basis B for P2, choose instead to use the basis
C = {1, 1 + x, 1 + x + 2}. This will complicate writing a generic element of the domain of T-1 as a linear
combination of the basis elements, and the algebra will be a bit messier, but in the end you should obtain
the same formula for T-1. The inverse linear transformation is what it is, and the choice of a particular
basis should not influence the outcome.
Contributed by Robert Beezer

T05 Prove that the identity linear transformation (Definition IDLT [508]) is both injective and surjective,
and hence invertible.
Contributed by Robert Beezer

T15    Suppose that T: U H V is a surjective linear transformation and dim (U) = dim (V). Prove that T
is injective.
Contributed by Robert Beezer Solution [526]

T16    Suppose that T: U H V is an injective linear transformation and dim (U) = dim (V). Prove that T
is surjective.
Contributed by Robert Beezer

T30 Suppose that U and V are isomorphic vector spaces. Prove that there are infinitely many isomor-
phisms between U and V.
Contributed by Robert Beezer Solution [527]


Version 2.02


﻿
                                                               Subsection IVLT.SOL Solutions 525


Subsection SOL
Solutions


C20    Contributed by Robert Beezer  Statement [522]
(a) We will compute the kernel of T. Suppose that a + bx + cz2 E K(T). Then

                       0 01                 2    a+2b-2c        2a+2b
                       0 0                       -a +b4 c3a+2b+2c

and matrix equality (Theorem ME [425]) yields the homogeneous system of four equations in three variables,

                                         a+2b-2c=0
                                             2a + 2b = 0
                                         -a+b-4c=0
                                         3a+2b+2c= 0


The coefficient matrix of this system row-reduces as

                                 1   2 -2           [1   0   2
                                 2   2   0   RREF,   0  [T   -2
                                 -1 1 -4             0   0   0
                                 3   2   2           0   0   0

From the existence of non-trivial solutions to this system, we can infer non-zero polynomials in C(T). By
Theorem KILT [484] we then know that T is not injective.
   (b) Since 3 =dim (P2) < dim (M22) = 4, by Theorem SLTD [502] T is not surjective.
   (c) Since T is not surjective, it is not invertible by Theorem ILTIS [511].
C21    Contributed by Robert Beezer  Statement [522]
(a) To check injectivity, we compute the kernel of S. To this end, suppose that a + bx+ cz2+ dx3 E K(S),
so
                0 03S+b+2+d3)'                   -a+4b+c+2d 4a-b+6c-d
                0a0_-a+5b-2c+2d                                    a+2c+5d
this creates the homogeneous system of four equations in four variables,

                                      -a+4b+c+2d= 0
                                        4a - b +6c - d =0
                                        a +5b - 2c +2d = 0
                                           a +2c +5d = 0

The coefficient matrix of this system row-reduces as,


We recognize the coefficient matrix as being nonsingular, so the only solution to the system is a = b = c =
d = 0, and the kernel of S is trivial, IC(S)  {0 + Ox + 0x2 + 0x3}. By Theorem KILT [484], we see that
S is injective.


Version 2.02


﻿
                                                                  Subsection IVLT.SOL  Solutions 526


    (b) We can establish that S is surjective by considering the rank and nullity of S.

                  r (S)   dim (P3) - n (S)                 Theorem RPNDD [517]
                        =4-0
                          dim (M22)

So, R(S) is a subspace of M22 (Theorem RLTS [497]) whose dimension equals that of M22. By Theorem
EDYES [358], we gain the set equality R(S)= M22. Theorem RSLT [498] then implies that S is surjective.
    (c) Since S is both injective and surjective, Theorem ILTIS [511] says S is invertible.
C50    Contributed by Robert Beezer  Statement [522]
Determine the kernel of S first. The condition that S ([a b]) = 0 becomes (3a + b) + (5a + 2b)xz= 0+ Ox.
Equating coefficients of these polynomials yields the system

                                              3a + b =0
                                              5a+ 2b = 0

This homogeneous system has a nonsingular coefficient matrix, so the only solution is a = 0, b = 0 and
thus
                                          IC(S){ [0 0]}

By Theorem KILT [484], we know S is injective. With n (S) = 0 we employ Theorem RPNDD [517] to
find
                     r (5)   r (S) +0 =r (S) + n (S) = dim (M12) = 2 =dim (P1)

Since R(S) C P1 and dim (R(S)) =dim (P1), we can apply Theorem EDYES [358] to obtain the set
equality R(S) = PI and therefore S is surjective.
   One of the two defining conditions of an invertible linear transformation is (Definition IVLT [508])

               (S o R)(a+bx) = S(R (a+bx))

                              = S ([(2a - b)(-5a + 3b)])
                              = (3(2a - b) + (-5a + 3b)) + (5(2a - b) + 2(-5a + 3b)) x
                              = ((6a - 3b) + (-5a + 3b)) + ((10a - 5b) + (-10a + 6b)) x
                              =a+bx
                              = IpA (a + bx)

That (R o.S) ([a b]) = IM12 (a b) is similar.
M30 Contributed by Robert Beezer Statement [522]
(Another approach to this solution would follow Example CIVLT [512].)
   Suppose that S-: Ml1,2 a Pi has a form given by

                                S-1(z w)=(rz+sw)+(pz+qw)x

where r, s, p, q are unknown scalars. Then

                      a +bx =S-1 (S (a +bx))


= S-1 ([3a + b 2a + b])
= (r(3a + b) + s(2a + b)) + (p(3a + b) + q(2a + b)) x
= ((3r + 2s)a + (r + s)b) + ((3p + 2q)a + (p + q)b) x


Version 2.02


﻿
                                                                    Subsection IVLT.SOL  Solutions 527


Equating coefficients of these two polynomials, and then equating coefficients on a and b, gives rise to 4
equations in 4 variables,

                                              3r + 2s= 1
                                                r+s0O
                                              3p+ 2q    0
                                                p+q=1


This system has a unique solution: r = 1, s = -1, p = -2, q = 3. So the desired inverse linear
transformation is
                                 S-1 (z w) = (z - w) + (-2z + 3w) x
Notice that the system of 4 equations in 4 variables could be split into two systems, each with two equations
in two variables (and identical coefficient matrices). After making this split, the solution might feel like
computing the inverse of a matrix (Theorem CINM [217]). Hmmmm.
M31     Contributed by Robert Beezer  Statement [522]
(Another approach to this solution would follow Example CIVLT [512].)
   We are given that R is invertible. The inverse linear transformation can be formulated by considering
the pre-image of a generic element of the codomain. With injectivity and surjectivity, we know that the
pre-image of any element will be a set of size one  it is this lone element that will be the output of the
inverse linear transformation.
    Suppose that we set v=z as a generic element of the codomain, M21. Then if [r s] = w E R- (v),


                                            H   =v=R(w)

                                                  E r + 3s
                                                  4r + 11s

So we obtain the system of two equations in the two variables r and s,

                                               r + 3s = x
                                             4r + 11s = y

With a nonsingular coefficient matrix, we can solve the system using the inverse of the coefficient matrix,

                                            r=-11x + 3y
                                            s =4x - y

So we define,


T15 Contributed by Robert Beezer Statement [523]
If T is surjective, then Theorem RSLT [498] says 7Z(T) =V, so r (T) =dim (V). In turn, the hypothesis
gives r (T) =dim (U). Then, using Theorem RPNDD [517],


                        n (T) = (r (T) + n (T)) - r (T) =dim (U) - dim (U) = 0

With a null space of zero dimension, C(T) = {0}, and by Theorem KILT [484] we see that T is injective.
T is both injective and surjective so by Theorem ILTIS [511], T is invertible.


Version 2.02


﻿
                                                                      Subsection IVLT.SOL   Solutions 528


T30     Contributed by Robert Beezer  Statement [523]
Since U and V are isomorphic, there is at least one isomorphism between them (Definition IVS [515]), say
T: U H V. As such, T is an invertible linear transformation.
    For a E C define the linear transformation S: V H V by S (v) = av. Convince yourself that when a#
0, S is an invertible linear transformation (Definition IVLT [508]). Then the composition, S o T: U H V,
is an invertible linear transformation by Theorem CIVLT [514]. Once convinced that each non-zero value
of a gives rise to a different functions for S o T, then we have constructed infinitely many isomorphisms
from U to V.


Version 2.02


﻿
                                               Annotated Acronyms IVLT.LT   Linear Transformations 529


Annotated Acronyms LT
Linear Transformations


Theorem MBLT [459]
You give me an m x n matrix and I'll give you a linear transformation T: C" H Cm. This is our first hint
that there is some relationship between linear transformations and matrices.

Theorem MLTCV [460]
You give me a linear transformation T: C" H Cm and I'll give you an m x n matrix. This is our second hint
that there is some relationship between linear transformations and matrices. Generalizing this relationship
to arbitrary vector spaces (i.e. not just C" and Cm) will be the most important idea of Chapter R [530].

Theorem LTLC [462]
A simple idea, and as described in Exercise LT.T20 [473], equivalent to the Definition LT [452]. The
statement is really just for convenience, as we'll quote this one often.

Theorem LTDB [462]
Another simple idea, but a powerful one. "It is enough to know what a linear transformation does to
a basis." At the outset of Chapter R [530], Theorem VRRB [317] will help us define a very important
function, and then Theorem LTDB [462] will allow us to understand that this function is also a linear
transformation.

Theorem KPI [483]
The pre-image will be an important construction in this chapter, and this is one of the most important
descriptions of the pre-image. It should remind you of Theorem PSPHS [105], which is described in
Acronyms V [181]. See Theorem RPI [501], which is also described below.

Theorem KILT [484]
Kernels and injective linear transformations are intimately related. This result is the connection. Compare
with Theorem RSLT [498] below.

Theorem ILTB [486]
Injective linear transformations and linear independence are intimately related. This result is the connec-
tion. Compare with Theorem SLTB [501] below.

Theorem RSLT [498]
Ranges and surjective linear transformations are intimately related. This result is the connection. Compare
with Theorem KILT [484] above.

Theorem SSRLT [500]
This theorem provides the most direct way of forming the range of a linear transformation. The resulting
spanning set might well be linearly dependent, and beg for some clean-up, but that doesn't stop us from
having very quickly formed a reasonable description of the range. If you find the determination of spanning


sets or ranges difficult, this is one worth remembering. You can view this as the analogue of forming a
column space by a direct application of Definition CSM [236].

Theorem SLTB [501]


Version 2.02


﻿
                                               Annotated Acronyms IVLT.LT   Linear Transformations 530


Surjective linear transformations and spanning sets are intimately related. This result is the connection.
Compare with Theorem ILTB [486] above.

Theorem RPI [501]
This is the analogue of Theorem KPI [483]. Membership in the range is equivalent to nonempty pre-images.

Theorem ILTIS [511]
Injectivity and surjectivity are independent concepts. You can have one without the other. But when you
have both, you get invertibility, a linear transformation that can be run "backwards." This result might
explain the entire structure of the four sections in this chapter.

Theorem RPNDD [517]
This is the promised generalization of Theorem RPNC [348] about matrices. So the number of columns of
a matrix is the analogue of the dimension of the domain. This will become even more precise in Chapter
R [530]. For now, this can be a powerful result for determining dimensions of kernels and ranges, and
consequently, the injectivity or surjectivity of linear transformations. Never underestimate a theorem that
counts something.


Version 2.02


﻿


Chapter R

Representations


Previous work with linear transformations may have convinced you that we can convert most questions
about linear transformations into questions about systems of equations or properties of subspaces of Cm.
In this section we begin to make these vague notions precise. We have used the word "representation"
prior, but it will get a heavy workout in this chapter. In many ways, everything we have studied so far
was in preparation for this chapter.


Section VR
Vector Representations


We begin by establishing an invertible linear transformation between any vector space V of dimension m
and C". This will allow us to "go back and forth" between the two vector spaces, no matter how abstract
the definition of V might be.
Definition VR
Vector Representation
Suppose that V is a vector space with a basis B = {vi, v2, v3, ..., vn}. Define a function pB: V H C"
as follows. For w E V define the column vector PB (w) E C" by

                  w = [PB (w)]1 v1 + [PB (w)12 v2 + [PB (w)13 v3 + ... + [PB (w)]o vn


(This definition contains Notation VR.)                                                          A
   This definition looks more complicated that it really is, though the form above will be useful in proofs.
Simply stated, given w E V, we write w as a linear combination of the basis elements of B. It is key
to realize that Theorem VRRB [317] guarantees that we can do this for every w, and furthermore this
expression as a linear combination is unique. The resulting scalars are just the entries of the vector PB (w).
This discussion should convince you that PB is "well-defined" as a function. We can determine a precise
output for any input. Now we want to establish that PB is a function with additional properties - it is a
linear transformation.
Theorem VRLT
Vector Representation is a Linear Transformation
The function pB (Definition VR [530]) is a linear transformation.D
Proof We will take a novel approach in this proof. We will construct another function, which we will
easily determine is a linear transformation, and then show that this second function is really pB in disguise.
Here we go.


531


﻿
                                                                Section VR  Vector Representations 532


   Since B is a basis, we can define T : V H C" to be the unique linear transformation such that T (vi) = ei,
1 < i < n, as guaranteed by Theorem LTDB [462], and where the ei are the standard unit vectors
(Definition SUV [173]). Then suppose for an arbitrary w E V we have,


                n
[T (w)]       (T   [PB (w)] v j

             n
               [PB(w)]j T (vJ)

             n
                PB (w)]j ej


j=1


[[PB (w)]j ej]


[PB (w)]j [ej]i


Definition VR [530]


Theorem LTLC [462]


Definition CVA [84]


Definition CVSM [85]


Property CC [86]


Definition SUV [173]


                n
[PB (w)]; [ei]i -+ [PB (w)], [eg]
               j=1
               12
               n
[PB (w)]i (1) -+  [PB (w)], (0)
              j=1
              jP w
[PB (W)]I


As column vectors, Definition CVE [84] implies that T (w) = PB (w). Since w was an arbitrary element
of V, as functions T = pB. Now, since T is known to be a linear transformation, it must follow that PB is
also a linear transformation.                                                                       U

   The proof of Theorem VRLT [530] provides an alternate definition of vector representation relative to
a basis B that we could state as a corollary (Technique LC [696]): PB is the unique linear transformation
that takes B to the standard unit basis.

Example VRC4
Vector representation in C4
Consider the vector y E C4
                                                    6
                                                    14

                                                    7_


We will find several vector representations of y in
representations of y do change.
   One basis for C4 is


this example. Notice that y never changes, but the


B = {Ui, U2, U3, U4} =


{


-2       3     1     4
1       -6     2     3
2    '   2     0'1
.-3]   [-4]   [5]    _6]


I


Version 2.02


﻿
                                                                  Section VR  Vector Representations 533


as can be seen by making these vectors the columns of a matrix, checking that the matrix is nonsingular
and applying Theorem CNMB [330]. To find PB (y), we need to find scalars, a1, a2, a3, a4 such that

                                    y =au1 + a2u2 + a3u3 + a4u4

By Theorem SLSLC [93] the desired scalars are a solution to the linear system of equations with a coefficient
matrix whose columns are the vectors in B and with a vector of constants y. With a nonsingular coefficient
matrix, the solution is unique, but this is no surprise as this is the content of Theorem VRRB [317]. This
unique solution is

               a1=2                 a2=-1                  a3 =-3                 a4= 4

Then by Definition VR [530], we have
                                                        2
                                                        -1
                                            PB (Y)    [j
                                                        4
Suppose now that we construct a representation of y relative to another basis of C4,

                                       --15      16      -26      14
                                     _   9      -14       14     -13
                                   C    -4   '   5    '  -6    '   4
                                       .-2 _     2  _   _-3  _   _6  _
As with B, it is easy to check that C is a basis. Writing y as a linear combination of the vectors in C leads
to solving a system of four equations in the four unknown scalars with a nonsingular coefficient matrix.
The unique solution can be expressed as

                          6             -15             16         -26         14
                          14             9    +   -)-14 +           14   +0-13
                    y     ~     (-28) [ij+(-8)               +11 [~]+0 73
                             6    -2)    -4             5           -6          4
                          7_             -2_           _2  _-3_               _6   _

so that Definition VR [530] gives
                                                      --28

                                            pc (y) =[11
                                                        0
We often perform representations relative to standard bases, but for vectors in Cm its a little silly. Let's
find the vector representation of y relative to the standard basis (Theorem SUVB [325]),

                                          D ={ei, e2, e3, e4}

Then, without any computation, we can check that


                                  y     ~      6ei +14e2 +6e3 +7e4


so by Definition VR [530],


           6

PD (Y)      4
           7


Version 2.02


﻿
                                                                 Section VR  Vector Representations 534


which is not very exciting. Notice however that the order in which we place the vectors in the basis is
critical to the representation. Let's keep the standard unit vectors as our basis, but rearrange the order
we place them in the basis. So a fourth basis is

                                          E= {e3, e4, e2, ei}

Then,
                                        6
                                  y    [=6j =6e3-7e4+14e2+6ei

                                        -7
so by Definition VR [530],
                                                       6
                                                       7
                                            PE (Y)    K]
                                                       6

So for every possible basis of C4 we could construct a different representation of y.

   Vector representations are most interesting for vector spaces that are not C".

Example VRP2
Vector representations in P2
Consider the vector u = 15 + 10x - 6x2 E P2 from the vector space of polynomials with degree at most 2
(Example VSP [281]). A nice basis for P2 is

                                            B = {1, x, x2}

so that
                            u =15 + 10x - 6x2 =15(1) + 10(x) + (-6)(x2)

so by Definition VR [530]
                                                       15
                                            PB (u) =   10
                                                      --6_
Another nice basis for P2 is
                                      B = {1, 1 + x, 1 + x + x2}

so that now it takes a bit of computation to determine the scalars for the representation. We want a1, a2, a3
so that
                          15 + 10x - 6x2 ai0(1) + a2(1 + x) + as(1 + x + x2)

Performing the operations in P2 on the right-hand side, and equating coefficients, gives the three equations
in the three unknown scalars,

                                           15 =ai + a2 + as
                                           10 =a2 + a3
                                           -6 =as


The coefficient matrix of this sytem is nonsingular, leading to a unique solution (no surprise there, see
Theorem VRRB [317]),

                    a1=5                       a2=16                      a3=-6


Version 2.02


﻿
                                                                 Section VR  Vector Representations 535


so by Definition VR [530]
                                                       5
                                            pc (u) [=16
                                                      .-6_

While we often form vector representations relative to "nice" bases, nothing prevents us from forming
representations relative to "nasty" bases. For example, the set

                               D= {-2 - x + 3x2, 1 - 2x2, 5+ 4x + x2}

can be verified as a basis of P2 by checking linear independence with Definition LI [308] and then arguing
that 3 vectors from P2, a vector space of dimension 3 (Theorem DP [345]), must also be a spanning set
(Theorem G [355]). Now we desire scalars a1, a2, a3 so that

                   15 + 10x - 6x2 a=i(-2 - x + 3x2) + a2(1 - 2x2) + a3(5 + 4x + x2)

Performing the operations in P2 on the right-hand side, and equating coefficients, gives the three equations
in the three unknown scalars,

                                         15 =-2a1 + a2 + 5a3
                                         10 =-ai + 4a3
                                         -6= 3a1 - 22 + a3

The coefficient matrix of this sytem is nonsingular, leading to a unique solution (no surprise there, see
Theorem VRRB [317]),

                     a1=-2                       a2=1                       a3=2

so by Definition VR [530]
                                                      -2
                                            pD (u) = 1
                                                       2


Theorem VRI
Vector Representation is Injective
The function PB (Definition VR [530]) is an injective linear transformation.D

Proof We will appeal to Theorem KILT [484]. Suppose U is a vector space of dimension n, so vector
representation is of the form PB: U a C". Let B ={ui, 112, 113, ..., un} be the basis of U used in the
definition of pB. Suppose u E lC(pB). We write u as a linear combination of the vectors in the basis B
where the scalars are the components of the vector representation, PB (u).

       u =[PB (u)] i1 ui-+ [PB (u)]2 112 +| [PB (u)] 3 + -|-  + | [PB (u)], un  Definition VR [530]
          =[O] 11ui + [O]2 112 + [O]3 113 + - - + [0], un            Definition KLT [481]


= Oui + Ou2 + Ou3s+ - - - +0Oun                                    Definition ZCV [25]
= 0 + 0 + 0 + - - - + 0                                            Theorem ZSSM [286]
= 0                                                                Property Z [280]


Version 2.02


﻿
                                              Subsection VR.CVS   Characterization of Vector Spaces 536


Thus an arbitrary vector, u, from the kernel ,K(pB), must equal the zero vector of U. So C(pB) =_{0}
and by Theorem KILT [484], PB is injective.                                                       U

Theorem VRS
Vector Representation is Surjective
The function PB (Definition VR [530]) is a surjective linear transformation.                      D

Proof We will appeal to Theorem RSLT [498]. Suppose U is a vector space of dimension n, so vector
representation is of the form pB: U i C. Let B = {ui, u2, u3, ..., un} be the basis of U used in the
definition of pB. Suppose v E C". Define the vector u by

                             u = [v]1 ui + [v]2 u2 + [v]3 u3+...+ [v], un

Then for 1 <i <rn

         [PB (u)] - [PB ([v]1 ui + [v]2 U2 + [v]3 U3 + ... + [v] un)],
                 = [v]i                                                  Definition VR [530]

so the entries of vectors PB (u) and v are equal and Definition CVE [84] yields the vector equality PB (u)
v. This demonstrates that v E R(pB), so C" C R(pB). Since R(PB) C C" by Definition RLT [496], we
have R(pB) = C" and Theorem RSLT [498] says PB is surjective.                                     U

   We will have many occasions later to employ the inverse of vector representation, so we will record the
fact that vector representation is an invertible linear transformation.

Theorem VRILT
Vector Representation is an Invertible Linear Transformation
The function PB (Definition VR [530]) is an invertible linear transformation.                D

Proof The function PB (Definition VR [530]) is a linear transformation (Theorem VRLT [530]) that is
injective (Theorem VRI [534]) and surjective (Theorem VRS [535]) with domain V and codomain C". By
Theorem ILTIS [511] we then know that PB is an invertible linear transformation.             U

   Informally, we will refer to the application of PB as coordinatizing a vector, while the application of
p-- will be referred to as un-coordinatizing a vector.


Subsection CVS
Characterization of Vector Spaces


Limiting our attention to vector spaces with finite dimension, we now describe every possible vector space.
All of them. Really.

Theorem CFDVS
Characterization of Finite Dimensional Vector Spaces
Suppose that V is a vector space with dimension n. Then V is isomorphic to C".D
Proof Since V has dimension n, we can find a basis of V of size n~ (Definition D [341]) which we will call
B. The linear transformation pB is an invertible linear transformation from V to C", so by Definition IVS
[515], we have that V and C"m are isomorphic.U


   Theorem CFDVS [535] is the first of several surprises in this chapter, though it might be a bit demor-
alizing too. It says that there really are not all that many different (finite dimensional) vector spaces, and
none are really any more complicated than C". Hmmm. The following examples should make this point.


Version 2.02


﻿
                                                     Subsection VR.CP  Coordinatization Principle 537


Example TIVS
Two isomorphic vector spaces
The vector space of polynomials with degree 8 or less, P8, has dimension 9 (Theorem DP [345]). By
Theorem CFDVS [535], P8 is isomorphic to C9.

Example CVSR
Crazy vector space revealed
The crazy vector space, C of Example CVS [283], has dimension 2 by Example DC [346]. By Theorem
CFDVS [535], C is isomorphic to C2. Hmmmm. Not really so crazy after all?

Example ASC
A subspace characterized
In Example DSP4 [346] we determined that a certain subspace W of P4 has dimension 4. By Theorem
CFDVS [535], W is isomorphic to C4.

Theorem IFDVS
Isomorphism of Finite Dimensional Vector Spaces
Suppose U and V are both finite-dimensional vector spaces. Then U and V are isomorphic if and only if
dim (U) = dim (V).                                                                               D

Proof (-) This is just the statement proved in Theorem IVSED [516].
   (<) This is the advertised converse of Theorem IVSED [516]. We will assume U and V have equal
dimension and discover that they are isomorphic vector spaces. Let n be the common dimension of U and
V. Then by Theorem CFDVS [535] there are isomorphisms T: U H C"m and S: V H C".
   T is therefore an invertible linear transformation by Definition IVS [515]. Similarly, S is an invertible
linear transformation, and so S-1 is an invertible linear transformation (Theorem IILT [511]). The com-
position of invertible linear transformations is again invertible (Theorem CIVLT [514]) so the composition
of S-1 with T is invertible. Then (S-1 o T) : U H V is an invertible linear transformation from U to V
and Definition IVS [515] says U and V are isomorphic.                                            U

Example MIVS
Multiple isomorphic vector spaces
C10, P9, M2,5 and M5,2 are all vector spaces and each has dimension 10. By Theorem IFDVS [536] each is
isomorphic to any other.
   The subspace of M4,4 that contains all the symmetric matrices (Definition SYM [186]) has dimension
10, so this subspace is also isomorphic to each of the four vector spaces above.


Subsection CP
Coordinatization Principle


With pB available as an invertible linear transformation, we can translate between vectors in a vector space
U of dimension m and Cm. Furthermore, as a linear transformation, pB respects the addition and scalar
multiplication in U, while pg1 respects the addition and scalar multiplication in Ctm. Since our definitions
of linear independence, spans, bases and dimension are all built up from linear combinations, we will finally
be able to translate fundamental properties between abstract vector spaces (U) and concrete vector spaces
(Ctm).


Theorem CLI
Coordinatization and Linear Independence
Suppose that U is a vector space with a basis B of size n. Then S = {ui, u2, u3, ..., uk} is a linearly inde-


Version 2.02


﻿
                                                     Subsection VR.CP  Coordinatization Principle 538


pendent subset of U if and only if R = {PB (ui), PB (u2), PB (u3), -"-"-, PB (uk)} is a linearly independent
subset of C".                                                                                    D
Proof The linear transformation PB is an isomorphism between U and C" (Theorem VRILT [535]). As
an invertible linear transformation, PB is an injective linear transformation (Theorem ILTIS [511]), and
pil is also an injective linear transformation (Theorem IILT [511], Theorem ILTIS [511]).
   (-) Since PB is an injective linear transformation and S is linearly independent, Theorem ILTLI [485]
says that R is linearly independent.
   (<) If we apply pil to each element of R, we will create the set S. Since we are assuming R is linearly
independent and pl is injective, Theorem ILTLI [485] says that S is linearly independent.

Theorem CSS
Coordinatization and Spanning Sets
Suppose that U is a vector space with a basis B of size n. Then u E ({ui, u2, u3, ... , uk}) if and only if
PB (u) E K{Ps (ui), pB (U2), PB (u3), --. , PB (uk)}).-Q
Proof   (-) Suppose u E ({ui, u2, u3, ... , uk}). Then there are scalars, ai, a2, a3, ..., a, such that

                                u = aiu1 + a2u2 + asus + - - - + akuk

Then,

       PB (u) =BP (aiui + a2u2 + a3u3 + ... + akuk)
              = aipB (ui) + a2PB (u2) + a3PB (u3) + - --+ akpB (uk)  Theorem LTLC [462]

which says that PB (u) E K{PB (ui), PB (u2), PB (u3), -"-"-", PB (uk)}).
   (-) Suppose that PB (u) E K{PB (ui), PB (u2), PB (u3), -"- -, PB (uk)}). Then there are scalars bi, b2, b3, ...,
such that
                     PB (u) = bipB (ui) + b2PB (u2) + b3PB (us) + ... + bkPB (uk)
Recall that PB is invertible (Theorem VRILT [535]), so

       u = Iu (u)                                                       Definition IDLT [508]
         = (pg1 o PB) (u)                                               Definition IVLT [508]
         = pil (PB (u))                                                 Definition LTC [469]
         = pil (bips (ui) + b2PB (u2) + b3PB (us) + ... + bkps (uk))
         = bips1 (PB (ui)) + b2Ps1 (PB (u2)) + b3Ps1 (PB (u3))
             + - - - + bkpoi (pB (uk))                                  Theorem LTLC [462]
         = biIU (u1) + b2IU (u2) + b3IU (u3) + - --+ bkIu (uk)      Definition IVLT [508]
         = biui+ b2u2 + baus3-- -- |-bkuk                               Definition IDLT [508]

which says that u C ({ui, 112, 113, . . ., Uk})-
   Here's a fairly simple example that illustrates a very, very important idea.
Example CP2
Coordinatizing in P2
In Example VRP2 [533] we needed to know that

                              D ={-2 - x + 3x2, 1 - 2x2, 5 + 4x + z2}


is a basis for P2. With Theorem CLI [536] and Theorem CSS [537] this task is much easier. First, choose
a known basis for P2, a basis that forms vector representations easily. We will choose

                                          B = {1, x, x2


Version 2.02


﻿
                                                       Subsection VR.CP  Coordinatization Principle 539


Now, form the subset of C3 that is the result of applying PB to each element of D,

                                                                        -2       1     5
          F = {pB (-2 - x+3x2) , pB (1 -2x2) , pB (5+ 4x +x2)}=     -1 ,   0   , 4
                                                                         3      -2     1

and ask if F is a linearly independent spanning set for C3. This is easily seen to be the case by forming a
matrix A whose columns are the vectors of F, row-reducing A to the identity matrix I3, and then using
the nonsingularity of A to assert that F is a basis for C3 (Theorem CNMB [330]). Now, since F is a basis
for C3, Theorem CLI [536] and Theorem CSS [537] tell us that D is also a basis for P2.

   Example CP2 [537] illustrates the broad notion that computations in abstract vector spaces can be
reduced to computations in Cm. You may have noticed this phenomenon as you worked through examples
in Chapter VS [279] or Chapter LT [452] employing vector spaces of matrices or polynomials. These
computations seemed to invariably result in systems of equations or the like from Chapter SLE [2], Chapter
V [83] and Chapter M [182]. It is vector representation, PB, that allows us to make this connection formal
and precise.
   Knowing that vector representation allows us to translate questions about linear combinations, linear
independence and spans from general vector spaces to Cm allows us to prove a great many theorems about
how to translate other properties. Rather than prove these theorems, each of the same style as the other,
we will offer some general guidance about how to best employ Theorem VRLT [530], Theorem CLI [536]
and Theorem CSS [537]. This comes in the form of a "principle": a basic truth, but most definitely not a
theorem (hence, no proof).


The Coordinatization Principle Suppose that U is a vector space with a basis B of size n. Then any
question about U, or its elements, which ultimately depends on the vector addition or scalar multiplication
in U, or depends on linear independence or spanning, may be translated into the same question in C"
by application of the linear transformation PB to the relevant vectors. Once the question is answered
in C, the answer may be translated back to U (if necessary) through application of the inverse linear
transformation p-1.


Example CM32
Coordinatization in M32
This is a simple example of the Coordinatization Principle [538], depending only on the fact that coordina-
tizing is an invertible linear transformation (Theorem VRILT [535]). Suppose we have a linear combination
to perform in M32, the vector space of 3 x 2 matrices, but we are adverse to doing the operations of M32
(Definition MA [182], Definition MSM [183]). More specifically, suppose we are faced with the computation


                                      6[-2 2] +2[4 8]


We choose a nice basis for M32 (or a nasty basis if we are so inclined),


                          a1 0        0 0     0 0      0  1     0 0      0 0
                     B=      0 0, 1 0, 0 0, 0 0, 0                 1, 0 0
                           100        0 0      1 0     0 0      0 0      0  1-

and apply pB to each vector in the linear combination. This gives us a new computation, now in the vector


Version 2.02


﻿
Subsection VR.READ   Reading Questions 540


space C6,


6


-3-
-2
0
7
4
.-3_


+2


-1
4
-2
3
8
-5-


which we can compute with the operations of C6 (Definition CVA [84], Definition CVSM [85]), to arrive at

                                                16
                                                -4
                                                -4
                                                48
                                                40
                                                _-8_

We are after the result of a computation in M32, so we now can apply p-1 to obtain a 3 x 2 matrix,


    1 0           0 0           0 0
16 0 0 +(-4) 1 0 +(-4) 0 0
    0 0           0 0            1 0


+48 0 0 +40 0 1 +
      0 0         0 0


      0 0       16   48
(-8) 0 0 =      -4   40
      0  1-4 -8


which is exactly the matrix we would have computed had we just performed the matrix operations in the
first place. So this was not meant to be an easier way to compute a linear combination of two matrices,
just a different way.


Subsection READ
Reading Questions


  1. The vector space of 3 x 5 matrices, M3,5 is isomorphic to what fundamental vector space?

  2. A basis for C3 is
                                            B  1       3     1
                                       B= 2 , -1 , 1
                                              -1 2 1


Compute PB


[ii)/


3. What is the first "surprise," and why is it surprising?


Version 2.02


﻿
                                                                   Subsection VR.EXC  Exercises 541


Subsection EXC
Exercises


C10 In the vector space C3, compute the vector representation PB (v) for the basis B and vector v below.

                              2      1    3                               11
                     B=      -2 ,3, 5                               v=    5
                            12       1    2                              -8

Contributed by Robert Beezer   Solution [541]

C20 Rework Example CM32 [538] replacing the basis B by the basis

                  _14   _9  -   7 -_ _4-   -3  -1     -7 -4        4    2      0    0
          C=-     10    10,    5    5   ,  0   -2,     3    2   , -3   -3, -1 -2
                 1-6   -2_     -3  -1_     1    1_    -1    0_     2    1      1    1_

Contributed by Robert Beezer Solution [541]

M10 Prove that the set S below is a basis for the vector space of 2 x 2 matrices, M22. Do this choosing
a natural basis for M22 and coordinatizing the elements of S with respect to this basis. Examine the
resulting set of column vectors from C4 and apply the Coordinatization Principle [538].

                        S=[33      99    [-16  -47    [10 27     -2   -7
                                                S'IL[26               4 ] }
                               78t-91' -36       21' 117    3_'-

Contributed by Andy Zimmer


Version 2.02


﻿
                                                                  Subsection VR.SOL Solutions 542


Subsection SOL
Solutions


C10    Contributed by Robert Beezer   Statement [540]
We need to express the vector v as a linear combination of the vectors in B. Theorem VRRB [317] tells
us we will be able to do this, and do it uniquely. The vector equation

                                    2         1        3     ~11
                                ai -2   +a2 3    +a3   5  =   5
                                    2         1        2      8

becomes (via Theorem SLSLC [93]) a system of linear equations with augmented matrix,

                                           2   1 3   11
                                           -2 3 5    5
                                           2   1 2   8

This system has the unique solution ai1= 2, a2 = -2, a3= 3. So by Definition VR [530],

                               11               2           1      3         2
                 PB (V) = PB    5    = PB   2 -2   + (-2) [3 +3 5       =   -2
                                8               2           1      2         3

C20 Contributed by Robert Beezer Statement [540]
The following computations replicate the computations given in Example CM32 [538], only using the basis
C.

                                    -9--11-
                                -   12                             -       34
                      3   7         1                        -1 3          3
               pc    -2 4 1         -6                 pc    4   8 >      2
                      0   -3-       -2                        2 5-         16
                                    -1                                     5 _
                 -9       -11       -76                     7 -76\
                 12        34       140                        140        16   48
                 -6        -4       -44                  -1    -44       ' 4   30
                 7   +2    -1       40                  P       40     =    1 3
                 -2        16       20                              )04         8
                 _-1_       5 _        _\                     _ u


Version 2.02


﻿
                                                             Section MR  Matrix Representations 543


Section MR
Matrix Representations
U._


We have seen that linear transformations whose domain and codomain are vector spaces of columns vec-
tors have a close relationship with matrices (Theorem MBLT [459], Theorem MLTCV [460]). In this
section, we will extend the relationship between matrices and linear transformations to the setting of linear
transformations between abstract vector spaces.

Definition MR
Matrix Representation
Suppose that T: U H V is a linear transformation, B ={ui, u2, u3, ..., un} is a basis for U of size n,
and C is a basis for V of size m. Then the matrix representation of T relative to B and C is the m x n
matrix,
                    MgT,C = [ PC (T (ui))| pc (T (u2))| pc (T (u3))| -..- pc (T (un)) ]


(This definition contains Notation MR.)                                                         A

Example OLTTR
One linear transformation, three representations
Consider the linear transformation

         S: P3   M22,S(a + bx +c+ dx3) E3a+7b-2c-5d                  8a+14b-2c-11d
                                                 -4a-8b+2c+6d        12a+22b-4c-17d

First, we build a representation relative to the bases,

             B={1+2x+x2-x3, 1+3x+x2+x3, -1-2x+2x3, 2+3x+2x2-5x3}

               C =     1    2 3     -1 -1      -1 -4
               C    1 2-'-2 5-' 0       -2 ' -2 -4 J

We evaluate S with each element of the basis for the domain, B, and coordinatize the result relative to
the vectors in the basis for the codomain, C.


          pc (S(1+2x+x2_x3)V=pc([j'0             1)


                                                                                 [-72]


                   1(( 2) 1 ( j     -0 1-1                     -4        29
     = pc  (-_72)     J+29 [  ]+ (-34)  _J+ 3 [2 - =) 3

                                         -27- -58
PC (S (-1 -2x +2x3)) = pc (327  -90J


Version 2.02


﻿
                                            Section MR  Matrix Representations 544


                          { 1 1 23   -11-1-1 - 4 )          -46
          = Pc 11  1 2] +(-46) [ 5 +54    J-2 +(-5) [-2 -4) 54
                  21\1                   i       F     1\  [4
                                                            [-5]

       Pc (S (2+ 3x + 2x2 - 5x3)) = pc (  58  109

                                                           -2201
          = PC((-220) K1 + 91[ 2 5] +-96 [0 2J +10 [-2 -4J) -96
                              LK                               ]
                                                            . 10 _


Thus, employing Definition MR [542]
                              -90  -72 114 -220
                         SM37      29  -46  91
                         MB)c -40 -34  54  -96
                               _4  3   -5   10_
Often we use "nice" bases to build matrix representations and the work involved is much easier. Suppose
we take bases


The evaluation of S at the elements of D is easy and coordinatization relative to E can be done on sight,

        PE (S (1))VPE
                                                       3
                =PE (3 0 0] + 8 0 0] + (-4) 1     0 + 12 -0 1Jf/      -4
                                                      .12]_

        PE (S (X) =PE(L87 1
                                                        7
                =PE (7 0 0 + 14 [0 0 + (-8) 1  0J+ 22 -0 1 -
                                                     .22_1
                            { -2 -
                   PE (S(220P


                                              -5
= PE ((5) I 0  J + (-11) -   J + 6 I    J + (-17) -  Ol 1   11
        L                 L             -
                                             [-17]


Version 2.02


﻿
Section MR  Matrix Representations  545


So the matrix representation of S relative to D and E is

                                            3    7
                                  MSE      [8   14
                                    D,E   -_4 _-8
                                           12 22


-2   -5
-2 -11
2     6
-4 -17_


One more time, but now let's use bases

               F    {1+ z - 2 + 2x3, -1+ 2x + 2x3, 2 + x - 29 + 3x3, 1+    + 2x3}


               and1 2'           0n  2 '  -2  3      t'0 2}
and evaluate S with the elements of F, then coordinatize the results relative to G,


PG (S(1+  -2  2 3))


   PG (S (-1+ 2x + 2X3))


PG (S (2 + x -22 + 3x3))


                                    2
   PG L2  4      P2     1  1 2) 0


PG 4PG
     -    2-              .         01

                                   0_


PG     2   - PG        23
                                  0


[


PG (S (1 + x + 2x3))


PG([0 W)


                  0

PG (0[02)
                 .0_


So we arrive at an especially economical matrix representation,

                                              2   0   0 0
                                      MS      0 -1 0 0
                                      MF,G    0   0   1 0
                                              _0 0 0 0_


   We may choose to use whatever terms we want when we make a definition. Some are arbitrary, while
others make sense, but only in light of subsequent theorems. Matrix representation is in the latter category.
We begin with a linear transformation and produce a matrix. So what? Here's the theorem that justifies
the term "matrix representation."
Theorem FTMR
Fundamental Theorem of Matrix Representation
Suppose that T: U H V is a linear transformation, B is a basis for U, C is a basis for V and MB C is the
matrix representation of T relative to B and C. Then, for any u E U,

                                    PC (T (u)) = MB,c (PB (u))


Version 2.02


﻿
                                                               Section MR  Matrix Representations 546


or equivalently
                                     T (u) = pC1 (Ma,c (PB (u)))


Proof   Let B = {ui, u2, u3, ..., un} be the basis of U. Since u E U, there are scalars ai, a2, a3, ..., an
such that
                                u=alul+ a2u2 + a3u3 - -+ anu


Then,


Ma,cPB (u)
   [PC (T(ui))| pC (T(u2))| PC (T(u3))| ... PC (T (un))] pB (u)
                                                        ai
                                                        a2
 = [pc(T(ui))| pC (T(u2))| PC (T (u3)) .. PC(T (un))] a3


 = aipc (T (ui)) + a2Pc (T(u2)) + -..-+ anpco(T (un))
 = PC (aiT (u1) + a2T (u2) + a3T (u3) + ... + anT (un))
 = PC (T (aiui + a2u2 + a3u3 + . + anun))
 = PC (T (u))


Definition MR [542]


Definition VR [530]


Definition MVP [194]
Theorem LTLC [462]
Theorem LTLC [462]


The alternative conclusion is obtained as


T (u) = I( ((u))
      = (p'° oPC) (T (u))
      = PC' (Pc (T (u)))
      = phi (MBT,C (PB (u)))


Definition IDLT [508]
Definition IVLT [508]
Definition LTC [469]


0


   This theorem says that we can apply T to u and coordinatize the result relative to C in V, or we can
first coordinatize u relative to B in U, then multiply by the matrix representation. Either way, the result is
the same. So the effect of a linear transformation can always be accomplished by a matrix-vector product
(Definition MVP [194]). That's important enough to say again. The effect of a linear transformation is a
matrix-vector product.


T


   U

PB


T (u)

    PC


          PB(U) )B,CpB (upc(T (u))
Diagram FTMR. Fundamental Theorem of Matrix Representations


The alternative conclusion of this result might be even more striking. It says that to effect a linear trans-
formation (T) of a vector (u), coordinatize the input (with PB), do a matrix-vector product (with MB c),
and un-coordinatize the result (with p-1). So, absent some bookkeeping about vector representations, a


Version 2.02


﻿
                                                              Section MR  Matrix Representations 547


linear transformation is a matrix. To adjust the diagram, we "reverse" the arrow on the right, which
means inverting the vector representation PC on V. Now we can go directly across the top of the diagram,
computing the linear transformation between the abstract vector spaces. Or, we can around the other
three sides, using vector representation, a matrix-vector product, followed by un-coordinatization.
                                        T
                             u                    T(u) = p-1 (MBT,CpB (u))


PB


Pc


1


                               PB ,U                      BT,cpB (u)
                           PB (u)MBCMcB()
           Diagram FTMRA. Fundamental Theorem of Matrix Representations (Alternate)

   Here's an example to illustrate how the "action" of a linear transformation can be effected by matrix
multiplication.
Example ALTMM
A linear transformation as matrix multiplication
In Example OLTTR [542] we found three representations of the linear transformation S. In this example,
we will compute a single output of S in four different ways. First "normally," then three times over using
Theorem FTMR [544].
   Choose p(x) = 3 - x + 2x2 - 5x3, for no particular reason. Then the straightforward application of S
to p(x) yields

             S (p(x)) = S (3 - x + 2x2 - 5x3)
                          3(3) + 7(-1) + 2(2) - 5(-5)  8(3) + 14(-1) - 2(2) - 11(-5)
                          -4(3) - 8(-1) + 2(2) + 6(-5) 12(3) + 22(-1) - 4(2) - 17(-5)
                          23 61
                          -30 91


Now use the representation of S relative to the bases B and C and Theorem FTMR [544].
will employ the following linear combination in moving from the second line to the third,

                3 - x + 2x2 - 5x3 = 48(1 + 2x + x2 - x3) + (-20)(1 + 3x + x2 + x3)+
                                     (-1)(-1 - 2x + 2x3) + (-13)(2 + 3x + 2X2 - 5X3)


Note that we


S (p(x)) = p (MBCPB (p(x)))
          Pcl (Ms,CPB (3 - x + 2x2 - 5z3)
                        48

         -Pc    MB)c -210
                       -13_
                 -90 -72     114   -2201   48
           _1     37    29   -46    91     -20
                  -40-34      54   -96     -1
                 4      3    -5     10    _-13_
                 -134
           _1l     59
           - PC -46
                   7


Version 2.02


﻿
                                                          Section MR  Matrix Representations 548


                         _  34  1  1+5     2  3 +     -    1  -1 +     -1  -4
                             (-3)1 2+5     2  5 +   -6    0   -2 +    -2   -4-
                           23  61
                           -30 91

Again, but now with "nice" bases like D and E, and the computations are more transparent.


S (p(x)) = p- (MPD (p(x)))
        = P- (MEPD (3 - x + 22 - 5x3))
        = p- (MEPD (3(1) + (-1)(x) + 2(x2) + (-5)(
                      3

        = p, MD)E21
                     .-5_
                 3    7  -2   -5     3
                 8   14 -2 -11      -1
           - ~l-4-8 2 6              2j
                 12  22 -4 -17] [-5_
                 23
          _-1    61
          -ySE  -30
                 91]


        =23 0 0 +61  0 0+(-30) 1 0J+91[0
           23   61
           -30 91


X3))


0
1]


OK, last time, now with the bases F and G. The coordinatizations will take some work this time, but the
matrix-vector product (Definition MVP [194]) (which is the actual action of the linear transformation) will
be especially easy, given the diagonal nature of the matrix representation, MFG. Here we go,


S (p(x)) = p- (MGPF (p(x)))
        = p- (MGPF (3 - x + 22 - 5x3))
        = pG (M GpF (32(1 + x - 2 + 2x3) - 7(-1 + 2x + 2x3) - 17(2 +
                     032

       = p lMFG       -7

                   010
                     .-2_
                2   0  0 0     32
           _1   0 -1 0 0       -7
                0   0   1 0   -17
                0   0  0 0_-2
                '64

            _-1 7


       =  64   1 2J + 0   l 2J + (-17)   - 2 3 ]+ 0 0  2


2 + 3x3) - 2(1 + x + 2x3)))


Version 2.02


﻿
                                             Subsection MR.NRFO   New Representations from Old 549


            23   61
            -30 91

This example is not meant to necessarily illustrate that any one of these four computations is simpler than
the others. Instead, it is meant to illustrate the many different ways we can arrive at the same result, with
the last three all employing a matrix representation to effect the linear transformation.
   We will use Theorem FTMR [544] frequently in the next few sections. A typical application will feel like
the linear transformation T "commutes" with a vector representation, pc, and as it does the transformation
morphs into a matrix, MBc, while the vector representation changes to a new basis, pB. Or vice-versa.

Subsection NRFO
New Representations from Old


In Subsection LT.NLTFO [467] we built new linear transformations from other linear transformations.
Sums, scalar multiples and compositions. These new linear transformations will have matrix representations
as well. How do the new matrix representations relate to the old matrix representations? Here are the
three theorems.
Theorem MRSLT
Matrix Representation of a Sum of Linear Transformations
Suppose that T: U H V and S: U H V are linear transformations, B is a basis of U and C is a basis of
V. Then
                                      MB+Cj= MB,c + MB ,C


Proof Let x be any vector in C". Define u E U by u = pB (x), so x = PB (u). Then,

            M   +Cx = MB+JpB (u)                              Substitution

                    = PC ((T + S) (u))                        Theorem FTMR [544]
                    = PC (T (u) + S (u))                      Definition LTA [467]
                    = PC (T (u)) + PC (S (u))                 Definition LT [452]
                    = MB,C (PB (u)) + MB,C (PB (u))           Theorem FTMR [544]
                    = (MT,C + MB ,C) PB (u)                   Theorem MMDAA [201]
                    = (MT,C + MBC) x                          Substitution

Since the matri csMT+S andBC MBC+ M s  C hay equalmatrix-vector products for every vector in C" by
Theorem EMMVP [196] they are equal matrices. (Now would be a good time to double-back and study
the proof of Theorem EMMVP [196]. You did promise to come back to this theorem sometime, didn't
you?)U

Theorem MRMLT
Matrix Representation of a Multiple of a Linear Transformation
Suppose that T: U a V is a linear transformation, ca E C, B is a basis of U and C is a basis of V. Then

                                             Mj c  MgT~


Proof Let x be any vector in C". Define u E U by u = pg (x), so x = PB (u). Then,

                 M1jfx = MJfPB (u)                       Substitution


Version 2.02


﻿
Subsection MR.NRFO  New Representations from Old 550


PC ((aT) (u))
PC (aT (u))
apc (T (u))

a(MB)~CPB (u))
(cM4,c) PB (u)
(oM,c) x


Theorem FTMR [544]
Definition LTSM [468]
Definition LT [452]
Theorem FTMR [544]
Theorem MMSMM [201]
Substitution


Since the matrices M C and aMBT have equal matrix-vector products for every vector in C", by Theorem
EMMVP [196] they are equal matrices.                                                          U
   The vector space of all linear transformations from U to V is now isomorphic to the vector space of all
m x n matrices.


Theorem MRCLT
Matrix Representation of a Composition of Linear Transformations
Suppose that T: U a V and S: V a W are linear transformations, B is a basis of U, C
and D is a basis of W. Then
                                       M  D= MS,DMB,C


Proof Let x be any vector in C". Define u E U by u = pB (x), so x = PB (u). Then,


is a basis of V,


El


MO x =MBDPB (u)
       = PD ((S o T) (u))
       = PD (S (T (u)))
       = Mc,DPC (T (u))
       = Mc,D (M,CPB (u))
       = (MM   ) PB (u)
       = (MM        ) x


Substitution
Theorem FTMR [544]
Definition LTC [469]
Theorem FTMR [544]
Theorem FTMR [544]
Theorem MMA [202]
Substitution


Since the matrices MSOT and Ms MT e have equal matrix-vector products for every vector in C', by
Theorem EMMVP [196] they are equal matrices.                                                  U
   This is the second great surprise of introductory linear algebra. Matrices are linear transformations
(functions, really), and matrix multiplication is function composition! We can form the composition of
two linear transformations, then form the matrix representation of the result. Or we can form the matrix
representation of each linear transformation separately, then multiply the two representations together via
Definition MM [197]. In either case, we arrive at the same result.
Example MPMR
Matrix product of matrix representations
Consider the two linear transformations,

                    T:(C2F- P2 T=([1) =(-a + 3b) + (2a + 4b)xz+ (a - 2b)2

                                                  2F a+b+2c      a+4b-c1
                   S: P2 -HM22    S(a+bx+cx2)       2
                                                      -a+3c     3a+b+2c_

and bases for C2, P2 and M22 (respectively),


                          B    {  ],[3 2
                                   1  


Version 2.02


﻿
                                              Subsection MR.NRFO  New Representations from Old 551


                           C ={1-2x+92, -1 + 3x, 2x + 3x2}

                             (1-2             1 -      l- 2    2 -3
                             D=[1 -1]'[1        2]' 0     0]'[2    2_

Begin by computing the new linear transformation that is the composition of T and S (Definition LTC
[469], Theorem CLTLT [470]), (S o T) : C2 1 M22,

       (SoT)         - S()T-b)

                     = S ((-a + 3b) + (2a + 4b)z + (a - 2b)z2)


2(-a + 3b) + (2a + 4b) + 2(a-
     -(-a + 3b) + 3(a - 2b)


2b) (-a + 3b) + 4(2a + 4b) - (a -
     3(-a + 3b) + (2a + 4b) + 2(a


2b)
- 2b)_


                        2a + 6b 6a + 21b
                        4a - 9b a + 9b

Now compute the matrix representations (Definition MR [542]) for each of these three linear transformations
(T, S, S o T), relative to the appropriate bases. First for T,


P(  T   L1))


Pc (lOx + z2)


PC (28(1 - 2x + x2) + 28(-1 + 3x) + (-9)(2x + 3x2))


28
28
-9_


pC T


PC (1 + 8x)


pc (33(1 - 2x + x2) + 32(-1 + 3x) + (-11)(2x + 3X2))


33
32
-11_


So we have the matrix representation of T,

                                                 28    33
                                        Mc=      28    32
                                                 -9 -11]


Now, a representation of S,


PD (S (1 - 2x + x2))PD(2

                    = PD (-11

                       -_11
                       -21
                         0


    PD (S (-1 + 3x)) = PD (26K0

                    = PD (26 [


1 ) 2
K +1


-21) i   21J +0 I  1   J


+(17) [ 2 2 ] )


+51 1            +0     0l   +(-38) 2     -3


Version 2.02


﻿
Subsection MR.NRFO   New Representations from Old 552


PD (S (2x + 3x2))


  26
  51
  0
  -381

PD([9 g])


PD  34[1-2-]+67 [           1 0[ l0]

  34
  67
  1
  -46]


+(-46)        3])
        L


So we have the matrix representation of S,


                                    MaD[


Finally, a representation of S o T,


-11
-21
0
17


26
51
0
-38


34
67
1
-46]


PD ((SOT)


PD ((SoT)


(['1))


([PD)/


PD([3     12])

PD(      1 -2

  114
  237
  -9
  -174

PD([iL   1   I]


PD 95 [1 1'-]

   95
   202
   -11
 _-149


[1  -1


J +(-174) -2 -
                23J /


1 +202 [1 1 21] + (-11) 1- 01 J + (-149)  2   23J /


So we have the matrix representation of S o T,


~soT[
M,D


114
237
-9
-174


95
202
-11
-149]


Version 2.02


﻿
                                           Subsection MR.PMR  Properties of Matrix Representations 553


Now, we are all set to verify the conclusion of Theorem MRCLT [549],

                                            -11   26    34    28    33
                                MS  T   =-21      51    67     28   3
                              MC,D MB,C=     0     0     1     28   3
                                            17    -38   -46]-g       -
                                            114 95
                                            237     202
                                            -9 -11
                                            L-174 -149
                                        =MST

We have intentionally used non-standard bases. If you were to choose "nice" bases for the three vector
spaces, then the result of the theorem might be rather transparent. But this would still be a worthwhile
exercise   give it a go.

   A diagram, similar to ones we have seen earlier, might make the importance of this theorem clearer,

                                           Definition MR
                                S, T                        >Mc, D, BC

                   Definition LTC                                    Definition MM

                                S o T                   >MB D = MC,D      B,C
                                           Definition MR
         Diagram MRCLT. Matrix Representation and Composition of Linear Transformations

One of our goals in the first part of this book is to make the definition of matrix multiplication (Definition
MVP [194], Definition MM [197]) seem as natural as possible. However, many are brought up with an entry-
by-entry description of matrix multiplication (Theorem ME [425]) as the definition of matrix multiplication,
and then theorems about columns of matrices and linear combinations follow from that definition. With
this unmotivated definition, the realization that matrix multiplication is function composition is quite
remarkable. It is an interesting exercise to begin with the question, "What is the matrix representation
of the composition of two linear transformations?" and then, without using any theorems about matrix
multiplication, finally arrive at the entry-by-entry description of matrix multiplication. Try it yourself
(Exercise MR.T80 [564]).


Subsection PMR
Properties of Matrix Representations


It will not be a surprise to discover that the kernel and range of a linear transformation are closely related
to the null space and column space of the transformation's matrix representation. Perhaps this idea has
been bouncing around in your head already, even before seeing the definition of a matrix representation.
However, with a formal definition of a matrix representation (Definition MR [542]), and a fundamental
theorem to go with it (Theorem FTMR [544]) we can be formal about the relationship, using the idea of
isomorphic vector spaces (Definition IVS [515]). Here are the twin theorems.


Theorem KNSI
Kernel and Null Space Isomorphism
Suppose that T: U H V is a linear transformation, B is a basis for U of size n, and C is a basis for V.


Version 2.02


﻿
                                          Subsection MR.PMR   Properties of Matrix Representations 554


Then the kernel of T is isomorphic to the null space of MBT,

                                         K(T ) =   (Ma,c)
                                                                 BC'I

Proof To establish that two vector spaces are isomorphic, we must find an isomorphism between them,
an invertible linear transformation (Definition IVS [515]). The kernel of the linear transformation T, K(T),
is a subspace of U, while the null space of the matrix representation, N(MBc) is a subspace of C". The
function PB is defined as a function from U to C", but we can just as well employ the definition of PB as
a function from KC(T) to NMB c)
   We must first insure that if we choose an input for PB from C(T) that then the output will be an
element of NMB c). So suppose that u E K(T). Then

                  MB,cpB (u) = pc (T (u))                   Theorem FTMR [544]
                              = pc (0)                      Definition KLT [481]
                              = 0                           Theorem LTTZZ [456]


This says that PB (u) E N MB c), as desired.
   The restriction in the size of the domain and codomain PB will not affect the fact that PB is a linear
transformation (Theorem VRLT [530]), nor will it affect the fact that PB is injective (Theorem VRI
[534]). Something must be done though to verify that PB is surjective. To this end, appeal to the
definition of surjective (Definition SLT [492]), and suppose that we have an element of the codomain,
x E NMBc) C C" and we wish to find an element of the domain with x as its image. We now show
that the desired element of the domain is u = pg (x). First, verify that u E K(T),

                T' (u) = T (pg (x))
                     = pel (Mac (PB (ph' (x))))                Theorem FTMR [544]
                     = pc (Mgc (Ica (x)))                      Definition IVLT [508]
                     = pc (Mgcx)                               Definition IDLT [508]
                     = p-1 (Ocn)                               Definition KLT [481]
                     = Ov                                      Theorem LTTZZ [456]

Second, verify that the proposed isomorphism, PB, takes u to x,

               pB (UV=pB(p-B(x))                               Substitution
                       =Icn (x)                                Definition IVLT [508]
                       =x                                      Definition IDLT [508]

With PB demonstrated to be an injective and surjective linear transformation from K(T) toN MTc
Theorem ILTIS [511] tells us PB 15 invertible, and so by Definition IVS [515], we say KC(T) andN Mg
are isomorphic.U

Example KVMR


Kernel via matrix representation
Consider the kernel of the linear transformation

     T : M22 H P2, T=([a     $1)(2a - b + c - 5d) + (a + 4b + 5b + 2d)x + (3a - 2b + c - 8d)x2


Version 2.02


﻿
                                          Subsection MR.PMR   Properties of Matrix Representations 555


We will begin with a matrix representation of T relative to the bases for M22 and P2 (respectively),

                                B 1{[ 1 -1'] [11 -4]' [  22] ' [22  -4]}

                        C= {1+x+x2, 2+3x, -1 -2x2}

Then,


            Pc \T (1=J /           pc (4 + 2x + 6x2)

                                 = Pc (2(1 + z + x2) + 0(2 + 3x) + (-2)(-1 - 2x2)
                                    - 2
                                 =   0


            Pc \T                = pc (18 + 28x2

                                 = pc ((-24)(1 + x + 2) + 8(2 + 3x) + (-26)(-1 - 2x2))
                                    -24
                                 =48
                                    -26_

             PC(T([0      -2]))=pc(10 + 5x+ 15x2)

                                 = pc (5(1 + x + x2) + 0(2 + 3x) + (-5)(-1 - 2x2)
                                     5
                                 =   0
                                    --5-

            PcT(-2 24)           = pc (17 + 4x + 26x2)

                                 = pc ((-8)(1 + x + x2) + (4)(2 + 3x) + (-17)(-1 - 2x2))
                                     -8
                                 =    4
                                    -17_

So the matrix representation of T (relative to B and C) is


We know from Theorem KNSI [5521 that the kernel of the linear transformation T is isomorphic to the
null space of the matrix representation MI~c and by studying the proof of Theorem KNSI [552] we learn
that PB is an isomorphism between these null spaces. Rather than trying to compute the kernel of T using
definitions and techniques from Chapter LT [452] we will instead analyze the null space of MV4c using
techniques from way back in Chapter V [83]. First row-reduce MTB,


  2 -24 5       -81F1             0      21
  0    8    0    4    RREF:      [0 0
[-2 -26 -5 -17_               0   0 0 0


Version 2.02


﻿
                                           Subsection MR.PMR   Properties of Matrix Representations 556


So, by Theorem BNS [139], a basis for N(M) )is

                                              -5  -   -
                                              2
                                              0      -2
                                              1   '  0


We can now convert this basis of NM  B C   into a basis of C(T) by applying p-1 to each element of the
basis,
                    - 5-
              1i ([]            5  [1    2]       1o'  341    1 [  2] +0[2       54
              p-           =(-2)    112]+01[             41+      [ 22 +1    224


                 B   1          2 [- -1   1        1-4         0    2       -2-
                    .0_
                                3 _-
                                2 2.


             p-       2    = (-2)+(-                          +0          + 1
                 B   01         -    1 1 ]       2 [-1    -4J     0   -2J      -2   -


So the set

                                          2   23J. L2       .
is a basis for K(T) Just for fun, you might evaluate T with each of these two basis vectors and verify that
the output is the zero polynomial (Exercise MR.C10 [562]).
   An entirely similar result applies to the range of a linear transformation and the column space of a
matrix representation of the linear transformation.
Theorem RCSI
Range and Column Space Isomorphism
Suppose that T: U i V is a linear transformation, B is a basis for U of size n, and C is a basis for V of
size m. Then the range of T is isomorphic to the column space of MT,c


Proof To establish that two vector spaces are isomorphic, we must find an isomorphism between them,
an invertible linear transformation (Definition IVS [515]). The range of the linear transformation T, 7Z(T),
is a subspace of V, while the column space of the matrix representation, C (Mg,c) is a subspace of Cm.
The function pc is defined as a function from V to Cm, but we can just as well employ the definition of
pc as a function from 7Z(T) to C (MgT ).
   We must first insure that if we choose an input for pc from RZ(T) that then the output will be an


element of C (MB c). So suppose that v E R(T). Then there is a vector u E U, such that T (u) = v.
Consider

                  MT,CPB (u) = pc (T (u))                    Theorem FTMR [544]


Version 2.02


﻿
                                           Subsection MR.PMR   Properties of Matrix Representations 557


                              = pc (v)                       Definition RLT [496]

This says that pc (v) E C (MC), as desired.
   The restriction in the size of the domain and codomain will not affect the fact that pc is a linear
transformation (Theorem VRLT [530]), nor will it affect the fact that pc is injective (Theorem VRI [534]).
Something must be done though to verify that PC is surjective. This all gets a bit confusing, since the
domain of our isomorphism is the range of the linear transformation, so think about your objects as you go.
To establish that PC is surjective, appeal to the definition of a surjective linear transformation (Definition
SLT [492]), and suppose that we have an element of the codomain, y E C (M,) C' and we wish to
find an element of the domain with y as its image. Since y E C (M ,, there exists a vector, x E C"
with MB Cx = y. We now show that the desired element of the domain is v = p-1 (y). First, verify that
v E R(T) by applying T to u = pg (x),

                T (u) = T(pl(X))
                      = pol (MI,c (PB (p-1 (x))))               Theorem FTMR [544]
                      = pl (M,c (Icn (x)))                      Definition IVLT [508]
                      = pcl (MI,cx)                             Definition IDLT [508]
                      = pl (y)                                  Definition CSM [236]
                      = v                                       Substitution

Second, verify that the proposed isomorphism, pc, takes v to y,

               PC (v) = pc (ps1(y))                             Substitution
                      = Icm (y)                                 Definition IVLT [508]
                      = y                                       Definition IDLT [508]

With pc demonstrated to be an injective and surjective linear transformation from R(T) to C(M,

Theorem ILTIS [511] tells us pc is invertible, and so by Definition IVS [515], we say R(T) and C (MBC)
are isomorphic.                                                                                   U

Example RVMR
Range via matrix representation
In this example, we will recycle the linear transformation T and the bases B and C of Example KVMR
[553] but now we will compute the range of T,

      T :M22F- P2,   T([     =1)(2a -b+ c- 5d) + (a+4b+ 5b+ 2d)zx+ (3a -2b+ c -8d)z2

With bases B and C,

                             B   -{[  -1_ ]' -1  -4_]' 0   -22]' -22  -4]}

                         c = {1+ x-+H2, 2+ 3x, -1 -2x2

we obtain the matrix representation


          2    -24   5    -8
M,c2=     0     8    0    4
         L-2 -26 -5 -17_


Version 2.02


﻿
                                             Subsection MR.IVLT  Invertible Linear Transformations 558


We know from Theorem RCSI [555] that the range of the linear transformation T is isomorphic to the
column space of the matrix representation MB c and by studying the proof of Theorem RCSI [555] we
learn that pC is an isomorphism between these subspaces. Notice that since the range is a subspace of
the codomain, we will employ pc as the isomorphism, rather than PB, which was the correct choice for an
isomorphism between the null spaces of Example KVMR [553].
   Rather than trying to compute the range of T using definitions and techniques from Chapter LT [452]
we will instead analyze the column space of Mk c using techniques from way back in Chapter M [182].
First row-reduce (MBC

                                 2    0  -2                0    -1
                                 -24 8 -26     RREF I0-2-
                                 5    0  -5            0   0     0
                                 -8   4 -17]           0   0    0     t_

Now employ Theorem CSRST [247] and Theorem BRS [245] (there are other methods we could choose
here to compute the column space, such as Theorem BCS [239]) to obtain the basis for C(M),


                                             0 , 1
                                         _{           54.

We can now convert this basis of C (MBT) into a basis of 7Z(T) by applying p-1 to each element of the
basis,


                      p   (  0     = (1+ z +92) - (-1 - 2x2)     2+   +3x2
                             -1

                             0                  25              33        31
                     pg      1     = (2+ 3x) -    (-1 - 2x2)       + 3x +    2
                              25                4               4         2
                              4-

So the set
                                  {2+3x+3x 2,      +3x+      z2
                                                ,4         2

is a basis for R(T).

   Theorem KNSI [552] and Theorem RCSI [555] can be viewed as further formal evidence for the Coor-
dinatization Principle [538], though they are not direct consequences.


Subsection IVLT
Invertible Linear Transformations


We have seen, both in theorems and in examples, that questions about linear transformations are often
equivalent to questions about matrices. It is the matrix representation of a linear transformation that
makes this idea precise. Here's our final theorem that solidifies this connection.


Theorem IMR
Invertible Matrix Representations
Suppose that T: U H V is a linear transformation, B is a basis for U and C is a basis for V. Then T is an


Version 2.02


﻿
                                            Subsection MR.IVLT  Invertible Linear Transformations 559


invertible linear transformation if and only if the matrix representation of T relative to B and C, MgT c is
an invertible matrix. When T is invertible,

                                          M,   =-(MB,c)


Proof ( ) Suppose T is invertible, so the inverse linear transformation T-1: V H U exists (Definition
IVLT [508]). Both linear transformations have matrix representations relative to the bases of U and V,
namely MBC and MCJ (Definition MR [542]). Then


MT- MT
  C,B B,C


MB B
M'U
MB B

[PB (IU (ui))| ps3(IU (u2))| -..  -PB (IU (un))]
[PB (ui)| PB (u2)| PB (u3) ... PB (un) ]
[ei e2|e3 ... en]
In


Theorem MRCLT [549]
Definition IVLT [508]
Definition MR [542]
Definition IDLT [508]
Definition VR [530]
Definition IM [72]


and


ToT1
Mcc
M'v
  c,c
[PC (IV (vi))| PC (IV (V2))| .. -PC (IV (va))]
[PC (vi)| Pc(v2)| PC (v3) ... PC(va)]
[eie2|e3 ... en]
In


Theorem MRCLT [549]
Definition IVLT [508]
Definition MR [542]
Definition IDLT [508]
Definition VR [530]
Definition IM [72]


These two equations show that MT C and MC     a

that when T is invertible, then MC=B(MTcj
   ( ) Suppose now that MBc is an invertible m
compute the nullity of T,

                  n (T) = dim (KC(T))
                        = dim (PJ(Mc))
                        = nT (MbC)
                        = 0


re inverse matrices (Definition MI [213]) and establish


atrix and hence nonsingular (Theorem NI [228]). We


           Definition KLT [481]
           Theorem KNSI [552]
           Definition NOM [347]
           Theorem RNNM [349]


So the kernel of T is trivial, and by Theorem KILT [484], T is injective.
   We now compute the rank of T,


r (T) = dim (R(T))
     = dim (C(MT,c))
       = (MBT,c)
     = dim (V)


Definition RLT [496]
Theorem RCSI [555]
Definition ROM [347]
Theorem RNNM [349]


Since the dimension of the range of T equals the dimension of the codomain V, by Theorem EDYES [358],
R(T) = V. Which says that T is surjective by Theorem RSLT [498].


Version 2.02


﻿
Subsection MR.IVLT  Invertible Linear Transformations 560


Because T is both injective and surjective, by Theorem ILTIS [511], T is invertible.


0


   By now, the connections between matrices and linear transformations should be starting to become
more transparent, and you may have already recognized the invertibility of a matrix as being tantamount
to the invertibility of the associated matrix representation. The next example shows how to apply this
theorem to the problem of actually building a formula for the inverse of an invertible linear transformation.
Example ILTVR
Inverse of a linear transformation via a representation
Consider the linear transformation


R: P3 -M22, R(a+bx+cx2 +x3)


a+b-c+2d 2a+3b-2c+3d
  a+b+2d       -a+b+2c-5d_


If we wish to quickly find a formula for the inverse of R (presuming it exists), then choosing "nice" bases
will work best. So build a matrix representation of R relative to the bases B and C,


B= {1, x, x2, x3


          0 0  0I '       0 '[01


}


Then,


pc (R (1))


                 1
fC \ 1 2 1J2
   pc1


                             1

Pc (R (x)) = PC   1 1J       1
                 L
                             1


pc (R (x2))


pc (R (x3))


                    -_i
p(-1 -2])          [-


                   2
      2 33
pc (2 -5J )   2

                  L-5_


So a representation of R is


MB,c     r


1
2
1


1
3
1


-1
-2
0


2
3
2
-5]


1 1    2


The matrix MB c is invertible
IMR [557]. Furthermore,


(as you can check) so we know for sure that R is invertible by Theorem


     1MR
Mc B = (MB,)


-1


1
2
1
-1


1
3
1
1


-1
-2
0
2


2         20 -7 -2 3
3         -31-2
2         -1    0   1    0
-5_       _-6   2   1   -1_


Version 2.02


﻿
                                              Subsection MR.IVLT  Invertible Linear Transformations 561


We can use this representation of the inverse linear transformation, in concert with Theorem FTMR [544],
to determine an explicit formula for the inverse itself,


R--1([a     ])


pa   Mc Bpc     [    ]

pa ((M    c) pc \_        )


PB1      s
                 [d]
       20 -7 -2       3    a
    -8       3    1  -1    b
    B -1 0        1   0    c
       [-6   2    1  -1 [d]
       20a - 7b - 2c + 3d
   -1   -8a+3b+c-d
PB H         -a + c
        -6a+2b+c-d _1
(20a - 7b - 2c + 3d) + (-8a + 3b + c - d)x
  + (-a + c)2 + (-6a + 2b + c - d)X3


Theorem FTMR [544]

Theorem IMR [557]


Definition VR [530]


Definition MI [213]


Definition MVP [194]


Definition VR [530]


   You might look back at Example AIVLT [508], where we first witnessed the inverse of a linear trans-
formation and recognize that the inverse (S) was built from using the method of Example ILTVR [559]
with a matrix representation of T.
Theorem IMILT
Invertible Matrices, Invertible Linear Transformation
Suppose that A is a square matrix of size n and T: C" H C" is the linear transformation defined by
T (x) = Ax. Then A is invertible matrix if and only if T is an invertible linear transformation.  Q
Proof   Choose bases B = C = {ei, e2, e3, ..., en} consisting of the standard unit vectors as a basis of
C"m (Theorem SUVB [325]) and build a matrix representation of T relative to B and C. Then

                                        pc (T (ei)) = pc (Aei)
                                                  = pc (Ai)
                                                  = Ai
So then the matrix representation of T, relative to B and C, is simply MBT = A. with this observation,
the proof becomes a specialization of Theorem IMR [557],
                      T is invertible <  M   ,c is invertible <  A is invertible


   This theorem may seem gratuitous. Why state such a special case of Theorem IMR [557]? Because
it adds another condition to our NMEx series of theorems, and in some ways it is the most fundamental
expression of what it means for a matrix to be nonsingular  the associated linear transformation is
invertible. This is our final update.
Theorem NME9
Nonsingular Matrix Equivalences, Round 9
Suppose that A is a square matrix of size n. The following are equivalent.


Version 2.02


﻿
                                                         Subsection MR.READ  Reading Questions 562


  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, Nf(A) = {0}.

  4. The linear system [S(A, b) has a unique solution for every possible choice of b.

  5. The columns of A are a linearly independent set.

  6. A is invertible.

  7. The column space of A is C", C(A) = C'.

  8. The columns of A are a basis for C.

  9. The rank of A is n, r (A) = n.

  10. The nullity of A is zero, n (A) = 0.

  11. The determinant of A is nonzero, det (A) # 0.

  12. A = 0 is not an eigenvalue of A.

  13. The linear transformation T: CC H CC defined by T (x) = Ax is invertible.


Proof By Theorem IMILT [560] the new addition to this list is equivalent to the statement that A is
invertible so we can expand Theorem NME8 [420].                                                   U


Subsection READ
Reading Questions


  1. Why does Theorem FTMR [544] deserve the moniker "fundamental"?

  2. Find the matrix representation, MBc of the linear transformation


                                 T: C2F- C2     T(Xl         [2zi - X2
                                                  ' [x2_     3xi + 2x2_

     relative to the bases


  3. What is the second "surprise," and why is it surprising?


Version 2.02


﻿
                                                                   Subsection MR.EXC  Exercises 563


Subsection EXC
Exercises


C1O Example KVMR [553] concludes with a basis for the kernel of the linear transformation T. Compute
the value of T for each of these two basis vectors. Did you get what you expected?
Contributed by Robert Beezer

C20 Compute the matrix representation of T relative to the bases B and C.


                                        2a - 3b+4c - 2d
T: P3 H C3,   T  -a  -Ex2 +d3) _  a+b-c-d
                                          3a + 2c - 3d_


      B ={1, x, x2, x3}  C  = 0  , 1  , 1
                                    0     0    1


Contributed by Robert Beezer Solution [565]

C21 Find a matrix representation of the linear transformation T relative to the bases B and C.

                                 T: P2 F_ C2, T (p(x)) =P()
                                                           Lp(3),
                                 B = {2 - 5z+x2, 1 +xz-x2, x2}

                                 C -= ~ ~
                                    {    3    2


Contributed by Robert Beezer Solution [565]

C22 Let S22 be the vector space of 2 x 2 symmetric matrices. Build the matrix representation of the
linear transformation T: P2 H S22 relative to the bases B and C and then use this matrix representation
to compute T (3 + 5x - 2x2).


B={I, 1+X, 1+x+zx2}


c - { Lo 1 01     ii   0 of ' L0 of ' Lo  J


T (a + bx + cx2)


2a-b+c a+3b-c
aa+3b-c  a-c]


Contributed by Robert Beezer Solution [565]

C25 Use a matrix representation to determine if the linear transformation T: P3 H M22 surjective.

                    T(a+bx+cx2+dx3)           -a+4b+c+2d 4a-b+6c-d
                                              a+5b-2c+2d a+2c+5d_

Contributed by Robert Beezer Solution [566]

C30 Find bases for the kernel and range of the linear transformation S below.

      S: M22 H P2, S([a       k=(a+ 2b+ 5c - 4d)+ (3a - b+8c+ 2d)x+ (a+b+4c - 2d)z2


Version 2.02


﻿
                                                                   Subsection MR.EXC   Exercises 564


Contributed by Robert Beezer Solution [567]

C40 Let S22 be the set of 2 x 2 symmetric matrices. Verify that the linear transformation R is invertible
and find R-1.

                R: S22 i- P2, R IIbb = (a -b) +(2a -3b -2c)z+ (a -b+c)z2


Contributed by Robert Beezer Solution [567]

C41 Prove that the linear transformation S is invertible. Then find a formula for the inverse linear
transformation, S-1, by employing a matrix inverse. (15 points)

                            S: P1F-M1,2,    S(a + bx) = [3a + b 2a + b]


Contributed by Robert Beezer Solution [568]

C42    The linear transformation R: M12 - M21 is invertible. Use a matrix representation to determine a
formula for the inverse linear transformation R-1: M21 - M12.

                                                    a + 3b
                                         ([a b=     4a+11b


Contributed by Robert Beezer Solution [569]

C50 Use a matrix representation to find a basis for the range of the linear transformation L. (15 points)

        L: M22F- P2, T([a        D = (a+2b+4c+d)+(3a+c-2d)x+(-a+b+3c+3d)2


Contributed by Robert Beezer Solution [569]

C51 Use a matrix representation to find a basis for the kernel of the linear transformation L. (15 points)

        L: M22 -HP2, T([      =D J  (a+2b+4c+d)+(3a+c-2d)x+(-a+b+3c+3d)x2


Contributed by Robert Beezer

C52 Find a basis for the kernel of the linear transformation T: P2 s Ml22.
                                T~b2       - Ea+ 2b -2c       2a+ 2b1
                                  ~a~xcx1-a+b -4c 3a+ 2b+ 2cJ


Contributed by Robert Beezer Solution [570]

M20 The linear transformation D performs differentiation on polynomials. Use a matrix representation
of D to find the rank and nullity of D.


                                   D:PdR toP, D (p(x)) = p'(x)


Contributed by Robert Beezer Solution [571]


Version 2.02


﻿
                                                                      Subsection MR.EXC   Exercises 565


T20    Construct a new solution to Exercise B.T50 [337] along the following outline. From the n x n matrix
A, construct the linear transformation T: C"m H C", T (x) = Ax. Use Theorem NI [228], Theorem IMILT
[560] and Theorem ILTIS [511] to translate between the nonsingularity of A and the surjectivity/injectivity
of T. Then apply Theorem ILTB [486] and Theorem SLTB [501] to connect these properties with bases.
Contributed by Robert Beezer Solution [571]

T60 Create an entirely different proof of Theorem IMILT [560] that relies on Definition IVLT [508] to
establish the invertibility of T, and that relies on Definition MI [213] to establish the invertibility of A.
Contributed by Robert Beezer

T80    Suppose that T: U H V and S: V H W     are linear transformations, and that B, C and D are
bases for U, V, and W. Using only Definition MR [542] define matrix representations for T and S. Using
these two definitions, and Definition MR [542], derive a matrix representation for the composition S o T in
terms of the entries of the matrices MBc and MID. Explain how you would use this result to motivate a
definition for matrix multiplication that is strikingly similar to Theorem EMP [198].
Contributed by Robert Beezer Solution [572]


Version 2.02


﻿
                                                                 Subsection MR.SOL  Solutions 566


Subsection SOL
Solutions


C20    Contributed by Robert Beezer   Statement [562]
Apply Definition MR [542],


  PC (T (1))


  Pc (T (x))


Pc (T (2))


Pc (T (x3))


p c   1    =C


Pc 1


PC 1
      .-3_


Pc 1


= PC


= Pc


= Pc


1]
0]
_0_

(-4)

   .1
5 0
   0

(-3)


        1
+(-2) 1
        -0-

    1     1
  0[ +1
  0       0

  + (-3) 1
_ []
  1       1
  0 + 4 1
  0       0


     1
+3 1         K

   01       =
       1
 ]+ [i])=

       1

  + 2 1 =


  +(-3) 1
          1


1
-2
3]
-4
  1
  0
  5
  -3
  2
    -3
 =-4
    -3_


These four vectors are the columns of the matrix representation,


          1
MgT,c = -2
          3


-4 5
1 -3
0    2


-3
4
-3]


C21 Contributed by Robert Beezer
Applying Definition MR [542],


               Pc (T (2 - 5x+x2))


  Statement [562]


Pc  [] = Pc \2 4 + (-4)

: Pc   -5 [ ]1 = PC 13 IJ+ (-19) [ ) - 19


: PC ( ]   - Pc  (-15) 4   + 23 []  235


pc (T (1 + zx


X2))


Pc (T (x2))


So the resulting matrix representation is


MB, C  -4


13 -15
-19 23


C22 Contributed by Robert Beezer


Statement [562]


Input to T the vectors of the basis B and coordinatize the outputs relative to C,


    Pc (T(1))


Pc (T(1+x))


     -( ])  ([ 0] [  1 [0 J) 2
Pc Pc           c  2 [     + 1       + 1         1 1


Pc(L41J = Pc (1 [ ]+ 4[10  + 1 [  ]


Version 2.02


﻿
                                                                 Subsection MR.SOL Solutions 567


          PC (T (1+ xz+ x2)) =-pc       ]    pc   2     ]+          +0


Applying Definition MR [542] we have the matrix representation

                                                ~2 1 2
                                       MgT,c =  1 4 3
                                                _1 1 0_

To compute T (3 + 5x - 2x2) employ Theorem FTMR [544],

              T (3 + 5x - 2x2)  p1 (4MT,CPB (3 + 5x - 2x2))

                              = p   (MBT,CPB ((-2)(1) + 7(1 + x) + (-2)(1 + X + x2))
                                       2  1 2    -2
                              = p-       4 1 4 3 7
                                        11 0_ -2


                              = p-C    20
                                       5_


                                 -120
                                 20   5

You can, of course, check your answer by evaluating T (3 + 5x - 2x2) directly.

C25    Contributed by Robert Beezer  Statement [562]
Choose bases B and C for the matrix representation,


B = {1, X, 2, x3


    r1 01       11  o 01       011
- { [ ]' [o o]' Li o]' [o 1] j


Input to T the vectors of the basis B and coordinatize the outputs relative to C,


       pc (T (1)) = pc  it41= pc L(-1)          + 4[0 + 1 O0J + 1-0O/


           \                    J 4pc (T ( x)) = pc= P - J/ -Pc 4 [ ]+ (- 1) J+ 5 L + 0 -


-]
4
1


0

-1


[12
62
-2
2

21

5
  Version 2.02


PC (T (2))


Pc (T (x3))


Pc   12 26 ])   Pc (1 [0 0] +6 [0 0] +(-2)     J +2 -0 0 OJ / 2 PC (-2  51J /  Pc (2 [0 0] + (-1) L   J +2  O  0 J +5 -0 0 0


                                                            J /


﻿
                                                                    Subsection MR.SOL   Solutions 568


Applying Definition MR [542] we have the matrix representation

                                              -1    4    1   2
                                      MT   -   4    -1   6   -1
                                    MBc        1    5   -2   2
                                               1    0    2   5

Properties of this matrix representation will translate to properties of the linear transformation The matrix
representation is nonsingular since it row-reduces to the identity matrix (Theorem NMRRI [72]) and
therefore has a column space equal to C4 (Theorem CNMB [330]). The column space of the matrix
representation is isomorphic to the range of the linear transformation (Theorem RCSI [555]). So the range
of T has dimension 4, equal to the dimension of the codomain M22. By Theorem ROSLT [517], T is
surjective.
C30    Contributed by Robert Beezer  Statement [562]
These subspaces will be easiest to construct by analyzing a matrix representation of S. Since we can
use any matrix representation, we might as well use natural bases that allow us to construct the matrix
representation quickly and easily,


                B=CO0]0 ' [0 0_ '[ 1 0_ ' [ 0 1 ]}c = {1,zx,z2}

then we can practically build the matrix representation on sight,

                                                1   2   5 -4
                                      Me",C =   3 -1 8      2
                                                1   1   4 -2-

The first step is to find bases for the null space and column space of the matrix representation. Row-
reducing the matrix representation we find,

                                              1 0   3   0
                                            0  [    1 -2
                                            0   0   0   0

So by Theorem BNS [139] and Theorem BCS [239], we have

                               -3     0                                     -    2

             N(MBc)= <          1 '                       C(AM4,C) {=     3 ,        }


Now, the proofs of Theorem KNSI [552] and Theorem RCSI [555] tell us that we can apply p1 and p
(respectively) to "un-coordinatize" and get bases for the kernel and range of the linear transformation S
itself,


C40 Contributed by Robert Beezer Statement [563]
The analysis of R will be easiest if we analyze a matrix representation of R. Since we can use any matrix
representation, we might as well use natural bases that allow us to construct the matrix representation


quickly and easily,


                     { 1 0        0  1    0 0                       C     l)x2
                     B    -= 0 0_ ' -1 0_- 0  1_                          1   ,z


Version 2.02


﻿
                                                                    Subsection MR.SOL  Solutions 569


then we can practically build the matrix representation on sight,

                                                 ~1 -1 0
                                       MBRc=     2 -3 -2
                                                 _1 -1 1_

This matrix representation is invertible (it has a nonzero determinant of -1, Theorem SMZD [389], The-
orem NI [228]) so Theorem IMR [557] tells us that the linear transformation S is also invertible. To find
a formula for R-1 we compute,


R-1 (a + bx + cx2)


p-1 (Mepc (a + bx + cx2))

p-1 ((MRc)- pc (a + bx + cx2))
                 .a
p-1 (MRc)        b

        5   -1 -2      a
p-1     4   -1 -2      b
      \-1    0    1 _ c
      5a - b - 2c
p--    4a - b - 2c
         -a+c
 5a-b-2c 4a -b-2c
 4a-b-2c        -a+c


Theorem FTMR [544]

Theorem IMR [557]


Definition VR [530]


Definition MI [213]


Definition MVP [194]


Definition VR [530]


C41    Contributed by Robert Beezer  Statement [563]
First, build a matrix representation of S (Definition MR [542]). We are free to choose whatever bases we
wish, so we should choose ones that are easy to work with, such as

                                        B = {1, }
                                        C = { 1  0]1,[0  1]}

The resulting matrix representation is then

                                            MB,c   [    ]

this matrix is invertible, since it has a nonzero determinant, so by Theorem IMR [557] the linear transfor-
mation S is invertible. We can use the matrix inverse and Theorem IMR [557] to find a formula for the
inverse linear transformation,


S-i ([a  b])


PBl (Mp C ([a b]))

PBl ((Mc)'pc   [a b] J )

PBI (MBSC) a
                Lbj
        3 1- pBl  \ L2 1])1[])


pBl \       31J [])


Theorem FTMR [544]

Theorem IMR [557]

Definition VR [530]


Definition MI [213]


Version 2.02


﻿
                                                                  Subsection MR.SOL  Solutions 570


                        p--<                                     Definition MVP [194]

                        - (a - b) + (-2a + 3b)x                  Definition VR [530]

C42    Contributed by Robert Beezer   Statement [563]
Choose bases B and C for M12 and M21 (respectively),


The resulting matrix representation is

                                         M~c      4   11

This matrix is invertible (its determinant is nonzero, Theorem SMZD [389]), so by Theorem IMR [557],
we can compute the matrix representation of R-1 with a matrix inverse (Theorem TTMI [214]),

                                 M -cB   - 4  11 -      4 1    1


To obtain a general formula for R-1, use Theorem FTMR [544],


                               R-1   [  ) -     l M e c P   [ )

                                              1     -\L 1 1J 3

                                              S(-[11x + 3y])
                                            - B   L 4x  -_y  J
                                            [-11x+3y 4x - y]

C50 Contributed by Robert Beezer Statement [563]
As usual, build any matrix representation of L, most likely using a "nice" bases, such as

                                 B=1 0[0 11 [0 01 [0 0f
                                       0 0_ ['  0_ '    0_]' [0 1_J
                              C - {1, z,2

Then the matrix representation (Definition MR [542]) is,


                                         [9~c=     0 1    -2]


Theorem RCSI [555] tells us that we can compute the column space of the matrix representation, then use
the isomorphism pg5 to convert the column space of the matrix representation into the range of the linear
transformation. So we first analyze the matrix representation,

                                     1 24 10 0 -1]

                                11 32]3_ 0 0 lo=1j


With three nonzero rows in the reduced row-echelon form of the matrix, we know the column space has
dimension 3. Since P2 has dimension 3 (Theorem DP [345]), the range must be all of P2. So any basis of
P2 would suffice as a basis for the range. For instance, C itself would be a correct answer.


Version 2.02


﻿
                                                                    Subsection MR.SOL  Solutions 571


   A more laborious approach would be to use Theorem BCS [239] and choose the first three columns
of the matrix representation as a basis for the range of the matrix representation. These could then be
"un-coordinatized" with p-1 to yield a ("not nice") basis for P2.
C52    Contributed by Robert Beezer  Statement [563]
Choose bases B and C for the matrix representation,

                B   {1, x, x2}                c   {[1 0][01]         [0 0]   [0 0]}


Input to T the vectors of the basis B and coordinatize the outputs relative to C,

                                                                                         1

      pc (T'(1)) = po (-13 )       Pc     0 0] +2 [0 0 + (-1) L1 0      +3   0 O1        -1
                                                                                         3


      Pc(T(x))=pc([ l      2 =Pc(2[ 0] +2 [0 0] +1 [1 0] +2[ ]0     ) 11
                                          )                                       L2]

                                                                                            -2

    Pc (T (23)= Pc \-4 O2         = Pc  (-2) L0  0+0 [0 0]+(-4) -1 0J+2 -0 O1 f             -4


Applying Definition MR [542] we have the matrix representation

                                                  1   2 -2:
                                         T        2   2 M 0
                                       Mg,c =


The null space of the matrix representation is isomorphic (via PB) to the kernel of the linear transformation
(Theorem KNSI [552]). So we compute the null space of the matrix representation by first row-reducing
the matrix to,
                                              S0      2


Employing Theorem BNS [139] we have


We only need to uncoordinatize this one basis vector to get a basis for 1C (T),


                   -2
1C(T) =     pBl    2         = K{-2 + 2x + x2})
                    1


Version 2.02


﻿
Subsection MR.SOL   Solutions 572


M20     Contributed by Robert Beezer  Statement [563]
Build a matrix representation (Definition MR [542]) with the set

                                       Bpoea   a= o1,a2,..., en

employed as a basis of both the domain and codomain. Then


PB (D (1)) = PB (0)


0
0
0


0
0
0
2
0


0
0


  PB (D (x)) =PB (1) =


PB (D (x3))   PB (3x2)


1~
0
0


0
0


PB (D (x2))


PB (2x)


0
0
3


0
0


PB (D (x"))


PB (nZ


.0~
0
0


0


and the resulting matrix representation is


MB


0
0
0


0
0


1
0
0


0
0


0
2
0


0
0


0
0
3


0
0


0
0
0


0


This (n +1) x (n +1) matrix is very close to being in reduced row-echelon form. Multiply row i by }, for
1 < i < n, to convert it to reduced row-echelon form. From this we can see that matrix representation
MB    has rank n and nullity 1. Applying Theorem RCSI [555] and Theorem KNSI [552] tells us that the
linear transformation D will have the same values for the rank and nullity, as well.


T20    Contributed by Robert Beezer  Statement [564]
Given the nonsingular n x n matrix A, create the linear transformation T: C" C" defined by T (x)
Then


Ax.


A nonsingular <     A invertible
               <    T invertible
               <    T injective and surjective


Theorem NI [228]
Theorem IMILT [560]
Theorem ILTIS [511]


Version 2.02


﻿
Subsection MR.SOL   Solutions 573


    C linearly independent, and
       C spans C"
- C basis for C"


Theorem ILTB [486]
Theorem SLTB [501]
Definition B [325]


T80    Contributed by Robert Beezer  Statement [564]
Suppose that B = {Ui, u2, u3, ..., um}, C = {vi, v2, v3, ..., vn} and D = {wi, w2, w3, ..., wp}. For
convenience, set M      = ,, mij= [M].., 1 < i < n, 1 < j < m, and similarly, set N = MSD, nrij=[N]w  ,
1 < i < p, 1 < j   n. We want to learn about the matrix representation of S o T: V H W relative to B
and D. We will examine a single (generic) entry of this representation.


Mio


[pD ((So  ) (uj)) i

[PD (S (T (uj)))] i

PD (s ( Imkjvk)
         k=1.

 PD (mkjS(vk))]
     (k=1            . Ji
       n       p~
 [PD (   mkj ZnekWe)
      k=1    f=1        .2
         np
 PD          mkjiekw)
      k=1 k=1.

 [PD (       mkikw)
     (f=1 k=1.

 PD    (       mkik) w)]
     (f=1 k=1 .
 n
 Z mkjnik
k=1
n
Z nikmkj
k=1
n
  M [ ,D].ik   B kj
k=1


Definition MR [542]
Definition LTC [469]

Definition MR [542]


Theorem LTLC [462]


Definition MR [542]


Property DVA [280]


Property C [279]


Property DSA [280]


Definition VR [530]


Property CMCN [680]


Property CMCN [680]


This formula for the entry of a matrix should remind you of Theorem EMP [198]. However, while the
theorem presumed we knew how to multiply matrices, the solution before us never uses any understanding
of matrix products. It uses the definitions of vector and matrix representations, properties of linear
transformations and vector spaces. So if we began a course by first discussing vector space, and then
linear transformations between vector spaces, we could carry matrix representations into a motivation for
a definition of matrix multiplication that is grounded in function composition. That is worth saying again
   a definition of matrix representations of linear transformations results in a matrix product being the
representation of a composition of linear transformations.
   This exercise is meant to explain why many authors take the formula in Theorem EMP [198] as their
definition of matrix multiplication, and why it is a natural choice when the proper motivation is in place.
If we first defined matrix multiplication in the style of Theorem EMP [198], then the above argument,


Version 2.02


﻿
                                                                        Subsection MR.SOL  Solutions 574


followed by a simple application of the definition of matrix equality (Definition ME [182]), would yield
Theorem MRCLT [549].


Version 2.02


﻿
Section CB  Change of Basis 575


Section CB
Change of Basis


0


We have seen in Section MR [542] that a linear transformation can be represented by a matrix, once we
pick bases for the domain and codomain. How does the matrix representation change if we choose different
bases? Which bases lead to especially nice representations? From the infinite possibilities, what is the
best possible representation? This section will begin to answer these questions. But first we need to define
eigenvalues for linear transformations and the change-of-basis matrix.

Subsection EELT
Eigenvalues and Eigenvectors of Linear Transformations


We now define the notion of an eigenvalue and eigenvector of a linear transformation. It should not
be too surprising, especially if you remind yourself of the close relationship between matrices and linear
transformations.
Definition EELT
Eigenvalue and Eigenvector of a Linear Transformation
Suppose that T: V H V is a linear transformation. Then a nonzero vector v E V is an eigenvector of T
for the eigenvalue A if T (v) =Av.                                                               A
   We will see shortly the best method for computing the eigenvalues and eigenvectors of a linear trans-
formation, but for now, here are some examples to verify that such things really do exist.
Example ELTBM
Eigenvectors of linear transformation between matrices
Consider the linear transformation T: M1/l22 M il22 defined by


T([a b])


[-


-17a+ llb+8c
-14a+10b+6c


lid -57a + 35b + 24c
10d -41a+25b+16c


33d
23d_


and the vectors


      0
X1- o


11
1J


   1  1]
X2 1  0


     r1 31
X3 =L2 3


  [2  6]
14 L  4


Then compute


T (xl) = T I -  iJ      -  22-


T(X2) = T ( L-1  J ) L-2  J

T(X3)= T 1 3- -1
             2 3-1 L-2

T(X4) = T 2 6 -4
           ( -1 4-) "- 2


= 2xi


= 2x2


-3 = (-1)x3
-12
  =8 -(-2)X4


So x1, x2, x3, x4 are eigenvectors of T with eigenvalues (respectively) A1= 2, A2 = 2, A3


1, A4 = -2.


Version 2.02


﻿
                                                       Subsection CB.CBM   Change-of-Basis Matrix 576


   Here's another.

Example ELTBP
Eigenvectors of linear transformation between polynomials
Consider the linear transformation R: P2 - P2 defined by

              R(a+bx+cx2) = (15a+8b-4c)+(-12a-6b+3c)x+(24a+ 14b-7c)x2

and the vectors

            wi=1-x+x2                  w2=x+2x2                 w3=1+4x2

Then compute

                            R(wi) =R(1 -x+x2)= 3-3x+3x2 =3wi
                            R (w2) = R (x + 2x2) = 0 + Ox + Ox2 = Ow2
                            R(w3) = R(1 +4x2) = -1 -4x2 = (-1)w3


So wi, w2, w3 are eigenvectors of R with eigenvalues (respectively) A1= 3, A2 = 0, A3 = -1. Notice how
the eigenvalue A2 = 0 indicates that the eigenvector w2 is a non-trivial element of the kernel of R, and
therefore R is not injective (Exercise CB.T15 [596]).

   Of course, these examples are meant only to illustrate the definition of eigenvectors and eigenvalues for
linear transformations, and therefore beg the question, "How would I find eigenvectors?" We'll have an
answer before we finish this section. We need one more construction first.


Subsection CBM
Change-of-Basis Matrix


Given a vector space, we know we can usually find many different bases for the vector space, some nice,
some nasty. If we choose a single vector from this vector space, we can build many different representa-
tions of the vector by constructing the representations relative to different bases. How are these different
representations related to each other? A change-of-basis matrix answers this question.

Definition CBM
Change-of-Basis Matrix
Suppose that V is a vector space, and Iv: V a V is the identity linear transformation on V. Let
B ={vi, v2, v3, . . ., vn} and C be two bases of V. Then the change-of-basis matrix from B to C is
the matrix representation of Iy relative to B and C,

                      CB~BC
                          ~~PC (Iv (vi))| pc (Iy (v2))| PC (Iy (v3))| -. -PC (Iv (va))]

                                             PC (i)|pc (2)|PC (3)| . .PC va)


   Notice that this definition is primarily about a single vector space (V) and two bases of V (B, C). The
linear transformation (IV) is necessary but not critical. As you might expect, this matrix has something
to do with changing bases. Here is the theorem that gives the matrix its name (not the other way around).


Version 2.02


﻿
                                                     Subsection CB.CBM   Change-of-Basis Matrix 577


Theorem CB
Change-of-Basis
Suppose that v is a vector in the vector space V and B and C are bases of V. Then

                                       pc (v) = CB,CPB (v)


D-


Proof


pc (v) Pc (Iv (v))

      = MajepB (v)
      = CB,CPB (v)


Definition IDLT [508]
Theorem FTMR [544]
Definition CBM [575]


0


   So the change-of-basis matrix can be used with matrix multiplication to convert a vector representation
of a vector (v) relative to one basis (PB (v)) to a representation of the same vector relative to a second
basis (pc (v)).
Theorem ICBM
Inverse of Change-of-Basis Matrix
Suppose that V is a vector space, and B and C are bases of V. Then the change-of-basis matrix CB,c is
nonsingular and
                                           Cn c = Cc,B


Proof The linear transformation IV: V
by Theorem IMR [557], the matrix MB C
is nonsingular.
   Then

                    C-1 -= (M e

                         = MIV1
                              C, B
                         = MCB
                         = CC) B


i  V is invertible, and its inverse is itself, IV (check this!). So
= CB,C is invertible. Theorem NI [228] says an invertible matrix


                   Definition CBM [575]

                   Theorem IMR [557]
                   Definition IDLT [508]
                   Definition CBM [575]


0


Example CBP
Change of basis with polynomials
The vector space P4 (Example VSP [281]) has two nice bases (Example BP [326]),


B={1,x,x2,x3,x4}


C  {1,1+ x,1+ x + xx2 1+     + 2 + x3, 1+ x + x2 + x3 + x4}


To build the change-of-basis matrix between B and C, we must first build a vector representation of each
vector in B relative to C,

                                      1
                                      0
               Pc (1) = Pc ((1) (1)) = 0
                                      0
                                      0


Version 2.02


﻿
Subsection CB.CBM   Change-of-Basis Matrix 578


PC ()    PC ((-1) (1) + (1) (1+ x))


-1
1
0
0
0 _


Pc (X2) = Pc ((-1) (1 + X) + (1) (1 + X + X2))


0
-1
1
0
0 _


PC (X3) =pc ((-1) (1 + X + x2) + (1) (1 + x + X2 +X3))


0
0
-1
1
0


Pc(x4) =Pc((-1) (1+  +xz2 +x3)+(1) (1+X+x2+x3 +x4))


0
0
0
-1
1


Then we package up these vectors as the columns of a matrix,

                                           1 -1     0   0    0
                                           0   1   -1    0   0
                                  CB,C =   0   0    1   -1   0
                                           0   0    0    1  -1
                                           0   0    0   0    1

Now, to illustrate Theorem CB [576], consider the vector u = 5 - 3x + 2x2 + 8x3
representation of u relative to B easily,


3x4. We can build the


PB (u) = PB (5 - 3x + 2x2 + 8x3


         5
         -3
3x4) = 2
         8
         -3


Applying Theorem CB [576], we obtain a second representation of u, but now relative to C,


PC (u) = CB,CPB (u)
          1 -1     0
          0   1   -1
       =  0   0    1
          0   0    0
          0   0    0
          8
          -5
        =-6
           11
           -3


Theorem CB [576]


0
0
-1
1
0


0
0
0
-1
1


5
-3
2
8
-3


Definition MVP [194]


Version 2.02


﻿
Subsection CB.CBM   Change-of-Basis Matrix 579


We can check our work by unraveling this second representation,

        u = PC1 (Pc (u))
                  /8\
                  -5
          = pci    -6
                    11
                  \-3_
          = 8(1) + (-5)(1 + x) + (-6)(1 + x + x2)
              +(11)(1+x+x2+x3)+(-3)(1+x+x2 +x3+x4)
          =5 - 3x + 2x2 +8x3 - 3x4

The change-of-basis matrix from C to B is actually easier to build.
form its representation relative to B


Definition IVLT [508]


           PB


PB (1 + x)=


PB (1 +3 + z2)


pB ((1)1-


       Definition VR [530]


Grab each vector in the basis C and


                   1
                   0
(1) = PB ((1)1) = 0
                   0
                   0
                   1
                   1
B ((1)1 + (1)z) = 0
                   0
                   0
                   1
                   1
+ (1)x + (1)x2)
                   0
                   0
                   1
                   1
 (1)x2 + (1)x3)=1
                   1
                   0
                   1
                   1
 (1)x3 + (1)x4) = 1
                   1
                   1


PB (1 +3 +3:2 +3:)


PB ((1)1 + (1)3 +


              PB (1+3+3: +2-3 -+ 4) = ps((1)1 + (1)x + (1)x2 +


Then we package up these vectors as the columns of a matrix,

                                                1 1 1 1 1
                                                0 1 1 1 1
                                       CC,B =   0 0 1 1 1
                                                0 0 0 1 1
                                                _0 0 0 0 1_


Version 2.02


﻿
                                                      Subsection CB.CBM   Change-of-Basis Matrix 580


We formed two representations of the vector u above, so we can again provide a check on our computations
by converting from the representation of u relative to C to the representation of u relative to B,


PB (u) = CC,BPC (u)
          1 1 1 1 1        8
          0 1 1 1 1        -5
       =  0 0 1 1 1        -6
          0 0 0 1 1        11
          0 0 0 0 1_ -3_
          5
          -3
       = 2
           8
           -3_


Theorem CB [576]


Definition MVP [194]


One more computation that is either a check on our work, or an illustration of a theorem. The two change-
of-basis matrices, CB,C and CC,B, should be inverses of each other, according to Theorem ICBM [576].
Here we go,


CBCCC,B=


1
0
0
0
0


1


0
0
0


1 0
   -1


1
0
0


0    0    1 1 1 1 1          1
0    0    0 1 1 1 1          0
-1   0    0 0 1 1 1          0
1    -1   0 0 0 1 1          0
0    1    0 0 0 0 1          0


0
1
0
0
0


0
0
1
0
0


0 0
0 0
0 0
1 0
0 1


   The computations of the previous example are not meant to present any labor-saving devices, but
instead are meant to illustrate the utility of the change-of-basis matrix. However, you might have noticed
that CC,B was easier to compute than CB,C. If you needed CB,C, then you could first compute CC,B and
then compute its inverse, which by Theorem ICBM [576], would equal CB,C.
   Here's another illustrative example. We have been concentrating on working with abstract vector
spaces, but all of our theorems and techniques apply just as well to Cm, the vector space of column
vectors. We only need to use more complicated bases than the standard unit vectors (Theorem SUVB
[325]) to make things interesting.
Example CBCV
Change of basis with column vectors
For the vector space C4 we have the two bases,


B2              3     -3 [+]2[3]
    B1       ' 1    '  3   ' 3
      .-2_  [ _1 _ _-4_  _0 _J


        r1     -4     -5
        -6      8      13     -7
c       -4     -5     -2      3
        -1      8      9     [-6]


I


The change-of-basis matrix from B to C requires writing each vector of B as a linear combination the
vectors in C.


       1
       -2
P C    1  j
      .-2_


          1           ~-42        -5          [A

Pc   (1) _4   + (-2)   -5  + (1)  -2  + (-1)   37
         _-1_         _8 __9        _          -6_


ii


1
-2
1
_-1_


Version 2.02


﻿
Subsection CB.CBM   Change-of-Basis Matrix 581


PC


PC


PC


-1
3M


2
-3
3
-4.
-1
3
3
0 _


         -
PC (2)


pc ~(2) [


1
-6

-1.
1
-6
-4
-1.
1
-6

-1_


         -4

+ (-3) -5:

         .8
         -4

+ (-3) -5:

        . 8
        -4

+ (-2) -5

         _8 _


       -5

+ (3) -2i

        9 _
        -5

+ (1) -2i

        9 _
        -5
    +4)13
    +()-2
       _9 _


        3

+ (0) [iz i

       _-6_
         3

+ (-2) 3f

        _-6_
        3

+ (3) 37

       _-6_


[


ii


2
-3
3
0_
  1
  -3
  1
  _-2_
2
-2
4
3_


_[


Then we package these vectors up as the change-of-basis matrix,


          1    2
          -2 -3
CB,C=1         3
         -1 0


1
-3
1
-2


2
-2
4
3]


                                              2

Now consider a single (arbitrary) vector y =[63 . First, build the vector representation of y relative to

                                              4_
B. This will require writing y as a linear combination of the vectors in B,

                         2
                         6
         PB (Y) = PB     _3
                         4


PB (-21)[


11
-2
1
-2]


+(6)


-11
3
1
1]


+ (11)


21
-3
3
-4]


-1
3

_0 _


-21
  6
  11
_ -7]


Now, applying Theorem CB [576] we can convert the representation of y relative to B into a representation
relative to C,


PC (y) = CB,CPB (y)
           1    2
           -2  -3-

           -1   0-
           --12
           5
           -20
           -22_


Theorem CB [576]


1    2] -21
-3 -2       6
1    4     11
-2   3] _-7]


Definition MVP [194]


We could continue further with this example, perhaps by computing the representation of y relative to the
basis C directly as a check on our work (Exercise CB.C20 [596]). Or we could choose another vector to


Version 2.02


﻿
                                       Subsection CB.MRS Matrix Representations and Similarity  582


play the role of y and compute two different representations of this vector relative to the two bases B and
C.


Subsection MRS
Matrix Representations and Similarity


Here is the main theorem of this section. It looks a bit involved at first glance, but the proof should make
you realize it is not all that complicated. In any event, we are more interested in a special case.
Theorem MRCB
Matrix Representation and Change of Basis
Suppose that T: U H V is a linear transformation, B and C are bases for U, and D and E are bases for
V. Then
                                    MB,D   CE,DMCECB,C


Proof

              CE,DMC,ECB, C   MED MC,EMBC                 Definition CBM [575]
                            =M'VMToIUTheorem MRCLT [549]
                                E,D  B,E
                            =ME    MT                     Definition IDLT [508]
                            = MIoTTheorem MRCLT [549]

                            = MBD                         Definition IDLT [508]


   We will be most interested in a special case of this theorem (Theorem SCB [583]), but here's an example
that illustrates the full generality of Theorem MRCB [581].
Example MRCM
Matrix representations and change-of-basis matrices
Begin with two vector spaces, S2, the subspace of M22 containing all 2 x 2 symmetric matrices, and P3
(Example VSP [281]), the vector space of all polynomials of degree 3 or less. Then define the linear
transformation Q: S2 H P3 by

             Q=(5a -2b+6c)+ (3a -b+ 2c) + (a +3b -c)z2+ (-4a +2b+ c)za

Here are two bases for each vector space, one nice, one nasty. First for S2,
                           [ 2       [3 i 2(101                      0
             B    -[3 -2] '[-3  -3  '  2]}                  [00,'1     0] ' 0   JJ1
and then for F3,

      D ={2 + x - 2x2 + 3x3, -1 - 2x2 + 3x3, -3 - x + z3, -z + x3}    £   {1, x, z2, x3}
We'll begin with a matrix representation of Q relative to C and B. We first find vector representations of
the elements of C relative to E,


                                            5

PE =Q(0      )     PE (5 + 3x + x2 - 43) _


Version 2.02


﻿
Subsection CB.MRS Matrix Representations and Similarity  583


NF \(1  LL


NF \(1 \ L0 1,J


PE (-2 - x + 3x2 + 2X)


  -2
  -1

  2_
6
2


PE (6 + 2x


2 +x3)


L


So


M - r


5
3
1
-4


-2
-1
3
2


6
2
_1
1_


Now we construct two change-of-basis matrices. First, CB,c requires vector representations of the elements
of B, relative to C. Since C is a nice basis, this is straightforward,

                             5  31001                            0r 05
              Pc (- 3 -2]    = Pc  (5) [0 0]+ (-3) 1 0 + (-2) I 0 1]         32


              Pc (- 3   03J  = -pc (2) [0 0]+ (-3) [1 0]+ (0) [0 ]1   =    3


                 Pc (2 4-       c   1  0 0] + (2) 1 0] + (4) [0 10
                                    -]-- -4_


So


         5    2  1
CB,c = -3 - 2
        -2 0 4_


The other change-of-basis matrix we'll compute is CE,D. However, since E is a nice basis (and D is
not) we'll turn it around and instead compute CD,E and apply Theorem ICBM [576] to use an inverse to
compute CE,D


PE (2 + x - 2x2 +3X3)


  PE (-1 - 2x2 +3X3)


PE ((2)1 + (1)x + (-2)x2 + (3)3)_


PE ((-1)1 + (0)x + (-2)x2 + (3)x3)


2
1
-2
[3]


[0

  .3
  --3
_-1


PE (-3 - x + x3)  PE ((-3)1 + (-1)x + (O)x2 + (1)x3)


Version 2.02


﻿
                                        Subsection CB.MRS  Matrix Representations and Similarity  584


                                                                           0

                       PE (-x2   x3) =PE ((0)1+ (0)x + (-1)x2 + (1)x3)     0


So, we can package these column vectors up as a matrix to obtain CD,E and then,


CED -(CD,E)
         2   -1 -3     0   -
         1    0   -1   0
         -2 -2     0  -1
         3    3    1   1
         1   -2    1   1
         -2   5   -1 -1
         1   -3    1   1
         2   -6 -1     0_


Theorem ICBM [576]


We are now in a position to apply Theorem MRCB [581]. The matrix representation of Q relative to B
and D can be obtained as follows,


MBD =CE,DMc'ECB,c
          1   -2   1    1     5  -2   6     5
          -2   5  -1 -1      3   -1   2     -3
          1   -3   1    1    1    3   -1
          2   -6  -1    0    -4   2    1 _-2
          1   -2   1    1    19    16   25
          -2   5  -1 -1      14    9    9
          1   -3   1    1    -2    -7   3
          2   -6  -1    0    -28  -14   4_
          -39  -23   14
          62    34   -12
          -53  -32    5
          L-44 -15   -7_


Theorem MRCB [581]


2 1
-3 2
0 4


Now check our work by computing MB D directly (Exercise CB.C21 [596]).

   Here is a special case of the previous theorem, where we choose U and V to be the same vector space,
so the matrix representations and the change-of-basis matrices are all square of the same size.

Theorem SCB
Similarity and Change of Basis
Suppose that T: V H V is a linear transformation and B and C are bases of V. Then

                                     MB ,B = C§-- Mc ,CCB,C


D-


Proof In the conclusion of Theorem MRCB [581], replace D by B, and replace E by C,


MBB = CC,BJMCCCB,C
      =- C1--MCCCB,C


Theorem MRCB [581]
Theorem ICBM [576]


Version 2.02


﻿
                                          Subsection CB.MRS Matrix Representations and Similarity 585


   This is the third surprise of this chapter. Theorem SCB [583] considers the special case where a
linear transformation has the same vector space for the domain and codomain (V). We build a matrix
representation of T using the basis B simultaneously for both the domain and codomain (MB B), and then
we build a second matrix representation of T, now using the basis C for both the domain and codomain
(Mc c). Then these two representations are related via a similarity transformation (Definition SIM [432])
using a change-of-basis matrix (CB,c)!
Example MRBE
Matrix representation with basis of eigenvectors
We return to the linear transformation T: JM22 1 J-M22 of Example ELTBM [574] defined by

                       a   b       -17a + 11b+ 8c - 11d   -57a + 35b + 24c - 33d
                   T(c d           -14a+10b+6c - 10d      -41a+25b+16c - 23d

In Example ELTBM [574] we showcased four eigenvectors of T. We will now put these four vectors in a
set,

                         B{x1, x2, x3, x4}=  0 1 ' 1 0 ' 2 3_ ' [1 4_ J

Check that B is a basis of M22 by first establishing the linear independence of B and then employing
Theorem G [355] to get the spanning property easily. Here is a second set of 2 x 2 matrices, which also
forms a basis of M22 (Example BM [326]),

                      C = {Yi, Y2, Y3, y4} =10 0      [ 01' 1 [0     ' 0101

We can build two matrix representations of T, one relative to B and one relative to C. Each is easy, but
for wildly different reasons. In our computation of the matrix representation relative to B we borrow some
of our work in Example ELTBM [574]. Here are the representations, then the explanation.

                                                                        2

                 PB (T (xi)) = PB (2x1) = Ps (2x1 + 0x2 + 0x3 + 0x4) = 0
                                                                        .0_
                                                                        0

                 PB (T (x2))   PB (2x2)   PB (Oxi + 2x2 + Ox3 + 0x4) =[j

                                                                            0

                 PB (T (x3)) =PB ((-1)x3)  PB (xi + 0x2 + (-1)x3 + 0x4)


                 PB (T (x4)) =pB ((-2)x4) =pB (Oxi + Ox2 + Ox3 + (-2)x4) =  0


So the resulting representation is


         2 0    0    0
  T       0 2   0    0
MB) B     0 0 -1     0
         0 0     0   -2]


Version 2.02


﻿
Subsection CB.MRS   Matrix Representations and Similarity  586


Very pretty. Now for the matrix representation relative to C first compute,


PC (T (yi))


      -17 -57
PC  -1 -4 -


PC((


-17) L0  J + (-57) -    J + (-14) -    J


+(-41) [ ])


[


-17
-57
-14
-411


pc (T (Y2))


    \L11  35
PC (10 25]L1


Pc 111 [0 0 + 35 [0 0] +10 1       0 + 25 _0   1      351

                                                      [25_
      8 24
Pc    6 16] f

                     - 8

PC 8g 0 0] +24 [0 0]+6        1 0 l+16 L0 1Jf/       6

                                                    [16_


pc (T (y3))


Pc (T (y4))


PC


PC


-11
-10


-33)
-23]f


-11) 0   ]+ 33[  ]+ io [  J + (


-23) 0 _ f


[


-11
-33
-10
-23]


So the resulting representation is


Mc T
   ,c =


-17 11
-57 35
-14 10
-41 25


8
24
6
16


-11
-33
-10
-23]


Not quite as pretty. The purpose of this example is to illustrate Theorem SCB [583]. This theorem says
that the two matrix representations, MBT   and MCT, of the one linear transformation, T, are related
by a similarity transformation using the change-of-basis matrix CB,C. Lets compute this change-of-basis
matrix. Notice that since C is such a nice basis, this is fairly straightforward,


                                                                           0

PC (X1) = PC([ ])  C (00[  ] 0  [0  ] + 1  ]0 0+010 +1 [0 1)  ]0


                                                                           1

PC (X2) = Pc (-1 0J  c( 0 0] + 1 [0 0] + 1 1O 0-+ 0 0 O1 f  1
                                                                           0o


Version 2.02


﻿
Subsection CB.MRS   Matrix Representations and Similarity  587


                                                                            1

pc (x3) = Pc (2 3J        c(     0 0] + 3 0 0] + 2 1 0-+ 3 -0 O1            2
                                                                           [3
                                                                           2

PC (X4) = pc (L14J     = pc (2 0 0 + 6 0 0] + 1 1O 0-+ 4 _0 O1        1
                                                                           [4


So we have,
                                                 0
                                                 1
                                        CB,C     0


Now, according to Theorem SCB [583] we can write,

                           MBB = CB C Mcc CB,C

                 2 0    0    0      0   1 1 2
                 0 2    0    0       1 1 3 6
                 0 0 -1      0      0 1 2 1
                 0 0    0   -2       1 0 3 4]


1
1
1
0


1
3
2
3


2
6
1
4


[


-17
-57
-14
-41


11
35
10
25


8
24
6
16


-111
-33
-10
-23]


[


0 1
1 1
0 1
1 0


1
3
2
3


2
6
1
4


This should look and feel exactly like the process for diagonalizing a
SD [432]. And it is.


matrix, as was described in Section


   We can now return to the question of computing an eigenvalue or eigenvector of a linear transformation.
For a linear transformation of the form T: V H V, we know that representations relative to different
bases are similar matrices. We also know that similar matrices have equal characteristic polynomials by
Theorem SMEE [434]. We will now show that eigenvalues of a linear transformation T are precisely the
eigenvalues of any matrix representation of T. Since the choice of a different matrix representation leads to
a similar matrix, there will be no "new" eigenvalues obtained from this second representation. Similarly,
the change-of-basis matrix can be used to show that eigenvectors obtained from one matrix representation
will be precisely those obtained from any other representation. So we can determine the eigenvalues and
eigenvectors of a linear transformation by forming one matrix representation, using any basis we please,
and analyzing the matrix in the manner of Chapter E [396].
Theorem EER
Eigenvalues, Eigenvectors, Representations
Suppose that T: V H V is a linear transformation and B is a basis of V. Then v E V is an eigenvector of
T for the eigenvalue A if and only if PB (v) is an eigenvector of MB for the eigenvalue A.  E
Proof    (-) Assume that v E V is an eigenvector of T for the eigenvalue A. Then


Mk',BPB (v) = PB (T (v))
            = PB (Av)
            = ApB (v)


Theorem FTMR [544]
Definition EELT [574]
Theorem VRLT [530]


which by Definition EEM [396] says that PB (v) is an eigenvector of the matrix MB B for the eigenvalue A.
    (<) Assume that PB (v) is an eigenvector of MgB for the eigenvalue A. Then


T  v  = PBl (pB(T(v)))
      = PBl (MBPB (v))


Definition IVLT [508]
Theorem FTMR [544]


Version 2.02


﻿
                               Subsection CB.CELT  Computing Eigenvectors of Linear Transformations 588


                        = pB1 (APB (v))                       Definition EEM [396]
                        = APB1 (PB (v))                       Theorem ILTLT [511]
                        = Av                                  Definition IVLT [508]

which by Definition EELT [574] says v is an eigenvector of T for the eigenvalue A.           U


Subsection CELT
Computing Eigenvectors of Linear Transformations


Knowing that the eigenvalues of a linear transformation are the eigenvalues of any representation, no matter
what the choice of the basis B might be, we could now unambiguously define items such as the charac-
teristic polynomial of a linear transformation, rather than a matrix. We'll say that again eigenvalues,
eigenvectors, and characteristic polynomials are intrinsic properties of a linear transformation, independent
of the choice of a basis used to construct a matrix representation.
   As a practical matter, how does one compute the eigenvalues and eigenvectors of a linear transformation
of the form T: V H V? Choose a nice basis B for V, one where the vector representations of the values
of the linear transformations necessary for the matrix representation are easy to compute. Construct the
matrix representation relative to this basis, and find the eigenvalues and eigenvectors of this matrix using
the techniques of Chapter E [396]. The resulting eigenvalues of the matrix are precisely the eigenvalues of
the linear transformation. The eigenvectors of the matrix are column vectors that need to be converted to
vectors in V through application of pi1.
   Now consider the case where the matrix representation of a linear transformation is diagonalizable. The
n linearly independent eigenvectors that must exist for the matrix (Theorem DC [436]) can be converted (via
pB1) into eigenvectors of the linear transformation. A matrix representation of the linear transformation
relative to a basis of eigenvectors will be a diagonal matrix  an especially nice representation! Though we
did not know it at the time, the diagonalizations of Section SD [432] were really finding especially pleasing
matrix representations of linear transformations.
   Here are some examples.

Example ELTT
Eigenvectors of a linear transformation, twice
Consider the linear transformation S: M22 HM22 defined by

                     S    a  b             b-c-3d          -14a-15b-13c+dl
                          c d        18a + 21bC+ 19c + 3d   -6a-7b-7c-3d

To find the eigenvalues and eigenvectors of S we will build a matrix representation and analyze the matrix.
Since Theorem EER [586] places no restriction on the choice of the basis B, we may as well use a basis
that is easy to work with. So set


                            {f           i      1 0     01      00      00

Then to build the matrix representation of S relative to B compute,


                                                                          0
                  0-14-1
PB (S (X1)) =P s  E18   -6) =PB (0xi + (-14)x2 + 18x3 + (-6)x4) =[18
                                                                        .[-6]_


Version 2.02


﻿
Subsection CB.CELT  Computing Eigenvectors of Linear Transformations 589


PB (S (x2))


PB (S (x3))


PB (S (x4))


      -1 -15
PB (21 -7-)


      -1 -13
PB ([19 i7])


PB( 3  -3 1


= PB ((-1)xi + (-15)x2 + 21x3 + (-7)x4)


=PB ((-1) xi + (-13) x2 + 19x3 -|- (-7) x4) =


                                     -3

pB ((-3)xi + 1x2 + 3x3 + (-3)x4) = 3

                                     -3_


L


L


-1
-15
21
-7

-13
19


So by Definition MR [542] we have


             0

    = S -14
- B,B  [18
            -6


-1    -1
-15 -13
21    19
-7    -7


-3
1
3
-3]


Now compute eigenvalues and eigenvectors of the matrix representation of M with the techniques of Section
EE [396]. First the characteristic polynomial,

             pM (x) = det (M -xI4)V= x4 -x3 -10x2 + 4x + 24 = (x- 3)(x- 2)(x+ 2)2

We could now make statements about the eigenvalues of M, but in light of Theorem EER [586] we can
refer to the eigenvalues of S and mildly abuse (or extend) our notation for multiplicities to write


as (3) = 1


1


as (-2)


2


Now compute the eigenvectors of M,


A=3


A=2


M


           -3  -1  -1 -3]

 --3I4 -14 -18 -13 1    RRE
           18    21    16    3
           -6  -7  -7 -6]
                            -_ K -

EM (3) = N(M - 3I4) = 3


     1
F 0
     0
     0


0

0
0


0

0
0


0
0
01
0


0
0
Q1
0


1
-3
3
0]


2
-4
3
0_


             -2
             -14
M-214        18
            [-6


-1
-17
21
-7


-1
-13
17
-7


-3]
1    R
3
-5]
--2

-3


       1
REF    0
       0
       _0


Em (2) =   (M - 2I4) K=


Version 2.02


﻿
Subsection CB.CELT   Computing Eigenvectors of Linear Transformations 590


A=-2


                 2     -1 -1 -3]                1 0 0 -1
     -(-2)4  -14 -13 -13 1 RREF  0  W 1 1
M -(-)           18    21    21    3            0 0 0 0
               [-6  -7 -7 -1]0 0 0 0]
                                       0       1

   Em (-2) = N(M  - (-2)I14)  = -1 1  0
                                      .  0 1


According to Theorem EER [586] the eigenvectors just listed as basis vectors for the eigenspaces of M
are vector representations (relative to B) of eigenvectors for S. So the application if the inverse function
pil will convert these column vectors into elements of the vector space M22 (2 x 2 matrices) that are
eigenvectors of S. Since PB is an isomorphism (Theorem VRILT [535]), so is pil. Applying the inverse
function will then preserve linear independence and spanning properties, so with a sweeping application
of the Coordinatization Principle [538] and some extensions of our previous notation for eigenspaces and
geometric multiplicities, we can write,


P1 4
pn


pn


pg


pn


_-
3M
-31

-2
4
-3j


=11
0
-1

0
1
-1
0


(-1)xi + 3x2 + (-3)x3 + 1x4


(-2)xi + 4x2 + (-3)x3 + 1x4


Ox1 + (-1)x2 + 1x3 + Ox4 =


1xi + (-1) x2 + Ox3 + 1x4 =-


[-

[I


-1
-3


-2
-3


1]


4]
1J


0 -1
1   0


1 -1
0 1


So

                                   Es (3)VK f=    3 ]}>

                                   Es (2) =K[ }


                                 ES (-2) =   0 -11-1
                                                1 0 '-0 1_J
with geometric multiplicities given by


7S(3)= 1


ys(2)= 1


ys (-2)= 2
only now relative to a linearly


Suppose we now decided to build another matrix representation of S,
independent set of eigenvectors of S, such as


c - {[--1 31      -2 3 1]' [-3  ]' LO 01i]' [0  11J


Version 2.02


﻿
                              Subsection CB.CELT   Computing Eigenvectors of Linear Transformations 591


At this point you should have computed enough matrix representations to predict that the result of
representing S relative to C will be a diagonal matrix. Computing this representation is an example of
how Theorem SCB [583] generalizes the diagonalizations from Section SD [432]. For the record, here is the
diagonal representation,


         3 0
      s0 2
MCS)C 0 0
         _0 0


0    0
0 0
-2   0
0 -2_


Our interest in this example is not necessarily building nice representations, but instead we want to demon-
strate how eigenvalues and eigenvectors are an intrinsic property of a linear transformation, independent
of any particular representation. To this end, we will repeat the foregoing, but replace B by another basis.
We will make this basis different, but not extremely so,


                             4>                1  0    1  1    1   1    1  1
                      ThentoDu h m x r tao  of = 0 0_ ' 0 0_   1 0_' 1 1_

Then to build the matrix representation of S relative to D compute,


PD (S (Yi))


PD (S (Y2))


PD (S (Y3))


PD (S (Y4))


      0 -14
PD ( E     -61)


PD (39 I1])


      -2 -42
PD (58 -20)


      -5 -41
PD (-61 -2_ -


PD (14Y1 + (-32)Y2 + 24y3 + (-6)y4)


PD (281 + (-68)Y2 + 52y3 + (-13)Y4)


PD (40y1 + (-100)Y2 + 78y3 + (-20)y4)


PD (36Y1 + (-102)Y2 + 84y3 + (-23)y4)


  14
  -32
  24
  -6 j
  28
  568

  --13_
     40
     -100
=178
    -20
    36
    -102

    -23]


So by Definition MR [542] we have


N= MD,D


14
-32
24
-6


28
-68
52
-13


40
-100
78
-20


36
-102
84
-23_


Now compute eigenvalues and eigenvectors of the matrix representation of N with the techniques of Section
EE [396]. First the characteristic polynomial,

              pN (x) = det (N -xI4) = x4 - x3 - 10x2 + 4x + 24 = (x- 3)(x- 2)(x+ 2)2

Of course this is not news. We now know that M = M% B and N= MDj% are similar matrices (Theorem
SCB [583]). But Theorem SMEE [434] told us long ago that similar matrices have identical characteristic


Version 2.02


﻿
                              Subsection CB.CELT   Computing Eigenvectors of Linear Transformations 592


polynomials. Now compute eigenvectors for the matrix representation, which will be different than what
we found for M,


A=3


             11    28    40     361
             -32 -71 -100 -102
N-3I4=       24    52    75     84
             [-6-13 -20   -26_


  EN(3) =  V(N  - 3I4)  =  4


             12    28    40     36 1
             N 32-70 -100 -102
N-24         24    52    76     84

             [-6-13 -20   -25]
                              -6
                              7
  EN (2) = N (N - 2I14) = 1_


        1
RREF    0
       0
       [0


0
1
0
0


0
0
1
0


4
-6
4
0]


A=2


        1 0 0    6
RREF   0 1 0 -7
       0 0 1     4
       _0 0 0 0_


A = -2


                16

N - (-2)I4      -32
               24
               L-6


28
-66
52
-13


40
-100
80
-20


36             1
-102   RREF    0
84             0
-21            0
   1       3
   -2     -3

   .0 _   _1


0
1
0
0


-1
2
0
0


-3
3
0
0]


EN (-2) A=f(N - (-2)14) =


Employing Theorem EER [586] we can
obtain eigenvectors for S that also form


apply p-1 to each of the basis vectors of the eigenspaces of N to
bases for eigenspaces of S,


p1 4
p1
1 l


P l


-41
6
_4
1
-6
7
_4
1
1
-2]

0
3
-3
0


(-4)yi + 6Y2 + (-4)y3 + 1y4


(-6)y1 + 7Y2 + (-4)y3 +1'y4


lyi + (-2)Y2 + ly3 + OY4


3Yi + (-3)Y2 + Oy3 + ly4 =


[[
[-


-1 31
-3 1]


-2
-3


41
1J


0 -1
1   0


1 -21
1 1J


Version 2.02


﻿
Subsection CB.CELT   Computing Eigenvectors of Linear Transformations 593


The eigenspaces for the eigenvalues of algebraic multiplicity 1 are exactly as before,


                                       Es (3)VK f=   3 ]}>

                                       Es (2) =K{2]}
                                                    -3 1_
However, the eigenspace for A - -2 would at first glance appear to be different. Here are the two
eigenspaces for A = -2, first the eigenspace obtained from M = Ms B, then followed by the eigenspace
obtained from M =  M,D


           Es (-2) ={[ i-'1 ['-        11]              Es(-2)={0 -1            [1  -2]}

Subspaces generally have many bases, and that is the situation here. With a careful proof of set equality,
you can show that these two eigenspaces are equal sets. The key observation to make such a proof go is
that

                                     1          1   0+0        11
which will establish that the second set is a subset of the first. With equal dimensions, Theorem EDYES
[358] will finish the task. So the eigenvalues of a linear transformation are independent of the matrix
representation employed to compute them!
   Another example, this time a bit larger and with complex eigenvalues.
Example CELT
Complex eigenvectors of a linear transformation
Consider the linear transformation Q: P4 H P4 defined by

                  Q (a + bz + cx2 + dx3+ex4)
                     (-46a - 22b + 13c + 5d + e) + (117a + 57b - 32c - 15d - 4e)x+
                       (-69a - 29b+ 21c - 7e)2 + (159a + 73b - 44c - 13d + 2e)x3+
                       (-195a - 87b + 55c + 10d - 13e)x4

Choose a simple basis to compute with, say

                                        B = {1, X, X2, x3, X4}

Then it should be apparent that the matrix representation of Q relative to B is

                                            -46   -22    13    5     1
                                            117    57   -32   -15    -4
                            M=Mg~B=         -69   -29    21    0     -7
                                            159    73   -44   -13    2
                                            -195  -87    55    10   -13_

Compute the characteristic polynomial, eigenvalues and eigenvectors according to the techniques of Section
EE [396],

                         pQ (x) =-xa + 65' - z3- 88x2 + 252x - 208
                                   =-z- 2)2(z + 4) (z2 - 6x + 13)


Version 2.02


﻿
Subsection CB.CELT Computing Eigenvectors of Linear Transformations 594


((x-2)2(x+4)(x-(3+2i))(x-(3-2i))


aQ (2) = 2


aq (-4)   1


aq (3 +2i) = 1


aq (3 - 2i) = 1


        A=2
             -48   -22    13   5     1          1 0 0     1  -
                                                            2 5 52
              117   55   -32 -15    -4          0 1 0    _5 -5
M - (2)I5 =  -69   -29    19   0    -7   RREF   0 0 1 -2     -6
              159   73   -44 -15     2          0 0 0     0   0
              -195 -87   55    10  -15          0 0 0     0   0 _
                               -1                   -1     1
                                      255                  5
   EM(2) = N(M - (2)I5) =       2     6      =      4   , 12
                                1     0             2      0
                                _0 _ L_ _]_0          _   _2 _

          A =-4
                -42   -22   13    5    1          1 0 0 0     1
                117   61   -32 -15 -4             0 1 0 0 -3
 M - (-4)I5=    -69   -29   25    0   -7   RREF   0 0 1 0 -1
                159   73   -44   -9   2           0 0 0 1 -2
                -195 -87    55   10   -9_         0 0 0 0     0 _
                                   -1I
                                   3
    EM (-4) = N(M - (-4)I[5) =      1
                                    2
                                    _ 1 _


A =3 + 2i


M - (3 + 2i)15


   EM (3 + 2i)


-49 - 2i
  117
  -69
  159
  -195


-22
54- 2i
-29
  73
  -87


  13       5         1             1 0 0 0 -4+
  -32     -15       -4            0 1 0 0        -
18-2i      0        -7     RREF   0 0 1 0-2+
-44     -16-2i       2            0 0 0 1        -
  55       10    -16-2i           0 0 0 0       0

            -{ 3-i

          -      =      2- 2i

      _ 1   __            4  _


:A(M  - (3 + 2i)15) :


A=3-2i


M - (3 - 2i)I5


-49 + 2i
  117
  -69
  159
  -195


-22       13
54 + 2i -32
-29     18 + 2i
  73     -44
  -87     55


  5
  -15
  0
-16 + 2i
  10


  1             1 0   0 0  -3   -
  -4            0  1 0  0     +
  -7     RREF   0  0  1 0  -1-
                             2  .2
  2             0  0  0  1  4+
-16+2i          0  0  0 0     0


Version 2.02


﻿
                 Subsection CB.CELT  Computing Bigenvectors of Linear Transformations 595


aplyngth  iomrpis-i1


          -1
          5
   pB'     4
           2
           0
           1
           5
    pB'    12
           0
           2
           -1
           3
   pB'     1
           2
           1
       3-i
       -7+i
pB1    2-2i
       -7+i
         4
       3+i
       -7-i
pB'    2+2i
       -7-i
         4


-1 +5x +4x2 +2x3


1 + 5x + 12x2 + 2x4


-1+3x+x2+2x3+x4


(3-i)+ (-7+i)x+ (2-2i)x2+ (-7+i)x3+4x4


(3 +i) +(-7 -i)x +(2 +2i)x2 +(-7 -i)x3 +4x4


So we apply Theorem EER [586] and the Coordinatization Principle [538] to get the eigenspaces for Q,

                    SQ (2)   K{-1 + 5x + 4x2 + 2x3, 1 + 5x + 12x2 +2X
                    £'Q (-4) K{-1 + 3x + x2 + 2x3 + x}
                EQ (3 +2i) _K{(3-i)+ (-7+i)x+(2-2i)x+(-7+i)x+4x4})
                EQ (3 -2i) _K{(3+i)+(-7-i)x+(2+2i)x+(-7-i)x+4x4})

with geometric multiplicities


y)Q (2) =2


y(-4) =1


;/Q (3 + 2i) =1


-/Q (3 -2i)=1


Version 2.02


﻿
                                                 Subsection CB.READ  Reading Questions 596


Subsection READ
Reading Questions


  1. The change-of-basis matrix is a matrix representation of which linear transformation?

  2. Find the change-of-basis matrix, CB,C, for the two bases of C2

                     3  W-1htdinsg
                           3-'  2                          0  ' 1

  3. What is the third "surprise," and why is it surprising?


Version 2.02


﻿
                                                                    Subsection CB.EXC  Exercises 597


Subsection EXC
Exercises


C20 In Example CBCV [579] we computed the vector representation of y relative to C, pc (y), as an
example of Theorem CB [576]. Compute this same representation directly. In other words, apply Definition
VR [530] rather than Theorem CB [576].
Contributed by Robert Beezer

C21    Perform a check on Example MRCM   [581] by computing MBD directly. In other words, apply
Definition MR [542] rather than Theorem MRCB [581].
Contributed by Robert Beezer Solution [597]

C30 Find a basis for the vector space P3 composed of eigenvectors of the linear transformation T. Then
find a matrix representation of T relative to this basis.

    T:P3F   P3, T(a+bx+cx2+dx3) _(a+c+d)+(b+c+d)x+(a+b+c)x2+(a+b+d)x3


Contributed by Robert Beezer Solution [597]

C40 Let S22 be the vector space of 2 x 2 symmetric matrices. Find a basis B for S22 that yields a diagonal
matrix representation of the linear transformation R. (15 points)


                                       a b         -a+ 2b-3c      -12a+5b-6c
                                       b c        -12a+ 5b - 6c     6a - 2b+4c _


Contributed by Robert Beezer Solution [598]

C41 Let S22 be the vector space of 2 x 2 symmetric matrices. Find a basis for S22 composed of eigenvectors
of the linear transformation Q: S22  S22. (15 points)

                             a  b]       25a + 18b + 30c  -16a - 11b - 20c
                             Q  b c     -16a - 11 b- 20c   -11a - 9b- 12c _

Contributed by Robert Beezer Solution [599]

T10    Suppose that T: V H V is an invertible linear transformation with a nonzero eigenvalue A. Prove
     1
that -is an eigenvalue of T-1.
Contributed by Robert Beezer Solution [599]

T15 Suppose that V is a vector space and T: V a V is a linear transformation. Prove that T is injective
if and only if A =0 is not an eigenvalue of T.
Contributed by Robert Beezer


Version 2.02


﻿
                                                                 Subsection CB.SOL Solutions 598


Subsection SOL
Solutions


C21    Contributed by Robert Beezer  Statement [596]
Apply Definition MR [542],


PD Q([23      2   )    PD(19 + 14x - 2x28x3

= PD ((-39)(2 + x - 2x2 + 3x3) + 62(-1 - 2x2 + 3x3) + (-53)(-3 - x + z3) + (-44)( -2 +
    --39
    _62
    -53
    -44

PD (Q(3       031))    PD(16 + 9x - 7x2314x3

   PD ((-23)(2 + x - 22 + 33) +(34)(-1-22+ 3x3) + (-32)(-3 - x + 3) + (-15)(-X2
   -23
     34
     -32
     --15_

PD (Q([      1))   PD (25 + 9x + 3x2 + 4x3

   PD ((14)(2 + x - 222+ 3x3) + (-12)(-1 - 22 + 3x3) + 5(-3 - x + z3) + (-7)(-2 + x3))
     14
     -12
     5
     -7_


3))


+ z3-)


These three vectors are the columns of the matrix representation,

                                            -39   -23   14
                                   MQ    -   62    34   -12
                                     B,D    -53 -32      5
                                            -44 -15 -7]

which coincides with the result obtained in Example MRCM [581].

C30    Contributed by Robert Beezer  Statement [596]
With the domain and codomain being identical, we will build a matrix representation using the same basis
for both the domain and codomain. The eigenvalues of the matrix representation will be the eigenvalues
of the linear transformation, and we can obtain the eigenvectors of the linear transformation by un-
coordinatizing (Theorem EER [586]). Since the method does not depend on which basis we choose, we can
choose a natural basis for ease of computation, say,

                                       B = {1, c, x2, x3}


Version 2.02


﻿
                                                                      Subsection CB.SOL  Solutions 599


The matrix representation is then,
                                                  1 0 1 1
                                         MT       0 1 1 1
                                         MB       1 1 1 0
                                                  1 1 0 1

The eigenvalues and eigenvectors of this matrix were computed in Example ESMS4 [407]. A basis for C4,
composed of eigenvectors of the matrix representation is,

                                          1     -1      0      -1
                                       -  1     1       0      -1
                                       C  1'    0   ' -1 '1
                                          1     0       1      1

Applying p-1 to each vector of this set, yields a basis of P3 composed of eigenvectors of T,

                      D ={ {+ z +9z2 + x3, -1+ x, -2 - x3, -1 - x +x 2  x3}

The matrix representation of T relative to the basis D will be a diagonal matrix with the corresponding
eigenvalues along the diagonal, so in this case we get

                                                 3 0 0      0
                                        MT       0 1 0      0
                                        MD'D     0 0 1      0
                                                 0 0 0 -1

C40    Contributed by Robert Beezer  Statement [596]
Begin with a matrix representation of R, any matrix representation, but use the same basis for both
instances of S22. We'll choose a basis that makes it easy to compute vector representations in 522.

                                      B  {    0    0 1      0 0
                                      B    0  0_'   1  0_'  0  1_

Then the resulting matrix representation of R (Definition MR [542]) is

                                                 -5    2   -3
                                       MB, =     -12   5   -6
                                                  6    -2   4

Now, compute the eigenvalues and eigenvectors of this matrix, with the goal of diagonalizing the matrix
(Theorem DC [436]),


                      A =2                       EMR    (2)~ =     -2


The three vectors that occur as basis elements for these eigenspaces will together form a linearly inde-
pendent set (check this!). So these column vectors may be employed in a matrix that will diagonalize
the matrix representation. If we "un-coordinatize" these three column vectors relative to the basis B, we


Version 2.02


﻿
                                                                  Subsection CB.SOL  Solutions 600


will find three linearly independent elements of S22 that are eigenvectors of the linear transformation R
(Theorem EER [586]). A matrix representation relative to this basis of eigenvectors will be diagonal, with
the eigenvalues (A = 2, 1) as the diagonal elements. Here we go,


       -1                1        r    1     r
pBl    -2     = (-1) L   J +(-2) L     J + 1 LO 01] - [-2   12J
        1
        -1               1
pBl     0     = (-1)     J + 0  0 1 1 + 2 0 Ol _   o1 Ol
        2                           J    Lo   J    L     J

        1         1 0        0  1      0 0      1 3
  pBl         = 1 [0 0] +3 [1 0] +0 [0    1]   [3 0]
        La])


So the requested basis of 522, yielding a diagonal matrix representation of R, is

                                  (1 -1  -2 -10L 113
                                         -2 1_  0   2J '[3  0J

C41    Contributed by Robert Beezer   Statement [596]
Use a single basis for both the domain and codomain, since they are equal.


The matrix representation of Q relative to B is

                                                 25    18   30
                                 M= M        =  -16   -11  -20
                                               [-11   -9   -12]

We can analyze this matrix with the techniques of Section EE [396] and then apply Theorem EER [586].
The eigenvalues of this matrix are A = -2, 1, 3 with eigenspaces


Em (-2) - (


-6
4
3


EM (1) =


{[2] }


E (3) K {2 Y]J}>


Because the three eigenvalues are distinct, the three basis vectors from the three eigenspaces for a linearly
independent set (Theorem EDELI [419]). Theorem EER [586] says we can uncoordinatize these eigenvectors
to obtain eigenvectors of Q. By Theorem ILTLI [485] the resulting set will remain linearly independent.
Set
                     -6            -2             -321                              32
                C  -I 4      --                               [-I 2- 64] -2 1 1   -3[2
                     3 _1             _1            _
Then C is a linearly independent set of size 3 in the vector space M22, which has dimension 3 as well. By
Theorem G [355], C is a basis of M22.
T10    Contributed by Robert Beezer   Statement [596]
Let v be an eigenvector of T for the eigenvalue A. Then,

                  T-1 (v) =   AT-1 (v)                     A-#0

                                                                                          Version 2.02


﻿
Subsection CB.SOL   Solutions 601


  T-1 (Av)

'T-1 (T (v))

-Iv (v)
1
-v


Theorem ILTLT [511]

v eigenvector of T

Definition IVLT [508]

Definition IDLT [508]


                 1
which says that - is an eigenvalue of T-1 with eigenvector v. Note that it is possible to prove that any
                 A
eigenvalue of an invertible linear transformation is never zero. So the hypothesis that A be nonzero is just
a convenience for this problem.


Version 2.02


﻿
                                                        Section OD  Orthonormal Diagonalization 602


Section OD
Orthonormal Diagonalization


THIS SECTION IS IN DRAFT FORM
THEOREMS & DEFINITIONS ARE COMPLETE, NEEDS EXAMPLES
   We have seen in Section SD [432] that under the right conditions a square matrix is similar to a diagonal
matrix. We recognize now, via Theorem SCB [583], that a similarity transformation is a change of basis on
a matrix representation. So we can now discuss the choice of a basis used to build a matrix representation,
and decide if some bases are better than others for this purpose. This will be the tone of this section. We
will also see that every matrix has a reasonably useful matrix representation, and we will discover a new
class of diagonalizable linear transformations. First we need some basic facts about triangular matrices.

Subsection TM
Triangular Matrices


An upper, or lower, triangular matrix is exactly what it sounds like it should be, but here are the two
relevant definitions.
Definition UTM
Upper Triangular Matrix
The n x n square matrix A is upper triangular if [A]zj = 0 whenever i > j.                      A

Definition LTM
Lower Triangular Matrix
The n x n square matrix A is lower triangular if [A]zj = 0 whenever i < j.                      A
   Obviously, properties of a lower triangular matrices will have analogues for upper triangular matrices.
Rather than stating two very similar theorems, we will say that matrices are "triangular of the same type"
as a convenient shorthand to cover both possibilities and then give a proof for just one type.
Theorem PTMT
Product of Triangular Matrices is Triangular
Suppose that A and B are square matrices of size n that are triangular of the same type. Then AB is also
triangular of that type.                                                                        D
Proof We prove this for lower triangular matrices and leave the proof for upper triangular matrices to
you. Suppose that A and B are both lower triangular. We need only establish that certain entries of the
product AB are zero. Suppose that i < j, then

           [ AB]gg =>3 [ A]ik [BlkJ                       Theorem EMP [198]
                   k=1
                   j-1              12
                   S   [A]gg [B]kJ + S [A]gg [B]kJ     Property AACN [680]
                   k=1             k~j
                   j-1          12
                   S   [ A]; 0 +(5 [ A]ik [B]kJ        k < j, Definition LT M [601]


k=1         k=j
j-1          n
5   [A]2k 0 + 50 [B]kJ                 i <j <; k, Definition LTM [601]
k=1         k=j


Version 2.02


﻿
                                       Subsection OD.UTMR  Upper Triangular Matrix Representation 603

                    j-1     n
                  =     0+     0o
                    k=1    k=j
                    =0

Since [AB]j= 0 whenever i < j, by Definition LTM [601], AB is lower triangular.             U
   The inverse of a triangular matrix is triangular, of the same type.
Theorem ITMT
Inverse of a Triangular Matrix is Triangular
Suppose that A is a nonsingular matrix of size n that is triangular. Then the inverse of A, A-1, is triangular
of the same type. Furthermore, the diagonal entries of A-1 are the reciprocals of the corresponding diagonal
entries of A. More precisely, [A-1] 2= [A]j1 .
Proof We give the proof for the case when A is lower triangular, and leave the case when A is upper
triangular for you. Consider the process for computing the inverse of a matrix that is outlined in the
proof of Theorem CINM [217]. We augment A with the size n identity matrix, In, and row-reduce the
n x 2n matrix to reduced row-echelon form via the algorithm in Theorem REMEF [30]. The proof involves
tracking the peculiarities of this process in the case of a lower triangular matrix. Let M = [A In].
   First, none of the diagonal elements of A are zero. By repeated expansion about the first row, the
determinant of a lower triangular matrix can be seen to be the product of the diagonal entries (Theorem
DER [376]). If just one of these diagonal elements was zero, then the determinant of A is zero and A is
singular by Theorem SMZD [389]. Slightly violating the exact algorithm for row reduction we can form a
matrix, M', that is row-equivalent to M, by multiplying row i by the nonzero scalar [A]2, for 1 < i <rn.
This sets [M']2 - 1 and [M']2 n+1 - [A]22, and leaves every zero entry of M unchanged.
   Let My denote the matrix obtained form M' after converting column j to a pivot column. We can
convert column j of M_ into a pivot column with a set of n - j - 1 row operations of the form o-R3 + Rk
with j + 1 < k <rn. The key observation here is that we add multiples of row j only to higher-numbered
rows. This means that none of the entries in rows 1 through j - 1 is changed, and since row j has zeros in
columns j + 1 through n, none of the entries in rows j + 1 through n is changed in columns j + 1 through
n. The first n columns of M' form a lower triangular matrix with 1's on the diagonal. In its conversion
to the identity matrix through this sequence of row operations, it remains lower triangular with 1's on the
diagonal.
   What happens in columns n+ 1 through 2n of M'? These columns began in M as the identity matrix,
and in M' each diagonal entry was scaled to a reciprocal of the corresponding diagonal entry of A. Notice
that trivially, these final n columns of M' form a lower triangular matrix. Just as we argued for the first
n columns, the row operations that convert M_ into My will preserve the lower triangular form in the
final n columns and preserve the exact values of the diagonal entries. By Theorem CINM [217], the final n
columns of Ma is the inverse of A, and this matrix has the necessary properties advertised in the conclusion
of this theorem.U


Subsection UTMR
Upper Triangular Matrix Representation


Not every matrix is diagonalizable, but every linear transformation has a matrix representation that is an
upper triangular matrix, and the basis that achieves this representation is especially pleasing. Here's the
theorem.


Theorem UTMR
Upper Triangular Matrix Representation
Suppose that T: V H V is a linear transformation. Then there is a basis B for V such that the matrix


Version 2.02


﻿
                                       Subsection OD.UTMR  Upper Triangular Matrix Representation 604


representation of T relative to B, MB B,,is an upper triangular matrix. Each diagonal entry is an eigenvalue
of T, and if A is an eigenvalue of T, then A occurs orT (A) times on the diagonal.         D

Proof We begin with a proof by induction (Technique I [694]) of the first statement in the conclusion of
the theorem. We use induction on the dimension of V to show that if T: V H V is a linear transformation,
then there is a basis B for V such that the matrix representation of T relative to B, M,, is an upper
triangular matrix.
    To start suppose that dim (V) = 1. Choose any nonzero vector v E V and realize that V = ({v}).
Then we can determine T uniquely by T (v) = #v for some #3 E C (Theorem LTDB [462]). This description
of T also gives us a matrix representation relative to the basis B = {v} as the 1 x 1 matrix with lone entry
equal to 3. And this matrix representation is upper triangular (Definition UTM [601]).
    For the induction step let dim (V) = m, and assume the theorem is true for every linear transformation
defined on a vector space of dimension less than m. By Theorem EMHE [400] (suitably converted to the
setting of a linear transformation), T has at least one eigenvalue, and we denote this eigenvalue as A. (We
will remark later about how critical this step is.) We now consider properties of the linear transformation
T - AIv: VHV.
    Let x be an eigenvector of T for A. By definition x # 0. Then

                 (T - AIv) (x) = T (x) - AIv (x)                Theorem VSLT [469]
                               = T (x) - Ax                     Definition IDLT [508]
                               = Ax - Ax                        Definition EELT [574]
                               = 0                              Property Al [280]

So T - AIv is not injective, as it has a nontrivial kernel (Theorem KILT [484]). With an application of
Theorem RPNDD [517] we bound the rank of T - AIV,

                             r(T- AIv) =dim (V) -rn(T- AIv) < m-1

Define W to be the subspace of V that is the range of T - AIv, W = 7Z(T - AIv). We define a new linear
transformation S, on W,

                          S:WH-aW                              S(w)=T(w)

This does not look we have accomplished much, since the action of S is identical to the action of T. For
our purposes this will be a good thing. What is different is the domain and codomain. S is defined on W,
a vector space with dimension less than m, and so is susceptible to our induction hypothesis. Verifying
that S is really a linear transformation is almost entirely routine, with one exception. Employing T in our
definition of S raises the possibility that the outputs of S will not be contained within W (but instead will
lie inside V, but outside W). To examine this possibility, suppose that w E W.

               S (w) =T (w)
                      =T (w) + 0                                   Property Z [280]
                      =T (w) + (AIy (w) - AIy (w))                 Property Al [280]
                      =(T (w) - AIy (w)) + AIy (w)                 Property AA [279]
                      =(T (w) - AIy (w)) + Aw                      Definition IDLT [508]
                      =(T - AIy) (w) + Aw                          Theorem VSLT [469]


Since W is the range of T - AIv, (T - AIv) (w) E W. And by Property SC [279], Aw E W. Finally,
applying Property AC [279] we see by closure that the sum is in W and so we conclude that S (w) E W.
This argument convinces us that it is legitimate to define S as we did with W as the codomain.


Version 2.02


﻿
                                       Subsection OD.UTMR  Upper Triangular Matrix Representation  605


   S is a linear transformation defined on a vector space with dimension less than m, so we can apply the
induction hypothesis and conclude that W has a basis, C = {w1, w2, w3, ..., wk}, such that the matrix
representation of S relative to C is an upper triangular matrix.
   By Theorem DSFOS [362] there exists a second subspace of V, which we will call U, so that V is a
direct sum of W and U, V = W eU. Choose a basis D = {ui, u2,u3, ..., u} for U. So m = k + £ by
Theorem DSD [364], and B = C U D is basis for V by Theorem DSLI [364] and Theorem G [355]. B is the
basis we desire. What does a matrix representation of T look like, relative to B?
   Since the definition of T and S agree on W, the first k columns of MB4 will have the upper triangular
matrix representation of S in the first k rows. The remaining £ = m - k rows of these first k columns will
be all zeros since the outputs of T on C are all contained in W. The situation for T on D is not quite as
pretty, but it is close.
   For 1 < i <Cf, consider


PB (T (ui))


pB (T (ui) + 0)
PB (T (ui) + (AIv (ui) - AIv (ui)))
PB ((T (ui) - AIv (ui)) + AIv (uz))
PB ((T (ui) - AIv (ui)) + AuZ)
PB ((T - AIV) (ui) + AuZ)
PB (aiwi + a2w2 + a3w3 + -- --+ akwk + Au2)
  ai
  a2


  0


  0
  0
  0


  0


Property Z [280]
Property Al [280]
Property AA [279]
Definition IDLT [508]
Theorem VSLT [469]
Definition RLT [496]


Definition VR [530]


In the penultimate step of this proof, we have rewritten an element of the range of T - AIv as a linear
combination of the basis vectors, C, for the range of T - AIv, W, using the scalars a1, a2, a3, ..., a.
If we incorporate these £ column vectors into the matrix representation M!BB we find £ occurrences of A
on the diagonal, and any nonzero entries lying only in the first k rows. Together with the k x k upper
triangular representation in the upper left-hand corner, the entire matrix representation is now clearly
upper triangular. This completes the induction step, so for any linear transformation there is a basis that
creates an upper triangular matrix representation.
   We have one more statement in the conclusion of the theorem to verify. The eigenvalues of T, and their
multiplicities, can be computed with the techniques of Chapter E [396] relative to any matrix representation
(Theorem EER [586]). We take this approach with our upper triangular matrix representation MB B. Let
d be the diagonal entry of MgBB in row i and column i. Then the characteristic polynomial, computed as
a determinant (Definition CP [403]) with repeated expansions about the first column, is

                           Pur B(x) = (di - x) (d2 - x) (d3 - x) -.-. (dm - x)

The roots of the polynomial equation pMrB (x)   0 are the eigenvalues of the linear transformation
(Theorem EMRCP [404]). So each diagonal entry is an eigenvalue, and is repeated on the diagonal exactly


Version 2.02


﻿
                                       Subsection OD.UTMR  Upper Triangular Matrix Representation 606


oT (A) times (Definition AME [406]).                                                                U
   A key step in this proof was the construction of the subspace W with dimension strictly less than that
of V. This required an eigenvalue/eigenvector pair, which was guaranteed to us by Theorem EMHE [400].
Digging deeper, the proof of Theorem EMHE [400] requires that we can factor polynomials completely,
into linear factors. This will not always happen if our set of scalars is the reals, R. So this is our final
explanation of our choice of the complex numbers, C, as our set of scalars. In C polynomials factor
completely, so every matrix has at least one eigenvalue, and an inductive argument will get us to upper
triangular matrix representations.
   In the case of linear transformations defined on CCm, we can use the inner product (Definition IP [168])
profitably to fine-tune the basis that yields an upper triangular matrix representation. Recall that the
adjoint of matrix A (Definition A [189]) is written as A*.
Theorem OBUTR
Orthonormal Basis for Upper Triangular Representation
Suppose that A is a square matrix. Then there is a unitary matrix U, and an upper triangular matrix T,
such that

                                             U*AU = T

and T has the eigenvalues of A as the entries of the diagonal.                                      D
Proof This theorem is a statement about matrices and similarity. We can convert it to a statement
about linear transformations, matrix representations and bases (Theorem SCB [583]). Suppose that A is
an n x n matrix, and define the linear transformation S: C" H C" by S (x) = Ax. Then Theorem UTMR
[602] gives us a basis B = {vi, v2, v3, ..., vn} for C" such that a matrix representation of S relative to
B, MB,B, is upper triangular.
   Now convert the basis B into an orthogonal basis, C, by an application of the Gram-Schmidt procedure
(Theorem GSP [175]). This is a messy business computationally, but here we have an excellent illustration
of the power of the Gram-Schmidt procedure. We need only be sure that B is linearly independent and
spans C", and then we know that C is linearly independent, spans C" and is also an orthogonal set. We
will now consider the matrix representation of S relative to C (rather than B). Write the new basis as
C = {yi, y2, y3, ..., yn}. The application of the Gram-Schmidt procedure creates each vector of C, say
yj, as the difference of v3 and a linear combination of yi, y2, y3, ..., yj-1. We are not concerned here
with the actual values of the scalars in this linear combination, so we will write

                                                   j-1
                                         yj = vj - ZEbjkyk
                                                   k=1

where the byk are shorthand for the scalars. The equation above is in a form useful for creating the basis
C from B. To better understand the relationship between B and C convert it to read

                                                   j1-1
                                            v4=y3 +Z bkyk
                                                   k=1

In this form, we recognize that the change-of-basis matrix CB,C= M~c~ (Definition CBM [575]) is an
upper triangular matrix. By Theorem SCB [583] we have

                                        Mc,c =CB,C M ,BCB>C


The inverse of an upper triangular matrix is upper triangular (Theorem ITMT [602]), and the product
of two upper triangular matrices is again upper triangular (Theorem PTMT [601]). So MCc is an upper
triangular matrix.


Version 2.02


﻿
                                                              Subsection OD.NM   Normal Matrices 607


   Now, multiply each vector of C by a nonzero scalar, so that the result has norm 1. In this way we
create a new basis D which is an orthonormal set (Definition ONS [177]). Note that the change-of-basis
matrix CC,D is a diagonal matrix with nonzero entries equal to the norms of the vectors in C.
   Now we can convert our results into the language of matrices. Let E be the basis of C" formed with
the standard unit vectors (Definition SUV [173]). Then the matrix representation of S relative to E is
simply A, A = ME s. The change-of-basis matrix CD,E has columns that are simply the vectors in D,
the orthonormal basis. As such, Theorem CUMOS [230] tells us that CD,E is a unitary matrix, and by
Definition UM [229] has an inverse equal to its adjoint. Write U = CD,E. We have

                   U*AU = U-1AU                               Theorem UMI [230]
                          = CDEME,ECD,E
                          = M2,D                              Theorem SCB [583]
                          = CC,DMNC,CC D                      Theorem SCB [583]

The inverse of a diagonal matrix is also a diagonal matrix, and so this final expression is the product
of three upper triangular matrices, and so is again upper triangular (Theorem PTMT [601]). Thus the
desired upper triangular matrix, T, is the matrix representation of S relative to the orthonormal basis D,
  D,D'

Subsection NM
Normal Matrices


Normal matrices comprise a broad class of interesting matrices, many of which we have met already. But
they are most interesting since they define exactly which matrices we can diagonalize via a unitary matrix.
This is the upcoming Theorem OD [607]. Here's the definition.
Definition NRML
Normal Matrix
The square matrix A is normal if A*A = AA*.                                                       A
   So a normal matrix commutes with its adjoint. Part of the beauty of this definition is that it includes
many other types of matrices. A diagonal matrix will commute with its adjoint, since the adjoint is again
diagonal and the entries are just conjugates of the entries of the original diagonal matrix. A Hermitian
(self-adjoint) matrix (Definition HM [205]) will trivially commute with its adjoint, since the two matrices
are the same. A real, symmetric matrix is Hermitian, so these matrices are also normal. A unitary matrix
(Definition UM [229]) has its adjoint as its inverse, and inverses commute (Theorem OSIS [227]), so unitary
matrices are normal. Another class of normal matrices is the skew-symmetric matrices. However, these
broad descriptions still do not capture all of the normal matrices, as the next example shows.
Example ANM
A normal matrix
Let


Then


                            1   1_J   - 1 1_- 0 2]      -11 1_    1   1_J
so we see by Definition NRML [606] that A is normal. However, A is not symmetric (hence, as a real
matrix, not Hermitian), not unitary, and not skew-symmetric.


Version 2.02


﻿
                                                 Subsection OD.OD  Orthonormal Diagonalization 608


Subsection OD
Orthonormal Diagonalization


A diagonal matrix is very easy to work with in matrix multiplication (Example HPDM [441]) and an
orthonormal basis also has many advantages (Theorem COB [332]). How about converting a matrix to a
diagonal matrix through a similarity transformation using a unitary matrix (i.e. build a diagonal matrix
representation with an orthonormal matrix)? That'd be fantastic! When can we do this? We can always
accomplish this feat when the matrix is normal, and normal matrices are the only ones that behave this
way. Here's the theorem.
Theorem OD
Orthonormal Diagonalization
Suppose that A is a square matrix. Then there is a unitary matrix U and a diagonal matrix D, with
diagonal entries equal to the eigenvalues of A, such that U*AU = D if and only if A is a normal matrix. D
Proof   (-) Suppose there is a unitary matrix U that diagonalizes A, resulting in D, i.e. U*AU = D.
We check the normality of A,


A*A = InA*InAIn
     = UU*A*UU*AUU*
     = UU*A*UDU*
     = UU*A* (U*)* DU*
     = U (U*AU)* DU*
     = UD*DU*
     = U (D)DU*
     = UDDU*
     = UDDU*
     = UD (D)tU*
     = UDD*U*
     = UD (U*AU)* U*
     = UDU*A* (U*)* U*
     = UDU*A*UU*
     = UU*AUU*A*UU*
     = InAInA*In
     = AA*


Theorem MMIM [200]
Definition UM [229]


Theorem AA [190]
Adjoint of a product


Definition A [189]
Diagonal matrix
Property CMCN [680]
Diagonal matrix
Definition A [189]


Adjoint of a product
Theorem AA [190]


Definition UM [229]
Theorem MMIM [200]


So by Definition NRML [606], A is a normal matrix.
   (<) For the converse, suppose that A is a normal matrix. Whether or not A is normal, Theorem
OBUTR [605] provides a unitary matrix U and an upper triangular matrix T, whose diagonal entries are
the eigenvalues of A, and such that U*AU = T. With the added condition that A is normal, we will
determine that the entries of T above the diagonal must be all zero. Here we go. First we show that T is
normal.


T*T = (U*AU)* U*AU
     = U*A* (U*)* U*AU
     = U*A*UU*AU
     = U*A*InAU


Adjoint of a product
Theorem AA [190]
Definition UM [229]


Version 2.02


﻿
Subsection OD.OD  Orthonormal Diagonalization 609


U*A*AU
U*AA*U
U*AInA*U
U*AUU*A*U
U*AUU*A* (U*)*
U*AU (U*AU)*
TT*


Theorem MMIM [200]
Definition NRML [606]
Theorem MMIM [200]
Definition UM [229]
Theorem AA [190]
Adjoint of a product


So by Definition NRML [606], T is a normal matrix.
   We can translate the normality of T into the statement TT*
we will use repeatedly. For 1 < i < n,


- T*T = 0. We now establish an equality


0 = [0]22
  = [TT* - T*T]
  = [TT*]ii - [T*T] i
     n               n
  = >   [T]ik [T*]ki->  [T*]ik [Tlki
    k=1             k=1
    n               n
  = >   [T][]i - >[][T]ki
    k=1            k=1


    n
  = E   [T ]i 12-   I[li

    k=i          k=1


Definition ZM [185]
Definition NRML [606]
Definition MA [182]

Theorem EMP [198]


Definition A [189]


Definition UTM [601]


Definition MCN [682]


To conclude, we use the above equality repeatedly, beginning with i = 1, and discover, row by row, that
the entries above the diagonal of T are all zero. The key observation is that a sum of squares can only
equal zero when each term of the sum is zero. For i =1 we have


n
0 = [T1]2 x|2
k=1


1
k=1 [T]12
k~1


n
S[T]lk2
k=2


which forces the conclusions

         [T1=0O[


0


[T]14 = 0


0


For i = 2 we use the same equality, but also incorporate the
[T]12 -0,


portion of the above conclusions that says


     n
0 = (   [T]2k 2
    k=2


2
Z   [T'k2l2
k=1


n
22
k=2


2
> I[Tlk212
k=2


n
ZT2k2
k=3


which forces the conclusions


[T]23 = 0


0


[T]25 = 0


0


We can repeat this process for the subsequent values of i = 3, 4, 5...,n - 1. Notice that it is critical we
do this in order, since we need to employ portions of each of the previous conclusions about rows having


Version 2.02


﻿
                                                     Subsection OD.OD  Orthonormal Diagonalization 610


zero entries in order to successfully get the same conclusion for later rows. Eventually, we conclude that
all of the nondiagonal entries of T are zero, so the extra assumption of normality forces T to be diagonal.


   We can rearrange the conclusion of this theorem to read A = UDU*. Recall that a unitary matrix
can be viewed as a geometry-preserving transformation (isometry), or more loosely as a rotation of sorts.
Then a matrix-vector product, Ax, can be viewed instead as a sequence of three transformations. U* is
unitary, so is a rotation. Since D is diagonal, it just multiplies each entry of a vector by a scalar. Diagonal
entries that are positive or negative, with absolute values bigger or smaller than 1 evoke descriptions like
reflection, expansion and contraction. Generally we can say that D "stretches" a vector in each component.
Final multiplication by U undoes (inverts) the rotation performed by U*. So a normal matrix is a rotation-
stretch-rotation transformation.
    The orthonormal basis formed from the columns of U can be viewed as a system of mutually perpendic-
ular axes. The rotation by U* allows the transformation by A to be replaced by the simple transformation
D along these axes, and then D brings the result back to the original coordinate system. For this reason
Theorem OD [607] is known as the Principal Axis Theorem.
    The columns of the unitary matrix in Theorem OD [607] create an especially nice basis for use with
the normal matrix. We record this observation as a theorem.
Theorem OBNM
Orthonormal Bases and Normal Matrices
Suppose that A is a normal matrix of size n. Then there is an orthonormal basis of C" composed of
eigenvectors of A.                                                                                   D
Proof Let U be the unitary matrix promised by Theorem OD [607] and let D be the resulting diagonal
matrix. The desired set of vectors is formed by collecting the columns of U into a set. Theorem CUMOS
[230] says this set of columns is orthonormal. Since U is nonsingular (Theorem UMI [230]), Theorem
CNMB [330] says the set is a basis.
    Since A is diagonalized by U, the diagonal entries of the matrix D are the eigenvalues of A. An
argument exactly like the second half of the proof of Theorem DC [436] shows that each vector of the basis
is an eigenvector of A.                                                                              U
    In a vague way Theorem OBNM [609] is an improvement on Theorem HMOE [428] which said that
eigenvectors of a Hermitian matrix for different eigenvalues are always orthogonal. Hermitian matrices
are normal and we see that we can find at least one basis where every pair of eigenvectors is orthogonal.
Notice that this is not a generalization, since Theorem HMOE [428] states a weak result which applies to
many (but not all) pairs of eigenvectors, while Theorem OBNM [609] is a seemingly stronger result, but
only asserts that there is one collection of eigenvectors with the stronger property.


Version 2.02


﻿
                                                     Section NLT  Nilpotent Linear Transformations 611


Section NLT
Nilpotent Linear Transformations


THIS SECTION IS IN DRAFT FORM
NEARLY COMPLETE

   We have seen that some matrices are diagonalizable and some are not. Some authors refer to a non-
diagonalizable matrix as defective, but we will study them carefully anyway. Examples of such matrices
include Example EMMS4 [406], Example HMEM5 [408], and Example CEMS6 [409]. Each of these matrices
has at least one eigenvalue with geometric multiplicity strictly less than its algebraic multiplicity, and
therefore Theorem DMFE [438] tells us these matrices are not diagonalizable.
   Given a square matrix A, it is likely similar to many, many other matrices. Of all these possibilities,
which is the best? "Best" is a subjective term, but we might agree that a diagonal matrix is certainly a
very nice choice. Unfortunately, as we have seen, this will not always be possible. What form of a matrix is
"next-best"? Our goal, which will take us several sections to reach, is to show that every matrix is similar to
a matrix that is "nearly-diagonal" (Section JCF [644]). More precisely, every matrix is similar to a matrix
with elements on the diagonal, and zeros and ones on the diagonal just above the main diagonal (the
"super diagonal"), with zeros everywhere else. In the language of equivalence relations (see Theorem SER
[433]), we are determining a systematic representative for each equivalence class. Such a representative for
a set of similar matrices is called a canonical form.
   We have just discussed the determination of a canonical form as a question about matrices. However,
we know that every square matrix creates a natural linear transformation (Theorem MBLT [459]) and
every linear transformation with identical domain and codomain has a square matrix representation for
each choice of a basis, with a change of basis creating a similarity transformation (Theorem SCB [583]). So
we will state, and prove, theorems using the language of linear transformations on abstract vector spaces,
while most of our examples will work with square matrices. You can, and should, mentally translate
between the two settings frequently and easily.

Subsection NLT
Nilpotent Linear Transformations


We will discover that nilpotent linear transformations are the essential obstacle in a non-diagonalizable
linear transformation. So we will study them carefully first, both as an object of inherent mathematical
interest, but also as the object at the heart of the argument that leads to a pleasing canonical form for
any linear transformation. Once we understand these linear transformations thoroughly, we will be able
to easily analyze the structure of any linear transformation.

Definition NLT
Nilpotent Linear Transformation
Suppose that T: V a V is a linear transformation such that there is an integer p > 0 such that TP (v) = 0
for every v E V. The smallest p for which this condition is met is called the index of T.   A

   Of course, the linear transformation T defined by T (v) =0 will qualify as nilpotent of index 1. But
are there others?


Example NM64
Nilpotent matrix, size 6, index 4
Recall that our definitions and theorems are being stated for linear transformations on abstract vector
spaces, while our examples will work with square matrices (and use the same terms interchangeably). In


Version 2.02


﻿
                                             Subsection NLT.NLT Nilpotent Linear Transformations 612


this case, to demonstrate the existence of nontrivial nilpotent linear transformations, we desire a matrix
such that some power of the matrix is the zero matrix. Consider


A


-3
-3
-3
-3
-3
-2


3
5
4
3
3
3


-2
-3
-2
-2
-2
-2


5
4
6
5
4
2


0
3
-4
0
2
4


-5
-9
-3
-5
-6
-7


and compute powers of A,


A2


A3


A4


1


1

1

0
1
1
1
0
0
0
0
0
0


   -2
   -2
   0
   -2
   -2
1 -2
00
00
00
00
00
00
00
00
00
00
00
00


1
1
0
1
1
1
-1
-1
0
-1
-1
-1


0
1
-3
0
1
2
0
0
0
0
0
0
)0
  0
  0
  0
  0
  0


-3
-3
0
-3
-3
-3
0
0
0
0
0
0


4-
4
0
4
4
4


0
0
0
0
0
0


0
0
0
0
0
0


Thus we can say that A is nilpotent of index 4.
   Because it will presage some upcoming theorems, we will record some extra information about the
eigenvalues and eigenvectors of A here. A has just one eigenvalue, A = 0, with algebraic multiplicity 6 and
geometric multiplicity 2. The eigenspace for this eigenvalue is


SA (0) =


2
5
2
1
0


-1
-1
-5
-1
0
1


If there were degrees of singularity, we might say this matrix was very singular, since zero is an eigenvalue
with maximum algebraic multiplicity (Theorem SMZE [420], Theorem ME [425]). Notice too that A is
"far" from being diagonalizable (Theorem DMFE [438]).

   Another example.

Example NM62
Nilpotent matrix, size 6, index 2


Version 2.02


﻿
                                              Subsection NLT.NLT  Nilpotent Linear Transformations 613


Consider the matrix
                                       -1    1   -1   4   -3   -1
                                       1     1   -1   2   -3   -1
                                   B   -9   10   -5   9    5   -15
                                 B     -1    1   -1   4   -3   -1
                                        1   -1   0    2   -4    2
                                        4   -3    1  -1 -5      5

and compute the second power of B,

                                       0 0 0 0 0 0
                                       0 0 0 0 0 0
                                 B2    0 0 0 0 0 0
                                       0 0 0 0 0 0
                                       0 0 0 0 0 0
                                       _0 0 0 0 0 0_


So B is nilpotent of index 2. Again, the only eigenvalue of B is zero, with algebraic multiplicity 6. The
geometric multiplicity of the eigenvalue is 3, as seen in the eigenspace,
                                               1      0     0
                                               3     -4     2

                                   B(O         6     -7     1
                                       EB()=   1 '       >
                                               0      1     0
                                               0      0 _   1
Again, Theorem DMFE [438] tells us that B is far from being diagonalizable.
   On a first encounter with the definition of a nilpotent matrix, you might wonder if such a thing was
possible at all. That a high power of a nonzero object could be zero is so very different from our experience
with scalars that it seems very unnatural. Hopefully the two previous examples were somewhat surprising.
But we have seen that matrix algebra does not always behave the way we expect (Example MMNC [198]),
and we also now recognize matrix products not just as arithmetic, but as function composition (Theorem
MRCLT [549]). We will now turn to some examples of nilpotent matrices which might be more transparent.
Definition JB
Jordan Block
Given the scalar A E C, the Jordan block Jn (A) is the n x n matrix defined by


                                         [J(A]    1ji~+1
                                                 {0 oherwise

(This definition contains Notation JB.)                                                           /A

Example JB4
Jordan block, size 4
A simple example of a Jordan block,


         5 1 0 0
J4 (5)  0 5 1 0
         0 0 5 1
         _0 0 0 5_


Version 2.02


﻿
Subsection NLT.NLT   Nilpotent Linear Transformations 614


   We will return to general Jordan blocks later, but in this section we are just interested in Jordan blocks
where A = 0. Here's an example of why we are specializing in these matrices now.
Example NJB5
Nilpotent Jordan block, size 5
Consider
                                                 0 1 0 0 0
                                                 0 0 1 0 0
                                        J5 (0) = 0 0 0 1 0
                                                 0 0 0 0 1
                                                 _0 0 0 0 0_

and compute powers,

                                                 0 0 1 0 0
                                                 0 0 0 1 0
                                     (J5 (0))2 = 0 0 0 0 1
                                                 0 0 0 0 0
                                                 _0 0 0 0 0_
                                                 0 0 0 1 0
                                                 0 0 0 0 1
                                     (J5 (0))3 = 0 0 0 0 0
                                                 0 0 0 0 0
                                                 _0 0 0 0 0_
                                                 0 0 0 0 1
                                                 0 0 0 0 0
                                     (J5(0))4 = 0 0 0 0 0
                                                 0 0 0 0 0
                                                 _0 0 0 0 0_
                                                 0 0 0 0 0
                                                 0 0 0 0 0
                                     (J5(0))5 = 0 0 0 0 0
                                                 0 0 0 0 0
                                                 _0 0 0 0 0_


So J5 (0) is nilpotent of index 5. As before, we record some information about the eigenvalues and eigen-
vectors of this matrix. The only eigenvalue is zero, with algebraic multiplicity 5, the maximum possible
(Theorem ME [425]). The geometric multiplicity of this eigenvalue is just 1, the minimum possible (The-
orem ME [425]), as seen in the eigenspace,

                                                        1

                                         SJ5(o) (0) K 0
                                                        0

There should not be any real surprises in this example. We can watch the ones in the powers of J5 (0)
slowly march off to the upper-right hand corner of the powers. In some vague way, the eigenvalues and


Version 2.02


﻿
                        Subsection NLT.NLT  Nilpotent Linear Transformations 615


eigenvectors of this matrix are equally extreme.
  We can form combinations of Jordan blocks to build a variety of nilpotent matrices. Simply place
Jordan blocks on the diagonal of a matrix with zeros everywhere else, to create a block diagonal matrix.
Example NM83
Nilpotent matrix, size 8, index 3
Consider the matrix
                             0 1 0 0 0 0 0 0
                             0 0 1 0 0 0 0 0

               [J3 (0) 0 00    0 0 0 0 0 00 J3 (0) 0 = 0 0 0 0 1 0 0000 J2 (0)j 0 0 0 0 0 1 0 0
                           - 0 0 0 0 0 0 0 0
                             0 0 0 01 0 0 01
                             0 0 0 0 0 1 0 0


and compute powers,

               0 0 1 0 0 0 0 0
               0 0 0 0 0 0 0 0
               0 0 0 0 0 0 0 0
            C2 0 0 0 0 0 1 0 0
               0 0 0 0 0 0 0 0
               0 0 0 0 0 0 0 0
               0 0 0 0 0 0 0 0
               0 0 0 0 0 0 0 0
               _0 0 0 0 0 0 0 0_
               0 0 0 0 0 0 0 0
             C-0 0 0 0 0 0 0 0
               0 0 0 0 0 0 0 0

               0 0 0 0 0 0 0 0
               0 0 0 0 0 0 0 0
               0 0 0 0 0 0 0 0

So C is nilpotent of index 3. You should notice how block diagonal matrices behave in products (much like
diagonal matrices) and that it was the largest Jordan block that determined the index of this combination.
All eight eigenvalues are zero, and each of the three Jordan blocks contributes one eigenvector to a basis
    fo heegesaersutngi00r0avn00go0       ti0  mli0iiy f3
    Itwud perthtnlotn ari00nl0a0zeoa0n0i0vlus0teagbri utilct
    wilb temxiu   osil.    oeer0ycraig0lc0daoalmticswthJranbokso0h
    dignl o  hol  eabet ati aydsre0emercmutpiit0o0ti0on0ievau.Lieie
    th ie ftelags Jra  lokepoydwlldtr0n0h0idxo0temtix00niptn0mtie
    wihvriu  omiaioso idxad emtrcmltpiite0rees0o0auacue0Tepedcal


properis lotetof ne  .Yuhudntc         o block diagonal matrices  behti  rdutn ienetrcmains roct (much like


willxb therm mamumi possible. Hoeeb ceatfing bxmlc diNJ5[1asfl cswtoran bockhs poof.


Theorem NJB


Nilpotent Jordan Blocks
The Jordan block J, (0) is nilpotent of index n.    D
Proof While not phrased as an if-then statement, the statement in the theorem is understood to mean
that if we have a specific matrix (J, (0)) then we need to establish it is nilpotent of a specified index. The


Version 2.02


﻿
                                 Subsection NLT.PNLT  Properties of Nilpotent Linear Transformations 616


first column of Jn (0) is the zero vector, and the remaining n - 1 columns are the standard unit vectors e2,
1 < i <rn - 1 (Definition SUV [173]), which are also the first n - 1 columns of the size n identity matrix
In. As shorthand, write J= Jn (0).

                                      J = [0 ei le2 les3|-.-. len-1 ]

We will use the definition of matrix multiplication (Definition MM [197]), together with a proof by induction
(Technique I [694]), to study the powers of J. Our claim is that
                                  Jk= [0|0|...|0   |e1 e2  ... e- k]

for 1 < k < n. For the base case, k = 1, and the definition of J1 = Jn (0) establishes the claim. For the
induction step, first note that Je1 = 0 and Je = e2_1 for 2 < i < n. Then, assuming the claim is true for
k, we examine the k + 1 case,

            Jk+1_ jjk
                  = J [0 0 ... 0 e1 e2 ... en-k ]                    Induction Hypothesis
                  = [JO|JO ... JO Jei Je2 ... Jen-k ]                Definition MM [197]
                  = [0 0 ... 0 0 lei e2 ... en-k-1]                  Definition MVP [194]
                  = [0 0  ... 0 lei e2 |.- en-(k+1)]J

This concludes the induction. So jk has a nonzero entry (a one) in row n-k and column n, for 1 < k < n-1,
and is therefore a nonzero matrix. However, J"h= [0 0 ... 0] =0. By Definition NLT [610], J is nilpotent
of index n.                                                                                          U


Subsection PNLT
Properties of Nilpotent Linear Transformations


In this subsection we collect some basic properties of nilpotent linear transformations. After studying the
examples in the previous section, some of these will be no surprise.
Theorem ENLT
Eigenvalues of Nilpotent Linear Transformations
Suppose that T: V H V is a nilpotent linear transformation and A is an eigenvalue of T. Then A = 0. D
Proof Let x be an eigenvector of T for the eigenvalue A, and suppose that T is nilpotent with index p.
Then

                        0 =TP (x)                         Definition NLT [610]
                          = A~x                           Theorem EOMP [421]

Because x is an eigenvector, it is nonzero, and therefore Theorem SMEZV [287] tells us that AP   0 and
so A =0.U

    Paraphrasing, all of the eigenvalues of a nilpotent linear transformation are zero. So in particular,
the characteristic polynomial of a nilpotent linear transformation, T, on a vector space of dimension n, is
simply pT (x) =xz".
    The next theorem is not critical for what follows, but it will explain our interest in nilpotent linear


transformations. More specifically, it is the first step in backing up the assertion that nilpotent linear trans-
formations are the essential obstacle in a non-diagonalizable linear transformation. While it is not obvious
from the statement of the theorem, it says that a nilpotent linear transformation is not diagonalizable,
unless it is trivially so.


Version 2.02


﻿
                                 Subsection NLT.PNLT   Properties of Nilpotent Linear Transformations 617


Theorem DNLT
Diagonalizable Nilpotent Linear Transformations
Suppose the linear transformation T: V H V is nilpotent. Then T is diagonalizable if and only T is the
zero linear transformation.                                                                         D
Proof We start with the easy direction. Let n = dim (V).
    (<) The linear transformation Z: V H V defined by Z (v) = 0 for all v E V is nilpotent of index
p =1 and a matrix representation relative to any basis of V is the n x n zero matrix, O. Quite obviously,
the zero matrix is a diagonal matrix (Definition DIM [435]) and hence Z is diagonalizable (Definition DZM
[435]).
    (-) Assume now that T is diagonalizable, so -YT (A) = oT (A) for every eigenvalue A (Theorem DMFE
[438]). By Theorem ENLT [615], T has only one eigenvalue (zero), which therefore must have algebraic
multiplicity n (Theorem NEM [425]). So the geometric multiplicity of zero will be n as well, yT (0) = n.
    Let B be a basis for the eigenspace ET (0). Then B is a linearly independent subset of V of size n, and
by Theorem G [355] will be a basis for V. For any x E B we have

                        T (x) O=0x                        Definition EM [404]
                             = 0                          Theorem ZSSM [286]

So T is identically zero on a basis for B, and since the action of a linear transformation on a basis determines
all of the values of the linear transformation (Theorem LTDB [462]), it is easy to see that T (v) = 0 for
every v E V.
    So, other than one trivial case (the zero matrix), every nilpotent linear transformation is not diag-
onalizable. It remains to see what is so "essential" about this broad class of non-diagonalizable linear
transformations. For this we now turn to a discussion of kernels of powers of nilpotent linear transforma-
tions, beginning with a result about general linear transformations that may not necessarily be nilpotent.
Theorem KPLT
Kernels of Powers of Linear Transformations
Suppose T: V H V is a linear transformation, where dim (V) = n. Then there is an integer m, 0 < m < n,
such that

             {0} =Z(T°) C K(T1) AC (T2) C...CI(Tm) -C (Tm+1) -4C(Tm+2)


Proof There are several items to verify in the conclusion as stated. First, we show that C (Tk) C /C (Tk+1)
for any k. Choose z E C(Tk). Then

                    Tk+1 (   - T (T  (z))                   Definition LTC [469]

                             =T (0)                         Definition KLT [481]
                             o                              Theorem LT TZZ [456]

So by Definition KLT [481], z E EC(Tk+1) and by Definition SSET [683] we have K(Tk) C /CTk+l)
    Second, we demonstrate the existence of a power m where consecutive powers result in equal kernels.
A by-product will be the condition that m can be chosen so that m < n. To the contrary, suppose that


Since C(Tk) C 1C(Tk+1), Theorem PSSD [358] implies that dim (K(Tk+1)) > dim (K(Tk)) + 1. Repeated
application of this observation yields

                               dim (K(Tn+1)) > dim (K(TTh)) + 1


Version 2.02


﻿
                                 Subsection NLT.PNLT Properties of Nilpotent Linear Transformations 618


                                              > dim (1C(T"-1)) + 2


                                              > dim (K (T0)) + (n + 1)
                                                dim ({0}) +n+ 1


Thus, K(Tn+l) has a basis of size at least n+ 1, which is a linearly independent set of size greater than n
in the vector space V of dimension n. This contradicts Theorem G [355].
    This contradiction yields the existence of an integer k such that K(Tk) - / ('k+1), so we can define
m to be smallest such integer with this property. From the argument above about dimensions resulting
from a strictly increasing chain of subspaces, it should be clear that m < n.
    It remains to show that once two consecutive kernels are equal, then all of the remaining kernels are
equal. More formally, if 1C(Tm) = K (Tm+1),then K(Tm) = K(Tm+j) for all j > 1. We will give a proof
by induction on j (Technique I [694]). The base case (j = 1) is precisely our defining property for m.
    In the induction step, we assume that K(Tm) =1C(Tm+j) and endeavor to show that K(Tm)
K(Tm+j+l). At the outset of this proof we established that C(Tm) Cc (Tm+j+1). So Definition SE
[684] requires only that we establish the subset inclusion in the opposite direction. To wit, choose z E
K(Tm+j+l). Then

                      0 = Tm+j+1 (z)                       Definition KLT [481]
                        = Tm+j (T (z))                     Definition LTC [469]
                        = Tm (T (z))                       Induction Hypothesis
                        = Tm+1 (z)                         Definition LTC [469]
                        = Tm (z)                           Base Case

So by Definition KLT [481], z E 1C(Tm) as desired.                                                  U
    We now specialize Theorem KPLT [616] to the case of nilpotent linear transformations, which buys us
just a bit more precision in the conclusion.
Theorem KPNLT
Kernels of Powers of Nilpotent Linear Transformations
Suppose T: V H V is a nilpotent linear transformation with index p and dim (V) = n. Then 0 < p   n
and

                 {0} = K(T°) C KC(T A) C  T2) C .---C MTp) - MTv+1) _       -.._V


Proof Since TP      0 it follows that Tvj  0 for all j > 0 and thus K(TP~i) =V for j > 0. So the value
of m guaranteed by Theorem KPLT [616] is at most p. The only remaining aspect of our conclusion that
does not follow from Theorem KPLT [616] is that m =p. To see this we must show that KC (Tk) Q /C'(Tk+l)
for 0 < k   p - 1. If K(Tk) - /C(Tk+l) for some k < p, then KC(Tk) - /C('P) =V. This implies that
Tk = 0, violating the fact that T has index p. So the smallest value of m is indeed p, and we learn that

    The structure of the kernels of powers of nilpotent linear transformations will be crucial to what follows.
But immediately we can see a practical benefit. Suppose we are confronted with the question of whether


or not an n x n matrix, A, is nilpotent or not. If we don't quickly find a low power that equals the zero
matrix, when do we stop trying higher and higher powers? Theorem KPNLT [617] gives us the answer: if
we don't see a zero matrix by the time we finish computing An, then it is not going to ever happen. We'll
now take a look at one example of Theorem KPNLT [617] in action.


Version 2.02


﻿
                                Subsection NLT.PNLT   Properties of Nilpotent Linear Transformations 619


Example KPNLT
Kernels of powers of a nilpotent linear transformation
We will recycle the nilpotent matrix A of index 4 from Example NM64 [610]. We now know that would
have only needed to look at the first 6 powers of A if the matrix had not been nilpotent. We list bases for
the null spaces of the powers of A. (Notice how we are using null spaces for matrices interchangeably with
kernels of linear transformations, see Theorem KNSI [552] for justification.)


N(A)


N(A2)


N(A3)


AF


AF


AF


/-3
-3
K3
-3
-3
-2
/1
  0
  3
  1
  0
  -1
/ 1     0


3
5
4
3
3
3
-2
-2
0
-2
-2
-2
0
0
0
0
0
0
0
0
0
0
0
0


-2
-3
-2
-2
-2
-2
1
1
0
1
1
1
-1
-1
0
-1
-1
-1


5
4
6
5
4
2
0
1
-3
0
1
2
0
0
0
0
0
0


0
3
-4
0
2
4


-5
-9
-3
-5
-6
-7
4-
4
0
4
4
4


0
0
0
0
0
0
0
0
0
0
0
0


-3
-3
0
-3
-3
-3
0
01
01
01
0)
0_/


I


I


N(A4) =N


0
0
0
0
0
0


0
0
0
0
0
0


0
0
0
0
0
0


<I


{


_K


  1
  0
  0
  0
  0
~1
0
0
0'
0
0


  2
  2
  5
  2
  1
  0
  0
  1
  2
  0

  0
  0
  0
  1
' 0
  0'
  0
  0
  1
  0
  0 '
  0
  _0_


-1
-1
-5
-1
  0
  1
  2
  1
  0
  2'
  0
  0
  1
  0
  0
  1
  0
  0
0
0
1
0'
0
0


  0
  -3
  0
  0
  2
  0
  0
  0
  0'

  1
  0
0
0
0
1 '
0
0


0
0
0
0
1
0


0
0
0
0
0
1


I


   0
   2
   0
'0
   0
   _1 _
   0
   0
   0
   0
   0
 _1_


With the exception of some convenience scaling of the basis vectors in N(A2) these are exactly the basis
vectors described in Theorem BNS [139]. We can see that the dimension of N(A) equals the geometric
multiplicity of the zero eigenvalue. Why is this not an accident? We can see the dimensions of the kernels
consistently increasing, and we can see that N(A4) - C6. But Theorem KPNLT [617] says a little more.
Each successive kernel should be a superset of the previous one. We ought to be able to begin with a basis
of N(A) and extend it to a basis of N(A2). Then we should be able to extend a basis of N(A2) into a
basis of N(A3), all with repeated applications of Theorem ELIS [355]. Verify the following,


                                      2     -1-
                                      2     -1

                         N(A) =       5      -
                                      1      0
                                      _0_   _1 _


Version 2.02


﻿
Subsection NLT.CFNLT


Canonical Form for Nilpotent Linear Transformations 620


                                       2     -1      0     0
                                       2     -1     -3     2

                        .NA2           5     -5      0     0
                                       2 '-1 '0          '0
                                       1     0       2     0
                                       _0     1      0     _1_
                                       2-    -1      0     0     0-
                                       2     -1     -3     2     0

                        NA3            5     -5      0     0     0
                                       2 '-1 '0          '0 '0
                                       1      0      2      0    0
                                       _0     1      0     _1_   1_
                                       2     -1      0     0     0     0-
                                       2     -1     -3     2     0     0

                        .N A4          5     -5      0     0     0     0
                                       2  ' -1   '   0   ' 0   ' 0  ' 1
                                       1     0       2     0     0     0


Do not be concerned at the moment about how these bases were constructed since we are not describing
the applications of Theorem ELIS [355] here. Do verify carefully for each alleged basis that, (1) it is a
superset of the basis for the previous kernel, (2) the basis vectors really are members of the kernel of the
right power of A, (3) the basis is a linearly independent set, (4) the size of the basis is equal to the size of
the basis found previously for each kernel. With these verifications, Theorem G [355] will tell us that we
have successfully demonstrated what Theorem KPNLT [617] guarantees.


Subsection CFNLT
Canonical Form for Nilpotent Linear Transformations


Our main purpose in this section is to find a basis so that a nilpotent linear transformation will have a
pleasing, nearly-diagonal matrix representation. Of course, we will not have a definition for "pleasing," nor
for "nearly-diagonal." But the short answer is that our preferred matrix representation will be built up
from Jordan blocks, Jn (0). Here's the theorem. You will find Example CFNLT [623] helpful as you study
this proof, since it uses the same notation, and is large enough to (barely) illustrate the full generality of
the theorem (see ).

Theorem CFNLT
Canonical Form for Nilpotent Linear Transformations
Suppose that T: V a V is a nilpotent linear transformation of index p. Then there is a basis for V so
that the matrix representation, MkB , is block diagonal with each block being a Jordan block, J, (0). The
size of the largest block is the index p, and the total number of blocks is the nullity of T, n~ (T). D

Proof We will explicitly construct the desired basis, so the proof is constructive (Technique C [690]),
and can be used in practice. As we begin, the basis vectors will not be in the proper order, but we will
rearrange them at the end of the proof. For convenience, define ni =rn (Ti), so for example, no = 0,


ni = n (T) and nr = n (TP) = dim (V). Define si = ni - n2_1, for 1 < i < p, so we can think of si as
"how much bigger" C (Ti) is than C (Ti-i). In particular, Theorem KPNLT [617] implies that si > 0 for
1   i  p.


Version 2.02


﻿
                          Subsection NLT.CFNLT   Canonical Form for Nilpotent Linear Transformations 621


   We are going to build a set of vectors zi,i, 1 < i < p, 1  j    si. Each zi,3 will be an element of
C (Ti) and not an element of C (Ti-1). In total, we will obtain a linearly independent set of   1si =
    1 ni -ni-1 =  p - 0o= dim (V) vectors that form a basis of V. We construct this set in pieces, starting
at the "wrong" end. Our procedure will build a series of subspaces, Zi, each lying in between C (Ti-1) and
C(Ti), having bases zi,, 1 < j   si, and which together equal V as a direct sum. Now would be a good
time to review the results on direct sums collected in Subsection PD.DS [361]. OK, here we go.
   We build the subspace Z, first (this is what we meant by "starting at the wrong end"). K (TP-1) is
a proper subspace of C(TP) = V (Theorem KPNLT [617]). Theorem DSFOS [362] says that there is a
subspace of V that will pair with the subspace C (TP-) to form a direct sum of V. Call this subspace
Z,, and choose vectors zyj, 1 < j < s, as a basis of Zp, which we will denote as Bp. Note that we have a
fair amount of freedom in how to choose these first basis vectors. Several observations will be useful in the
next step. First V =1CK(TP-1)  Z. The basis Bp = {zy,1, zp,2, zp,3, ... , z,s,} is linearly independent.
For 1 < j    si, zp,3 E C(Tp) = V. Since the two subspaces of a direct sum have no nonzero vectors in
common (Theorem DSZI [363]), for 1 < j < si, zy,3 0 C(TP-1). That was comparably easy.
   If obtaining Z, was easy, getting Z,_1 will be harder. We will repeat the next step p - 1 times, and
so will do it carefully the first time. Eventually, Z,_1 will have dimension sp-1. However, the first sp
vectors of a basis are straightforward. Define zp_1,j = T (zpj), 1  j < sp. Notice that we have no choice
in creating these vectors, they are a consequence of our choices for zy,3. In retrospect (i.e. on a second
reading of this proof), you will recognize this as the key step in realizing a matrix representation of a
nilpotent linear transformation with Jordan blocks. We need to know that this set of vectors in linearly
independent, so start with a relation of linear dependence (Definition RLD [308]), and massage it,

        o0= aizp-11 + a2zp-1,2 + a3zp-1,3 +- - - --aszp_1,s,
          = a1T (zy,1) + a2T (zp,2) + a3T (zp,3) + ... + asT (zy,8,)
          = T (aizp,i + a2zp,2 + a3zp,3 + - --+ aspzp,s,)           Theorem LTLC [462]


Define x = a1zy,1+a2zp,2+a3zp,3+ --aspzy,s,. The statement just above means that x E C(T) C E(TP-1)
(Definition KLT [481], Theorem KPNLT [617]). As defined, x is a linear combination of the basis vectors
Bp, and therefore x E Zp. Thus x E K(TP-1) n Zp (Definition SI [685]). Because V =A(TP-1) e@Z,,
Theorem DSZI [363] tells us that x = 0. Now we recognize the definition of x as a relation of linear
dependence on the linearly independent set Bp, and therefore a1= a2 = - - - = as, = 0 (Definition LI
[308]). This establishes the linear independence of zp_ 1 < j < s, (Definition LI [308]).
   We also need to know where the vectors zp_1j, 1 < j     sp live. First we demonstrate that they are
members of C(Tp-1).

                                    Tp-1 (zp,)     - Tp-1 (T (zy,3))
                                                  =TP (zy,5)


So zpi~ C E (TP- ), 1    j   sp. However, we now show that these vectors are not elements of KC (TP-2).
Suppose to the contrary (Technique CD [692]) that zpi C AC (TP 2). Then

                                          o   Tv-2 (zp_1,j)


                                             = p-1     .
                                           w  Ty K(z)


which contradicts the earlier statement that zp,3 0 AC (TP-l). So Zp 1 AC (TP-2), 1 G j < sp.


Version 2.02


﻿
                         Subsection NLT.CFNLT    Canonical Form for Nilpotent Linear Transformations 622


   Now choose a basis Co-2 = {u1, u2, u3, ..., unp-2} for C(TP-2). We want to extend this basis by
adding in the zp_1, to span a subspace of /C(TP-1). But first we want to know that this set is linearly
independent. Let ak, 1 < k rno-2 and by, 1 < j < sp be the scalars in a relation of linear dependence,

               0 = alul + a2u2 + ... + anp-2unp-2 + bizp-1,1 + b2zp-1,2 + ... + bspzp_1,s,

Then,

        0 = Tp-2 (0)
          = Tp-2 (aiui + a2u2 + -  + anp-2unp-2 + bizp-1,1 + b2zp-1,2 + -.. + bspzp_1,s,)
          = a1Tp-2 (U1) + a2Tp-2 (U2) + ... + anp-2Tp-2 (unp-2) +
             b1Tp-2 (zp_1,1) + b2Tp-2 (zP-1,2) + ... + bsTp-2 (z_1,8,)
          = a10 + a20 + - - - + anp-20 + b1Tp-2 (z_1,1) + b2Tp-2 (z-1,2) + ... + bsTp-2 (z_1,,)
          = b1Tp-2 (zp_1,1) + b2Tp-2 (zp-1,2) + ... + bsTp-2 (z_1,8,)
            b1Tp-2 (T (zy,1)) + b2Tp-2 (T (zo,2)) + ... + bsTp-2 (T (zy,8,))
          = b1Tp-1 (zyi) + b2Tp-1 (zp,2) + ... + bsTp-1 (zy,8,)
          = Tp-1 (bizp,1 + b2zp,2 +- - - + b8 zy,s,)

Define y = bizp,1 + b2zp,2 + - -- + bspzy,s,. The statement just above means that y E 1C(TP-1) (Definition
KLT [481]). As defined, y is a linear combination of the basis vectors Bp, and therefore y E Zp. Thus
y E 1C (TP-1) nZp. Because V =1C(TP-1) eZp, Theorem DSZI [363] tells us that y = 0. Now we recognize
the definition of y as a relation of linear dependence on the linearly independent set Bp, and therefore
bi= b2 = - - - = bs, = 0 (Definition LI [308]). Return to the full relation of linear dependence with both sets
of scalars (the ai and b3). Now that we know that b3 = 0 for 1 < j < sp, this relation of linear dependence
simplifies to a relation of linear dependence on just the basis C,_1. Therefore, ai = 0, 1   a2   rn_1 and
we have the desired linear independence.
   Define a new subspace of 1C(TP-1) as

                  p-1 = ({ui, U)2, 113, .,up-1, zp1,1, Zp-1,2, Zp-1,3, ..., zp_1,s,

By Theorem DSFOS [362] there exists a subspace of 1C(TP-1) which will pair with Qi to form a direct
sum. Call this subspace R,_1, so by definition, C(T-1) = Q_1eR_1. We are interested in the dimension
of R,_1. Note first, that since the spanning set of Q,_ is linearly independent, dim (Qpi) = np-2 + sp.
Then

              dim (Rp_1) = dim (1C(TP1)) - dim (Qi)                 Theorem DSD [364]
                         =-,_ - (ny~-2 + sp)
                         =(n,_1 - nop-2) - s
                         =3p_1 - Spj

Notice that if s,_1 =sp, then R,_1 is trivial. Now choose a basis of R,_1, and denote these s,_1 - s
vectors as zp-1,s19+1, zp-1,s,+2, zp-1,s,+3, ..., zp_,s_* This is another occassion to notice that we have
some freedom in this choice.
   We now have KC(TP-l) = Q,_1 e R,_1, and we have bases for each of the two subspaces. The union of
these two bases will therefore be a linearly independent set in KC(TP-l) with size


(np-2 + sp) + (sp-1 - sp)= n-2 + Sp-1
                        = np-2 + np-1 - np-2
                        = np-1 =dim (K (TP-))


Version 2.02


﻿
                          Subsection NLT.CFNLT   Canonical Form for Nilpotent Linear Transformations 623


So, by Theorem G [355], the following set is a basis of C(TP-l),

          {U1, U2, U3, ... , unp-2, Zp_1,1, Zp-1,2, ... , Zp_1,s,, z ,p-1,s+1, Zp-1,sp+2, -.-.-, zp_1,s,_1}

We built up this basis in three parts, we will now split it in half. Define the subspace Z,_1 by

                           Z,_1 = (Bp_1)=_K{zp_i,i, Zp-1,2, ... , Zp_1,s,_,1

where we have implicitly denoted the basis as Bp_1. Then Theorem DSFB [361] allows us to split up the
basis for C(TP-1) as C,_1 U B,_1 and write

                                     1C(TP-1) -    Tp-2)e Zi

Whew! This is a good place to recap what we have achieved. The vectors z2,3 form bases for the subspaces
Zi and right now

                              V =C(TP-') e          CZ= K(Tp-2) e Z,1e(Z,

The key feature of this decomposition of V is that the first sp vectors in the basis for Z,_1 are outputs of
the linear transformation T using the basis vectors of Z, as inputs.
   Now we want to further decompose C (TP-2) (into C (TP-3) and Zp-2). The procedure is the same as
above, so we will only sketch the key steps. Checking the details proceeds in the same manner as above.
Technically, we could have set up the preceding as the induction step in a proof by induction (Technique
I [694]), but this probably would make the proof harder to understand.
   Hit each element of Bp_1 with T, to create vectors Zp-2,j, 1 < j < sp-1. These vectors form a linearly
independent set, and each is an element of C (TP-2), but not an element of C (TP-3). Grab a basis Co-3
of C (TTh3) and tack on the newly-created vectors Zp-2,j, 1 < j    sp-1. This expanded set is linearly
independent, and we can define a subspace Qp-2 using it as a basis. Theorem DSFOS [362] gives us a
subspace Rp-2 such that C(TP-2) = Qp-2   Rp-2. Vectors zp-2j, so-1 + 1 < j < sp-2 are chosen as a
basis for Rp-2 once the relevant dimensions have been verified. The union of Co-3 and zp-2j, 1 < j < sp-2
then form a basis of C (TP-2), which can be split into two parts to yield the decomposition

                                     C(Tp-2) = C(TP-3) e Zv-2

Here Zp-2 is the subspace of C(Tv-2) with basis Bp-2 = {zp-2,j 1 < j < sp-2}. Finally,

              V =K(TP1) e            (TP2) e        e         (TP3) e Zp-2e Z      eZ

Again, the key feature of this decomposition is that the first vectors in the basis of Zp-2 are outputs of T
using vectors from the basis Z,_1 as inputs (and in turn, some of these inputs are outputs of T derived
from inputs in Zr).
    Now assume we repeat this procedure until we decompose KC(T2) into subspaces KC(T) and Z2. Finally,
decompose K(T) into subspaces KC(TO) = K(In) ={O} and Zi, so that we recognize the vectors ziyj,
1   j   si =rn as elements of K(T). The set


is linearly independent by Theorem DSLI [364] and has size
                               p       p


                                 si >=n - n2_1 = np -  =o dim (V)
                              i=1     i=1

So by Theorem G [355], B is a basis of V. We desire a matrix representation of T relative to B (Definition
MR [542]), but first we will reorder the elements of B. The following display lists the elements of B in


Version 2.02


﻿
                           Subsection NLT.CFNLT    Canonical Form for Nilpotent Linear Transformations 624


the desired order, when read across the rows right-to-left in the usual way. Notice that we arrived at these
vectors column-by-column, beginning on the right.

             Zi,1                Z2,1               Z3,1                --.Zd,i
             Z1,2                Z2,2               Z3,2                --.             Zd,2


             Z1,sd               Z2,sd              Z3,sd.--                            Zd,sd
             Z1,sd+1             Z2,sd+1            Z3,sd+1- -.


             Z1,s3               Z2,s3              Z3,s3


             Z1,s2               Z2,s2


             Z1,s1


It is difficult to layout this table with the notation we have been using, nor would it be especially useful
to invent some notation to overcome the difficulty. (One approach would be to define something like the
inverse of the nonincreasing function, i - si.) Do notice that there are si = n1 rows and d columns.
Column i is the basis Bi. The vectors in the first column are elements of 1C(T). Each row is the same
length, or shorter, than the one above it. If we apply T to any vector in the table, other than those in the
first column, the output is the preceding vector in the row.
    Now contemplate the matrix representation of T relative to B as we read across the rows of the table
above. In the first row, T (z1,1) = 0, so the first column of the representation is the zero column. Next,
T (Z2,1) z=Zi,1, so the second column of the representation is a vector with a single one in the first entry,
and zeros elsewhere. Next, T (z3,1) = z2,1, so column 3 of the representation is a zero, then a one, then
all zeros. Continuing in this vein, we obtain the first d columns of the representation, which is the Jordan
block Jd (0) followed by rows of zeros.
    When we apply T to the basis vectors of the second row, what happens? Applying T to the first vector,
the result is the zero vector, so the representation gets a zero column. Applying T to the second vector in
the row, the output is simply the first vector in that row, making the next column of the representation
all zeros plus a lone one, sitting just above the diagonal. Continuing, we create a Jordan block, sitting on
the diagonal of the matrix representation. It is not possible in general to state the size of this block, but
since the second row is no longer than the first, it cannot have size larger than d.
    Since there are as many rows as the dimension of KC(T), the representation contains as many Jordan
blocks as the nullity of T, n~ (T). Each successive block is smaller than the preceding one, with the first,
and largest, having size d. The blocks are Jordan blocks since the basis vectors Zi,3 were often defined as
the result of applying T to other elements of the basis already determined, and then we rearranged the
basis into an order that placed outputs of T just before their inputs, excepting the start of each row, which
was an element of/ K(T).U

    The proof of Theorem CFNLT [619] is constructive (Technique C [690]), so we can use it to create bases
of nilpotent linear transformations with pleasing matrix representations. Recall that Theorem DNLT [616]


told us that nilpotent linear transformations are almost never diagonalizable, so this is progress. As we
have hinted before, with a nice representation of nilpotent matrices, it will not be difficult to build up
representations of other non-diagonalizable matrices. Here is the promised example which illustrates the
previous theorem. It is a useful companion to your study of the proof of Theorem CFNLT [619].


Version 2.02


﻿
                          Subsection NLT.CFNLT  Canonical Form for Nilpotent Linear Transformations 625


Example CFNLT
Canonical form for a nilpotent linear transformation
The 6 x 6 matrix, A, of Example NM64 [610] is nilpotent of index p = 4. If we define the linear trans-
formation T: C6 H C6 by T (x) = Ax, then T is nilpotent of index 4 and we can seek a basis of C6 that
yields a matrix representation with Jordan blocks on the diagonal. The nullity of T is 2, so from Theorem
CFNLT [619] we can expect the largest Jordan block to be J4 (0), and there will be just two blocks. This
only leaves enough room for the second block to have size 2.
   We will recycle the bases for the null spaces of the powers of A from Example KPNLT [618] rather than
recomputing them here. We will also use the same notation used in the proof of Theorem CFNLT [619].
    To begin, 34 = n4 - ns = 6 - 5 = 1, so we need one vector of C(T4) = C6, that is not in K(T3), to
be a basis for Z4. We have a lot of latitude in this choice, and we have not described any sure-fire method
for constructing a vector outside of a subspace. Looking at the basis for C (T3) we see that if a vector is
in this subspace, and has a nonzero value in the first entry, then it must also have a nonzero value in the
fourth entry. So the vector

                                                      1
                                                      0
                                                      0
                                              Z4,1 =  0

                                                      0
                                                      _0

will not be an element of C (T3) (notice that many other choices could be made here, so our basis will not
be unique). This completes the determination of Z, = Z4.
    Next, s3 = n - n2 = 5 - 4 = 1, so we again need just a single basis vector for Z3. We start by
evaluating T with each basis vector of Z4,

                                                              --3
                                                              -3
                                                              -3
                                    z3,1= T (z41) =Az4,1
                                                              -3
                                                              _-2_

Since 83 = s4, the subspace R3 is trivial, and there is nothing left to do, z3,1 is the lone basis vector of Z3.
    Now s2 = n2 - ni = 4 - 2 = 2, so the construction of Z2 will not be as simple as the construction of
Z3. We first apply T to the basis vector of Z2,

                                                              ~1
                                                              0
                                                              3
                                      z21- T (za,1) - Aza,1=
                                                               0
                                                               -1_


The two basis vectors of 1C(T1), together with z2,1, form a basis for Q2. Because dim (K(T2)) -dim (Q2) =
4 - 3 = 1 we need only find a single basis vector for R2. This vector must be an element of C(T2), but
not an element of Q2. Again, there is a variety of vectors that fit this description, and we have no precise
algorithm for finding them. Since they are plentiful, they are not too hard to find. We add up the four basis
vectors of K (T2), ensuring an element of K (T2). Then we check to see if the vector is a linear combination


Version 2.02


﻿
                          Subsection NLT.CFNLT    Canonical Form for Nilpotent Linear Transformations 626


of three vectors: the two basis vectors of C(T1) and z2,1. Having passed the tests, we have chosen

                                                      2
                                                      1
                                                      2
                                              Z2,2    2
                                                      2
                                                      1

Thus, Z2 = ({z2,1, z2,2}).
    Lastly, si = ni - no = 2 - 0 = 2. Since s2 = si, we again have a trivial R1 and need only complete our
basis by evaluating the basis vectors of Z2 with T,

                                                              1
                                                              1
                                                              0
                                    zi,1 = T (z2,1) = Az2,1   1
                                                              1
                                                              1
                                                              -2
                                                              -2

                                    zi,2 = T (z2,2) = Az2,2 -2
                                                              -1
                                                              0 _


Now we reorder these vectors as the desired basis,

                                  B = {z1,1, z2,1, z3,1, z4,1, z1,2, z2,2}

We now apply Definition MR [542] to build a matrix representation of T relative to B,

                                                                    0
                                                                    0

                              PB (T (z1,1)) = PB (Azi,i) = PB (0) =0
                                                                    0
                                                                    0
                                                                      ~1
                                                                      0
                                                                      0
                              PB (T (z2,1)) =PB (Az2,1) =PB (zi,1) =0
                                                                      0
                                                                      _0_


                                        0
                                        1
                                        0
PB (T (z3,1)) = PB (Az3,i) = PB (z2,1)  0
                                        0
                                        0


Version 2.02


﻿
Subsection NLT.CFNLT  Canonical Form for Nilpotent Linear Transformations 627


PB (T (z4,1))


PB (T (zi,2))


PB (T (z2,2))


PB (Az4,i) = PB (z3,1)


PB (Azi,2) = PB (0) =


PB (Az2,2) = PB (zi,2)


   0
   0
   1
   0
   0
   0
0
0
0
0
0
_0_
   0
   0
   0
   0
   1
   _0_


Installing these vectors as the columns of the matrix representation we have


MT
MBB


0 1
0 0
0 0
0 0
0 0
0 0


0
1
0
0
0
0


0
0
1
0
0
0


0
0
0
0
0
0


0
0
0
0
1
0


which is a block diagonal matrix with Jordan blocks J4 (0) and J2 (0). If we constructed the matrix S
having the vectors of B as columns, then Theorem SCB [583] tells us that a similarity transformation
with S relates the original matrix representation of T with the matrix representation consisting of Jordan
blocks., i.e. S-1AS =MBTB.
    Notice that constructing interesting examples of matrix representations requires domains with dimen-
sions bigger than just two or three. Going forward we will see several more big examples.


Version 2.02


﻿
                                                                   Section IS Invariant Subspaces 628


Section IS
Invariant Subspaces
.                                                                                                -


THIS SECTION IS IN DRAFT FORM
NEARLY COMPLETE

   We have seen in Section NLT [610] that nilpotent linear transformations are almost never diagonalizable
(Theorem DNLT [616]), yet have matrix representations that are very nearly diagonal (Theorem CFNLT
[619]). Our goal in this section, and the next (Section JCF [644]), is to obtain a matrix representation of any
linear transformation that is very nearly diagonal. A key step in reaching this goal is an understanding of
invariant subspaces, and a particular type of invariant subspace that contains vectors known as "generalized
eigenvectors."

Subsection IS
Invariant Subspaces


As is often the case, we start with a definition.

Definition IS
Invariant Subspace
Suppose that T: V H V is a linear transformation and W is a subspace of V. Suppose further that
T (w) E W for every w E W. Then W is an invariant subspace of V relative to T.             A

   We do not have any special notation for an invariant subspace, so it is important to recognize that an
invariant subspace is always relative to both a superspace (V) and a linear transformation (T), which will
sometimes not be mentioned, yet will be clear from the context. Note also that the linear transformation
involved must have an equal domain and codomain  the definition would not make much sense if our
outputs were not of the same type as our inputs.
   As usual, we begin with an example that demonstrates the existence of invariant subspaces. We will
return later to understand how this example was constructed, but for now, just understand how we check
the existence of the invariant subspaces.
Example TIS
Two invariant subspaces
Consider the linear transformation T: C4 H C4 defined by T (x) = Ax where A is given by


                                           K -8 2 411


Define (with zero motivation),


                             W -2 -2]W2                              [~


                                [0]                                  [1]

and set W = ({wi, w2}). We verify that W is an invariant subspace of C4 with respect to T. By the
definition of W, any vector chosen from W can be written as a linear combination of wi and w2. Suppose


Version 2.02


﻿
Subsection IS.IS Invariant Subspaces 629


that w E W, and then check the details of the following verification,


T (w) = T (aiwi + a2w2)
      = a1T (wi) + a2T(w2)
             -1         5
             -2         -2
      =ai [0     + a2   -3
             1          2
      = aiw2 + a2 ((-1)wi + 2w2)
      = (-a2)wi + (ai + 2a2)w2


Definition SS [298]
Theorem LTLC [462]


                      E W                                         Definition SS [298]

So, by Definition IS [627], W is an invariant subspace of C4 relative to T. In an entirely similar manner
we construct another invariant subspace of T.
   With zero motivation, define


       -3
       -1
X1= 1
        0


0
-1

1


and set X = ({x1, x2}). We verify that X is an invariant subspace of C4 with respect to T. By the
definition of X, any vector chosen from X can be written as a linear combination of xl and x2. Suppose
that x E X, and then check the details of the following verification,


T (x) = T (bix1 + b2x2)
      = b1T (xi) + b2T (x2)
             3         3

      = b1<i     + b2[f
            -1         -1

      = bi ((-1)xi + X2) + b2 ((-1)xi + (-3)x2)
      = (-bi - b2)xi + (bi - 3b2)x2


Definition SS [298]
Theorem LTLC [462]


                 E X                                                  Definition SS [298]

So, by Definition IS [627], X is an invariant subspace of C4 relative to T.
   There is a bit of magic in each of these verifications where the two outputs of T happen to equal linear
combinations of the two inputs. But this is the essential nature of an invariant subspace. We'll have a
peek under the hood later, and it won't look so magical after all.
   As a hint of things to come, verify that B = {w1, w2, x1, x2} is a basis of C4. Splitting this basis in
half, Theorem DSFB [361], tells us that C4 = W ( X. To see why a decomposition of a vector space into
a direct sum of invariant subspaces might be interesting, construct the matrix representation of T relative
to B, MT   .Hmmmmmm.


   Example
subspaces?
and in more
eigenspaces.


TIS [627] is a bit mysterious at this stage. Do we know any other examples of invariant
Yes, as it turns out, we have already seen quite a few. We'll give some examples now,
general situations, describe broad classes of invariant subspaces with theorems. First up is


Version 2.02


﻿
                                                              Subsection ISIS Invariant Subspaces 630


Theorem EIS
Eigenspaces are Invariant Subspaces
Suppose that T: V H V is a linear transformation with eigenvalue A and associated eigenspace ET (A).
Let W be any subspace of ET (A). Then W is an invariant subspace of V relative to T.         D
Proof Choose w E W. Then

                       T (w) =Aw                         Definition EELT [574]
                             E W                         Property SC [279]
So by Definition IS [627], W is an invariant subspace of V relative to T.                         U
   Theorem EIS [629] is general enough to determine that an entire eigenspace is an invariant subspace,
or that simply the span of a single eigenvector is an invariant subspace. It is not always the case that any
subspace of an invariant subspace is again an invariant subspace, but eigenspaces do have this property.
Here is an example of the theorem, which also allows us to very quickly build several several invariant (4x4,
2 evs, 1 2x2 jordan, 1 2x2 diag)
Example EIS
Eigenspaces as invariant subspaces
Define the linear transformation S: M1il22 1-M22 by
                   S    a  b )     -2a+19b-33c+21d        -3a+16b-24c+15d
                        c d )       -2a+9b-13c+9d           -a+4b-6c+5d
Build a matrix representation of S relative to the standard basis (Definition MR [542], Example BM [326])
and compute eigenvalues and eigenspaces of S with the computational techniques of Chapter E [396] in
concert with Theorem EER [586]. Then


                  S{ 4 3                            E6 3                  -9) -3
                            Es 1)=   21              s ()        10' 0        1

So by Theorem EIS [629], both Es (1) and Es (2) are invariant subspaces of M1122 relative to S. However,
Theorem EIS [629] provides even more invariant subspaces. Since Es (1) has dimension 1, it has no
interesting subspaces, however Es (2) has dimension 2 and has a plethora of subspaces. For example, set

                                    26 3        -9 -3        -6 -3
                               u=2 1 0 +3        0    1       2   3

and define U = ({u}). Then since U is a subspace of Es (2), Theorem EIS [629] says that U is an invariant
subspace of JMil22 (or we could check this claim directly based simply on the fact that u is an eigenvector of
S).
   For every linear transformation there are some obvious, trivial invariant subspaces. Suppose that
T: V a V is a linear transformation. Then simply because T is a function (Definition LT [452]), the
subspace V is an invariant subspace of T. In only a minor twist on this theme, the range of T, 7Z(T), is an
invariant subspace of T by Definition RLT [496]. Finally, Theorem LTTZZ [456] provides the justification
for claiming that {O} is an invariant subspace of T.
   That the trivial subspace is always an invariant subspace is a special case of the next theorem. As an
easy exercise before reading the next theorem, prove that the kernel of a linear transformation (Definition
KLT [481]), K(T), is an invariant subspace. We'll wait.
Theorem KPIS
Kernels of Powers are Invariant Subspaces


Suppose that T: V i V is a linear transformation. Then 1C(Tk) is an invariant subspace of V.  Q
Proof Suppose that z E K(Tk). Then

                   Tk (T (z)) - Tk+1 (z)                    Definition LTC [469]


Version 2.02


﻿
                                        Subsection IS.GEE Generalized Eigenvectors and Eigenspaces 631


                             = T (Tk (z))                   Definition LTC [469]

                             = T (0)                        Definition KLT [481]
                             = 0                            Theorem LTTZZ [456]

So by Definition KLT [481], we see that T (z) E /C (Tk). Thus C (Tk) is an invariant subspace of V relative
to T (Definition IS [627]).                                                                       U

   Two interesting special cases of Theorem KPIS [629] occur when choose k = 0 and k =1. Rather than
give an example of this theorem, we will refer you back to Example KPNLT [618] where we work with null
spaces of the first four powers of a nilpotent matrix. By Theorem KPIS [629] each of these null spaces is
an invariant subspace of the associated linear transformation.
   Here's one more example of invariant subspaces we have encountered previously.

Example ISJB
Invariant subspaces and Jordan blocks
Refer back to Example CFNLT [623]. We decomposed the vector space C6 into a direct sum of the
subspaces Zi, Z2, Z3, Z4. The union of the basis vectors for these subspaces is a basis of C6, which we
reordered prior to building a matrix representation of the linear transformation T. A principal reason for
this reordering was to create invariant subspaces (though it was not obvious then).
   Define

                                                      1      1     -3      1-
                                                      1      0     -3     0

                   X1 = ({zi,1, z2,1, z3,1, z4,1}=    1      1   '-3       0
                                                      1      0     -3     0
                                                      _1_   -1_    -2_    0_
                                             -2     2
                                             -2     1

                   X2 = ({zi,2, Z2,2}) =      )
                                              -1    2
                                              _0 _ -1_

Recall from the proof of Theorem CFNLT [619] or the computations in Example CFNLT [623] that first
elements of X1 and X2 are in the kernel of T, 1C(T), and each element of X1 and X2 is the output of T
when evaluated with the subsequent element of the set. This was by design, and it is this feature of these
basis vectors that leads to the nearly diagonal matrix representation with Jordan blocks. However, we also
recognize now that this property of these basis vectors allow us to conclude easily that X1i and A?2 are
invariant subspaces of C6 relative to T.
   Furthermore, C6 =AX1 e A?2 (Theorem DSFB [361]). So the domain of T is the direct sum of invariant
subspaces and the resulting matrix representation has a block diagonal form. Hmmmmm.


Subsection GEE
Generalized Eigenvectors and Eigenspaces


We now define a new type of invariant subspace and explore its key properties. This generalization of
eigenvalues and eigenspaces will allow us to move from diagonal matrix representations of diagonalizable
matrices to nearly diagonal matrix representations of arbitrary matrices. Here are the definitions.


Version 2.02


﻿
Subsection IS.GEE Generalized Eigenvectors and Eigenspaces  632


Definition GEV
Generalized Eigenvector
Suppose that T: V i V is a linear transformation. Suppose further that for x # 0, (T
for some k > 0. Then x is a generalized eigenvector of T with eigenvalue A.


AIy)k (x)


=0
A


Definition GES
Generalized Eigenspace
Suppose that T: V H V is a linear transformation. Define the generalized eigenspace of T for A as

                          gT (A) =_{x |I(T - AIv)k (x) = 0 for some k> 0}


(This definition contains Notation GES.)


A


   So the generalized eigenspace is composed of generalized eigenvectors, plus the zero vector. As the
name implies, the generalized eigenspace is a subspace of V. But more topically, it is an invariant subspace
of V relative to T.
Theorem GESIS
Generalized Eigenspace is an Invariant Subspace
Suppose that T: V H V is a linear transformation. Then the generalized eigenspace gT (A) is an invariant
subspace of V relative to T.                                                                     D


Proof First we establish that 9T (A) is a subspace of V. First (T
[456], sO  0T ET(A).
   Suppose that x, y E gT (A). Then there are integers k, f such that
0. Set m = k+ f,

        (T - AIV)r (x + y)   (T - AIV)m (x) + (T - AIV)m (y)
                             (T - AIv)k±e (x) + (T - AIv)k+ (Y)

                             =(T - Xly) (T - X-1y)k X)+


- AIV)1 (0) = 0 by Theorem LTTZZ

(T - AIv)k (x) = 0 and (T - AIv)< (y)


      Definition LT [452]


(T - AIv)k ((T


AIv) (y))


(T - AIv) (0) + (T - AIv)k (0)
0+0
0


Definition LTC [469]

Definition GES [631]
Theorem LTTZZ [456]
Property Z [280]


So x+y EgT(A).
   Suppose that x E gT (A) and a E C. Then there is an integer k such that (T - AIv)k (x) = 0.


(T - AIv)k (ax) = a (T - AIv)k (x)
                = a0
                = 0


Definition LT [452]
Definition GES [631]
Theorem ZVSM [286]


So ax E T (A). By Theorem TSS [293], 9T (A) is a subspace of V.
   Now we show that QT (A) is invariant relative to T. Suppose that x E gT (A). Then there is an integer
k such that (T - AIv)k (x) = 0. Recognize also that (T - AIv)k is a polynomial in T, and therefore
commutes with T (that is, T o p(T) = p(T) o T for any polynomial p(x)). Now,


(T - AIv)k (T (x)) = T ((T

                  = T (0)


AI    )


Definition GES [631]


Version 2.02


﻿
                                         Subsection IS.GEE Generalized Eigenvectors and Eigenspaces 633


                                = 0                               Theorem LTTZZ [456]

This qualifies T (x) for membership in gT (A), so by Definition GES [631], 9T (A) is invariant relative to
T.
   Before we compute some generalized eigenspaces, we state and prove one theorem that will make it
much easier to create a generalized eigenspace, since it will allow us to use tools we already know well, and
will remove some the ambiguity of the clause "for some k" in the definition.
Theorem GEK
Generalized Eigenspace as a Kernel
Suppose that T: V H V is a linear transformation, dim (V) = n, and A is an eigenvalue of T. Then
gr (A) =K((T - AIv)").                                                                              D
Proof The conclusion of this theorem is a set equality, so we will apply Definition SE [684] by establishing
two set inclusions. First, suppose that x E gT (A). Then there is an integer k such that (T - AIv)k (x) = 0.
This is equivalent to the statement that x E ((T - AIv)k). No matter what the value of k is, Theorem
KPLT [616] gives

                                 x E K (T - AIv)k) C 1C((T - AIv)")

So, 9T (A) C K((T - AIV)"). For the opposite inclusion, suppose y E K((T - AIv)"). Then (T - AIV)" (y)
0, so y E gT (A) and thus C((T - AIv)") Cg   (A). By Definition SE [684] we have the desired equality of
sets.
   Theorem GEK [632] allows us to compute generalized eigenspaces as a single kernel (or null space of a
matrix representation) with tools like Theorem KNSI [552] and Theorem BNS [139]. Also, we do not need
to consider all possible powers k and can simply consider the case where k = n. It is worth noting that
the "regular" eigenspace is a subspace of the generalized eigenspace since

                          ET(A) =K((T-AI)1) C((T- AI)"n) =gT(A)

where the subset inclusion is a consequence of Theorem KPLT [616]. Also, there is no such thing as a
"generalized eigenvalue." If A is not an eigenvalue of T, then the kernel of T - AIV is trivial and therefore
subsequent powers of T- AIv also have trivial kernels (Theorem KPLT [616]). So the generalized eigenspace
of a scalar that is not already an eigenvalue would be trivial. Alright, we know enough now to compute
some generalized eigenspaces. We will record some information about algebraic and geometric multiplicities
of eigenvalues (Definition AME [406], Definition GME [406]) as we go, since these observations will be of
interest in light of some future theorems.
Example GE4
Generalized eigenspaces, dimension 4 domain
In Example TIS [627] we presented two invariant subspaces of C4. There was some mystery about just
how these were constructed, but we can now reveal that they are generalized eigenspaces. Example TIS
[627] featured T: C4 - C4 defined by T (x) =Ax with A given by


                                           K -8 2 -11_

A matrix representation of T relative to the standard basis (Definition SUV [173]) will equal A. So we


can analyze A with the techniques of Chapter E [396]. Doing so, we find two eigenvalues, A = 1, -2, with
multiplicities,

                            O'T (1) = 2                          'YT (1) = 1


Version 2.02


﻿
Subsection IS.GEE Generalized Eigenvectors and Eigenspaces  634


aT (-2) = 2


yT (-2) = 1


To apply Theorem GEK
to the power dim (C4) =


[632] we subtract each eigenvalue from the diagonal entries of A, raise the result
4, and compute a basis for the null space.


A= -2


  =1


                  648
(A - (-2)I4)4 = -324
                  405
                  [297


-1215
486
729
-486


729
-486
-486
405


                   -3       0

    9T (- 2) =       1   1  0
                    .   0 _1 _

                 81    -405   -81
(A - (1)I4)4 -  -108 -189 -378
                -27     135    27
                135     54    351


-1215            1 0 3 0
  486    RREF    0 1 1 1
  729            0 0 0
  -486           0 0 0 0


-729           1 0      1I
-486   RREF    0 1      2
243            0 0 0 0
243           [0 0 0 0]


                                             -7      -1

                                9T (1) =      32     -


In Example TIS [627] we concluded that these two invariant subspaces formed a direct sum of C4, only at
that time, they were called X and W. Now we can write

                                       C4 = 9T (1) e(Dr9(-2)

This is no accident. Notice that the dimension of each of these invariant subspaces is equal to the algebraic
multiplicity of the associated eigenvalue. Not an accident either. (See the upcoming Theorem GESD [644].)


Example GE6
Generalized eigenspaces, dimension 6 domain
Define the linear transformation S: C6 H C6 by S (x) = Bx where


2
2
2
10
8
5


-4
-3
-3
-18
-14
-7


25
4
4
6
0
-6


-54
-16
-15
-36
-21
-7


90
26
24
51
28
8


-37-
-8
-7
-2
4
7


Then B will be the matrix representation of S relative to the standard basis (Definition SUV [173]) and
we can use the techniques of Chapter E [396] applied to B in order to find the eigenvalues of S.


  as (3) - 2
as (-1)= 4


  ys (3) - 1
ys (-1)= 2


Version 2.02


﻿
                                        Subsection IS.GEE  Generalized Eigenvectors and Eigenspaces 635


To find the generalized eigenspaces of S we need to subtract an eigenvalue from the diagonal elements of
B, raise the result to the power dim (C6) = 6 and compute the null space. Here are the results for the two
eigenvalues of S,


A=3


(B-


          64000 -
          15872
 -36 6 _ 12032
    )     -1536
          -9728
          -7936
       1 0 0 0
       0 1 0 0
RREF   0 0 1 0
       0 0 0 1
       0 0 0 0
       _0 0 0 0
             4-
             1

9s(3)K


              0_
          6144 -
          4096    -
1)I[6)6 _ 4096    -
          18432  -
          14336 -
          10240 -
       1 0 -5
       0 1 -3
RREF   0 0    0
       0 0    0
       0 0    0
       0 0    0


-152576
-39936
-30208
11264
27648
17920
-4 5-
-1 1
-1 1
-2 1
  0 0
  0 0_
  -5-
  -1


  0


-16384  1
-8192
-8192
-32768
-24576
-16384 -


-59904 26112
-11776 8704
-9984    6400
-23040 17920
-6656    9728
5888     1792


-95744
-29184
-20736
-17920
-1536
4352


133632
36352
26368
-1536
-17920
-14080


A = -1


(-


18432
4096
4096
6144
2048
-2048


-36864 57344   -18432~
-16384 24576  -4096
-16384 24576  -4096
-61440 90112  -6144
-45056 65536  -2048
-28672 40960     2048


2 -4 5-
3 -5 3
0   0   0
0   0   0
0   0   0
0 0 0_


               5     -2     4     -5
               3     -3     5     -3
               1      0     0      0
s              0      1     0      0
               0      0     1      0
               _0_   _0 _   0_    _1 _


If we take the union of the two bases for these two invariant subspaces we obtain the set

                           C = {vi, v2, v3, v4, v5, v6}
                                   4     -5     5     -2     4     -5
                                   1     -1     3     -3     5     -3
                                _  1     -1     1     0      0     0
                                   2'-1'0              1  '0'0
                                   1      0     0     0      1     0
                                   _0_    1 _   0_    0 _    0_     1 _


Version 2.02


﻿
                                             Subsection IS.RLT Restrictions of Linear Transformations 636


You can check that this set is linearly independent (right now we have no guarantee this will happen).
Once this is verified, we have a linearly independent set of size 6 inside a vector space of dimension 6, so by
Theorem G [355], the set C is a basis for C6. This is enough to apply Theorem DSFB [361] and conclude
that

                                         C6 =gs(3) egQs(-1)

This is no accident. Notice that the dimension of each of these invariant subspaces is equal to the algebraic
multiplicity of the associated eigenvalue. Not an accident either. (See the upcoming Theorem GESD [644].)


Subsection RLT
Restrictions of Linear Transformations


Generalized eigenspaces will prove to be an important type of invariant subspace. A second reason for our
interest in invariant subspaces is they provide us with another method for creating new linear transforma-
tions from old ones.

Definition LTR
Linear Transformation Restriction
Suppose that T: V H V is a linear transformation, and U is an invariant subspace of V relative to T.
Define the restriction of T to U by

                         TU:UH-aU                             Tu(u)=T(u)


(This definition contains Notation LTR.)                                                             A

    It might appear that this definition has not accomplished anything, as T U would appear to take on
exactly the same values as T. And this is true. However, T U differs from T in the choice of domain and
codomain. We tend to give little attention to the domain and codomain of functions, while their defining
rules get the spotlight. But the restriction of a linear transformation is all about the choice of domain and
codomain. We are restricting the rule of the function to a smaller subspace. Notice the importance of only
using this construction with an invariant subspace, since otherwise we cannot be assured that the outputs
of the function are even contained in the codomain. Maybe this observation should be the key step in the
proof of a theorem saying that T u is also a linear transformation, but we won't bother.

Example LTRGE
Linear transformation restriction on generalized eigenspace
In order to gain some experience with restrictions of linear transformations, we construct one and then also
construct a matrix representation for the restriction. Furthermore, we will use a generalized eigenspace as
the invariant subspace for the construction of the restriction.
    Consider the linear transformation T: C5 - C5 defined by T (x) =Ax, where


      -22 -24     -24   -24   -46
        3     2    6     0     11
A=    -12 -16      -6   -14   -17
        6     8    4     10    8
        11   14    8     13    18


Version 2.02


﻿
                                            Subsection IS.RLT Restrictions of Linear Transformations 637


One of the eigenvalues of A is A = 2, with geometric multiplicity yT (2) = 1, and algebraic multiplicity
0T (2) = 3. We get the generalized eigenspace in the usual manner,


W =gT(2) =C(T-2Ic)   )


-2      0      -4
1      -1       2
1    , 0    , 0
0       1       0
0 _  0 _ 1 _


({wi, w2, w3})


By Theorem GESIS [631], we know W is invariant relative to T, so we can employ Definition LTR [635]
to form the restriction, T w: W H W.
   To better understand exactly what a restriction is (and isn't), we'll form a matrix representation of T w.
This will also be a skill we will use in subsequent examples. For a basis of W we will use C = {wi, w2, w3}.
Notice that dim (W) = 3, so our matrix representation will be a square matrix of size 3. Applying Definition
MR [542], we compute


Pc (T (wi))


Pc (T (w2))


Pc (T (ws))


Pc (Awi)


PC (Aw2)


PC (Aw3)


     /-_4-\
       2
pc     2
       0


     /0 \
     -2
pc     2
       2
     \-1 /

       3
pc    -1
       0
     \2 /


Pc 2


  \C

pc


  \ K


-2
1
1
0
0
-2
1
1
0
0


1)


       0        -4
       -1        21
 +0 0 +0 01=:
       1         01
       0         1  /
       0           -4
       -1           2
 +2 0 + (-1) 0
       1            0
       0            1
-2        0        -4
1         -1        2
1 +0 0 +2 0
0         1         0
0         0         1


2
0
0_


     2
=    2
    - - =  0


    2


1


1


So the matrix representation of T w relative to C is

                                                  2
                                       MTow =0
                                                  0

The question arises: how do we use a 3 x 3 matrix
question, consider the randomly chosen vector


2    -1
2 0
-1 2]


to compute with vectors from C5? To answer this


      -4
      4
W=     4
      -2
      -1


First check that w E ET (2). There are two ways to do this, first verify that

                                  (T - 2Ic5)5 (w) = (A - 2I5)5 w = 0


Version 2.02


﻿
                                            Subsection IS.RLT Restrictions of Linear Transformations 638


meeting Definition GES [631] (with k = 5). Or, express w as a linear combination of the basis C for W,
to wit, w =4wi - 2w2 - w3. Now compute T w (w) directly using Definition LTR [635],

                                                             -10
                                                             9
                                  Tw(w)=T(w)=Aw=              5
                                                             -4
                                                             0 _

It was necessary to verify that w E ET (2), and if we trust our work so far, then this output will also be
an element of W, but it would be wise to check this anyway (using either of the methods we used for w).
We'll wait.
   Now we will repeat this sample computation, but instead using the matrix representation of T w
relative to C.


T w (w) = pc- (MC w pc (w))

         = P (Miviwjpc (4w1 - 2w2 - w3))
                  2   2   -1     4
         = pl     0   2    0    -2
               1        -1      2 -1
                   5
         = p   1-4
                   0
         = 5wi - 4w2 + Ow3
              -2            0        -4
              1            -1        2
         = 5 1 +(-4) 0 +0 0
               0            1        0
               0            0         1
            -10
              9
         =    5
             -4
             _ 0


Theorem FTMR [544]


Definition VR [530]


Definition MVP [194]


Definition VR [530]


which matches the previous computation. Notice how the "action" of T w is accomplished by a 3 x 3 matrix
multiplying a column vector of size 3. If you would like more practice with these sorts of computations,
mimic the above using the other eigenvalue of T, which is A = -2. The generalized eigenspace has
dimension 2, so the matrix representation of the restriction to the generalized eigenspace will be a 2 x 2
matrix.

   Suppose that T: V H V is a linear transformation and we can find a decomposition of V as a direct
sum, say V = U1  U2 ( U3 ( ... -  Um where each Uz is an invariant subspace of V relative to T. Then,
for any v E V there is a unique decomposition v =ui +u2 + u3+ - - - +umwith u2 E U, 1 < i < m and
furthermore


T(V) =T(ui+ u2 + u3+        + um)
          (ui) + T (u2) + T (u3) + ---+ T (urn)
      = Tlu1 (ui) +TlU2 (u2) +Tlu3 (us3)+-...+'T Um (urn)


Definition DS [361]
Theorem LTLC [462]


Version 2.02


﻿
                                            Subsection IS.RLT Restrictions of Linear Transformations 639


So in a very real sense, we obtain a decomposition of the linear transformation T into the restrictions T u2,
1 < i < m. If we wanted to be more careful, we could extend each restriction to a linear transformation
defined on V by setting the output of T u, to be the zero vector for inputs outside of Us. Then T would
be exactly equal to the sum (Definition LTA [467]) of these extended restrictions. However, the irony of
extending our restrictions is more than we could handle right now.
   Our real interest is in the matrix representation of a linear transformation when the domain decomposes
as a direct sum of invariant subspaces. Consider forming a basis B of V as the union of bases Bi from the
individual U, i.e. B = U'i Bi. Now form the matrix representation of T relative to B. The result will be
block diagonal, where each block is the matrix representation of a restriction T u, relative to a basis Bi,
MB U. Though we did not have the definitions to describe it then, this is exactly what was going on in
the latter portion of the proof of Theorem CFNLT [619]. Two examples should help to clarify these ideas.

Example ISMR4
Invariant subspaces, matrix representation, dimension 4 domain
Example TIS [627] and Example GE4 [632] describe a basis of C4 which is derived from bases for two
invariant subspaces (both generalized eigenspaces). In this example we will construct a matrix representa-
tion of the linear transformation T relative to this basis. Recycling the notation from Example TIS [627],
we work with the basis,

                                                  -7      -1     -3      0
                                                  -2      -2     -1     -1
                       B= {wi, w2, x1, x2} -       3     [ 0    [       ' 0
                                                   0      1       0      1

Now we compute the matrix representation of T relative to B, borrowing some computations from Example
TIS [627],

                                          -1                             0

                     PB (T (wi)) =PB 0 [= pB ((0)wi + (1)w2) =K]
                                          1                              0
                                          5                                -1
                             -w=2                =2
                     PB (T(x2))    PB     -      = j PB ((-1)wi + ()x2)
                                           3                               0
                                           0                               0


                                           3                                0


                      PB (T(x2)) =PB      _ [] j= PB ((-1)xi + (-3)x2) =[_]


Applying Definition MR [542], we have


          o -1     0   0
  T       1   2    0   0
MB) B     0   0   -1 -1
          0   0    1   -3


Version 2.02


﻿
                                           Subsection IS.RLT Restrictions of Linear Transformations 640


The interesting feature of this representation is the two 2 x 2 blocks on the diagonal that arise from the
decomposition of C4 into a direct sum (of generalized eigenspaces). Or maybe the interesting feature of
this matrix is the two 2 x 2 submatrices in the "other" corners that are all zero. You decide.

Example ISMR6
Invariant subspaces, matrix representation, dimension 6 domain
In Example GE6 [633] we computed the generalized eigenspaces of the linear transformation S: C6 H C6
by S (x) = Bx where


2
2
2
10
8
5


-4
-3
-3
-18
-14
-7


25
4
4
6
0
-6


-54
-16
-15
-36
-21
-7


90
26
24
51
28
8


-37-
-8
-7
-2
4
7 _


From this we found the basis

                           C = {vi, V2, V3, V4, V5, V6}
                                   4    -5      5    -2     4     -5
                                   1    -1     3     -3     5     -3
                                   1    -1      1     0     0      0
                                   2  ' -1   ' 0   ' 1    ' 0  ' 0
                                   1     0     0      0     1      0
                                   0  1 _ 0_ _0 _ 0_  1


I


of C6 where {vi, v2} is a basis of gs (3) and {v3, v4, v5, v6} is
the construction of a matrix representation of S (Definition MR


a basis of GS (-1). We can employ C in
[542]). Here are the computations,


PC (S (vi))


PC (S (v2))


PC


PC


K 11\
   3
   31
   7 =
   41


   -14-
   -3
   -3
   -4
   -1


   -2
 /23   \
   5j
   54
   2
   -2
 \-2_/
 /-46 \
 -11
 -10
   -2
   5
   \ 4 _)


pc (4vi + 1v2)


4-
1
0
0
0
0


PC ((-1)vi + 2v2)


-1
2
0
0
0
0


Pc (S (v3)) = Pc


pc (5v3 + 2v4 + (-2)v5 + (-2)v6) r


= pc ((-10)v3 + (-2)V4 + 5v5 + 4v6)


0
0
5
2
-2
-2
   0
   0
   -10
   -2
   5
   4


PC (S (v4))


PC


Version 2.02


﻿
                                           Subsection IS.RLT  Restrictions of Linear Transformations 641


                                /78   \0
                                19                                                 0
                                17     I=17
             PC (S (v5)) = PC     1        Pc (17v3 + 1v4 + (-10)v5 + (-7)v6) =
                                 -10                                              -10
                                 -7/                                             _-7_
                               /_-35-\                                       0
                                 -9                                          0
                                 -8                                         -8
             PC (S (v6)) = Pc     2        pc ((-8)v3 + 2v4 + 6v5 + 3v6)=  2
                                  6                                          6
                                  33 _
These column vectors are the columns of the matrix representation, so we obtain
                                        4   -1   0     0    0     0
                                        1   2    0     0    0     0

                               MS     _0    0    5   -10    17   -8
                                 CC      0  0    2    -2    1     2
                                         0  0   -2     5   -10    6
                                         0  0   -2     4    -7    3 _
As before, the key feature of this representation is the 2 x 2 and 4 x 4 blocks on the diagonal. We will
discover in the final theorem of this section (Theorem RGEN [640]) that we already understand these
blocks fairly well. For now, we recognize them as arising from generalized eigenspaces and suspect that
their sizes are equal to the algebraic multiplicities of the eigenvalues.
   The paragraph prior to these last two examples is worth repeating. A basis derived from a direct sum
decomposition into invariant subspaces will provide a matrix representation of a linear transformation with
a block diagonal form.
   Diagonalizing a linear transformation is the most extreme example of decomposing a vector space
into invariant subspaces. When a linear transformation is diagonalizable, then there is a basis composed
of eigenvectors (Theorem DC [436]). Each of these basis vectors can be used individually as the lone
element of a spanning set for an invariant subspace (Theorem EIS [629]). So the domain decomposes into
a direct sum of one-dimensional invariant subspaces (Theorem DSFB [361]). The corresponding matrix
representation is then block diagonal with all the blocks of size 1, i.e. the matrix is diagonal. Section NLT
[610], Section IS [627] and Section JCF [644] are all devoted to generalizing this extreme situation when
there are not enough eigenvectors available to make such a complete decomposition and arrive at such an
elegant matrix representation.
   One last theorem will roll up much of this section and Section NLT [610] into one nice, neat package.
Theorem RGEN
Restriction to Generalized Eigenspace is Nilpotent
Suppose T: V a V is a linear transformation with eigenvalue A. Then the linear transformation T gT(A) -
AIg,(A) is nilpotent.D
Proof    Notice first that every subspace of V is invariant with respect to Iy, so IgT(A) =Iv gTAx) Let
  n=dim (V) and choose v E gr (A). Then

              (T gT(A) - AIgT(A))" (v) =(T - AIy)"h (v)        Definition LTR [635]
                                      =0                           Theorem GEK [632]


So by Definition NLT [610], TIgT(A) - AIg,(A) is nilpotent.                                       U
   The proof of Theorem RGEN [640] indicates that the index of the nilpotent linear transformation is
less than or equal to the dimension of V. In practice, it will be less than or equal to the dimension of the


Version 2.02


﻿
                                           Subsection IS.RLT  Restrictions of Linear Transformations 642


domain of the linear transformation, CT (A). In any event, the exact value of this index will be of some
interest, so we define it now. Notice that this is a property of the eigenvalue A, similar to the algebraic
and geometric multiplicities (Definition AME [406], Definition GME [406]).

Definition IE
Index of an Eigenvalue
Suppose T: V H V is a linear transformation with eigenvalue A. Then the index of A, tT (A), is the index
of the nilpotent linear transformation T gT(A) - AIg9(A).
(This definition contains Notation IE.)                                                           A

Example GENR6
Generalized eigenspaces and nilpotent restrictions, dimension 6 domain
In Example GE6 [633] we computed the generalized eigenspaces of the linear transformation S: C6 H C6
defined by S (x) = Bx where


2
2
2
10
8
5


-4
-3
-3
-18
-14
-7


25
4
4
6
0
-6


-54
-16
-15
-36
-21
-7


90
26
24
51
28
8


-37-
-8
-7
-2
4
7


The generalized eigenspace, gs (3), has dimension 2, while GS (-1), has dimension 4. We'll investigate each
thoroughly in turn, with the intent being to illustrate Theorem RGEN [640]. Much of our computations
will be repeats of those done in Example ISMR6 [639].
   For U = GS (3) we compute a matrix representation of SIU using the basis found in Example GE6 [633],

                                                      4    -5
                                                         1-1
                                                         1-1
                                   B = u1u2}           V '-1

                                                      r  (0


Since B has size 2, we obtain a 2 x 2 matrix representation (Definition MR [542]) from


PB (S u (1))


PB (S U (u2))


PB


1


r


PB


1 \
3
3

4
1
-14
-3
-3
_4
-1
2 _


PB (4u1 + u2)


[41
Lii


PB ((-1)ui + 2U2)


-


-11


Thus


M    MU'lU - -    21J


Version 2.02


﻿
                                           Subsection IS.RLT Restrictions of Linear Transformations 643


Now we can illustrate Theorem RGEN [640] with powers of the matrix representation (rather than the
restriction itself),


M-3I2 = 1 1


( -1)2   0  0
         0 0-


So M - 3I2 is a nilpotent matrix of index 2 (meaning that Slu - 3Iu is a nilpotent linear transformation
of index 2) and according to Definition IE [641] we say ts (3) = 2.
   For W = S (-1) we compute a matrix representation of S w using the basis found in Example GE6
[633],


C = {wi, w2, w3, w4} =


5;
3
1
0
0
0


-2
-3
0
1
0
0


-4
5
0
0
1
0


-5
-3
0
0
0
1


I


Since C has size 4, we obtain a 4 x 4 matrix representation (Definition MR [542]) from


PC (S w (wi)) P=pc


PC (Sw (w2))


PC (Sw (w3))


PC (S w (w4))


PC


PC


Pc


[ 23
   51
   5
   2
   -2
 \-2_
[ -46
   -11
   -10
   -2
   5
   4,
[ 78 \
   19
   17
   1
   -10

   -35

   _9
   -8
   2
   6
   \ _3 _/


Pc (5wi + 2w2 + (-2)w3 + (-2)w4) =


= pc ((-10)wi + (-2)w2 + 5w3 + 4w4)


= Pc (17wi + W2 + (-10)w3 + (-7)W4)


--


~K


-10
-2
5
4


17
1
-10
-7_


[


5
2
-2
-2


Pc ((-8)wi + 2w2 + 6w3 + 3w4)


L


-8
2
6
3


Thus


N=MSW
       w,w


5
2
-2
-2


-10
-2
5
4


17
1
-10
-7


-8
2
6
3_


Version 2.02


﻿
                                             Subsection IS.RLT Restrictions of Linear Transformations 644


Now we can illustrate Theorem RGEN [640] with powers of the matrix representation (rather than the
restriction itself),
                                                   6   -10    17  -8
                                                   2    -1    1    2
                                   N -(-)14=[2           5   -9    6

                                                  -2     4   -7    4-
                                                  -2    3   -5    2
                                             (N (l)I4)2  4  -6  10  -4


                                                  1000010         -
                                                  2    -3    5   -2_
                                                  0 0 0 0

                                (N- (-1)I4)       0 0 0 0
                                                  0 0 0 0

So N - (-1)14 is a nilpotent matrix of index 3 (meaning that S w - (-1)'w is a nilpotent linear transfor-
mation of index 3) and according to Definition IE [641] we say is (-1) = 3.
    Notice that if we were to take the union of the two bases of the generalized eigenspaces, we would have
a basis for C6. Then a matrix representation of S relative to this basis would be the same block diagonal
matrix we found in Example ISMR6 [639], only we now understand each of these blocks as being very close
to being a nilpotent matrix.
    Invariant subspaces, and restrictions of linear transformations, are topics you will see again and again
if you continue with further study of linear algebra. Our reasons for discussing them now is to arrive at a
nice matrix representation of the restriction of a linear transformation to one of its generalized eigenspaces.
Here's the theorem.
Theorem MRRGE
Matrix Representation of a Restriction to a Generalized Eigenspace
Suppose that T: V H V is a linear transformation with eigenvalue A. Then there is a basis of the the
generalized eigenspace gT (A) such that the restriction T gT(A): gT (A) g (A) has a matrix representation
that is block diagonal where each block is a Jordan block of the form J> (A).                D
Proof Theorem RGEN [640] tells us that T gT(A) - AIg(A) is a nilpotent linear transformation. Theorem
CFNLT [619] tells us that a nilpotent linear transformation has a basis for its domain that yields a matrix
representation that is block diagonal where the blocks are Jordan blocks of the form Jn (0). Let B be a
basis of gT (A) that yields such a matrix representation for T gT(A) - AIg(A).
    By Definition LTA [467], we can write

                                T gT(A) - (T gT(A) - AIgT(A)) + Afg,(A)
The matrix representation of AMg,(A) relative to the basis B is then simply the diagonal matrix AIm, where
m =dim (gT (A)). By Theorem MRSLT [548] we have the rather unwieldy expression,

                                 MT QT( - M(T QTAAkT()+AkT(
                                   B,B         BB

                                             MB,B           +    B,B
The first of these matrix representations has Jordan blocks with zero in every diagonal entry, while the
second matrix representation has A in every diagonal entry. The result of adding the two representations
is to convert the Jordan blocks from the form Ja (0) to the form Ja (A).U


   Of course, Theorem CFNLT [619] provides some extra information on the sizes of the Jordan blocks in
a representation and we could carry over this information to Theorem MRRGE [643], but will save that
for a subsequent application of this result.


Version 2.02


﻿
                                                              Section JCF  Jordan Canonical Form 645


Section JCF
Jordan Canonical Form


THIS SECTION IS IN DRAFT FORM
NEEDS EXAMPLES NEAR BEGINNING

   We have seen in Section IS [627] that generalized eigenspaces are invariant subspaces that in every
instance have led to a direct sum decomposition of the domain of the associated linear transformation.
This allows us to create a block diagonal matrix representation (Example ISMR4 [638], Example ISMR6
[639]). We also know from Theorem RGEN [640] that the restriction of a linear transformation to a
generalized eigenspace is almost a nilpotent linear transformation. Of course, we understand nilpotent
linear transformations very well from Section NLT [610] and we have carefully determined a nice matrix
representation for them.
   So here is the game plan for the final push. Prove that the domain of a linear transformation always
decomposes into a direct sum of generalized eigenspaces. We have unravelled Theorem RGEN [640] at
Theorem MRRGE [643] so that we can formulate the matrix representations of the restrictions on the
generalized eigenspaces using our storehouse of results about nilpotent linear transformations. Arrive at a
matrix representation of any linear transformation that is block diagonal with each block being a Jordan
block.

Subsection GESD
Generalized Eigenspace Decomposition


In Theorem UTMR [602] we were able to show that any linear transformation from V to V has an upper
triangular matrix representation (Definition UTM [601]). We will now show that we can improve on the
basis yielding this representation by massaging the basis so that the matrix representation is also block
diagonal. The subspaces associated with each block will be generalized eigenspaces, so the most general
result will be a decomposition of the domain of a linear transformation into a direct sum of generalized
eigenspaces.
Theorem GESD
Generalized Eigenspace Decomposition
Suppose that T (V) V is a linear transformation with distinct eigenvalues Ai, A2, A3, ..., Am. Then

                           V =gT(A1) e gT(A2)eD gT(A3)e ...-egT(Am)


Proof Suppose that dim (V) =n and the n~ (not necessarily distinct) eigenvalues of T are scalarlistpnt. We
begin with a basis of V that yields an upper triangular matrix representation, as guaranteed by Theorem
UTMR [602], B    ={x1, x2, x3, -.-., xn}. Since the matrix representation is upper triangular, and the
eigenvalues of the linear transformation are the diagonal elements we can choose this basis so that there
are then scalars agg, 1 5j 5rn 1<5i 5j -1 such that

                                               j1-1


                                      T (x) =      aijxi+ pjxj
                                               i=1

We now define a new basis for V which is just a slight variation in the basis B. Choose any k and f such that
1 < k <   < r<n and Pk -f p. Define the scalar = aki/ (p - Pk). The new basis is C = {yi, y2, y3, -"--", yn}


Version 2.02


﻿

Subsection JCF.GESD  Generalized Eigenspace Decomposition 646


where


yj= X, j7#/,1 j 1n


Ye = X + OXk


We now compute the values of the linear transformation T with inputs from C, noting carefully the
changed scalars in the linear combinations of C describing the outputs. These changes will translate to
minor changes in the matrix representation built using the basis C. There are three cases to consider,
depending on which column of the matrix representation we are examining. First, assume j <Ce. Then

                                      T (yj) = T (xj)
                                              j-1

                                            = S ajgx2 + pjx3
                                              i=1
                                              j-1

                                            = 5   aijy2 + pjy3
                                              i=1

That seems a bit pointless. The first £ - 1 columns of the matrix representations of T relative to B and C
are identical. OK, if that was too easy, here's the main act. Assume j = £. Then


T(y) =T(x +axk)
      =T(x) + aT(xk)
          f-1k

      = 5 ai Xi + 5x oia
          i=1i
          f-1             k
          = ait Xi + p Xf +
          i=1             i
          f-1      k-1

      - 5   aaix + 5  aa
         i=1        i=1
         f-1       k-1

      - 5   aixi +     aai
         i=1        i=1
         i#k
         f-1       k-1

      - 5   aixi + 5   aai
         i=1        i=1
         i#k
         f-1       k-1

      - 5   aixi + 5   aai
         i=1        i=1
         i#k
         f-1       k-1

         =5 auxi + 5eaa
         i=1        i=1
         iik
         f-1       k-1
         =   i ayi +  e aia
         i=1        i=1
         iok


      k-1
 + a      aikxi + Pkxk
      i=1
-1
(aaikxi -|-a pkxk
i=1


kxi + aPkXk + pexe


kXi + akixk -+ aPkXk + pex


k xi + akixk + apkxk - pgaxk + pgaxk p xf


kXi + (aki + apk - pea) xk + P (axk + x )


kxi + (aki + a (Pk - pe)) xk -+ p (x + +axe)


Yj + (aki+ a (pk - pe)) yk+-p ye


So how different are the matrix representations relative to B and C in column £? For i > k, the coefficient
of yj is agg, as in the representation relative to B. It is a different story for i < k, where the coefficients of
yi may be very different. We are especially interested in the coefficient of yk. In fact, this whole first part


Version 2.02


﻿
                                         Subsection JCF.GESD   Generalized Eigenspace Decomposition 647


of this proof is about this particular entry of the matrix representation. The coefficient of Yk is

                               akl-+a((pk- pg)aak -(Pk|--P)
                                                        pl - Pk
                                                = ali + (-l)aki
                                                = 0
If the definition of a was a mystery, then no more. In the matrix representation of T relative to C, the
entry in column £, row k is a zero. Nice. The only price we pay is that other entries in column £, specifically
rows 1 through k - 1, may also change in a way we can't control.
    One more case to consider. Assume j > £. Then

                     T (yj) = T (xj)
                              j-1
                            = S aijxi + pjxj
                               i=1
                               j-1
                            =      aijxi +afjxf +akjxk-+pjxj
                               i=1
                               iff,k
                               j-1
                               5   aijxi + afjxf + Oeajxk - oeajxk + akjxk + pjxj
                               i=1
                               iff,k
                               j-1
                               5   aijxi + af (x +Oaxk) + (ak] - oaj)xk +P3x3
                               i=1
                               iff,k
                               j-1
                               5   aijyi + afjy + (ak] - aafj) Yk + PjYj
                               i=1
                               iff,k
As before, we ask: how different are the matrix representations relative to B and C in column j? Only
Yk has a coefficient different from the corresponding coefficient when the basis is B. So in the matrix
representations, the only entries to change are in row k, for columns £ + 1 through n.
   What have we accomplished? With a change of basis, we can place a zero in a desired entry (row k,
column £) of the matrix representation, leaving most of the entries untouched. The only entries to possibly
change are above the new zero entry, or to the right of the new zero entry. S Suppose we repeat this
procedure, starting by "zeroing out" the entry above the diagonal in the second column and first wow.
Then we move right to the third column, and zero out the element just above the diagonal in the second
row. Next we zero out the element in the third column and first row. Then tackle the fourth column, work
upwards from the diagonal, zeroing out elements as we go. Entries above, and to the right will repeatedly
change, but newly created zeros will never get wrecked, since they are below, or just to the left of the
entry we are working on. Similarly the values on the diagonal do not change either. This entire argument
can be retooled in the language of change-of-basis matrices and similarity transformations, and this is the
approach taken by Noble in his Applied Linear Algebra. It is interesting to concoct the change-of-basis
matrix between the matrices B and C and compute the inverse.
    Perhaps you have noticed that we have to be just a bit more careful than the previous paragraph
suggests. The definition of a~ has a denominator that cannot be zero, which restricts our maneuvers to
zeroing out entries in row k and column £ only when pk # pr. So we do not necessarily arrive at a diagonal
matrix. More carefully we can write


          j-1
T (y5) =     bigyi + pjyj
          i=1
          *s Pi=Pj


Version 2.02


﻿
                                          Subsection JCF.GESD  Generalized Eigenspace Decomposition   648


where the big are our new coefficients after repeated changes, the yj are the new basis vectors, and the
condition "i : pi = Pi" means that we only have terms in the sum involving vectors whose final coefficients
are identical diagonal values (the eigenvalues). Now reorder the basis vectors carefully. Group together
vectors that have equal diagonal entries in the matrix representation, but within each group preserve
the order of the precursor basis. This grouping will create a block diagonal structure for the matrix
representation, while otherwise preserving the order of the basis will retain the upper triangular form of
the representation. So we can arrive at a basis that yields a matrix representation that is upper triangular
and block diagonal, with the diagonal entries of each block all equal to a common eigenvalue of the linear
transformation.
    More carefully, employing the distinct eigenvalues of T, Xi, 1 < i < m, we can assert there is a set of
basis vectors for V, uij, 1 < i < m, 1 < j  ar (Ai), such that
                                                j-1
                                      T (uij) =     b3iju1 -+ A ui
                                                k=1
So the subspace U= ({ui     1 < j   ar (Ai)}), 1 < i < m is an invariant subspace of V relative to T and the
restriction T uz has an upper triangular matrix representation relative to the basis {u3| 1 < j  a (Ai) }
where the diagonal entries are all equal to Ai. Notice too that with this definition,

                                      V=U1ie U2 e(U3 (D-.-.-DeUm

Whew. This is a good place to take a break, grab a cup of coffee, use the toilet, or go for a short stroll,
before we show that Ui is a subspace of the generalized eigenspace CT (Ai). This will follow if we can prove
that each of the basis vectors for Ui is a generalized eigenvector of T for Ai (Definition GEV [631]). We
need some power of T - AjIy that takes uij to the zero vector. We prove by induction on j (Technique I
[694]) the claim that (T - AgIV)3 (uij) = 0. For j =1 we have,

                                  (T - AiIV) (ui)= T (usi) - AiIV (usi)
                                                  = T (usi) - Aguil
                                                  = Aiugi - Aiuil
                                                  = 0

For the induction step, assume that if k < j, then (T - ANIv)k takes uik to the zero vector. Then

                     (T - AgIV)3 (uij) = (T - AiIV)-1 ((T - Ai IV) (uij))
                                      = (T - AiIV)-1 (T (uij) - AiIV (uij))
                                      = (T - AIV)j-1 (T (ui.) - Aiui)

                                      = (T - AiIy) - (i bigkuik +| Aiui3 - Aiuii)


                                          i T-AIy)~    (jbijkuik)

                                        j1-1
                                      = S b Jk (T - AgIy)~ (uk)
                                        k=1


  j-1
  =E  b sk (T - AZIy --k(T - AiIy)k (uik)
  k=1
  j-1
=> b Jk (T - AiIv)J k (0)
  k=1


Version 2.02


﻿
                                         Subsection JCF.GESD   Generalized Eigenspace Decomposition 649

                                       j-1
                                       =( bijk0
                                       k=1
                                     =0

This completes the induction step. Since every vector of the spanning set for Ui is an element of the
subspace gT (Xi), Property AC [279] and Property SC [279] allow us to conclude that Ui c gr (Ai). Then
by Definition S [292], Ui is a subspace of gT (Ai). Notice that this inductive proof could be interpreted to
say that every element of Ui is a generalized eigenvector of T for Xi, and the algebraic multiplicity of As is
a sufficiently high power to demonstrate this via the definition for each vector.
   We are now prepared for our final argument in this long proof. We wish to establish that the dimension
of the subspace gT (Al) is the algebraic multiplicity of Ai. This will be enough to show that Ui and gr (Ai)
are equal, and will finally provide the desired direct sum decomposition.
   We will prove by induction (Technique I [694]) the following claim. Suppose that T: V H V is a linear
transformation and B is a basis for V that provides an upper triangular matrix representation of T. The
number of times any eigenvalue A occurs on the diagonal of the representation is greater than or equal to
the dimension of the generalized eigenspace CT (A).
   We will use the symbol m for the dimension of V so as to avoid confusion with our notation for the
nullity. So dim V = m and our proof will proced by induction on m. Use the notation #T(A) to count the
number of times A occurs on the diagonal of a matrix representation of T. We want to show that

                  #T(A) > dim (gT (A))
                         = dim (1C((T - A)m))                  Theorem GEK [632]
                         = n ((T - A)m)                        Definition NOLT [517]


For the base case, dim V = 1. Every matrix representation of T is an upper triangular matrix with the
lone eigenvalue of T, A, as the diagonal entry. So #T(A) =1. The generalized eigenspace of A is not trivial
(since by Theorem GEK [632] it equals the regular eigenspace), and is a subspace of V. With Theorem
PSSD [358] we see that dim (gT (A)) =1.
    Now for the induction step, assume the claim is true for any linear transformation defined on a
vector space with dimension m - 1 or less. Suppose that B = {vi, v2, v3, ..., vm} is a basis for V
that yields a diagonal matrix representation for T with diagonal entries A1, A2, A3, ..., Am. Then U
({Vi, V2, v3, ... , Vm-1}) is a subspace of V that is invariant relative to T. The restriction Tu u: U 1 U is
then a linear transformation defined on U, a vector space of dimension m - 1. A matrix representation of
T u relative to the basis C = {vi, v2, v3, ..., Vm-1} will be an upper triangular matrix with diagonal en-
tries Ai, A2, A3, ..., Am-1. We can therefore apply the induction hypothesis to T u and its representation
relative to C.
    Suppose that A is any eigenvalue of T. Then suppose that v E K((T - Ay)m). As an element of V, we
can write v as a linear combination of the basis elements of B, or more compactly, there is a vector u E U
and a scalar a~ such that V =U + avmn. Then,


            = a(T -AI)m (vm)                                               Theorem EOMP [421]
            = 0+ a (T -AI)m (vm)                                           Property Z [280]
            =- (T -AI)m (u) + (T -AIy)m (u) + a (T-AI)m (vm)       Property AI [280]
            =- (T - AIy)m (u) + (T - AIy)m (u + avm)                       Theorem LTL C [462]


- (T - AIv)m (u) + (T - AIv)m (v)                            Theorem LTLC [462]
- (T - AIv)m (u) + 0                                         Definition KLT [481]
-(T - AI)m (u)                                                Property Z [280]


Version 2.02


﻿
Subsection JCF.GESD   Generalized Eigenspace Decomposition 650


The final expression in this string of equalities is an element of U since U is invariant relative to both
T and Iy. The expression at the beginning is a scalar multiple of vm, and as such cannot be a nonzero
element of U without violating the linear independence of B. So

                                         a i(Am- A)m Vm = 0

The vector vm is nonzero since B is linearly independent, so Theorem SMEZV [287] tells us that a (Am - A)tm
0. From the properties of scalar multiplication, we are confronted with two possibilities.
    Our first case is that A - Am. Notice then that A occurs the same number of times along the diagonal
in the representations of T U and T. Now a = 0 and v = u+0vm = u. Since v was chosen as an arbitrary
element of C((T - AIv)m), Definition SSET [683] says that C((T - AIv)m) C U. It is always the case that
K((Tlu - AIu)m) C    ((T - AIv)tm). However, we can also see that in this case, the opposite set inclusion
is true as well. By Definition SE [684] we have K((T u - AIU)m) = K((T - AIv)m). Then


               #T(A)= #TlU(A)
                      > dim (gTri (A))

                      = dim (((Tu - AIU)m-1)
                      = dim (K((Tu - AIU)m))
                      =dim (K((T - AIV)m))
                      =dim (gT (A))


The second case is that A =Am. Notice then that A occurs one
representation of T compared to the representation of T u. Then

        (Tu - AI)m (u) =_(T - AIv)m (u)
                          = (T - AIv)m (u) + 0


Induction Hypothesis

Theorem GEK [632]
Theorem KPLT [616]


Theorem GEK [632]


more time along the diagonal in the


      Property Z [280]
      Theorem ZSSM [286]
      Theorem EOMP [421]
      Theorem LTLC [462]


      Definition KLT [481]


(T - AIV)m (u) + a(Am
(T - AIv)m (u) + a (T -
(T - AIV)m (u + avm)
(T - AIv)m (v)
0


- A)mVm
AIV)m (vm)


So u E C(T u - AIu). The vector v is an arbitrary member of K((T - AIv)"') and is also equal to an
element of C(T u - AIu) (u) plus a scalar multiple of the vector vm. This observation yields

                            dim (K((T - AIv)tm)) < dim (K(T u - AIu)) + 1

Now count eigenvalues on the diagonal,


#T(A)= #TlU(A) +1
         dim (gTri (A)) + 1
       = dim (Kc((T u - AIU)m-1 + 1

       =dim (K((Tu - AIu)m)) + 1
         dim (K((T - AIv)tm))
       = dim (gT (A))


Induction Hypothesis

Theorem GEK [632]
Theorem KPLT [616]


Theorem GEK [632]


Version 2.02


﻿
Subsection JCF.JCF  Jordan Canonical Form 651


In Theorem UTMR [602] we constructed an upper triangular matrix representation of T where each
eigenvalue occurred aT (A) times on the diagonal. So

                   aT (A2) = #T(Ai)                        Theorem UTMR [602]
                          > dim (gT (A2))
                          > dim (UZ)                       Theorem PSSD [358]
                          = aT (A2)                        Theorem PSSD [358]


Thus, dim (gT (A2)) = aT (AZ) and by Theorem EDYES [358], U = gT (A2) and we can write

                           V = Ui1e U2 e U3 e -.-.-E)Um
                             = gT(Al)(E)gT(A2) (E)T (A3)e(D- -'-(e gT(Am)


   Besides a nice decomposition into invariant subspaces, this proof has a bonus for us.
Theorem DGES
Dimension of Generalized Eigenspaces
Suppose T: V i V is a linear transformation with eigenvalue A. Then the dimension of the generalized
eigenspace for A is the algebraic multiplicity of A, dim (gT (A2)) = aT (A2).
Proof At the very end of the proof of Theorem GESD [644] we obtain the inequalities

                                  aT (Ai)  dim (gT (A2))  aT (A2)

which establishes the desired equality.                                                          U


Subsection JCF
Jordan Canonical Form


Now we are in a position to define what we (and others) regard as an especially nice matrix representation.
The word "canonical" has at its root, the word "canon," which has various meanings. One is the set
of laws established by a church council. Another is a set of writings that are authentic, important or
representative. Here we take to to mean the accepted, or best, representative among a variety of choices.
Every linear transformation admits a variety of representations, and will declare one as the best. Hopefully
you will agree.
Definition JCF
Jordan Canonical Form
A square matrix is in Jordan canonical form if it meets the following requirements:

  1. The matrix is block diagonal.

  2. Each block is a Jordan block.

  3. If p < A then the block Jk (p) occupies rows with indices greater than the indices of the rows occupied
     by Jr (A).

  4. If p =A and £ < k, then the block Jr (A) occupies rows with indices greater than the indices of the
     rows occupied by Jk (A).


Version 2.02


﻿
                                                         Subsection JCF.JCF   Jordan Canonical Form  652


                                                                                                      A

Theorem JCFLT
Jordan Canonical Form for a Linear Transformation
Suppose T: V H V is a linear transformation. Then there is a basis B for V such that the matrix
representation of T with the following properties:

   1. The matrix representation is in Jordan canonical form.

   2. If Jk (A) is one of the Jordan blocks, then A is an eigenvalue of T.

   3. For a fixed value of A, the largest block of the form Jk (A) has size equal to the index of A, tT (A).

   4. For a fixed value of A, the number of blocks of the form Jk (A) is the geometric multiplicity of A,
     'YT (A).

  5. For a fixed value of A, the number of rows occupied by blocks of the form Jk (A) is the algebraic
     multiplicity of A, oT (A).


Proof This theorem is really just the consequence of applying to T, consecutively Theorem GESD [644],
Theorem MRRGE [643] and Theorem CFNLT [619].
    Theorem GESD [644] gives us a decomposition of V into generalized eigenspaces, one for each distinct
eigenvalue. Since these generalized eigenspaces ar invariant relative to T, this provides a block diagonal
matrix representation where each block is the matrix representation of the restriction of T to the generalized
eigenspace.
    Restricting T to a generalized eigenspace results in a "nearly nilpotent" linear transformation, as stated
more precisely in Theorem RGEN [640]. We unravel Theorem RGEN [640] in the proof of Theorem MRRGE
[643] so that we can apply Theorem CFNLT [619] about representations of nilpotent linear transformations.
   We know the dimension of a generalized eigenspace is the algebraic multiplicity of the eigenvalue
(Theorem DGES [650]), so the blocks associated with the generalized eigenspaces are square with a size
equal to the algebraic multiplicity. In refining the basis for this block, and producing Jordan blocks the
results of Theorem CFNLT [619] apply. The total number of blocks will be the nullity of Tg T(A) - AIg(A),
which is the geometric multiplicity of A as an eigenvalue of T (Definition GME [406]). The largest of the
Jordan blocks will have size equal to the index of the nilpotent linear transformation T gT(A) - AIg,(A)
which is exactly the definition of the index of the eigenvalue A (Definition IE [641]).       U

    Before we do some examples of this result, notice how close Jordan canonical form is to a diagonal
matrix. Or, equivalently, notice how close we have come to diagonalizing a matrix (Definition DZM
[435]). We have a matrix representation which has diagonal entries that are the eigenvalues of a matrix.
Each occurs on the diagonal as many times as the algebraic multiplicity. However, when the geometric
multiplicity is strictly less than the algebraic multiplicity, we have some entries in the representation just
above the diagonal (the "superdiagonal"). Furthermore, we have some idea how often this happens if we
know the geometric multiplicity and the index of the eigenvalue.
    We now recognize just how simple a diagonalizable linear transformation really is. For each eigenvalue,
the generalized eigenspace is just the regular eigenspace, and it decomposes into a direct sum of one-
dimensional subspaces, each spanned by a different eigenvector chosen from a basis of eigenvectors for the
eigenspace.


    Some authors create matrix representations of nilpotent linear transformations where the Jordan block
has the ones just below the diagonal (the "subdiagonal"). No matter, it is really the same, just different.
We have also defined Jordan canonical form to place blocks for the larger eigenvalues earlier, and for blocks
with the same eigenvalue, we place the bigger ones earlier. This is fairly standard, but there is no reason we


Version 2.02


﻿
                                                         Subsection JCF.JCF   Jordan Canonical Form  653


couldn't order the blocks differently. It'd be the same, just different. The reason for choosing some ordering
is to be assured that there is just one canonical matrix representation for each linear transformation.

Example JCF1O
Jordan canonical form, size 10
Suppose that T: C10 H C10 is the linear transformation defined by T (x) = Ax where


A


-6
-3
8
-7
0
3
-1
3
0
-4


9
5
-9
9
-1
2
3
-4
2
4


-7
-3
8
-7
0
1
-3
3
0
-5


-5
-1
6
-5
-1
2
-2
2
0
-4


5
2
0
0
-3
9
4
1
2
-1


12
7
-14
13
-2
-1
3
-5
2
6


-22
-12
25
-23
3
1
-6
9
-4
-11


14
9
-13
13
-4
5
4
-5
4
4


8
1
-4
2
-2
5
4
1
2
1


21 -
12
-26
24
-3
-5
3
-9
4
10


We'll find a basis for C10 that will yield a matrix representation of T in Jordan canonical form. First we
find the eigenvalues, and their multiplicities, with the techniques of Chapter E [396].


A=2
A=0
A=-1


  aT (2) = 2
  aT (0) = 3
aT (-1) = 5


  YT (2) = 2
'YT (-1) = 2
'yT (-1) = 2


For each eigenvalue, we can compute a generalized eigenspace. By Theorem GESD [644] we know that
C10 will decompose into a direct sum of these eigenspaces, and we can restrict T to each part of this
decomposition. At this stage we know that the Jordan canonical form will be block diagonal with blocks of
size 2, 3 and 5, since the dimensions of the generalized eigenspaces are equal to the algebraic multiplicities
of the eigenvalues (Theorem DGES [650]). The geometric multiplicities tell us how many Jordan blocks
occupy each of the three larger blocks, but we will discuss this as we analyze each eigenvalue. We do not
yet know the index of each eigenvalue (though we can easily infer it for A = 2) and even if we did have this
information, it only determines the size of the largest Jordan block (per eigenvalue). We will press ahead,
considering each eigenvalue one at a time.
    The eigenvalue A = 2 has "full" geometric multiplicity, and is not an impediment to diagonalizing T.
We will treat it in full generality anyway. First we compute the generalized eigenspace. Since Theorem
GEK [632] says that CT (2) = C((T - 2Ic1)10 we can compute this generalized eigenspace as a null space
derived from the matrix A,


(A - 21o)10 RREF


Ti
0
0
0
0
0
0
0
0
0


0
01
0
0
0
0
0
0
0
0


0
0
01
0
0
0
0
0
0
0


0
0
0
01
0
0
0
0
0
0


0
0
0
0
01
0
0
0
0
0


0
0
0
0
0
01
0
0
0
0


0
0
0
0
0
0
01
0
0
0


0
0
0
0
0
0
0
01
0
0


-2   -1
-1 -1
1     2
-1 -2
1     0
-2 1
-1 0
0     1
0     0
0     0


Version 2.02


﻿
Subsection JCF.JCF   Jordan Canonical Form  654


gT(2) =ICQ((A - 2Iio)1O)


2
1
-1
1

21
2
  1
  0
  1
_0_


1
  1
  -2
  2
  0
  -1
  0
  -1
  0
_1_


The restriction of T to GT (2) relative to the two basis vectors above has a matrix representation that is a
2 x 2 diagonal matrix with the eigenvalue A = 2 as the diagonal entries. So these two vectors will be the
first two vectors in our basis for C10,


vi


2
1
-1
1
-1
2
1
0
1
0.


V2


1-
1
-2
2
0
-1
0
-1
0
1.


Notice that it was not strictly necessary to compute the 10-th power of A - 2I1o. With oT (2) = yT (2)
the null space of the matrix A - 2I10 contains all of the generalized eigenvectors of T for the eigenvalue
A = 2. But there was no harm in computing the 10-th power either. This discussion is equivalent to the
observation that the linear transformation Tjg(2): CT (2) H gT (2) is nilpotent of index 1. In other words,
t (2) =1.
    The eigenvalue A = 0 will not be quite as simple, since the geometric multiplicity is strictly less than
the geometric multiplicity. As before, we first compute the generalized eigenspace. Since Theorem GEK
[632] says that gT (0) =c ((T - OIcio)10) we can compute this generalized eigenspace as a null space
derived from the matrix A,


(A - 0110)10 RREF,


0
0
0
0
0
0
0
0
0


0
01
0
0
0
0
0
0
0
0


0
0
01
0
0
0
0
0
0
0


0
0
0
01
0
0
0
0
0
0


0
0
0
0
01
0
0
0
0
0


0
0
0
0
0
01
0
0
0
0


0
-1
0
0
0
-1
0
0
0
0


0
0
0
0
0
0
01
0
0
0


-1
-1
1
-2
1
-1
1
0
0
0


-1
0
2
-1
0
2
0
0
0
0


Version 2.02


﻿
Subsection JCF.JCF  Jordan Canonical Form  655


gT (0) = C((A - 0110)10)


K


-0-
1
0
0
0
1
1
0
0
0


1-
1
-1
2
-1
1
0
-1
1
0 _


1-
0
-2
1
0
-2
0
0
0
1.


(F)


So dim (9T (0)) = 3 = aT (0), as expected. We will use these three basis vectors for the generalized
eigenspace to construct a matrix representation of TIgT(0), where F is being defined implicitly as the basis
of gT (0). We construct this representation as usual, applying Definition MR [542],


(


I


PF


T g(0


/


(


PF


TI T(0)


( 0
  1
  0
  0
  0
  1
  1
  0
  0


  1-
  1
  -1
  2
  -1
  1
  0
  -1
  1
  0
-11
0
-2
1
0
-2
0
0
0
_1 _


//


\


PF


/


\


PF


--1-
0
2
-1
0
2
0
0
0
-1
-1-
0
-2
1
0
-2
0
0
0
_1=
-0-
0
0
0
0
0
0
0
0
_0_


j/
1\


PF


(-1)


PF


/


/


1-
0
-2
1
0
-2
0
0
0
1


(1)


0
0
-1]


1
0
-2
1
0
-2
0
0
0
~1


/


0
0
1-


7 \


\


(


j/
1\


/


\


/


I


PF   T7'T(0)


PF


0
0
0_


/


I


I


So we have the matrix representation


M = MT T()


0
0
-1


00
0  0
1t0


Version 2.02


﻿
                                                        Subsection JCF.JCF   Jordan Canonical Form 656


By Theorem RGEN [640] we can obtain a nilpotent matrix from this matrix representation by subtracting
the eigenvalue from the diagonal elements, and then we can apply Theorem CFNLT [619] to M - (0)13.
First check that (M - (0)I3)2 = 0, so we know that the index of M - (0)I3 as a nilpotent matrix, and that
therefore A = 0 is an eigenvalue of T with index 2, oT (0) = 2. To determine a basis of C3 that converts
M - (0)13 to canonical form, we need the null spaces of the powers of M - (0)I3. For convenience, set
N = M - (0)13.

                                               .1_ 0
                                N(N1) =        1 ,o
                                               0    -1
                                               -1_   0     0
                                N (N2)=        0 ,1 ,0            =C3


Then we choose a vector from N(N2) that is not an element of N(N1). Any vector with unequal first
two entries will fit the bill, say


                                              z2,1 = 0
                                                     0

where we are employing the notation in Theorem CFNLT [619]. The next step is to multiply this vector
by N to get part of the basis for N(N1),

                                                0   0 0     1       0
                               zi,1= Nz2,1= [0      0 0    [    =   0
                                                -1 1 0_ 0          -1_

We need a vector to pair with zil, that will make a basis for the two-dimensional subspace N(N1).
Examining the basis for N(N1) we see that a vector with its first two entries equal will do the job.

                                                      1
                                              z1,2   [t]
                                                      -0

Reordering, we find the basis,


                             C, = f{zi,1,z2,1,zi,2}[ =n0I,[0l,1


From this basis, we can get a matrix representation of N (when viewed as a linear transformation) relative
to the basis C for C3,


                                      [000] = [J2 (0) JO0)


Now we add back the eigenvalue A =0 to the representation of N to obtain a representation for M. Of


course, with an eigenvalue of zero, the change is not apparent, so we won't display the same matrix again.
This is the second block of the Jordan canonical form for T. However, the three vectors in C will not
suffice as basis vectors for the domain of T  they have the wrong size! The vectors in C are vectors in
the domain of a linear transformation defined by the matrix M. But M was a matrix representation of


Version 2.02


﻿
                                                        Subsection JCF.JCF  Jordan Canonical Form 657


T gT () - 0IgT () relative to the basis F for CT (0). We need to "uncoordinatize" each of the basis vectors
in C to produce a linear combination of vectors in F that will be an element of the generalized eigenspace
CT (0). These will be the next three vectors of our final answer, a basis for C10 that has a pleasing matrix
representation.


                         1
                         0-

                         0
                         0
             1          0°
V3 = pF, 0 =0 y
            - 1-1
                        0
                        0


                        -0-
                        0
                        0

                        0
            1          0
V4 = pF10 = 1


               00
            - - 1
                       0
                       0


                       0

                       0
                  -0

v5 = pF1  1  = 1 1


                       0
                       0
                       _0_]


1-
1
-1
2
-1
1
0
-1


+0


+ (-1)


-1-
0
-2
1
0
-2
0
0
0
_1 .


--1-
0
2
-1
0
2
0
0
0
-1


   1
 _Lo _

-a1l


+0


-1
2
-1
1
0
-1
1
0
1
1
-1
2
-1
1
0
-1
1
0


+0


1
0
-2
1
0
-2
0
0
0
1
1
0
-2
1
0
-2
0
0
0
1


-0-
1
0
0
0
1
1
0
0
0
1-
2
-1
2
-1
2
1
-1
1
0


+1


+0


Five down, five to go. Basis vectors, that is. A = -1 is the smallest eigenvalue, but it will require the
most computation. First we compute the generalized eigenspace. Since Theorem GEK [632] says that
gQ (-1) = C((T - (-1)Icio) 0) we can compute this generalized eigenspace as a null space derived from
the matrix A,


(A-(-1)110)'° RREF


0       1
0   Q1  0
0   0   0
0   0   0
0   0   0
0   0   0
0   0   0
0   0   0
0   0   0
0   0   0


0
0
01
0
0
0
0
0
0
0


1   0
1 0
1 0
02 Q
0 0
0 0
0 0
0 0
0 0
0 0


-1
0
1
-2
0
0
0
0
0
0


1   0
1 0
0 0
1 0
0 Q
0 0
0 0
0 0
0 0
0 0


1
0
-2
2
0
0
0
0
0
0


Version 2.02


﻿
Subsection JCF.JCF   Jordan Canonical Form  658


gT (-1) =C((A - (-1)I1o)1o)


K


-1
0
  1
  0
  0
  0
  0
  0
  0
_0 _


-1-
-1
0
-1
1
0
0
0
0
0 _


1"
0
0
-1
0
2
1
0
0
0


--1-
-1
0
0
0
-1
0
1
0
_0.


-1
0
0
2
0
-2
0
0
0
_1_


I


(F)


So dim (gT (-1)) = 5 = aT (-1), as expected. We will use these five basis vectors for the generalized
eigenspace to construct a matrix representation of T gT(_1), where F is being recycled and defined now
implicitly as the basis of gT (-1). We construct this representation as usual, applying Definition MR [542],


1'


/


PF


I9T(-1)


-1-
0
1
0
0
0
0
0
0
0


/


PF


/


PF


0


-1-
0
1
0
0
0
0
0
0
0


+0


ill
-1-
-1
  0
  -1
  1
  0
  0
  0
  0
  0


+(-2)


-1
0
0
0
0
-2
-2
0
0
-1
1
0
0
-1
0
2
1
0
0
0
7
1
-5
3
-1
2
4
0
0
3


/


+0


-1
-1
0
0
0
-1
0
1
0
0


+(-1)


--1
  0
  0
  2
  0
  -2
  0
  0
  0
_1_


0
0
-2
0
-1


1


(


PF


T IT(-1)


-1-
-1
0
-1
1
0
0
0
0
0


/


PF


II


I


Version 2.02


﻿
Subsection JCF.JCF  Jordan Canonical Form  659


'


PF


(-5)


~-1
0
1
0
0
0
0
0
0
0
ire


+ (-1


--1-
-1
0
-1
1
0
0
0
0
~0


PF


+4


~1
0
0
-1
0
2
1
0
0
0


+0


- 1-
-1
0
0
0
-1
0
1
0
0


+3


~-1~
0
0
2
0
-2
0
0
0
~1


-5
-1
4
0
3_


/


(


1


PF


T IgT (-1)


0
0
-1
0
2
1
0
0


1,
0
-1


/


~-1
0
1
0
0
0
0
0
0
0


0


II


PF


(-1)


+0


- 1-
-1
0
-1
1
0
0
0
0
0


+1


1
0
0
1
0
0
1
1
0
0
-1
0
2
1
0
0
0
-1
0
2
-2
-1
1
-1
1
0
-2


I


+0


~-1=
-1
0
0
0
-1
0
1
0
0


+1


-1-
0
0
2
0
-2
0
0
0
1=


-1
0
1
0
1


I


(


/


PF


T IgT (-1)


- 1-
-1
0
0
0
-1
0
1
0
0


/


PF


II


/


Version 2.02


﻿
                                                      Subsection JCF.JCF Jordan Canonical Form 660


                       -1           -1           1        -1           -1
                       0            -1           0        -1            0
                       1             0           0         0            0         2
                       0            -1           -1        0            2
                       00                     1       0          0      0         -
             =PF    2   0   +(-1)   0   +(-1)    2   +1    _-  +(-2) -2
                        0            0           1         0            0        -2
                        0            0           0         1            0       -2
                        0            0           0         0            0
                        0            0           0         0            1
                             -1              -7
                             0               -1
                             0                6
                             2               -5
                             0               -1
             PF  TgT(-1)     -2        PF    -2
                             0               -6
                             0                2
                             0                0
                             1               -6
                       -1           -1           1        -1           -1
                       0            -1           0        -1            0
                       1             0           0         0            0         6
                       0            -1           -1        0            2
             =F         0            1           0         0            0     -    6
               PF   6   0  +(-1)     0  +(-6)        +2   -1   +(-6) -2           26
                        0            0           1         0            0         2
                        0            0           0         1            0        -6
                        0            0           0         0            0
                        0            0           0         0            1


So we have the matrix representation of the restriction of T (again recycling and redefining the matrix M)

                                                0  -5   -1   2    6
                                                0  -1    0   -1 -1
                            M=MFTT(1           -2   4    1   -1 -6
                                                0   0    0    1   2
                                                -1  3    1   -2  -6_

By Theorem RGEN [640] we can obtain a nilpotent matrix from this matrix representation by subtracting
the eigenvalue from the diagonal elements, and then we can apply Theorem CFNLT [619] to M - (-1)I5.
First check that (M - (-1)Is)3  0, so we know that the index of M - (-1)Is as a nilpotent matrix, and
that therefore A =-1 is an eigenvalue of T with index 3, T (-1) =3. To determine a basis of C5 that
converts M - (-1)I5 to canonical form, we need the null spaces of the powers of M - (-1)Is. Again, for
convenience, set N= M - (-1)I5-


               1    -3
               0     1
N(Nl) =   1 ,0
               0    -2
               _0_ _2 _


Version 2.02


﻿
                                                    Subsection JCF.JCF Jordan Canonical Form 661


                                      3     1    0    -3
                                      1     0    0     0
                        N(N2) =       0r1,0n,(0
                                      0    0     1     0
                                      _0_  0_    0_    1 _
                                      1    0     0    0    0
                                      0     1    0    0    0
                        A(N3)=        0, 0     -1   2 0     0     =C
                                      0    0     0        1 0


                                      -1   3    1  -2    - 1        -
Then we choose a vector from (N3) that is not an element of n (N2). The sum of the four basis vectors
for  (Nr2) sum to a vector with all five entries equal to 1. We will mess with the first entry to create a
vector not in N(N2),

                                                  0
                                                  1
                                          Z3,1=
                                                  1
                                                  1
whe ar f he employing the notation in Theorem CFNLT [619]. The next step is to multiply this vector
by N to get a portion of the basis for(N2),

                                       1  -5   -1   2   6    0       2
                                       0   0   0   -1 -1     1      -2
                       z2,1 = Nz3,1=  -2   4   2   -1 -6     1 =    -1
                                      0    0   0    2   2    1       4
                                      -1   3    1  -2   -5_  1_    _-3_

We have a basis for the two-dimensional subspace N(N1) and we can add to that the vector z2,1 and we
have three of four basis vectors for NJ~(N2) . These three vectors span the subspace we call Q2. We need a
fourth vector outside of Q2 to complete a basis of the four-dimensional subspace N(N2) . Check that the
vector
                                                  3
                                                  1
                                          z2,2 = 3
                                                  1


                3                                    3
                -1                                   -2
z1,1 =Nz2,1 =   0                     zi,2 = Nz2,2 = -3
                2                                     4
                -2_                                  -4


Version 2.02


﻿
                                                        Subsection JCF.JCF Jordan Canonical Form 662


Now we reorder these basis vectors, to arrive at the basis

                                                      3      2      0     3      3-
                                                      -1    -2      1     -2     1
                 C = {Z1,1, Z2,1, Z3,1, Z1,2, Z2,2} = 0   , -1   , 1 , -3     , 3
                                                      2      4      1     4      1
                                                      -2_   -3_     1_    -4_    1_

A matrix representation of N relative to C is

                                   0   1 0   0  0
                                   0 0    1 0   0      J3(0)    0

                                   0   0  0  0  1       0     J2(0)
                                   _0 0 0 0 0_

To obtain a matrix representation of M, we add back in the matrix (-1)15, placing the eigenvalue back
along the diagonal, and slightly modifying the Jordan blocks,

                             -1    1    0    0    0
                             0    -1    1    0    0       J-
                             0     0   -1    0    0      [J3(-1)      01
                             0     0    0   -1    1                  (-1)-
                              0    0    0    0   -1

The basis C yields a pleasant matrix representation for the restriction of the linear transformation T -
(-1)I to the generalized eigenspace CT (-1). However, we must remember that these vectors in C5 are
representations of vectors in C10 relative to the basis F. Each needs to be "un-coordinatized" before joining
our final basis. Here we go,

                                -1            -1         1        -1           -1       -2-
                                0             -1         0        -1            0       -1
                     --1                       0         0         0            0        3
                                 0            -1        -1         0            2       -3
                1=               0             1         0         0            0       -1
        V6 =p-F      0      =3   0   +(-1)         +0    2   +2     1 +(-2) -2           2
                                 0      0                1         0            0        0
                     20                        0         0         1            0        2
                                 0             0         0         0            0        0

                                 0             -1         0           -1        1         -2


                   (-1-         [2           [0l          [0        [2              0      K]


                      0             1            0         0            0       -2
          v           0   +(-2)     0   +(-1)    2   +4-1      +(-3) -2          0
V7~P~[1jJ4
                      0             0            1         0            0       -1
                      0             0            0         1            0        4
                      0             0            0         0            0        0
                      0 _         _ 0           _ 0 _     _ 0 _        _ 1 _    -3_


Version 2.02


﻿
Subsection JCF.JCF


Jordan Canonical Form 663


Subsection JCF.JCF Jordan Canonical Form 663


             0
             1
V8   pF,     1
             1
             1


0


             3
             -2
V9 = pF      -3
             4
             -4


~-1
0
1
0
0
0
0
0
0
-o
  -1-
  0
  1
  0
  0

  0
  0
  0
  0~

~-1
0
1
0
0
0
0
0
0
-o


+1


~-1-
-1
0
-1
1
0
0
0
0
~0


+ (-2


+1


-1~
-1
0
-1
1
0
0
0
0
~0


+3


-1-
0
0
-1
0
2
1
0
0
=0


+1


+ (-s>


-1~
-1
  0
  0
  0
  -1
  0
  1
  0
  0~

01
0

-1
0
2
1
0
0
-o
-1
-1
  0
  0
  0
  -1
  0
  1
  0
  0~


+4


+1


  -1-
  0
  0
  2
  0
  -2
  0
  0
  0
  1~
~-1
-1
0
0
0
-1
0
1
0
-o
  -1-
  0
  0
  2
  0
  -2
  0
  0
  0
  1~


+ (-


-2-
02
0
1
-1
-1
1

0
~1
   -1

   0

   4 0


--4-
-2
3
-3
-2
-2
-3
4
0
L-4


             3
             1
V1p=pF,      3
             1
             1


3


+1


~-1-
-1
0
-1
1
0
0
0
0
0


-1-
0
0
-1
0
2
1
0
0
=0


+1


To summarize, we list the entire basis B= {v1, v2, v3, ... , vio},


VI


1
-1
-1
-1

2
1
0
1
0


V2


~1
1
-2
2
0
-1
0
-1
0
~1


~-1~
0
2
-1
0
2
0
0
0
-1


V4


~0-
1
0
0
0
1
1
0
0
~0


V5


-1-
2
-1
2
-1
2
1
-1
1
0


Version 2.02


﻿
                                                    Subsection JCF.CHT  Cayley-Hamilton Theorem  664


              -2                 -2                -2                -4                 -3
              -1                 -2                -2                -2                 -2
              3                  2                  0                 3                  3
              -3                 -3                 0                -3                 -2
              -1                 -2                 1                -2                  1
        V 6 =  2          V 7 =  0          V 8    -_         V g =  - 2         V i     3

               0                 -1                 1                -3                  3
               2                 4                  1                 4                  1
               0                 0                  0                 0                  0
               -2                -3                 1                -4                  1

The resulting matrix representation is

                                    2 0 0 0 0       0    0   0    0    0
                                    0 2 0 0 0       0    0   0    0    0
                                    0 0 0 1 0       0    0   0    0    0
                                    0 0 0 0 0       0    0   0    0    0
                          MT        0 0 0 0 0       0    0   0    0    0
                            BB      0 0 0 0 0 -1         1   0    0    0
                                   0 0 0 0 0        0   -1   1    0    0
                                   0 0 0 0 0        0   0   -1    0    0
                                   0 0 0 0 0        0    0   0   -1    1
                                   0 0 0 0 0        0    0   0    0   -1

If you are not inclined to check all of these computations, here are a few that should convince you of the
amazing properties of the basis B. Compute the matrix-vector products Ave, 1 < i < 10. In each case the
result will be a vector of the form Avi + ovi_1, where A is one of the eigenvalues (you should be able to
predict ahead of time which one) and b E {0,1}.
   Alternatively, if we can write inputs to the linear transformation T as linear combinations of the vectors
in B (which we can do uniquely since B is a basis, Theorem VRRB [317]), then the "action" of T is reduced
to a matrix-vector product with the exceedingly simple matrix that is the Jordan canonical form. Wow!


Subsection CHT
Cayley-Hamilton Theorem


Jordan was a French mathematician who was active in the late 1800's. Cayley and Hamilton were 19th-
century contemporaries of Jordan from Britain. The theorem that bears their names is perhaps one of
the most celebrated in basic linear algebra. While our result applies only to vector spaces and linear
transformations with scalars from the set of complex numbers, C, the result is equally true if we restrict
our scalars to the real numbers, TR. It says that every matrix satisfies its own characteristic polynomial.
Theorem CHT
Cayley-Hamilton Theorem
Suppose A is a square matrix with characteristic polynomial PA (x). Then PA (A) =0.D
Proof Suppose B and C are similar matrices via the matrix S, B =S--CS, and q~x) is any polynomial.
Then q (B) is similar to q (C) via 5, q (B) - S--q (C) S. (See Example HPDM [441] for hints on how to


convince yourself of this.)
   By Theorem JCFLT [651] and Theorem SCB [583] we know A is similar to a matrix, J, in Jordan
canonical form. Suppose A1, A2, A3, -.-.-., Am are the distinct eigenvalues of A (and are therefore the eigen-
values and diagonal entries of J). Then by Theorem EMRCP [404] and Definition AME [406], we can


Version 2.02


﻿
                                                      Subsection JCF.CHT   Cayley-Hamilton Theorem   665


factor the characteristic polynomial as

                 PA (x) - (x - AlaA(A1) (x-A2  (A2) (x - A3)a(A3) ... (x - Am)aA(Am)

On substituting the matrix J we have

              PA (J)= (J - AiI)aZA(A1) (J - A2)A() (J - A31)aA(A3) ... (J - AmI)aA(Am)

The matrix J - AI will be block diagonal, and the block arising from the generalized eigenspace for Ak
will have zeros along the diagonal. Suitably adjusted for matrices (rather than linear transformations),
Theorem RGEN [640] tells us this matrix is nilpotent. Since the size of this nilpotent matrix is equal to
the algebraic multiplicity of Ak, the power (J - AkI)A(A k) will be a zero matrix (Theorem KPNLT [617])
in the location of this block.
    Repeating this argument for each of the m eigenvalues will place a zero block in some term of the product
at every location on the diagonal. The entire product will then be zero blocks on the diagonal, and zero off
the diagonal. In other words, it will be the zero matrix. Since A and J are similar, PA (A) =PA (J) = 0.


Version 2.02


﻿
                                                        Annotated Acronyms JCF.R   Representations 666


Annotated Acronyms R
Representations


Definition VR [530]
Matrix representations build on vector representations, so this is the definition that gets us started. A
representation depends on the choice of a single basis for the vector space. Theorem VRRB [317] is what
tells us this idea might be useful.

Theorem VRILT [535]
As an invertible linear transformation, vector representation allows us to translate, back and forth, between
abstract vector spaces (V) and concrete vector spaces (C"). This is key to all our notions of representations
in this chapter.

Theorem CFDVS [535]
Every vector space with finite dimension "looks like" a vector space of column vectors. Vector representa-
tion is the isomorphism that establishes that these vector spaces are isomorphic.

Definition MR [542]
Building on the definition of a vector representation, we define a representation of a linear transformation,
determined by a choice of two bases, one for the domain and one for the codomain. Notice that vectors
are represented by columnar lists of scalars, while linear transformations are represented by rectangular
tables of scalars. Building a matrix representation is as important a skill as row-reducing a matrix.

Theorem FTMR [544]
Definition MR [542] is not really very interesting until we have this theorem. The second form tells us that
we can compute outputs of linear transformations via matrix multiplication, along with some bookkeeping
for vector representations. Searching forward through the text on "FTMR" is an interesting exercise. You
will find reference to this result buried inside many key proofs at critical points, and it also appears in
numerous examples and solutions to exercises.

Theorem MRCLT [549]
Turns out that matrix multiplication is really a very natural operation, it is just the chaining together
(composition) of functions (linear transformations). Beautiful. Even if you don't try to work the problem,
study Solution MR.T80 [572] for more insight.

Theorem KNSI [552]
Kernels "are" null spaces. For this reason you'll see these terms used interchangeably.

Theorem RCSI [555]
Ranges "are" column spaces. For this reason you'll see these terms used interchangeably.

Theorem IMR [557]
Invertible linear transformations are represented by invertible (nonsingular) matrices.


Theorem NME9 [560]
The NMEx series has always been important, but we've held off saying so until now. This is the end of
the line for this one, so it is a good time to contemplate all that it means.


Version 2.02


﻿
                                                        Annotated Acronyms JCF.R   Representations 667


Theorem SCB [583]
Diagonalization back in Section SD [432] was really a change of basis to achieve a diagonal matrix repe-
sentation. Maybe we should be highlighting the more general Theorem MRCB [581] here, but its overly
technical description just isn't as appealing. However, it will be important in some of the matrix decom-
postions in Chapter MD [822].

Theorem EER [586]
This theorem, with the companion definition, Definition EELT [574], tells us that eigenvalues, and eigen-
vectors, are fundamentally a characteristic of linear transformations (not matrices). If you study matrix
decompositions in Chapter MD [822] you will come to appreciate that almost all of a matrix's secrets can
be unlocked with knowledge of the eigenvalues and eigenvectors.

Theorem OD [607]
Can you imagine anything nicer than an orthonormal diagonalization? A basis of pairwise orthogonal, unit
norm, eigenvectors that provide a diagonal representation for a matrix? Here we learn just when this can
happen     precisely when a matrix is normal, which is a disarmingly simple property to define.

Theorem CFNLT [619]
Nilpotent linear transformations are the fundamental obstacle to a matrix (or linear transformation) being
diagonalizable. This specialized representation theorem is the fundamental expression of just how close we
can come to surmounting the obstacle, i.e. how close we can come to a diagonal representation.

Theorem DGES [650]
This theorem is a long time in coming, but perhaps it best explains our interest in generalized eigenspaces.
When the dimension of a "regular" eigenspace (the geometic multiplicity) does not meet the algebraic
multiplicity of the corresponding eigenvalue, then a matrix is not diagonalizable (Theorem DMFE [438]).
However, if we generalize the idea of an eigenspace (Definition GES [631]), then we arrive at invariant
subspaces that together give a complete decomposition of the domain as a direct sum. And these subspaces
have dimensions equal to the corresponding algebraic multiplicities.

Theorem JCFLT [651]
If you can't diagonalize, just how close can you come? This is an answer (there are others, like rational
canonical form). "Canonicalism" is in the eye of the beholder. But this is a good place to conclude our
study of a widely accepted canonical form that is possible for every matrix or linear transformation.


Version 2.02


﻿


Appendix CN

Computation Notes


Section MMA
Mathematica
U.-


Computation Note ME.MMA
Matrix Entry


Matrices are input as lists of lists, since a list is a basic data structure in Mathematica. A matrix is a list
of rows, with each row entered as a list. Mathematica uses braces (({ , })) to delimit lists. So the input

                          a = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}}

would create a 3 x 4 matrix named a that is equal to

                                      1  2   3  4
                                      5  6   7  8
                                      9 10  11  12

To display a matrix named a "nicely" in Mathematica, type MatrixForm [a] , and the output will be
displayed with rows and columns. If you just type a , then you will get a list of lists, like how you input
the matrix in the first place.


Computation Note RR.MMA
Row Reduce


If a is the name of a matrix in Mathematica, then the command RowReduce [a] will output the reduced
row-echelon form of the matrix.


668


﻿
                                                    Computation Note MMA.LS.MMA     Linear Solve 669


Computation Note LS.MMA
Linear Solve


Mathematica will solve a linear system of equations using the LinearSolve [] command. The inputs are
a matrix with the coefficients of the variables (but not the column of constants), and a list containing the
constant terms of each equation. This will look a bit odd, since the lists in the matrix are rows, but the
column of constants is also input as a list and so looks like a row rather than a column. The result will
be a single solution (even if there are infinitely many), reported as a list, or the statement that there is
no solution. When there are infinitely many, the single solution reported is exactly that solution used in
the proof of Theorem RCLS [53], where the free variables are all set to zero, and the dependent variables
come along with values from the final column of the row-reduced matrix.
   As an example, Archetype A [702] is

                                          -     2 + 2x3= 1
                                          2x1+ x2+ x3 =8
                                               x1 + x2 = 5

To ask Mathematica for a solution, enter

                      LinearSolve[ {{1, -1, 2}, {2, 1, 1}, {1, 1, 0}}, {1, 8, 5} ]

and you will get back the single solution
                                              {3, 2, 0}

We will see later how to coax Mathematica into giving us infinitely many solutions for this system (Com-
putation VFSS.MMA [669]).


Computation Note VLC.MMA
Vector Linear Combinations


Contributed by Robert Beezer
Vectors in Mathematica are represented as lists, written and displayed horizontally. For example, the
vector
                                                   11


would be entered and named via the command

                                           v ={1, 2, 3, 4}

Vector addition and scalar multiplication are then very natural. If u and v are two lists of equal length,
then


                                             2u+ (-3)v

will compute the correct vector and return it as a list. If u and v have different sizes, then Mathematica
will complain about "objects of unequal length."


Version 2.02


﻿
                                                   Computation Note MMA.NS.MMA Null Space 670


Computation Note NS.MMA
Null Space


Given a matrix A, Mathematica will compute a set of column vectors whose span is the null space of the ma-
trix with the NullSpace [] command. Perhaps not coincidentally, this set is exactly {z3 1 < j < n - r}.
However, Mathematica prefers to output the vectors in the opposite order than one we have chosen. Here's
a small example.
   Begin with the 3 x 4 matrix A, and its row-reduced version B,

                  1  2 -1     01                                        0    3  -2
           A      3  4   1   -2              RREF:            B     0   W1-2     1
                 -1 1 -5      3 _                                   0   0    0   0_

We could extract entries from B to build the vectors zi and z2 according to Theorem SSNS [118] and
describe Af(A) as a span of the set {zi, z2}. Instead, if a has been set to A, then executing the command
NullSpace [a] yields the list of lists (column vectors),

                                    {{2, -1, 0, 1}, {-3, 2, 1, 0}}

Notice how our zi is second in the list. To "correct" this we can use a list-processing command from
Mathematica, Reverse [] , as follows,

                                    Reverse [NullSpace [a]]

and receive the output in our preferred order. Give it a try yourself.


Computation Note VFSS.MMA
Vector Form of Solution Set


Suppose that A is an m x n matrix and b E Ctm is a column vector. We might wish to find all of the
solutions to the linear system [S(A, b). Mathematica's LinearSolve [A, b] will return at most one
solution (Computation LS.MMA [668]). However, when the system is consistent, then this one solution
reported is exactly the vector c, described in the statement of Theorem VFSLS [99].
   The vectors u, 1 < j < n - r of Theorem VFSLS [99] are exactly the output of Mathematica's
NullSpace [] command, though Mathematica lists them in the opposite order from the order we have
chosen. These are the same vectors listed as z3, 1   j   n~ - r in Theorem SSNS [118]. With c produced
from the LinearSolve [] command, and the u3 coming from the NullSpace [] command we can use
Mathematica's symbolic manipulation commands to create an expression that describes all of the solutions.
   Begin with the system [S(A, b). Row-reduce A (Computation RR.MMA [667]) and identify the free
variables by determining the non-pivot columns. Suppose, for the sake of argument, that we have the
three free variables xs, z7 and x8. Then the following command will build an expression for an arbitrary
solution:

                          LinearSolve [A, b]+{x8, x7, x3}.NullSpace [Al


Be sure to include the "dot" right before the NullSpace [] command    it has the effect of creating
a linear combination of the vectors in the null space, using scalars that are symbols reminiscent of the
variables.


Version 2.02


﻿
                                      Computation Note MMA.GSP.MMA     Gram-Schmidt Procedure 671


   A concrete example should help here. Suppose we want a solution set for the linear system with
coefficient matrix A and vector of constants b,

                        ~1 2 3 -5      1   -1   2                           8
                  A=    2 4 0     8   -4   1   -8                     b=    1
                        3 6 4     0   -2   5    7                          -5

If we were to apply Theorem VFSLS [99], we would extract the components of c and u3 from the
row-reduced version of the augmented matrix of the system (obtained with Mathematica, Computation
RR.MMA [667]),
                                 1   2   0   4   -2   0   -5   2
                                 0   0 [    -3    1   0    3   1
                                 S0  0   0   0    0  T     2   -3]

Instead, we will use this augmented matrix in reduced row-echelon form only to identify the free variables.
In this example, we locate the non-pivot columns and see that x2, x4, x5 and x7 are free. If we have set a
to the coefficient matrix and b to the vector of constants, then we execute the Mathematica command,

                       LinearSolve [a, b]+{x7, x5, x4, x2}.NullSpace [a]

As output we obtain the column vector (list),

                                  2-2x2 -4x4 +2x5 +5x7
                                                x2
                                      1+3x4 - x5 -3x7
                                                x4
                                                x5
                                            -3 - 2 x7
                                                x7


Computation Note GSP.MMA
Gram-Schmidt Procedure


Mathematica has a built-in routine that will do the Gram-Schmidt procedure (Theorem GSP [175]).
The input is a set of vectors, which must be linearly independent. This is written as a list, contain-
ing lists that are the vectors. Let a be such a list of lists, containing the vectors vi, 1 < i < p from
the statement of the theorem. You will need to first load the right Mathematica package -execute
<<LinearAlgebra'Orthogonalization' to make this happen. Then execute GramSchmidt [a] . The
output will be another list of lists containing the vectors us, 1   i   p from the statement of the theorem.
Mathematica will complain if you do not provide a linearly independent set as input (try it!).
   An example. Suppose our linearly independent set (check this!) is


        -1     0      -1     -1      1
        4      3       2     -2      6
S=      1      0       0     -3     -1
        0      3      -1      1      4
        3      -3    L-21   L 4      6


Version 2.02


﻿
                                         Computation Note MMA.TM.MMA    Transpose of a Matrix 672


The output of the GramSchmidt [] command will be the set,

                           1       12-137              -    337      -  23  -
                           3/       23        429         2/120423      879
                           4_1___468                        37          26
                           4      _12 1       486/120423              3 879
                           __      121 15                  1763         44
                 T =<     11              ,   4-685       6- 120423 '  3 879
                                         3 1     9337                   23
                          0--                 4 685      6/120423      3 879
                          1               5       5        50           1
                                  _-VVV     -          _3L/120423 _  _  87  _

Ugly, but true. At this stage, you might just as well be encouraged to think of the Gram-Schmidt procedure
as a computational black box, linearly independent set in, orthogonal span-preserving set out.
   To check that the output set is orthogonal, we can easily check the orthogonality of individual pairs
of vectors. Suppose the output was set equal to b (say via b=GramSchmidt [a] ). We can extract the
individual vectors of c as "parts" with syntax like c [[3]] , which would return the third vector in the
set. When our vectors have only real number entries, we can accomplish an innerproduct with a "dot."
So, for example, you should discover that c [[3]] . c [[5]] will return zero. Try it yourself with another
pair of vectors.

Computation Note TM.MMA
Transpose of a Matrix


Contributed by Robert Beezer
Suppose a is the name of a matrix stored in Mathematica. Then Transpose [a] will create the transpose
of a .

Computation Note MM.MMA
Matrix Multiplication


If A and B are matrices defined in Mathematica, then A . B will return the product of the two matrices
(notice the dot between the matrices). If A is a matrix and v is a vector, then A. v will return the vector
that is the matrix-vector product of A and v. In every case the sizes of the matrices and vectors need to
be correct.
   Some examples:

                {{1, 2}, {3, 4}}.{{5, 6, 7}, {8, 9, 10}} = {{21, 24, 27}, {47, 54, 61}}
                            {{1, 2}, {3, 4}}.{{5}, {6}} ={{17}, {39}}
                                {{1, 2}, {3, 4}}.{5, 6} ={17, 39}

Understanding the difference between the last two examples will go a long way to explaining how some
Mathematica constructs work.

Computation Note MI.MMA
Matrix Inverse


If A is a matrix defined in Mathematica, then Inverse [Al will return the inverse of A, should it exist. In
the case where A does not have an inverse Mathematica will tell you the matrix is singular (see Theorem
NI [228]).


Version 2.02


﻿
                                                             Section T186 Texas Instruments 86 673


Section T186
Texas Instruments 86


Computation Note ME.T186
Matrix Entry


On the TI-86, press the MATRX key   (Yellow-7) . Press the second menu key over, F2 , to bring up
the EDIT screen. Give your matrix a name, one letter or many, then press ENTER . You can then change
the size of the matrix (rows, then columns) and begin editing individual entries (which are initially zero).
ENTER will move you from entry to entry, or the down arrow key will move you to the next row. A menu
gives you extra options for editing.
   Matrices may also be entered on the home screen as follows. Use brackets ([ , ]) to enclose rows with
elements separated by commas. Group rows, in order, into a final set of brackets (with no commas between
rows). This can then be stored in a name with the STO key. So, for example,

                               [[1, 2, 3, 4] [5, 6, 7, 8] [9, 10, 11, 12]] - A

will create a matrix named A that is equal to

                                          1  2   3   4
                                          5  6   7   8
                                          9 10   11  12


Computation Note RR.T186
Row Reduce


If A is the name of a matrix stored in the TI-86, then the command rref A will return the reduced
row-echelon form of the matrix. This command can also be found by pressing the MATRX key, then F4
for OPS , and finally, F5 for rref .
   Note that this command will not work for a matrix with more rows than columns. (Ed. Not sure just
why this is!) A work-around is to pad the matrix with extra columns of zeros until the matrix is square.


Computation Note VLC.TI86
Vector Linear Combinations


Contributed by Robert Beezer
Vector operations on the TI-86 can be accessed via the VECTR key, which is Yellow-8 . The EDIT tool
appears when the F2 key is pressed. After providing a name and giving a "dimension" (the size) then
you can enter the individual entries, one at a time. Vectors can also be entered on the home screen using
brackets ( [ , ] ). To create the vector


      1
      2
v     3

     Pj


Version 2.02


﻿
                                           Computation Note TI86.TM.TI86 Transpose of a Matrix 674


use brackets and the store key ( STO ),
                                          [1, 2, 3, 4] - v

Vector addition and scalar multiplication are then very natural. If u and v are two vectors of equal size,
then
                                         2 * u+ (-3) *v

will compute the correct vector and display the result as a vector.


Computation Note TM.T186
Transpose of a Matrix


Contributed by Eric Fickenscher
Suppose A is the name of a matrix stored in the TI-86. Use the command AT to transpose A . This
command can be found by pressing the MATRX key, then F3 for MATH , then F2 for T.


Section T183
Texas Instruments 83


Computation Note ME.T183
Matrix Entry


Contributed by Douglas Phelps
On the TI-83, press the MATRX key. Press the right arrow key twice so that EDIT is highlighted. Move
the cursor down so that it is over the desired letter of the matrix and press ENTER . For example, let's call
our matrix B , so press the down arrow once and press ENTER . To enter a 2 x 3 matrix, press 2 ENTER
3 ENTER . To create the matrix
                                             1 2 3
                                             4 5 6

press 1 ENTER 2 ENTER 3 ENTER 4 ENTER 5 ENTER 6 ENTER.


Computation Note RR.TI83
Row Reduce


Contributed by Douglas Phelps
Suppose B is the name of a matrix stored in the TI-83. Press the MATRX key. Press the right arrow
key once so that MATH is highlighted. Press the down arrow eleven times so that rref ( is highlighted,


then press ENTER . to choose the matrix B , press MATRX , then the down arrow once followed by ENTER
Supply a right parenthesis () ) and press ENTER .
   Note that this command will not work for a matrix with more rows than columns. (Ed. Not sure just
why this is!) A work-around is to pad the matrix with extra columns of zeros until the matrix is square.


Version 2.02


﻿
                                     Computation Note TI83.VLC.TI83 Vector Linear Combinations 675


Computation Note VLC.TI83
Vector Linear Combinations


Contributed by Douglas Phelps
Entering a vector on the TI-83 is the same process as entering a matrix. You press 4 ENTER 3 ENTER for
a 4 x 3 matrix. Likewise, you press 4 ENTER 1 ENTER for a vector of size 4. To multiply a vector by 8,
press the number 8, then press the MATRX key, then scroll down to the letter you named your vector (A,
B, C, etc) and press ENTER .
   To add vectors A and B for example, press the MATRX key, then ENTER . Then press the + key.
Then press the MATRX key, then the down arrow once, then ENTER . [Al + [B] will appear on the
screen. Press ENTER.


Section SAGE
SAGE: Open Source Mathematics Software


Computation Note R.SAGE
Rings


Contributed by Steve Canfield
SAGE uses different rings to denote the type of an object. The rings are as follows:


                                 ZZ: The set of integers
                                 QQ: The set of rational numbers
                                 RR: The real numbers
                                 CC: The complex numbers


Most objects in SAGE will tell you which they are using with the basering 0 command. Keep this in
mind, especially when row reducing or factoring. Here's a quick example of where you might go wrong.

                                       m=matrix([[2, 3], [4, 7]])
                                     m.baser ing()
                                     Int eger Ring
                                     m.e chelornform()

                                     2 0j


As you can clearly see, m isn't even in reduced row-echelon form. This is because m is defined over the
ZZ. You have to create matrices with the correct ring or you will get this type of odd result. This problem
comes up in more places than just calculating the reduced row-echelon form, so unless you are specifically
working with integers take note.


Version 2.02


﻿
                                                  Computation Note SAGE.ME.SAGE     Matrix Entry 676


Computation Note ME.SAGE
Matrix Entry


Contributed by Steve Canfield
A matrix in SAGE can be made a few ways. The first is simply to define the matrix as an array of rows.
SAGE uses brackets ([ , ]) to delimit arrays. So the input

                            a = matrix([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

would create a 3 x 4 matrix named a that is equal to

                                           1   2   3   4
                                           5   6   7   8
                                           9 10    11 12

SAGE will guess what type of matrix you are working with based on the inputs. If all the entries are
integers, you will get back an integer matrix. If your matrix contains an entry in the R or Cspace, the
matrix will be of those types. This can cause problems as integers cannot become fractions, which is an
issue when calculating reduced row-echelon form. We therefore recommend using the following construction
to make your matrices,

                          a = matrix(QQ, [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

This gives you a matrix over the rational numbers which will be sufficient for most of the course. If your
matrix has entries that are complex numbers you would replace the QQ with CC .
To display a matrix named a , type a , and the output will be displayed with rows and columns. If you
type latex(a) you will get IATEX code to display the matrix. Very handy.

Computation Note RR.SAGE
Row Reduce


Contributed by Steve Canfield and Robert Beezer
Row-reducing a matrix is a simple operation in SAGE. However, because of Sage's flexibility with different
types of numbers (integers, rationals, reals, complexes), we need to be a bit more careful.
   If a is a matrix entered in in SAGE (see Computation ME.SAGE [675]) then a. echelon_form() will
return a new matrix that is the reduced row-echelon form of a (Definition RREF [30]).
   If your matrix has only integer entries (as is the case with many examples and exercises in this book),
then row operations might introduce rational numbers ("fractions"). So when you enter your matrix, you
need to tell SAGE that rational numbers are allowable in its calculations. This is the advice in Computation
R.SAGE [674] to use the ring QQ . As an illustration create

                          a =matrix(QQ, [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

and issue the command a. echelornform() . The result is

                                           [01 2]3


However, if we adjust the entry by neglecting to specify QQ , then SAGE assumes that we only want to
work with integers, since every entry of the matrix is an integer. So as an experiment, enter

                           b = matrix([[1, 2, 3, 4], [5,6,7,8], [9, 10, 11, 12]])


Version 2.02


﻿
                                                     Computation Note SAGE.LS.SAGE  Linear Solve 677


and issue the command b. echelon_form() . The result is

                                             ~1 2 3 4
                                             0 4 8 12
                                             0 0 0     0


You can now clearly see Sage's reluctance to multiply row 2 by .
    The ring QQ will of course suffice if your matrix has rational numbers for entries. Decimal entries are
another place to be careful. If an entry of your matrix is the real number 2.17, you are free to enter it as
the rational number 217 and keep the ring QQ in the specification of your matrix. If you want to consider
your entries as real numbers, then you might as well just specify your ring as the complex numbers CC
This advice also applies if you have complex numbers as entries.
    If you allow SAGE to work with real or complex numbers, then the problem of round-off error becomes
relevant. Computer arithmetic with real numbers is, of necessity, subject to minor inaccuracies and errors.
This becomes problematic when row-reducing a matrix. If a zero entry is computed instead as an extremely
small number, such as 1.287 x 10-18, then an incorrect sequence of row operations will follow (with further
incorrect results). So if you use CC be on the lookout for these kinds of potential pitfalls.
    So, in summary, remember to always specify the ring you will be using for your matrices, and most
matrices can be handled with a choice of QQ or CC .
   When you need to do significant scientific computing with SAGE, there are extra facilities that will
help you work with these subtleties.
    Finally, you can also use a command of the form   a. echelonize 0) to replace a with its reduced
row-echelon form.


Computation Note LS.SAGE
Linear Solve


SAGE can solve a variety of systems of equations with the solve ( ) command, even when the equations
are not linear (see Exercise SSLE.M70 [19]). But we can afford to specialize here to just linear systems.
First, you must specify your variables in advance, so for example, var ('x1, x2, x3') might precede a
system with three equations. Equations are then written just as you might expect, except that equality is
written as == , since computer programs have traditionally reserved = to assign values to variables. And
remember to use a * to indicate that a coefficient multiplies a variable.
    The example below illustrates the use of the command and the possibilities for results. Each system
would be preceded by establishing the variables with the command var ( 'x ,y') . In the case of an infinite
solution set, free variables are denoted as rx where x is an integer that increases throughout a session.
The style of this description of a solution set is reminiscent of the style we used in Chapter SLE [2] before
we were accustomed to using linear combinations of vectors (Theorem VFSLS [99]).


                     System                     Solution Set                  Result
   solve([2*x+y==5, 3*x+2*y==15], x, y)        Unique     [Lx == -5, y == 1511
   solve([2*x+y==5, 6*x+3*y==15], x, y)        Infinite   [Lx == (5 - r1)/2, y == r1]]


   solve([2*x+y==5, 6*x+3*y==101, x, y)     Empty     ValueError: Unable to solve...

Notice how the output contains equations written a format that might be suitable as input for further use
within SAGE.


Version 2.02


﻿
                                  Computation Note SAGE.VLC.SAGE  Vector Linear Combinations 678


Computation Note VLC.SAGE
Vector Linear Combinations


Contributed by Robert Beezer
Vectors in SAGE are constructed from lists, and are displayed horizontally. For example, the vector

                                                 1
                                                 2
                                               V3


would be entered and named via the command

                                    v = vector(QQ, [1, 2, 3, 4])

See the notes about rings (Computenote R.SAGE [??]) and matrix entry (Computenote ME.SAGE [??])
for reminders about specifying the relevant ring.
   Vector addition and scalar multiplication are then very natural. If u and v are two vectors of the
same size, then
                                           2u+ (-3)v

will compute the correct vector. it can be assigned to a variable (which will contain a vector), or be printed.
If printed, it will be written horizontally with parentheses for grouping. If u and v have different sizes,
then SAGE will complain about "unsupported operand(s)."

Computation Note MI.SAGE
Matrix Inverse


Contributed by Steve Canfield
If a is a matrix defined in SAGE, then a. inverse 0 will return the inverse of a, should it exist. In the
case where a does not have an inverse SAGE will tell you the matrix must be nonsingular (see Theorem
NI [228]).

Computation Note TM.SAGE
Transpose of a Matrix


Suppose a is the name of a matrix stored in SAGE. Then a. transpose () will return the transpose of
a .


Computation Note E.SAGE
Eigenspaces


Contributed by Steve Canfield
SAGE can compute eigenspaces and eigenvalues for you. If you have a matrix named a and you type

                                        a.eigenspaces()


Version 2.02


﻿
                                                      Computation Note SAGE.E.SAGE     Eigenspaces 679


you will get a listing of the eigenvalues and the eigenspace for each. Let's do an example. Your output
may be formatted slightly different from what we have here.

                           m = matrix(QQ, [[-13, -8, -4], [12, 7, 4], [24, 16, 7]])
                           m.eigenspaces()
                           [(3, [(1, 2/3, 1/3)]), (-1, [(1, 0, 1/2), (0, 1, -1/2)])]


Whew, that looks like a mess. At the top level, eigenspaces() returns a dictionary whose keys are the
eigenvalues. So in this case we have eigenvalues 3 and -1. Each eigenvalue has an array after it that forms
the basis of the eigenspace. In our example, there is 1 vector for A= 3 and 2 vectors for A= -1. Finally,
the vectors SAGE spits out may not be the nicest ones to work with. In particular, we might want to scale
the vectors to get rid of fractions.


Version 2.02


﻿


Appendix P

Preliminaries


This appendix contains important ideas about complex numbers, sets, and the logic and techniques of
forming proofs. It is not meant to be read straight through, but you should head here when you need to
review these ideas.
   We choose to expand the set of scalars from the real numbers, R, to the set of complex numbers, C. So
basic operations with complex numbers (like addition and division) will be necessary. This can be safely
postponed until your arrival in Section 0 [167], and a refresher before Chapter E [396] would be a good
idea as well.
   Sets are extremely important in all of mathematics, but maybe you have not had much exposure to
the basic operations. Check out Section SET [683]. The text will send you here frequently as well. Visit
often.
   This book is as much about doing mathematics as it is about linear algebra. The "Proof Techniques"
are vignettes about logic, types of theorems, structure of proofs, or just plain old-fashioned advice about
how to do advanced mathematics. The text will frequently point to one of these techniques in advance
of their first use, and for specific instructions there will be additional references. If you find constructing
proofs difficult (we all did once), then head back here and browse through the advice for second or third
readings.


Section CNO
Complex Number Operations


In this section we review of the basics of working with complex numbers.

Subsection CNA
Arithmetic with complex numbers


A complex number is a linear combination of 1 and i    -1, typically written in the form a + bi.
Complex numbers can be added, subtracted, multiplied and divided, just like we are used to doing with
real numbers, including the restriction on division by zero. We will not define these operations carefully,
but instead illustrate with examples.
Example ACN
Arithmetic of complex numbers


680


﻿
                                         Subsection CNO.CNA Arithmetic with complex numbers 681


          (2 + 5i) - (6 - 4i) = (2 - 6) + (5 - (-4))i = -4 + 9i
             (2 + 5i)(6 - 4i) = (2)(6) + (5i)(6) + (2)(-4i) + (5i)(-4i) =12 + 30i - 8i - 20i2
                           =12 + 22i - 20(-1) =32 + 22i

Division takes just a bit more care. We multiply the denominator by a complex number chosen to produce
a real number and then we can produce a complex number as a result.

                     2+5i    2+5i6+4i      -8+38i      8    38.     2   19.
                     6 - 4i  6 - 4i 6 + 4i    52       52   52     13   26


   In this example, we used 6 + 4i to convert the denominator in the fraction to a real number. This
number is known as the conjugate, which we define in the next section. We will often exploit the basic
properties of complex number addition, subtraction, multiplication and division, so we will carefully define
the two basic operations, together with a definition of equality, and then collect nine basic properties in a
theorem.
Definition CNE
Complex Number Equality
The complex numbers a = a + bi and 3/= c + di are equal, denoted a =3, if a = c and b = d.
(This definition contains Notation CNE.)                                                   A

Definition CNA
Complex Number Addition
The sum of the complex numbers a = a + bi and 3/= c + di , denoted a + 3, is (a + c) + (b + d)i.
(This definition contains Notation CNA.)                                                   A

Definition CNM
Complex Number Multiplication
The product of the complex numbers a = a + bi and 3 = c+ di , denoted c/3, is (ac - bd) + (ad + bc)i.
(This definition contains Notation CNM.)                                                   A

Theorem PCNA
Properties of Complex Number Arithmetic
The operations of addition and multiplication of complex numbers have the following properties.

   " ACCN Additive Closure, Complex Numbers
     If a,3 EC, then c+/3 EC.

   * MCCN Multiplicative Closure, Complex Numbers
     If a,/ #EC, then ca/ EcC.

   * CACN Commutativity of Addition, Complex Numbers
     For any a, /3 E C, ca + /3 /3+ ca.

   * CMCN Commutativity of Multiplication, Complex Numbers
     For any a, /3 E C, ca/ =3oa.

   * AACN Additive Associativity, Complex Numbers


  For any a, /3, 7 EC, a+ (/3+'Y) =(c+/)+7.

" MACN Multiplicative Associativity, Complex Numbers
  For any a, /3, 7 EC, a (,3y) =_(a,)7.


Version 2.02


﻿
                                          Subsection CNO.CCN  Conjugates of Complex Numbers 682


   " DCN Distributivity, Complex Numbers
     For any a, 3, 'EC, a(,3+Y) =a3+cay.

   " ZCN Zero, Complex Numbers
     There is a complex number 0 = 0 + Oi so that for any a E C, 0 + a = a.

   " OCN One, Complex Numbers
     There is a complex number 1 =1 + Oi so that for any a E C, 1a = a.

   " AICN Additive Inverse, Complex Numbers
     For every a E CC there exists -a E CC so that a + (-a) = 0.

   " MICN Multiplicative Inverse, Complex Numbers
     For every a E C, a # 0 there exists  E C so that a ( ) =1.


Proof We could derive each of these properties of complex numbers with a proof that builds on the
identical properties of the real numbers. The only proof that might be at all interesting would be to show
Property MICN [681] since we would need to trot out a conjugate. For this property, and especially for
the others, we might be tempted to construct proofs of the identical properties for the reals. This would
take us way too far afield, so we will draw a line in the sand right here and just agree that these nine
fundamental behaviors are true. OK?
   Mostly we have stated these nine properties carefully so that we can make reference to them later in
other proofs. So we will be linking back here often.                                        U


Subsection CCN
Conjugates of Complex Numbers


Definition CCN
Conjugate of a Complex Number
The conjugate of the complex number c = a + bi E C is the complex number c = a - bi.
(This definition contains Notation CCN.)                                                    A

Example CSCN
Conjugate of some complex numbers


        2+ 3i =2 -3i        5 -4i =5+ 4i         -3+0i =-3+0i            0 +0i =0 +0i


   Notice how the conjugate of a real number leaves the number unchanged. The conjugate enjoys some
basic properties that are useful when we work with linear expressions involving addition and multiplication.

Theorem CCRA
Complex Conjugation Respects Addition
Suppose that c and d are complex numbers. Then c + d = 2+ d.


Proof Let c = a + bi and d = r + si. Then

              c + d = (a+r) + (b + s)i = (a + r) - (b + s)i = (a - bi) + (r - si) = c + d


Version 2.02


﻿
                                         Subsection CNO.MCN Modulus of a Complex Number 683


Theorem CCRM
Complex Conjugation Respects Multiplication
Suppose that c and d are complex numbers. Then cd = Td.                                D
Proof Let c = a + bi and d = r + si. Then

                 cd = (ar - bs) + (as + br)i = (ar - bs) - (as + br)i
                    = (ar - (-b)(-s)) + (a(-s) + (-b)r)i = (a - bi)(r - si) = Ed


Theorem CCT
Complex Conjugation Twice
Suppose that c is a complex number. Then c= c.                                         D
Proof Let c = a + bi. Then
                              S= a - bi = a - (-bi)=a+ bi=c


Subsection MCN
Modulus of a Complex Number


We define one more operation with complex numbers that may be new to you.
Definition MCN
Modulus of a Complex Number
The modulus of the complex number c = a + bi E C, is the nonnegative real number

                                   |cl  V c =  a2+ b2.

                                                                                       A

Example MSCN
Modulus of some complex numbers


         2+ 3il - v13         |5-4il -  41         |-3+ Oil 3          0+ Oil  0


   The modulus can be interpreted as a version of the absolute value for complex numbers, as is suggested
by the notation employed. You can see this in how |-3| =-3 + Oil  3. Notice too how the modulus of
the complex zero, 0 + Oi, has value 0.


Version 2.02


﻿
                                                                              Section SET  Sets 684


Section SET
Sets
N                                                                                               -


Definition SET
Set
A set is an unordered collection of objects. If S is a set and x is an object that is in the set S, we write
x E S. If x is not in S, then we write x 0 S. We refer to the objects in a set as its elements.
(This definition contains Notation SETM.)                                                        A

   Hard to get much more basic than that. Notice that the objects in a set can be anything, and there is
no notion of order among the elements of the set. A set can be finite as well as infinite. A set can contain
other sets as its objects. At a primitive level, a set is just a way to break up some class of objects into two
groupings: those objects in the set, and those objects not in the set.

Example SETM
Set membership
From the set of all possible symbols, construct the following set of three symbols,

                                          S={, ,*}

Then the statement U E S is true, while the statement A E S is false. However, then the statement A 0 S
is true.

   A portion of a set is known as a subset. Notice how the following definition uses an implication (if
whenever... then...). Note too how the definition of a subset relies on the definition of a set through the
idea of set membership.

Definition SSET
Subset
If S and T are two sets, then S is a subset of T, written S C T if whenever x E S then x E T.
(This definition contains Notation SSET.)                                                        A

   If we want to disallow the possibility that S is the same as T, we use the notation S C T and we say
that S is a proper subset of T. We'll do an example, but first we'll define a special set.

Definition ES
Empty Set
The empty set is the set with no elements. Its is denoted by 0.
(This definition contains Notation ES.)                                                          A

Example SSET
Subset
If S= {E, *, *}, T ={*, *}, R ={A, *}, then

                    TCS                        RgjT                      0cS
                    TcS                        SCSSg


   What does it mean for two sets to be equal? They must be the same. Well, that explanation is not
really too helpful, is it? How about: If A C B and B C A, then A equals B. This gives us something to
work with, if A is a subset of B, and vice versa, then they must really be the same set. We will now make


Version 2.02


﻿
                                                                 Subsection SET.SC   Set Cardinality 685


the symbol "=" do double-duty and extend its use to statements like A = B, where A and B are sets.
Here's the definition, which we will reference often.
Definition SE
Set Equality
Two sets, S and T, are equal, if S C T and T C S. In this case, we write S = T.
(This definition contains Notation SE.)                                                               A
    Sets are typically written inside of braces, as { }, as we have seen above. However, when sets have
more than a few elements, a description will typically have two components. The first is a description of
the general type of objects contained in a set, while the second is some sort of restriction on the properties
the objects have. Every object in the set must be of the type described in the first part and it must satisfy
the restrictions in the second part. Conversely, any object of the proper type for the first part, that also
meets the conditions of the second part, will be in the set. These two parts are set off from each other
somehow, often with a vertical bar () or a colon (:).
    I like to think of sets as clubs. The first part is some description of the type of people who might
belong to the club, the basic objects. For example, a bicycle club would describe its members as being
people who like to ride bicycles. The second part is like a membership committee, it restricts the people
who are allowed in the club. Continuing with our bicycle club analogy, we might decide to limit ourselves
to "serious" riders and only have members who can document having ridden 100 kilometers or more in a
single day at least one time.
    The restrictions on membership can migrate around some between the first and second part, and there
may be several ways to describe the same set of objects. Here's a more mathematical example, employing
the set of all integers, Z, to describe the set of even integers.

                                   E={xEEZ |isan even number}
                                     S{x E Z    2 divides x evenly}
                                     ={2k kEZ}

Notice how this set tells us that its objects are integer numbers (not, say, matrices or functions, for example)
and just those that are even. So we can write that 10 E E, while 17 0 E once we check the membership
criteria. We also recognize the question
                                            1 -3 5 ]EE?
                                            2   0    3
as being simply ridiculous.

Subsection SC
Set Cardinality


On occasion, we will be interested in the number of elements in a finite set. Here's the definition and the
associated notation.
Definition C
Cardinality
Suppose S is a finite set. Then the number of elements in S is called the cardinality or size of 5, and is
denoted |S|.
(This definition contains Notation C.)                                                                A


Example CS
Cardinality and Size
If S = {", *,E }, then S| =3.


Version 2.02


﻿
                                                                Subsection SET.SO  Set Operations 686


Subsection SO
Set Operations


In this subsection we define and illustrate the three most common basic ways to manipulate sets to create
other sets. Since much of linear algebra is about sets, we will use these often.
Definition SU
Set Union
Suppose S and T are sets. Then the union of S and T, denoted S U T, is the set whose elements are those
that are elements of S or of T, or both. More formally,

                                x E S U T if and only if x E S or x E T


(This definition contains Notation SU.)                                                            A
   Notice that the use of the word "or" in this definition is meant to be non-exclusive. That is, it allows
for x to be an element of both S and T and still qualify for membership in S U T.
Example SU
Set union
If S   {*, *, E} and T ={*, *, A} then S U T = {*, *, U, A}.

Definition SI
Set Intersection
Suppose S and T are sets. Then the intersection of S and T, denoted S n T, is the set whose elements
are only those that are elements of S and of T. More formally,

                               x E SS T if and only if x E S and x E T


(This definition contains Notation SI.)                                                            A

Example SI
Set intersection
If S = {*, *, E} and T =  {, *, A} then S n T = {*, *}.
   The union and intersection of sets are operations that begin with two sets and produce a third, new,
set. Our final operation is the set complement, which we usually think of as an operation that takes a
single set and creates a second, new, set. However, if you study the definition carefully, you will see that
it needs to be computed relative to some "universal" set.
Definition SC
Set Complement
Suppose S is a set that is a subset of a universal set U. Then the complement of 5, denoted S, is the
set whose elements are those that are elements of U and not elements of S. More formally,

                                 x C S if and only if x C U and x g S


(This definition contains Notation SC.)                                                            A


   Notice that there is nothing at all special about the universal set. This is simply a term that suggests
that U contains all of the possible objects we are considering. Often this set will be clear from the context,
and we won't think much about it, nor reference it in our notation. In other cases (rarely in our work in


Version 2.02


﻿
                                                                 Subsection SET.SO  Set Operations 687


this course) the exact nature of the universal set must be made explicit, and reference to it will possibly
be carried through in our choice of notation.
Example SC
Set complement
If U = {+, *, U, A} and S ={, *, E} then S = {A}.
    There are many more natural operations that can be performed on sets, such as an exclusive-or and the
symmetric difference. Many of these can be defined in terms of the union, intersection and complement.
We will not have much need of them in this course, and so we will not give precise descriptions here in this
preliminary section.
    There is also an interesting variety of basic results that describe the interplay of these operations with
each other. We mention just two as an example, these are known as DeMorgan's Laws.

                                           (SuT) =nT
                                           (SnT)      SuT


Besides having an appealing symmetry, we mention these two facts, since constructing the proofs of each
is a useful exercise that will require a solid understanding of all but one of the definitions presented in this
section. Give it a try.


Version 2.02


﻿
                                                                       Section PT  Proof Techniques 688


Section PT
Proof Techniques


In this section we collect many short essays designed to help you understand how to read, understand and
construct proofs. Some are very factual, while others consist of advice. They appear in the order that
they are first needed (or advisable) in the text, and are meant to be self-contained. So you should not
think of reading through this section in one sitting as you begin this course. But be sure to head back here
for a first reading whenever the text suggests it. Also think about returning to browse at various points
during the course, and especially as you struggle with becoming an accomplished mathematician who is
comfortable with the difficult process of designing new proofs.


Proof Technique D
Definitions


A definition is a made-up term, used as a kind of shortcut for some typically more complicated idea. For
example, we say a whole number is even as a shortcut for saying that when we divide the number by two
we get a remainder of zero. With a precise definition, we can answer certain questions unambiguously. For
example, did you ever wonder if zero was an even number? Now the answer should be clear since we have
a precise definition of what we mean by the term even.
    A single term might have several possible definitions. For example, we could say that the whole number
n is even if there is another whole number k such that n = 2k. We say this is an equivalent definition since
it categorizes even numbers the same way our first definition does.
    Definitions are like two-way streets we can use a definition to replace something rather complicated
by its definition (if it fits) and we can replace a definition by its more complicated description. A definition
is usually written as some form of an implication, such as "If something-nice-happens, then blatzo."
However, this also means that "If blatzo, then something-nice-happens," even though this may not be
formally stated. This is what we mean when we say a definition is a two-way street  it is really two
implications, going in opposite "directions."
    Anybody (including you) can make up a definition, so long as it is unambiguous, but the real test of a
definition's utility is whether or not it is useful for describing interesting or frequent situations.
    We will talk about theorems later (and especially equivalences). For now, be sure not to confuse the
notion of a definition with that of a theorem.
    In this book, we will display every new definition carefully set-off from the text, and the term being
defined will be written thus: definition. Additionally, there is a full list of all the definitions, in order of
their appearance located at the front of the book (Definitions [viii]). Finally, the acronym for each definition
can be found in the index (Index [??]). Definitions are critical to doing mathematics and proving theorems,
so we've given you lots of ways to locate a definition should you forget its. .. uh, well, . .. definition.


   Can you formulate a precise definition for what it means for a number to be odd? (Don't just say
it is the opposite of even. Act as if you don't have a definition for even yet.) Can you formulate your
definition a second, equivalent, way? Can you employ your definition to test an odd and an even number
for "odd-ness"?


Version 2.02


﻿
                                                                  Proof Technique PT.T  Theorems 689


Proof Technique T
Theorems


Higher mathematics is about understanding theorems. Reading them, understanding them, applying them,
proving them. Every theorem is a shortcut  we prove something in general, and then whenever we find
a specific instance covered by the theorem we can immediately say that we know something else about
the situation by applying the theorem. In many cases, this new information can be gained with much less
effort than if we did not know the theorem.
   The first step in understanding a theorem is to realize that the statement of every theorem can be rewrit-
ten using statements of the form "If something-happens, then something-else-happens." The "something-
happens" part is the hypothesis and the "something-else-happens" is the conclusion. To understand
a theorem, it helps to rewrite its statement using this construction. To apply a theorem, we verify that
"something-happens" in a particular instance and immediately conclude that "something-else-happens."
To prove a theorem, we must argue based on the assumption that the hypothesis is true, and arrive through
the process of logic that the conclusion must then also be true.

Proof Technique L
Language


     Like any science, the language of math must be understood before further study can continue.

                                                                            Erin Wilson, Student
                                                                                September, 2004

Mathematics is a language. It is a way to express complicated ideas clearly, precisely, and unambiguously.
Because of this, it can be difficult to read. Read slowly, and have pencil and paper at hand. It will usually
be necessary to read something several times. While reading can be difficult, it is even harder to speak
mathematics, and so that is the topic of this technique.
    "Natural" language, in the present case English, is fraught with ambiguity. Consider the possible
meanings of the sentence: The fish is ready to eat. One fish, or two fish? Are the fish hungry, or will the
fish be eaten? (See Exercise SSLE.M10 [18], Exercise SSLE.M11 [19], Exercise SSLE.M12 [19], Exercise
SSLE.M13 [19].) In your daily interactions with others, give some thought to how many mis-understandings
arise from the ambiguity of pronouns, modifiers and objects.
   I am going to suggest a simple modification to the way you use language that will make it much, much
easier to become proficient at speaking mathematics and eventually it will become second nature. Think
of it as a training aid or practice drill you might use when learning to become skilled at a sport.
   First, eliminate pronouns from your vocabulary when discussing linear algebra, in class or with your
colleagues. Do not use: it, that, those, their or similar sources of confusion. This is the single easiest step
you can take to make your oral expression of mathematics clearer to others, and in turn, it will greatly
help your own understanding.
   Now rid yourself of the word "thing" (or variants like "something"). When you are tempted to use this
word realize that there is some object you want to discuss, and we likely have a definition for that object
(see the discussion at Technique D [687]). Always "think about your objects" and many aspects of the
study of mathematics will get easier. Ask yourself: "Am I working with a set, a number, a function, an


operation, a differential equation, or what?" Knowing what an object is will allow you to narrow down the
procedures you may apply to it. If you have studied an object-oriented computer programming language,
then you will already have experience identifying objects and thinking carefully about what procedures are
allowed to be applied to them.


Version 2.02


﻿
                                                            Proof Technique PT.GS Getting Started 690


    Third, eliminate the verb "works" (as in "the equation works") from your vocabulary. This term is
used as a substitute when we are not sure just what we are trying to accomplish. Usually we are trying to
say that some object fulfills some condition. The condition might even have a definition associated with
it, making it even easier to describe.
    Last, speak slooooowly and thoughtfully as you try to get by without all these lazy words. It is hard
at first, but you will get better with practice. Especially in class, when the pressure is on and all eyes are
on you, don't succumb to the temptation to use these weak words. Slow down, we'd all rather wait for a
slow, well-formed question or answer than a fast, sloppy, incomprehensible one.
   You will find the improvement in your ability to speak clearly about complicated ideas will greatly
improve your ability to think clearly about complicated ideas. And I believe that you cannot think clearly
about complicated ideas if you cannot formulate questions or answers clearly in the correct language. This
is as applicable to the study of law, economics or philosophy as it is to the study of science or mathematics.
    In this spirit, Dupont Hubert has contributed the following quotation, which is widely used in French
mathematics courses (and which might be construed as the contrapositive of Technique CP [691])

     Ce que l'on concoit bien s'enonce clairement,
     Et les mots pour le dire arrivent aisement.

        Nicolas Boileau, L'art poetique, Chant I, 1674

which translates as

     Whatever is well conceived is clearly said,
     And the words to say it flow with ease.

     So when you come to class, check your pronouns at the door, along with other weak words. And
when studying with friends, you might make a game of catching one another using pronouns, "thing," or
"works." I know I'll be calling you on it!


Proof Technique GS
Getting Started


"I don't know how to get started!" is often the lament of the novice proof-builder. Here are a few pieces
of advice.

   1. As mentioned in Technique T [688], rewrite the statement of the theorem in an "if-then" form. This
     will simplify identifying the hypothesis and conclusion, which are referenced in the next few items.

  2. Ask yourself what kind of statement you are trying to prove. This is always part of your conclusion.
     Are you being asked to conclude that two numbers are equal, that a function is differentiable or a set
     is a subset of another? You cannot bring other techniques to bear if you do not know what type of
     conclusion you have.

  3. Write down reformulations of your hypotheses. Interpret and translate each definition properly.

  4. Write your hypothesis at the top of a sheet of paper and your conclusion at the bottom. See if you


can formulate a statement that precedes the conclusion and also implies it. Work down from your
hypothesis, and up from your conclusion, and see if you can meet in the middle. When you are
finished, rewrite the proof nicely, from hypothesis to conclusion, with verifiable implications giving
each subsequent statement.


Version 2.02


﻿
                                                          Proof Technique PT.C  Constructive Proofs 691


  5. As you work through your proof, think about what kinds of objects your symbols represent. For
     example, suppose A is a set and f(x) is a real-valued function. Then the expression A + f might
     make no sense if we have not defined what it means to "add" a set to a function, so we can stop at
     that point and adjust accordingly. On the other hand we might understand 2f to be the function
     whose rule is described by (2f)(x) = 2f(x). "Think about your objects" means to always verify that
     your objects and operations are compatible.


Proof Technique C
Constructive Proofs


Conclusions of proofs come in a variety of types. Often a theorem will simply assert that something exists.
The best way, but not the only way, to show something exists is to actually build it. Such a proof is
called constructive. The thing to realize about constructive proofs is that the proof itself will contain a
procedure that might be used computationally to construct the desired object. If the procedure is not too
cumbersome, then the proof itself is as useful as the statement of the theorem.


Proof Technique E
Equivalences


When a theorem uses the phrase "if and only if" (or the abbreviation "iff") it is a shorthand way of saying
that two if-then statements are true. So if a theorem says "P if and only if Q," then it is true that "if P,
then Q" while it is also true that "if Q, then P." For example, it may be a theorem that "I wear bright
yellow knee-high plastic boots if and only if it is raining." This means that I never forget to wear my
super-duper yellow boots when it is raining and I wouldn't be seen in such silly boots unless it was raining.
You never have one without the other. I've got my boots on and it is raining or I don't have my boots on
and it is dry.
    The upshot for proving such theorems is that it is like a 2-for-1 sale, we get to do two proofs. Assume
P and conclude Q, then start over and assume Q and conclude P. For this reason, "if and only if" is
sometimes abbreviated by        , while proofs indicate which of the two implications is being proved by
prefacing each with  or e. A carefully written proof will remind the reader which statement is being
used as the hypothesis, a quicker version will let the reader deduce it from the direction of the arrow.
Tradition dictates we do the "easy" half first, but that's hard for a student to know until you've finished
doing both halves! Oh well, if you rewrite your proofs (a good habit), you can then choose to put the easy
half first.
    Theorems of this type are called "equivalences" or "characterizations," and they are some of the most
pleasing results in mathematics. They say that two objects, or two situations, are really the same. You
don't have one without the other, like rain and my yellow boots. The more different P and Q seem to be,
the more pleasing it is to discover they are really equivalent. And if P describes a very mysterious solution
or involves a tough computation, while Q is transparent or involves easy computations, then we've found
a great shortcut for better understanding or faster computation. Remember that every theorem really is a
shortcut in some form. You will also discover that if proving P 4 Q is very easy, then proving Q 4 P is
likely to be proportionately harder. Sometimes the two halves are about equally hard. And in rare cases,
you can string together a whole sequence of other equivalences to form the one you're after and you don't


even need to do two halves. In this case, the argument of one half is just the argument of the other half,
but in reverse.
    One last thing about equivalences. If you see a statement of a theorem that says two things are
"equivalent," translate it first into an "if and only if" statement.


Version 2.02


﻿
                                                                    Proof Technique PT.N   Negation  692


Proof Technique N
Negation


When we construct the contrapositive of a theorem (Technique CP [691]), we need to negate the two
statements in the implication. And when we construct a proof by contradiction (Technique CD [692]),
we need to negate the conclusion of the theorem. One way to construct a converse (Technique CV [691])
is to simultaneously negate the hypothesis and conclusion of an implication (but remember that this is
not guaranteed to be a true statement). So we often have the need to negate statements, and in some
situations it can be tricky.
    If a statement says that a set is empty, then its negation is the statement that the set is nonempty.
That's straightforward. Suppose a statement says "something-happens" for all i, or every i, or any i. Then
the negation is that "something-doesn't-happen" for at least one value of i. If a statement says that there
exists at least one "thing," then the negation is the statement that there is no "thing." If a statement says
that a "thing" is unique, then the negation is that there is zero, or more than one, of the "thing."
   We are not covering all of the possibilities, but we wish to make the point that logical qualifiers like
"there exists" or "for every" must be handled with care when negating statements. Studying the proofs
which employ contradiction (as listed in Technique CD [692]) is a good first step towards understanding
the range of possibilities.

Proof Technique CP
Contrapositives


The contrapositive of an implication P  Q is the implication not(Q)  not(P), where "not" means the
logical negation, or opposite. An implication is true if and only if its contrapositive is true. In symbols,
(P 4 Q) <       (not(Q)     not(P)) is a theorem. Such statements about logic, that are always true, are
known as tautologies.
    For example, it is a theorem that "if a vehicle is a fire truck, then it has big tires and has a siren."
(Yes, I'm sure you can conjure up a counterexample, but play along with me anyway.) The contrapositive
is "if a vehicle does not have big tires or does not have a siren, then it is not a fire truck." Notice how the
"and" became an "or" when we negated the conclusion of the original theorem.
    It will frequently happen that it is easier to construct a proof of the contrapositive than of the original
implication. If you are having difficulty formulating a proof of some implication, see if the contrapositive
is easier for you. The trick is to construct the negation of complicated statements accurately. More on
that later.

Proof Technique CV
Converses


The converse of the implication P 4 Q is the implication Q 4 P. There is no guarantee that the truth
of these two statements are related. In particular, if an implication has been proven to be a theorem, then
do not try to use its converse too, as if it were a theorem. Sometimes the converse is true (and we have an
equivalence, see Technique E [690]). But more likely the converse is false, especially if it wasn't included
in the statement of the original theorem.


    For example, we have the theorem, "if a vehicle is a fire truck, then it is has big tires and has a siren."
The converse is false. The statement that "if a vehicle has big tires and a siren, then it is a fire truck" is
false. A police vehicle for use on a sandy public beach would have big tires and a siren, yet is not equipped
to fight fires.


Version 2.02


﻿
                                                             Proof Technique PT.CD  Contradiction 693


   We bring this up now, because Theorem CSRN [54] has a tempting converse. Does this theorem say
that if r < n, then the system is consistent? Definitely not, as Archetype E [720] has r = 3 < 4 = n,
yet is inconsistent. This example is then said to be a counterexample to the converse. Whenever you
think a theorem that is an implication might actually be an equivalence, it is good to hunt around for
a counterexample that shows the converse to be false (the archetypes, Appendix A [698], can be a good
hunting ground).


Proof Technique CD
Contradiction


Another proof technique is known as "proof by contradiction" and it can be a powerful (and satisfying)
approach. Simply put, suppose you wish to prove the implication, "If A, then B." As usual, we assume
that A is true, but we also make the additional assumption that B is false. If our original implication
is true, then these twin assumptions should lead us to a logical inconsistency. In practice we assume the
negation of B to be true (see Technique N [691]). So we argue from the assumptions A and not(B) looking
for some obviously false conclusion such as 1 = 6, or a set is simultaneously empty and nonempty, or a
matrix is both nonsingular and singular.
   You should be careful about formulating proofs that look like proofs by contradiction, but really aren't.
This happens when you assume A and not(B) and proceed to give a "normal" and direct proof that B
is true by only using the assumption that A is true. Your last step is to then claim that B is true and
you then appeal to the assumption that not(B) is true, thus getting the desired contradiction. Instead,
you could have avoided the overhead of a proof by contradiction and just run with the direct proof. This
stylistic flaw is known, quite graphically, as "setting up the strawman to knock him down."
   Here is a simple example of a proof by contradiction. There are direct proofs that are just about as
easy, but this will demonstrate the point, while narrowly avoiding knocking down the straw man.

   Theorem: If a and b are odd integers, then their product, ab, is odd.

   Proof: To begin a proof by contradiction, assume the hypothesis, that a and b are odd. Also assume
the negation of the conclusion, in this case, that ab is even. Then there are integers, j, k, £ so that
a =2j + 1, b =2k + 1, ab= 2. Then

                                      0 =ab - ab
                                          (2j+1)(2k+1) - (2£)
                                        =4jk+2j+2k-2f+1
                                        =2(2jk+j +k -£)+ 1


Notice how we used both our hypothesis and the negation of the conclusion in the second line. Now divide
the integer on each end of this string of equalities by 2. On the left we get a remainder of 0, while on
the right we see that the remainder will be 1. Both remainders cannot be correct, so this is our desired
contradiction. Thus, the conclusion (that ab is odd) is true.

   Again, we do not offer this example as the best proof of this fact about even and odd numbers, but rather


it is a simple illustration of a proof by contradiction. You can find examples of proofs by contradiction in
Theorem RREFU [32], Theorem NMUS [74], Theorem NPNT [226], Theorem TTMI [214], Theorem GSP
[175], Theorem ELIS [355], Theorem EDYES [358], Theorem EMHE [400], Theorem EDELI [419], and
Theorem DMFE [438], in addition to several examples and solutions to exercises.


Version 2.02


﻿
                                                                 Proof Technique PT.U   Uniqueness 694


Proof Technique U
Uniqueness


A theorem will sometimes claim that some object, having some desirable property, is unique. In other
words, there should be only one such object. To prove this, a standard technique is to assume there
are two such objects and proceed to analyze the consequences. The end result may be a contradiction
(Technique CD [692]), or the conclusion that the two allegedly different objects really are equal.

Proof Technique ME
Multiple Equivalences


A very specialized form of a theorem begins with the statement "The following are equivalent...," which
is then followed by a list of statements. Informally, this lead-in sometimes gets abbreviated by "TFAE."
This formulation means that any two of the statements on the list can be connected with an "if and only
if" to form a theorem. So if the list has n statements then, there are "(m21) possible equivalences that can
                                                                     2
be constructed (and are claimed to be true).
    Suppose a theorem of this form has statements denoted as A, B, C,... Z. To prove the entire theorem,
we can prove A  B, B      C, C    D,..., Y    Z and finally, Z   A. This circular chain of n equivalences
would allow us, logically, if not practically, to form any one of the "(421) possible equivalences by chasing
the equivalences around the circle as far as required.

Proof Technique PI
Proving Identities


Many theorems have conclusions that say two objects are equal. Perhaps one object is hard to compute or
understand, while the other is easy to compute or understand. This would make for a pleasing theorem.
Whether the result is pleasing or not, we take the same approach to formulate a proof. Sometimes we need
to employ specialized notions of equality, such as Definition SE [684] or Definition CVE [84], but in other
cases we can string together a list of equalities.
    The wrong way to prove an identity is to begin by writing it down and then beating on it until it
reduces to an obvious identity. The first flaw is that you would be writing down the statement you wish
to prove, as if you already believed it to be true. But more dangerous is the possibility that some of your
maneuvers are not reversible. Here's an example. Let's prove that 3 = -3.

                       3 =-3                           (This is a bad start)
                       32 =(-3)2                       Square both sides


                       0 =0                            Subtract 9 from both sides

So because 0 =0 is a true statement, does it follow that 3 =-3 is a true statement? Nope. Of course,
we didn't really expect a legitimate proof of 3 =-3, but this attempt should illustrate the dangers of this
(incorrect) approach.
   What you have just seen in the proof of Theorem VSPCV [86], and what you will see consistently


throughout this text, is proofs of the following form. To prove that A = D we write

               A = B                 Theorem, Definition or Hypothesis justifying A = B
                 = C                 Theorem, Definition or Hypothesis justifying B = C


Version 2.02


﻿
                                                           Proof Technique PT.DC   Decompositions 695


                 = D                Theorem, Definition or Hypothesis justifying C = D

In your scratch work exploring possible approaches to proving a theorem you may massage a variety of
expressions, sometimes making connections to various bits and pieces, while some parts get abandoned.
Once you see a line of attack, rewrite your proof carefully mimicking this style.

Proof Technique DC
Decompositions


Much of your mathematical upbringing, especially once you began a study of algebra, revolved around
simplifying expressions   combining like terms, obtaining common denominators so as to add fractions,
factoring in order to solve polynomial equations. However, as often as not, we will do the opposite.
Many theorems and techniques will revolve around taking some object and "decomposing" it into some
combination of other objects, ostensibly in a more complicated fashion. When we say something can "be
written as" something else, we mean that the one object can be decomposed into some combination of
other objects. This may seem unnatural at first, but results of this type will give us insight into the
structure of the original object by exposing its inner workings. An appropriate analogy might be stripping
the wallboards away from the interior of a building to expose the structural members supporting the whole
building.
   This is a major shift in thinking, so come back here often, especially when we say "can be written as",
or "can be expressed as," or "can be decomposed as."

Proof Technique I
Induction


"Induction" or "mathematical induction" is a framework for proving statements that are indexed by in-
tegers. In other words, suppose you have a statement to prove that is really multiple statements, one for
n = 1, another for n = 2, a third for n = 3, and so on. If there is enough similarity between the statements,
then you can use a script (the framework) to prove them all at once.
   For example, consider the theorem

   Theorem 1+2+3+"--+n=n               2    forn>1.
                                       22-
   This is shorthand for the many statements 1 =1(1+1) 1+2 = 2(2 , 1+2+3-3(3+1),1+2+3+4
4(4+1) and so on. Forever. You can do the calculations in each of these statements and verify that all four
are true. We might not be surprised to learn that the fifth statement is true as well (go ahead and check).
However, do we think the theorem is true for nr= 872? Or nr= 1, 234, 529?
    To see that these questions are not so ridiculous, consider the following example from Rotman's Journey
into Mathematics. The statement "nt2 _ n~ + 41 is prime" is true for integers 1 < n~ < 40 (check a few).
However, when we check n~ = 41 we find 412 - 41 + 41 = 412, which is not prime.
    So how do we prove infinitely many statements all at once? More formally, lets denote our statements
as P(nt). Then, if we can prove the two assertions

  1. P(1) is true.


  2. If P(k) is true, then P(k + 1) is true.

then it follows that P(n) is true for all n > 1. To understand this, I liken the process to climbing an
infinitely long ladder with equally spaced rungs. Confronted with such a ladder, suppose I tell you that


Version 2.02


﻿
                                                                    Proof Technique PT.P  Practice 696


you are able to step up onto the first rung, and if you are on any particular rung, then you are capable of
stepping up to the next rung. It follows that you can climb the ladder as far up as you wish. The first
formal assertion above is akin to stepping onto the first rung, and the second formal assertion is akin to
assuming that if you are on any one rung then you can always reach the next rung.
    In practice, establishing that P(1) is true is called the "base case" and in most cases is straightforward.
Establishing that P(k) - P(k + 1) is referred to as the "induction step," or in this book (and elsewhere)
we will typically refer to the assumption of P(k) as the "induction hypothesis." This is perhaps the most
mysterious part of a proof by induction, since it looks like you are assuming (P(k)) what you are trying
to prove (P(n)). Sometimes it is even worse, since as you get more comfortable with induction, we often
don't bother to use a different letter (k) for the index (n) in the induction step. Notice that the second
formal assertion never says that P(k) is true, it simply says that if P(k) were true, what might logically
follow. We can establish statements like "If I lived on the moon, then I could pole-vault over a bar 12
meters high." This may be a true statement, but it does not say we live on the moon, and indeed we may
never live there.
    Enough generalities. Let's work an example and prove the theorem above about sums of integers.
                                                       nt(n +1)
Formally, our statement is P(n) : 1 + 2 + 3 + - - - +  2n =
                                                          2
   Proof: Base Case. P(1) is the statement 1 =  (1), which we see simplifies to the true statement
1=1.
    Induction Step: We will assume P(k) is true, and will try to prove P(k + 1). Given what we want to
accomplish, it is natural to begin by examining the sum of the first k + 1 integers.
           1+2+3+...+(k+1)
                      =(1 + 2 + 3 +.--+k) +(k +1)
                      k(k + 1)
                      =         + (k + 1)                              Induction Hypothesis
                           2
                       k2+k      k2+3k+2
                          2           2
                       (k+1)(k+2) _-(k+1)((k+1)+1)
                             2                   2
We then recognize the two ends of this chain of equalities as P(k + 1). So, by mathematical induction, the
theorem is true for all n.
    How do you recognize when to use induction? The first clue is a statement that is really many state-
ments, one for each integer. The second clue would be that you begin a more standard proof and you find
yourself using words like "and so on" (as above!) or lots of ellipses (dots) to establish patterns that you are
convinced continue on and on forever. However, there are many minor instances where induction might be
warranted but we don't bother.
    Induction is important enough, and used often enough, that it appears in various variations. The base
case sometimes begins with n = 0, or perhaps an integer greater than n. Some formulate the induction
step as P(k - 1) -> P(k). There is also a "strong form" of induction where we assume all of P(1), P(2),
P(3), . .. P(k) as a hypothesis for showing the conclusion P(k + 1).
You can find examples of induction in the proofs of Theorem GSP [175], Theorem DER [376], Theorem DT
[377], Theorem DIM [387], Theorem EOMP [421], Theorem DCP [424], and Theorem KPLT [616].

Proof Technique P
Practice


Here is a technique used by many practicing mathematicians when they are teaching themselves new
mathematics. As they read a textbook, monograph or research article, they attempt to prove each new


Version 2.02


﻿
                                                     Proof Technique PT.LC   Lemmas and Corollaries 697


theorem themselves, before reading the proof. Often the proofs can be very difficult, so it is wise not to
spend too much time on each. Maybe limit your losses and try each proof for 10 or 15 minutes. Even if
the proof is not found, it is time well-spent. You become more familiar with the definitions involved, and
the hypothesis and conclusion of the theorem. When you do work through the proof, it might make more
sense, and you will gain added insight about just how to construct a proof.

Proof Technique LC
Lemmas and Corollaries


Theorems often go by different titles. Two of the most popular being "lemma" and "corollary." Before we
describe the fine distinctions, be aware that lemmas, corollaries, propositions, claims and facts are all just
theorems. And every theorem can be rephrased as an "if-then" statement, or perhaps a pair of "if-then"
statements expressed as an equivalence (Technique E [690]).
    A lemma is a theorem that is not too interesting in its own right, but is important for proving other
theorems. It might be a generalization or abstraction of a key step of several different proofs. For this
reason you often hear the phrase "technical lemma" though some might argue that the adjective "technical"
is redundant.
    A corollary is a theorem that follows very easily from another theorem. For this reason, corollaries
frequently do not have proofs. You are expected to easily and quickly see how a previous theorem implies
the corollary.
    A proposition or fact is really just a codeword for a theorem. A claim might be similar, but some
authors like to use claims within a proof to organize key steps. In a similar manner, some long proofs are
organized as a series of lemmas.
    In order to not confuse the novice, we have just called all our theorems theorems. It is also an
organizational convenience. With only theorems and definitions, the theoretical backbone of the course is
laid bare in the two lists of Definitions [viii] and Theorems [ix].


Version 2.02


﻿

Proof Technique PT.LC    Lemmas and Corollaries   698


Version 2.02


﻿


Appendix A

Archetypes


WordNet (an open-source lexical database) gives the following definition of "archetype": something that
serves as a model or a basis for making copies.
   Our archetypes are typical examples of systems of equations, matrices and linear transformations. They
have been designed to demonstrate the range of possibilities, allowing you to compare and contrast them.
Several are of a size and complexity that is usually not presented in a textbook, but should do a better
job of being "typical."
   We have made frequent reference to many of these throughout the text, such as the frequent comparisons
between Archetype A [702] and Archetype B [707]. Some we have left for you to investigate, such as
Archetype J [741], which parallels Archetype I [737].
   How should you use the archetypes? First, consult the description of each one as it is mentioned in the
text. See how other facts about the example might illuminate whatever property or construction is being
described in the example. Second, each property has a short description that usually includes references to
the relevant theorems. Perform the computations and understand the connections to the listed theorems.
Third, each property has a small checkbox in front of it. Use the archetypes like a workbook and chart
your progress by "checking-off" those properties that you understand.
   The next page has a chart that summarizes some (but not all) of the properties described for each
archetype. Notice that while there are several types of objects, there are fundamental connections between
them. That some lines of the table do double-duty is meant to convey some of these connections. Consult
this table when you wish to quickly find an example of a certain phenomenon.


699


﻿

Appendix A    Archetypes 700


Version 2.02


﻿


                   A B C D E         F G H I J K L M N O P Q R S TUVWX
Type               S S S S S         S   S S S S M M              L L L         L        L    L L L
Vars,Cols,Domain   3   3  4   4  4   4   2   2  7   9  5   5   5   5  3   3  5   5  3  5   6  4   3   4
Eqns,Rows,CoDom    3   3  3   3  3   4   5   5  4   6  5   5   3   3  5   5  5   5  4  6   4  4   3   4
SolutionSet        I   U  I   I  N   U   U   N  I   I
Rank               2   3  3   2  2   4   2   2  3   4  5   3   2   3  2   3  4   5  2  5   4  4   3   3
Nullity            1   0  1   2  2   0   0   0  4   5  0   2   3   2  1   0  1   0  1  0   2  0   0   1
Injective                                                      XXN          N Y N     YXY         Y N
Surjective                                                     N  Y   X  X   N  Y   X  XY    Y    Y   N
FullRank           N  Y   Y  N   N   Y   Y  Y   N  N   Y   N
Nonsingular        N   Y             Y                 Y   N
Invertible         N   Y             Y                 Y   N                 N  Y             Y   Y   N
Determinant        0  -2            -18                16  0                                  -2  -3  0
Diagonalizable     N   Y             Y                 Y   Y                                      YY


                                           Archetype Facts

                          S=System of Equations, M=Matrix, L=Linear Transformation
                          U=Unique solution, I=Infinitely many solutions, N=No solutions
                             Y=Yes, N=No, X=Impossible, blank=Not Applicable


﻿

Appendix A    Archetypes 702


Version 2.02


﻿
Archetype A     703


Archetype A


U.


7


Summary Linear system of three equations, three unknowns. Singular coefficient matrix with dimension
1 null space. Integer eigenvalues and a degenerate eigenspace for coefficient matrix.


    A system of linear equations (Definition SLE [9]):

                                          XI - z2 + 2x3 = 1
                                          2x1 + x2 + x3 = 8
                                                Xi + x2 = 5


    Some solutions to the system of linear equations (not necessarily exhaustive):

x1=2, x2=3, x3=1

xi=3, X2=2, x3O=0


    Augmented matrix of the linear system of equations (Definition AM [27]):


1
2
1


-1 2
1 1
1 0


1
8
5


  Matrix in reduced row-echelon form, row-equivalent to augmented matrix:

1   0    1   3
0   0   -1 2
0    0   0   0_


  Analysis of the augmented matrix (Notation RREFA [30]):


r=2


D = {1, 2}


F = {3, 4}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries
of the vectors corresponding to elements of the set F for the larger examples.


Version 2.02


﻿
                                                                                  Archetype A     704


 [I       3        -1
 X2 =     2 + x3    1
 3       _01


    Given a system of equations we can always build a new, related, homogeneous system (Definition HS
[62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties
of this new system will have precise relationships with various properties of the original system.

                                          Xi - z2 + 2X3 = 0
                                          231 + X2 +3:3 = 0
                                            3I+3:2      = 0


    Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):
zi=0, z2=0, os=0

Xi     1,   2 =1,    z3 = 1

zi= -5, z2 =5,os3= 5


    Form the augmented matrix of the homogenous linear system, and use row operations to convert to
reduced row-echelon form. Notice how the entries of the final column remain zeros:

  1    0   1   0
  0   0   -1 0
  0    0   0   0_


    Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:

                   r = 2                   D = {1, 2}                   F = {3, 4}


  ]Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.


[2 11


    Matrix brought to reduced row-echelon form:


10        1
0 Di -1
0    0    0_


Version 2.02


﻿
Archetype A     705


    Analysis of the row-reduced matrix (Notation RREFA [30]):

                   r = 2                    D = {1, 2}                    F = {3}


    Matrix (coefficient matrix) is nonsingular or singular? (Theorem NMRRI [72]) at the same time,
examine the size of the set F above.Notice that this property does not apply to matrices that are not
square.

Singular.


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.

     _-
     1
     1


     Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

     1     -1
     2  ,   1
     1      1


     The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.

L = [1 -2 3]


      3      2
          1 0


  ]Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.


Version 2.02


﻿
Archetype A     706


1      0
0     [1


D
the


K


{


Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])

.1_     0
0    ,  1
1      -1


    Inverse matrix, if it exists. The inverse is not defined for matrices that are not square, and if the matrix
is square, then the matrix must be nonsingular. (Definition MI [213], Theorem NI [228])


    Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]


Matrix columns: 3


Rank: 2


Nullity: 1


    Determinant of the matrix, which is only defined for square matrices. The matrix is nonsingular if and
only if the determinant is nonzero (Theorem SMZD [389]). (Product of all eigenvalues?)


Determinant


0


Eigenvalues, and bases for eigenspaces. (Definition EEM [396],Definition EM [404])


A=0


A=2


EF (0) K


EA (2) K


   1 -)


   L 1
 1i

{5~]}


Geometric and algebraic multiplicities. (Definition GME [406]Definition AME [406])


7A (0) 1
YA (2) 1


aA(0) 2
A (2) 1


Diagonalizable? (Definition DZM [435])


Version 2.02


﻿
Archetype A    707


No, yA (0) # 6aB (0), Theorem DMFE [438].


Version 2.02


﻿
                                                                                   Archetype B     708


Archetype B

N                                                                                                   _-Z


Summary       System with three equations, three unknowns. Nonsingular coefficient matrix. Distinct
integer eigenvalues for coefficient matrix.


    A system of linear equations (Definition SLE [9]):

                                       -7xi - 6x2 - 12x3 = -33
                                         5xi + 5X2 + 7X3= 24
                                                 zi +4x3= 5


    Some solutions to the system of linear equations (not necessarily exhaustive):

 1l = -3, x2 = 5, x3 = 2


    Augmented matrix of the linear system of equations (Definition AM [27]):

 -7    -6  -12   -33
 5     5     7    24
 1     0     4     5


    Matrix in reduced row-echelon form, row-equivalent to augmented matrix:

       0   0   -3
  0   [-   0    5
  0    0  [-1   2_


    Analysis of the augmented matrix (Notation RREFA [30]):

                   r=3                     D={1,2,3}                       F={4}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries
of the vectors corresponding to elements of the set F for the larger examples.


[qi      [-3


    Given a system of equations we can always build a new, related, homogeneous system (Definition HS
[62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties


Version 2.02


﻿
                                                                                    Archetype B    709


of this new system will have precise relationships with various properties of the original system.

                                        -11x1 + 2x2 - 14x3 = 0
                                        23xi - 6x2 + 33x3 = 0
                                        14xi - 2x2 + 17x3 = 0


    Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):
zi=0,     z2=0,    os=0


    Form the augmented matrix of the homogenous linear system, and use row operations to convert to
reduced row-echelon form. Notice how the entries of the final column remain zeros:

  1    0   0   0
  0   [-1  0   0
  0    0  [-]0_


    Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:

                   r=3                     D={1,2,3}                       F={4}


    Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.

  -7   -6  -12
  5    5     7
  1    0     4


    Matrix brought to reduced row-echelon form:

    100
  0 i 0


  ]Analysis of the row-reduced matrix (Notation RREFA [30]):

                   r =3                     D ={1, 2, 3}                    F ={ }


    Matrix (coefficient matrix) is nonsingular or singular? (Theorem NMRRI [72]) at the same time,
examine the size of the set F above.Notice that this property does not apply to matrices that are not


Version 2.02


﻿
                                                                                  Archetype B     710


square.

Nonsingular.


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.

K{ })


    Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

     - [],-[6]      -12
     5    ,  5   ,   7
     1       0       4


     The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of C".

L=(]

     1     0     0
     0  , 1   ,  0
     0     0     1I


     Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.


    Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])


Version 2.02


﻿
                                                                                  Archetype B     711


     1     0     0
     0  , 1   ,  0


     Inverse matrix, if it exists. The inverse is not defined for matrices that are not square, and if the matrix
is square, then the matrix must be nonsingular. (Definition MI [213], Theorem NI [228])

-10    -12   -9
      8       2
      5  3    5


    Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]

                Matrix columns: 3                  Rank: 3                 Nullity: 0


    Determinant of the matrix, which is only defined for square matrices. The matrix is nonsingular if and
only if the determinant is nonzero (Theorem SMZD [389]). (Product of all eigenvalues?)

Determinant = -2


    Eigenvalues, and bases for eigenspaces. (Definition EEM [396],Definition EM [404])


                                                                      -5
                        A = -1                        EB (-1) =        3

                                                                      -3
                        A = 1                           EB (1) =       2

                                                                      -2
                        A =2                            EB (2)=        1


  ]Geometric and algebraic multiplicities. (Definition GME [406]Definition AME [406])


Diagonalizable? (Definition DZM [435])


Version 2.02


﻿
Archetype B     712


Yes, distinct eigenvalues, Theorem DED [440].


    The diagonalization. (Theorem DC [436])

                             -1 -1 -1       [-7   -6  -12]    -5 -3 -2
                             2    3    1     5    5     7     3    2    1
                             -1 -2     1_     1   0     4 _    1   1    1_
                               -1 0 0
                            =   0   1 0
                                0 0 2_


Version 2.02


﻿
Archetype C     713


Archetype C


U.


7


Summary System with three equations, four variables. Consistent. Null space of coefficient matrix has
dimension 1.


A system of linear equations (Definition SLE [9]):


2xi - 3x2 + 13 - 6x4
4xi + 12 + 213 + 914
3xi + 12 +3:3 + 814


-7
-7
-8


]   Some solutions to the system of linear equations (not necessarily exhaustive):

1 = -7, X2 =-2,X3 = 7, X4=I1


xl=-1, x2=7, x3=4, x4


-2


Augmented matrix of the linear system of equations (Definition AM [27]):


2
4
3


-3 1 -6 -7
1 2 9 -7
1 1 8        -8


Matrix in reduced row-echelon form, row-equivalent to augmented matrix:


10

0 0


0    2   -5
0    3    1
LI -1 6]


Analysis of the augmented matrix (Notation RREFA [30]):


r=3


D = {1, 2, 3}


F = {4, 5}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries
of the vectors corresponding to elements of the set F for the larger examples.


Version 2.02


﻿
                                                                                   Archetype C    714


 zI       -5         -2
     z2 14 -3
 z3=      6          1
 _z4_     0  __1       _


    Given a system of equations we can always build a new, related, homogeneous system (Definition HS
[62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties
of this new system will have precise relationships with various properties of the original system.

                                       2x1 - 3X2 + 33 - 6x4= 0
                                       4x1 +z2 +2x3+ 9X4= 0
                                       3xi + x2 +z3 +8X4 = 0


    Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):
zi =0, z2=0, os =0, z4=0

= -2,      z2= -3, X3=1,        34=1

zi = -4, x2 = -6, x3 = 2, x4 = 2


    Form the augmented matrix of the homogenous linear system, and use row operations to convert to
reduced row-echelon form. Notice how the entries of the final column remain zeros:

    1 0    0    2   0
  0   [    0    3   0
  0    0  [-1  -1 0_


    Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:

                  r = 3                   D = {1, 2, 3}                  F = {4, 5}


  ]Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.

[2 -3 1 -6]


    Matrix brought to reduced row-echelon form:


1 0       0   2
0  ] 0 3
_  0 0   2   -1_


Version 2.02


﻿
Archetype C     715


    Analysis of the row-reduced matrix (Notation RREFA [30]):
                   r=3                     D={1,2,3}                       F={4}


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.

     -2
     -3


     Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])


     3i   ,  -   R 1 1


     The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.

L=(]

     1     0     0
     0  , 1   ,  0
     0     0     1I


     Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.


    Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])


Version 2.02


﻿
                                                                                  Archetype C     716


     1     0     0
     0     1     0
     0  ' 0   '  1
     _2_   3_    -1_


     Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]

                Matrix columns: 4                  Rank: 3                 Nullity: 1


Version 2.02


﻿
Archetype D     717


Archetype D


U.


7


Summary System with three equations, four variables. Consistent. Null space of coefficient matrix has
dimension 2. Coefficient matrix identical to that of Archetype E, vector of constants is different.


A system of linear equations (Definition SLE [9]):


                                   2xi + x2 + 73 - 7x4 = 8
                                -3x1 + 4x2 - 5x3 - 6X4= -12
                                    zi + 2 +44x3 - 5x4 = 4


Some solutions to the system of linear equations (not necessarily exhaustive):


Xi=0, X2=1, X3=2, 14


1


zi =4, z2 =0, os=0, z4 =0

zi=7, z2=8, z3=1, z4=3


    Augmented matrix of the linear system of equations (Definition AM [27]):

  2   1   7   -7    8
  -3 4 -5 -6 -12
  1   1   4   -5    4


    Matrix in reduced row-echelon form, row-equivalent to augmented matrix:

  1    0  3 -2 4]
  0   F    1 -3 0
  0    0  0   0   0]


    Analysis of the augmented matrix (Notation RREFA [30]):


r=2


D = {1, 2}


F = {3, 4, 5}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries
of the vectors corresponding to elements of the set F for the larger examples.


Version 2.02


﻿
                                                                                   Archetype D     718


  XI      4        -3         2

  X32     0  +-311         4  3
  [4      0 4]


    Given a system of equations we can always build a new, related, homogeneous system (Definition HS
[62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties
of this new system will have precise relationships with various properties of the original system.

                                         231 + x2 + 733 - 7x4 = 0
                                      -3x1 + 4x2 - 533 - 6X4 =0
                                          Xi + X2 + 4X3 - 5X4 = 0


    Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):
zi =0, z2=0, os =0, z4=0

xi= -3, z2= -1, z3=1, z4=0

zi =2, z2=3, os=0, z4 =1

Xi = -1, z2 =2, z3 = 1, z4= 1


    Form the augmented matrix of the homogenous linear system, and use row operations to convert to
reduced row-echelon form. Notice how the entries of the final column remain zeros:
  1    0   3 -2    0]
  0   [    1 -3    0
  0    0  0   0    0]


    Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:

                   r = 2                  D = {1, 2}                   F = {3, 4, 5}


  ]Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.


[j3 4 -5 -6]


    Matrix brought to reduced row-echelon form:


_0   0   0   0_


Version 2.02


﻿
Archetype D     719


    Analysis of the row-reduced matrix (Notation RREFA [30]):
                   r = 2                   D = {1, 2}                   F = {3, 4}


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.

     -3     [2}


       Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

      2      1


    The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.

L = [1 j     -1


      0  ,   1
      1      0


    Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.


    Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])


Version 2.02


﻿
                                                                              Archetype D    720


      1      0
      01
      3  '   1
      .-2_ _-3_


   Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]

               Matrix columns: 4                Rank: 2                Nullity: 2


Version 2.02


﻿
                                                                                    Archetype B    721


Archetype E

N                                                                                                   _-Z


Summary System with three equations, four variables. Inconsistent. Null space of coefficient matrix
has dimension 2. Coefficient matrix identical to that of Archetype D, constant vector is different.


    A system of linear equations (Definition SLE [9]):

                                        2xi + x2 + 733 - 7x4 = 2
                                      -3x1 +4x2 - 5x3 - 6X4=3
                                          xi + 2 + 4x3 - 5X4 = 2


    Some solutions to the system of linear equations (not necessarily exhaustive):

None. (Why?)


    Augmented matrix of the linear system of equations (Definition AM [27]):

  2   1   7   -7 2
  -3 4 -5 -6 3
  [1  1   4   -5 2


    Matrix in reduced row-echelon form, row-equivalent to augmented matrix:

  1    0   3 -2    0]
  0     i 1-3      0
  0    0  0   0   Fl


]   Analysis of the augmented matrix (Notation RREFA [30]):

                   r =3                   D ={1, 2, 5}                    F ={3, 4}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries
of the vectors corresponding to elements of the set F for the larger examples.

Inconsistent system, no solutions exist.


    Given a system of equations we can always build a new, related, homogeneous system (Definition HS
[62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties


Version 2.02


﻿
                                                                                    Archetype B     722


of this new system will have precise relationships with various properties of the original system.

                                         2xi + x2 + 733 - 7x4 = 0
                                      -3x1 +4x2 - 5X3 - 6x4 = 0
                                          zi + X2 + 4x3 - 5X4 = 0


    Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):
zi=0, z2=0, os=0, z4=0

xi=4, z2=13, os=2, z4=5


    Form the augmented matrix of the homogenous linear system, and use row operations to convert to
reduced row-echelon form. Notice how the entries of the final column remain zeros:

  1    0   3 -2    0]
  0   [    1 -3    0
  0    0  0   0    0]


] Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:

                   r=2                    D ={1, 2}                    F={3,4,5}


    Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.

  2   1   7   -7
  -3 4 -5 -6
  1   1   4   -5-


    Matrix brought to reduced row-echelon form:


  ]Analysis of the row-reduced matrix (Notation RREFA [30]):

                   r =2                    D ={1, 2}                     F ={3, 4}


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem


Version 2.02


﻿
                                                                                   Archetype B    723


BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.

     -3     [2}


       Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

     2      R1}


     The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of C".

L = [I1 j1-


      0  ,   1
      1      0


    Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.


    Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])


   <1       K0}


Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify


Version 2.02


﻿
                                                                                  Archetype B     724


Theorem RPNC [348]

                Matrix columns: 4                  Rank: 2                 Nullity: 2


Version 2.02


﻿
Archetype F     725


Archetype F


U.


7


Summary System with four equations, four variables. Nonsingular coefficient matrix. Integer eigenval-
ues, one has "high" multiplicity.


A system of linear equations (Definition SLE [9]):

                               33xi- 16x2 +10x3 - 2X4 =-27
                               99x1 - 47x2 + 27x3 - 7x4 =-77
                               78x1 - 36x2 + 17x3 - 6X4 =-52
                                 -9xi+ 2x2 + 3x3 + 4x4 =5


Some solutions to the system of linear equations (not necessarily exhaustive):


x1 =1, X2 =2, 13


-2, X4=4


Augmented matrix of the linear system of equations (Definition AM [27]):


33
99
78
-9


-16
-47
-36
2


10
27
17
3


-2
-7
-6
4


-27
-77
-52
5_


  Matrix in reduced row-echelon form, row-equivalent to augmented matrix:

10       0   0    1
0  [     0   0    2
0   0   [    0   -2
0   0    0   W    4


Analysis of the augmented matrix (Notation RREFA [30]):


r=4


D ={1, 2, 3, 4}


F = {5}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries
of the vectors corresponding to elements of the set F for the larger examples.


Version 2.02


﻿
                                                                                    Archetype F    726


 zi        1
 X2       2
 X3       -2
 [4_ ]    4 _


 ] Given a system of equations we can always build a new, related, homogeneous system (Definition HS
 [62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties
 of this new system will have precise relationships with various properties of the original system.

                                     33xi - 16x2 + 10x3 - 2X4 = 0
                                     99xi - 47x2 + 27x3 - 7X4 = 0
                                     78x1 - 36x2 + 17x3 - 6X4 = 0
                                       -9xi + 2X2 + 3X3 + 4X4 =0


    Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):
zi=0, x2=0,        x3=0,    x4=0


    Form the augmented matrix of the homogenous linear system, and use row operations to convert to
reduced row-echelon form. Notice how the entries of the final column remain zeros:

1      0   0    0  0
  0   [    0    0   0
  0    0  [     0   0
  0    0   0   [    0]


] Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:

                  r=4                     D={1,2,3,4}                       F={5}


    Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.

[33    -16   10  -21
  99 -47 27 -7
  78 -36 17 -6
  -9    2    3    4]


    Matrix brought to reduced row-echelon form:


0      0   0


                                                                                            Version 2.02


﻿
Archetype F     727


    Analysis of the row-reduced matrix (Notation RREFA [30]):

                  r=4                     D={1,2,3,4}                      F={}


    Matrix (coefficient matrix) is nonsingular or singular? (Theorem NMRRI [72]) at the same time,
examine the size of the set F above.Notice that this property does not apply to matrices that are not
square.

Nonsingular.


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.

K{ })


    Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

      33     -16     10]    -2
      99     -47     27     -7
      78  ' -36 '17'-6
      -9      2       3      4


    The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.

L = []

    11     0     0    0
    0      1     0    0
    0   ' 0   '  1  ' 0


  ]Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.


Version 2.02


﻿
                                                                                    Archetype F    728


     1     0     0     0
     0     1     0     0
     0  ' 0   '  1  ' 0


     Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])

     1     0     0     0
     0     1     0     0
     0  ' 0   '  1  ' 0


     Inverse matrix, if it exists. The inverse is not defined for matrices that are not square, and if the matrix
is square, then the matrix must be nonsingular. (Definition MI [213], Theorem NI [228])


    (129)  86    (17) 1 6
    -13    6     -2     1


    Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]
                Matrix columns: 4                  Rank: 4                  Nullity: 0


    Determinant of the matrix, which is only defined for square matrices. The matrix is nonsingular if and
only if the determinant is nonzero (Theorem SMZD [389]). (Product of all eigenvalues?)

Determinant = -18


    Eigenvalues, and bases for eigenspaces. (Definition EEM [396],Definition EM [404])


                              A -SF (-1) {K


                                                         SF 2)  { [] }


                                             1     17

A = 3                          EF (3)  = 10 '42
                                         7_       _02_


Version 2.02


﻿
Archetype F   729


Geometric and algebraic multiplicities. (Definition GME [406]Definition AME [406])


                   7F (-1) =1GF (-1) =1
                     7F (2) = 1                       aF (2) = 1
                     7F(3) = 2                        aF (3) = 2


   Diagonalizable? (Definition DZM [435])

Yes, full eigenspaces, Theorem DMFE [438].


   The diagonalization. (Theorem DC [436])


12
-39
27
7
26
7
  -1
  0
  0


-5
18
13
-7
12
-7
0
2
0
0


1
-7
6
7
5
7

0
0


-1   33
3    99
-     78
-2   [-9


-16 10
-47 27
-36 17
2    3


-2   1 2 1 17
-7   2 5 1 45
-6   0 2 0 21
4   [1 1 7    0]


0
0
3
0


Version 2.02


﻿
Archetype G     730


Archetype G


U.


7


Summary System with five equations, two variables. Consistent. Null space of coefficient matrix has
dimension 0. Coefficient matrix identical to that of Archetype H, constant vector is different.


A system of linear equations (Definition SLE [9]):

                                        2x1 + 3x2 = 6


-zi + 4x2
3x1 + 10x2
  3xi - 92
  6x1 + 9x2


-14
-2
20
18


D


Some solutions to the system of linear equations (not necessarily exhaustive):


1i =6,   z2


-2


  Augmented matrix of the linear system of equations (Definition AM [27]):

  2   3    6
-1    4   -14
3    10   -2
3    -1    20
6     9    18 _


  Matrix in reduced row-echelon form, row-equivalent to augmented matrix:

  1  0    6
  0 W-2
  0  0    0
  0  0    0
  0  0    0


  Analysis of the augmented matrix (Notation RREFA [30]):


r=2


D = {1, 2}


F = {3}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries


Version 2.02


﻿
                                                                                   Archetype G     731


of the vectors corresponding to elements of the set F for the larger examples.

XI 6
[2]     [2_


    Given a system of equations we can always build a new, related, homogeneous system (Definition HS
[62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties
of this new system will have precise relationships with various properties of the original system.

                                             2x1 + 3x2 = 0
                                             -Xi + 4x2 = 0
                                             3xi + 10x2 = 0
                                             3xi - -2 = 0
                                             6x1 + 9x2 = 0


    Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):


    Form the augmented matrix of the homogenous linear system, and use row operations to convert to
reduced row-echelon form. Notice how the entries of the final column remain zeros:

     100
  0 0     0
  0    0  0
  0    0  0
  0    0  0_


    Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:

                    r =2                     D ={1, 2}                     F ={3}


  ]Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.


2     3
-1 4
3    10
3    -1
6     9


Version 2.02


﻿
Archetype G     732


    Matrix brought to reduced row-echelon form:
    10
  0   W
  0   0
  0   0
  0   0


    Analysis of the row-reduced matrix (Notation RREFA [30]):
                    r=2                     D={1, 2}                      F={}


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.

K{ })


    Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

      2      3

      3   ,10
      3      -1
      _ 6 _9


    The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.

      1[ 00      0    -
L=    0 1 0 1-j
      0 0   1    1    -1

      1. ,--


  ]Column space of the matrix, expressed as the span of a set of linearly independent vectors. These


Version 2.02


﻿
                                                                                  Archetype G     733


vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.

     1      0
     0      1
     2  ,   1
     1     -1


     Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])


    Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]

                Matrix columns: 2                  Rank: 2                 Nullity: 0


Version 2.02


﻿
Archetype H     734


Archetype H


U.


7


Summary       System with five equations, two variables. Inconsistent, overdetermined. Null space of
coefficient matrix has dimension 0. Coefficient matrix identical to that of Archetype G, constant vector is
different.


A system of linear equations (Definition SLE [9]):

                                        2xi + 3x2 = 5
                                        -x1 + 4x2 = 6
                                        3xi + 10x2 = 2
                                        3x- 2 =-1
                                        6xi + 9x2 = 3


    Some solutions to the system of linear equations (not necessarily exhaustive):

None. (Why?)


    Augmented matrix of the linear system of equations (Definition AM [27]):


2
-1
3
3
6


3
4
10
-1
9


5
6
2
-1
3 _


  Matrix in reduced row-echelon form, row-equivalent to augmented matrix:

1   0    0
0   [-1  0
0   0   [-
0   0    0
0   0    0


  Analysis of the augmented matrix (Notation RREFA [30]):


r=3


D = {1, 2, 3}


F={}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries


Version 2.02


﻿
                                                                                   Archetype H     735


of the vectors corresponding to elements of the set F for the larger examples.

Inconsistent system, no solutions exist.


] Given a system of equations we can always build a new, related, homogeneous system (Definition HS
[62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties
of this new system will have precise relationships with various properties of the original system.


                                             2xi + 3X2 = 0
                                             -x1 + 4x2 = 0
                                             3xi + 10x2 = 0
                                             3xi -   2= 0
                                             6x1 + 9x2 = 0


    Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):
zi= 0, z2 = 0


    Form the augmented matrix of the homogenous linear system, and use row operations to convert to
reduced row-echelon form. Notice how the entries of the final column remain zeros:

     100
 FLi~oo
 0 0      0
 0     0  0
 0     0  0
 0     0  0_


] Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:


                    r =2                     D ={1, 2}                     F ={3}


  ]Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.


2     3
-1 4
3    10
3    -1
6     9


Version 2.02


﻿
Archetype H     736


    Matrix brought to reduced row-echelon form:

  10
  0 [-
  0   0
  0   0
  0   0


    Analysis of the row-reduced matrix (Notation RREFA [30]):

                    r=2                     D={1, 2}                      F={}


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.

K{ })


    Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

      2      3

      3   ,10
      3      -1
      _ 6 _9


    The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.

L = []


  ]Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing


Version 2.02


﻿
                                                                                 Archetype H     737


out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.

        10
     0      1
     2  ,   1
        1-1
     3{ _0 _


     The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.

      1 0 0      0    -j
L=    0 1 0 1-j
      0 0 1      1    -1


   ~0
     3      -
        1,-1
     0      1


     Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])


    Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]

                Matrix columns: 2                 Rank: 2                 Nullity: 0


Version 2.02


﻿
Archetype I    738


Archetype I


U.


7


Summary System with four equations, seven variables. Consistent. Null space of coefficient matrix has
dimension 4.


    A system of linear equations (Definition SLE [9]):


                                        zi+4X2 - X4+ 7z6 - 93:=7 3
                          2xi +88x2 - x3 + 3X4 + 9x5 - 13x6 + 77:=7 9
                                     2x3 - 3X4 - 4x5 + 12x6 - 8X7=1
                       -xi - 4X2 + 2x3 + 4x4 + 8x5 - 31x6 + 377 = 4


Some solutions to the system of linear equations (not necessarily exhaustive):


-i = -25, z2 = 4, 33 = 22, 34 = 29, zx - 1, zx - 2, 3:7


-3


Xi= -7, z2= -5, 3= -7, X4= 15, os1


4, z6 =2, 37= 1


zi=4, 2=0, s= -2, X4= 1, := -0, z =0, 37=0


] Augmented matrix of the linear system of equations (Definition AM [27]):


1
2
0
-1


4    0   -1    0    7   -9 3
8    -1   3    9   -13   7   9
0    2   -3 -4      12  -8 1
-4   2    4    8   -31   37  4]


Matrix in reduced row-echelon form, row-equivalent to augmented matrix:


14     0    0
0   0       0
0   0  0
0   0  0    0


2
1
2
0


1
-3
-6
0


-3
5
6
0


4
2
1
0]


Analysis of the augmented matrix (Notation RREFA [30]):


r=3


D = {1, 3, 4}


F ={2, 5, 6, 7, 8}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries


Version 2.02


﻿
Archetype I    739


of the vectors corresponding to elements of the set F for the larger examples.


12
13
14
_5

137~


-4-
0
2
1
0
0
.0_


+3:2


--4-
1
0
0
0
0
_0 _


-2
0
-1
-2
1
0
_0 _


+ X6


~-1-
0
3
6
0
1
_0 _


+ X7


3.
0
-5
-6
0
0
_1 _


    Given a system of equations we can always build a new, related, homogeneous system (Definition HS
[62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties
of this new system will have precise relationships with various properties of the original system.


                                             xi+4X2 - 34 + 73:6 - 937=0
                              2x1 +88x2 - 33 + 334 + 9x5 - 13x6 + 73:=7  0
                                         2x3 - 3X4 - 4x5 + 12x6 - 8X7=0
                            -xi - 4x2 + 2x3 + 4x4 + 8x5 - 31x6 + 377 = 0


zi


Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):
=o0, x2=o0, x3os  , x4=o0, x5os  ,zx6=o0,zx7= 0


zi = 3, 2 =0, 3


-5, :4 =-6, x = 0, zx = 0, 37 = 1


I= -1, z2= 0, 3= -3, 34= -6, :5= 0, z1     - 1, 37 = 0


1l = -2, 12 = 0, 13


1, 34 =-2, -5 = 1, z6 = 0, 37 = 0


zi= -4, z2= -1, 33 30, 34= 0, 3:= 0, zx - 0, 37 = 0


zi = -4, z2 - 1, 3


-3, 34 =-2, zx = 1, ze - 1, 3: = 1


    Form the augmented matrix of the homogenous linear system, and use row operations to convert to
reduced row-echelon form. Notice how the entries of the final column remain zeros:


14     0    0
0   0       0
0   0   0
0   0  0    0


2
1
2
0


1
-3
-6
0


-3 0
5 0
6 0
0 0_


    Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:


r=3


D = {1, 3, 4}


F ={2, 5, 6, 7, 8}


Version 2.02


﻿
Archetype I    740


] Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.


  1
  2
  0
[-1


4     0
8 -1
0     2
-4 2


-1    0
3     9
-3 -4
4     8


7     -9
-13 7
12 -8
-31 37_


Matrix brought to reduced row-echelon form:


1
0
0
0


4
0
0
0


0
01
0
0


0
0
01
0


2
1
2
0


1   -3
-3 5
-6 6
0 0]


Analysis of the row-reduced matrix (Notation RREFA [30]):


r=3


D = {1, 3, 4}


F = {2, 5, 6, 7}


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.


K


-4-
1
0
0
0
0
0


0
-1
-2
1
0
0


--1~
0
3
6
0
1
0


3
0
-5
-6
0
0
_1_


    Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

      1       0     ~-1
      2      -1      3
      0   '2      '-3
      .- 1_ .2  _   _4 _


    The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a


Version 2.02


﻿
                                                                                 Archetype I    741


set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.

L=[1j          -12 137

      - 7 -   13-   12-
      31      31    31
      0       0      1
      0    '  1  '   0
      1 0 0


    Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.

      1       0     0
      0       1      0
      0       0      1
      31      12    13


      -17    -7-    -7-
    Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])

      n1     0      0
      4      0      0
      0      1      0
    < 0   ,  0   ,  1    >
      2      1      2
      1     -3      -6
      -3_   _5 _ _6   _


    Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]


Version 2.02


﻿
                                                                                Archetype J    742


Archetype J

N                                                                                              -


Summary System with six equations, nine variables. Consistent. Null space of coefficient matrix has
dimension 5.


    A system of linear equations (Definition SLE [9]):

                        xi + 2x2 - 2x3 + 9X4 + 3x5 - 5x6 - 2.:7 + -8 + 27.x9 =-5
                      231 +44x2 + 3x3 + 4.4 - -x5 + 4-6 + 10x7 + 2x8 - 23x9 =18
                           xi +22x2 +3:3 + 3x4 + X5 + z6 + 5x7 + 2X8 - 7.9 = 6
                           2x1 +44x2 + 3X3 + 4.:4 - 7.5 -+2.:6 +4.7 - 11.9= 20
                             xi+ 2x2 + 5X4 + 2x5 - 4x6+3X7 + 8x8 + 13x= -4
                   -3xi - 6x2 - x3 - 13x4 + 2x5 - 5x6 - 4.:7 + 13x8 + 10x9= -29


    Some solutions to the system of linear equations (not necessarily exhaustive):

i =6, 2 = 0, 33 =-1, .x4 = 0, x5 =-1, 6 =2, x7 =0, x8 =0, 3= 0

xi=4, 2 = 1, 3 =-1, .x4 = 0,  =-1, zx = 2, :7 =0, x8 =0, x= 0

-i=-17, 2= 7, 33      3, x4= 2, ox= -1, 6 =14, x:7    -1, x8= 3, x= 2

zi= -11, 2= -6,       = 1, x4= 5,    = -4, x6 =7, x7   3, x8= 1, xg= 1


    Augmented matrix of the linear system of equations (Definition AM [27]):

  1    2   -2    9    3   -5  -2    1   27   -5
  2    4    3    4   -1    4   10   2  -23    18
  1    2    1    3    1    1   5    2   -7    6
  2    4    3    4   -7    2   4   0   -11    20
  1    2    0    5    2   -4   3    8   13   -4
  -3  -6   -1  -13    2   -5  -4   13   10   -29_


    Matrix in reduced row-echelon form, row-equivalent to augmented matrix:


 F    20ow    52   0    01-2      3   6]


0   0  0    0   1    0  1   1   -1 -1
0   0  0    0   0       0 -2 -3      2
0   0  0    0   0   0   0   0    0   0
0   0  0    0   0   0   0   0    0   0


Version 2.02


﻿
Archetype J    743


Analysis of the augmented matrix (Notation RREFA [30]):


r=4


D = {1, 3, 5, 6}


F = {2, 4, 7, 8, 9, 10}


    Vector form of the solution set to the system of equations (Theorem VFSLS [99]). Notice the relation-
ship between the free variables and the set F above. Also, notice the pattern of 0's and l's in the entries
of the vectors corresponding to elements of the set F for the larger examples.


12
13
14
3:5

3:7
18
.19..


-6-
0
-1
0
-1
2
0
0
.0_


+3:2


-2
1
0
0
0
0
0
0
0


+3:4


--5-
0
2
1
0
0
0
0
0


+ X7


-1-
0
-3
0
-1
0
1
0
0


+3:8


-2
0
-5
0
-1
2
0
1
0


+ 19


-3-
0
6
0
1
3
0
0
_1.


    Given a system of equations we can always build a new, related, homogeneous system (Definition HS
[62]) by converting the constant terms to zeros and retaining the coefficients of the variables. Properties
of this new system will have precise relationships with various properties of the original system.


    zi + 2x2 - 2x3 + 9x4 + 3x5 - 5x6 - 2x7 + x8 +27:9
  231 + 412 + 333 +44 -  5 -+416 + 1037 + 28 - 233:9
       zi + 2x2 +3:3 + 3x4 + z5 + z6 + 5x7 + 2x8 - 73:9
       2xi + 412 + 313 -|-4x4 - 715 -|-216 -|-417 - 11xg9
       xi +2X2+ +5x4 + 2x5 - 4x6+3x7 +8x8 + 13x9
-3xi - 632 - 13 - 1334 + 2x5 - 5x6 - 4x7 + 1338 + 103:9


0
0
0
0
0
0


]   Some solutions to the associated homogenous system of linear equations (not necessarily exhaustive):
-i=0, x2 =0, 3 =0, 34=0, x5 =0, z6 =0, 3:=0, x8 =0, 39 =0

-i= -2, z2= -1, 3= -0, 34= -0, 3:= -0, zx  - 0, 3: - 0, x8 = 0, :9= 0


-i=-23, 2 =7, x3 =4, x4 =2, x5 =0, 6 =12, 3:7


1, x8 = 3, :9= 2


x1 =-17, 12=-6, 13=2, 14=5) 15


-3, z6 = 5, 3: = 3, x8 = 1, 3g = 1


Form the augmented matrix of the homogenous linear system, and use row operations to convert to


Version 2.02


﻿
Archetype J     744


reduced row-echelon form. Notice how the entries of the final column remain zeros:


Ti
0
0
0
0
0


2
0
0
0
0
0


0
01
0
0
0
0


5
-2
0
0
0
0


0
0
01
0
0
0


0
0
0
01
0
0


1
3
1
0
0
0


-2
5
1
-2
0
0


3
-6
-1
-3
0
0


0
0
0
0
0
0


    Analysis of the augmented matrix for the homogenous system (Notation RREFA [30]). Notice the
slight variation for the same analysis of the original system only when the original system was consistent:


r=4


D = {1, 3, 5, 6}


F = {2, 4, 7, 8, 9, 10}


] Coefficient matrix of original system of equations, and of associated homogenous system. This matrix
will be the subject of further analysis, rather than the systems of equations.


1
2
1
2
1
-3


2
4
2
4
2
-6


-2
3
1
3
0
-1


9
4
3
4
5
-13


3
-1
1
-7
2
2


-5
4
1
2
-4
-5


-2
10
5
4
3
-4


1
2
2
0
8
13


27
-23
-7
-11
13
10


Matrix brought to reduced row-echelon form:


fl 2     0
0 0
0 0 0


5
-2
0
0
0
0


0
0
0
0
0
0


0
0
0
0
0
0


1
3
1
0
0
0


-2
5
1
-2
0
0


3
-6
-1
-3
0
0


0
0
0


0
0
0


0
0
0


Analysis of the row-reduced matrix (Notation RREFA [30]):


r=4


D ={1, 3, 5, 6}


F ={2, 4, 7, 8, 9}


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.


Version 2.02


﻿
Archetype J


745


Archetype J     745


K


--2-
  1
  0
  0
  0
  0
  0
  0
_0 _


-5-
0
2
1
0
0
0
0
0 _


-1"
0
-3
0
-1
0
1
0
0.


-2  -
0
-5
0
-1
2
0
  1
_0 _


--3-
0
6
0
  1
  3
  0
  0
_1_


] Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

      1      -2      3       -
      2       3     -1       4
      1       1      1       1
      2   '   3   ' -7 '     2
      1       0      2      -4
      -3_    -1_    _2 _    -5_


    The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.


[1  0   186    51      188   77 1
L       132    131    5131   131
         131    131   131     131]


K


I


77
131
14
131
0
0
0
1


188 -
131
58
131
0
0
1
0


S51-
131
45
131
0
1
0
0


186
131
272
131
1
0
0
0


I


] Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.


Version 2.02


﻿
Archetype J


746


Archetype J     746


K


I


1
0
0
0
-1
29


0
1
0
0
11
94
*7-


~0~
0
1
0
10
22


0
0
0
1
3
-3-


I


    Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])

      1       0      0       0
      2       0      0       0
      0       1      0       0
      5      -2      0       0
    < 0   ,   0   ,  1   ,   0   >
      0       0      0       1
      1       3      1       0
      -2      5      1      -2
      3      -6     -1     _-3


    Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]


Matrix columns: 9


Rank: 4


Nullity: 5


Version 2.02


﻿
Archetype K     747


Archetype K


U.


7


Summary Square matrix of size 5. Nonsingular. 3 distinct eigenvalues, 2 of multiplicity 2.


A matrix:


10
12
-30
27
18


18
-2
-21
30
24


24
-6
-23
36
30


24
0
-30
37
30


-12
-18
39
-30
-20


Matrix brought to reduced row-echelon form:


10       0   0   0
0  F     0   0    0
0   0   F    0    0
0   0    0  F     0
0   0    0   0 1


Analysis of the row-reduced matrix (Notation RREFA [30]):


r=5


D = {1, 2, 3, 4, 5}


F={}


    Matrix (coefficient matrix) is nonsingular or singular? (Theorem NMRRI [72]) at the same time,
examine the size of the set F above.Notice that this property does not apply to matrices that are not
square.

Nonsingular.


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.

K{ })


    Column space of the matrix, expressed as the span of a set of linearly independent vectors that are
also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])


Version 2.02


﻿
                                                                                  Archetype K     748


      10       18      24       24     -12
      12      -2       -6       0      -18
      -30 , -21 , -23 , -30 ,           39
      27       30      36       37     -30
      _ 18     24     _ 30 _  _ 30 _   -20_


    The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.

L=[]

     1     0     0    0     0
     0     1    0     0     0
     0  ,  0  ,  1  , 0  ,  0
     0     0     0    1     0


     Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.

     1     0     0    0     0
     0     1     0    0     0
     0  ,  0  ,  1  , 0  ,  0
     0     0     0    1     0


     Row space of the matrix, expressed as a span of a set of linearly independent vectors, obtained from
the nonzero rows of the equivalent matrix in reduced row-echelon form. (Theorem BRS [245])

    S1    =0    =0    00
    0      0     0    0     0
    0F]       ,  1  ,


    Inverse matrix, if it exists. The inverse is not defined for matrices that are not square, and if the matrix
is square, then the matrix must be nonsingular. (Definition MI [213], Theorem NI [228])


Version 2.02


﻿
Archetype K     749


1
21
2
-15
9
9


(9)
43
(4)
15
4
3


(3)
~2
21
-11
9
3


3
9
-15
10
6


-6
-9
39
-15
(1)


    Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]


Matrix columns: 5


Rank: 5


Nullity: 0


    Determinant of the matrix, which is only defined for square matrices. The matrix is nonsingular if and
only if the determinant is nonzero (Theorem SMZD [389]). (Product of all eigenvalues?)


Determinant


16


Eigenvalues, and bases for eigenspaces. (Definition EEM [396],Definition EM [404])


A= -2


A=1


  =4


EK(-2)


  EK(1VK{


  EK (4) -


2      -1
-2      2
1  , -2
0       1
1       0)
4        -4
-10      18
7     , -17
0         5
2         0
1
-1
0
1
1_


I>


Geometric and algebraic multiplicities. (Definition GME [406]Definition AME [406])


7K(-2)=2
  7K (1) = 2
  7K(4)= 1


K (-2)
  aK (1)
  aK (4)


2
2
1


Diagonalizable? (Definition DZM [435])


Version 2.02


﻿
                                                                                  Archetype K     750


Yes, full eigenspaces, Theorem DMFE [438].


    The diagonalization. (Theorem DC [436])

           -4 -3 -4 -6         7     10    18    24    24   -12     2  -1     4    -4    1
           -7 -5 -6 -8        10     12   -2    -6     0    -18    -2   2   -10    18   -1
           1   -1 -1      1   -3    -30 -21 -23 -30         39      1  -2     7   -17    0
           1    0    0    1   -2     27    30    36    37   -30     0    1    0     5    1
           2    5    6    4    0     18    24    30    30   -20     1   0     2     0    1
             -2    0   0 0 0
             0    -2   0 0 0
          =   0    0   1 0 0
              0    0   0  1 0
              0    0   0 0 4


Version 2.02


﻿
                                                                                   Archetype L    751


Archetype L

N                                                                                                  __v


Summary Square matrix of size 5. Singular, nullity 2. 2 distinct eigenvalues, each of "high" multiplicity.


    A matrix:

 -2   -1 -2     -4    4
 -6   -5   -4   -4    6
 10    7    7   10   -13
 -7   -5   -6   -9    10
 -4   -3   -4   -6    6 _


    Matrix brought to reduced row-echelon form:

  1    0   0    1   -2
  0   [    0   -2    2
  0    0  [     2   -1
  0   0    0    0    0
  0   0    0    0    0


    Analysis of the row-reduced matrix (Notation RREFA [30]):

                  r = 5                   D = {1, 2, 3}                  F = {4, 5}


    Matrix (coefficient matrix) is nonsingular or singular? (Theorem NMRRI [72]) at the same time,
examine the size of the set F above.Notice that this property does not apply to matrices that are not
square.

Singular.


    This is the null space of the matrix. The set of vectors used in the span construction is a linearly
independent set of column vectors that spans the null space of the matrix (Theorem SSNS [118], Theorem
BNS [139]). Solve the homogenous system with this matrix as the coefficient matrix and write the solutions
in vector form (Theorem VFSLS [99]) to see these vectors arise.


   <{  1f   [2fl


_LO] [1])


Column space of the matrix, expressed as the span of a set of linearly independent vectors that are


Version 2.02


﻿
                                                                                   Archetype L    752


also columns of the matrix. These columns have indices that form the set D above. (Theorem BCS [239])

     --2     -1     -2
     -6      -5     -4
     10   ,  7   ,   7
     -7      -5     -6
     _-4_    -3_    -4_


     The column space of the matrix, as it arises from the extended echelon form of the matrix. The matrix
L is computed as described in Definition EEF [261]. This is followed by the column space described by a
set of linearly independent vectors that span the null space of L, computed as according to Theorem FS
[263] and Theorem BNS [139]. When r = m, the matrix L has no rows and the column space is all of Cm.


L=_ 10      -2   -6   5
      0  1   4   10  -9-

          -5  6       2
      9      -10     -4
      0   ,   0   ,   1
      0       1       0
      1 _     0 _     0


    Column space of the matrix, expressed as the span of a set of linearly independent vectors. These
vectors are computed by row-reducing the transpose of the matrix into reduced row-echelon form, tossing
out the zero rows, and writing the remaining nonzero rows as column vectors. By Theorem CSRST [247]
and Theorem BRS [245], and in the style of Example CSROI [247], this yields a linearly independent set
of vectors that span the column space.

           0     0
           1     0

        9  5     1
     44          2
   5<3W
         2- 2


    Inverse matrix, if it exists. The inverse is not defined for matrices that are not square, and if the matrix
is square, then the matrix must be nonsingular. (Definition MI [213], Theorem NI [228])


Version 2.02


﻿
Archetype L    753


    Subspace dimensions associated with the matrix. (Definition NOM [347], Definition ROM [347]) Verify
Theorem RPNC [348]


Matrix columns: 5


Rank: 3


Nullity: 2


    Determinant of the matrix, which is only defined for square matrices. The matrix is nonsingular if and
only if the determinant is nonzero (Theorem SMZD [389]). (Product of all eigenvalues?)


Determinant


0


Eigenvalues, and bases for eigenspaces. (Definition EEM [396],Definition EM [404])


A _-1


A=0


EL(-1)    K


  EL (0) =K


-5       6       2
9      -10      -4
0    ,   0   ,   1
0        1       0
1        0 _     0 _
2      -1
-2      2
1  ,-2
0       1
1 _  0 _


I


    Geometric and algebraic multiplicities. (Definition GME [406]Definition AME [406])


                          7L (-1)= 3                           aL (-1) = 3
                          7L (0) = 2                            aL (0) = 2


    Diagonalizable? (Definition DZM [435])

Yes, full eigenspaces, Theorem DMFE [438].


    The diagonalization. (Theorem DC [436])


4
7
-10
-4
-7


3
5
-7
-3
-5


4
6
-7
-4
-6


6     -6    -2
9    -10    -6
-10   13     10
-6     7    -7
-8    10    -4


-1
-5
7
-5
-3


-2
-4
7
-6
-4


-4
-4
10
-9
-6


4      -5
6      9
-13    0
10     0
6       1


6
-10
0
1
0


2
-4
1
0
0


2
-2
1
0
1


-1
2
-2
1
0 _


Version 2.02


﻿
Archetype L        754


-1     0    0    0 0
0     -1    0    0 0
0      0    -1 0 0
0      0    0    0 0
0      0    0    0 0


Version 2.02


﻿
Archetype M     755


Archetype M


U.


7


Summary       Linear transformation with bigger domain than codomain, so it is guaranteed to not be
injective. Happens to not be surjective.


A linear transformation: (Definition LT [452])


T: C5 - C3,


     2          zi + 2x2 + 3x3 + 4x4 + 4.5
T    x3     =   3xi + x2 + 4x3 - 3x4 + 7x5
     X4        [     i1-3z2-5z14 +z5      1
       \XI x5


A basis for the null space of the linear transformation: (Definition KLT [481])


I


-2      2
-1      -3
0    ,  0   ,
0        1
1       0


-1
-1
1
0
0_


I


    Injective: No. (Definition ILT [477])

Since the kernel is nontrivial Theorem KILT [484] tells us that the linear transformation is not injective.
Also, since the rank can not exceed 3, we are guaranteed to have a nullity of at least 2, just from checking
dimensions of the domain and the codomain. In particular, verify that


     / 1  \
     2           38]
T -1 = 24
      4[-16]
      \ _5  _/


     / 0 \
     -3
T     0
      5
      \ _6 _


L


38
24
-16]


This demonstration that T is not injective is constructed with the observation that

                                          0        1       -1
                                          -3       2       -5
                                          0   =-1 +        1
                                          5        4       1
                                          6        5       1


Version 2.02


﻿
                                                                                     Archetype M      756


and

                                                   -_ -
                                                   -5
                                              z =   1   E /C(T)
                                                    1
                                                    1

so the vector z effectively "does nothing" in the evaluation of T.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


                                      .1_    2      3      4      4
                                      3 ,1       ,4 ,-3 ,7
                                      1     -1      0     -5 1

If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:

                                                 1      0
                                                 0   , 1


    Surjective: No. (Definition SLT [492])

                                                                                              3
Notice that the range is not all of C3 since its dimension 2, not 3. In particular, verify that 4]   (T),
                                                                                             -5
by setting the output equal to this vector and seeing that the resulting system of linear equations has no

solution, i.e. is inconsistent. So the preimage, T-1 (4] , is empty. This alone is sufficient to see that

the linear transformation is not onto.


  ]Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].

                Domain dimension: 5                   Rank: 2                  Nullity: 3


Invertible: No.


Version 2.02


﻿
Archetype M     757


Not injective or surjective.


    Matrix representation (Theorem MLTCV [460]):

                                                           ~1   2   3   4  4
                        T : C5 - C3, T(x)= Ax, A =          3   1  4 -3 7
                                                           _1 -1 0 -5 1_


Version 2.02


﻿
Archetype N     758


Archetype N


U.


7


Summary Linear transformation with domain larger than its codomain, so it is guaranteed to not be
injective. Happens to be onto.


A linear transformation: (Definition LT [452])


T : C5 - C3,


   / z1 \
     12
T    x3
     14
   TI 5


2x1 + x2 + 3x3 - 4-4 + 5x5
Xi -2X2+ 3x3 - 9x4 + 3x5
   3x1+4x3-6x4+5x          _


A basis for the null space of the linear transformation: (Definition KLT [481])


1       -2
-1     -1
-2, 3
0       1
1       0


I


    Injective: No. (Definition ILT [477])

Since the kernel is nontrivial Theorem KILT [484] tells us that the linear transformation is not injective.
Also, since the rank can not exceed 3, we are guaranteed to have a nullity of at least 2, just from checking
dimensions of the domain and the codomain. In particular, verify that


    /-3 \
      1          6
T    -2     = 19
     -3          6_
     \ 1./


    /-4\
      -4         6
T  -2 = 19
      -1         6_
      \ _4 .


This demonstration that T is not injective is constructed with the observation that

                                         --4-     --3      -1
                                         -4        1       -5
                                         -2 =     -2 +     0
                                         -1       -3       2
                                         4         1       3


Version 2.02


﻿
                                                                                   Archetype N     759


and


                                                  -5
                                            z =    0   E /C(T)
                                                   2
                                                   3 _
so the vector z effectively "does nothing" in the evaluation of T.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


                                     .2     1      3     -4     5
                                     1      - , 2 , 3, [-9      3
                                     3      0      4     -6     5
If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:

                                             1     0    0
                                             0  , 1   , 0
                                             10    0     1-


    Surjective: Yes. (Definition SLT [492])

Notice that the basis for the range above is the standard basis for C3. So the range is all of C3 and thus
the linear transformation is surjective.


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].

                Domain dimension: 5                  Rank: 3                 Nullity: 2


  DInvertible: No.

Not surjective, and the relative sizes of the domain and codomain mean the linear transformation cannot
be injective. (Theorem ILTIS [511])


    Matrix representation (Theorem MLTCV [460]):


                                   2   1   3 -4 5
T: C5 - C3, T (x) = Ax, A =    1 -2 3 -9 3
                                   3   0   4 -6 5


Version 2.02


﻿

Archetype N        760


Version 2.02


﻿
Archetype 0     761


Archetype 0


U.


7


Summary Linear transformation with a domain smaller than the codomain, so it is guaranteed to not
be onto. Happens to not be one-to-one.


D


A linear transformation: (Definition LT [452])


T x2
     -x3_


-XI + z2 - 313
-1i + 2x2 - 4x3
  X:I + 12 + 33
2xi + 3x2 + z3
   1i + 213


A basis for the null space of the linear transformation: (Definition KLT [481])


{


[-21
  1
  L1


Injective: No. (Definition ILT [477])


Since the kernel is nontrivial Theorem KILT [484] tells us that the linear transformation is not injective.
Also, since the rank can not exceed 3, we are guaranteed to have a nullity of at least 2, just from checking
dimensions of the domain and the codomain. In particular, verify that


                -15
      5         -19
T    -1     =     7
      3 _10
                 11


T


This demonstration that T is not injective is constructed with the observation that

                                           1       5      -4
                                           1 =    -1 +     2
                                           5      3        2

and

                                                  _4-
                                            z =   2    E /C(T)
                                                  2

so the vector z effectively "does nothing" in the evaluation of T.


    A basis for the range of the linear transformation: (Definition RLT [496])


Version 2.02


﻿
                                                                                    Archetype 0   762


Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


                                            -1     1     -3
                                            -1     2     -4
                                            1{   , 1  ,
                                            2      3      1
                                            1      0      2
If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:
                                                1      0
                                                0      1
                                                -3 ,2
                                                -7     5
                                                -2     1


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].
                Domain dimension: 3                  Rank: 2                  Nullity: 1


    Surjective: No. (Definition SLT [492])

The dimension of the range is 2, and the codomain (C5) has dimension 5. So the transformation is not onto.
Notice too that since the domain C3 has dimension 3, it is impossible for the range to have a dimension
greater than 3, and no matter what the actual definition of the function, it cannot possibly be onto.
                                    2
                                    3
    To be more precise, verify that 1  R(T), by setting the output equal to this vector and seeing that
                                    1
                                    1


the resulting system of linear equations has no solution, i.e. is inconsistent. So the preimage, T-1 1,


is empty. This alone is sufficient to see that the linear transformation is not onto.


  DInvertible: No.

Not injective, and the relative dimensions of the domain and codomain prohibit any possibility of being


surjective.


    Matrix representation (Theorem MLTCV [460]):


Version 2.02


﻿
Archetype 0  763


                              -1 1 -3
                              -1 2 -4
T: C3 H-C5, T(x)=Ax, A=       1   1  1
                              23 1
                              1 02


Version 2.02


﻿
Archetype P     764


Archetype P


U.


7


Summary Linear transformation with a domain smaller that its codomain, so it is guaranteed to not
be surjective. Happens to be injective.


A linear transformation: (Definition LT [452])


     T1
T X2
      X3_


-XI + 12 + 13
-1i + 2x2 + 213
zi + 12 + 313
2x1 + 3x2 + 13
-2xi + 12 + 3x3_


A basis for the null space of the linear transformation: (Definition KLT [481])


                                              { }


Injective: Yes. (Definition ILT [477])


Since C(T) = {0}, Theorem KILT [484] tells us that T is injective.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


                                            -1      1    1
                                            -1      2    2
                                            1    , 1   , 3
                                            2       3    1
                                            -2I i   1_i 3

If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:

                                           1       0       0
                                           0       1       0
                                           0    ,  0   ,   1
                                           -10     7      -1
                                           _6 _   -3_     _1 _


Version 2.02


﻿
Archetype P     765


    Surjective: No. (Definition SLT [492])

The dimension of the range is 3, and the codomain (C5) has dimension 5. So the transformation is not
surjective. Notice too that since the domain C3 has dimension 3, it is impossible for the range to have a
dimension greater than 3, and no matter what the actual definition of the function, it cannot possibly be
surjective in this situation.
                                    2
                                    1
    To be more precise, verify that -3  R(T), by setting the output equal to this vector and seeing that
                                    2
                                    6
                                                                                                /2\
                                                                                                  1
the resulting system of linear equations has no solution, i.e. is inconsistent. So the preimage, T-1  -3
                                                                                                  2

is empty. This alone is sufficient to see that the linear transformation is not onto.


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].

                Domain dimension: 3                  Rank: 3                  Nullity: 0


    Invertible: No.

The relative dimensions of the domain and codomain prohibit any possibility of being surjective, so apply
Theorem ILTIS [511].


    Matrix representation (Theorem MLTCV [460]):

                                                                -1 1 1
                                                                -1 2 2
                             T:(C3FH C5, T(x) =Ax, A=            1   1 3
                                                                 2   3 1
                                                                 -2  1 3


Version 2.02


﻿
Archetype Q     766


Archetype Q


U.


7


Summary Linear transformation with equal-sized domain and codomain, so it has the potential to be
invertible, but in this case is not. Neither injective nor surjective. Diagonalizable, though.


A linear transformation: (Definition LT [452])


T : C5 - C5,


    /XzI\
    12
T    x3
     14
     \15s /


-2xi + 3x2 + 3x3
-16xi + 9x2 + 12x3
-19xi + 7x2 + 14x3
.21x1 + 9x2 + 15x3
-9xi + 5x2 + 733 -


- 6x4 + 3x5
- 2834 + 28x5
- 32x4 + 37x5
- 3534 + 39x5
16x4 + 16x5 _


A basis for the null space of the linear transformation: (Definition KLT [481])


                                               3I
                                               4
                                               1
                                               3
                                               _3_


Injective: No. (Definition ILT [477])


Since the kernel is nontrivial Theorem KILT [484] tells us that the linear transformation is not injective.
Also, since the rank can not exceed 3, we are guaranteed to have a nullity of at least 2, just from checking
dimensions of the domain and the codomain. In particular, verify that


     / 1  \      4
     3          55
T     -1    =   72
      2         77
      \ .       31_


    /4\        4
      7       55
T    0     =  72
     5        77
     \ _7     31_


This demonstration that T is not injective is constructed with the observation that

                                           4       1      3
                                           7       3      4
                                           0 =    -1 +    1
                                           5       2      3
                                           7       4      3


Version 2.02


﻿
                                                                                    Archetype Q   767


and

                                                  3
                                                  4
                                             z =   1 E /C(T)
                                                   3
                                                   _3
so the vector z effectively "does nothing" in the evaluation of T.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


                                    -2      3      3      -6       3
                                    -16     9     12     -28      28
                                    -19 , 7 , 14 , -32 , 37
                                    -21     9     15     -35      39
                                    -9_     _     _7_    -16      16_J
If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:

                                         1     0       0      0
                                         0      1      0      0
                                         0  ,  0    ,  1   ,  0
                                         0     0       0      1
                                         1     -1     _-1_    2


    Surjective: No. (Definition SLT [492])

The dimension of the range is 4, and the codomain (C5) has dimension 5. So R(T) - C5 and by Theorem
RSLT [498] the transformation is not surjective.
                                   -_1i
                                   2
    To be more precise, verify that 3 g 7Z(T), by setting the output equal to this vector and seeing that
                                   -1
                                   4


the resulting system of linear equations has no solution, i.e. is inconsistent. So the preimage, T-1 3,


is empty. This alone is sufficient to see that the linear transformation is not onto.


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results


Version 2.02


﻿
                                                                                  Archetype Q     768


for matrices. Verify Theorem RPNDD [517].

                Domain dimension: 5                 Rank: 4                 Nullity: 1


    Invertible: No.

Neither injective nor surjective. Notice that since the domain and codomain have the same dimension,
either the transformation is both onto and one-to-one (making it invertible) or else it is both not onto and
not one-to-one (as in this case) by Theorem RPNDD [517].


    Matrix representation (Theorem MLTCV [460]):

                                                         -2    3  3    -6   3
                                                         -16   9 12   -28   28
                      T:(C5F-(C5, T(x)=Ax, A=            -19   7 14   -32   37
                                                         -21 9 15     -35 39
                                                         -9    5  7   -16   16


    Eigenvalues and eigenvectors (Definition EELT [574], Theorem EER [586]):


                                                             0
                                                             2
                   A = -1                    ET(-1)          3
                                                             3
                                                             11
                                                             3
                                                             4
                   A = 0                       Er (0) =      1
                                                             3
                                                             _3_


                   A=1                         E(1) =K       0{ ,  0   ,


Evaluate the linear transformation with each of these eigenvectors as an interesting check.


  ]A diagonal matrix representation relative to a basis of eigenvectors, B.


        0     3     5    -3       1
        2     4     3     1      -1
B=      3  , 1   ,0    , 0    , 2
        3     3     0     2       0
        _1_   3_  2_ _0  _ _0 _


Version 2.02


﻿
Archetype Q       769


           -1    0 0 0 0
           0     0 0    0 0
MgT,B=      0    0  1 0 0
            0    0 0    1 0
            0 0 0 0 1_


Version 2.02


﻿
Archetype R     770


Archetype R


U.


7


Summary Linear transformation with equal-sized domain and codomain. Injective, surjective, invert-
ible, diagonalizable, the works.


A linear transformation: (Definition LT [452])


  12
S 3
  3:4
  1\5/


-65x1 + 12812 + 1013 - 26214 + 4015
36x1 - 73x2 - 13 + 15114 - 1615
-441i + 8812 + 5x3 - 180X4 + 24x5
341i - 6812 - 313 + 14014 - 1815
   12x1 - 24x2 - 33 + 49x4 - 5x5


A basis for the null space of the linear transformation: (Definition KLT [481])


                                              { }


Injective: Yes. (Definition ILT [477])


Since the kernel is trivial Theorem KILT [484] tells us that the linear transformation is injective.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


I


-65      128      10
36       -73     -1
-44 ,    88   ,   5
34       -68     -3
12       -24     -1


-262
151
-180
140
49 _


40
-16
24
-18
_ -5


I


If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:

                                       1     0     0    0     0
                                       0     1     0    0     0
                                       0  ,  0  ,  1  , 0  ,  0
                                       0     0     0     1    0
                                       _0_   0_    0_   0_    1_


Version 2.02


﻿
Archetype R     771


    Surjective: Yes. (Definition SLT [492])

A basis for the range is the standard basis of C5, so R(T) = C5 and Theorem RSLT [498] tells us T
is surjective. Or, the dimension of the range is 5, and the codomain (C5) has dimension 5. So the
transformation is surjective.


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].

               Domain dimension: 5                  Rank: 5                Nullity: 0


    Invertible: Yes.

Both injective and surjective (Theorem ILTIS [511]). Notice that since the domain and codomain have the
same dimension, either the transformation is both injective and surjective (making it invertible, as in this
case) or else it is both not injective and not surjective.


    Matrix representation (Theorem MLTCV [460]):

                                                     -65   128   10  -262    40
                                                     36    -73   -1   151   -16
                   T: C5 - C5, T (x) = Ax, A =       -44    88    5  -180    24
                                                      34   -68   -3   140   -18
                                                      12   -24   -1    49    -5


    The inverse linear transformation (Definition IVLT [508]):

                                        / 1       -47x1 + 92x2 + x3 - 181X4 - 14x5
                                        X2        27xi- 552+ 2X3+         X4 +11X5
                T-1 : C5 - C5, T-1    3    =  -32x1 + 64x2 -z3 - 126X4 - 12x5
                                        X4         25zi - 50x2 +       yzX+19x4 + 9x5
                                        \5_/      _9xi-18x2+2x3+       19 -- 4-+4X5
Verify that T (T-1 (x)) = x and T (T-1 (x)) = x, and notice that the representations of the transformation
and its inverse are matrix inverses (Theorem IMR [557], Definition MI [213]).


  ]Eigenvalues and eigenvectors (Definition EELT [574], Theorem EER [586]):


                                                                 5[ 2
                                                                 0       1
                     A =-1                       E(1V            -18     0 52


Version 2.02


﻿
                                                                                  Archetype R     772


                                                                 -10      2
                                                                 -5       3
                     A=1                           ET(1)=K        -6      1


Evaluate the linear transformation with each of these eigenvectors as an interesting check.


  ]A diagonal matrix representation relative to a basis of eigenvectors, B.


                               B    =-18        0      -6     1,    -41
                                                0      0      1


                                     -1    0   0 0   0
                                     0    -1   0  0  0
                              MB B    0    0   1 0   0
                                      0    00_10
                                      140      0 002


Version 2.02


﻿
Archetype S    773


Archetype S


U.


7


Summary Domain is column vectors, codomain is matrices. Domain is dimension 3 and codomain is
dimension 4. Not injective, not surjective.


A linear transformation: (Definition LT [452])


                     Ta            -b
T:(C3    M22,   T     b    = -
                              [3a+b+c


2a+2b+c
-2a-6b-2cJ


A basis for the null space of the linear transformation: (Definition KLT [481])


                                              [-}
                                              -1
                                              4


Injective: No. (Definition ILT [477])


Since the kernel is nontrivial Theorem KILT [484] tells us that the linear transformation is not injective.
Also, since the rank can not exceed 3, we are guaranteed to have a nullity of at least 1, just from checking
dimensions of the domain and the codomain. In particular, verify that


     2 -
          21       9
T 1 =
     3        10  -16]
     3 -


      0-
         0      1    9
T  -1 [=
      11-


This demonstration that T is not injective is constructed with the observation that

                                          0       2     -2
                                          -1 =    1 +   -2
                                          11      3      8


and


      -2
z =   -2 E IC (T)
      8


so the vector z effectively "does nothing" in the evaluation of T.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT


Version 2.02


﻿
                                                                                     Archetype S     774


[500]):


                                      21        - 1   2 6   0   1
                                      3   -           -          2 J
If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:

                                          {110 0 12


    Surjective: No. (Definition SLT [492])

The dimension of the range is 2, and the codomain (M22) has dimension 4. So the transformation is not
surjective. Notice too that since the domain C3 has dimension 3, it is impossible for the range to have a
dimension greater than 3, and no matter what the actual definition of the function, it cannot possibly be
surjective in this situation.
    To be more precise, verify that  3 J  R(T), by setting the output of T equal to this matrix and
seeing that the resulting system of linear equations has no solution, i.e. is inconsistent. So the preimage,

T-1 ([1  3    is empty. This alone is sufficient to see that the linear transformation is not onto.


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].

                Domain dimension: 3                  Rank: 2                  Nullity: 1


    Invertible: No.

Not injective (Theorem ILTIS [511]), and the relative dimensions of the domain and codomain prohibit
any possibility of being surjective.


    Matrix representation (Definition MR [542]):


                                  B= { [a, [1], [0


                                  C={[ 1 0[0 1]           [00      00}
                                               0 0_ ' 0_ ' I      [_' 0 1


           1   -1    0
  T        2    2    1
MB~c       3    1    1
          -2 -6 -2_


Version 2.02


﻿

Archetype S


775


Archetype S      775


Version 2.02


﻿
Archetype T     776


Archetype T


U.


7


Summary       Domain and codomain are polynomials. Domain has dimension 5, while codomain has
dimension 6. Is injective, can't be surjective.


A linear transformation: (Definition LT [452])

                             T: P4 s P,     T (p(x)) = (x - 2)p(x)


A basis for the null space of the linear transformation: (Definition KLT [481])


                                              { }


Injective: Yes. (Definition ILT [477])


Since the kernel is trivial Theorem KILT [484] tells us that the linear transformation is injective.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


                        {x - 2, 2 - 2x,   23 2, 4 - 2 x5 - 24, X6 - 2X}
If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:


{ 32x + 1


15
16


1
8


45     3) 52       Zjx


    Surjective: No. (Definition SLT [492])

The dimension of the range is 5, and the codomain (P5) has dimension 6. So the transformation is not
surjective. Notice too that since the domain P4 has dimension 5, it is impossible for the range to have a
dimension greater than 5, and no matter what the actual definition of the function, it cannot possibly be
surjective in this situation.
   To be more precise, verify that 1+x+x2+x3 +:4 0 7(T), by setting the output equal to this vector and
seeing that the resulting system of linear equations has no solution, i.e. is inconsistent. So the preimage,


Version 2.02


﻿
                                                                                   Archetype T     777


T-1 (1 + x + 2 + x3 + x4), is nonempty. This alone is sufficient to see that the linear transformation is
not onto.


] Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].


Domain dimension: 5


Rank: 5


Nullity: 0


    Invertible: No.

The relative dimensions of the domain and codomain prohibit any possibility of being surjective, so apply
Theorem ILTIS [511].


    Matrix representation (Definition MR [542]):


B.
C.


{1, z,
{1, x,


MBTC~


-2
1
0
0
0
0


x2
x2

0
-2
1
0
0
0


x3, x4}
x3, 5', x}


0
0
-2
1
0
0


0
0
0
-2
1
0


0
0
0
0
-2
1_


Version 2.02


﻿
Archetype U     778


Archetype U


U.


7


Summary Domain is matrices, codomain is column vectors. Domain has dimension 6, while codomain
has dimension 4. Can't be injective, is surjective.


A linear transformation: (Definition LT [452])


T:M23 C4,


Zd e f


a+2b+12c-3d+e+6f
   2a-b-c+d-11f
 a+b+7c+2d+e-3f
 a+2b+12c+5e-5f _


A basis for the null space of the linear transformation: (Definition KLT [481])


{E1


-4
2


0    -2
1_'   0


-5 1
0 0_ J


Injective: No. (Definition ILT [477])


Since the kernel is nontrivial Theorem KILT [484] tells us that the linear transformation is not injective.
Also, since the rank can not exceed 4, we are guaranteed to have a nullity of at least 2, just from checking
dimensions of the domain and the codomain. In particular, verify that


10
-1


          -7
    -2-14
1J _  -1
         -13_


-3
3


31


[I


-7
-14
-1
-13_


This demonstration that T is not injective is constructed with the observation that


5
5


-3
3


-1  1 10 -21[4 -13 11
3 J[3 -111_+2               4    2]


and


     4
Z    2


-13 1     C(T)
4    2


so the vector z effectively "does nothing" in the evaluation of T.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


Version 2.02


﻿
                                                                                Archetype U    779


                               1     2      12     -3     1      6
                               2    -1     -1      1      0    -11
                               1    '] 1g7         2 1          -3
                               1     2      12     0      5     -5
If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:

                                        1     0    0     0
                                        0     1    0     0
                                        0  ' 0   ' 1  ' 0
                                        0     0    0     1


    Surjective: Yes. (Definition SLT [492])

A basis for the range is the standard basis of C4, so R(T) = C4 and Theorem RSLT [498] tells us T
is surjective. Or, the dimension of the range is 4, and the codomain (C4) has dimension 4. So the
transformation is surjective.


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].

               Domain dimension: 6                Rank: 4                Nullity: 2


    Invertible: No.

The relative sizes of the domain and codomain mean the linear transformation cannot be injective. (The-
orem ILTIS [511])


    Matrix representation (Definition MR [542]):


                          0o]o'0                   1001'[001                [ '    01


              ~1 > 12 -3 1 6


L1   2   12   0   5  -5]


Version 2.02


﻿
Archetype V     780


Archetype V


U.


7


Summary Domain is polynomials, codomain is matrices. Domain and codomain both have dimension
4. Injective, surjective, invertible. Square matrix representation, but domain and codomain are unequal,
so no eigenvalue information.


A linear transformation: (Definition LT [452])


T : P3 F-M22, T (a+bx +cx2 +dx3)


a+b a-2c
  d     b - d


A basis for the null space of the linear transformation: (Definition KLT [481])


                                              { }


Injective: Yes. (Definition ILT [477])


Since the kernel is trivial Theorem KILT [484] tells us that the linear transformation is injective.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


                                { [1  1 ][ 1 0   [0   -2 ][0     0 ]

If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:


                                     { 10   01       0 0     00
                                   {[  0_ '   0L0_ '1 0_ '01_ }


    Surjective: Yes. (Definition SLT [492])

A basis for the range is the standard basis of M22, so 7(T) = M22 and Theorem RSLT [498] tells us


Version 2.02


﻿
                                                                                    Archetype V      781


T is surjective. Or, the dimension of the range is 4, and the codomain (M22) has dimension 4. So the
transformation is surjective.


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].

                Domain dimension: 4                  Rank: 4                  Nullity: 0


    Invertible: Yes.

Both injective and surjective (Theorem ILTIS [511]). Notice that since the domain and codomain have the
same dimension, either the transformation is both injective and surjective (making it invertible, as in this
case) or else it is both not injective and not surjective.


    Matrix representation (Definition MR [542]):


                                  B ={1, X, 2, x3

                                  C ={[ 10[0         1]   [00      0 0]

                                        1 1    0    0
                                M       1 0 -2      0
                              M   e     0 0    0    1
                                        0  1   0   -1


    Since invertible, the inverse linear transformation. (Definition IVLT [508])

          T-1:M22      P3 F TP, T- L-            - c (a+c+d)+(c+d)x+-(a-b-c-d)2+cxs


Version 2.02


﻿
Archetype W      782


Archetype W


U.


7


Summary Domain is polynomials, codomain is polynomials. Domain and codomain both have dimen-
sion 3. Injective, surjective, invertible, 3 distinct eigenvalues, diagonalizable.


    A linear transformation: (Definition LT [452])

        T: P2 H-P2,  T (a + bz + cz2) = (19a + 6b - 4c) + (-24a - 7b+ 4c) + (36a + 12b - 9c)


    A basis for the null space of the linear transformation: (Definition KLT [481])


                                                  { }


Injective: Yes. (Definition ILT [477])


Since the kernel is trivial Theorem KILT [484] tells us that the linear transformation is injective.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


                           {19 - 24x + 36x2, 6 - 7x + 12x2, -4 + 4x - 9x2
If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows
of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:

                                              {1, z, z2


Surjective: Yes. (Definition SLT [492])


A basis for the range is the standard basis of C5, so R(T) = C5 and Theorem RSLT [498] tells us T
is surjective. Or, the dimension of the range is 5, and the codomain (C5) has dimension 5. So the
transformation is surjective.


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].


Domain dimension: 3


Rank: 3


Nullity: 0


Version 2.02


﻿
Archetype W      783


    Invertible: Yes.

Both injective and surjective (Theorem ILTIS [511]). Notice that since the domain and codomain have the
same dimension, either the transformation is both injective and surjective (making it invertible, as in this
case) or else it is both not injective and not surjective.


    Matrix representation (Definition MR [542]):


                                           B = {i, x, x2}
                                           C = {i, zx2
                                                  19    6   -4
                                       MB c =    -24   -7    4
                                                  36    12  -9]


    Since invertible, the inverse linear transformation. (Definition IVLT [508])
                T1P~-P, 1a~xcx)4                                         20                   11
    T-1:- P 2  Pa  T-1 (a + bz + cz2) = (-5a - 2b+ -c) + (24a + 9b- xc) x+ (12a + 4b - c)x2
                                                       3                  3                   3


    Eigenvalues and eigenvectors (Definition EELT [574], Theorem EER [586]):


                       A =-1                          ET(-1) =K({2x + 3x2})
                       A = 1                           E(1) = ({-1 + 3x})
                       A = 3                           ET(3) =Q({1 - 2x+X2})

Evaluate the linear transformation with each of these eigenvectors as an interesting check.


    A diagonal matrix representation relative to a basis of eigenvectors, B.

                                   B    {2x + 3x2, -1 + 3x, 1 - 2x + x2}
                                         -1 0 0
                               Mno,?'=I0      1 0


Version 2.02


﻿
Archetype X     784


Archetype X


U.


7


Summary Domain and codomain are square matrices. Domain and codomain both have dimension 4.
Not injective, not surjective, not invertible, 3 distinct eigenvalues, diagonalizable.


A linear transformation: (Definition LT [452])


T: M22 H M22, T([a b1)


-2a+ 15b+3c+27d
     a-5b-9d


lOb+6c+ 18d
-a-4b-5c-8dJ


A basis for the null space of the linear transformation: (Definition KLT [481])


-6
2


-31
1i


Injective: No. (Definition ILT [477])


Since the kernel is nontrivial Theorem KILT [484] tells us that the linear transformation is not injective.
In particular, verify that


-2 l0 [115  781
-4])  -38 -35]


T([41 3]


115
-38


78
-35_


This demonstration that T is not injective is constructed with the observation that

                                   4    3      -2   0] + [2 6    3]


and


                                         z    [=-2  -1

so the vector z effectively "does nothing" in the evaluation of T.


    A basis for the range of the linear transformation: (Definition RLT [496])

Evaluate the linear transformation on a standard basis to get a spanning set for the range (Theorem SSRLT
[500]):


                            {f-2    0      151        3   6      27   8
                                    1 - 1   5   -4J '"0  6-5J'   -29  -8 J
If the linear transformation is injective, then the set above is guaranteed to be linearly independent (The-
orem ILTLI [485]). This spanning set may be converted to a "nice" basis, by making the vectors the rows


Version 2.02


﻿
                                                                                 Archetype X     785


of a matrix (perhaps after using a vector reperesentation), row-reducing, and retaining the nonzero rows
(Theorem BRS [245]), and perhaps un-coordinatizing. A basis for the range is:


                                    {1 0 J 0100


    Surjective: No. (Definition SLT [492])

The dimension of the range is 3, and the codomain (M22) has dimension 5. So R(T) - JM22 and by
Theorem RSLT [498] the transformation is not surjective.
   To be more precise, verify that 31   7ZR(T), by setting the output of T equal to this matrix and
seeing that the resulting system of linear equations has no solution, i.e. is inconsistent. So the preimage,

T-1 (31_),)is empty. This alone is sufficient to see that the linear transformation is not onto.


    Subspace dimensions associated with the linear transformation. Examine parallels with earlier results
for matrices. Verify Theorem RPNDD [517].

               Domain dimension: 4                  Rank: 3                Nullity: 1


    Invertible: No.

Neither injective nor surjective (Theorem ILTIS [511]). Notice that since the domain and codomain have
the same dimension, either the transformation is both injective and surjective or else it is both not injective
and not surjective (making it not invertible, as in this case).


    Matrix representation (Definition MR [542]):


                                      B = { ,1  0 0n 11 0 01  [   F l


                                      7 0 10,6118
                                      I [

                               M ~     -1 i4     -5   -8_


  ]Eigenvalues and eigenvectors (Definition EELT [574], Theorem EER [586]):


=0                       ET(0)=K{2         13]}


Version 2.02


﻿
                                                                                        Archetype X      786


                     A = 1                      EFT (1) =     37    02    -1 -2

                     A = 3                      Er (3) =   {   3-2

Evaluate the linear transformation with each of these eigenvectors as an interesting check.


    A diagonal matrix representation relative to a basis of eigenvectors, B.


                            B{[ 2         1_3]   37   0 2]' -01      _ 2]' [1     J2]}
                                  0 0 0 0
                         MT    - 0 1 0 0
                         MBB      0 0 3 0
                                  _0 0 0 3_


Version 2.02


﻿


Appendix GFDL

GNU Free Documentation License


                                    Version 1.2, November 2002
                     Copyright @2000,2001,2002 Free Software Foundation, Inc.

                     59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is
                                           not allowed.

                                           Preamble

   The purpose of this License is to make a manual, textbook, or other functional and useful document
"free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with
or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for
the author and publisher a way to get credit for their work, while not being considered responsible for
modifications made by others.
   This License is a kind of "copyleft", which means that derivative works of the document must themselves
be free in the same sense. It complements the GNU General Public License, which is a copyleft license
designed for free software.
   We have designed this License in order to use it for manuals for free software, because free software
needs free documentation: a free program should come with manuals providing the same freedoms that
the software does. But this License is not limited to software manuals; it can be used for any textual work,
regardless of subject matter or whether it is published as a printed book. We recommend this License
principally for works whose purpose is instruction or reference.

                   1. APPLICABILITY AND DEFINITIONS

   This License applies to any manual or other work, in any medium, that contains a notice placed by
the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a
world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein.
The "Document", below, refers to any such manual or work. Any member of the public is a licensee,
and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way
requiring permission under copyright law.
   A "Modified Version" of the Document means any work containing the Document or a portion of
it, either copied verbatim, or with modifications and/or translated into another language.
   A "Secondary Section" is a named appendix or a front-matter section of the Document that deals
exclusively with the relationship of the publishers or authors of the Document to the Document's overall
subject (or to related matters) and contains nothing that could fall directly within that overall subject.
(Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any


787


﻿
                                                Appendix GFDL    GNU Free Documentation License 788


mathematics.) The relationship could be a matter of historical connection with the subject or with related
matters, or of legal, commercial, philosophical, ethical or political position regarding them.
   The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those
of Invariant Sections, in the notice that says that the Document is released under this License. If a section
does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The
Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections
then there are none.
   The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-
Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text
may be at most 5 words, and a Back-Cover Text may be at most 25 words.
   A "Transparent" copy of the Document means a machine-readable copy, represented in a format
whose specification is available to the general public, that is suitable for revising the document straightfor-
wardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings)
some widely available drawing editor, and that is suitable for input to text formatters or for automatic
translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise
Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage
subsequent modification by readers is not Transparent. An image format is not Transparent if used for
any substantial amount of text. A copy that is not "Transparent" is called "Opaque".
   Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input
format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming
simple HTML, PostScript or PDF designed for human modification. Examples of transparent image
formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and
edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools
are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word
processors for output purposes only.
   The "Title Page" means, for a printed book, the title page itself, plus such following pages as are
needed to hold, legibly, the material this License requires to appear in the title page. For works in formats
which do not have any title page as such, "Title Page" means the text near the most prominent appearance
of the work's title, preceding the beginning of the body of the text.
   A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely
XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ
stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications",
"Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the
Document means that it remains a section "Entitled XYZ" according to this definition.
   The Document may include Warranty Disclaimers next to the notice which states that this License
applies to the Document. These Warranty Disclaimers are considered to be included by reference in this
License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers
may have is void and has no effect on the meaning of this License.

                              2. VERBATIM COPYING

   You may copy and distribute the Document in any medium, either commercially or noncommercially,
provided that this License, the copyright notices, and the license notice saying this License applies to the
Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this
License. You may not use technical measures to obstruct or control the reading or further copying of the
copies you make or distribute. However, you may accept compensation in exchange for copies. If you


distribute a large enough number of copies you must also follow the conditions in section 3.
   You may also lend copies, under the same conditions stated above, and you may publicly display copies.

                            3. COPYING IN QUANTITY


Version 2.02


﻿
                                                  Appendix GFDL    GNU Free Documentation License 789


    If you publish printed copies (or copies in media that commonly have printed covers) of the Document,
numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the
copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover,
and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the
publisher of these copies. The front cover must present the full title with all words of the title equally
prominent and visible. You may add other material on the covers in addition. Copying with changes
limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can
be treated as verbatim copying in other respects.
    If the required texts for either cover are too voluminous to fit legibly, you should put the first ones
listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
    If you publish or distribute Opaque copies of the Document numbering more than 100, you must
either include a machine-readable Transparent copy along with each Opaque copy, or state in or with
each Opaque copy a computer-network location from which the general network-using public has access
to download using public-standard network protocols a complete Transparent copy of the Document, free
of added material. If you use the latter option, you must take reasonably prudent steps, when you begin
distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible
at the stated location until at least one year after the last time you distribute an Opaque copy (directly
or through your agents or retailers) of that edition to the public.
    It is requested, but not required, that you contact the authors of the Document well before redistributing
any large number of copies, to give them a chance to provide you with an updated version of the Document.

                                   4. MODIFICATIONS

   You may copy and distribute a Modified Version of the Document under the conditions of sections 2
and 3 above, provided that you release the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution and modification of the Modified Version
to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

  A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from
     those of previous versions (which should, if there were any, be listed in the History section of the
     Document). You may use the same title as a previous version if the original publisher of that version
     gives permission.

  B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of
     the modifications in the Modified Version, together with at least five of the principal authors of the
     Document (all of its principal authors, if it has fewer than five), unless they release you from this
     requirement.

  C. State on the Title page the name of the publisher of the Modified Version, as the publisher.

  D. Preserve all the copyright notices of the Document.

  E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.

  F. Include, immediately after the copyright notices, a license notice giving the public permission to use
     the Modified Version under the terms of this License, in the form shown in the Addendum below.


G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in
   the Document's license notice.

H. Include an unaltered copy of this License.


Version 2.02


﻿
                                                  Appendix GFDL  GNU Free Documentation License 790


   I. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least
     the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If
     there is no section Entitled "History" in the Document, create one stating the title, year, authors,
     and publisher of the Document as given on its Title Page, then add an item describing the Modified
     Version as stated in the previous sentence.

  J. Preserve the network location, if any, given in the Document for public access to a Transparent copy
     of the Document, and likewise the network locations given in the Document for previous versions it
     was based on. These may be placed in the "History" section. You may omit a network location for a
     work that was published at least four years before the Document itself, or if the original publisher of
     the version it refers to gives permission.

  K. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section,
     and preserve in the section all the substance and tone of each of the contributor acknowledgements
     and/or dedications given therein.

  L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section
     numbers or the equivalent are not considered part of the section titles.

 M. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified
     Version.

  N. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any
     Invariant Section.

  0. Preserve any Warranty Disclaimers.

    If the Modified Version includes new front-matter sections or appendices that qualify as Secondary
Sections and contain no material copied from the Document, you may at your option designate some or all
of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified
Version's license notice. These titles must be distinct from any other section titles.
   You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of
your Modified Version by various parties-for example, statements of peer review or that the text has been
approved by an organization as the authoritative definition of a standard.
   You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words
as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any
one entity. If the Document already includes a cover text for the same cover, previously added by you or
by arrangement made by the same entity you are acting on behalf of, you may not add another; but you
may replace the old one, on explicit permission from the previous publisher that added the old one.
    The author(s) and publisher(s) of the Document do not by this License give permission to use their
names for publicity for or to assert or imply endorsement of any Modified Version.

                           5. COMBINING DOCUMENTS

   You may combine the Document with other documents released under this License, under the terms
defined in section 4 above for modified versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of


your combined work in its license notice, and that you preserve all their Warranty Disclaimers.
   The combined work need only contain one copy of this License, and multiple identical Invariant Sections
may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different
contents, make the title of each such section unique by adding at the end of it, in parentheses, the name


Version 2.02


﻿
                                                Appendix GFDL  GNU Free Documentation License 791


of the original author or publisher of that section if known, or else a unique number. Make the same
adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
   In the combination, you must combine any sections Entitled "History" in the various original documents,
forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and
any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements".

                      6. COLLECTIONS OF DOCUMENTS

   You may make a collection consisting of the Document and other documents released under this License,
and replace the individual copies of this License in the various documents with a single copy that is included
in the collection, provided that you follow the rules of this License for verbatim copying of each of the
documents in all other respects.
   You may extract a single document from such a collection, and distribute it individually under this
License, provided you insert a copy of this License into the extracted document, and follow this License in
all other respects regarding verbatim copying of that document.

           7. AGGREGATION WITH INDEPENDENT WORKS

   A compilation of the Document or its derivatives with other separate and independent documents or
works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright
resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what
the individual works permit. When the Document is included in an aggregate, this License does not apply
to the other works in the aggregate which are not themselves derivative works of the Document.
   If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the
Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers
that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is
in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.

                                   8. TRANSLATION

   Translation is considered a kind of modification, so you may distribute translations of the Document
under the terms of section 4. Replacing Invariant Sections with translations requires special permission
from their copyright holders, but you may include translations of some or all Invariant Sections in addition
to the original versions of these Invariant Sections. You may include a translation of this License, and all
the license notices in the Document, and any Warranty Disclaimers, provided that you also include the
original English version of this License and the original versions of those notices and disclaimers. In case
of a disagreement between the translation and the original version of this License or a notice or disclaimer,
the original version will prevail.
   If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the re-
quirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.

                                   9. TERMINATION

   You may not copy, modify, sublicense, or distribute the Document except as expressly provided for
under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and
will automatically terminate your rights under this License. However, parties who have received copies, or


rights, from you under this License will not have their licenses terminated so long as such parties remain
in full compliance.

                10. FUTURE REVISIONS OF THIS LICENSE


Version 2.02


﻿
                                                 Appendix GFDL   GNU Free Documentation License 792


   The Free Software Foundation may publish new, revised versions of the GNU Free Documentation
License from time to time. Such new versions will be similar in spirit to the present version, but may differ
in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.
   Each version of the License is given a distinguishing version number. If the Document specifies that
a particular numbered version of this License "or any later version" applies to it, you have the option of
following the terms and conditions either of that specified version or of any later version that has been
published (not as a draft) by the Free Software Foundation. If the Document does not specify a version
number of this License, you may choose any version ever published (not as a draft) by the Free Software
Foundation.

        ADDENDUM: How to use this License for your documents

   To use this License in a document you have written, include a copy of the License in the document and
put the following copyright and license notices just after the title page:


     Copyright ©YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this
     document under the terms of the GNU Free Documentation License, Version 1.2 or any later
     version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover
     Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU
     Free Documentation License".


   If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with...Texts."
line with this:


     with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being
     LIST, and with the Back-Cover Texts being LIST.


   If you have Invariant Sections without Cover Texts, or some other combination of the three, merge
those two alternatives to suit the situation.
   If your document contains nontrivial examples of program code, we recommend releasing these examples
in parallel under your choice of free software license, such as the GNU General Public License, to permit
their use in free software.


Version 2.02


﻿


Part T
Topics


793


﻿


Section F
Fields


DRAFT: THIS SECTION COMPLETE, BUT SUBJECT To CHANGE

   We have chosen to present introductory linear algebra in the Core (Part C [2]) using scalars from the
set of complex numbers, C. We could have instead chosen to use scalars from the set of real numbers,
R. This would have presented certain difficulties when we encountered characteristic polynomials with
complex roots (Definition CP [403]) or when we needed to be sure every matrix had at least one eigenvalue
(Theorem EMHE [400]). However, much of the basics would be unchanged. The definition of a vector space
would not change, nor would the ideas of linear independence, spanning, or bases. Linear transformations
would still behave the same and we would still obtain matrix representations, though our ideas about
canonical forms would have to be adjusted slightly.
   The real numbers and the complex numbers are both examples of what are called fields, and we can "do"
linear algebra in just a bit more generality by letting our scalars take values from some unspecified field.
So in this section we will describe exactly what constitutes a field, give some finite examples, and discuss
another connection between fields and vector spaces. Vector spaces over finite fields are very important
in certain applications, so this is partially background for other topics. As such, we will not prove every
claim we make.

Subsection F
Fields


Like a vector space, a field is a set along with two binary operations. The distinction is that both operations
accept two elements of the set, and then produce a new element of the set. In a vector space we have two
sets   the vectors and the scalars, and scalar multiplication mixes one of each to produce a vector. Here
is the careful definition of a field.

Definition F
Field
Suppose that F is a set upon which we have defined two operations: (1) addition, which combines two
elements of F and is denoted by "+", and (2) multiplication, which combines two elements of F and
is denoted by juxtaposition. Then F, along with the two operations, is a field if the following properties
hold.

   " ACF Additive Closure, Field
     If a,3 E F, then a +/3 EF.

   * MCF Multiplicative Closure, Field
     If a,3 # F, then a#3 E F.

   * CAF Commutativity of Addition, Field
     If a,3 E F, then a +#= #3+ca.

   * CMF Commutativity of Multiplication, Field
     If a,f C F, then ca# = #a.


" AAF Additive Associativity, Field
  If a, 3, 7 E V, then a + (,3+-y) =(a+3)+y.


﻿
                                                                      Subsection F.FF Finite Fields 795


   " MAF Multiplicative Associativity, Field
     If a, 3, 7 E V, then a (#)/) (a,)-y.

   " DF    Distributivity, Field
     If ca, 3, 7 EF ,then a(Q+7y) = a3+cay.

   " ZF    Zero, Field
     There is an element, 0 E F, called zero, such that a + 0 = a for all a E F.

   " OF    One, Field
     There is an element, 1 E F, called one, such that a(l) =_a for all a E F.

   " AIF Additive Inverse, Field
     If a E F, then there exists -a c V so that a + (-a) = 0.

   " MIF Multiplicative Inverse, Field
     If a E F, a # 0, then there exists  E V so that a ()  1.
                                                                                                    A
    Mostly this definition says that all the good things you might expect, really do happen in a field. The
one technicality is that the special element, 0, the additive identity element, does not have a multiplicative
inverse. In other words, no dividing by zero.
    This definition should remind you of Theorem PCNA [680], and indeed, Theorem PCNA [680] provides
the justification for the statement that the complex numbers form a field. Another example of field is the
set of rational numbers

                                   Q=     - p, q are integers, q # 0
                                         q
Of course, the real numbers, R, also form a field. It is this field that you probably studied for many years.
You began studying the integers ("counting"), then the rationals ("fractions"), then the reals ("algebra"),
along with some excursions in the complex numbers ("imaginary numbers"). So you should have seen
three fields already in your previous studies.
    Our first observation about fields is that we can go back to our definition of a vector space (Definition
VS [279]) and replace every occurrence of C by some general, unspecified field, F, and all our subsequent
definitions and theorems are still true, so long as we avoid roots of polynomials (or equivalently, factoring
polynomials). So if you consult more advanced texts on linear algebra, you will see this sort of approach.
You might study some of the first theorems we proved about vector spaces in Subsection VS.VSP [285] and
work through their proofs in the more general setting of an arbitrary field. This exercise should convince
you that very little changes when we move from C to an arbitrary field F. (See Exercise F.T10 [800].)

Subsection FF
Finite Fields


It may sound odd at first, but there exist finite fields, and even finite vector spaces. We will find certain
of these important in subsequent applications, so we collect some ideas and properties here.
Definition IMP
Integers Modulo a Prime


Suppose that p is a prime number. Let 7Z= {0, 1, 2, ..., p - 1}. Add and multiply elements of 7Z as
integers, but whenever a result lies outside of the set Z, find its remainder after division by p and replace
the result by this remainder.                                                                        A
   We have defined a set, and two binary operations. The result is a field.


Version 2.02


﻿
                                                                    Subsection F.FF Finite Fields 796


Theorem FIMP
Field of Integers Modulo a Prime
The set of integers modulo a prime p, Z,, is a field.                                             D

Example IM11
Integers mod 11
Zn1 is a field by Theorem FIMP [795]. Here we provide some sample calculations.

                  8+5=2                       -8=3                      5-9=7
                                                1                           6
                   5(7)2                                      -8=10
                                                7                           5
                                                                            1
                     25=10                    -1    10-                         ?
                                                                            0


   We can now "do" linear algebra using scalars from a finite field.

Example VSIM5
Vector space over integers mod 5
Let (Z5)3 be the set of all column vectors of length 3 with entries from Z5. Use Z5 as the set of scalars.
Define addition and multiplication the usual way. We exhibit a few sample calculations.

                         2     4      1                            2      1
                         3 +    1= 4                            3 0    =0
                         4      3     2                            4      2

We can, of course, build linear combinations, such as

                                        1      2      1      0
                                     2 3 -4 1 +       2     [4
                                       0        1     4     -0

which almost looks like a relation of linear dependence. The set

                                               1    2
                                               3 ,2
                                            1 K     0

is linearly independent, while the set


is linearly dependent, as can be seen from the relation of linear dependence formed by the scalars ail 2,
a2 =1 and a3     4. To find these scalars, one would take the same approach as Example LDS [132], but
in performing row operations to solve a homogeneous system, you would need to take care that all scalar
(field) operations are performed over Z5, especially when multiplying a row by a scalar to make a leading
entry equal to 1. One more observation about this example -the set


1     1    1
0     1    1
0     0    1


Version 2.02


﻿
                                                                      Subsection F.FF Finite Fields 797


is a basis for (Z5)3, since it is both linearly independent and spans (Z5)3.
    In applications to computer science or electrical engineering, Z2 is the most important field, since it can
be used to describe the binary nature of logic, circuitry, communications and their intertwined relationships.
The vector space of column vectors with entries from Z2, (Z2)", with scalars taken from Z2 is the natural
extension of this idea. Notice that Z2 has the minimum number of elements to be a field, since any field
must contain a zero and a one (Property ZF [794], Property OF [794]).
Example SM2Z7
Symmetric matrices of size 2 over7
We can employ the field of integers modulo a prime to build other examples of vector spaces with novel
fields of scalars. Define

                                   S2 (7)=      a     |bab, C Ez7
                                   S22(7L7){b      c a

which is the set of all 2 x 2 symmetric matrices with entries from Z7. Use the field Z7 as the set of scalars,
and define vector addition and scalar multiplication in the natural way. The result will be a vector space.
    Notice that the field of scalars is finite, as is the vector space, since there are 73 = 343 matrices in
S22 (Z7). The set

                                      (1 0       0 1     0 0
                                         0  0_'  1  0_'  0   1_
is a basis, so dim (S22 (Z7)) = 3.
    In a more advanced algebra course it is possible to prove that the number of elements in a finite field
must be of the form p", where p is a prime. We can't go so far afield as to prove this here, but we can
demonstrate an example.
Example FF8
Finite field of size 8
Define the set F as F= {a + bt + ct2 a, b, c E Z2}. Add and multiply these quantities as polynomials in
the variable t, but replace any occurrence of t3 by t + 1.
    This defines a set, and the two operations on elements of that set. Do not be concerned with what t
"is," because it isn't. t is just a handy device that makes the example a field. We'll say a bit more about
t when we finish. But first, some examples. Remember that 1 + 1= 0 in Z2. Addition is quite simple, for
example,

                        (1+t+t2) + (1+t2) = (1+1)+(1+o)t+(1+1)t2 = t


Multiplication gets more involved, for example,

                           (1+ t+ t2) (1+ t2) =1+ t2 + t+ ts+t2 + t
                                               = 1 + t+(1+ 1)t2+ t3 (1+ t)
                                               = 1 + t+(1 +t) (1 +t)
                                               = 1i+ t+i1+ t+ t+t2
                                               = (1+ 1) +(1 + 1+1)t + t2


Every element has a multiplicative inverse (Property MIF [794]). What is the inverse of t + t2? Check that

                                (t+t2)(1+t)=t+t2+t2+t3


Version 2.02


﻿
                                                                      Subsection F.FF Finite Fields 798


                                                =t +(1+ 1)t2 +(1+ t)
                                                =t+1+t
                                                =1+(1+1)t
                                                =1

So we can write        =1 + t. So that you may experiment, we give you the complete addition and
multiplication tables for this field. Addition is simple, while multiplication is more interesting, so verify
a few entries of each table. Because of the commutativity of addition and multiplication (Property CAF
[793], Property CMF [793]), we have just listed half of each table.

              +          0   1 t      t2      t+1        t2+t       t2+t+1 t2+1
              0          0   1 t      t2      t+1        t2+t       t2+t+1 t2+1
              1             0 t+1 t2+1 t                 t2+t+1 t2+t           t2
              t                 0     t2+t    1          t2         t2+1       t2+t+1
              P20                             t2+t+1 t              t+1        1
              t+1                             0          t2+1       t2         t2+t
              t2+t                                       0          1          t+1
              t2+t+1                                                0          t
              t2+1                                                             0

                -          0 1 t     t2     t+1        t2+t       t2+t+1 t2+1
                0          0 0 0     0      0          0          0           0
                1             1 t    t2     t+1        t2+t       t2+t+1 t2+1
                t                t2 t+1     t2+t       t2+t+1 t2+1            1
                t2                   t2+t t2+t+1 t2+1              1          t
                t+1                         t2+1        1         t           t2
                t2+t                                   t          t2          t+1
                t2+t+1                                             1+t        t2+t
                t2+1                                                          t2+t+1
Note that every element of F is a linear combination (with scalars from Z2) of the polynomials 1, t, t2.
So B = {1, t, t2 } is a spanning set for F. Further, B is linearly independent since there is no nontrivial
relation of linear dependence, and B is a basis. So dim (F) = 3. Of course, this paragraph presumes that
F is also a vector space over Z2 (which it is).
   The defining relation for t (t3 = t + 1) in Example FF8 [796] arises from the polynomial t3 + t + 1,
which has no factorization with coefficients from Z2. This is an example of an irreducible polynomial,
which involves considerable theory to fully understand. In the exercises, we provide you with a few more
irreducible polynomials to experiment with. See the suggested readings if you would like to learn more.
    Trivially, every field (finite or otherwise) is a vector space. Suppose we begin with a field F. From this
we know F has two binary operations defined on it. We need to somehow create a vector space from F,
in a general way. First we need a set of vectors. That'll be F. We also need a set of scalars. That'll be F
as well. How do we define the addition of two vectors? By the same rule that we use to add them when
they are in the field. How do we define scalar multiplication? Since a scalar is an element of F, and a
vector is an element of F, we can define scalar multiplication to be the same rule that we use to multiply
the two elements as members of the field. With these definitions, F will be a vector space (Exercise F.T20
[800]). This is something of a trivial situation, since the set of vectors and the set of scalars are identical.
In particular, do not confuse this with Example FF8 [796] where the set of vectors has eight elements, and


the set of scalars has just two elements.

Further Reading
Robert J. McEliece, Finite Fields for Scientists and Engineers. Kluwer Academic Publishers, 1987.


Version 2.02


﻿
                                                                        Subsection F.FF  Finite Fields 799


    Rudolpf Lidl, Harald Niederreiter, Introduction to Finite Fields and Their Applications, Revised Edi-
tion. Cambridge University Press, 1994.


Version 2.02


﻿
                                                                        Subsection F.EXC  Exercises 800


Subsection EXC
Exercises


C60 Consider the vector space (Z5)4 composed of column vectors of size 4 with entries from Z5. The
matrix A is a square matrix composed of four such column vectors.

                                                 3 3 0 3
                                              A-1 2 3 0
                                           A =
                                                 1 1 0 2
                                                 4 2 2 1

Find the inverse of A. Use this to find a solution to IJS(A, b) when

                                                      3
                                                b =
                                                      2
                                                      0

Contributed by Robert Beezer Solution [801]

M10 Suppose we relax the restriction in Definition IMP [794] to allow p to not be a prime. Will the
construction given still be a field? Is Z6 a field? Can you generalize?
Contributed by Robert Beezer

M40 Construct a finite field with 9 elements using the set

                                        F={a+bt a,bEZ3}

where t2 is consistently replaced by 2t + 1 in any intermediate results obtained with polynomial multiplica-
tion. Compute the first nine powers of t (t° through t8). Use this information to aid you in the construction
of the multiplication table for this field. What is the multiplicative inverse of 2t?
Contributed by Robert Beezer

M45 Construct a finite field with 25 elements using the set

                                        F = {a + bt a, b EZ5}

where t2 is consistently replaced by t+3 in any intermediate results obtained with polynomial multiplication.
Compute the first 25 powers of t (t0 through t24). Use this information to aid you in computing in this
field. What is the multiplicative inverse of 2t? What is the multiplicative inverse of 4? What is the
multiplicative inverse of 1 + 4t?
    Find a basis for F as a vector space with Zs used as the set of scalars.
Contributed by Robert Beezer

M50 Construct a finite field with 16 elements using the set

                                F ={a +bt +ct2 +dt3   a, b,c, d&E/Z2}

where t4 is consistently replaced by t+1 in any intermediate results obtained with polynomial multiplication.


Compute the first 16 powers of t (t° through t15). Consider the set G = {0, 1, t5, t10}. Then G will also
be a finite field, a subfield of F. Construct the addition and multiplication tables for G. Notice that since
both G and F are vector spaces over Z2, and G C F, by Definition S [292], G is a subspace of F.
Contributed by Robert Beezer


Version 2.02


﻿
                                                                        Subsection F.EXC  Exercises 801


T10 Give a new proof of Theorem ZVSM [286] for a vector space whose scalars come from an arbitrary
field F.
Contributed by Robert Beezer

T20 By applying Definition VS [279], prove that every field is also a vector space. (See the construction
at the end of this section.)
Contributed by Robert Beezer


Version 2.02


﻿
                                                                      Subsection F.SOL Solutions 802


Subsection SOL
Solutions


C60    Contributed by Robert Beezer  Statement [799]
Remember that every computation must be done with arithmetic in the field, reducing any intermediate
number outside of {0, 1, 2, 3, 4} to its remainder after division by 5.
   The matrix inverse can be found with Theorem CINM [217] (and we discover along the way that A is
nonsingular). The inverse is

                                                1 1 3 1
                                         -1'    3 4 1 4
                                                1 4 0 2
                                                3 0 1 0

Then by an application of Theorem SNCM [229] the (unique) solution to the system will be

                                1 1 3 1       3                            2
                        1-l     3 4 1 4       3                            3
                                1 4 0 2       2                            0
                                3 0 1 0       0                            1


Version 2.02


﻿
Section T  Trace  803


Section T
Trace


0


This section contributed by Andy Zimmer.

   The matrix trace is a function that sends square matrices to scalars. In some ways it is reminiscent of
the determinant. And like the determinant, it has many useful and surprising properties.
Definition T
Trace
Suppose A is a square matrix of size n. Then the trace of A, t (A), is the sum of the diagonal entries of
A. Symbolically,
                                                    n
                                            t (A)=     [A]zz
                                                   i=1


(This definition contains Notation T.)


A


    The next three proofs make for excellent practice. In some books they would be left as exercises for the
reader as they are all "trivial" in the sense they do not rely on anything but the definition of the matrix
trace.
Theorem TL
Trace is Linear
Suppose A and B are square matrices of size n. Then t (A + B) = t (A) + t (B). Furthermore, if a E C,
then t (oA) = at (A).                                                                                D
Proof These properties are exactly those required for a linear transformation. To prove these results we
just manipulate sums,


             n
t(A + B) =Z[A + B]vi
            k=1
            n
          = S [A]ii + [B]ii
            i=1
            n          n
          -5    [A]22 +   [B]22
            i=1        i=1
          = t (A) + t (B)


Definition T [802]


Definition MA [182]


Property CACN [680]

Definition T [802]


The second part is as straightforward as the first,


          n
t (aA) 5 [aA]ii
         i=1
         n
         = a [A]gg
         i1

            n
       = a     [A]22
           i=1
         at (A)


Definition T [802]


Definition MSM [183]


Property DCN [681]

Definition T [802]


Version 2.02


﻿
Section T Trace  804


0


Theorem TSRM
Trace is Symmetric with Respect to Multiplication
Suppose A and B are square matrices of size n. Then t (AB)

Proof


t (BA).


El


          n
t (AB)   S  [AB]>
         k=1
         n   n
       = t S [A]1W [B] k
         k=1 f=1
         n   n
       = t S [A] k [B]fk
         f=1 k=1
         n   n
         -EES[B] e [A]k
         f=1 k=1
         n
         E5[BA]ee
         f=1
       = t (BA)


Definition T [802]


Theorem EMP [198]


Property CACN [680]


Property CMCN [680]


Theorem EMP [198]

Definition T [802]


0


Theorem TIST
Trace is Invariant Under Similarity Transformations
Suppose A and S are square matrices of size n and S is invertible. Then t (S1AS)


t (A).


E


Proof Invariant means constant under some operation. In this case the operation is a similarity trans-
formation. A lengthy exercise (but possibly a educational one) would be to prove this result without
referencing Theorem TSRM [803]. But here we will,


t (S-AS) = t ((s'A) S)
          = t (S (s1A))

          = t ((SS-1) A)
          = t (A)


Theorem MMA [202]
Theorem TSRM [803]
Theorem MMA [202]
Definition MI [213]


0


   Now we could define the trace of a linear transformation as the trace of any matrix representation of
the transformation. Would this definition be well-defined? That is, will two different representations of
the same linear transformation always have the same trace? Why? (Think Theorem SCB [583].) We will
now prove one of the most interesting and surprising results about the trace.

Theorem TSE
Trace is the Sum of the Eigenvalues
Suppose that A is a square matrix of size n with distinct eigenvalues Ai, A2, A3, ..., Ak. Then

                                               k
                                       t (A) =       (Ai) a2
                                              i=1


Version 2.02


﻿
                                                                                  Section T  Trace 805


Proof It is amazing that the eigenvalues would have anything to do with the sum of the diagonal entries.
Our proof will rely on double counting. We will demonstrate two different ways of counting the same thing
therefore proving equality. Our object of interest is the coefficient of xz-1 in the characteristic polynomial
of A (Definition CP [403]), which will be denoted an1. From the proof of Theorem NEM [425] we have,

               PA (x) = (-1)Thv - A1)aA(A1) (X a- 2) (A2)(x - A3)aA(A3) ... (x - Ak)a(Ak)

First we want to prove that an_1 is equal to (-1)n+1 Z_1 a (A2) A2 and to do this we will use a straight
forward counting argument. Induction can be used here as well (try it), but the intuitive approach is a
much stronger technique. Let's imagine creating each term one by one from the extended product. How
do we do this? From each (x - AZ) we pick either a x or a A2. But we are only interested in the terms
that result in x to the power n~ - 1. As Z_1 a (A2) = n, we have n factors of the form (x - A2). Then
to get terms with xz-1 we need to pick x's in every (x - A2), except one. Since we have n linear factors
there are n ways to do this, namely each eigenvalue represented as many times as it's algebraic multiplicity.
Now we have to take into account the sign of each term. As we pick n - 1 x's and one A2 (which has a
negative sign in the linear factor) we get a factor of -1. Then we have to take into account the (-1)n in
the characteristic polynomial. Thus an_1 is the sum of these terms,
                                                      k
                                     aen_1 = (-1)n+ EaA (s) Ai
                                                     i=1

Now we will now show that an_1 is also equal to (-1)"-it (A). For this we will proceed by induction on the
size of A. If A is a 1 x 1 square matrix then PA (x) = det (A - xIn) = ([A]11 - x) and (-1)1-t (A) = [A]11
With our base case in hand let's assume A is a square matrix of size n. By Definition CP [403]

      PA (x) = det (A - zln)
             = [A - xIn]11 det ((A - xIn) (1|1)) - [A - xIn]12 det ((A - xIn) (1|2)) +
                 [A - xIn]13 det ((A - xInn) (1|3)) - ... + (-1)n+1 [A - xIn]1i det ((A - xln) (1|t))

First let's consider the maximum degree of [A - xIn] i det ((A - xIn) (1|i)) when i # 1. For polynomials,
the degree of f, denoted d(f), is the highest power of x in the expression f(x). A well known result of this
definition is: if f(x) = g(x)h(x) then d(f) = d(g) + d(h) (can you prove this?). Now [A - xIn] 1 has degree
zero when i # 1. Furthermore (A - xIn) (1|i) has n - 1 rows, one of which has all of its entries of degree
zero, since column i is removed. The other n - 2 rows have one entry with degree one and the remainder
of degree zero. Then by Exercise T.T30 [806], the maximum degree of [A - xIn]1Z det ((A - xIn) (1|i)) is
n -2. So these terms will not affect the coefficient of xz-1. Now we are free to focus all of our attention on
the term [A - zIn]11 det ((A - zIn) (1|1)). As A (1|1) is a (nt - 1) x (nt - 1) matrix the induction hypothesis
tells us that det ((A - zIn) (1|1)) has a coefficient of (-1)n-2t (A (1|1)) for on-2. We also note that the
proof of Theorem NEM [425] tells us that the leading coefficient of det ((A - zIn) (1|1)) is (-1)"-1. Then,


Expanding the product shows ac_1 (the coefficient of z"-1) to be


                                            n-1


(1)"-1 [A]11 + (-1)n-1 E  [A (1|1)]kk        Definition T [802]
                        k=1
                 n-1
(-1)-1 ([A]11 + E    [A (1|1)]kk)                 Property DCN [681]
                 k=1


Version 2.02


﻿
Section T  Trace  806


                   =(-1)-1 ( [A]11 + S[Akk )
                                      k=2
                     (-1)"-t(A)

With two expressions for an_1, we have our result,

                                 t (A) = (-1)n+(_-i t (A)
                                         (-1)Th+ _1
                                                           k
                                        =(-1)n+1(_ ign+1   AA)A
                                                          i=1
                                          k
                                          i=A(A1)Ai


Definition SM [375]

Definition T [802]


0


Version 2.02


﻿
                                                                 Subsection T.EXC Exercises 807


Subsection EXC
Exercises


T10 Prove there are no square matrices A and B such that AB - BA= I.
Contributed by Andy Zimmer

T12 Assume A is a square matrix of size n matrix. Prove t (A) = t (At).
Contributed by Andy Zimmer

T20 If T= {M E Mn | t (M) =0} then prove Tn is a subspace of Man and determine it's dimension.
Contributed by Andy Zimmer

T30   Assume A is a n x n matrix with polynomial entries. Define md(A, i) to be the maximum degree of
the entries in row i. Then d(det (A)) < md(A, 1) +md(A, 2) +... +md(A, n). (Hint: If f(x) = h(x) +g(x),
then d(f) < max{d(h), d(g)}.)
Contributed by Andy Zimmer Solution [807]

T40 If A is a square matrix, the matrix exponential is defined as

                                                  A
                                          ei


Prove that det (eA) - et(A). (You might want to give some thought to the convergence of the infinite sum
as well.)
Contributed by Andy Zimmer


Version 2.02


﻿
                                                                  Subsection T.SOL Solutions 808


Subsection SOL
Solutions


T30    Contributed by Andy Zimmer   Statement [806]
We will proceed by induction. If A is a square matrix of size 1, then clearly d(det (A)) c md(A, 1). Now
assume A is a square matrix of size n then by Theorem DER [376],

               det (A) = (-1)2 [A]1,1 det (A (1|1)) + (-1)3 [A]12 det (A (1|2))
                        + (-1)4 [A]1,3 det (A (1|3)) + - - - + (-1)n+1 [A]1, det (A (1|n))

Let's consider the degree of term j, (-1)1+J [A]1 J det (A (1|j)). By definition of the function md, d([A]1,5)
md(A, j). We use our induction hypothesis to examine the other part of the product which tells us that

             d (det (A (1|j)))  md(A (1|j) ,1) +md(A (1|j) ,2) + - -+ md(A (1|j) , n-1)


Furthermore by definition of A (1|j) (Definition SM [375]) row i of matrix A contains all the entries of the
corresponding row in A (lj) then,

                                    md(A(lj) ,1)   md(A,1)
                                    md(A(1|j) ,2)  md(A, 2)


                                md(A(1|j),j-1)     md(A,j-1)
                                    md(A (1|j) ,j) md(A, j + 1)


                                md(A (1|j) , n - 1) < md(A, n)

So,

    d (det (A (1|j))) < md(A (1|j) ,1) + md(A (1|j) ,2) + - - - + md(A (1|j) , n - 1)
                    md(A,1)+md(A,2)+---+md(A,j-1)+md(A,j+1)+---+md(A,rn-1)

Then using the property that if f(x) = g(x)h(x) then d(f) = d(g) + d(h),

           d ((-1)1+j [A]1~g det (A (1|j))) =d ([A]i,5) + d (det (A (1|j)))
                                       < md(A, j) + md(A, 1) + md(A, 2) +---+
                                          md(A, j- 1) +md(A, j+ 1) +- -+ md(Arn)
                                        =md(A, 1) + md(A, 2) + ---+ md(A,rn)

As j is arbitrary the degree of all terms in the determinant are so bounded. Finally using the fact that if
f(x) =g(x) + h(x) then d(f)  max{d(h), d(g)} we have

                        d(det (A))  md(A, 1) + md(A, 2) + - - + md(A, nt)


Version 2.02


﻿
                                                               Section HP Hadamard Product 809


Section HP
Hadamard Product


This section is contributed by Elizabeth Million.
   You may have once thought that the natural definition for matrix multiplication would be entrywise
multiplication, much in the same way that a young child might say, "I writed my name." The mistake is
understandable, but it still makes us cringe. Unlike poor grammar, however, entrywise matrix multiplica-
tion has reason to be studied; it has nice properties in matrix analysis and additionally plays a role with
relative gain arrays in chemical engineering, covariance matrices in probability and serves as an inertia
preserver for Hermitian matrices in physics. Here we will only explore the properties of the Hadamard
product in matrix analysis.

Definition HP
Hadamard Product
Let A and B be m x n matrices. The Hadamard Product of A and B is defined by [A o B]Zj = [A]2j [B]ij
foralll<i<m, 1<j<n.
(This definition contains Notation HP.)                                                      A

   As we can see, the Hadamard product is simply "entrywise multiplication". Because of this, the
Hadamard product inherits the same benefits (and restrictions) of multiplication in C. Note also that
both A and B need to be the same size, but not necessarily square. To avoid confusion, juxtaposition of
matrices will imply the "usual" matrix multiplication, and we will use "o" for the Hadamard product.

Example HP
Hadamard Product
Consider

                            1 0 6B                             3  13  i
                            3 -r 5_                            j   2  4_

Then

                                A o B - (1)(3) (0)(13)  (6)(i)
                                          3(j)   (r)(2) (5)(4)_
                                          3 0   6i
                                          127r 20_


   Now we will explore some basics properties of the Hadamard Product.

Theorem HPC
Hadamard Product is Commutative
If A and B are m x na matrices then A o B-BBAo.A

Proof The proof follows directly from the fact that multiplication in C is commutative. Let A and B be
m x n~ matrices. Then


[A o B]gg = [A]2j [B]2j              Definition HP [808]
        = [B]2j [A]2j                Property CMCN [680]
          [B o A]2j                  Definition HP [808]


Version 2.02


﻿
                                                                Section HP  Hadamard Product 810


With equality of each entry of the matrices being equal we know by Definition ME [182] that the two
matrices are equal.                                                                            U


Definition HID
Hadamard Identity
The Hadamard identity is the m x n matrix Jmn defined by [Jmn]i
(This definition contains Notation HID.)


1 for all 1 <i <in, 1 j <rn.
                            A


Theorem HPHID
Hadamard Product with the Hadamard Identity
Suppose A is an m x n matrix. Then A o Jmn = Jmn o A
Proof

                  [A o Jmn]j = [Jmn o A] j
                             = [Jm] [A]
                             =(1) [A]


A.


El


Theorem HPC [808]
Definition HP [808]
Definition HID [809]
Property OCN [681]


With equality of each entry of the matrices being equal we know by Definition ME
matrices are equal.


[182] that the two


Definition HI
Hadamard Inverse
Let A be an m x n matrix and suppose [A]ij # 0 for all 1 < i < m, 1 < j < n. Then the Hadamard


Inverse, A, is given by [A]=
(This definition contains Notatio:


([A] )-1 for all 1 < i < m, 1 < j n.


A


Theorem HPHI
Hadamard Product with Hadamard Inverses
Let A be an mx n matrix such that [A]2j # 0 for all 1 < i < m, 1 < j <rn. Then A o A
Proof


AoA=Jmn. D


[ A o A]= [AoA]


         S([Ai] )-'[A] 3
         =1
         = [Jmn]ij


Theorem HPC [808]

Definition HP [808]

Definition HI [809], [A]2 # 0
Property MICN [681]
Definition HID [809]


With equality of each entry of the matrices being equal we know by Definition ME [182] that the two
matrices are equal.                                                                            U
   Since matrices have a different inverse and identity under the Hadamard product, we have used special
notation to distinguish them from what we have been using with "normal" matrix multiplication. That
is, compare "usual" matrix inverse, A-1, with the Hadamard inverse A, and the "usual" matrix identity,
In, with the Hadamard identity, Jmm. The Hadamard identity matrix and the Hadamard inverse are both
more limiting than helpful, so we will not explore their use further. One last fun fact for those of you
who may be familiar with group theory: the set of m x n matrices with nonzero entries form an abelian
(commutative) group under the Hadamard product (prove this!).


Version 2.02


﻿
Subsection HP.DMHP  Diagonal Matrices and the Hadamard Product 811


Theorem HPDAA
Hadamard Product Distributes Across Addition
Suppose A, B and C are m x n matrices. Then C o (A + B)
Proof


= CoA+CoB.


El


[C o (A+ B)]ij


[C]ij [ A+ B]gg
[C]ij ([A]1j + [B]ij)
[C]1j [A]2j + [C]ij [B]gg
[C ol Aij + [C oB]ggj
[C oA+ C oB]gg


Definition HP [808]
Definition MA [182]
Property DCN [681]
Definition HP [808]
Definition MA [182]


With equality of each entry of the matrices being equal we know
matrices are equal.

Theorem HPSMM
Hadamard Product and Scalar Matrix Multiplication
Suppose a E C, and A and B are m x n matrices. Then a(A o B)
Proof


by Definition ME [182] that the two
                                (


z (oA) oB =A o (oB).            ED


[aA o B]1 =     a [A o B]2j
         = a [A]ij [B]gg
         = [aA]1 j [B]2j
            [(GA) o B]2j
         = a [A]ij [B]gg
         = [A]1ga[B]1 j
         = [A]1j [aB]1 j
         = [A o (aB)]ij


Definition MSM [183]
Definition HP [808]
Definition MSM [183]
Definition HP [808]
Definition MSM [183]
Property CMCN [680]
Definition MSM [183]
Definition HP [808]


With equality of each entry of the matrices
matrices are equal.


being equal we know by Definition ME


[182] that the two


Subsection DMHP
Diagonal Matrices and the Hadamard Product


We can relate the Hadamard product with matrix multiplication by considering diagonal matrices, since
AoB = AB if and only if both A and B are diagonal (Citation!!!). For example, a simple calculation reveals
that the Hadamard product relates the diagonal values of a diagonalizable matrix A with its eigenvalues:
Theorem DMHP
Diagonalizable Matrices and the Hadamard Product
Let A be a diagonalizable matrix of size n with eigenvalues Ai, A2, A3, ..., An. Let D be a diagonal matrix
from the diagonalization of A, A = SDS-1, and d be a vector such that [D]j =[d]2= A for all 1 < i < n.
Then


A 22 - [So (Sl)td] i


for all1< i<n.


Version 2.02


﻿
                            Subsection HP.DMHP  Diagonal Matrices and the Hadamard Product 812


That is,
                                    A~An
                                [A]22            A2
                                [A]133  = S o (,S -)t A3


                                [A]nn 2An2


Proof
                         n
           [S o (S--1)d] = E [So (S-1))k [d]k         Definition MVP [194]
                        k=1
                        n
                      = E  [So (S-1 ) ] Ak            Definition of d
                        k=1
                        n
                        = E [S] [(S-1)t Ak            Definition HP [808]
                        k=1
                        n
                        = E [S]p [S--] kiAkDefinition TM [185]
                        k=1
                        n
                        =  [S]V A [S- ]ki             Property CMCN [680]
                        k=1
                        n
                        = ([S]ip [D]kk [S-] kiDefinition of D
                        k=1
                        n   n
                        >3>3 [S]k [D]j [S--].          [D] kJ= 0 for all k f j
                        j=1 k=1
                        n
                      =    [SD]2 [S-1].               Theorem EMP [198]
                        j=1
                      = [SDS-]..                      Theorem EMP [198]
                        [A]22Definition ME [182]

With equality of each entry of the matrices being equal we know by Definition ME [182] that the two
matrices are equal.                                                              U

   We obtain a similar result when we look at the singular value decomposition of square matrices (see
exercises).

Theorem DMMP
Diagonal Matrices and Matrix Products
Suppose A, B are m x n~ matrices, and D and £ are diagonal matrices of size m and n, respectively. Then,

                         D(A o B)E =(DAB) o B =(DA) o (BE)


Proof
                        M
          [D(A o B)E]2, =  [D]ik [(A o B)E]kJ          Theorem EMP [198]
                        k=1


Version 2.02


﻿
Subsection HP.DMHP   Diagonal Matrices and the Hadamard Product 813


m    n
>3 Z[D]ik [A o B]kl [E]1j
k=1 1=1
  m  n
    E   D] 2 A] l B] l E]
 k=1 1=1

   E D] k A]k B]   E ]
 k=1
 [D]i [A] j[B] j[E] j
 [D]i [A] j[E] j[B] j
       n
 [D]2(2   [A]Zt [E]1) [B]2j
      l=1
 [D]j [AE] j[B] j
 m

 k=1
 [DAE]2j [B]2j
 [(DAE) o B]

 of the matrices being equal we know


= [DAE]2j [B]2j
    n
= (E [DA]ik [Elk]) [B],
   k=1
= [DA] j[E] j[B] j
= [DA] j[B] j[E] .
          n
= [DA]i (E [B]ik [Elk])
         k=1
= [DA] j[BE]g,
= [(DA) o (BE)]

of the matrices being equal we know


  Theorem EMP [198]


  Definition HP [808]


  [E] = 0 for all l # j

  [D]ik 0 for all i # k
  Property CMCN [680]

  [E]1 = 0 for all l # j

  Theorem EMP [198]

  [D]ik 0 for all i # k

  Theorem EMP [198]
  Definition HP [808]

by Definition ME [182] that the two


Definition HP [808]

Theorem EMP [198]

[Elk] = 0 for all k # j
Property CMCN [680]

[Elk] = 0 for all k # j

Theorem EMP [198]
Definition HP [808]

by Definition ME [182] that the two


With equality of each entry
matrices are equal.
   Also,

              [(DAE) o B]j


With equality of each entry
matrices are equal.


Version 2.02


﻿
                                                                    Subsection HP.EXC  Exercises 814


Subsection EXC
Exercises


T10 Prove that A o B = AB if and only if both A and B are diagonal matrices.
Contributed by Elizabeth Million

T20   Suppose A, B are m x n matrices, and D and E are diagonal matrices of size m and n, respectively.
Prove both parts of the following equality hold:

                              D(A o B)E = (AE) o (DB) = A o (DBE)


Contributed by Elizabeth Million

T30   Let A be a square matrix of size n with singular values 01, 02, os, ..., an. Let D be a diagonal
matrix from the singular value decomposition of A, A = UDV* (Theorem SVD [839]). Define the vector
d by [d]Z = [D]22 = a, 1 < i < n. Prove the following equality,

                                         [A]ii = [(U oV)d].


Contributed by Elizabeth Million

T40   Suppose A, B and C are m x n matrices. Prove that for all 1 < i < m,

                                    [(A o B)C']jz = [(A o C)Bt]


Contributed by Elizabeth Million

T50 Define the diagonal matrix D of size n with entries from a vector x E Cn by


                                          [D][z]j if i - j
                                            S0 otherwise

Furthermore, suppose A, B are m x n matrices. Prove that [ADBt] = [(A o B)x]i for all 1 < i < m.
Contributed by Elizabeth Million


Version 2.02


﻿
                                                              Section VM Vandermonde Matrix 815


Section VM
Vandermonde Matrix


THIS SECTION IS A DRAFT, SUBJECT TO CHANGES

   Alexandre-Theophile Vandermonde was a French mathematician in the 1700's who was among the first
to write about basic properties of the determinant (such as the effect of swapping two rows). However,
the determinant that bears his name (Theorem DVM [814]) does not appear in any of his four published
mathematical papers.

Definition VM
Vandermonde Matrix
An square matrix of size n, A, is a Vandermonde matrix if there are scalars, zi, 2, x3, ..., x such
that [A]2=x--1,1<i<n, 1 j         n.                                                          A

Example VM4
Vandermonde matrix of size 4


                                          1   2   4    8
                                        A 1 -3    9   -27
                                          1   1   1    1
                                          1   4   16   64

is a Vandermonde matrix since it meets the definition with xi = 2, x2 = -3, x3 =1, X4 = 4.

   Vandermonde matrices are not very interesting as numerical matrices, but instead appear more often
in proofs and applications where the scalars x2 are carried as symbols. Two such applications are in the
sections on secret-sharing (Section SAS [852]) and curve-fitting (Section CF [847]). Principally, we would
like to know when Vandermonde matrices are nonsingular, and the most convenient way to check this is
by determining when the determinant is nonzero (Theorem SMZD [389]). As a bonus, the determinant of
a Vandermonde matrix has an especially pleasing formula.
Theorem DVM
Determinant of a Vandermonde Matrix
Suppose that A is a Vandermonde matrix of size n built with the scalars xzi, 2, x3, ..., z:n. Then


                                             1 i<jn


Proof The proof is by induction (Technique I [694]) on n, the size of the matrix. An empty product for
a 1 x 1 matrix might make a good base case, but we'll start at nr= 2 instead. For a 2 x 2 Vandermonde
matrix, we have


                          det (A)~ 1 =: z2 -z1~ H (z:j-3zi)
                                         1 ~2         1 i<j 2


For the induction step we will perform row operations on A to obtain the determinant of A as multiple of
the determinant of an (n - 1) x (n - 1) Vandermonde matrix. the notation in this theorem tens to obscure
your intuition about the changes effected by various row and column manipulations. Construct a 4 x 4


Version 2.02


﻿
                                                                 Section VM   Vandermonde Matrix 816


Vandermonde matrix with four symbols as the scalars (xi, x2, x2, x4, or perhaps a, b, c, d) and play along
with the example as you study the proof.
   First we convert most of the first column to zeros. Subtract row n from each of the other n - 1 rows
to form a matrix B. By Theorem DRCMA [385], B has the same determinant as A. The entries of B, in
the first n- 1 rows, i.e. for 1 < i < n- 1,1 <j < n- 1, are

                                                            j-2
                                 [B  ij1 - -n1  (xi - xn)   xi-2-kxk
                                                            k=0

As the elements of row i, 1 < i  n - 1, have the common factor (xi - xz), we form the new matrix
C that differs from B by the removal of this factor from each of the first n - 1 rows. This will change
the determinant, as we will track carefully in a moment. We also have a first column with zeros in each
location, except row n, so we can use it for a column expansion computation of the determinant. We now
know,

   det (A) = det (B)                                                       Theorem DRCMA [385]
           = (x1 - x)(2 - x) ... (x_1 - x) det (C)                         Theorem DRCM [384]
           = (x1 - x)(2 - x) ... (x-1 - x)(1)(-1)"-1 det (C (n - 1|1))  Theorem DEC [378]
           = (xi - x)(z2 - zn) ... (x-1 - x)(-1)--1 det (C (n - 1|1))
           = (z. - zi)(z. - X2) -.-.-(z. - za_1) det (C (n - 1|1))

For convenience, denote D = C (n - 1|1). Entries of this matrix are similar to those of B, but the factors
used to build C are gone, and since the first column is gone, there is a slight re-indexing relative to the
columns. For1 <i < n -1, 1 <j<n-1,

                                                j-1
                                         [D] g= (   z-1-kok
                                                k=o

We will perform many column operations on the matrix D, always of the type where we multiply a
column by a scalar and add the result to another column. As such, Theorem DRCM [384] insures that
the determinant will remain constant. We will work column by column, left to right, to convert D into a
Vandermonde matrix with scalars xi, x2, x3, ... , x_1. More precisely, we will build a sequence of matrices
D = D1, D2, ..., Dn_1, where each obtainable from the previous by a sequence of determinant-preserving
column operations and the first £ columns of Df are the first £ columns of a Vandermonde matrix with
scalars x1, x2, x3, ..., x-1. We could establish this claim by induction (Technique I [694]) on £ if we were
to expand the claim to specify the exact values of the final n - 1 - £ columns as well. Since the claim is
that matrices with certain properties exist, we will instead establish the claim by constructing the desired
matrices one-by-one procedurally. The extension to an inductive proof should be clear, but not especially
illuminating.
   Set Di1 D to begin, and note that the entries of the first column of D1 are, for 1 < i < n~ - 1,

                                           1-1

                                           k=0

So the first column of D1 has the properties we desire. We will use this column of all l's to remove the
highest power of zu from each of the remaining columns and so build D2. Precisely, perform the n~ - 2


column operations where column 1 is multiplied by z4-1 and subtracted from column j, for 2 < j < n - 1.
Call the result D2, and examine its entries in columns 2 through n -1. For 1 < i < n - 1, 2 < j < n -1,

                            [D2]ig = -zx-1 [D1]ii + [D1]ij


Version 2.02


﻿

Section VM  Vandermonde Matrix  817


                                              j-1


                                              k=0
                                - -n1 +                  + j-       k

                                                           k=0
                                   j-2
                                     -- -1 +  k 1U kj1+  j--~


                                   k=0

In particular, we examine column 2 of D2. For 1 <i < n - 1,

                                         2-2
                                 [D 2] i2 =    --          - 2-1
                                         k=0


Now, form D3. Perform the n -
subtracted from column j, for 3
1   i  n -1,


3 column operations where column 2 of D2 is multiplied by zjn2 and
< j < n - 1. The result is D3, whose entries we now compute. For


                          [D3] ij - z--2 [D2]i2 + [D2]ij
                                            j-2
                                  -x-2x1 + 5     -1-
                                            k=0
                                                            j-3
                                 _-_3-21 +    -1-(j-2) j-2 +      -1-kk

                                                            k=0
                                 j-3

                                 k=0

Specifically, we examine column 3 of D3. For 1 < i <rn - 1,

                                         3-3
                                 [D3]i   5x3-1-kok -9x2-= 3-1
                                         k=0

We could continue this procedure n -4 more times, eventually totaling 2 (n2 - 3n + 2) column operations,
and arriving at Dn_1, the Vandermonde matrix of size n - 1 built from the scalars x1, x2, x3, - -- , . n-1.
Informally, we chop off the last term of every sum, until a single term is left in a column, and it is of the
right form for the Vandermonde matrix. This desired column is then used in the next iteration to chop
off some more final terms for columns to the right. Now we can apply our induction hypothesis to the
determinant of Dn_1 and arrive at an expression for det A,


det (A) = det (C)
         n-1

       =Hl (x - Xk) det
         k=1
         n-1
         T(x1 - Xk) det
         k=1
         n-1
           H(x12 - Xk)
         k=1          1i(xj-x)


         1 i<jsn


(D)


(Dn_1)


<j <n-


(xj - xi)


1


Version 2.02


﻿
                                                                Section VM   Vandermonde Matrix 818


which is the desired result.                                                                      U
   Before we had Theorem DVM [814] we could see that if two of the scalar values were equal, then the
Vandermonde matrix would have two equal rows and hence be singular (Theorem DERC [385], Theorem
SMZD [389]). But with this expression for the determinant, we can establish the converse.
Theorem NVM
Nonsingular Vandermonde Matrix
A Vandermonde matrix of size n with scalars x1, x2, x3, ..., z 5is nonsingular if and only if the scalars
are all different.                                                                                D
Proof Let A denote the Vandermonde matrix with scalars x1, x2, x3, ..., x. By Theorem SMZD [389],
A is nonsingular if and only if the determinant of A is nonzero. The determinant is given by Theorem
DVM [814], and this product is nonzero if and only if each term of the product is nonzero. This condition
translates to xi - zy # 0 whenever i # j. In other words, the matrix is nonsingular if and only if the scalars
are all different.                                                                                U


Version 2.02


﻿
                                                       Section PSM  Positive Semi-definite Matrices 819


Section PSM
Positive Semi-definite Matrices


THIS SECTION IS A DRAFT, SUBJECT TO CHANGES
NEEDS NUMERICAL EXAMPLES

   Positive semi-definite matrices (and their cousins, positive definite matrices) are square matrices which
in many ways behave like non-negative (respectively, positive) real numbers. Results given here are em-
ployed in the decompositions of Section SVD [835], Section SR [840] and Section PD [355].

Subsection PSM
Positive Semi-Definite Matrices


Definition PSM
Positive Semi-Definite Matrix
A square matrix A of size n is positive semi-definite if A is Hermitian and for all x E C"n, (Ax, x) > 0.
                                                                                                  A

   For a definition of positive definite replace the inequality in the definition with a strict inequality,
and exclude the zero vector from the vectors x required to meet the condition. Similar variations allow
definitions of negative definite and negative semi-definite. Our first theorem in this section gives us
an easy way to build positive semi-definite matrices.
Theorem CPSM
Creating Positive Semi-Definite Matrices
Suppose that A is any m x n matrix. Then the matrices A*A and AA* are positive semi-definite matrices.


Proof We will give the proof for the first matrix, the proof for the second is entirely similar. First we
check that A*A is Hermitian,

                    (A*A)* = A* (A*)*                     Theorem MMAD [204]
                            = A*A                         Theorem AA [190]

so by Definition HM [205], the matrix A*A is Hermitian. Second, for any x E Cn,

                   (A*Ax, x) =(Ax, (A*)* x)                    Theorem AIP [204]
                              =(Ax, Ax)                        Theorem AA [190]
                              > 0                              Theorem PIP [172]

which is the second criteria in the definition of a positive semi-definite matrix (Definition PSM [818]). U
   A statement very similar to the converse of this theorem is also true. Any positive semi-definite matrix
can be realized as the product of a square matrix, B, with its adjoint, B*. (See Exercise PSM.T20 [821]
after studying this entire section.) The matrices A*A and AA* will be important later when we define


singular values (Section SVD [835]).
   Positive semi-definite matrices can also be characterized by their eigenvalues, without any mention of
inner products. This next result further reinforces the notion that positive semi-definite matrices behave
like non-negative real numbers.


Version 2.02


﻿
                                             Subsection PSM.PSM  Positive Semi-Definite Matrices 820


Theorem EPSM
Eigenvalues of Positive Semi-definite Matrices
Suppose that A is a Hermitian matrix. Then A is positive semi-definite matrix if and only if whenever A
is an eigenvalue of A, then A > 0.                                                            D

Proof Notice first that since we are considering only Hermitian matrices in this theorem, it is always
possible to compare eigenvalues with the real number zero, since eigenvalues of Hermitian matrices are all
real numbers (Theorem HMRE [427]). Let n denote the size of A.
   (-) Let x # 0 be an eigenvector of A for A. Then by Theorem PIP [172] we know (x, x) # 0. So


      1
A::=   A (x, x)
    (x, x)
      1
  =Kx, x)(Ax, x)
      1
   =xx)(Ax, x)


Property MICN [681]

Theorem IPSM [170]

Definition EEM [396]


By Theorem PIP [172], (x, x) > 0 and by Definition PSM [818] we have (Ax, x) > 0. With A expressed
as the product of these two quantities, we have A > 0.
   (<)   Suppose now that Ai, A2, A3, ..., An are the (not necessarily distinct) eigenvalues of the Her-
mitian matrix A, each of which is non-negative. Let B = {x1, x2, x3, ..., xn } be a set of associated
eigenvectors for these eigenvalues. Since a Hermitian matrix is normal (Definition HM [205], Definition
NM [71]), Theorem OBNM [609] allows us to choose this set of eigenvectors to also be an orthonormal basis
of C". Choose any x E C"m and let ai, a2, a3, ..., an be the scalars guaranteed by the spanning property
of the basis B such that
                                                               n
                          x = aix1 + a2x2 + a3x3 -+ --+-anx =  axi
                                                              i=1

Since we have presumed A is Hermitian, we need only check the other defining property,


              n Anaix,
(Ax, x) = KA aixi, Zia1x9

            n         n
              Aaixi, Zax9
           i=1       j=1
           n          n
       - K    aAxi, Zagx9
           i=1       j=1
           n          n
       - E ai Aiyxi)x, ax
           i=1       j=1
           n n
         = E E(aiAixi, ajxj)
         i=1 j=1
         n n
            =EEaiAia-j(xi, xj)
          i=1 j=1
          n                  n n
       - 5 a Ai ai(xi, xi) + 55EaiAa-j (xi, x3)
          i=1               i=1 j=1
                               jai


Definition TSVS [313]


Theorem MMDAA [201]


Theorem MMSMM [201]


Definition EEM [396]


Theorem IPVA [169]


Theorem IPSM [170]


Property CACN [680]


Version 2.02


﻿
                                                 Subsection PSM.PSM  Positive Semi-Definite Matrices 821

                     n              n   n
                  =     aiAidi(1) +>       aiAaj(0)                    Definition ONS [177]
                    i=1             i=1 j=1
                                       jii


                    i=1
                    n
                  =     Ai la 2                                        Definition MCN [682]
                    i=1

With non-negative values for each eigenvalue Xi, 1 < i <rn, and each modulus squared, it should be clear
that this sum is non-negative. Which is exactly what is required by Definition PSM [818] to establish that
A is positive semi-definite.                                                                           U
    As positive semi-definite matrices are defined to be Hermitian, they are then normal and subject to
orthonormal diagonalization (Theorem OD [607]). Now consider the interpretation of orthonormal diago-
nalization as a rotation to principal axes, a stretch by a diagonal matrix and a rotation back (Subsection
OD.OD [607]). For a positive semi-definite matrix, the diagonal matrix has diagonal entries that are the
non-negative eigenvalues of the original positive semi-definite matrix. So the "stretching" along each axis
is never a reflection.


Version 2.02


﻿
                                                                  Subsection PSM.EXC   Exercises 822


Subsection EXC
Exercises


T20 Suppose that A is a positive semi-definite matrix of size n. Prove that there is a square matix B of
size n such that A = BB*.
Contributed by Robert Beezer


Version 2.02


﻿


Chapter MD

Matrix Decompositions


This chapter is about breaking up a matrix A into pieces that somehow combine to recreate A. Usually
the pieces are again matrices, and usually they are then combined via matrix multiplication (Definition
MM [197]). In some cases, the decomposition will be valid for any matrix, but often we might need
extra conditions on A, such as being square (Definition SQM [71]), nonsingular (Definition NM [71]) or
diagonalizable (Definition DZM [435]) before we can guarantee the decomposition. If you are comfortable
with topics like decomposing a solution vector into linear combinations (Subsection LC.VFSS [94]) or
decomposing vector spaces into direct sums (Subsection PD.DS [361]), then we will be doing similar things
in this chapter. If not, review these ideas and take another look at Technique DC [694] on decompositions.
   We have studied one matrix decomposition already, so we will review that here in this introduction,
both as a way of previewing the topic in a familiar setting, but also since it does not deserve another
section all of its own.
   A diagonalizable matrix (Definition DZM [435]) is defined to be a square matrix A such that there is
an invertible matrix S and a diagonal matrix D where S-1AS = D. We can re-write this as A = SDS-1.
Here we have a decomposition of A into three matrices, S, D and S-1, which recombine through matrix
multiplication to recreate A. We also know that the diagonal entries of D are the eigenvalues of A. We
cannot form this decomposition for just any matrix - A must be square and we know from Theorem
DC [436] that a matrix of size n is diagonalizable if and only if there is a basis for C" composed entirely
of eigenvectors of A, or by Theorem DMFE [438] we know that A is diagonalizable if and only if each
eigenvalue of A has a geometric multiplicity equal to its algebraic multiplicity. Some authors prefer to call
this an eigen decomposition of A rather than a matrix diagonalization.
   Another decomposition, which is similar in flavor to matrix diagonalization, is orthonormal diagonal-
ization (Theorem OD [607]). Here we require the matrix A to be normal and we get the decomposition
A = UDU*, where D is a diagonal matrix with the eigenvalues of A on the diagonal, and U is unitary.
The hypothesis that A is normal guarantees the decomposition and we get the extra information that U
is unitary.
   Each section of this chapter features a different matrix decomposition, with the exception of Section
PSM [818], which presents background information on positive semi-definite matrices required for singular
value decompositions, square roots and polar decompositions.


Section ROD
Rank One Decomposition


THIS SECTION IS A DRAFT, SUBJECT TO CHANGES

   Our first decomposition applies only to diagonalizable (Definition DZM [435]) matrices, and yields a


823


﻿
                                                           Section ROD  Rank One Decomposition  824


decomposition into a sum of very simple matrices.
Theorem ROD
Rank One Decomposition
Suppose that A is a diagonalizable matrix of size n and rank r. Then there are r square matrices
A1, A2, A3, ..., Ar, each of size n and rank 1 such that

                                    A= Al+A2+A3+---+Ar

Furthermore, if A1, A2, A3, ..., Ar are the nonzero eigenvalues of A, then there are two sets of r linearly
independent vectors from C",

                 X = {x1, x2, x3,..., xr}                 Y = {yi, y2, y3,..., yr}

such that Ak _ Akxkyi, 1 < k < r.                                                                D
Proof The proof is constructive. Generally, we will diagonalize A, creating a nonsingular matrix S and
a diagonal matrix D. Then we split up the diagonal matrix into a sum of matrices with a single nonzero
entry (on the diagonal). This fundamentally creates the decomposition in the statement of the theorem,
the remainder is just bookkeeping. The vectors in X and Y will result from the columns of S and the rows
of S-1.
   Let A1, A2, A3, ..., An be the eigenvalues of A (repeated according to their algebraic multiplicity). If
A has rank r, then dim (P1(A)) = n - r (Theorem RPNC [348]). The null space of A is the eigenspace of
the eigenvalue A = 0 (Theorem EMNS [405]), so it follows that the algebraic multiplicity of A = 0 is n - r,
c-A (0) = n - r. Presume that the complete list of eigenvalues is ordered so that Ak= 0 for r + 1 < k < n.
   Since A is hypothesized to be diagonalizable, there exists a diagonal matrix D and an invertible matrix
S, such that D = S-1AS. We can rearrange tis equation to read, A = SDS-1. Also, the proof of Theorem
DC [436] says that the diagonal elements of D are the eigenvalues of A and we have the flexibility to assume
they lie on the diagonal in the same order as we have specified above. Now, let X* = {x1, x2, x3, ..., xn}
be the columns of S, and let Y* = {yi, Y2, y3, ..., yn} be the rows of S-- converted to column vectors.
With little motivation other than the statement of the theorem, define size n matrices Ak, 1 < k < n by
Ak    Akxkyt. Finally, let Dk be the size n matrix that is totally zero, other than having Ak in row k and
column k.
   With everything in place, we compute entry-by-entry,

             [A] =_[SDS1]..                                 Definition DZM [435]


                   =[s(2Dk      s--1                        Definition MA [182]
                         n


                   =[s(2D      s--1                         Theorem MMDAA [201]


                   [z2SDsS--1                               Theorem MMDAA [201]
                     k12

                 = S    [SDS1]-1                            Definition MA [182]
                    k=1
                    n2 n


  >3>3 [SDk]2 [S-1] .                      Theorem EMP [198]
  k=1 f=1
  n n n
= >>>        [S] [Dk]pf [S-1] fTheorem EMP [198]
  k=1 P=1 p=1


Version 2.02


﻿
Section ROD   Rank One Decomposition  825


n

k=1
n
S   [S] Ak [S-1]
k=1
n
5 Ak [Slik [s']k
k=1
n
5 Ak [xk1 1 [k1 i
k=1
n       1
   Ak5[Xk jq [Ylqj
k=1   q=1
n
   Ak [xk3<] 3
k=1
n

k=1
n
5 [AQli
k=1
  nA    ~

  k=1 .2j


[Dk1]    0 if p # k, or £ # k


[Dklkk Ak


Property CMCN [680]


Definition of X*, Y*


Theorem EMP [198]


Definition MSM [183]


Definition of Ak


Definition MA [182]


So by Definition ME [182] we have the desired equality of matrices. The careful reader will have noted that
Ak = 0, r + 1 < k < n, since Ak= 0 in these instances. To get the sets X and Y from X* and Y*, simply
discard the last n - r vectors. We can safely ignore (or remove) Ar+1, Ar+2, ..., An from the summation
just derived.
    One last assertion to check. What is the rank of Ak, 1 < k < r? Every row of Ak is a scalar multiple
of y%, row k of the nonsingular matrix S-1 (Theorem MIMI [220]). As a row of a nonsingular matrix, y%
cannot be all zeros. In particular, row i of Ak is obtained as a scalar multiple of yt by the scalar a-k [xk]h.
We have restricted ourselves to the nonzero eigenvalues of A, and as S is nonsingular, some entry of xk
is nonzero. This all implies that some row of Ak will be nonzero. Now consider row-reducing Ak. Swap
the nonzero row up into row 1. Use scalar multiples of this row to zero out every other row. This leaves a
single nonzero row in the reduced row-echelon form, so Ak has rank one.

    We record two observations that was not stated in our theorem above. First, the vectors in X, chosen
as columns of S, are eigenvectors of A. Second, the product of two vectors from X and Y in the opposite
order, by which we mean yjxj, is the entry in row i and column j of the matrix product S--S = In
(Theorem EMP [198]). In particular,

                                              1 Iiif i= j
                                                  0 if z #

We give two computational examples. One small, one a bit bigger.

Example ROD2
Rank one decomposition, size 2


Version 2.02


﻿
                                                       Section ROD Rank One Decomposition 826


Consider the 2 x 2 matrix,

                                       A    -16   -6
                                             45   17]

By the techniques of Chapter E [396] we find the eigenvalues and eigenspaces,

        Ai=2          EA(2) =   {3               A=-1           EA

With n = 2 distinct eigenvalues, Theorem DED [440] tells us that A is diagonalizable, and with no zero
eigenvalues we see that A has full rank. Theorem DC [436] says we can construct the nonsingular matrix
S with eigenvectors of A as columns, so we have

                     S=_    1  -2                       S1      5   2
                           3    5                              -3  -1

From these matrices we obtain the sets of vectors

                   X    - 1]      2]                     Y     5    -3

And we have the matrices,

                       A1=2    1  5     2 _ 5 -2   _ -10   -4
                       A   2[3] 2         15   6      30    12
                                -2   -3t           6    2      -6  -2
                      A2 =(-1)5      -1     (     -115 5       15   5

And you can easily verify that A = A1+ A2.
   Here's a slightly larger example, and the matrix does not have full rank.
Example ROD4
Rank one decomposition, size 4
Consider the 4 x 4 matrix,

                                       [34    18  -1   -6
                                   B=   -44  -24  -1   9
                                   B    36    18  -3   -6

                                       [36    18  -6   -3]

By the techniques of Chapter E [396] we find the eigenvalues and eigenvectors,


                             A11 1


                   A2 =-2                     SB(-2)   K{[]}


                                         2

A3 = 0                      EA (0) = 2

                                         .2 _


Version 2.02


﻿
                                                            Section ROD  Rank One Decomposition  827


The algebraic and geometric multiplicities of each eigenvalue are equal, so Theorem DMFE [438] tells us
that A is diagonalizable. With a single zero eigenvalue we see that A has rank 4 - 1= 3. Theorem DC
[436] says we can construct the nonsingular matrix S with eigenvectors of A as columns, so we have


    1    1   -1   2-
S = 2-1 2 -3
    1    1   0    2
    -1   2   0    2


]


     4    2   0   -1
  1 8     4   -1  -1
S1 -1     0   1    0
   _-6   -3   1    1_


Since r = 3, we need only collect three vectors from each of these matrices,


    X    {1       -1
1X      1       '  0
   .-1. .2]_ _ 0


    2       4      0
Y0      '   -1  '
   _-1_  _-1_  0o


And we obtain the matrices,


         1     4-

B, = 3   12    2
               0
        -1    -1
        1      8

82=3    it     4 2 [-i]


            -1    -1 t

B3 =(-2) 2
             0     1
             0     0


K


4    2   0 -1
-8   -4 0    2
4    2   0 -1
-4   -2 0    1

8    4   -1 -1]
-8 -4     1    1
8    4   -1 -1
16   8   -2   -2]

       1   0 -1 0]
       -2 0 2 0
(-2)   0   0   0   0

       0   0   0   0_


12     6   0  -3
-24  -12 0     6
12     6   0  -3
-12   -6   0   3]
  24    12   -3   -3
  -24  -12    3   3
  24    12   -3   -3
  [48   24   -6   -6]
     -2 0    2   0
     4   0   -4 0
     0   0   0   0
     _0  0   0   0_


Then we verify that

                B=B1+B2+B3
                       12    6   0 -3       ~ 24   12   -3 -3        ~-2 0    2   0
                       -24  -12  0   6       -24   -12   3    3       4   0 -4 0
                       12    6   0 -3 +      24    12   -3 -3 +       0   0   0   0
                       -12  -6   0   3_      48    24   -6 -6_       _ 0  0   0   0_
                       34   18   -1   -6
                       -44  -24  -1    9
                       36   18   -3   -6
                       36   18   -6   -3]


Version 2.02


﻿
                                                             Section TD  Triangular Decomposition 828


Section TD
Triangular Decomposition


THIS SECTION IS A DRAFT, SUBJECT TO CHANGES

   Our next decomposition will break a square matrix into a product of two matrices, one lower triangular
and the other upper triangular. So we will write A = LU, and hence many refer to this as LU decompo-
sition. We will see that this decomposition is very easy to compute and that it has a direct application
to solving systems of equations. Since this section is about triangular matrices you might want to review
the definitions and a couple of basic theorems back in Subsection OD.TM [601].

Subsection TD
Triangular Decomposition


With a slight condition on the nonsingularity of certain submatrices, we can split a matrix into a product
of two triangular matrices.

Theorem TD
Triangular Decomposition
Suppose A is a square matrix of size n. Let Ak be the k x k matrix formed from A by taking the first k
rows and the first k columns. Suppose that Ak is nonsingular for all 1 < k < n. Then there is a lower
triangular matrix L with all of its diagonal entries equal to 1 and an upper triangular matrix U such that
A = LU. Furthermore, this decomposition is unique.                                                D

Proof We will row reduce A to a row-equivalent upper triangular matrix through a series of row operations,
forming intermediate matrices A', 1   j n, that denote the state of the conversion after working on
column j. First, the lone entry of A1 is [A]11 and this scalar must be nonzero if A1 is nonsingular (Theorem
SMZD [389]). We can use row operations Definition RO [28] of the form oR1 + Rk, 2 < k < n, where
a = - [A] 1k / [A]11 to place zeros in the first column below the diagonal. The first two rows and columns
of Al are a 2 x 2 upper triangular matrix whose determinant is equal to the determinant of A2, since
the matrices are row-equivalent through a sequence of row operations strictly of the third type (Theorem
DRCMA [385]). As such the diagonal entries of this 2 x 2 submatrix of Al are nonzero. We can employ
this nonzero diagonal element with row operations of the form aR2 + Rk, 3 < k < n to place zeros below
the diagonal in the second column. We can continue this process, column by column. The key observations
are that our hypothesis on the nonsingularity of the Ak will guarantee a nonzero diagonal entry for each
column when we need it, that the row operations employed are always of the third type using a multiple
of a row to transform another row with a greater row index, and that the final result will be a nonsingular
upper triangular matrix. This is the desired matrix U.
   Each row operation described in the previous paragraph can be accomplished with matrix multiplication
by the appropriate elementary matrix (Theorem EMDRO [372]). Since every row operation employed is
adding a multiple of a row to a subsequent row these elementary matrices are of the form Egk (ae) with
j < k. By Definition ELEM [370], these matrices are lower triangular with every diagonal entry equal to 1.
We know that the product of two such matrices will again be lower triangular (Theorem PTMT [601]), but
also, as you can also easily check using a proof with a style similar to one above, that the product maintains


all l's on the diagonal. Let Eli, E2, E3, ..., Em denote the elementary matrices for this sequence of row
operations. Then

                                  U = EmEm-1... E3E2E1A = L'A


Version 2.02


﻿
                                                      Subsection TD.TD   Triangular Decomposition 829


where L' is the product of the elementary matrices, and we know L' is lower triangular with all l's on the
diagonal. Our desired matrix L is then L = (L')-1. By Theorem ITMT [602], L is lower triangular with
all l's on the diagonal and A = LU, as desired.
   The process just described is deterministic. That is, the proof is constructive, with no freedom for each
of us to walk through it differently. But could there be other matrices with the same properties as L and
U that give such a decomposition of A. In other words, is the decomposition unique (Technique U [693])?
Suppose that we have two triangular decompositions, A = L1U1 and A = L2U2. Since A is nonsingular,
two applications of Theorem NPNT [226] imply that L1, L2, U1, U2 are all nonsingular. We have

                 Lj1Li I= L21Li                               Theorem MMIM [200]
                       = L21AA-- L1                           Definition MI [213]
                       = L21L2U2 (L1U1)-- L
                       = Lj1L2U2U11L1Li                       Theorem SS [219]
                       = InU2U1--II,                          Definition MI [213]
                       = U2 U 1Theorem MMIM [200]

Theorem ITMT [602] tells us that L21 is lower triangular and has l's as the diagonal entries. By Theorem
PTMT [601], the product L2jL1 is again lower triangular, and it is simple to check (as before) that the
diagonal entries of the product are again all l's. By the entirely similar process we can conclude that the
product U2Uj1 is upper triangular. Because these two products are equal, their common value is a matrix
that is both lower triangular and upper triangular, with all l's on the diagonal. The only matrix meeting
these three requirements is the identity matrix (Definition IM [72]). So, we have,

                   In= L2 Li     L2 =L1                    In=U2U1-       U1=U2

which establishes the uniqueness of the decomposition.                                            U

   Studying the proofs of some previous theorems will perhaps give you an idea for an approach to
computing a triangular decomposition. In the proof of Theorem CINM [217] we augmented a nonsingular
matrix with an identity matrix of the same size, and row-reduced until the original matrix became the
identity matrix (as we knew in advance would happen, since we knew Theorem NMRRI [72]). Theorem
PEEF [262] tells us about properties of extended echelon form, and in particular, that B = JA, where A is
the matrix that begins on the left, and B is the reduced row-echelon form of A. The matrix J is the result
on the right side of the augmented matrix, which is the result of applying the same row operations to the
identity matrix. We should recognize now that J is just the product of the elementary matrices (Subsection
DM.EM [370]) that perform these row operations. Theorem ITMT [602] used the extended echelon form
to discern properties of the inverse of a triangular matrix. Theorem TD [827] proves the existence of a
triangular decomposition by applying specific row operations, and tracking the relevant elementary row
operations. It is not a great leap to combine these observations into a computational procedure.
   To find the triangular decomposition of A, augment A with the identity matrix of the same size and
call this new 2nt x n~ matrix, M. Perform row operations on M that convert the first n~ columns to an upper
triangular matrix. Do this using only row operations that add a scalar multiple of one row to another row
with higher indez (i.e. lower down). In this way, the last n~ columns of M will be converted into a lower
triangular matrix with 1's on the diagonal (since M has l's in these locations initially). We could think of
this process as doing about half of the work required to compute the inverse of A. Take the first n~ columns
of the row-equivalent version of M and call this matrix U. Take the final n~ columns of the row-equivalent


version of M and call this matrix L'. Then by a proof employing elementary matrices, or a proof similar in
spirit to the one used to prove Theorem PEEF [262], we arrive at a result similar to the second assertion
of Theorem PEEF [262]. Namely, U = L'A. Multiplication on the left, by the inverse of L', will give us a
decomposition of A (which we know to be unique). Ready? Lets try it.


Version 2.02


﻿
                                                     Subsection TD.TD  Triangular Decomposition 830


Example TD4
Triangular decomposition, size 4
In this example, we will illustrate the process for computing a triangular decomposition, as described in
the previous paragraphs. Consider the nonsingular square matrix A of size 4,

                                           -2   6   -8    7
                                           -4 16 -14     15
                                           -6 22    -23 26
                                           -6 26    -18  17]

We form M by augmenting A with the size 4 identity matrix I4. We will perform the allowed operations,
column by column, only reporting intermediate results as we finish converting each column. It is easy to
determine exactly which row operations we perform, since the final four columns contain a record of each
such operation. We will not verify our hypotheses about the nonsingularity of the Ak, since if we do not
have these conditions, we will reach a stage where a diagonal entry is zero and we cannot create the row
operations we need to zero out the bottom portion of the associated column. In other words, we can boldly
proceed and the necessity of our hypotheses will become apparent.

                                    -2   6    -8   7   1 0 0 0
                                    -4   16 -14    15 0 1 0 0
                              M4=
                                    -6 22    -23 26 0 0 1 0
                                    -6 26    -18   17 0 0 0 1]
                                    -2 6 -8      7    1   0 0 0
                                    0    4   2   1   -2 1 0 0
                                    0    4   1   5   -3 0 1 0
                                    0    8   6   -4  -3 0 0 1]
                                    -2 6 -8      7    1    0  0 0
                                    0    4   2   1   -2    1  0 0
                                    0    0 -1    4   -1 -1 1 0
                                    0    0   2   -6   1   -2 0 1
                                    -2 6 -8 7       1    0   0 0
                                    0    4   2   1 -2    1   0 0
                                    0    0 -1 4 -1 -1 1 0
                                    0    0   0   2 -1 -4 2 1]

So at this point, we have U and L',

                          -2 6 -871                                   0   00


Then by whatever procedure we like (such as Theorem CINM [217]), we find


                                                 [3  2  -2  1]

It is instructive to verify that indeed LU = A.


Version 2.02


﻿
                    Subsection TD.TDSSE  Triangular Decomposition and Solving Systems of Equations 831


Subsection TDSSE
Triangular Decomposition and Solving Systems of Equations


In this section we give an explanation of why you might be interested in a triangular decomposition for
a matrix. Many of the computational problems in linear algebra revolve around solving large systems
of equations, or nearly equivalently, finding inverses of large matrices. Suppose we have a system of
equations with coefficient matrix A and vector of constants b, and suppose further that A has the triangular
decomposition A = LU.
   Let y be the solution to the linear system [S(L, b), so that by Theorem SLEMM [195], we have
Ly = b. Notice that since L is nonsingular, this solution is unique, and the form of L makes it trivial
to solve the system. The first component of y is determined easily, and we can continue on through
determining the components of y, without even ever dividing. Now, with y in hand, consider the linear
system, IJS(U, y). Let x be the unique solution to this system, so by Theorem SLEMM [195] we have
Ux = y. Notice that a system of equations with U as a coefficient matrix is also straightforward to solve,
though we will compute the bottom entries of x first, and we will need to divide. The upshot of all this is
that x is a solution to [S(A, b), as we now show,

                                  Ax = LUx = L (Ux) =Ly = b

An application of Theorem SLEMM [195] demonstrates that x is a solution to [S(A, b).
Example TDSSE
Triangular decomposition solves a system of equations
Here we illustrate the previous discussion, recycling the decomposition found previously in Example TD4
[829]. Consider the linear system [S(A, b) with

                           -26      -8    7]-10
                           -4   16 -14   15b                            -2
                           -6 22    -23  26                             -1
                           _-6  26  -18  17_-8

First we solve the system [S(L, b) (see Example TD4 [829] for L),

                                                     y1 =-10
                                               2y1 + Y2 =-2
                                           3y1 + Y2 + Y3= -1
                                    3y1 + 2y2 - 2y3+ Y4 = -8

Then

                    yi   -10
                    y2 =-2 - 2yi    -2 - 2(-10) =18
                    y3   -1 - 3y1 - y2 =-1 - 3(-10) - 18 =11
                       y4=-8 - 3y1 - 2y2 + 2ys= -8 - 3(-10) - 2(18) + 2(11) =8

so


--10
18
11
8


Version 2.02


﻿
Subsection TD.CTD   Computing Triangular Decompositions 832


Then we solve the system IS(U, y) (see Example TD4 [829] for U),

                                    -2xi + 6x2 - 8x3 + 7x4 = -10
                                            4x2 + 2x3 +4 =18
                                                 -33+4-4x4 = 11
                                                       2x4 =8


Then


14
13
12
zI


8/2 = 4
(11 - 4x4) /(-1) =
(18 - 2x3 - X4)/4
(-10 - 6x2 + 8x3 -


(11 - 4(4)) /(-1) = 5
= (18 - 2(5) - 4)/4   1
-7X4) /(-2)  (-10-6(1)+8(5) - 7(4)) /(-2) = 2


And so
                                                    4
                                                    5
                                              x = 1

                                                    .2_
is the solution to [S(U, y) and consequently is the unique solution to [S(A, b), as you can easily verify.


Subsection CTD
Computing Triangular Decompositions


It would be a simple matter to adjust the algorithm for converting a matrix to reduced row-echelon form
and obtain an algorithm to compute the triangular decomposition of the matrix, along the lines of Example
TD4 [829] and the discussion preceding this example. However, it is possible to obtain relatively simple
formulas for the entries of the decomposition, and if computed in the proper order, an implementation will
be straightforward. We will state the result as a theorem and then give an example of its use.
Theorem TDEE
Triangular Decomposition, Entry by Entry
Suppose that A is a squarematrix of size n with a triangular decomposition A = LU, where L is lower
triangular with diagonal entries all equal to 1, and U is upper triangular. Then


              i-1
[U]i= [A]i - >3[L]ik [U]kJ
              k=1
         1   (       j-1
         [L]  [A]ig - >  [L]ik [U k])
         l   (       k=1


1 <i j      n


1   j <i    n


Proof Consider a single scalar product of an entry of L with an entry of U of the form [L]ik [U]k3. By
Definition LTM [601], if k > i then [L]ik = 0, while Definition UTM [601], says that if k > j then [U]k] = 0.
So we can combine these two facts to assert that if k > min(i, j), [L]ik [U]k] = 0 since at least one term of
the product will be zero. Employing this observation,


        n
[A]g =j E  [L]ik [U] kJ
       k=1


Theorem EMP [198]


Version 2.02


﻿
Subsection TD.CTD    Computing Triangular Decompositions 833


                            min(i, j)
                            -S[L]ik[U]k j
                            k=1
Now, assume that 1 < i < j <n


U ] = A]ij - [ A]ig
              mini
            A >1 (


i+ [U ]2j
, j)
   [L]ik [U
1


-A]  3- [L]ik [U]kJ
       k=1
       i-i
A]  - >3[L]ik [U]kgj
       k=1
       i-1
A]  - >3[L]ik [U]kj
       k=1
       i-1
A]  - >3 [L]ik [U]kj
       k=1


]kj + [U],4


+ [U]23


- [L]ii [U]i + [U]ij


- [U]i + [U],4


And for 1    j <i <rn


[L] ij 1l      L]2j [U] )


            l ~j ] j   A i   [] jlU jk=1i~ii
               [A] -    >    [L]ik [UlkJ + [L]2j [U] J

             1


         -kU []~        3 l ik [UlkJ + [L]2j [U1 j-)
                      j1-1

        - [U] ([A]ig - >  [L]ik [UlkJ - [L]zj [U]jj + [L]4 [U] )
                      :j-1
        - [U ([A]ij - >3 [L]k [UlkJ)


0


    At first glance, these formulas may look exceedingly complex. Upon closer examination, it looks even
worse. We have expressions for entries of U that depend on other entries of U and also on entries of L.
But then the formula for entries of L depend on entries from L and entries from U. Do these formula have
circular dependencies? Or perhaps equivalently, how do we get started? The key is to be organized about
the computations and employ these two (similar) formulas in a specific order. First compute the first row
of L, followed by the first column of U. Then the second row of L, followed by the second column of U.
And so on. In this way, all of the values required for each new entry will have already been computed
previously.
    Of course, the formula for entries of L require division by diagonal entries of U. These entries might
be zero, but in this case A is nonsingular and does not have a triangular decomposition. So we need not


Version 2.02


﻿
                                          Subsection TD.CTD   Computing Triangular Decompositions 834


check the hypothesis carefully and can launch into the arithmetic dictated by the formulas, confident that
we will be reminded when a decomposition is not possible. Note that these formula give us all of the values
that we need for the decomposition, since we require that L has l's on the diagonal. If we replace the l's
on the diagonal of L by zeros, and add the matrix U, we get an n x n matrix containing all the information
we need to resurrect the triangular decomposition. This is mostly a notational convenience, but it is a
frequent way of presenting the information. We'll employ it in the next example.
Example TDEE6
Triangular decomposition, entry by entry, size 6
We illustrate the application of the formulas in Theorem TDEE [831] for the 6 x 6 matrix A.


A


3
-6
9
-6
6
9


3
-4
9
-10
4
3


-3
5
-7
8
-9
-12


-2
2
-7
10
-2
-3


-1
4
0
-1
-10
-21


0
2
1
-7
1
-2


Using the notational convenience of packaging the two triangular matrices into one matrix, and using the
ordering of the computations mentioned above, we display the results after computing a single row and
column of each of the two triangular matrices.


3
-2
3
-2
2
3
3
-2
3
-2
2
3
3
-2
3
-2
2
3


3 -3 -2 -1 0


3
2
0
-2
-1
-3
3
2
0
-2
-1
-3


-3
-1
2
0
-2
-3
-3
-1
2
0
-2
-3


-2
-2
-1


-2
-2
-1
2
-1
-3


-1 0
2 2
3 1


3
-2
3
-2
2
3
3
-2
3
-2
2
3
3
-2
3
-2
2
3


3
2
0
-2
-1
-3
3
2
0
-2
-1
-3
3
2
0
-2
-1
-3


-1 0
2    2
3     1
1    -3
1    2
0


-3
-1


-3
-1
2
0
-2
-3
-3
-1
2
0
-2
-3


-3
-1
2
0
0
0


-2
-2


-2
-2
-1
2
-1
-3
-2
-2
-1
2
-1
-3


-2
-2
-1
2
0
0


-1
2


-1
2
3
1


-1
2
3
1
1
0


-1
2
3
1
1
0


0
2
1
-3


0
2
1
-3
2
-2


0
2


Splitting out the pieces of this matrix, we have the decomposition,


1     0
-2 1
3     0
-2 -2
2    -1
3    -3


0
0
1
0
-2
-3


0
0
0
1
-1
-3


0 0
0 0
0 0
0 0
1 0
0 1


      3
      0
      0
U=0
      0
      0


3
2
0
0
0
0


0
2
1
-3
2
-2


   The hypotheses of Theorem TD [827] can be weakened slightly to include matrices where not every
Ak is nonsingular. The introduces a rearrangement of the rows and columns of A to force as many as


Version 2.02


﻿
                                           Subsection TD.CTD  Computing Triangular Decompositions 835


possible of the smaller submatrices to be nonsingular. Then permutation matrices also enter into the
decomposition. We will not present the details here, but instead suggest consulting a more advanced text
on matrix analysis.


Version 2.02


﻿
                                                      Section SVD  Singular Value Decomposition  836


Section SVD
Singular Value Decomposition


THIS SECTION IS A DRAFT, SUBJECT TO CHANGES
NEEDS NUMERICAL EXAMPLES

   The singular value decomposition is one of the more useful ways to represent any matrix, even rectan-
gular ones. We can also view the singular values of a (rectangular) matrix as analogues of the eigenvalues
of a square matrix. Our definitions and theorems in this section rely heavily on the properties of the
matrix-adjoint products (A*A and AA*), which we first met in Theorem CPSM [818]. We start by exam-
ining some of the basic properties of these two matrices. Now would be a good time to review the basic
facts about positive semi-definite matrices in Section PSM [818].

Subsection MAP
Matrix-Adjoint Product


Theorem EEMAP
Eigenvalues and Eigenvectors of Matrix-Adjoint Product
Suppose that A is an m x n matrix and A*A has rank r. Let Al, A2, A3, ..., AP be the nonzero distinct
eigenvalues of A*A and let pi, P2, p3, ..., pq be the nonzero distinct eigenvalues of AA*. Then,

  1. p = q.

  2. The distinct nonzero eigenvalues can be ordered such that Ai = pi, 1 < i < p.

  3. Properly ordered, aA* A (Ai) = AA* (pi), 1 < i < p.

  4. The rank of A*A is equal to the rank of AA*.

  5. There is an orthonormal basis, {x1, x2, x3, ..., xn} of C" composed of eigenvectors of A*A and an
     orthonormal basis, {Yi, y2, y3, ..., ym} of Cm composed of eigenvectors of AA* with the following
     properties. Order the eigenvectors so that xi, r + 1 < i < n are the eigenvectors of A*A for the zero
     eigenvalue. Let oi, 1 < i < r denote the nonzero eigenvalues of A*A. Then Axi = viyi, 1 < i < r
     and Axi = 0, r+ 1 < i < n. Finally, yi, r+ 1 < i < m, are eigenvectors of AA* for the zero eigenvalue.


Proof Suppose that x E C" is any eigenvector of A*A for a nonzero eigenvalue A. We will show that Ax
is an eigenvector of AA* for the same eigenvalue, A. First, we ascertain that Ax is not the zero vector.

                  (Ax, Ax) =(Ax, (A*)* x)                  Theorem AA [190]
                            =KA* Ax, x)                    Theorem AIP [204]
                            =(Ax, x)                       Definition EEM [396]
                            =A (x, x)                      Theorem IPSM [170]


Since x is an eigenvector, x # 0, and by Theorem PIP [172], (x, x) # 0. As A was assumed to be nonzero,
we see that (Ax, Ax) $ 0. Again, Theorem PIP [172] tells us that Ax # 0.
   Much of the sequel turns on the following simple computation. If you ever wonder what all the fuss is
about adjoints, Hermitian matrices, square roots, and singular values, return to this brief computation, as


Version 2.02


﻿
                                                     Subsection SVD.MAP   Matrix-Adjoint Product 837


it holds the key. There is much more to do in this proof, but after this it is mostly bookkeeping. Here we
go. We check that Ax functions as an eigenvector of AA* for the eigenvalue A,


(AA*) Ax = A (A*A) x
          = AAx
          = A (Ax)


Theorem MMA [202]
Definition EEM [396]
Theorem MMSMM [201]


That's it. If x is an eigenvector of A*A (for a nonzero eigenvalue), then Ax is an eigenvector for AA* for
the same eigenvalue. Let's see what this buys us.
   A*A and AA* are Hermitian matrices (Definition HM [205]), and hence are normal (Definition NRML
[606]). This provides the existence of orthonormal bases of eigenvectors for each matrix by Theorem
OBNM [609]. Also, since each matrix is diagonalizable (Definition DZM [435]) by Theorem OD [607] we
can interchange algebraic and geometric multiplicities by Theorem DMFE [438].
   Our first step is to establish that an eigenvalue A has the same geometric multiplicity for both A*A
and AA*. Suppose {x1, x2, x3, ..., xs} is an orthonormal basis of eigenvectors of A*A for the eigenspace
E*A (A).Then for 1 < i < j < s, note


(Axi, Axe) = (Axi, (A*)* xy)
           = (A*Axi, xy)
           = (Axi, x3)
           = A (xi, x.)
           = A(0)
           = 0


Theorem AA [190]
Theorem AIP [204]
Definition EEM [396]
Theorem IPSM [170]
Definition ONS [177]
Property ZCN [681]


Then the set E= {Axi, Ax2, Ax3, ..., Ax8} is an orthogonal set of nonzero eigenvectors of AA* for the
eigenvalue A. By Theorem OSLI [174], the set E is linearly independent and so the geometric multiplicity
of A as an eigenvalue of AA* is s or greater. We have

                              YA*A (A) = YA*A (A) < YAA* (A) = YAA* (A)

This inequality applies to any matrix, so long as the eigenvalue is nonzero. We now apply it to the matrix
A*,

                          aAA* (A) =   (A*)*A* (A) -A*(A*)* (A) = GA*A (A)

So for a nonzero eigenvalue, its algebraic multiplicities as an eigenvalue of A*A and AA* are equal. This
is enough to establish that p = q and the eigenvalues can be ordered such that Ai = pi for 1 < i < p.
   For any matrix B, the null space is identical to the eigenspace of the zero eigenvalue, P1(B) = ESB (0),
and thus the nullity of the matrix is equal to the geometric multiplicity of the zero eigenvalue. With this,
we can examine the ranks of A*A and AA*.


r (A*A) = n - n (A*A)
                       P
         =  A*A (0) +  e a i=1


(&A.


A (0) + ZeA:4
       i=1


A(A))


a (A)


A(A))


n (A*A)


YA*A (0)


aA*A (0)


Theorem RPNC [348]

Theorem NEM [425]


Definition GME [406]


Theorem DMFE [438]


(A*A (0) +Zc aA =


Version 2.02


﻿
Subsection SVD.MAP   Matrix-Adjoint Product 838


   aA*A (Ai)
i=1

   aAA* (Ai)
i=1


( AA*


(AA*


r(AA*


(0) +    an.
      i=1

 (0) +   aAA
      i=1

 (0) +    aA
      i=1
(AA*)
)


(A))


(A))


(A))


YAA* (0)


'yAA* (0)


n (AA*)


Theorem DMFE [438]


Definition GME [406]

Theorem NEM [425]
Theorem RPNC [348]


When A is rectangular, the square matrices A*A and AA* have different sizes. With equal algebraic and
geometric multiplicities for their common nonzero eigenvalues, the difference in their sizes is manifest in
different algebraic multiplicities for the zero eigenvalue and different nullities. Specifically,


n (A*A) = n - r


n (AA*) = m - r


Suppose that xl, x2, x3, ..., xn is an orthonormal basis of C" composed of eigenvectors of A*A and ordered
so that xi, r + 1 < i < n are eigenvectors of AA* for the zero eigenvalue. Denote the associated nonzero
eigenvalues of A*A for these eigenvectors by oi, 1 < i < r. Then define
                                1
                          yj-     Ax i                          1 < i < r

Let Yr+i, Yr+2, Yr+2, -.., ym be an orthonormal basis for the eigenspace EAA* (0), whose existence is
guaranteed by Theorem GSP [175]. As scalar multiples of demonstrated eigenvectors of AA*, yi, 1 < i < r
are also eigenvectors of AA*, and yi, r + 1 < i < n have been chosen as eigenvectors of AA*. These
eigenvectors also have norm 1, as we now show. For 1 < i < r,


         1
|yjll =  Axi


              Ax,    aAx   e

          161
    -     1 1a(Axi, Axe)

          1 1
    -      1     (Axi, Axi)

    =  a  (Axi, Axe)

        16i (Axi, (A*)* xi)

    = aZ (A*Axi, x2)
        1-x
           2 oixi, xi)


Theorem IPN [171]


Theorem IPSM [170]


Theorem HMRE [427]


Theorem AA [190]

Theorem AIP [204]

Definition EEM [396]


Version 2.02


﻿
Subsection SVD.SVD  Singular Value Decomposition  839


1
      og (xi, xi)
 1

1


Theorem IPSM [170]

Definition ONS [177]


For r + 1 < i <rn, the yj have been chosen to have norm 1.
   Finally we check orthogonality. Consider two eigenvectors yj and yj with 1 < i < j < m. If these two
vectors have different eigenvalues, then Theorem HMOE [428] establishes that the two eigenvectors are
orthogonal. If the two eigenvectors have a zero eigenvalue, then they are orthogonal by the choice of the
orthonormal basis of EAA* (0). If the two eigenvectors have identical, nonzero, eigenvalues, then


     11Axi,      Ax


     1 1
     1  1a (Axi, Ax )

     1
_        (Axi, Axe)

     1
-        (Axi, (A*)*xy)

     1
     I   (A*Axi, x3)

     1
     _(ixi, x.)>


     Vi (xi, xy)
  -     (xx

-       (0)
  = 0~


Theorem IPSM [170]

Theorem HMRE [427]

Theorem AA [190]

Theorem AIP [204]

Definition EEM [396]

Theorem IPSM [170]

Definition ONS [177]


So {yi, y2, y3, ..., ym} is an orthonormal set of eigenvectors for AA*. The critical relationship between
these two orthonormal bases is present by design. For 1 < i < r,

                                    Axi = = S    Axi = dyi

For r + 1 < i <r nwe have


(Axi, Axe) = (Axi, (A*) * xi)
          =(A*Axi, x2)
             (0, x2)
           =0


Theorem AA [190]
Theorem AIP [204]
Definition EEM [396]
Definition IP [168]


So by Theorem PIP [172], Axe = 0.


Subsection SVD
Singular Value Decomposition


0


0


The square roots of the eigenvalues of A*A (or almost equivalently, AA*!) are known as the singular values
of A. Here is the definition.


Version 2.02


﻿
                                                Subsection SVD.SVD  Singular Value Decomposition 840


Definition SV
Singular Values
Suppose A is an m x n matrix. If the eigenvalues of A*A are a1, a2, a3, ..., on, then the singular values
of A are   1,  a2,  a3, ..., yan.                                                                A
   Theorem EEMAP [835] is a total setup for the singular value decomposition. This remarkable theorem
says that any matrix can be broken into a product of three matrices. Two are square, and unitary. In
light of Theorem UMPIP [231], we can view these matrices as transforming vectors or coordinates in a
rotational fashion. The middle matrix of this decomposition is rectangular, but is as close to being diagonal
as a rectangular matrix can be. Viewed as a transformation, this matrix effects, reflections, contractions
or expansions along axes   it stretches vectors. So any matrix, viewed as a transformation is the product
of a rotation, a stretch and a rotation.
   The singular value theorem can also be viewed as an application of our most general statement about
matrix representations of linear transformations relative to different bases. Theorem MRCB [581] concerns
linear transformations T: U H V where U and V are possibly different vector spaces. When U and V
have different dimensions, the resulting matrix representation will be rectangular. In Section CB [574] we
quickly specialized to the case where U = V and the matrix representations are square with one of our
most central results, Theorem SCB [583]. Theorem SVD [839] is an application of the full generality of
Theorem MRCB [581] where the relevant bases are now orthonormal sets.
Theorem SVD
Singular Value Decomposition
Suppose A is an m x n matrix of rank r with nonzero singular values si, s2, s3, - -s. Then A = UDV*
where U is a unitary matrix of size m, V is a unitary matrix of size n and D is an m x n matrix given by

                                             si ifl1<i= j     r
                                    [D]ir
                                       [J    0   otherwise


Proof Let xl, x2, x3, ..., xn and yi, Y2, y3, ..., ym be the orthonormal bases described by the conclu-
sion of Theorem EEMAP [835]. Define U to be the m x m matrix whose columns are yi, 1 < i <im, and
define V to be the n x n matrix whose columns are xi, 1 < i < n. With orthonormal sets of columns, by
Theorem CUMOS [230] both U and V are unitary matrices.
   Then forl<i<m, 1<j<n,

                   [AV] i3 [Ax3]                           Definition MM [197]

                         = [ a    ]Theorem EEMAP [835]

                         = [syy],                          Definition SV [839]
                         = [yg]~ sj                        Definition CVSM [85]

                            m
                         = S[U]k [D]kJ
                           k=1
                         = [UD]gg                          Theorem EMP [198]

So by Theorem ME [425], AV =UD and thus


A =AI, =AVV* =UDV*


Version 2.02


﻿
Section SR Square Roots 841


Section SR
Square Roots


0


THIS SECTION IS A DRAFT, SUBJECT TO CHANGES
NEEDS NUMERICAL EXAMPLES

   With all our results about Hermitian matrices, their eigenvalues and their diagonalizations, it will be a
nearly trivial matter to now construct a "square root" of a positive semi-definite matrix. We will describe
the square root of a matrix A as a matrix S such that A = S2. In general, a matrix A might have many
such square roots. But with a few results in hand we will be able to impose an extra condition on S that
will make a unique S such that A = S2. At that point we can define the square root of A formally.

Subsection SRM
Square Root of a Matrix


Theorem PSMSR
Positive Semi-Definite Matrices and Square Roots
Suppose A is a square matrix. There is a positive semi-definite matrix S such that A = S2 if and only if
A is positive semi-definite.                                                                  D

Proof Let n denote the size of A.
   (<) Suppose that A is positive semi-definite. Since A is Hermitian (Definition PSM [818]) we know A
is normal (Definition NRML [606]) and so by Theorem OD [607] there is a unitary matrix U and a diagonal
matrix D, whose diagonal entries are the eigenvalues of A, such that D = U*AU. The eigenvalues of A
are all non-negative (Theorem EPSM [819]), which allows us to define a diagonal matrix E whose diagonal
entries are the positive square roots of the eigenvalues of A, in the same order as they appear in D. More
precisely, define E to be the diagonal matrix with non-negative diagonal entries such that E2 = D. Set
S = UEU*, and compute


S2 = UEU*UEU*
   = UEITEU*
   = UEEU*
   = UDU*
   = UU*AUU*
   = InAI
   = A


Definition UM [229]
Theorem MMIM [200]


Theorem OD [607]
Definition UM [229]
Theorem MMIM [200]


We need to first verify that S is Hermitian.


S* = (UEU*)*
     (UEU*)*
     (U*)*E*U*
   = UE*U*

   = U (EtU*
   = UEtU*


Theorem MMAD [204]
Theorem AA [190]
Definition A [189]
Theorem HMRE [427]


Version 2.02


﻿
                                                       Subsection SR.SRM   Square Root of a Matrix 842


                        = UEU*                            Diagonal matrix
                        =S

And finally, we want to check the use of S in an inner product. Notice that E is Hermitian since it is a
diagonal matrix with real entries. Furthermore, as a diagonal matrix, the eigenvalues of E are precisely
the diagonal entries, and since these were chosen to be positive, an application of Theorem EPSM [819]
tells us that E is positive semi-definite. Now, for any x E C"m,

                   (Sx, x) =(UEU*x, x)
                          = (EU*x, U*x)                       Theorem AIP [204]
                          = (E (U*x) , U*x)
                          > 0                                  Definition PSM [818]

So, according to Definition PSM [818], S is positive semi-definite.
    (-) Assume that A = S2, with S positive semi-definite. Then S is Hermitian, and we check that A is
Hermitian.

                       A* = (SS)*
                          = S* S*                        Theorem MMAD [204]
                          = SS                           Definition HM [205]
                          -A

Now for the use of A in an inner product. For any x E C",

                     (Ax, x) =_KS2x, x)
                             - (Sx, S*x)                     Theorem AIP [204]
                             = (Sx, Sx)                      Definition HM [205]
                             > 0                             Theorem PIP [172]

So by Definition PSM [818], A is positive semi-definite.                                            U
   There is a very close relationship between the eigenvalues and eigenspaces of a positive semi-definite
matrix and its positive semi-definite square root. The next theorem is interesting in its own right, but is
also an important technical step in some other important results, such as the upcoming uniqueness of the
square root (Theorem USR [843]).
Theorem EESR
Eigenvalues and Eigenspaces of a Square Root
Suppose that A is a positive semi-definite matrix and S is a positive semi-definite matrix such that
A = S2 If A1, A2, A3, ..., Ap are the distinct eigenvalues of A, then the distinct eigenvalues of S are
A1, A2, A,..., A6, andEs( (v) =EA(A) for 1 i 5p.D
Proof Let x be an eigenvector of S for an eigenvalue p. Then, in the style of Theorem EPM [421],

                              Ax =S2x       (S)     Sx)         x   p2x

so p2 is an eigenvalue of A and must equal some Ai. Furthermore, because S is positive semi-definite,
Theorem EPSM [819] tells us that p > 0. The impact for us here is that we cannot have two different


eigenvalues of S whose squares equal the same eigenvalue of A, so we can pair each eigenvalue of S with a
different eigenvalue of A, equal to its square. (A good exercise is to track through the rest of this proof in
the situation where S is not assumed to be positive semi-definite and we do not have this condition on the
eigenvalues. Where does the proof then break down?) Let pi, 1 < i < q denote the q distinct eigenvalues of


Version 2.02


﻿
                                                        Subsection SR.SRM   Square Root of a Matrix 843


S. The discussion above implies that we can order the eigenvalues of A and S so that Ai = p2 for 1 < <i q.
Notice that at this point we know that q < p, though we will be showing that q = p.
    Additionally, the equation above tells us that every eigenvector of S for p2 is again an eigenvector of A
for p2. So for 1 < i G q, the relevant eigenspaces are related by


So the eigenspaces of S are subsets of the eigenspaces of A, for the related eigenvalues. However, we will
be showing that these sets are indeed equal to each other.
    Both A and S are positive semi-definite, hence Hermitian and therefore normal. Theorem OD [607]
then tells us that each is diagonalizable (Definition DZM [435]). Then Theorem DMFE [438] says that the
algebraic multiplicity and geometric multiplicity of each eigenvalue are equal. Then, if we let n denote the
size of A,
                        q
                   n   >   as ( Ai)Theorem NEM [425]
                       i=1
                       q
                       3 ys (      i)                         Theorem DMFE [438]
                       i=1
                         q
                       =   dim (ES (     ))Definition GME [406]
                       i=1
                         q
                           dim (EA (A2))                      Theorem PSSD [358]
                        i=1
                        P
                     <     dim (EA (A2))                      Definition D [341]
                        i=1
                        P
                        =  YA (A2)                            Definition GME [406]
                        i=1
                        P
                        =  aA (A2)                            Theorem DMFE [438]
                        i=1
                     = n                                      Theorem NEM [425]

With equal values at the two ends of this chain of equalities and inequalities, we know that the two
inequalities are forced to actually be equalities. In particular, the second inequality implies that p = q and
the first, in conjunction with Theorem EDYES [358], implies that Es (vA) = EA (A;) for 1 G i < p.  U
    Notice that we defined the singular values of a matrix A as the square roots of the eigenvalues of A*A
(Definition SV [839]). With Theorem EESR [841] in hand we recognize the singular values of A as simply
the eigenvalues of A*A1/2. Indeed, many authors take this as the definition of singular values, since it is
equivalent to our definition. We have chosen not to wait for a discussion of square roots before making a
definition of singular values, allowing us to present the singular value decomposition (Theorem SVD [839])
all the sooner.
    In the first half of the proof of Theorem PSMSR [840] we could have chosen the matrix £ (which was
the essential component of the desired matrix 5) in a variety of ways. Any collection of diagonal entries
of £ could be replaced by their negatives and we would maintain the property that £2 =D. However,


if we decide to enforce the entries of E as non-negative quantities then E is positive semi-definite, and
then S follows along as a positive semi-definite matrix. We now show that of all the possible square roots
of a positive semi-definite matrix, only one is itself again positive semi-definite. In other words, the S of
Theorem PSMSR [840] is unique.


Version 2.02


﻿
                                                       Subsection SR.SRM   Square Root of a Matrix 844


Theorem USR
Unique Square Root
Suppose A is a positive semi-definite matrix. Then there is a unique positive semi-definite matrix S such
that A = S2.
Proof Theorem PSMSR [840] gives us the existence of at least one positive semi-definite matrix S such
that A = S2. As usual, we will assume that Si and S2 are positive semi-definite matrices such that
A = S1= S (Technique U [693]).
   As A is diagonalizable, there is a basis of C" composed entirely of eigenvectors of A (Theorem DC [436]),
say B = {x1, x2, x3, ..., xn}. Let 61, a2, 63, ..., 6n denote the associated eigenvalues. Theorem EESR
[841] allows to conclude that EA (al) = Est ( ) =Ss2 (/). So S1xi =         xxi = S2xi for 1 < i < n.
   Choose any x E C". The spanning property of B allows us to conclude the existence of a set of scalars,
a1, a2, a3, ..., an, yielding x as a linear combination of the vectors in B. So,
                       n         n            n             n              n
             Six =Si >    aixi =    aiSixi =     ai   xi =   3 aiS2xi = S2    axi = S2x
                      i=1        i=1         i=1           i=1             i=1

Since S1 and S2 have the same action on every vector, Theorem EMMVP [196] yields the conclusion that
S1=S2.                                                                                              U

   With a criteria that distinguishes one square root from all the rest (positive semi-definiteness) we can
now define the square root of a positive semi-definite matrix.
Definition SRM
Square Root of a Matrix
Suppose A is a positive semi-definite matrix and S is the positive semi-definite matrix such that S2
SS = A. Then S is the square root of A and we write S = A1/2.
(This definition contains Notation SRM.)                                                           A


Version 2.02


﻿
                                                              Section POD  Polar Decomposition 845


Section POD
Polar Decomposition


THIS SECTION IS A DRAFT, SUBJECT TO CHANGES
NEEDS NUMERICAL EXAMPLES

   The polar decomposition of a matrix writes any matrix as the product of a unitary matrix (Definition
UM [229])and a positive semi-definite matrix (Definition PSM [818]). It takes its name from a special way
to write complex numbers. If you've had a basic course in complex analysis, the next paragraph will help
explain the name. If the next paragraph makes no sense to you, there's no harm in skipping it.
   Any complex number z (EC can be written as z = rei0 where r is a positive number (computed as a
square root of a function of the real amd imaginary parts of z) and 0 is an angle of rotation that converts 1
to the complex number ei0 = cos(0) + i sin(0). The polar form of a square matrix is a product of a positive
semi-definite matrix that is a square root of a function of the matrix together with a unitary matrix, which
can be viewed as achieving a rotation (Theorem UMPIP [231]).
   OK, enough preliminaries. We have all the tools in place to jump straight to our main theorem.

Theorem PDM
Polar Decomposition of a Matrix
Suppose that A is a square matrix. Then there is a unitary matrix U such that A = (AA*)1/2 U.   E

Proof This theorem only claims the existence of a unitary matrix U that does a certain job. We will
manufacture U and check that it meets the requirements.
   Suppose A has size n and rank r. We begin by applying Theorem EEMAP [835] to A. Let B =
{x1, x2, x3, ..., xn} be the orthonormal basis of C" composed of eigenvectors for A*A, and let C =
{Yi, Y2, y3, ..., yn} be the orthonormal basis of C" composed of eigenvectors for AA*. We have Axe =
  vixi, 1 < i < r, and Axe = 0, r + 1 < i < n, where o2, 1 < i < r are the distinct nonzero eigenvalues of
A*A.
   Define T: C" H C" to be the unique linear transformation such that T (xi) = y2, 1 < i  n, as
guaranteed by Theorem LTDB [462]. Let E be the basis of standard unit vectors for C"m (Definition SUV
[173]), and define U to be the matrix representation (Definition MR [542]) of T with respect to E, more
carefully U =ME. This is the matrix we are after. Notice that

                    Uxi = M,EpE (xi)                      Definition VR [530]
                        = PE (T (xi))                     Theorem FTMR [544]
                        = pE (yi)                         Theorem FTMR [544]
                        =-y                               Definition VR [530]

Since B and C are orthonormal bases, and C is the result of multiplying the vectors of B by U, we conclude
that U is unitary by Theorem UMCOB [334]. So once again, Theorem EEMAP [835] is a big part of the
setup for a decomposition.
   Let x E C" be any vector. Since B is a basis of C", there are scalars ai, a2, a3, ..., an expressing x as
a linear combination of the vectors in B. then


(AA*)l/2 Ux = (AA*)l/2 U>   axi                            Definition B [325]
                         i=1

              =   (AA*)1/2 Uaix2                           Theorem MMDAA [201]
              i=1


Version 2.02


﻿

Section POD Polar Decomposition  846


                    S  a (AA*)l/2 Uxi                       Theorem MMSMM [201]
                    i=1


                    5ai (AA*)1/2
                    i=1
                    r                 12

                    5  ai (AA*)1/2 yi + 5 ai (AA*)1/2 yi    Property AAC [86]
                    i=1              i=r+1
                    r             n

                    5  ai 6y2 + 5    a (0)yi                Theorem EESR [841]
                    i=1         i=r+1
                    r             n

                    5  a    yiY2 + 5 aiO                    Theorem ZSSM [286]
                    i=1         i=r+1
                    r           n
                  =5aiAxi + 5E     aiAxi                    Theorem EEMAP [835]
                    i=1       i=r+1
                    n
                    5  aiAxi                                Property AAC [86]
                    i=1
                    n
                    5  Aa xi                                Theorem MMSMM [201]
                    i=1
                       n
                  = A    aixi                               Theorem MMDAA [201]
                      i=1
                  = Ax


So by Theorem EMMVP [196] we have the matrix equality (AA*)1/2 U = A.


0


Version 2.02


﻿


   Part A
Applications


847


﻿


Section CF
Curve Fitting


THIS SECTION IS INCOMPLETE

   Given two points in the plane, there is a unique line through them. Given three points in the plane,
and not in a line, there is a unique parabola through them. Given four points in the plane, there is a
unique polynomial, of degree 3 or less, passing through them. And so on. We can prove this result, and
give a procedure for finding the polynomial with the help of Vandermonde matrices (Section VM [814]).
Theorem IP
Interpolating Polynomial
Suppose {(x, yi) 1 < i <rn + 1} is a set of n + 1 points in the plane where the x-coordinates are all
different. Then there is a unique polynomial of degree n or less, p(x), such that p(Xi) =_yi, 1 < i < n + 1.


Proof Write p(x) = a0 + a1x + a2x2 + - - - + anz". To meet the conclusion of the theorem, we desire,

               yi=p(xi)=ao+aixi+a2x2 +...+anzx2                       1<irn+1

This is a system of n+ 1 linear equations in the n+ 1 variables ao, al, a2, ..., an. The vector of constants
in this system is the vector containing the y-coordinates of the points. More importantly, the coefficient
matrix is a Vandermonde matrix (Definition VM [814]) built from the x-coordinates xi, x2, x3, ..., Xn+l.
Since we have required that these scalars all be different, Theorem NVM [817] tells us that the coefficient
matrix is nonsingular and Theorem NMUS [74] says the solution for the coefficients of the polynomial
exists, and is unique. As a practical matter, Theorem SNCM [229] provides an expression for the solution.


Example PTFP
Polynomial through five points
Suppose we have the following 5 points in the plane and we wish to pass a degree 4 polynomial through
them.
                                  i    1    2   3       4      5


i


Xi   -3   -1   2      3      6


yZ 276


16


31 144 2319


The required system of equations has a coefficient matrix that is the Vandermonde matrix where row i is
successive powers of xi

                                        1 -3    9   -27    81
                                        1 -1     1   -1    1
                                  A=    1   2   4    8     16
                                        1   3   9    27    81
                                        1   6   36  216   1296

Theorem NMUS [74] provides a solution as

                ao          276      -  1                 1    1-2 4 276        3
                ai           16        0    -            -}    1      16       -4
                     a2~ 1   31               1     1    17     11    3
                a2 = A831         =-               -           7      315
                a3           144      -             1    1      1     144      -2
                a4_         2319_      4 1    1    1    -1     1jT _7_2319      2


﻿
                                                                  Subsection CF.DF  Data Fitting 849


So the polynomial is p(x) = 3 - 4x + 5x2- 2x3 + 2x4.

   The unique polynomial passing through a set of points is known as the interpolating polynomial
and it has many uses. Unfortunately, when confronted with data from an experiment the situation may
not be so simple or clear cut. Read on.

Subsection DF
Data Fitting


Suppose that we have n real variables, xi, x2, x3, ..., xn, that we can measure in an experiment. We
believe that these variables combine, in a linear fashion, to equal another real variable, y. In other words,
we have reason to believe from our understanding of the experiment, that

                                 y = aixzi+ a2x2 + a3x3 + ... + anon

where the scalars a1, a2, a3, ..., an are not known to us, but are instead desirable. We would call this our
model of the situation. Then we run the experiment m times, collecting sets of values for the variables
of the experiment. For run number k we might denote these values as Yk, Xkl, Xk2, Xk3, ..., Xkn. If we
substitute these values into the model equation, we get m linear equations in the unknown coefficients
ai, a2, a3, ..., an. If m = n, then we have a square coefficient matrix of the system which might happen
to be nonsingular and there would be a unique solution.
   However, more likely m > n (the more data we collect, the greater our confidence in the results) and
the resulting system is inconsistent. It may be that our model is only an approximate understanding of
the relationship between the x2 and y, or our measurements are not completely accurate. Still we would
like to understand the situation we are studying, and would like some best answer for a1, a2, a3, ..., an.
   Let y denote the vector with [y] = yz, 1 < i < m, let a denote the vector with [a]3 = aj, 1 < j < n,
and let X denote the m x n matrix with [X]i X=zij, 1 < i < m, 1 < j < n. Then the model equation,
evaluated with each run of the experiment, translates to Xa = y. With the presumption that this system
has no solution, we can try to minimize the difference between the two side of the equation y - Xa. As a
vector, it is hard to imagine what the minimum might be, so we instead minimize the square of its norm

                                       S= (y - Xa)t (y - Xa)

To keep the logical flow accurate, we will define the minimizing value and then give the proof that it
behaves as desired.

Definition LSS
Least Squares Solution
Given the equation Xa y, where X is an m x n~ matrix of rank n, the least squares solution for a is
(XLX)lXty.                                                                                        A

Theorem LSMR
Least Squares Minimizes Residuals
Suppose that X is an m x n~ matrix of rank n. The least squares solution of Xa =y, a' = (XLX)i X'y,
minimizes the expression

                                       S =(y - Xa)t (y - Xa)


Proof We begin by finding the critical points of S. In preparation, let X3 denote column j of X, for
1   j   n and compute partial derivatives with respect to aj, 1 < j <rn. A matrix product of the form


Version 2.02


﻿
                                                                 Subsection CF.DF  Data Fitting 850


xty is a sum of products, so a derivative is a sum of applications of the product rule,

                    a       a
                    aaS =((y -Xa)t (y -Xa))


                                ([j y - Xa]2) [y - Xa]2 + [y - Xa]2   ([y - Xa]2)
                           2=1 a                                   a

                        = 2 E -([y - Xa]i) [y - Xa]i
                            i=1 aa3
                            mn
                        = 2>-      [y] -      [X[]-X[a]a][y-Xa]
                            i=1            k=1

                        = 2 (   -X ] y - Xa]2
                            i=1
                        = -2 (X)t (y - Xa)


The first partial derivatives will allow us to find critical points, while second partial derivatives will be
needed to confirm that a critical point will yield a minimum. Return to the next-to-last expression for the
first partial derivative of S,


                                S S     2    -[X] [y -Xa]2
                                         i=1

                                  =-2     X   a[X][ [y - Xa]


                                       i=1                          akk=1
                                       m
                                   =-2E [X]2 (- [X],)
                                       i=1
                                       m
                                  =2>3([X]2i [X]Ze
                                      i=1

                                  = 2   [X]j2X]


For 1   j <rn, set ( S - 0. This results in the n~ scalar equations


These n vector equations can be summarized in the single vector equation,

                                           XLXa =Xiy


XLX is an n x n matrix and since we have assumed that X has rank n, XLX will also have rank n. Since
XLX is invertible, we have a critical point at

                                         a' = (XtX)1Xty


Version 2.02


﻿
                                                                      Subsection CF.DF   Data Fitting  851


Is this lone critical point really a minimum? The matrix of second partial derivatives is constant, and a
positive multiple of XLX. Theorem CPSM [818] tells us that this matrix is positive semi-definite. In an
advanced course on multivariable calculus, it is shown that a minimum occurs exactly where the matrix of
second partial derivatives is positive semi-definite. You may have seen this in the two-variable case, where
a check on the positive semi-definiteness is disguised with a determinant of the 2 x 2 matrix of second
partial derivatives.                                                                                     U


Version 2.02


﻿
                                                                     Subsection CF.EXC  Exercises 852


Subsection EXC
Exercises


T20 Theorem IP [847] constructs a unique polynomial through a set of n + 1 points in the plane,
{(xi, yi) 1 < i <rn + 1}, where the x-coordinates are all different. Prove that the expression below is
the same polynomial and include an explanation of the necessity of the hypothesis that the x-coordinates
are all different.
                                              n+1  n+1

                                              i=1   j=1 X    3
                                                    jii
This is known as the Lagrange form of the interpolating polynomial.
Contributed by Robert Beezer


Version 2.02


﻿
                                                                     Section SAS Sharing A Secret 853


Section SAS
Sharing A Secret


THIS SECTION IS A DRAFT, SUBJECT TO CHANGES

    In this section we will see how to use solutions to systems of equations to share a secret among a group
of people. We will be able to break a secret up into, say 10 pieces, so as to distribute the secret among 10
people. But rather than requiring all 10 people to collaborate on restoring the secret, we can design the
split so that any smaller group, of say just 4 of these people, can collaborate and restore the secret. The
numbers 10 and 4 here are arbitrary, we can choose them to be anything.
    Suppose we have a secret, S. This could be the combination to a lock, a password on an account, or
a recipe for chocolate chip cookies. If the secret is text, we will assume that the characters have been
translated into integers (say with the ASCII code), and these numbers have been rolled up into one grand
positive integer (perhaps by concatenating binary strings for the ASCII code numbers, and interpreting
the longer string as one big base 2 integer). So we will assume S is some positive integer.
    Suppose you wish to give parts of your secret to n people, and you wish to require that any group of
m (or more) of these people should be able to combine their parts and recover the secret. Perhaps you are
President and CEO of a small company and only you know the password that authorizes large transfers of
money among the company's bank accounts. If you were to die or become incapacitated, it would perhaps
hamper the company's ability to function if they couldn't quickly rearrange their assets, especially since
they are also without a CEO. So you might wish to give this secret to six of your trusted Vice-Presidents.
But you don't trust them that much and you certainly don't want any one of these people to be able to
access the company's accounts all by themselves without anybody else in the company knowing about it.
Simultaneously, you know that in an emergency, it might not be possible to get all six Vice-Presidents
together and maybe even one or two of them have met the same unfortunate fate you did. So you would
like any group of three Vice-Presidents to be able to combine their parts and recover S. So you would
choose n = 6 and m = 3.
   We will describe the split, with no motivation. The explanation of how the secret recovery is handled
will explain our choices here. Choose a large prime number, p, bigger than any possible secret. For a single
number in a combination lock, p could be small. For a one-page recipe, p would need to be huge. All of
our subsequent arithmetic will be modulo p, so consult Subsection F.FF [794] for a brief description of
how we do linear algebra when our field is Z. Build a polynomial, r(x), of degree m - 1 as follows. Set
the constant term to S, and choose the other m - 1 coefficients at random from Z. The quality of your
random generator will ultimately affect the quality of how hidden your secret remains.
    Compute the pairs (i, r(i)), 1 < i n. To person i, of the n persons you will give a part of your secret,
present the pair (i, r(i)), and instruct them to keep this secret, for all 1 < i < n. They could perhaps
encrypt their pairs with AES (Advanced Encryption Standard) using a password known only to them
individually. Or you could do this for each of them in advance and tell them the chose password orally, in
private. At any rate, each person gets a pair of integers, an input to the polynomial, and the output of
evaluating the polynomial, and they keep this information secret. They do not know the polynomial itself,
and certainly not the constant term 5, so the secret is still safe.
    Now suppose that m of these people get together, in the event you are unable to act, or perhaps without
your permission. Suppose they pool all of their pairs, or even just turn them over to one member of the
group. What do they now know collectively? Suppose that


                               r(x) = ao + aizx+ a2x2 + -..-+ am-ixm-1

where, of course, ao = S is the secret. A single pair, (i, r(i)), results in a linear equation whose unknowns
are the m coefficients of r(x). With m pairs revealed, we now have m equations in m variables. Furthermore,


Version 2.02


﻿
                                                                     Section SAS Sharing A Secret 854


the coefficient matrix of this system is a Vandermonde matrix (Definition VM [814]). With our inputs to
the polynomial all different (we used 1, 2, 3, ..., n), the Vandermonde matrix is nonsingular (Theorem
NVM [817]). Thus by Theorem NMUS [74] there is a unique solution for the coefficients of r(x). We only
desire the constant term   the other coefficients (the randomly chosen ones) are of no interest, they were
used to mask the secret as it was split into parts.
   A few practical considerations. If certain individuals in your group are more important, or more
trustworthy, you can give them more than one part. You could split a secret into 30 parts, giving 5 Vice-
Presidents each 4 parts and give 10 department heads each 1 part. Then you might require 12 parts to
be present. This way three Vice-Presidents could recover the secret, or 4 department heads could stand-in
for a Vice-President. Furthermore, the 10 department heads could not recover the secret without having
at least one Vice-President present.
   The inputs do not have to be consecutive integers, starting at 1. Any set of different integers will
suffice. Why make it any easier for an attacker? Mix it up and choose the inputs randomly as well, just
keep them different.
   Why do all this arithmetic over Z,? If we worked with polynomials having real number coefficients,
properties of polynomials as continuous functions might give an attacker the ability to compute the secret
with a reasonable amount of computing time. For example, the magnitude of the output is going to
dominated by the term of r(x) having degree m - 1. Suppose an attacker had a few of the pairs, but not
a full set of m of them. Or even worse, suppose some group of fewer than m of your trusted acquaintances
were to conspire against you. It might be possible to guess a limited range of values for the coefficient
of the largest term. With a limited range of values here, the next term might fall to a similar analysis.
And so on. However, modular arithmetic is in some ways very unpredictable looking and as high powers
"wrap-around" this sort of analysis will be frustrated. And we know it is no harder to do linear algebra in
Z than in C.
   OK, here's a non-trivial example.

Example SS6W
Sharing a secret 6 ways
Let's return to the CEO and his six Vice-Presidents. Suppose the password for the company's accounts
is a sequence of 5 two-digit numbers, which we will concatenate into a 10-digit number, in this case S
0603725962. For a prime p we choose the 11-digit prime number p = 22801761379. From the requirement
that m = 3 Vice-Presidents are needed to recover the secret, we need a second-degree polynomial and so
need two more coefficients, which we will construct at random between 1 and p. The resulting polynomial
is

                          r(x) = 603725962 + 22561982919x + 8844088338x2

We will now build six pairs of inputs and outputs, where we will choose the inputs at random (not allowing
duplicates) and we do all our arithmetic modulo p,

                            VP                       x            r(x)
                            Finance            20220406046     7205699654
                            Human Resources     8862377358    17357568951
                            Marketing          13747127957    18503158079
                            Legal              15835120319    14060705999
                            Research            6530855859     5628836054
                            Manufacturing       9222703664     2608052019


The two numbers of each row of the table are then given to the indicated Vice-President. Done. The secret
has been split six ways, and any three VP's can jointly recover the secret.
   Let's test the recovery process, especially since it contains the relevant linear algebra. Suppose we write
the unknown polynomial as r(x) = a0 + aix + a2x2 and the VP's for Finance, Marketing and Legal all get


Version 2.02


﻿
Section SAS Sharing A Secret 855


together to recover the secret. The equations we arrive at are,


Finance


Marketing


7205699654 = r(20220406046)
            = ao + al1(20220406046) + a2(20220406046)2
            = ao + 20220406046a1 + 7793596215a2
18503158079 = r(13747127957)
            = ao + a1(13747127957) + a2(13747127957)2
            = ao + 13747127957a1 + 18840301370a2
14060705999 = r(15835120319)
            = ao + a1(15835120319) + a2(15835120319)2
            = ao + 15835120319x1 + 8874412999a2


Legal


So they have a linear system, IS(A, b) with


      1 20220406046
A =   1 13747127957
      1  15835120319


7793596215
18840301370
8874412999]


      7205699654
b = 18503158079
      14060705999]


With a Vandermonde matrix as the coefficient matrix, they know there is a solution, and it is unique. By
Theorem SNCM [229] (or through row-reducing the augmented matrix) they arrive at the solution,


          5716900879
A--lb = 20952200747
         17286943796


9234437646
16452595922
18018241597


7850422855
8198726089
10298337365]


[


7205699654
18503158079
14060705999]


603725962
22561982919
8844088338]


So the CEO's password is the secret S = ao = 603725962 = 0603725962 (as expected).


Version 2.02


﻿


Index


A (appendix), 698
A (archetype), 702
A (definition), 189
A (notation), 189
A (part), 847
AA (Property), 279
AA (subsection, section WILA), 3
AA (theorem), 190
AAC (Property), 86
AACN (Property), 680
AAF (Property), 793
AALC (example), 92
AAM (Property), 184
ABLC (example), 91
ABS (example), 112
AC (Property), 279
ACC (Property), 86
ACCN (Property), 680
ACF (Property), 793
ACM (Property), 184
ACN (example), 679
additive associativity
    column vectors
      Property AAC, 86
    complex numbers
      Property AACN, 680
    matrices
      Property AAM, 184
    vectors
      Property AA, 279
additive closure
    column vectors
      Property ACC, 86
    complex numbers
      Property ACCN, 680
    field
      Property ACF, 793
    matrices
      Property ACM, 184
    vectors
      Property AC, 279
additive commutativity


    complex numbers
      Property CACN, 680
additive inverse
    complex numbers
      Property AICN, 681
    from scalar multiplication
      theorem AISM, 287
additive inverses
    column vectors
      Property AIC, 86
    matrices
      Property AIM, 184
    unique
      theorem AIU, 286
    vectors
      Property Al, 280
adjoint
    definition A, 189
    inner product
      theorem AIP, 204
    notation, 189
    of a matrix sum
      theorem AMA, 189
    of an adjoint
      theorem AA, 190
    of matrix scalar multiplication
      theorem AMSM, 189
AHSAC (example), 62
Al (Property), 280
AIC (Property), 86
AICN (Property), 681
AIF (Property), 794
AIM (Property), 184
AIP (theorem), 204
AISM (theorem), 287
AIU (theorem), 286
AIVLT (example), 508
ALT (example), 453
ALTMM (example), 546
AM (definition), 27
AM (example), 24
AM (notation), 27


856


﻿
INDEX 857


AM (subsection, section MO), 189
AMA (theorem), 189
AMAA (example), 27
AME (definition), 406
AME (notation), 406
AMSM (theorem), 189
ANILT (example), 509
ANM (example), 606
AOS (example), 173
Archetype A
    column space, 241
    linearly dependent columns, 137
    singular matrix, 71
    solving homogeneous system, 63
    system as linear combination, 92
archetype A
    augmented matrix
      example AMAA, 27
Archetype B
    column space, 241
    inverse
      example CMIAB, 218
    linearly independent columns, 137
    nonsingular matrix, 72
    not invertible
      example MWIAA, 213
    solutions via inverse
      example SABMI, 212
    solving homogeneous system, 63
    system as linear combination, 91
    vector equality, 84
archetype B
    solutions
      example SAB, 36
Archetype C
    homogeneous system, 62
Archetype D
    column space, original columns, 240
    solving homogeneous system, 63
    vector form of solutions, 95
Archetype I
    column space from row operations, 247
    null space, 65
    row space, 243
    vector form of solutions, 102
Archetype J:casting out vectors, 154
Archetype L
    null space span, linearly independent, 140
    vector form of solutions, 103


ASC (example), 536


augmented matrix
    notation, 27
AVR (example), 316

B (archetype), 707
B (definition), 325
B (section), 325
B (subsection, section B), 325
basis
    columns nonsingular matrix
      example CABAK, 330
    common size
      theorem BIS, 344
    crazy vector apace
      example BC, 328
    definition B, 325
    matrices
      example BM, 326
      example BSM22, 327
    polynomials
      example BP, 326
      example BPR, 356
      example BSP4, 326
      example SVP4, 357
    subspace of matrices
      example BDM22, 357
BC (example), 328
BCS (theorem), 239
BDE (example), 422
BDM22 (example), 357
best cities
    money magazine
      example MBC, 195
BIS (theorem), 344
BM (example), 326
BNM (subsection, section B), 330
BNS (theorem), 139
BP (example), 326
BPR (example), 356
BRLT (example), 501
BRS (theorem), 245
BS (theorem), 157
BSCV (subsection, section B), 328
BSM22 (example), 327
BSP4 (example), 326

C (archetype), 712
C (definition), 684
C (notation), 684
C (part), 2


C (Property), 279


Version 2.02


﻿
INDEX 858


C (technique, section PT), 690
CABAK (example), 330
CACN (Property), 680
CAEHW (example), 401
CAF (Property), 793
canonical form
    nilpotent linear transformation
      example CFNLT, 623
      theorem CFNLT, 619
CAV (subsection, section 0), 167
Cayley-Hamilton
    theorem CHT, 663
CB (section), 574
CB (theorem), 576
CBCV (example), 579
CBM (definition), 575
CBM (subsection, section CB), 575
CBP (example), 576
CC (Property), 86
CCCV (definition), 167
CCCV (notation), 167
CCM (definition), 187
CCM (example), 187
CCM (notation), 187
CCM (theorem), 188
CCN (definition), 681
CCN (notation), 681
CCN (subsection, section CNO), 681
CCRA (theorem), 681
CCRM (theorem), 682
CCT (theorem), 682
CD (subsection, section DM), 376
CD (technique, section PT), 692
CEE (subsection, section EE), 403
CELT (example), 592
CELT (subsection, section CB), 587
CEMS6 (example), 409
CF (section), 847
CFDVS (theorem), 535
CFNLT (example), 623
CFNLT (subsection, section NLT), 619
CFNLT (theorem), 619
CFV (example), 55
change of basis
    between polynomials
      example CBP, 576
change-of-basis
    between column vectors
      example CBCV, 579


matrix representation


      theorem MRCB, 581
    similarity
      theorem SCB, 583
    theorem CB, 576
change-of-basis matrix
    definition CBM, 575
    inverse
      theorem ICBM, 576
characteristic polynomial
    definition CP, 403
    degree
      theorem DCP, 424
    size 3 matrix
      example CPMS3, 403
CHT (subsection, section JCF), 663
CHT (theorem), 663
CILT (subsection, section ILT), 487
CILTI (theorem), 487
CIM (subsection, section MISLE), 214
CINM (theorem), 217
CIVLT (example), 512
CIVLT (theorem), 514
CLI (theorem), 536
CLTLT (theorem), 470
CM (definition), 25
CM (Property), 184
CM32 (example), 538
CMCN (Property), 680
CMF (Property), 793
CMI (example), 216
CMIAB (example), 218
CMVEI (theorem), 56
CN (appendix), 667
CNA (definition), 680
CNA (notation), 680
CNA (subsection, section CNO), 679
CNE (definition), 680
CNE (notation), 680
CNM (definition), 680
CNM (notation), 680
CNMB (theorem), 330
CNO (section), 679
CNS1 (example), 65
CNS2 (example), 66
CNSV (example), 171
COB (theorem), 332
coefficient matrix
    definition CM, 25
    nonsingular


theorem SNCM, 229


Version 2.02


﻿
INDEX 859


column space
    as null space
      theorem FS, 263
    Archetype A
      example CSAA, 241
    Archetype B
      example CSAB, 241
    as null space
      example CSANS, 258
    as null space, Archetype G
      example FSAG, 269
    as row space
      theorem CSRST, 247
    basis
      theorem BCS, 239
    consistent system
      theorem CSCS, 237
    consistent systems
      example CSMCS, 236
    isomorphic to range, 555
    matrix, 236
    nonsingular matrix
      theorem CSNM, 242
    notation, 236
    original columns, Archetype D
      example CSOCD, 240
    row operations, Archetype I
      example CSROI, 247
    subspace
      theorem CSMS, 302
    testing membership
      example MCSM, 237
    two computations
      example CSTW, 239
column vector addition
    notation, 85
column vector scalar multiplication
    notation, 85
commutativity
    column vectors
      Property CC, 86
    matrices
      Property CM, 184
    vectors
      Property C, 279
complex rn-space
    example VSCV, 281
complex arithmetic
    example ACN, 679


complex number


    conjugate
      example CSCN, 681
    modulus
      example MSCN, 682
complex number
    conjugate
      definition CCN, 681
    modulus
      definition MCN, 682
complex numbers
    addition
      definition CNA, 680
      notation, 680
    arithmetic properties
      theorem PCNA, 680
    equality
      definition CNE, 680
      notation, 680
    multiplication
      definition CNM, 680
      notation, 680
complex vector space
    dimension
      theorem DCM, 345
composition
    injective linear transformations
      theorem CILTI, 487
    surjective linear transformations
      theorem CSLTS, 503
conjugate
    addition
      theorem CCRA, 681
    column vector
      definition CCCV, 167
    matrix
      definition CCM, 187
      notation, 187
    multiplication
      theorem CCRM, 682
    notation, 681
    of conjugate of a matrix
      theorem CCM, 188
    scalar multiplication
      theorem CRSM, 167
    twice
      theorem CCT, 682
    vector addition
      theorem CRVA, 167
conjugate of a vector


notation, 167


Version 2.02


﻿
INDEX 860


conjugation
    matrix addition
      theorem CRMA, 188
    matrix scalar multiplication
      theorem CRMSM, 188
    matrix transpose
      theorem MCT, 189
consistent linear system, 53
consistent linear systems
    theorem CSRN, 54
consistent system
    definition CS, 50
constructive proofs
    technique C, 690
contradiction
    technique CD, 692
contrapositive
    technique CP, 691
converse
    technique CV, 691
coordinates
    orthonormal basis
      theorem COB, 332
coordinatization
    linear combination of matrices
      example CM32, 538
    linear independence
      theorem CLI, 536
    orthonormal basis
      example CROB3, 333
      example CROB4, 332
    spanning sets
      theorem CSS, 537
coordinatization principle, 538
coordinatizing
    polynomials
      example CP2, 537
COV (example), 154
COV (subsection, section LDS), 154
CP (definition), 403
CP (subsection, section VR), 536
CP (technique, section PT), 691
CP2 (example), 537
CPMS3 (example), 403
CPSM (theorem), 818
crazy vector space
    example CVSR, 536
    properties
      example PCVS, 288


CRMA (theorem), 188


CRMSM (theorem), 188
CRN (theorem), 347
CROB3 (example), 333
CROB4 (example), 332
CRS (section), 236
CRS (subsection, section FS), 258
CRSM (theorem), 167
CRVA (theorem), 167
CS (definition), 50
CS (example), 684
CS (subsection, section TSS), 50
CSAA (example), 241
CSAB (example), 241
CSANS (example), 258
CSCN (example), 681
CSCS (theorem), 237
CSIP (example), 168
CSLT (subsection, section SLT), 503
CSLTS (theorem), 503
CSM (definition), 236
CSM (notation), 236
CSMCS (example), 236
CSMS (theorem), 302
CSNM (subsection, section CRS), 241
CSNM (theorem), 242
CSOCD (example), 240
CSRN (theorem), 54
CSROI (example), 247
CSRST (diagram), 271
CSRST (theorem), 247
CSS (theorem), 537
CSSE (subsection, section CRS), 236
CSSOC (subsection, section CRS), 239
CSTW (example), 239
CTD (subsection, section TD), 831
CTLT (example), 470
CUMOS (theorem), 230
curve fitting
    polynomial through 5 points
      example PTFP, 847
CV (definition), 24
CV (notation), 25
CV (technique, section PT), 691
CVA (definition), 84
CVA (notation), 85
CVC (notation), 25
CVE (definition), 84
CVE (notation), 84
CVS (example), 283


CVS (subsection, section VR), 535


Version 2.02


﻿
INDEX 861


CVSM (definition), 85
CVSM (example), 86
CVSM (notation), 85
CVSR (example), 536

D (acronyms, section PDM), 395
D (archetype), 716
D (chapter), 370
D (definition), 341
D (notation), 341
D (section), 341
D (subsection, section D), 341
D (subsection, section SD), 435
D (technique, section PT), 687
D33M (example), 375
DAB (example), 435
DC (example), 346
DC (technique, section PT), 694
DC (theorem), 436
DCM (theorem), 345
DCN (Property), 681
DCP (theorem), 424
DD (subsection, section DM), 374
DEC (theorem), 378
decomposition
    technique DC, 694
DED (theorem), 440
definition
    A, 189
    AM, 27
    AME, 406
    B, 325
    C, 684
    CBM, 575
    CCCV, 167
    CCM, 187
    CCN, 681
    CM, 25
    CNA, 680
    CNE, 680
    CNM, 680
    CP, 403
    CS, 50
    CSM, 236
    CV, 24
    CVA, 84
    CVE, 84
    CVSM, 85
    D, 341
    DIM, 435


DM, 375


DS, 361
DZM, 435
EEF, 261
EELT, 574
EEM, 396
ELEM, 370
EM, 404
EO, 11
ES, 683
ESYS, 11
F, 793
GES, 631
GEV, 631
GME, 406
HI, 809
HID, 809
HM, 205
HP, 808
HS, 62
IDLT, 508
IDV, 52
IE, 641
ILT, 477
IM, 72
IMP, 794
IP, 168
IS, 627
IVLT, 508
IVS, 515
JB, 612
JCF, 650
KLT, 481
LC, 297
LCCV, 90
LI, 308
LICV, 132
LNS, 257
LSS, 848
LT, 452
LTA, 467
LTC, 469
LTM, 601
LTR, 635
LTSM, 468
M, 24
MA, 182
MCN, 682
ME, 182
MI, 213


MM, 197


Version 2.02


﻿
INDEX 862


MR, 542
MRLS, 26
MSM, 183
MVP, 194
NLT, 610
NM, 71
NOLT, 517
NOM, 347
NRML, 606
NSM, 64
NV, 171
ONS, 177
OSV, 173
OV, 172
PI, 465
PSM, 818
REM, 28
RLD, 308
RLDCV, 132
RLT, 496
RO, 28
ROLT, 517
ROM, 347
RR, 39
RREF, 30
RSM, 243
S, 292
SC, 685
SE, 684
SET, 683
SI, 685
SIM, 432
SLE, 9
SLT, 492
SM, 375
SOLV, 26
SQM, 71
SRM, 843
SS, 298
SSCV, 112
SSET, 683
SU, 685
SUV, 173
SV, 839
SYM, 186
T, 802
technique D, 687
TM, 185
TS, 296


TSHSE, 62


    TSVS, 313
    UM, 229
    UTM, 601
    VM, 814
    VOC, 25
    VR, 530
    VS, 279
    VSCV, 83
    VSM, 182
    ZCV, 25
    ZM, 185
DEHD (example), 440
DEM (theorem), 388
DEMMM (theorem), 389
DEMS5 (example), 411
DER (theorem), 376
DERC (theorem), 385
determinant
    computed two ways
    example TCSD, 379
    definition DM, 375
    equal rows or columns
    theorem DERC, 385
    expansion, columns
    theorem DEC, 378
    expansion, rows
    theorem DER, 376
    identity matrix
    theorem DIM, 387
    matrix multiplication
    theorem DRMM, 391
    nonsingular matrix, 389
    notation, 375
    row or column multiple
    theorem DRCM, 384
    row or column swap
    theorem DRCS, 383
    size 2 matrix
    theorem DMST, 376
    size 3 matrix
    example D33M, 375
    transpose
    theorem DT, 377
    via row operations
    example DRO, 386
    zero
    theorem SMZD, 389
    zero row or column
    theorem DZRC, 383


zero versus nonzero


Version 2.02


﻿
INDEX 863


      example ZNDAB, 390
determinant, upper triangular matrix
    example DUTM, 379
determinants
    elementary matrices
      theorem DEMMM, 389
DF (Property), 794
DF (subsection, section CF), 848
DFS (subsection, section PD), 360
DFS (theorem), 360
DGES (theorem), 650
diagonal matrix
    definition DIM, 435
diagonalizable
    definition DZM, 435
    distinct eigenvalues
      example DEHD, 440
      theorem DED, 440
    full eigenspaces
      theorem DMFE, 438
    not
      example NDMS4, 440
diagonalizable matrix
    high power
      example HPDM, 441
diagonalization
    Archetype B
      example DAB, 435
    criteria
      theorem DC, 436
    example DMS3, 437
diagram
    CSRST, 271
    DLTA, 453
    DLTM, 453
    DTSLS, 56
    FTMR, 545
    FTMRA, 546
    GLT, 456
    ILT, 479
    MRCLT, 552
    NILT, 478
DIM (definition), 435
DIM (theorem), 387
dimension
    crazy vector space
      example DC, 346
    definition D, 341
    notation, 341


polynomial subspace


      example DSP4, 346
    proper subspaces
      theorem PSSD, 358
    subspace
      example DSM22, 345
direct sum
    decomposing zero vector
      theorem DSZV, 362
    definition DS, 361
    dimension
      theorem DSD, 364
    example SDS, 361
    from a basis
      theorem DSFB, 361
    from one subspace
      theorem DSFOS, 362
    notation, 361
    zero intersection
      theorem DSZI, 363
direct sums
    linear independence
      theorem DSLI, 364
    repeated
      theorem RDS, 365
distributivity
    complex numbers
      Property DCN, 681
    field
      Property DF, 794
distributivity, matrix addition
    matrices
      Property DMAM, 184
distributivity, scalar addition
    column vectors
      Property DSAC, 87
    matrices
      Property DSAM, 184
    vectors
      Property DSA, 280
distributivity, vector addition
    column vectors
      Property DVAC, 87
    vectors
      Property DVA, 280
DLDS (theorem), 152
DLTA (diagram), 453
DLTM (diagram), 453
DM (definition), 375
DM (notation), 375


DM (section), 370


Version 2.02


﻿
INDEX 864


DM (theorem), 345
DMAM (Property), 184
DMFE (theorem), 438
DMHP (subsection, section HP), 810
DMHP (theorem), 810
DMMP (theorem), 811
DMS3 (example), 437
DMST (theorem), 376
DNLT (theorem), 616
DNMMM (subsection, section PDM), 389
DP (theorem), 345
DRCM (theorem), 384
DRCMA (theorem), 385
DRCS (theorem), 383
DRMM (theorem), 391
DRO (example), 386
DRO (subsection, section PDM), 383
DROEM (subsection, section PDM), 387
DS (definition), 361
DS (notation), 361
DS (subsection, section PD), 361
DSA (Property), 280
DSAC (Property), 87
DSAM (Property), 184
DSD (theorem), 364
DSFB (theorem), 361
DSFOS (theorem), 362
DSLI (theorem), 364
DSM22 (example), 345
DSP4 (example), 346
DSZI (theorem), 363
DSZV (theorem), 362
DT (theorem), 377
DTSLS (diagram), 56
DUTM (example), 379
DVA (Property), 280
DVAC (Property), 87
DVM (theorem), 814
DVS (subsection, section D), 345
DZM (definition), 435 '
DZRC (theorem), 383

E (acronyms, section SD), 451
E (archetype), 720
E (chapter), 396
E (technique, section PT), 690
E.SAGE (computation, section SAGE), 677
ECEE (subsection, section EE), 406
EDELI (theorem), 419
EDYES (theorem), 358


EE (section), 396


EEE (subsection, section EE), 399
EEF (definition), 261
EEF (subsection, section FS), 261
EELT (definition), 574
EELT (subsection, section CB), 574
EEM (definition), 396
EEM (subsection, section EE), 396
EEMAP (theorem), 835
EENS (example), 435
EER (theorem), 586
EESR (theorem), 841
EHM (subsection, section PEE), 427
eigenspace
    as null space
      theorem EMNS, 405
    definition EM, 404
    invariant subspace
      theorem EIS, 629
    subspace
      theorem EMS, 404
eigenspaces
    sage, 677
eigenvalue
    algebraic multiplicity
      definition AME, 406
      notation, 406
    complex
      example CEMS6, 409
    definition EEM, 396
    existence
      example CAEHW, 401
      theorem EMHE, 400
    geometric multiplicity
      definition GME, 406
      notation, 406
    index, 641
    linear transformation
      definition EELT, 574
    multiplicities
      example EMMS4, 406
    power
      theorem EOMP, 421
    root of characteristic polynomial
      theorem EMRCP, 404
    scalar multiple
      theorem ESMM, 421
    symmetric matrix
      example ESMS4, 407
    zero


theorem SMZE, 420


Version 2.02


﻿
INDEX 865


eigenvalues
    building desired
      example BDE, 422
    complex, of a linear transformation
      example CELT, 592
    conjugate pairs
      theorem ERMCP, 423
    distinct
      example DEMS5, 411
    example SEE, 396
    Hermitian matrices
      theorem HMRE, 427
    inverse
      theorem EIM, 422
    maximum number
      theorem MNEM, 427
    multiplicities
      example HMEM5, 408
      theorem ME, 425
    number
      theorem NEM, 425
    of a polynomial
      theorem EPM, 421
    size 3 matrix
      example EMS3, 404
      example ESMS3, 405
    transpose
      theorem ETM, 423
eigenvalues, eigenvectors
    vector, matrix representations
      theorem EER, 586
eigenvector, 396
    linear transformation, 574
eigenvectors, 397
    conjugate pairs, 423
    Hermitian matrices
      theorem HMOE, 428
    linear transformation
      example ELTBM, 574
      example ELTBP, 575
    linearly independent
      theorem EDELI, 419
    of a linear transformation
      example ELTT, 587
EILT (subsection, section ILT), 477
EIM (theorem), 422
EIS (example), 629
EIS (theorem), 629
ELEM (definition), 370


ELEM (notation), 371


elementary matrices
    definition ELEM, 370
    determinants
      theorem DEM, 388
    nonsingular
      theorem EMN, 374
    notation, 371
    row operations
      example EMRO, 371
      theorem EMDRO, 372
ELIS (theorem), 355
ELTBM (example), 574
ELTBP (example), 575
ELTT (example), 587
EM (definition), 404
EM (subsection, section DM), 370
EMDRO (theorem), 372
EMHE (theorem), 400
EMMS4 (example), 406
EMMVP (theorem), 196
EMN (theorem), 374
EMNS (theorem), 405
EMP (theorem), 198
empty set, 683
    notation, 683
EMRCP (theorem), 404
EMRO (example), 371
EMS (theorem), 404
EMS3 (example), 404
ENLT (theorem), 615
EO (definition), 11
EOMP (theorem), 421
EOPSS (theorem), 12
EPM (theorem), 421
EPSM (theorem), 819
equal matrices
    via equal matrix-vector products
      theorem EMMVP, 196
equation operations
    definition EO, 11
    theorem EOPSS, 12
equivalence statements
    technique E, 690
equivalences
    technique ME, 693
equivalent systems
    definition ESYS, 11
ERMCP (theorem), 423
ES (definition), 683


ES (notation), 683


Version 2.02


﻿
INDEX 866


ESEO (subsection, section SSLE), 11
ESLT (subsection, section SLT), 492
ESMM (theorem), 421
ESMS3 (example), 405
ESMS4 (example), 407
ESYS (definition), 11
ETM (theorem), 423
EVS (subsection, section VS), 280
example
   AALC, 92
   ABLC, 91
   ABS, 112
   ACN, 679
   AHSAC, 62
   AIVLT, 508
   ALT, 453
   ALTMM, 546
   AM, 24
   AMAA, 27
   ANILT, 509
   ANM, 606
   AOS, 173
   ASC, 536
   AVR, 316
   BC, 328
   BDE, 422
   BDM22, 357
   BM, 326
   BP, 326
   BPR, 356
   BRLT, 501
   BSM22, 327
   BSP4, 326
   CABAK, 330
   CAEHW, 401
   CBCV, 579
   CBP, 576
   CCM, 187
   CELT, 592
   CEMS6, 409
   CFNLT, 623
   CFV, 55
   CIVLT, 512
   CM32, 538
   CMI, 216
   CMIAB, 218
   CNS1, 65
   CNS2, 66
   CNSV, 171


COV, 154


CP2, 537
CPMS3, 403
CROB3, 333
CROB4, 332
CS, 684
CSAA, 241
CSAB, 241
CSANS, 258
CSCN, 681
CSIP, 168
CSMCS, 236
CSOCD, 240
CSROI, 247
CSTW, 239
CTLT, 470
CVS, 283
CVSM, 86
CVSR, 536
D33M, 375
DAB, 435
DC, 346
DEHD, 440
DEMS5, 411
DMS3, 437
DRO, 386
DSM22, 345
DSP4, 346
DUTM, 379
EENS, 435
EIS, 629
ELTBM, 574
ELTBP, 575
ELTT, 587
EMMS4, 406
EMRO, 371
EMS3, 404
ESMS3, 405
ESMS4, 407
FDV, 52
FF8, 796
FRAN, 497
FS1, 267
FS2, 268
FSAG, 269
FSCF, 442
GE4, 632
GE6, 633
GENR6, 641
GSTV, 176


HISAA, 63


Version 2.02


﻿
INDEX 867


HISAD, 63
HMEM5, 408
HP, 808
HPDM, 441
HUSAB, 63
IAP, 485
IAR, 478
IAS, 246
IAV, 480
ILTVR, 559
IM, 72
IM11, 795
IS, 15
ISJB, 630
ISMR4, 638
ISMR6, 639
ISSI, 51
IVSAV, 515
JB4, 612
JCF10, 652
KPNLT, 618
KVMR, 553
LCM, 297
LDCAA, 137
LDHS, 135
LDP4, 344
LDRN, 136
LDS, 132
LIC, 312
LICAB, 137
LIHS, 134
LIM32, 310
LINSB, 138
LIP4, 308
LIS, 133
LLDS, 136
LNS, 257
LTDB1, 463
LTDB2, 464
LTDB3, 464
LTM, 457
LTPM, 455
LTPP, 455
LTRGE, 635
MA, 183
MBC, 195
MCSM, 237
MELT, 459
MI, 214


MIVS, 536


MMNC, 198
MNSLE, 195
MOLT, 461
MPMR, 549
MRBE, 584
MRCM, 581
MSCN, 682
MSM, 183
MTV, 194
MWIAA, 213
NDMS4, 440
NIAO, 485
NIAQ, 477
NIAQR, 484
NIDAU, 486
NJB5, 613
NKAO, 481
NLT, 454
NM, 72
NM62, 611
NM64, 610
NM83, 614
NRREF, 30
NSAO, 499
NSAQ, 492
NSAQR, 499
NSC2A, 295
NSC2S, 296
NSC2Z, 295
NSDAT, 502
NSDS, 119
NSE, 10
NSEAI, 65
NSLE, 26
NSLIL, 140
NSNM, 74
NSR, 73
NSS, 73
OLTTR, 542
ONFV, 178
ONTV, 177
OSGMD, 56
OSMC, 231
PCVS, 288
PM, 398
PSHS, 106
PTFP, 847
PTM, 197
PTMEE, 199


RAO, 496


Version 2.02


﻿
INDEX 868


RES, 159
RNM, 347
RNSM, 348
ROD2, 824
ROD4, 825
RREF, 30
RREFN, 50
RRTI, 359
RS, 329
RSAI, 243
RSB, 328
RSC4, 159
RSC5, 153
RSNS, 297
RSREM, 245
RVMR, 556
S, 71
SAA, 37
SAB, 36
SABMI, 212
SAE, 38
SAN, 500
SAR, 493
SAV, 494
SC, 686
SC3, 292
SCAA, 114
SCAB, 116
SCAD, 120
SDS, 361
SEE, 396
SEEF, 261
SETM, 683
SI, 685
SM2Z7, 796
SM32, 300
SMLT, 469
SMS3, 433
SMS5, 432
SP4, 294
SPIAS, 465
SRR, 73
SS, 375
SS6W, 853
SSC, 315
SSET, 683
SSM22, 314
SSNS, 118
SSP, 299


SSP4, 313


   STLT, 468
   STNE, 9
   SU, 685
   SUVOS, 173
   SVP4, 357
   SYM, 186
   TCSD, 379
   TD4, 829
   TDEE6, 833
   TDSSE, 830
   TIS, 627
   TIVS, 536
   TKAP, 482
   TLC, 90
   TM, 185
   TMP, 3
   TOV, 172
   TREM, 28
   TTS, 10
   UM3, 229
   UPM, 229
   US, 14
   USR, 29
   VA, 85
   VESE, 84
   VFS, 96
   VFSAD, 95
   VFSAI, 102
   VFSAL, 103
   VM4, 814
   VRC4, 531
   VRP2, 533
   VSCV, 281
   VSF, 282
   VSIM5, 795
   VSIS, 282
   VSM, 281
   VSP, 281
   VSPUD, 346
   VSS, 283
   ZNDAB, 390
EXC (subsection,
EXC (subsection,
EXC (subsection,
EXC (subsection,
EXC (subsection,
EXC (subsection,
EXC (subsection,
EXC (subsection,


EXC (subsection,


section B), 337
section CB), 596
section CF), 851
section CRS), 249
section D), 351
section DM), 381
section EE), 414
section F), 799
section FS), 272


Version 2.02


﻿
INDEX 869


EXC (subsection, section HP), 813
EXC (subsection, section HSE), 67
EXC (subsection, section ILT), 488
EXC (subsection, section IVLT), 522
EXC (subsection, section LC), 108
EXC (subsection, section LDS), 162
EXC (subsection, section LI), 142
EXC (subsection, section LISS), 319
EXC (subsection, section LT), 472
EXC (subsection, section MINM), 234
EXC (subsection, section MISLE), 222
EXC (subsection, section MM), 207
EXC (subsection, section MO), 191
EXC (subsection, section MR), 562
EXC (subsection, section NM), 76
EXC (subsection, section 0), 179
EXC (subsection, section PD), 366
EXC (subsection, section PDM), 393
EXC (subsection, section PEE), 429
EXC (subsection, section PSM), 821
EXC (subsection, section RREF), 40
EXC (subsection, section S), 304
EXC (subsection, section SD), 446
EXC (subsection, section SLT), 504
EXC (subsection, section SS), 123
EXC (subsection, section SSLE), 18
EXC (subsection, section T), 806
EXC (subsection, section TSS), 58
EXC (subsection, section VO), 88
EXC (subsection, section VR), 540
EXC (subsection, section VS), 290
EXC (subsection, section WILA), 7
extended echelon form
    submatrices
      example SEEF, 261
extended reduced row-echelon form
    properties
      theorem PEEF, 262

F (archetype), 724
F (definition), 793
F (section), 793
F (subsection, section F), 793
FDV (example), 52
FF (subsection, section F), 794
FF8 (example), 796
Fibonacci sequence
    example FSCF, 442
field
    definition F, 793


FIMP (theorem), 795


finite field
    size 8
      example FF8, 796
four subsets
    example FS1, 267
    example FS2, 268
four subspaces
    dimension
      theorem DFS, 360
FRAN (example), 497
free variables
    example CFV, 55
free variables, number
    theorem FVCS, 55
free, independent variables
    example FDV, 52
FS (section), 257
FS (subsection, section FS), 263
FS (subsection, section SD), 442
FS (theorem), 263
FS1 (example), 267
FS2 (example), 268
FSAG (example), 269
FSCF (example), 442
FTMR (diagram), 545
FTMR (theorem), 544
FTMRA (diagram), 546
FV (subsection, section TSS), 55
FVCS (theorem), 55


G (archetype), 729
G (theorem), 355
GE4 (example), 632
GE6 (example), 633
GEE (subsection, section IS), 630
GEK (theorem), 632
generalized eigenspace
    as kernel
      theorem GEK, 632
    definition GES, 631
    dimension
      theorem DGES, 650
    dimension 4 domain
      example GE4, 632
    dimension 6 domain
      example GE6, 633
    invariant subspace
      theorem GESIS, 631
    nilpotent restriction
      theorem RGEN, 640
    nilpotent restrictions, dimension 6 domain


Version 2.02


﻿
INDEX 870


      example GENR6, 641
    notation, 631
generalized eigenspace decomposition
    theorem GESD, 644
generalized eigenvector
    definition GEV, 631
GENR6 (example), 641
GES (definition), 631
GES (notation), 631
GESD (subsection, section JCF), 644
GESD (theorem), 644
GESIS (theorem), 631
GEV (definition), 631
GFDL (appendix), 786
GLT (diagram), 456
GME (definition), 406
GME (notation), 406
goldilocks
    theorem G, 355
Gram-Schmidt
    column vectors
      theorem GSP, 175
    three vectors
      example GSTV, 176
gram-schmidt
    mathematica, 670
GS (technique, section PT), 689
GSP (subsection, section 0), 175
GSP (theorem), 175
GSP.MMA (computation, section MMA), 670
GSTV (example), 176
GT (subsection, section PD), 355

H (archetype), 733
Hadamard Identity
    notation, 809
Hadamard identity
    definition HID, 809
Hadamard Inverse
    notation, 809
Hadamard inverse
    definition HI, 809
Hadamard Product
    Diagonalizable Matrices
      theorem DMHP, 810
    notation, 808
Hadamard product
    commutativity
      theorem HPC, 808
    definition HP, 808


diagonal matrices


      theorem DMMP, 811
    distributivity
      theorem HPDAA, 810
    example HP, 808
    identity
      theorem HPHID, 809
    inverse
      theorem HPHI, 809
    scalar matrix multiplication
      theorem HPSMM, 810
hermitian
    definition HM, 205
Hermitian matrix
    inner product
      theorem HMIP, 205
HI (definition), 809
HI (notation), 809
HID (definition), 809
HID (notation), 809
HISAA (example), 63
HISAD (example), 63
HM (definition), 205
HM (subsection, section MM), 204
HMEM5 (example), 408
HMIP (theorem), 205
HMOE (theorem), 428
HMRE (theorem), 427
HMVEI (theorem), 64
homogeneous system
    Archetype C
      example AHSAC, 62
    consistent
      theorem HSC, 62
    definition HS, 62
    infinitely many solutions
      theorem HMVEI, 64
homogeneous systems
    linear independence, 134
HP (definition), 808
HP (example), 808
HP (notation), 808
HP (section), 808
HPC (theorem), 808
HPDAA (theorem), 810
HPDM (example), 441
HPHI (theorem), 809
HPHID (theorem), 809
HPSMM (theorem), 810
HS (definition), 62


HSC (theorem), 62


Version 2.02


﻿
INDEX 871


HSE (section), 62
HUSAB (example), 63

I (archetype), 737
I (technique, section PT), 694
IAP (example), 485
IAR (example), 478
IAS (example), 246
IAV (example), 480
ICBM (theorem), 576
ICLT (theorem), 514
identities
    technique PI, 693
identity matrix
    determinant, 388
    example IM, 72
    notation, 72
IDLT (definition), 508
IDV (definition), 52
IE (definition), 641
IE (notation), 641
IFDVS (theorem), 536
IILT (theorem), 511
ILT (definition), 477
ILT (diagram), 479
ILT (section), 477
ILTB (theorem), 486
ILTD (subsection, section ILT), 486
ILTD (theorem), 486
ILTIS (theorem), 511
ILTLI (subsection, section ILT), 485
ILTLI (theorem), 485
ILTLT (theorem), 511
ILTVR (example), 559
IM (definition), 72
IM (example), 72
IM (notation), 72
IM (subsection, section MISLE), 213
IM11 (example), 795
IMILT (theorem), 560
JMP (definition), 794
JMR (theorem), 557
inconsistent linear systems
    theorem JSRN, 54
independent, dependent variables
    definition JDV, 52
indesxstring
    example SM2Z7, 796
    example SSET, 683
index


eigenvalue


      definition IE, 641
      notation, 641
indexstring
    theorem DRCMA, 385
    theorem OBUTR, 605
    theorem UMCOB, 334
induction
    technique I, 694
infinite solution set
    example ISSI, 51
infinite solutions, 3 x 4
    example IS, 15
injective
    example IAP, 485
    example IAR, 478
    not
      example NIAO, 485
      example NIAQ, 477
      example NIAQR, 484
    not, by dimension
      example NIDAU, 486
    polynomials to matrices
      example IAV, 480
injective linear transformation
    bases
      theorem ILTB, 486
injective linear transformations
    dimension
      theorem ILTD, 486
inner product
    anti-commutative
      theorem IPAC, 170
    example CSIP, 168
    norm
      theorem IPN, 171
    notation, 168
    positive
      theorem PIP, 172
    scalar multiplication
      theorem JPSM, 170
    vector addition
      theorem JPVA, 169
integers
    mod p
      definition JMP, 794
    mod p, field
      theorem FJMP, 795
    mod 11
      example JM11, 795


interpolating polynomial


Version 2.02


﻿
INDEX 872


    theorem IP, 847
invariant subspace
    definition IS, 627
    eigenspace, 629
    eigenspaces
      example EIS, 629
    example TIS, 627
    Jordan block
      example ISJB, 630
    kernels of powers
      theorem KPIS, 629
inverse
    composition of linear transformations
      theorem ICLT, 514
    example CMI, 216
    example MI, 214
    notation, 213
    of a matrix, 213
invertible linear transformation
    defined by invertible matrix
      theorem IMILT, 560
invertible linear transformations
    composition
      theorem CIVLT, 514
    computing
      example CIVLT, 512
IP (definition), 168
IP (notation), 168
IP (subsection, section 0), 168
IP (theorem), 847
IPAC (theorem), 170
IPN (theorem), 171
IPSM (theorem), 170
IPVA (theorem), 169
IS (definition), 627
IS (example), 15
IS (section), 627
IS (subsection, section IS), 627
ISJB (example), 630
ISMR4 (example), 638
ISMR6 (example), 639
isomorphic
    multiple vector spaces
      example MIVS, 536
    vector spaces
      example IVSAV, 515
isomorphic vector spaces
    dimension
      theorem IVSED, 516


example TIVS, 536


ISRN (theorem), 54
ISSI (example), 51
ITMT (theorem), 602
IV (subsection, section IVLT), 511
IVLT (definition), 508
IVLT (section), 508
IVLT (subsection, section IVLT), 508
IVLT (subsection, section MR), 557
IVS (definition), 515
IVSAV (example), 515
IVSED (theorem), 516

J (archetype), 741
JB (definition), 612
JB (notation), 612
JB4 (example), 612
JCF (definition), 650
JCF (section), 644
JCF (subsection, section JCF), 650
JCF1O (example), 652
JCFLT (theorem), 651
Jordan block
    definition JB, 612
    nilpotent
      theorem NJB, 614
    notation, 612
    size 4
      example JB4, 612
Jordan canonical form
    definition JCF, 650
    size 10
      example JCF1O, 652

K (archetype), 746
kernel
    injective linear transformation
      theorem KILT, 484
    isomorphic to null space
      theorem KNSI, 552
    linear transformation
      example NKAO, 481
    notation, 481
    of a linear transformation
      definition KLT, 481
    pre-image, 483
    subspace
      theorem KLTS, 482
    trivial
      example TKAP, 482
    via matrix representation


example KVMR, 553


Version 2.02


﻿
INDEX 873


KILT (theorem), 484
KLT (definition), 481
KLT (notation), 481
KLT (subsection, section ILT), 481
KLTS (theorem), 482
KNSI (theorem), 552
KPI (theorem), 483
KPIS (theorem), 629
KPLT (theorem), 616
KPNLT (example), 618
KPNLT (theorem), 617
KVMR (example), 553

L (archetype), 750
L (technique, section PT), 688
LA (subsection, section WILA), 2
LC (definition), 297
LC (section), 90
LC (subsection, section LC), 90
LC (technique, section PT), 696
LCCV (definition), 90
LCM (example), 297
LDCAA (example), 137
LDHS (example), 135
LDP4 (example), 344
LDRN (example), 136
LDS (example), 132
LDS (section), 152
LDSS (subsection, section LDS), 152
least squares
    minimizes residuals
      theorem LSMR, 848
least squares solution
    definition LSS, 848
left null space
    as row space, 263
    definition LNS, 257
    example LNS, 257
    notation, 257
    subspace
      theorem LNSMS, 303
lemma
    technique LC, 696
LI (definition), 308
LI (section), 132
LI (subsection, section LISS), 308
LIC (example), 312
LICAB (example), 137
LICV (definition), 132
LIRS (example), 134


LIM32 (example), 310


linear combination
    system of equations
      example ABLC, 91
    definition LC, 297
    definition LCCV, 90
    example TLC, 90
    linear transformation, 462
    matrices
      example LCM, 297
    system of equations
      example AALC, 92
linear combinations
    solutions to linear systems
      theorem SLSLC, 93
linear dependence
    more vectors than size
      theorem MVSLD, 137
linear independence
    definition LI, 308
    definition LICV, 132
    homogeneous systems
      theorem LIVHS, 134
    injective linear transformation
      theorem ILTLI, 485
    matrices
      example LIM32, 310
    orthogonal, 174
    r and n
      theorem LIVRN, 136
linear solve
    mathematica, 668
    sage, 676
linear system
    consistent
      theorem RCLS, 53
    matrix representation
      definition MRLS, 26
      notation, 26
linear systems
    notation
      example MNSLE, 195
      example NSLE, 26
linear transformation
    polynomials to polynomials
      example LTPP, 455
    addition
      definition LTA, 467
      theorem MLTLT, 468
      theorem SLTLT, 467


as matrix multiplication


Version 2.02


﻿
INDEX 874


  example ALTMM, 546
basis of range
  example BRLT, 501
checking
  example ALT, 453
composition
  definition LTC, 469
  theorem CLTLT, 470
defined by a matrix
  example LTM, 457
defined on a basis
  example LTDB1, 463
  example LTDB2, 464
  example LTDB3, 464
  theorem LTDB, 462
definition LT, 452
identity
  definition IDLT, 508
injection
  definition ILT, 477
inverse
  theorem ILTLT, 511
inverse of inverse
  theorem IILT, 511
invertible
  definition IVLT, 508
  example AIVLT, 508
invertible, injective and surjective
  theorem ILTIS, 511
Jordan canonical form
  theorem JCFLT, 651
kernels of powers
  theorem KPLT, 616
linear combination
  theorem LTLC, 462
matrix of, 460
  example MFLT, 459
  example MOLT, 461
not
  example NLT, 454
not invertible
  example ANILT, 509
notation, 452
polynomials to matrices
  example LTPM, 455
rank plus nullity
  theorem RPNDD, 517
restriction
  definition LTR, 635


notation, 635


    scalar multiple
      example SMLT, 469
    scalar multiplication
      definition LTSM, 468
    spanning range
      theorem SSRLT, 500
    sum
      example STLT, 468
    surjection
      definition SLT, 492
    vector space of, 469
    zero vector
      theorem LTTZZ, 456
linear transformation inverse
    via matrix representation
      example ILTVR, 559
linear transformation restriction
    on generalized eigenspace
      example LTRGE, 635
linear transformations
    compositions
      example CTLT, 470
    from matrices
      theorem MBLT, 459
linearly dependent
    r <rn
      example LDRN, 136
    via homogeneous system
      example LDHS, 135
linearly dependent columns
    Archetype A
      example LDCAA, 137
linearly dependent set
    example LDS, 132
    linear combinations within
      theorem DLDS, 152
    polynomials
      example LDP4, 344
linearly independent
    crazy vector space
      example LIC, 312
    extending sets
      theorem ELIS, 355
    polynomials
      example LIP4, 308
    via homogeneous system
      example LIRS, 134
linearly independent columns
    Archetype B


example LICAB, 137


Version 2.02


﻿
INDEX 875


linearly independent set
    example LIS, 133
    example LLDS, 136
LINM (subsection, section LI), 137
LINSB (example), 138
LIP4 (example), 308
LIS (example), 133
LISS (section), 308
LISV (subsection, section LI), 132
LIVHS (theorem), 134
LIVRN (theorem), 136
LLDS (example), 136
LNS (definition), 257
LNS (example), 257
LNS (notation), 257
LNS (subsection, section FS), 257
LNSMS (theorem), 303
lower triangular matrix
    definition LTM, 601
LS.MMA (computation, section MMA), 668
LS.SAGE (computation, section SAGE), 676
LSMR (theorem), 848
LSS (definition), 848
LT (acronyms, section IVLT), 528
LT (chapter), 452
LT (definition), 452
LT (notation), 452
LT (section), 452
LT (subsection, section LT), 452
LTA (definition), 467
LTC (definition), 469
LTC (subsection, section LT), 456
LTDB (theorem), 462
LTDB1 (example), 463
LTDB2 (example), 464
LTDB3 (example), 464
LTLC (subsection, section LT), 461
LTLC (theorem), 462
LTM (definition), 601
LTM (example), 457
LTPM (example), 455
LTPP (example), 455
LTR (definition), 635
LTR (notation), 635
LTRGE (example), 635
LTSM (definition), 468
LTTZZ (theorem), 456

M (acronyms, section ES), 278
M (archetype), 754


M (chapter), 182


M (definition), 24
M (notation), 24
MA (definition), 182
MA (example), 183
MA (notation), 183
MACN (Property), 680
MAF (Property), 794
MAP (subsection, section SVD), 835
mathematica
    gram-schmidt (computation), 670
    linear solve (computation), 668
    matrix entry (computation), 667
    matrix inverse (computation), 671
    matrix multiplication (computation), 671
    null space (computation), 669
    row reduce (computation), 667
    transpose of a matrix (computation), 671
    vector form of solutions (computation), 669
    vector linear combinations (computation), 668
mathematical language
    technique L, 688
matrix
    addition
      definition MA, 182
      notation, 183
    augmented
      definition AM, 27
    column space
      definition CSM, 236
    complex conjugate
      example CCM, 187
    definition M, 24
    equality
      definition ME, 182
      notation, 182
    example AM, 24
    identity
      definition IM, 72
    inverse
      definition MI, 213
    nonsingular
      definition NM, 71
    notation, 24
    of a linear transformation
      theorem MLTCV, 460
    product
      example PTM, 197
      example PTMEE, 199
    product with vector


definition MVP, 194


Version 2.02


﻿
INDEX 876


    rectangular, 71
    row space
      definition RSM, 243
    scalar multiplication
      definition MSM, 183
      notation, 183
    singular, 71
    square
      definition SQM, 71
    submatrices
      example SS, 375
    submatrix
      definition SM, 375
    symmetric
      definition SYM, 186
    transpose
      definition TM, 185
    unitary
      definition UM, 229
    unitary is invertible
      theorem UMI, 230
    zero
      definition ZM, 185
matrix addition
    example MA, 183
matrix components
    notation, 24
matrix entry
    mathematica, 667
    sage, 675
    ti83, 673
    ti86, 672
matrix inverse
    Archetype B, 218
    computation
      theorem CINM, 217
    mathematica, 671
    nonsingular matrix
      theorem NI, 228
    of a matrix inverse
      theorem MIMI, 220
    one-sided
      theorem OSIS, 227
    product
      theorem SS, 219
    sage, 677
    scalar multiple
      theorem MISM, 221
    size 2 matrices


theorem TTMI, 214


    transpose
      theorem MIT, 220
    uniqueness
      theorem MIU, 219
matrix multiplication
    adjoints
      theorem MMAD, 204
    associativity
      theorem MMA, 202
    complex conjugation
      theorem MMCC, 203
    definition MM, 197
    distributivity
      theorem MMDAA, 201
    entry-by-entry
      theorem EMP, 198
    identity matrix
      theorem MMIM, 200
    inner product
      theorem MMIP, 202
    mathematica, 671
    noncommutative
      example MMNC, 198
    scalar matrix multiplication
      theorem MMSMM, 201
    systems of linear equations
      theorem SLEMM, 195
    transposes
      theorem MMT, 203
    zero matrix
      theorem MMZM, 200
matrix product
    as composition of linear transformations
      example MPMR, 549
matrix representation
    basis of eigenvectors
      example MRBE, 584
    composition of linear transformations
      theorem MRCLT, 549
    definition MR, 542
    invertible
      theorem IMR, 557
    multiple of a linear transformation
      theorem MRMLT, 548
    notation, 542
    restriction to generalized eigenspace
      theorem MRRGE, 643
    sum of linear transformations
      theorem MRSLT, 548


theorem FTMR, 544


Version 2.02


﻿
INDEX 877


    upper triangular
    theorem UTMR, 602
matrix representations
    converting with change-of-basis
    example MRCM, 581
    example OLTTR, 542
matrix scalar multiplication
    example MSM, 183
matrix vector space
    dimension
    theorem DM, 345
matrix-adjoint product
    eigenvalues, eigenvectors
    theorem EEMAP, 835
matrix-vector product
    example MTV, 194
    notation, 194
MBC (example), 195
MBLT (theorem), 459
MC (notation), 24
MCC (subsection, section MO), 187
MCCN (Property), 680
MCF (Property), 793
MCN (definition), 682
MCN (subsection, section CNO), 682
MCSM (example), 237
MCT (theorem), 189
MD (chapter), 822
ME (definition), 182
ME (notation), 182
ME (subsection, section PEE), 424
ME (technique, section PT), 693
ME (theorem), 425
ME.MMA (computation, section MMA), 667
ME.SAGE (computation, section SAGE), 675
ME.T183 (computation, section T183), 673
ME.T186 (computation, section T186), 672
MEASM (subsection, section MO), 182
MFLT (example), 459
MI (definition), 213
MI (example), 214
MI (notation), 213
MI.MMA (computation, section MMA), 671
MI.SAGE (computation, section SAGE), 677
MICN (Property), 681
MIF (Property), 794
MIMI (theorem), 220
MINM (section), 226
MISLE (section), 212


MISM (theorem), 221


MIT (theorem), 220
MIU (theorem), 219
MIVS (example), 536
MLT (subsection, section LT), 457
MLTCV (theorem), 460
MLTLT (theorem), 468
MM (definition), 197
MM (section), 194
MM (subsection, section MM), 197
MM.MMA (computation, section MMA), 671
MMA (section), 667
MMA (theorem), 202
MMAD (theorem), 204
MMCC (theorem), 203
MMDAA (theorem), 201
MMEE (subsection, section MM), 198
MMIM (theorem), 200
MMIP (theorem), 202
MMNC (example), 198
MMSMM (theorem), 201
MMT (theorem), 203
MMZM (theorem), 200
MNEM (theorem), 427
MNSLE (example), 195
MO (section), 182
MOLT (example), 461
more variables than equations
    example OSGMD, 56
    theorem CMVEI, 56
MPMR (example), 549
MR (definition), 542
MR (notation), 542
MR (section), 542
MRBE (example), 584
MRCB (theorem), 581
MRCLT (diagram), 552
MRCLT (theorem), 549
MRCM (example), 581
MRLS (definition), 26
MRLS (notation), 26
MRMLT (theorem), 548
MRRGE (theorem), 643
MRS (subsection, section CB), 581
MRSLT (theorem), 548
MSCN (example), 682
MSM (definition), 183
MSM (example), 183
MSM (notation), 183
MTV (example), 194


multiplicative associativity


Version 2.02


﻿
INDEX 878


    complex numbers
      Property MACN, 680
multiplicative closure
    complex numbers
      Property MCCN, 680
    field
      Property MCF, 793
multiplicative commutativity
    complex numbers
      Property CMCN, 680
multiplicative inverse
    complex numbers
      Property MICN, 681
MVNSE (subsection, section RREF), 24
MVP (definition), 194
MVP (notation), 194
MVP (subsection, section MM), 194
MVSLD (theorem), 137
MWIAA (example), 213

N (archetype), 757
N (subsection, section 0), 171
N (technique, section PT), 691
NDMS4 (example), 440
negation of statements
    technique N, 691
NEM (theorem), 425
NI (theorem), 228
NIAO (example), 485
NIAQ (example), 477
NIAQR (example), 484
NIDAU (example), 486
nilpotent
    linear transformation
      definition NLT, 610
NILT (diagram), 478
NJB (theorem), 614
NJB5 (example), 613
NKAO (example), 481
NLT (definition), 610
NLT (example), 454
NLT (section), 610
NLT (subsection, section NLT), 610
NLTFO (subsection, section LT), 467
NM (definition), 71
NM (example), 72
NM (section), 71
NM (subsection, section NM), 71
NM (subsection, section OD), 606
NM62 (example), 611


NM64 (example), 610


NM83 (example), 614
NME1 (theorem), 75
NME2 (theorem), 138
NME3 (theorem), 228
NME4 (theorem), 242
NME5 (theorem), 331
NME6 (theorem), 349
NME7 (theorem), 390
NME8 (theorem), 420
NME9 (theorem), 560
NMI (subsection, section MINM), 226
NMLIC (theorem), 138
NMPEM (theorem), 374
NMRRI (theorem), 72
NMTNS (theorem), 74
NMUS (theorem), 74
NOILT (theorem), 517
NOLT (definition), 517
NOLT (notation), 517
NOM (definition), 347
NOM (notation), 347
nonsingular
    columns as basis
      theorem CNMB, 330
nonsingular matrices
    linearly independent columns
      theorem NMLIC, 138
nonsingular matrix
    Archetype B
      example NM, 72
    column space, 242
    elementary matrices
      theorem NMPEM, 374
    equivalences
      theorem NME1, 75
      theorem NME2, 138
      theorem NME3, 228
      theorem NME4, 242
      theorem NME5, 331
      theorem NME6, 349
      theorem NME7, 390
      theorem NME8, 420
      theorem NME9, 560
    matrix inverse, 228
    null space
      example NSNM, 74
    nullity, 349
    product of nonsingular matrices
      theorem NPNT, 226


rank


Version 2.02


﻿
INDEX 879


     theorem RNNM, 349
     row-reduced
     theorem NMRRI, 72
     trivial null space
     theorem NMTNS, 74
     unique solutions
     theorem NMUS, 74
nonsingular matrix, row-reduced
    example NSR, 73
norm
    example CNSV, 171
    inner product, 171
    notation, 171
normal matrix
    definition NRML, 606
    example ANM, 606
    orthonormal basis, 609
notation
    A, 189
    AM, 27
    AME, 406
    C, 684
    CCCV, 167
    CCM, 187
    CCN, 681
    CNA, 680
    CNE, 680
    CNM, 680
    CSM, 236
    CV, 25
    CVA, 85
    CVC, 25
    CVE, 84
    CVSM, 85
    D, 341
    DM, 375
    DS, 361
    ELEM, 371
    ES, 683
    GES, 631
    GME, 406
    HI, 809
    HID, 809
    HP, 808
    IE, 641
    IM, 72
    IP, 168
    JB, 612
    KLT, 481


LNS, 257


   LT, 452
   LTR, 635
   M, 24
   MA, 183
   MC, 24
   ME, 182
   MI, 213
   MR, 542
   MRLS, 26
   MSM, 183
   MVP, 194
   NOLT, 517
   NOM, 347
   NSM, 64
   NV, 171
   RLT, 496
   RO, 28
   ROLT, 517
   ROM, 347
   RREFA, 30
   RSM, 243
   SC, 685
   SE, 684
   SETM, 683
   SI, 685
   SM, 375
   SRM, 843
   SSET, 683
   SSV, 112
   SU, 685
   SUV, 173
   T, 802
   TM, 185
   VR, 530
   VSCV, 83
   VSM, 182
   ZCV, 25
   ZM, 185
notation for a linear system
    example NSE, 10
NPNT (theorem), 226
NRFO (subsection, section MR), 548
NRML (definition), 606
NRREF (example), 30
NS.MMA (computation, section MMA), 669
NSAO (example), 499
NSAQ (example), 492
NSAQR (example), 499
NSC2A (example), 295


NSC2S (example), 296


Version 2.02


﻿
INDEX 880


NSC2Z (example), 295
NSDAT (example), 502
NSDS (example), 119
NSE (example), 10
NSEAI (example), 65
NSLE (example), 26
NSLIL (example), 140
NSM (definition), 64
NSM (notation), 64
NSM (subsection, section HSE), 64
NSMS (theorem), 296
NSNM (example), 74
NSNM (subsection, section NM), 73
NSR (example), 73
NSS (example), 73
NSSLI (subsection, section LI), 138
Null space
    as a span
      example NSDS, 119
null space
    Archetype I
      example NSEAI, 65
    basis
      theorem BNS, 139
    computation
      example CNS1, 65
      example CNS2, 66
    isomorphic to kernel, 552
    linearly independent basis
      example LINSB, 138
    mathematica, 669
    matrix
      definition NSM, 64
    nonsingular matrix, 74
    notation, 64
    singular matrix, 73
    spanning set
      example SSNS, 118
      theorem SSNS, 118
    subspace
      theorem NSMS, 296
null space span, linearly independent
    Archetype L
      example NSLIL, 140
nullity
    computing, 347
    injective linear transformation
      theorem NOILT, 517
    linear transformation


definition NOLT, 517


    matrix, 347
      definition NOM, 347
    notation, 347, 517
    square matrix, 348
NV (definition), 171
NV (notation), 171
NVM (theorem), 817

O (archetype), 760
O (Property), 280
O (section), 167
OBC (subsection, section B), 331
OBNM (theorem), 609
OBUTR (theorem), 605
OC (Property), 87
OCN (Property), 681
OD (section), 601
OD (subsection, section OD), 607
OD (theorem), 607
OF (Property), 794
OLTTR (example), 542
OM (Property), 184
one
    column vectors
      Property OC, 87
    complex numbers
      Property OCN, 681
    field
      Property OF, 794
    matrices
      Property OM, 184
    vectors
      Property 0, 280
ONFV (example), 178
ONS (definition), 177
ONTV (example), 177
orthogonal
    linear independence
      theorem OSLI, 174
    set
      example AOS, 173
    set of vectors
      definition OSV, 173
    vector pairs
      definition OV, 172
orthogonal vectors
    example TOV, 172
orthonormal
    definition ONS, 177
    matrix columns


example OSMC, 231


Version 2.02


﻿
INDEX 881


orthonormal basis
    normal matrix
      theorem OBNM, 609
orthonormal diagonalization
    theorem OD, 607
orthonormal set
    four vectors
      example ONFV, 178
    three vectors
      example ONTV, 177
OSGMD (example), 56
OSIS (theorem), 227
OSLI (theorem), 174
OSMC (example), 231
OSV (definition), 173
OV (definition), 172
OV (subsection, section O), 172

P (appendix), 679
P (archetype), 763
P (technique, section PT), 695
particular solutions
    example PSHS, 106
PCNA (theorem), 680
PCVS (example), 288
PD (section), 355
PDM (section), 383
PDM (theorem), 844
PEE (section), 419
PEEF (theorem), 262
PI (definition), 465
PI (subsection, section LT), 465
PI (technique, section PT), 693
PIP (theorem), 172
PM (example), 398
PM (subsection, section EE), 398
PMI (subsection, section MISLE), 219
PMM (subsection, section MM), 200
PMR (subsection, section MR), 552
PNLT (subsection, section NLT), 615
POD (section), 844
polar decomposition
    theorem PDM, 844
polynomial
    of a matrix
      example PM, 398
polynomial vector space
    dimension
      theorem DP, 345
positive semi-definite


creating


      theorem CPSM, 818
positive semi-definite matrix
    definition PSM, 818
    eigenvalues
      theorem EPSM, 819
practice
    technique P, 695
pre-image
    definition PI, 465
    kernel
      theorem KPI, 483
pre-images
    example SPIAS, 465
principal axis theorem, 609
product of triangular matrices
    theorem PTMT, 601
Property
    AA, 279
    AAC, 86
    AACN, 680
    AAF, 793
    AAM, 184
    AC, 279
    ACC, 86
    ACCN, 680
    ACF, 793
    ACM, 184
    Al, 280
    AIC, 86
    AICN, 681
    AIF, 794
    AIM, 184
    C, 279
    CACN, 680
    CAF, 793
    CC, 86
    CM, 184
    CMCN, 680
    CMF, 793
    DCN, 681
    DF, 794
    DMAM, 184
    DSA, 280
    DSAC, 87
    DSAM, 184
    DVA, 280
    DVAC, 87
    MACN, 680
    MAE, 794


MCCN, 680


Version 2.02


﻿
INDEX 882


    MCF, 793
    MICN, 681
    MIF, 794
    0, 280
    OC, 87
    OCN, 681
    OF, 794
    OM, 184
    SC, 279
    SCC, 86
    SCM, 184
    SMA, 280
    SMAC, 86
    SMAM, 184
    Z, 280
    ZC, 86
    ZCN, 681
    ZF, 794
    ZM, 184
PSHS (example), 106
PSHS (subsection, section LC), 105
PSM (definition), 818
PSM (section), 818
PSM (subsection, section PSM), 818
PSM (subsection, section SD), 433
PSMSR (theorem), 840
PSPHS (theorem), 105
PSS (subsection, section SSLE), 10
PSSD (theorem), 358
PSSLS (theorem), 55
PT (section), 687
PTFP (example), 847
PTM (example), 197
PTMEE (example), 199
PTMT (theorem), 601

Q (archetype), 765

R (acronyms, section JCF), 665
R (archetype), 769
R (chapter), 530
R.SAGE (computation, section SAGE), 674
range
    full
      example FRAN, 497
    isomorphic to column space
      theorem RCSI, 555
    linear transformation
      example RAO, 496
    notation, 496


of a linear transformation


      definition RLT, 496
    pre-image
      theorem RPI, 501
    subspace
      theorem RLTS, 497
    surjective linear transformation
      theorem RSLT, 498
    via matrix representation
      example RVMR, 556
rank
    computing
      theorem CRN, 347
    linear transformation
      definition ROLT, 517
    matrix
      definition ROM, 347
      example RNM, 347
    notation, 347, 517
    of transpose
      example RRTI, 359
    square matrix
      example RNSM, 348
    surjective linear transformation
      theorem ROSLT, 517
    transpose
      theorem RMRT, 359
rank one decomposition
    size 2
      example ROD2, 824
    size 4
      example ROD4, 825
    theorem ROD, 823
rank+nullity
    theorem RPNC, 348
RAO (example), 496
RCLS (theorem), 53
RCSI (theorem), 555
RD (subsection, section VS), 288
RDS (theorem), 365
READ (subsection, section B), 336
READ (subsection, section CB), 595
READ (subsection, section CRS), 248
READ (subsection, section D), 350
READ (subsection, section DM), 380
READ (subsection, section EE), 413
READ (subsection, section ES), 271
READ (subsection, section HSE), 66
READ (subsection, section ILT), 487
READ (subsection, section IVLT), 521


READ (subsection, section LC), 107


Version 2.02


﻿
INDEX 883


READ (subsection, section LDS), 161
READ (subsection, section LI), 141
READ (subsection, section LISS), 318
READ (subsection, section LT), 471
READ (subsection, section MINM), 232
READ (subsection, section MISLE), 221
READ (subsection, section MM), 206
READ (subsection, section MO), 190
READ (subsection, section MR), 561
READ (subsection, section NM), 75
READ (subsection, section 0), 178
READ (subsection, section PD), 365
READ (subsection, section PDM), 392
READ (subsection, section PEE), 428
READ (subsection, section RREF), 39
READ (subsection, section S), 303
READ (subsection, section SD), 445
READ (subsection, section SLT), 503
READ (subsection, section SS), 122
READ (subsection, section SSLE), 17
READ (subsection, section TSS), 57
READ (subsection, section VO), 87
READ (subsection, section VR), 539
READ (subsection, section VS), 289
READ (subsection, section WILA), 6
reduced row-echelon form
    analysis
      notation, 30
    definition RREF, 30
    example NRREF, 30
    example RREF, 30
    extended
      definition EEF, 261
    notation
      example RREFN, 50
    unique
      theorem RREFU, 32
reducing a span
    example RSC5, 153
relation of linear dependence
    definition RLD, 308
    definition RLDCV, 132
REM (definition), 28
REMEF (theorem), 30
REMES (theorem), 28
REMRS (theorem), 244
RES (example), 159
RGEN (theorem), 640
rings


sage, 674


RLD (definition), 308
RLDCV (definition), 132
RLT (definition), 496
RLT (notation), 496
RLT (subsection, section IS), 635
RLT (subsection, section SLT), 496
RLTS (theorem), 497
RMRT (theorem), 359
RNLT (subsection, section IVLT), 517
RNM (example), 347
RNM (subsection, section D), 347
RNNM (subsection, section D), 348
RNNM (theorem), 349
RNSM (example), 348
RO (definition), 28
RO (notation), 28
RO (subsection, section RREF), 27
ROD (section), 822
ROD (theorem), 823
ROD2 (example), 824
ROD4 (example), 825
ROLT (definition), 517
ROLT (notation), 517
ROM (definition), 347
ROM (notation), 347
ROSLT (theorem), 517
row operations
    definition RO, 28
    elementary matrices, 371, 372
    notation, 28
row reduce
    mathematica, 667
    sage, 675
    ti83, 673
    ti86, 672
row space
    Archetype I
      example RSAI, 243
    as column space, 247
    basis
      example RSB, 328
      theorem BRS, 245
    matrix, 243
    notation, 243
    row-equivalent matrices
      theorem REMRS, 244
    subspace
      theorem RSMS, 303
row-equivalent matrices


definition REM, 28


Version 2.02


﻿
INDEX 884


    example TREM, 28
    row space, 244
    row spaces
      example RSREM, 245
    theorem REMES, 28
row-reduce
    the verb
      definition RR, 39
row-reduced matrices
    theorem REMEF, 30
RPI (theorem), 501
RPNC (theorem), 348
RPNDD (theorem), 517
RR (definition), 39
RR.MMA (computation, section MMA), 667
RR.SAGE (computation, section SAGE), 675
RR.T183 (computation, section T183), 673
RR.T186 (computation, section T186), 672
RREF (definition), 30
RREF (example), 30
RREF (section), 24
RREF (subsection, section RREF), 29
RREFA (notation), 30
RREFN (example), 50
RREFU (theorem), 32
RRTI (example), 359
RS (example), 329
RSAI (example), 243
RSB (example), 328
RSC4 (example), 159
RSC5 (example), 153
RSLT (theorem), 498
RSM (definition), 243
RSM (notation), 243
RSM (subsection, section CRS), 243
RSMS (theorem), 303
RSNS (example), 297
RSREM (example), 245
RT (subsection, section PD), 358
RVMR (example), 556

S (archetype), 772
S (definition), 292
S (example), 71
S (section), 292
SAA (example), 37
SAB (example), 36
SABMI (example), 212
SAE (example), 38
sage


eigenspaces (computation), 677


    linear solve (computation), 676
    matrix entry (computation), 675
    matrix inverse (computation), 677
    rings (computation), 674
    row reduce (computation), 675
    transpose of a matrix (computation), 677
    vector linear combinations (computation), 677
SAGE (section), 674
SAN (example), 500
SAR (example), 493
SAS (section), 852
SAV (example), 494
SC (definition), 685
SC (example), 686
SC (notation), 685
SC (Property), 279
SC (subsection, section S), 302
SC (subsection, section SET), 684
SC3 (example), 292
SCAA (example), 114
SCAB (example), 116
SCAD (example), 120
scalar closure
    column vectors
      Property SCC, 86
    matrices
      Property SCM, 184
    vectors
      Property SC, 279
scalar multiple
    matrix inverse, 221
scalar multiplication
    zero scalar
      theorem ZSSM, 286
    zero vector
      theorem ZVSM, 286
    zero vector result
      theorem SMEZV, 287
scalar multiplication associativity
    column vectors
      Property SMAC, 86
    matrices
      Property SMAM, 184
    vectors
      Property SMA, 280
SCB (theorem), 583
SCC (Property), 86
SCM (Property), 184
SD (section), 432


SDS (example), 361


Version 2.02


﻿
INDEX 885


SE (definition), 684
SE (notation), 684
secret sharing
    6 ways
      example SS6W, 853
SEE (example), 396
SEEF (example), 261
SER (theorem), 433
set
    cardinality
      definition C, 684
      example CS, 684
      notation, 684
    complement
      definition SC, 685
      example SC, 686
      notation, 685
    definition SET, 683
    empty
      definition ES, 683
    equality
      definition SE, 684
      notation, 684
    intersection
      definition SI, 685
      example SI, 685
      notation, 685
    membership
      example SETM, 683
      notation, 683
    size, 684
    subset, 683
    union
      definition SU, 685
      example SU, 685
      notation, 685
SET (definition), 683
SET (section), 683
SETM (example), 683
SETM (notation), 683
shoes, 219
SHS (subsection, section HSE), 62
SI (definition), 685
SI (example), 685
SI (notation), 685
SI (subsection, section IVLT), 515
SIM (definition), 432
similar matrices
    equal eigenvalues


example EENS, 435


    eual eigenvalues
      theorem SMEE, 434
    example SMS3, 433
    example SMS5, 432
similarity
    definition SIM, 432
    equivalence relation
      theorem SER, 433
singular matrix
    Archetype A
      example S, 71
    null space
      example NSS, 73
singular matrix, row-reduced
    example SRR, 73
singular value decomposition
    theorem SVD, 839
singular values
    definition SV, 839
SLE (acronyms, section NM), 82
SLE (chapter), 2
SLE (definition), 9
SLE (subsection, section SSLE), 9
SLELT (subsection, section IVLT), 520
SLEMM (theorem), 195
SLSLC (theorem), 93
SLT (definition), 492
SLT (section), 492
SLTB (theorem), 501
SLTD (subsection, section SLT), 502
SLTD (theorem), 502
SLTLT (theorem), 467
SM (definition), 375
SM (notation), 375
SM (subsection, section SD), 432
SM2Z7 (example), 796
SM32 (example), 300
SMA (Property), 280
SMAC (Property), 86
SMAM (Property), 184
SMEE (theorem), 434
SMEZV (theorem), 287
SMLT (example), 469
SMS (theorem), 186
SMS3 (example), 433
SMS5 (example), 432
SMZD (theorem), 389
SMZE (theorem), 420
SNCM (theorem), 229


SO (subsection, section SET), 685


Version 2.02


﻿
INDEX 886


socks, 219
SOL (subsection, section B), 338
SOL (subsection, section CB), 597
SOL (subsection, section CRS), 253
SOL (subsection, section D), 353
SOL (subsection, section DM), 382
SOL (subsection, section EE), 415
SOL (subsection, section F), 801
SOL (subsection, section FS), 274
SOL (subsection, section HSE), 69
SOL (subsection, section ILT), 490
SOL (subsection, section IVLT), 524
SOL (subsection, section LC), 110
SOL (subsection, section LDS), 164
SOL (subsection, section LI), 146
SOL (subsection, section LISS), 321
SOL (subsection, section LT), 474
SOL (subsection, section MINM), 235
SOL (subsection, section MISLE), 224
SOL (subsection, section MM), 209
SOL (subsection, section MO), 193
SOL (subsection, section MR), 565
SOL (subsection, section NM), 78
SOL (subsection, section O), 180
SOL (subsection, section PD), 367
SOL (subsection, section PDM), 394
SOL (subsection, section PEE), 430
SOL (subsection, section RREF), 44
SOL (subsection, section S), 305
SOL (subsection, section SD), 447
SOL (subsection, section SLT), 506
SOL (subsection, section SS), 126
SOL (subsection, section SSLE), 21
SOL (subsection, section T), 807
SOL (subsection, section TSS), 60
SOL (subsection, section VO), 89
SOL (subsection, section VR), 541
SOL (subsection, section VS), 291
SOL (subsection, section WILA), 8
solution set
    Archetype A
      example SAA, 37
    archetype E
      example SAE, 38
    theorem PSPHS, 105
solution sets
    possibilities
      theorem PSSLS, 55
solution vector


definition SOLV, 26


SOLV (definition), 26
solving homogeneous system
    Archetype A
      example HISAA, 63
    Archetype B
      example HUSAB, 63
    Archetype D
      example HISAD, 63
solving nonlinear equations
    example STNE, 9
SP4 (example), 294
span
    basic
      example ABS, 112
    basis
      theorem BS, 157
    definition SS, 298
    definition SSCV, 112
    improved
      example IAS, 246
    notation, 112
    reducing
      example RSC4, 159
    reduction
      example RS, 329
    removing vectors
      example COV, 154
    reworking elements
      example RES, 159
    set of polynomials
      example SSP, 299
    subspace
      theorem SSS, 298
span of columns
    Archetype A
      example SCAA, 114
    Archetype B
      example SCAB, 116
    Archetype D
      example SCAD, 120
spanning set
    crazy vector space
      example SSC, 315
    definition TSVS, 313
    matrices
      example SSM22, 314
    more vectors
      theorem SSLD, 341
    polynomials


example SSP4, 313


Version 2.02


﻿
INDEX 887


SPIAS (example), 465
SQM (definition), 71
square root
    eigenvalues, eigenspaces
      theorem EESR, 841
    matrix
      definition SRM, 843
      notation, 843
    positive semi-definite matrix
      theorem PSMSR, 840
    unique
      theorem USR, 843
SR (section), 840
SRM (definition), 843
SRM (notation), 843
SRM (subsection, section SR), 840
SRR (example), 73
SS (definition), 298
SS (example), 375
SS (section), 112
SS (subsection, section LISS), 312
SS (theorem), 219
SS6W (example), 853
SSC (example), 315
SSCV (definition), 112
SSET (definition), 683
SSET (example), 683
SSET (notation), 683
SSLD (theorem), 341
SSLE (section), 9
SSM22 (example), 314
SSNS (example), 118
SSNS (subsection, section SS), 117
SSNS (theorem), 118
SSP (example), 299
SSP4 (example), 313
SSRLT (theorem), 500
SSS (theorem), 298
SSSLT (subsection, section SLT), 500
SSV (notation), 112
SSV (subsection, section SS), 112
standard unit vector
    notation, 173
starting proofs
    technique GS, 689
STLT (example), 468
STNE (example), 9
SU (definition), 685
SU (example), 685


SU (notation), 685


submatrix
    notation, 375
subset
    definition SSET, 683
    notation, 683
subspace
    as null space
      example RSNS, 297
    characterized
      example ASC, 536
    definition S, 292
    in P4
      example SP4, 294
    not, additive closure
      example NSC2A, 295
    not, scalar closure
      example NSC2S, 296
    not, zero vector
      example NSC2Z, 295
    testing
      theorem TSS, 293
    trivial
      definition TS, 296
    verification
      example SC3, 292
      example SM32, 300
subspaces
    equal dimension
      theorem EDYES, 358
surjective
    Archetype N
      example SAN, 500
    example SAR, 493
    not
      example NSAQ, 492
      example NSAQR, 499
    not, Archetype 0
      example NSAO, 499
    not, by dimension
      example NSDAT, 502
    polynomials to matrices
      example SAV, 494
surjective linear transformation
    bases
      theorem SLTB, 501
surj ective linear transformations
    dimension
      theorem SLTD, 502
SUV (definition), 173


SUV (notation), 173


Version 2.02


﻿
INDEX 888


SUVB (theorem), 325
SUVOS (example), 173
SV (definition), 839
SVD (section), 835
SVD (subsection, section SVD), 838
SVD (theorem), 839
SVP4 (example), 357
SYM (definition), 186
SYM (example), 186
symmetric matrices
    theorem SMS, 186
symmetric matrix
    example SYM, 186
system of equations
    vector equality
    example VESE, 84
system of linear equations
    definition SLE, 9

T (archetype), 775
T (definition), 802
T (notation), 802
T (part), 793
T (section), 802
T (technique, section PT), 688
TCSD (example), 379
TD (section), 827
TD (subsection, section TD), 827
TD (theorem), 827
TD4 (example), 829
TDEE (theorem), 831
TDEE6 (example), 833
TDSSE (example), 830
TDSSE (subsection, section TD), 830
technique
    C, 690
    CD, 692
    CP, 691
    CV, 691
    D, 687
    DC, 694
    E, 690
    GS, 689
    I, 694
    L, 688
    LC, 696
    ME, 693
    N, 691
    P, 695
    PI, 693


T, 688


    U, 693
theorem
    AA, 190
    AIP, 204
    AISM, 287
    AIU, 286
    AMA, 189
    AMSM, 189
    BCS, 239
    BIS, 344
    BNS, 139
    BRS, 245
    BS, 157
    CB, 576
    CCM, 188
    CCRA, 681
    CCRM, 682
    CCT, 682
    CFDVS, 535
    CFNLT, 619
    CHT, 663
    CILTI, 487
    CINM, 217
    CIVLT, 514
    CLI, 536
    CLTLT, 470
    CMVEI, 56
    CNMB, 330
    COB, 332
    CPSM, 818
    CRMA, 188
    CRMSM, 188
    CRN, 347
    CRSM, 167
    CRVA, 167
    CSCS, 237
    CSLTS, 503
    CSMS, 302
    CSNM, 242
    CSRN, 54
    CSRST, 247
    CSS, 537
    CUMOS, 230
    DC, 436
    DCM, 345
    DCP, 424
    DEC, 378
    DED, 440
    DEM, 388


DEMMM, 389


Version 2.02


﻿
INDEX 889


DER, 376
DERC, 385
DFS, 360
DGES, 650
DIM, 387
DLDS, 152
DM, 345
DMFE, 438
DMHP, 810
DMMP, 811
DMST, 376
DNLT, 616
DP, 345
DRCM, 384
DRCMA, 385
DRCS, 383
DRMM, 391
DSD, 364
DSFB, 361
DSFOS, 362
DSLI, 364
DSZI, 363
DSZV, 362
DT, 377
DVM, 814
DZRC, 383
EDELI, 419
EDYES, 358
EEMAP, 835
EER, 586
EESR, 841
EIM, 422
EIS, 629
ELIS, 355
EMDRO, 372
EMHE, 400
EMMVP, 196
EMN, 374
EMNS, 405
EMP, 198
EMRCP, 404
EMS, 404
ENLT, 615
EOMP, 421
EOPSS, 12
EPM, 421
EPSM, 819
ERMCP, 423
ESMM, 421


ETM, 423


FIMP, 795
FS, 263
FTMR, 544
FVCS, 55
G, 355
GEK, 632
GESD, 644
GESIS, 631
GSP, 175
HMIP, 205
HMOE, 428
HMRE, 427
HMVEI, 64
HPC, 808
HPDAA, 810
HPHI, 809
HPHID, 809
HPSMM, 810
HSC, 62
ICBM, 576
ICLT, 514
IFDVS, 536
IILT, 511
ILTB, 486
ILTD, 486
ILTIS, 511
ILTLI, 485
ILTLT, 511
IMILT, 560
IMR, 557
IP, 847
IPAC, 170
IPN, 171
IPSM, 170
IPVA, 169
ISRN, 54
ITMT, 602
IVSED, 516
JCFLT, 651
KILT, 484
KLTS, 482
KNSI, 552
KPI, 483
KPIS, 629
KPLT, 616
KPNLT, 617
LIVHS, 134
LIVRN, 136
LNSMS, 303


LSMR, 848


Version 2.02


﻿
INDEX 890


LTDB, 462
LTLC, 462
LTTZZ, 456
MBLT, 459
MCT, 189
ME, 425
MIMI, 220
MISM, 221
MIT, 220
MIU, 219
MLTCV, 460
MLTLT, 468
MMA, 202
MMAD, 204
MMCC, 203
MMDAA, 201
MMIM, 200
MMIP, 202
MMSMM, 201
MMT, 203
MMZM, 200
MNEM, 427
MRCB, 581
MRCLT, 549
MRMLT, 548
MRRGE, 643
MRSLT, 548
MVSLD, 137
NEM, 425
NI, 228
NJB, 614
NME1, 75
NME2, 138
NME3, 228
NME4, 242
NME5, 331
NME6, 349
NME7, 390
NME8, 420
NME9, 560
NMLIC, 138
NMPEM, 374
NMRRI, 72
NMTNS, 74
NMUS, 74
NOILT, 517
NPNT, 226
NSMS, 296
NVM, 817


OBNM, 609


OBUTR, 605
OD, 607
OSIS, 227
OSLI, 174
PCNA, 680
PDM, 844
PEEF, 262
PIP, 172
PSMSR, 840
PSPHS, 105
PSSD, 358
PSSLS, 55
PTMT, 601
RCLS, 53
RCSI, 555
RDS, 365
REMEF, 30
REMES, 28
REMRS, 244
RGEN, 640
RLTS, 497
RMRT, 359
RNNM, 349
ROD, 823
ROSLT, 517
RPI, 501
RPNC, 348
RPNDD, 517
RREFU, 32
RSLT, 498
RSMS, 303
SCB, 583
SER, 433
SLEMM, 195
SLSLC, 93
SLTB, 501
SLTD, 502
SLTLT, 467
SMEE, 434
SMEZV, 287
SMS, 186
SMZD, 389
SMZE, 420
SNCM, 229
SS, 219
SSLD, 341
SSNS, 118
SSRLT, 500
SSS, 298


SUVB, 325


Version 2.02


﻿
INDEX 891


    SVD, 839
    TD, 827
    TDEE, 831
    technique T, 688
    TIST, 803
    TL, 802
    TMA, 186
    TMSM, 187
    TSE, 803
    TSRM, 803
    TSS, 293
    TT, 187
    TTMI, 214
    UMCOB, 334
    UMI, 230
    UMPIP, 231
    USR, 843
    UTMR, 602
    VFSLS, 99
    VRI, 534
    VRILT, 535
    VRLT, 530
    VRRB, 317
    VRS, 535
    VSLT, 469
    VSPCV, 86
    VSPM, 184
    ZSSM, 286
    ZVSM, 286
    ZVU, 285
ti83
    matrix entry (computation), 673
    row reduce (computation), 673
    vector linear combinations (computation), 674
T183 (section), 673
ti86
    matrix entry (computation), 672
    row reduce (computation), 672
    transpose of a matrix (computation), 673
    vector linear combinations (computation), 672
TI86 (section), 672
TIS (example), 627
TIST (theorem), 803
TIVS (example), 536
TKAP (example), 482
TL (theorem), 802
TLC (example), 90
TM (definition), 185
TM (example), 185


TM (notation), 185


TM (subsection, section OD), 601
TM.MMA (computation, section MMA), 671
TM.SAGE (computation, section SAGE), 677
TM.T186 (computation, section T186), 673
TMA (theorem), 186
TMP (example), 3
TMSM (theorem), 187
TOV (example), 172
trace
    definition T, 802
    linearity
      theorem TL, 802
    matrix multiplication
      theorem TSRM, 803
    notation, 802
    similarity
      theorem TIST, 803
    sum of eigenvalues
      theorem TSE, 803
trail mix
    example TMP, 3
transpose
    matrix scalar multiplication
      theorem TMSM, 187
    example TM, 185
    matrix addition
      theorem TMA, 186
    matrix inverse, 220
    notation, 185
    scalar multiplication, 187
transpose of a matrix
    mathematica, 671
    sage, 677
    ti86, 673
transpose of a transpose
    theorem TT, 187
TREM (example), 28
triangular decomposition
    entry by entry, size 6
      example TDEE6, 833
    entry by entry
      theorem TDEE, 831
    size 4
      example TD4, 829
    solving systems of equations
      example TDSSE, 830
    theorem TD, 827
triangular matrix
    inverse


theorem ITMT, 602


Version 2.02


﻿
INDEX 892


trivial solution
    system of equations
      definition TSHSE, 62
TS (definition), 296
TS (subsection, section S), 293
TSE (theorem), 803
TSHSE (definition), 62
TSM (subsection, section MO), 185
TSRM (theorem), 803
TSS (section), 50
TSS (subsection, section S), 297
TSS (theorem), 293
TSVS (definition), 313
TT (theorem), 187
TTMI (theorem), 214
TTS (example), 10
typical systems, 2 x 2
    example TTS, 10

U (archetype), 777
U (technique, section PT), 693
UM (definition), 229
UM (subsection, section MINM), 229
UM3 (example), 229
UMCOB (theorem), 334
UMI (theorem), 230
UMPIP (theorem), 231
unique solution, 3 x 3
    example US, 14
    example USR, 29
uniqueness
    technique U, 693
unit vectors
    basis
      theorem SUVB, 325
    definition SUV, 173
    orthogonal
      example SUVOS, 173
unitary
    permutation matrix
      example UPM, 229
    size 3
      example UM3, 229
unitary matrices
    columns
      theorem CUMOS, 230
unitary matrix
    inner product
      theorem UMPIP, 231
UPM (example), 229


upper triangular matrix


    definition UTM, 601
US (example), 14
USR (example), 29
USR (theorem), 843
UTM (definition), 601
UTMR (subsection, section OD), 602
UTMR (theorem), 602

V (acronyms, section 0), 181
V (archetype), 779
V (chapter), 83
VA (example), 85
Vandermonde matrix
    definition VM, 814
vandermonde matrix
    determinant
      theorem DVM, 814
    nonsingular
      theorem NVM, 817
    size 4
      example VM4, 814
VEASM (subsection, section VO), 84
vector
    addition
      definition CVA, 84
    column
      definition CV, 24
    equality
      definition CVE, 84
      notation, 84
    inner product
      definition IP, 168
    norm
      definition NV, 171
    notation, 25
    of constants
      definition VOC, 25
    product with matrix, 194, 197
    scalar multiplication
      definition CVSM, 85
vector addition
    example VA, 85
vector component
    notation, 25
vector form of solutions
    Archetype D
      example VFSAD, 95
    Archetype I
      example VFSAI, 102
    Archetype L


example VFSAL, 103


Version 2.02


﻿
INDEX 893


    example VFS, 96
    mathematica, 669
    theorem VFSLS, 99
vector linear combinations
    mathematica, 668
    sage, 677
    ti83, 674
    ti86, 672
vector representation
    example AVR, 316
    example VRC4, 531
    injective
      theorem VRI, 534
    invertible
      theorem VRILT, 535
    linear transformation
      definition VR, 530
      notation, 530
      theorem VRLT, 530
    surjective
      theorem VRS, 535
    theorem VRRB, 317
vector representations
    polynomials
      example VRP2, 533
vector scalar multiplication
    example CVSM, 86
vector space
    characterization
      theorem CFDVS, 535
    column vectors
      definition VSCV, 83
    definition VS, 279
    infinite dimension
      example VSPUD, 346
    linear transformations
      theorem VSLT, 469
    over integers mod 5
      example VSIM5, 795
vector space of column vectors
    notation, 83
vector space of functions
    example VSF, 282
vector space of infinite sequences
    example VSIS, 282
vector space of matrices
    definition VSM, 182
    example VSM, 281
    notation, 182


vector space of polynomials


    example VSP, 281
vector space properties
    column vectors
      theorem VSPCV, 86
    matrices
      theorem VSPM, 184
vector space, crazy
    example CVS, 283
vector space, singleton
    example VSS, 283
vector spaces
    isomorphic
      definition IVS, 515
      theorem IFDVS, 536
VESE (example), 84
VFS (example), 96
VFSAD (example), 95
VFSAI (example), 102
VFSAL (example), 103
VFSLS (theorem), 99
VFSS (subsection, section LC), 94
VFSS.MMA (computation, section MMA), 669
VLC.MMA (computation, section MMA), 668
VLC.SAGE (computation, section SAGE), 677
VLC.T183 (computation, section T183), 674
VLC.T186 (computation, section T186), 672
VM (definition), 814
VM (section), 814
VM4 (example), 814
VO (section), 83
VOC (definition), 25
VR (definition), 530
VR (notation), 530
VR (section), 530
VR (subsection, section LISS), 316
VRC4 (example), 531
VRI (theorem), 534
VRILT (theorem), 535
VRLT (theorem), 530
VRP2 (example), 533
VRRB (theorem), 317
VRS (theorem), 535
VS (acronyms, section PD), 369
VS (chapter), 279
VS (definition), 279
VS (section), 279
VS (subsection, section VS), 279
VSCV (definition), 83
VSCV (example), 281


VSCV (notation), 83


Version 2.02


﻿
INDEX 894


VSF (example), 282
VSIM5 (example), 795
VSIS (example), 282
VSLT (theorem), 469
VSM (definition), 182
VSM (example), 281
VSM (notation), 182
VSP (example), 281
VSP (subsection, section MO), 184
VSP (subsection, section VO), 86
VSP (subsection, section VS), 285
VSPCV (theorem), 86
VSPM (theorem), 184
VSPUD (example), 346
VSS (example), 283


ZVSM (theorem), 286
ZVU (theorem), 285


W (archetype), 781
WILA (section), 2

X (archetype), 783

Z (Property), 280
ZC (Property), 86
ZCN (Property), 681
ZCV (definition), 25
ZCV (notation), 25
zero
    complex numbers
      Property ZCN, 681
    field
      Property ZF, 794
zero column vector
    definition ZCV, 25
    notation, 25
zero matrix
    notation, 185
zero vector
    column vectors
      Property ZC, 86
    matrices
      Property ZM, 184
    unique
      theorem ZVU, 285
    vectors
      Property Z, 280
ZF (Property), 794
ZM (definition), 185
ZM (notation), 185
ZM (Property), 184
ZNDAB (example), 390
ZSSM (theorem), 286


Version 2.02