3Matrices
Handling linear equations and keeping track of the unknowns can be a pain. At a certain point one needs to simplify the notation. This is done introducing matrices. For example, the system of equations can be represented by the rectangular array (matrix) of numbers. Many of the operations we do to solve linear equations might as well be done on this array forgetting about the unknowns.3.1 Matrices
3.1.1 Definitions
A rectangular array of numbers is called a matrix. A matrix with rows and columns is called an ( by ) matrix. The notation for an matrix is where denotes the number or entry in the -th row and -th column. If the matrix in (3.2) is denoted , then it has rows and columns with .Two matrices are equal if they have the same number of rows and columns and their entries are identical.A very useful (and famous) open source library in python (with 1000+ contributors) for handling matrices is NumPy. Here is how the matrix in (3.2) is entered in NumPy.- A matrix whose entries are all is called a zero matrix. It is denoted simply by , when it is clear from the context what its numbers of rows and columns are.
- A matrix is called quadratic if it has an equal number of rows and columns. The first two matrices below are quadratic, whereas the third is not.
- The diagonal in a matrix is defined as the entries in the matrix with the same row- and column indices. Below we have a matrix with the diagonal elements marked A matrix is called a diagonal matrix, if all its entries outside the diagonal are . Below is an example of a square diagonal matrix
- A matrix is called a row vector if it has only one row. For example, is a row vector with three columns.
- A matrix is called a column vector if it has only one column. For example, is a column vector with three rows.
- The rows in a matrix are called the row vectors of the matrix. The -th row in a matrix is denoted . The matrix in (3.2) contains the row vectors
- The columns in a matrix are called the column vectors of the matrix. The -th column in a matrix is denoted Not to be confused with powers of the matrix introduced later.. The matrix in (3.2) contains the column vectors
- A row- or column vector is referred to as a vector.
- Even though we have used the notation for the -th cartesian product of , we will use henceforth to denote the set of column vectors with rows (entries). This definition is almost identical with the previous one, except that the tuple is formatted as a column vector.Illustrated by an example,
3.2 Linear maps
In the first chapter we encountered a miniature version of a neural network. Neural networks are generally incredibly complicated functions from to . The function given by even though it looks complicated, is simple in comparison.You probably agree that the function given by is even simpler. This function (or map) is an example of a linear map. In general, a linear map has the form where are real numbers.Using matrices we will use the notation In this way, we can write the map as where is the matrix and is the vector in .Basically a linear map is a system of linear equations without the right hand side (including ). In fact, we may write the system of linear equations in (3.1) as
Let be the linear map given by the matrix
Does there exist , such that
Quite generally, can we find , such that
for arbitrary ?
Suppose you know that is a linear map and
that you have a black box giving you output if you
supply the input . How would you find
the matrix defining ?
3.3 Matrix multiplication
Suppose we are given two linear maps and . Then it turns out that the composition is also a linear map. A word of advice: the computations below look large and intimidating. They are not. It is important that you carry them out on your own. Do not look and copy or tell yourself that it looks okay. Do the computations yourself and ask me or fellow students if you get stuck.Let us look at an example. Suppose that Then In terms of the matrices of the linear maps, we write this as There is nothing special about the numbers in this example. We might as well do the computation in general: suppose that Then Again, in terms of the matrices of the linear maps, we write this as The equation above is the formula for matrix multiplication for two matrices, precisely as it was introduced by Cayley around .Upon closer inspection (and colored in (3.5) for and ), you will see that the number in the -th row and -th column in the product matrix is the row-column multiplication between the -th row and the -th column in the two matrices:
The row-column multiplication between a row vector
with the same number of entries is defined as
Let be an matrix and an matrix.
Then the matrix product is defined as the matrix given by the
row-column multiplication
for and .
If is an matrix and is an , then the matrix product only makes sense if
: the number of columns in must equal the number of rows in .
Suppose that
Which of the matrix products below make sense?
I have been told that my pronunciation of column in the video below is wrong. In the area of the US, where I got my PhD, people for some reason had this (Irish?)
rare pronunciation.
Using matrix product notation, the system of linear equations in (3.1) can now
be written as
Here we multiply a with a
matrix. The row-column multiplication gives the matrix
This matrix must equal the matrix on the right hand side
for (3.1) to be true.
This is in agreement with our convention for writing linear maps in
section 3.2.
Suppose that
Which ones of the statements below are true?
If , then .
If , then .
3.3.1 Matrix multiplication in numpy
Matrix multiplication in numpy is represented by the function dot:3.3.2 The identity matrix
The identity matrix of order is the diagonal matrix with in the diagonal. Below is the identity matrix of order . The identity matrix has the crucial property that for all matrices .
Prove that the two identities in (3.6) are true for matrices.
3.3.3 Examples of matrix multiplication
Matrix multiplication is omnipresent in mathematics. Below we give an example, which is a baby version of Google's famous page rank algorithm.
Suppose that % of the people living in the suburbs move to the big city and
that % of the people living in the big city move to the suburbs per year.Aiming for a model using probabilities, let us be a bit more precise.
- If you live in the suburbs, the probability that you move to the big city is ,
- If you live in the suburbs, the probability that you do not move is .
- If you live in the big city the probability that you move to the suburbs is .
- If you live in the big city the probability that you do not move is .
giving the distribution of the populations for
years. Let us experiment a little:
It seems that the distribution stabilizes around living
in the big city and living in the suburbs of the
original total population.The matrix is an example of a stochastic matrix.
In general, a square matrix is called a stochastic matrix if its entries are
and the sum of the entries in its column vectors are .
A simple example of the page rank algorithm is
given in Example 2.13. There you
encountered the equations
In terms of matrix multiplication the first four equations can be rewritten to
Putting
we get a stochastic matrix and may again iterate and compute .Is there a connection between the entries of , where is very big and the solutions to the linear equations?
In the end of Example 3.8 (above) a stochastic matrix is defined. Show that
the matrix product of two stochastic matrices is a stochastic matrix.
Below is an example, where matrix multiplication occurs in networks.
Suppose we have five cities connected with roads as shown belowThis network has a so called incidence matrix, where city is associated with the
-th row and -th column. A in the matrix in the entry means that there is a road
from city to city , whereas a means that city and city are not connected by a road:
Here
What is the interpretation of and in general?
It turns out that the entry in the matrix exactly is
the number of paths of length from city to city .For example, there are paths from city to city of length
corresponding to the paths . The paths from city to city of
length are and the paths of length from city to city
are .A deeper explanation
Suppose that we have a network with cities and incidence matrix .The general proof of the observations above in our special example, builds on the
fact that
a path of length from city to city has to end with a road from a neighboring city
to .
For every one of these neighboring cities, we may count the number of paths of length
from city .
If is the number of paths of length from city to city ,
then matrix multiplication tells us that
This number is exactly the number of paths of length from city to
city , since
only when is a neighboring city to city (and otherwise).
3.4 Matrix arithmetic
Matrix multiplication is very different from ordinary multiplication of numbers: it is not commutative. Consider the matrices Then i.e.., . Addition of matrices is like ordinary addition, except that you add all the entries of the involved matrices.3.4.1 Matrix addition
Addition of two matrices with the same number of rows and columns is defined below. The zero matrix is the () matrix containing zero in all its entries. When its number of rows and columns are clear from the context it is simply denoted by . For matrices for example, we write
Given an example of a non-zero matrix, such that
3.4.2 Multiplication of a number and a matrix
A matrix may be multiplied by a number by multiplying each entry by the number:
Does there exists a number , such that
Let be a matrix, such that
for every other matrix . Show that is a diagonal matrix of the form
where i.e., .
3.4.3 The distributive law
Ordinary numbers satisfy . This rule also holds for matrices and is called the distributive law (multiplication is distributed over plus)
Let and be matrices, an matrix and an matrix. Then
Let us start by looking at .
Here it suffices to do the proof, when is a row vector and
column vectors, since
For , we may reduce to the case, where
are row vectors and a column vector, since
Both of these cases follow using the distributive law for ordinary numbers.
Suppose that and are two matrices. Is it true that
What about
3.4.4 The miraculous associative law
It does not make sense to multiply three matrices and . We have only defined matrix multiplication for two matrices. There are two natural ways of evaluating : We can begin by multiplying by and then multiply from the right. However, we may just as well start by multiplying by and then multiply from the left.It is in no way clear, that these two computations give the same result!That this turns out to be true, is just one of many miracles in the universe (there is a rather cool mathematical explanation, though, addressed in an exercise below).
Let be an matrix, an matrix and an matrix. Then
We must prove that
for og . The left hand side can be written
The right hand side is
Writing the row-column multiplications in (3.10), we get
Writing the row-column multiplications in (3.11), we get
The rows in the sum in (3.12) correspond to the columns in the sum (3.13).
Therefore these sums are equal and .
The associative law is true, but in computing there can be a (big)
difference in the number of multiplications in the two computations
and i.e., efficiency is not associative for
matrix multiplication. In the notation of Theorem 3.17, computing
requires
multiplications, whereas computing requires
multiplications. If for example and , then computing
requires multiplications, whereas
computing requires multiplications!
Verify the associative law for the three matrices
by showing by explicit computation that
There is in fact a high tech explanation that the associative law for matrices holds. An explanation
that makes the calculations in the above proof superfluous and shows the raw power of
abstract mathematics: suppose that
and are linear maps.
Then and are both linear maps from , such that
for every . How does this relate to the associative law for matrix
multiplication?
3.5 The inverse matrix
You are allowed to divide by a number provided it is . Does it makes sense to divide by matrices?It does, but there are some matrices that correspond to the number that we are not allowed to divide by.
Let and be matrices. Show that
and
implies that .
An matrix is called invertible, if there exists an matrix , such that
In this case, is called the inverse matrix of and denoted .
Show that a quadratic matrix with a column or row consisting entirely of zeros cannot
be invertible.
Suppose that
with . Prove that is invertible with
When is a quadratic diagonal matrix invertible? Look first at the case:
The inverse matrix can be computed in numpy:The inverse matrix enters the picture when solving linear
equations with unknowns:
can be rewritten using matrix notation as
or more compactly as .If is invertible, then the associative law gives the following:
The system of linear equations
can be rewritten using matrix multiplication to
where
Here is invertible and
One simple matrix multiplication
shows the solution we expect from (3.14).
The product of two invertible matrices (when this makes sense)
is an invertible matrix. This is the content of the following result.
The product of two invertible matrices and is invertible and
.
We must check that
Let us check the first condition using the associative law:
where denotes the identity matrix. The condition is verified in the
same way.
We have defined a matrix to be invertible if there exists a matrix , such that
and . Suppose that only . Can we then conclude that ?Find the mistake in the argument below.Suppose that . Then for every we have .
Therefore for every and we have proved that
.
Let
Compute the powers for i.e., . Now let
where .
Show that is invertible, and
Compute .Do you see a way of generalizing this computation to
matrices with a property shared by the
matrix above?
3.5.1 Well, how do I find the inverse of a matrix?
Finding the inverse of a matrix or deciding that the matrix is not invertible is a matter of solving systems of linear equations.Given an matrix , we need to see if there exists an matrix , such that where is the identity matrix of order . We can do this by computing the columns of . From the definition in (3.15), the -th column of must satisfy This follows from the definition of matrix multiplication!The identity in (3.16) is a system of linear equations in unknowns. The unknowns are the entries in the -th column of the inverse matrix (if it exists).
Suppose that is a matrix. Then the inverse matrix
(if it exists) can be computed from the systems of linear
equations below.
Writing
for the first and second columns, the systems of linear equations can be written as
where
A concrete example along with a useful way of keeping track of the computation is presented in the video below.
Compute the inverse of the matrix
by employing the method of solving linear equations above. Explain
the steps in your computation. You may find it useful to collect
inspiration from the video in Example 3.30.
3.6 The transposed matrix
The transpose of an matrix is the matrix given by As an example, we have Notice also that for an arbitrary matrix .
Let be an matrix and an matrix. Then
By definition . This entry is given
by row-column multiplication of the -th row in and
the -th column in , which is the row-column multiplication of the
-th row in and the -th column in .
Let be a quadratic matrix. Prove that is invertible if and only if
is invertible.
In the sage window below, you are supposed to experiment a
bit by entering an arbitrary matrix and studying the quadratic
matrix . Is there anything special about this product? Press
the Further explanation button below the sage window to display
the rest of the exercise after(!) you have completed your
experimentation.Further explanation
A quadratic matrix is called symmetric if . Prove that
is a symmetric matrix, where is an arbitrary matrix.
3.7 Symmetric matrices
A (quadratic) matrix is called symmetric if . Visually, this means that is symmetric around the diagonal like the matrix but not like the matrix
Show that
is a symmetric matrix, when is a symmetric matrix and
is an arbitrary matrix. Both matrices are assumed
quadratic of the same dimensions.
If is a symmetric matrix, we define the
function given by
This definition is rather compact. Let us consider the following example for .
Then
You are also encouraged to watch the short video below
for an example with concrete numbers.
3.7.1 Positive definite matrices
A symmetric matrix is called positive definite if for every . Probably the first example of a positive definite -matrix one thinks of is being the identity matrix. HereOf course, here if and only if or .
Give examples of (non-zero) and matrices that are
positive definite and ones that fail to be positive
definite.When is a diagonal matrix positive definite?
Let be a symmetric matrix. Show that
is not positive definite if .
3.7.2 Positive semi-definite matrices
A symmetric matrix is called positive semi-definite if for every . A positive definite matrix is positive semi-definite. Probably the first example of a non positive definite, but positive semi-definite -matrix one thinks of is being the zero matrix. Here
for every .
Give an example of a non-zero matrix that is positive semi-definite,
but not positive definite.When is a diagonal matrix positive semi-definite?
3.7.3 Symmetric reductions
As you probably have noticed, it is rather straightforward to see when a diagonal matrix is positive (semi)definite. For a general symmetric matrix, one needs to reduce to the case of a diagonal matrix. This is done using the following result.
Let be a symmetric matrix and an
invertible matrix. Then is
positive (semi) definite if and only if
is positive (semi) definite.
Every vector is equal to for a unique ,
since is invertible. Why? The upshot is that the equation
can be solved by multiplying both sides by giving
So we get
This computation shows that is positive (semi) definite if
is positive semi-definite. The same
reasoning with shows that is
positive (semi) definite if is positive (semi) definite.Notice that it is important that only happens
when .
Let
be a diagonal matrix. What conditions must the diagonal
entries and satisfy in order for
to be positive definite?Let
denote a symmetric matrix, where . Let
Show that is invertible and compute
Use this to show that is positive definite if and only if
and .Let be the function defined by
Show that for every .