6Convex functions
In this chapter we will dive deeper into convex functions. The main focus will be convex functions defined on intervals (convex subsets) of the real numbers i.e. convex functions in just one variable. Along the way, differentiability is introduced along with several classical results. We will sketch surrounding ideas, but skip proofs of some of the main results. In this chapter denotes the logarithm with base .Mentimeter6.1 Strictly convex functions
Below we strengthen Definition 4.23 of a convex function.
Let be a convex subset.
A strictly convex function is a convex function , such that
for every number with and .
Strictly convex function. The line segment between and lies strictly () above
the graph of .
Consider the line (function) given by
for .
This function is convex, since we can formally write for every :
However, the computation in (6.1) also shows why
there is no chance that is strictly convex. Intuitively,
the graph of convex functions need to bend and curve a bit to
be strictly convex. No lines should occur in their graphs.
Let be a convex function. Show is strictly convex if and only if
for implies that .
Give an example of a non-constant convex function , which is not strictly convex. Show in
details that is a strictly convex function.Hint
Look back to the relevant part of Exercise 4.25 for
dealing with .
6.2 Why are convex functions interesting?
We begin this section by giving the following result without proof.
A convex function defined on an open convex subset is continuous.
Give an example showing that Theorem 6.6 is not true if the convex function
is defined on a closed convex subset.Hint
Let us now define precisely what is meant by a local vs a global minimum
for a function.
Try to come up with an example like . Look at
the end point .Hint
Well, try out
Let be a function, where is an
arbitrary subset (not necessarily convex, open or closed). Then
is called a local minimum for if
for every , which is sufficiently close to . Being
sufficiently close to means that satisfies
for some fixed .In a much stronger notion, is called a global minimum if
for every (not just locally).
Graph of function defined on an interval. This function has a local minimum, which is not
a global minimum.
Give an example of a local minimum that is not a global minimum for a precisely specified function.
Also give
an example of a global minimum, which is not uniquely defined (again for a precisely specified function).
Uniquely defined means that there is precisely one , such that is minimal.
We might as well have talked about maximum instead of minimum above.
Reformulate Definition 6.8 in order to define
a local and a global maximum.
A local extremum is a point , which is either a
local minimum or a local maximum.Convex functions are interesting, because of the local
nature of the minimization problemIf you run into a local minimum in (6.2),
then you are sure that it also is a global minimum! This is the
content of the result below.
Let be a convex function defined on a convex subset . If is a
local minimum, then is a global minimum. If is strictly
convex, then a global minimum for is unique.
By the definition of local minimum in Definition 6.8, there exists , such that , when and . Suppose that
is not a global minimum. Then there exists with
. Consider the point
where . Then
Since , we can choose sufficiently small such that
implying , since is a local minimum. This
contradicts that for every . Let
be strictly convex and let be a global minimum for . If
, and , then
for . This would contradict that global minimality
of , since for .
The following little result turns out to be very useful and
also very intuitive and drawable! It is a key component
in characterizing convex differentiable
functions in terms of . We will not give the proof here.
Let be a convex function. Then
for .
6.3 Differentiable functions
To appreciate the depth of the notion of differentiability, you should read the story (joke, actually) in the second paragraph of section 8-2 in volume I of the famous Feynman Lectures on Physics. Below is a photograph of the master explainer in action.6.3.1 Definition
Let be a function defined on the open interval . The notion of being differentiable at a point can be glanced from the drawing belowwhere we informally let approach and look at the limiting value of the slope. Newton used to say many hundred years ago, that the derivative of at is the value of this slope just before becomes . In modern day mathematical (obfuscated?) parlance, this gets translated into the following.The function is differentiable at if there exists a number , such that for every there exists with for every satisfying .We will use a somewhat more operational definition below in terms of continuous functions defined around with . This looks difficult, but it is actually a clever way of approaching differentiability (and more in the spirit of Newton).
The function is differentiable at if there exists
- with i.e., and .
- A function continuous at with ,
If a function is
differentiable, we get a new function
giving the (first) derivative at
a point as output. We may ask again if this function
is differentiable. If this is so, we may define a
function given by
called the second derivative. This
procedure may be continued. We use the notation
for the -th derivative.
Let us apply Definition 6.15 to the function at
the point . Here
Here you immediately see that with
(and ) in Definition 6.15.
Use Definition 6.15 to formally show that
if .
In operating with differentiable functions you are supposed to draw on
your previous knowledge. I have summarized some of this knowledge below (even though we will give hints below as how to prove some of the rules).
- If , where , then .
- If , where , then
- If , then
- If , then
- If , then
- If , then
- If and are differentiable functions, then the derivative of their product is
- If and are differentiable functions, then the derivative of their quotient is
- If and are composable differentiable functions, then the derivative of their composite is
Suppose that . What is
6.3.2 Recall the derivative of a product?
From high school you know that the derivative of a product of two functions and is given by the formulaWe can use the -definition (6.4) to derive the product rule in (6.5). The computation below is a bit cumbersome, but actually quite doable. We assume to begin with that and are differentiable at according to (6.4) i.e., Then we start the computation: where the function is seen to be continuous at with . The end result of this computation shows that is differentiable at with again according to (6.4).The formula for the derivative of a fraction i.e., can be derived using a neat little trick. This is the topic of the following exercise.
Show how the product rule may be used to derive the rule for finding the derivative of
a fraction:
Hint
6.3.3 Recall the one variable chain rule?
The formula for the derivative of a composite function is given by where is in the domain of . Let us see how (6.4) applies in showing this.Suppose that is differentiable at and is differentiable at , then we can mess around a bit with the -functions for and for the composite function around : where (take a deep breath) Here is seen to be continuous at with i.e., the composition is differentiable at with derivative The formula (6.9) is extremely important and useful. We give some applications in the exercises below.
For the function for , you already know that . Show that if you define
the function by
for an arbitrary number ,
then .
Compute the derivative of the function given by
using only paper and pencil! You can check your result afterwards using a computer.
Suppose that and are inverse functions i.e.,
If you know the derivative of , how can you use the chain rule
to get the derivative of ? Illustrate with examples like
and , and .
Suppose that is a convex function. We know that
is continuous, but is differentiable at every point ?Hint
Nope. This is wrong. Come up with a convex function and a point ,
such that is not differentiable at .
6.3.4 The Newton-Raphson method for finding roots
We begin this section with a surprising example.
Suppose that and we wish to compute . To do this we may focus on the
quadratic equation and attempt to compute an approximate value ,
such that is close to . Let me at this point disclose that there is
a very effective iterative scheme for doing this. You start by putting and then iterate
using the formula
to get better and better approximations to .
I have been in complete awe of the Newton-Raphson method since my early youth. It is
an algorithm, where the notion of differentiability really shines.The method
comes from Definition 6.15 with :
we are assuming that is very close to , where . Then
Ignoring the very small number and solving this equation for we get
In the Sage window below, I have entered the algorithm starting
in running ten iterations for finding a zero for
.Graph
The formula in (6.10) is derived from
where .
You can try out (6.10)
below.
Give an example, where the Newton-Raphson method cycles between points
and never finds the desired zero. Perhaps a drawing will help here.
The Newton-Raphson converges rapidly in most cases. Of course, it
breaks down violently if it runs into a critical point i.e.,
a point , such that .William Cook has made some nice Sage code available for
experimentation with Newton's method.
The formula (see button in Example 1.60)
for the (monthly) payment on a (car) loan over payments with
a down payment of and an interest rate of (per payment or term) is given by the formula
There is no explicit formula for calculating given and . Here
the Newton-Raphson method is invaluable for estimating by approximating a
zero for the function
Your bank promises you a loan of DKK with yearly payments
of DKK over years. At the same time it claims that
its interest rate is very favorable at only %. Here the bank is
wrong! What is the
real interest rate?
How much money do you save (compared to the original offer from the bank) if you insist that the bank offers you
the promised interest rate of %?
6.3.5 Critical points and extrema
A critical point for a differentiable function is
a point with
Let be a differentiable function. If
is a local extremum for , then is critical point i.e., .
Suppose that is a local maximum and that
according to (6.4). If , then we can choose
sufficiently small, such that if , since and
is continuous in . Therefore
contradicting that is a local maximum. The proof is similar
for and if is a local minimum.
Is the converse of the above lemma true i.e., if is
a local extremum?
Theorem 6.34 below is called the mean value theorem. It is a consequence of Lemma 6.32
and the extremely important Theorem 5.77 about continuous functions
on compact subsets attaining their maxima and minima!
Let be continuous and differentiable
on . Then there exists such that
6.3.6 Increasing functions
The definition below is much simpler than the definition of differentiability.
A function with is called
increasing if
and strictly increasing if
for .
Give an example of an increasing function.
Give an example of an increasing function that is not strictly increasing.
The following very important result is a consequence of Theorem 6.34. You probably already know this result from your previous (danish) education (monotoniforhold!).
Let be a differentiable function.
Then
is increasing if and only if for every . If for every , then is strictly
increasing.
Show that is strictly increasing i.e.,
Hint
but why is always except when ?
Suppose that is a continuous function, such that
is differentiable on the open interval . Is increasing
on if for every ?
Is it possible for a strictly increasing function to be bounded i.e., does
there exist a (positive) number , such that for every ?Hint
Have a look at
6.4 Taylor polynomials
If is a critical point for we cannot conclude that is a local extremum. We know that and we can get more information out of by exploring the signs of Suppose that is a polynomial, then For nice functions like we can play this game ad infinitum. In fact in this way we get the beautiful infinite series If is an times differentiable function defined at , we call the polynomial in (6.11) the Taylor polynomial about the point of degree associated with the . Similarly, one may also define the Taylor polynomial of order about a point by Taylor polynomials can be used to approximate more complicated functions such as and with a well defined error term. This is cool classical mathematics. Unfortunately we do not have time to go deeper into Taylor's theorem, which states this in precise terms.
Compute the Taylor polynomial for up to degree .
Suppose you have a number that satisfies
Can you make sense of the formula
using Taylor polynomials?
In the context of optimization, the following
result becomes important. We will not give the proof, but only notice that Theorem
6.34 also here plays an important role.
Let be a critical point of an times differentiable
function , such that
is a continuous function,
and . If is even, then is a local
minimum if and a local maximum if
. If is odd, then is not a local extremum.
Let us apply Theorem 6.43 to the function
where . Here and
is a critical point (why?). Since
we see that is a local minimum if and a local maximum if
.
Have you seen Example 6.44 elsewhere, perhaps in a more
geometric setting? What type of curve is the graph of ? Here you may consult
your previous mathematical knowledge.What is the outcome, when you apply Theorem 6.43 to the
function at ?
Show that is a critical point of the function
defined by
Use Theorem 6.43 in deciding if it is a local
maximum or minimum or neither.
6.5 Differentiable convex functions
The following theorem is proved using Lemma 6.13 and Theorem 6.34. It immediately implies Corollary 6.48, which is the result mostly used.
Let be a
differentiable function. Then is convex if and only if is
increasing. If is strictly increasing, then is strictly
convex.
Let be a twice differentiable
function. Then is convex if and only if for every
. If for every , then is
strictly convex.
Wait! Stop! Why did I not write if and only if
is strictly convex?
You cannot deduce from Corollary 6.48 that the function
given by is a
strictly convex function. Why not?You can deduce from Corollary 6.48 that is
a strictly convex function. How can
be used to prove that is a strictly convex function?
Show that is a strictly convex function .Show that is a strictly convex function
.
Show that given by
is a strictly convex function.
Another nice application of Lemma 6.13 (and Theorem 6.47) is the following.
Let be a differentiable function. Then
is convex if and only if
for every .
Suppose that is a differentiable
convex function and is a critical point for
. What can you say about using Theorem 6.53?