MathAnalysisCalculusNumerical functions

Convex functions

7 minutes read

In this topic you will learn about convex functions. In optimization theory, there is a special chapter devoted to convex optimization, which deals with convex functions. Often, when setting an optimization problem, these problems are limited only to convex functions. It is important to note that there are things that work for convex functions that cannot be promised for all functions. Moreover, convexity is one of the conditions for the applicability of optimization methods, which are used, for example, in machine learning.

Geometric interpretation of convexity

But before answering this question, let's look at what functions are called convex.

Take a look at two figures representing different functions. Can you guess which function is convex?

Graphs of two different functions

If you are not sure, here is a hint:

  1. Draw an epigraph (the region above the graph of the function).

  2. Decide if this region is a convex set. If yes, then the function is convex. If not, then the function is non-convex.

Only one of this  function is convex

Right now, it should be obvious which function is convex. However, if you do not remember what is a convex set, there is another hint:

  1. Choose any two points on the graph.

  2. Draw a line segment between them.

  3. The function is convex if the line segment lies above or coincides with the graph, no matter which points you select.

Only one of this  function is convex (2)

So, the first function is convex, while the second one is non-convex.

Definition of convexity of a function

Let's define a convex function more formally.

Definition. Function f(x)f(x) is convex if for all 0t10\leq t\leq 1 and all x1,x2x_{1},x_{2} from the domain of f(x)f(x)

f(tx1+(1t)x2)tf(x1)+(1t)f(x2).f(tx_{1}+(1-t)x_{2})\leq tf(x_{1})+(1-t)f(x_{2}).

Note: point xx is from the domain if you can compute f(x)f(x).

For example, for the function f(x)=1xf(x)=\frac{1}{x} point x=0x=0 is not in the domain as it is not allowed to divide by zero.

Line and curve

You can see from the figure above (and also remember from the topic Convex set) that the right part of the definition denotes a line segment between points x1x_1 and x2x_2. The left part of the definition is a part of the graph between x1x_1 and x2x_2. So, it is just a formally written fact that

the line segment between any points on the graph of the convex function lies above or coincides with the graph of this function.

Below, you can find illustrations of this definition for a convex function f(x)=x2f(x)=x^2. The domain of this function is the whole real line. You can note that no matter which points we choose, the line segment between them will always lie above the graph.

  • Quadratic function (convex)

Quadratic function

According to the definition, the linear function is also convex. However, you can see that it somehow differs from other convex functions. In order to formalize it, another definition needs to be introduced.

Definition. The function f(x)f(x) is strictly convex if for all 0t10\leq t\leq 1 and all x1,x2x_{1},x_{2} from the domain of f(x)f(x)

f(tx1+(1t)x2)<tf(x1)+(1t)f(x2).f(tx_{1}+(1-t)x_{2})< tf(x_{1})+(1-t)f(x_{2}).

In this case, the line segment between two points always lies above the graph and never meets it. We can think of strictly convex functions as of having strictly positive "curvature" at all points. For example, the exponent function is strictly convex, but the linear function is just convex.

  • Exponent function (strictly convex)

Exponent function

  • Linear function (convex, but not strictly). You can see that the lie segment coincides with the graph.

Linear function

Let's take a look at the function f(x)=xf(x)=\sqrt{x}. It looks like a convex function, but turned upside down. Functions like that are called concave functions. In concave functions, the region below their graph is a convex set with respect to their domain. For the square function it is x[0;+)x \in [0;+\infty)).

  • Square function (concave)

Square function

In order to turn a function upside down (or reflect it), you need to multiply it by 1−1 to get f(x)−f(x). So, if you reflect a square function, it becomes f(x)=xf(x)=-\sqrt{x}, which is a convex function! It means that the reflection of the convex function is concave. The opposite is also true.

Convex function and  concave function

Definition. A function f(x)f(x) is concave if for all xx in its domain f(x)-f(x) is convex.

Let's prove using the definition that the function f(x)=x3f(x)=x^3 is non-convex. We can select points x1=1,x2=0x_1=-1, x_2 = 0 and set t=12t=\frac{1}{2}. Then, you need to compare

f(1+02)?f(1)2+f(0)218?12+0218>12\begin{align*}\color{green} f\left(\frac{-1+ 0}{2}\right) & \quad ? \color{red}\quad \frac{f(-1)}{2} + \frac{f(0)}{2}\\ \color{green}-\frac{1}{8} & \quad ?\color{red} \quad -\frac{1}{2}+\frac{0}{2}\\ \color{green}-\frac{1}{8}& \quad > \quad\color{red}-\frac{1}{2} \end{align*}

So, you get that the left part is greater than the right part. According to the definition, the function f(x)=x3f(x) = x^3 is non-convex.

Cubic function

We talked a lot about convex functions. However, there are a dozen of functions which are non-convex, which means that at least one line segment between points on the graph lies below the graph. Below, you can see a few examples of such functions.

  • Sine function (non-convex)

Sine function

  • Polynomial function of degree more than 2 (non-convex)

Polynomial function of degree more than 2

By the way, it is hard to test the function for convexity using the definition. There is another way how we can check it, which is described in the next section.

Testing for convexity

There is a simpler way to check a function for the convexity than using definition. You can do so by … differentiating!

A twice differentiable function ff is convex at an interval XX if f(x)0f''(x)\geq0 for all xXx \in X.

A twice differentiable function ff is concave at an interval XX if f(x)0f''(x)\leq0 for all xXx \in X.

The algorithm is the following.

  1. Compute the second derivative of the function.

  2. If it is greater than 00 at an interval, then the function is convex at this interval. Otherwise, it is concave.

For example, let's investigate the function f(x)=exf(x)=e^x. The second derivative is f(x)=exf''(x)=e^x. It is always greater than 00, so f(x)=exf(x)=e^x is a convex function.

The exponential function (2)

Now, let's do the same with f(x)=x2+5x+1f(x) = -x^2+5x+1. The second derivative is f(x)=2f''(x)=-2. Obviously, it is less than 00, so this function is non-convex.

But this function is concave. To check this, we calculate the second derivative for the function g(x)=f(x)=(x2+5x+1)=x25x1.g(x) = - f(x) = -(-x^2 + 5x +1) = x^2 - 5x -1.

The second derivative is g(x)=2g''(x)=2. Obviously, it is more than 00, therefore g(x)g(x) is convex function, and hence f(x)f(x) is concave function.

The quadratic function (2)

Let us test one more function, which was already checked in the previous definition: f(x)=x3f(x)=x^3. The second derivative is f(x)=6xf''(x) = 6x. Is it greater than 00? Is it less than 00? Is it equal to 00?

Well, the answer depends on the xx. If you scroll up to the beginning of this section, you will see the phrase "convex at an interval XX". It is included because there are some situations when on certain intervals the function behaves like a convex function, but on the others — like non-convex. This is exactly what happens with the cubic function. On the interval (;0)(-\infin;0) it is non-convex (concave, to be specific), but on the interval [0;+)[0;+\infin) it is convex! Note: the function f(x)=x3f(x)=x^3 is anyway called non-convex, it just has an interval which shows convexity. Another examples of such functions can be cosx,sinx\cos x, \sin x and 1x\frac{1}{x}.

Graph of a cubic function

We can also check if the function is strictly convex using the second derivative.

A function ff is strictly convex at an interval XX if f(x)>0f''(x)>0 for all xXx \in X.

For example, if the function is f(x)=exf(x)=e^x, then f(x)=ex>0f''(x)=e^x>0, so it is strictly convex. If the function is f(x)=kx+b,k,bRf(x)=kx+b, k,b\in \mathbb{R}, then f(x)=0f''(x)=0, so it is not strictly convex.

Now that we have learned to distinguish convex functions from all others, let's return to the question of why mathematicians like them so much?

Useful property of convex functions

Convex functions are widely used in optimization, because of their properties, discussed in this section.

As you remember, if the first derivative of the function at a point is equal to 00 and the second derivative at the same point is greater than 00, then this point is a local minimum (LM). There might be several local minima for one function. The task of optimization is to find the global minimum (GM), or, in other words, to find the most minimum of all others minima.

LM and GM

If the function is set explicitly, that is quite obvious: you can simply compare the value of the function in these points. In all other ways, you do not know the exact number of local minimum points, so you have to sort out the whole domain to find them (which can take infinite time). That is where convex functions come to the stage because of their property:

Local minimum of the convex function is also a global minimum.

LM and GM (2)

So, if you found local minima in a convex function, you can immediately say that the optimization task is solved. Let's prove it!

The idea of the proof. Imagine you have a convex function with two local minima with the different value of the function at these minima (they are located on a different height). If you draw a line segment between them, this line segment will always lie below the function, which contradicts the definition of convexity. So, the local minimum in a convex function is also a global minimum.

Function graph

Multivariable functions

Before, we investigated a function of one variable. However, in optimization, you often have to deal with multivariable functions. In this section, let's take a look at a function of 2 variables and see how the definitions change (spoiler: it is almost the same).

Definition. The function f(x,y)f(x,y) is convex if for all 0t10\leq t\leq 1 and all points (x1,y1),(x2,y2)(x_{1},y_1), (x_2,y_2) from the domain of f(x,y)f(x,y)

f(tx1+(1t)x2,ty1+(1t)y2))tf(x1,y1)+(1t)f(x2,y2).f\left(tx_{1}+ (1-t)x_{2},ty_1+(1-t)y_2)\right)\leq tf(x_{1},y_1)+(1-t)f(x_{2},y_2).

Multivariable function

You can see the difference only in the way we denote points. The definition still tells us that in a convex function, the line segment between any points lies above the graph between the same points. Moreover, the differential convexity criterion continues to work, as do all other properties no matter we investigate a one-variable or multivariable function.

Conclusion

Let's go over the main points of the topic:

  • The line segment between any point of the convex function lies above or coincides with the graph of this function

  • Linear function is convex, but not strictly

  • If the second derivative of the function is greater or equal to 00 at an interval, the function is convex at this interval

  • There are functions which on some intervals are convex and on some are concave

  • In a convex function, a local minimum is a global minimum

7 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo