MathAnalysisCalculusNumerical functions

Convex functions

Provided by: Edvancium

7 minutes read

In this topic you will learn about convex functions. In optimization theory, there is a special chapter devoted to convex optimization, which deals with convex functions. Often, when setting an optimization problem, these problems are limited only to convex functions. It is important to note that there are things that work for convex functions that cannot be promised for all functions. Moreover, convexity is one of the conditions for the applicability of optimization methods, which are used, for example, in machine learning.

Geometric interpretation of convexity

But before answering this question, let's look at what functions are called convex.

Take a look at two figures representing different functions. Can you guess which function is convex?

Graphs of two different functions

If you are not sure, here is a hint:

Draw an epigraph (the region above the graph of the function).
Decide if this region is a convex set. If yes, then the function is convex. If not, then the function is non-convex.

Only one of this function is convex

Right now, it should be obvious which function is convex. However, if you do not remember what is a convex set, there is another hint:

Choose any two points on the graph.
Draw a line segment between them.
The function is convex if the line segment lies above or coincides with the graph, no matter which points you select.

Only one of this function is convex (2)

So, the first function is convex, while the second one is non-convex.

Definition of convexity of a function

Let's define a convex function more formally.

Definition. Function $f(x)$ is convex if for all $0\leq t\leq 1$ and all $x_{1},x_{2}$ from the domain of $f(x)$

$f(tx_{1}+(1-t)x_{2})\leq tf(x_{1})+(1-t)f(x_{2}).$

Note: point $x$ is from the domain if you can compute $f(x)$ .

For example, for the function $f(x)=\frac{1}{x}$ point $x=0$ is not in the domain as it is not allowed to divide by zero.

Line and curve

You can see from the figure above (and also remember from the topic Convex set) that the right part of the definition denotes a line segment between points $x_1$ and $x_2$ . The left part of the definition is a part of the graph between $x_1$ and $x_2$ . So, it is just a formally written fact that

the line segment between any points on the graph of the convex function lies above or coincides with the graph of this function.

Below, you can find illustrations of this definition for a convex function $f(x)=x^2$ . The domain of this function is the whole real line. You can note that no matter which points we choose, the line segment between them will always lie above the graph.

Quadratic function (convex)

Quadratic function

According to the definition, the linear function is also convex. However, you can see that it somehow differs from other convex functions. In order to formalize it, another definition needs to be introduced.

Definition. The function $f(x)$ is strictly convex if for all $0\leq t\leq 1$ and all $x_{1},x_{2}$ from the domain of $f(x)$

$f(tx_{1}+(1-t)x_{2})< tf(x_{1})+(1-t)f(x_{2}).$

In this case, the line segment between two points always lies above the graph and never meets it. We can think of strictly convex functions as of having strictly positive "curvature" at all points. For example, the exponent function is strictly convex, but the linear function is just convex.

Exponent function (strictly convex)

Exponent function

Linear function (convex, but not strictly). You can see that the lie segment coincides with the graph.

Linear function

Let's take a look at the function $f(x)=\sqrt{x}$ . It looks like a convex function, but turned upside down. Functions like that are called concave functions. In concave functions, the region below their graph is a convex set with respect to their domain. For the square function it is $x \in [0;+\infty)$ ).

Square function (concave)

Square function

In order to turn a function upside down (or reflect it), you need to multiply it by $−1$ to get $−f(x)$ . So, if you reflect a square function, it becomes $f(x)=-\sqrt{x}$ , which is a convex function! It means that the reflection of the convex function is concave. The opposite is also true.

Convex function and concave function

Definition. A function $f(x)$ is concave if for all $x$ in its domain $-f(x)$ is convex.

Let's prove using the definition that the function $f(x)=x^3$ is non-convex. We can select points $x_1=-1, x_2 = 0$ and set $t=\frac{1}{2}$ . Then, you need to compare

$\begin{align*}\color{green} f\left(\frac{-1+ 0}{2}\right) & \quad ? \color{red}\quad \frac{f(-1)}{2} + \frac{f(0)}{2}\\ \color{green}-\frac{1}{8} & \quad ?\color{red} \quad -\frac{1}{2}+\frac{0}{2}\\ \color{green}-\frac{1}{8}& \quad > \quad\color{red}-\frac{1}{2} \end{align*}$

So, you get that the left part is greater than the right part. According to the definition, the function $f(x) = x^3$ is non-convex.

Cubic function

We talked a lot about convex functions. However, there are a dozen of functions which are non-convex, which means that at least one line segment between points on the graph lies below the graph. Below, you can see a few examples of such functions.

Sine function (non-convex)

Sine function

Polynomial function of degree more than 2 (non-convex)

Polynomial function of degree more than 2

By the way, it is hard to test the function for convexity using the definition. There is another way how we can check it, which is described in the next section.

Testing for convexity

There is a simpler way to check a function for the convexity than using definition. You can do so by … differentiating!

A twice differentiable function $f$ is convex at an interval $X$ if $f''(x)\geq0$ for all $x \in X$ .

A twice differentiable function $f$ is concave at an interval $X$ if $f''(x)\leq0$ for all $x \in X$ .

The algorithm is the following.

Compute the second derivative of the function.
If it is greater than $0$ at an interval, then the function is convex at this interval. Otherwise, it is concave.

For example, let's investigate the function $f(x)=e^x$ . The second derivative is $f''(x)=e^x$ . It is always greater than $0$ , so $f(x)=e^x$ is a convex function.

The exponential function (2)

Now, let's do the same with $f(x) = -x^2+5x+1$ . The second derivative is $f''(x)=-2$ . Obviously, it is less than $0$ , so this function is non-convex.

But this function is concave. To check this, we calculate the second derivative for the function $g(x) = - f(x) = -(-x^2 + 5x +1) = x^2 - 5x -1.$

The second derivative is $g''(x)=2$ . Obviously, it is more than $0$ , therefore $g(x)$ is convex function, and hence $f(x)$ is concave function.

The quadratic function (2)

Let us test one more function, which was already checked in the previous definition: $f(x)=x^3$ . The second derivative is $f''(x) = 6x$ . Is it greater than $0$ ? Is it less than $0$ ? Is it equal to $0$ ?

Well, the answer depends on the $x$ . If you scroll up to the beginning of this section, you will see the phrase "convex at an interval $X$ ". It is included because there are some situations when on certain intervals the function behaves like a convex function, but on the others — like non-convex. This is exactly what happens with the cubic function. On the interval $(-\infin;0)$ it is non-convex (concave, to be specific), but on the interval $[0;+\infin)$ it is convex! Note: the function $f(x)=x^3$ is anyway called non-convex, it just has an interval which shows convexity. Another examples of such functions can be $\cos x, \sin x$ and $\frac{1}{x}$ .

Graph of a cubic function

We can also check if the function is strictly convex using the second derivative.

A function $f$ is strictly convex at an interval $X$ if $f''(x)>0$ for all $x \in X$ .

For example, if the function is $f(x)=e^x$ , then $f''(x)=e^x>0$ , so it is strictly convex. If the function is $f(x)=kx+b, k,b\in \mathbb{R}$ , then $f''(x)=0$ , so it is not strictly convex.

Now that we have learned to distinguish convex functions from all others, let's return to the question of why mathematicians like them so much?

Useful property of convex functions

Convex functions are widely used in optimization, because of their properties, discussed in this section.

As you remember, if the first derivative of the function at a point is equal to $0$ and the second derivative at the same point is greater than $0$ , then this point is a local minimum (LM). There might be several local minima for one function. The task of optimization is to find the global minimum (GM), or, in other words, to find the most minimum of all others minima.

LM and GM

If the function is set explicitly, that is quite obvious: you can simply compare the value of the function in these points. In all other ways, you do not know the exact number of local minimum points, so you have to sort out the whole domain to find them (which can take infinite time). That is where convex functions come to the stage because of their property:

Local minimum of the convex function is also a global minimum.

LM and GM (2)

So, if you found local minima in a convex function, you can immediately say that the optimization task is solved. Let's prove it!

The idea of the proof. Imagine you have a convex function with two local minima with the different value of the function at these minima (they are located on a different height). If you draw a line segment between them, this line segment will always lie below the function, which contradicts the definition of convexity. So, the local minimum in a convex function is also a global minimum.

Function graph

Multivariable functions

Before, we investigated a function of one variable. However, in optimization, you often have to deal with multivariable functions. In this section, let's take a look at a function of 2 variables and see how the definitions change (spoiler: it is almost the same).

Definition. The function $f(x,y)$ is convex if for all $0\leq t\leq 1$ and all points $(x_{1},y_1), (x_2,y_2)$ from the domain of $f(x,y)$

$f\left(tx_{1}+ (1-t)x_{2},ty_1+(1-t)y_2)\right)\leq tf(x_{1},y_1)+(1-t)f(x_{2},y_2).$

Multivariable function

You can see the difference only in the way we denote points. The definition still tells us that in a convex function, the line segment between any points lies above the graph between the same points. Moreover, the differential convexity criterion continues to work, as do all other properties no matter we investigate a one-variable or multivariable function.

Conclusion

Let's go over the main points of the topic:

The line segment between any point of the convex function lies above or coincides with the graph of this function
Linear function is convex, but not strictly
If the second derivative of the function is greater or equal to $0$ at an interval, the function is convex at this interval
There are functions which on some intervals are convex and on some are concave
In a convex function, a local minimum is a global minimum

7 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo