Computer scienceProgramming languagesJavaAdditional instrumentsRuntime analyzer tools

Profiling basics

10 minutes read

You've probably already heard of performance tuning. To perform it, you must study and find flaws in the application operation at runtime. You can use various manual techniques or special applications called profilers to get the information you need. They help us analyze the resources the application consumes. In this topic, you will learn some manual profiling techniques and get acquainted with the built-in IntelliJ IDEA Ultimate profiler — the async profiler.

Manual profiling techniques

Profiling applications are relatively difficult to understand. Before we have a look at a full-fledged profiling application, let's see how we can get the desired information using different approaches/tools. In some cases, you can manage with just their help and get answers to your questions. In general, profilers use such techniques under the hood. Those techniques include:

Method time execution measuring. This is the simplest approach of the list, where you measure the code execution time calculating how long it took to execute it. To do that, you can use the System.nanoTime() or System.currentTimeMillis() methods. The first method returns the current system time in nanoseconds, the second one in milliseconds. If you call these methods before and after executing the desired method and save the returned values in variables, you can get the execution time by subtracting those variables.
Thread dump sampling. A very useful technique for monitoring the threads' states within a process. It can be obtained using the jstack command-line utility and provides us with the information regarding deadlocks, CPU consumption spikes, etc. It can be useful if your application unexpectedly slows down or freezes. If you get a thread dump during the execution of the program in fixed time intervals and collect statistics on the number of dumps for each method, you can assume if the method was executed for a long time or not.
Memory sampling. Another command-line utility that shows the details of heap memory. Depending on the command's flag combination it can show different information. For instance, a jmap -histo:live <pid> command shows the histogram of live Java objects. By understanding the size of each object, you can find those that take up more memory than you need and figure out the reasons. That is what the sampling technique is: making various samples and analyzing the application execution based on collected data.
Bytecode manipulation. Profiling applications use this technique to modify the application bytecode which allows to collect statistics about it or perform any other manipulations. You can do the same using Java agents services and have your solutions match your individual needs. This is quite an important technique and it's definitely worth exploring. Try searching for materials concerning Java bytecode instrumentation.

Fortunately, there are applications that can perform such operations and we don't need to write everything from scratch. One of them is the async profiler. It widely uses the sampling approach. Let's move on to studying it.

Preparation steps

As of June 2022, the async profiler is officially supported on Linux and Mac machines. You can use its community-supported version on Windows (Windows Async Profiler) but it will show less information. In this topic, we will use Linux Ubuntu 20.04 LTS and IntellliJ IDEA Ultimate 2022.1.2 versions.
It's time to learn how to run this profiler. Since the async profiler is shipped as part of IntellliJ IDEA by default, we don't need to install it. First, let's prepare the application we will use:

public class ProfilingDemo {
    public static void main(String[] args) {
        long[] arr = new long[50_000_000];

        multiplyElements(fillArray(arr), 2);
        printArray(arr);
    }

    public static long[] fillArray(long[] arr) {
        for (int i = 0; i < arr.length; i++) {
            arr[i] = i;
        }

        return arr;
    }

    public static long[] multiplyElements(long[] arr, int multiplier) {
        for (int i = 0; i < arr.length; i++) {
            arr[i] *= multiplier;
        }

        return arr;
    }

    public static void printArray(long[] arr) {
        System.out.println(Arrays.toString(arr));
    }
}

The code is quite simple to understand, but it will consume a lot of resources. To profile the application, we just need to click on the application launch button and the option we need will be there.

the application launch button

Note that in older IntellliJ IDEA versions the profiler can have other names. You may find separate CPU Profiler or Allocation Profiler options. These are the async profiler's two modes. In the newer versions of the IntellliJ IDEA, you don't need to choose one specific mode, both are executed at once. We will show how to choose the profiling result you need later in this topic.

CPU Profiler

In the previous section, we prepared for the launch of the application. Now let's look at its results. At the bottom of your IDE, there are several tabs: Version Control, TODO, Terminal, etc. One of them is the Profiler tab: click on it after launching the profiler and you will see a window that looks like this:
Profiler tab

This flame graph shows methods called during the application execution and it is based on a stack trace logic. Each block represents a single stack frame: yellow blocks are frames of Java itself, the code written by us, and method calls of Java base classes, blue blocks are native method calls, and purple blocks are calls to core Linux kernel methods.

On the left side of the graph, you can see the threads of the virtual machine itself. You can select a specific one and see the flame graph associated with a specific thread. In the image above, you see a flame graph for all threads.

They are sorted by their execution time. The methods that have spent the most CPU time are on the left. The length of each block depends on this indicator. Here you need to ask yourself: if the length of each block depends on the time of its execution, why do we see short blue and purple blocks on the left if there are longer blocks on the right? The answer is that these are all the method calls located in the main() method hierarchy.
flame graph

So, to the left, above java.lang.Long.stringSize(long), we can see the call of the native method in blue which took the most time compared to calls to other native methods inside the main() hierarchy. Therefore, it is in the leftmost part, despite the fact that there are method calls that have consumed more CPU time. The same logic works with Linux kernel method calls. The frames that executed longer than all others from the Linux kernel are shown in purple on the left. They aren't long enough for their names to show in the graph above, but if you zoom the image using the Zoom button, you can see their names. That button is located to the right of the All threads merged block at the very top.
Zoom button in action
The graph shows that from what we have written, much more CPU time was spent on the execution of the printArray() method: 86.72% of time consumed by the parent main() method. If we study the next blocks following main(), we will see the invocation of two methods from our code. From the image, you can understand that the toString() method is responsible for most of the time spent on the main() method. Its call was followed by a chain of calls to other methods. If we study all the yellow blocks in this chain to the end, we will understand that the two methods stringSize() and getChars() consume nearly half the time spent by the main() method.
In parallel with the main() method, other methods were executed. Most of the CPU time among them was spent on the native start_thread(), so it is shown immediately after the main() method.
start thread in graph
You can see the overall picture if you right-click the start_thread block and click Focus on method in Call Tree. Focus on method in Call Tree

Memory allocation

In the first section, we mentioned that when running the profiler, the IDE performs two types of profiling. Now that we have studied the CPU profiling, let's move on to the Memory allocation mode. To do this, you just need to switch to this mode from the upper right corner, where you see Show: CPU Samples. Open the drop-down list and select the Memory Allocations option.
Memory Allocations option

This section doesn't provide as much information as the previous one. Its main purpose is to show statistics regarding memory usage by each method. As you can see from the image above, it tells us how many bytes the method consumed as well as the same data relative to the parent method and the whole application.
In our case, the printArray() hierarchy consumed the most memory among other method invocations from the main() method as it was in the case of CPU time consumption. However, here the main memory consumer is the copyOf() method. It is responsible for the consumption of more than half the memory consumed by the main() method. It turns out that the stringSize() and getChars() methods have consumed the most CPU time compared to other methods but they aren't the biggest memory consumers in the application.

Conclusion

With this topic, you have taken your first steps in a very complex but important process. Application profiling provides invaluable help in improving the application, especially when it comes to high-load applications. Large applications require special efforts if you don't want to face serious issues. Study our topic thoroughly because you need a strong foundation in this area!

28 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo