To create a high-quality and reliable application capable of high loads, you must control the quality by thoroughly checking its operability in various ways. For this purpose, developers use different techniques, and benchmarking is one of them. In this topic, we will explore the basics of benchmarking. You will understand what metrics this technique uses, and get acquainted with micro and macro benchmarking approaches. Also, you will discover the potential difficulties with benchmarking an application.
Why benchmarking?
Benchmarks are software solutions designed to emulate situations of loading the application (including multithreaded loading for concurrent environment emulation) and measure its performance. This performance tuning approach shouldn't be confused with profiling, which is also used for the same purpose. These two techniques have significant differences. Profilers don't aim to load the application but simply monitor its execution and collect statistics without interfering in this process. Benchmarks specifically create different conditions for an application in order to collect statistics on the behavior of the application in various situations. They repeat measurements many times, performing the so-called benchmark iterations, in order to get reproducible results and collect meaningful statistics.
Benchmark types
In this section, we will cover two benchmarking types (although, in practice, there are more of them):
Macro benchmarks. With this type, you are testing the whole application with real-life scenarios by performing operations a real user would do while using it. Here, it's hard to detect small performance issues since you are dealing with a whole system.
Micro benchmarks. This benchmarking type's purpose is to test a small piece of code. It allows you to perform a more detailed application examination and find small issues affecting its performance. For instance, if you have a method to find Fibonacci numbers and there is more than one solution: one is based on recursion and the other isn't. Microbenchmarking will help you to find out which solution is more efficient. Other cases could be choosing the right algorithm to sort the array, choosing between several collections to store elements comparing their performance for different operations, and so on.
These two approaches provide answers to different questions but complement each other. The best solution is to use both of them.
Benchmark metrics
In practice, when you say benchmarking, generally you should understand creating methods that perform some operations to test the application. Such tools have special metrics to measure application performance which, as mentioned earlier, are collected while repeatedly performing the same operation. Two common metrics are:
Throughput. This metric shows the number of given method executions performed in a fixed period of time.
Latency. This metric determines the time consumed for the given method execution. It can be represented in different values: the average time spent on the method execution during all performed benchmarks or the minimum/maximum time spent on the execution. Finally, it can be represented in the form of percentile, which shows the percentage of method executions consuming less than the specified time. This parameter gives a clearer and deeper picture of the situation, allowing you to better analyze your case. For example, the maximum could be 20ms, but 90-percentile is 12ms, because of several splash measurements. This means that 90% of the methods' execution time is no more than 12 milliseconds and the remaining 10% from 12 to 20 milliseconds.
Depending on the benchmarking tool you use and its settings, which you can configure based on your requirements, the results you'll get may look different and include a different amount of information.
Problems with benchmarks
In many situations, the JVM helps us by performing optimizations, but sometimes it can affect the results of the benchmark. In this section, we will look at some frequently encountered benchmarking problems you need to know about in order to handle them correctly. Among those are:
Dead-code elimination. While writing code in your IDE, you might have faced situations when it highlighted a section of code warning that it wasn't being used and therefore was useless. Those are the so-called "dead-code" sections. The same thing is performed at the bytecode level by the JIT compiler, but the IDE won't show that. In such cases, the JIT compiler will eliminate the unused code to optimize the application execution. Microbenchmarks are tiny programs testing a small piece of your application so it's easy to encounter dead code. Consider you have a method named
multiply()and you want to test its performance:public static void multiply() { Math.pow(2, 3); }This method may be executed somewhere in your application, but the microbenchmark covers a small piece of code calculating the power of a number. In the scope of that benchmark test, this line is useless since it isn't used anywhere else. The JIT compiler will remove
Math.pow(2, 3)and you will get incorrect results. So, you should consider such cases when designing benchmark tests.Constant folding. This happens when you hard-code values in your code. The code example above will have also this problem. If you execute the same code with the same values multiple times, the JVM is smart enough to detect it. So, it will remember the multiplication result to avoid consuming resources on that operation. But this isn't what we need when benchmarking, since it will affect its results.
Warmup period. Due to various internal optimization techniques and JVM internal structure, for instance, the class loading mechanism, the JVM/JIT compiler has a warmup period at startup with worse performance. You need to wait for a while when performing warmup iterations to get real results, after which the performance gets better. Below is a graph showing a case of how the execution time can decrease over iterations.
The main problem here is you can't predict the JVM behavior and there is no way to determine the duration of the warmup period. Sometimes you will need to experiment with benchmark settings.
Conclusion
This topic is a brief introduction to an important technique that is extremely useful for increasing your application performance. You learned about the main benchmarking types and metrics, as well as some frequently occurring benchmarking problems. However, remember that knowing how benchmarking tools work and how to configure them is not enough to get the job done. You have to master the JVM internal structure at a very high level. This track will guide you on that journey.