Computer scienceFundamentalsJVMBenchmarking

JMH: Deep dive

5 minutes read

Earlier, you learned the basics of working with JMH. Now it's time to expand your knowledge. This topic will guide you deeper into the world of benchmarking with the JMH tool. Here, you'll learn to choose the benchmarking mode and output the time unit, work with multiple threads and make advanced configurations to run benchmarks. The topic will contain many code samples, so get ready — you will learn a lot of new stuff!

Specifying the benchmarking mode

As you already know, JMH uses default configurations to run benchmarks. Speaking of the benchmark mode, it uses the Throughput mode by default, but it's not limited by that. In the code below you'll see how to choose other modes:

public class MyBenchmark {
    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @Fork(1)
    @Warmup(iterations = 1)
    @Measurement(iterations = 2)
    public String testSingleShotTime() {
        Map<Integer, String> map = new HashMap<>();
        map.put(1, "A");

        return map.get(1);
    }

    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @Fork(1)
    @Warmup(iterations = 5, time = 5)
    @Measurement(iterations = 2, time = 5)
    public String testSampleTime() {
        Map<Integer, String> map = new HashMap<>();
        map.put(1, "A");

        return map.get(1);
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(1)
    @Warmup(iterations = 1, time = 5)
    @Measurement(iterations = 2, time = 5)
    public String testAverageTime() {
        Map<Integer, String> map = new HashMap<>();
        map.put(1, "A");

        return map.get(1);
    }
}

Note that @Warmup and @Measurement annotations here have two parameters for two modes: the iteration count and the duration of each iteration. When you run this code with the java -jar target/benchmarks.jar command you will see an output similar to below.

If you see a strange output like ? 10?? s/op on a Windows machine, run CHCP 65001 to enable UTF-8 in your current command prompt.

/* Mode 2 - Sampling time */
/* Environment info */
# Warmup: 5 iterations, 5 s each                   
# Measurement: 2 iterations, 5 s each              
# Timeout: 10 min per iteration                    
# Threads: 1 thread, will synchronize iterations   
# Benchmark mode: Average time, time/op            
# Benchmark: org.sample.MyBenchmark.testAverageTime

# Run progress: 0,00% complete, ETA 00:01:10
# Fork: 1 of 1
# Warmup Iteration   1: ≈ 10⁻⁷ s/op
# Warmup Iteration   2: ≈ 10⁻⁷ s/op
# Warmup Iteration   3: ≈ 10⁻⁷ s/op
# Warmup Iteration   4: ≈ 10⁻⁸ s/op
# Warmup Iteration   5: ≈ 10⁻⁸ s/op
Iteration   1: ≈ 10⁻⁷ s/op
Iteration   2: ≈ 10⁻⁸ s/op


Result "org.sample.MyBenchmark.testAverageTime":
  ≈ 10⁻⁷ s/op


/* Mode 2 - Sampling time */
/* Environment info */
# Warmup: 5 iterations, 5 s each
# Measurement: 2 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Sampling time
# Benchmark: org.sample.MyBenchmark.testSampleTime2

# Run progress: 50,00% complete, ETA 00:00:35
# Fork: 1 of 1
# Warmup Iteration   1: ≈ 10⁻⁷ s/op
# Warmup Iteration   2: ≈ 10⁻⁷ s/op
# Warmup Iteration   3: ≈ 10⁻⁷ s/op
# Warmup Iteration   4: ≈ 10⁻⁷ s/op
# Warmup Iteration   5: ≈ 10⁻⁷ s/op
Iteration   1: ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.00:   ≈ 0 s/op
                 testSampleTime2·p0.50:   ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.90:   ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.95:   ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.99:   ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.999:  ≈ 10⁻⁶ s/op
                 testSampleTime2·p0.9999: ≈ 10⁻⁵ s/op
                 testSampleTime2·p1.00:   ≈ 10⁻⁴ s/op

Iteration   2: ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.00:   ≈ 0 s/op
                 testSampleTime2·p0.50:   ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.90:   ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.95:   ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.99:   ≈ 10⁻⁷ s/op
                 testSampleTime2·p0.999:  ≈ 10⁻⁵ s/op
                 testSampleTime2·p0.9999: ≈ 10⁻⁴ s/op
                 testSampleTime2·p1.00:   0,001 s/op



Result "org.sample.MyBenchmark.testSampleTime2":
  N = 237234
  mean =     ≈ 10⁻⁷ ±(99.9%) 0,001 s/op

  Histogram, s/op:
    [0,000, 0,000) = 237216
    [0,000, 0,000) = 11
    [0,000, 0,000) = 1
    [0,000, 0,000) = 1
    [0,000, 0,001) = 1
    [0,001, 0,001) = 0
    [0,001, 0,001) = 2
    [0,001, 0,001) = 0
    [0,001, 0,001) = 1

  Percentiles, s/op:
      p(0,0000) =        ≈ 0 s/op
     p(50,0000) =     ≈ 10⁻⁷ s/op
     p(90,0000) =     ≈ 10⁻⁷ s/op
     p(95,0000) =     ≈ 10⁻⁷ s/op
     p(99,0000) =     ≈ 10⁻⁷ s/op
     p(99,9000) =     ≈ 10⁻⁵ s/op
     p(99,9900) =     ≈ 10⁻⁴ s/op
     p(99,9990) =      0,001 s/op
     p(99,9999) =      0,001 s/op
    p(100,0000) =      0,001 s/op


/* Mode 3 - Single shot invocation time */
/* Environment info */
# Warmup: 5 iterations, 5 s each
# Measurement: 2 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 1 thread
# Benchmark mode: Single shot invocation time
# Benchmark: org.sample.MyBenchmark.testSingleShotTime

# Run progress: 99,99% complete, ETA 00:00:00
# Fork: 1 of 1
# Warmup Iteration   1: ≈ 10⁻⁵ s/op
# Warmup Iteration   2: ≈ 10⁻⁵ s/op
# Warmup Iteration   3: ≈ 10⁻⁵ s/op
# Warmup Iteration   4: ≈ 10⁻⁵ s/op
# Warmup Iteration   5: ≈ 10⁻⁵ s/op
Iteration   1: ≈ 10⁻⁵ s/op
Iteration   2: ≈ 10⁻⁵ s/op


/* Summary */
# Run complete. Total time: 00:01:11

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.

Benchmark                                              Mode     Cnt   Score    Error  Units
MyBenchmark.testAverageTime                            avgt       2  ≈ 10⁻⁷            s/op
MyBenchmark.testSampleTime2                          sample  237234  ≈ 10⁻⁷            s/op
MyBenchmark.testSampleTime2:testSampleTime2·p0.00    sample             ≈ 0            s/op
MyBenchmark.testSampleTime2:testSampleTime2·p0.50    sample          ≈ 10⁻⁷            s/op
MyBenchmark.testSampleTime2:testSampleTime2·p0.90    sample          ≈ 10⁻⁷            s/op
MyBenchmark.testSampleTime2:testSampleTime2·p0.95    sample          ≈ 10⁻⁷            s/op
MyBenchmark.testSampleTime2:testSampleTime2·p0.99    sample          ≈ 10⁻⁷            s/op
MyBenchmark.testSampleTime2:testSampleTime2·p0.999   sample          ≈ 10⁻⁵            s/op
MyBenchmark.testSampleTime2:testSampleTime2·p0.9999  sample          ≈ 10⁻⁴            s/op
MyBenchmark.testSampleTime2:testSampleTime2·p1.00    sample           0,001            s/op
MyBenchmark.testSingleShotTime                           ss       2  ≈ 10⁻⁵            s/op

In this output you see three modes:

The Average time mode shows the average time of the method execution during each 10 seconds iteration.
The Sampling time mode randomly samples the method call time. It displays the result in histograms and percentiles. Let's analyze the numbers of each metric. The first line of the histogram is:
```
/* min  max */ 
[0,000, 0,000) = 237216
```
The number on the right shows the count of method calls. On the left, you see the range of execution time. On this line, you see 0,000 since the execution time is too small to be displayed in seconds. Later you will see how to change the output time unit to have more accurate results. The percentile shows similar information but in percent. For instance, take a look at these lines:
```
p(99,0000) =     ≈ 10⁻⁷ s/op
p(99,9000) =     ≈ 10⁻⁵ s/op
```
The first line says that 99% of calls were made in less than ≈ 10⁻⁷ seconds. The second one says 99.9% of calls were made in less than ≈ 10⁻⁵ seconds.
The Single Shot Time mode performs a single method call during each iteration and displays the result.

In addition, JMH provides the option to choose all of these modes at once. Just choose Mode.All and your benchmark will run in all modes:

@Benchmark
@BenchmarkMode(Mode.All)
@Fork(1)
@Warmup(iterations = 5)
@Measurement(iterations = 2)
public String testAll() {
    Map<Integer, String> map = new HashMap<>();
    map.put(1, "A");

    return map.get(1);
}

Choosing the output time unit

Another setting you can configure is the output time unit. If you remember, in the previous section the histogram showed the [0,000, 0,000) range in seconds. Now, let's pick a smaller time unit to have a more precise result:

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(1)
@Warmup(iterations = 5)
@Measurement(iterations = 2)
public String testSampleTime() {
    Map<Integer, String> map = new HashMap<>();
    map.put(1, "A");

    return map.get(1);
}

Here, the time unit is microseconds and the histogram output looks like this:

Histogram, us/op:
  [    0,000,  1250,000) = 438525
  [ 1250,000,  2500,000) = 2
  [ 2500,000,  3750,000) = 0
  [ 3750,000,  5000,000) = 0
  [ 5000,000,  6250,000) = 0
  [ 6250,000,  7500,000) = 2
  [ 7500,000,  8750,000) = 0
  [ 8750,000, 10000,000) = 1
  [10000,000, 11250,000) = 0
  [11250,000, 12500,000) = 1
  [12500,000, 13750,000) = 0
  [13750,000, 15000,000) = 0
  [15000,000, 16250,000) = 0
  [16250,000, 17500,000) = 0
  [17500,000, 18750,000) = 0

Of course, you can choose other time units for benchmarks as well.
This and other configurations can also be class-level. You just need to write the annotation configs under the class declaration:

@Fork(1)
@Warmup(iterations = 5)
@Measurement(iterations = 2)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class MyBenchmark {
    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    public String testSingleShotTime() {
           /* your code */  
    }

    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    public String testSampleTime() {
           /* your code */  
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    public String testAverageTime() {
           /* your code */  
    }
}

In the example above, all three benchmarks will be executed with the same configurations except for the benchmark mode.

Specifying the number of threads

All the examples you've seen so far were executed on one thread, but what if you need to test your method execution concurrently? For that purpose, JMH provides us with the @Thread annotation.

public class MyBenchmark {
    @Benchmark
    @Threads(4)
    public String testFourThreads() {
        Map<Integer, String> map = new HashMap<>();
        map.put(1, "A");

        return map.get(1);
    }

    @Benchmark
    @Threads(Threads.MAX) // Or @Threads(-1)
    public String testAllAvailableThreads() {
        Map<Integer, String> map = new HashMap<>();
        map.put(1, "A");

        return map.get(1);
    }
}

In the above two benchmarks, threads will execute the method concurrently. The first method will execute with four threads, and the second one will use all available threads on your machine.

Defining the benchmark state

The JMH tool provides us with the functionality for defining a state object's scope. We can specify this configuration in two ways, by using nested classes or marking the benchmark class with the appropriate annotation. Let's start exploring this feature with nested classes.

From this point you can define three object states:

Thread. Objects of classes marked by this state scope will belong to one thread. For each thread, JMH creates a separate instance.

public class MyBenchmark {
    @State(Scope.Thread)
    public static class StateScope {
        public Map<Integer, String> map = new HashMap<>();
    }

    @Benchmark
    @Threads(4)
    public String testMultipleThreads(StateScope scope) {
        scope.map.put(1, "A");
        return scope.map.get(1);
    }
}

In this example, you will have four different HashMap objects for each thread. Note that the state class must be public and static.

Group. This state defines separate instances for each group.

public class MyBenchmark {
    @State(Scope.Group)
    public static class StateScope {
        public Map<Integer, String> map = new HashMap<>();
    }

    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @Group("A")
    public String testGroupA(StateScope scope) {
        scope.map.put(1, "A");
        return scope.map.get(1);
    }

    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @Group("B")
    public String testGroupB(StateScope scope) {
        scope.map.put(1, "A");
        return scope.map.get(1);
    }
}

In the code above, there are two groups, and each of them will have its HashMap instance.

Benchmark. These objects are defined for all benchmark threads:

public class MyBenchmark {
    @State(Scope.Benchmark)
    public static class StateScope {
        public Map<Integer, String> map = new HashMap<>();
    }

    @Benchmark
    @BenchmarkMode(Mode.All)
    @Threads(2)
    public String testBenchmarkScope1(StateScope scope) {
        scope.map.put(1, "A");
        return scope.map.get(1);
    }

    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @Threads(4)
    public testBenchmarkScope2(StateScope scope) {
        scope.map.put(1, "A");
        return scope.map.get(1);
    }
}

Now, let's see how to implement this functionality without declaring a nested class:

@State(Scope.Thread)
public class MyBenchmark {
    public Map<Integer, String> map = new HashMap<>();

    @Benchmark
    @Threads(4)
    public String testMultipleThreads() {
        map.put(1, "A");

        return map.get(1);
    }
}

This code will operate similarly to the first example shown with the nested State class.

Defining state object invocation level

In the previous section, you got acquainted with state objects. The JMH tool has two annotations to manage the lifecycle of these objects. You can use them only in combination with the @State annotation, and methods marked with them are called fixture methods. Let's explore them to understand their purpose.

@Setup. Methods marked with this annotation are executed before the benchmark execution. This means that the code execution inside such a method won't be measured and included in the results.
@Teardown. These methods are executed after benchmark methods.

In the code below, there is one benchmark and two fixture methods. The first one named doSetup() is executed before the testMethod(), and the second one right after.

@State(Scope.Thread)
public class MyBenchmark {
    Map<Integer, String> map;

    @Setup()
    public void doSetup() {
        map = new HashMap<>();
        System.out.println("Setup");
    }

    @Benchmark
    public String testMethod() {
        map.put(1, "A");
        return map.get(1);
    }

    @TearDown()
    public void doTearDown() {
        System.out.println("Tear down");
    }
}

These annotations have levels that help you choose when you want to invoke them. Those levels are:

Trial. Fixture methods of this level run before all benchmarks in the case of @Setup, and after all benchmarks in the case of @Teardown.
Iteration. This level sets fixture methods to run before and after each benchmark iteration, including warmup iterations.
Invocation. On this level, fixture methods run before and after each benchmark method invocation.

In this code sample, the doSetup() method will run once per iteration before testMethod() and doTearDown() will run once at the end of all iterations. Each time the doSetup() will instantiate and add a pair to the map, doTearDown() will remove it at the end of the iteration.

@State(Scope.Benchmark)
public class MyBenchmark {
    Map<Integer, String> map;

    @Setup(Level.Iteration)
    public void doSetup() {
        map = new HashMap<>();
        map.put(1, "A");

        System.out.println("Setup");
    }

    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @Fork(1)
    @Warmup(iterations = 2)
    @Measurement(iterations = 2)
    public String testMethod() {
        return map.get(1);
    }

    @TearDown(Level.Iteration)
    public void check() {
        map.remove(1);

        System.out.println("Tear down");
    }
}

You can configure other levels similarly by just changing the Level.Trial with the appropriate level.

Passing different parameters to benchmark

In some cases, you'll need to run benchmarks with different parameters. The @Param annotation is designed for that purpose. It injects the specified value before benchmark method invocations. Like the annotations from the previous sections, this one also has to be used inside the @State class.

@State(Scope.Benchmark)
public class MyBenchmark {
    @Param({"A", "B", "C"})
    private String letter;
    Map<Intger, String> map;

    @Setup
    public void doSetup() {
        map = new HashMap<>();
        map.put(1, letter);
    }

    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @Fork(1)
    @Warmup(iterations = 2)
    @Measurement(iterations = 2)
    public String testMethod() {
        System.out.println(map);

        return map.get(1);
    }
}

The sample above will run three benchmarks with given values accordingly. Note that parameters can be primitive data types and their wrapper classes: a String, or an Enum.

Avoiding dead code elimination and constant folding

As you already know, the JVM performs dead code elimination, so, in each benchmark, we returned a value. For that purpose, JMH has a special class that consumes different values and helps us avoid dead code elimination.

@State(Scope.Benchmark)
public class MyBenchmark {
    @Benchmark
    public void testMethod(Blackhole blackhole) {
        blackhole.consume("Hello".toUpperCase());
        blackhole.consume("World".toUpperCase());
    }
}

The shown code splits two strings separately and consumes them one by one — you don't need to return anything. Opt for this approach when you have two or more results and can't return them separately.
Now, think hard: in the example above there are two hardcoded strings, which means that JVM can perform optimizations and skip performing the required operation. This is a regular case of constant folding. To avoid such a situation, let's pass these strings from the state object:

public class MyBenchmark {
    @State(Scope.Benchmark)
    public static class BenchmarkState {
        public String str1 = "Hello";
        public String str2 = "World";
    }

    @Benchmark
    public void testMethod(BenchmarkState state, Blackhole blackhole) {
        blackhole.consume(state.str1.toUpperCase());
        blackhole.consume(state.str2.toUpperCase());
    }
}

Now both variables are declared outside of the benchmark, which prevents us from getting undesirable results.

Conclusion

In this topic, you learned some advanced approaches to JMH benchmarking. The capabilities of this tool aren't limited to what is presented here. However, you have learned enough to design many benchmark cases and test your application.

42 learners liked this piece of theory. 1 didn't like it. What about you?

Report a typo