9 minutes read

This topic will introduce you to one of the HotSpot garbage collectors introduced in Java 6 and made default in Java 9. It is the G1GC or Garbage First garbage collector, designed for memory-consuming applications. It provides relatively small stop-the-world pauses compared to previous collectors. You will study its internal structure, garbage collection phases, and how to adjust its various parameters. In the end, you will learn how to get the collector logs and understand them.

Heap structure

G1 is a generational region-based garbage collector. It consists of many fixed-size regions, each representing a particular generation. By default, the heap size is split into 2048 equal regions, with their size being the power of 2 in the range of 1-32MB depending on the heap size. This increases the garbage collection process predictability. Each region can be one of the following: the young generation, which includes the Eden and Survivor generations, and the old generation. New objects are created in the Eden generation, and objects that survived the cleanup here are moved to the Survivor. In turn, the objects that survived the cleanup in the Survivor generation are moved to the old ones. These region-generations are created from free regions during the application runtime when necessary.

g1 garbage collector generations regions

In addition, Humongous regions, where the application stores humongous objects, are also part of the old generation. They can represent one or several regions combined into one. As a rule, humongous objects consume at least half of the standard region size.

Garbage collection phases

By saying garbage collection, we mean that there are three types of collection cycles. They are:

  • Young GC, when the garbage collector performs a cleanup only in the young generation. It doesn't mean that the cleanup is performed in all young regions during each garbage collection — G1 scans all regions to find and clean regions containing the biggest number of garbage objects first. That's why this GC is called a "garbage first" collector. Young garbage collections cause STW (stop-the-world) events and are performed periodically to evacuate survived objects to older regions. At the same time, the pause time is limited. By default, it's set to 10% of the garbage collection time vs 90% of the application run time. So, the GC reclaims as much memory as it can in order not to exceed the specified pause time.

  • Mixed GC, when the garbage collector performs a cleanup in the young generation and at least in one old generation region. This GC type runs after the Young GC concurrently.

  • Full GC, when the garbage collector performs a cleanup of the entire heap space. This GC cycle starts when the Mixed GC doesn't manage to reclaim enough memory. In such cases, there are chances that Mixed GC can exceed the pause time limit.

The garbage collection process involving old regions is more complicated than the Young GC and includes phases. Each of them performs a specific role. In the list below, phases causing stop-the-world pauses are marked as STW. The rest run concurrently with the application.

  1. Initial mark (STW). This phase is performed at the same time as the Young GC. Its purpose is to find and mark Survivor regions containing references to the old generation.

  2. Root region scanning. During this phase, the GC scans Survivor regions to mark objects with references to the old generation.

  3. Concurrent marking. In this phase, the GC scans and searches live objects in the entire heap.

  4. Remark (STW). This phase completes the marking process by performing an additional search of live objects using the SATB (snapshot-at-the-beginning) algorithm which helps to find floating garbage. Those are objects that were live at the beginning of the Concurrent marking phase but became unavailable at the end of that phase.

  5. Cleanup (partly STW). The GC performs certain operations concerning the cleanup process, such as accounting live objects and free regions, and resetting empty regions to include them in the free region list.

  6. Copying (STW). In this phase, live objects are evacuated or copied into new unused regions. This phase can be performed with both young and old generations.

In practice, the garbage collection process isn't limited to these phases. It is much more complicated. These were only some crucial points for you to know to form a basic understanding of the topic.

g1 garbage collection phases

In the diagram above, you see an example of how a GC can combine STW and concurrent phases when performing a cleanup. Here, the purpose of the Compact phase is memory defragmentation which also requires the STW pause.

G1 parameters

Now that you've learned about the G1 structure, let's explore some important settings that you can use.

  • -XX:+UseG1GC. This is the parameter to use the G1 collector. Since Java 9 the G1 is the default collector, so you don't need to explicitly specify using G1, but if your application runs using another GC, you can switch to G1 with this argument.

  • -XX:G1HeapRegionSize=n. As you probably remember, regions can be from 1 to 32MB. It can be set automatically, but you can also use this argument to set the size according to your needs.

  • -XX:InitiatingHeapOccupancyPercent=n. Sets the entire heap occupancy percentage when a concurrent GC cycle must start.

  • -XX:NewRatio=n. Sets the ratio of young and old generations size. By default, this value is 2.

  • -XX:SurvivorRatio=n. Sets the ratio of Eden and Survivor generation size. By default, this value is 8.

  • -XX:MaxGCPauseMillis. Sets a maximum stop-the-world pause time. By default, this value is 200.

  • -XX:PauseTimeIntervalMillis. Sets an interval between GC pauses. Its value must be bigger than the previous parameter value.

  • -XX:ParallelGCThreads. Sets the thread number used during parallel garbage collection phases. Depends on the platform where the JVM is running.

  • -XX:ConcGCThreads. Sets the number of threads used during concurrent garbage collection phases. Depends on the platform where the JVM is running.

Remember that even though you can adjust different parameters to change the default configurations, it isn't recommended to change them without a detailed performance investigation.

To adjust these parameters, you can use them when running your application with the java command from the command line. Let's take a look at an example:

java -Xms1024m -Xmx2048m -XX:G1HeapRegionSize=2048K -XX:MaxGCPauseMillis=150 Main

Here, you see 4 configurations: initial heap size, maximum heap size, size of each heap region, and the STW pause duration.
You can also enable experimental features. For that purpose, before applying such parameters, use the following option: -XX:+UnlockExperimentalVMOptions. Examples of such parameters are:

  • -XX:G1NewSizePercent=n. Sets the percentage of the young generation. By default, the value is 5.

  • -XX:G1MaxNewSizePercent=n. Sets the maximum percentage of the young generation. By default, the value is 60.

Here is an example of its usage:
java -Xmx2048m -XX:G1HeapRegionSize=2048K -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=20 Main

Exploring GC logs

In the previous section, you learned about some parameters you can use to configure your garbage collector. There is one more important parameter that is missing in that section. It provides information about the operations the GC performs. That parameter is -Xlog:gc*. If you add it to the command from the previous section, you will get your GC logs each time it performs a garbage collection. First, let's write a simple code that will allocate a lot of memory:

public static void main(String[] args) throws InterruptedException {
    for (int i = 1; i < 50_000_000; i++) {
        long[] arr = new long[500];

        for (int j = 0; j < arr.length; j++) {
            arr[j] = j;
        }

        arr = null;
    }
}

Now, let's apply this command to our compiled code:
java -Xms1024m -Xmx2048m -XX:G1HeapRegionSize=2048K -XX:MaxGCPauseMillis=150 -Xlog:gc* Main

You can run the same command but print the output in a file adding some additional parameters:
java -Xms1024m -Xmx2048m -XX:G1HeapRegionSize=2048K -XX:MaxGCPauseMillis=150 -Xlog:gc*[path]:utctime,pid Main. Just replace [path] with the path of the folder where you want to save the file.


The output of a single GC cycle looks like the message below. If the application performs several GC cycles, you'll see a similar output for each of them. When reading the message, pay attention to the third column to the left. It shows what the specific step (line) is associated with.

/* General info. Prints once at the beginning of a log message */ 
[0.009s][info][gc] Using G1
[0.016s][info][gc,init] Version: 17.0.1+12-LTS-39 (release)
[0.017s][info][gc,init] CPUs: 4 total, 4 available
[0.017s][info][gc,init] Memory: 8084M
[0.018s][info][gc,init] Large Page Support: Disabled
[0.019s][info][gc,init] NUMA Support: Disabled
[0.019s][info][gc,init] Compressed Oops: Enabled (32-bit)
[0.020s][info][gc,init] Heap Region Size: 2M
[0.020s][info][gc,init] Heap Min Capacity: 1G
[0.020s][info][gc,init] Heap Initial Capacity: 1G
[0.020s][info][gc,init] Heap Max Capacity: 2G
[0.020s][info][gc,init] Pre-touch: Disabled
[0.021s][info][gc,init] Parallel Workers: 4
[0.021s][info][gc,init] Concurrent Workers: 1
[0.021s][info][gc,init] Concurrent Refinement Workers: 4
[0.021s][info][gc,init] Periodic GC: Disabled
[0.023s][info][gc,metaspace] CDS archive(s) mapped at: [0x0000000800000000-0x0000000800bc0000-0x0000000800bc0000), size 12320768, SharedBaseAddress: 0x0000000800000000, ArchiveRelocati
onMode: 0.
[0.023s][info][gc,metaspace] Compressed class space mapped at: 0x0000000800c00000-0x0000000840c00000, reserved size: 1073741824
[0.024s][info][gc,metaspace] Narrow klass base: 0x0000000800000000, Narrow klass shift: 0, Narrow klass range: 0x100000000
/* GC log */
[0.097s][info][gc,start    ] GC(0) Pause Young (Normal) (G1 Evacuation Pause)
[0.098s][info][gc,task     ] GC(0) Using 4 workers of 4 for evacuation
[0.101s][info][gc,phases   ] GC(0)   Pre Evacuate Collection Set: 0.1ms
[0.102s][info][gc,phases   ] GC(0)   Merge Heap Roots: 0.2ms
[0.102s][info][gc,phases   ] GC(0)   Evacuate Collection Set: 0.8ms
[0.103s][info][gc,phases   ] GC(0)   Post Evacuate Collection Set: 0.7ms
[0.104s][info][gc,phases   ] GC(0)   Other: 2.1ms
[0.104s][info][gc,heap     ] GC(0) Eden regions: 25->0(28)
[0.105s][info][gc,heap     ] GC(0) Survivor regions: 0->1(4)
[0.106s][info][gc,heap     ] GC(0) Old regions: 0->0
[0.106s][info][gc,heap     ] GC(0) Archive regions: 0->0
[0.106s][info][gc,heap     ] GC(0) Humongous regions: 0->0
[0.107s][info][gc,metaspace] GC(0) Metaspace: 134K(384K)->134K(384K) NonClass: 129K(256K)->129K(256K) Class: 5K(128K)->5K(128K)
[0.107s][info][gc          ] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 50M->0M(1024M) 9.761ms
[0.107s][info][gc,cpu      ] GC(0) User=0.00s Sys=0.00s Real=0.01s
[0.121s][info][gc,start    ] GC(1) Pause Young (Normal) (G1 Evacuation Pause)
[0.121s][info][gc,task     ] GC(1) Using 4 workers of 4 for evacuation
[0.123s][info][gc,phases   ] GC(1)   Pre Evacuate Collection Set: 0.1ms
[0.124s][info][gc,phases   ] GC(1)   Merge Heap Roots: 0.0ms
[0.124s][info][gc,phases   ] GC(1)   Evacuate Collection Set: 0.5ms
[0.125s][info][gc,phases   ] GC(1)   Post Evacuate Collection Set: 0.5ms
[0.125s][info][gc,phases   ] GC(1)   Other: 1.7ms
[0.126s][info][gc,heap     ] GC(1) Eden regions: 28->0(306)
[0.126s][info][gc,heap     ] GC(1) Survivor regions: 1->1(4)
[0.126s][info][gc,heap     ] GC(1) Old regions: 0->0
[0.127s][info][gc,heap     ] GC(1) Archive regions: 0->0
[0.127s][info][gc,heap     ] GC(1) Humongous regions: 0->0
[0.128s][info][gc,metaspace] GC(1) Metaspace: 134K(384K)->134K(384K) NonClass: 129K(256K)->129K(256K) Class: 5K(128K)->5K(128K)
[0.128s][info][gc          ] GC(1) Pause Young (Normal) (G1 Evacuation Pause) 56M->0M(1024M) 7.711ms
[0.129s][info][gc,cpu      ] GC(1) User=0.00s Sys=0.00s Real=0.01s

In many resources, you'll find such a command to print GC logs: -XX:+PrintGCDetails. This option is deprecated, so, instead, you'll see the -Xlog:gc* parameter in this topic.

Some parts of the message may not be clear to you if you aren't familiar with all the phases of the collector's work, but there is also information that you should be able to read. As you can see from the fourth column info, this output contains a log of two GC cycles: GC(0) and GC(1), although there will be much more for this code. If you take a closer look at the log, you'll see that for each GC cycle there are only evacuation pause in the young generation. Now, let's explicitly call the GC by invoking the System.gc() method.

Don't forget that you can't be sure System.gc() will start the garbage collection process. So, avoid using it in the final versions of your application that will be available to users.

public static void main(String[] args) throws InterruptedException {
    for (int i = 1; i < 50_000_000; i++) {
        long[] arr = new long[500];

        for (int j = 0; j < arr.length; j++) {
            arr[j] = j;
        }

        arr = null;
        System.gc();
    }
}

This time, we'll face a completely different output. The message below shows details of the first GC cycle. From the first line, you can understand that the application is on the full STW pause, unlike the previous output, which showed only a young generation pause.

[0.078s][info][gc,start    ] GC(0) Pause Full (System.gc())
[0.078s][info][gc,phases,start] GC(0) Phase 1: Mark live objects
[0.079s][info][gc,phases      ] GC(0) Phase 1: Mark live objects 1.027ms
[0.080s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction
[0.082s][info][gc,phases      ] GC(0) Phase 2: Prepare for compaction 1.807ms
[0.082s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers
[0.084s][info][gc,phases      ] GC(0) Phase 3: Adjust pointers 1.105ms
[0.084s][info][gc,phases,start] GC(0) Phase 4: Compact heap
[0.085s][info][gc,phases      ] GC(0) Phase 4: Compact heap 0.933ms
[0.091s][info][gc,heap        ] GC(0) Eden regions: 2->0(25)
[0.091s][info][gc,heap        ] GC(0) Survivor regions: 0->0(0)
[0.091s][info][gc,heap        ] GC(0) Old regions: 0->1
[0.092s][info][gc,heap        ] GC(0) Archive regions: 0->0
[0.092s][info][gc,heap        ] GC(0) Humongous regions: 0->0
[0.093s][info][gc,metaspace   ] GC(0) Metaspace: 133K(320K)->133K(320K) NonClass: 128K(192K)->128K(192K) Class: 5K(128K)->5K(128K)
[0.093s][info][gc             ] GC(0) Pause Full (System.gc()) 2M->0M(1024M) 15.413ms
[0.094s][info][gc,cpu         ] GC(0) User=0.00s Sys=0.00s Real=0.02s

From the third column, you can see that the next 8 lines represent GC phases: Mark live objects, Prepare for compaction, Adjust pointers, and Compact heap.

If you wish to explore command line parameters to configure the GC or investigate logs deeper, you can read the official documentation on this topic.

Conclusion

In this topic, you explored the fundamentals of the G1 garbage collector. Now you know how it operates internally, and how to adjust different command line parameters to apply settings according to your needs. You also explored samples of GC logs and learned how to get them. The topic can be quite a challenge — don't stop here if you wish to master it. Keep reading a lot or watching videos, and consider this topic the start of your journey with G1GC!

43 learners liked this piece of theory. 2 didn't like it. What about you?
Report a typo