Tips from the Team

7 Problems to Look out for When Analyzing Garbage Collection Logs

Modern garbage collectors use sophisticated algorithms to manage unused memory, so you don’t have to. But when things aren’t working right, you need a way to peek under the hood and troubleshoot issues. Analyzing garbage collection (GC) logs can provide key insights into the behavior and performance of programming languages using automatic memory management, such as Java. If you suspect you’ve got a memory problem with your Java app, the GC log is the first place to look.

Let me share the seven common app problems you can uncover by analyzing GC logs.

Starting with JDK 9 the default GC implementation is the Garbage-First Garbage Collector (G1GC). All of the examples in this article assume you’re using JDK 9 or newer with G1GC.

1. GC pauses cause apps to stall

When the garbage collector runs, it can introduce delays into your application. This is because of the way GC is implemented. G1GC will pause your app while it frees unused memory objects and compacts memory regions to reduce wasted space. These GC pauses can introduce visible delays while your app is running.

You can inspect the GC logs to see how long your app was delayed for by using the -Xlog:safepoint command-line option. Here is an example GC log entry illustrating an app stall:

[7.971s][info ][safepoint ] Total time for which application threads were stopped: 0.0111253 seconds, Stopping threads took: 0.0000070 seconds

This example shows the GC pause caused the app threads to stall for 11 milliseconds. If you notice your app is periodically stalling, the GC logs are a good place to look.

2. Memory leaks

Even though Java implementations include a garbage collector, which automatically handles memory management, it’s still possible to introduce memory leaks into your app. One way to do this is to use the static keyword for large objects, e.g.

static final ArrayList<Integer> alist = new ArrayList<Integer>(50000);

Adding objects to this list and forgetting to remove them later causes a memory leak which the garbage collector is unable to work around. Looking at the GC logs can help you detect when this is happening because the amount of memory the garbage collector couldn’t reclaim will continue to increase. The following GC log excerpt demonstrates how to identify if you have a memory leak:

[22.064s][info ][gc ] GC(112) Pause Young (Normal) (G1 Evacuation Pause) 2848M->537M(3946M) 52.395ms

The item marked in red shows the size of objects currently referenced on the heap after a collection has taken place. If you have a memory leak you’ll see this value increase over time.

3. The total heap memory size is wrong

The single most important factor affecting Java app performance is the total size of the heap. This is because the frequency of collections is inversely proportional to the size of the regions—bigger regions mean you can allocate more objects before exhausting memory, which means the garbage collector runs less frequently. At the other extreme, if the size of your heap is too small for your workload you’ll see OutOfMemoryError exceptions as you app tries and fails to allocate unused memory.

The GC log can help you figure out whether the heap is badly sized because it displays statistics at the end of each collection.

[12.504s][info ][gc,heap,exit ] Heap
[12.504s][info ][gc,heap,exit ] garbage-first heap total 262144K, used 171106K [0x00000000f0000000, 0x0000000100000000)
[12.504s][info ][gc,heap,exit ] region size 1024K, 19 young (19456K), 0 survivors (0K)
[12.504s][info ][gc,heap,exit ] Metaspace used 181K, capacity 4486K, committed 4864K, reserved 1056768K
[12.504s][info ][gc,heap,exit ] class space used 6K, capacity 386K, committed 512K, reserved 1048576K

Here we can see the total heap size is 256MB. Other regions are sized relative to heap, so if you’re seeing too many full collections it might be time to increase the size of the heap. The -Xms and -Xmx control the minimum and maximum size of the heap, respectively.

4. High allocation rates can cause performance issues

The GC logs provide a way to capture how frequently your app is allocating objects. While high allocation rates aren’t necessarily a problem, they can lead to performance issues. To see whether this is affecting your app you can compare the size of the young generation after a collection and before the next one.

For example, the following three GC log entries show the app is allocating objects at about 12.48GB/sec.

[31.087s][info ][gc ] GC(153) Pause Young (Normal) (G1 Evacuation Pause) 3105M->794M(3946M) 55.590ms
[31.268s][info ][gc ] GC(154) Pause Young (Normal) (G1 Evacuation Pause) 3108M->798M(3946M) 55.425ms
[31.453s][info ][gc ] GC(155) Pause Young (Normal) (G1 Evacuation Pause) 3113M->802M(3946M) 55.571ms

Between 31.087s and 31.268s 2314M was allocated by the app (3108M-794M) and between 31.268s and 31.453s another 2315M was allocated (3113M-798M). This works out about 2.3GB every 200ms or 12.48GB/sec. Depending on your app’s behavior, allocating objects at that rate could negatively affect its performance.

5. Frequent humongous allocations

Humongous allocations are allocations over 50% of the G1 region size (the region size is another statistic printed in the GC log at the end of each collection). If your app uses humongous allocations heavily it can lead to excessive memory fragmentation and OutOfMemoryError exceptions.

You can check to see whether humongous allocations are causing problems by looking in the GC log for “humongous allocation.” Here’s an example:

[4.765s][info ][gc ] GC(122) Pause Young (Concurrent Start) (G1 Humongous Allocation) 889M->5M(1985M) 3.339ms

If your app uses humongous allocations and you think G1GC is spending too much time managing them, you can increase the region size to avoid costly collections of humongous objects by using the -XX:G1HeapRegionSize option.

6. Young objects are promoted too quickly

In contrast to the previous problem where objects are too large to fight in a G1 region, if the young generation size is too small minor collections will occur less frequently and instead full collections (which take more time to execute) will occur more often.

[4.845s][info ][gc ] GC(437) Pause Full (G1 Evacuation Pause) 255M->98M(256M) 102.056ms

If you see a large number of Pause Full messages in your log you may need to use the -XX:NewRatio option to increase the size of the young generation and avoid the more costly full collections.

7. Too few GC threads

When G1GC needs to do a full GC it spawns severl workers. If you have spare CPU capacity to run more threads, you can significantly reduce GC pause times by letting G1GC use them. For example, the first GC log entry below shows the pause time when running with one thread (-XX:ParallelGCThreads=1) and the second entry shows the pause time with four threads (-XX:ParallelGCThreads=4).

[8.194s][info ][gc ] GC(638) Pause Full (G1 Evacuation Pause) 255M->128M(256M) 242.539ms
[8.509s][info ][gc ] GC(698) Pause Full (G1 Evacuation Pause) 253M->129M(256M) 111.873ms

The results show the GC pause time was cut in half by increasing the number of threads.

Conclusion

Garbage collection algorithms are sophisticated pieces of software designed to alleviate the need to manually manage your app’s memory allocations. The G1GC JVM garbage collector supports several configuration options. To get the best results, you need to tune it appropriately. By carefully analyzing your GC logs you can better understand the performance and behavior of your Java apps and troubleshoot common problems.

However, analyzing GC logs by hand can be extremely time consuming. Fortunately, SolarWinds® Papertrail can automatically parse GC logs and provides an intuitive search syntax for combing through all of your logs from a central interface. If you want to see it for yourself, sign up for a trial or request a demo.

Aggregate, organize, and manage your logs

Papertrail
  • Collect real-time log data from your applications, servers, cloud services, and more
  • Search log messages to analyze and troubleshoot incidents, identify trends, and set alerts
  • Create comprehensive per-user access control policies, automated backups, and archives of up to a year of historical data
Start free trial Fully Functional for 30 Days

Let's talk it over

Contact our team, anytime.
Toll Free: +1 (855) 679-0752
Phone: +1 (512) 498-6011
papertrailapp@solarwinds.com