Get the PID
If a heap dump is unexpectedly or automatically created, ensure that the JVM associated with the heap dump is still running properly. The heap dump files can be used to determine the PID associated with the heap dump, and then the PID can be used to determine the JVM associated with the heap dump. The heap dump files are in this format:
Is the PID in use
Let's say the PID is 12345. The ps command can be used to determine if the PID is still associated wth the JVM. In this example, the output of the ps command only displays the grep command, which means that the PID is no longer in use, which means that the JVM that was associated with the PID is no longer running, or was automatically restarted as part of the heap dump. If there is a significant amount of output, the JVM is still running. However, the JVM is probably in a bad way. For example, the JVM may be out of memory.
~]# ps -ef | grep 12345 root 12345 1 0 21:54 pts/0 00:00:00 grep 12345
Kill the PID if in use
When the JVM is in a bad way, you first will want to kill the PID, and the start the JVM.
~]# kill -9 12345
Determine the JVM associated with the PID
You can search the logs with the PID to determine what JVM was associated with the PID. This command will usually produce quite a bit of output, as this command searches every file at and below the specified directory for the string (12345 in this example). This may help you find the JVM that had the PID associated with the heap dump. What you are looking for is line "Dumping heap to /path/to/java_pidxxxxx.hprof" in the JVMs catalina.out log.
~]# grep -R /path/to/logs/directory -ie 'PID 12345' Dumping heap to /path/to/java_pid12345.hprof
Ensure the JVM is running
Once you know the JVM that had the PID with the heap dump, determine if the JVM was restarted. You can check the catalina.log file for the event "Starting service Catalina" to determine when the JVM was last restarted.
~]# cat catalina.log Mar 16, 2018 1:25:56 AM org.apache.catalina.core.StandardService startInternal INFO: Starting service Catalina
Check for out of memory
If event OutOfMemory is in the catalina.out log before the heap dump event, this means that the JVM heap dumped because the JVM ran out of memory.
java.lang.OutOfMemoryError: PermGen space java.lang.OutOfMemoryError: Metaspace java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: GC overhead limit exceeded
PermGen / Metaspace
Java heap space
- If you have similar application servers in different environments (development, production), check to see if the heap size is the same across environments
- Analyze the heap dump using IBMs Heap Analyzer or Eclipse Memory Analyzer (mat)
- Determine what caused the heap dump using Introscope
- Check for a memory leak in Introscope
GC overhead limit exceeded
- Analyze the garbage collection log (gc.log) using GCEasy
- Check for garbage collection problems in Introscope
Check for SEVERE events in catalina.log
When a WAR is deployed to a Tomcat JVM, some threads associated with the prior instance of the application may remain active, thus retaining objects in the heap. Check the catalina.log for SEVERE errors.
SEVERE: The web application [/yourapp] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@544d8937]) and a value of type [org.apache.cxf.BusFactory.BusHolder] (value [org.apache.cxf.BusFactory$BusHolder@2e6ba51e]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
When the new WAR is deployed and becomes active, the new WAR creates new threads and places objects in the heap. For example, perhaps the old and new war both have an object in the heap for a SQL connection, when only a single object is needed for the SQL connection. Thus, both the old and new WAR are using the heap, which causes a large footprint in the heap, which can lead to an out of memory situation. For some reason, the garbage collector doesn’t know to remove the objects with the old WAR from the heap. A quick fix is to restart the JVM to remove the threads associated with the prior instance of the application.