What is a process?
A process contains numerous threads, so before we get into threads, it good to understand what a process is. A process is a program running on the operating system and each process has a unique process identification number (PID). The ps command (Linux) can be used to view the processes that are running. Here is an example of the first 5 processes as identified by the ps command.
~]# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 Mar19 ? 00:00:00 systemd root 2 0 0 Mar19 ? 00:00:00 kthreadd root 3 0 0 Mar19 ? 00:00:00 ksoftirqd root 4 0 0 Mar19 ? 00:00:00 kworker root 5 0 0 Mar19 ? 00:00:00 migration
What process contains the hung thread?
SInce a WebSphere application server is a Java virtual machine (JVM), Java is the process that contains the hung thread. You can get the PID of the JVM using the ps command. In this example, the PID is 12345, thus 12345 is the process that contains the hung thread.
~]# ps -ef | grep <jvm name> root 12345 0 0 Mar19 ? 00:00:00 myJVM
What is a thread?
A process will contain numerous threads. For example, a thread is created when an application in the application server makes a database query.
Hung Thread Scenario
Let's say an application in the application server has a search engine and a user searches for "hello world". Once the user clicks submit, a thread is created that executes a query against the SQL database. After the query has completed, the database query should be completed and the thread gets destroyed. In this example, the thread would probably have been active for no more than a few seconds. This is the ideal situation. On the other hand, let's say the database has some issue (it's down or flooded with requests). This could cause the thread to remain active for a long period of time, thus producing a hung thread situation.
An application server can be configured to write hung thread events to the SystemOut.log.
- In the left panel of the WebSphere admin console, select Servers > Server Types > WebSphere application servers.
- Select an application server.
- Expand Administration and select Custom properties.
- Select New, and create the two properties in the table below. This two properties are required to write hung thread events to the JVMs SystemOut.log.
- Select Save.
- Restart the application server for this change to take effect.
|com.ibm.websphere.threadmonitor.interval||integer||Number of seconds between each interval to see if there are any hung threads, such as 30 (seconds).|
|com.ibm.websphere.threadmonitor.threshold||integer||Number of seconds that must elapse before a thread is considered hung, such as 60 (1 minute).|
Using the properties in the above table, a thread will be consider hung when the thread has been active for 10 minutes. As an example, a thread may be hung when an application issues a SQL request, and the database does not issue a response in a timely manner. Hung threads are problematic, because they use system resources (CPU, memory), and usually, the thread is unintentially hung, which means system resources are unnecessarily being used.
If you want the JVM to create a javacore dump when a hung thread is detected, add the following custom property. The javacore dump can be analyzed using IBMs Thread and Monitor Dump Analyzer tool, which is part of IBMs Support Assistant.
|com.ibm.websphere.threadmonitor.dump.java||1||Create a javacore dump when a hung thread is detected.|
Replicate hung thread
If you want to create a servlet in Eclipse that will produce a hung thread, refer to this article.
In the SystemOut.log, a hung thread will be identified by event code WSVR0605W, and when the thread is no longer hung, event WSVR0606W will be found in the log. When hung threads are detected, the number of hung threads will be identifed, along with how long the threads have been hung. In this example, there is 1 hung thread 655526 milliseconds, which is about 10 minutes.
WSVR0605W: Thread "WebContainer : 1" (000000a1) has been active for 655526 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
The SystemOut.log may list events that correlate to the hung thread. In this example, you would look for events near 10 minutes prior to the hung thread event. The list of possible causes of hung threads is far to vast to list. Each hung thread will need to be looked at. However, some common causes of hung threads are long database connections, some sort of system scan (such as anti-virus), or some sort of long running batch job. Not all hung threads are necessarily bad - for example, an anti-virus scan may be an OK reason for a long running thread.
Once the threads are no longer hung, the SystemOut.log will list the following event.
WSVR0606W: Thread "WebContainer : 1" (000000a1) was previously reported to be hung but has completed.
If the application server is fronted by a web server, check the web server access log near the time that the request that caused the first thread to hang. If there is an event in the access log that identifies the resource that was requested (/example/search) and the user that initiated the request, and the user is in your organization, consult with the user and application developers to determine what the user was doing that caused the application to produce a hung thread.
IBMs Support Assistant Thread and Monitor Dump Analyzer tool can also be used to spot hung threads.