You may want to first read up on what is a process and what is a thread.
Hung Thread Scenario
Let's say an application in the application server has a search engine and a user searches for "hello world". Once the user clicks submit, a thread is created that executes a query against the SQL database. After the query has completed, the database query should be completed and the thread gets destroyed. In this example, the thread would probably have been active for no more than a few seconds. This is the ideal situation. On the other hand, let's say the database has some issue (it's down or flooded with requests). This could cause the thread to remain active for a long period of time, thus producing a hung thread situation.
An application server can be configured to write hung thread events to the SystemOut.log.
- In the left panel of the WebSphere admin console, select Servers > Server Types > WebSphere application servers.
- Select an application server.
- Expand Administration and select Custom properties.
- Select New, and create the two properties in the table below. This two properties are required to write hung thread events to the JVMs SystemOut.log.
- Select Save.
- Restart the application server for this change to take effect.
|com.ibm.websphere.threadmonitor.interval||integer||Number of seconds between each interval to see if there are any hung threads, such as 30 (seconds).|
|com.ibm.websphere.threadmonitor.threshold||integer||Number of seconds that must elapse before a thread is considered hung, such as 60 (1 minute).|
Using the properties in the above table, a thread will be consider hung when the thread has been active for 10 minutes. As an example, a thread may be hung when an application issues a SQL request, and the database does not issue a response in a timely manner. Hung threads are problematic, because they use system resources (CPU, memory), and usually, the thread is unintentially hung, which means system resources are unnecessarily being used.
End a hung thread after x milliseconds
In the left panel of the WebSphere admin console, at Servers > Server types > your application server > Thread pools > Web container, by default, thread inactivity timeout will be set to 5000 milliseconds (that's 5 seconds). This is where things can get a bit tricky. What this means is that any Web container thread that has been active for 5000 milliseconds will be ended, thus preventing the thread from become a hung thread, if the minimum size has been reached. It's important to recognize that threads will not be destroyed if the number of threads is below the minimum.
If you want the JVM to create a javacore dump when a hung thread is detected, add the following custom property. The javacore dump can be analyzed using IBMs Thread and Monitor Dump Analyzer tool, which is part of IBMs Support Assistant.
|com.ibm.websphere.threadmonitor.dump.java||1||Create a javacore dump when a hung thread is detected.|
Replicate hung thread
If you want to create a servlet in Eclipse that will produce a hung thread, refer to this article.
In the SystemOut.log, a hung thread will be identified by event code WSVR0605W, and when the thread is no longer hung, event WSVR0606W will be found in the log. When hung threads are detected, the number of hung threads will be identifed, along with how long the threads have been hung. In this example, there is 1 hung thread 655526 milliseconds, which is about 10 minutes.
WSVR0605W: Thread "WebContainer : 1" (000000a1) has been active for 655526 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
The SystemOut.log may list events that correlate to the hung thread. In this example, you would look for events near 10 minutes prior to the hung thread event. The list of possible causes of hung threads is far to vast to list. Each hung thread will need to be looked at. However, some common causes of hung threads are long database connections, some sort of system scan (such as anti-virus), or some sort of long running batch job. Not all hung threads are necessarily bad - for example, an anti-virus scan may be an OK reason for a long running thread.
Once the threads are no longer hung, the SystemOut.log will list the following event.
WSVR0606W: Thread "WebContainer : 1" (000000a1) was previously reported to be hung but has completed.
If the application server is fronted by a web server, check the web server access log near the time that the request that caused the first thread to hang. If there is an event in the access log that identifies the resource that was requested (/example/search) and the user that initiated the request, and the user is in your organization, consult with the user and application developers to determine what the user was doing that caused the application to produce a hung thread.
IBMs Support Assistant Thread and Monitor Dump Analyzer tool can also be used to spot hung threads. Download this tool.