Articles in this section

Investigating Service Thread Starvation through Netstat Analysis

Context

Service threads play a pivotal role in the seamless management of incoming client traffic demarshalling. When these threads become overwhelmed or stalled, it triggers a backlog in the network queue, potentially leading to service thread starvation. This article delineates the steps necessary to identify and confirm instances of service thread starvation.

Additionally, if you count the number of CLOSE_WAIT states, observe an increase in such occurrences without corresponding ESTABLISHED connections. This is an indication that clients attempted to make new connections but failed, leaving a significant number of CLOSE_WAIT states in the netstat output.
 

 


Method

A reliable method to assess service thread performance involves running a script to capture netstat data every 10 seconds on the server node(s). The primary focus during this analysis should be on monitoring the Recv-Q parameter in the netstat output.

 

Here is a sample netstat output between the server at port 3010 and client at port 40048:

grep

 

Tue Oct 10 20:50:52 UTC 2023

tcp        0      0 192.168.202.18:3010     192.168.34.194:40048    ESTABLISHED 1/asd               

Tue Oct 10 20:51:02 UTC 2023

tcp     2163      0 192.168.202.18:3010     192.168.34.194:40048    ESTABLISHED 1/asd               

Tue Oct 10 20:51:12 UTC 2023

tcp     2163      0 192.168.202.18:3010     192.168.34.194:40048    ESTABLISHED 1/asd               

Tue Oct 10 20:51:22 UTC 2023

tcp     2163      0 192.168.202.18:3010     192.168.34.194:40048    ESTABLISHED 1/asd               

Tue Oct 10 20:51:32 UTC 2023

tcp     2195      0 192.168.202.18:3010     192.168.34.194:40048    CLOSE_WAIT  1/asd               

                 

 

 

In instances where service threads encounter difficulties demarshalling the queue, the Recv-Q values ascend from 0 to 2163 and peak at 2195. The network state transitions from ESTABLISHED to CLOSE_WAIT. This transition signifies that the client likely experienced a timeout and initiated the closure of the existing connection.

 

 


Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful