8.3.1 Concept of number of concurrent executions control and pending queue control

Multi-processing of several requests is an effective way of enhancing the throughput of applications in the Application Server system. In comparison to processing one request in one thread at one time, processing of requests in multiple threads is found to enhance the throughput.

If there are bottlenecks in I/O processing or exclusion processing, however, and if maximum throughput has already been achieved, multi-processing is not effective in enhancing the throughput. Check the following points as you tune the number of concurrent executions:

Removing bottlenecks from I/O processing and exclusion processing
If the CPU usage remains low even after multiplexing threads and the throughput is not enhanced, it is likely that there are bottlenecks in I/O processing and exclusion processing, for example when the application accesses a database. In such a case, tuning has to be done after identifying the process that has a bottleneck and removing the bottleneck. For example, the bottlenecks in I/O processing and exclusion processing can be removed by tuning the method of accessing the database and by changing the method of exclusion processing.
Check maximum throughput
If you go on increasing the multiplicity of requests and the number of threads, the free time available in the CPU goes on decreasing along with the increase in the number of threads and any further increase will result in almost no time available in the CPU. If this stage is reached, the throughput will not be enhanced even by increasing the number of threads.
This state indicates that there is a bottleneck in the CPU and the performance of the machine itself has reached the maximum limit. In short, the throughput at this time is the maximum throughput of the application on the machine.
To gain higher throughput, you have to enhance the hardware by increasing the number of machines and CPU.
Maintaining the throughput by controlling the number of concurrent executions
If multiplicity and the number of threads are increased in a state in which the CPU usage is full, there is an increase in the number of threads that cannot be allocated by the CPU in an executable state. This will lead to a lock conflict between the threads, and repetition of thread's context switch that can result in deterioration of the throughput.
If you increase the number of threads, the memory usage will also increase along with the number of threads in the application server. In short, to avoid increasing the memory usage and decreasing the throughput, it is necessary to limit the number of threads to the number of executable threads only.
You can use the function for controlling the number of concurrent executions appropriately in order to hold the execution of requests exceeding the value of the maximum number of concurrent executions even when you have tuned the maximum number of concurrent executions and increased the multiplicity of the requests. As a result, a high throughput can be maintained despite temporary overloading and peak load status.
Tuning the pending queue size
When a request received in the application server exceeds the maximum number of concurrent executions, you can register the request in a queue and keep the request pending until the processing of all requests in progress is complete. However, if the queue in which the request is kept pending until processing of other requests is complete, has a maximum size limit (pending queue size), and there is a new request that exceeds the maximum size of the pending queue, the request is not registered in the queue and is returned to the client as an error. When setting a maximum value for the pending queue, it is necessary to secure enough number of requests that can be held pending in the queue.
The concepts of registering a request in the pending queue and returning the request as an error are explained below with the help of following figure:

Figure 8-2 Registering a request in a pending queue and returning a request as an error

[Figure]
Hint
Keeping requests in the pending queue is to prevent an error that may occur due to temporary overloading and peak-load state. Unnecessarily increasing the size of the pending queue for preventing failures is not an effective solution. Increase the number of concurrent executions or increase the number of machines or CPUs as necessary.
When a timeout has been specified on the client-side and if too much time has elapsed from the time the request was registered in the pending queue until it is actually executed, it is likely that the timeout will occur before the execution of the request and result in an error.
Specify an appropriate size for the pending queue.
Balancing the maximum number of concurrent executions in a hierarchical application
The performance of the entire system is not enhanced by tuning only a certain layer and increasing the number of concurrent executions because the layer with low performance is limiting the performance of the entire system.
The figure below illustrates an example in which meaningless settings are specified as a result of optimizing only a particular layer:

Figure 8-3 Example of meaningless settings due to optimization of a particular layer

[Figure]

If you increase the number of concurrent executions, it is likely that resources such as memory will be consumed unnecessarily even in an idle state in which no request is being processed. Therefore, to control the number of concurrent executions, check the entire system and specify an appropriate number of concurrent executions for each layer.