2.5.2 Important considerations for operating JP1/Performance Management in large systems
This subsection describes some of the important considerations that you must take into account when operating JP1/Performance Management in large systems.
- Organization of this subsection
(1) Starting the PFM services of agent hosts
When shutting down the PFM services of agent hosts that connect to the monitoring manager, try shutting down the PFM services in several smaller groups, rather than shutting them all down at one time By verifying and comparing the leeway in startup time that your actual operations allow and the time it takes for all agents to start without entering stand-alone mode, determine the number of agents to be started at one time and the length of the interval at which the groups are to be started. The following shows, as a guideline, an example of the number of agents to be started at one time and the interval at which the agents are to be started. (Note that this example applies when the Status Server service and the Action Handler service on each agent host are to be started simultaneously.)
-
Number of agents to be started at one time: 100
-
Interval: 100 seconds
Example: In a large-scale system using 1,500 agents, if the PFM services of agent hosts begin to start at 05:00 while the PFM service of the monitoring manager is running, the PFM services can be started as shown in the following table.
Startup time |
Agent hosts whose PFM services are to be started |
---|---|
05:00:00 |
Agents 1 to 100 |
05:01:40 |
Agents 101 to 200 |
05:03:20 |
Agents 201 to 300 |
05:05:00 |
Agents 301 to 400 |
05:06:40 |
Agents 401 to 500 |
05:08:20 |
Agents 501 to 600 |
05:10:00 |
Agents 601 to 700 |
05:11:40 |
Agents 701 to 800 |
05:13:20 |
Agents 801 to 900 |
05:15:00 |
Agents 901 to 1000 |
05:16:40 |
Agents 1001 to 1100 |
05:18:20 |
Agents 1101 to 1200 |
05:20:00 |
Agents 1201 to 1300 |
05:21:40 |
Agents 1301 to 1400 |
05:23:20 |
Agents 1401 to 1500 |
If the PFM services of agent hosts are started simultaneously (including cases in which the OSs on agent hosts are simultaneously started or periodically restarted), the following notes apply:
-
The Event Monitor or Event History window might not display the agent event that is issued when the agent starts (the event whose message text is "Startup").
-
Agents and action handlers might temporarily start in stand-alone mode. If this occurs, it might take time before alarm events are issued for those agents or action handlers.
(2) The time it takes for all PFM services to start running in normal mode
Simultaneously starting up all PFM services as part of the simultaneous startup of the operating system or scheduled restart can strain the monitoring managers, causing agents and action handlers to start running in stand-alone mode on a temporary basis. When there are a large number of agents, it takes time for the agents and the action handlers to transition from stand-alone mode to normal mode. During stand-alone mode, records are collected but alarm evaluation is not performed. These are the factors that must be taken into consideration when operating JP1/Performance Management in large systems. The table below shows the approximate time it takes for agents and action handlers to transition to normal mode.
Number of agents# |
Number of action handlers |
Approximate time to normal mode activation (units: minutes) |
---|---|---|
100 |
100 |
2 |
500 |
500 |
30 |
1,200 |
1,024 |
70 |
2,500 |
2,500 |
120 |
(3) Command execution time
Because the jpctool config sync command, the jpctool config alarmsync command, and the jpcconf primmgr notify command access agents and action handlers, they take time to execute in large systems. The table below shows the approximate command execution time.
Number of agents#1 |
Number of action handlers |
Approximate command execution time (units: minutes) |
||
---|---|---|---|---|
jpctool config sync command |
jpctool config alarmsync command#2 |
jpcconf primmgr notify command |
||
100 |
100 |
25 |
15 |
2 |
500 |
500 |
120 |
55 |
10 |
1,200 |
1,024 |
240 |
120 |
20 |
2,500 |
2,500 |
585 |
290 |
50 |
The jpctool config sync command synchronizes alarm information and node information between agents and action handlers. The jpctool config alarmsync command synchronizes alarm information between the agents and action handlers whose application status is either Failed or Uncertain. Because commands take a long time to execute in large systems, we recommend that you use different commands under different circumstances as necessary.
(4) Simultaneously starting all PFM services on a system on which the automatic bind function is used
When you are starting agents for the first time after setting automatic bind, if you want to simultaneously start all PFM services as the agents start, automatic bind might not work in some agents due to the excessive burden placed on the system by PFM services. If automatic bind fails, the KAVE00568-E message is output. If this message is output, set alarm bind again or restart the agents in question and apply alarm information to them.
You can avoid this problem by starting agents in several batches. In doing so, you have to determine the number of agents to be started at a time and the time interval between batches by carefully considering and comparing not only the permissible startup time from the standpoint of operating JP1/Performance Management but also the time it can take for all agents to start running without entering stand-alone mode.
(5) Shutting down the PFM services of agent hosts
When shutting down the PFM services of agent hosts that connect to the monitoring manager, try shutting down the PFM services in several smaller groups, rather than shutting them all down at one time. The following shows, as a guideline, an example of the number of agents to be shut down at one time and the interval at which the agents are to be shut down. (Note that this example applies when the Status Server service and the Action Handler service on each agent host are to be shut down simultaneously.)
-
Number of agents to be shut down at one time: 500
-
Interval: 60 seconds
If the PFM services of agent hosts are shut down simultaneously (including cases in which the OSs on agent hosts are simultaneously shut down or periodically restarted), the following note applies:
-
The Event Monitor or Event History window might not display the agent event that is issued when the agent shuts down (the event whose message text is "Shutdown").