Hitachi

JP1 Version 12 JP1/Performance Management Planning and Configuration Guide


2.5.2 Important considerations for operating JP1/Performance Management in large systems

This subsection describes some of the important considerations that you must take into account when operating JP1/Performance Management in large systems.

Organization of this subsection

(1) Starting the PFM services of agent hosts

When shutting down the PFM services of agent hosts that connect to the monitoring manager, try shutting down the PFM services in several smaller groups, rather than shutting them all down at one time By verifying and comparing the leeway in startup time that your actual operations allow and the time it takes for all agents to start without entering stand-alone mode, determine the number of agents to be started at one time and the length of the interval at which the groups are to be started. The following shows, as a guideline, an example of the number of agents to be started at one time and the interval at which the agents are to be started. (Note that this example applies when the Status Server service and the Action Handler service on each agent host are to be started simultaneously.)

Example: In a large-scale system using 1,500 agents, if the PFM services of agent hosts begin to start at 05:00 while the PFM service of the monitoring manager is running, the PFM services can be started as shown in the following table.

Table 2‒3: Example of starting agent host services at 05:00:00

Startup time

Agent hosts whose PFM services are to be started

05:00:00

Agents 1 to 100

05:01:40

Agents 101 to 200

05:03:20

Agents 201 to 300

05:05:00

Agents 301 to 400

05:06:40

Agents 401 to 500

05:08:20

Agents 501 to 600

05:10:00

Agents 601 to 700

05:11:40

Agents 701 to 800

05:13:20

Agents 801 to 900

05:15:00

Agents 901 to 1000

05:16:40

Agents 1001 to 1100

05:18:20

Agents 1101 to 1200

05:20:00

Agents 1201 to 1300

05:21:40

Agents 1301 to 1400

05:23:20

Agents 1401 to 1500

If the PFM services of agent hosts are started simultaneously (including cases in which the OSs on agent hosts are simultaneously started or periodically restarted), the following notes apply:

(2) The time it takes for all PFM services to start running in normal mode

Simultaneously starting up all PFM services as part of the simultaneous startup of the operating system or scheduled restart can strain the monitoring managers, causing agents and action handlers to start running in stand-alone mode on a temporary basis. When there are a large number of agents, it takes time for the agents and the action handlers to transition from stand-alone mode to normal mode. During stand-alone mode, records are collected but alarm evaluation is not performed. These are the factors that must be taken into consideration when operating JP1/Performance Management in large systems. The table below shows the approximate time it takes for agents and action handlers to transition to normal mode.

Table 2‒4: The approximate time to normal mode activation

Number of agents#

Number of action handlers

Approximate time to normal mode activation (units: minutes)

100

100

2

500

500

30

1,200

1,024

70

2,500

2,500

120

#: This is the number of Agent Collectors or RM Collectors.

(3) Command execution time

Because the jpctool config sync command, the jpctool config alarmsync command, and the jpcconf primmgr notify command access agents and action handlers, they take time to execute in large systems. The table below shows the approximate command execution time.

Table 2‒5: Approximate command execution time

Number of agents#1

Number of action handlers

Approximate command execution time (units: minutes)

jpctool config sync command

jpctool config alarmsync command#2

jpcconf primmgr notify command

100

100

25

15

2

500

500

120

55

10

1,200

1,024

240

120

20

2,500

2,500

585

290

50

#1:

This is the number of Agent Collectors or RM Collectors.

#2:

The execution time shown for this command assumes that all services are subject to synchronization (that is, the application status is ether Failed or Uncertain).

The jpctool config sync command synchronizes alarm information and node information between agents and action handlers. The jpctool config alarmsync command synchronizes alarm information between the agents and action handlers whose application status is either Failed or Uncertain. Because commands take a long time to execute in large systems, we recommend that you use different commands under different circumstances as necessary.

(4) Simultaneously starting all PFM services on a system on which the automatic bind function is used

When you are starting agents for the first time after setting automatic bind, if you want to simultaneously start all PFM services as the agents start, automatic bind might not work in some agents due to the excessive burden placed on the system by PFM services. If automatic bind fails, the KAVE00568-E message is output. If this message is output, set alarm bind again or restart the agents in question and apply alarm information to them.

You can avoid this problem by starting agents in several batches. In doing so, you have to determine the number of agents to be started at a time and the time interval between batches by carefully considering and comparing not only the permissible startup time from the standpoint of operating JP1/Performance Management but also the time it can take for all agents to start running without entering stand-alone mode.

(5) Shutting down the PFM services of agent hosts

When shutting down the PFM services of agent hosts that connect to the monitoring manager, try shutting down the PFM services in several smaller groups, rather than shutting them all down at one time. The following shows, as a guideline, an example of the number of agents to be shut down at one time and the interval at which the agents are to be shut down. (Note that this example applies when the Status Server service and the Action Handler service on each agent host are to be shut down simultaneously.)

If the PFM services of agent hosts are shut down simultaneously (including cases in which the OSs on agent hosts are simultaneously shut down or periodically restarted), the following note applies: