2.2.2 Examples of manager/agent system configurations
Jobs can be distributed among multiple hosts. Install JP1/AJS3 - Manager on the hosts on which jobs will be managed (one or more manager hosts) and JP1/AJS3 - Agent on the hosts on which the jobs will be executed (agent hosts). JP1/AJS3 - Manager can be used instead of JP1/AJS3 - Agent. Install JP1/AJS3 - View on the hosts that will perform job monitoring.
- Organization of this subsection
(1) Operation with a single manager host
Install JP1/AJS3 - Manager on the manager host, and JP1/AJS3 - Agent on the agent hosts. Set up a dedicated JP1/AJS3 - View host as required.
The following figure shows an example of a manager/agent system configuration with one manager host.
(2) Operation with multiple manager hosts
Prepare multiple hosts on which to install JP1/AJS3 - Manager, and multiple hosts on which to install JP1/AJS3 - Agent. Set up a dedicated JP1/AJS3 - View host as required.
The following figure shows an example of a manager/agent system configuration with multiple manager hosts.
Use JP1/AJS3 in an environment where the IP address can be resolved from the local host name. JP1/AJS3 can only operate in an environment where the IP address can be resolved from the local host name.
When executing jobs on multiple agent hosts, you must also specify settings that allow resolution of the IP addresses for the manager hosts, agent hosts, and the hosts in other systems. When DNS is used, specify settings that allow resolution of host names in the FQDN format. Note, however, that host names in the FQDN format cannot be used for logical host names.
(3) Using a single agent host as multiple execution hosts
To execute jobs in JP1/AJS3, you must register agent host information in the manager host. The required information consists of the execution agent (the logical name of the agent host), together with its physical host name.
By creating multiple execution agents (logical names) for a single agent host, you can set up a job execution environment in which a single agent host can be used as multiple execution hosts. For details, see 2.5 Setting the job execution environment.
(4) Centrally monitoring work tasks
Using JP1/AJS3 Console, you can centrally monitor from one window all the work tasks being managed by multiple manager hosts, or under different scheduler services, or in different job groups.
The following discusses example system configurations in which work tasks are monitored using JP1/AJS3 Console, and examples of how such configurations can be used.
(a) Monitoring work tasks managed by multiple manager hosts
The following figure shows an example of a system configuration for centrally monitoring work tasks that are executed and managed by multiple manager hosts.
The following figure shows an example of a system configuration in which JP1/AJS3 Console is used to centrally monitor work tasks that are executed and managed by multiple manager hosts.
Work tasks executed by different manager hosts can be centrally monitored in the same window.
(b) Monitoring work tasks managed by one manager host
The following figure shows an example of a system configuration for centrally monitoring work tasks running under different scheduler services and managed by a single manager host.
The following figure shows an example of a system configuration in which JP1/AJS3 Console is used to centrally monitor work tasks running under different scheduler services and managed by a single manager host.
Work tasks managed by different scheduler services can be centrally monitored in the same window.
(5) Notes on manager/agent system configurations
Some cautions on manager/agent configurations are listed below.
-
Communication between manager and agent hosts is based on the host names of the hosts. Specify the settings so that the manager host name can be correctly resolved on the agent hosts and the agent host names can be correctly resolved on the manager host.
Hereafter, Host names refers to names that can be checked by using the methods below.
- Physical host:
-
The name found by executing the hostname command on a JP1/AJS3 host
- Logical host (Windows)
-
On the host on which JP1/AJS3 is running, in Control Panel, by choosing Services or Administrative Tools, and then Services. In the Services dialog box that opens, the host name is displayed in the xxxxx portion of JP1/AJS3_xxxxx.
- Logical host (UNIX)
-
Executing the ps command. The host name is displayed after the jajs_spmd process.
-
Depending on the manager/agent configurations, the now queuing status of standard jobs, action jobs, custom jobs, or event jobs might not change. If this problem occurs, check whether any of conditions below are satisfied. If any are satisfied, review the system settings.
-
When an agent host name cannot be resolved on a manager host, or a manager host name cannot be resolved on an agent host
Configure the hosts file, DNS server, jp1hosts information, or jp1hosts2 information on each manager host and agent host so that each manager host can resolve the host names of agent hosts and each agent host can resolve the host names of manager hosts. For details about how to define the jp1hosts information and jp1hosts2 information, see the Job Management Partner 1/Base User's Guide.
The following shows an example of specifying the hosts file settings.
Figure 2‒8: Example of hosts file entries that enable hosts to resolve each other's host names -
An alias for an agent host name is defined on the manager host. For this reason, the IP address obtained from the host name displayed by the hostname command executed on the agent host is not the same as the IP address obtained from the host name specified in the alias definition.
The following shows an example of specifying hosts file entries.
Figure 2‒9: Example of specifying the hosts file entries when an alias is used for a host name -
The agent host name is defined in FQDN format in the hosts file on the manager host, but the host name displayed by the hostname command executed on the agent host is a short name. Alternatively, the agent host name in the hosts file is defined with a short name, but the host name displayed by the hostname command is in FQDN format. In either case, the host name cannot be resolved on the manager host.
The following shows an example of a host name that cannot be resolved.
Figure 2‒10: Example of a host name in FQDN format defined in the hosts file Figure 2‒11: Example of a short host name defined in the hosts file In the above cases, the communication sequence for the job between the manager and agent is not completed even if the job can be executed. As a result, a large amount of retry data, which could cause the following problems, might remain on the manager and agent:
- Load on the system becomes heavy. Jobs that the user attempted to execute might not be removed from the queue, event detection might be extremely delayed, or the system might be unable to detect events.
- When the JP1/AJS3 service is restarted on the agent host, events that were detected in the past are detected again.
The second problem occurs when the remaining retry data includes data that reports detection of an event since that report data is sent to the manager again when the agent is restarted.
If these problems occur, stop the JP1/AJS3 service between the manager host and the agent host, and then specify settings so that both the host name and the short name can be resolved mutually between the manager host and the agent host. Next, cold-start the JP1/AJS3 service on the manager host and on the agent host. If necessary, register the jobnet for execution again.
-
-
Set name resolution so that an IP address other than the loopback address (127.0.0.1) is preferentially assigned to the physical host. However, if it is necessary to assign the loopback address due to operating requirements, configure the jp1hosts information or jp1hosts2 information to assign an IP address that allows the physical host to communicate with other hosts. For details about how to define the jp1hosts information or jp1hosts2 information, see the Job Management Partner 1/Base User's Guide.
-
A manager and agent are connected via a network. The procedure for the retry process if a network error occurs while a job is being executed is indicated in the table below.
- #1
-
This is the default for ObserveInterval, the environment setting parameter for job execution control, which sets the interval for monitoring the job execution agents.
- #2
-
The send retry interval and maximum number of retries can be changed as needed for the operation. For details, with Windows hosts, see 6.2.5 Changing the interval and number of resend attempts for job result files in the Job Management Partner 1/Automatic Job Management System 3 Configuration Guide 1. For details with UNIX hosts, see 15.2.5 Changing the interval and number of resend attempts for job result files in the Job Management Partner 1/Automatic Job Management System 3 Configuration Guide 1.
- #3
-
You can specify the setting so that if the manager is unable to connect to the agent host, a timeout error occurs and the manager retries the transfer at the interval defined for errors other than timeout errors. For details, see 6.3.12 Settings for ensuring that the sending of unreported information is retried at regular intervals in the Job Management Partner 1/Automatic Job Management System 3 Configuration Guide 1 (in Windows) or 15.3.13 Settings for ensuring that the sending of unreported information is retried at regular intervals in the Job Management Partner 1/Automatic Job Management System 3 Configuration Guide 1 (in UNIX).
- #4
-
You can change the retry interval and maximum number of retries to values that are best suited for system operation. For details, see 6.3.13 Changing the send retry interval and the number of retries for sending unreported information in the Job Management Partner 1/Automatic Job Management System 3 Configuration Guide 1 (in Windows) or 15.3.14 Changing the send retry interval and the number of retries for sending unreported information in the Job Management Partner 1/Automatic Job Management System 3 Configuration Guide 1 (in UNIX).
When a network error has occurred, job execution is delayed by the monitoring time only, and operation continues. However, if a network error continues for longer than the monitoring time indicated above, Failed to start is output as the job execution result.
-
When the error message KAVU2227-E (A connection error occurred during TCP/IP communication.) is output, all the socket ports might be busy in the entire system. If this happens, take the following corrective action.
- On a Windows host:
-
Execute the netstat -a command to investigate the system's socket status, and check whether there are a lot of sockets placed in the TIME_WAIT status. If they are, there might be temporary shortages of available socket ports. If a communication error message is output, job execution and job status confirmation might have failed because it was not possible to make a socket connection. In this case, rerun the job when the number of sockets in the TIME_WAIT status has decreased.
Note that you can prevent communication errors from occurring by speeding up recovery of TIME_WAIT sockets managed in Windows. The procedure for this is as follows.
1. Execute the following command to start the Registry Editor:
C:\> regedt32.exe
2. Open the following TCP/IP registry key:
\\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
3. Add the following registry values:
Name: TcpTimedWaitDelay
Data type: REG_DWORD
Data: Any value (decimal notation)
4. Restart Windows.
You can specify any required value for the TcpTimedWaitDelay parameter. Set a value that is appropriate for the operating environment.
The standard value is 240 seconds and the minimum value is 30 seconds.
- On a UNIX host:
-
If there are a lot of sockets placed in the TIME_WAIT status, there might be temporary shortages of available socket ports. If a communication error message is output, job execution and job status confirmation might have failed because it was not possible to make a socket connection. In this case, rerun the job when the number of sockets in the TIME_WAIT status has decreased.
-
JP1/AJS3 - Manager polls the status of PC jobs, Unix jobs, QUEUE jobs, action jobs, and custom jobs running on agent hosts in five-minute intervals.
When polling cannot be performed for 10 minutes or more because of a communication error or power outage at the executing host (agent), for example, the job status is changed. Jobs executed in a jobnet are placed in abnormal end status (return code: -1). Jobs executed by a jpqjobsub command change their status as specified in the command's -rs option.
-
If the agent stops while an event job is running, the status of the event job depends on how the agent was stopped and whether the option to continue execution of active event jobs is being used.
If an event job is registered for execution while the agent is stopped, the system retries the job start request at predetermined intervals. For the job statuses for each agent termination status, see 7.2.1(4) Job statuses on the manager host when an agent host is restarted in the Job Management Partner 1/Automatic Job Management System 3 Administration Guide.
For details about the option to continue execution of active event jobs, see 9.2.1 Continuing the execution of event jobs if the JP1/AJS3 service stops in the Job Management Partner 1/Automatic Job Management System 3 Administration Guide. For details about the event job retry process, see Table 2-3.
-
If multiple IP addresses are assigned to an agent host, the event/action control manager can manage a maximum of four IPv4 addresses and four IPv6 addresses per host. Make sure that each host has four or fewer IPv4 addresses and four or fewer IPv6 addresses.
If more than four IPv4 addresses can be obtained from a host name, the event/action control manager manages only four of the obtained addresses. However, no rules exist about which addresses are managed. The situation is the same when more than four IPv6 addresses are obtained from a host name.
-
When you define an alias HostB for an agent host with the real host name HostA, make sure that the same IP address is returned for both the real host name and the alias.