10.6.1 Cautionary notes on cluster system operation
The following are cautionary notes on using a cluster system.
- Organization of this subsection
-
-
(4) Cautionary notes on the execution environment for event jobs and custom event jobs
-
(5) Cautionary notes on the execution environment for QUEUE jobs and submit jobs
-
(6) Cautionary notes on the queueless job execution environment
-
(8) Cautionary notes on using a logical host in a non-cluster environment
-
(9) Cautionary notes on the execution environment of flexible jobs
(1) Cautionary notes applicable to all JP1/AJS3 programs
-
In JP1/AJS3, only logical host names that are 32 bytes or less can be used. Accordingly, if you create a logical host in JP1/Base, make sure that the name is 32 bytes or less. If you use a kill command (jajs_killall.cluster command) in UNIX, the first 15 bytes of the logical host names must be unique. For details, see (3) Cautionary notes applicable to UNIX.
-
When you set up JP1/AJS3 in a cluster system, you need to stop the JP1/AJS3 services running on the physical host and existing logical hosts. If you attempt setup while JP1/AJS3 services are running, the JP1/AJS3 services will no longer operate correctly, and you will need to restart the server.
-
To start multiple instances of JP1/AJS3 in a cluster system, you need a system resource for each logical host on which a JP1/AJS3 instance is started.
-
Only one queueless agent service or queueless file transfer service is assigned on a machine. However, these services are available in a cluster system because separate processing on each logical host is enabled by using the cluster software to move the shared disk and the logical IP address.
-
Some cluster software has error simulation functionality built in. If the simulation functionality is used for JP1/AJS3, the cluster software might assume an error without stopping JP1/AJS3 or without waiting for JP1/AJS3 to stop; such a situation might cause unintended operations, such as the unsuccessful restarting of JP1/AJS3. You can avoid this problem by using the cluster software to adjust the restart interval. Note, however, that you cannot use the error simulation functionality with cluster software that is unable to adjust the restart interval.
-
Some cluster software might monitor the start time or stop time of a JP1/AJS3 service and cause a timeout if the start or stop process is not completed within a specified time period. Because the start and stop times of JP1/AJS3 services vary depending on the environment (for example, the number of scheduler services), adjust the timeout value of the cluster software appropriately for the environment.
To determine the start time or stop time of JP1/AJS3 services, use the start or stop time when a service or command is used rather than the time when the cluster software is used.
-
Immediately after a JP1/AJS3 service has stopped, some JP1/AJS3 processes might remain. If the cluster software has been set up to restart JP1/AJS3, the restart might fail. However, you can avoid the problem by increasing the restart interval for the cluster software or the number of times restart is performed.
-
Duplication of the database (ISAM) and internal files used for QUEUE jobs and submit jobs is not supported. Use a RAID disk to ensure reliability of the disk system.
-
A disk mounted on a file system connected to a network, such as NFS, cannot be used as a shared disk in a cluster system.
-
If you set up JP1/AJS3 in a cluster system, the same connection source restriction settings are applied to the physical hosts and logical hosts. If you want to use separate settings for physical hosts and logical hosts, you must change the connection source restriction settings on the logical hosts. Even if the same settings can be used, add a logical host IP address to the connection permission configuration files on both the physical and logical hosts. This is because the logical host IP address is newly assigned to the logical host as the local host IP address. For details about how to change the connection source restriction settings, see 7.11 Changing the setting for restricting connection sources.
-
If the same IP address is used for the physical host and logical host, operation instructions from the logical host might be assumed to be operation instructions from the physical host. To avoid this, for Server host in the user mapping definition, specify the host name of both the physical and logical hosts or specify an asterisk (*).
(2) Cautionary notes applicable to Windows
-
During cluster system operation, if a JP1/AJS3 process in a JP1/AJS3 service running on a logical host terminates abnormally, JP1/AJS3 stops all processes rather than continuing in reduced-operation mode. If JP1/AJS3 has been set up to restart a JP1/AJS3 process that has terminated abnormally, the restart settings are disabled.
-
Do not set the JP1_HOSTNAME environment variable as a system environment variable or user environment variable. If you do so, the service might not be able to start. Set the JP1_HOSTNAME environment variable from the command prompt or in a batch file. For details about how to specify a logical host name, see 10.1.1(4) Requirements for a logical host name.
-
If a JP1/AJS3 service on a logical host is stopped by choosing Services in the Windows Control Panel window or by choosing Administrative Tools and then Services, some cluster software might assume an error without waiting for JP1/AJS3 to stop. This might cause unintended operations, such as the unsuccessful restarting of JP1/AJS3.
(3) Cautionary notes applicable to UNIX
-
During cluster system operation, if a JP1/AJS3 process in a JP1/AJS3 service running on a logical host terminates abnormally, terminate all processes, rather than continuing in the reduced-operation mode. If JP1/AJS3 has been set up to restart a JP1/AJS3 process that has terminated abnormally, cancel the restart setting because the restart takes precedence. For details about how to set the restart, see 6.3.1 Restarting an abnormally terminated JP1/AJS3 process.
If a JP1/AJS3 process in a JP1/AJS3 service started on a logical host with the -HA option specified terminates abnormally, JP1/AJS3 terminates all processes rather than continuing in reduced-operation mode. If JP1/AJS3 has been set up to restart a JP1/AJS3 process that has terminated abnormally, the restart settings are disabled.
-
To start or stop the physical host in an environment in which the JP1_HOSTNAME environment variable has been set, use a shell in which the JP1_HOSTNAME environment variable is temporarily deleted. For details about how to set up automatic start and termination, see 15.10.1(10) Setting automatic startup and termination of the JP1/AJS3 service that do not depend on the JP1_HOSTNAME environment variable in the JP1/Automatic Job Management System 3 Configuration Guide.
-
When using the kill command (jajs_killall.cluster command) in UNIX, specify a unique logical host name of 32 bytes. This command checks the first 32 bytes of the logical host name, and kills the corresponding process. If there are logical hosts that have names of 33 or more bytes, all logical hosts whose names have the same first 32 bytes are subject to the kill operation.
-
In AIX, when a memory shortage occurs, the system issues SIGKILL and the JP1/AJS3 process might be terminated. To prevent this problem, set the environment variables as follows for the physical and logical hosts that are used for JP1/AJS3, and then start JP1/AJS3:
-
PSALLOC=early
-
NODISCLAIM=true
-
(4) Cautionary notes on the execution environment for event jobs and custom event jobs
-
When a cluster environment is set up, the FileWriteMode environment setting parameter, which defines the file update mode of the logical host, is set to sync (synchronous update mode), the recommended value. To use asynchronous update mode, use the jajs_config command to change the value of the FileWriteMode environment setting parameter to nosync after setting up a cluster environment.
For details about the FileWriteMode environment setting parameter, see 20.6.2(2) FileWriteMode (for manager process) in the JP1/Automatic Job Management System 3 Configuration Guide or 20.6.2(21) FileWriteMode (for agent process) in the JP1/Automatic Job Management System 3 Configuration Guide.
-
When a cluster environment is set up, the EVProcessHA environment parameter, which defines the detailed process termination option, is set to Y, the recommended value. When a detailed process of event/action control terminates, you might want operations to continue in reduced mode without terminating the event/action control agent process. In such a case, use the jajs_config command to change the value of the EVProcessHA environment parameter to N after setting up a cluster environment.
For details about the EVProcessHA environment setting parameter, see 20.6.2(22) EVProcessHA in the JP1/Automatic Job Management System 3 Configuration Guide.
-
If the mail system linkage function that uses Outlook is enabled, a mail system can be linked with only one instance of JP1/AJS3 on the physical host or logical host. Note that the environment setting parameters for the mail system linkage function that uses Outlook must be defined on the physical host even if you link the mail system with JP1/AJS3 on the logical host. If you want to execute a mail reception monitoring job for the mail system linkage function on a UNIX host, define the ExecMode environment setting parameter on the physical host and other environment setting parameters on the logical host.
For details about environment setting parameters, see 2.3.4 Setting up the environment for the mail system linkage in the JP1/Automatic Job Management System 3 Linkage Guide (for Windows) or 2.4.2 Setting up the environment for an email reception monitoring job in the JP1/Automatic Job Management System 3 Linkage Guide (for UNIX).
Note that, for the mail system linkage function that uses Outlook or that runs on a UNIX host, linkage with a mail system on the standby node is not possible.
(5) Cautionary notes on the execution environment for QUEUE jobs and submit jobs
-
During cluster system operation, if you stop a JP1/AJS3 service while a job is running on the primary node, the job is killed, and operation switches to the secondary node. However, the secondary node does not immediately recognize that the killed job has ended. A few minutes will be required for the status of the job to change to the ended status.
-
When using the jpqreguser command to register VOS3 user information to link with JP1/OJE for VOS3, you need to register the user information on both the primary node and the secondary node. If you have added, changed, or deleted user information on the primary node, you also need to add, change, or delete user information on the secondary node. To do so, use the procedure in the cautionary notes on operation in a cluster system in the jpqreguser in 2. Commands Used during Setup in the manual JP1/Automatic Job Management System 3 Command Reference.
(6) Cautionary notes on the queueless job execution environment
For cautionary notes on the queueless job execution environment, see 8.2.7(3) Notes on automatic attachment and detachment of logical hosts performed when queueless jobs are used in the JP1/Automatic Job Management System 3 Configuration Guide (in Windows) or 17.2.7 Setting up the queueless job execution environment in the JP1/Automatic Job Management System 3 Configuration Guide (in UNIX).
(7) Cautionary note on the definition pre-check function
For cautionary notes on the definition pre-check function, see 8. Definition Pre-Check in the JP1/Automatic Job Management System 3 System Design (Work Tasks) Guide.
(8) Cautionary notes on using a logical host in a non-cluster environment
Because a logical host in a non-cluster system does not inherit the management information on the shared disk, it cannot be failed over. For this reason, do not use such a logical host in a multiple-host environment where a logical host IP is passed from one host to another.
(9) Cautionary notes on the execution environment of flexible jobs
-
Destination agents and broadcast agents do not support cluster configurations.
Regarding the availability of the destination agent and broadcast agent, see 2.9 Executing jobs in a cloud environment in the JP1/Automatic Job Management System 3 System Design (Configuration) Guide or 2.10 Considerations for executing a job by broadcast execution in the JP1/Automatic Job Management System 3 System Design (Configuration) Guide.
-
In a configuration that does not use a relay agent, if a failover occurs in JP1/AJS3 - Manager while a flexible job is being executed on another host, the status of the flexible job becomes Killed. On the other hand, in a configuration that does use a relay agent, the execution of flexible jobs continues (Flexible jobs remain in the Now running status). Therefore, we recommend that you use a relay agent if you want to use flexible jobs in a cluster system.