Hitachi

JP1 Version 12 JP1/Automatic Job Management System 3 System Design (Work Tasks) Guide


7.6.8 Notes on restarting the JP1/AJS3 service while event jobs are running

If you perform any of the operations listed below while event jobs (including those with a start condition) are running, communication and correlation processing take place between the schedule control and event/action control functions to ensure consistency in job statuses between the functions. If a large number of event jobs are in Now running status, the amount of data to be processed places a considerable load on the system.

Operations that result in high system load
  • Stopping the scheduler service, and then performing a warm or hot start.

  • Stopping the JP1/AJS3 service on the manager host, and then performing a warm or hot start.

  • Stopping and then restarting the JP1/AJS3 service on the agent host.

  • Executing the jajs_maintain command.

Use the following workarounds to avoid placing a high load on the system:

Workarounds
  • Before you perform any of the above operations, forcibly terminate any jobnets that use event jobs. Register them for execution again when the operation is complete.

  • Before you perform any of the above operations, forcibly terminate all event jobs. Re-execute them when the operation is complete.

  • Do not run any jobs until 30 minutes to an hour has elapsed after performing the operation.

  • Do not trigger any monitored events until 30 minutes to an hour has elapsed after performing the operation.

The following problems can occur when a high load is placed on the system.

Problems resulting from a high system load
  1. Event jobs (including those with a start condition) registered for execution immediately after you performed one of the above operations take a long time to enter Now running status.#1

  2. When you forcibly terminate an event job in Now running status, or a jobnet with a start condition in Now monitoring status, it takes a long time to terminate.#1

  3. When you change the status of an event job in Now running status, the change takes a long time to take effect.#1

  4. It takes a long time for an event to be detected when a monitoring condition is satisfied.#1

  5. Event jobs (including those with a start condition) registered for execution immediately after you performed one of the above operations remain in Queuing status.#2

  6. You attempt to forcibly terminate an event job in Now running status, or a jobnet with a start condition in Now monitoring status, but it does not terminate.#2

  7. A monitoring condition is satisfied but no event is detected.#2

#1

Problems 1 through 4 can occur when you perform an operation that results in a high system load while the event/action control manager is near its resource limits. For details about the resource limits that apply to event/action control, see B.8 Limits for the event/action control in the JP1/Automatic Job Management System 3 System Design (Configuration) Guide.

#2

Problems 5 through 7 can occur when you perform an operation that results in a high system load after the event/action control manager has exceeded its resource limits. For details about the resource limits that apply to event/action control, see B.8 Limits for the event/action control in the JP1/Automatic Job Management System 3 System Design (Configuration) Guide.

If you perform an operation that results in a high system load when a large number of event jobs are running, a large volume of communication takes place between the scheduler control and event/action control functions. This results in an increased number of unreported items of information that are generated and managed in the event of a communication error. JP1/AJS3 imposes a limit on the number of unreported item that can be kept, to prevent the high system load that results from processing this data from monopolizing system resources and delaying job execution and event detection. When the number of unreported items reaches the limit, the information is deleted starting with the oldest item. Any of problems 5 through 7 can occur as a result, depending on the content of the deleted data.

The upper limit of the number of unreported items of information that can be kept is not disclosed to JP1/AJS3 users. Instead, the limit for event/action control is calculated from the number of unreported items generated by an operation that results in a high system load. When using JP1/AJS3, be careful not to exceed this limit.

If any of the problems mentioned above occurs, take the following remedial action:

Recovery procedure
For problems 1 to 4

Wait until the processing finishes. This can take 30 minutes to an hour, depending on the number of jobs affected.

For problem 5

Forcibly terminate the event job or jobnet in question, and then re-register it for execution.

For problem 6

For an event job, change the status of the job and terminate it. For a jobnet with a start condition, forcibly terminate the jobnet again.

For problem 7

Forcibly terminate the event job or jobnet in question, and then re-register it for execution. Then, generate the event again.