Nonstop Database, HiRDB Version 9 System Operation Guide

[Contents][Index][Back][Next]

20.17.3 Unit-restarting procedure (restarting the unit after identifying the cause)

The table below lists the causes of system log file shortage. Check whether any of the causes described in this table is applicable.

There might be more than one cause. Investigate all causes that satisfy the conditions.

Table 20-21 Causes of system log file shortage

No. Causes of system log file shortage Location where investigation method is explained Remarks
1 Because system log files were not unloaded or their status was not changed, the number of files waiting to be unloaded increased, causing a system log file shortage. (2) Determine the number of files waiting to be unloaded If release of checking of system log file unload status is used, no investigation is required.
2 Because the automatic log unloading facility stopped, the number of files waiting to be unloaded increased, causing a system log file shortage. If the automatic log unloading facility is not being used, no investigation is required.
3 Because transactions were not committed for a long time, the number of files that are overwrite disabled increased, causing a system log file shortage. (3) Determine the number of files that are overwrite disabled None
4 Because synchronization point dump validation processing was skipped, the number of files that are overwrite disabled increased, causing a system log file shortage. None
5 During updatable online reorganization, the number of files in overwriting denied status for online reorganization increased, causing a system log file shortage. (4) Determine the number of files that are in the overwriting denied status for online reorganization If updatable online reorganization is not being performed, no investigation is required.
6 Because an error occurred in a system log file, there was no file that could be made a swappable target, and this caused a system log file shortage. (5) Check whether the KFPS01202-E message has been output None
7 Because the system log extraction process fell behind during linkage with HiRDB Datareplicator, the number of files in extracting status increased, causing a system log file shortage. (6) Determine the number of files that are in extracting status If HiRDB Datareplicator is not used, no investigation is required.

The following subsections explain how to restart the unit after you have identified the cause of system log file shortage.

Organization of this subsection
(1) Identify the back-end server where the system log file shortage occurred
(2) Determine the number of files waiting to be unloaded
(3) Determine the number of files that are overwrite disabled
(4) Determine the number of files that are in the overwriting denied status for online reorganization
(5) Check whether the KFPS01202-E message has been output
(6) Determine the number of files that are in extracting status
(7) Eliminate the identified cause

(1) Identify the back-end server where the system log file shortage occurred

See the KFPS01220-E message output to syslogfile or the message log file to identify the back-end server where the system log file shortage occurred.

Example

 
KFPS01220-E Request to swap sys(bes1) log file unable to be executed
            because there is no standby log file group available.
 

The underlined text indicates the back-end server where the system log file shortage occurred. In this example, the system log file shortage occurred at back-end server bes1.

(2) Determine the number of files waiting to be unloaded

Execute the pdlogls command to check the status of the system log files of the back-end server identified in step (1).

Example

[Figure]

Determine the cause from the number of files waiting to be unloaded. If the following conditional expression is satisfied, it can be concluded that the number of files waiting to be unloaded increased, causing a system log file shortage:

(A + 1) [Figure] [Figure]B [Figure] 3[Figure]

A: Number of system log files that are waiting to be unloaded and which have the same Run ID as the current file

B: Number of system log files that have the same Run ID as the current file

In the above example, the files that satisfy condition A are the four files log003 to log006. The files that satisfy condition B are the six files log002 to log007. When these numbers are substituted into the formula, the conditional expression is satisfied, since 5 [Figure] [Figure]6 [Figure] 3[Figure]. Therefore, it can be concluded that an increase in the number of files waiting to be unloaded caused a system log file shortage.

If the conditional expression is satisfied when the automatic log unloading facility is being used, check whether the facility is stopped. Also, check whether the KFPS01150-E message was output to syslogfile or the message log file before the unit terminated abnormally. If the KFPS01150-E message was output, the automatic log unloading facility is stopped. It can be concluded that this was the cause of a system log file shortage.

(3) Determine the number of files that are overwrite disabled

The procedure for determining the number of files that are overwrite disabled is explained using an example.

If the number of guaranteed-valid generations for synchronization point dump files is 1 (1 is specified for the pd_spd_assurance_count operand, or specification is omitted), use the following method:

If the number of guaranteed-valid generations for synchronization point dump files is 2 (2 is specified for the pd_spd_assurance_count operand), use either of the methods listed below. These methods determine the number of system log files that were overwrite disabled by going back to the validation point of the synchronization point dump of one generation earlier.

(a) Example 1: Method for determining the number of files that are overwrite disabled from message KFPS01229-I (showing the system logs that were input during a restart)

Determine whether the system log file shortage was caused by an increase in the number of files that are overwrite disabled. Below, the number of guaranteed-valid generations for synchronization point dump files is assumed to be 1.

Procedure
  1. Execute the pdlogls command to check the information on the system log files of the back-end server identified in step (1).

    [Figure]

    Check the following types of information:
    [Figure] Files that have the same Run ID as the current file
    [Figure] Gen No. (file generation number) of the above files
    In this example, log002 to log007 have the same Run ID as the current file. The generation numbers of the individual files are as follows:
    log002: Generation number 1
    log003: Generation number 2
    log004: Generation number 3
    log005: Generation number 4
    log006: Generation number 5
    log007: Generation number 6 (current file)
  2. See the KFPS01229-I message to check the information on the system log files that become the input files during unit restart.

    [Figure]

    This message shows the information on the system log files that become input files during unit restart. In this example, the system log file with a generation number of 3 is the leading system log file to be input during unit restart.
    It can therefore be seen that system log files with a generation number of 3 or greater (log004 to log007) are overwrite disabled.
    Reference note
    The system log files that become input files during the database recovery processing executed during unit restart are overwrite disabled. It can therefore be seen that system log files with a generation number of 3 or greater (log004 to log007) are overwrite disabled.
    Note
    In the following cases, since the unit that was to be restarted terminated abnormally again, the KFPS01220-E message (indicating a system log file shortage) is output two or more times:
    • AUTO or MANUAL1 is specified for the pd_mode_conf operand.
    • The pdstart command was executed after the unit terminated abnormally due to a system log file shortage.
    In these cases, check the KFPS01229-I message, output before the KFPS01220-E message (which was output when the unit terminated abnormally the first time).
  3. Determine the cause from the number of files that are overwrite disabled.
    If the following conditional expression is satisfied, it can be concluded that the number of files that are overwrite disabled increased, causing a system log file shortage:
    A [Figure] [Figure]B [Figure] C[Figure]
    A: Number of system log files that are overwrite disabled and that have the same Run ID as the current file
    B: Number of system log files that have the same Run ID as the current file
    C: Either of the following values:
    [Figure] When HiRDB Text Search Plug-in is used or abnormal termination occurred during updatable online reorganization: 4
    [Figure] All other cases: 3
    In this example, the files that satisfy condition A are the four files log004 to log007. The files that satisfy condition B are the six files log002 to log007. When these numbers are substituted into the formula, the conditional expression is satisfied, since 4 [Figure] [Figure]6 [Figure] 3[Figure]. Therefore, it can be concluded that an increase in the number of files that are overwrite disabled caused a system log file shortage.
  4. See the KFPS02179-I message to determine the cause of the increase in the number of files that are overwrite disabled.

    [Figure]

    The KFPS02179-I message is output when a synchronization point dump validation process is skipped. Since a message with factor code=A01-02 is output several times, it can be concluded that the number of files that are overwrite disabled increased because transactions were not committed for a long time, and that this caused a system log file shortage.
    If the KFPS02179-I message with a different factor code (factor code) is output multiple times, use the largest number of factor codes to identify the cause.
    If the factor code is A01-02, it can be concluded that the number of files that are overwrite disabled increased because transactions were not committed for a long time, causing a system log file shortage.
    If the factor code is not A01-02, it can be concluded that the number of files that are overwrite disabled increased because synchronization point dump validation processing was skipped, which caused a system log file shortage.
(b) Example 2: Method for determining the number of files that are overwrite disabled from the synchronization point dump validation completion message (KFPS02183-I)

Determine whether the system log file shortage was caused by an increase in the number of files that are overwrite disabled. Below, the number of guaranteed-valid generations for synchronization point dump files is assumed to be 2.

Procedure
  1. Execute the pdlogls command to check the information on the system log files of the back-end server identified in step (1).

    [Figure]

    Check the following types of information:
    [Figure] Files that have the same Run ID as the current file
    [Figure] Gen No. (file generation number) of the above files
    In this example, log002 to log007 have the same Run ID as the current file. The generation numbers of the individual files are as follows:
    log002: Generation number 1
    log003: Generation number 2
    log004: Generation number 3
    log005: Generation number 4
    log006: Generation number 5
    log007: Generation number 6 (current file)
  2. See the KFPS01229-I message to check the information on the system log files that become input files during unit restart.

    [Figure]

    This message is output when synchronization point dump validation is completed. Since the number of guaranteed-valid generations for synchronization point dump files is 2, system log files corresponding to the synchronization point dump up to two generations earlier are overwrite disabled. Since log004 (with a generation number of 3) is shown in the validation completion message for the synchronization point dump that is one generation earlier than the latest, it can be seen that system log files with a generation number of 3 or greater (log004 to log007) are overwrite disabled.
    Note
    In the following cases, because the unit that was to be restarted terminated abnormally again, the KFPS01220-E message (indicating a system log file shortage) is output two or more times:
    • AUTO or MANUAL1 is specified for the pd_mode_conf operand.
    • The pdstart command was executed after the unit terminated. abnormally due to a system log file shortage.
    In these cases, check the KFPS02183-I message, output before the KFPS01220-E message (which was output when the unit terminated abnormally the first time).

The steps to be taken from this point on are the same as steps 3 and beyond in Example 1: Method for determining the number of files that are overwrite disabled from message KFPS01229-I (showing the system logs that were input during a restart).

(c) Example 3: Method for determining the number of files that are overwrite disabled from the synchronization point dump validation skip message (KFPS02179-I)

Determine whether the system log file shortage was caused by an increase in the number of files that are overwrite disabled. If this is the case, it is assumed that the number of guaranteed-valid generations for synchronization point dump files is 2.

Procedure
  1. Execute the pdlogls command to check the information on the system log files of the back-end server identified in step (1).

    [Figure]

    Check the following types of information:
    [Figure] Files that have the same Run ID as the current file
    [Figure] Gen No. (file generation number) of the above files
    In this example, log002 to log007 have the same Run ID as the current file. The generation numbers of the individual files are as follows:
    log002: Generation number 1
    log003: Generation number 2
    log004: Generation number 3
    log005: Generation number 4
    log006: Generation number 5
    log007: Generation number 6 (current file)
  2. Confirm that the KFPS02179-I message has been output.

    [Figure]

    The KFPS02179-I message showing that a validation skip count (number of skip) of 1 was output while the generation of system log file whose system logs began being input during the restart (log004) was allocated as the current file. Therefore, check whether the KFPS02179-I message was output while log003 from one generation earlier was allocated as the current file, and identify the system log files that were overwrite disabled.
    In this example, the KFPS02179-I message was not output while log003 was allocated as the current file. Therefore, the system log files with a generation number equal to or greater than that of log002 (generation number of 1), which is the system log file allocated two generations earlier than the system log file whose system logs began being input during restart, are overwrite disabled.
    It can be seen that system log files with a generation number of 1 or greater (log002 to log007) are overwrite disabled.
    Note
    In the following cases, because the unit that was to be restarted terminated abnormally again, the KFPS01220-E message (indicating a system log file shortage) is output two or more times:
    • AUTO or MANUAL1 is specified for the pd_mode_conf operand.
    • The pdstart command was executed after the unit terminated abnormally due to a system log file shortage.
    In these cases, check the KFPS02183-I message, output before the KFPS01220-E message (which was output when the unit terminated abnormally the first time).

The steps to be taken from this point on are the same as steps 3 and following in Example 1: Method for determining the number of files that are overwrite disabled from message KFPS01229-I (showing the system logs that were input during a restart).

[Figure] Concept behind step 2 in the above procedure (supplement)
First, look for the KFPS01221-I message, which indicates that the system log file that began being input during the restart was allocated as the current file.
Next, look for the KFPS02179-I message that was output while the system log file that began being input during the restart was allocated as the current file, which shows a validation skip count (number of skip) of 1.
If the KFPS02179-I message was not output, the system log file that is one generation older than the system log file that began being input during the restart is the file that was overwrite disabled.
If the KFPS02179-I message was output, check whether it was output while the system log file that is one generation older than the system log file that began being input during the restart was being used as the current file.
If it was not output under those circumstances, the system log file that is two generations older than the system log file that began being input during the restart was the system log file that was overwrite disabled.
If the KFPS02179-I message was output while the system log file that is one generation older than the system log file that began being input during the restart was being used as the current file, check the validation skip count. From there, go back up to the KFPS02179-I message that shows a synchronization point validation count (number of skip) of 1. The system log files, starting with the one that was being used immediately before this message was output, were overwrite disabled.

(4) Determine the number of files that are in the overwriting denied status for online reorganization

Execute the pdlogls command to check the status of the system log files of the back-end server identified in step (1).

Example

[Figure]

Determine the cause from the number of files that were in the overwriting denied status for online reorganization. If the following conditional expression is satisfied, it can be concluded that the number of files that were in the overwriting denied status for online reorganization increased, causing a system log file shortage:

(A + 1) [Figure] [Figure]B [Figure] 4[Figure]

A: Number of system log files that were in the overwriting denied status for online reorganization and that have the same Run ID as the current file

B: Number of system log files that have the same Run ID as the current file

In the example above, the files that satisfy condition A are the four files log003 to log006. The files that satisfy condition B are the six files log002 to log007. When these numbers are substituted into the formula, the conditional expression is satisfied, since 5 [Figure] [Figure]6 [Figure] 4[Figure]. Therefore, it can be concluded that an increase in the number of files that were in the overwriting denied status for online reorganization caused a system log file shortage.

(5) Check whether the KFPS01202-E message has been output

Check whether the KFPS01202-E message was output to syslogfile or the message log file immediately before the unit terminated abnormally due to a system log file shortage.

Example

[Figure]

If the KFPS01220-E message was output immediately following the KFPS01202-E message, as in this example, it can be concluded that an error occurred in the file that could be made a swappable target, causing a system log file shortage.

(6) Determine the number of files that are in extracting status

Execute the pdlogls command to check the status of the system log files of the back-end server identified in step (1).

Example

[Figure]

Determine the cause from the number of files that are in extracting status. If the following conditional expression is satisfied, it can be concluded that the number of files that are in extracting status increased because system log extraction could not keep up, causing a system log file shortage:

(A + 1) [Figure] [Figure](B [Figure] 3) [Figure] 4[Figure]

A: Number of system log files that are in extracting status and that have the same Run ID as the current file

B: Number of system log files that have the same Run ID as the current file

In the above example, the files that satisfy condition A are the four files log003 to log006. The files that satisfy condition B are the six files log002 to log007. When these numbers are substituted into the formula, the conditional expression is satisfied, since 5 [Figure] [Figure](6 [Figure] 3) [Figure] 4[Figure]. Therefore, it can be concluded that an increase in the number of files that are in extracting status caused a system log file shortage.

(7) Eliminate the identified cause

See 20.17.4 Unit-restarting procedure (after the cause of the system log file shortage has been identified) and eliminate the identified cause. Then, restart the unit.