17.2.1 Troubleshooting problems related to setup and service startup
- Organization of this subsection
-
-
(1) A Performance Management program service does not start.
-
(2) A service takes a long time to start once startup is requested.
-
(5) The Correlator service takes a long time to start after PFM - Manager restarts.
-
(6) The Agent Collector service or Remote Monitor Collector service does not start.
-
(7) Multiple agents that start simultaneously take a long time to recover from stand-alone mode.
-
(1) A Performance Management program service does not start.
Possible causes and solutions:
-
PFM - Manager stopped
If PFM - Manager and PFM - Agent or PFM - RM are installed on the same host, the PFM - Agent or PFM - RM service cannot start when PFM - Manager is stopped. Determine whether the PFM - Manager service has started. If the PFM - Manager service has not started, start the service. For details on service startup, see 1. Starting and Stopping Performance Management.
-
The same port number is set for multiple Performance Management program services.
When the same port number is set for multiple Performance Management program services, none of the Performance Management program services can start. Since port numbers are allocated automatically by default, they cannot be duplicated. When port numbers for Performance Management program services are fixed during Performance Management setup, check the port number settings. If the same port number is set for more than one Performance Management program service, you must make appropriate corrections in the port number settings. For details on how to set a port number, see the chapter that describes installation and setup in the Job Management Partner 1/Performance Management Planning and Configuration Guide.
-
There is an error in a setting for a Store database installation directory.
If any of the following directories are set to a directory that cannot be accessed or a directory that does not exist, the Agent Store or Remote Monitor Store service cannot start. Review the directory name and attributes, and correct the settings if necessary.
-
Store database installation directory
-
Store database backup directory
-
Store database partial backup directory
-
Store database export directory
-
Store database import directory
In addition, if one of these directories is set for multiple Agent Store or Remote Monitor Store services, the Agent Store or Remote Monitor Store service cannot start. Review the directory settings and correct the settings if necessary.
-
-
The host name of the machine was changed using a non-permitted procedure.
For details on how to change the host name of the machine, see the chapter that describes installation and setup in the Job Management Partner 1/Performance Management Planning and Configuration Guide. Under some circumstances when the host name is changed using a procedure other than those permitted, a Performance Management program service might not start. This is the reason why the following issues might occur.
-
The KAVE00493-E message is output to the common message log and services cannot start.
For details on how to recover from this event, see 17.2.2 Recovery method when the KAVE00493-E message is output and services cannot start.
-
If you execute the jpctool service list -id * -host * command on a host where the service has not started, a service with a duplicate service ID will appear in the Service Name column.
For details on the jpctool service list command, see the chapters that describe commands in the manual Job Management Partner 1/Performance Management Reference.
-
-
When you stopped the PFM - Agent and changed the PFM - Agent host name while the PFM - Agent was not connect to PFM - Manager.
Each Performance Management service registers its own service information (IP address, host name, and port number) in PFM - Manager when it is starting and deletes the information when it is stopping. If a service is not able to delete its service information when it was stopping because it could not communicate with PFM - Manager for some reason, the service information remains on the PFM - Manager side. Then, if the service tries to start using the same service information it used the previous time it started, the attempt will fail. This is because Performance Management does not allow a duplicate service instance to be started. (KAVE00133-E is output to the common message log on the host where the attempt to start the service failed.) In such a case, perform the following procedure to change the service information:
-
Execute the jpcconf port define command to unlock the PFM - Agent port (if it is locked).
-
Restart the PFM - Agent service.
-
Execute the jpcconf port define command again to relock the PFM - Agent port (if necessary).
For details on the jpcconf port define command, see the chapters that describe commands in the manual Job Management Partner 1/Performance Management Reference.
-
-
A previously started process still exists.
If a previously started process still exists, the service of the process cannot be started, because Performance Management does not allow a duplicate service instance to be started. Use Task Manager (for Windows) or the ps command (for UNIX) to check whether a process exists for a service that could not start successfully. If such a process exists, terminate it.
-
An error has occurred in the service control manager.
When the jpcspm start command is executed in Windows, the KAVE05163-E message is output and the service might not start. If this occurs, re-execute the jpcspm start command. If the same problem frequently occurs, edit the jpccomm.ini file and change the retry interval and the number of retries of the service startup processing performed when executing the jpcspm start command. For details on changing the retry interval and the number of retries, see 1.8.3 Starting on a Windows machine.
(2) A service takes a long time to start once startup is requested.
It might take a long time for service to actually start once you execute the jpcspm start command or start a service by selecting Services in Windows. If the following factors are the reason for this, subsequent service startups should take less time.
-
Starting a service in standalone mode might slow down the startup of the service.
-
During initial startup after the Store database is restored, the indexes of the Store database must be rebuilt. This might slow startup of the service.
-
During initial startup after an Agent is newly added, the indexes of the Store database must be created. This might slow startup of the service.
-
If normal end processing for the Store service cannot be performed due to a power interruption or other reason, the indexes of the Store database are rebuilt at restart; therefore, it might take a long time for the Store service to start.
(3) Immediately after a Performance Management program service is stopped, another program starts service and communication is not performed properly.
Immediately after stopping a Performance Management program service, another program service might start that uses the same port that the stopped service was using. In this case, communication might not be performed properly. You can use either of the following techniques to avoid this problem:
-
Fix the port numbers to be allocated to the Performance Management program services.
Allocate a fixed port number to each Performance Management program service. For details on how to set a port number, see the chapter that describes installation and setup in the Job Management Partner 1/Performance Management Planning and Configuration Guide.
-
Set the TCP_TIMEWAIT value.
Use the TCP_TIMEWAIT value to set a connection wait time.
For HP-UX or AIX, specify a connection wait time of at least 75 seconds, as follows:
-
In HP-UX: tcp_time_wait_interval:240000
-
In AIX: tcp_timewait:5
In Windows or Solaris, use the default connection wait time setting. The default settings are:
-
In Solaris: 4 minutes
-
In Windows Server 2003, Windows Server 2008, Windows Server 2012: 2 minutes
In Linux, you cannot change the connection wait time setting from the default of 60 seconds. If this problem occurs in Linux, use the technique to fix the port numbers of the Performance Management program services.
-
(4) After the message "The disk capacity is insufficient" is output, the Master Store service or Agent Store service stops.
If there is insufficient space on the disk used by the Store database, the storing of data to the Store database is cancelled. In this case, after the message The disk capacity is insufficient is output, the Master Store service, Agent Store service, or Remote Monitor Store service stops.
If this message appears, use either of the following techniques to solve this problem.
-
Allocate sufficient disk space.
Estimate the disk usage of the Store database and change the storage location of the Store database to a disk with sufficient space. For details on how to estimate the disk usage of the Store database, see the system requirements in an appendix of each PFM - Agent or PFM - RM manual.
For details on how to change the storage location of the Store database for event data, see the chapter that describes installation and setup in the Job Management Partner 1/Performance Management Planning and Configuration Guide. For details on how to change the storage location of performance data, see each PFM - Agent or PFM - RM manual.
-
Modify the data retention conditions of the Store database.
Modify the data retention conditions of the Store database and adjust the upper limit for the amount of data in the Store database. For details on how to change the retention conditions of the Store database, see 4.1.2 Modifying the retention conditions for performance data (in Store 2.0), 4.1.3 Modifying the retention conditions for performance data (in Store 1.0), or 4.2.1 Changing the maximum number of records for event data.
If the Master Store service, the Agent Store service, or the Remote Monitor Store service does not start even after taking these actions, there may be some unrecoverable logical errors in the Store database. In this case, you must restore the Store database from the backup data, and then restart the Master Store service, the Agent Store service, or the Remote Monitor Store service. If you have no backup data, you must initialize the Store database, and then start the Master Store service, the Agent Store service, or the Remote Monitor Store service. To initialize the Store database, delete all of the following files in the installation directories of the Store database:
- When the Store database version is 1.0
-
-
Files with the extension .DB
-
Files with the extension .IDX
-
- When the Store database version is 2.0
-
-
Files with the extension .DB
-
Files with the extension .IDX
Delete the files in the STPI, STPD, and STPL directories.
(Do not delete the STPI, STPD, and STPL directories themselves.)
-
The following shows the default installation directories of the Store database.
- Store database installation directory for performance data:
-
For details, see the appropriate PFM - Agent or PFM - RM manual.
- Store database installation directory for event data:
-
- When PFM - Manager is in a non-cluster environment
-
-
In Windows:
installation-folder\mgr\store\
-
In UNIX:
/opt/jp1pc/mgr/store/
-
- When PFM - Manager is in a cluster environment
-
-
In Windows:
environment-directory\jp1pc\mgr\store\
-
In UNIX:
environment-directory/jp1pc/mgr/store/
-
(5) The Correlator service takes a long time to start after PFM - Manager restarts.
The Correlator service checks alarm status on agents when it starts. If you restart PFM - Manager without stopping agents, the Correlator service might take some time to start. If you want to prevent this, consider enabling the Correlator quick start function.
When you enable the Correlator quick start function, the Correlator service checks alarm status on agents after it starts when necessary. As a result, the Correlator service requires less time to start. When the Correlator service reports checked alarm status, PFM - Manager might issue agent events containing one of the following messages.
Message |
Description |
---|---|
State information |
The Correlator service received an alarm event from an agent and successfully checked the alarm status. |
State information (Unconfirmed) |
The Correlator service received an alarm event from an agent but could not check the alarm status. |
State change (Unconfirmed) |
The Correlator service received an alarm event from an agent whose alarm status was unknown. The Correlator service assumed the status of PFM - Agent or PFM - RM based on the content of the received alarm event. |
The following table describes the triggers that prompt PFM - Manager to issue agent events containing the messages described in the above table.
Status of the Correlator quick start function |
Trigger for the Correlator service to check alarm status |
Success or failure of alarm status checking and message to be included in agent events |
|
---|---|---|---|
Success |
Failure |
||
Disabled (when the Retry Getting Alarm Status label is enabled in the startup information file (jpccomm.ini)) |
When PFM - Manager starts |
State information |
State information (Unconfirmed) |
When the Correlator service fails to check alarm status on an agent and receives the next alarm event from the agent |
State information |
State change (Unconfirmed)# |
|
Enabled |
When PFM - Manager starts and then the Correlator service receives the first alarm event from an agent |
State information |
State information (Unconfirmed) |
When the Correlator service fails to check alarm status on an agent and then receives the next alarm event from the agent |
State information |
State change (Unconfirmed)# |
To enable or disable the Correlator quick start function:
-
Stop the Performance Management programs and services.
When the Performance Management programs and services are running on the PFM - Manager host, execute the jpcspm stop command to stop all of them. When Performance Management is running in a cluster system, use cluster software to stop all the Performance Management programs and services.
-
Use a text editor to open the jpccomm.ini file on the PFM - Manager host.
The jpccomm.ini file is stored in the following location:
For a physical host
-
For Windows
installation-folder\
-
For UNIX
/opt/jp1pc/
For a logical host
-
For Windows
environment-directory\jp1pc\
-
For UNIX
environment-directory/jp1pc/
-
-
Enable or disable the Correlator quick start function.
In the jpccomm.ini file, in the Common Section section, set a desired value for the following label.
-
Enabling the function
Correlator Startup Mode=1
-
Disabling the function
Correlator Startup Mode=0
-
-
Save and close the jpccomm.ini file.
-
Start the Performance Management programs and services.
(6) The Agent Collector service or Remote Monitor Collector service does not start.
Suppose the OS of a PFM - Agent host or a PFM - RM host is Windows, PFM - Agent or PFM - RM starts, and the Agent Collector service or the Remote Monitor Collector service fails to start. When that occurs and Windows restarts, the following message might be output to a Windows event log.
-
service-name service hung on starting.
These messages appear when the Windows Service Control Manager times out. The Service Control Manager is likely to time out if the communication load on PFM - Manager is high and PFM - Manager takes a long time to issue a response. These messages are output if all of the following conditions are met:
-
The communication load on PFM - Manager is high.
For example, many instances of startup processing for PFM - Agent or PFM - RM are simultaneously executed.
-
In Services in Windows, the startup type for the PFM - Agent or PFM - RM services is set to automatic.
-
The OS is restarted.
To prevent the Service Control Manager from timing out, perform either of the following procedures:
-
If you want to start the services when the OS restarts, use the jpcspm start command instead of the Windows Service Control Manager.
-
Perform the following on PFM - Agent or PFM - RM hosts to reduce the startup time for PFM - Agent or PFM - RM.
The following procedure reduces the reconnection processing time when PFM - Agent or PFM - RM cannot connect to PFM - Manager when the PFM - Agent or PFM - RM services start. If that occurs, the PFM - Agent or PFM - RM services are highly likely to start in standalone mode.
To reduce the startup time for PFM - Agent or PFM - RM, in the startup information file (jpccomm.ini), in Agent Collector x Section# and Agent Store x Section#, change the value for the NS Init Retry Count label from NS Init Retry Count = 2 to NS Init Retry Count = 1.
- #
-
x represents the product ID of PFM - Agent or PFM - RM. For details on product IDs, see the list of identifiers in the appendix in the applicable PFM - Agent or PFM - RM manual. When multiple instances of PFM - Agent or PFM - RM are installed on the same host, set the value for the NS Init Retry Count label for each product ID.
The startup information file (jpccomm.ini) is stored in the following location:
- If a PFM - Agent or PFM - RM host is a physical host
-
installation-folder\jpccomm.ini
- If a PFM - Agent or PFM - RM host is a logical host
-
environment-directory#\jp1pc\jpccomm.ini
- #
-
Indicates a directory on the shared disk that is specified when a logical host is created.
(7) Multiple agents that start simultaneously take a long time to recover from stand-alone mode.
A monitoring agent that enters stand-alone mode during startup automatically tries to reconnect to the monitoring manager. If it succeeds, the monitoring agent enters normal mode.
If you start multiple monitoring agents simultaneously, communication from each monitoring agent to the monitoring manager is concentrated and connection errors will occur, and multiple monitoring agents might enter stand-alone mode. At that time, if those monitoring agents try to reconnect to the monitoring manager repeatedly at the same time it will cause a concentration of communication and the monitoring agents might be delayed in entering normal mode.
When such an event occurs, change the value of the Random Retry Mode label (the dispersion of the reconnection) of the Common Section section of the startup information file (jpccomm.ini) to 1 (enabled).
This operation allows attempts to reconnect from monitoring agents in stand-alone mode to the monitoring manager to be made at random intervals rather than at regular intervals and can thus avoid communication concentration.
Note that these settings are applicable when the version of PFM - Manager or PFM - Base in the system is 10-10-20 or later and the version of PFM - Agent or PFM - RM is 10-00 or later.
For details on how to change the startup information file (jpccomm.ini), see the part that explains the startup information file (jpccomm.ini) in the appendixes of the manual Job Management Partner 1/Performance Management Reference.