Hitachi

JP1 Version 13 JP1/Integrated Management 3 - Manager Administration Guide


12.5.2 What happens and how to recover from major input errors

Organization of this subsection

(1) Prometheus server scrape specified an incorrect host or port

Phenomenon

Scrape fails and the UP metric is collected as 0.

The latest information is not displayed in the TRE information of the integrated operation viewer for data acquired via the Prometheus server with incorrect scrape destination settings.

Recovery methods

Correct the scrape definition and reload or restart the Prometheus server.

(2) You specified an incorrect host or port as the remote write destination of the Prometheus server

Phenomenon

The latest information is not displayed in the TRE information of the integrated operation viewer for data acquired via the Prometheus server with incorrect remote light destination settings.

Recovery methods

Correct the remote write definition and reload or restart the Prometheus server.

(3) You specified an incorrect host or port as the Prometheus server alert notification destination.

Phenomenon

Regarding alerts generated from Prometheus server with incorrect alert notification destination settings, the latest information is not displayed in the JP1 event list of the integrated operation viewer.

Recovery methods

Correct the notification destination definition and reload or restart the Prometheus server.

(4) You specified an incorrect host or port as the Alertmanager notification destination

Phenomenon

Regarding alerts sent from Alertmanager with incorrect notification destination settings, the latest information is not displayed in the JP1 event list of the integrated operation viewer.

Recovery methods

Correct the notification destination definition and reload or restart Alertmanager.

(5) Blackbox exporter has specified an incorrect host or port to monitor

Phenomenon

Acquisition of monitoring destination information fails, and probe_success metric is collected as 0.

Recovery methods

Correct the monitored host and port definitions and reload or restart the Prometheus server.

(6) Prometheus server definition file format is incorrect

Phenomenon

When the Prometheus server reload API is executed, the STAY code 500 is returned and the following message is displayed:

failed to reload config: couldn't load configuration (--config.file="file-path"): parsing YAML file file-path: yaml: unmarshal errors:  line line-number: field test not found in type config.plain
Recovery methods

Check the file-path and line-number in the message, correct the definitions, and reload or restart the Prometheus server.

(7) Incorrect format of discovery configuration file

Phenomenon

Despite successful execution of the jddcreatetree and jddupdatetree commands, the IM management node with the information described in the discovery configuration file is not displayed in the integrated operation viewer.

Recovery methods

Check whether the format of the discovery configuration file is correct, such as specifying colons and making sure that the number of half-width spaces is correct. After correcting the error, run the jddcreatetree and jddupdatetree commands (specify configuration change mode (-c option)) again).

(8) Log monitoring common definition file format is invalid

Description

For path parameter in the <buffer> directive in log monitoring common definition file (jpc_fluentd_common.conf), if a pathname greater than 256 bytes is specified, or a path with :, ,, ;, *, ?, ", <, >, |, tabs, or spaces is specified, it will continue to be logged repeatedly with the following error in fluentd:

unexpected error error_class = Errno::ENOENT error="No such file or directory @ dir_s_mkdir - directory-name "
Corrective action

Verify that log monitoring common definition file (jpc_fluentd_common.conf) <buffer> directive contains the correct path parameter format and settings.

(9) Text-formatted log file monitoring definition file format is invalid

Description

If the pos_file parameter in the [Input Settings] section of text-formatted log file monitoring definition file (fluentd_@@trapname@@_tail.conf) is specified to be blank or string exceeds the upper limit of OS filename, it will continue to be logged with the following repetition in fluentd:

unexpected error error_class = Errno::ENOENT error="No such file or directory @ rb_sysopen - directory-name "
Corrective action

For the pos_file parameter in the [Input Settings] section of text-formatted log file monitoring definition file (fluentd_@@trapname@@_tail.conf), make sure that the format and settings are correct.

(10) Script exporter executed a script that does not exist

Result

The script_success metric is collected as 0. In addition, the script_exit_code metric is collected as -1.

Recovery method

Correct the definition, and then reload or restart Script exporter.

(11) An invalid host and port were specified for Promitor Scraper Resource Discovery

Result

No metric acquired from Resource Discovery.

Recovery method

Correct the definition, and then reload or restart the Promitor Scraper.