12.5.2 What happens and how to recover from major input errors
- Organization of this subsection
-
-
(1) Prometheus server scrape specified an incorrect host or port
-
(2) You specified an incorrect host or port as the remote write destination of the Prometheus server
-
(3) You specified an incorrect host or port as the Prometheus server alert notification destination.
-
(4) You specified an incorrect host or port as the Alertmanager notification destination
-
(5) Blackbox exporter has specified an incorrect host or port to monitor
-
(9) Text-formatted log file monitoring definition file format is invalid
-
(11) An invalid host and port were specified for Promitor Scraper Resource Discovery
-
(1) Prometheus server scrape specified an incorrect host or port
- Phenomenon
-
Scrape fails and the UP metric is collected as 0.
The latest information is not displayed in the TRE information of the integrated operation viewer for data acquired via the Prometheus server with incorrect scrape destination settings.
- Recovery methods
-
Correct the scrape definition and reload or restart the Prometheus server.
(2) You specified an incorrect host or port as the remote write destination of the Prometheus server
- Phenomenon
-
The latest information is not displayed in the TRE information of the integrated operation viewer for data acquired via the Prometheus server with incorrect remote light destination settings.
- Recovery methods
-
Correct the remote write definition and reload or restart the Prometheus server.
(3) You specified an incorrect host or port as the Prometheus server alert notification destination.
- Phenomenon
-
Regarding alerts generated from Prometheus server with incorrect alert notification destination settings, the latest information is not displayed in the JP1 event list of the integrated operation viewer.
- Recovery methods
-
Correct the notification destination definition and reload or restart the Prometheus server.
(4) You specified an incorrect host or port as the Alertmanager notification destination
- Phenomenon
-
Regarding alerts sent from Alertmanager with incorrect notification destination settings, the latest information is not displayed in the JP1 event list of the integrated operation viewer.
- Recovery methods
-
Correct the notification destination definition and reload or restart Alertmanager.
(5) Blackbox exporter has specified an incorrect host or port to monitor
- Phenomenon
-
Acquisition of monitoring destination information fails, and probe_success metric is collected as 0.
- Recovery methods
-
Correct the monitored host and port definitions and reload or restart the Prometheus server.
(6) Prometheus server definition file format is incorrect
- Phenomenon
-
When the Prometheus server reload API is executed, the STAY code 500 is returned and the following message is displayed:
failed to reload config: couldn't load configuration (--config.file="file-path"): parsing YAML file file-path: yaml: unmarshal errors: line line-number: field test not found in type config.plain
- Recovery methods
-
Check the file-path and line-number in the message, correct the definitions, and reload or restart the Prometheus server.
(7) Incorrect format of discovery configuration file
- Phenomenon
-
Despite successful execution of the jddcreatetree and jddupdatetree commands, the IM management node with the information described in the discovery configuration file is not displayed in the integrated operation viewer.
- Recovery methods
-
Check whether the format of the discovery configuration file is correct, such as specifying colons and making sure that the number of half-width spaces is correct. After correcting the error, run the jddcreatetree and jddupdatetree commands (specify configuration change mode (-c option)) again).
(8) Log monitoring common definition file format is invalid
- Description
-
For path parameter in the <buffer> directive in log monitoring common definition file (jpc_fluentd_common.conf), if a pathname greater than 256 bytes is specified, or a path with :, ,, ;, *, ?, ", <, >, |, tabs, or spaces is specified, it will continue to be logged repeatedly with the following error in fluentd:
unexpected error error_class = Errno::ENOENT error="No such file or directory @ dir_s_mkdir - directory-name "
- Corrective action
-
Verify that log monitoring common definition file (jpc_fluentd_common.conf) <buffer> directive contains the correct path parameter format and settings.
(9) Text-formatted log file monitoring definition file format is invalid
- Description
-
If the pos_file parameter in the [Input Settings] section of text-formatted log file monitoring definition file (fluentd_@@trapname@@_tail.conf) is specified to be blank or string exceeds the upper limit of OS filename, it will continue to be logged with the following repetition in fluentd:
unexpected error error_class = Errno::ENOENT error="No such file or directory @ rb_sysopen - directory-name "
- Corrective action
-
For the pos_file parameter in the [Input Settings] section of text-formatted log file monitoring definition file (fluentd_@@trapname@@_tail.conf), make sure that the format and settings are correct.
(10) Script exporter executed a script that does not exist
- Result
-
The script_success metric is collected as 0. In addition, the script_exit_code metric is collected as -1.
- Recovery method
-
Correct the definition, and then reload or restart Script exporter.
(11) An invalid host and port were specified for Promitor Scraper Resource Discovery
- Result
-
No metric acquired from Resource Discovery.
- Recovery method
-
Correct the definition, and then reload or restart the Promitor Scraper.