promtool
- Organization of this page
Function
This command checks the format of Prometheus server definition files and tests alert rules.
Format
promtool check config Prometheus-configuration-file-name check rules alert-configuration-file-name test rules test-file-name ...
Execution permission
None
Storage directory
- In Windows:
-
Agent-path\tools\
- In Linuix:
-
/opt/jp1ima/tools/
Arguments
- check config Prometheus-configuration-file-name
-
Check the format of the Prometheus configuration file for errors.
For details about the Prometheus configuration file, see Prometheus configuration file (jpc_prometheus_server.yml).
- check rules alert-configuration-file-name
-
Check for errors in the format of the alert configuration file.
For details about the alert configuration file, see Alert configuration file (jpc_alerting_rules.yml).
- test rules test-file-name ...
-
Run a test of the alert rule that you wrote in the test file. You can specify up to 10 test files.
Return values
0 |
Format is correct, and alert rule test successful. |
other than 0 |
Format is incorrect, and alert rule test failure. |
Alert rule test file contents
- Format
-
Write in YAML format.
- File
-
Any-name.yml
- Description
-
Item name
Description
Default value
rule_files:
[ - <file_name> ]
Specify a list of rule files to consider for testing.
The file name can be specified as a wildcard.
--
[ evaluation_interval: <duration> | default = 1m ]
Specify the evaluation interval for the alert rule.
1m
group_eval_order:
[ - <group_name> ]
You can specify the order of group names.
The order of the group names is the order in which the rule groups are evaluated (specific evaluation times). The order specified is guaranteed only for the group name described.
You don't have to describe every group.
You can specify the evaluation order of the rule file described in rule_files:, as shown in the following example.
<Description example>
group_eval_order:
- test02.yml
- test01.yml
--
tests:
[ - <test_group> ]
Enumerate all tests.
--
- Legend:
-
--: Not applicable
-
<test_group>
Item name
Description
Default value
interval: <duration>
Specify the interval between data occurrences for input_series:.
--
input_series:
[ - <series> ]
Specify the data for the series.
--
[ name: <string> ]
Specify a name for the <test group>.
--
alert_rule_test:
[ - <alert_test_case> ]
Specify a test for the alert rule.
Considers the alert rule from the specified file.
--
- Legend:
-
--: Not applicable
-
<series>
Item name
Description
Default value
series: <string>
Specify the data for the series in the following format:
'metric-name {label-name = label-value, ...}'
<Description example>
series_name{label1="value1", label2="value2"}
go_goroutines{job="prometheus", instance="localhost:9090"}
--
values: <string>
Specify the data to occur, separated by spaces.
You can use the following expansion notation:
<Example of expanded notation>
'a+bxc' becomes 'a a+b a+(2*b) a+(3*b) ... a+(c*b)'
'a-bxc' becomes 'a a-b a-(2*b) a-(3*b) ... a-(c*b)'
<Description example>
'-2+4x3' becomes '-2 2 6 10'
' 1-2x4' becomes '1 -1 -3 -5 -7'
--
- Legend:
-
--: Not applicable
-
<alert_test_case>
Item name
Description
Default value
eval_time: <duration>
Specify the time elapsed since "time = 0s" when to evaluate the test (check for alerts).
Evaluates the data occurrence interval specified in interval: in <test_group> up to the elapsed time specified in eval_time:.
For example, if the data occurrence interval specified for interval: in <test_group> is 1m and the elapsed time specified for eval_time: is 5m, it is evaluated when the sixth data specified in input_series: is acquired.
--
alertname: <string>
Specifies the name of the alert to test.
Specify the value described in alert: in the alert configuration file.
--
exp_alerts:
[ - <alert> ]
Specify a list of alerts that you expect to raise.
If you want to test that the alert rule does not run, leave the exp_alerts specification empty.
--
- Legend:
-
--: Not applicable
-
<alert>
Item name
Description
Default value
exp_labels:
[ <labelname>: <string> ]
Specifies the label of the alert that you expect to raise.
The label also includes the label of the sample associated with the alert.
--
exp_annotations:
[ <labelname>: <string> ]
Specifies the annotation of the alert that you expect to raise.
--
- Legend:
-
--: Not applicable
Sample alert configuration file and alert rule test file
- - Examples of "up" metrics
-
-
Example of description of alert configuration file
groups: - name: alerts rules: - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
-
Example of description of alert rule test file
rule_files: - jpc_alerting_rules.yml evaluation_interval: 1m tests: # Test - interval: 1m # Series data. input_series: - series: 'up{job="prometheus", instance="localhost:9090"}' values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0' # success # Unit test for alerting rules. alert_rule_test: # Unit test 1. - eval_time: 5m alertname: InstanceDown # alertname: AnotherInstanceDown exp_alerts: # Alert 1. - exp_labels: severity: page instance: localhost:9090 job: prometheus exp_annotations: summary: "Instance localhost:9090 down" description: "localhost:9090 of job prometheus has been down for more than 5 minutes."
-
- - Examples of CPU
-
-
Example of description of alert configuration file
groups: - name: alerts rules: - alert: cpu_alert_rule expr: sum(rate(windows_cpu_time_total{mode!="idle"}[1m])) < 50 for: 2m labels: severity: page annotations: summary: "CPU alert rule summary" description: "CPU alert rule description"
-
Example of description of alert rule test file
rule_files: - jpc_alerting_rules.yml evaluation_interval: 1m tests: # Test - interval: 1m # Series data. input_series: - series: 'windows_cpu_time_total' #values: '0+3000x10' # fail values: '0+2800x10' # success # Unit test for alerting rules. alert_rule_test: # Unit test 1. - eval_time: 7m alertname: cpu_alert_rule exp_alerts: # Alert 1. - exp_labels: severity: page exp_annotations: summary: "CPU alert rule summary" description: "CPU alert rule description"
-
- - Examples of memory (#1)
-
-
Example of description of alert configuration file
groups: - name: alerts rules: - alert: memory_alert_rule expr: windows_memory_available_bytes < 1073741824 for: 3m labels: severity: page annotations: summary: "Memory alert rule summary" description: "Memory alert rule description"
-
Example of description of alert rule test file
rule_files: - jpc_alerting_rules.yml evaluation_interval: 1m tests: # Test - interval: 1m # Series data. input_series: - series: 'windows_memory_available_bytes' values: '1073741822 1073741826 1073741823 1073741823 1073741823 1073741823 1073741828 1073741822' # success #values: '1073741822 1073741822 1073741826 1073741823 1073741823 1073741823 1073741828 1073741822' # fail # Unit test for alerting rules. alert_rule_test: # Unit test 1. - eval_time: 5m alertname: memory_alert_rule exp_alerts: # Alert 1. - exp_labels: severity: page exp_annotations: summary: "Memory alert rule summary" description: "Memory alert rule description"
-
- - Examples of memory (#2)
-
-
Example of description of alert configuration file
groups: - name: alerts rules: - alert: memory_alert_rule expr: windows_memory_available_bytes < 1073741824 for: 3m labels: severity: page annotations: summary: "Memory alert rule summary" description: "Memory alert rule description"
-
Example of description of alert rule test file
rule_files: - jpc_alerting_rules.yml evaluation_interval: 1m tests: # Test - interval: 1m # Series data. input_series: - series: 'windows_memory_available_bytes' values: '1073741822 1073741826 1073741823 1073741823 1073741823 1073741823 1073741828 1073741822' # success #values: '1073741822 1073741822 1073741826 1073741823 1073741823 1073741823 1073741828 1073741822' # fail # Unit test for alerting rules. alert_rule_test: # Unit test 1. - eval_time: 5m alertname: memory_alert_rule exp_alerts: # Alert 1. - exp_labels: severity: page exp_annotations: summary: "Memory alert rule summary" description: "Memory alert rule description"
-
- - Examples of interrupt
-
-
Example of description of alert configuration file
groups: - name: alerts rules: - alert: alerts_rules_increase expr: increase(windows_cpu_interrupts_total{core="0,0"}[3m]) > 100 for: 0m labels: severity: critical annotations: summary: "Alert test case of increase." description: "Use increase function in unit test." - alert: alerts_rules_increase_all expr: sum(increase(windows_cpu_interrupts_total[5m])) > 240 for: 0m labels: severity: critical annotations: summary: "Alert test case of increase." description: "Use increase function in unit test."
-
Example of description of alert rule test file
rule_files: - jpc_alerting_rules.yml evaluation_interval: 1m tests: # Test - interval: 1m # Series data. input_series: - series: 'windows_cpu_interrupts_total{core="0,0"}' #values: '0 10 20 30 110 120 130 140 150 160 170' #fail values: '0+10x3 111+10x6' #success - series: 'windows_cpu_interrupts_total{core="0,1"}' values: '0+10x3 98+13x6' # Unit test for alerting rules. alert_rule_test: # Unit test 1. - eval_time: 4m alertname: alerts_rules_increase exp_alerts: # Alert 1. - exp_labels: core: "0,0" severity: critical exp_annotations: summary: "Alert test case of increase." description: "Use increase function in unit test." - eval_time: 8m alertname: alerts_rules_increase_all exp_alerts: # Alert 1. - exp_labels: severity: critical exp_annotations: summary: "Alert test case of increase." description: "Use increase function in unit test."
-
Examples
-
Example of performing a format check on a Prometheus configuration file (if the format is correct)
# ./promtool check config jpc_prometheus_server.yml Checking jpc_prometheus_server.yml SUCCESS: 1 rule files found Checking jpc_alerting_rules.yml SUCCESS: 1 rules found
-
Example of performing a format check on a Prometheus configuration file (if the format is incorrect)
# ./promtool check config jpc_prometheus_server.yml Checking jpc_prometheus_server.yml FAILED: parsing YAML file jpc_prometheus_server.yml: yaml: line 42: did not find expected key
-
Example of performing a format check of the alert configuration file (if the format is correct)
# ./promtool check rules jpc_alerting_rules.yml Checking jpc_alerting_rules.yml SUCCESS: 1 rules found
-
Example of performing a format check of the alert configuration file (if the format is incorrect)
# ./promtool check rules jpc_alerting_rules.yml Checking jpc_alerting_rules.yml FAILED: jpc_alerting_rules.yml: yaml: unmarshal errors: line 10: field aannotations not found in type rulefmt.RuleNode
-
Example of running an alert rule test (if the test is successful)
# ./promtool test rules alerts_rules_unit_test.yml Unit Testing: alerts_rules_unit_test.yml SUCCESS
-
Example of running an alert rule test (if the test fails (Part 1))
# ./promtool test rules alerts_rules_unit_test.yml Unit Testing: alerts_rules_unit_test.yml FAILED: alertname:InstanceDown, time:10m, exp:"[Labels:{alertname=\"InstanceDown\", instance=\"localhost:9090\", job=\"prometheus\", severity=\"page\"} Annotations:{description=\"localhost:9090 of job prometheus has been down for more than 5 minutes.\", summary=\"Instance localhost:9090 down\"}]", got:"[]
If the test file is invalid, an alert that is expected to occur in exp and an alert that actually occurs in got are output. In the case of the above execution example, it indicates that no alert was actually notified (output result of got) for the expectation that one alert is output (output result of exp).
-
Example of running an alert rule test (if the test fails (Part 2))
# ./promtool test rules test_rule_file.yml Unit Testing: test_rule_file.yml FAILED: alertname:InstanceDown, time:5m, exp:"[Labels:{alertname=\"InstanceDown\", instance=\"localhost:9090\", job=\"prometheus\", severity=\"warn\"} Annotations:{description=\"localhost:9090 of job prometheus has been down for more than 5 minutes.\", summary=\"Instance localhost:9090 down\"}]", got:"[Labels:{alertname=\"InstanceDown\", instance=\"localhost:9090\", job=\"prometheus\", severity=\"page\"} Annotations:{description=\"localhost:9090 of job prometheus has been down for more than 5 minutes.\", summary=\"Instance localhost:9090 down\"}]
If the test file is invalid, an alert that is expected to occur in exp and an alert that actually occurs in got are output. In the above execution example, it is assumed that the value of severity is warn (output result of exp), but the value of severity is actually page (output result of got).