Job Management Partner 1/Integrated Management - Manager Overview and System Design Guide
Using the event guide function, you can record your experience and success in resolving problems, and you can reference and accumulate diagnostic case studies, troubleshooting examples, and so on.
The system administrator manages the system through a process of error detection based on JP1 event monitoring, investigation, and remedial action. By recording your experience and results as event guide information after you have resolved a problem, users can respond quickly if the same type of JP1 event occurs again.
Event guide information is displayed as detailed information about a JP1 event in the Event Details window of the Central Console.
One item of event guide information can be displayed for one JP1 event. But the larger the system, the greater the number of JP1 events issued from linked JP1 products and user applications. Consider the following points when setting event guide information.
- Organization of this subsection
- (1) Restricting applicable JP1 events
- (2) Setting appropriate event guide information
- (3) Setting event guide information using variables (placeholder strings)
(1) Restricting applicable JP1 events
JP1 events cover a wide range and their number increases according to the size of the system. It would not be easy to set event guide information for every event. Also, the number of items that can be defined in an event guide information file is limited to 1,000.
For these reasons, you must restrict the JP1 events for which event guide information is set. Decide how to do this from the following perspectives, for example.
(a) Restricting applicable JP1 events by event level
The JP1 event levels are Emergency, Alert, Critical, Error, Warning, Notice, Information, and Debug. Depending on the types of JP1 events issued by the managed hosts in your system, register event guide information for the more important JP1 events (Error level or higher, for example).
When you use the integrated monitoring database, the user-defined event level applies for JP1 events.
Under the default settings, JP1 events of Emergency, Alert, Critical, Error, or Warning level are forwarded to a manager from JP1/Base on an agent.
(b) Restricting applicable JP1 events by frequency and urgency
Find out what sort of JP1 events are being issued from the managed hosts by performing an event search or by executing the JP1/Base jevexport command, and examine the subtotals in the output results. If it appears that some JP1 events of concern are being issued more often than others, you can target those JP1 events according to which host they originate from, or how urgently they need to be identified and dealt with.
If any JP1 events requiring urgent action are being issued at a high frequency, the system administrator and operators will need to discuss and determine troubleshooting procedures. Set event guide information for these sorts of JP1 events.
For details about the jevexport command, see the chapter on commands in the Job Management Partner 1/Base User's Guide.
- Note
- A maximum of 1,000 items of event guide information can be set. Make sure that you prioritize JP1 events to keep them within this limit.
- If it is difficult to restrict the applicable JP1 events to no more than 1,000, consider the following strategy:
- Group similar events or related events, and write a list of links (used as an index page) in the event-guide message for the group.
- This approach requires the user to search for advice relating to a particular event from the list of links. You should therefore establish clear editing rules and explore other ways of making the list easy to search.
(2) Setting appropriate event guide information
Because you can set event guide information as you choose, you can set appropriate information for your operational requirements, as in the following examples:
- Event guide information for initial response
State how to respond to a problem detected by a JP1 event, and guide the system administrator on what action to take when the problem occurs. Set this as event guide information.
- Event guide information for error investigation and troubleshooting
State what JP1/IM functions to use when investigating a problem detected by a JP1 event, and write down the action procedure for the problem. Set this as event guide information.
You can also prepare event guide information according to the nature of the JP1 event. For example, for JP1 events of Error level or higher that require urgent action, you might describe the initial response procedure, while for JP1 events of Warning level indicating a preventable future problem, you might describe how to investigate and preempt the problem.
(a) Event guide information for initial response (example)
In this example, event guide information is needed for an event indicating that a JP1/AJS job running on a managed host has ended abnormally.
The JP1 event indicating abnormal termination of a JP1/AJS job has an event ID (B.ID) of 00004107 and an event level (E.SEVERITY) of Error level. Set event guide information for this JP1 event as follows.
- Example of contents written in the event guide information file (jco_guide.txt):
- (extract of the condition definition)
- [EV_GUIDE_001]
- EV_COMP=B.ID:00004107:00000000
- EV_COMP=E.SEVERITY:Error
- EV_GUIDE=The job ended abnormally.\n Contact the system administrator in charge of host $E.C0 urgently.\n\n List of system administrator contact details \n Host-A:TEL(03-xxxx-xxxx) Mail(xxxxx@xxx.co.jp) \n Host-B:TEL(03-xxxx-xxxx) Mail(xxxxx@xxx.co.jp) \n Host-C:TEL(03-xxxx-xxxx) Mail(xxxxx@xxx.co.jp)
- [END]
(b) Event guide information for error investigation and troubleshooting (example)
In this example, event guide information is needed for an event indicating that the number of commands queued in JP1/Base running on an agent has reached a set threshold.
The JP1 event indicating that the command queue count threshold has been exceeded has an event ID (B.ID) of 00003FA5 and an event level (E.SEVERITY) of Warning level. Set event guide information for this JP1 event as follows.
- Example of contents written in the event guide information file (jco_guide.txt):
- (extract of the condition definition)
- [EV_GUIDE_002]
- EV_COMP=B.IDBASE:00003FA5
- EV_COMP=E.SEVERITY:Warning
- EV_FILE=user-specified-folder(path)\jco_guidemes_002.txt
- [END]
- Example of contents written in an event-guide message file (jco_guidemes_002.txt)
- The number of queued commands has exceeded the threshold (10).
- Determine the JP1/Base host from the message text.
- Check whether there is insufficient memory or a backlog of automated actions on the host.
- Open the List of Action Results window, or execute the jcashowa and jcocmdshow commands, to check the statuses of the automated actions.
- If any urgent automated actions are waiting to be executed, cancel them as a temporary measure.
- To cancel an automated action, use the jcacancel or jcocmddel command.
- These two commands display a confirmation message requiring you to type y or n. When executing either command from the Execute Command window, specify the -f option to bypass the confirmation message.
- If this event occurs frequently, use the jcocmddef command to modify the command execution environment.
(3) Setting event guide information using variables (placeholder strings)
A variable (placeholder string) can be used to represent a JP1 event attribute in an event-guide message. For example, if you set the host name of the server where the problem originated (B.SOURCESERVER) as a variable, the actual host name will be displayed in the event guide information by means of the variable, and the message text will match the actual situation. This reduces the time required to identify the host where the problem occurred.
The following table describes the variables you can use in an event-guide message.
Table 11-4 Variables that can be used in event-guide messages
Event attribute Variable Format of substituted value Basic attribute Serial number B.SEQNO Integer character string Event ID Either of the following:
- B.ID
- B.IDBASE
String in the format:
- basic-code:extended-code
- basic-code
Source process ID B.PROCESSID Integer character string Registered time B.TIME Arrived time B.ARRIVEDTIME Source user ID B.USERID Source group ID B.GROUPID Source user name B.USERNAME Character string Source group name B.GROUPNAME Source event server name B.SOURCESERVER Destination event server name B.DESTSERVER Source serial number B.SOURCESEQNO Integer character string Message B.MESSAGE Character string Extended attribute Event level E.SEVERITY User name E.USER_NAME Product name E.PRODUCT_NAME Object type E.OBJECT_TYPE Object name E.OBJECT_NAME Root object type E.ROOT_OBJECT_TYPE Root object name E.ROOT_OBJECT_NAME Object ID E.OBJECT_ID Occurrence E.OCCURRENCE Start time E.START_TIME End time E.END_TIME Return code E.RESULT_CODE Other extended attribute E.xxxxxx# #: Any JP1 product-specific extended attribute can be used. For example, a JP1/AJS job execution host is E.C0. For details about program-specific extended attributes, see the documentation for the particular product that issues JP1 events.
By using these variables, you can write event-guide messages that can be generally applied. For example, if you use the variable for a JP1/AJS job execution host (E.C0), you can write event-guide messages like the following.
- Example of an event-guide message using a variable (extract of the EV_GUIDE segment):
- EV_GUIDE=The job ended abnormally.\n Check whether an error occurred on host $E.C0.\n In a previous case, the job failed due to insufficient memory on host A.\n Check the available memory using the vmstat command.
For details about JP1 event attributes, see 3.1 Attributes of JP1 events in the manual Job Management Partner 1/Integrated Management - Manager Command and Definition File Reference.
The character strings that can be substituted in a JP1 event attribute (variable) depend on the product. When using variables in event-guide messages, see also the description of JP1 events in the product documentation.
All Rights Reserved. Copyright (C) 2009, Hitachi, Ltd.