Hitachi

HA Monitor Cluster Software Messages


Glossary

A

abort code

Cause code for abnormal termination of a UNIX system.

active server

The server that is currently performing job processing.

active server start wait state

When a standby server is to be started, it might not always be possible for some reason to confirm that the active server has started on the remote host. In such a case, HA Monitor places the startup processing on the standby server in the wait state until the status of the active server can be confirmed. Such a standby server is called being in the active server start wait state. A standby server in the start wait state waits for a user intervention. When the active server startup processing is confirmed to have started, HA Monitor then starts the standby server.

active system

The system (host) that is performing job processing by running an active server.

alias IP address function

A function that enables multiple IP addresses to be assigned to a single LAN adapter so that the LAN adapter can be shared through the use of different IP addresses.

alive message

A message that is issued between hosts at a specified interval to determine whether the target host is running normally.

C

chassis

A rack that houses a computer and its peripheral equipment. When the system being used is BladeSymphony, this rack is called a chassis. Multiple systems can be configured in one chassis.

child server

A server whose startup processing begins after its parent server has started. This corresponds to a child in the parent-child relationship in server groups.

client

A machine (node) that uses various services provided by programs.

cluster configuration

A system configuration that is larger than one server system. Multiple server systems are connected via high-speed LANs and processing is distributed among different server systems. A client can perform processing by treating these server systems as a single server system.

core file

An OS-specific file in which memory information (module trace information) for a process is stored in the event the program terminates abnormally. Core files might not always be created. If an error occurs in a user-created program and its core file is available, the user can use OS commands to debug the program.

D

dual server system

A system that improves reliability and availability of the entire system by providing two sets of machines, programs, and resources for the system on which a server is run (or by sharing dual resources).

E

event ID

A number assigned to each event in order to individually manage the events that occur in a UNIX system.

G

grouped-system switchover

A function that groups multiple servers together in advance, allowing operations to be switched over to a standby server for that group if a failure occurs on any of the active servers in the group (server group). HA Monitor enables the user to specify hot standby processing for each server within the server group.

grouped-system switchover wait state

If a server failure occurs on an active server for which no_exchange is specified for the group operand of the server environment definition, HA Monitor delays hot standby processing on the standby server in the standby system. This status for a standby server is called the grouped-system switchover wait state. A standby server in the grouped-system switchover wait state waits for a user intervention. However, if a server failure occurs on an active server for which exchange is specified in the group, grouped-system switchover is also performed on the standby server in the grouped-system switchover wait state.

H

HA Monitor Extension

An HA Monitor optional product. When HA Monitor Extension is installed, hot standby control in a large system configuration can be supported. HA Monitor Extension can be used when the OS is Linux (x86).

host

A unit for a system in which one server is run per CPU. The hardware making up a system and the programs that are run on the system are referred to collectively as a host.

host slowdown

An event in which the execution time on the entire host becomes longer than usual. Its causes include concurrent execution of too many programs or a failure in communication between programs.

hot standby

A function for switching jobs to a standby system (host) or server if a failure occurs on the primary system (host) or server.

hot-standby wait-state

If the standby system fails to reset the active system after a failure on the active system, and the active server is started on the standby system while the status of the active server cannot be verified, two active servers might result. To avoid this, HA Monitor delays startup of the active server on the standby system. This status at the active server is called the hot-standby wait-state. A server in the hot-standby wait-state cannot be started as an active server without user intervention.

I

IP address

An address used with the IP protocol. The IP protocol corresponds to the network layer in the OSI basic reference model. The network layer manages addresses used to establish data transfer paths and to determine paths.

K

kernel

The core program of an OS. Its roles include management of tasks, memory, and I/O operations.

L

LAN adapter

Data conversion hardware used to connect a computer and a LAN.

lock

A control to prevent concurrent updating and deadlock on system resources when multiple requests for the same system resources result in contention. In HA Monitor, a function that prevents shared disks from being updated concurrently from both the active server and a standby server is called the lock function.

locked server

If multiple standby servers provide separate services to the standby system, the user can prevent multiple active servers from being run concurrently after hot standby processing.

When a standby server operates as an active server, HA Monitor stops all other standby servers running on the same host. Such a standby server that is terminated by HA Monitor is called a locked server.

LVM

Abbreviation for Logical Volume Manager, which is a kernel functions. LVM enables the user to group several physical disk devices together as a volume group and assign to them any number and size of logical volumes.

M

message log

An OS function that stores issued messages in a specific file (message log file).

module trace information

A flow of module processing in HA Monitor that is collected in a module trace buffer (core file). The module trace information is first transferred to a portable medium and then is analyzed.

monitor mode

One of the server operation modes. A server in the monitor mode cannot use some of the HA Monitor functions, such as starting a server on the standby server beforehand or monitoring for server failures. However, the monitor-mode program management function makes some of those unavailable HA Monitor functions available.

multi-standby function

A function for providing multiple standby servers for one active server.

This function can protect against a system failure while the primary system is recovering from a failure, in contrast to when only one standby server is provided per active server.

N

N+M cold standby configuration

A system configuration for BladeSymphony in which JP1/ServerConductor or HCSM is used to switch from the primary server module to a standby server module when a hardware failure occurs on the primary server module that was performing job operations. Because one or more standby server modules are provided, reliability is improved for handling failures.

O

operation report

Information that is sent from a server to HA Monitor at specific intervals. HA Monitor monitors the server based on this information.

OS panic

An OS kernel panic.

P

parent server

A server that must be active so that another server can be started or hot standby processing can be started. This corresponds to the parent in the parent-child relationship in server groups, and it is specified in the server environment definition.

partition

A function that divides a server machine into multiple sections and runs each section as a virtual server is called partitioning. Each such section is called a partition.

In BladeSymphony, which enables multiple processors to be installed in a single chassis, each processor is sometimes called a partition.

Hitachi server virtualization enables logical partitions (LPARs) to be configured within a processor.

portable medium

A storage medium, such as DAT, that enables programs and data to be recorded and transported.

primary system

The system (host) that performs job processing when it is started.

program

The program (application) that actually executes jobs. HA Monitor improves system reliability by employing hot-standby configurations to achieve dual programs.

Programs can be classified into two types, programs with an HA Monitor interface and programs without an HA Monitor interface.

program with an HA Monitor interface

A program that has a dedicated HA Monitor interface. When a program with an HA Monitor interface is run in the server mode, HA Monitor monitors both host failures and server failures.

HA Monitor monitors a program with an HA Monitor interface and performs hot standby processing in the event of a failure that cannot be detected by the program.

program without an HA Monitor interface

A program that does not have a dedicated HA Monitor interface. Some of the HA Monitor functions are not available, including the function for starting programs on the standby system in advance and the function for monitoring server failures.

There is no difference in functional limitations between using programs without an HA Monitor interface and running programs with an HA Monitor interface in the monitor mode.

R

resource server

A server used only for sharing resources among multiple servers. A resource server does not provide server functions.

Shared resources are controlled for each server when no resource server is used, while shared resources are controlled for each server group when a resource server is used.

restart wait state

If a server failure occurs on an active server for which restart or manual is specified in the switchtype operand in the server environment definition, HA Monitor waits until the active server is restarted without terminating it. Such a status for an active server is called the restart wait state.

S

secondary system

A system (host) that is placed on standby mode when it is started.

server

A service that processes jobs in accordance with requests. In this manual, a program as a unit of hot standby processing is called a server.

Servers are classified roughly as server-mode servers and monitor-mode servers.

server inheritance information

Information that is inherited from an active server to a standby server when pairing is established between the active server and the standby server. If information needs to be transmitted between servers within a user command, HA Monitor's moninfo -p command (command for specifying and displaying server inheritance information) is used to specify the information beforehand, and then the moninfo -g command is used to reference and display the information.

server mode

A server operation mode that can be selected when the program has an interface with HA Monitor. When a server is run in the server mode, HA Monitor monitors both host failures and server failures.

server slowdown

An event where the server execution time becomes longer than usual. Its causes include program looping and resource contention.

shared resource

A resource, such as shared disks and LANs, that can be shared between the active and a standby system. The shared resources controlled by HA Monitor include shared disks, file systems, LANs, and communication lines. HA Monitor controls shared resources for each server.

A resource server can also be used to share resources among server groups.

standby server

A server currently on standby in the event of a failure on the active server.

standby system

A system (host) that is running a standby server that is on standby ready to take over in the event of a failure.

system dump

An OS function for storing on a portable medium error information that cannot be narrowed down to a specific program. Memory information, information about swap area (virtual memory), and processor-specific information can be collected in a system dump. In general, a system dump is used when the cause of an error in the system cannot be identified.

T

TCP/IP

A standard communication protocol used for connection between UNIX computers. TCP/IP supports both the TCP protocol and the IP protocol.

U

UAP

A user's job created as a program. UAPs can be used as programs on a server in the monitor mode. HA Monitor can monitor UAPs by having the UAPs issue APIs.

user command

A command created by a user. A user command that has been registered into HA Monitor in advance can be issued automatically when HA Monitor processing is triggered by a change in the server status. User commands enable the user to use as shared resources various resources that are not controlled by HA Monitor.