Hitachi

JP1 Version 12 JP1/Network Node Manager i Setup Guide


18.8.1 Application failover and the NNMi database

After you configure NNMi using the database for application failover, NNMi does the following:

  1. The active server performs a database backup, storing the data in a single ZIP file.

  2. NNMi sends this ZIP file across the network to the standby server.

  3. The standby server extracts the ZIP file and configures the database to import transaction logs at the first startup.

  4. The database on the active server generates transaction logs, depending on database activity.

  5. Application failover sends the transaction logs across the network to the standby server, where they accumulate on the disk.

  6. When the standby server becomes active, NNMi starts and the database imports all transaction logs across the network.

    The amount of time this takes depends on the number of files and the complexity of the information stored within those files.

  7. After the standby server imports all the transaction logs, the database becomes available and the standby server starts the remaining NNMi processes.

  8. The original standby server is now active, and the procedure starts over at step 1.

Organization of this subsection

(1) Network traffic in an application failover environment

NNMi transfers many items across the network from the active server to the standby server in an application failover environment:

The first two items generate 99% of the network traffic used by application failover. This subsection explores these two items in more detail.

Database activity

NNMi generates transaction logs for all database activity.

Database activity includes everything in NNMi. This activity includes, but is not limited to, the following database activities:

  • Discovering new nodes

  • Discovering attributes about nodes, interfaces, VLANs, and other managed objects

  • State polling and status changes

  • Incidents, events, and root cause analysis

  • Operator actions on the NNMi console

Database activity is outside of your control. For example, an outage on the network results in NNMi generating many incidents and events. These incidents and events trigger state polling of devices on the network, resulting in updates to device status in NNMi. When the outage is restored, additional node up incidents result in further status changes. All of this activity updates entries in the NNMi database.

Although the NNMi database itself grows with database activity, it will reach a stable size for your environment and will exhibit only moderate growth over time.

Database transaction logs

The NNMi database works by creating an empty 16-megabyte file, and then writing database transaction information into that file. NNMi closes this file, and then makes it available to application failover after 15 minutes or after writing 16 megabytes of data to the file, whichever comes first. That means that a completely idle database will generate one transaction log file every 15 minutes, and this file will be essentially empty. Application failover compresses all transaction logs, so an empty 16-megabyte file compresses down to under 1 megabyte. A full 16-megabyte file compresses to about 8 megabytes. Keep in mind that during periods of higher database activity, application failover generates more transaction logs in a shorter period of time, because each file gets full faster.

(2) An application failover traffic test

The following test resulted in an average of about two transaction log files per minute, with an average file size of 7 megabytes per file. This was due to the database activity associated with discovery of an additional 5,000 nodes added with each failover event. The database in this test case eventually stabilized at about 1.1 gigabytes (as measured by the size of the backup ZIP file), with 31,000 nodes and 960,000 interfaces.

Testing method

During the first 4 hours, test personnel seeded NNMi with 5,000 nodes and waited until discovery stabilized. After 4 hours, test personnel induced failover (the standby server became active, and the previously active server became the standby). Immediately after failover, test personnel added approximately 5,000 more nodes, waited another 4 hours to let the NNMi discovery process stabilize, and then induced another failover (failed back to the original active server).

Test personnel repeated this cycle several times with some variation in the time between failovers (4 hours, then 6 hours, then 2 hours). After each failover event, test personnel measured the following:

  • Size of the database created when the node first became active

  • Size of the backup ZIP file

  • Transaction logs

  • Total number of files and the amount of disk space used

  • Number of nodes and interfaces in the NNMi database immediately before inducing failover

  • Elapsed time to complete failover

    This was the time from the initial ovstop command on the active server until the standby server became fully active with NNMi running.

Results

The following table summarizes the results:

Table 18‒2: Application failover test results

Hours

DB.zip size (MB)

No. of transaction logs

Transaction log size (GB)

Nodes

Interfaces

Failover time (minutes)

4

6.5

50

0.3

5,000

15,000

5

8

34

500

2.5

12,000

222,000

10

12

243

500

2.5

17,000

370,000

25

16

400

500

3.5

21,500

477,000

23

20

498

500

3.5

25,500

588,000

32

26

618

1,100

7.5

30,600

776,000

30

28

840

400

2.2

30,600

791,000

31

30

887

500

2.5

30,700

800,000

16

Observations

When NNMi transferred files from the active server to the standby server, the transfer averaged about 5 gigabytes every 4 hours, which is a continuous throughput of approximately 350 kilobytes/sec or 2.8 megabits/sec.

Note
  • This data does not include any other application failover traffic, such as the heartbeat, file consistency checks, or other application failover communications. This data also excludes the overhead of network I/O, such as packet headers. This data only included the actual network payload of each file's contents moving across the network.

  • The traffic volume generated by an NNMi application failover environment is extremely large. Application failover identifies new transaction logs on the active server every five minutes and sends these logs to the standby server. Depending on network speed, the standby server might receive all the new files in a short time, resulting in a relatively idle network for the remainder of that 5-minute interval.

Each time the active and standby servers switch roles (the standby server becomes active and the active server becomes standby), the new active server generates a complete database backup that it sends across the network to the new standby server. Such a database backup also occurs periodically, backing up every 24 hours by default. Each time NNMi generates a new backup, it sends the backup to the standby server. Having this new backup available on the standby server reduces the failover time, as all transaction logs NNMi generated in that 24-hour interval are already in the database and do not need to be imported at the time of a failover.

The information provided in this section will help you understand how the network might perform after a failover when using NNMi with application failover using the NNMi database.