18.8.1 Application failover and the NNMi database

After you configure NNMi using the database for application failover, NNMi does the following:

The active server performs a database backup, storing the data in a single ZIP file.
NNMi sends this ZIP file across the network to the standby server.
The standby server extracts the ZIP file and configures the database to import transaction logs at the first startup.
The database on the active server generates transaction logs, depending on database activity.
Application failover sends the transaction logs across the network to the standby server, where they accumulate on the disk.
When the standby server becomes active, NNMi starts and the database imports all transaction logs across the network.

The amount of time this takes depends on the number of files and the complexity of the information stored within those files.
After the standby server imports all the transaction logs, the database becomes available and the standby server starts the remaining NNMi processes.
The original standby server is now active, and the procedure starts over at step 1.

Organization of this subsection

(1) Network traffic in an application failover environment
(2) An application failover traffic test

(1) Network traffic in an application failover environment

NNMi transfers many items across the network from the active server to the standby server in an application failover environment:

Database activity (the database backup as a single ZIP file)
Transaction logs
A periodic heartbeat so that each application failover node verifies that the other node is still running.
File comparison lists so that the standby server can verify that its files are in synchronization with those on the active server
Miscellaneous events, such as changes in parameters (enable/disable failover and others) and nodes joining or leaving the cluster

The first two items generate 99% of the network traffic used by application failover. This subsection explores these two items in more detail.

Database activity

NNMi generates transaction logs for all database activity.

Database activity includes everything in NNMi. This activity includes, but is not limited to, the following database activities:

Discovering new nodes
Discovering attributes about nodes, interfaces, VLANs, and other managed objects
State polling and status changes
Incidents, events, and root cause analysis
Operator actions on the NNMi console

Database activity is outside of your control. For example, an outage on the network results in NNMi generating many incidents and events. These incidents and events trigger state polling of devices on the network, resulting in updates to device status in NNMi. When the outage is restored, additional node up incidents result in further status changes. All of this activity updates entries in the NNMi database.

Although the NNMi database itself grows with database activity, it will reach a stable size for your environment and will exhibit only moderate growth over time.

Database transaction logs

The NNMi database works by creating an empty 16-megabyte file, and then writing database transaction information into that file. NNMi closes this file, and then makes it available to application failover after 15 minutes or after writing 16 megabytes of data to the file, whichever comes first. That means that a completely idle database will generate one transaction log file every 15 minutes, and this file will be essentially empty. Application failover compresses all transaction logs, so an empty 16-megabyte file compresses down to under 1 megabyte. A full 16-megabyte file compresses to about 8 megabytes. Keep in mind that during periods of higher database activity, application failover generates more transaction logs in a shorter period of time, because each file gets full faster.

To Page Top

(2) An application failover traffic test

The following test resulted in an average of about two transaction log files per minute, with an average file size of 7 megabytes per file. This was due to the database activity associated with discovery of an additional 5,000 nodes added with each failover event. The database in this test case eventually stabilized at about 1.1 gigabytes (as measured by the size of the backup ZIP file), with 31,000 nodes and 960,000 interfaces.

Testing method

During the first 4 hours, test personnel seeded NNMi with 5,000 nodes and waited until discovery stabilized. After 4 hours, test personnel induced failover (the standby server became active, and the previously active server became the standby). Immediately after failover, test personnel added approximately 5,000 more nodes, waited another 4 hours to let the NNMi discovery process stabilize, and then induced another failover (failed back to the original active server).

Test personnel repeated this cycle several times with some variation in the time between failovers (4 hours, then 6 hours, then 2 hours). After each failover event, test personnel measured the following:

Size of the database created when the node first became active
Size of the backup ZIP file
Transaction logs
Total number of files and the amount of disk space used
Number of nodes and interfaces in the NNMi database immediately before inducing failover
Elapsed time to complete failover

This was the time from the initial ovstop command on the active server until the standby server became fully active with NNMi running.

Results

The following table summarizes the results:

Table 18‒2: Application failover test results
Hours	DB.zip size (MB)	No. of transaction logs	Transaction log size (GB)	Nodes	Interfaces	Failover time (minutes)
4	6.5	50	0.3	5,000	15,000	5
8	34	500	2.5	12,000	222,000	10
12	243	500	2.5	17,000	370,000	25
16	400	500	3.5	21,500	477,000	23
20	498	500	3.5	25,500	588,000	32
26	618	1,100	7.5	30,600	776,000	30
28	840	400	2.2	30,600	791,000	31
30	887	500	2.5	30,700	800,000	16

Observations

When NNMi transferred files from the active server to the standby server, the transfer averaged about 5 gigabytes every 4 hours, which is a continuous throughput of approximately 350 kilobytes/sec or 2.8 megabits/sec.

Note

This data does not include any other application failover traffic, such as the heartbeat, file consistency checks, or other application failover communications. This data also excludes the overhead of network I/O, such as packet headers. This data only included the actual network payload of each file's contents moving across the network.
The traffic volume generated by an NNMi application failover environment is extremely large. Application failover identifies new transaction logs on the active server every five minutes and sends these logs to the standby server. Depending on network speed, the standby server might receive all the new files in a short time, resulting in a relatively idle network for the remainder of that 5-minute interval.

Each time the active and standby servers switch roles (the standby server becomes active and the active server becomes standby), the new active server generates a complete database backup that it sends across the network to the new standby server. Such a database backup also occurs periodically, backing up every 24 hours by default. Each time NNMi generates a new backup, it sends the backup to the standby server. Having this new backup available on the standby server reduces the failover time, as all transaction logs NNMi generated in that 24-hour interval are already in the database and do not need to be imported at the time of a failover.

The information provided in this section will help you understand how the network might perform after a failover when using NNMi with application failover using the NNMi database.

To Page Top