Cluster failures
A high-availability cluster failure is a loss of connection between the database servers in a cluster that can be caused by several different situations.
- Equipment failure or destruction
- A network failure
- An excessive processing delay on one of the database servers
- The DRTIMEOUT configuration parameter value was exceeded without confirmation of communication with other cluster servers.
- A database server in the cluster does not respond to the periodic
messaging attempts over the network. Cluster servers ping each other
even if the primary server does not send records to the secondary
database servers.
A cluster server pings other cluster servers at the interval specified by its DRTIMEOUT configuration parameter.
After a database server detects a cluster failure, it writes a message to its message log (for example, DR: receive error) and turns off data replication. If a cluster failure occurs, the connection between the two database servers is dropped and the secondary database server remains in read-only mode.
You can configure automatic switchover for HDR replication pairs by setting the DRAUTO configuration parameter to 1 or 2.
You can configure automatic failover for a high-availability cluster by configuring Connection Managers. Connection Managers have many advantages over automatic switchover, and can manage failover to SD and RS secondary servers, as well.