Recovering a cluster after critical data is damaged
If one of the database servers in a high-availability cluster experiences a failure that damages the root dbspace, the dbspace that contains logical-log files, or the dbspace that contains the physical log, you must treat the failed database server as if it has no data on the disks as is being started for the first time. Use the functioning database server with the intact disks as the database server with the data.
Primary server failure
For the following steps, assume that the configuration consists of a primary server named srv_A and an HDR secondary server named srv_B. The steps for restarting an RS cluster are the similar.
To restart HDR after a critical media failure:
- The DRAUTO configuration parameter on srv_B affects what
you do next
- If it is set to
0
, then you must convert the server to the primary server by running the onmode -d make primary command. - If it is set to
1
, then convert the server to the primary server by running the onmode -d make primary command. - If it is set to
2
, the secondary database server becomes a primary database server as soon as the connection ends when the old primary server fails.
- If it is set to
- Restore srv_A (the primary database server) from the last dbspace backup.
- Use the onmode -d command to set srv_A to
an HDR secondary database server and to start HDR.
The onmode -d command starts a logical recovery from the logical-log files on srv_B. If logical recovery cannot complete because you backed up and freed logical-log files on srv_B, HDR does not start until you perform the next step.
- Apply the logical-log files from srv_B (the new primary database server), which were backed up to tape. The HDR pair is now operational; however the roles of srv_A and srv_B are swapped. To swap srv_A and srv_B back to their original roles, follow the instructions: Recovering an HDR cluster after the secondary server became the primary server.
Step | On the primary database server (svr_A) | On the secondary database server (svr_B) |
---|---|---|
1. | onmode command onmode -d make primary srv_A |
|
2. | ontape command ontape -p ON-Bar command onbar -r -p |
|
3. | onmode command onmode -d secondary srv_B |
|
4. | ontape command ontape -l ON-Bar command onbar -r -l |
Secondary server failure
If the secondary database server suffers a critical media failure, recover the cluster by following the steps for starting a cluster for the first time.
Primary and secondary server failure
In the unfortunate event that both of the computers that are running database servers in a replication pair experience a failure that damages the root dbspace, the dbspaces that contain logical-log files or the physical log, you must restart the cluster.
- Restore the primary database server from the storage space and logical-log backup.
- After you restore the primary database server, treat the other failed database server as if it had no data on the disks and you were starting the high-availability cluster for the first time.