Automatic failover
Switching a master domain manager to a backup master domain manager.
Recovery is easy when you are prepared for potential problems. If the master domain manager becomes unavailable, to ensure continuous operations, a long-term switchmgr operation is triggered and the workload is automatically switched to an eligible backup master domain manager.
Similarly, the backup event processors automatically detect if the event processor is unavailable, and a long-term switcheventprocessor command is triggered.
- Each backup master domain manager monitors the status of the active master domain manager.
- The master domain manager (active or backup) is made to be self-aware. It monitors the status of its fault-tolerant agent to check on the status of processes such as, Batchman, Mailman and Jobman. If at least one of these processes are down, the master domain manager makes 3 attempts to restart them. If the 3 attempts fail, a long-term switchmgr operation is triggered and the workload is automatically switched to an eligible backup master domain manager, while the event processor remains unchanged.
- If the WebSphere Application Server Liberty goes down, the watchdog process attempts to restart it. If the attempt fails, long-term switchmgr and switcheventprocessor commands are triggered, moving both the master domain manager and event processor to their backups.
- If the active master domain manager cannot be
automatically restored within 5 minutes (the threshold after which the master is declared
unavailable), then a permanent switch to a backup is automatically triggered by any of the backup
candidates when one or more of the following conditions persist:
- The fault-tolerant agent, WebSphere Application Server Liberty, or both are still down.
- The engine is unable to communicate with the database, for example, due to a network outage.
The list for potential event processor backups is a separate list from the potential master domain manager backups, because you might have a workstation that can serve as the event manager backup, but you do not want it to act as a potential master domain manager backup. If the event manager fails, but the master domain manager is running fine, then only the event manager switches to a backup manager defined in the list of potential backups. The same happens if the master domain manager fails and the event manager is running fine.
- <TWA_DATA_DIR>/stdlist/appserver/engineServer/logs/messages.log
- <TWA_home>\TWS\stdlist\appserver\engineServer\logs\messages.log
On Windows™ workstations, the FINAL job stream is not defined on the extended agent, but remains on the master domain manager. The FINAL and FINALPOSTREPORTS job streams and jobs need to be moved from the master to the extended agent workstation. For this reason, only a short-term switch can be performed automatically and the long-term switch must be performed manually as documented in Extended loss or permanent change of master domain manager and in Complete procedure for switching a domain manager. See also switchmgr command that contains both the command-line syntax, as well as the procedural steps to perform the switch from the Dynamic Workload Console.