An auto-failover is triggered if the Acting-Primary or Arbiter determine that a primary can no longer function. The most common reasons are:
- The Acting-Primary lost contact with the Failover and Arbiter, but the Failover and Arbiter maintained contact with each other
- The Acting Primary runs into a critical problem, such as an inability to write to disk or major database corruption
- InterSystems Caché on the Acting-Primary was shut down or restarted without selecting “Don’t fail over.”The following image shows “Don’t fail over” selected as a best practice for continued functionality of the primary
- The operating system on the Acting-Primary was shut down for any reason while running InterSystems Caché
Failover process overview:
- The mirror determines that the Acting-Primary can no longer fulfill its duties
- The mirror confirms that the Failover is able to take over the duties of the Acting-Primary. Once this test is passed, the Auto-Failover cannot be cancelled
- The Failover is promoted into the Acting-Primary role
- If the previous Acting-Primary is still running, its InterSystems Caché will be forced to shut down. It will stay down until manually turned back on
What to do after the failover:
- Confirm the new Acting-Failover has every connection in the “On” state. Some connections might not have made the transition
- Turn InterSystems Caché on the previous Acting-Primary back on
- If the failover was unwanted, gather the cconsole.log file from each system and send them to Support. They are the best way to determine the root cause of the auto-failover
- Please label the files by system of origin for clarity
- If the failover was unwanted, work on preventing the root cause