solidDB Help : Samples : HotStandby sample : Watchdog sample application : Failure situations and watchdog actions : Secondary server fails
  
Secondary server fails
Scenario
The secondary server fails.
Symptoms
The watchdog poll of the secondary server fails. The state of the primary server is either PRIMARY ALONE or PRIMARY UNCERTAIN.
Remedy
The standard remedy is to switch the primary server to PRIMARY ALONE state. After the connection to the secondary server is re-established, you can synchronize the primary and secondary servers.
The following table describes the steps that should be taken by the watchdog and the administrator in order to return the service to normal operation.
 
Description
Illustration
Server #2 fails.
Server #1 switches state from PRIMARY ACTIVE to PRIMARY UNCERTAIN automatically, and suspends any open transactions.
The watchdog determines that Server #2 is not responding.
Note If AutoPrimaryAlone is set to Yes in solid.ini on the primary server, then the server switches to PRIMARY ALONE state automatically instead of PRIMARY UNCERTAIN state.
The diagram is described in the first column of the row
The watchdog instructs the primary server to switch state to PRIMARY ALONE by using the command:
ADMIN COMMAND 'hsb set primary alone';
Server #1 commits any open transactions but saves all transactions in the transaction log, in case they have not been committed by Server #2.
Server #1 continues to accept new transactions from applications.
The watchdog continues to monitor responsiveness of servers.
Note If the transaction log on Server #1 fills up before the network connection is fixed, you might have to switch Server #1 to STANDALONE state.
The diagram is described in the first column of the row
The administrator brings Server #2 back up as the secondary server.
The watchdog determines that Server #2 is responsive and instructs Server #1 to connect to Server #2 by using the command:
ADMIN COMMAND 'hsb connect';
Server #2 reads the transaction log from Server #1.
Note If you switched Server #1 to STANDALONE state, you must copy the database from Server #1 to Server #2 before you reconnect the servers, see Synchronizing primary and secondary servers for details.
The diagram is described in the first column of the row
The watchdog or administrator must be careful in choosing whether to switch the primary server to PRIMARY ALONE state, or choose an alternative action. If the watchdog or administrator chooses a different action, they must take into account that the secondary and primary servers might not have the same data; that is, they might not both have rolled back the transaction. It is possible that the failed secondary server actually committed the data and crashed after committing the data but before sending the confirmation to the primary server, and so the primary server never committed the data. In this situation, the secondary server might actually be "ahead" of the primary server rather than behind it.
As always, the watchdog or administrator also must be careful not to allow both servers to switch to PRIMARY ALONE state at the same time.
Further scenarios when the secondary is down
If an application receives error message 10047 or 14537 from the primary server:
Try to connect to the secondary server to check if its state was switched to a primary state.
If its state is not one of the primary server states (PRIMARY ACTIVE or PRIMARY ALONE), see the scenario in Primary server fails.
Go up to
Failure situations and watchdog actions