solidDB Help : Samples : HotStandby sample : Watchdog sample application : How the Watchdog sample application works
  
How the Watchdog sample application works
The Watchdog sample application works in one of two modes:
Normal mode
In normal mode, the Watchdog sample checks the connection status of servers by using the following command in both primary and secondary servers:
ADMIN COMMAND 'hotstandby status connect'
The Watchdog sample performs this check between servers at regular intervals. The interval time is set with the Watchdog.PingInterval parameter in the solid.ini file, see [Watchdog] section of the solid.ini configuration file.
The Watchdog sample determines that there is a problem in the HSB system when it receives no response from either the primary server, the secondary server, or both servers after a given number of polling attempts. The number of attempts is set by using the Watchdog.NumRetry parameter.
The Watchdog sample also checks whether the primary server and the secondary server are connected to each other. If the primary or secondary server returns a successful connect status, the primary and secondary servers are still connected. If one of the servers returns an error, then the primary and secondary servers are no longer connected.
You can determine whether the Watchdog sample attempts to switch server states or just writes a message to the log to inform the administrator of the error by setting the Watchdog.AutoSwitch parameter:
If the Watchdog.AutoSwitch parameter is set to YES, then the Watchdog sample is responsible for switching server states in the event of the failure of the primary server. For example, when the primary server is down, the Watchdog sample switches the secondary server to make it the new primary server (in PRIMARY ALONE state).
If the Watchdog.AutoSwitch parameter is set to NO, the Watchdog sample does not change the server state, but instead writes a message to the Watchdog log to notify the user to switch server states.
To continue monitoring, the Watchdog sample switches to "failure" mode, which means it continuously keeps checking failed servers for a working connection.
Failure mode
In failure mode, the Watchdog sample waits for the system administrator to fix the problem with the primary and secondary servers. If a second failure occurs before the first error is fixed, the Watchdog sample does not handle the failure. This limitation in the Watchdog sample is deliberate. In certain situations, a series of failures and seemingly appropriate responses can result in two primary servers (either in PRIMARY ALONE or STANDALONE states). This is particularly likely if there are brief failures in the network, but no failures in the database servers themselves. For an example of a series of events that produces two primary servers, see Dual primary servers.
While in failure mode, the Watchdog sample polls both the primary and secondary servers. When it is able to connect to both servers, it sends the following command to both servers to determine the state of the servers:
ADMIN COMMAND 'hotstandby state'
After the Watchdog sample is able to communicate with both servers, the next step is determined by the setting of the Watchdog.DualSecAutoSwitch parameter:
If the parameter is set to YES and both servers are in a secondary state, the Watchdog sample automatically selects one of the secondary servers to be the new primary server.
If the parameter is set to NO then the system administrator must switch one server to be the primary server.
Note This parameter setting applies whether the Watchdog sample is operating in normal mode or failure mode.
When HSB operation is restored (servers are in PRIMARY ACTIVE and SECONDARY ACTIVE state), the Watchdog sample returns to normal mode.
See
Co-locating a watchdog with an HSB server
Go up to
Watchdog sample application