solidDB Help : solidDB Grid : Failure handling in the grid : Failure conditions and automatic corrective actions : Actions of clients and nodes as a result of failures
  
Actions of clients and nodes as a result of failures
Note Check the solidDB Release Notes for any limitations that are associated with using a grid in the current release.
In addition to the node controller and the Grid Availability Monitor (GAM), grid failures can be detected by the client and by other nodes. If the node controller and the GAM can recover the grid, clients and nodes might be able to automatically recover (or re-establish) processes that were affected by the failure.
Client perception of failures
A client that is connected to a failed node (or cannot connect to a node because of a network failure), receives an error either immediately or after a timeout period. In both cases, if the client had an open transaction, the transaction is rolled back.
If the client immediately tries to reconnect to the node the connection is blocked until one of the following situations occurs:
The login timeout expires and the connect command returns with an error.
The grid driver successfully connects to a node. The node might be the same node (if the node or network has already recovered from the failure). However, if the failed node remains unresponsive and the grid is still active, the GAM sets the status of the node to MEMBER_FAILED, and reconfigures the grid so that another node can take the place of the failed node.
The duration of the failover depends on the configured threshold values and other simultaneous load, but is typically only a few seconds.
Nodes perceptions of failures
When a node fails (or is inaccessible due to network failure), the effect on another node in the grid depends on whether the node shared a replication group with the failed node.
Nodes that share a replication group
Nodes that have a replication group where a replication unit is on the failed node, can detect the failure as soon as the replication timeout expires.
If the failed node had the primary replication unit for a replication group, nodes with secondary replication units notice the replication subscription change from ACTIVE to ERROR state. Nodes react automatically to replication errors by periodically attempting to create new replication connections and by restarting failed subscriptions. If the failure is resolved quickly, these automated actions recover the functionality.
If the failed node had a secondary replication unit, then other nodes that share the same replication group, do not receive any immediate notification of the failure.
If the failure is not resolved, GAM takes over, see Node failures.
Nodes that do not share a replication group
Replication connections and subscriptions exist between every node in a grid, but if nodes do not share a replication group, replication is in standby mode and errors are not actively monitored by the nodes. However, the nodes can report the states of all replication subscriptions when the GAM collects health check information from the grid.
Go up to
Failure conditions and automatic corrective actions