solidDB Grid architecture

Note Check the solidDB Release Notes for any limitations that are associated with using a grid in the current release.

In order to fulfill the requirements of scalability, full functionality, and high availability, a number of mechanisms work seamlessly together in solidDB Grid:

▪ In order to reach read scalability and fault tolerance, data that is written to a grid is replicated to other nodes.

▪ In order to scale up when data volumes grow, data is horizontally partitioned in a balanced way that avoids having all the data in a single node.

▪ Both read and write latencies (when measured from the application) are at a similar level to those found with solidDB standalone databases.

Database drivers route most queries and write operations directly to the node where the operation is to be executed and avoid network hops between grid nodes.

▪ A grid supports continuous operation, allowing hardware and software upgrades of individual nodes to take place without disrupting the grid service.

▪ solidDB Grid fault tolerance is similar to solidDB HotStandby (HSB). A grid can continue operation in the case of node failures without any loss of committed data.

Each individual grid node is similar to a regular solidDB node, having database files, transaction log files, SQL connectivity, and other mechanisms.

You set up a grid by installing and starting the database server in each node and running a grid setup to connect the nodes, see Creating a grid.

You control a grid by using a set of extended SQL statements, see Working with grids.

To enable the scaling up of read operations and to provide fault tolerance in case of the loss of any single node, all write operations are replicated to multiple nodes.

The replication factor defines how many copies of a partition must be kept on nodes in the grid. The partition copies (replication units) form a replication group with one primary replication unit.

For write operations, the grid-aware database driver determines the location of the primary replication unit for the operation. The operation is executed in that replication unit. The grid then replicates the write operation to the appropriate read-only secondary replication units.

For read operations, the driver can route the queries to any of the replication units.

In grid partitioning, all rows that are written to a specific partition of a partitioned table belong to the same replication group. The group is determined by the partitioning key. Location information for the primary and secondary replication units of each replication group is maintained in each node and database driver (ODBC and JDBC).

Similarly to the solidDB HotStandby feature, a solidDB Grid solution can tolerate the failure of nodes by making redundant copies of all data on other nodes. If a node fails, the primary replication units of all replication groups that resided in the failed node are reallocated to surviving nodes that have the related data in secondary replication units.

After these secondary replication units have been converted to primary replication units, the grid can again process write operations to these replication groups. But, because a node has failed, the number of secondary replication units might not be sufficient to fulfill the replication factor. Hence, the grid starts an operation to create the required secondary replication units on other grid nodes. This operation is a background task that does not impact the service of database operations.

For automated background decisions on where to locate primary replication units and other operational and data-related conflict situations, the grid uses an internal high availability monitor, see Grid Availability Manager. Consensus algorithm transactions are used to consistently distribute decisions to all nodes in the grid.

When a grid needs to be scaled up or down, you add or remove nodes, see Processes involved in adding and removing grid nodes. After nodes are added, the grid starts reallocating the data across the nodes asynchronously and without blocking SQL traffic. When reallocation is complete, the data is evenly balanced across all nodes including the recently-added nodes. The process is reversed when nodes are removed.

You can use this method to upgrade hardware or software in an individual node. You can disconnect a node from the grid, upgrade the node, and add the node to the grid again afterwards.