Introduction to Sync Pull Notify

The "pull" approach is that the replica requests the data. This corresponds to the "refresh" operation in solidDB^® advanced replication. When the replica refreshes, it requests that the master server send all of the data in a particular publication (or the data that has changed since the last refresh operation).

The "push" approach is that the master sends the data to the replica at a time chosen by the master. (This is usually, but not necessarily, immediately after the data was updated on the master). When the master pushes the data, the replica must accept the data. solidDB^® servers do not use a true "push" method.

solidDB^® uses a hybrid approach called "sync pull notify". In this approach, the replica is notified that new data is available. The replica may then "pull" the new data by executing a REFRESH command. The REFRESH is optional; after being notified that there is new data, the replica can choose to refresh immediately, or to refresh after delaying, or simply to ignore the notification and not refresh at all. See the next section for details.

Generally, the master server notifies the replica by calling a remote stored procedure. The replica must already have created this stored procedure. Normally, this procedure will itself contain the appropriate REFRESH command(s). (It is possible to use more indirect methods, but this approach is the shortest and simplest.)

What induces the master to call the remote stored procedure? One way to do this is to create a trigger. For example, suppose that you know that replica_node_01 would like to refresh data in the table named "employees" every time the table is changed on the master. You could create INSERT, UPDATE, and DELETE triggers on the master's copy of the employees table. Each trigger would call the remote stored procedure on replica_node_01 that requests a refresh of the data.

Although the process of notifying a replica is straightforward, there is no automated way to decide which replica(s) should be notified. The person who creates or updates the triggers on the master must know the name of the stored procedure to call on each replica that wants to be notified.

Similarly, the replica must have REFRESH statements that specify the appropriate publications. There is no automated way of having the master server figure out which replicas subscribe to which publications and then automatically creating triggers to call procedures on those replicas. (This wouldn't be sufficient anyway because the master server would also have to compose the appropriate stored procedure and force the replica to create that stored procedure.)

Note If you decide to use a trigger to initiate Sync Pull Notify or use some other mechanism to run synchronization very frequently, always make sure that you are aware of the possible performance consequences. Because synchronization messaging is based on a store & forward architecture that writes data to disk before sending it over the network, there is always some overhead related to synchronization messaging. The relative overhead of the messaging is the bigger, the smaller amount of data is to be synchronized. For example, a solidDB^® server is capable of performing hundreds or thousands of update operations per second but it may be able to handle only some dozens of synchronization messages per second. If you write an update trigger into a frequently updated table to initiate synchronization for each occurred update, the update performance is limited by the synchronization messaging performance. Instead of achieving performance of 1000 updates per second, you may be able to run only 10 updates per second because every update causes a synchronization message that contains only one row of changed data. The remedy to this issue is to make the synchronization occur only, for example, once every few seconds. This way, each synchronization message contains more than one row in it and the overhead of the synchronization message is only a fraction of the worst case scenario.

The server that is executing this statement effectively creates a list of all the replicas that match the WHERE clause and then calls the specified procedure (my_proc) for each replica on the list. While that stored procedure is running, the DEFAULT clause identifies the replica that is currently being processed. For example, suppose that we have three replicas that match the WHERE clause, i.e. three replicas for which region = 'west':

The my_proc procedure is then called three times. During the first call DEFAULT is equal to "California"; during the second call DEFAULT is equal to "Oregon"; and during the third call the DEFAULT is equal to "Washington". If the stored procedure named my_proc wants to call a remote stored procedure on each of the three replicas, it can do so by using the syntax:

If the "remote_proc_name" contains a command to refresh from the master, then the effect will be that all three replicas are notified that there is updated data and they should refresh. Thus by using a combination of START AFTER COMMIT and remote procedure calls we have been able to notify every replica that needs to refresh the data.

One way to make such maintenance easier is to use the "properties" feature described earlier. For example, suppose that you have three replica servers named California, Oregon, and Washington, all of which refresh from a particular publication. Now suppose that you want to add a new replica named Arizona and it should also refresh from the same publication and thus should be notified under the same circumstances as the servers California, Oregon, and Washington. You can simply set the property for that new server to match the property that the other three servers have (and which is used in the WHERE clause of the START AFTER COMMIT). For example,

Sync Pull Notify may increase the load on your network. If your current synchronization approach is to synchronize each replica once per hour, and if you typically have multiple updates per hour, then you only use 1 set of network messages per replica per hour to synchronize. If you switch to using the Sync Pull Notify feature, however, then you will have as many sets of network messages per replica per hour as you have updates.

The converse is also true. If you currently refresh frequently but only have updates infrequently, then Sync Pull Notify will actually reduce your network traffic because you will only refresh when you actually need data; you will not need to frequently "poll" to see if any data has changed. If you work in a situation where updates are infrequent, but it's necessary for you to know about those updates immediately when they occur, then Sync Pull Notify is a very good solution.

Note that "pull" and "push" (Sync Pull Notify) approaches are not mutually exclusive. You may use a combination of these. As an example, you might choose to design your system so that every day at a specific time the master will notify the replicas that it is time to refresh. However, a repair person going out into the field and taking her replica database with her could also issue a "REFRESH" command just before leaving the office, thus making sure that she has the most up-to-date data.