Optimizing synchronization history data management

The first time that a replica refreshes from a publication in the master, the replica must download a copy of all the information in the publication; i.e. the replica must get a "full refresh". After that first download, each time the replica refreshes, the replica only needs to download the records that have changed since the previous download, i.e. an "incremental refresh".

When synchronizing changed data from the master to a replica, solidDB^® (in both the master and replica) must know the data written to these databases since the previous synchronization. If the data updates occurred, then solidDB^® must have a record of the previous version of the row before the update. These old versions of updated rows are recorded to a synchronization history table.

The use of incremental publications is highly recommended for optimum performance. For details on setting up incremental publications, read Creating incremental publications.

In a system where update operations are frequent (relative to how often the replicas request refreshes), the history tables can grow large. By default, solidDB^® creates a new row in the history table whenever a row of the main table is updated in the master database. This is not always necessary, however. You can reduce the amount of data stored in the history table by specifying which columns of a synchronized table need to cause a new entry to the history table during data updates. Only those columns in the publication that are used as search criteria (WHERE clause or join columns) need to be specified as history columns. (See Inside Information: SET HISTORY COLUMNS for technical details.) To specify these columns, use the following command:

If your publication definition contains all rows of a table, then specify the primary key column(s) as the HISTORY COLUMNS of that table.

Without this definition, all update operations in the master database cause a new entry to the history table when the corresponding synchronized table is updated. If you have rows that are frequently updated, setting history columns can significantly reduce overhead in terms of performance and disk space consumption in the master database.

Note In order for ALTER TABLE ... SET HISTORY COLUMNS to succeed, the statement ALTER TABLE ... SET SYNCHISTORY has to be executed first. Executing ALTER TABLE ... SET NOSYNCHISTORY removes also the effect of ALTER TABLE ... SET HISTORY COLUMNS.

Assume you have set the following table, which has already been set for synchronization history:

You can now use the following command to specify that the history entry occurs only if the accountid column value is changed by an update operation.

Now any changes to the balance column value do not cause a history table row update.

You might wonder why the history table only needs to track changes in certain columns. After all, each replica needs to be notified of ANY changes to the data, not just changes to certain columns. A slightly simplified example and explanation are below.

Suppose that your replica database contains information only for customers of the London branch office. Your replica must be notified of each of the following types of changes:

2 Addition of new customers to (or deletion of old customers from) the London office. For example, if you open up a Liverpool office and assign the Liverpool office some customers who were formerly assigned to the London office, then those customers should no longer be received by the London office when it requests refreshes of updated data. In fact, the master must explicitly notify the London replica that it should delete its copy of information about those customers.

In other words, there are changes to records within a set (the London set), and there are changes that cause records to move from one set to another set (for example, from the London set to the Liverpool set). Each of these 2 possible types of changes is tracked using a separate mechanism.

Changes within a set are tracked using the server's general versioning system. For example, if the London office last refreshed data at 12:01 AM December 3, 2001, then the London office will want updates for any records that are in London's set and that have changed since 12:01 AM December 3, 2001.

Changes that affect which set a record appears in (London, Liverpool, and so on) are tracked in the history table. Since the history table only needs to track addition of or removal of customers from a particular set, it only needs to track changes to the fields that determine which set the record is in. Those fields are the "search criteria" — i.e. the WHERE clause and the join fields that you specified when you defined the publication. Since the only way that a customer record can suddenly appear in, or disappear from, the London branch's publication is if there is a change to one of the columns used in the WHERE clause (or the join clause), the history table only needs to record changes to values in those columns.

An example is below:

CREATE TABLE invoice (
  customer_id VARCHAR NOT NULL,
  invoice_number VARCHAR NOT NULL,
  invoice_date DATE NOT NULL,
  invoice_total NUMERIC (12,2),
  PRIMARY KEY (customer_id, invoice_number));

CREATE PUBLICATION customers_by_salesman (salesman_id VARCHAR)
BEGIN
  RESULT SET FOR customer
  BEGIN
    SELECT * FROM customer WHERE salesman_id = :salesman_id ;
    RESULT SET FOR invoice
    BEGIN
      SELECT * FROM invoice WHERE customer_id = customer.id ;
    END
  END
END

To optimize this for update performance, the following ALTER TABLEs are needed:

CAUTION: If you execute the command ALTER TABLE tablename SET HISTORY COLUMNS, but you make a mistake and do not specify all the appropriate columns, then the refresh mechanism will not work properly, and records may not get stored in the proper replica(s). Each time you add a new publication, you must consider whether you need to execute the ...SET HISTORY COLUMNS command again to take into account additional columns used in the search criteria of the new publications.

The ALTER TABLE ... SET SYNC HISTORY command is never required. If the HISTORY COLUMN property is not set for a table, the synchronization will work properly but may not be optimized for performance.

After synchronization, a background process purges the obsolete data (i.e. data that is older than the most recent synchronization) from the history tables. You do not need to manually delete old history data to prevent it from accumulating.