Specifying recovery from fatal errors

The most important rule of error handling is that all transactions must commit in the master database. Any DBMS error is a fatal error and by default, causes the execution of the synchronization message to halt. These errors are system-level errors. For instance, if a write operation fails in the master database during synchronization because of a unique constraint violation, then the execution of the message is halted and an error code is returned to the replica database.

If a fatal error is detected by the business logic of a transaction, the transaction can be aborted and the further execution of the synchronization message halted by putting a rollback request to the bulletin board of the transaction. Remember that the COMMIT WORK and ROLLBACK WORK statements are not allowed in propagated transactions. However, the rollback request can be issued with the following system bulletin board parameters:

The SYS_ROLLBACK parameter is a system-recognized parameter. If the transaction sets the value to 'YES', then the server will automatically roll back the transaction. The previously committed transactions of the same synch message will remain committed. What happens to the rest of the synchronization message depends on the mode of the PROPAGATE TRANSACTIONS operation. (See “IGNORE_ERRORS, FAIL_ERRORS, and LOG_ERRORS flags for propagate transactions command” for details about the error-handling mode of the PROPAGATE TRANSACTIONS operation.)

Here is a sample scenario:

Let us assume that a transaction detects a violation of referential integrity of the database; for example, the customer of an order does not exist in the master database. The transaction can put the following parameters on the bulletin board in order to request rollback and return application-specific error codes:

Put_param(’SYS_ROLLBACK’, ’YES’);
Put_param(’SYS_ERROR_CODE’, ’90001’);
Put_param(’SYS_ERROR_TEXT’, ’Referential integrity violation detected’);

Note that because the transaction management of transactions is done outside the procedure, directly issuing the ROLLBACK WORK or COMMIT WORK command inside the procedure is never allowed.

If an error occurs during transaction propagation, the default behavior is that the server stops processing the message and aborts the current transaction. Any previous transactions in the message remain in effect. This means that you may wind up with just part of a message executed.

solidDB^® supports three error-handling modes for propagated messages.

▪IGNORE_ERRORS - This option means that if an error occurs, the transaction is aborted. Execution continues with the next transaction. In other words, an error does not abort the entire message.

▪LOG_ERRORS - Like IGNORE_ERRORS, this option means that if an error occurs, the transaction is aborted, and execution continues with the next transaction. In addition, failed transaction statements are saved in SYS_SYNC_RECEIVED_STMTS system table for later execution or investigation.

▪FAIL_ERRORS - This option means that if a statement fails, the current transaction is rolled back, and the server does not continue on to process subsequent transactions in the same message. (Any transactions that have already been committed are not undone.) This is the default error-handling mode for propagation.

There are three ways to specify which of these error-handling modes you want to apply to a particular message or transaction.

▪Use an appropriate keyword in the SAVE command. If you specify the error-handling mode in the SAVE command, then the specified mode applies only to that saved statement (not the entire transaction or the entire message).

▪Use a transaction bulletin board parameter to specify the error-handling mode. In this case, the specified mode applies to the current transaction.

▪Use a MESSAGE APPEND PROPAGATE TRANSACTIONS command to specify the error behavior. In this case, the specified mode applies to the entire message.

If error-handling options are specified in both the SAVE and the PROPAGATE TRANSACTIONS statement, then the error-handling options specified in the PROPAGATE TRANSACTIONS statement take precedence.

The failed messages can be examined using SYNC_FAILED_MESSAGES system view and they can be re-executed from there using statement MESSAGE <msg_id> FROM REPLICA <replica_name> RESTART <error_options>.

The syntax for setting values in the parameter bulletin board is shown below.

The parameter name for error handling is SYNC_DEFAULT_PROPAGATE_ERRORMODE

The parameter name used for autosave in the master is:

Autosave is used in hierarchies that have more than two levels. If a replica needs to propagate a transaction not to its direct master, but to a master above that, then the replica may use AUTOSAVE. AUTOSAVE is discussed in more detail later in this chapter.

PUT_PARAM(’SYNC_DEFAULT_PROPAGATE_ERRORMODE’, ’LOG_ERRORS’); PUT_PARAM(’SYNC_DEFAULT_PROPAGATE_SAVEMODE’, ’AUTOSAVE’);

The syntax for setting values in messages is shown below.

MESSAGE <message_name> APPEND PROPAGATE TRANSACTIONS
[ { IGNORE_ERRORS | LOG_ERRORS | FAIL_ERRORS } ] [WHERE ...]

Note that the autosave option is not possible in this statement.

▪NO CHECK: This option means that the statement is not prepared in the replica. This option is useful if the command would not make sense on the replica. For example, if the SQL command calls a stored procedure that exists on the master but not on the replica, then you do not want the replica to try to prepare the statement. If you use this option, then the statement can not have parameter markers.

▪IGNORE_ERRORS: This option means that if a statement fails while executing on the master, then the failed statement is ignored and the transaction is aborted. However, only the transaction, not the entire message, is aborted. The master continues executing the message, resuming with the first transaction after the failed one.

▪LOG_ERRORS: This means that if a statement failed while executing on the master, then the failed statement is ignored and the current transaction is aborted. The failed transaction's statements are saved in

SYS_SYNC_RECEIVED_STMTS system table for later execution or investigation. The failed transactions can be examined using SYNC_FAILED_MESSAGES system view and they can be re-executed from there using MESSAGE <msg_id> FROM REPLICA <replica_name> RESTART statement. Note that, as with the IGNORE_ERRORS option, aborting the transaction does not abort the entire message. The master continues executing the message, resuming with the first transaction after the failed one.

▪FAIL_ERRORS: This option means that if a statement fails, the master stops executing the message. This is the default behavior.

▪AUTOSAVE: This option means that the statement is executed in the master and automatically saved for further propagation if the master is also a replica to some other master (i.e. a middle-tier node).

▪AUTOSAVEONLY: This option means that the statement is NOT executed in the master but instead is automatically saved for further propagation if the master is also a replica to some other master (i.e. is a middle-tier node).

When a master database is propagating a message, the autosave setting is ignored if the node is not also a replica. In other words, the autosave setting is ignored for the topmost master in the hierarchy. The setting affects one node only, for example, all nodes must set it separately if needed.

The table sys_sync_master_msginfo has a new column FAILED_MSG_ID which is part of the primary key. The value is zero for normal messages. The value is msg_id if LOG_ERRORS option is ON and any errors exists. Also the SYS_SYNC_RECEIVED_STMTS table has errcode and err_str columns where actual errors are logged.

The autosave option takes effect if defined in SAVE or if defined in the master bulletin board.

Note If a replica registration or a publication registration message fails, the hung messages are automatically deleted from both the master and the replica, and the status is reset back to what it was before the message execution. The IGNORE_ERRORS, FAIL_ERRORS, and SAVE_ERRORS flags do not apply to these two types of messages.

If transactions were propagated with the LOG_ERRORS option, the server saves the statements that were in the transaction that failed. These statements can be examined the cause of the problem fixed. The failed statements can then be re-executed with the following statement:

If the problem cannot be corrected, or you do not re-execute the transactions that failed, you can delete the entire message with the following command:

Alternatively, you can delete one or more transactions within the message with the following command:

Important: Do not use the following statement to delete the messages:

The message_name is not valid after the message has completed its replica-perceived lifecycle. message_name is removed from the system, for example, after the statement MESSAGE ... GET REPLY. However, msgid applies for as long as the message is logged at the master.

You can create and send a synchronization message from a propagated transaction. Although in most situations explicit commits are not allowed in a propagated transaction, they are allowed when creating and sending a synchronization message from inside a propagated transaction. You may issue an explicit COMMIT in the middle of a propagated transaction if the transaction has not executed any DML statements and if the previous statement was one of the following: