21.7 NDB Cluster Replication

NDB Cluster supports asynchronous replication, more usually referred to simply as 'replication'. This section explains how to set up and manage a configuration in which one group of computers operating as an NDB Cluster replicates to a second computer or group of computers. We assume some familiarity on the part of the reader with standard MySQL replication as discussed elsewhere in this Manual. (See *note replication::).

Note:

NDB Cluster does not support replication using GTIDs; semisynchronous replication and group replication are also not supported by the 'NDB' storage engine.

Normal (non-clustered) replication involves a source server (formerly called a 'master') and a replica server (formerly referred to as a 'slave'), the source being so named because operations and data to be replicated originate with it, and the replica being the recipient of these. In NDB Cluster, replication is conceptually very similar but can be more complex in practice, as it may be extended to cover a number of different configurations including replicating between two complete clusters. Although an NDB Cluster itself depends on the note 'NDB': mysql-cluster. storage engine for clustering functionality, it is not necessary to use note 'NDB': mysql-cluster. as the storage engine for the replica's copies of the replicated tables (see *note mysql-cluster-replication-ndb-to-non-ndb::). However, for maximum availability, it is possible (and preferable) to replicate from one NDB Cluster to another, and it is this scenario that we discuss, as shown in the following figure:

FIGURE GOES HERE: NDB Cluster-to-Cluster Replication Layout

In this scenario, the replication process is one in which successive states of a source cluster are logged and saved to a replica cluster. This process is accomplished by a special thread known as the NDB binary log injector thread, which runs on each MySQL server and produces a binary log ('binlog'). This thread ensures that all changes in the cluster producing the binary log--and not just those changes that are effected through the MySQL Server--are inserted into the binary log with the correct serialization order. We refer to the MySQL source and replica servers as replication servers or replication nodes, and the data flow or line of communication between them as a replication channel.

For information about performing point-in-time recovery with NDB Cluster and NDB Cluster Replication, see *note mysql-cluster-replication-pitr::.

NDB API replica status variables

NDB API counters can provide enhanced monitoring capabilities on replica clusters. These counters are implemented as NDB statistics '_slave' status variables, as seen in the output of note 'SHOW STATUS': show-status, or in the results of queries against the note 'SESSION_STATUS': information-schema-status-table. or note 'GLOBAL_STATUS': information-schema-status-table. table in a note 'mysql': mysql. client session connected to a MySQL Server that is acting as a replica in NDB Cluster Replication. By comparing the values of these status variables before and after the execution of statements affecting replicated note 'NDB': mysql-cluster. tables, you can observe the corresponding actions taken on the NDB API level by the replica, which can be useful when monitoring or troubleshooting NDB Cluster Replication. note mysql-cluster-ndb-api-statistics::, provides additional information.

Replication from NDB to non-NDB tables

It is possible to replicate note 'NDB': mysql-cluster. tables from an NDB Cluster acting as the replication source to tables using other MySQL storage engines such as note 'InnoDB': innodb-storage-engine. or note 'MyISAM': myisam-storage-engine. on a replica note 'mysqld': mysqld. This is subject to a number of conditions; see note mysql-cluster-replication-ndb-to-non-ndb::, and note mysql-cluster-replication-ndb-to-nontransactional::, for more information.

 File: manual.info.tmp, Node: mysql-cluster-replication-abbreviations, Next: mysql-cluster-replication-general, Prev: mysql-cluster-replication, Up: mysql-cluster-replication

21.7.1 NDB Cluster Replication: Abbreviations and Symbols

Throughout this section, we use the following abbreviations or symbols for referring to the source and replica clusters, and to processes and commands run on the clusters or cluster nodes:

Abbreviations used throughout this section referring to source and replica clusters, and to processes and commands run on cluster nodes

Symbol or Description (Refers to...) Abbreviation

S The cluster serving as the (primary) replication source

R The cluster acting as the (primary) replica

'shellS>' Shell command to be issued on the source cluster

'mysqlS>' MySQL client command issued on a single MySQL server running as an SQL node on the source cluster

'mysqlS*>' MySQL client command to be issued on all SQL nodes participating in the replication source cluster

'shellR>' Shell command to be issued on the replica cluster

'mysqlR>' MySQL client command issued on a single MySQL server running as an SQL node on the replica cluster

'mysqlR*>' MySQL client command to be issued on all SQL nodes participating in the replica cluster

C Primary replication channel

C' Secondary replication channel

S' Secondary replication source

R' Secondary replica

 File: manual.info.tmp, Node: mysql-cluster-replication-general, Next: mysql-cluster-replication-issues, Prev: mysql-cluster-replication-abbreviations, Up: mysql-cluster-replication

21.7.2 General Requirements for NDB Cluster Replication

A replication channel requires two MySQL servers acting as replication servers (one each for the source and replica). For example, this means that in the case of a replication setup with two replication channels (to provide an extra channel for redundancy), there should be a total of four replication nodes, two per cluster.

Replication of an NDB Cluster as described in this section and those following is dependent on row-based replication. This means that the replication source MySQL server must be running with '--binlog-format=ROW' or '--binlog-format=MIXED', as described in note mysql-cluster-replication-starting::. For general information about row-based replication, see note replication-formats::.

Important:

If you attempt to use NDB Cluster Replication with '--binlog-format=STATEMENT', replication fails to work properly because the 'ndb_binlog_index' table on the source cluster and the 'epoch' column of the 'ndb_apply_status' table on the replica cluster are not updated (see *note mysql-cluster-replication-schema::). Instead, only updates on the MySQL server acting as the replication source propagate to the replica, and no updates from any other SQL nodes in the source cluster are replicated.

The default value for the '--binlog-format' option is 'MIXED'.

Each MySQL server used for replication in either cluster must be uniquely identified among all the MySQL replication servers participating in either cluster (you cannot have replication servers on both the source and replica clusters sharing the same ID). This can be done by starting each SQL node using the '--server-id=ID' option, where ID is a unique integer. Although it is not strictly necessary, we assume for purposes of this discussion that all NDB Cluster binaries are of the same release version.

It is generally true in MySQL Replication that both MySQL servers (note 'mysqld': mysqld. processes) involved must be compatible with one another with respect to both the version of the replication protocol used and the SQL feature sets which they support (see note replication-compatibility::). It is due to such differences between the binaries in the NDB Cluster and MySQL Server 5.7 distributions that NDB Cluster Replication has the additional requirement that both note 'mysqld': mysqld. binaries come from an NDB Cluster distribution. The simplest and easiest way to assure that the note 'mysqld': mysqld. servers are compatible is to use the same NDB Cluster distribution for all source and replica *note 'mysqld': mysqld. binaries.

We assume that the replica server or cluster is dedicated to replication of the source cluster, and that no other data is being stored on it.

All 'NDB' tables being replicated must be created using a MySQL server and client. Tables and other database objects created using the NDB API (with, for example, 'Dictionary::createTable()' (https://dev.mysql.com/doc/ndbapi/en/ndb-dictionary.html#ndb-dictionary-createtable)) are not visible to a MySQL server and so are not replicated. Updates by NDB API applications to existing tables that were created using a MySQL server can be replicated.

Note:

It is possible to replicate an NDB Cluster using statement-based replication. However, in this case, the following restrictions apply:

These are in addition to the other limitations of statement-based replication as opposed to row-based replication; see *note replication-sbr-rbr::, for more specific information concerning the differences between the two replication formats.

 File: manual.info.tmp, Node: mysql-cluster-replication-issues, Next: mysql-cluster-replication-schema, Prev: mysql-cluster-replication-general, Up: mysql-cluster-replication

21.7.3 Known Issues in NDB Cluster Replication

This section discusses known problems or issues when using replication with NDB Cluster.

Loss of connection between source and replica

A loss of connection can occur either between the source cluster SQL node and the replica cluster SQL node, or between the source SQL node and the data nodes of the source cluster. In the latter case, this can occur not only as a result of loss of physical connection (for example, a broken network cable), but due to the overflow of data node event buffers; if the SQL node is too slow to respond, it may be dropped by the cluster (this is controllable to some degree by adjusting the 'MaxBufferedEpochs' and 'TimeBetweenEpochs' configuration parameters). If this occurs, it is entirely possible for new data to be inserted into the source cluster without being recorded in the source SQL node's binary log. For this reason, to guarantee high availability, it is extremely important to maintain a backup replication channel, to monitor the primary channel, and to fail over to the secondary replication channel when necessary to keep the replica cluster synchronized with the source. NDB Cluster is not designed to perform such monitoring on its own; for this, an external application is required.

The source SQL node issues a 'gap' event when connecting or reconnecting to the source cluster. (A gap event is a type of 'incident event,' which indicates an incident that occurs that affects the contents of the database but that cannot easily be represented as a set of changes. Examples of incidents are server failures, database resynchronization, some software updates, and some hardware changes.) When the replica encounters a gap in the replication log, it stops with an error message. This message is available in the output of note 'SHOW SLAVE STATUS': show-slave-status, and indicates that the SQL thread has stopped due to an incident registered in the replication stream, and that manual intervention is required. See note mysql-cluster-replication-failover::, for more information about what to do in such circumstances.

Important:

Because NDB Cluster is not designed on its own to monitor replication status or provide failover, if high availability is a requirement for the replica server or cluster, then you must set up multiple replication lines, monitor the source note 'mysqld': mysqld. on the primary replication line, and be prepared fail over to a secondary line if and as necessary. This must be done manually, or possibly by means of a third-party application. For information about implementing this type of setup, see note mysql-cluster-replication-two-channels::, and *note mysql-cluster-replication-failover::.

If you are replicating from a standalone MySQL server to an NDB Cluster, one channel is usually sufficient.

Circular replication

NDB Cluster Replication supports circular replication, as shown in the next example. The replication setup involves three NDB Clusters numbered 1, 2, and 3, in which Cluster 1 acts as the replication source for Cluster 2, Cluster 2 acts as the source for Cluster 3, and Cluster 3 acts as the source for Cluster 1, thus completing the circle. Each NDB Cluster has two SQL nodes, with SQL nodes A and B belonging to Cluster 1, SQL nodes C and D belonging to Cluster 2, and SQL nodes E and F belonging to Cluster 3.

Circular replication using these clusters is supported as long as the following conditions are met:

This type of circular replication setup is shown in the following diagram:

FIGURE GOES HERE: NDB Cluster Circular Replication With All Sources As Replicas

In this scenario, SQL node A in Cluster 1 replicates to SQL node C in Cluster 2; SQL node C replicates to SQL node E in Cluster 3; SQL node E replicates to SQL node A. In other words, the replication line (indicated by the curved arrows in the diagram) directly connects all SQL nodes used as sources and replicas.

It should also be possible to set up circular replication in which not all source SQL nodes are also replicas, as shown here:

FIGURE GOES HERE: NDB Cluster Circular Replication Where Not All Sources Are Replicas

In this case, different SQL nodes in each cluster are used as sources and replicas. However, you must not start any of the SQL nodes with the 'log_slave_updates' system variable enabled. This type of circular replication scheme for NDB Cluster, in which the line of replication (again indicated by the curved arrows in the diagram) is discontinuous, should be possible, but it should be noted that it has not yet been thoroughly tested and must therefore still be considered experimental.

Note:

The *note 'NDB': mysql-cluster. storage engine uses idempotent execution mode, which suppresses duplicate-key and other errors that otherwise break circular replication of NDB Cluster. This is equivalent to setting the global 'slave_exec_mode' system variable to 'IDEMPOTENT', although this is not necessary in NDB Cluster replication, since NDB Cluster sets this variable automatically and ignores any attempts to set it explicitly.

NDB Cluster replication and primary keys

In the event of a node failure, errors in replication of note 'NDB': mysql-cluster. tables without primary keys can still occur, due to the possibility of duplicate rows being inserted in such cases. For this reason, it is highly recommended that all note 'NDB': mysql-cluster. tables being replicated have explicit primary keys.

NDB Cluster Replication and Unique Keys

In older versions of NDB Cluster, operations that updated values of unique key columns of note 'NDB': mysql-cluster. tables could result in duplicate-key errors when replicated. This issue is solved for replication between note 'NDB': mysql-cluster. tables by deferring unique key checks until after all table row updates have been performed.

Deferring constraints in this way is currently supported only by note 'NDB': mysql-cluster. Thus, updates of unique keys when replicating from note 'NDB': mysql-cluster. to a different storage engine such as note 'InnoDB': innodb-storage-engine. or note 'MyISAM': myisam-storage-engine. are still not supported.

The problem encountered when replicating without deferred checking of unique key updates can be illustrated using *note 'NDB': mysql-cluster. table such as 't', is created and populated on the source (and transmitted to a replica that does not support deferred unique key updates) as shown here:

 CREATE TABLE t (
     p INT PRIMARY KEY,
     c INT,
     UNIQUE KEY u (c)
 )   ENGINE NDB;

 INSERT INTO t
     VALUES (1,1), (2,2), (3,3), (4,4), (5,5);

The following *note 'UPDATE': update. statement on 't' succeeds on the source, since the rows affected are processed in the order determined by the 'ORDER BY' option, performed over the entire table:

 UPDATE t SET c = c - 1 ORDER BY p;

The same statement fails with a duplicate key error or other constraint violation on the replica, because the ordering of the row updates is performed for one partition at a time, rather than for the table as a whole.

Note:

Every note 'NDB': mysql-cluster. table is implicitly partitioned by key when it is created. See note partitioning-key::, for more information.

GTIDs not supported

Replication using global transaction IDs is not compatible with the 'NDB' storage engine, and is not supported. Enabling GTIDs is likely to cause NDB Cluster Replication to fail.

Multithreaded replicas not supported

NDB Cluster does not support multithreaded replicas. This is because the replica may not be able to separate transactions occurring in one database from those in another if they are written within the same epoch. In addition, every transaction handled by the note 'NDB': mysql-cluster. storage engine involves at least two databases--the target database and the 'mysql' system database--due to the requirement for updating the 'mysql.ndb_apply_status' table (see note mysql-cluster-replication-schema::). This in turn breaks the requirement for multithreading that the transaction is specific to a given database.

Prior to NDB 7.5.7 and NDB 7.6.3, setting any system variables relating to multithreaded slaves such as 'slave_parallel_workers' and 'slave_checkpoint_group' (or the equivalent *note 'mysqld': mysqld. startup options) was completely ignored, and had no effect.

Beginning with NDB 7.5.7 and NDB 7.6.3, 'slave_parallel_workers' is always 0. If set to any other value on startup, 'NDB' changes it to 0, and writes a message to the *note 'mysqld': mysqld. server log file.

Restarting with -initial

Restarting the cluster with the '--initial' option causes the sequence of GCI and epoch numbers to start over from '0'. (This is generally true of NDB Cluster and not limited to replication scenarios involving Cluster.) The MySQL servers involved in replication should in this case be restarted. After this, you should use the note 'RESET MASTER': reset-master. and note 'RESET SLAVE': reset-slave. statements to clear the invalid 'ndb_binlog_index' and 'ndb_apply_status' tables, respectively.

Replication from NDB to other storage engines

It is possible to replicate an *note 'NDB': mysql-cluster. table on the source to a table using a different storage engine on the replica, taking into account the restrictions listed here:

The next few paragraphs provide additional information about each of the issues just described.

Multiple sources not supported when replicating NDB to other storage engines

For replication from *note 'NDB': mysql-cluster. to a different storage engine, the relationship between the two databases must be one-to-one. This means that bidirectional or circular replication is not supported between NDB Cluster and other storage engines.

In addition, it is not possible to configure more than one replication channel when replicating between note 'NDB': mysql-cluster. and a different storage engine. (An NDB Cluster database can simultaneously replicate to multiple NDB Cluster databases.) If the source uses note 'NDB': mysql-cluster. tables, it is still possible to have more than one MySQL Server maintain a binary log of all changes, but for the replica to change sources (fail over), the new source-replica relationship must be explicitly defined on the replica.

Replicating NDB tables to a storage engine that does not perform binary logging

If you attempt to replicate from an NDB Cluster to a replica that uses a storage engine that does not handle its own binary logging, the replication process aborts with the error 'Binary logging not possible ... Statement cannot be written atomically since more than one engine involved and at least one engine is self-logging' (Error 1595). It is possible to work around this issue in one of the following ways:

Important:

You should not disable replication or binary logging of 'mysql.ndb_apply_status' or change the storage engine used for this table when replicating from one NDB Cluster to another. See *note mysql-cluster-replication-issues-filtering::, for details.

Replication from NDB to a nontransactional storage engine

When replicating from note 'NDB': mysql-cluster. to a nontransactional storage engine such as note 'MyISAM': myisam-storage-engine, you may encounter unnecessary duplicate key errors when replicating *note 'INSERT ... ON DUPLICATE KEY UPDATE': insert-on-duplicate. statements. You can suppress these by using '--ndb-log-update-as-write=0', which forces updates to be logged as writes, rather than as updates.

Replication and binary log filtering rules with replication between NDB Clusters

If you are using any of the options '--replicate-do-', '--replicate-ignore-', '--binlog-do-db', or '--binlog-ignore-db' to filter databases or tables being replicated, you must take care not to block replication or binary logging of the 'mysql.ndb_apply_status', which is required for replication between NDB Clusters to operate properly. In particular, you must keep in mind the following:

  1. Using '--replicate-do-db=DB_NAME' (and no other '--replicate-do-' or '--replicate-ignore-' options) means that only tables in database DB_NAME are replicated. In this case, you should also use '--replicate-do-db=mysql', '--binlog-do-db=mysql', or '--replicate-do-table=mysql.ndb_apply_status' to ensure that 'mysql.ndb_apply_status' is populated on replicas.

    Using '--binlog-do-db=DB_NAME' (and no other '--binlog-do-db' options) means that changes only to tables in database DB_NAME are written to the binary log. In this case, you should also use '--replicate-do-db=mysql', '--binlog-do-db=mysql', or '--replicate-do-table=mysql.ndb_apply_status' to ensure that 'mysql.ndb_apply_status' is populated on replicas.

  2. Using '--replicate-ignore-db=mysql' means that no tables in the 'mysql' database are replicated. In this case, you should also use '--replicate-do-table=mysql.ndb_apply_status' to ensure that 'mysql.ndb_apply_status' is replicated.

    Using '--binlog-ignore-db=mysql' means that no changes to tables in the 'mysql' database are written to the binary log. In this case, you should also use '--replicate-do-table=mysql.ndb_apply_status' to ensure that 'mysql.ndb_apply_status' is replicated.

You should also remember that each replication rule requires the following:

  1. Its own '--replicate-do-' or '--replicate-ignore-' option, and that multiple rules cannot be expressed in a single replication filtering option. For information about these rules, see *note replication-options::.

  2. Its own '--binlog-do-db' or '--binlog-ignore-db' option, and that multiple rules cannot be expressed in a single binary log filtering option. For information about these rules, see *note binary-log::.

If you are replicating an NDB Cluster to a replica that uses a storage engine other than *note 'NDB': mysql-cluster, the considerations just given previously may not apply, as discussed elsewhere in this section.

NDB Cluster Replication and IPv6

While the NDB API and MGM API (and thus data nodes and management nodes) do not support IPv6 in NDB 7.5 and 7.6, MySQL Servers--including those acting as SQL nodes in an NDB Cluster--can use IPv6 to contact other MySQL Servers. This means that you can replicate between NDB Clusters using IPv6 to connect the source and replica SQL nodes as shown by the dotted arrow in the following diagram:

FIGURE GOES HERE: Replication Between SQL Nodes Connected Using IPv6

All connections originating within the NDB Cluster --represented in the preceding diagram by solid arrows--must use IPv4. In other words, all NDB Cluster data nodes, management servers, and management clients must be accessible from one another using IPv4. In addition, SQL nodes must use IPv4 to communicate with the cluster.

Since there is currently no support in the NDB and MGM APIs for IPv6, any applications written using these APIs must also make all connections using IPv4.

Attribute promotion and demotion

NDB Cluster Replication includes support for attribute promotion and demotion. The implementation of the latter distinguishes between lossy and non-lossy type conversions, and their use on the replica can be controlled by setting the 'slave_type_conversions' global server system variable.

For more information about attribute promotion and demotion in NDB Cluster, see *note replication-features-attribute-promotion::.

'NDB', unlike note 'InnoDB': innodb-storage-engine. or note 'MyISAM': myisam-storage-engine, does not write changes to virtual columns to the binary log; however, this has no detrimental effects on NDB Cluster Replication or replication between 'NDB' and other storage engines. Changes to stored generated columns are logged.

 File: manual.info.tmp, Node: mysql-cluster-replication-schema, Next: mysql-cluster-replication-preparation, Prev: mysql-cluster-replication-issues, Up: mysql-cluster-replication

21.7.4 NDB Cluster Replication Schema and Tables

Replication in NDB Cluster makes use of a number of dedicated tables in the 'mysql' database on each MySQL Server instance acting as an SQL node in both the cluster being replicated and in the replica. This is true regardless of whether the replica is a single server or a cluster.

The 'ndb_binlog_index' and 'ndb_apply_status' tables are created in the 'mysql' database. They should not be explicitly replicated by the user. User intervention is normally not required to create or maintain either of these tables, since both are maintained by the note 'NDB': mysql-cluster. binary log (binlog) injector thread. This keeps the source note 'mysqld': mysqld. process updated to changes performed by the note 'NDB': mysql-cluster. storage engine. The note 'NDB': mysql-cluster. binlog injector thread receives events directly from the note 'NDB': mysql-cluster. storage engine. The note 'NDB': mysql-cluster. injector is responsible for capturing all the data events within the cluster, and ensures that all events which change, insert, or delete data are recorded in the 'ndb_binlog_index' table. The replica I/O thread transfers the events from the source's binary log to the replica's relay log.

The 'ndb_replication' table must be created manually. This table can be updated by the user to perform filtering by database or table. See note ndb-replication-ndb-replication::, for more information. 'ndb_replication' is also used in NDB Replication conflict detection and resolution for conflict resolution control; see note conflict-resolution-control::.

Even though 'ndb_binlog_index' and 'ndb_apply_status' are created and maintained automatically, it is advisable to check for the existence and integrity of these tables as an initial step in preparing an NDB Cluster for replication. It is possible to view event data recorded in the binary log by querying the 'mysql.ndb_binlog_index' table directly on the source. This can be also be accomplished using the note 'SHOW BINLOG EVENTS': show-binlog-events. statement on either the source or replica SQL node. (See note show-binlog-events::.)

You can also obtain useful information from the output of *note 'SHOW ENGINE NDB STATUS': show-engine.

Note:

When performing schema changes on note 'NDB': mysql-cluster. tables, applications should wait until the note 'ALTER TABLE': alter-table. statement has returned in the MySQL client connection that issued the statement before attempting to use the updated definition of the table.

ndb_apply_status Table

'ndb_apply_status' is used to keep a record of the operations that have been replicated from the source to the replica. If the 'ndb_apply_status' table does not exist on the replica, *note 'ndb_restore': mysql-cluster-programs-ndb-restore. re-creates it.

Unlike the case with 'ndb_binlog_index', the data in this table is not specific to any one SQL node in the (replica) cluster, and so 'ndb_apply_status' can use the 'NDBCLUSTER' storage engine, as shown here:

 CREATE TABLE `ndb_apply_status` (
     `server_id`   INT(10) UNSIGNED NOT NULL,
     `epoch`       BIGINT(20) UNSIGNED NOT NULL,
     `log_name`    VARCHAR(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
     `start_pos`   BIGINT(20) UNSIGNED NOT NULL,
     `end_pos`     BIGINT(20) UNSIGNED NOT NULL,
     PRIMARY KEY (`server_id`) USING HASH
 ) ENGINE=NDBCLUSTER   DEFAULT CHARSET=latin1;

The 'ndb_apply_status' table is populated only on replicas, which means that, on the source, this table never contains any rows; thus, there is no need to allot any 'DataMemory' to 'ndb_apply_status' there.

Because this table is populated from data originating on the source, it should be allowed to replicate; any replication filtering or binary log filtering rules that inadvertently prevent the replica from updating 'ndb_apply_status', or that prevent the source from writing into the binary log may prevent replication between clusters from operating properly. For more information about potential problems arising from such filtering rules, see *note mysql-cluster-replication-issues-filtering::.

'0' in the 'epoch' column of this table indicates a transaction originating from a storage engine other than 'NDB'.

ndb_binlog_index Table

NDB Cluster Replication uses the 'ndb_binlog_index' table for storing the binary log's indexing data. Since this table is local to each MySQL server and does not participate in clustering, it uses the note 'InnoDB': innodb-storage-engine. storage engine. This means that it must be created separately on each note 'mysqld': mysqld. participating in the source cluster. (The binary log itself contains updates from all MySQL servers in the cluster.) This table is defined as follows:

 CREATE TABLE `ndb_binlog_index` (
     `Position` BIGINT(20) UNSIGNED NOT NULL,
     `File` VARCHAR(255) NOT NULL,
     `epoch` BIGINT(20) UNSIGNED NOT NULL,
     `inserts` INT(10) UNSIGNED NOT NULL,
     `updates` INT(10) UNSIGNED NOT NULL,
     `deletes` INT(10) UNSIGNED NOT NULL,
     `schemaops` INT(10) UNSIGNED NOT NULL,
     `orig_server_id` INT(10) UNSIGNED NOT NULL,
     `orig_epoch` BIGINT(20) UNSIGNED NOT NULL,
     `gci` INT(10) UNSIGNED NOT NULL,
     `next_position` bigint(20) unsigned NOT NULL,
     `next_file` varchar(255) NOT NULL,
     PRIMARY KEY (`epoch`,`orig_server_id`,`orig_epoch`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Note:

Prior to NDB 7.5.2, this table always used the note 'MyISAM': myisam-storage-engine. storage engine. If you are upgrading from an earlier release, you can use note 'mysql_upgrade': mysql-upgrade. with the '--force' and '--upgrade-system-tables' options after starting the server.) The system table upgrade causes an *note 'ALTER TABLE ... ENGINE=INNODB': alter-table. statement to be executed for this table. Use of the 'MyISAM' storage engine for this table continues to be supported for backward compatibility.

'ndb_binlog_index' may require additional disk space after being converted to 'InnoDB'. If this becomes an issue, you may be able to conserve space by using an 'InnoDB' tablespace for this table, changing its 'ROW_FORMAT' to 'COMPRESSED', or both. For more information, see note create-tablespace::, and note create-table::, as well as *note innodb-tablespace::.

The size of the 'ndb_binlog_index' table is dependent on the number of epochs per binary log file and the number of binary log files. The number of epochs per binary log file normally depends on the amount of binary log generated per epoch and the size of the binary log file, with smaller epochs resulting in more epochs per file. You should be aware that empty epochs produce inserts to the 'ndb_binlog_index' table, even when the '--ndb-log-empty-epochs' option is 'OFF', meaning that the number of entries per file depends on the length of time that the file is in use; this relationship can be represented by the formula shown here:

 [number of epochs per file] = [time spent per file] / TimeBetweenEpochs

A busy NDB Cluster writes to the binary log regularly and presumably rotates binary log files more quickly than a quiet one. This means that a 'quiet' NDB Cluster with '--ndb-log-empty-epochs=ON' can actually have a much higher number of 'ndb_binlog_index' rows per file than one with a great deal of activity.

When note 'mysqld': mysqld. is started with the '--ndb-log-orig' option, the 'orig_server_id' and 'orig_epoch' columns store, respectively, the ID of the server on which the event originated and the epoch in which the event took place on the originating server, which is useful in NDB Cluster replication setups employing multiple sources. The note 'SELECT': select. statement used to find the closest binary log position to the highest applied epoch on the replica in a multi-source setup (see *note mysql-cluster-replication-multi-source::) employs these two columns, which are not indexed. This can lead to performance issues when trying to fail over, since the query must perform a table scan, especially when the source has been running with '--ndb-log-empty-epochs=ON'. You can improve multi-source failover times by adding an index to these columns, as shown here:

 ALTER TABLE mysql.ndb_binlog_index
     ADD INDEX orig_lookup USING BTREE (orig_server_id, orig_epoch);

Adding this index provides no benefit when replicating from a single source to a single replica, since the query used to get the binary log position in such cases makes no use of 'orig_server_id' or 'orig_epoch'.

See *note mysql-cluster-replication-failover::, for more information about using the 'next_position' and 'next_file' columns.

The following figure shows the relationship of the NDB Cluster replication source server, its binary log injector thread, and the 'mysql.ndb_binlog_index' table.

FIGURE GOES HERE: The Replication Source Cluster

ndb_replication Table

The 'ndb_replication' table is used to control binary logging and conflict resolution, and acts on a per-table basis. Each row in this table corresponds to a table being replicated, determines how to log changes to the table and, if a conflict resolution function is specified, and determines how to resolve conflicts for that table.

Unlike the 'ndb_apply_status' and 'ndb_replication' tables, the 'ndb_replication' table must be created manually, using the SQL statement shown here:

 CREATE TABLE mysql.ndb_replication  (
     db VARBINARY(63),
     table_name VARBINARY(63),
     server_id INT UNSIGNED,
     binlog_type INT UNSIGNED,
     conflict_fn VARBINARY(128),
     PRIMARY KEY USING HASH (db, table_name, server_id)
 )   ENGINE=NDB
 PARTITION BY KEY(db,table_name);

The columns of this table are listed here, with descriptions:

To enable conflict resolution with NDB Replication, it is necessary to create and populate this table with control information on the SQL node or nodes on which the conflict should be resolved. Depending on the conflict resolution type and method to be employed, this may be the source, the replica, or both servers. In a simple source-replica setup where data can also be changed locally on the replica this is typically the replica. In a more complex replication scheme, such as bidirectional replication, this is usually all of the sources involved. See *note mysql-cluster-replication-conflict-resolution::, for more information.

The 'ndb_replication' table allows table-level control over binary logging outside the scope of conflict resolution, in which case 'conflict_fn' is specified as 'NULL', while the remaining column values are used to control binary logging for a given table or set of tables matching a wildcard expression. By setting the proper value for the 'binlog_type' column, you can make logging for a given table or tables use a desired binary log format, or disabling binary logging altogether. Possible values for this column, with

values and descriptions, are shown in the following table:

binlog_type values, with values and descriptions

Value Description

0 Use server default

1 Do not log this table in the binary log (same effect as 'sql_log_bin = 0', but applies to one or more specified tables only)

2 Log updated attributes only; log these as 'WRITE_ROW' events

3 Log full row, even if not updated (MySQL server default behavior)

6 Use updated attributes, even if values are unchanged

7 Log full row, even if no values are changed; log updates as 'UPDATE_ROW' events

8 Log update as 'UPDATE_ROW'; log only primary key columns in before image, and only updated columns in after image (same effect as '--ndb-log-update-minimal', but applies to one or more specified tables only)

9 Log update as 'UPDATE_ROW'; log only primary key columns in before image, and all columns other than primary key columns in after image

Note:

'binlog_type' values 4 and 5 are not used, and so are omitted from the table just shown, as well as from the next table.

Several 'binlog_type' values are equivalent to various combinations of the *note 'mysqld': mysqld. logging options '--ndb-log-updated-only', '--ndb-log-update-as-write', and '--ndb-log-update-minimal', as shown in the following table:

binlog_type values with equivalent combinations of NDB logging options

Value '--ndb-log-updated-only''--ndb-log-update-as-write''--ndb-log-update-minimal' Value Value Value

0 - - -

1 - - -

2 ON ON OFF

3 OFF ON OFF

6 ON OFF OFF

7 OFF OFF OFF

8 ON OFF ON

9 OFF OFF ON

Binary logging can be set to different formats for different tables by inserting rows into the 'ndb_replication' table using the appropriate 'db', 'table_name', and 'binlog_type' column values. The internal integer value shown in the preceding table should be used when setting the binary logging format. The following two statements set binary logging to logging of full rows (

value 3) for table 'test.a', and to logging of updates only (

value 2) for table 'test.b':

 # Table test.a: Log full rows
 INSERT INTO mysql.ndb_replication VALUES("test", "a", 0, 3, NULL);

 # Table test.b: log updates only
 INSERT INTO mysql.ndb_replication VALUES("test", "b", 0, 2, NULL);

To disable logging for one or more tables, use 1

for 'binlog_type', as shown here:

 # Disable binary logging for table test.t1
 INSERT INTO mysql.ndb_replication VALUES("test", "t1", 0, 1, NULL);

 # Disable binary logging for any table in 'test' whose name begins with 't'
 INSERT INTO mysql.ndb_replication VALUES("test", "t%", 0, 1, NULL);

Disabling logging

for a given table is the equivalent of setting 'sql_log_bin = 0', except that it applies to one or more tables individually.

If an SQL node is not performing binary logging for a given table, it is not sent the row change events for those tables. This means that it is not receiving all changes and discarding some, but rather it is not subscribing to these changes.

Disabling logging

can be useful for a number of reasons, including those listed here:

Matching with wildcards

In order not to make it necessary to insert a row in the 'ndb_replication' table for each and every combination of database, table, and SQL node in your replication setup, 'NDB' supports wildcard matching on the this table's 'db', 'table_name', and 'server_id' columns. Database and table names used in, respectively, 'db' and 'table_name' may contain either or both of the following wildcards:

(These are the same wildcards as supported by the MySQL 'LIKE' operator.)

The 'server_id' column supports '0' as a wildcard equivalent to '_' (matches anything). This is used in the examples shown previously.

A given row in the 'ndb_replication' table can use wildcards to match any of the database name, table name, and server ID in any combination. Where there are multiple potential matches in the table, the best match is chosen, according to the table shown here, where W represents a wildcard match, E an exact match, and the greater the value in the Quality column, the better the match:

'db' 'table_name' 'server_id' Quality

W W W 1

W W E 2

W E W 3

W E E 4

E W W 5

E W E 6

E E W 7

E E E 8

Thus, an exact match on database name, table name, and server ID is considered best (strongest), while the weakest (worst) match is a wildcard match on all three columns. Only the strength of the match is considered when choosing which rule to apply; the order in which the rows occur in the table has no effect on this determination.

Logging Full or Partial Rows

There are two basic methods of logging rows, as determined by the setting of the '--ndb-log-updated-only' option for *note 'mysqld': mysqld.:

It is usually sufficient--and more efficient--to log updated columns only; however, if you need to log full rows, you can do so by setting '--ndb-log-updated-only' to '0' or 'OFF'.

Logging Changed Data as Updates

The setting of the MySQL Server's '--ndb-log-update-as-write' option determines whether logging is performed with or without the 'before' image.

Because conflict resolution for updates and delete operations is done in the MySQL Server's update handler, it is necessary to control the logging performed by the replication source such that updates are updates and not writes; that is, such that updates are treated as changes in existing rows rather than the writing of new rows, even though these replace existing rows.

This option is turned on by default; in other words, updates are treated as writes. That is, updates are by default written as 'write_row' events in the binary log, rather than as 'update_row' events.

To disable the option, start the source note 'mysqld': mysqld. with '--ndb-log-update-as-write=0' or '--ndb-log-update-as-write=OFF'. You must do this when replicating from NDB tables to tables using a different storage engine; see note mysql-cluster-replication-ndb-to-non-ndb::, and *note mysql-cluster-replication-ndb-to-nontransactional::, for more information.

 File: manual.info.tmp, Node: mysql-cluster-replication-preparation, Next: mysql-cluster-replication-starting, Prev: mysql-cluster-replication-schema, Up: mysql-cluster-replication

21.7.5 Preparing the NDB Cluster for Replication

Preparing the NDB Cluster for replication consists of the following steps:

  1. Check all MySQL servers for version compatibility (see *note mysql-cluster-replication-general::).

  2. Create a replication account on the source Cluster with the appropriate privileges, using the following two SQL statements:

      mysqlS> CREATE USER 'REPLICA_USER'@'REPLICA_HOST'
           -> IDENTIFIED BY 'REPLICA_PASSWORD';
    
      mysqlS> GRANT REPLICATION SLAVE ON *.*
           -> TO 'REPLICA_USER'@'REPLICA_HOST';

    In the previous statement, REPLICA_USER is the replication account user name, REPLICA_HOST is the host name or IP address of the replica, and REPLICA_PASSWORD is the password to assign to this account.

    For example, to create a replica user account with the name 'myreplica', logging in from the host named 'replica-host', and using the password '53cr37', use the following note 'CREATE USER': create-user. and note 'GRANT': grant. statements:

      mysqlS> CREATE USER 'myreplica'@'replica-host'
           -> IDENTIFIED BY '53cr37';
    
      mysqlS> GRANT REPLICATION SLAVE ON *.*
           -> TO 'myreplica'@'replica-host';

    For security reasons, it is preferable to use a unique user account--not employed for any other purpose--for the replication account.

  3. Set up the replica to use the source. Using the note 'mysql': mysql. client, this can be accomplished with the following note 'CHANGE MASTER TO': change-master-to. statement:

      mysqlR> CHANGE MASTER TO
           -> MASTER_HOST='SOURCE_HOST',
           -> MASTER_PORT=SOURCE_PORT,
           -> MASTER_USER='REPLICA_USER',
           -> MASTER_PASSWORD='REPLICA_PASSWORD';

    In the previous statement, SOURCE_HOST is the host name or IP address of the replication source, SOURCE_PORT is the port for the replica to use when connecting to the source, REPLICA_USER is the user name set up for the replica on the source, and REPLICA_PASSWORD is the password set for that user account in the previous step.

    For example, to tell the replica to use the MySQL server whose host name is 'rep-source' with the replication account created in the previous step, use the following statement:

      mysqlR> CHANGE MASTER TO
           -> MASTER_HOST='rep-source',
           -> MASTER_PORT=3306,
           -> MASTER_USER='myreplica',
           -> MASTER_PASSWORD='53cr37';

    For a complete list of options that can be used with this statement, see *note change-master-to::.

    To provide replication backup capability, you also need to add an '--ndb-connectstring' option to the replica's 'my.cnf' file prior to starting the replication process. See *note mysql-cluster-replication-backups::, for details.

    For additional options that can be set in 'my.cnf' for replicas, see *note replication-options::.

  4. If the source cluster is already in use, you can create a backup of the source and load this onto the replica to cut down on the amount of time required for the replica to synchronize itself with the source. If the replica is also running NDB Cluster, this can be accomplished using the backup and restore procedure described in *note mysql-cluster-replication-backups::.

      ndb-connectstring=MANAGEMENT_HOST[:PORT]

    In the event that you are not using NDB Cluster on the replica, you can create a backup with this command on the source:

      shellS> mysqldump --master-data=1

    Then import the resulting data dump onto the replica by copying the dump file over to it. After this, you can use the note 'mysql': mysql. client to import the data from the dumpfile into the replica database as shown here, where DUMP_FILE is the name of the file that was generated using note 'mysqldump': mysqldump. on the source, and DB_NAME is the name of the database to be replicated:

      shellR> mysql -u root -p DB_NAME < DUMP_FILE

    For a complete list of options to use with note 'mysqldump': mysqldump, see note mysqldump::.

    Note:

    If you copy the data to the replica in this fashion, you should make sure that the replica is started with the '--skip-slave-start' option on the command line, or else include 'skip-slave-start' in the replica's 'my.cnf' file to keep it from trying to connect to the source to begin replicating before all the data has been loaded. Once the data loading has completed, follow the additional steps outlined in the next two sections.

  5. Ensure that each MySQL server acting as a replication source is assigned a unique server ID, and has binary logging enabled, using the row-based format. (See note replication-formats::.) In addition, we recommend enabling the 'slave_allow_batching' system variable; beginning with NDB 7.6.23, a warning is issued if this variable is set to 'OFF'. You should also consider increasing the values used with the '--ndb-batch-size' and '--ndb-blob-write-batch-bytes' options as well. All of these options can be set either in the source server's 'my.cnf' file, or on the command line when starting the source note 'mysqld': mysqld. process. See *note mysql-cluster-replication-starting::, for more information.

 File: manual.info.tmp, Node: mysql-cluster-replication-starting, Next: mysql-cluster-replication-two-channels, Prev: mysql-cluster-replication-preparation, Up: mysql-cluster-replication

21.7.6 Starting NDB Cluster Replication (Single Replication Channel)

This section outlines the procedure for starting NDB Cluster replication using a single replication channel.

  1. Start the MySQL replication source server by issuing this command, where ID is this server's unique ID (see *note mysql-cluster-replication-general::):

      shellS> mysqld --ndbcluster --server-id=ID \
              --log-bin --ndb-log-bin &

    This starts the server's *note 'mysqld': mysqld. process with binary logging enabled using the proper logging format.

    Note:

    You can also start the source with '--binlog-format=MIXED', in which case row-based replication is used automatically when replicating between clusters. Statement-based binary logging is not supported for NDB Cluster Replication (see *note mysql-cluster-replication-general::).

  2. Start the MySQL replica server as shown here:

      shellR> mysqld --ndbcluster --server-id=ID &

    In the command just shown, ID is the replica server's unique ID. It is not necessary to enable logging on the replica.

    Note:

    You should use the '--skip-slave-start' option with this command or else you should include 'skip-slave-start' in the replica server's 'my.cnf' file, unless you want replication to begin immediately. With the use of this option, the start of replication is delayed until the appropriate *note 'START SLAVE': start-slave. statement has been issued, as explained in Step 4 below.

  3. It is necessary to synchronize the replica server with the source server's replication binary log. If binary logging has not previously been running on the source, run the following statement on the replica:

      mysqlR> CHANGE MASTER TO
           -> MASTER_LOG_FILE='',
           -> MASTER_LOG_POS=4;

    This instructs the replica to begin reading the source server's binary log from the log's starting point. Otherwise--that is, if you are loading data from the source using a backup--see *note mysql-cluster-replication-failover::, for information on how to obtain the correct values to use for 'MASTER_LOG_FILE' and 'MASTER_LOG_POS' in such cases.

  4. Finally, instruct the replica to begin applying replication by issuing this command from the *note 'mysql': mysql. client on the replica:

      mysqlR> START SLAVE;

    This also initiates the transmission of data and changes from the source to the replica.

It is also possible to use two replication channels, in a manner similar to the procedure described in the next section; the differences between this and using a single replication channel are covered in *note mysql-cluster-replication-two-channels::.

It is also possible to improve cluster replication performance by enabling batched updates. This can be accomplished by setting the 'slave_allow_batching' system variable on the replicas' *note 'mysqld': mysqld. processes. Normally, updates are applied as soon as they are received. However, the use of batching causes updates to be applied in batches of 32 KB each; this can result in higher throughput and less CPU usage, particularly where individual updates are relatively small.

Note:

Batching works on a per-epoch basis; updates belonging to more than one transaction can be sent as part of the same batch.

All outstanding updates are applied when the end of an epoch is reached, even if the updates total less than 32 KB.

Batching can be turned on and off at runtime. To activate it at runtime, you can use either of these two statements:

 SET GLOBAL slave_allow_batching = 1;
 SET GLOBAL slave_allow_batching = ON;

If a particular batch causes problems (such as a statement whose effects do not appear to be replicated correctly), batching can be deactivated using either of the following statements:

 SET GLOBAL slave_allow_batching = 0;
 SET GLOBAL slave_allow_batching = OFF;

You can check whether batching is currently being used by means of an appropriate *note 'SHOW VARIABLES': show-variables. statement, like this one:

 mysql> SHOW VARIABLES LIKE 'slave%';
 +---------------------------+-------+
 | Variable_name             | Value |
 +---------------------------+-------+
 | slave_allow_batching      | ON    |
 | slave_compressed_protocol | OFF   |
 | slave_load_tmpdir         | /tmp  |
 | slave_net_timeout         | 3600  |
 | slave_skip_errors         | OFF   |
 | slave_transaction_retries | 10    |
 +---------------------------+-------+
 6 rows in set (0.00 sec)

 File: manual.info.tmp, Node: mysql-cluster-replication-two-channels, Next: mysql-cluster-replication-failover, Prev: mysql-cluster-replication-starting, Up: mysql-cluster-replication

21.7.7 Using Two Replication Channels for NDB Cluster Replication

In a more complete example scenario, we envision two replication channels to provide redundancy and thereby guard against possible failure of a single replication channel. This requires a total of four replication servers, two source servers on the source cluster and two replica servers on the replica cluster. For purposes of the discussion that follows, we assume that unique identifiers are assigned as shown here:

NDB Cluster replication servers described in the text

Server ID Description

1 Source - primary replication channel (S)

2 Source - secondary replication channel (S')

3 Replica - primary replication channel (R)

4 replica - secondary replication channel (R')

Setting up replication with two channels is not radically different from setting up a single replication channel. First, the note 'mysqld': mysqld. processes for the primary and secondary replication source servers must be started, followed by those for the primary and secondary replicas. The replication processes can be initiated by issuing the note 'START SLAVE': start-slave. statement on each of the replicas. The commands and the order in which they need to be issued are shown here:

  1. Start the primary replication source:

      shellS> mysqld --ndbcluster --server-id=1 \
                     --log-bin &
  2. Start the secondary replication source:

      shellS'> mysqld --ndbcluster --server-id=2 \
                     --log-bin &
  3. Start the primary replica server:

      shellR> mysqld --ndbcluster --server-id=3 \
                     --skip-slave-start &
  4. Start the secondary replica server:

      shellR'> mysqld --ndbcluster --server-id=4 \
                      --skip-slave-start &
  5. Finally, initiate replication on the primary channel by executing the *note 'START SLAVE': start-slave. statement on the primary replica as shown here:

      mysqlR> START SLAVE;

    Warning:

    Only the primary channel must be started at this point. The secondary replication channel needs to be started only in the event that the primary replication channel fails, as described in *note mysql-cluster-replication-failover::. Running multiple replication channels simultaneously can result in unwanted duplicate records being created on the replicas.

As mentioned previously, it is not necessary to enable binary logging on the replicas.

 File: manual.info.tmp, Node: mysql-cluster-replication-failover, Next: mysql-cluster-replication-backups, Prev: mysql-cluster-replication-two-channels, Up: mysql-cluster-replication

21.7.8 Implementing Failover with NDB Cluster Replication

In the event that the primary Cluster replication process fails, it is possible to switch over to the secondary replication channel. The following procedure describes the steps required to accomplish this.

  1. Obtain the time of the most recent global checkpoint (GCP). That is, you need to determine the most recent epoch from the 'ndb_apply_status' table on the replica cluster, which can be found using the following query:

      mysqlR'> SELECT @latest:=MAX(epoch)
            ->        FROM mysql.ndb_apply_status;

    In a circular replication topology, with a source and a replica running on each host, when you are using 'ndb_log_apply_status=1', NDB Cluster epochs are written in the replicas' binary logs. This means that the 'ndb_apply_status' table contains information for the replica on this host as well as for any other host which acts as a replica of the replication source server running on this host.

    In this case, you need to determine the latest epoch on this replica to the exclusion of any epochs from any other replicas in this replica's binary log that were not listed in the 'IGNORE_SERVER_IDS' options of the note 'CHANGE MASTER TO': change-master-to. statement used to set up this replica. The reason for excluding such epochs is that rows in the 'mysql.ndb_apply_status' table whose server IDs have a match in the 'IGNORE_SERVER_IDS' list from the 'CHANGE MASTER TO' statement used to prepare this replicas's source are also considered to be from local servers, in addition to those having the replica's own server ID. You can retrieve this list as 'Replicate_Ignore_Server_Ids' from the output of note 'SHOW SLAVE STATUS': show-slave-status. We assume that you have obtained this list and are substituting it for IGNORE_SERVER_IDS in the query shown here, which like the previous version of the query, selects the greatest epoch into a variable named '@latest':

      mysqlR'> SELECT @latest:=MAX(epoch)
            ->        FROM mysql.ndb_apply_status
            ->        WHERE server_id NOT IN (IGNORE_SERVER_IDS);

    In some cases, it may be simpler or more efficient (or both) to use a list of the server IDs to be included and 'server_id IN SERVER_ID_LIST' in the 'WHERE' condition of the preceding query.

  2. Using the information obtained from the query shown in Step 1, obtain the corresponding records from the 'ndb_binlog_index' table on the source cluster.

    You can use the following query to obtain the needed records from the 'ndb_binlog_index' table on the source:

      mysqlS'> SELECT
          ->     @file:=SUBSTRING_INDEX(next_file, '/', -1),
          ->     @pos:=next_position
          -> FROM mysql.ndb_binlog_index
          -> WHERE epoch = @latest;

    These are the records saved on the source since the failure of the primary replication channel. We have employed a user variable '@latest' here to represent the value obtained in Step 1. Of course, it is not possible for one *note 'mysqld': mysqld. instance to access user variables set on another server instance directly. These values must be 'plugged in' to the second query manually or by an application.

    Important:

    You must ensure that the replica note 'mysqld': mysqld. is started with '--slave-skip-errors=ddl_exist_errors' before executing note 'START SLAVE': start-slave. Otherwise, replication may stop with duplicate DDL errors.

  3. Now it is possible to synchronize the secondary channel by running the following query on the secondary replica server:

      mysqlR'> CHANGE MASTER TO
            ->     MASTER_LOG_FILE='@file',
            ->     MASTER_LOG_POS=@pos;

    Again we have employed user variables (in this case '@file' and '@pos') to represent the values obtained in Step 2 and applied in Step 3; in practice these values must be inserted manually or using an application that can access both of the servers involved.

    Note:

    '@file' is a string value such as ''/var/log/mysql/replication-source-bin.00001'', and so must be quoted when used in SQL or application code. However, the value represented by '@pos' must not be quoted. Although MySQL normally attempts to convert strings to numbers, this case is an exception.

  4. You can now initiate replication on the secondary channel by issuing the appropriate statement on the secondary replica *note 'mysqld': mysqld.:

      mysqlR'> START SLAVE;

Once the secondary replication channel is active, you can investigate the failure of the primary and effect repairs. The precise actions required to do this depend upon the reasons for which the primary channel failed.

Warning:

The secondary replication channel is to be started only if and when the primary replication channel has failed. Running multiple replication channels simultaneously can result in unwanted duplicate records being created on the replicas.

If the failure is limited to a single server, it should in theory be possible to replicate from S to R', or from S' to R.

 File: manual.info.tmp, Node: mysql-cluster-replication-backups, Next: mysql-cluster-replication-multi-source, Prev: mysql-cluster-replication-failover, Up: mysql-cluster-replication

21.7.9 NDB Cluster Backups With NDB Cluster Replication

This section discusses making backups and restoring from them using NDB Cluster replication. We assume that the replication servers have already been configured as covered previously (see *note mysql-cluster-replication-preparation::, and the sections immediately following). This having been done, the procedure for making a backup and then restoring from it is as follows:

  1. There are two different methods by which the backup may be started.

    * Method A
    
      This method requires that the cluster backup process was
      previously enabled on the source server, prior to starting the
      replication process.  This can be done by including the
      following line in a '[mysql_cluster]' section in the 'my.cnf
      file', where MANAGEMENT_HOST is the IP address or host name of
      the *note 'NDB': mysql-cluster. management server for the
      source cluster, and PORT is the management server's port
      number:
    
           ndb-connectstring=MANAGEMENT_HOST[:PORT]
    
      *Note*:
    
      The port number needs to be specified only if the default port
      (1186) is not being used.  See *note
      mysql-cluster-install-configuration::, for more information
      about ports and port allocation in NDB Cluster.
    
      In this case, the backup can be started by executing this
      statement on the replication source:
    
           shellS> ndb_mgm -e "START BACKUP"
    
    * Method B
    
      If the 'my.cnf' file does not specify where to find the
      management host, you can start the backup process by passing
      this information to the *note 'NDB': mysql-cluster. management
      client as part of the *note 'START BACKUP':
      mysql-cluster-backup-using-management-client. command.  This
      can be done as shown here, where MANAGEMENT_HOST and PORT are
      the host name and port number of the management server:
    
           shellS> ndb_mgm MANAGEMENT_HOST:PORT -e "START BACKUP"
    
      In our scenario as outlined earlier (see *note
      mysql-cluster-replication-preparation::), this would be
      executed as follows:
    
           shellS> ndb_mgm rep-source:1186 -e "START BACKUP"
  2. Copy the cluster backup files to the replica that is being brought on line. Each system running an *note 'ndbd': mysql-cluster-programs-ndbd. process for the source cluster has cluster backup files located on it, and all of these files must be copied to the replica to ensure a successful restore. The backup files can be copied into any directory on the computer where the replica's management host resides, as long as the MySQL and NDB binaries have read permissions in that directory. In this case, we assume that these files have been copied into the directory '/var/BACKUPS/BACKUP-1'.

    While it is not necessary that the replica cluster have the same number of *note 'ndbd': mysql-cluster-programs-ndbd. processes (data nodes) as the source, it is highly recommended this number be the same. It is necessary that the replica be started with the '--skip-slave-start' option, to prevent premature startup of the replication process.

  3. Create any databases on the replica cluster that are present on the source cluster and that are to be replicated.

    Important:

    A note 'CREATE DATABASE': create-database. (or note 'CREATE SCHEMA': create-database.) statement corresponding to each database to be replicated must be executed on each SQL node in the replica cluster.

  4. Reset the replica cluster using this statement in the *note 'mysql': mysql. client:

      mysqlR> RESET SLAVE;
  5. You can now start the cluster restoration process on the replica using the *note 'ndb_restore': mysql-cluster-programs-ndb-restore. command for each backup file in turn. For the first of these, it is necessary to include the '-m' option to restore the cluster metadata, as shown here:

      shellR> ndb_restore -c REPLICA_HOST:PORT -n NODE-ID \
              -b BACKUP-ID -m -r DIR

    DIR is the path to the directory where the backup files have been placed on the replica. For the *note 'ndb_restore': mysql-cluster-programs-ndb-restore. commands corresponding to the remaining backup files, the '-m' option should not be used.

    For restoring from a source cluster with four data nodes (as shown in the figure in *note mysql-cluster-replication::) where the backup files have been copied to the directory '/var/BACKUPS/BACKUP-1', the proper sequence of commands to be executed on the replica might look like this:

      shellR> ndb_restore -c replica-host:1186 -n 2 -b 1 -m \
              -r ./var/BACKUPS/BACKUP-1
      shellR> ndb_restore -c replica-host:1186 -n 3 -b 1 \
              -r ./var/BACKUPS/BACKUP-1
      shellR> ndb_restore -c replica-host:1186 -n 4 -b 1 \
              -r ./var/BACKUPS/BACKUP-1
      shellR> ndb_restore -c replica-host:1186 -n 5 -b 1 -e \
              -r ./var/BACKUPS/BACKUP-1

    Important:

    The '-e' (or '--restore-epoch') option in the final invocation of note 'ndb_restore': mysql-cluster-programs-ndb-restore. in this example is required to make sure that the epoch is written to the replica's 'mysql.ndb_apply_status' table. Without this information, the replica cannot synchronize properly with the source. (See note mysql-cluster-programs-ndb-restore::.)

  6. Now you need to obtain the most recent epoch from the 'ndb_apply_status' table on the replica (as discussed in *note mysql-cluster-replication-failover::):

      mysqlR> SELECT @latest:=MAX(epoch)
              FROM mysql.ndb_apply_status;
  7. Using '@latest' as the epoch value obtained in the previous step, you can obtain the correct starting position '@pos' in the correct binary log file '@file' from the 'mysql.ndb_binlog_index' table on the source. The query shown here gets these from the 'next_position' and 'next_file' columns from the last epoch applied before the logical restore position:

      mysqlS> SELECT
           ->     @file:=SUBSTRING_INDEX(next_file, '/', -1),
           ->     @pos:=next_position
           -> FROM mysql.ndb_binlog_index
           -> WHERE epoch > @latest
           -> ORDER BY epoch ASC LIMIT 1;

    In the event that there is currently no replication traffic, you can get similar information by running *note 'SHOW MASTER STATUS': show-master-status. on the source and using the value shown in the 'Position' column of the output for the file whose name has the suffix with the greatest value for all files shown in the 'File' column. In this case, you must determine which file this is and supply the name in the next step manually or by parsing the output with a script.

  8. Using the values obtained in the previous step, you can now issue the appropriate note 'CHANGE MASTER TO': change-master-to. statement in the replica's note 'mysql': mysql. client:

      mysqlR> CHANGE MASTER TO
           ->     MASTER_LOG_FILE='@file',
           ->     MASTER_LOG_POS=@pos;
  9. Now that the replica knows from what point in which binary log file to start reading data from the source, you can cause the replica to begin replicating with this statement:

      mysqlR> START SLAVE;

To perform a backup and restore on a second replication channel, it is necessary only to repeat these steps, substituting the host names and IDs of the secondary source and replica for those of the primary source and replica servers where appropriate, and running the preceding statements on them.

For additional information on performing Cluster backups and restoring Cluster from backups, see *note mysql-cluster-backup::.

 File: manual.info.tmp, Node: mysql-cluster-replication-auto-sync, Next: mysql-cluster-replication-pitr, Prev: mysql-cluster-replication-backups, Up: mysql-cluster-replication-backups

21.7.9.1 NDB Cluster Replication: Automating Synchronization of the Replica to the Source Binary Log ....................................................................................................

It is possible to automate much of the process described in the previous section (see *note mysql-cluster-replication-backups::). The following Perl script 'reset-replica.pl' serves as an example of how you can do this.

 #!/user/bin/perl -w

 #  file: reset-replica.pl

 #  Copyright (c) 2005, 2020, Oracle and/or its affiliates. All rights reserved.

 #  This program is free software; you can redistribute it and/or modify
 #  it under the terms of the GNU General Public License as published by
 #  the Free Software Foundation; either version 2 of the License, or
 #  (at your option) any later version.

 #  This program is distributed in the hope that it will be useful,
 #  but WITHOUT ANY WARRANTY; without even the implied warranty of
 #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 #  GNU General Public License for more details.

 #  You should have received a copy of the GNU General Public License
 #  along with this program; if not, write to:
 #  Free Software Foundation, Inc.
 #  59 Temple Place, Suite 330
 #  Boston, MA 02111-1307 USA
 #
 #  Version 1.1

 ######################## Includes ###############################

 use DBI;

 ######################## Globals ################################

 my  $m_host='';
 my  $m_port='';
 my  $m_user='';
 my  $m_pass='';
 my  $s_host='';
 my  $s_port='';
 my  $s_user='';
 my  $s_pass='';
 my  $dbhM='';
 my  $dbhS='';

 ####################### Sub Prototypes ##########################

 sub CollectCommandPromptInfo;
 sub ConnectToDatabases;
 sub DisconnectFromDatabases;
 sub GetReplicaEpoch;
 sub GetSourceInfo;
 sub UpdateReplica;

 ######################## Program Main ###########################

 CollectCommandPromptInfo;
 ConnectToDatabases;
 GetReplicaEpoch;
 GetSourceInfo;
 UpdateReplica;
 DisconnectFromDatabases;

 ################## Collect Command Prompt Info ##################

 sub CollectCommandPromptInfo
 {
   ### Check that user has supplied correct number of command line args
   die "Usage:\n
        reset-replica >source MySQL host< >source MySQL port< \n
                    >source user< >source pass< >replica MySQL host< \n
                    >replica MySQL port< >replica user< >replica pass< \n
        All 8 arguments must be passed. Use BLANK for NULL passwords\n"
        unless @ARGV == 8;

   $m_host  =  $ARGV[0];
   $m_port  =  $ARGV[1];
   $m_user  =  $ARGV[2];
   $m_pass  =  $ARGV[3];
   $s_host  =  $ARGV[4];
   $s_port  =  $ARGV[5];
   $s_user  =  $ARGV[6];
   $s_pass  =  $ARGV[7];

   if ($m_pass eq "BLANK") { $m_pass = '';}
   if ($s_pass eq "BLANK") { $s_pass = '';}
 }

 ###############  Make connections to both databases #############

 sub ConnectToDatabases
 {
   ### Connect to both source and replica cluster databases

   ### Connect to source
   $dbhM
     = DBI->connect(
     "dbi:mysql:database=mysql;host=$m_host;port=$m_port",
     "$m_user", "$m_pass")
       or die "Can't connect to source cluster MySQL process!
               Error: $DBI::errstr\n";

   ### Connect to replica
   $dbhS
     = DBI->connect(
           "dbi:mysql:database=mysql;host=$s_host",
           "$s_user", "$s_pass")
     or die "Can't connect to replica cluster MySQL process!
             Error: $DBI::errstr\n";
 }

 ################  Disconnect from both databases ################

 sub DisconnectFromDatabases
 {
   ### Disconnect from source

   $dbhM->disconnect
   or warn " Disconnection failed: $DBI::errstr\n";

   ### Disconnect from replica

   $dbhS->disconnect
   or warn " Disconnection failed: $DBI::errstr\n";
 }

 ######################  Find the last good GCI ##################

 sub GetReplicaEpoch
 {
   $sth = $dbhS->prepare("SELECT MAX(epoch)
                          FROM mysql.ndb_apply_status;")
       or die "Error while preparing to select epoch from replica: ",
              $dbhS->errstr;

   $sth->execute
       or die "Selecting epoch from replica error: ", $sth->errstr;

   $sth->bind_col (1, \$epoch);
   $sth->fetch;
   print "\tReplica epoch =  $epoch\n";
   $sth->finish;
 }

 #######  Find the position of the last GCI in the binary log ########

 sub GetSourceInfo
 {
   $sth = $dbhM->prepare("SELECT
                            SUBSTRING_INDEX(File, '/', -1), Position
                          FROM mysql.ndb_binlog_index
                          WHERE epoch > $epoch
                          ORDER BY epoch ASC LIMIT 1;")
       or die "Prepare to select from source error: ", $dbhM->errstr;

   $sth->execute
       or die "Selecting from source error: ", $sth->errstr;

   $sth->bind_col (1, \$binlog);
   $sth->bind_col (2, \$binpos);
   $sth->fetch;
   print "\tSource binary log file =  $binlog\n";
   print "\tSource binary log position =  $binpos\n";
   $sth->finish;
 }

 ##########  Set the replica to process from that location #########

 sub UpdateReplica
 {
   $sth = $dbhS->prepare("CHANGE MASTER TO
                          MASTER_LOG_FILE='$binlog',
                          MASTER_LOG_POS=$binpos;")
       or die "Prepare to CHANGE MASTER error: ", $dbhS->errstr;

   $sth->execute
        or die "CHANGE MASTER on replica error: ", $sth->errstr;
   $sth->finish;
   print "\tReplica has been updated. You may now start the replica.\n";
 }

 # end reset-replica.pl

 File: manual.info.tmp, Node: mysql-cluster-replication-pitr, Prev: mysql-cluster-replication-auto-sync, Up: mysql-cluster-replication-backups

21.7.9.2 Point-In-Time Recovery Using NDB Cluster Replication .............................................................

Point-in-time recovery--that is, recovery of data changes made since a given point in time--is performed after restoring a full backup that returns the server to its state when the backup was made. Performing point-in-time recovery of NDB Cluster tables with NDB Cluster and NDB Cluster Replication can be accomplished using a native note 'NDB': mysql-cluster. data backup (taken by issuing 'CREATE BACKUP' in the note 'ndb_mgm': mysql-cluster-programs-ndb-mgm. client) and restoring the 'ndb_binlog_index' table (from a dump made using *note 'mysqldump': mysqldump.).

To perform point-in-time recovery of NDB Cluster, it is necessary to follow the steps shown here:

  1. Back up all 'NDB' databases in the cluster, using the note 'START BACKUP': mysql-cluster-backup-using-management-client. command in the note 'ndb_mgm': mysql-cluster-programs-ndb-mgm. client (see *note mysql-cluster-backup::).

  2. At some later point, prior to restoring the cluster, make a backup of the 'mysql.ndb_binlog_index' table. It is probably simplest to use *note 'mysqldump': mysqldump. for this task. Also back up the binary log files at this time.

    This backup should be updated regularly--perhaps even hourly--depending on your needs.

  3. (Catastrophic failure or error occurs.)

  4. Locate the last known good backup.

  5. Clear the data node file systems (using note 'ndbd': mysql-cluster-programs-ndbd. '--initial' or note 'ndbmtd': mysql-cluster-programs-ndbmtd. '--initial').

    Note:

    NDB Cluster Disk Data tablespace and log files are not removed by '--initial'. You must delete these manually.

  6. Use note 'DROP TABLE': drop-table. or note 'TRUNCATE TABLE': truncate-table. with the 'mysql.ndb_binlog_index' table.

  7. Execute note 'ndb_restore': mysql-cluster-programs-ndb-restore, restoring all data. You must include the '--restore-epoch' option when you run note 'ndb_restore': mysql-cluster-programs-ndb-restore, so that the 'ndb_apply_status' table is populated correctly. (See *note mysql-cluster-programs-ndb-restore::, for more information.)

  8. Restore the 'ndb_binlog_index' table from the output of *note 'mysqldump': mysqldump. and restore the binary log files from backup, if necessary.

  9. Find the epoch applied most recently--that is, the maximum 'epoch' column value in the 'ndb_apply_status' table--as the user variable '@LATEST_EPOCH' (emphasized):

      SELECT _@LATEST_EPOCH_:=MAX(epoch)
          FROM mysql.ndb_apply_status;
  10. Find the latest binary log file ('@FIRST_FILE') and position ('Position' column value) within this file that correspond to '@LATEST_EPOCH' in the 'ndb_binlog_index' table:

      SELECT Position, _@FIRST_FILE_:=File
          FROM mysql.ndb_binlog_index
          WHERE epoch > _@LATEST_EPOCH_ ORDER BY epoch ASC LIMIT 1;
  11. Using note 'mysqlbinlog': mysqlbinlog, replay the binary log events from the given file and position up to the point of the failure. (See note mysqlbinlog::.)

See also *note point-in-time-recovery::, for more information about the binary log, replication, and incremental recovery.

 File: manual.info.tmp, Node: mysql-cluster-replication-multi-source, Next: mysql-cluster-replication-conflict-resolution, Prev: mysql-cluster-replication-backups, Up: mysql-cluster-replication

21.7.10 NDB Cluster Replication: Bidirectional and Circular Replication

It is possible to use NDB Cluster for bidirectional replication between two clusters, as well as for circular replication between any number of clusters.

Circular replication example

In the next few paragraphs we consider the example of a replication setup involving three NDB Clusters numbered 1, 2, and 3, in which Cluster 1 acts as the replication source for Cluster 2, Cluster 2 acts as the source for Cluster 3, and Cluster 3 acts as the source for Cluster 1. Each cluster has two SQL nodes, with SQL nodes A and B belonging to Cluster 1, SQL nodes C and D belonging to Cluster 2, and SQL nodes E and F belonging to Cluster 3.

Circular replication using these clusters is supported as long as the following conditions are met:

This type of circular replication setup is shown in the following diagram:

FIGURE GOES HERE: NDB Cluster Circular Replication with All Sources As Replicas

In this scenario, SQL node A in Cluster 1 replicates to SQL node C in Cluster 2; SQL node C replicates to SQL node E in Cluster 3; SQL node E replicates to SQL node A. In other words, the replication line (indicated by the curved arrows in the diagram) directly connects all SQL nodes used as replication sources and replicas.

It is also possible to set up circular replication in such a way that not all source SQL nodes are also replicas, as shown here:

FIGURE GOES HERE: NDB Cluster Circular Replication Where Not All Sources Are Replicas

In this case, different SQL nodes in each cluster are used as replication sources and replicas. You must not start any of the SQL nodes with the 'log_slave_updates' system variable enabled. This type of circular replication scheme for NDB Cluster, in which the line of replication (again indicated by the curved arrows in the diagram) is discontinuous, should be possible, but it should be noted that it has not yet been thoroughly tested and must therefore still be considered experimental.

Using NDB-native backup and restore to initialize a replica cluster

When setting up circular replication, it is possible to initialize the replica cluster by using the management client note 'START BACKUP': mysql-cluster-backup-using-management-client. command on one NDB Cluster to create a backup and then applying this backup on another NDB Cluster using note 'ndb_restore': mysql-cluster-programs-ndb-restore. This does not automatically create binary logs on the second NDB Cluster's SQL node acting as the replica; in order to cause the binary logs to be created, you must issue a note 'SHOW TABLES': show-tables. statement on that SQL node; this should be done prior to running note 'START SLAVE': start-slave. This is a known issue.

Multi-source failover example

In this section, we discuss failover in a multi-source NDB Cluster replication setup with three NDB Clusters having server IDs 1, 2, and 3. In this scenario, Cluster 1 replicates to Clusters 2 and 3; Cluster 2 also replicates to Cluster 3. This relationship is shown here:

FIGURE GOES HERE: NDB Cluster Multi-Master Replication With 3 Sources

In other words, data replicates from Cluster 1 to Cluster 3 through 2 different routes: directly, and by way of Cluster 2.

Not all MySQL servers taking part in multi-source replication must act as both source and replica, and a given NDB Cluster might use different SQL nodes for different replication channels. Such a case is shown here:

FIGURE GOES HERE: NDB Cluster Multi-Source Replication, With MySQL Servers

MySQL servers acting as replicas must be run with the 'log_slave_updates' system variable enabled. Which *note 'mysqld': mysqld. processes require this option is also shown in the preceding diagram.

Note:

Using the 'log_slave_updates' system variable has no effect on servers not being run as replicas.

The need for failover arises when one of the replicating clusters goes down. In this example, we consider the case where Cluster 1 is lost to service, and so Cluster 3 loses 2 sources of updates from Cluster 1. Because replication between NDB Clusters is asynchronous, there is no guarantee that Cluster 3's updates originating directly from Cluster 1 are more recent than those received through Cluster 2. You can handle this by ensuring that Cluster 3 catches up to Cluster 2 with regard to updates from Cluster 1. In terms of MySQL servers, this means that you need to replicate any outstanding updates from MySQL server C to server F.

On server C, perform the following queries:

 mysqlC> SELECT @latest:=MAX(epoch)
      ->     FROM mysql.ndb_apply_status
      ->     WHERE server_id=1;

 mysqlC> SELECT
      ->     @file:=SUBSTRING_INDEX(File, '/', -1),
      ->     @pos:=Position
      ->     FROM mysql.ndb_binlog_index
      ->     WHERE orig_epoch >= @latest
      ->     AND orig_server_id = 1
      ->     ORDER BY epoch ASC LIMIT 1;

Note:

You can improve the performance of this query, and thus likely speed up failover times significantly, by adding the appropriate index to the 'ndb_binlog_index' table. See *note mysql-cluster-replication-schema::, for more information.

Copy over the values for @FILE and @POS manually from server C to server F (or have your application perform the equivalent). Then, on server F, execute the following *note 'CHANGE MASTER TO': change-master-to. statement:

 mysqlF> CHANGE MASTER TO
      ->     MASTER_HOST = 'serverC'
      ->     MASTER_LOG_FILE='@file',
      ->     MASTER_LOG_POS=@pos;

Once this has been done, you can issue a *note 'START SLAVE': start-slave. statement on MySQL server F; this causes any missing updates originating from server B to be replicated to server F.

The note 'CHANGE MASTER TO': change-master-to. statement also supports an 'IGNORE_SERVER_IDS' option which takes a comma-separated list of server IDs and causes events originating from the corresponding servers to be ignored. For more information, see note change-master-to::, and note show-slave-status::. For information about how this option intereacts with the 'ndb_log_apply_status' variable, see note mysql-cluster-replication-failover::.

 File: manual.info.tmp, Node: mysql-cluster-replication-conflict-resolution, Prev: mysql-cluster-replication-multi-source, Up: mysql-cluster-replication

21.7.11 NDB Cluster Replication Conflict Resolution

When using a replication setup involving multiple sources (including circular replication), it is possible that different sources may try to update the same row on the replica with different data. Conflict resolution in NDB Cluster Replication provides a means of resolving such conflicts by permitting a user-defined resolution column to be used to determine whether or not an update on a given source should be applied on the replica.

Some types of conflict resolution supported by NDB Cluster ('NDBOLD()′,′NDBMAX()', 'NDBMAXDELETEWIN()′)implementthisuser − definedcolumnasatimestampcolumn(althoughitstypecannotbe * noteTIMESTAMP′:datetime, asexplainedlaterinthissection).Thesetypesofconflictresolutionarealwaysappliedarow − by − rowbasisratherthanatransactionalbasis.Theepoch − basedconflictresolutionfunctionsNDBEPOCH()' and 'NDB$EPOCH_TRANS()' compare the order in which epochs are replicated (and thus these functions are transactional). Different methods can be used to compare resolution column values on the replica when conflicts occur, as explained later in this section; the method used can be set to act on a single table, database, or server, or on a set of one or more tables using pattern matching. See *note ndb-replication-wildcards::, for information about using pattern matches in the 'db', 'table_name', and 'server_id' columns of the 'mysql.ndb_replication' table.

You should also keep in mind that it is the application's responsibility to ensure that the resolution column is correctly populated with relevant values, so that the resolution function can make the appropriate choice when determining whether to apply an update.

Requirements

Preparations for conflict resolution must be made on both the source and the replica. These tasks are described in the following list:

When using the functions 'NDBOLD()′,′NDBMAX()', and 'NDB$MAX_DELETE_WIN()' for timestamp-based conflict resolution, we often refer to the column used for determining updates as a 'timestamp' column. However, the data type of this column is never note 'TIMESTAMP': datetime.; instead, its data type should be note 'INT': integer-types. (note 'INTEGER': integer-types.) or note 'BIGINT': integer-types. The 'timestamp' column should also be 'UNSIGNED' and 'NOT NULL'.

The 'NDBEPOCH()′andNDBEPOCH_TRANS()' functions discussed later in this section work by comparing the relative order of replication epochs applied on a primary and secondary NDB Cluster, and do not make use of timestamps.

Source Column Control

We can see update operations in terms of 'before' and 'after' images--that is, the states of the table before and after the update is applied. Normally, when updating a table with a primary key, the 'before' image is not of great interest; however, when we need to determine on a per-update basis whether or not to use the updated values on a replica, we need to make sure that both images are written to the source's binary log. This is done with the '--ndb-log-update-as-write' option for *note 'mysqld': mysqld, as described later in this section.

Important:

Whether logging of complete rows or of updated columns only is done is decided when the MySQL server is started, and cannot be changed online; you must either restart note 'mysqld': mysqld, or start a new note 'mysqld': mysqld. instance with different logging options.

Conflict Resolution Control

Conflict resolution is usually enabled on the server where conflicts can occur. Like logging method selection, it is enabled by entries in the 'mysql.ndb_replication' table.

'NBT_UPDATED_ONLY_MINIMAL' and 'NBT_UPDATED_FULL_MINIMAL' can be used with 'NDBEPOCH()′,′NDBEPOCH2()', and 'NDBEPOCHTRANS()′,becausethesedonotrequirebeforevaluesofcolumnswhicharenotprimarykeys.Conflictresolutionalgorithmsrequiringtheoldvalues, suchasNDBMAX()' and 'NDB$OLD()', do not work correctly with these 'binlog_type' values.

Conflict Resolution Functions

This section provides detailed information about the functions which can be used for conflict detection and resolution with NDB Replication. These functions are listed here in alphabetical order:

NDB$OLD()

If the value of COLUMN_NAME is the same on both the source and the replica, then the update is applied; otherwise, the update is not applied on the replica and an exception is written to the log. This is illustrated by the following pseudocode:

 if (SOURCE_OLD_COLUMN_VALUE == REPLICA_CURRENT_COLUMN_VALUE)
   apply_update();
 else
   log_exception();

This function can be used for 'same value wins' conflict resolution. This type of conflict resolution ensures that updates are not applied on the replica from the wrong source.

Important:

The column value from the source's 'before' image is used by this function.

NDB$MAX()

If the 'timestamp' column value for a given row coming from the source is higher than that on the replica, it is applied; otherwise it is not applied on the replica. This is illustrated by the following pseudocode:

 if (SOURCE_NEW_COLUMN_VALUE > REPLICA_CURRENT_COLUMN_VALUE)
   apply_update();

This function can be used for 'greatest timestamp wins' conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was most recently updated is the version that persists.

Important:

The column value from the sources's 'after' image is used by this function.

NDB$MAX_DELETE_WIN()

This is a variation on 'NDBMAX()′.Duetothefactthatnotimestampisavailableforadeleteoperation, adeleteusingNDBMAX()' is in fact processed as 'NDBOLD′,butforsomeusecases, thisisnotoptimal.ForNDBMAX_DELETE_WIN()', if the 'timestamp' column value for a given row adding or updating an existing row coming from the source is higher than that on the replica, it is applied. However, delete operations are treated as always having the higher value. This is illustrated by the following pseudocode:

 if ( (SOURCE_NEW_COLUMN_VALUE > REPLICA_CURRENT_COLUMN_VALUE)
         ||
       OPERATION.TYPE == "delete")
   apply_update();

This function can be used for 'greatest timestamp, delete wins' conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was deleted or (otherwise) most recently updated is the version that persists.

Note:

As with 'NDB$MAX()', the column value from the source's 'after' image is the value used by this function.

NDB$EPOCH()

The 'NDB$EPOCH()' function tracks the order in which replicated epochs are applied on a replica cluster relative to changes originating on the replica. This relative ordering is used to determine whether changes originating on the replica are concurrent with any changes that originate locally, and are therefore potentially in conflict.

Most of what follows in the description of 'NDBEPOCH()′alsoappliestoNDBEPOCH_TRANS()'. Any exceptions are noted in the text.

'NDB$EPOCH()' is asymmetric, operating on one NDB Cluster in a bidirectional replication configuration (sometimes referred to as 'active-active' replication). We refer here to cluster on which it operates as the primary, and the other as the secondary. The replica on the primary is responsible for detecting and handling conflicts, while the replica on the secondary is not involved in any conflict detection or handling.

When the replica on the primary detects conflicts, it injects events into its own binary log to compensate for these; this ensures that the secondary NDB Cluster eventually realigns itself with the primary and so keeps the primary and secondary from diverging. This compensation and realignment mechanism requires that the primary NDB Cluster always wins any conflicts with the secondary--that is, that the primary's changes are always used rather than those from the secondary in event of a conflict. This 'primary always wins' rule has the following implications:

'NDBEPOCH()′andNDBEPOCH_TRANS()' do not require any user schema modifications, or application changes to provide conflict detection. However, careful thought must be given to the schema used, and the access patterns used, to verify that the complete system behaves within specified limits.

Each of the 'NDBEPOCH()′andNDBEPOCH_TRANS()' functions can take an optional parameter; this is the number of bits to use to represent the lower 32 bits of the epoch, and should be set to no less than the value calculated as shown here:

 CEIL( LOG2( TimeBetweenGlobalCheckpoints / TimeBetweenEpochs ), 1)

For the default values of these configuration parameters (2000 and 100 milliseconds, respectively), this gives a value of 5 bits, so the default value (6) should be sufficient, unless other values are used for 'TimeBetweenGlobalCheckpoints', 'TimeBetweenEpochs', or both. A value that is too small can result in false positives, while one that is too large could lead to excessive wasted space in the database.

Both 'NDBEPOCH()′andNDBEPOCH_TRANS()' insert entries for conflicting rows into the relevant exceptions tables, provided that these tables have been defined according to the same exceptions table schema rules as described elsewhere in this section (see *note mysql-cluster-replication-ndb-old::). You must create any exceptions table before creating the data table with which it is to be used.

As with the other conflict detection functions discussed in this section, 'NDBEPOCH()′andNDBEPOCH_TRANS()' are activated by including relevant entries in the 'mysql.ndb_replication' table (see *note ndb-replication-ndb-replication::). The roles of the primary and secondary NDB Clusters in this scenario are fully determined by 'mysql.ndb_replication' table entries.

Because the conflict detection algorithms employed by 'NDBEPOCH()′andNDBEPOCH_TRANS()' are asymmetric, you must use different values for the 'server_id' entries of the primary and secondary replicas.

A conflict between 'DELETE' operations alone is not sufficient to trigger a conflict using 'NDBEPOCH()′orNDBEPOCH_TRANS()', and the relative placement within epochs does not matter.

Limitations on NDB$EPOCH()

The following limitations currently apply when using 'NDB$EPOCH()' to perform conflict detection:

NDB$EPOCH_TRANS()

'NDBEPOCHTRANS()′extendstheNDBEPOCH()' function. Conflicts are detected and handled in the same way using the 'primary wins all' rule (see *note mysql-cluster-replication-ndb-epoch::) but with the extra condition that any other rows updated in the same transaction in which the conflict occurred are also regarded as being in conflict. In other words, where 'NDBEPOCH()′realignsindividualconflictingrowsonthesecondary, ′NDBEPOCH_TRANS()' realigns conflicting transactions.

In addition, any transactions which are detectably dependent on a conflicting transaction are also regarded as being in conflict, these dependencies being determined by the contents of the secondary cluster's binary log. Since the binary log contains only data modification operations (inserts, updates, and deletes), only overlapping data modifications are used to determine dependencies between transactions.

'NDBEPOCHTRANS()′issubjecttothesameconditionsandlimitationsasNDBEPOCH()', and in addition requires that Version 2 binary log row events are used ('log_bin_use_v1_row_events' equal to 0), which adds a storage overhead of 2 bytes per event in the binary log. In addition, all transaction IDs must be recorded in the secondary's binary log, using '--ndb-log-transaction-id' set to 'ON'. This adds a variable amount of overhead (up to 13 bytes per row).

See *note mysql-cluster-replication-ndb-epoch::.

NDB$EPOCH2()

The 'NDBEPOCH2()′functionissimilartoNDBEPOCH()', except that 'NDB$EPOCH2()' provides for delete-delete handling with a bidirectional replication topology. In this scenario, primary and secondary roles are assigned to the two sources by setting the 'ndb_slave_conflict_role' system variable to the appropriate value on each source (usually one each of 'PRIMARY', 'SECONDARY'). When this is done, modifications made by the secondary are reflected by the primary back to the secondary which then conditionally applies them.

NDB$EPOCH2_TRANS()

'NDBEPOCH2TRANS()′extendstheNDBEPOCH2()' function. Conflicts are detected and handled in the same way, and assigning primary and secondary roles to the replicating clusters, but with the extra condition that any other rows updated in the same transaction in which the conflict occurred are also regarded as being in conflict. That is, 'NDBEPOCH2()′realignsindividualconflictingrowsonthesecondary, whileNDBEPOCH_TRANS()' realigns conflicting transactions.

Where 'NDBEPOCH()′andNDBEPOCH_TRANS()' use metadata that is specified per row, per last modified epoch, to determine on the primary whether an incoming replicated row change from the secondary is concurrent with a locally committed change; concurrent changes are regarded as conflicting, with subesequent exceptions table updates and realignment of the secondary. A problem arises when a row is deleted on the primary so there is no longer any last-modified epoch available to determine whether any replicated operations conflict, which means that conflicting delete operationss are not detected. This can result in divergence, an example being a delete on one cluster which is concurrent with a delete and insert on the other; this why delete operations can be routed to only one cluster when using 'NDBEPOCH()′andNDBEPOCH_TRANS()'.

'NDB$EPOCH2()' bypasses the issue just described--storing information about deleted rows on the PRIMARY--by ignoring any delete-delete conflict, and by avoiding any potential resultant divergence as well. This is accomplished by reflecting any operation successfully applied on and replicated from the secondary back to the secondary. On its return to the secondary, it can be used to reapply an operation on the secondary which was deleted by an operation originating from the primary.

When using 'NDB$EPOCH2()', you should keep in mind that the secondary applies the delete from the primary, removing the new row until it is restored by a reflected operation. In theory, the subsequent insert or update on the secondary conflicts with the delete from the primary, but in this case, we choose to ignore this and allow the secondary to 'win', in the interest of preventing divergence between the clusters. In other words, after a delete, the primary does not detect conflicts, and instead adopts the secondary's following changes immediately. Because of this, the secondary's state can revisit multiple previous committed states as it progresses to a final (stable) state, and some of these may be visible.

You should also be aware that reflecting all operations from the secondary back to the primary increases the size of the primary's logbinary log, as well as demands on bandwidth, CPU usage, and disk I/O.

Application of reflected operations on the secondary depends on the state of the target row on the secondary. Whether or not reflected changes are applied on the secondary can be tracked by checking the 'Ndb_conflict_reflected_op_prepare_count' and 'Ndb_conflict_reflected_op_discard_count' status variables. The number of changes applied is simply the difference between these two values (note that 'Ndb_conflict_reflected_op_prepare_count' is always greater than or equal to 'Ndb_conflict_reflected_op_discard_count').

Events are applied if and only if both of the following conditions are true:

If both of these conditions are not met, the reflected operation is discarded by the secondary.

Conflict Resolution Exceptions Table

To use the 'NDBOLD()′conflictresolutionfunction, itisalsonecessarytocreateanexceptionstablecorrespondingtoeach * noteNDB′:mysql − cluster.tableforwhichthistypeofconflictresolutionistobeemployed.ThisisalsotruewhenusingNDBEPOCH()' or 'NDBEPOCHTRANS()′.Thenameofthistableisthatofthetableforwhichconflictresolutionistobeapplied, withthestringEX' appended. (For example, if the name of the original table is 'mytable', the name of the corresponding exceptions table name should be 'mytable$EX'.) The syntax for creating the exceptions table is as shown here:

 CREATE TABLE ORIGINAL_TABLE$EX  (
     [NDB$]server_id INT UNSIGNED,
     [NDB$]source_server_id INT UNSIGNED,
     [NDB$]source_epoch BIGINT UNSIGNED,
     [NDB$]count INT UNSIGNED,

     [NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW',
       'REFRESH_ROW', 'READ_ROW') NOT NULL,]
     [NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS',
       'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,]
     [NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,]

     ORIGINAL_TABLE_PK_COLUMNS,

     [ORIG_TABLE_COLUMN|ORIG_TABLE_COLUMN$OLD|ORIG_TABLE_COLUMN$NEW,]

     [ADDITIONAL_COLUMNS,]

     PRIMARY KEY([NDB$]server_id, [NDB$]source_server_id, [NDB$]source_epoch, [NDB$]count)
 ) ENGINE=NDB;

The first four columns are required. The names of the first four columns and the columns matching the original table's primary key columns are not critical; however, we suggest for reasons of clarity and consistency, that you use the names shown here for the 'server_id', 'source_server_id', 'source_epoch', and 'count' columns, and that you use the same names as in the original table for the columns matching those in the original table's primary key.

If the exceptions table uses one or more of the optional columns 'NDBOPTYPE′,′NDBCFT_CAUSE', or 'NDBORIGTRANSIDdiscussedlaterinthissection, theneachoftherequiredcolumnsmustalsobenamedusingtheprefixNDB'. If desired, you can use the 'NDB$' prefix to name the required columns even if you do not define any optional columns, but in this case, all four of the required columns must be named using the prefix.

Following these columns, the columns making up the original table's primary key should be copied in the order in which they are used to define the primary key of the original table. The data types for the columns duplicating the primary key columns of the original table should be the same as (or larger than) those of the original columns. A subset of the primary key columns may be used.

The exceptions table must use the *note 'NDB': mysql-cluster. storage engine. (An example that uses 'NDB$OLD()' with an exceptions table is shown later in this section.)

Additional columns may optionally be defined following the copied primary key columns, but not before any of them; any such extra columns cannot be 'NOT NULL'. NDB Cluster supports three additional, predefined optional columns 'NDBOPTYPE′,′NDBCFT_CAUSE', and 'NDB$ORIG_TRANSID', which are described in the next few paragraphs.

'NDB$OP_TYPE': This column can be used to obtain the type of operation causing the conflict. If you use this column, define it as shown here:

 NDB$OP_TYPE ENUM('WRITE_ROW', 'UPDATE_ROW', 'DELETE_ROW',
     'REFRESH_ROW', 'READ_ROW') NOT NULL

The 'WRITE_ROW', 'UPDATE_ROW', and 'DELETE_ROW' operation types represent user-initiated operations. 'REFRESH_ROW' operations are operations generated by conflict resolution in compensating transactions sent back to the originating cluster from the cluster that detected the conflict. 'READ_ROW' operations are user-initiated read tracking operations defined with exclusive row locks.

'NDBCFTCAUSE′:YoucandefineanoptionalcolumnNDBCFT_CAUSE' which provides the cause of the registered conflict. This column, if used, is defined as shown here:

 NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS',
     'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL

'ROW_DOES_NOT_EXIST' can be reported as the cause for 'UPDATE_ROW' and 'WRITE_ROW' operations; 'ROW_ALREADY_EXISTS' can be reported for 'WRITE_ROW' events. 'DATA_IN_CONFLICT' is reported when a row-based conflict function detects a conflict; 'TRANS_IN_CONFLICT' is reported when a transactional conflict function rejects all of the operations belonging to a complete transaction.

'NDBORIGTRANSID′:TheNDBORIG_TRANSID' column, if used, contains the ID of the originating transaction. This column should be defined as follows:

 NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL

'NDB$ORIG_TRANSID' is a 64-bit value generated by 'NDB'. This value can be used to correlate multiple exceptions table entries belonging to the same conflicting transaction from the same or different exceptions tables.

Additional reference columns which are not part of the original table's primary key can be named 'COLNAMEOLDorCOLNAMENEW'. 'COLNAMEOLDreferencesoldvaluesinupdateanddeleteoperations − −thatis, operationscontainingDELETEROWevents.′COLNAMENEW' can be used to reference new values in insert and update operations--in other words, operations using 'WRITE_ROW' events, 'UPDATE_ROW' events, or both types of events. Where a conflicting operation does not supply a value for a given reference column that is not a primary key, the exceptions table row contains either 'NULL', or a defined default value for that column.

Important:

The 'mysql.ndb_replication' table is read when a data table is set up for replication, so the row corresponding to a table to be replicated must be inserted into 'mysql.ndb_replication' before the table to be replicated is created.

Conflict Detection Status Variables

Several status variables can be used to monitor conflict detection. You can see how many rows have been found in conflict by 'NDB$EPOCH()' since this replica was last restarted from the current value of the 'Ndb_conflict_fn_epoch' system status variable.

'Ndb_conflict_fn_epoch_trans' provides the number of rows that have been found directly in conflict by 'NDBEPOCHTRANS()′.′Ndbconflictfnepoch2′andNdbconflictfnepoch2transshowthenumberofrowsfoundinconflictbyNDBEPOCH2()' and 'NDB$EPOCH2_TRANS()', respectively. The number of rows actually realigned, including those affected due to their membership in or dependency on the same transactions as other conflicting rows, is given by 'Ndb_conflict_trans_row_reject_count'.

Another server status variable 'Ndb_conflict_fn_max' provides a count of the number of times that a row was not applied on the current SQL node due to 'greatest timestamp wins' conflict resolution since the last time that *note 'mysqld': mysqld. was started. 'Ndb_conflict_fn_max_del_win' provides a count of the number of times that conflict resolution based on the outcome of 'NDB$MAX_DELETE_WIN()' has been applied.

The number of times that a row was not applied as the result of 'same timestamp wins' conflict resolution on a given *note 'mysqld': mysqld. since the last time it was restarted is given by the global status variable 'Ndb_conflict_fn_old'. In addition to incrementing 'Ndb_conflict_fn_old', the primary key of the row that was not used is inserted into an exceptions table, as explained elsewhere in this section.

See also *note mysql-cluster-status-variables::.

Examples

The following examples assume that you have already a working NDB Cluster replication setup, as described in note mysql-cluster-replication-preparation::, and note mysql-cluster-replication-starting::.

NDB$MAX() example

Suppose you wish to enable 'greatest timestamp wins' conflict resolution on table 'test.t1', using column 'mycol' as the 'timestamp'. This can be done using the following steps:

  1. Make sure that you have started the source *note 'mysqld': mysqld. with '--ndb-log-update-as-write=OFF'.

  2. On the source, perform this *note 'INSERT': insert. statement:

      INSERT INTO mysql.ndb_replication
          VALUES ('test', 't1', 0, NULL, 'NDB$MAX(mycol)');

    Note:

    If the 'ndb_replication' table does not already exist, you must create it. See *note ndb-replication-ndb-replication::.

    Inserting a 0 into the 'server_id' column indicates that all SQL nodes accessing this table should use conflict resolution. If you want to use conflict resolution on a specific *note 'mysqld': mysqld. only, use the actual server ID.

    Inserting 'NULL' into the 'binlog_type' column has the same effect as inserting 0 ('NBT_DEFAULT'); the server default is used.

  3. Create the 'test.t1' table:

      CREATE TABLE test.t1 (
          COLUMNS
          mycol INT UNSIGNED,
          COLUMNS
      ) ENGINE=NDB;

    Now, when updates are performed on this table, conflict resolution is applied, and the version of the row having the greatest value for 'mycol' is written to the replica.

Note:

Other 'binlog_type' options such as 'NBT_UPDATED_ONLY_USE_UPDATE' ('6') should be used to control logging on the source using the 'ndb_replication' table rather than by using command-line options.

NDB$OLD() example

Suppose an *note 'NDB': mysql-cluster. table such as the one defined here is being replicated, and you wish to enable 'same timestamp wins' conflict resolution for updates to this table:

 CREATE TABLE test.t2  (
     a INT UNSIGNED NOT NULL,
     b CHAR(25) NOT NULL,
     COLUMNS,
     mycol INT UNSIGNED NOT NULL,
     COLUMNS,
     PRIMARY KEY pk (a, b)
 )   ENGINE=NDB;

The following steps are required, in the order shown:

  1. First--and prior to creating 'test.t2'--you must insert a row into the 'mysql.ndb_replication' table, as shown here:

      INSERT INTO mysql.ndb_replication
          VALUES ('test', 't2', 0, 0, 'NDB$OLD(mycol)');

    Possible values for the 'binlog_type' column are shown earlier in this section; in this case, we use '0' to specify that the server default logging behavior be used. The value ''NDB$OLD(mycol)'' should be inserted into the 'conflict_fn' column.

  2. Create an appropriate exceptions table for 'test.t2'. The table creation statement shown here includes all required columns; any additional columns must be declared following these columns, and before the definition of the table's primary key.

      CREATE TABLE test.t2$EX  (
          server_id INT UNSIGNED,
          source_server_id INT UNSIGNED,
          source_epoch BIGINT UNSIGNED,
          count INT UNSIGNED,
          a INT UNSIGNED NOT NULL,
          b CHAR(25) NOT NULL,
    
          [ADDITIONAL_COLUMNS,]
    
          PRIMARY KEY(server_id, source_server_id, source_epoch, count)
      )   ENGINE=NDB;

    We can include additional columns for information about the type, cause, and originating transaction ID for a given conflict. We are also not required to supply matching columns for all primary key columns in the original table. This means you can create the exceptions table like this:

      CREATE TABLE test.t2$EX  (
          NDB$server_id INT UNSIGNED,
          NDB$source_server_id INT UNSIGNED,
          NDB$source_epoch BIGINT UNSIGNED,
          NDB$count INT UNSIGNED,
          a INT UNSIGNED NOT NULL,
    
          NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW',
            'REFRESH_ROW', 'READ_ROW') NOT NULL,
          NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS',
            'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,
          NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,
    
          [ADDITIONAL_COLUMNS,]
    
          PRIMARY KEY(NDB$server_id, NDB$source_server_id, NDB$source_epoch, NDB$count)
      )   ENGINE=NDB;

    Note:

    The 'NDBprefixisrequiredforthefourrequiredcolumnssinceweincludedatleastoneofthecolumnsNDBOP_TYPE', 'NDBCFTCAUSE′,orNDBORIG_TRANSID' in the table definition.

  3. Create the table 'test.t2' as shown previously.

These steps must be followed for every table for which you wish to perform conflict resolution using 'NDB$OLD()'. For each such table, there must be a corresponding row in 'mysql.ndb_replication', and there must be an exceptions table in the same database as the table being replicated.

Read conflict detection and resolution

NDB Cluster also supports tracking of read operations, which makes it possible in circular replication setups to manage conflicts between reads of a given row in one cluster and updates or deletes of the same row in another. This example uses 'employee' and 'department' tables to model a scenario in which an employee is moved from one department to another on the source cluster (which we refer to hereafter as cluster A) while the replica cluster (hereafter B) updates the employee count of the employee's former department in an interleaved transaction.

The data tables have been created using the following SQL statements:

 # Employee table
 CREATE TABLE employee (
     id INT PRIMARY KEY,
     name VARCHAR(2000),
     dept INT NOT NULL
 )   ENGINE=NDB;

 # Department table
 CREATE TABLE department (
     id INT PRIMARY KEY,
     name VARCHAR(2000),
     members INT
 )   ENGINE=NDB;

The contents of the two tables include the rows shown in the (partial) output of the following *note 'SELECT': select. statements:

 mysql> SELECT id, name, dept FROM employee;
 +---------------+------+
 | id   | name   | dept |
 +------+--------+------+
 ...
 | 998  |  Mike  | 3    |
 | 999  |  Joe   | 3    |
 | 1000 |  Mary  | 3    |
 ...
 +------+--------+------+

 mysql> SELECT id, name, members FROM department;
 +-----+-------------+---------+
 | id  | name        | members |
 +-----+-------------+---------+
 ...
 | 3   | Old project | 24      |
 ...
 +-----+-------------+---------+

We assume that we are already using an exceptions table that includes the four required columns (and these are used for this table's primary key), the optional columns for operation type and cause, and the original table's primary key column, created using the SQL statement shown here:

 CREATE TABLE employee$EX  (
     NDB$server_id INT UNSIGNED,
     NDB$source_server_id INT UNSIGNED,
     NDB$source_epoch BIGINT UNSIGNED,
     NDB$count INT UNSIGNED,

     NDB$OP_TYPE ENUM( 'WRITE_ROW','UPDATE_ROW', 'DELETE_ROW',
                       'REFRESH_ROW','READ_ROW') NOT NULL,
     NDB$CFT_CAUSE ENUM( 'ROW_DOES_NOT_EXIST',
                         'ROW_ALREADY_EXISTS',
                         'DATA_IN_CONFLICT',
                         'TRANS_IN_CONFLICT') NOT NULL,

     id INT NOT NULL,

     PRIMARY KEY(NDB$server_id, NDB$source_server_id, NDB$source_epoch, NDB$count)
 )   ENGINE=NDB;

Suppose there occur the two simultaneous transactions on the two clusters. On cluster A, we create a new department, then move employee number 999 into that department, using the following SQL statements:

 *note BEGIN: commit.;
   *note INSERT: insert. INTO department VALUES (4, "New project", 1);
   _*note UPDATE: update. employee SET dept = 4 WHERE id = 999;_
 COMMIT;

At the same time, on cluster B, another transaction reads from 'employee', as shown here:

 BEGIN;
   _SELECT name FROM employee WHERE id = 999;_
   UPDATE department SET members = members - 1  WHERE id = 3;
 commit;

The conflicting transactions are not normally detected by the conflict resolution mechanism, since the conflict is between a read ('SELECT') and an update operation. You can circumvent this issue by executing *note 'SET': set-variable. 'ndb_log_exclusive_reads' '= 1' on the replica cluster. Acquiring exclusive read locks in this way causes any rows read on the source to be flagged as needing conflict resolution on the replica cluster. If we enable exclusive reads in this way prior to the logging of these transactions, the read on cluster B is tracked and sent to cluster A for resolution; the conflict on the employee row is subsequently detected and the transaction on cluster B is aborted.

The conflict is registered in the exceptions table (on cluster A) as a 'READ_ROW' operation (see *note conflict-resolution-exceptions-table::, for a description of operation types), as shown here:

 mysql> SELECT id, NDB$OP_TYPE, NDB$CFT_CAUSE FROM employee$EX;
 +-------+-------------+-------------------+
 | id    | NDB$OP_TYPE | NDB$CFT_CAUSE     |
 +-------+-------------+-------------------+
 ...
 | 999   | READ_ROW    | TRANS_IN_CONFLICT |
 +-------+-------------+-------------------+

Any existing rows found in the read operation are flagged. This means that multiple rows resulting from the same conflict may be logged in the exception table, as shown by examining the effects a conflict between an update on cluster A and a read of multiple rows on cluster B from the same table in simultaneous transactions. The transaction executed on cluster A is shown here:

 BEGIN;
   INSERT INTO department VALUES (4, "New project", 0);
   _UPDATE employee SET dept = 4 WHERE dept = 3;_
   SELECT COUNT(*) INTO @count FROM employee WHERE dept = 4;
   UPDATE department SET members = @count WHERE id = 4;
 COMMIT;

Concurrently a transaction containing the statements shown here runs on cluster B:

 SET ndb_log_exclusive_reads = 1;  # Must be set if not already enabled
 ...
 BEGIN;
   _SELECT COUNT(*) INTO @count FROM employee WHERE dept = 3 FOR UPDATE;_
   UPDATE department SET members = @count WHERE id = 3;
 COMMIT;

In this case, all three rows matching the 'WHERE' condition in the second transaction's 'SELECT' are read, and are thus flagged in the exceptions table, as shown here:

 mysql> SELECT id, NDB$OP_TYPE, NDB$CFT_CAUSE FROM employee$EX;
 +-------+-------------+-------------------+
 | id    | NDB$OP_TYPE | NDB$CFT_CAUSE     |
 +-------+-------------+-------------------+
 ...
 | 998   | READ_ROW    | TRANS_IN_CONFLICT |
 | 999   | READ_ROW    | TRANS_IN_CONFLICT |
 | 1000  | READ_ROW    | TRANS_IN_CONFLICT |
 ...
 +-------+-------------+-------------------+

Read tracking is performed on the basis of existing rows only. A read based on a given condition track conflicts only of any rows that are found and not of any rows that are inserted in an interleaved transaction. This is similar to how exclusive row locking is performed in a single instance of NDB Cluster.

 File: manual.info.tmp, Node: mysql-cluster-news, Prev: mysql-cluster-replication, Up: mysql-cluster