Enterprise Replication (ER)
Enterprise Replication (ER) technology, which is both flexible and powerful, allows businesses to replicate data from branch offices to corporate office, replicate data from corporate office to branch offices, replicate data closer to customers.
ER works by you first defining the servers among which you would like data to be
replicated. This creates a network topology — of root, leaf, and non-root, non-leaf
nodes — that will transfer the data. Each ER node
may be a
single server or a cluster of servers, which this article further discusses below. All
the interconnected nodes together are called an ER domain
. The
domain does not define what data will be transferred, just the paths or routes along
which the data will flow.
Next, you decide what data you would like to be replicated. This is defined by an SQL
query involving a server, database, table, and columns. The query acts as a filter whose
output decides the data to replicate and proves to be a very flexible tool. If you'd
like an entire table to be replicated, your query would be SELECT * FROM
...
. If you'd like just the first three columns to be replicated, your
query would now be SELECT column1, column2, column3 FROM ...
. Or, if
you'd like only certain rows to be replicated, you just use a WHERE
clause in your query. ER can replicate data with very broad or very fine filters. To
help with replication ER requires that the table have a primary key defined.
After the query is written, determine what nodes should participate in replicating the
data. Let's say you want nodes A, B, and C to all have the same data in their Employee
table. So A, B, and C are your participants
. ER can be
configured so that if data changes are made on any server, the other participants become
updated too. This is called an update anywhere
scenario.
What happens if you only want updates to flow from A, into both B and C, but back never
to A? ER gives you the flexibility to do these primary-target
scenarios as well. In situations where data changes occur on more than one participant,
it may happen that a row changed on server B conflicts with a change occurring on server
C. ER allows you to define rules for automatically handling such conflicts. These
include comparing timestamps, running a stored procedure, and ignoring conflicts.
Now that you know the query, the participants, and the scenario, take all this
information and use it to create something called a replicate
.
Think of a replicate as a faucet that controls the flow of data from the query. The
faucet can be turned on or off, and you can have as many faucets as you want. Replicates
can be grouped into sets, which make it easier for users to control multiple replicates.
Also, you can use templates to help you quickly create replicates for multiple tables.
All this work is done through the server utility cdr
.
After a replicate has been created and started, how does the data actually get moved? Well, first of all, the replication happens asynchronously. This means there may be a delay between data being committed on one node and it appears on another node. And only committed data, of course, is replicated. ER works by reading the logical logs, testing if a log record needs to be replicated, and finally sending that information to the appropriate participants.