In this data-driven era, replication is often a critical requirement for achieving a modern, agile database management environment. It is believed designing an enterprise-grade dataset is the to achieving this requirement but building datamarts from datasets always presents certain challenges
In this article, we’ll discuss what it takes to setup “central subscriber with multiple publishers” replication model, to create an aggregate dataset from multiple sources, and you’ll also see how to scale with the data.
Here’s a sneak peek of the topics:
- Replication model; Central subscriber-multiple publisher
- Understanding the state of the data
- Highlights of a custom transaction log reading/replication
- Setup and internals
- And more!
Technology
For our SQL Server transaction log reading agent we will use ApexSQL Log, a 3rd party tool that can read the SQL Server transaction log and convert transactions into “redo” statements that can be executed on our various subscribers
Replication Model
The “central subscriber–multiple publishers” model is commonly used in situations where data from multiple sites needs to be consolidated at a central location while providing access to the local site with local data. A shipment data warehouse is a typical example. Where orders are placed at all publisher locations and the order confirmation details are pushed to central subscriber from which the order is shipped to the respective parties.
In short, the multiple publisher databases replicate their data to the same subscription table but the process may pose a unique problem. We need to have a model to include the information to create a unique identifier for every row in every publisher database. For example, in this model, several publishers replicate data to a single, central subscriber. Basically, this is a process to support the data consolidation concept at a central database.
In some other instances, the publisher databases replicate the data at the table level. In this case, the synchronization is straight-forward, as we deal with higher granularity.
For each article to be published in a Central Subscriber topology, a unique identifier should be defined to leverage the action. We can use a lookup table to generate the unique combination of rows in all the publishers. To publish rows from Publisher 1, unique combination of keys is defined in the article so that it will not end up in creating a duplicate row. Likewise, to publish rows from Publisher 2 and Publisher 3, and Publisher 4, the same logic is placed across all the articles.
Setup
Let us walk through the entire process in detail
-
Initial preparation
- Load the data into the Orders table on entire 4 publisher database. For example, the insert statement is generated in combination with region-specific details. In this case, the Order_id and Region_ID is a primary-key constraint.
-
Query the publisher and subscriber database. You noticed that there are missing numbers of rows at the subscriber database.
-
Build ApexSQL CLI commands
- Start ApexSQL Log
-
In the Select database, type in Server and Database details and Click Next
-
In Select data sources, leave the default Online transaction log option. This option allows the application to read online transaction log files.
-
Next, prepare redo script by selecting Undo/Redo script
-
Now, In the Filter setup, select continuous auditing feature. The tracking file is a unique feature and it induces an intelligence to track the entire transactions by maintaining the LSN (Log Sequence Number) value. The safeguards integrity of the entire transactions.
-
In the Tables, select the object to be published on the central subscriber. In this case, the table Orders is selected as publishing article.
-
The configuration is all most done for the publisher database. Let’s save the batch script in the “Batch file”. Save the command as replication.BAT.
-
The batch file content is shown below
“E:\Program Files\ApexSQL\ApexSQL Log\ApexSQLLog.com” /server:HQDBT01 /database:AdventureWorks2016 /trusted /redo:G:\Replication\redo.sql /operations:DMLALL /transactions:COMMIT BEGIN UNKNOWN /continuous:g:\bkp\AdventureWorks2016.axtr /tables:[dbo].[Orders]
-
Next, prepare the CLI commands for other publishers.
- You also have an option to test individual set of CLI commands or you can have all in a batch file and run it through PowerShell
- In this case, the CLI commands are integrated. It is very simple and straight-forward. You need to copy and paste the content one after the other as per the number of publisher databases.
- As you see, the ‘/database’ switch has four different databases and the ‘/continuous’ switch has four unique tracking files.
-
The SQL file is uniquely identified with a name. The name of the file <Publisher Name_<Date and timestamp>.sql is prepared and fed to ‘/redo’ switch
Build PowerShell Script Replication.ps1
- Load the SQL Server Module
- Assign the current date and timestamp value to $datetime variable
- Pass the variable $datetime to ApexSQL Log CLI command
- The batch file “replication.bat” is invoked and it generates four SQL replay files
- Execute the SQL file in the same cadence using Invoke-SQLCMD
1234567891011Import-Module SQLServer$datetime = (get-date).ToString("yyyyMMdd-HHmmss")cmd /c "G:\replication\replication.bat" $datetimeInvoke-SQLcmd -inputfile "G:\replication\Publisher1_$datetime.sql" -serverinstance "hqdbt01" -database "AdventureWorks"Invoke-SQLcmd -inputfile "G:\replication\Publisher2_$datetime.sql" -serverinstance "hqdbt01" -database "AdventureWorks"Invoke-SQLcmd -inputfile "G:\replication\Publisher3_$datetime.sql" -serverinstance "hqdbt01" -database "AdventureWorks"Invoke-SQLcmd -inputfile "G:\replication\Publisher4_$datetime.sql" -serverinstance "hqdbt01" -database "AdventureWorks"Execute PowerShell script Replication.ps1
Test the data
We can see that the aggregated set of rows got inserted into the subscriber databases.
1234567891011121314:Connect HQDBT01\SQL2017SELECT * FROM WideWorldImporters.dbo.OrdersGO:Connect HQDBT01SELECT * FROM Adventureworks2016.dbo.OrdersGO:Connect HQDBT01SELECT * FROM Adventureworks2014.dbo.OrdersGO:Connect HQDBT01SELECT * FROM Adventureworks2012.dbo.OrdersGO:Connect HQDBT01SELECT * FROM Adventureworks.dbo.Orders
Schedule a job
To automate, schedule the PowerShell script to run at any interval you want e.g. 5 minutes. Refer to the scheduling section in this article How to set up a DDL and DML SQL Server database transactional replication solution
Wrap Up
In the article we reviewed the “central subscriber with multiple publishers” use-case and then designed and built the system using a 3rd party tool, ApexSQL Log, a SQL Server transaction log reader, and common scripting technologies e.g. batch files and PowerShell.
I re-iterate the importance of replication design. Data replication design is a balancing act between managing data at its various forms. Replication design techniques are vital and things get complicated quickly if we’ve not designed properly. For example, in this model, subscribers should be treated as read-only because changes are not propagated back to the publisher. So inserts, updates, and deletes should be avoided at the subscriber database as this could lead to non-convergence.
That’s all for now… If you feel like giving it a try, feel free to let me know how it went on the comments below
Appendix
Batch Script
@echo off
SET “redofile=%1”
“E:\Program Files\ApexSQL\ApexSQL Log\ApexSQLLog.com” /server:HQDBT01 /database:AdventureWorks2016 /trusted /redo:G:\Replication\publisher1_%redofile%.sql /operations:DMLALL /transactions:COMMIT BEGIN UNKNOWN /continuous:g:\bkp\AdventureWorks2016.axtr /tables:[dbo].[Orders]
“E:\Program Files\ApexSQL\ApexSQL Log\ApexSQLLog.com” /server:HQDBT01 /database:AdventureWorks2014 /trusted /redo:G:\Replication\publisher2_%redofile%.sql /operations:DMLALL /transactions:COMMIT BEGIN UNKNOWN /continuous:g:\bkp\AdventureWorks2014.axtr /tables:[dbo].[Orders]
“E:\Program Files\ApexSQL\ApexSQL Log\ApexSQLLog.com” /server:HQDBT01 /database:AdventureWorks2012 /trusted /redo:G:\Replication\publisher3_%redofile%.sql /operations:DMLALL /transactions:COMMIT BEGIN UNKNOWN /continuous:g:\bkp\AdventureWorks2012.axtr /tables:[dbo].[Orders]
“E:\Program Files\ApexSQL\ApexSQL Log\ApexSQLLog.com” /server:HQDBT01\SQL2017 /database:WideWorldImporters /trusted /redo:G:\Replication\publisher4_%redofile%.sql /operations:DMLALL /transactions:COMMIT BEGIN UNKNOWN /continuous:g:\bkp\WideWorldImporters.axtr /tables:[dbo].[Orders]
SQL
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758:Connect <Publisher1>Use Adventureworks2016GOCREATE TABLE Orders(Order_Id int,Item_Name varchar(100),Order_Qty int,Order_Date DATETIME default GETDATE(),Region_ID int,CONSTRAINT PK_OrdersRegion PRIMARY KEY (Order_Id,Region_ID))INSERT INTO Orders VALUES (1,'ApexSQL Recover',5,Getdate(),3)INSERT INTO Orders VALUES (2,'ApexSQL Restore',3,Getdate(),3):Connect <Publisher1>Use Adventureworks2016GOCREATE TABLE Orders(Order_Id int,Item_Name varchar(100),Order_Qty int,Order_Date DATETIME default GETDATE(),Region_ID int,CONSTRAINT PK_OrdersRegion PRIMARY KEY (Order_Id,Region_ID))INSERT INTO Orders VALUES (1,'ApexSQL Defrag',5,Getdate(),4)INSERT INTO Orders VALUES (2,'ApexSQL Decrypt',3,Getdate(),4):Connect <Publisher3>Use Adventureworks2016GOCREATE TABLE Orders(Order_Id int,Item_Name varchar(100),Order_Qty int,Order_Date DATETIME default GETDATE(),Region_ID int,CONSTRAINT PK_OrdersRegion PRIMARY KEY (Order_Id,Region_ID))INSERT INTO Orders VALUES (1,'ApexSQL Defrag',5,Getdate(),2)INSERT INTO Orders VALUES (2,'ApexSQL Decrypt',3,Getdate(),2):Connect <Publisher4>Use Adventureworks2016GOCREATE TABLE Orders(Order_Id int,Item_Name varchar(100),Order_Qty int,Order_Date DATETIME default GETDATE(),Region_ID int,CONSTRAINT PK_OrdersRegion PRIMARY KEY (Order_Id,Region_ID))INSERT INTO Orders VALUES (1,'ApexSQL Defrag',5,Getdate(),1)INSERT INTO Orders VALUES (2,'ApexSQL Decrypt',3,Getdate(),1)Table of contents
Latest posts by Prashanth Jayaram (see all)- A quick overview of database audit in SQL - January 28, 2021
- How to set up Azure Data Sync between Azure SQL databases and on-premises SQL Server - January 20, 2021
- How to perform Azure SQL database Import/Export operations using PowerShell - January 14, 2021