Reducing the size of the distribution database & improving transactional replication performance: Part 3

Part 1 of this series of articles looked at the intricacies of SQL Server transactional replication, and made the observation that the clean-up mechanism treats publications as a collective.  Part 2 looked at how differing distribution job schedules can cause unnecessary bloating in the distribution database.

The final (and much delayed) part of this series focuses on the choice between replicating stored procedure calls and replicating the data modified by stored procedure calls.

As an example consider the scenario where you have a stored procedure that deletes old data from a table periodically.  Lets say that it runs daily and deletes roughly one million records per execution.  Here you have a choice in how you replicate the changes made by the procedure:

  1. You allow the deletes to be replicated individually.
  2. You replicate the stored procedure call.

Both of these choices will end up with the same result; however, the impact of each can be significantly different.

Choice one is the default.  By this I mean that you will typically add articles to replicate data from a table in site A to a table in site B.  With this scenario any inserts, updates, and deletes will be replicated individually.  By using this method, based on the example above you would be writing 1 million seperate entries into the distribution database. All of which would have to be synchronised individually with the replication subscriber over the network.

Choice two is adding an article to a publication for a specific procedure call.  Then, when that procedure is called, instead of writing each delete to the distribution database individually you actually just write one entry into the distribution database for synchronisation.  This one entry is the stored procedure call.  Here, rather than each delete being synchronised individually, the stored procedure call is replicated.  This means that the same call that was made on the publisher is also made on the subscriber.  Therefore, there is only one item that is replicated rather than one million, and the same records will be deleted from the subscriber via its own stored procedure call.  This has a performance benefit for the distribution database as the amount of data it is holding and synchronising can be significantly reduced. Network IO will also benefit significantly in this scenario.

Whilst the performance benefits for replication by selecting choice two in the above example are obvious, you probably won’t want to be replicating every single procedure call that you make.  You’ll need to find those procedures that perform a large number of DML changes in an individual call and add them to the list of items to replicate.  Now if you already have an article that replicates all changes to Table A you can keep this in place without any changes, even if you want to replicate a procedure that makes DML changes to Table A.  By having the article in place for the stored procedure call, these changes just won’t get read by the article replicating Table A.  It is simple, and if used in the correct scenarios it can be hugely beneficial to your replication environment and system as a whole.

Implementing

To replicate a stored procedure call you’ll need to add it as an article using sp_addarticle. You can create a specific publication for procedure calls or add it to an existing transactional publication. It is up to you. An example execution of sp_addarticle is shown in the example below.

EXEC sp_addarticle @publication = N'Your publication name'
				 , @article = N'Your procedure name'
				 , @source_owner = N'schema owner, e.g. dbo'
				 , @source_object = N'Your procedure name'
				 , @destination_table = N'Your procedure name'
				 , @type = N'proc exec'
				 , @description = null
				 , @pre_creation_cmd = N'none'
				 , @schema_option = 0x0000000008000001
				 , @status = 16
				 , @vertical_partition = N'false'
				 , @filter = null
				 , @sync_object = null;

Now one thing that is worth mentioning here is what you select for @type. Microsoft recommends you use serializable proc exec, however with this option set your procedure call will only replicate if it executes within an explicit transaction using the serializable isolation level. If your procedure call doesn’t meet this specific criteria you’ll need to use proc exec as specified in the example above.

To check whether your stored procedure call is replicating you can check out the distribution database after the call has executed.  If your replication article is set up correctly for your procedure you should be able to amend the following query specifying your procedure name and see some entries for it returned. You then just need to wait for it to replicate (or give it a nudge by executing the distribution job), and check that the expected changes have also been made on the subscriber.

--Query to find repl commands for specified article
SELECT TOP 10 art.article
     , trn.entry_time
     , cmd.[command]
FROM   [distribution].[dbo].[MSrepl_commands] AS cmd
INNER JOIN [distribution].[dbo].[MSarticles] AS art
   ON  cmd.article_id = art.article_id
INNER JOIN [distribution].[dbo].[MSrepl_transactions] AS trn
   ON  cmd.xact_seqno = trn.xact_seqno
WHERE  art.article = 'Your procedure name';

For more information you can look at the following Microsoft articles:

Reducing the size of the distribution database & improving transactional replication performance: Part 2

The impact of having data sitting in the distribution database that doesn’t need to be there can be significant.  The aim of this series of articles is show you how to keep the distribution database as small (and efficient) as possible, and to keep the impact of the clean-up job to a minimum.  Each of the steps outlined will help reduce the data that is unnecessarily held in the distribution database, and free up resources for other things.

Part 1 of this article focussed on the intricacies of SQL Server transactional replication, and made the observation that the clean-up mechanism treats publications as a collective rather than individuals.  Part 1 looked into publication settings, and part 2 looks at how distribution job schedules can cause unnecessary bloating in the distribution database.

When the distribution clean-up job runs it executes the sp_MSdistribution_cleanup procedure.  This procedure then executes a number of other procedures as part of the clean-up process.  The most important procedure in terms of what will actually be removed is sp_MSmaximum_cleanup_seqno.  This procedure finds the maximum transaction sequence number to be removed from the distribution DB, and uses the following process to find this:

  1. Find the most recent synchronisation for each publication in the MSdistribution_history table.*
  2. Find the oldest of all of the synchronisations from step 1.
  3. Find the most recent transaction from MSrepl_transactions that is older than the synchronisation from step 2.  This is the latest transaction to remove from the distribution DB.

* The MSdistribution_history table holds details of all synchronisations within the transaction retention period (default of 72 hours).  If any distribution agent has not synchronised during this time then the oldest value in MSsubscriptions is used for that distribution agent.

Differing distribution schedules

If, for example, you have two distributions, one with a schedule of once a minute, and one with a schedule of once a day you will see something similar to the following:

  1. Find the most recent synchronisation for each publicationin the MSdistribution_history table.
    • You will have one synchronisation up to one minute old, and one synchronisation up to one day old.
  2. Find the oldest of the synchronisations from step 1.
    • This will most likely be the synchronisation up to one day old.
  3. Find the most recent transaction that is older than the synchronisation from step 2.  This is the latest transaction to remove from the distribution DB.

In this specific example the MSrepl_transactions and MSrepl_commands tables will only be cleaned up once per day.  For the rest of the day these tables will be filling up and the clean-up job will be running, reading more data and taking more resources each time it runs, but deleting nothing.

The resolution to this is simple; just set all of the distribution job schedules to be the same or similar.

Part 3 of the series will focus on replicating intensive stored procedure executions rather than the outcome of the execution.

Reducing the size of the distribution database & improving transactional replication performance: Part 1

In this series of blog posts I will looking at issues regarding the size and performance of the distribution database as part of a transactional replication environment.

Symptoms:

  • Your distribution database is growing quite large
  • The distribution cleanup job is taking a long time to run, yet not clearing as much data as you would expect
  • Disk reads are high throughout the day regardless of system throughput

When using transactional replication all of the data to synchronise with the subscriber is held in the MSrepl_commands and MSrepl_transactions tables.  Once transactions have been committed at the subscriber you probably want them to be cleaned up immediately by the Distribution clean up job, however this may not be happening.  Depending upon the amount of transactions being replicated this can cause the distribution database to grow significantly and impact system performance.

The distribution clean up job runs every ten minutes (on its default schedule) and each time it runs it looks for transactions that can be cleared.  You may well find that the clean up job is running for long periods, and that it is hitting the disk hard whilst achieving very little in terms of actual clean up.

Take a look at the following scenario:

  1. DML changes are made in the published database
  2. Changes get written to the replication tables
  3. Changes get replicated to the subscriber
  4. Cleanup looks to find entries to remove from the replication tables
  5. Cleanup only removes a small subset of entries seemingly ignoring any recently replicated entries
  6.  More DML changes get written (back to step 1)

As you can see with busy systems the replication tables can quickly fill up, and if you are not using the correct publication settings you may well be holding information you don’t need to the detriment of replication and your system as a whole.

This series of articles discusses a number of techniques for reducing the size of the distribution DB and improving Transactional replication performance.

Part 1 – Only keep the data that hasn’t been synchronised

The first issue to check for is data being replicated correctly but not being cleared out of the distribution database afterwards.  If this is the case it is being caused by the immediate_sync publication setting.  This setting causes all transactions to be held for the full retention period of the distributor rather than just holding the transactions that haven’t been synchronised.  This means that each time the distribution clean up job runs it will only be deleting entries older than the retention period.  The default transaction retention period is 72 hours.

Distributor Properties

The excellent article by Paul Ibson (http://www.replicationanswers.com/TransactionalOptimisation.asp) covers how to identify and resolve this issue by setting the immediate_sync and allow_anonymous values, however this article only mentions altering a specific publication.

If you have multiple transactional publications for the same publisher you will need to ensure that you update each publication.  Otherwise the problem will not be resolved.

The sp_MSdelete_publisherdb_trans procedure is called as part of the Distribution clean up process, and this procedure checks whether there are any publications with immediate_sync set.  If so it will only delete commands and transactions older than the retention period regardless of whether there are any publications that aren’t set to immediate_sync.

Once you have changed the immediate_sync and allow_anonymous settings for every transactional publication you should execute the distribution clean up job.  This may well take some time to run the first time, however, when it completes, when checking the job log you should see a large number of deletes.  From this point on the job should be far quicker to execute and will actually clean up the MSrepl_transactions and MSrepl_commands tables.

It is important to note that the distribution cleanup mechanism treats the publications as a collective rather than individuals. So when it performs deletes it does it based on the collective rather than individual settings, however, the way that it does this is not obvious.

This will form the starting point for part 2 of the series.