Hi,
In my merge replication scenario I use Push subscriptions. The
distributor is SQL server 2000 Sp3a, and subscribers are MSDE SP3a.
Everything works fine but recently merge agents started to "hang"
reporting no errors at random phases of replication session. They stop
sending or receiving any data between distributor and subscriber
(there is no network traffic). SQL Profiles show no activity on a
subscriber server. Last message generated by agent is "The process is
running and is waiting for a response from one of the backend
connections".
If there is a problem with communication between servers one would
expect returnig an error code and retrying operation. In my case the
merge agent "hangs" and waits for nothing. After 1 or 2 hours it
reports a communication link failure (waits for a random period of
time).
To continue a Merge process I have to Stop and Start the merge agent
manually.
Since my replication sends a lot of data from subscribers
(MSMerge_genhistory has approx 50000 entries when retention period is
set to 2 days.) after uploading data changes to the publisher (the
procedure sp_MScheckexistsgeneration is executed for every one of
50000 generations - am I right about this?) it takes a long time (40
minutes) to complete. If merge agent hangs, the process must be run
from the beginning. In this situation it is almost impossible to
complete the replication session.
What can be the cause of such behavior of merge agents?
Last lines of output (level 3) generated by agent are as follows:
{call sp_MScheckexistsgeneration (?, ?) }
{call sp_MScheckexistsgeneration (?, ?) }
Percent Complete: 0
The process is running and is waiting for a response from one of the
backend connections.
Repl Agent Status: 3
Tomek
Tomek,
it's difficult to say, but in some sense this looks like a connectivity
issue. However, when I've seen such a situation before, the error was
'General network error'. You might like to verify is this is/is not the
case, by using a Network Monitor tool:
http://support.microsoft.com/default...48942&sd=tech.
Alternatively you could open a window in QA on the publisher, with the
window connected to the subscriber SQL Server. In the window do a simple
select command in a loop and return the date. This should tell you if/when
the connectivity had problems.
Can you check in current activity or sp_who2/sp_lock/sp_blocker to identify
if there is any blocking involved. If the problem is not connectivity then
this would be my next test.
I'd also like to know what happens when you restart the agent - does it work
quickly or still take a long time but is successful.
No doubt you have considered this, but if possible can you synchronize more
often to shorten the batchsize?
You might also optimize performance by running
sp_mergemetadataretentioncleanup manually and optimize
he -DownloadGenerationsPerBatch parameter.
HTH,
Paul Ibison
|||Paul,
Thank you for your response. I hope my answers to your questions will
help us to solve the problem.
"Paul Ibison" <Paul.Ibison@.Pygmalion.Com> wrote in message news:<#EzzqyDQEHA.1312@.TK2MSFTNGP12.phx.gbl>...
> it's difficult to say, but in some sense this looks like a connectivity
> issue. However, when I've seen such a situation before, the error was
> 'General network error'. You might like to verify is this is/is not the
> case, by using a Network Monitor tool:
> http://support.microsoft.com/default...48942&sd=tech.
It can be related to connectivity because merge agents connect to
subscribers that not always use reliable network connection. But the
problem is that instead of reporting General network error and
retrying operation agent waits (hangs) doing nothing.
> Alternatively you could open a window in QA on the publisher, with the
> window connected to the subscriber SQL Server. In the window do a simple
> select command in a loop and return the date. This should tell you if/when
> the connectivity had problems.
QA in loop worked fine (no connectivity problems), but in the same
time agent assigned to the same subscriber hanged.
> Can you check in current activity or sp_who2/sp_lock/sp_blocker to identify
> if there is any blocking involved. If the problem is not connectivity then
> this would be my next test.
I've checked this ne too. Blocking is not the reason.
> I'd also like to know what happens when you restart the agent - does it work
> quickly or still take a long time but is successful.
I will describe how my agent works. Please tell me is it correct
behavior.
1) After connecting to subscriber agent uploads all the data changes
made by subscriber to the publisher. This works very fast and i do not
see any problem here (agent has never stopped at this phase).
2) Merge agent executes sp_MScheckexistsgeneration for each of 50000
generations stored in msmerge_genhistory at subscriber side. If there
is some data to download, it is downloaded to subscriber, and then
merge agent checks next generation.
Now let's assume that merge agent "hanged" ot generation 39000. After
restarting the agent it starts the process from the beginning checking
each of 39000 generations that where checked and merged in previus
session.
The merge agent is succesfull only if it is able to go through 50000
generations without loosing connection to subscriber. That is wy it is
so difficult to complete merge session.
IMHO the merge agent should start from generation 39000 making
completing session possible even on not reliable connections. Is it
"by design" behaviour?
> No doubt you have considered this, but if possible can you synchronize more
> often to shorten the batchsize?
It is not possible. subscribers can not synchronize on saturday and
sunday. So 2 days retention perod is minumum.
> You might also optimize performance by running
> sp_mergemetadataretentioncleanup manually and optimize
> he -DownloadGenerationsPerBatch parameter.
I think the problem is not metadata at publisher but metadata at
subscribers. It is not deleted after succesfull replication session (i
think it must work this way).
DownloadGenerationsPerBatch does not help because most of the time
merge agent executes sp_MScheckexistsgeneration without downloading
any data changes. Or maybe I am misunderstanding this parameter?.
Tomek
|||Tomek,
Slow and unreliable connections can result in more retries by the Merge
Agent, none of which you've seen. Your QA test also indicates the same
thing, so we can cross this off the list.
Answers inline...
> IMHO the merge agent should start from generation 39000 making
> completing session possible even on not reliable connections. Is it
> "by design" behaviour?
>
I guess it is not processed this way to try to avoid potential
non-convergence as inserts and child records could be in separate batches
(http://support.microsoft.com/default...b;enus;Q308266)
>
Having MetadataRetentionCleanup 1 should sort this out, but you could run
manually sp_mergemetadataretentioncleanup on the publisher and subscriber to
see if things are improved. It won't remove the 50000 entries, but what I am
hoping is that the search for new generations will be noticeably quicker
after old ones are removed.
> DownloadGenerationsPerBatch does not help because most of the time
> merge agent executes sp_MScheckexistsgeneration without downloading
> any data changes. Or maybe I am misunderstanding this parameter?.
Increasing the -DownloadGenerationsPerBatch Merge Agent parameters (even to
its maximum value of 2000) is a recommendation where a Subscriber has not
merged data with the Publisher for an extended period time, during which the
Publisher or other Subscribers have made numerous changes
(http://www.microsoft.com/technet/pro...n/mergperf.msp
x). So, I'd definitley test altering this parameter's value.
(Powodzenia)
Paul
Monday, March 26, 2012
Merge Agent Hangs (reporting no errors)
Labels:
agent,
database,
errors,
hangs,
merge,
microsoft,
msde,
mysql,
oracle,
push,
replication,
reporting,
scenario,
server,
sp3a,
sql,
subscribers,
subscriptions,
thedistributor
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment