Wednesday, March 28, 2012

Merge Filtering

I'm trying to get this filtering configuration stuff worked out so that I
can really understand it and it's been a little difficult finding a
complete example to go by. Let me illustrate my situation this way:
Tier1 <--> Tier2 <--> Tier3
So, we have data being created at Tier1 that is being replicated to
Tier2. Tier2 will republish some of this and will also publish some
information created/owned at Tier2 down to Tier3.
There are multiple Tier2 location communicating with Tier1, and multiple
Tier3 locations for each Tier2. Merge replication was selected because
updates and inserts are being done at all tiers and there is the
possibility of frequent disconnects at any point in the chain, possibly
for long periods. We're currently looking at PULL subscriptions that are
anonymous (there could be hundreds of Tier2s).
So the filtering question comes down to how to control the partitioning
of data to each Tier2; some Tier1 data is for everyone, some isn't. The
database contains a configuration table that, along with other things,
contains an identifier for each Tier location (maybe we can use
this...maybe not?). I don't know in advance what the various machine
names are and I don't have any control over the domains they are running
in. Quite likely they will be mixed domains so we have to assume this is
an untrusted situation that will be using SQL accounts for
authentication.
Can someone out there please help point me in the right direction on
this? I see references to filtering by HOST_NAME() but since I can't
anticipate the host names (and there could be hundreds) I would really
rather have a solution that is more dynamic, that can just "figure out"
how the data is split based on a data value compared against a Tier1 list
of all possible values (e.g. a list of all customers at Tier1 is compared
against a customer ID that is passed from the subscriber).
THANKS!
Ken
This can be very very tricky. Normally when you do this sort of republishing
defining ownership becomes critical in setting up the correct topology.
Tier 1 owns everything. Each Tier 2 will own a portion of what Tier 1 owns.
Each Tier 3 will own a portion of Tier 2, and a smaller portion of Tier 1.
You have to come up with a filter which will allow you to filter all the
data for all the tiers.
So it looks like you have a location ID. I would create my filters so they
look like this on Tier 1 pulling to Tier 2.
I think you will need separate publications with multiple filtering criteria
possible using a UDF or a subquery.
Consider a tier system like this
Tier 1 = locationID's of 1, 2, 3, 4, 5, 6
Tier 2 =locationID's of 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22,
23, 24, 25, etc
Tier 1 locationID 1 owns Tier 2 locationsIDs of
10,11,12,13,14,15,16,17,18,19
Tier 1 locationID 2 owns Tier 2 locationsIDs of
20,21,22,23,24,25,26,27,28,29
The create a location table which will look like this:
11, 'tier2Server1'
12, 'tier2Server1'
13, 'tier2Server1'
14, 'tier2Server1'
21, 'tier2Server2'
22, 'tier2Server2'
When your filter will look like this
SELECT <published_columns> FROM [dbo].[MergeTable] WHERE
locationID in (select locationID from LocationTable where
HostName=Host_Name())
Then override the value of -HostName in each of your Pull Subscribers with
tier2ServerX, replace X with the number representing your server in teh look
up table.
you would do the same thing for the Tier3 servers.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Ken Ross" <kross@.horizon-boss.com> wrote in message
news:Xns95DC898AF82C8krosshorizonbosscom@.207.46.24 8.16...
> I'm trying to get this filtering configuration stuff worked out so that I
> can really understand it and it's been a little difficult finding a
> complete example to go by. Let me illustrate my situation this way:
> Tier1 <--> Tier2 <--> Tier3
> So, we have data being created at Tier1 that is being replicated to
> Tier2. Tier2 will republish some of this and will also publish some
> information created/owned at Tier2 down to Tier3.
> There are multiple Tier2 location communicating with Tier1, and multiple
> Tier3 locations for each Tier2. Merge replication was selected because
> updates and inserts are being done at all tiers and there is the
> possibility of frequent disconnects at any point in the chain, possibly
> for long periods. We're currently looking at PULL subscriptions that are
> anonymous (there could be hundreds of Tier2s).
> So the filtering question comes down to how to control the partitioning
> of data to each Tier2; some Tier1 data is for everyone, some isn't. The
> database contains a configuration table that, along with other things,
> contains an identifier for each Tier location (maybe we can use
> this...maybe not?). I don't know in advance what the various machine
> names are and I don't have any control over the domains they are running
> in. Quite likely they will be mixed domains so we have to assume this is
> an untrusted situation that will be using SQL accounts for
> authentication.
> Can someone out there please help point me in the right direction on
> this? I see references to filtering by HOST_NAME() but since I can't
> anticipate the host names (and there could be hundreds) I would really
> rather have a solution that is more dynamic, that can just "figure out"
> how the data is split based on a data value compared against a Tier1 list
> of all possible values (e.g. a list of all customers at Tier1 is compared
> against a customer ID that is passed from the subscriber).
> THANKS!
> Ken
|||"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in
news:eemC3QO#EHA.2032@.tk2msftngp13.phx.gbl:

> This can be very very tricky. Normally when you do this sort of
> republishing defining ownership becomes critical in setting up the
> correct topology.
> The create a location table which will look like this:
> 11, 'tier2Server1'
> 12, 'tier2Server1'
> 13, 'tier2Server1'
> 14, 'tier2Server1'
> 21, 'tier2Server2'
> 22, 'tier2Server2'
> When your filter will look like this
> SELECT <published_columns> FROM [dbo].[MergeTable] WHERE
> locationID in (select locationID from LocationTable where
> HostName=Host_Name())
>
> Then override the value of -HostName in each of your Pull Subscribers
> with tier2ServerX, replace X with the number representing your server
> in teh look up table.
> you would do the same thing for the Tier3 servers.
>
Hilary,
Thanks for the quick reply. I was just reading about overriding HOST_NAME
() on the subscriber and was wondering if that was going to be part of
the solution or if there was some magic with Dynamic Filtering that could
save me. One thing I'm really trying to keep in mind with this is the
ease of setup on the clients; I'm likely going to script out the
subscriber setup and my current thoughts are to tokenize the script so
that at installation time I can query certain information from the user
and local database, modify the script accordingly [search/replace the
tokens], and then run the script through osql to setup the subscription.
For what it's worth, we're also working with Yukon (Ascend) and will be
launching all this on SQL2005 when it ships. Nothing like the bleading
edge, eh?
Thanks again - I'm going to give this a try.
Ken
|||I think key to this are dynamic snapshots and dynamic filtering.
I don't know a lot about your clients, but you might want to look at WSM
(Windows Synchronization Manager) as it has rich features for pulling
(nothing you can't accomplish with a plain vanilla pull subscription).
I am glad to hear you are working with the Ascend team. Yukon has some great
features. Hopefully you will get some high level guidance on how to
accomplish this.
On a side note, you might want to look at incorporating UDF's into your
filters as these may offers better performance.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Ken Ross" <kross@.horizon-boss.com> wrote in message
news:Xns95DC9E6C2B68Bkrosshorizonbosscom@.207.46.24 8.16...
> "Hilary Cotter" <hilary.cotter@.gmail.com> wrote in
> news:eemC3QO#EHA.2032@.tk2msftngp13.phx.gbl:
>
> Hilary,
> Thanks for the quick reply. I was just reading about overriding HOST_NAME
> () on the subscriber and was wondering if that was going to be part of
> the solution or if there was some magic with Dynamic Filtering that could
> save me. One thing I'm really trying to keep in mind with this is the
> ease of setup on the clients; I'm likely going to script out the
> subscriber setup and my current thoughts are to tokenize the script so
> that at installation time I can query certain information from the user
> and local database, modify the script accordingly [search/replace the
> tokens], and then run the script through osql to setup the subscription.
> For what it's worth, we're also working with Yukon (Ascend) and will be
> launching all this on SQL2005 when it ships. Nothing like the bleading
> edge, eh?
> Thanks again - I'm going to give this a try.
> Ken

No comments:

Post a Comment