Friday, March 30, 2012

Merge Join's poor performance

Hello All,

I'm experiencing performance problems with the merge join task.
Every time I'm building a nice package using this task, I'm ending up deleting it and using SQL statement in the OLE DB source to accomplish the join since it takes forever to run and crushing my computer at the process.
It makes me feel I don't use the abilities SSIS has to offer compared to DTS.
Of course for the use of several thousands of records it works fine, but in a production surrounding with hundred of thousands of rows, it seems to be futile.

Maybe someone had a little more luck with it?

Liran R wrote:

Hello All,

I'm experiencing performance problems with the merge join task.
Every time I'm building a nice package using this task, I'm ending up deleting it and using SQL statement in the OLE DB source to accomplish the join since it takes forever to run and crushing my computer at the process.
It makes me feel I don't use the abilities SSIS has to offer compared to DTS.
Of course for the use of several thousands of records it works fine, but in a production surrounding with hundred of thousands of rows, it seems to be futile.

Maybe someone had a little more luck with it?

If I were you I would use the OLE DB Source component to do the join - there is absolutely nothing wrong with doing that. If you have a super-performant relational databsae engine at your disposal - why not use it?

Donald Farmer talks around this a little in his OVAL webcast. If you only watch one SSIS webcast in your life then it should be this one.

Donald Farmer's Technet webcast
(http://blogs.conchango.com/jamiethomson/archive/2006/06/14/4076.aspx)

-Jamie

|||Thanks, I'll take a look. does the merge join takes place in the cache?|||

Can you elaborate as to what you mean by "the cache"?

-Jamie

|||

Sure. When I'm using a Lookup component to make the join, The lookup table is been cached by default. When I want to join to a large table (with several millions) I understood It's best practice to use the Merge Join Component, but as I see also this component caches the records in the machines memory, so I don't understand the benefit...

|||

There's a different sort of memory usage going on here.

Yes, the LOOKUP has an area of memory that we call the cache and is used for storing a lookup set.

MERGE JOIN stores data in memory (as do all asynchronous components) but its a different kind of memory and we don't refer to it as a cache. it is just the component's working area - more commonly termed a buffer. Also, this working area will change as MERGE JOIN does its work whereas the LOOKUP cache is static.

It is also worth saying that the MERGE JOIN can spool data to disk if it s running out of memory. LOOKUP cannot do that with its cache.

-Jamie

|||

Hi SSIS friend,

I remember asking a similar question a couple of months back. I think I had 15+ Merge Join components in a test package and the performance was awful. Jamie's advice then and now is pretty sound.

During my short period experimenting with SSIS, I came to realise that in order to create efficient packages, I had to utilise the power of both the SQL Server and SSIS engines. Each one is good at performing certain tasks better than the other. It takes time and I'm still learning, but the more you play around with it, the easier it gets to choose which engine should be used.

|||

Thanks guys.

sql

No comments:

Post a Comment