Showing posts with label flow. Show all posts
Showing posts with label flow. Show all posts

Friday, March 30, 2012

Merge Join transformation hangs

I have serached through this forum, and I could not find the solution.

My SSIS data flow is a simple one. Extracting two tables from Oracle using OLE DB Source , join them together using merge join and loading into a table in SQL server by SQL Server Destination.

I haven't gone through this simple procedure because Merge join always hangs there. Down to further investigation, I found one input (randomly one of two inputs) is always stuck. Sometimes the input is empty, sometimes is about half way.

Is there workround to see what is happening there and to fix this problem?

TIA

As far as I can see, Mergejoin is not the culprit, it is by design for MergeJoin to wait until it gets buffers from both inputs before it outputs. Looks to me your input(s) did not get through at your Oracle server. I suggest you to use an external tool (rowsetviewer?) to perform the same queries you put on the two OLEDBSrc inputs, to see whether it works there.

thanks

wenyang

|||

Also make sure the 2 inputs of the Merge Join are actually sorted.

Rafael Salas

|||

Hi, Wenyang & Rafael,

Thanks very much for your input.

The 2 inputs of the Merge Join are sorted by "order by" and the Issorted = true. I found the killer is the "data viewer" of the path, which shows the input data. Once I removed the "data viewer", the problem's gone. I cannot explain why.

BTW, what's the external tool (rowsetviewer)? Where can I get it?

Merge Join question

Hi, All,

In the data flow of Intergration Service, my understanding is that the "Merge Join" only does the "join", there is no way to input the "where" clause. If I need to input the the "Where" clause or selection criteria, how should I do it in Integration Service?

TIA,

Maybe put in a conditional split transformation before you go into the merge... Filter your rows there.|||

john zhang wrote:

Hi, All,

In the data flow of Intergration Service, my understanding is that the "Merge Join" only does the "join", there is no way to input the "where" clause. If I need to input the the "Where" clause or selection criteria, how should I do it in Integration Service?

TIA,

Depending in the source you are using; I would use a SQL statement right in the surce components to 'pull' only required rows; otherwise I would use conditional split transforms within the data flow.

Rafael Salas

|||Thanks Phil.|||

Thanks Rafael for the input. Now I have "conditional split" worked for me even it's inconvenient

|||

john zhang wrote:

Thanks Rafael for the input. Now I have "conditional split" worked for me even it's inconvenient

Why is it inconvenient?

-Jamie

|||

With other popular ETL tool, in one transformation, you can do both "join" and "where". But in SQL Server 2005 Integration service, you have to use two transformation to finish the task.

|||

john zhang wrote:

With other popular ETL tool, in one transformation, you can do both "join" and "where". But in SQL Server 2005 Integration service, you have to use two transformation to finish the task.

Why is that a bad thing? Is having 2 boxes on the design surface really such a bad thing?

SSIS components are deliberately "atomic". By that I mean they do one single operation and thus the aim is to build data-flows which are greater than the sum of their parts. Just because the "join" and "where" is in the same box then it doesn't mean its doing any less work. In fact it might actually be doing more because you are using a CPU to do two operations whereas in SSIS this may not be the case.

Please don't think I'm trying to say that you're wrong - this is just my opinion and in a way I'm playing devil's advocate. I think its a useful discussion to have.

-Jamie

|||

here's a discussion, the merge join would be even better if it could join more than 2 tables at a time.

I have many packages that require a left join to 5 or 6 other tables and the only way to do this is to use a merge join for the first two tables, then get the data from the 3rd table, add another merge join transform, then join the output of the first merge join to the data from the third table to get a 3 table join and so on...

It works ok though..

|||

bobbins wrote:

here's a discussion, the merge join would be even better if it could join more than 2 tables at a time.

I have many packages that require a left join to 5 or 6 other tables and the only way to do this is to use a merge join for the first two tables, then get the data from the 3rd table, add another merge join transform, then join the output of the first merge join to the data from the third table to get a 3 table join and so on...

It works ok though..

Nice idea. Submit it at http://connect.microsoft.com

-Jamie

sql

Merge Join (Full outer join) never finishes.

I have a merge join (full outer join) task in a data flow. The left input comes from a flat file source and then a script transformation which does some custom grouping. The right input comes from an oledb source. The script transformation output is asynchronous (SynchronousInputID=0). The left input has many more rows (200,000+) than the right input (2,500). I run it from VS 2005 by right-click/execute on the data flow task. The merge join remains yellow and the task never finishes. I do see a row count above the flat file destination that reaches a certain number and seems to get stuck there. When I test with a smaller file on the left it works OK. Any suggestions?

Try this - replace your destination with a RowCount component - it does nothing except count rows into a variable. if the package succeeds, then you have narrowed the problem down to an issue with your destination.

If there is an issue with your destination, you should start to look into disk and memory usage.

hth - I'll look forward to hearing more.

Donald

|||The way I worked around the problem was to separate it into 2 dataflows, the first one just taking the left branch of the input to the merge join and landing it to a raw destination. Then in the second data flow (the merge join) I simply used the raw file as the left input and it worked fine. Does that eliminate the possibility of it being an issue with the destination? If not then I will recreate the original scenario and replace the destination with a RowCount component as you suggested. Thanks.|||

That's a tricky one. I wonder if the combined memory usage of your script component and the merge join was stressing the box.

Donald