Friday, March 30, 2012

Merge Join (Full outer join) never finishes.

I have a merge join (full outer join) task in a data flow. The left input comes from a flat file source and then a script transformation which does some custom grouping. The right input comes from an oledb source. The script transformation output is asynchronous (SynchronousInputID=0). The left input has many more rows (200,000+) than the right input (2,500). I run it from VS 2005 by right-click/execute on the data flow task. The merge join remains yellow and the task never finishes. I do see a row count above the flat file destination that reaches a certain number and seems to get stuck there. When I test with a smaller file on the left it works OK. Any suggestions?

Try this - replace your destination with a RowCount component - it does nothing except count rows into a variable. if the package succeeds, then you have narrowed the problem down to an issue with your destination.

If there is an issue with your destination, you should start to look into disk and memory usage.

hth - I'll look forward to hearing more.

Donald

|||The way I worked around the problem was to separate it into 2 dataflows, the first one just taking the left branch of the input to the merge join and landing it to a raw destination. Then in the second data flow (the merge join) I simply used the raw file as the left input and it worked fine. Does that eliminate the possibility of it being an issue with the destination? If not then I will recreate the original scenario and replace the destination with a RowCount component as you suggested. Thanks.|||

That's a tricky one. I wonder if the combined memory usage of your script component and the merge join was stressing the box.

Donald

No comments:

Post a Comment