Friday, March 30, 2012

Merge join: nr of output rows unchanged when amount of input changes

Dear all,

I created a package that seems to work fine with a small amount of data. When I run the package however with more data (as in production) the merge join output is limites to 9963 rows, no matter if I change the number of input rows.

Situation as follows.

The package has 2 OLE DB Sources, in which SQL-statements have been defined in order to retrieve the data.

The flow of source 1 is: retrieving source data -> trimming (non-key) columns -> sorting on the key-columns.

The flow of source 2 is: retrieving source data -> deriving 2 new columns -> aggregating the data to the level of source 1 -> sorting on the key columns.

Then both flows are merged and other steps are performed.

If I test with just a couple of rows it works fine. But when I change the where-clause in the data source retrieval, so that the number of rows is for instance 15000 or 150000 the number of rows after the merge join is 9963.

When I run the package in debug-mode the step is colored green, nevertheless an error is displayed:

Error: 0xC0047022 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Merge Join" (4703) failed with error code 0xC0047020. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.

To be honest, a few more errormessages appear, but they don't seem related to this issue. The package stops running after some 6000 rows have been written to the destination.

Any help will be greatly appreciated.

Kind regards,

Albert.

If you could post the full error output here, that would probably be helpful. Sometimes it is the "big view" that helps point you at the cause of the problem, especially since many of SSIS's error messages are not particularly transparent. The earlier errors often show what triggered the later errors, even if they do not appear directly related.

|||Can you also include details of the next task in the pipeline, the one that accepts the ~9000 rows. It sounds like it's failing on the first buffer it receives as input.|||

Ok, for the big picture: I feel like a fool.

Solving one of the other errors solved the reported issue as well.

Apologies for bothering you.

No comments:

Post a Comment