Showing posts with label unchanged. Show all posts
Showing posts with label unchanged. Show all posts

Friday, March 30, 2012

Merge join: nr of output rows unchanged when amount of input changes

Dear all,

I created a package that seems to work fine with a small amount of data. When I run the package however with more data (as in production) the merge join output is limites to 9963 rows, no matter if I change the number of input rows.

Situation as follows.

The package has 2 OLE DB Sources, in which SQL-statements have been defined in order to retrieve the data.

The flow of source 1 is: retrieving source data -> trimming (non-key) columns -> sorting on the key-columns.

The flow of source 2 is: retrieving source data -> deriving 2 new columns -> aggregating the data to the level of source 1 -> sorting on the key columns.

Then both flows are merged and other steps are performed.

If I test with just a couple of rows it works fine. But when I change the where-clause in the data source retrieval, so that the number of rows is for instance 15000 or 150000 the number of rows after the merge join is 9963.

When I run the package in debug-mode the step is colored green, nevertheless an error is displayed:

Error: 0xC0047022 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Merge Join" (4703) failed with error code 0xC0047020. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.

To be honest, a few more errormessages appear, but they don't seem related to this issue. The package stops running after some 6000 rows have been written to the destination.

Any help will be greatly appreciated.

Kind regards,

Albert.

If you could post the full error output here, that would probably be helpful. Sometimes it is the "big view" that helps point you at the cause of the problem, especially since many of SSIS's error messages are not particularly transparent. The earlier errors often show what triggered the later errors, even if they do not appear directly related.

|||Can you also include details of the next task in the pipeline, the one that accepts the ~9000 rows. It sounds like it's failing on the first buffer it receives as input.|||

Ok, for the big picture: I feel like a fool.

Solving one of the other errors solved the reported issue as well.

Apologies for bothering you.

Merge Join conditional split - HeadBanger

I have a merge join that does a full outer join. I then have a conditional split that will breakout by unchanged, insert and update. The update is what I am having a problem with. The conditional split for the update is a follows:

!ISNULL(HISTORICAL) && !ISNULL(TRANSFORM)

Now I believe the problem is related to spaces in the key field let me explain.

The join field is defined as [char](14) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

There was an original mismatch between the two tables of VARCHAR(14) & Char(14) but now there both Char(14).

What's interesting is that the few records with Alpha characters correctly does the update but the records with starting numeric data all go down the insert path.

The data of the join columns look like

'308811151 ' - 5 Spaces This Data incorrectly goes to insert path

'TSTRWR02 ' 6 Space This Data correctly goes to update

The data is grouped and sorted by the Key field and the historical and transform column contain a 1 for all records.
while the Doc_nbr which is the key


I tested the join in SQL server and it works with and without the spaces.

From SSIS I also tried RTRIM in the SQL command of the source in addition to substring the first 9 with the right trim and I always received the same results.

I am guessing there is something going on with the not isnull in the conditional split but I can't figure it out if I am missing something or if this is a BUG.


Three paths of data after the merge join and the conditional split

Insert
DOC_NBR,HISTORICAL,TRANSFORM,DOC_NBR_historical
206352185 ,,1,
209900165 ,,1,


Update
DOC_NBR,HISTORICAL,TRANSFORM,DOC_NBR_historical
NEE ,1,1,NEE
New ,1,1,New


Unchanged
DOC_NBR,HISTORICAL,TRANSFORM,DOC_NBR_historical
,1,,0000521096
,1,,0000685216

Any suggestions, am I missing the obvious?

LL


I beleive this to be a bug relating to a combination of the data relating to a char 14 which gets populated with space to the right on short columns in conjunction with the issorted property and the sortkey position of the oledb data source for sql server with the access mode SQL Command.

A work around on this was to remove the order by in the sql statements and to add a sort component before each input to the merge join.

So if anyone receives the same behavior, replace with sort components at least until you can find another solution like the look-up mentioned on this forum. You will take a little bit of a performance hit but at least you will get the anticipated results.

On a lighter note, Currently It's 78 degrees here in Miami and I didn't need a web service to figure that out.

LL

sql