Showing posts with label input. Show all posts
Showing posts with label input. Show all posts

Friday, March 30, 2012

Merge join: nr of output rows unchanged when amount of input changes

Dear all,

I created a package that seems to work fine with a small amount of data. When I run the package however with more data (as in production) the merge join output is limites to 9963 rows, no matter if I change the number of input rows.

Situation as follows.

The package has 2 OLE DB Sources, in which SQL-statements have been defined in order to retrieve the data.

The flow of source 1 is: retrieving source data -> trimming (non-key) columns -> sorting on the key-columns.

The flow of source 2 is: retrieving source data -> deriving 2 new columns -> aggregating the data to the level of source 1 -> sorting on the key columns.

Then both flows are merged and other steps are performed.

If I test with just a couple of rows it works fine. But when I change the where-clause in the data source retrieval, so that the number of rows is for instance 15000 or 150000 the number of rows after the merge join is 9963.

When I run the package in debug-mode the step is colored green, nevertheless an error is displayed:

Error: 0xC0047022 at Data Flow Task, DTS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Merge Join" (4703) failed with error code 0xC0047020. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.

To be honest, a few more errormessages appear, but they don't seem related to this issue. The package stops running after some 6000 rows have been written to the destination.

Any help will be greatly appreciated.

Kind regards,

Albert.

If you could post the full error output here, that would probably be helpful. Sometimes it is the "big view" that helps point you at the cause of the problem, especially since many of SSIS's error messages are not particularly transparent. The earlier errors often show what triggered the later errors, even if they do not appear directly related.

|||Can you also include details of the next task in the pipeline, the one that accepts the ~9000 rows. It sounds like it's failing on the first buffer it receives as input.|||

Ok, for the big picture: I feel like a fool.

Solving one of the other errors solved the reported issue as well.

Apologies for bothering you.

Merge Join gives unexpected results when the input has speacial characters

Hi
I have a strange issue here.
Both the input datasets of the merge join are sorted . Everything works fine if the dataset doesn't contain speacial characters like - (hipen).
This is the sorted dataset.
ASI311_3
ASI311_3
ASI312
ASI311
ASIKit1
ASIKit1

Merge Joiner has to look for ASI311 and finds it in the list and everything works as expected. But if the dataset is changed to

ASI311_3
ASI311_3
ASI-312
ASI311
ASIKit1
ASIKit1

Then it doesn't find the ASI311 in the above dataset ?
Only change between the two datasets is the - (hipen) which is added in one of the field. Does SSIS has a differenent precedence of the special characters than that of the Database. Both the datasets are on SQLServer.

Can anyone point out the issue here?

Thankyou
Vidya

What is the sort column's data type (in SQL server, not SSIS) and what is its collation?

The issue is highly like one of collation. Hyphen sorts to the top in SQL collations , and to the bottom in Windows collations. If your SQL server column is using a SQL collation, like SQL_Latin_General1_CI_AS, then an "ORDER BY" on the column of SQL collation would not of necessity match an SSIS sort (or a merge join which requires data of a particular collation).

If so, its likely you have a database collation equal or similar to SQL_Latin_General1_CI_AS on the database columns. Data with hyphens present will sort differently in SQL than in SSIS using the aforementioned SQL collation.

You have some options though.

1. Change the collation on the database column from a SQL collation to a windows collation. From example you would use Latin_General1_CI_AS rather than SQL_Latin_General_CI_AS.

2. Change the column data type to the national character variant, e.g. from varchar to nvarchar.

3. Lastly, you can change collation in the source components's "select" statement itself, as in "select x from t order by x COLLATE Latin_General1_CI_AS". This sort order is a windows collation compatible with SSIS.

Now, SSIS is also case-sensitive, whereas SQL server (by default) is not. However, that does not appear to be your issue here, since there is no "captailized" hyphen.

|||
Thanks a lot jadge , it worked Smile
I changed the collation from SQL to Windows as mentioned and it worked .

Thankyou
Vidya
|||Hi,
What is the way to set the same type of property in Oracle 10g database.
Do we need to use NLS_SORT property ?
Can anyone point me to that.

Thankyou
Vidya

Merge Join gives unexpected results when the input has speacial characters

Hi
I have a strange issue here.
Both the input datasets of the merge join are sorted . Everything works fine if the dataset doesn't contain speacial characters like - (hipen).
This is the sorted dataset.
ASI311_3
ASI311_3
ASI312
ASI311
ASIKit1
ASIKit1

Merge Joiner has to look for ASI311 and finds it in the list and everything works as expected. But if the dataset is changed to

ASI311_3
ASI311_3
ASI-312
ASI311
ASIKit1
ASIKit1

Then it doesn't find the ASI311 in the above dataset ?
Only change between the two datasets is the - (hipen) which is added in one of the field. Does SSIS has a differenent precedence of the special characters than that of the Database. Both the datasets are on SQLServer.

Can anyone point out the issue here?

Thankyou
Vidya

What is the sort column's data type (in SQL server, not SSIS) and what is its collation?

The issue is highly like one of collation. Hyphen sorts to the top in SQL collations , and to the bottom in Windows collations. If your SQL server column is using a SQL collation, like SQL_Latin_General1_CI_AS, then an "ORDER BY" on the column of SQL collation would not of necessity match an SSIS sort (or a merge join which requires data of a particular collation).

If so, its likely you have a database collation equal or similar to SQL_Latin_General1_CI_AS on the database columns. Data with hyphens present will sort differently in SQL than in SSIS using the aforementioned SQL collation.

You have some options though.

1. Change the collation on the database column from a SQL collation to a windows collation. From example you would use Latin_General1_CI_AS rather than SQL_Latin_General_CI_AS.

2. Change the column data type to the national character variant, e.g. from varchar to nvarchar.

3. Lastly, you can change collation in the source components's "select" statement itself, as in "select x from t order by x COLLATE Latin_General1_CI_AS". This sort order is a windows collation compatible with SSIS.

Now, SSIS is also case-sensitive, whereas SQL server (by default) is not. However, that does not appear to be your issue here, since there is no "captailized" hyphen.

|||
Thanks a lot jadge , it worked Smile
I changed the collation from SQL to Windows as mentioned and it worked .

Thankyou
Vidya
|||Hi,
What is the way to set the same type of property in Oracle 10g database.
Do we need to use NLS_SORT property ?
Can anyone point me to that.

Thankyou
Vidya

Merge Join (Full outer join) never finishes.

I have a merge join (full outer join) task in a data flow. The left input comes from a flat file source and then a script transformation which does some custom grouping. The right input comes from an oledb source. The script transformation output is asynchronous (SynchronousInputID=0). The left input has many more rows (200,000+) than the right input (2,500). I run it from VS 2005 by right-click/execute on the data flow task. The merge join remains yellow and the task never finishes. I do see a row count above the flat file destination that reaches a certain number and seems to get stuck there. When I test with a smaller file on the left it works OK. Any suggestions?

Try this - replace your destination with a RowCount component - it does nothing except count rows into a variable. if the package succeeds, then you have narrowed the problem down to an issue with your destination.

If there is an issue with your destination, you should start to look into disk and memory usage.

hth - I'll look forward to hearing more.

Donald

|||The way I worked around the problem was to separate it into 2 dataflows, the first one just taking the left branch of the input to the merge join and landing it to a raw destination. Then in the second data flow (the merge join) I simply used the raw file as the left input and it worked fine. Does that eliminate the possibility of it being an issue with the destination? If not then I will recreate the original scenario and replace the destination with a RowCount component as you suggested. Thanks.|||

That's a tricky one. I wonder if the combined memory usage of your script component and the merge join was stressing the box.

Donald

Wednesday, March 28, 2012

merge input output selection dialog is hanging

Hello

I'm trying to use the Merge component. When i attach a datasource to the the component, the Select Input/Output dialog box should popup.. It does, but VS.NET is hanging and i can only shutdown the procesess...

Any idea how i should solve this? how can i re-register this component?

ps. sql 2005 sp1 is installed.

Thanks
Marco

I am experiencing the same problem.

I have SQL Server 2005 SP1 and Visual Studio 2005 SP1.

I thought I had worked around this in the past by adding the components to the designer in a different order, but that does not seem to make a difference any longer.

|||

If this is something that happened very randomly and could be worked around by just closing/killing the project and open it again, then it's a known issue, the fix will be in SP2. However if you bumped into a constant repro of this, then likely it is something new, in that case please file a bug at our beta place with a brief description of that you got, we will get back to you soon.

Thanks

wenyang

|||

Beta Place closed some time ago.

You can of course raise a PSS support case for full paid support, or just log the bug/request online at MS Connect, effectively the replacement for Beta Place - http://connect.microsoft.com

merge input output selection dialog is hanging

Hello

I'm trying to use the Merge component. When i attach a datasource to the the component, the Select Input/Output dialog box should popup.. It does, but VS.NET is hanging and i can only shutdown the procesess...

Any idea how i should solve this? how can i re-register this component?

ps. sql 2005 sp1 is installed.

Thanks
Marco

I am experiencing the same problem.

I have SQL Server 2005 SP1 and Visual Studio 2005 SP1.

I thought I had worked around this in the past by adding the components to the designer in a different order, but that does not seem to make a difference any longer.

|||

If this is something that happened very randomly and could be worked around by just closing/killing the project and open it again, then it's a known issue, the fix will be in SP2. However if you bumped into a constant repro of this, then likely it is something new, in that case please file a bug at our beta place with a brief description of that you got, we will get back to you soon.

Thanks

wenyang

|||

Beta Place closed some time ago.

You can of course raise a PSS support case for full paid support, or just log the bug/request online at MS Connect, effectively the replacement for Beta Place - http://connect.microsoft.com