Showing posts with label unexpected. Show all posts
Showing posts with label unexpected. Show all posts

Friday, March 30, 2012

Merge Join Unexpected Results

I noticed a possible bug with the merge join in the June CTP and wanted to see if anyone else is experiencing these issues.

I have two sorted oledb sources with a sort order on columns TicketNumber (1), SequenceNumber (2). Later in my transformation I want to apply a merge join only on TicketNumber (1). The metadata still shows my sort order of TicketNumber, SequenceNumber but my join condition only uses the TicketNumber column.

Upon execution, the first sequence of the ticket on left input joins to every ticket sequence on the right. This is the expected behavior. However, once the sequence number on the left changes and the ticket number remains the same, the left joins to nothing on the right.

In order to get my expected result I had to resort the data only on the TicketNumber (1) column to clear the metadata of the sort order. Below is an example of my results and expected results.

Just wondering what others are experiencing and if this is a potential bug or not. The reduntant sort is killing my performance, but I need the two column sort for work done in previous steps.

Thanks,

Adam

Example.

Left Table
Ticket,Seq,Data
1,1,A
1,2,B
1,3,A

Right Table
Ticket, Seq, Data
1,1,A
1,2,B
1,3,A

Joined Result
L.Ticket, L.Seq, L.Data, R.Ticket, R.Seq, R.Data
1, 1, A, 1, 1, A
1, 1, A, 1, 2, B
1, 1, A, 1, 3, A
1, 2, B, NULL, NULL, NULL
1, 3, A, NULL, NULL, NULL

Expected Result
L.Ticket, L.Seq, L.Data, R.Ticket, R.Seq, R.Data
1, 1, A, 1, 1, A
1, 1, A, 1, 2, B
1, 1, A, 1, 3, A
1, 2, B, 1, 1, A
1, 2, B, 1, 2, B
1, 2, B, 1, 3, A
1, 3, A, 1, 1, A
1, 3, A, 1, 2, B
1, 3, A, 1, 3, A

I have seen this problem before. If I remember correctly, you have to make sure only rows that you want to join on are sorted. It seems as though the merge-join pays more attention to the rows that are sorted than to the "Join Key" checkboxes.
I filed a bug on this and this is the reply:
You can either uncheck the joinkey column in the custom editor or you can change the NumKeyColumns property in the properties grid or advanced editor
I haven't had time to test the NumKeyColumns property, let me know if it works.
-Evan Black|||Evan is right.

Adam, what's the value for NumKeyColumns (a property on Merge Join transform)? If you set to 1, you will get the expected results. That worked for me.|||Thanks guys that did the trick.

Merge Join gives unexpected results when the input has speacial characters

Hi
I have a strange issue here.
Both the input datasets of the merge join are sorted . Everything works fine if the dataset doesn't contain speacial characters like - (hipen).
This is the sorted dataset.
ASI311_3
ASI311_3
ASI312
ASI311
ASIKit1
ASIKit1

Merge Joiner has to look for ASI311 and finds it in the list and everything works as expected. But if the dataset is changed to

ASI311_3
ASI311_3
ASI-312
ASI311
ASIKit1
ASIKit1

Then it doesn't find the ASI311 in the above dataset ?
Only change between the two datasets is the - (hipen) which is added in one of the field. Does SSIS has a differenent precedence of the special characters than that of the Database. Both the datasets are on SQLServer.

Can anyone point out the issue here?

Thankyou
Vidya

What is the sort column's data type (in SQL server, not SSIS) and what is its collation?

The issue is highly like one of collation. Hyphen sorts to the top in SQL collations , and to the bottom in Windows collations. If your SQL server column is using a SQL collation, like SQL_Latin_General1_CI_AS, then an "ORDER BY" on the column of SQL collation would not of necessity match an SSIS sort (or a merge join which requires data of a particular collation).

If so, its likely you have a database collation equal or similar to SQL_Latin_General1_CI_AS on the database columns. Data with hyphens present will sort differently in SQL than in SSIS using the aforementioned SQL collation.

You have some options though.

1. Change the collation on the database column from a SQL collation to a windows collation. From example you would use Latin_General1_CI_AS rather than SQL_Latin_General_CI_AS.

2. Change the column data type to the national character variant, e.g. from varchar to nvarchar.

3. Lastly, you can change collation in the source components's "select" statement itself, as in "select x from t order by x COLLATE Latin_General1_CI_AS". This sort order is a windows collation compatible with SSIS.

Now, SSIS is also case-sensitive, whereas SQL server (by default) is not. However, that does not appear to be your issue here, since there is no "captailized" hyphen.

|||
Thanks a lot jadge , it worked Smile
I changed the collation from SQL to Windows as mentioned and it worked .

Thankyou
Vidya
|||Hi,
What is the way to set the same type of property in Oracle 10g database.
Do we need to use NLS_SORT property ?
Can anyone point me to that.

Thankyou
Vidya

Merge Join gives unexpected results when the input has speacial characters

Hi
I have a strange issue here.
Both the input datasets of the merge join are sorted . Everything works fine if the dataset doesn't contain speacial characters like - (hipen).
This is the sorted dataset.
ASI311_3
ASI311_3
ASI312
ASI311
ASIKit1
ASIKit1

Merge Joiner has to look for ASI311 and finds it in the list and everything works as expected. But if the dataset is changed to

ASI311_3
ASI311_3
ASI-312
ASI311
ASIKit1
ASIKit1

Then it doesn't find the ASI311 in the above dataset ?
Only change between the two datasets is the - (hipen) which is added in one of the field. Does SSIS has a differenent precedence of the special characters than that of the Database. Both the datasets are on SQLServer.

Can anyone point out the issue here?

Thankyou
Vidya

What is the sort column's data type (in SQL server, not SSIS) and what is its collation?

The issue is highly like one of collation. Hyphen sorts to the top in SQL collations , and to the bottom in Windows collations. If your SQL server column is using a SQL collation, like SQL_Latin_General1_CI_AS, then an "ORDER BY" on the column of SQL collation would not of necessity match an SSIS sort (or a merge join which requires data of a particular collation).

If so, its likely you have a database collation equal or similar to SQL_Latin_General1_CI_AS on the database columns. Data with hyphens present will sort differently in SQL than in SSIS using the aforementioned SQL collation.

You have some options though.

1. Change the collation on the database column from a SQL collation to a windows collation. From example you would use Latin_General1_CI_AS rather than SQL_Latin_General_CI_AS.

2. Change the column data type to the national character variant, e.g. from varchar to nvarchar.

3. Lastly, you can change collation in the source components's "select" statement itself, as in "select x from t order by x COLLATE Latin_General1_CI_AS". This sort order is a windows collation compatible with SSIS.

Now, SSIS is also case-sensitive, whereas SQL server (by default) is not. However, that does not appear to be your issue here, since there is no "captailized" hyphen.

|||
Thanks a lot jadge , it worked Smile
I changed the collation from SQL to Windows as mentioned and it worked .

Thankyou
Vidya
|||Hi,
What is the way to set the same type of property in Oracle 10g database.
Do we need to use NLS_SORT property ?
Can anyone point me to that.

Thankyou
Vidya