Showing posts with label sorted. Show all posts
Showing posts with label sorted. Show all posts

Friday, March 30, 2012

Merge Join Unexpected Results

I noticed a possible bug with the merge join in the June CTP and wanted to see if anyone else is experiencing these issues.

I have two sorted oledb sources with a sort order on columns TicketNumber (1), SequenceNumber (2). Later in my transformation I want to apply a merge join only on TicketNumber (1). The metadata still shows my sort order of TicketNumber, SequenceNumber but my join condition only uses the TicketNumber column.

Upon execution, the first sequence of the ticket on left input joins to every ticket sequence on the right. This is the expected behavior. However, once the sequence number on the left changes and the ticket number remains the same, the left joins to nothing on the right.

In order to get my expected result I had to resort the data only on the TicketNumber (1) column to clear the metadata of the sort order. Below is an example of my results and expected results.

Just wondering what others are experiencing and if this is a potential bug or not. The reduntant sort is killing my performance, but I need the two column sort for work done in previous steps.

Thanks,

Adam

Example.

Left Table
Ticket,Seq,Data
1,1,A
1,2,B
1,3,A

Right Table
Ticket, Seq, Data
1,1,A
1,2,B
1,3,A

Joined Result
L.Ticket, L.Seq, L.Data, R.Ticket, R.Seq, R.Data
1, 1, A, 1, 1, A
1, 1, A, 1, 2, B
1, 1, A, 1, 3, A
1, 2, B, NULL, NULL, NULL
1, 3, A, NULL, NULL, NULL

Expected Result
L.Ticket, L.Seq, L.Data, R.Ticket, R.Seq, R.Data
1, 1, A, 1, 1, A
1, 1, A, 1, 2, B
1, 1, A, 1, 3, A
1, 2, B, 1, 1, A
1, 2, B, 1, 2, B
1, 2, B, 1, 3, A
1, 3, A, 1, 1, A
1, 3, A, 1, 2, B
1, 3, A, 1, 3, A

I have seen this problem before. If I remember correctly, you have to make sure only rows that you want to join on are sorted. It seems as though the merge-join pays more attention to the rows that are sorted than to the "Join Key" checkboxes.
I filed a bug on this and this is the reply:
You can either uncheck the joinkey column in the custom editor or you can change the NumKeyColumns property in the properties grid or advanced editor
I haven't had time to test the NumKeyColumns property, let me know if it works.
-Evan Black|||Evan is right.

Adam, what's the value for NumKeyColumns (a property on Merge Join transform)? If you set to 1, you will get the expected results. That worked for me.|||Thanks guys that did the trick.

Merge Join Output Bug?

I've run into something that looks like a bug to me but I wanted to run it by the board:

Merge join 2 sorted tables.

Table1: ColumnA : Sort Order 1, ColumnB Sort Order 2

Table2 : ColumnA: Sort Order 1, ColumnB Sort Order 2, ColumnC not sorted

Merge Join the two tables on ColumnA and ColumnB...

Choose the following as output columns

A + B + C = works

C = works

A + C = works

B + C = NOT work.. error message: The column with the SortKeyPosition value of 0 is not valid. It should be 2.

Basically if you choose one or more of the sorted columns in the output at least one of them has to be the column with Sort position 1 or you'll get that error.

Is this a bug or intentional? If you do not have sort column 1 in the output that output could no longer be considered sorted... so perhaps the error is related to that (instead of error I'd expect some warning about the sorting). Interesting that it lets you choose C only becuase that also makes the output unsorted.

I see your point Chris.

I think it is intential -

. The reason why B+C not work is because column B has a non-zero sortKeyPosition which indicates the output (to which B belongs) should be sorted (in other words, the output's "isSorted" property is true), but the output can not find a column with SortKeyPosition 1

. As for why C column only works is because the output is then not sorted.

If you think the error message is not very helpful, please log a customer issue through our connect website http://connect.microsoft.com/SQLServer and your request will be addressed soon as appropriate.

Thanks

wenyang

Merge Join Output Bug?

I've run into something that looks like a bug to me but I wanted to run it by the board:

Merge join 2 sorted tables.

Table1: ColumnA : Sort Order 1, ColumnB Sort Order 2

Table2 : ColumnA: Sort Order 1, ColumnB Sort Order 2, ColumnC not sorted

Merge Join the two tables on ColumnA and ColumnB...

Choose the following as output columns

A + B + C = works

C = works

A + C = works

B + C = NOT work.. error message: The column with the SortKeyPosition value of 0 is not valid. It should be 2.

Basically if you choose one or more of the sorted columns in the output at least one of them has to be the column with Sort position 1 or you'll get that error.

Is this a bug or intentional? If you do not have sort column 1 in the output that output could no longer be considered sorted... so perhaps the error is related to that (instead of error I'd expect some warning about the sorting). Interesting that it lets you choose C only becuase that also makes the output unsorted.

I see your point Chris.

I think it is intential -

. The reason why B+C not work is because column B has a non-zero sortKeyPosition which indicates the output (to which B belongs) should be sorted (in other words, the output's "isSorted" property is true), but the output can not find a column with SortKeyPosition 1

. As for why C column only works is because the output is then not sorted.

If you think the error message is not very helpful, please log a customer issue through our connect website http://connect.microsoft.com/SQLServer and your request will be addressed soon as appropriate.

Thanks

wenyang

Merge Join gives unexpected results when the input has speacial characters

Hi
I have a strange issue here.
Both the input datasets of the merge join are sorted . Everything works fine if the dataset doesn't contain speacial characters like - (hipen).
This is the sorted dataset.
ASI311_3
ASI311_3
ASI312
ASI311
ASIKit1
ASIKit1

Merge Joiner has to look for ASI311 and finds it in the list and everything works as expected. But if the dataset is changed to

ASI311_3
ASI311_3
ASI-312
ASI311
ASIKit1
ASIKit1

Then it doesn't find the ASI311 in the above dataset ?
Only change between the two datasets is the - (hipen) which is added in one of the field. Does SSIS has a differenent precedence of the special characters than that of the Database. Both the datasets are on SQLServer.

Can anyone point out the issue here?

Thankyou
Vidya

What is the sort column's data type (in SQL server, not SSIS) and what is its collation?

The issue is highly like one of collation. Hyphen sorts to the top in SQL collations , and to the bottom in Windows collations. If your SQL server column is using a SQL collation, like SQL_Latin_General1_CI_AS, then an "ORDER BY" on the column of SQL collation would not of necessity match an SSIS sort (or a merge join which requires data of a particular collation).

If so, its likely you have a database collation equal or similar to SQL_Latin_General1_CI_AS on the database columns. Data with hyphens present will sort differently in SQL than in SSIS using the aforementioned SQL collation.

You have some options though.

1. Change the collation on the database column from a SQL collation to a windows collation. From example you would use Latin_General1_CI_AS rather than SQL_Latin_General_CI_AS.

2. Change the column data type to the national character variant, e.g. from varchar to nvarchar.

3. Lastly, you can change collation in the source components's "select" statement itself, as in "select x from t order by x COLLATE Latin_General1_CI_AS". This sort order is a windows collation compatible with SSIS.

Now, SSIS is also case-sensitive, whereas SQL server (by default) is not. However, that does not appear to be your issue here, since there is no "captailized" hyphen.

|||
Thanks a lot jadge , it worked Smile
I changed the collation from SQL to Windows as mentioned and it worked .

Thankyou
Vidya
|||Hi,
What is the way to set the same type of property in Oracle 10g database.
Do we need to use NLS_SORT property ?
Can anyone point me to that.

Thankyou
Vidya

Merge Join gives unexpected results when the input has speacial characters

Hi
I have a strange issue here.
Both the input datasets of the merge join are sorted . Everything works fine if the dataset doesn't contain speacial characters like - (hipen).
This is the sorted dataset.
ASI311_3
ASI311_3
ASI312
ASI311
ASIKit1
ASIKit1

Merge Joiner has to look for ASI311 and finds it in the list and everything works as expected. But if the dataset is changed to

ASI311_3
ASI311_3
ASI-312
ASI311
ASIKit1
ASIKit1

Then it doesn't find the ASI311 in the above dataset ?
Only change between the two datasets is the - (hipen) which is added in one of the field. Does SSIS has a differenent precedence of the special characters than that of the Database. Both the datasets are on SQLServer.

Can anyone point out the issue here?

Thankyou
Vidya

What is the sort column's data type (in SQL server, not SSIS) and what is its collation?

The issue is highly like one of collation. Hyphen sorts to the top in SQL collations , and to the bottom in Windows collations. If your SQL server column is using a SQL collation, like SQL_Latin_General1_CI_AS, then an "ORDER BY" on the column of SQL collation would not of necessity match an SSIS sort (or a merge join which requires data of a particular collation).

If so, its likely you have a database collation equal or similar to SQL_Latin_General1_CI_AS on the database columns. Data with hyphens present will sort differently in SQL than in SSIS using the aforementioned SQL collation.

You have some options though.

1. Change the collation on the database column from a SQL collation to a windows collation. From example you would use Latin_General1_CI_AS rather than SQL_Latin_General_CI_AS.

2. Change the column data type to the national character variant, e.g. from varchar to nvarchar.

3. Lastly, you can change collation in the source components's "select" statement itself, as in "select x from t order by x COLLATE Latin_General1_CI_AS". This sort order is a windows collation compatible with SSIS.

Now, SSIS is also case-sensitive, whereas SQL server (by default) is not. However, that does not appear to be your issue here, since there is no "captailized" hyphen.

|||
Thanks a lot jadge , it worked Smile
I changed the collation from SQL to Windows as mentioned and it worked .

Thankyou
Vidya
|||Hi,
What is the way to set the same type of property in Oracle 10g database.
Do we need to use NLS_SORT property ?
Can anyone point me to that.

Thankyou
Vidya

Merge Join

I am trying to understand mege joins. In BOL it reads "The merge join
requires that both inputs be sorted on the merge columns, which are defined
by the equality (WHERE) clauses of the join predicate."
What does it mean "both inputs must be sorted"?
Message posted via droptable.com
http://www.droptable.com/Uwe/Forums...erver/200603/1
> What does it mean "both inputs must be sorted"?
That you on each table either have a clustered index on the columns over which you do the join, or a
non-clustered index that covers the data that is needed from the table.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
Blog: http://solidqualitylearning.com/blogs/tibor/
"cbrichards via droptable.com" <u3288@.uwe> wrote in message news:5d9898960f213@.uwe...
>I am trying to understand mege joins. In BOL it reads "The merge join
> requires that both inputs be sorted on the merge columns, which are defined
> by the equality (WHERE) clauses of the join predicate."
> What does it mean "both inputs must be sorted"?
> --
> Message posted via droptable.com
> http://www.droptable.com/Uwe/Forums...erver/200603/1

Merge Join

I am trying to understand mege joins. In BOL it reads "The merge join
requires that both inputs be sorted on the merge columns, which are defined
by the equality (WHERE) clauses of the join predicate."
What does it mean "both inputs must be sorted"?
Message posted via droptable.com
http://www.droptable.com/Uwe/Forum...server/200603/1> What does it mean "both inputs must be sorted"?
That you on each table either have a clustered index on the columns over whi
ch you do the join, or a
non-clustered index that covers the data that is needed from the table.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
Blog: http://solidqualitylearning.com/blogs/tibor/
"cbrichards via droptable.com" <u3288@.uwe> wrote in message news:5d9898960f213@.uwe...[vbcol
=seagreen]
>I am trying to understand mege joins. In BOL it reads "The merge join
> requires that both inputs be sorted on the merge columns, which are define
d
> by the equality (WHERE) clauses of the join predicate."
> What does it mean "both inputs must be sorted"?
> --
> Message posted via droptable.com
> http://www.droptable.com/Uwe/Forum...server/200603/1[/vbcol]sql

Merge Join

Hello all,

I have a package where I use merge join for two sorted inputs and the output is stored in a raw file.

In another package, the raw file from above package is again merge joined with another sorted input. Now my question is....do we need to sort again the raw file from first package? or is it OK to set the isSorted property to True and define the sort keys?

Thank you.

As long as you know the input is sorted, you can use the issorted property.

If it turns out not to be sorted, you could miss joins.

BobP

|||

Thank you for the reply...

So here in my case I am using a left join (in merge join)....how would I know if the raw file output from first package i sorted or not?

There is no problem to sort it again but really takes a lot of time(as the size is too big).

|||

If you are sorting it and then writing it out, it should stay sorted.

BobP

|||Ok....if it stays sorted then OK.......many Thanks

Merge Join

I am trying to understand mege joins. In BOL it reads "The merge join
requires that both inputs be sorted on the merge columns, which are defined
by the equality (WHERE) clauses of the join predicate."
What does it mean "both inputs must be sorted"?
--
Message posted via SQLMonster.com
http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200603/1> What does it mean "both inputs must be sorted"?
That you on each table either have a clustered index on the columns over which you do the join, or a
non-clustered index that covers the data that is needed from the table.
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
Blog: http://solidqualitylearning.com/blogs/tibor/
"cbrichards via SQLMonster.com" <u3288@.uwe> wrote in message news:5d9898960f213@.uwe...
>I am trying to understand mege joins. In BOL it reads "The merge join
> requires that both inputs be sorted on the merge columns, which are defined
> by the equality (WHERE) clauses of the join predicate."
> What does it mean "both inputs must be sorted"?
> --
> Message posted via SQLMonster.com
> http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200603/1