Friday, March 30, 2012

Merge Join gives unexpected results when the input has speacial characters

Hi
I have a strange issue here.
Both the input datasets of the merge join are sorted . Everything works fine if the dataset doesn't contain speacial characters like - (hipen).
This is the sorted dataset.
ASI311_3
ASI311_3
ASI312
ASI311
ASIKit1
ASIKit1

Merge Joiner has to look for ASI311 and finds it in the list and everything works as expected. But if the dataset is changed to

ASI311_3
ASI311_3
ASI-312
ASI311
ASIKit1
ASIKit1

Then it doesn't find the ASI311 in the above dataset ?
Only change between the two datasets is the - (hipen) which is added in one of the field. Does SSIS has a differenent precedence of the special characters than that of the Database. Both the datasets are on SQLServer.

Can anyone point out the issue here?

Thankyou
Vidya

What is the sort column's data type (in SQL server, not SSIS) and what is its collation?

The issue is highly like one of collation. Hyphen sorts to the top in SQL collations , and to the bottom in Windows collations. If your SQL server column is using a SQL collation, like SQL_Latin_General1_CI_AS, then an "ORDER BY" on the column of SQL collation would not of necessity match an SSIS sort (or a merge join which requires data of a particular collation).

If so, its likely you have a database collation equal or similar to SQL_Latin_General1_CI_AS on the database columns. Data with hyphens present will sort differently in SQL than in SSIS using the aforementioned SQL collation.

You have some options though.

1. Change the collation on the database column from a SQL collation to a windows collation. From example you would use Latin_General1_CI_AS rather than SQL_Latin_General_CI_AS.

2. Change the column data type to the national character variant, e.g. from varchar to nvarchar.

3. Lastly, you can change collation in the source components's "select" statement itself, as in "select x from t order by x COLLATE Latin_General1_CI_AS". This sort order is a windows collation compatible with SSIS.

Now, SSIS is also case-sensitive, whereas SQL server (by default) is not. However, that does not appear to be your issue here, since there is no "captailized" hyphen.

|||
Thanks a lot jadge , it worked Smile
I changed the collation from SQL to Windows as mentioned and it worked .

Thankyou
Vidya
|||Hi,
What is the way to set the same type of property in Oracle 10g database.
Do we need to use NLS_SORT property ?
Can anyone point me to that.

Thankyou
Vidya

No comments:

Post a Comment