Friday, March 30, 2012

Merge Join = 99%

I have got a query in which a merge join is 99% of the cost ... and I am confused ... is not merge join supposed to be the fastest ? Anyone seen this before ?
Any ideas why this could be happening ... and sorry ... do not ask me to post the code coz I will not be able to ...just because it is 99% of the cost that does not indicate that it was excessive. it just means that out of 100% of total cost required to run the query, (parsing resolving compiling (serial and parallel plans) and execution)
that the execution was 99% of the total operation. this indicates that figuring it out isnt taking a long time but doing it is.

now I believe that when you say 99% of the cost you mean in the Graphical execution plan, then that is still justifiable considering the amount of work that sql has to do to perform a dual sorted join.

sometimes if a merge join is warranted, sql server will sort unsorted columns because the optimizer feels that the merge join is more beneficial to the query than the cost of not performing a merge join.

if one of the columns involved is a foriegn key and it is not indexed you might want to consider a nonclustered index on this column.
this will presort the data in the column and speed up the merge.
the Graphic EX will probably still indicate the 99% (of the process time)but your IO and or execution time might decrease.|||There is a non clustered index on the columns ... but the optimizer is choosing the clustered index ... and the merge join is taking 5 minutes ...|||Is the optimizer scanning the clustered index, or doing a seek? Also, how is the where clause on this pig? If you can eliminate "OR"s or "(NOT) IN"s, you will probably do better.|||The query is on a big fat b#$%%rd table of about 11 million rows ...

Should i take the table with all its where clauses into another temp table and then do a join ... currently I am using a derived table ...|||How about the DDL and the DML...skip the sample data...

Don't forget to script the indexes|||Just read up on "Understanding Merge Joins" in BOL. Sounds like in your case, the data is not sorted on the merge keys going into the query. Also, the fact that you are dealing with millions of rows. I have only seen the merge join a few times, but every time has been with 1 million rows or more.

Just for grins (because I doubt it will work), what is the result of throwing an order by on the subquery? The order by should be the same order as the join, and ideally there should be an index on the table you are joining to with that same order.

And this is just the thing I would need to have someone else look at myself, because you never know what is going to jump out at a different pair of eyes.|||I would avoid the ORDER BY

It's more overhead...until you get the final result (which is how big again?)

It's either going to use the index or it's not.

Where's the driver data coming from, since your doing a merge...

And it will have to scan the driver....|||Another pair of eyes .. i am not gonna get ... am working at client's place all alone on my laptop ... away from my office ... and as I said before in the YAK Corral... that Shark tank story looks a lot like mine ...|||Another pair of eyes .. i am not gonna get

Dude we are your eyes...

Can't you post it?|||Just my luck ... am under a strict NDA|||So am I...just under a completely unrelated project ;-).

As for the sort in the subquery, it is just a guess based on the description of how a merge jin works. It may not work, but it sounds like this is getting down to the "Hail Mary" point.|||Dude we are your eyes...

Anyway ... Thanks for the nice words ...

What do you think would give better performance under the conditions ...
a) Temp table
b) Table variable
c) Derived table

The table contains data for an year ... the selection is for a month ... and then have to make a join on a different col from another big fat table containing about 80,000 records ...|||It's ALL about the indexes...

Get the result set as small as possible...

how many rows in a month of data..if it's not too big you could use a table datatype...

I still like derived tables

In the predicate...is there an index for every column..are they in the right order?

How many columns in the orw?|||How many rows in option 1, 2, or 3?|||No of rows in a month ... 11 million divided by 12 ...
And I too like derived tables ...

not many colums in the row ... if I remember correctly .. 8 or 9 ...

Ok .. going to sleep now ... will update how it goes tomorrow ...|||avoid the order by as much as possible it wont affect the sort order of the table in the case of the join.

i would consider that the merge is going to occur in this case regardless based on the sql optimizer

i would try a join hint (in development only) and initially i would try to set forceplan on. just to see what the io is on your current query as written.
they i would play with the join order just to see how the optimizer chooses it's plan.
oh yeah
dont forget to update stats and check your indicies etc... before you start using optimizer hints.|||avoid the order by

What did I say....

so it's 1 million...not huge, but still alot...

Is the date indexed?

With 8-10 columns the best you'll have would be index seek, bookmark lookup...

Did you SHOW PLAN yet?

No comments:

Post a Comment