Friday, February 24, 2012

Memory Pressure in MemtoLeave

Starting in early 2004 our active\active SQL Server cluster started
experiencing intermittent slow downs, mini-dumps, and error messages in
the client and SQL logs (see below). These symptoms would build until
the system became non-responsive. The final symptom was login failures
for connections using Windows authentication. Based on the error
messages and that a restart\failover brought the server back online we
were sure the problem was memory pressure.
The first few times the memory pressure happened we applied standard
troubleshooting. All the latest patches were applied, called PSS, code
was reviewed, questionable processes and troublesome jobs were reviewed
and cleaned up. Three or four times we were confident that we found
the likely cause. A few weeks later the problem would re-occur.
After six months and out of answers we bought Essential Support from
Microsoft and opened a case The level of support and knowledge is much
higher than PSS. Working with Essential Support they confirmed our
in-house diagnosis of memory pressure in the MemToLeave area of the SQL
Server memory space. For more information on MemToLeave find Ken
Henderson's article on MSDN.
It took memory dumps and diagnostics after three incidents before
Microsoft found the problem. There is memory leak in SQL Server and
how it handles disconnects in remote queries across linked servers.
The scenario in my shop is like this: an app server starts a request
on behalf of a remote client. The request starts a remote query on
SQL Server A that reaches across to SQL Server B. Our app server has
aggressive retry logic (long story) so it terminates the original
process if it has waited too long. SQL Server A handles the disconnect
fine but SQL Server B doesn't. It has no place to send the data it
has gathered so that memory allocation on Server B leaks. It is not
de-allocated until the next time SQL Server restarts.
MemToLeave and the processes that run there want large chunks of
contiguous memory. These leaked sections of memory eventually caused
MemToLeave to become fragmented; leading to the infamous unable to
reserve memory message in my SQL Server log.
I'm guessing that someone will post a reply saying SQL Server
doesn't leak; it is our app server etc. Well, SQL Server does have a
leak; a hotfix and KB article will be coming to a SQL Server near you
soon.
Here is what I would recommend for troubleshooting this process if you
get any of the errors or symptoms I've listed:
1. Read Ken Henderson's article on MemToLeave so you understand this
poorly documented part of SQL Server.
http://msdn.microsoft.com/library/d...ev_01262004.asp
2. Set the -G switch in your startup parameter to 384 megs. This
will increase MemtoLeave but reduce the amount of memory available to
the buffer pool.
3. Look for any memory leaks in your own code. Review all use of
Sp_OA, third party or in-house extended stored procedures etc.
Remember every byte counts!
4. Try to reduce your linked server traffic or identify bottlenecks in
your remote queries.
5. Bypass PSS and spend the money for Essential Support.
6. If you suspect your problem is like mine changing this registry
setting. It is the workaround that we are using until the hotfix is
released.
& #91;HKEY_LOCAL_MACHINE\SOFTWARE\Microsof
t\Microsoft SQL Server\your
instance name\Providers\SQLOLEDB]"AllowInProcess"=dword:00000000
Error messages related to problem:
***Unable to get thread context for spid 56
2004-08-13 08:18:04.06 server CImageHelper::Init () Failed load of
dbghelp.dll - Not enough storage is available to process this command.
2004-08-13 08:18:04.44 logon Login failed for user 'User1'. Only
administrators may connect at this time.
2004-08-12 14:24:02.09 spid134 WARNING: Failed to reserve contiguous
memory of Size= 131072.
2004-08-12 14:57:04.98 server SQL Server could not spawn
process_loginread thread.
THIS POSTING IS PROVIDED WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. USE THE
INFORMATION AND ADVICE AT YOUR OWN RISK.Thanks for the informative post, it might save us some pain.
Did you consider setting the linked server "allow in-process" switch to
false so that the allocations are made out-of-process which would mean I
guess in the buffer pool instead of the MemToLeave?
It wouldn't solve a memory leak, but if the queries use up the memory in the
MemToLeave area, wouldn't this cause the same problems you have.
Did they advise that out-of-process would be much slower.
thanks,
"dosberg@.yahoo.com" wrote:

> Starting in early 2004 our active\active SQL Server cluster started
> experiencing intermittent slow downs, mini-dumps, and error messages in
> the client and SQL logs (see below). These symptoms would build until
> the system became non-responsive. The final symptom was login failures
> for connections using Windows authentication. Based on the error
> messages and that a restart\failover brought the server back online we
> were sure the problem was memory pressure.
> The first few times the memory pressure happened we applied standard
> troubleshooting. All the latest patches were applied, called PSS, code
> was reviewed, questionable processes and troublesome jobs were reviewed
> and cleaned up. Three or four times we were confident that we found
> the likely cause. A few weeks later the problem would re-occur.
> After six months and out of answers we bought Essential Support from
> Microsoft and opened a case The level of support and knowledge is much
> higher than PSS. Working with Essential Support they confirmed our
> in-house diagnosis of memory pressure in the MemToLeave area of the SQL
> Server memory space. For more information on MemToLeave find Ken
> Henderson's article on MSDN.
> It took memory dumps and diagnostics after three incidents before
> Microsoft found the problem. There is memory leak in SQL Server and
> how it handles disconnects in remote queries across linked servers.
> The scenario in my shop is like this: an app server starts a request
> on behalf of a remote client. The request starts a remote query on
> SQL Server A that reaches across to SQL Server B. Our app server has
> aggressive retry logic (long story) so it terminates the original
> process if it has waited too long. SQL Server A handles the disconnect
> fine but SQL Server B doesn't. It has no place to send the data it
> has gathered so that memory allocation on Server B leaks. It is not
> de-allocated until the next time SQL Server restarts.
> MemToLeave and the processes that run there want large chunks of
> contiguous memory. These leaked sections of memory eventually caused
> MemToLeave to become fragmented; leading to the infamous unable to
> reserve memory message in my SQL Server log.
> I'm guessing that someone will post a reply saying SQL Server
> doesn't leak; it is our app server etc. Well, SQL Server does have a
> leak; a hotfix and KB article will be coming to a SQL Server near you
> soon.
> Here is what I would recommend for troubleshooting this process if you
> get any of the errors or symptoms I've listed:
> 1. Read Ken Henderson's article on MemToLeave so you understand this
> poorly documented part of SQL Server.
> http://msdn.microsoft.com/library/d...ev_01262004.asp
> 2. Set the -G switch in your startup parameter to 384 megs. This
> will increase MemtoLeave but reduce the amount of memory available to
> the buffer pool.
> 3. Look for any memory leaks in your own code. Review all use of
> Sp_OA, third party or in-house extended stored procedures etc.
> Remember every byte counts!
> 4. Try to reduce your linked server traffic or identify bottlenecks in
> your remote queries.
> 5. Bypass PSS and spend the money for Essential Support.
> 6. If you suspect your problem is like mine changing this registry
> setting. It is the workaround that we are using until the hotfix is
> released.
> & #91;HKEY_LOCAL_MACHINE\SOFTWARE\Microsof
t\Microsoft SQL Server\your
> instance name\Providers\SQLOLEDB]"AllowInProcess"=dword:00000000
>
> Error messages related to problem:
> ***Unable to get thread context for spid 56
> 2004-08-13 08:18:04.06 server CImageHelper::Init () Failed load of
> dbghelp.dll - Not enough storage is available to process this command.
> 2004-08-13 08:18:04.44 logon Login failed for user 'User1'. Only
> administrators may connect at this time.
> 2004-08-12 14:24:02.09 spid134 WARNING: Failed to reserve contiguous
> memory of Size= 131072.
> 2004-08-12 14:57:04.98 server SQL Server could not spawn
> process_loginread thread.
>
> THIS POSTING IS PROVIDED WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
> OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. USE THE
> INFORMATION AND ADVICE AT YOUR OWN RISK.
>

No comments:

Post a Comment