Tuesday, March 27, 2012

Help with blocking

Hi all,
I was hoping I could get some help from you all about a blocking
problem that is making me tear my hair out. I've got a client process
that is executing two DELETE statements hourly during our ten hour
production day. The first DELETE statement cleans out "expired"
entries from Table A and the second cleans out "expired" entries from
Table B.
Under normal circumstances, each of these queries executes in a few
seconds. About once a week the second query which deletes from Table B
runs long (2-3 minutes) though it does finish by itself. This creates
tons of blocking as both TABLEA and B are under heavy SELECT, INSERT,
and UPDATE activity. Here are the queries (sorta)...
Query1 - DELETE FROM TABLEA WHERE [FieldModified] < (GetDate() -
'00:05:00')
Query2 - DELETE FROM TABLEB WHERE [FieldGUID] NOT IN (SELECT
[FieldGUID] FROM TABLEA)
(note: FieldGUID is the PK for TABLEA)
I've checked the execution plan for the second query and I know that's
it is going to perform a table scan. I also know the row counts for
Tables A and B and my I/O subsystem and and can say for certain that
the second query should not run for more than a few seconds.
I've checked sysprocesses while this second query is running and
consistently find that it is not blocked, is not waiting, the [status]
column shows sleeping, open_tran column equals 2, cpu and physical_io
columns show low values and do change slowly over time. I have also
checked my normal metrics that I capture throughout the day and can't
find a problem. My server's CPU usage is low, disk queues are low to
nonexistent, low to no paging going on, no other processes running,
etc.
I can't for the life of me figure out what causes this second query to
consistently (but infrequently) run long. I'm running SQL 2000 SP3a
Enterprise Edition on a 2-node Active/Passive cluster. Each cluster
node has 8GB of memory and 4x2.8GHz CPUs.
Any and all help is greatly appreciated. Thanks.
Steve
With "implicit_transaction' on by default, your #1 query is enclosed within
a transaction. Thus, it holds the locks until it completes with the delete.
Your #2 query's SELECT statement is blocked at this point. This is probably
desired because you probably want to be sure of the data in TableA.
So, to improve this, I would suggest putting an index on your TableA's
FieldModified column. Also, consider using NOT EXISTS instead.
Now, if #2 is nothing more than a maintenance step which will run again, you
could force a READPAST (i.e. skip locked rows).
e.g.
DELETE FROM TABLEB WHERE [FieldGUID]
NOT IN (SELECT
[FieldGUID]
FROM TABLEA with(READPAST) )
-oj
<steveg@.payerpath.com> wrote in message
news:1107799902.434459.80610@.z14g2000cwz.googlegro ups.com...
> Hi all,
> I was hoping I could get some help from you all about a blocking
> problem that is making me tear my hair out. I've got a client process
> that is executing two DELETE statements hourly during our ten hour
> production day. The first DELETE statement cleans out "expired"
> entries from Table A and the second cleans out "expired" entries from
> Table B.
> Under normal circumstances, each of these queries executes in a few
> seconds. About once a week the second query which deletes from Table B
> runs long (2-3 minutes) though it does finish by itself. This creates
> tons of blocking as both TABLEA and B are under heavy SELECT, INSERT,
> and UPDATE activity. Here are the queries (sorta)...
> Query1 - DELETE FROM TABLEA WHERE [FieldModified] < (GetDate() -
> '00:05:00')
> Query2 - DELETE FROM TABLEB WHERE [FieldGUID] NOT IN (SELECT
> [FieldGUID] FROM TABLEA)
> (note: FieldGUID is the PK for TABLEA)
>
> I've checked the execution plan for the second query and I know that's
> it is going to perform a table scan. I also know the row counts for
> Tables A and B and my I/O subsystem and and can say for certain that
> the second query should not run for more than a few seconds.
> I've checked sysprocesses while this second query is running and
> consistently find that it is not blocked, is not waiting, the [status]
> column shows sleeping, open_tran column equals 2, cpu and physical_io
> columns show low values and do change slowly over time. I have also
> checked my normal metrics that I capture throughout the day and can't
> find a problem. My server's CPU usage is low, disk queues are low to
> nonexistent, low to no paging going on, no other processes running,
> etc.
> I can't for the life of me figure out what causes this second query to
> consistently (but infrequently) run long. I'm running SQL 2000 SP3a
> Enterprise Edition on a 2-node Active/Passive cluster. Each cluster
> node has 8GB of memory and 4x2.8GHz CPUs.
> Any and all help is greatly appreciated. Thanks.
> Steve
>
|||Hi,
Did you check lastwaittype and waitresource on sysprocesses which might give
you a hint..
We had some problems when we have max degree of parallelism is set to 0 (all
available processors) on some of the update statements for some queries. The
queries run with no problems when we use option maxdop (1) or when we set max
degree of parallelism to some value (processor number - 1). This is just an
assumption, our server has 8 cpu's..
Also check your log file growth settings, if you're transaction log file
setting is large and the auto growth is enabled like %10 etc maybe you need
to consider to change log auto-growth settings and auto-shrink settings as
well..
"steveg@.payerpath.com" wrote:

> Hi all,
> I was hoping I could get some help from you all about a blocking
> problem that is making me tear my hair out. I've got a client process
> that is executing two DELETE statements hourly during our ten hour
> production day. The first DELETE statement cleans out "expired"
> entries from Table A and the second cleans out "expired" entries from
> Table B.
> Under normal circumstances, each of these queries executes in a few
> seconds. About once a week the second query which deletes from Table B
> runs long (2-3 minutes) though it does finish by itself. This creates
> tons of blocking as both TABLEA and B are under heavy SELECT, INSERT,
> and UPDATE activity. Here are the queries (sorta)...
> Query1 - DELETE FROM TABLEA WHERE [FieldModified] < (GetDate() -
> '00:05:00')
> Query2 - DELETE FROM TABLEB WHERE [FieldGUID] NOT IN (SELECT
> [FieldGUID] FROM TABLEA)
> (note: FieldGUID is the PK for TABLEA)
>
> I've checked the execution plan for the second query and I know that's
> it is going to perform a table scan. I also know the row counts for
> Tables A and B and my I/O subsystem and and can say for certain that
> the second query should not run for more than a few seconds.
> I've checked sysprocesses while this second query is running and
> consistently find that it is not blocked, is not waiting, the [status]
> column shows sleeping, open_tran column equals 2, cpu and physical_io
> columns show low values and do change slowly over time. I have also
> checked my normal metrics that I capture throughout the day and can't
> find a problem. My server's CPU usage is low, disk queues are low to
> nonexistent, low to no paging going on, no other processes running,
> etc.
> I can't for the life of me figure out what causes this second query to
> consistently (but infrequently) run long. I'm running SQL 2000 SP3a
> Enterprise Edition on a 2-node Active/Passive cluster. Each cluster
> node has 8GB of memory and 4x2.8GHz CPUs.
> Any and all help is greatly appreciated. Thanks.
> Steve
>
|||oj,
The client that sends these two statements sends them in the same order
every time and won't send the second statement until the first has
completed. At a high level the code looks like the following so I
don't think the possibility you outlined is what is actually occuring.
try
prepare first delete
execute 1st delete
prepare 2nd delete
execute 2nd delete
catch
error handling logic
Steve
oj wrote:
> With "implicit_transaction' on by default, your #1 query is enclosed
within
> a transaction. Thus, it holds the locks until it completes with the
delete.
> Your #2 query's SELECT statement is blocked at this point. This is
probably
> desired because you probably want to be sure of the data in TableA.
> So, to improve this, I would suggest putting an index on your
TableA's
> FieldModified column. Also, consider using NOT EXISTS instead.
> Now, if #2 is nothing more than a maintenance step which will run
again, you[vbcol=seagreen]
> could force a READPAST (i.e. skip locked rows).
> e.g.
> DELETE FROM TABLEB WHERE [FieldGUID]
> NOT IN (SELECT
> [FieldGUID]
> FROM TABLEA with(READPAST) )
>
> --
> -oj
>
> <steveg@.payerpath.com> wrote in message
> news:1107799902.434459.80610@.z14g2000cwz.googlegro ups.com...
process[vbcol=seagreen]
from[vbcol=seagreen]
Table B[vbcol=seagreen]
creates[vbcol=seagreen]
INSERT,[vbcol=seagreen]
that's[vbcol=seagreen]
for[vbcol=seagreen]
that[vbcol=seagreen]
[status][vbcol=seagreen]
physical_io[vbcol=seagreen]
also[vbcol=seagreen]
can't[vbcol=seagreen]
to[vbcol=seagreen]
to[vbcol=seagreen]
SP3a[vbcol=seagreen]
cluster[vbcol=seagreen]
|||Sysprocesses shows waittype=0x0000, waittime=0, and lastwaittype is
PAGELATCH_SH or PAGELATCH_EX. What I get from that information is that
the DELETE statement that causes the trouble isn't waiting on anything
but that the last time it did wait was for PAGELATCH_SH or
PAGELATCH_EX.
I too have had problems with max degrees of parallelism which is why I
have it configured at the server level to be half the number of logical
CPUs in the server. I can also see in sysprocesses when the problem
DELETE is running that it is not running in parallel as there is only
one row for the problem SPID.
I have checked the file growth settings for both my database and
transaction log. Both TableA and TableB are in the same database whose
size is very stable throughout the day. This database is also set up
in Simple recovery mode so the transaction log stays mostly empty.
Steve
|||I see.
Just reread your post. You might be experiencing this:
http://support.microsoft.com/kb/329906
-oj
<steveg@.payerpath.com> wrote in message
news:1107803288.881378.258020@.l41g2000cwc.googlegr oups.com...
> oj,
> The client that sends these two statements sends them in the same order
> every time and won't send the second statement until the first has
> completed. At a high level the code looks like the following so I
> don't think the possibility you outlined is what is actually occuring.
> try
> prepare first delete
> execute 1st delete
> prepare 2nd delete
> execute 2nd delete
> catch
> error handling logic
>
> Steve
>
> oj wrote:
> within
> delete.
> probably
> TableA's
> again, you
> process
> from
> Table B
> creates
> INSERT,
> that's
> for
> that
> [status]
> physical_io
> also
> can't
> to
> to
> SP3a
> cluster
>
|||Also, there have been some *nasty* bugs with Parallelism (which some are
fixed in Sql2k sp4).
http://support.microsoft.com/search/...ery=sql+maxdop
I believe Umut is right on target here.
http://support.microsoft.com/kb/837983
Some info on upcoming fixes:
http://support.microsoft.com/kb/888799
-oj
<steveg@.payerpath.com> wrote in message
news:1107803288.881378.258020@.l41g2000cwc.googlegr oups.com...
> oj,
> The client that sends these two statements sends them in the same order
> every time and won't send the second statement until the first has
> completed. At a high level the code looks like the following so I
> don't think the possibility you outlined is what is actually occuring.
> try
> prepare first delete
> execute 1st delete
> prepare 2nd delete
> execute 2nd delete
> catch
> error handling logic
>
> Steve
>
> oj wrote:
> within
> delete.
> probably
> TableA's
> again, you
> process
> from
> Table B
> creates
> INSERT,
> that's
> for
> that
> [status]
> physical_io
> also
> can't
> to
> to
> SP3a
> cluster
>
|||I don't thing the DELETE is running as a parallel process because while
it's running I only see one row for it in sysprocesses. For parallel
queries I would expect to see multiple entries in sysprocesses for the
same SPID with different values in the ecid column for each process.
Steve
|||Hi Steve,
PAGELATCH_SH or PAGELATCH_EX explanation:
Contention can be caused by issues other than IO or memory performance, for
example, heavy concurrent inserts into the same index range can cause this
type of contention. If a lot of inserts need to be placed on the same page
they are serialized using the latch. A lot of inserts into the same range
can also cause page splits in the index which will hold onto the latch while
allocating a new page (this can take a while). Any read accesses to the same
range as the inserts would also conflict on the latches. (from Tom Davidson
article)
I recommend you Tom Davidson article - Opening Performance Tuning Toolbox
from SQLMag. I use it all the time and helps me a lot.
You did not give any details about how many rows does the table have and how
many rows are being deleted, if this is a batch process then consider to do
this in non business hours. If it's a part of your application and if there
are a lot of records affected by the delete then i think you need to change
your code..
If there are a few records affected by the delete then try to use cursors
and delete one by one but i don't recommend this if it's a batch process and
a lot of records are affected..
"steveg@.payerpath.com" wrote:

> Sysprocesses shows waittype=0x0000, waittime=0, and lastwaittype is
> PAGELATCH_SH or PAGELATCH_EX. What I get from that information is that
> the DELETE statement that causes the trouble isn't waiting on anything
> but that the last time it did wait was for PAGELATCH_SH or
> PAGELATCH_EX.
> I too have had problems with max degrees of parallelism which is why I
> have it configured at the server level to be half the number of logical
> CPUs in the server. I can also see in sysprocesses when the problem
> DELETE is running that it is not running in parallel as there is only
> one row for the problem SPID.
> I have checked the file growth settings for both my database and
> transaction log. Both TableA and TableB are in the same database whose
> size is very stable throughout the day. This database is also set up
> in Simple recovery mode so the transaction log stays mostly empty.
> Steve
>
|||I don't think parallel queries cause multiple rows in sysprocesses. I only
see multiple entries in sysprocesses related to same SPID when lightweight
pooling is set to 1. (NT Fibre's).
FYI: Parallel Process wait type is CXPACKET
"steveg@.payerpath.com" wrote:

> I don't thing the DELETE is running as a parallel process because while
> it's running I only see one row for it in sysprocesses. For parallel
> queries I would expect to see multiple entries in sysprocesses for the
> same SPID with different values in the ecid column for each process.
> Steve
>
sql

No comments:

Post a Comment