Thursday, March 29, 2012

Help with complicated query...

Hello All,
I have a simple table that stores messages of different types from
different sources. The definition of the table is shown below. I need
to devise an efficient query to return a "list of the N more recent
messages for a subset of sources within a specified time frame."
create table MessageTbl
(
src nvarchar(50), -- Source of the message
type nvarchar(50), -- Type of the message.
msg nvarchar(1000), -- Text of the message
dt datetime -- When the message was posted
)
We are given the following parameters:
1. declare @.startTime datetime -- Start of the time frame
2. declare @.endTime datetime -- End of the time frame
3. declare @.myTable( src nvarchar(50), type nvarchar(50)) -- This table
contains a list of sources/types for which we want to obtain the
messages.
4. N -- How many messages per source/type
If all I wanted was the 20 more recent messages for source1/type1
between @.startTime and @.endTime, I could do something like:
SELECT TOP 20 *
FROM MessageTbl
WHERE (src ='source1') AND ( type='type1')
AND ( dt BETWEEN @.startTime AND @.endTime )
ORDER BY dt DESC
In my case, however, I have a number of pairs (src,type) in the local
table @.myTable. Therefore, what I ultimately want is equivalent to the
UNION of the results of such query for each pair(src,type). Another
thing missing is that I am using a hardcoded value for the TOP clause.
This also varies.
All the solutions that I can think of are very inneficient, cumbersome,
and involve a number of temporary tables. I was wondering if the
experts could lead me to a cleaner query design.
Thank you
- CDOn 25 Oct 2005 14:57:37 -0700, crbd98@.yahoo.com wrote:

>Hello All,
>I have a simple table that stores messages of different types from
>different sources. The definition of the table is shown below. I need
>to devise an efficient query to return a "list of the N more recent
>messages for a subset of sources within a specified time frame."
>create table MessageTbl
>(
> src nvarchar(50), -- Source of the message
> type nvarchar(50), -- Type of the message.
> msg nvarchar(1000), -- Text of the message
> dt datetime -- When the message was posted
> )
>We are given the following parameters:
>1. declare @.startTime datetime -- Start of the time frame
>2. declare @.endTime datetime -- End of the time frame
>3. declare @.myTable( src nvarchar(50), type nvarchar(50)) -- This table
>contains a list of sources/types for which we want to obtain the
>messages.
>4. N -- How many messages per source/type
>
>If all I wanted was the 20 more recent messages for source1/type1
>between @.startTime and @.endTime, I could do something like:
>SELECT TOP 20 *
>FROM MessageTbl
>WHERE (src ='source1') AND ( type='type1')
> AND ( dt BETWEEN @.startTime AND @.endTime )
>ORDER BY dt DESC
>In my case, however, I have a number of pairs (src,type) in the local
>table @.myTable. Therefore, what I ultimately want is equivalent to the
>UNION of the results of such query for each pair(src,type). Another
>thing missing is that I am using a hardcoded value for the TOP clause.
>This also varies.
>All the solutions that I can think of are very inneficient, cumbersome,
>and involve a number of temporary tables. I was wondering if the
>experts could lead me to a cleaner query design.
>Thank you
>- CD
Hi CD,
Try if this works:
SELECT a.src, a.type, a.msg, a.dt
FROM MessageTbl AS a
INNER JOIN @.myTable AS b
ON b.src = a.src
AND b.type = a.type
WHERE a.dt BETWEEN @.startTime AND @.endTime
AND a.dt IN (SELECT TOP 20 dt
FROM MessageTbl AS c
WHERE c.src = a.src
AND c.type = a.type
AND c.dt BETWEEN @.startTime AND @.endTime
ORDER BY c.dt DESC)
(untested)
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hello Hugo,
Thank you for your reply. The query that you posted produces the
correct result but it is very expensive. It takes almost 30 seconds
when MessageTbl has about 3600 rows and @.myTable has 51 rows. I checked
the execution plan and it seems there is a lot of table scanning going
on.
Maybe I have to look into a solution that does not use the join...
Any Ideas?
Thanks
- CD|||Are there any indexes? Your original DDL had no keys and you used a
proprietary table variable.
Also, you might want to fix the data element names. They are horrible.|||Hello Celko,
The table MessageTbl has a compound primary key involving src and type.
There are no other indexes. My local table variable has no keys and no
indexes. Do you think this is a problem? Is there any problem in using
table variables or are you just concerned about the portability of the
code?
Any suggestions?
Thank you
CD
--CELKO-- wrote:
> Are there any indexes? Your original DDL had no keys and you used a
> proprietary table variable.
> Also, you might want to fix the data element names. They are horrible.|||On 25 Oct 2005 16:22:34 -0700, crbd98@.yahoo.com wrote:

>Hello Hugo,
>Thank you for your reply. The query that you posted produces the
>correct result but it is very expensive. It takes almost 30 seconds
>when MessageTbl has about 3600 rows and @.myTable has 51 rows. I checked
>the execution plan and it seems there is a lot of table scanning going
>on.
>Maybe I have to look into a solution that does not use the join...
>
>Any Ideas?
Hi CD,
Try if adding this index helps:
CREATE INDEX ProperNameHere
ON MessageTbl (src, type, dt DESC)
On 26 Oct 2005 10:50:55 -0700, crbd98@.yahoo.com wrote:

>Hello Celko,
>The table MessageTbl has a compound primary key involving src and type.
Huh? If there's a compount primary key on src and type, then how can you
find the 20 most recent messages between two moments for a given src and
type? As a result of the primary key, there will be only one message for
each src / type combination!!
But since you apparently have a PRIMARKY KEY that was not included in
your first post, please post the complete CREATE TABLE statement, WITH
all constraints, properties and indexes. My suggestion above might well
be invalidated by your current keys and indexes.

>There are no other indexes. My local table variable has no keys and no
>indexes. Do you think this is a problem?
You might have duplicates in the table variable, which will never
improve performance.
You can't define indexes for a table variable, but you can define
PRIMARY KEY or UNIQUE constraints (and they DO automatically add an
index). In your case, try if adding a PRIMARY KEY (src, type) helps the
performance. And if it doesn't, but doesn't hinder performance either,
then do leave it in.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hello Hugo,
Thank you for the suggestion of creating an index with the src, type,
and dt. That reduced the execution time from 30 sec to less than a
second. My last problem was to parameterize the value for TOP clause. I
created a huge dynamic SQL statement that I use with sp_executesql.
VERY UGLY!!! Do you have any alternative suggestion?
Thanks
Cassiano|||On 28 Oct 2005 00:52:02 -0700, crbd98@.yahoo.com wrote:

> Hello Hugo,
> Thank you for the suggestion of creating an index with the src, type,
> and dt. That reduced the execution time from 30 sec to less than a
> second. My last problem was to parameterize the value for TOP clause. I
> created a huge dynamic SQL statement that I use with sp_executesql.
> VERY UGLY!!! Do you have any alternative suggestion?
> Thanks
> Cassiano
Issue a SET ROWCOUNT in the stored procedure:
CREATE PROCEDURE foo (@.nRows int, @.bar varchar(30))
AS
SET ROWCOUNT @.nRows
SELECT * FROM sysobjects where name <> @.bar
SET ROWCOUNT 0
GO
SQL Server specific, but it works, and there's no dynamic SQL or even
recompilation.|||On 28 Oct 2005 00:52:02 -0700, crbd98@.yahoo.com wrote:

>Hello Hugo,
>Thank you for the suggestion of creating an index with the src, type,
>and dt. That reduced the execution time from 30 sec to less than a
>second. My last problem was to parameterize the value for TOP clause. I
>created a huge dynamic SQL statement that I use with sp_executesql.
>VERY UGLY!!! Do you have any alternative suggestion?
>Thanks
>Cassiano
Hi Cassiano,
The suggestion made by Ross (SET ROWCOUNT) is fine if you wwant to limit
the total number of rows returned by the query. But I seem to recall
that your problem was more complex than that.
Going back in the thread, I see this query I posted a few days ago - is
this the one you are using, and where you want to replace TOP 20 with a
variable number?
SELECT a.src, a.type, a.msg, a.dt
FROM MessageTbl AS a
INNER JOIN @.myTable AS b
ON b.src = a.src
AND b.type = a.type
WHERE a.dt BETWEEN @.startTime AND @.endTime
AND a.dt IN (SELECT TOP 20 dt
FROM MessageTbl AS c
WHERE c.src = a.src
AND c.type = a.type
AND c.dt BETWEEN @.startTime AND @.endTime
ORDER BY c.dt DESC)
The easiest answer is to wait a few ws. TOP @.variable will be
suppported in SQL Server 2000, which will hit the streets in the w of
November 7th.
Or use the following (which is ANSI standard to boot):
SELECT a.src, a.type, a.msg, a.dt
FROM MessageTbl AS a
INNER JOIN @.myTable AS b
ON b.src = a.src
AND b.type = a.type
WHERE a.dt BETWEEN @.startTime AND @.endTime
AND (SELECT COUNT(*)
FROM MessageTbl AS c
WHERE c.src = a.src
AND c.type = a.type
AND c.dt BETWEEN @.startTime AND @.endTime
AND c.dt <= a.dt) <= 20
(untested)
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hello Hugo,
Thank you very much for your reply. This whole discussion has been very
enlightening.
I compared the performance of two variations of the solution:
#1. The last solution that you suggested.
#2. Based on your original solution (the one with hardcoded TOP
clause). The only modification that I made was to create the query
string dynamically to simmulate the effect of a variable TOP clause.
The table has the following indices (in addition to a PK index on a
MsgId column that I did not include in the posting).
CREATE INDEX IX_MessageTbl1
ON MessageTbl (src, type, dt DESC)
and
CREATE INDEX IX_MessageTbl2
ON MessageTbl (dt DESC)
I noticed that #1 was 6 times slower than #2. Although #2 is faster, I
do not like it, because I create the query dynamically. Do you know if
there is any index that I can create or any hint that I can use to
speed-up the query #1.
Thank you
- CDsql

No comments:

Post a Comment