Showing posts with label complex. Show all posts
Showing posts with label complex. Show all posts

Thursday, March 29, 2012

Help with complex summary of data please

Hi, I'm fairly new to RS and am looking for pointers to help me with
the following situation.
I have a system that tracks the activity of our warehouse and drivers.
Basically, it logs when an order is entered and when it leaves the
warehouse. There are various types of orders and seven warehouses. I
have a view which gives me all the detail I need including the
TimeToSend of each order (the difference between order time and send
time) all in one record per order. I have to create a report that
summarises all of this data on a daily basis by providing the
following:
At the entire company level and then broken down by each warehouse for
each day:
- #orders of TypeA where TimeToSend < X minutes with a % of Total
orders
- #orders of TypeA where TimeToSend > X and < Y minutes with a % of
Total orders
- #orders of TypeA where TimeToSend >Y minutes with a % of Total
orders
- AVG TimeToSend for the day for TypeA orders
- #orders of TypeB where TimeToSend < X minutes with a % of Total
orders
- #orders of TypeB where TimeToSend > X and < Y minutes with a % of
Total orders
- #orders of TypeB where TimeToSend >Y minutes with a % of Total
orders
- AVG TimeToSend for the day for TypeB orders
I have achieved the above using two datasets and two tables in my
report aligned next to each other and sorted so the warehouses appear
next to each other.
I have been asked to add additional summary data for each warehouse.
Such as:
- Count of orders of TypeC
- Best AVG TimeToSend of all days (i.e. best day ever)
Now, I can get these new data items from more datasets and add more
tables to the report but am I going about this the right way. Ideally,
having more than one dataset per table would be great. Or if I could
link the tables somehow.
Should I look at trying to return all the data in one dataset? Is that
even possible? My sql is fairly good but I don't know how to combine
the disparate data together.
Any suggestions would be appreciated.Hi,
I am not sure whether I have understood you fully, however, I think you may
integrate the data in another new Table first or a temp table?
If you want to use multiple datasets, it is not a easy job as Data Regions
(Table, Matrix, Chart, and List) are bound to a single data set. You can
refer to other data sets but the reference must be wrapped in an aggregate
such as First(Fields!FieldName.Value, "DataSetX").
Your options are to write a single query, using SQL Server's OpenRowset
functionality, that joins the multiple data sources into a single data set.
The other choice is to use multiple side by side tables though this might
introduce formatting issues.
Sincerely yours,
Michael Cheng
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
=====================================================
This posting is provided "AS IS" with no warranties, and confers no rights.

help with complex SQL query

I need some help from prog. gurus with sql query.

I have a commodity table and I would like to pull information from it based on the conditions. I can write code to pull straight forward information from the table but I need help in writing query such that the later columns are based on the former columns.

To clear my question, I am writing the sample code below.
The column 4, 5 and 6 are based on column 1, 2, 3.
Column6 is column3/column2 (which is simple)
But column4 & 5 are too complex for me to write the code for. Column4 is the % of Quantity (cty_code)/(all countries) for that particular commodity. cty_code correspends to a country.

Table information:

[dbo].[2005exp](
[dom_or_for] [char](1) NOT NULL,
[commodity] [char](10) NOT NULL,
[cty_code] [char](4) NOT NULL,
[district] [char](2) NOT NULL,
[stat_month] [char](2) NOT NULL,
[cards_mo] [decimal](8, 0) NOT NULL,
[all_qy1_mo] [decimal](12, 0) NOT NULL,
[all_val_mo] [decimal](12, 0) NOT NULL
)


Table data snapshot:


Query1:
select top 10 commodity, sum(all_qy1_mo) as Quantity, sum(all_val_mo) as Price from [2005exp]
where cty_code=5310 group by Commodity order by price desc


Output:

Commodity

Quantity

Price

Column4Column5Price/Quantity

8517305000

0

46307629

8517905000

0

11990255

3003100000

2268

2687905

0.35%29.92%1185.1437

8524990000

148

1815000

8471300000

2591

1673570

9802400000

0

1560247

9880004000

0

1197407

8802300080

1

1100000

3819000000

601192

899417

9802200000

0

811996

Query2: (based on the results of query 1 i.e., commodity)
select Sum(all_qy1_mo) as Quantity, sum(all_val_mo) as Price from [2005exp]
where commodity=3003100000

Output:

Quantity

Price

645261

8982928

2268 / 645261 = 0.35% Column 4

2687905 / 8982928 = 29.92% Column 5

Quantity/Price = Average (2687906/2268 = 1185.1437) i.e. column3/column2=Column6

The final output should like: (There are additional 2 columns in the table below but they are not required)

select a.commodity, sum(a.all_qy1_mo) as Quantity, sum(a.all_val_mo) as Price,

CASE WHEN (select Sum(b.all_qy1_mo)from [2005exp] b

where a.commodity=b.commodity)<>0 THEN (sum(a.all_qy1_mo)/(select Sum(b.all_qy1_mo) from [2005exp] b

where a.commodity=b.commodity))*100.00 ELSE NULL END as col4, CASE WHEN (select sum(b.all_val_mo) from [2005exp] b where a.commodity=b.commodity)<>0 THEN (sum(a.all_val_mo)/(select sum(b.all_val_mo) from [2005exp] b where a.commodity=b.commodity))*100.00 ELSE NULL END as col5,

CASE WHEN sum(a.all_qy1_mo)<> 0 THEN sum(a.all_val_mo)/sum(a.all_qy1_mo) ELSE NULL END as col6

from [2005exp] a

WHERE a.cty_code=5310

GROUP BY a.commodity

|||

Thank you so much. The code works perfectly. I really appreciate all your help.

I wanted to add another column 7 which is column5/column4. How can I save the column5 and Column4 values in 2 different variables and then perform the division ?

I tried to use just the alias names i.e col5/col4, but its not working.

I tried using the actual code for both columns and its working but its taking 4 minutes to execute the query. Its repeating the same steps twice. How is it done right way ?

Also, I was trying to delcare a variable for the table name but without any success. Who is it done ?

Use test
Go

Declare @.Country Int
Set @.Country=5310

select top 10 a.commodity as Commodity, c.descrip_1 as Description, c.quantity_1 as Unit,

sum(a.all_qy1_mo) as Quantity, sum(a.all_val_mo) as [Value],

CASE WHEN (select Sum(b.all_qy1_mo)from [2005exp] b where a.commodity=b.commodity)<>0 THEN (sum(a.all_qy1_mo)/(select Sum(b.all_qy1_mo) from [2005exp] b where a.commodity=b.commodity))*100.00 ELSE NULL END as [U.S. Share(Quantity) %],

CASE WHEN (select sum(b.all_val_mo) from [2005exp] b where a.commodity=b.commodity)<>0 THEN (sum(a.all_val_mo)/(select sum(b.all_val_mo) from [2005exp] b where a.commodity=b.commodity))*100.00 ELSE NULL END as [U.S. Share(Value) %],

CASE WHEN sum(a.all_qy1_mo)<> 0 THEN sum(a.all_val_mo)/sum(a.all_qy1_mo) ELSE NULL END as [Average Price],

CASE WHEN (select Sum(b.all_qy1_mo)from [2005exp] b where a.commodity=b.commodity)<>0 THEN
(((sum(a.all_val_mo)/(select sum(b.all_val_mo) from [2005exp] b where a.commodity=b.commodity)))/((sum(a.all_qy1_mo)/(select Sum(b.all_qy1_mo) from [2005exp] b where a.commodity=b.commodity)))) ELSE NULL END as [Price Ratio]

from [2005exp] a inner join concord c on a.commodity=c.commodity

WHERE a.cty_code=@.Country
GROUP BY a.commodity, c.descrip_1, c.quantity_1 order by [Value] desc

Help With complex SELECT

Can someone help me with an advanced query?

I have two tables

Table 1: Employees
Cols: UID, lname, fname

Table 2: StatEntry
Cols: UID, Timestamp, description

The queary should reaturn all the information in both tables. If more
than one entry exists in the second table, it should return the one
with the greatest timestamp. If not entries exist I would like the
second table columns set to "no value"

Something link:
select Employees.*,StatEntry.* from Employees JOIN StatEntry ON
employees.uid == statentry.uid WHERE timestamp in (select
MAX(timestamp) from statentry where uid=employees.uid).

Anyone db guru's out there?
-SJOK, I am close.

Here is what I have

SELECT a.*,b.* from employees as a LEFT JOIN statentry as b ON
a.uid=b.uid
WHERE b.timestamp IN (select MAX(timestamp) from statentry where
uid=b.uid)
OR b.timestamp IS NULL

This returns me all the values, but for some rease UID in the result
set is allways null. ANy idea why?

THanks,
-SJ

On Jan 30, 6:00 pm, "SJ" <sjour...@.gmail.comwrote:

Quote:

Originally Posted by

Can someone help me with an advanced query?
>
I have two tables
>
Table 1: Employees
Cols: UID, lname, fname
>
Table 2: StatEntry
Cols: UID, Timestamp, description
>
The queary should reaturn all the information in both tables. If more
than one entry exists in the second table, it should return the one
with the greatest timestamp. If not entries exist I would like the
second table columns set to "no value"
>
Something link:
select Employees.*,StatEntry.* from Employees JOIN StatEntry ON
employees.uid == statentry.uid WHERE timestamp in (select
MAX(timestamp) from statentry where uid=employees.uid).
>
Anyone db guru's out there?
-SJ

|||You could try something like:

SELECT a.*, b.*
FROM employees as a
LEFT OUTER
JOIN (SELECT *
FROM statentry as X
WHERE X.timestamp =
(select max(timestamp)
from statentry as Y
where X.uid = Y.uid)) as b
ON a.uid = b.uid

Roy Harvey
Beacon Falls, CT

On 30 Jan 2007 17:49:11 -0800, "SJ" <sjourdan@.gmail.comwrote:

Quote:

Originally Posted by

>OK, I am close.
>
>Here is what I have
>
>SELECT a.*,b.* from employees as a LEFT JOIN statentry as b ON
>a.uid=b.uid
>WHERE b.timestamp IN (select MAX(timestamp) from statentry where
>uid=b.uid)
>OR b.timestamp IS NULL
>
>
>This returns me all the values, but for some rease UID in the result
>set is allways null. ANy idea why?
>
>THanks,
>-SJ
>
>On Jan 30, 6:00 pm, "SJ" <sjour...@.gmail.comwrote:

Quote:

Originally Posted by

>Can someone help me with an advanced query?
>>
>I have two tables
>>
>Table 1: Employees
>Cols: UID, lname, fname
>>
>Table 2: StatEntry
>Cols: UID, Timestamp, description
>>
>The queary should reaturn all the information in both tables. If more
>than one entry exists in the second table, it should return the one
>with the greatest timestamp. If not entries exist I would like the
>second table columns set to "no value"
>>
>Something link:
>select Employees.*,StatEntry.* from Employees JOIN StatEntry ON
>employees.uid == statentry.uid WHERE timestamp in (select
>MAX(timestamp) from statentry where uid=employees.uid).
>>
>Anyone db guru's out there?
>-SJ


>

|||On Jan 30, 7:49 pm, "SJ" <sjour...@.gmail.comwrote:

Quote:

Originally Posted by

OK, I am close.
>
Here is what I have
>
SELECT a.*,b.* from employees as a LEFT JOIN statentry as b ON
a.uid=b.uid
WHERE b.timestamp IN (select MAX(timestamp) from statentry where
uid=b.uid)
OR b.timestamp IS NULL
>
This returns me all the values, but for some rease UID in the result
set is allways null. ANy idea why?
>
THanks,
-SJ
>
On Jan 30, 6:00 pm, "SJ" <sjour...@.gmail.comwrote:
>
>
>

Quote:

Originally Posted by

Can someone help me with an advanced query?


>

Quote:

Originally Posted by

I have two tables


>

Quote:

Originally Posted by

Table 1: Employees
Cols: UID, lname, fname


>

Quote:

Originally Posted by

Table 2: StatEntry
Cols: UID, Timestamp, description


>

Quote:

Originally Posted by

The queary should reaturn all the information in both tables. If more
than one entry exists in the second table, it should return the one
with the greatest timestamp. If not entries exist I would like the
second table columns set to "no value"


>

Quote:

Originally Posted by

Something link:
select Employees.*,StatEntry.* from Employees JOIN StatEntry ON
employees.uid == statentry.uid WHERE timestamp in (select
MAX(timestamp) from statentry where uid=employees.uid).


>

Quote:

Originally Posted by

Anyone db guru's out there?
-SJ- Hide quoted text -


>
- Show quoted text -


Thank you very very much!|||On 31 Jan 2007 10:00:39 -0800, "SJ" <sjourdan@.gmail.comwrote:

Quote:

Originally Posted by

>Thank you so much, This is perfect. However, when I do dbUID =
>result.Fields.Item("uid") always get a null
>If I do for each over the result set, it comes back fine. Any ideas
>on that? should it be someting like dbUID =
>result.Fields.Item("a.uid")?
>
>Thanks again.


Try removing the quotes. Use result.Fields.Item(uid) rather than
result.Fields.Item("uid"), result.Fields.Item(a.uid) rather than
result.Fields.Item("a.uid").

Roy Harvey
Beacon Falls, CT

Wednesday, March 28, 2012

Help with complex query

Hi Everyone,

I need help writing the following query. I have to group my data by Department and have a field that will calculation the number of minutes that employee worked in that department. So basically I take the total number of minutes worked in the department and divide it by the total number of minutes the employee worked for the specified date range.

--

Agent Name: John Doe

Date Range: 1/1/2007 - 6/30/2007

RowID Work Minutes in Dept Total Work Minutes Dept

1 26355 52920 Service

2 9000 52920 Parts

3 17565 52920 Dispatch

Service = 26355 / 52920 = 0.499 = 50%

Parts = 9000 / 52920 = 0.17 = 17%

Dispatch= 17565 / 52920 = 0.33 = 33%

--

How can I accomplish this?

I am using SQL Server 2005 Express

Thank You

Assuming your table is the grouped sum's by department:

Code Snippet

create table #t1 (RowID int, [Work Minutes] int, [Total Work Minutes] int, Dept varchar(20) )

insert into #t1

select 1, 26355, 52920, 'Service'

union all select 2, 9000, 52920, 'Parts'

union all select 3, 17565, 52920, 'Dispatch'

select Dept, ' = ' + convert(varchar(15), [Work Minutes]) + ' / ' + convert(varchar(15), [Total Work Minutes]),

round(([Work Minutes]*100.00)/[Total Work Minutes], 0) as 'Percentage'

from #t1

|||

DaleJ,

The table data is not grouped. Thats what makes this query complex.

|||

SamCosta wrote:

DaleJ,

The table data is not grouped. Thats what makes this query complex.

Can you right click the tables that you are using and choose "Script Table As..." and "Create To.." and post those back here. Once we have your structure we can help further.|||

And some additional sample data.

It's not that difficult, but would like to get it right the first time (or two )

|||

Code Snippet

SELECT E.EMPID,

E.DEPTID,

CMS.Productivity(CMS.TrueCalls(SUM(D.ti_stafftime), SUM(D.ti_availtime), SUM(D.acdcalls)), SUM(D.acdcalls),
Agent.AbsentPercentage(SUM(CONVERT(int, A.TOTALABSENT)), SUM(CONVERT(int, A.TOTALWORKMI))), E.DEPTID) AS AvgPLevel,

SUM(CONVERT(int, A.TOTALWORKMI)) AS DEPTWORKMI

FROM CMS.dAgent AS D INNER JOIN dbo.EMP_DEPT_ASSOC AS E ON D.EmployeeID = E.EMPID AND D.row_date >= E.STARTDATE AND D.row_date

<= E.STOPDATE INNER JOIN EmpAbsents As A ON D.row_date = A.ROW_DATE AND D.EmployeeID = A.EMPID

WHERE (D.row_date BETWEEN @.FromDate AND @.ToDate) AND (D.EmployeeID = @.EmpID)


GROUP BY E.ID, E.DEPTID

This query outputs:

EmpID DeptID AvgPLevel DeptWorkMI

28899 Service 2 17244

28899 Parts 3 9000

28899 Dispatch 1 27836

I need to then group the query results by EmpID to get a total Average Productivity Level

Result:

EmpID AvgPLevel

28899 2

How to calculate total productivity level:

(2 * 17244 / 54080) + (3 * 9000 / 54080) + (1 * 27836 / 54080) = 1.65 = Level 2

54080 is the total number of minutes worked in ALL departs. (17244 + 9000 + 27836 = 54080)

Thank You

|||

See if this does what you need:

Code Snippet

;WITH base

AS

(

SELECT E.EMPID,

E.DEPTID,

CMS.Productivity(CMS.TrueCalls(SUM(D.ti_stafftime), SUM(D.ti_availtime), SUM(D.acdcalls)), SUM(D.acdcalls),

Agent.AbsentPercentage(SUM(CONVERT(int, A.TOTALABSENT)), SUM(CONVERT(int, A.TOTALWORKMI))), E.DEPTID) AS AvgPLevel,

SUM(CONVERT(int, A.TOTALWORKMI)) AS DEPTWORKMI

FROM CMS.dAgent AS D INNER JOIN dbo.EMP_DEPT_ASSOC AS E ON D.EmployeeID = E.EMPID AND D.row_date >= E.STARTDATE AND D.row_date

<= E.STOPDATE INNER JOIN EmpAbsents As A ON D.row_date = A.ROW_DATE AND D.EmployeeID = A.EMPID

WHERE (D.row_date BETWEEN @.FromDate AND @.ToDate) AND (D.EmployeeID = @.EmpID)

GROUP BY E.ID, E.DEPTID

),

Totals

AS

(

SELECT EmpID, SUM(DeptWorkMi) AS TotalMinutes

FROM base

GROUP BY EmpID

)

SELECT b.EmpID, ROUND(SUM(1.0 * b.AvgPLevel * b.DeptWorkMI / t.TotalMinutes), 0) as AvgPLevel

FROM base b

INNER JOIN Totals AS t

ON b.EmpID = t.EmpID

GROUP BY b.EmpID

|||

Thank you DaleJ. I was able to solve this problem using the CTE query example you provided.

Help with complex query

Hi Everyone,

I need some help writing a query that joins a table to a UNION query. I was wondering what is the most efficent way to do this.

Tables

Employees (EMPID, FULLNAME)

DailySchedules (SCHID,EMPID, SCHDATE,DEPTID)

GeneralSegments (GSID,EMPID,STARTTIME,STOPTIME)

DetailSegments (DSID,EMPID,STARTTIME,STOPTIME)

I need to join the records from GeneralSegments and DetailSegments THEN inner join DailySchedules and Employees.

Query must output:

EMPID, FULLNAME, SCHDATE,STARTTIME,STOPDTIME

Thank You

Can you provide some sample data (preferrably in the form of insert statements) along with a data representation of what you want for output?

Thanks.

|||

Untested, but, it should give you an idea

Code Snippet


SELECT Segments.EMPID,
Employees.FULLNAME,
DailySchedules.SCHDATE,
Segments.STARTTIME,
Segments.STOPDTIME

FROM

(
SELECT EMPID, STARTTIME, STOPTIME FROM GeneralSegments
UNION
SELECT EMPID, STARTTIME, STOPTIME FROM DetailSegments
) Segments
INNER JOIN Employees
ON Employees.EMPID = Segments.EMPID
INNER JOIN DailySchedules
ON DailySchedules.EMPID = Segments.EMPID

|||

Hi David,

That example will work. One question though, I have to display the next 14 days (2 weeks) on the web. Where should I place my WHERE clause? In the outer query or include the date range in both union statements?

ie. WHERE SCHDATE BETWEEN GETDATE() AND GETDATE() + 14

Thank You

|||

Is SCHDATE column present in the tables in the DailySchedule table or the segments tables? In any case, it doesn't matter. SQL Server will automatically roll the schdate predicate into the inner queries as well based on their reference. For example, if you want to filter on STARTTIME then you can include just one WHERE clause like below and it will be applied to both GeneralSegments & DetailSegments

Code Snippet

SELECT Segments.EMPID,
Employees.FULLNAME,
d.SCHDATE,
Segments.STARTTIME,
Segments.STOPDTIME

FROM

(
SELECT EMPID, STARTTIME, STOPTIME FROM GeneralSegments
UNION
SELECT EMPID, STARTTIME, STOPTIME FROM DetailSegments
) Segments
INNER JOIN Employees
ON Employees.EMPID = Segments.EMPID
INNER JOIN DailySchedules
ON DailySchedules.EMPID = Segments.EMPID
WHERE Segments.STARTTIME >= @.Start

sql

Friday, March 23, 2012

Help with accessing data from a webform please...

Ok,
I have a web app that calls several very complex (at least to me) sql
queries like this for example:
SELECT t1.MemberId, t1.PeriodID,
SUM(CASE WHEN t2.amountTypeId = 7 THEN t2.amount WHEN
t2.amountTypeId = 23 THEN -t2.amount END) AS Purchase,
SUM(CASE WHEN t2.amountTypeId = 8 THEN t2.amount WHEN
t2.amountTypeId = 24 THEN -t2.amount END) AS Matrix,
SUM(CASE WHEN t2.amountTypeId = 20 THEN t2.amount WHEN
t2.amountTypeId = 21 THEN -t2.amount END) AS QualiFly,
SUM(CASE WHEN t2.amountTypeId = 9 THEN t2.amount
WHEN t2.amountTypeId = 25 THEN -t2.amount END) AS Dist,
SUM(CASE WHEN t2.amountTypeId = 10 THEN t2.amount WHEN
t2.amountTypeId = 26 THEN -t2.amount END) AS SM,
SUM(CASE WHEN t2.amountTypeId = 11 THEN t2.amount
WHEN t2.amountTypeId = 27 THEN -t2.amount END) AS BreakAway,
SUM(CASE WHEN t2.amountTypeId = 13 THEN t2.amount WHEN
t2.amountTypeId = 14 THEN -t2.amount END) AS Transfer,
SUM(CASE WHEN t2.amountTypeId = 28 THEN t2.amount WHEN
t2.amountTypeId = 15 THEN -t2.amount END) AS Spent
FROM tblTravelDetail t1 INNER JOIN
tblTravelDetailAmount t2 ON t1.TravelDetailId =
t2.TravelDetailId INNER JOIN
tblTravelDetail t3 ON t2.TravelDetailId =
t3.TravelDetailId INNER JOIN
tblTravelDetailMember t4 ON t3.TravelDetailId =
t4.TravelDetailId INNER JOIN
tblTravelEvent t5 ON t1.TravelEventId =
t5.TravelEventId INNER JOIN
amount_type t6 ON t2.amountTypeId =
t6.amount_type_id
WHERE (t1.MemberId = 12391) AND (t2.amount <> 0)
GROUP BY t1.MemberId, t1.PeriodID...
It is so slow on our very fast servers. Would an XML connection with
the "SQL XML Support in IIS" be an option? How can I get this view to
be faster for users to get over the net?
Thanks for any help,
Trinttrint
Have you defined any indexes on the table?
Have you seen an execution plan of the query?
"trint" <trinity.smith@.gmail.com> wrote in message
news:1123496340.977863.196110@.g47g2000cwa.googlegroups.com...
> Ok,
> I have a web app that calls several very complex (at least to me) sql
> queries like this for example:
> SELECT t1.MemberId, t1.PeriodID,
> SUM(CASE WHEN t2.amountTypeId = 7 THEN t2.amount WHEN
> t2.amountTypeId = 23 THEN -t2.amount END) AS Purchase,
> SUM(CASE WHEN t2.amountTypeId = 8 THEN t2.amount WHEN
> t2.amountTypeId = 24 THEN -t2.amount END) AS Matrix,
> SUM(CASE WHEN t2.amountTypeId = 20 THEN t2.amount WHEN
> t2.amountTypeId = 21 THEN -t2.amount END) AS QualiFly,
> SUM(CASE WHEN t2.amountTypeId = 9 THEN t2.amount
> WHEN t2.amountTypeId = 25 THEN -t2.amount END) AS Dist,
> SUM(CASE WHEN t2.amountTypeId = 10 THEN t2.amount WHEN
> t2.amountTypeId = 26 THEN -t2.amount END) AS SM,
> SUM(CASE WHEN t2.amountTypeId = 11 THEN t2.amount
> WHEN t2.amountTypeId = 27 THEN -t2.amount END) AS BreakAway,
> SUM(CASE WHEN t2.amountTypeId = 13 THEN t2.amount WHEN
> t2.amountTypeId = 14 THEN -t2.amount END) AS Transfer,
> SUM(CASE WHEN t2.amountTypeId = 28 THEN t2.amount WHEN
> t2.amountTypeId = 15 THEN -t2.amount END) AS Spent
> FROM tblTravelDetail t1 INNER JOIN
> tblTravelDetailAmount t2 ON t1.TravelDetailId =
> t2.TravelDetailId INNER JOIN
> tblTravelDetail t3 ON t2.TravelDetailId =
> t3.TravelDetailId INNER JOIN
> tblTravelDetailMember t4 ON t3.TravelDetailId =
> t4.TravelDetailId INNER JOIN
> tblTravelEvent t5 ON t1.TravelEventId =
> t5.TravelEventId INNER JOIN
> amount_type t6 ON t2.amountTypeId =
> t6.amount_type_id
> WHERE (t1.MemberId = 12391) AND (t2.amount <> 0)
> GROUP BY t1.MemberId, t1.PeriodID...
> It is so slow on our very fast servers. Would an XML connection with
> the "SQL XML Support in IIS" be an option? How can I get this view to
> be faster for users to get over the net?
> Thanks for any help,
> Trint
>|||I have not defined any indexes on the table ( none are defined ). The
tables are created by another department, but I have exported all of
them as 'copies' to my local machine for my c# app development. I am
willing to go for it though and then present this, if faster, to my
boss to get the changes in the real instances of SQL Server. I will
research on how to do this now.
Any help on making this faster (or easier) is appreciated.
Thanks,
Trintsql

Monday, March 19, 2012

Help with a confusing and advanced query - If Exsists?

Ok,

I need to select a list of products based on a complex criteria. First off, here is a list of the tables I am using and their function.

Dbo.Products – This table holds product names and information.

Dbo.Products_Attributes – This table holds a list of attributes for each product. It has a many to 1 relationship with Products. This table holds only 2 fields – ProductID and AttributeID

Dbo.Customers – This table holds basic customer information and some of the needed criteria for the product search.

Dbo.Customers_Attributes – This table contains a list of attributes that the customer needs in his/her products. This table has a many to 1 relationship with Customers and only has 2 fields, CustomerNum and AttributeID

Dbo.Attributes – This table contains all of the different attributes possible for our products to have. Each can be used by a customer when choosing criteria.

Background: Each attribute is a True / False. Either they want that attribute, or they do not care if they get it or not. This is where the hard part comes in. If a particular attribute is listed as needed in dbo.Customers_Attributes, then I do not want any records pulled from Products that DOES NOT have this attribute. On the other hand, if the attribute is NO listed in the customers_attributes list, it is assumed that the customer wants it. In other words, if there were NO attributes in the customers_attributes list, ALL products would be returned. Attributes listed in the customers_attributes table are "required" (no product will be returned that does not have what is in the list).

I hope I was clear enough… I had to change the table names a functions slightly because of some stupid policy about giving out too much proprietary information here.

If any of you know how to do the above with a subquery of some kind, please let me know.

Dave LarsonI'm not sure that I catch your task properly, but try something like that:

SELECT ProductId, ProductName
FROM dbo.Products
WHERE (NOT (ProductId IN
(SELECT dbo.Products.ProductId
FROM dbo.Products INNER JOIN
dbo.Product_Attributes ON dbo.Products.ProductId = dbo.Product_Attributes.Product
WHERE (dbo.Product_Attributes.Attribute IN
(SELECT dbo.Castomer_Attributes.Attribute
FROM dbo.Castomer_Attributes INNER JOIN
dbo.Customers ON dbo.Castomer_Attributes.Customer = dbo.Customers.CustomerId
WHERE (dbo.Customers.CustomerId = @.CustomerId))))))

where @.CustomerId is a parameter.

Help with a complex UPDATE query

Well, I think it's complex anyway -- you might not :)

TableDef:
CREATE TABLE CustTransactions (
TransactionKey int IDENTITY(1,1) NOT NULL,
CustomerID int,
AmountSpent float,
CustSelected bit default 0);

TransactionKey is the primary key, CustomerID and AmountSpent are both
indexed (non unique).

What I would like to do is, for all of the records in descending order
of "AmountSpent" where "CustSelected = TRUE", set CustSelected to FALSE
such that the sum of all the AmountSpent records with CustSelected =
TRUE is no greater than a specified amount (say $50,000).

What I'm doing at the moment is a "SELECT * FROM CustTransactions WHERE
CustSelected = TRUE ORDER BY AmountSpent;", programatically looping
through all the records until AmountSpent 50000, then continuine to
loop through the remainder of the records setting CustSelected = FALSE.
This does exactly what I want but is slow and inefficient. I am sure it
could be done in a single SQL statement with subqueries, but I lack the
knowledge and experience to figure out how.

The closest I can get is:-

UPDATE CustTransactions SET CustSelected = FALSE
WHERE (CustSelected = TRUE)
AND TransactionKey NOT IN
(SELECT TOP 50000 TransactionKey FROM CustTransactions WHERE
(((CustTransactions.CustSelected)=TRUE))
ORDER BY AmountSpect DESC, TransactionKey ASC);

However, this mereley ensures only the top 50,000 customers by amount
spent remain "selected", not the top "X" customers whose total spend
is $50,000. I really need to replace the "SELECT TOP 50000" with some
form of "SELECT TOP (X rows until sum(AmountSpent) =50000)".

Is it even possible to achieve what I'm trying to do?

Thanks in advance for any assistance offered!
--
SlowerThanYouHi,

Consider the following sample data:

INSERT INTO CustTransactions VALUES (1, 1000, 0)
INSERT INTO CustTransactions VALUES (2, 1000, 1)
INSERT INTO CustTransactions VALUES (2, 2500, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (3, 30000, 1)
INSERT INTO CustTransactions VALUES (3, 17000, 1)

What is the expected result (the output of SELECT * FROM
CustTransactions) ?

Also consider this sample data:

INSERT INTO CustTransactions VALUES (1, 10000, 0)
INSERT INTO CustTransactions VALUES (2, 20000, 1)
INSERT INTO CustTransactions VALUES (2, 25000, 0)
INSERT INTO CustTransactions VALUES (2, 2500, 0)

What is the expected result in this case ?

Razvan

Slower Than You wrote:

Quote:

Originally Posted by

Well, I think it's complex anyway -- you might not :)
>
TableDef:
CREATE TABLE CustTransactions (
TransactionKey int IDENTITY(1,1) NOT NULL,
CustomerID int,
AmountSpent float,
CustSelected bit default 0);
>
TransactionKey is the primary key, CustomerID and AmountSpent are both
indexed (non unique).
>
What I would like to do is, for all of the records in descending order
of "AmountSpent" where "CustSelected = TRUE", set CustSelected to FALSE
such that the sum of all the AmountSpent records with CustSelected =
TRUE is no greater than a specified amount (say $50,000).
>
What I'm doing at the moment is a "SELECT * FROM CustTransactions WHERE
CustSelected = TRUE ORDER BY AmountSpent;", programatically looping
through all the records until AmountSpent 50000, then continuine to
loop through the remainder of the records setting CustSelected = FALSE.
This does exactly what I want but is slow and inefficient. I am sure it
could be done in a single SQL statement with subqueries, but I lack the
knowledge and experience to figure out how.
>
The closest I can get is:-
>
UPDATE CustTransactions SET CustSelected = FALSE
WHERE (CustSelected = TRUE)
AND TransactionKey NOT IN
(SELECT TOP 50000 TransactionKey FROM CustTransactions WHERE
(((CustTransactions.CustSelected)=TRUE))
ORDER BY AmountSpect DESC, TransactionKey ASC);
>
However, this mereley ensures only the top 50,000 customers by amount
spent remain "selected", not the top "X" customers whose total spend
is $50,000. I really need to replace the "SELECT TOP 50000" with some
form of "SELECT TOP (X rows until sum(AmountSpent) =50000)".
>
Is it even possible to achieve what I'm trying to do?
>
Thanks in advance for any assistance offered!
--
SlowerThanYou

|||You do not have a table at all; it is an attempt to mimic a deck of
punch cards. You confuse columns and fields, rows and records and use
the wrong data types.

do these transactions create a customer or a sale? Why is there DDL in
narratives? Why did youn use an IDENTITY columns? Why FLOAT for money?

CREATE TABLE SalesTransactions
(sales_nbr INTEGER NOT NULL PRIMARY KEY,
customer_id INTEGER NOT NULL
REFERENCES Customers (customer_id),
sales_amt DECIMAL(12,2) NOT NULL);

Quote:

Originally Posted by

Quote:

Originally Posted by

>What I would like to do is, for all of the records [sic] in descending order of "AmountSpent" where "CustSelected = TRUE", set CustSelected to FALSE


such that the sum of all the AmountSpent records with CustSelected =
TRUE is no greater than a specified amount (say $50,000). <<

Did you know that SQL has no Boolean data type? That using BIT is
proprietary and an awful coding practice? We updated punch cards like
you are doing because we had no choice about it.

Quote:

Originally Posted by

Quote:

Originally Posted by

>What I'm doing at the moment is a "SELECT * FROM CustTransactions WHERE CustSelected = TRUE ORDER BY AmountSpent;", programatically looping through all the records [sic] until AmountSpent 50000, then continuine to loop through the remainder of the records [sic] setting CustSelected = FALSE.


You did not say what to do about ties; if I have five sales of
$50,000.00 which one would I mark?

give us a RELATIONAL spec and we can probably help you|||Razvan Socol wrote:

Quote:

Originally Posted by

Consider the following sample data:
>
INSERT INTO CustTransactions VALUES (1, 1000, 0)
INSERT INTO CustTransactions VALUES (2, 1000, 1)
INSERT INTO CustTransactions VALUES (2, 2500, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (3, 30000, 1)
INSERT INTO CustTransactions VALUES (3, 17000, 1)
>
What is the expected result (the output of SELECT * FROM
CustTransactions) ?


Hi Razvan,

Thanks for responding. The expected result for the above sample data
would be:-

1, 1000, 0
2, 1000, 0
2, 2500, 0
1, 1000, 1
1, 1000, 1
3, 30000, 1
3, 17000, 1

To clarify this:-

1) The first row is completely ignored because its CustSelected field
is FALSE (as would be any other records where CustSelected = 0)

2) The rows WHERE CustSelected = 1 are sorted in descending order of
AmountSpent (where two or more records have equal values for
AmountSpent, the ordered of them is arbitrary - I don't care).

3) Any rows that would cause the sum of AmountSpent WHERE CustSelected
= 1 to exceed our selection criteria ($50,000) have their
CustSelected value set to 0.

Quote:

Originally Posted by

>
Also consider this sample data:
>
INSERT INTO CustTransactions VALUES (1, 10000, 0)
INSERT INTO CustTransactions VALUES (2, 20000, 1)
INSERT INTO CustTransactions VALUES (2, 25000, 0)
INSERT INTO CustTransactions VALUES (2, 2500, 0)
>
What is the expected result in this case ?


Assuming our "target" figure is 50000 again:-

1, 10000, 0
2, 20000, 1
2, 25000, 0
2, 2500, 0

The three records where CustSelected = 0 are ignored. As a possible
point of additional interest, if the target figure was less than 20000
then row two would have had its CustSelected column set to 0 (because
this would have caused the "target" figure to be exceeded.

I hope I've done a better job of explaining my requirement this time
around!
--
SlowerThanYou|||--CELKO-- wrote:

Quote:

Originally Posted by

You do not have a table at all; it is an attempt to mimic a deck of
punch cards. You confuse columns and fields, rows and records and use
the wrong data types.
>
do these transactions create a customer or a sale? Why is there DDL
in narratives? Why did youn use an IDENTITY columns? Why FLOAT for
money?


Forget about the datatypes; they are largely irrelevant to the problem
I am trying to solve. I have abstracted the problem to attempt to make
it as easy to explain as possible. The real table I am trying to update
is, in fact, not called CustTransactions and has nothing to do with
"customers" and it does, in fact, have a non-monetary floating point
value that is the focus of my update. You are reading more than I
intended into the column names I've used in my example.

Quote:

Originally Posted by

Did you know that SQL has no Boolean data type? That using BIT is
proprietary and an awful coding practice? We updated punch cards like
you are doing because we had no choice about it.


No, I didn't know that SQL has no boolean data type, and that BIT is
proprietary, so thanks for that information. You can pretend it is an
integer type if you prefer. Again, do not read anything into the table
and field names I have used in my abstract example - just assume that
there is a True/False type flag that I need to record for each row
according to the critera I outlined.

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

What I'm doing at the moment is a "SELECT * FROM CustTransactions


WHERE CustSelected = TRUE ORDER BY AmountSpent;", programatically
looping through all the records [sic] until AmountSpent 50000, then
continuine to loop through the remainder of the records [sic] setting
CustSelected = FALSE.
>
You did not say what to do about ties; if I have five sales of
$50,000.00 which one would I mark?
>
give us a RELATIONAL spec and we can probably help you


Please have a look at my reply to Razvan, which I hope describes the
problem I am trying to solve more accurately than my previous post
(which was not as coherent as it might have been, for which I
apologise).

--
SlowerThanYou|||"Slower Than You" <no.way@.josewrote in
news:1163790273.8354.0@.iris.uk.clara.net:

Quote:

Originally Posted by

Well, I think it's complex anyway -- you might not :)
>
TableDef:
CREATE TABLE CustTransactions (
TransactionKey int IDENTITY(1,1) NOT NULL,
CustomerID int,
AmountSpent float,
CustSelected bit default 0);
>
TransactionKey is the primary key, CustomerID and AmountSpent are both
indexed (non unique).
>
What I would like to do is, for all of the records in descending order
of "AmountSpent" where "CustSelected = TRUE", set CustSelected to FALSE
such that the sum of all the AmountSpent records with CustSelected =
TRUE is no greater than a specified amount (say $50,000).
>
What I'm doing at the moment is a "SELECT * FROM CustTransactions WHERE
CustSelected = TRUE ORDER BY AmountSpent;", programatically looping
through all the records until AmountSpent 50000, then continuine to
loop through the remainder of the records setting CustSelected = FALSE.
This does exactly what I want but is slow and inefficient. I am sure it
could be done in a single SQL statement with subqueries, but I lack the
knowledge and experience to figure out how.
>
The closest I can get is:-
>
UPDATE CustTransactions SET CustSelected = FALSE
WHERE (CustSelected = TRUE)
AND TransactionKey NOT IN
(SELECT TOP 50000 TransactionKey FROM CustTransactions WHERE
(((CustTransactions.CustSelected)=TRUE))
ORDER BY AmountSpect DESC, TransactionKey ASC);
>
However, this mereley ensures only the top 50,000 customers by amount
spent remain "selected", not the top "X" customers whose total spend
is $50,000. I really need to replace the "SELECT TOP 50000" with some
form of "SELECT TOP (X rows until sum(AmountSpent) =50000)".
>
Is it even possible to achieve what I'm trying to do?


See example 4 (cumulative sum) of
http://www.databasejournal.com/feat...10894_3373861_2
HTH

--
For e-mail address, remove the XXs|||Slower Than You wrote:

Quote:

Originally Posted by

Razvan Socol wrote:
>

Quote:

Originally Posted by

Consider the following sample data:

INSERT INTO CustTransactions VALUES (1, 1000, 0)
INSERT INTO CustTransactions VALUES (2, 1000, 1)
INSERT INTO CustTransactions VALUES (2, 2500, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (3, 30000, 1)
INSERT INTO CustTransactions VALUES (3, 17000, 1)

What is the expected result (the output of SELECT * FROM
CustTransactions) ?


>
Hi Razvan,
>
Thanks for responding. The expected result for the above sample data
would be:-
>
1, 1000, 0
2, 1000, 0
2, 2500, 0
1, 1000, 1
1, 1000, 1
3, 30000, 1
3, 17000, 1


The above result has a sum of 49000. From your narrative, I would
expect a result which has a sum of 49500, for example this:

1, 1000, 0
2, 1000, 0
2, 2500, 1
1, 1000, 0
1, 1000, 0
3, 30000, 1
3, 17000, 1

Which one is the correct result ?

Razvan|||Razvan Socol wrote:

Quote:

Originally Posted by

>
Slower Than You wrote:

Quote:

Originally Posted by

Razvan Socol wrote:

Quote:

Originally Posted by

Consider the following sample data:
>
INSERT INTO CustTransactions VALUES (1, 1000, 0)
INSERT INTO CustTransactions VALUES (2, 1000, 1)
INSERT INTO CustTransactions VALUES (2, 2500, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (3, 30000, 1)
INSERT INTO CustTransactions VALUES (3, 17000, 1)
>
What is the expected result (the output of SELECT * FROM
CustTransactions) ?


Hi Razvan,

Thanks for responding. The expected result for the above sample data
would be:-

1, 1000, 0
2, 1000, 0
2, 2500, 0
1, 1000, 1
1, 1000, 1
3, 30000, 1
3, 17000, 1


>
The above result has a sum of 49000. From your narrative, I would
expect a result which has a sum of 49500, for example this:
>
1, 1000, 0
2, 1000, 0
2, 2500, 1
1, 1000, 0
1, 1000, 0
3, 30000, 1
3, 17000, 1
>
Which one is the correct result ?


You are absolutely right - I was a little to hasty in putting my
response together. The sum of 49500 is correct.
--
SlowerThanYou|||Chris Cheney wrote:

Quote:

Originally Posted by

"Slower Than You" <no.way@.josewrote in
news:1163790273.8354.0@.iris.uk.clara.net:
>

Quote:

Originally Posted by

Well, I think it's complex anyway -- you might not :)

TableDef:
CREATE TABLE CustTransactions (
TransactionKey int IDENTITY(1,1) NOT NULL,
CustomerID int,
AmountSpent float,
CustSelected bit default 0);

TransactionKey is the primary key, CustomerID and AmountSpent are
both indexed (non unique).

What I would like to do is, for all of the records in descending
order of "AmountSpent" where "CustSelected = TRUE", set
CustSelected to FALSE such that the sum of all the AmountSpent
records with CustSelected = TRUE is no greater than a specified
amount (say $50,000).

What I'm doing at the moment is a "SELECT * FROM CustTransactions
WHERE CustSelected = TRUE ORDER BY AmountSpent;", programatically
looping through all the records until AmountSpent 50000, then
continuine to loop through the remainder of the records setting
CustSelected = FALSE. This does exactly what I want but is slow
and inefficient. I am sure it could be done in a single SQL
statement with subqueries, but I lack the knowledge and experience
to figure out how.

The closest I can get is:-

UPDATE CustTransactions SET CustSelected = FALSE
WHERE (CustSelected = TRUE)
AND TransactionKey NOT IN
(SELECT TOP 50000 TransactionKey FROM CustTransactions WHERE
(((CustTransactions.CustSelected)=TRUE))
ORDER BY AmountSpect DESC, TransactionKey ASC);

However, this mereley ensures only the top 50,000 customers by
amount spent remain "selected", not the top "X" customers whose
total spend is $50,000. I really need to replace the "SELECT TOP
50000" with some form of "SELECT TOP (X rows until sum(AmountSpent)
=50000)".

Is it even possible to achieve what I'm trying to do?


>
See example 4 (cumulative sum) of
http://www.databasejournal.com/feat...hp/10894_337386
1_2
>
HTH


Ahah! That helped enormously - thanks, much appreciated.
--
SlowerThanYou|||>No, I didn't know that SQL has no boolean data type, and that BIT is proprietary, so thanks for that information. You can pretend it is an integer type if you prefer. Again, do not read anything into the table and field [sic] names .. <<

O)kay. You have SERIOUS conceptual problems with SQL and RDBMS. The
reason that SQL has no BOOLEAN data types is one of those "mathematical
foundations" things that has to do with NULLs, 3-valued logic and
logic. In 25 words or less, we discover a state of being via
predicates rather than by looking for a flag.

In procedural, step-by-step file system models you set flags in step
(n) to pass control information to step (n+1) of the process. In the RM
model, multiple users can change the basic facts of a schema and thus
the criteria of the subset, so we do not store computed columns. You
compute subset membership at run time.

Fields have mean because of the program that reads them; columns have a
domain, a value and constraints in the schema -- totally separate from
any program that uses them -- which give them meaning.

It does not matter if you use a Standard data type; you are still not
programming with relational data model. Think in terms of predicates,
sets and declarations, not flags, sequences and procedures.|||--CELKO-- wrote:

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

No, I didn't know that SQL has no boolean data type, and that BIT


is proprietary, so thanks for that information. You can pretend it is
an integer type if you prefer. Again, do not read anything into the
table and field [sic] names .. <<
>
O)kay. You have SERIOUS conceptual problems with SQL and RDBMS.


Yeah well thanks for the opinion and all, but with the helpful efforts
of a number of posters to this group, I've understood and solved the
problem now and everything is just lovely. I'm happy, my customer is
happy, and my customer's customer is happy. Flowers bloom, birds sing,
and I've moved on to other things.

The last time I did any serious database development work was as a
young contractor, way back in the days of DBaseII before all this SQL
malarkey existed. In those days we had tables that consisted of records
made up of one or more fields. Rows and columns where for spreadsheets.
I'm sorry if that terminology annoys you but old habits die hard, and
it least it gives you a reason to try to act all superior, eh?
--
SlowerThanYou|||--CELKO-- wrote:

Quote:

Originally Posted by

You do not have a table at all; it is an attempt to mimic a deck of
punch cards. You confuse columns and fields, rows and records


This would be a lot more helpful if you'd explain the difference (or
rather, since the explanation is probably long-ish, include a URL
where the explanation can be found).

Quote:

Originally Posted by

do these transactions create a customer or a sale?


*looks down* Oh, you're alluding to SalesTransactions being a better
name than CustTransactions. But are you sure? Customers may engage
in sales, returns, credit memos and debit memos (the latter two are
used to adjust the customer's balance without inventory changing hands,
e.g. if they were over/undercharged for something). Of course, stuffing
multiple types of transactions into a single table without an explicit
TransactionType column is a separate error, but perhaps the table
definition was simplified by omitting columns not directly relevant to
the task at hand.

Quote:

Originally Posted by

Why is there DDL in narratives?


Why wouldn't there be? How many questions lacking DDL receive
the initial response "please post DDL to create your tables and
populate them with data illustrating the issue"?

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

>>What I would like to do is, for all of the records [sic] in descending order of "AmountSpent" where "CustSelected = TRUE", set CustSelected to FALSE


such that the sum of all the AmountSpent records with CustSelected =
TRUE is no greater than a specified amount (say $50,000). <<


Quote:

Originally Posted by

You did not say what to do about ties; if I have five sales of
$50,000.00 which one would I mark?


This gap in the spec can be bridged by picking an arbitrary
rule (e.g. mark rows with lower TransactionKey first), on the
assumption that the questioner will be able to adjust that
part of the answer to fit whatever the actual rule is.

Quote:

Originally Posted by

give us a RELATIONAL spec and we can probably help you


This is a more general case of the above. "Your style is lousy,
so I'm going to point that out _and not answer your question_."|||--CELKO-- wrote:

Quote:

Originally Posted by

O)kay. You have SERIOUS conceptual problems with SQL and RDBMS. The
reason that SQL has no BOOLEAN data types is one of those "mathematical
foundations" things that has to do with NULLs, 3-valued logic and
logic.


This is not quite true. SQL in general has an optional BOOLEAN data
type; MSSQL in particular does not support the option.

http://troels.arvin.dk/db/rdbms/#data_types-boolean
Also, most of your message boils down to "you shouldn't store computed
data that can become outdated", but starting it out with the
above-quoted material gives the impression of "you shouldn't use
flags", which is untrue.

Quote:

Originally Posted by

Fields have mean because of the program that reads them; columns have
a domain, a value and constraints in the schema -- totally separate
from any program that uses them -- which give them meaning.


Aha, here's the answer to that "what's the difference between a field
and a column?" question that was raised earlier. No wonder I felt
confused - I'm familiar with program-independent constraints enforced
by the database, but did not strictly associate "column" with their
existence and "field" with their non-existence. (Spreadsheets, in
particular, play havoc with this.)

The intended difference between "row" and "record" is similarly
non-obvious to the lay reader, though I think I've read about it
before: namely, records have an inherent order, while rows have
no guaranteed order unless you specify one. (Spreadsheets play
havoc with this, too. So do certain indexes, especially clustering
indexes, which novices can easily mistake for an inherent order.)|||See if this is what you want:

CREATE TABLE SalesTransactions
(sales_nbr INTEGER NOT NULL PRIMARY KEY,
customer_id INTEGER NOT NULL
REFERENCES Customers (customer_id),
sales_amt DECIMAL(12,2) NOT NULL);

Create a VIEW or CTE with each customers sales ordered from high to
low. This is a greedy algorithm. The ROW_NUMBER() will randomly pick
an ordering in the event of ties.

Using that derived table, we can find the subset of purchase in each
customer that are at or below the threshold. amount, something like
this:

WITH (SELECT customer_id, sales_amt,
ROW_NUMBER()
OVER (PARTITION BY customer_id
ORDER BY sales_amt DESC)
FROM SalesTransactions AS S1)
AS SalesScores (customer_id, sales_amt, score)

SELECT S1.customer_id, S1.score
FROM SalesScores AS S1
WHERE @.threshold_amt <=
(SELECT SUM(S2.sales_amt)
FROM SalesScores AS S2
WHERE S1.customer_id = S2.customer_id
AND S1.score >= S2.score);

You can do this in one statement with the full OLAP features, which
would have a RANGE clause in the SUM() OVER() construct. SQL Server is
a bit behind.

But the important point is that you use virtual tables, rather than
mimicing a deck of punch cards. Think LOGICAL and not PHYSICAL! Think
sets, not sequences.

Help with a complex Join

All
I could use some help with a complex join. I have 3 SIMPLIFIED
FICTIONAL tables (add table scripts and data on bottom of message):
In words I want the following:
a resultset of the avg(x), count(flag), sum(flag) for each and every
(outer join) Station_Number where cast_id<>'full' and flag=0 and z<=3.
I would expect that the count(flag) would return the number of rows in
tblLog but it returns number of rows in tblData (the child of the
1:many). What am I doing wrong? The sql I am using is below (as well
as the result).
daben
--SQL
QUERY---
SELECT TOP 100 PERCENT
tblStation.Station_Number,
AVG(CASE WHEN x <> -999 AND flag=0 and cast_id<>'full' THEN x
END) AS AVG_x,
ISNULL(STDEV(CASE WHEN x <> -999 AND flag=0 and
cast_id<>'full' THEN x END),0)AS STDEV_x,
SUM(flag) AS SUM_flag, COUNT(flag) AS COUNT_flag
FROM
((tblStation LEFT JOIN tblLog
ON (tblStation.Station_Number = tblLog.Station_Number))
LEFT JOIN tblData
ON (tblLog.Log_Index = tblData.Log_Index
AND z<=3))
GROUP BY
tblStation.Station_Number
ORDER BY
tblStation.Station_Number
--QUERY
RESULTSET---
Station_Number AVG_x
STDEV_x SUM_flag
COUNT_flag
-- ----
---- --
--
1 1.4285714285714286
0.53452248382484868 3 13
2 NULL
0.0 NULL 0
3 NULL
0.0 NULL 0
4 NULL
0.0 0 1
5 NULL
0.0 6 7
6 NULL
0.0 NULL 0
7 NULL
0.0 NULL 0
8 2.2000000000000002
1.0954451150103324 3 10
9 NULL
0.0 NULL 0
10 NULL
0.0 NULL 0
(10 row(s) affected)
--SQL ADD
TABLE---
CREATE TABLE [dbo].[tblData] (
[Log_Index] [int] NOT NULL ,
[z] [float] NOT NULL ,
[x] [float] NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[tblLog] (
[Log_Index] [int] NOT NULL ,
[Station_Number] [int] NULL ,
[cast_id] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[flag] [int] NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[tblStation] (
[Station_Number] [int] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblData] WITH NOCHECK ADD
CONSTRAINT [PK_tblData] PRIMARY KEY CLUSTERED
(
[Log_Index],
[z]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblLog] WITH NOCHECK ADD
CONSTRAINT [PK_tblLog] PRIMARY KEY CLUSTERED
(
[Log_Index]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblStation] WITH NOCHECK ADD
CONSTRAINT [PK_tblStation] PRIMARY KEY CLUSTERED
(
[Station_Number]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblLog] ADD
CONSTRAINT [DF_tblLog_cast_id] DEFAULT ('NA') FOR [cast_id],
CONSTRAINT [DF_tblLog_flag] DEFAULT (0) FOR [flag]
GO
ALTER TABLE [dbo].[tblData] ADD
CONSTRAINT [FK_tblData_tblLog] FOREIGN KEY
(
[Log_Index]
) REFERENCES [dbo].[tblLog] (
[Log_Index]
) ON DELETE CASCADE ON UPDATE CASCADE
GO
ALTER TABLE [dbo].[tblLog] ADD
CONSTRAINT [FK_tblLog_tblStation] FOREIGN KEY
(
[Station_Number]
) REFERENCES [dbo].[tblStation] (
[Station_Number]
) ON DELETE CASCADE ON UPDATE CASCADE
GO
--DATA--
tblStation
[Station_Number]
1
2
3
4
5
6
7
8
9
10
tblLog
Log_Index Station_Number cast_id flag
1 1 full 0
2 1 up 0
3 1 down 1
4 1 up 0
5 1 down 0
6 4 full 0
7 5 full 0
8 5 up 1
9 5 down 1
10 8 full 0
11 8 up 0
12 8 up 0
13 8 down 1
tblData
Log_Index z x
2 1 1
2 2 2
2 3 2
2 4 2
2 5 1
2 6 2
2 7 2
2 8 1
3 1 5
3 2 5
3 3 5
3 4 5
3 5 5
3 6 5
3 7 5
3 8 5
4 1 1
4 2 1
4 3 1
4 4 1
4 5 1
4 6 1
4 7 1
4 8 1
5 1 2
5 2 -999
5 3 -999
5 4 -999
5 5 3
5 6 3
5 7 3
5 8 3
8 1 1
8 2 1
8 3 1
8 4 1
9 1 3
9 2 3
9 3 3
9 4 3
11 1 3
11 2 3
11 3 3
11 4 -999
12 1 1
12 2 1
12 3 -999
12 4 1
13 1 -999
13 2 -999
13 3 -999
13 4 -999CREATE TABLE Stations
(station_number INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE Data
(log_index INTEGER NOT NULL PRIMARY KEY,
z FLOAT NOT NULL,
x FLOAT NULL,
PRIMARY KEY (log_index, z));
CREATE TABLE Log
(log_index INTEGER NOT NULL
REFERENCES Log (log_index)
ON DELETE CASCADE
ON UPDATE CASCADE,
station_number INTEGER NULL
REFERENCES Stations (station_number)
ON DELETE CASCADE
ON UPDATE CASCADE,
cast_id VARCHAR(10) DEFAULT 'NA',
flag INTEGER DEFAULT 0,
PRIMARY KEY (log_index, station_number));
I am going to make a gues beore I go to bed.
SELECT S.station_number,
AVG(D.x) AS x_avg,
COALESCE(STDEV(D.x, 0.0) AS x_stdev,
SUM(L.flag) AS flag_tot,
COUNT(L.flag) AS flag_cnt
FROM Station AS S
LEFT OUTER JOIN
Log AS L
ON S.station_number = L.station_number
AND L.flag = 0
AND L.cast_id <> 'full'
LEFT OUTER JOIN
Data AS D
ON L.log_index = D.log_index
AND D.z <= 3
AND D.x <> -999
GROUP BY S.station_number;|||Hi
Thanks for the reply. Your solution is much more elegant than mine, but
it fails on one point. It returns a count of 7 (indicating how many
rows in table D that are found) NOT a count of 4 (which would indicate
how many rows in table Log). I have found that this works (but is
sloppy) using correlated sub queries:
SELECT TOP 100 PERCENT
tblStation.Station_Number,
AVG(CASE WHEN x <> -999 AND flag=0 and cast_id<>'full' THEN x
END) AS AVG_x,
--ISNULL(STDEV(CASE WHEN x <> -999 AND flag=0 and
cast_id<>'full' THEN x END),0)AS STDEV_x,
(SELECT SUM(CASE WHEN cast_id<>'full' THEN flag END) FROM tblLog WHERE
tblStation.Station_Number=tblLog.Station_Number) AS SUM_flag,
(SELECT COUNT(CASE WHEN cast_id<>'full' THEN flag END) FROM tblLog
WHERE tblStation.Station_Number=tblLog.Station_Number ) AS COUNT_flag,
COUNT(flag) as COUNT_flag2
FROM
((tblStation LEFT JOIN tblLog
ON (tblStation.Station_Number = tblLog.Station_Number))
LEFT JOIN tblData
ON (tblLog.Log_Index = tblData.Log_Index
AND z<=3))
GROUP BY
tblStation.Station_Number
ORDER BY
tblStation.Station_Number