Restore Database: Help with a complex UPDATE query

Well, I think it's complex anyway -- you might not :)

TableDef:
CREATE TABLE CustTransactions (
TransactionKey int IDENTITY(1,1) NOT NULL,
CustomerID int,
AmountSpent float,
CustSelected bit default 0);

TransactionKey is the primary key, CustomerID and AmountSpent are both
indexed (non unique).

What I would like to do is, for all of the records in descending order
of "AmountSpent" where "CustSelected = TRUE", set CustSelected to FALSE
such that the sum of all the AmountSpent records with CustSelected =
TRUE is no greater than a specified amount (say $50,000).

What I'm doing at the moment is a "SELECT * FROM CustTransactions WHERE
CustSelected = TRUE ORDER BY AmountSpent;", programatically looping
through all the records until AmountSpent 50000, then continuine to
loop through the remainder of the records setting CustSelected = FALSE.
This does exactly what I want but is slow and inefficient. I am sure it
could be done in a single SQL statement with subqueries, but I lack the
knowledge and experience to figure out how.

The closest I can get is:-

UPDATE CustTransactions SET CustSelected = FALSE
WHERE (CustSelected = TRUE)
AND TransactionKey NOT IN
(SELECT TOP 50000 TransactionKey FROM CustTransactions WHERE
(((CustTransactions.CustSelected)=TRUE))
ORDER BY AmountSpect DESC, TransactionKey ASC);

However, this mereley ensures only the top 50,000 customers by amount
spent remain "selected", not the top "X" customers whose total spend
is $50,000. I really need to replace the "SELECT TOP 50000" with some
form of "SELECT TOP (X rows until sum(AmountSpent) =50000)".

Is it even possible to achieve what I'm trying to do?

Thanks in advance for any assistance offered!
--
SlowerThanYouHi,

Consider the following sample data:

INSERT INTO CustTransactions VALUES (1, 1000, 0)
INSERT INTO CustTransactions VALUES (2, 1000, 1)
INSERT INTO CustTransactions VALUES (2, 2500, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (3, 30000, 1)
INSERT INTO CustTransactions VALUES (3, 17000, 1)

What is the expected result (the output of SELECT * FROM
CustTransactions) ?

Also consider this sample data:

INSERT INTO CustTransactions VALUES (1, 10000, 0)
INSERT INTO CustTransactions VALUES (2, 20000, 1)
INSERT INTO CustTransactions VALUES (2, 25000, 0)
INSERT INTO CustTransactions VALUES (2, 2500, 0)

What is the expected result in this case ?

Razvan

Slower Than You wrote:

Quote:

Originally Posted by

Well, I think it's complex anyway -- you might not :)
>
TableDef:
CREATE TABLE CustTransactions (
TransactionKey int IDENTITY(1,1) NOT NULL,
CustomerID int,
AmountSpent float,
CustSelected bit default 0);
>
TransactionKey is the primary key, CustomerID and AmountSpent are both
indexed (non unique).
>
What I would like to do is, for all of the records in descending order
of "AmountSpent" where "CustSelected = TRUE", set CustSelected to FALSE
such that the sum of all the AmountSpent records with CustSelected =
TRUE is no greater than a specified amount (say $50,000).
>
What I'm doing at the moment is a "SELECT * FROM CustTransactions WHERE
CustSelected = TRUE ORDER BY AmountSpent;", programatically looping
through all the records until AmountSpent 50000, then continuine to
loop through the remainder of the records setting CustSelected = FALSE.
This does exactly what I want but is slow and inefficient. I am sure it
could be done in a single SQL statement with subqueries, but I lack the
knowledge and experience to figure out how.
>
The closest I can get is:-
>
UPDATE CustTransactions SET CustSelected = FALSE
WHERE (CustSelected = TRUE)
AND TransactionKey NOT IN
(SELECT TOP 50000 TransactionKey FROM CustTransactions WHERE
(((CustTransactions.CustSelected)=TRUE))
ORDER BY AmountSpect DESC, TransactionKey ASC);
>
However, this mereley ensures only the top 50,000 customers by amount
spent remain "selected", not the top "X" customers whose total spend
is $50,000. I really need to replace the "SELECT TOP 50000" with some
form of "SELECT TOP (X rows until sum(AmountSpent) =50000)".
>
Is it even possible to achieve what I'm trying to do?
>
Thanks in advance for any assistance offered!
--
SlowerThanYou

|||You do not have a table at all; it is an attempt to mimic a deck of
punch cards. You confuse columns and fields, rows and records and use
the wrong data types.

do these transactions create a customer or a sale? Why is there DDL in
narratives? Why did youn use an IDENTITY columns? Why FLOAT for money?

CREATE TABLE SalesTransactions
(sales_nbr INTEGER NOT NULL PRIMARY KEY,
customer_id INTEGER NOT NULL
REFERENCES Customers (customer_id),
sales_amt DECIMAL(12,2) NOT NULL);

Quote:

Originally Posted by

Quote:

Originally Posted by

>What I would like to do is, for all of the records [sic] in descending order of "AmountSpent" where "CustSelected = TRUE", set CustSelected to FALSE

such that the sum of all the AmountSpent records with CustSelected =
TRUE is no greater than a specified amount (say $50,000). <<

Did you know that SQL has no Boolean data type? That using BIT is
proprietary and an awful coding practice? We updated punch cards like
you are doing because we had no choice about it.

Quote:

Originally Posted by

Quote:

Originally Posted by

>What I'm doing at the moment is a "SELECT * FROM CustTransactions WHERE CustSelected = TRUE ORDER BY AmountSpent;", programatically looping through all the records [sic] until AmountSpent 50000, then continuine to loop through the remainder of the records [sic] setting CustSelected = FALSE.

You did not say what to do about ties; if I have five sales of
$50,000.00 which one would I mark?

give us a RELATIONAL spec and we can probably help you|||Razvan Socol wrote:

Quote:

Originally Posted by

Consider the following sample data:
>
INSERT INTO CustTransactions VALUES (1, 1000, 0)
INSERT INTO CustTransactions VALUES (2, 1000, 1)
INSERT INTO CustTransactions VALUES (2, 2500, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (1, 1000, 1)
INSERT INTO CustTransactions VALUES (3, 30000, 1)
INSERT INTO CustTransactions VALUES (3, 17000, 1)
>
What is the expected result (the output of SELECT * FROM
CustTransactions) ?

Hi Razvan,

Thanks for responding. The expected result for the above sample data
would be:-

1, 1000, 0
2, 1000, 0
2, 2500, 0
1, 1000, 1
1, 1000, 1
3, 30000, 1
3, 17000, 1

To clarify this:-

1) The first row is completely ignored because its CustSelected field
is FALSE (as would be any other records where CustSelected = 0)

2) The rows WHERE CustSelected = 1 are sorted in descending order of
AmountSpent (where two or more records have equal values for
AmountSpent, the ordered of them is arbitrary - I don't care).

3) Any rows that would cause the sum of AmountSpent WHERE CustSelected
= 1 to exceed our selection criteria ($50,000) have their
CustSelected value set to 0.

Quote:

Originally Posted by

>
Also consider this sample data:
>
INSERT INTO CustTransactions VALUES (1, 10000, 0)
INSERT INTO CustTransactions VALUES (2, 20000, 1)
INSERT INTO CustTransactions VALUES (2, 25000, 0)
INSERT INTO CustTransactions VALUES (2, 2500, 0)
>
What is the expected result in this case ?

Assuming our "target" figure is 50000 again:-

1, 10000, 0
2, 20000, 1
2, 25000, 0
2, 2500, 0

The three records where CustSelected = 0 are ignored. As a possible
point of additional interest, if the target figure was less than 20000
then row two would have had its CustSelected column set to 0 (because
this would have caused the "target" figure to be exceeded.

I hope I've done a better job of explaining my requirement this time
around!
--
SlowerThanYou|||--CELKO-- wrote:

Quote:

Originally Posted by

You do not have a table at all; it is an attempt to mimic a deck of
punch cards. You confuse columns and fields, rows and records and use
the wrong data types.
>
do these transactions create a customer or a sale? Why is there DDL
in narratives? Why did youn use an IDENTITY columns? Why FLOAT for
money?

Forget about the datatypes; they are largely irrelevant to the problem
I am trying to solve. I have abstracted the problem to attempt to make
it as easy to explain as possible. The real table I am trying to update
is, in fact, not called CustTransactions and has nothing to do with
"customers" and it does, in fact, have a non-monetary floating point
value that is the focus of my update. You are reading more than I
intended into the column names I've used in my example.

Quote:

Originally Posted by

Did you know that SQL has no Boolean data type? That using BIT is
proprietary and an awful coding practice? We updated punch cards like
you are doing because we had no choice about it.

No, I didn't know that SQL has no boolean data type, and that BIT is
proprietary, so thanks for that information. You can pretend it is an
integer type if you prefer. Again, do not read anything into the table
and field names I have used in my abstract example - just assume that
there is a True/False type flag that I need to record for each row
according to the critera I outlined.

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

What I'm doing at the moment is a "SELECT * FROM CustTransactions

WHERE CustSelected = TRUE ORDER BY AmountSpent;", programatically
looping through all the records [sic] until AmountSpent 50000, then
continuine to loop through the remainder of the records [sic] setting
CustSelected = FALSE.
>
You did not say what to do about ties; if I have five sales of
$50,000.00 which one would I mark?
>
give us a RELATIONAL spec and we can probably help you

Please have a look at my reply to Razvan, which I hope describes the
problem I am trying to solve more accurately than my previous post
(which was not as coherent as it might have been, for which I
apologise).

--
SlowerThanYou|||"Slower Than You" <no.way@.josewrote in
news:1163790273.8354.0@.iris.uk.clara.net:

Quote:

Originally Posted by

See example 4 (cumulative sum) of
http://www.databasejournal.com/feat...10894_3373861_2
HTH

--
For e-mail address, remove the XXs|||Slower Than You wrote:

Quote:

Originally Posted by

Razvan Socol wrote:
>

Quote:

Originally Posted by

Consider the following sample data:

What is the expected result (the output of SELECT * FROM
CustTransactions) ?

>
Hi Razvan,
>
Thanks for responding. The expected result for the above sample data
would be:-
>
1, 1000, 0
2, 1000, 0
2, 2500, 0
1, 1000, 1
1, 1000, 1
3, 30000, 1
3, 17000, 1

The above result has a sum of 49000. From your narrative, I would
expect a result which has a sum of 49500, for example this:

1, 1000, 0
2, 1000, 0
2, 2500, 1
1, 1000, 0
1, 1000, 0
3, 30000, 1
3, 17000, 1

Which one is the correct result ?

Razvan|||Razvan Socol wrote:

Quote:

Originally Posted by

>
Slower Than You wrote:

Quote:

Originally Posted by

Razvan Socol wrote:

Quote:

Originally Posted by

Hi Razvan,

Thanks for responding. The expected result for the above sample data
would be:-

1, 1000, 0
2, 1000, 0
2, 2500, 0
1, 1000, 1
1, 1000, 1
3, 30000, 1
3, 17000, 1

>
The above result has a sum of 49000. From your narrative, I would
expect a result which has a sum of 49500, for example this:
>
1, 1000, 0
2, 1000, 0
2, 2500, 1
1, 1000, 0
1, 1000, 0
3, 30000, 1
3, 17000, 1
>
Which one is the correct result ?

You are absolutely right - I was a little to hasty in putting my
response together. The sum of 49500 is correct.
--
SlowerThanYou|||Chris Cheney wrote:

Quote:

Originally Posted by

"Slower Than You" <no.way@.josewrote in
news:1163790273.8354.0@.iris.uk.clara.net:
>

Quote:

Originally Posted by

Well, I think it's complex anyway -- you might not :)

TableDef:
CREATE TABLE CustTransactions (
TransactionKey int IDENTITY(1,1) NOT NULL,
CustomerID int,
AmountSpent float,
CustSelected bit default 0);

TransactionKey is the primary key, CustomerID and AmountSpent are
both indexed (non unique).

What I would like to do is, for all of the records in descending
order of "AmountSpent" where "CustSelected = TRUE", set
CustSelected to FALSE such that the sum of all the AmountSpent
records with CustSelected = TRUE is no greater than a specified
amount (say $50,000).

What I'm doing at the moment is a "SELECT * FROM CustTransactions
WHERE CustSelected = TRUE ORDER BY AmountSpent;", programatically
looping through all the records until AmountSpent 50000, then
continuine to loop through the remainder of the records setting
CustSelected = FALSE. This does exactly what I want but is slow
and inefficient. I am sure it could be done in a single SQL
statement with subqueries, but I lack the knowledge and experience
to figure out how.

The closest I can get is:-

However, this mereley ensures only the top 50,000 customers by
amount spent remain "selected", not the top "X" customers whose
total spend is $50,000. I really need to replace the "SELECT TOP
50000" with some form of "SELECT TOP (X rows until sum(AmountSpent)
=50000)".

Is it even possible to achieve what I'm trying to do?

>
See example 4 (cumulative sum) of
http://www.databasejournal.com/feat...hp/10894_337386
1_2
>
HTH

Ahah! That helped enormously - thanks, much appreciated.
--
SlowerThanYou|||>No, I didn't know that SQL has no boolean data type, and that BIT is proprietary, so thanks for that information. You can pretend it is an integer type if you prefer. Again, do not read anything into the table and field [sic] names .. <<

O)kay. You have SERIOUS conceptual problems with SQL and RDBMS. The
reason that SQL has no BOOLEAN data types is one of those "mathematical
foundations" things that has to do with NULLs, 3-valued logic and
logic. In 25 words or less, we discover a state of being via
predicates rather than by looking for a flag.

In procedural, step-by-step file system models you set flags in step
(n) to pass control information to step (n+1) of the process. In the RM
model, multiple users can change the basic facts of a schema and thus
the criteria of the subset, so we do not store computed columns. You
compute subset membership at run time.

Fields have mean because of the program that reads them; columns have a
domain, a value and constraints in the schema -- totally separate from
any program that uses them -- which give them meaning.

It does not matter if you use a Standard data type; you are still not
programming with relational data model. Think in terms of predicates,
sets and declarations, not flags, sequences and procedures.|||--CELKO-- wrote:

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

No, I didn't know that SQL has no boolean data type, and that BIT

is proprietary, so thanks for that information. You can pretend it is
an integer type if you prefer. Again, do not read anything into the
table and field [sic] names .. <<
>
O)kay. You have SERIOUS conceptual problems with SQL and RDBMS.

Yeah well thanks for the opinion and all, but with the helpful efforts
of a number of posters to this group, I've understood and solved the
problem now and everything is just lovely. I'm happy, my customer is
happy, and my customer's customer is happy. Flowers bloom, birds sing,
and I've moved on to other things.

The last time I did any serious database development work was as a
young contractor, way back in the days of DBaseII before all this SQL
malarkey existed. In those days we had tables that consisted of records
made up of one or more fields. Rows and columns where for spreadsheets.
I'm sorry if that terminology annoys you but old habits die hard, and
it least it gives you a reason to try to act all superior, eh?
--
SlowerThanYou|||--CELKO-- wrote:

Quote:

Originally Posted by

You do not have a table at all; it is an attempt to mimic a deck of
punch cards. You confuse columns and fields, rows and records

This would be a lot more helpful if you'd explain the difference (or
rather, since the explanation is probably long-ish, include a URL
where the explanation can be found).

Quote:

Originally Posted by

do these transactions create a customer or a sale?

*looks down* Oh, you're alluding to SalesTransactions being a better
name than CustTransactions. But are you sure? Customers may engage
in sales, returns, credit memos and debit memos (the latter two are
used to adjust the customer's balance without inventory changing hands,
e.g. if they were over/undercharged for something). Of course, stuffing
multiple types of transactions into a single table without an explicit
TransactionType column is a separate error, but perhaps the table
definition was simplified by omitting columns not directly relevant to
the task at hand.

Quote:

Originally Posted by

Why is there DDL in narratives?

Why wouldn't there be? How many questions lacking DDL receive
the initial response "please post DDL to create your tables and
populate them with data illustrating the issue"?

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

>>What I would like to do is, for all of the records [sic] in descending order of "AmountSpent" where "CustSelected = TRUE", set CustSelected to FALSE

such that the sum of all the AmountSpent records with CustSelected =
TRUE is no greater than a specified amount (say $50,000). <<

Quote:

Originally Posted by

You did not say what to do about ties; if I have five sales of
$50,000.00 which one would I mark?

This gap in the spec can be bridged by picking an arbitrary
rule (e.g. mark rows with lower TransactionKey first), on the
assumption that the questioner will be able to adjust that
part of the answer to fit whatever the actual rule is.

Quote:

Originally Posted by

give us a RELATIONAL spec and we can probably help you

This is a more general case of the above. "Your style is lousy,
so I'm going to point that out _and not answer your question_."|||--CELKO-- wrote:

Quote:

Originally Posted by

This is not quite true. SQL in general has an optional BOOLEAN data
type; MSSQL in particular does not support the option.

http://troels.arvin.dk/db/rdbms/#data_types-boolean
Also, most of your message boils down to "you shouldn't store computed
data that can become outdated", but starting it out with the
above-quoted material gives the impression of "you shouldn't use
flags", which is untrue.

Quote:

Originally Posted by

Fields have mean because of the program that reads them; columns have
a domain, a value and constraints in the schema -- totally separate
from any program that uses them -- which give them meaning.

Aha, here's the answer to that "what's the difference between a field
and a column?" question that was raised earlier. No wonder I felt
confused - I'm familiar with program-independent constraints enforced
by the database, but did not strictly associate "column" with their
existence and "field" with their non-existence. (Spreadsheets, in
particular, play havoc with this.)

The intended difference between "row" and "record" is similarly
non-obvious to the lay reader, though I think I've read about it
before: namely, records have an inherent order, while rows have
no guaranteed order unless you specify one. (Spreadsheets play
havoc with this, too. So do certain indexes, especially clustering
indexes, which novices can easily mistake for an inherent order.)|||See if this is what you want:

CREATE TABLE SalesTransactions
(sales_nbr INTEGER NOT NULL PRIMARY KEY,
customer_id INTEGER NOT NULL
REFERENCES Customers (customer_id),
sales_amt DECIMAL(12,2) NOT NULL);

Create a VIEW or CTE with each customers sales ordered from high to
low. This is a greedy algorithm. The ROW_NUMBER() will randomly pick
an ordering in the event of ties.

Using that derived table, we can find the subset of purchase in each
customer that are at or below the threshold. amount, something like
this:

WITH (SELECT customer_id, sales_amt,
ROW_NUMBER()
OVER (PARTITION BY customer_id
ORDER BY sales_amt DESC)
FROM SalesTransactions AS S1)
AS SalesScores (customer_id, sales_amt, score)

SELECT S1.customer_id, S1.score
FROM SalesScores AS S1
WHERE @.threshold_amt <=
(SELECT SUM(S2.sales_amt)
FROM SalesScores AS S2
WHERE S1.customer_id = S2.customer_id
AND S1.score >= S2.score);

You can do this in one statement with the full OLAP features, which
would have a RANGE clause in the SUM() OVER() construct. SQL Server is
a bit behind.

But the important point is that you use virtual tables, rather than
mimicing a deck of punch cards. Think LOGICAL and not PHYSICAL! Think
sets, not sequences.

Monday, March 19, 2012

Help with a complex UPDATE query

No comments:

Post a Comment

Restore Database

Blog Archive

About Me