I am banging my head against a brick wall over this problem, so any
help in the correct direction would be muchly appreciated!
I have 2 SQL (MS SQL) server tables, realated to -
a Property,
Sales of that property.
A property is uniquely identifed by its Roll, valuation Number and
Suffix (not my choosing).
Each property can only appear in the property table once, and can only
have 1 assessment - but can have multiple sales (ie - over the
annalysis period the same property can sell more than once).
There is approximatly 19000 properties relating to about 8000 sales.
When creating a query to list property and most recent sale (if there
is any) I end up with somthing like this -
SELECT [roll], [valuation], [suffix], [sale date]
FROM [property]
LEFT JOIN [sales]
ON
[property].[roll] = [sales].[roll] AND
[property].[valuation] = [sales].[valuation] AND
[property].[suffix] = [sales].[suffix]
(table names simplifed).
I get rows where there is all the property data there, but sale date
(etc.) is null (as I would expect from a left join), but the problem is
- when there is more than 1 sale for a property it pulls out another
copy of the property data.
In short, because of that I come out with more records than properties.
ie -
roll valuation suffix sale date
12 456789 A 1/1/2003
12 788988 B NULL
14 123456 A 1/1/2003
14 123456 A 1/1/2004
(Note - the last two are the same property).
I didn't know that the left join can affect both joined tables!
Is there any way around this? Any suggestions/hints in the right
direction would be very much appreciated!
THANKS!On 11 Apr 2005 21:27:53 -0700, "MaxPenguin"
<malcolm.lockyer@.origen.co.nz> wrote:
>Hi All,
>
>I am banging my head against a brick wall over this problem, so any
>help in the correct direction would be muchly appreciated!
>
>I have 2 SQL (MS SQL) server tables, realated to -
>a Property,
>Sales of that property.
>
>A property is uniquely identifed by its Roll, valuation Number and
>Suffix (not my choosing).
>Each property can only appear in the property table once, and can only
>have 1 assessment - but can have multiple sales (ie - over the
>annalysis period the same property can sell more than once).
>There is approximatly 19000 properties relating to about 8000 sales.
>
>When creating a query to list property and most recent sale (if there
>is any) I end up with somthing like this -
>SELECT [roll], [valuation], [suffix], [sale date]
>FROM [property]
>LEFT JOIN [sales]
>ON
>[property].[roll] = [sales].[roll] AND
>[property].[valuation] = [sales].[valuation] AND
>[property].[suffix] = [sales].[suffix]
>(table names simplifed).
>I get rows where there is all the property data there, but sale date
>(etc.) is null (as I would expect from a left join), but the problem is
>- when there is more than 1 sale for a property it pulls out another
>copy of the property data.
>In short, because of that I come out with more records than properties.
>ie -
>roll valuation suffix sale date
>12 456789 A 1/1/2003
>12 788988 B NULL
>14 123456 A 1/1/2003
>14 123456 A 1/1/2004
>(Note - the last two are the same property).
>
>I didn't know that the left join can affect both joined tables!
>
>Is there any way around this? Any suggestions/hints in the right
>direction would be very much appreciated!
>THANKS!
Are you able to tell us what want the query to return?
(And why?)|||It isn't really a matter of the left join "affecting both tables". For
any join that you use, if you have a many to one relationship then you
are going to get multiple rows for the "one" side since the join
effectively creates a cartesian product to start with. Whittling that
down is a matter of your join criteria.
In your description of the problem you state that you want the "most
recent sale" but there is nothing in your query to try to limit the
results to the most recent sale. Either of the queries below should
work. I usually see better performance using the LEFT JOIN/IS NOT NULL
method, but using NOT EXISTS is a bit more readable/logical in my
opinion.
SELECT p.roll, p.valuation, p.suffix, s.[sale date]
FROM property p
LEFT JOIN sales s ON p.roll = s.roll
AND p.valuation = s.valuation
AND p.suffix = s.suffix
LEFT JOIN sales s2 ON s2.roll = s.roll
AND s2.valuation = s.valuation
AND s2.suffix = s.suffix
AND s2.[sale date] > s.[sale date]
WHERE s2.roll IS NULL
SELECT p.roll, p.valuation, p.suffix, s.[sale date]
FROM property p
LEFT JOIN sales s ON p.roll = s.roll
AND p.valuation = s.valuation
AND p.suffix = s.suffix
WHERE NOT EXISTS (SELECT *
FROM sales s2
WHERE s2.valuation = s.valuation
AND s2.suffix = s.suffix
AND s2.roll = s.roll
AND s2.[sale date] > s.[sale date])
Both methods assume that you cannot have two sales on the same exact
date for the same property. They both effectively remove any rows where
there is another row for the same property with a later sale date. That
will leave only those rows with the latest (most recent) sale date.
HTH,
-Tom.|||Thomas R. Hummel wrote:
> In your description of the problem you state that you want the "most
> recent sale" but there is nothing in your query to try to limit the
> results to the most recent sale. Either of the queries below should
> work. I usually see better performance using the LEFT JOIN/IS NOT
NULL
> method, but using NOT EXISTS is a bit more readable/logical in my
> opinion.
Thanks your solution works perfectly. Thats exactly what I needed
really - a way to tell it that I did just want the most recent sales.
Thanks again :)!
> SELECT p.roll, p.valuation, p.suffix, s.[sale date]
> FROM property p
> LEFT JOIN sales s ON p.roll = s.roll
>   AND p.valuation = s.valuation
>   AND p.suffix = s.suffix
> LEFT JOIN sales s2 ON s2.roll = s.roll
>    AND s2.valuation = s.valuation
>    AND s2.suffix = s.suffix
>    AND s2.[sale date] > s.[sale date]
> WHERE s2.roll IS NULL
> SELECT p.roll, p.valuation, p.suffix, s.[sale date]
> FROM property p
> LEFT JOIN sales s ON p.roll = s.roll
>   AND p.valuation = s.valuation
>   AND p.suffix = s.suffix
> WHERE NOT EXISTS (SELECT *
>    FROM sales s2
>    WHERE s2.valuation = s.valuation
>    AND s2.suffix = s.suffix
>    AND s2.roll = s.roll
>    AND s2.[sale date] > s.[sale date])
> Both methods assume that you cannot have two sales on the same exact
> date for the same property. They both effectively remove any rows
where
> there is another row for the same property with a later sale date.
That
> will leave only those rows with the latest (most recent) sale date.
>  HTH,
>  -Tom.
 
No comments:
Post a Comment