Wednesday, March 21, 2012

Help with a query

Hi all,
I have a table that describes bus routes. The table includes the following
columns:
ID - just an autonumber
Line - The line number / bus number
Way - A way number. Each line/bus can have multiple ways it travels on
OrderNo - The order of how the stations will be visited (starting from
1,2,3, etc) for every way
Distance - Distance in meters between the start and end station
StartStation - The start station number
EndStation - The end station number
An example recordset could look this:
ID Line Way Order Distance Start End
1 1 1 1 100 A B
2 1 1 2 150 B C
3 1 1 3 250 C D
4 1 2 1 150 A B
5 1 2 2 110 B C
6 1 2 3 200 C D
7 1 2 4 200 D E
8 2 1 1 100 Y Z
9 2 1 2 100 Z A
10 2 1 3 100 A B
11 2 1 4 150 B X
12 2 1 5 150 X W
13 2 1 6 150 W C
14 2 1 7 250 C D
and so on...
Imagine that I want to go from A to C, so my start station will be A and end
station will be C. From the example above I know I have 3 possibilities, and
these are:
Line 1, Way 1: A-B-C (ids: 1-2)
Line 1, Way 2: A-B-C (ids: 4-5)
Line 2, Way 1: A-B-X-W-C (ids: 10-11-12-13)
Basically I need a query that returns a recordset with the above results.
Is it possible to build an SQL statement of this kind?
Thanks,
IvanWith the arsenal available today, you are not likely to easily solve this
with T-SQL alone, although you can potentially kludge this with a view or
two. It is more likely you will have "client" code, meaning your app that
consumes the results of this query.
In SQL Server 2005, you have additional options, including .NET code in the
database, to produce the results you want without much pain.
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA
***************************
Think Outside the Box!
***************************
"Ivan Debono" wrote:

> Hi all,
> I have a table that describes bus routes. The table includes the following
> columns:
> ID - just an autonumber
> Line - The line number / bus number
> Way - A way number. Each line/bus can have multiple ways it travels on
> OrderNo - The order of how the stations will be visited (starting from
> 1,2,3, etc) for every way
> Distance - Distance in meters between the start and end station
> StartStation - The start station number
> EndStation - The end station number
> An example recordset could look this:
> ID Line Way Order Distance Start End
> 1 1 1 1 100 A B
> 2 1 1 2 150 B C
> 3 1 1 3 250 C D
> 4 1 2 1 150 A B
> 5 1 2 2 110 B C
> 6 1 2 3 200 C D
> 7 1 2 4 200 D E
> 8 2 1 1 100 Y Z
> 9 2 1 2 100 Z A
> 10 2 1 3 100 A B
> 11 2 1 4 150 B X
> 12 2 1 5 150 X W
> 13 2 1 6 150 W C
> 14 2 1 7 250 C D
> and so on...
> Imagine that I want to go from A to C, so my start station will be A and e
nd
> station will be C. From the example above I know I have 3 possibilities, a
nd
> these are:
> Line 1, Way 1: A-B-C (ids: 1-2)
> Line 1, Way 2: A-B-C (ids: 4-5)
> Line 2, Way 1: A-B-X-W-C (ids: 10-11-12-13)
> Basically I need a query that returns a recordset with the above results.
> Is it possible to build an SQL statement of this kind?
> Thanks,
> Ivan
>
>|||Actually, there is a neat solution in SQL-2005 that uses a recursive
CTE to implement Dijkstra's algorithm for the shortest path. There is
no need to add .NET code to the schema.|||I got an email asking me how to find paths in a graph using SQL. The
author of the email had seen my chapter on graphs in SQL FOR SMARTIES
2nd, and read that I was not happy with my own answers. What he wanted
was a list of paths from any two nodes in a directed graph, and I would
assume that he wanted the cheapest path.
After thinking about this for awhile, the best way is probably to do
the Floyd-Warshall or Johnson algorithm in a procedural language and
load a table with the results. But I want to do this in pure SQL as an
exercise.
Let's start with a simple graph and represent it as an adjacency list
with weights on the edges.
CREATE TABLE Graph
(source CHAR(2) NOT NULL,
destination CHAR(2) NOT NULL,
cost INTEGER NOT NULL,
PRIMARY KEY (source, destination));
I got data for this table from the book Introduction to Algorithms by
Cormen, Leiserson and Rivest (ISBN 0-262-03141-8), page 518. This book
is very popular in college courses in the United States. I made one
decision that will be important later; I added self-traversal edges
(i.e. the node is both the source and the destination) with weights of
zero.
INSERT INTO Graph VALUES ('s', 's', 0);
INSERT INTO Graph VALUES ('s', 'u', 3);
INSERT INTO Graph VALUES ('s', 'x', 5);
INSERT INTO Graph VALUES ('u', 'u', 0);
INSERT INTO Graph VALUES ('u', 'v', 6);
INSERT INTO Graph VALUES ('u', 'x', 2);
INSERT INTO Graph VALUES ('v', 'v', 0);
INSERT INTO Graph VALUES ('v', 'y', 2);
INSERT INTO Graph VALUES ('x', 'u', 1);
INSERT INTO Graph VALUES ('x', 'v', 4);
INSERT INTO Graph VALUES ('x', 'x', 0);
INSERT INTO Graph VALUES ('x', 'y', 6);
INSERT INTO Graph VALUES ('y', 's', 3);
INSERT INTO Graph VALUES ('y', 'v', 7);
INSERT INTO Graph VALUES ('y', 'y', 0);
I am not happy about this approach, because I have to decide the
maximum number of edges in path before I start looking for an answer.
But this will work and I know that a path will have no more than the
total number of nodes in the graph. Let's create a table to hold the
paths:
CREATE TABLE Paths
(step1 CHAR(2) NOT NULL,
step2 CHAR(2) NOT NULL,
step3 CHAR(2) NOT NULL,
step4 CHAR(2) NOT NULL,
step5 CHAR(2) NOT NULL,
total_cost INTEGER NOT NULL,
path_length INTEGER NOT NULL,
PRIMARY KEY (step1, step2, step3, step4, step5));
The step1 node is where I begin the path. The other columns are the
second step, third step, fourth step, and so forth. The last step
column is the end of the journey. The total_cost column is the total
cost, based on the sum of the weights of the edges, on this path. The
path length column is harder to explain, but for now, let's just say
that it is a count of the nodes visited in the path.
To keep things easier, let's look at all the paths from "s" to "y" in
the graph. The INSERT INTO statement for construction that set looks
like this:
INSERT INTO Paths
SELECT G1.source, -- it is 's' in this example
G2.source,
G3.source,
G4.source,
G4.destination, -- it is 'y' in this example
(G1.cost + G2.cost + G3.cost + G4.cost),
(CASE WHEN G1.source NOT IN (G2.source, G3.source, G4.source)
THEN 1 ELSE 0 END
+ CASE WHEN G2.source NOT IN (G1.source, G3.source, G4.source)
THEN 1 ELSE 0 END
+ CASE WHEN G3.source NOT IN (G1.source, G2.source, G4.source)
THEN 1 ELSE 0 END
+ CASE WHEN G4.source NOT IN (G1.source, G2.source, G3.source)
THEN 1 ELSE 0 END)
FROM Graph AS G1, Graph AS G2, Graph AS G3, Graph AS G4
WHERE G1.source = 's'
AND G1.destination = G2.source
AND G2.destination = G3.source
AND G3.destination = G4.source
AND G4.destination = 'y';
I put in "s" and "y" as the source and destination of the path, and
made sure that the destination of one step in the path was the source
of the next step in the path. This is a combinatorial explosion, but it
is easy to read and understand.
The sum of the weights is the cost of the path, which is easy to
understand. The path_length calculation is a bit harder. This sum of
CASE expressions looks at each node in the path. If it is unique within
the row, it is assigned a value of one, if it is not unique within the
row, it is assigned a value of zero.
All paths will have five steps in them because that is the way to table
is declared. But what if a path exists between the two nodes which is
shorter than five steps? That is where the self-traversal rows are
used! Consecutive pairs of steps in the same row can be repetitions of
the same node.
Here is what the rows of the Paths table look like after this INSERT
INTO statement, ordered by descending path_length, and then by
ascending cost.
Paths step1 step2 step3 step4 step5 total_cost path_length
========================================
==============
s s x x y 11 0
s s s x y 11 1
s x x x y 11 1
s x u x y 14 2
s s u v y 11 2
s s u x y 11 2
s s x v y 11 2
s s x y y 11 2
s u u v y 11 2
s u u x y 11 2
s u v v y 11 2
s u x x y 11 2
s x v v y 11 2
s x x v y 11 2
s x x y y 11 2
s x y y y 11 2
s x y v y 20 4
s x u v y 14 4
s u v y y 11 4
s u x v y 11 4
s u x y y 11 4
s x v y y 11 4
Clearly, all pairs of nodes could be picked from the original Graph
table and the same INSERT INTO run on them with a minor change in the
WHERE clause. However, this example is big enough for a short magazine
article. And it is too big for most applications. It is safe to assume
that people really want the cheapest path. In this example, the
total_cost column defines the cost of an path, so we can eliminate some
of the paths from the Paths table with this statement.
DELETE FROM Paths
WHERE total_cost
> (SELECT MIN(total_cost)
FROM Paths);
Again, if you had all the paths for all possible pairs of nodes, the
subquery expression would have a WHERE clause to correlate it to the
subset of paths for each possible pair.
In this example, it got rid of 3 out of 22 possible paths. It is
helpful and in some situations we might like having all the options.
But these are not distinct options.
As one of many examples, the paths
(s, x, v, v, y, 11, 2)
and
(s, x, x, v, y, 11, 2)
are both really the same path, (s, x, v, y). Before we decide to write
a statement to handle these equivalent rows, let's consider another
cost factor. People do not like to change airplanes or trains. If they
can go from Amsterdam to New York City on one plane without changing
planes for the same cost, they are happy. This is where that
path_length column comes in. It is a quick way to remove the paths that
have more edges than they need to get the job done.
DELETE FROM Paths
WHERE path_length
> (SELECT MIN(path_length)
FROM Paths);
In this case, that last DELETE FROM statement will reduce the table to
one row: (s, s, x, x, y, 11, 0) which reduces to (s, x, y). This single
remaining row is very convenient for my article, but if you look at the
table, you will see that there was also a subset of equivalent rows
that had higher path_length numbers.
(s, s, s, x, y, 11, 1)
(s, x, x, x, y, 11, 1)
(s, x, x, y, y, 11, 2)
(s, x, y, y, y, 11, 2)
Your task is to write code to handle equivalent rows. Hint: the
duplicate nodes will always be contigous across the row.|||Actually, I managed to find a way using a view with 2 inner joins.
It does find the path (not the shortest) from source to target, but the
problem is that it does not find paths when changes occur (that is, a change
in line). Therefore it finds only paths on the same lines.
This is the SQL statement:
select d2.* from descriptions_search d1
inner join descriptions_search d2
on d1.desline = d2.desline
and d1.deswayno = d2.deswayno
and d1.desdirection = d2.desdirection
and d2.desorder>= d1.desorder
inner join descriptions_search d3
on d1.desline = d3.desline
and d1.deswayno = d3.deswayno
and d1.desdirection = d3.desdirection
and d2.desorder<= d3.desorder
where d1.startstation = 1301 and d3.endstation = 1313
order by d2.desline, d2.deswayno, d2.desdirection, d2.desorder
1301 and 1313 and example bus stops. descriptions_search is a view based on
the graph table with further joins to get station numbers instead of id's
I wouldn't know how to further develop the view to find paths with changes.
Any ideas?
Ivan
"--CELKO--" <jcelko212@.earthlink.net> schrieb im Newsbeitrag
news:1117822495.714286.127820@.g47g2000cwa.googlegroups.com...
> I got an email asking me how to find paths in a graph using SQL. The
> author of the email had seen my chapter on graphs in SQL FOR SMARTIES
> 2nd, and read that I was not happy with my own answers. What he wanted
> was a list of paths from any two nodes in a directed graph, and I would
> assume that he wanted the cheapest path.
> After thinking about this for awhile, the best way is probably to do
> the Floyd-Warshall or Johnson algorithm in a procedural language and
> load a table with the results. But I want to do this in pure SQL as an
> exercise.
> Let's start with a simple graph and represent it as an adjacency list
> with weights on the edges.
> CREATE TABLE Graph
> (source CHAR(2) NOT NULL,
> destination CHAR(2) NOT NULL,
> cost INTEGER NOT NULL,
> PRIMARY KEY (source, destination));
>
> I got data for this table from the book Introduction to Algorithms by
> Cormen, Leiserson and Rivest (ISBN 0-262-03141-8), page 518. This book
> is very popular in college courses in the United States. I made one
> decision that will be important later; I added self-traversal edges
> (i.e. the node is both the source and the destination) with weights of
> zero.
> INSERT INTO Graph VALUES ('s', 's', 0);
> INSERT INTO Graph VALUES ('s', 'u', 3);
> INSERT INTO Graph VALUES ('s', 'x', 5);
> INSERT INTO Graph VALUES ('u', 'u', 0);
> INSERT INTO Graph VALUES ('u', 'v', 6);
> INSERT INTO Graph VALUES ('u', 'x', 2);
> INSERT INTO Graph VALUES ('v', 'v', 0);
> INSERT INTO Graph VALUES ('v', 'y', 2);
> INSERT INTO Graph VALUES ('x', 'u', 1);
> INSERT INTO Graph VALUES ('x', 'v', 4);
> INSERT INTO Graph VALUES ('x', 'x', 0);
> INSERT INTO Graph VALUES ('x', 'y', 6);
> INSERT INTO Graph VALUES ('y', 's', 3);
> INSERT INTO Graph VALUES ('y', 'v', 7);
> INSERT INTO Graph VALUES ('y', 'y', 0);
> I am not happy about this approach, because I have to decide the
> maximum number of edges in path before I start looking for an answer.
> But this will work and I know that a path will have no more than the
> total number of nodes in the graph. Let's create a table to hold the
> paths:
>
> CREATE TABLE Paths
> (step1 CHAR(2) NOT NULL,
> step2 CHAR(2) NOT NULL,
> step3 CHAR(2) NOT NULL,
> step4 CHAR(2) NOT NULL,
> step5 CHAR(2) NOT NULL,
> total_cost INTEGER NOT NULL,
> path_length INTEGER NOT NULL,
> PRIMARY KEY (step1, step2, step3, step4, step5));
>
> The step1 node is where I begin the path. The other columns are the
> second step, third step, fourth step, and so forth. The last step
> column is the end of the journey. The total_cost column is the total
> cost, based on the sum of the weights of the edges, on this path. The
> path length column is harder to explain, but for now, let's just say
> that it is a count of the nodes visited in the path.
> To keep things easier, let's look at all the paths from "s" to "y" in
> the graph. The INSERT INTO statement for construction that set looks
> like this:
> INSERT INTO Paths
> SELECT G1.source, -- it is 's' in this example
> G2.source,
> G3.source,
> G4.source,
> G4.destination, -- it is 'y' in this example
> (G1.cost + G2.cost + G3.cost + G4.cost),
> (CASE WHEN G1.source NOT IN (G2.source, G3.source, G4.source)
> THEN 1 ELSE 0 END
> + CASE WHEN G2.source NOT IN (G1.source, G3.source, G4.source)
> THEN 1 ELSE 0 END
> + CASE WHEN G3.source NOT IN (G1.source, G2.source, G4.source)
> THEN 1 ELSE 0 END
> + CASE WHEN G4.source NOT IN (G1.source, G2.source, G3.source)
> THEN 1 ELSE 0 END)
> FROM Graph AS G1, Graph AS G2, Graph AS G3, Graph AS G4
> WHERE G1.source = 's'
> AND G1.destination = G2.source
> AND G2.destination = G3.source
> AND G3.destination = G4.source
> AND G4.destination = 'y';
>
> I put in "s" and "y" as the source and destination of the path, and
> made sure that the destination of one step in the path was the source
> of the next step in the path. This is a combinatorial explosion, but it
> is easy to read and understand.
> The sum of the weights is the cost of the path, which is easy to
> understand. The path_length calculation is a bit harder. This sum of
> CASE expressions looks at each node in the path. If it is unique within
> the row, it is assigned a value of one, if it is not unique within the
> row, it is assigned a value of zero.
> All paths will have five steps in them because that is the way to table
> is declared. But what if a path exists between the two nodes which is
> shorter than five steps? That is where the self-traversal rows are
> used! Consecutive pairs of steps in the same row can be repetitions of
> the same node.
> Here is what the rows of the Paths table look like after this INSERT
> INTO statement, ordered by descending path_length, and then by
> ascending cost.
> Paths step1 step2 step3 step4 step5 total_cost path_length
> ========================================
==============
> s s x x y 11 0
> s s s x y 11 1
> s x x x y 11 1
> s x u x y 14 2
> s s u v y 11 2
> s s u x y 11 2
> s s x v y 11 2
> s s x y y 11 2
> s u u v y 11 2
> s u u x y 11 2
> s u v v y 11 2
> s u x x y 11 2
> s x v v y 11 2
> s x x v y 11 2
> s x x y y 11 2
> s x y y y 11 2
> s x y v y 20 4
> s x u v y 14 4
> s u v y y 11 4
> s u x v y 11 4
> s u x y y 11 4
> s x v y y 11 4
> Clearly, all pairs of nodes could be picked from the original Graph
> table and the same INSERT INTO run on them with a minor change in the
> WHERE clause. However, this example is big enough for a short magazine
> article. And it is too big for most applications. It is safe to assume
> that people really want the cheapest path. In this example, the
> total_cost column defines the cost of an path, so we can eliminate some
> of the paths from the Paths table with this statement.
> DELETE FROM Paths
> WHERE total_cost
> FROM Paths);
> Again, if you had all the paths for all possible pairs of nodes, the
> subquery expression would have a WHERE clause to correlate it to the
> subset of paths for each possible pair.
> In this example, it got rid of 3 out of 22 possible paths. It is
> helpful and in some situations we might like having all the options.
> But these are not distinct options.
> As one of many examples, the paths
> (s, x, v, v, y, 11, 2)
> and
> (s, x, x, v, y, 11, 2)
> are both really the same path, (s, x, v, y). Before we decide to write
> a statement to handle these equivalent rows, let's consider another
> cost factor. People do not like to change airplanes or trains. If they
> can go from Amsterdam to New York City on one plane without changing
> planes for the same cost, they are happy. This is where that
> path_length column comes in. It is a quick way to remove the paths that
> have more edges than they need to get the job done.
> DELETE FROM Paths
> WHERE path_length
> FROM Paths);
> In this case, that last DELETE FROM statement will reduce the table to
> one row: (s, s, x, x, y, 11, 0) which reduces to (s, x, y). This single
> remaining row is very convenient for my article, but if you look at the
> table, you will see that there was also a subset of equivalent rows
> that had higher path_length numbers.
> (s, s, s, x, y, 11, 1)
> (s, x, x, x, y, 11, 1)
> (s, x, x, y, y, 11, 2)
> (s, x, y, y, y, 11, 2)
> Your task is to write code to handle equivalent rows. Hint: the
> duplicate nodes will always be contigous across the row.
>|||if you have SQL Server 2005 with CTE, you can use this approach:
http://www.sqlservercentral.com/col...lserver2005.asp|||unfortunately not :(
"--CELKO--" <jcelko212@.earthlink.net> schrieb im Newsbeitrag
news:1118080444.728097.178340@.g14g2000cwa.googlegroups.com...
> if you have SQL Server 2005 with CTE, you can use this approach:
>
http://www.sqlservercentral.com/col...>
5.asp

>

No comments:

Post a Comment