trunk/doc/ALGORITHM.txt

                              Routino : Algorithm
                              ===================


   This page describes the development of the algorithm that is used in
   the software demonstrated here.


Simplest Algorithm
------------------

   The algorithm to find a route is fundamentally simple: Start at the
   beginning, follow all possible routes and keep going until you reach
   the end.

   While this method does work, it isn't fast. To be able to find a route
   quickly needs a different algorithm, one that can find the correct
   answer without wasting time on routes that lead nowhere.


Improved Algorithm
------------------

   The simplest way to do this is to follow all possible segments from the
   starting node to the next nearest node (an intermediate node in the
   complete journey). For each node that is reached store the shortest
   route from the starting node and the length of that route. The list of
   intermediate nodes needs to be maintained in order of shortest overall
   route on the assumption that there is a straight line route from here
   to the end node.
   At each point the intermediate node that has the shortest potential
   overall journey time is processed before any other node. From the first
   node in the list follow all possible segments and place the newly
   discovered nodes into the same list ordered in the same way. This will
   tend to constrain the list of nodes examined to be the ones that are
   between the start and end nodes. If at any point you reach a node that
   has already been reached by a longer route then you can discard that
   route since the newly discovered route is shorter. Conversely if the
   previously discovered route is shorter then discard the new route.
   At some point the end node will be reached and then any routes with
   potential lengths longer than this actual route can be immediately
   discarded. The few remaining potential routes must be continued until
   they are found to be shorter or have no possibility of being shorter.
   The shortest possible route is then found.

   At all times when looking at a node only those segments that are
   possible by the chosen means of transport are followed. This allows the
   type of transport to be handled easily. When finding the quickest route
   the same rules apply except that criterion for sorting is the shortest
   potential route (assuming that from each node to the end is the fastest
   possible type of highway).

   This method also works, but again it isn't very fast. The problem is
   that the complexity is proportional to the number of nodes or segments
   in all routes examined between the start and end nodes. Maintaining the
   list of intermediate nodes in order is the most complex part.


Final Algorithm
---------------

   The final algorithm that is implemented in the router is basically the
   one above but with an important difference. Instead of finding a long
   route among a data set of 5,000,000 nodes (number of highway nodes in
   UK at beginning of 2009) it finds one long route in a data set of
   500,000 nodes and a few hundred very short routes in the full data set.
   Since the time taken to find a route is proportional to the number of
   nodes the main route takes 1/10th of the time and the very short routes
   take almost no time at all.

   The solution to making the algorithm fast is therefore to discard most
   of the nodes and only keep the interesting ones. In this case a node is
   deemed to be interesting if it is the junction of three or more
   segments or the junction of two segments with different properties. In
   the algorithm these are classed as super-nodes. Between each super-node
   and a neighbouring super-node a super-segment is generated that
   contains the shortest path along segments with identical properties.
   This decision making process can be repeated until the only the most
   important and interesting nodes remain.

   To find a route now comprises finding a route along super-segments
   between the start node and the end node followed by finding a route
   between each adjacent pair of super-nodes in that route. (The routes
   between the start and end nodes and all adjacent super-nodes need to be
   found before the route using only super-nodes can be found.) This is
   considerably quicker than using all nodes but gives a result that still
   contains the full list of nodes that are visited.


Implementation
--------------

   The hardest part of implementing this router is the data organisation.
   The arrangement of the data to minimise the number of operations
   required to follow a route from one node to another is much harder than
   designing the algorithm itself.

   The final implementation uses a separate table for nodes, segments and
   ways. Each table individually is implemented as a C-language data
   structure that is written to disk by a program which parses the
   OpenStreetMap XML data file. In the router these data structures are
   memory mapped so that the operating system handles the problems of
   loading the needed data blocks from disk.

   Each node contains a latitude and longitude and they are sorted
   geographically so that converting a latitude and longitude coordinate
   to a node is fast as well as looking up the coordinate of a node. The
   node also contains the location in the array of segments for the first
   segment that uses that node.
   Each segment contains the location of the two nodes as well as the way
   that the segment came from. The location of the next segment that uses
   one of the two nodes is also stored; the next segment for the other
   node is the following one in the array. The length of the segment is
   also pre-computed and stored.
   Each way has a name, a highway type, a list of allowed types of
   traffic, a speed limit and any weight, height, width or length
   restrictions.

   The super-nodes are mixed in with the nodes and the super-segments are
   mixed in with the segments. For the nodes they are the same as the
   normal nodes, so just a flag is needed to indicate that they are super.
   The super-segments are in addition to the normal segments so they
   increase the database size (by about 10%) and are also marked with a
   flag.


Practicalities
--------------

   At the time of writing (April 2010) the OpenStreetMap data for Great
   Britain (taken fromGeoFabrik) contains:
     * 14,675,098 nodes
          + 8,767,521 are highway nodes
          + 1,120,297 are super-nodes
     * 1,876,822 ways
          + 1,412,898 are highways
               o 9,316,328 highway segments
               o 1,641,009 are super-segments
     * 60,572 relations

   The database files when generated are 41.5 MB for nodes, 121.6 MB for
   segments and 12.6 MB for ways and are stored uncompressed. By having at
   least 200 MB or RAM available the routing can be performed with no disk
   accesses (once the data has been read once).


--------

Copyright 2008-2010 Andrew M. Bishop.
1	amb	381	Routino : Algorithm
2			===================
3	amb	152
4
5			This page describes the development of the algorithm that is used in
6			the software demonstrated here.
7
8	amb	381
9	amb	152	Simplest Algorithm
10	amb	381	------------------
11	amb	152
12			The algorithm to find a route is fundamentally simple: Start at the
13			beginning, follow all possible routes and keep going until you reach
14			the end.
15
16			While this method does work, it isn't fast. To be able to find a route
17			quickly needs a different algorithm, one that can find the correct
18			answer without wasting time on routes that lead nowhere.
19
20	amb	381
21	amb	152	Improved Algorithm
22	amb	381	------------------
23	amb	152
24			The simplest way to do this is to follow all possible segments from the
25			starting node to the next nearest node (an intermediate node in the
26			complete journey). For each node that is reached store the shortest
27			route from the starting node and the length of that route. The list of
28			intermediate nodes needs to be maintained in order of shortest overall
29			route on the assumption that there is a straight line route from here
30			to the end node.
31			At each point the intermediate node that has the shortest potential
32			overall journey time is processed before any other node. From the first
33			node in the list follow all possible segments and place the newly
34			discovered nodes into the same list ordered in the same way. This will
35			tend to constrain the list of nodes examined to be the ones that are
36			between the start and end nodes. If at any point you reach a node that
37			has already been reached by a longer route then you can discard that
38			route since the newly discovered route is shorter. Conversely if the
39			previously discovered route is shorter then discard the new route.
40			At some point the end node will be reached and then any routes with
41			potential lengths longer than this actual route can be immediately
42			discarded. The few remaining potential routes must be continued until
43			they are found to be shorter or have no possibility of being shorter.
44			The shortest possible route is then found.
45
46			At all times when looking at a node only those segments that are
47			possible by the chosen means of transport are followed. This allows the
48			type of transport to be handled easily. When finding the quickest route
49			the same rules apply except that criterion for sorting is the shortest
50			potential route (assuming that from each node to the end is the fastest
51			possible type of highway).
52
53			This method also works, but again it isn't very fast. The problem is
54			that the complexity is proportional to the number of nodes or segments
55			in all routes examined between the start and end nodes. Maintaining the
56			list of intermediate nodes in order is the most complex part.
57
58	amb	381
59	amb	152	Final Algorithm
60	amb	381	---------------
61	amb	152
62			The final algorithm that is implemented in the router is basically the
63			one above but with an important difference. Instead of finding a long
64			route among a data set of 5,000,000 nodes (number of highway nodes in
65			UK at beginning of 2009) it finds one long route in a data set of
66			500,000 nodes and a few hundred very short routes in the full data set.
67			Since the time taken to find a route is proportional to the number of
68			nodes the main route takes 1/10th of the time and the very short routes
69			take almost no time at all.
70
71			The solution to making the algorithm fast is therefore to discard most
72			of the nodes and only keep the interesting ones. In this case a node is
73			deemed to be interesting if it is the junction of three or more
74			segments or the junction of two segments with different properties. In
75			the algorithm these are classed as super-nodes. Between each super-node
76			and a neighbouring super-node a super-segment is generated that
77			contains the shortest path along segments with identical properties.
78			This decision making process can be repeated until the only the most
79			important and interesting nodes remain.
80
81			To find a route now comprises finding a route along super-segments
82			between the start node and the end node followed by finding a route
83			between each adjacent pair of super-nodes in that route. (The routes
84			between the start and end nodes and all adjacent super-nodes need to be
85			found before the route using only super-nodes can be found.) This is
86			considerably quicker than using all nodes but gives a result that still
87			contains the full list of nodes that are visited.
88
89	amb	381
90	amb	152	Implementation
91	amb	381	--------------
92	amb	152
93			The hardest part of implementing this router is the data organisation.
94			The arrangement of the data to minimise the number of operations
95			required to follow a route from one node to another is much harder than
96			designing the algorithm itself.
97
98			The final implementation uses a separate table for nodes, segments and
99			ways. Each table individually is implemented as a C-language data
100			structure that is written to disk by a program which parses the
101			OpenStreetMap XML data file. In the router these data structures are
102			memory mapped so that the operating system handles the problems of
103			loading the needed data blocks from disk.
104
105			Each node contains a latitude and longitude and they are sorted
106			geographically so that converting a latitude and longitude coordinate
107			to a node is fast as well as looking up the coordinate of a node. The
108			node also contains the location in the array of segments for the first
109			segment that uses that node.
110			Each segment contains the location of the two nodes as well as the way
111			that the segment came from. The location of the next segment that uses
112			one of the two nodes is also stored; the next segment for the other
113			node is the following one in the array. The length of the segment is
114			also pre-computed and stored.
115			Each way has a name, a highway type, a list of allowed types of
116			traffic, a speed limit and any weight, height, width or length
117			restrictions.
118
119			The super-nodes are mixed in with the nodes and the super-segments are
120			mixed in with the segments. For the nodes they are the same as the
121			normal nodes, so just a flag is needed to indicate that they are super.
122			The super-segments are in addition to the normal segments so they
123			increase the database size (by about 10%) and are also marked with a
124			flag.
125
126	amb	381
127	amb	152	Practicalities
128	amb	381	--------------
129	amb	152
130	amb	381	At the time of writing (April 2010) the OpenStreetMap data for Great
131	amb	416	Britain (taken fromGeoFabrik) contains:
132	amb	381	* 14,675,098 nodes
133			+ 8,767,521 are highway nodes
134			+ 1,120,297 are super-nodes
135			* 1,876,822 ways
136	amb	416	+ 1,412,898 are highways
137			o 9,316,328 highway segments
138	amb	381	o 1,641,009 are super-segments
139			* 60,572 relations
140	amb	152
141			The database files when generated are 41.5 MB for nodes, 121.6 MB for
142			segments and 12.6 MB for ways and are stored uncompressed. By having at
143			least 200 MB or RAM available the routing can be performed with no disk
144			accesses (once the data has been read once).
145
146
147			--------
148
149	amb	381	Copyright 2008-2010 Andrew M. Bishop.