trunk/doc/ALGORITHM.txt

                             Routino : Algorithm
                             ===================


   This page describes the development of the algorithm that is used in
   Routino for finding routes.


Simplest Algorithm
------------------

   The algorithm to find a route is fundamentally simple: Start at the
   beginning, follow all possible routes and keep going until you reach
   the end.

   While this method does work, it isn't fast. To be able to find a route
   quickly needs a different algorithm, one that can find the correct
   answer without wasting time on routes that lead nowhere.


Improved Algorithm
------------------

   The simplest way to do this is to follow all possible segments from the
   starting node to the next nearest node (an intermediate node in the
   complete journey). For each node that is reached store the shortest
   route from the starting node and the length of that route. The list of
   intermediate nodes needs to be maintained in order of shortest overall
   route on the assumption that there is a straight line route from here
   to the end node.
   At each point the intermediate node that has the shortest potential
   overall journey time is processed before any other node. From the first
   node in the list follow all possible segments and place the newly
   discovered nodes into the same list ordered in the same way. This will
   tend to constrain the list of nodes examined to be the ones that are
   between the start and end nodes. If at any point you reach a node that
   has already been reached by a longer route then you can discard that
   route since the newly discovered route is shorter. Conversely if the
   previously discovered route is shorter then discard the new route.
   At some point the end node will be reached and then any routes with
   potential lengths longer than this actual route can be immediately
   discarded. The few remaining potential routes must be continued until
   they are found to be shorter or have no possibility of being shorter.
   The shortest possible route is then found.

   At all times when looking at a node only those segments that are
   possible by the chosen means of transport are followed. This allows the
   type of transport to be handled easily. When finding the quickest route
   the same rules apply except that the criterion for sorting is the
   shortest potential route (assuming that from each node to the end is
   the fastest possible type of highway).

   This method also works, but again it isn't very fast. The problem is
   that the complexity is proportional to the number of nodes or segments
   in all routes examined between the start and end nodes. Maintaining the
   list of intermediate nodes in order is the most complex part.


Final Algorithm
---------------

   The final algorithm that is implemented in the router is basically the
   one above but with an important difference. Instead of finding a long
   route among a data set of 8,000,000 nodes (number of highway nodes in
   UK at beginning of 2010) it finds one long route in a data set of
   1,000,000 nodes and a few hundred very short routes in the full data
   set. Since the time taken to find a route is proportional to the number
   of nodes that need to be considered the main route takes 1/10th of the
   time and the very short routes take almost no time at all.

   The solution to making the algorithm fast is therefore to discard most
   of the nodes and only keep the interesting ones. In this case a node is
   deemed to be interesting if it is the junction of three or more
   segments or the junction of two segments with different properties or
   has a routing restriction different from the connecting segments. In
   the algorithm and following description these are classed as
   super-nodes. Starting at each super-node a super-segment is generated
   that finishes on another super-node and contains the shortest path
   along segments with identical properties (and these properties are
   inherited by the super-segment). The point of choosing the shortest
   route is that since all segments considered have identical properties
   they will be treated identically when properties are taken into
   account. This decision making process can be repeated until the only
   the most important and interesting nodes remain.

   To find a route between a start and finish point now comprises the
   following steps (assuming a shortest route is required):

    1. Find all shortest routes from the start point along normal segments
       and stopping when super-nodes are reached.
    2. Find all shortest routes from the end point backwards along normal
       segments and stopping when super-nodes are reached.
    3. Find the shortest route along super-segments from the set of
       super-nodes in step 1 to the set of super-nodes in step 2 (taking
       into account the lengths found in steps 1 and 2 between the
       start/finish super-nodes and the ultimate start/finish point).
    4. For each super-segment in step 3 find the shortest route between
       the two end-point super-nodes.

   This multi-step process is considerably quicker than using all nodes
   but gives a result that still contains the full list of nodes that are
   visited. There are some special cases though, for example very short
   routes that do not pass through any super-nodes, or routes that start
   or finish on a super-node. In these cases one or more of the steps
   listed can be removed or simplified.

   When the first route reaches the final node the length of that route is
   retained as a benchmark. Any shorter complete route that is calculated
   later would replace this benchmark. As routes are tested any partial
   routes that are longer than the benchmark can be immediately discarded.
   Other partial routes have the length of a perfect straight highway to
   the final node added to them and if the total exceeds the benchmark
   they can also be discarded. Very quickly the number of possible routes
   is reduced until the absolute shortest is found.

   For routes that do not start or finish on a node in the original data
   set a fake node is added to an existing segment. This requires special
   handling in the algorithm but it gives mode flexibility for the start,
   finish and intermediate points in a route.


Algorithm Evolution
-------------------

   In Routino versions 1.0 to 1.4 the algorithm used to select a
   super-node was the same as above except that node properties were not
   included. Routino versions 1.4.1 to 1.5.1 used a slightly different
   algorithm which only chose nodes that were junctions between segments
   with different properties (or has a routing restriction that is
   different from connecting segments in versions 1.5 and 1.5.1). The
   addition of turn restrictions (described in more detail below) requires
   the original algorithm since the super-segments more accurately reflect
   the underlying topology.


Routing Preferences
-------------------

   One of the important features of Routino is the ability to select a
   route that is optimum for a set of criteria such as preferences for
   each type of highway, speed limits and other restrictions and highway
   properties.

   All of these features are handled by assigning a score to each segment
   while calculating the route and trying to minimise the score rather
   than simply minimising the length.

   Segment length
          When calculating the shortest route the length of the segment is
          the starting point for the score.

   Speed preference
          When calculating the quickest route the time taken calculated
          from the length of the segment and the lower of the highway's
          own speed limit and the user's speed preference for the type of
          highway is the starting point for the score.

   One-way restriction
          If a highway has the one-way property in the opposite direction
          to the desired travel and the user's preference is to obey
          one-way restrictions then the segment is ignored.

   Weight, height, width & length limits
          If a highway has one of these limits and its value is less than
          the user's specified requirement then the segment is ignored.

   Highway preference
          The highway preference specified by the user is a percentage,
          these are scaled so that the most preferred highway type has a
          weighted preference of 1.0 (0% always has a weighted preference
          of 0.0). The calculated score for a segment is divided by this
          weighted preference.

   Highway properties
          The other highway properties are specified by the user as a
          percentage and each highway either has that property or not. The
          user's property preference is scaled into the range 0.0 (for 0%)
          to 1.0 (for 100%) to give a weighted preference, a second
          "non-property" weighted preference is calculated in the same way
          after subtracting the user's preference from 100%. If a segment
          has a particular property then the calculated score is divided
          by the weighted preference for that property, if not then it is
          divided by the non-property weighted preference. A non-linear
          transformation is applied so that changing property preferences
          close to 50% do not cause large variations in routes.


Turn Restrictions
-----------------

   The addition of turn restrictions adds a set of further complications
   because it introduces a set of constraints that are far more complex
   than one-way streets.

   A turn restriction in the simplest case is a combination of a segment,
   node and segment such that routes are not allowed to go from the first
   segment to the second one through the specified node. Exceptions for
   certain types of traffic can also be specified. Currently only this
   simplest type of turn restriction is handled by the algorithm.

   The first complication of turn restrictions is that the algorithm above
   requires that super-segments are composed of segments with identical
   properties. A turn restriction is not the same in both directions so a
   super-segment cannot include any route through that turn restriction.
   The node at the centre of the turn restriction must therefore be a
   super-node to avoid this. In addition to this all nodes connected to
   the turn restriction node by a single segment must also be super-nodes
   to avoid any long-distance super-segments starting at the restricted
   node.

   The second complication of a turn restriction is that the optimum route
   may require passing through the same node more than once. This can
   happen where the route needs to work around a turn restriction by
   driving past it, turning round (on a roundabout perhaps) and coming
   back along the same highway. Without turn restrictions a route could be
   defined purely by the set of nodes that were passed; no node would
   exist more than once along a route between two points. With turn
   restrictions the route is defined by a node and the segment used to get
   there; no route between two points will ever need to follow the same
   segment in the same direction more than once.

   A side-effect of this is that a route that works around a turn
   restriction must be calculable using the super-segments that are stored
   in the database. This puts a limit on the amount of database
   optimisation that can be performed because if too many super-segments
   are removed the optimum work-around may also be removed. The solution
   to this is to ensure that the database preserves all loops that can be
   used to turn around and reverse direction, previously super-segments
   that started and finished on the same super-node were disallowed.

   Another side-effect of having the route composed of a set of locations
   (nodes) as well as the direction of travel (segments used to reach
   them) is that via points in the route can be forced to continue in the
   original direction. If the chosen method of transport obeys turn
   restrictions then it will not reverse direction at a via point but will
   find an optimum route continuing in the same direction. The only
   exception to this is when the route ahead at a waypoint is into a
   dead-end and an immediate U-turn is allowed.

   A side-effect of having the starting direction at a via point defined
   by the previous part of the route is that overall non-optimal routes
   may be found even though each section between via points is optimal.
   For a route with a start, middle and end point defined it can be the
   case that the shortest route from the start to the middle arrives in
   the opposite direction to that required for the optimal route from the
   middle to the end. The calculation of the route in separate sections
   therefore may give a non-optimum result even though each section is
   itself optimum based on the start conditions.

   Overall the presence of turn restrictions in the database makes the
   routing slower even for regions of the map that have no turn
   restrictions.


Data Implementation
-------------------

   The hardest part of implementing this router is the data organisation.
   The arrangement of the data to minimise the number of operations
   required to follow a route from one node to another is much harder than
   designing the algorithm itself.

   The final implementation uses a separate table for nodes, segments and
   ways. Each table individually is implemented as a C-language data
   structure that is written to disk by a program which parses the
   OpenStreetMap XML data file. In the router these data structures are
   memory mapped so that the operating system handles the problems of
   loading the needed data blocks from disk.

   Each node contains a latitude and longitude and they are sorted
   geographically so that converting a latitude and longitude coordinate
   to a node is fast as well as looking up the coordinate of a node. The
   node also contains the location in the array of segments for the first
   segment that uses that node.
   Each segment contains the location of the two nodes as well as the way
   that the segment came from. The location of the next segment that uses
   one of the two nodes is also stored; the next segment for the other
   node is the following one in the array. The length of the segment is
   also pre-computed and stored.
   Each way has a name, a highway type, a list of allowed types of
   traffic, a speed limit, any weight, height, width or length
   restrictions and the highway properties.

   The super-nodes are mixed in with the nodes and the super-segments are
   mixed in with the segments. For the nodes they are the same as the
   normal nodes, so just a flag is needed to indicate that they are super.
   The super-segments are in addition to the normal segments so they
   increase the database size (by about 10%) and are also marked with a
   flag.

   The relations are stored separately from the nodes, segments and ways.
   For the turn restriction relations the initial and final segments are
   stored along with the restricted node itself. Each node that has a turn
   restriction is marked in the main node storage with a flag to indicate
   this information.


Practicalities
--------------

   At the time of writing (April 2010) the OpenStreetMap data for Great
   Britain (taken from GeoFabrik) contains:
     * 14,675,098 nodes
          + 8,767,521 are highway nodes
          + 1,120,297 are super-nodes
     * 1,876,822 ways
          + 1,412,898 are highways
               o 9,316,328 highway segments
               o 1,641,009 are super-segments
     * 60,572 relations

   The database files when generated are 41.5 MB for nodes, 121.6 MB for
   segments and 12.6 MB for ways and are stored uncompressed. By having at
   least 200 MB or RAM available the routing can be performed with no disk
   accesses (once the data has been read once).


--------

Copyright 2008-2011 Andrew M. Bishop.
1	amb	429	Routino : Algorithm
2			===================
3	amb	152
4
5			This page describes the development of the algorithm that is used in
6	amb	429	Routino for finding routes.
7	amb	152
8	amb	381
9	amb	152	Simplest Algorithm
10	amb	381	------------------
11	amb	152
12			The algorithm to find a route is fundamentally simple: Start at the
13			beginning, follow all possible routes and keep going until you reach
14			the end.
15
16			While this method does work, it isn't fast. To be able to find a route
17			quickly needs a different algorithm, one that can find the correct
18			answer without wasting time on routes that lead nowhere.
19
20	amb	381
21	amb	152	Improved Algorithm
22	amb	381	------------------
23	amb	152
24			The simplest way to do this is to follow all possible segments from the
25			starting node to the next nearest node (an intermediate node in the
26			complete journey). For each node that is reached store the shortest
27			route from the starting node and the length of that route. The list of
28			intermediate nodes needs to be maintained in order of shortest overall
29			route on the assumption that there is a straight line route from here
30			to the end node.
31			At each point the intermediate node that has the shortest potential
32			overall journey time is processed before any other node. From the first
33			node in the list follow all possible segments and place the newly
34			discovered nodes into the same list ordered in the same way. This will
35			tend to constrain the list of nodes examined to be the ones that are
36			between the start and end nodes. If at any point you reach a node that
37			has already been reached by a longer route then you can discard that
38			route since the newly discovered route is shorter. Conversely if the
39			previously discovered route is shorter then discard the new route.
40			At some point the end node will be reached and then any routes with
41			potential lengths longer than this actual route can be immediately
42			discarded. The few remaining potential routes must be continued until
43			they are found to be shorter or have no possibility of being shorter.
44			The shortest possible route is then found.
45
46			At all times when looking at a node only those segments that are
47			possible by the chosen means of transport are followed. This allows the
48			type of transport to be handled easily. When finding the quickest route
49	amb	621	the same rules apply except that the criterion for sorting is the
50			shortest potential route (assuming that from each node to the end is
51			the fastest possible type of highway).
52	amb	152
53			This method also works, but again it isn't very fast. The problem is
54			that the complexity is proportional to the number of nodes or segments
55			in all routes examined between the start and end nodes. Maintaining the
56			list of intermediate nodes in order is the most complex part.
57
58	amb	381
59	amb	152	Final Algorithm
60	amb	381	---------------
61	amb	152
62			The final algorithm that is implemented in the router is basically the
63			one above but with an important difference. Instead of finding a long
64	amb	445	route among a data set of 8,000,000 nodes (number of highway nodes in
65			UK at beginning of 2010) it finds one long route in a data set of
66			1,000,000 nodes and a few hundred very short routes in the full data
67			set. Since the time taken to find a route is proportional to the number
68	amb	659	of nodes that need to be considered the main route takes 1/10th of the
69			time and the very short routes take almost no time at all.
70	amb	152
71			The solution to making the algorithm fast is therefore to discard most
72			of the nodes and only keep the interesting ones. In this case a node is
73	amb	659	deemed to be interesting if it is the junction of three or more
74			segments or the junction of two segments with different properties or
75			has a routing restriction different from the connecting segments. In
76			the algorithm and following description these are classed as
77	amb	445	super-nodes. Starting at each super-node a super-segment is generated
78			that finishes on another super-node and contains the shortest path
79			along segments with identical properties (and these properties are
80			inherited by the super-segment). The point of choosing the shortest
81			route is that since all segments considered have identical properties
82			they will be treated identically when properties are taken into
83			account. This decision making process can be repeated until the only
84			the most important and interesting nodes remain.
85	amb	152
86	amb	445	To find a route between a start and finish point now comprises the
87			following steps (assuming a shortest route is required):
88	amb	152
89	amb	445	1. Find all shortest routes from the start point along normal segments
90			and stopping when super-nodes are reached.
91			2. Find all shortest routes from the end point backwards along normal
92			segments and stopping when super-nodes are reached.
93			3. Find the shortest route along super-segments from the set of
94			super-nodes in step 1 to the set of super-nodes in step 2 (taking
95			into account the lengths found in steps 1 and 2 between the
96			start/finish super-nodes and the ultimate start/finish point).
97			4. For each super-segment in step 3 find the shortest route between
98			the two end-point super-nodes.
99	amb	381
100	amb	445	This multi-step process is considerably quicker than using all nodes
101			but gives a result that still contains the full list of nodes that are
102			visited. There are some special cases though, for example very short
103			routes that do not pass through any super-nodes, or routes that start
104			or finish on a super-node. In these cases one or more of the steps
105			listed can be removed or simplified.
106
107	amb	621	When the first route reaches the final node the length of that route is
108			retained as a benchmark. Any shorter complete route that is calculated
109			later would replace this benchmark. As routes are tested any partial
110			routes that are longer than the benchmark can be immediately discarded.
111			Other partial routes have the length of a perfect straight highway to
112			the final node added to them and if the total exceeds the benchmark
113			they can also be discarded. Very quickly the number of possible routes
114			is reduced until the absolute shortest is found.
115
116			For routes that do not start or finish on a node in the original data
117			set a fake node is added to an existing segment. This requires special
118			handling in the algorithm but it gives mode flexibility for the start,
119			finish and intermediate points in a route.
120
121
122	amb	659	Algorithm Evolution
123			-------------------
124
125			In Routino versions 1.0 to 1.4 the algorithm used to select a
126			super-node was the same as above except that node properties were not
127			included. Routino versions 1.4.1 to 1.5.1 used a slightly different
128			algorithm which only chose nodes that were junctions between segments
129			with different properties (or has a routing restriction that is
130			different from connecting segments in versions 1.5 and 1.5.1). The
131			addition of turn restrictions (described in more detail below) requires
132			the original algorithm since the super-segments more accurately reflect
133			the underlying topology.
134
135
136	amb	445	Routing Preferences
137	amb	621	-------------------
138	amb	445
139			One of the important features of Routino is the ability to select a
140			route that is optimum for a set of criteria such as preferences for
141			each type of highway, speed limits and other restrictions and highway
142			properties.
143
144			All of these features are handled by assigning a score to each segment
145			while calculating the route and trying to minimise the score rather
146			than simply minimising the length.
147
148			Segment length
149			When calculating the shortest route the length of the segment is
150			the starting point for the score.
151
152			Speed preference
153			When calculating the quickest route the time taken calculated
154			from the length of the segment and the lower of the highway's
155			own speed limit and the user's speed preference for the type of
156			highway is the starting point for the score.
157
158	amb	737	One-way restriction
159			If a highway has the one-way property in the opposite direction
160	amb	445	to the desired travel and the user's preference is to obey
161	amb	737	one-way restrictions then the segment is ignored.
162	amb	445
163			Weight, height, width & length limits
164			If a highway has one of these limits and its value is less than
165			the user's specified requirement then the segment is ignored.
166
167			Highway preference
168			The highway preference specified by the user is a percentage,
169			these are scaled so that the most preferred highway type has a
170			weighted preference of 1.0 (0% always has a weighted preference
171			of 0.0). The calculated score for a segment is divided by this
172			weighted preference.
173
174			Highway properties
175			The other highway properties are specified by the user as a
176			percentage and each highway either has that property or not. The
177			user's property preference is scaled into the range 0.0 (for 0%)
178	amb	515	to 1.0 (for 100%) to give a weighted preference, a second
179	amb	737	"non-property" weighted preference is calculated in the same way
180	amb	445	after subtracting the user's preference from 100%. If a segment
181	amb	515	has a particular property then the calculated score is divided
182	amb	671	by the weighted preference for that property, if not then it is
183			divided by the non-property weighted preference. A non-linear
184			transformation is applied so that changing property preferences
185			close to 50% do not cause large variations in routes.
186	amb	445
187	amb	901
188	amb	621	Turn Restrictions
189			-----------------
190
191	amb	659	The addition of turn restrictions adds a set of further complications
192			because it introduces a set of constraints that are far more complex
193			than one-way streets.
194	amb	621
195			A turn restriction in the simplest case is a combination of a segment,
196			node and segment such that routes are not allowed to go from the first
197			segment to the second one through the specified node. Exceptions for
198			certain types of traffic can also be specified. Currently only this
199			simplest type of turn restriction is handled by the algorithm.
200
201	amb	659	The first complication of turn restrictions is that the algorithm above
202			requires that super-segments are composed of segments with identical
203			properties. A turn restriction is not the same in both directions so a
204			super-segment cannot include any route through that turn restriction.
205			The node at the centre of the turn restriction must therefore be a
206			super-node to avoid this. In addition to this all nodes connected to
207			the turn restriction node by a single segment must also be super-nodes
208			to avoid any long-distance super-segments starting at the restricted
209			node.
210
211			The second complication of a turn restriction is that the optimum route
212	amb	621	may require passing through the same node more than once. This can
213			happen where the route needs to work around a turn restriction by
214			driving past it, turning round (on a roundabout perhaps) and coming
215			back along the same highway. Without turn restrictions a route could be
216	amb	659	defined purely by the set of nodes that were passed; no node would
217			exist more than once along a route between two points. With turn
218	amb	621	restrictions the route is defined by a node and the segment used to get
219			there; no route between two points will ever need to follow the same
220			segment in the same direction more than once.
221
222	amb	659	A side-effect of this is that a route that works around a turn
223			restriction must be calculable using the super-segments that are stored
224			in the database. This puts a limit on the amount of database
225			optimisation that can be performed because if too many super-segments
226			are removed the optimum work-around may also be removed. The solution
227			to this is to ensure that the database preserves all loops that can be
228			used to turn around and reverse direction, previously super-segments
229			that started and finished on the same super-node were disallowed.
230	amb	621
231	amb	659	Another side-effect of having the route composed of a set of locations
232			(nodes) as well as the direction of travel (segments used to reach
233			them) is that via points in the route can be forced to continue in the
234			original direction. If the chosen method of transport obeys turn
235			restrictions then it will not reverse direction at a via point but will
236	amb	737	find an optimum route continuing in the same direction. The only
237			exception to this is when the route ahead at a waypoint is into a
238			dead-end and an immediate U-turn is allowed.
239	amb	621
240	amb	659	A side-effect of having the starting direction at a via point defined
241			by the previous part of the route is that overall non-optimal routes
242			may be found even though each section between via points is optimal.
243			For a route with a start, middle and end point defined it can be the
244			case that the shortest route from the start to the middle arrives in
245			the opposite direction to that required for the optimal route from the
246			middle to the end. The calculation of the route in separate sections
247			therefore may give a non-optimum result even though each section is
248			itself optimum based on the start conditions.
249
250			Overall the presence of turn restrictions in the database makes the
251			routing slower even for regions of the map that have no turn
252			restrictions.
253
254
255	amb	621	Data Implementation
256			-------------------
257
258	amb	152	The hardest part of implementing this router is the data organisation.
259			The arrangement of the data to minimise the number of operations
260			required to follow a route from one node to another is much harder than
261			designing the algorithm itself.
262
263			The final implementation uses a separate table for nodes, segments and
264			ways. Each table individually is implemented as a C-language data
265			structure that is written to disk by a program which parses the
266			OpenStreetMap XML data file. In the router these data structures are
267			memory mapped so that the operating system handles the problems of
268			loading the needed data blocks from disk.
269
270			Each node contains a latitude and longitude and they are sorted
271			geographically so that converting a latitude and longitude coordinate
272			to a node is fast as well as looking up the coordinate of a node. The
273			node also contains the location in the array of segments for the first
274			segment that uses that node.
275			Each segment contains the location of the two nodes as well as the way
276			that the segment came from. The location of the next segment that uses
277			one of the two nodes is also stored; the next segment for the other
278			node is the following one in the array. The length of the segment is
279			also pre-computed and stored.
280			Each way has a name, a highway type, a list of allowed types of
281	amb	445	traffic, a speed limit, any weight, height, width or length
282			restrictions and the highway properties.
283	amb	152
284			The super-nodes are mixed in with the nodes and the super-segments are
285			mixed in with the segments. For the nodes they are the same as the
286			normal nodes, so just a flag is needed to indicate that they are super.
287			The super-segments are in addition to the normal segments so they
288			increase the database size (by about 10%) and are also marked with a
289			flag.
290
291	amb	621	The relations are stored separately from the nodes, segments and ways.
292			For the turn restriction relations the initial and final segments are
293			stored along with the restricted node itself. Each node that has a turn
294			restriction is marked in the main node storage with a flag to indicate
295			this information.
296	amb	381
297	amb	621
298	amb	152	Practicalities
299	amb	381	--------------
300	amb	152
301	amb	381	At the time of writing (April 2010) the OpenStreetMap data for Great
302	amb	429	Britain (taken from GeoFabrik) contains:
303	amb	381	* 14,675,098 nodes
304			+ 8,767,521 are highway nodes
305			+ 1,120,297 are super-nodes
306			* 1,876,822 ways
307	amb	416	+ 1,412,898 are highways
308			o 9,316,328 highway segments
309	amb	381	o 1,641,009 are super-segments
310			* 60,572 relations
311	amb	152
312			The database files when generated are 41.5 MB for nodes, 121.6 MB for
313			segments and 12.6 MB for ways and are stored uncompressed. By having at
314			least 200 MB or RAM available the routing can be performed with no disk
315			accesses (once the data has been read once).
316
317
318			--------
319
320	amb	621	Copyright 2008-2011 Andrew M. Bishop.