Check out the latest version of Routino: svn co http://routino.org/svn/trunk routino
Contents of /trunk/doc/ALGORITHM.txt
Parent Directory
|
Revision Log
Revision 1461 -
(show annotations)
(download)
Fri Jul 12 15:04:56 2013 UTC (11 years, 8 months ago) by amb
File MIME type: text/plain
File size: 19588 byte(s)
Fri Jul 12 15:04:56 2013 UTC (11 years, 8 months ago) by amb
File MIME type: text/plain
File size: 19588 byte(s)
Update the algorithm documentation with a description of the algorithm used for finding the shortest path.
1 | Routino : Algorithm |
2 | =================== |
3 | |
4 | |
5 | This page describes the development of the algorithm that is used in |
6 | Routino for finding routes. |
7 | |
8 | |
9 | Simplest Algorithm |
10 | ------------------ |
11 | |
12 | The algorithm to find a route is fundamentally simple: Start at the |
13 | beginning, follow all possible routes and keep going until you reach |
14 | the end. |
15 | |
16 | While this method does work, it isn't fast. To be able to find a route |
17 | quickly needs a different algorithm, one that can find the correct |
18 | answer without wasting time on routes that lead nowhere. |
19 | |
20 | |
21 | Improved Algorithm |
22 | ------------------ |
23 | |
24 | The simplest way to do this is to follow all possible segments from the |
25 | starting node to the next nearest node (an intermediate node in the |
26 | complete journey). For each node that is reached store the shortest |
27 | route from the starting node and the length of that route. The list of |
28 | intermediate nodes needs to be maintained in order of shortest overall |
29 | route on the assumption that there is a straight line route from here |
30 | to the end node. |
31 | At each point the intermediate node that has the shortest potential |
32 | overall journey time is processed before any other node. From the first |
33 | node in the list follow all possible segments and place the newly |
34 | discovered nodes into the same list ordered in the same way. This will |
35 | tend to constrain the list of nodes examined to be the ones that are |
36 | between the start and end nodes. If at any point you reach a node that |
37 | has already been reached by a longer route then you can discard that |
38 | route since the newly discovered route is shorter. Conversely if the |
39 | previously discovered route is shorter then discard the new route. |
40 | At some point the end node will be reached and then any routes with |
41 | potential lengths longer than this actual route can be immediately |
42 | discarded. The few remaining potential routes must be continued until |
43 | they are found to be shorter or have no possibility of being shorter. |
44 | The shortest possible route is then found. |
45 | |
46 | At all times when looking at a node only those segments that are |
47 | possible by the chosen means of transport are followed. This allows the |
48 | type of transport to be handled easily. When finding the quickest route |
49 | the same rules apply except that the criterion for sorting is the |
50 | shortest potential route (assuming that from each node to the end is |
51 | the fastest possible type of highway). |
52 | |
53 | This method also works, but again it isn't very fast. The problem is |
54 | that the complexity is proportional to the number of nodes or segments |
55 | in all routes examined between the start and end nodes. Maintaining the |
56 | list of intermediate nodes in order is the most complex part. |
57 | |
58 | |
59 | Final Algorithm |
60 | --------------- |
61 | |
62 | The final algorithm that is implemented in the router is basically the |
63 | one above but with an important difference. Instead of finding a long |
64 | route among a data set of 8,000,000 nodes (number of highway nodes in |
65 | UK at beginning of 2010) it finds one long route in a data set of |
66 | 1,000,000 nodes and a few hundred very short routes in the full data |
67 | set. Since the time taken to find a route is proportional to the number |
68 | of nodes that need to be considered the main route takes 1/10th of the |
69 | time and the very short routes take almost no time at all. |
70 | |
71 | The solution to making the algorithm fast is therefore to discard most |
72 | of the nodes and only keep the interesting ones. In this case a node is |
73 | deemed to be interesting if it is the junction of three or more |
74 | segments or the junction of two segments with different properties or |
75 | has a routing restriction different from the connecting segments. In |
76 | the algorithm and following description these are classed as |
77 | super-nodes. Starting at each super-node a super-segment is generated |
78 | that finishes on another super-node and contains the shortest path |
79 | along segments with identical properties (and these properties are |
80 | inherited by the super-segment). The point of choosing the shortest |
81 | route is that since all segments considered have identical properties |
82 | they will be treated identically when properties are taken into |
83 | account. This decision making process can be repeated until the only |
84 | the most important and interesting nodes remain. |
85 | |
86 | To find a route between a start and finish point now comprises the |
87 | following steps (assuming a shortest route is required): |
88 | |
89 | 1. Find all shortest routes from the start point along normal segments |
90 | and stopping when super-nodes are reached. |
91 | 2. Find all shortest routes from the end point backwards along normal |
92 | segments and stopping when super-nodes are reached. |
93 | 3. Find the shortest route along super-segments from the set of |
94 | super-nodes in step 1 to the set of super-nodes in step 2 (taking |
95 | into account the lengths found in steps 1 and 2 between the |
96 | start/finish super-nodes and the ultimate start/finish point). |
97 | 4. For each super-segment in step 3 find the shortest route between |
98 | the two end-point super-nodes. |
99 | |
100 | This multi-step process is considerably quicker than using all nodes |
101 | but gives a result that still contains the full list of nodes that are |
102 | visited. There are some special cases though, for example very short |
103 | routes that do not pass through any super-nodes, or routes that start |
104 | or finish on a super-node. In these cases one or more of the steps |
105 | listed can be removed or simplified. |
106 | |
107 | When the first route reaches the final node the length of that route is |
108 | retained as a benchmark. Any shorter complete route that is calculated |
109 | later would replace this benchmark. As routes are tested any partial |
110 | routes that are longer than the benchmark can be immediately discarded. |
111 | Other partial routes have the length of a perfect straight highway to |
112 | the final node added to them and if the total exceeds the benchmark |
113 | they can also be discarded. Very quickly the number of possible routes |
114 | is reduced until the absolute shortest is found. |
115 | |
116 | For routes that do not start or finish on a node in the original data |
117 | set a fake node is added to an existing segment. This requires special |
118 | handling in the algorithm but it gives mode flexibility for the start, |
119 | finish and intermediate points in a route. |
120 | |
121 | Algorithm Evolution |
122 | - - - - - - - - - - |
123 | |
124 | In Routino versions 1.0 to 1.4 the algorithm used to select a |
125 | super-node was the same as above except that node properties were not |
126 | included. Routino versions 1.4.1 to 1.5.1 used a slightly different |
127 | algorithm which only chose nodes that were junctions between segments |
128 | with different properties (or has a routing restriction that is |
129 | different from connecting segments in versions 1.5 and 1.5.1). The |
130 | addition of turn restrictions (described in more detail below) requires |
131 | the original algorithm since the super-segments more accurately reflect |
132 | the underlying topology. |
133 | |
134 | Algorithm Implementation |
135 | - - - - - - - - - - - - |
136 | |
137 | The algorithm that is used for finding the route between the |
138 | super-nodes using super-segments is the A* algorithm (or a slight |
139 | variation of it). This was not a deliberate design decision, but |
140 | evolved into it during development. This algorithm relies on |
141 | calculating the lowest score (shortest distance or quickest time) to |
142 | each node from the starting node. The remaining score for the path to |
143 | the destination node is estimated (based on a straight line using the |
144 | fastest type of highway) and added to the current score and the result |
145 | recorded. At each step the unvisited node that has the lowest current |
146 | score is examined and all nodes connected to it have their scores |
147 | calculated. When the destination node has been reached all remaining |
148 | unvisited nodes with scores higher than the destination node's score |
149 | can be discarded and the few remaining nodes examined. |
150 | |
151 | The algorithm used to find the route between super-nodes using normal |
152 | segments is Dijkstra's algorithm (although it is implemented as the |
153 | same algorithm as above but with no estimated cost). Since these routes |
154 | tend to be short and the CPU time for calculating the heuristic cost |
155 | function is relatively large this tends to give a quicker solution. |
156 | |
157 | |
158 | Routing Preferences |
159 | ------------------- |
160 | |
161 | One of the important features of Routino is the ability to select a |
162 | route that is optimum for a set of criteria such as preferences for |
163 | each type of highway, speed limits and other restrictions and highway |
164 | properties. |
165 | |
166 | All of these features are handled by assigning a score to each segment |
167 | while calculating the route and trying to minimise the score rather |
168 | than simply minimising the length. |
169 | |
170 | Segment length |
171 | When calculating the shortest route the length of the segment is |
172 | the starting point for the score. |
173 | |
174 | Speed preference |
175 | When calculating the quickest route the time taken calculated |
176 | from the length of the segment and the lower of the highway's |
177 | own speed limit and the user's speed preference for the type of |
178 | highway is the starting point for the score. |
179 | |
180 | One-way restriction |
181 | If a highway has the one-way property in the opposite direction |
182 | to the desired travel and the user's preference is to obey |
183 | one-way restrictions then the segment is ignored. |
184 | |
185 | Weight, height, width & length limits |
186 | If a highway has one of these limits and its value is less than |
187 | the user's specified requirement then the segment is ignored. |
188 | |
189 | Highway preference |
190 | The highway preference specified by the user is a percentage, |
191 | these are scaled so that the most preferred highway type has a |
192 | weighted preference of 1.0 (0% always has a weighted preference |
193 | of 0.0). The calculated score for a segment is divided by this |
194 | weighted preference. |
195 | |
196 | Highway properties |
197 | The other highway properties are specified by the user as a |
198 | percentage and each highway either has that property or not. The |
199 | user's property preference is scaled into the range 0.0 (for 0%) |
200 | to 1.0 (for 100%) to give a weighted preference, a second |
201 | "non-property" weighted preference is calculated in the same way |
202 | after subtracting the user's preference from 100%. If a segment |
203 | has a particular property then the calculated score is divided |
204 | by the weighted preference for that property, if not then it is |
205 | divided by the non-property weighted preference. A non-linear |
206 | transformation is applied so that changing property preferences |
207 | close to 50% do not cause large variations in routes. |
208 | |
209 | |
210 | Data Pruning |
211 | ------------ |
212 | |
213 | From version 2.2 there are options to "prune" nodes and segments from |
214 | the input data which means to remove nodes and/or segments without |
215 | significantly changing the routing results. |
216 | |
217 | The pruning options must meet a number of conditions to be useful: |
218 | * The topology relevant to routing must remain unchanged. The |
219 | instructions that are produced from the reduced set of nodes and |
220 | segments must be sufficiently accurate for anybody trying to follow |
221 | them on the ground. |
222 | * Any restrictions belonging to nodes or segments that stop certain |
223 | types of traffic from following a particular highway must be |
224 | preserved. |
225 | * The total length must be calculated using the original data and not |
226 | the simplified data which by its nature will typically be shorter. |
227 | * The location of the remaining nodes and segments must be a good |
228 | representation of the original nodes and segments. Since the |
229 | calculated route may be displayed on a map the remaining nodes and |
230 | segments must clearly indicate the route to take. |
231 | |
232 | The prune options all have user-controllable parameters which allow the |
233 | geographical accuracy to be controlled. This means that although the |
234 | topology is the same the geographical accuracy can be sacrificed |
235 | slightly to minimise the number of nodes and segments. |
236 | |
237 | The pruning options that are available are: |
238 | * Removing the access permissions for a transport type from segments |
239 | if it is not possible to route that transport type from those |
240 | segments to a significant number of other places. The limit on the |
241 | pruning is set by the total length of the isolated group of |
242 | segments. This significantly increases the chance that a route will |
243 | be found by not putting waypoints in inaccessible places. |
244 | * Removing short segments, the limit is set by the length of the |
245 | segment. This removes a number of redundant segments (and |
246 | associated nodes) but rules are applied to ensure that removing the |
247 | segments does not alter junction topology or remove node access |
248 | permissions or changes in way properties. |
249 | * Removing nodes from almost straight highways, the limit is set by |
250 | the distance between the remaining segments and the original nodes. |
251 | This removes a large number of redundant nodes (and therefore |
252 | segments) but again care is taken not to remove node access |
253 | permissions or changes in way properties. |
254 | |
255 | |
256 | Turn Restrictions |
257 | ----------------- |
258 | |
259 | The addition of turn restrictions in version 2.0 adds a set of further |
260 | complications because it introduces a set of constraints that are far |
261 | more complex than one-way streets. |
262 | |
263 | A turn restriction in the simplest case is a combination of a segment, |
264 | node and segment such that routes are not allowed to go from the first |
265 | segment to the second one through the specified node. Exceptions for |
266 | certain types of traffic can also be specified. Currently only this |
267 | simplest type of turn restriction is handled by the algorithm. |
268 | |
269 | The first complication of turn restrictions is that the algorithm above |
270 | requires that super-segments are composed of segments with identical |
271 | properties. A turn restriction is not the same in both directions so a |
272 | super-segment cannot include any route through that turn restriction. |
273 | The node at the centre of the turn restriction must therefore be a |
274 | super-node to avoid this. In addition to this all nodes connected to |
275 | the turn restriction node by a single segment must also be super-nodes |
276 | to avoid any long-distance super-segments starting at the restricted |
277 | node. |
278 | |
279 | The second complication of a turn restriction is that the optimum route |
280 | may require passing through the same node more than once. This can |
281 | happen where the route needs to work around a turn restriction by |
282 | driving past it, turning round (on a roundabout perhaps) and coming |
283 | back along the same highway. Without turn restrictions a route could be |
284 | defined purely by the set of nodes that were passed; no node would |
285 | exist more than once along a route between two points. With turn |
286 | restrictions the route is defined by a node and the segment used to get |
287 | there; no route between two points will ever need to follow the same |
288 | segment in the same direction more than once. This means that the |
289 | optimisation algorithm calculates scores for directed segments (indexed |
290 | by segment and end node) rather than for nodes. |
291 | |
292 | A side-effect of this is that a route that works around a turn |
293 | restriction must be calculable using the super-segments that are stored |
294 | in the database. This puts a limit on the amount of database |
295 | optimisation that can be performed because if too many super-segments |
296 | are removed the optimum work-around may also be removed. The solution |
297 | to this is to ensure that the database preserves all loops that can be |
298 | used to turn around and reverse direction, previously super-segments |
299 | that started and finished on the same super-node were disallowed. |
300 | |
301 | Another side-effect of having the route composed of a set of locations |
302 | (nodes) as well as the direction of travel (segments used to reach |
303 | them) is that via points in the route can be forced to continue in the |
304 | original direction. If the chosen method of transport obeys turn |
305 | restrictions then it will not reverse direction at a via point but will |
306 | find an optimum route continuing in the same direction. The only |
307 | exception to this is when the route ahead at a waypoint is into a |
308 | dead-end and an immediate U-turn is allowed. |
309 | |
310 | A side-effect of having the starting direction at a via point defined |
311 | by the previous part of the route is that overall non-optimal routes |
312 | may be found even though each section between via points is optimal. |
313 | For a route with a start, middle and end point defined it can be the |
314 | case that the shortest route from the start to the middle arrives in |
315 | the opposite direction to that required for the optimal route from the |
316 | middle to the end. The calculation of the route in separate sections |
317 | therefore may give a non-optimum result even though each section is |
318 | itself optimum based on the start conditions. |
319 | |
320 | Overall the presence of turn restrictions in the database makes the |
321 | routing slower even for regions of the map that have no turn |
322 | restrictions. |
323 | |
324 | |
325 | Data Implementation |
326 | ------------------- |
327 | |
328 | The hardest part of implementing this router is the data organisation. |
329 | The arrangement of the data to minimise the number of operations |
330 | required to follow a route from one node to another is much harder than |
331 | designing the algorithm itself. |
332 | |
333 | The final implementation uses a separate table for nodes, segments and |
334 | ways. Each table individually is implemented as a C-language data |
335 | structure that is written to disk by a program which parses the |
336 | OpenStreetMap XML data file. In the router these data structures are |
337 | memory mapped so that the operating system handles the problems of |
338 | loading the needed data blocks from disk. |
339 | |
340 | Each node contains a latitude and longitude and they are sorted |
341 | geographically so that converting a latitude and longitude coordinate |
342 | to a node is fast as well as looking up the coordinate of a node. The |
343 | node also contains the location in the array of segments for the first |
344 | segment that uses that node. |
345 | Each segment contains the location of the two nodes as well as the way |
346 | that the segment came from. The location of the next segment that uses |
347 | one of the two nodes is also stored; the next segment for the other |
348 | node is the following one in the array. The length of the segment is |
349 | also pre-computed and stored. |
350 | Each way has a name, a highway type, a list of allowed types of |
351 | traffic, a speed limit, any weight, height, width or length |
352 | restrictions and the highway properties. |
353 | |
354 | The super-nodes are mixed in with the nodes and the super-segments are |
355 | mixed in with the segments. For the nodes they are the same as the |
356 | normal nodes, so just a flag is needed to indicate that they are super. |
357 | The super-segments are in addition to the normal segments so they |
358 | increase the database size (by about 10%) and are also marked with a |
359 | flag. Some segments are therefore flagged as both normal segments and |
360 | super-segments if they both have the same end nodes. |
361 | |
362 | The relations are stored separately from the nodes, segments and ways. |
363 | For the turn restriction relations the initial and final segments are |
364 | stored along with the restricted node itself. Each node that has a turn |
365 | restriction is marked in the main node storage with a flag to indicate |
366 | this information. |
367 | |
368 | |
369 | -------- |
370 | |
371 | Copyright 2008-2013 Andrew M. Bishop. |
Properties
Name | Value |
---|---|
cvs:description | Description of the algorithm. |