Check out the latest version of Routino: svn co http://routino.org/svn/trunk routino
Annotation of /trunk/doc/ALGORITHM.txt
Parent Directory
|
Revision Log
Revision 901 -
(hide annotations)
(download)
Sat Nov 12 11:26:25 2011 UTC (13 years, 4 months ago) by amb
File MIME type: text/plain
File size: 16169 byte(s)
Sat Nov 12 11:26:25 2011 UTC (13 years, 4 months ago) by amb
File MIME type: text/plain
File size: 16169 byte(s)
Small formatting changes.
1 | amb | 429 | Routino : Algorithm |
2 | =================== | ||
3 | amb | 152 | |
4 | |||
5 | This page describes the development of the algorithm that is used in | ||
6 | amb | 429 | Routino for finding routes. |
7 | amb | 152 | |
8 | amb | 381 | |
9 | amb | 152 | Simplest Algorithm |
10 | amb | 381 | ------------------ |
11 | amb | 152 | |
12 | The algorithm to find a route is fundamentally simple: Start at the | ||
13 | beginning, follow all possible routes and keep going until you reach | ||
14 | the end. | ||
15 | |||
16 | While this method does work, it isn't fast. To be able to find a route | ||
17 | quickly needs a different algorithm, one that can find the correct | ||
18 | answer without wasting time on routes that lead nowhere. | ||
19 | |||
20 | amb | 381 | |
21 | amb | 152 | Improved Algorithm |
22 | amb | 381 | ------------------ |
23 | amb | 152 | |
24 | The simplest way to do this is to follow all possible segments from the | ||
25 | starting node to the next nearest node (an intermediate node in the | ||
26 | complete journey). For each node that is reached store the shortest | ||
27 | route from the starting node and the length of that route. The list of | ||
28 | intermediate nodes needs to be maintained in order of shortest overall | ||
29 | route on the assumption that there is a straight line route from here | ||
30 | to the end node. | ||
31 | At each point the intermediate node that has the shortest potential | ||
32 | overall journey time is processed before any other node. From the first | ||
33 | node in the list follow all possible segments and place the newly | ||
34 | discovered nodes into the same list ordered in the same way. This will | ||
35 | tend to constrain the list of nodes examined to be the ones that are | ||
36 | between the start and end nodes. If at any point you reach a node that | ||
37 | has already been reached by a longer route then you can discard that | ||
38 | route since the newly discovered route is shorter. Conversely if the | ||
39 | previously discovered route is shorter then discard the new route. | ||
40 | At some point the end node will be reached and then any routes with | ||
41 | potential lengths longer than this actual route can be immediately | ||
42 | discarded. The few remaining potential routes must be continued until | ||
43 | they are found to be shorter or have no possibility of being shorter. | ||
44 | The shortest possible route is then found. | ||
45 | |||
46 | At all times when looking at a node only those segments that are | ||
47 | possible by the chosen means of transport are followed. This allows the | ||
48 | type of transport to be handled easily. When finding the quickest route | ||
49 | amb | 621 | the same rules apply except that the criterion for sorting is the |
50 | shortest potential route (assuming that from each node to the end is | ||
51 | the fastest possible type of highway). | ||
52 | amb | 152 | |
53 | This method also works, but again it isn't very fast. The problem is | ||
54 | that the complexity is proportional to the number of nodes or segments | ||
55 | in all routes examined between the start and end nodes. Maintaining the | ||
56 | list of intermediate nodes in order is the most complex part. | ||
57 | |||
58 | amb | 381 | |
59 | amb | 152 | Final Algorithm |
60 | amb | 381 | --------------- |
61 | amb | 152 | |
62 | The final algorithm that is implemented in the router is basically the | ||
63 | one above but with an important difference. Instead of finding a long | ||
64 | amb | 445 | route among a data set of 8,000,000 nodes (number of highway nodes in |
65 | UK at beginning of 2010) it finds one long route in a data set of | ||
66 | 1,000,000 nodes and a few hundred very short routes in the full data | ||
67 | set. Since the time taken to find a route is proportional to the number | ||
68 | amb | 659 | of nodes that need to be considered the main route takes 1/10th of the |
69 | time and the very short routes take almost no time at all. | ||
70 | amb | 152 | |
71 | The solution to making the algorithm fast is therefore to discard most | ||
72 | of the nodes and only keep the interesting ones. In this case a node is | ||
73 | amb | 659 | deemed to be interesting if it is the junction of three or more |
74 | segments or the junction of two segments with different properties or | ||
75 | has a routing restriction different from the connecting segments. In | ||
76 | the algorithm and following description these are classed as | ||
77 | amb | 445 | super-nodes. Starting at each super-node a super-segment is generated |
78 | that finishes on another super-node and contains the shortest path | ||
79 | along segments with identical properties (and these properties are | ||
80 | inherited by the super-segment). The point of choosing the shortest | ||
81 | route is that since all segments considered have identical properties | ||
82 | they will be treated identically when properties are taken into | ||
83 | account. This decision making process can be repeated until the only | ||
84 | the most important and interesting nodes remain. | ||
85 | amb | 152 | |
86 | amb | 445 | To find a route between a start and finish point now comprises the |
87 | following steps (assuming a shortest route is required): | ||
88 | amb | 152 | |
89 | amb | 445 | 1. Find all shortest routes from the start point along normal segments |
90 | and stopping when super-nodes are reached. | ||
91 | 2. Find all shortest routes from the end point backwards along normal | ||
92 | segments and stopping when super-nodes are reached. | ||
93 | 3. Find the shortest route along super-segments from the set of | ||
94 | super-nodes in step 1 to the set of super-nodes in step 2 (taking | ||
95 | into account the lengths found in steps 1 and 2 between the | ||
96 | start/finish super-nodes and the ultimate start/finish point). | ||
97 | 4. For each super-segment in step 3 find the shortest route between | ||
98 | the two end-point super-nodes. | ||
99 | amb | 381 | |
100 | amb | 445 | This multi-step process is considerably quicker than using all nodes |
101 | but gives a result that still contains the full list of nodes that are | ||
102 | visited. There are some special cases though, for example very short | ||
103 | routes that do not pass through any super-nodes, or routes that start | ||
104 | or finish on a super-node. In these cases one or more of the steps | ||
105 | listed can be removed or simplified. | ||
106 | |||
107 | amb | 621 | When the first route reaches the final node the length of that route is |
108 | retained as a benchmark. Any shorter complete route that is calculated | ||
109 | later would replace this benchmark. As routes are tested any partial | ||
110 | routes that are longer than the benchmark can be immediately discarded. | ||
111 | Other partial routes have the length of a perfect straight highway to | ||
112 | the final node added to them and if the total exceeds the benchmark | ||
113 | they can also be discarded. Very quickly the number of possible routes | ||
114 | is reduced until the absolute shortest is found. | ||
115 | |||
116 | For routes that do not start or finish on a node in the original data | ||
117 | set a fake node is added to an existing segment. This requires special | ||
118 | handling in the algorithm but it gives mode flexibility for the start, | ||
119 | finish and intermediate points in a route. | ||
120 | |||
121 | |||
122 | amb | 659 | Algorithm Evolution |
123 | ------------------- | ||
124 | |||
125 | In Routino versions 1.0 to 1.4 the algorithm used to select a | ||
126 | super-node was the same as above except that node properties were not | ||
127 | included. Routino versions 1.4.1 to 1.5.1 used a slightly different | ||
128 | algorithm which only chose nodes that were junctions between segments | ||
129 | with different properties (or has a routing restriction that is | ||
130 | different from connecting segments in versions 1.5 and 1.5.1). The | ||
131 | addition of turn restrictions (described in more detail below) requires | ||
132 | the original algorithm since the super-segments more accurately reflect | ||
133 | the underlying topology. | ||
134 | |||
135 | |||
136 | amb | 445 | Routing Preferences |
137 | amb | 621 | ------------------- |
138 | amb | 445 | |
139 | One of the important features of Routino is the ability to select a | ||
140 | route that is optimum for a set of criteria such as preferences for | ||
141 | each type of highway, speed limits and other restrictions and highway | ||
142 | properties. | ||
143 | |||
144 | All of these features are handled by assigning a score to each segment | ||
145 | while calculating the route and trying to minimise the score rather | ||
146 | than simply minimising the length. | ||
147 | |||
148 | Segment length | ||
149 | When calculating the shortest route the length of the segment is | ||
150 | the starting point for the score. | ||
151 | |||
152 | Speed preference | ||
153 | When calculating the quickest route the time taken calculated | ||
154 | from the length of the segment and the lower of the highway's | ||
155 | own speed limit and the user's speed preference for the type of | ||
156 | highway is the starting point for the score. | ||
157 | |||
158 | amb | 737 | One-way restriction |
159 | If a highway has the one-way property in the opposite direction | ||
160 | amb | 445 | to the desired travel and the user's preference is to obey |
161 | amb | 737 | one-way restrictions then the segment is ignored. |
162 | amb | 445 | |
163 | Weight, height, width & length limits | ||
164 | If a highway has one of these limits and its value is less than | ||
165 | the user's specified requirement then the segment is ignored. | ||
166 | |||
167 | Highway preference | ||
168 | The highway preference specified by the user is a percentage, | ||
169 | these are scaled so that the most preferred highway type has a | ||
170 | weighted preference of 1.0 (0% always has a weighted preference | ||
171 | of 0.0). The calculated score for a segment is divided by this | ||
172 | weighted preference. | ||
173 | |||
174 | Highway properties | ||
175 | The other highway properties are specified by the user as a | ||
176 | percentage and each highway either has that property or not. The | ||
177 | user's property preference is scaled into the range 0.0 (for 0%) | ||
178 | amb | 515 | to 1.0 (for 100%) to give a weighted preference, a second |
179 | amb | 737 | "non-property" weighted preference is calculated in the same way |
180 | amb | 445 | after subtracting the user's preference from 100%. If a segment |
181 | amb | 515 | has a particular property then the calculated score is divided |
182 | amb | 671 | by the weighted preference for that property, if not then it is |
183 | divided by the non-property weighted preference. A non-linear | ||
184 | transformation is applied so that changing property preferences | ||
185 | close to 50% do not cause large variations in routes. | ||
186 | amb | 445 | |
187 | amb | 901 | |
188 | amb | 621 | Turn Restrictions |
189 | ----------------- | ||
190 | |||
191 | amb | 659 | The addition of turn restrictions adds a set of further complications |
192 | because it introduces a set of constraints that are far more complex | ||
193 | than one-way streets. | ||
194 | amb | 621 | |
195 | A turn restriction in the simplest case is a combination of a segment, | ||
196 | node and segment such that routes are not allowed to go from the first | ||
197 | segment to the second one through the specified node. Exceptions for | ||
198 | certain types of traffic can also be specified. Currently only this | ||
199 | simplest type of turn restriction is handled by the algorithm. | ||
200 | |||
201 | amb | 659 | The first complication of turn restrictions is that the algorithm above |
202 | requires that super-segments are composed of segments with identical | ||
203 | properties. A turn restriction is not the same in both directions so a | ||
204 | super-segment cannot include any route through that turn restriction. | ||
205 | The node at the centre of the turn restriction must therefore be a | ||
206 | super-node to avoid this. In addition to this all nodes connected to | ||
207 | the turn restriction node by a single segment must also be super-nodes | ||
208 | to avoid any long-distance super-segments starting at the restricted | ||
209 | node. | ||
210 | |||
211 | The second complication of a turn restriction is that the optimum route | ||
212 | amb | 621 | may require passing through the same node more than once. This can |
213 | happen where the route needs to work around a turn restriction by | ||
214 | driving past it, turning round (on a roundabout perhaps) and coming | ||
215 | back along the same highway. Without turn restrictions a route could be | ||
216 | amb | 659 | defined purely by the set of nodes that were passed; no node would |
217 | exist more than once along a route between two points. With turn | ||
218 | amb | 621 | restrictions the route is defined by a node and the segment used to get |
219 | there; no route between two points will ever need to follow the same | ||
220 | segment in the same direction more than once. | ||
221 | |||
222 | amb | 659 | A side-effect of this is that a route that works around a turn |
223 | restriction must be calculable using the super-segments that are stored | ||
224 | in the database. This puts a limit on the amount of database | ||
225 | optimisation that can be performed because if too many super-segments | ||
226 | are removed the optimum work-around may also be removed. The solution | ||
227 | to this is to ensure that the database preserves all loops that can be | ||
228 | used to turn around and reverse direction, previously super-segments | ||
229 | that started and finished on the same super-node were disallowed. | ||
230 | amb | 621 | |
231 | amb | 659 | Another side-effect of having the route composed of a set of locations |
232 | (nodes) as well as the direction of travel (segments used to reach | ||
233 | them) is that via points in the route can be forced to continue in the | ||
234 | original direction. If the chosen method of transport obeys turn | ||
235 | restrictions then it will not reverse direction at a via point but will | ||
236 | amb | 737 | find an optimum route continuing in the same direction. The only |
237 | exception to this is when the route ahead at a waypoint is into a | ||
238 | dead-end and an immediate U-turn is allowed. | ||
239 | amb | 621 | |
240 | amb | 659 | A side-effect of having the starting direction at a via point defined |
241 | by the previous part of the route is that overall non-optimal routes | ||
242 | may be found even though each section between via points is optimal. | ||
243 | For a route with a start, middle and end point defined it can be the | ||
244 | case that the shortest route from the start to the middle arrives in | ||
245 | the opposite direction to that required for the optimal route from the | ||
246 | middle to the end. The calculation of the route in separate sections | ||
247 | therefore may give a non-optimum result even though each section is | ||
248 | itself optimum based on the start conditions. | ||
249 | |||
250 | Overall the presence of turn restrictions in the database makes the | ||
251 | routing slower even for regions of the map that have no turn | ||
252 | restrictions. | ||
253 | |||
254 | |||
255 | amb | 621 | Data Implementation |
256 | ------------------- | ||
257 | |||
258 | amb | 152 | The hardest part of implementing this router is the data organisation. |
259 | The arrangement of the data to minimise the number of operations | ||
260 | required to follow a route from one node to another is much harder than | ||
261 | designing the algorithm itself. | ||
262 | |||
263 | The final implementation uses a separate table for nodes, segments and | ||
264 | ways. Each table individually is implemented as a C-language data | ||
265 | structure that is written to disk by a program which parses the | ||
266 | OpenStreetMap XML data file. In the router these data structures are | ||
267 | memory mapped so that the operating system handles the problems of | ||
268 | loading the needed data blocks from disk. | ||
269 | |||
270 | Each node contains a latitude and longitude and they are sorted | ||
271 | geographically so that converting a latitude and longitude coordinate | ||
272 | to a node is fast as well as looking up the coordinate of a node. The | ||
273 | node also contains the location in the array of segments for the first | ||
274 | segment that uses that node. | ||
275 | Each segment contains the location of the two nodes as well as the way | ||
276 | that the segment came from. The location of the next segment that uses | ||
277 | one of the two nodes is also stored; the next segment for the other | ||
278 | node is the following one in the array. The length of the segment is | ||
279 | also pre-computed and stored. | ||
280 | Each way has a name, a highway type, a list of allowed types of | ||
281 | amb | 445 | traffic, a speed limit, any weight, height, width or length |
282 | restrictions and the highway properties. | ||
283 | amb | 152 | |
284 | The super-nodes are mixed in with the nodes and the super-segments are | ||
285 | mixed in with the segments. For the nodes they are the same as the | ||
286 | normal nodes, so just a flag is needed to indicate that they are super. | ||
287 | The super-segments are in addition to the normal segments so they | ||
288 | increase the database size (by about 10%) and are also marked with a | ||
289 | flag. | ||
290 | |||
291 | amb | 621 | The relations are stored separately from the nodes, segments and ways. |
292 | For the turn restriction relations the initial and final segments are | ||
293 | stored along with the restricted node itself. Each node that has a turn | ||
294 | restriction is marked in the main node storage with a flag to indicate | ||
295 | this information. | ||
296 | amb | 381 | |
297 | amb | 621 | |
298 | amb | 152 | Practicalities |
299 | amb | 381 | -------------- |
300 | amb | 152 | |
301 | amb | 381 | At the time of writing (April 2010) the OpenStreetMap data for Great |
302 | amb | 429 | Britain (taken from GeoFabrik) contains: |
303 | amb | 381 | * 14,675,098 nodes |
304 | + 8,767,521 are highway nodes | ||
305 | + 1,120,297 are super-nodes | ||
306 | * 1,876,822 ways | ||
307 | amb | 416 | + 1,412,898 are highways |
308 | o 9,316,328 highway segments | ||
309 | amb | 381 | o 1,641,009 are super-segments |
310 | * 60,572 relations | ||
311 | amb | 152 | |
312 | The database files when generated are 41.5 MB for nodes, 121.6 MB for | ||
313 | segments and 12.6 MB for ways and are stored uncompressed. By having at | ||
314 | least 200 MB or RAM available the routing can be performed with no disk | ||
315 | accesses (once the data has been read once). | ||
316 | |||
317 | |||
318 | -------- | ||
319 | |||
320 | amb | 621 | Copyright 2008-2011 Andrew M. Bishop. |
Properties
Name | Value |
---|---|
cvs:description | Description of the algorithm. |