Check out the latest version of Routino: svn co http://routino.org/svn/trunk routino
Contents of /trunk/doc/ALGORITHM.txt
Parent Directory
|
Revision Log
Revision 679 -
(show annotations)
(download)
Sat Apr 23 16:05:56 2011 UTC (13 years, 11 months ago) by amb
File MIME type: text/plain
File size: 16161 byte(s)
Sat Apr 23 16:05:56 2011 UTC (13 years, 11 months ago) by amb
File MIME type: text/plain
File size: 16161 byte(s)
Add description of U-turns at dead-ends.
1 | Routino : Algorithm |
2 | =================== |
3 | |
4 | |
5 | This page describes the development of the algorithm that is used in |
6 | Routino for finding routes. |
7 | |
8 | |
9 | Simplest Algorithm |
10 | ------------------ |
11 | |
12 | The algorithm to find a route is fundamentally simple: Start at the |
13 | beginning, follow all possible routes and keep going until you reach |
14 | the end. |
15 | |
16 | While this method does work, it isn't fast. To be able to find a route |
17 | quickly needs a different algorithm, one that can find the correct |
18 | answer without wasting time on routes that lead nowhere. |
19 | |
20 | |
21 | Improved Algorithm |
22 | ------------------ |
23 | |
24 | The simplest way to do this is to follow all possible segments from the |
25 | starting node to the next nearest node (an intermediate node in the |
26 | complete journey). For each node that is reached store the shortest |
27 | route from the starting node and the length of that route. The list of |
28 | intermediate nodes needs to be maintained in order of shortest overall |
29 | route on the assumption that there is a straight line route from here |
30 | to the end node. |
31 | At each point the intermediate node that has the shortest potential |
32 | overall journey time is processed before any other node. From the first |
33 | node in the list follow all possible segments and place the newly |
34 | discovered nodes into the same list ordered in the same way. This will |
35 | tend to constrain the list of nodes examined to be the ones that are |
36 | between the start and end nodes. If at any point you reach a node that |
37 | has already been reached by a longer route then you can discard that |
38 | route since the newly discovered route is shorter. Conversely if the |
39 | previously discovered route is shorter then discard the new route. |
40 | At some point the end node will be reached and then any routes with |
41 | potential lengths longer than this actual route can be immediately |
42 | discarded. The few remaining potential routes must be continued until |
43 | they are found to be shorter or have no possibility of being shorter. |
44 | The shortest possible route is then found. |
45 | |
46 | At all times when looking at a node only those segments that are |
47 | possible by the chosen means of transport are followed. This allows the |
48 | type of transport to be handled easily. When finding the quickest route |
49 | the same rules apply except that the criterion for sorting is the |
50 | shortest potential route (assuming that from each node to the end is |
51 | the fastest possible type of highway). |
52 | |
53 | This method also works, but again it isn't very fast. The problem is |
54 | that the complexity is proportional to the number of nodes or segments |
55 | in all routes examined between the start and end nodes. Maintaining the |
56 | list of intermediate nodes in order is the most complex part. |
57 | |
58 | |
59 | Final Algorithm |
60 | --------------- |
61 | |
62 | The final algorithm that is implemented in the router is basically the |
63 | one above but with an important difference. Instead of finding a long |
64 | route among a data set of 8,000,000 nodes (number of highway nodes in |
65 | UK at beginning of 2010) it finds one long route in a data set of |
66 | 1,000,000 nodes and a few hundred very short routes in the full data |
67 | set. Since the time taken to find a route is proportional to the number |
68 | of nodes that need to be considered the main route takes 1/10th of the |
69 | time and the very short routes take almost no time at all. |
70 | |
71 | The solution to making the algorithm fast is therefore to discard most |
72 | of the nodes and only keep the interesting ones. In this case a node is |
73 | deemed to be interesting if it is the junction of three or more |
74 | segments or the junction of two segments with different properties or |
75 | has a routing restriction different from the connecting segments. In |
76 | the algorithm and following description these are classed as |
77 | super-nodes. Starting at each super-node a super-segment is generated |
78 | that finishes on another super-node and contains the shortest path |
79 | along segments with identical properties (and these properties are |
80 | inherited by the super-segment). The point of choosing the shortest |
81 | route is that since all segments considered have identical properties |
82 | they will be treated identically when properties are taken into |
83 | account. This decision making process can be repeated until the only |
84 | the most important and interesting nodes remain. |
85 | |
86 | To find a route between a start and finish point now comprises the |
87 | following steps (assuming a shortest route is required): |
88 | |
89 | 1. Find all shortest routes from the start point along normal segments |
90 | and stopping when super-nodes are reached. |
91 | 2. Find all shortest routes from the end point backwards along normal |
92 | segments and stopping when super-nodes are reached. |
93 | 3. Find the shortest route along super-segments from the set of |
94 | super-nodes in step 1 to the set of super-nodes in step 2 (taking |
95 | into account the lengths found in steps 1 and 2 between the |
96 | start/finish super-nodes and the ultimate start/finish point). |
97 | 4. For each super-segment in step 3 find the shortest route between |
98 | the two end-point super-nodes. |
99 | |
100 | This multi-step process is considerably quicker than using all nodes |
101 | but gives a result that still contains the full list of nodes that are |
102 | visited. There are some special cases though, for example very short |
103 | routes that do not pass through any super-nodes, or routes that start |
104 | or finish on a super-node. In these cases one or more of the steps |
105 | listed can be removed or simplified. |
106 | |
107 | When the first route reaches the final node the length of that route is |
108 | retained as a benchmark. Any shorter complete route that is calculated |
109 | later would replace this benchmark. As routes are tested any partial |
110 | routes that are longer than the benchmark can be immediately discarded. |
111 | Other partial routes have the length of a perfect straight highway to |
112 | the final node added to them and if the total exceeds the benchmark |
113 | they can also be discarded. Very quickly the number of possible routes |
114 | is reduced until the absolute shortest is found. |
115 | |
116 | For routes that do not start or finish on a node in the original data |
117 | set a fake node is added to an existing segment. This requires special |
118 | handling in the algorithm but it gives mode flexibility for the start, |
119 | finish and intermediate points in a route. |
120 | |
121 | |
122 | Algorithm Evolution |
123 | ------------------- |
124 | |
125 | In Routino versions 1.0 to 1.4 the algorithm used to select a |
126 | super-node was the same as above except that node properties were not |
127 | included. Routino versions 1.4.1 to 1.5.1 used a slightly different |
128 | algorithm which only chose nodes that were junctions between segments |
129 | with different properties (or has a routing restriction that is |
130 | different from connecting segments in versions 1.5 and 1.5.1). The |
131 | addition of turn restrictions (described in more detail below) requires |
132 | the original algorithm since the super-segments more accurately reflect |
133 | the underlying topology. |
134 | |
135 | |
136 | Routing Preferences |
137 | ------------------- |
138 | |
139 | One of the important features of Routino is the ability to select a |
140 | route that is optimum for a set of criteria such as preferences for |
141 | each type of highway, speed limits and other restrictions and highway |
142 | properties. |
143 | |
144 | All of these features are handled by assigning a score to each segment |
145 | while calculating the route and trying to minimise the score rather |
146 | than simply minimising the length. |
147 | |
148 | Segment length |
149 | When calculating the shortest route the length of the segment is |
150 | the starting point for the score. |
151 | |
152 | Speed preference |
153 | When calculating the quickest route the time taken calculated |
154 | from the length of the segment and the lower of the highway's |
155 | own speed limit and the user's speed preference for the type of |
156 | highway is the starting point for the score. |
157 | |
158 | Oneway restriction |
159 | If a highway has the oneway property in the opposite direction |
160 | to the desired travel and the user's preference is to obey |
161 | oneway restrictions then the segment is ignored. |
162 | |
163 | Weight, height, width & length limits |
164 | If a highway has one of these limits and its value is less than |
165 | the user's specified requirement then the segment is ignored. |
166 | |
167 | Highway preference |
168 | The highway preference specified by the user is a percentage, |
169 | these are scaled so that the most preferred highway type has a |
170 | weighted preference of 1.0 (0% always has a weighted preference |
171 | of 0.0). The calculated score for a segment is divided by this |
172 | weighted preference. |
173 | |
174 | Highway properties |
175 | The other highway properties are specified by the user as a |
176 | percentage and each highway either has that property or not. The |
177 | user's property preference is scaled into the range 0.0 (for 0%) |
178 | to 1.0 (for 100%) to give a weighted preference, a second |
179 | "non-property" weighted preference is calcuated in the same way |
180 | after subtracting the user's preference from 100%. If a segment |
181 | has a particular property then the calculated score is divided |
182 | by the weighted preference for that property, if not then it is |
183 | divided by the non-property weighted preference. A non-linear |
184 | transformation is applied so that changing property preferences |
185 | close to 50% do not cause large variations in routes. |
186 | |
187 | Turn Restrictions |
188 | ----------------- |
189 | |
190 | The addition of turn restrictions adds a set of further complications |
191 | because it introduces a set of constraints that are far more complex |
192 | than one-way streets. |
193 | |
194 | A turn restriction in the simplest case is a combination of a segment, |
195 | node and segment such that routes are not allowed to go from the first |
196 | segment to the second one through the specified node. Exceptions for |
197 | certain types of traffic can also be specified. Currently only this |
198 | simplest type of turn restriction is handled by the algorithm. |
199 | |
200 | The first complication of turn restrictions is that the algorithm above |
201 | requires that super-segments are composed of segments with identical |
202 | properties. A turn restriction is not the same in both directions so a |
203 | super-segment cannot include any route through that turn restriction. |
204 | The node at the centre of the turn restriction must therefore be a |
205 | super-node to avoid this. In addition to this all nodes connected to |
206 | the turn restriction node by a single segment must also be super-nodes |
207 | to avoid any long-distance super-segments starting at the restricted |
208 | node. |
209 | |
210 | The second complication of a turn restriction is that the optimum route |
211 | may require passing through the same node more than once. This can |
212 | happen where the route needs to work around a turn restriction by |
213 | driving past it, turning round (on a roundabout perhaps) and coming |
214 | back along the same highway. Without turn restrictions a route could be |
215 | defined purely by the set of nodes that were passed; no node would |
216 | exist more than once along a route between two points. With turn |
217 | restrictions the route is defined by a node and the segment used to get |
218 | there; no route between two points will ever need to follow the same |
219 | segment in the same direction more than once. |
220 | |
221 | A side-effect of this is that a route that works around a turn |
222 | restriction must be calculable using the super-segments that are stored |
223 | in the database. This puts a limit on the amount of database |
224 | optimisation that can be performed because if too many super-segments |
225 | are removed the optimum work-around may also be removed. The solution |
226 | to this is to ensure that the database preserves all loops that can be |
227 | used to turn around and reverse direction, previously super-segments |
228 | that started and finished on the same super-node were disallowed. |
229 | |
230 | Another side-effect of having the route composed of a set of locations |
231 | (nodes) as well as the direction of travel (segments used to reach |
232 | them) is that via points in the route can be forced to continue in the |
233 | original direction. If the chosen method of transport obeys turn |
234 | restrictions then it will not reverse direction at a via point but will |
235 | find an optimum route continuing in the same direction. This route may |
236 | require turning around to get out of a dead-end and it is assumed that |
237 | all dead-ends will allow this. |
238 | |
239 | A side-effect of having the starting direction at a via point defined |
240 | by the previous part of the route is that overall non-optimal routes |
241 | may be found even though each section between via points is optimal. |
242 | For a route with a start, middle and end point defined it can be the |
243 | case that the shortest route from the start to the middle arrives in |
244 | the opposite direction to that required for the optimal route from the |
245 | middle to the end. The calculation of the route in separate sections |
246 | therefore may give a non-optimum result even though each section is |
247 | itself optimum based on the start conditions. |
248 | |
249 | Overall the presence of turn restrictions in the database makes the |
250 | routing slower even for regions of the map that have no turn |
251 | restrictions. |
252 | |
253 | |
254 | Data Implementation |
255 | ------------------- |
256 | |
257 | The hardest part of implementing this router is the data organisation. |
258 | The arrangement of the data to minimise the number of operations |
259 | required to follow a route from one node to another is much harder than |
260 | designing the algorithm itself. |
261 | |
262 | The final implementation uses a separate table for nodes, segments and |
263 | ways. Each table individually is implemented as a C-language data |
264 | structure that is written to disk by a program which parses the |
265 | OpenStreetMap XML data file. In the router these data structures are |
266 | memory mapped so that the operating system handles the problems of |
267 | loading the needed data blocks from disk. |
268 | |
269 | Each node contains a latitude and longitude and they are sorted |
270 | geographically so that converting a latitude and longitude coordinate |
271 | to a node is fast as well as looking up the coordinate of a node. The |
272 | node also contains the location in the array of segments for the first |
273 | segment that uses that node. |
274 | Each segment contains the location of the two nodes as well as the way |
275 | that the segment came from. The location of the next segment that uses |
276 | one of the two nodes is also stored; the next segment for the other |
277 | node is the following one in the array. The length of the segment is |
278 | also pre-computed and stored. |
279 | Each way has a name, a highway type, a list of allowed types of |
280 | traffic, a speed limit, any weight, height, width or length |
281 | restrictions and the highway properties. |
282 | |
283 | The super-nodes are mixed in with the nodes and the super-segments are |
284 | mixed in with the segments. For the nodes they are the same as the |
285 | normal nodes, so just a flag is needed to indicate that they are super. |
286 | The super-segments are in addition to the normal segments so they |
287 | increase the database size (by about 10%) and are also marked with a |
288 | flag. |
289 | |
290 | The relations are stored separately from the nodes, segments and ways. |
291 | For the turn restriction relations the initial and final segments are |
292 | stored along with the restricted node itself. Each node that has a turn |
293 | restriction is marked in the main node storage with a flag to indicate |
294 | this information. |
295 | |
296 | |
297 | Practicalities |
298 | -------------- |
299 | |
300 | At the time of writing (April 2010) the OpenStreetMap data for Great |
301 | Britain (taken from GeoFabrik) contains: |
302 | * 14,675,098 nodes |
303 | + 8,767,521 are highway nodes |
304 | + 1,120,297 are super-nodes |
305 | * 1,876,822 ways |
306 | + 1,412,898 are highways |
307 | o 9,316,328 highway segments |
308 | o 1,641,009 are super-segments |
309 | * 60,572 relations |
310 | |
311 | The database files when generated are 41.5 MB for nodes, 121.6 MB for |
312 | segments and 12.6 MB for ways and are stored uncompressed. By having at |
313 | least 200 MB or RAM available the routing can be performed with no disk |
314 | accesses (once the data has been read once). |
315 | |
316 | |
317 | -------- |
318 | |
319 | Copyright 2008-2011 Andrew M. Bishop. |
Properties
Name | Value |
---|---|
cvs:description | Description of the algorithm. |