OpenStreetMap (OSM) data has a unique structure that is not directly reconcilable with other modes of representing spatial data, notably including the widely-adopted Simple Features (SF) scheme of the Open Geospatial Consortium (OGC). The three primary spatial objects of OSM are:
nodes, which are directly translatable to spatial points
ways, which may be closed, in which case they form polygons, or unclosed,
in which case they are (non-polygonal) lines.
relations which are higher-level objects used to specify relationships
between collections of ways and nodes. While there are several recognised
categories of relations, in spatial terms these may be reduced to a binary
distinction between:
multipolygon relations, which specify relationships between an
exterior polygon (through designating role='outer') and possible inner
polygons (role='inner'). These may or may not be designated with
type=multipolygon. Political boundaries, for example, often have
type=boundary rather than explicit type-multipolygon. osmdata
identifies multioplygons as those relation objects having at least one
member with role=outer or role=inner.
In the absence of inner and outer roles, an OSM relation is assumed to
be non-polygonal, and to instead form a collection of non-enclosing lines.
The representation of spatial objects as Simple Features is described at length by the OGC, with this document merely reviewing relevant aspects. The SF system assumes that spatial features can be represented in one of seven distinct primary classes, which by convention are referred to in all capital letters. Relevant classes for OSM data are:
(The seventh primary class is GEOMETRYCOLLECTION, which contains several
objects with different geometries.) An SF (where that acronym may connote both
singular and plural) consists of a sequence of spatial coordinates, which for
OSM data are only ever XY coordinates represented as strings enclosed within
brackets. In addition to coordinate data and associated coordinate reference
systems, an SF may include any number of additional data which quantify or
qualify the feature of interest. In the
sf extension to R, for example, a
single SF is represented by one row of a data.frame, with the geometry stored
in a single column, and any number of other columns containing these additional
data.
Simple Feature geometries are referred to in this vignette using all capital
letters (such as POLYGON), while OSM geometries use lower case (such as
polygon). Similarly, the Simple Features standard of the OGC is referred to as
SF, while the R package of the same name is referred to as R::sf–upper
case R followed by lower case sf. Much functionality of R::sf is
determined by the underlying
Geospatial Data Abstraction Library (GDAL; described
below). Representations of data are often discussed here with reference to
GDAL/sf, in which case it may always be assumed that the translation and
representation of data are determined by GDAL and not directly by the creators
of R::sf.
osmdata translates OSM into Simple FeaturesOSM nodes translate directly into SF::POINT objects, with all OSM key-value
pairs stored in additional data.frame columns.
OSM ways may be either polygons or (non-polygonal) lines. osmdata translates
these into SF::LINESTRING and SF::POLYGON objects, respectively. Although
polygonal and non-polygonal ways may have systematically different key fields,
they are conflated here to the single set of key values common to all way
objects regardless of shape. This enables direct comparison and uniform
operation on both SF::LINESTRING and SF::POLYGON objects.
OSM relations comprising members with role=outer or role=inner are
translated into SF::MULTIPOLYGON objects; otherwise they form
SF::MULTILINESTRING objects. As in the preceding case of OSM ways, potentially
systematic differences between OSM key fields for multipolygon and other
relation objects are ignored in favour of returning identical key fields in
both cases, whether or not value fields for those keys exist.
An OSM multipolygon is translated by osmdata into a single SF::MULTIPOLYGON
object which has an additional column specifying num_members. The SF
geometry thus consists of a list (an R::List object) of this number of
polygons, the first of which is the outer polygon, with all subsequent members
forming closed inner rings (either individually or in combination).
Each of these inner polygons are also represented as one or more OSM objects, which
will generally include detailed data on the individual components not able
to be represented in the single multipolygon representation. Each inner polygon
is therefore additionally stored in the sf::MULTIPOLYGON data.frame along
with all associated data. Thus the row containing a multipolygon of
num_polygon polygons is followed by num_polygon - 1 rows containing the data
for each inner polygon.
Note that OSM relation objects generally have fewer (or different) key-value
pairs than do OSM way objects. In the OSM system, data describing the detailed
properties of the constituent ways of a given OSM relation are stored with those
ways rather than with the relation. osmdata follows this general
principle, and stored the geometry of all ways of a relation with the
relation itself (that is, as part of the MULTIPOLYGON or MULTILINESTRING
object), while those ways are also stored themselves as LINESTRING (or
potentially POLYGON) objects, from where their additional key-value data may
be accessed.
OSM relations that are not multipolygons are translated into
SF::MULTILINESTRING objects. Each member of any OSM relation is attributed a
role, which may be empty. osmdata collates all ways within a relation
according to their role attributes. Thus, unlike multipolygon relations which
are always translated into a single sf::MULTIPOLYGON object, multilinestring
relations are translated by omsdata into potentially several
sf::MULTILINESTRING objects, one for each unique role.
This is particularly useful because relations are often used to designated
extended highways (for example, designated bicycle routes or motorways), yet
these often exist in primary and alternative forms, with these categories
specified in roles. Separating these roles enables ready access to any desired
role.
These multilinestring objects also have a column specifying num_members, as
for multipolygons, with the primary member followed by num_members rows, one
for each member of the multilinestring.
GDAL Translation of OSM into Simple FeaturesThe R package sf provide an R
implementation of Spatial Features, and provides a wrapper around GDAL for
reading geospatial data. GDAL provides a 'driver' to read OSM
data, and thus sf can also be used to read
OSM data in R,
as detailed in the main osmdata vignette.
However, the GDAL translation of OSM data differs in several important ways
from the osmdata translation.
The primary difference is that GDAL only returns unique objects of each
spatial (SF) type. Thus sf::POINT objects consist of only those points that
are not otherwise members of some 'higher' object (line, polygon, or
relation objects). Although a given set of OSM data may actually contain a
great many points, attempting to load these with
sf::st_read (file, layer = 'points')
will generally return surprisingly few points.
Apart from the numerical difference arising through osmdata returning an
sf::POINTS structure containing all nodes within a given set of OSM data,
while sf::st_read (file, layer='points') returns only those points not
represented in other structure, the representation of points remains otherwise
broadly similar. The only other major difference is that osmdata retains all
key-value pairs present in a given set of OSM data, whereas GDAL/sf only
retains a select few of these. Moreover, the keys returned by GDAL/sf are
pre-defined and invariant, meaning that data returned from sf::st_read (...)
may often contain key columns in the resultant data.frame which contain no
(non-NA) data. This difference is illustrated in an example repeated here from
the
main osmdata vignette,
with the same principles applying to all of the following classes of OSM data.
The following three lines define a query and download the resultant data to an
XML file.
q <- opq (bbox = 'Trentham, Australia')
q <- add_osm_feature (q, key = 'name') # any named objects
osmdata_xml (q, 'trentham.osm')
These data may then be converted into SF representations using either R::sf or
osmdata, with OSM keys being the column names of the resultant data.frame
objects.
names (sf::st_read ('trentham.osm', layer = 'points', quiet = TRUE))
## [1] "osm_id" "name" "barrier" "highway" "ref"
## [6] "address" "is_in" "place" "man_made" "other_tags"
## [11] "geometry"
names (osmdata_sf (q, 'trentham.osm')$osm_points)
## [1] "osm_id" "name" "X_description_"
## [4] "X_waypoint_" "addr.city" "addr.housenumber"
## [7] "addr.postcode" "addr.street" "amenity"
## [10] "barrier" "denomination" "foot"
## [13] "ford" "highway" "leisure"
## [16] "note_1" "phone" "place"
## [19] "railway" "railway.historic" "ref"
## [22] "religion" "shop" "source"
## [25] "tourism" "waterway" "geometry"
osmdata returns far more key fields than does GDAL/sf. More importantly,
however, GDAL/sf returns pre-defined key fields regardless of whether they
contain any data:
addr <- sf::st_read ('trentham.osm', layer = 'points', quiet = TRUE)$address
all (is.na (addr))
## [1] TRUE
In contrast, osmdata returns only those key fields which contain data (and
so excludes address in the above example).
As for points, GDAL/sf only returns those ways that are not represented or
contained in 'higher' objects (OSM relations interpreted as SF::MULTIPOLYGON
or SF::MULTILINESTRING objects). osmdata returns all ways, and thus enables,
for example, examination of the full attributes of any member of a multigeometry
object. This is not possible with the GDAL/sf translation. As for points, the
only additional difference between osmdata adn GDAL/sf is that osmdata
retains all key-value pairs, whereas GDAL retains only a select few.
Translation of OSM relations into Simple Features differs more significantly
between osmdata and GDAL/sf.
As indicated above, multipolygon relations are translated in broadly comparable
ways by both osmdata and sf/GDAL. Note, however, the way members of an OSM
relation may be specified in arbitrary order, and the multipolygonal way may not
necessarily be traced through simply following the segments in the order
returned by sf/GDAL.
Linestring relations are simply read by GDAL directly in terms of the their
constituent ways, resulting in a single SF::MULTILINESTRING object that
contains exactly the same number of lines as the ways in the OSM relation,
regardless of their role attributes. Note that roles are frequently used to
specify alternative multi-way routes through a single OSM relation. Such
distinctions between primary and alternative are erased with GDAL/sf reading.
Navigable paths, routes, and ways are all tagged within OSM as highway,
readily enabling an overpass query to return only ways that can be used for
routing purposes. Routes are nevertheless commonly assembled within OSM
relations, particularly where they form major, designated transport ways such as
long-distance foot or bicycle paths or major motorways.
sf/GDALA query for key=highway translated through GDAL/sf will return those ways
not part of any 'higher' structure as SF::LINESTRING objects, but components
of an entire transport network might also be returned as:
SF::MULTIPOLYGON objects, holding all single ways which form simple
polygons (that is, in which start and end points are the same); SF::MULTIPOLYGON objects holding all single (non-polygonal) ways which
combine to form an OSM multipolygon relation (that is, in which the
collection of ways ultimately forms a closed role=outer polygon).SF::MULTILINESTRING objects holding all single (non-polygonal) ways which
combine to form an OSM relation that is not a multipolygon.Translating these data into a single form usable for routing purposes is not
simple. A particular problem that is extremely difficult to resolve is
reconciling the SF::MULTIPOLYGON objects with the geometry of the
SF::LINESTRING objects. Highway components contained in SF::MULTIPOLYGON
objects need to be re-connected with the network represented by the
SF::LINESTRING objects, yet the OSM identifiers of the MULTIPOLYGON
components are removed by sf/GDAL, preventing these components from being
directly re-connected. The only way to ensure connection would be to re-connect
those geographic points sharing identical coordinates. This would require code
too long and complicated to be worthwhile demonstrating here.
osmdataosmdata retains all of the underlying ways of 'higher' structures
(SF::MULTIPOLYGON or SF::MULTILINESTRING objects) as
SF::LINESTRING or SF::POLYGON objects. The geometries of the latter objects
duplicate those of the 'higher' relations, yet contain additional key-value
pairs corresponding to each way. Most importantly, the OSM ID values for all
members of a relation are stored within that relation, readily enabling the
individual ways (LINESTRING or POLYGON objects) to be identified from the
relation (MULTIPOLYGON or MULTILINESTRING object).
The osmdata translation thus readily enables a singularly complete network to
be reconstructed by simply combining the SF::LINESTRING layer with the
SF::POLYGON layer. These layers will always contain entirely independent
members, and so will always be able to be directly combined without duplicating
any objects.