Marble/NaturalEarth: Difference between revisions

From KDE Community Wiki
(Created page with '=Natural Earth= The Natural Earth data set is a "public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster da...')
 
m (Reverted edits by Frinring (talk) to last revision by Ochurlaud)
 
(23 intermediate revisions by 5 users not shown)
Line 1: Line 1:
=Natural Earth=
= Natural Earth =


The Natural Earth data set is a "public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software."
== Introduction ==


The main benefit for the lightweight requirement switching to Natural Earth
Marble currently uses the very old and outdated MWDBII dataset for its vector base map such as national borders and coastlines and we really need to replace it with more up-to-date data.  However, MWDBII has two key advantages, it is very compact in size enabling Marble to ship it by default, and the individual nodes have a zoom level value which speeds up drawing.
(NE) would be updated borders, a separate internal border dataset, and a few
more physical features, but this should be worth it.


The main benefit for the second case is the embedded data attributes such as  
The current vector layer also has the disadvantage of not being able to be manipulated either programtiacally or by the user.  This prevents it from being used for such things as KGeography or other educational uses where you would want to select and manipulate a geographic entity.
country code, groupings, relative magnitude, etc, which will allow easy
linking and manipulation.  We could do something similar using PNT, but we
would have to create our own attribute storage to link things together, and  
then maintain the attribute data, so sticking with the shapefiles would seem a  
better solution there.


As you say, there's the two approaches that could work for the lightweight
Improving the vector base maps would thus consist of two closely related parts:
default layer:
* Improving the vector drawing code to allow interaction with the vectors, and improved performance to allow more detailed vectors to be drawn.
a) Convert just the required NE datasets to pnt format, either merging the 3
* Importing a new base layer dataset.
scale levels into a single file with just 3 detail levels, or use Douglas-
Peucker on the 1:10m files to create the required detail levels
b) Implement an internal lightweight shapefile parser without dbf support and
ship only the required NE datasets.


Some pros/cons to consider:
== Using NaturalEarth ==
 
The [http://www.naturalearthdata.com/ Natural Earth data set] is a "public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software."  This data set seems ideal as a replacement for the MWDBII.
 
Advantages:
* Free / Public Domain data
* Regularly updated
* Wide variety of political and geographic features
* Available at 3 different scales: 1:10m 1:50m and 1:110m
* All feature nodes at same scale are matched
* Data attributes such as country code, population, relative magnitude, etc
 
Disadvantages:
* Is in Shapefile format which is space inefficient
* No per node zoom level attribute
* Different scale datasets do not match so cannot efficiently be used together for zooming
 
The 1:10m dataset seems ideal as the base map in Marble as it provides a higher level of detail than the current MWDBII.  The 1:110m dataset seems ideal for use in a country selector widget in kdelibs.  The 1:50m dataset is less detailed than the current MWDBII so may be less useful.
 
Using the data in the default shapefile format is not considered desirable however:
* No shapefile format support in Marble (yet), would have to rely on an external library or write our own
* Space inefficient (14Mb vs 2.6Mb for MWDBII)
* No zoom level attribute or any node level attributes  which would slow drawing
* Vector level attributes are stored in .dbf format which adds complexity to implementing shapefile support
 
The ideal solution would seem be to convert the Natural Earth data into a more efficient file format and either calculate and store the zoom level attribute in the file or calculate it on file load.  The full Natural Earth dataset would be converted, but would only ship the minimal dataset required with Marble (approx 4-5Mb?) with the remainder of the data later being downloaded via GHNS or as a separate package.
 
=== Vector Layer Improvements ===
 
The main changes required to Marble will be in the vector layer itself, removing the old PNT file vector drawing code and implementing the new dataset using the new GeoData library vector support.
 
Two main issues will need to be solved here
* if a new file format is needed for efficient storage
* if a zoom level attribute is needed for fast drawing, and if so where and how to implement the attribute
 
Some possibilities for the file format are:
* Adapt the existing PNT format used for MWDBII to have a higher degree of accuracy and add an attributes table
* Use the existing Marble serial/cache format which will be faster but the size efficiency may not be sufficient
* Use shapefile by implementing a lightweight parser without dbf support but our own simple attribute table.
* Implement full shapefile support including dbf (possibly using libshape)


Shapefiles don't support node attributes at all, and only support feature and
The zoom level problem can be solved by either:
file attributes via dbf files, so different detail levels require separate 
* Calculate a zoom level for each point during the file conversion and save it in the new file format, but this will require more storage space.
files, or multiple features in the same file with the dbf attributes to  
* Improve the vector drawing layer to calculate the zoom level on the fly, which would benefit all vector drawing but may be too slow.
identify the detail level.  Either way this would obviously be slower and take
more disk space and memory than the PNT files.


Visually comparing it's clear that the 1:50m data is less detailed than the
The Douglas-Peucker algorithm may be able to be used here.
MWDBII, but the 1:10m data is more detailed.  This would imply to me that we
would have to use the 1:10m data, but to match the features currently
displayed from MWDBII using NE shapefiles would take 14 MB which is clearly
unacceptable (see table of sizes below).  By comparison the PNT files only
take 2.6 MB.


The 1:10m country file is 6.55MB and contains 533,202 points = 12.28  
Some pros/cons to consider:
bytes/point compared to the PNT which is 745KB and contains 127246 points  =  
 
* The 1:10m country file is 6.55MB and contains 533,202 points = 12.28  
bytes/point compared to the PNT which is 745KB and contains 127,246 points  =  
5.85 bytes/point, which would suggest the NE data in PNT format would be half  
5.85 bytes/point, which would suggest the NE data in PNT format would be half  
the size, so 6 MB in total.  This could probably be further reduced by a light  
the size, so 6 MB in total.  This could probably be further reduced by a light  
application of Douglas-Peucker.
application of Douglas-Peucker.


Visually comparing it's clear the 110, 50, and 10 datasets do not match each
* The NE shapefiles have been carefully processed so shared borders and  
other or share common vertices so can't be merged together as the detail
levels, they would have to be used separately.
 
The NE shapefiles have been carefully processed so shared borders and  
overlapping features like rivers match exactly and other such niceties,  
overlapping features like rivers match exactly and other such niceties,  
applying the Douglas-Peucker algorithm might affect that.
applying the Douglas-Peucker algorithm might affect that.


A lightweight shapefile parser would allow users/apps to load other  
* A shapefile parser would allow users/apps to load other shapefiles.
shapefiles.


We would have to reconvert and check the data every time there's a new NE  
* We would have to reconvert and check the data every time there's a new NE  
release which could be a lot of effort, but an automated shp2pnt script could  
release which could be a lot of effort, but an automated shp2pnt script could  
prove useful to allow apps/users to display their own shapefiles in a simple  
prove useful to allow apps/users to display their own shapefiles in a simple  
way.
way.


Overall, it seems the best approach for the updating the lightweight layer is  
* It is unknown if the D-H algorithm can be deployed in a way to mark each point with a detail level rather than just throwing the points away.
indeed to convert the shapefiles to PNT format, provided the D-H algorithm can  
 
be deployed in a way to mark each point with a detail level rather than just  
==== Calculating Zoom Level on the fly ====
throwing the points away.
Using GeoPainter and GeoDataLineString ("libgeodata"):
* Apply Douglas-Peuker dynamically in GeoDataLineString class to set Detail Level
* When GeoDataLineString modified (add/del point) set dirty flag
* When node accessed for drawing, Dirty flag would trigger D-P to calculate detail level and unset dirty flag
* Would benefit all vector formats, e.g. kml, ppx, shp
 
==== Possible new file format (Proposed by John) ====
Potential new Marble file format based on PNT:
* 1st integer (32 bit): Latitude in arcseconds highest bit indicates new polygon starts: header information has to be read from 3rd integer
* 2nd integer (32 bit): Longitude in arcseconds
* Optional 3rd integer: feature key highest bit feature geometry (line or ring).
 
Applying this to the 1:10m dataset:
* Each point takes 64 bits/8 bytes
* The start of each polygon takes 96 bits
* Roughly 533,202 x 8 bytes = 4 Mb for the country borders alone, not including internal border and coastline files
* If that's too much to ship, then ship the 1:50m dataset as the default and download the 1:10m dataset once online
 
==== Possible more size efficient new file format (Proposed by Torsten) ====
 
For the Natural Earth Layer providing the Default data set at 0.5 arcminute resolution should be enough.
This fileformat allows for even better packed data than the PNT format. For detailed polygons at arcminute scale on average it should use only 33% of the amount used by PNT.
 
=====Description of the file format=====


For the heavier vector layer with attributes, rather than just implementing a  
In the fileformat initally a file header is provided that provides the file format version and the number of polygons stored inside the file. A Polygon starts with the Polygon Header which provides the feature id and the number of so called "absolute nodes" that are about to follow. Absolute nodes always contain absolute geodetic coordinates. The Polygon Header also provides a flag that allows to specify whether the polygon is supposed to represent a line string ("0") or a linear ring ("1").
shapefile importer based on shapelib (a C library), it might be better to  
Each absolute node can be followed by relative nodes: These relative nodes are always nodes that follow in correct order inside the polygon after "their" absolute node.
implement an GDAL/OGR importer which would import any vector format and is a
Each absolute node specifies the number of relative nodes which contain relative coordinates in reference to their absolute node. So an absolute node provides the absolute reference for relative nodes across a theoretical area of 2x2 squaredegree-area (which in practice frequently might rather amount to 1x1 square degrees).   
C++ libraryThat would depend on packaging weight and difficulty, so needs
more investigation, but that's for another day.


Cheers!
So much of the compression works by just referencing lat/lon diffs to special "absolute nodes". Hence the compression will especially work well for polygons with many nodes with a high node density.


John.
The parser has to convert these relative coordinates to absolute coordinates.


=====File Structure=====


A lot of the NE data is duplicated / re-mixed, the core shapefiles sizes are:
'''File header'''


                          1:110m    1:50m     1:10m
* quint8: File format version
                          ------     -----     -----
* quint32: Number of polygons contained in the file.
 
'''Polygon Header'''
 
* quint32: Feature id (either Natural Earth or Geonames).
* quint32: Number of parent node chunks to follow
* quint8: Flags: 1st bit: polygonIsClosed
 
'''Absolute node chunk'''
 
* qint16: Latitude in halfarcminutes (allowed range = [-10800;+10800 ] halfarcminutes = [-90;+90 ] degrees )
* qint16: Longitude in halfarcminutes (allowed range = [-21600;+21600 ] halfarcminutes = [-180;+180 ] degrees )
* qint16: Number of child node chunks to follow (equals "0" if there are no child nodes)
 
'''Relative node chunk'''
 
* qint8: Latitude-diff in halfarcminutes compared to the parent (allowed range = [-60;+60] arcminutes = [-1;+1]  degrees)
* qint8: Longitude-diff in halfarcminutes compared to the parent (allowed range = [-60;+60] arcminutes = [-1;+1] degrees)
 
==== Attribute Database ====
 
Metadata file:
* convert / filter dbf into our own format
* could just be csv or xml? or sqlite?
 
Rather than the Geonames ID, we could just use the Natural Earth object ID,
then a look-up file/table that matches the NE ID to the ISO / FIPS / whatever
code (NE provides this in the metadata) and Geonames ID (which we would have
to provide).  This would allow look-ups via whatever code or ID is available,
and we wouldn't be reliant on Geonames IDs staying constant or being online.
 
==== Spatialite ====
 
One option would be to integrate Spatialite and use this as both the data storage for the vectors and as the attribute database.  Spatialite is an extension to SQLite implementing a Spatial SQL database.  Among the feature this provides is a compact data storage format and the ability to import Shapefiles and CSV files, as well as access all the standard GEOS tools if installed.
 
There is a 20Mb zip file available for Natural Earth in Spatialite format, it is unclear how much of Natural Earth is contained in this.  A minimal dataset could be shipped by default with the full dataset downloaded later.
 
Spatial SQL queries could return just those vectors currently in the viewport, but repeated reloading and redrawing could be inefficient.  However this may also solve the Level-of-Detail problem.
 
The major downside is the dependencies which include SQLite, PROJ and GEOS so on a platform like Windows would require a larger monolithic binary which defeats the purpose of shipping slimmed down data.
 
More research is required here.  It may not be a suitable option for the default Atlas view, but would be a very powerful extension for Marble to provide lightweight GIS-like functionality.
 
=== Action Plan ===
 
A possible action plan is
# Fix GeoPainter LinearRings which contain a pole not rendered correctly
# Implement Douglas-Peucker reduction in GeoDataLineString
# New PNT file format definition (with a different name, MBL?)
# Metadata file format definition
# New GeoData PNT2 file loading code (convert old data).
# shp2pnt2 script to convert shp to new formats (using Perl::shp? there's shp2xxx scripts out there we could copy?), including matching to Geonames ID
# split files into 'ship with', 'download asap', 'ghns'
 
Later add simple shapefile loading to GeoData, maybe with attibute layer?
 
== Natural Earth Datasets ==
 
=== Dataset Sizes ===
 
Key Natural Earth data files from v1.2, recent updates to 1.3 not included.
<pre>
                            1:110m    1:50m       1:10m
                            ------   -------     -------
Admin level 0 countries    172 KB    1.36 MB    6.55 MB
Admin level 0 countries    172 KB    1.36 MB    6.55 MB
Admin level 0 land borders  39 KB    301 KB      896 KB
Admin level 0 land borders  39 KB    301 KB      896 KB
Line 107: Line 210:
Bathmetry                                        11.64 MB
Bathmetry                                        11.64 MB


</pre>


Datasets:
=== Dataset Details ===


Countries
* Countries
matched boundary lines and polygons with names attributes for countries and  
** matched boundary lines and polygons with names attributes for countries and  
sovereign states. Includes dependencies (French Polynesia),  
sovereign states. Includes dependencies (French Polynesia),  
map units (U.S. Pacific Island Territories) and  
map units (U.S. Pacific Island Territories) and  
sub-national map subunits (Corsica versus mainland Metropolitan France).
sub-national map subunits (Corsica versus mainland Metropolitan France).
** Core data


Disputed areas and breakaway regions
* Disputed areas and breakaway regions
From Kashmir to the Elemi Triangle, Northern Cyprus to Western Sahara.
** From Kashmir to the Elemi Triangle, Northern Cyprus to Western Sahara.
** Core data


First order admin (provinces, departments, states, etc.)
* Internal boundaries
– internal boundaries and polygons for all but a few tiny island nations. Includes names attributes and some statistical groupings of the same for smaller countries.
** Core data??


Populated places
* Coastline
– point symbols with name attributes. Includes capitals, major cities and towns, plus significant smaller towns in sparsely inhabited regions. We favor regional significance over population census in determining rankings.
** ocean coastline, including major islands. Coastline is matched to land and water polygons.
** Core data?


Urban polygons
* First order admin (provinces, departments, states, etc.)
– derived from 2002-2003 MODIS satellite data.
** internal boundaries and polygons for all but a few tiny island nations. Includes names attributes and some statistical groupings of the same for smaller countries.
** Optional download


Parks and protected areas
* Populated places
* US Only
** point symbols with name attributes. Includes capitals, major cities and towns, plus significant smaller towns in sparsely inhabited regions. We favor regional significance over population census in determining rankings.
* Don't use, maybe user download layer
** Optional download, or use to replace current places file?


Pacific nation groupings
* Urban polygons
– boxes for keeping these far-flung islands tidy.
** derived from 2002-2003 MODIS satellite data.
** Optional download


Water boundary indicators
* Pacific nation groupings
– partial selection of key 200-mile nautical limits, plus some disputed, treaty, and median lines.
** boxes for keeping these far-flung islands tidy.
** Optional download


Coastline
* Water boundary indicators
– ocean coastline, including major islands. Coastline is matched to land and water polygons.
** partial selection of key 200-mile nautical limits, plus some disputed, treaty, and median lines.
** Optional download


Land
* Land
Land polygons including major islands
** Land polygons including major islands
** Optional download


Ocean
* Ocean
Ocean polygon split into contiguous pieces.
** Ocean polygon split into contiguous pieces.
** Optional download


Minor Islands
* Minor Islands
additional small ocean islands ranked to two levels of relative importance.
** additional small ocean islands ranked to two levels of relative importance.
** Optional download


Reefs
* Reefs
major coral reefs from WDB2.
** major coral reefs from WDB2.
** Optional download


Physical region features
* Physical region features
polygon and point labels of major physical features.
** polygon and point labels of major physical features.
** Optional download


Rivers and Lake Centerlines
* Rivers and Lake Centerlines
ranked by relative importance. Includes name and line width attributes. Don’t want minor lakes? Turn on their centerlines to avoid unseemly data gaps.
** ranked by relative importance. Includes name and line width attributes. Don’t want minor lakes? Turn on their centerlines to avoid unseemly data gaps.
** Optional download


Lakes
* Lakes
– ranked by relative importance, coordinating with river ranking. Includes name attributes.
** Ranked by relative importance, coordinating with river ranking. Includes name attributes.
** Optional download


Glaciated areas
*Glaciated areas
polygons derived from DCW, except for Antarctica derived from MOA. Includes name attributes for major polar glaciers.
** polygons derived from DCW, except for Antarctica derived from MOA. Includes name attributes for major polar glaciers.
** Optional download


Antarctic ice shelves
*Antarctic ice shelves
– derived from 2003-2004 MOA. Reflects recent ice shelf collapses.
** Derived from 2003-2004 MOA. Reflects recent ice shelf collapses.
** Optional download


Bathymetry
*Bathymetry
nested polygons at 0, -200, -1,000, -2,000, -3,000, -4,000, -5,000, -6,000, -7,000, -8,000, -9,000,and -10,000 meters. Created from SRTM Plus.
** nested polygons at 0, -200, -1,000, -2,000, -3,000, -4,000, -5,000, -6,000, -7,000, -8,000, -9,000,and -10,000 meters. Created from SRTM Plus.
** Optional download


Geographic lines
* Parks and protected areas
– Polar circles, tropical circles, equator, and International Date Line.
** US Only
 
** Don't use, maybe user download layer
Graticules
– 1-, 5-, 10-, 15-, 20-, and 30-degree increments. Includes WGS84 bounding box.
 
tk
 
 
Using GeoPainter and GeoDataLineString ("libgeodata"):
* Apply Douglas-Peuker dynamicallyin GeoDataLineString class to set Detail Level
* When GeoDataLineString modified (add/del point) set dirty flag
* When node accessed for drawing, Dirty flag would trigger D-P to calculate detail level and unset dirty flag
* Would benefit all vector formats, e.g. kml, ppx, shp
 
New PNT format:
 
* 1st integer (32 bit): Latitude in arcseconds
                        highest bit indicates new polygon starts: header information has to be read from 3rd integer
* 2nd integer (32 bit): Longitude in arcseconds
* optional 3rd integer: feature key
                        highest bit feature geometry (line or ring).
 
1:10 dataset in PNT2 format:
* each point takes 64 bits/8 bytes
* the start of polygon which takes 96 bits
* roughly 533,202 x 8 bytes = 4 Mb for the country borders alone
* If that's too much to ship, then ship the 1:50 dataset and download the 1:10 asap
 
Metadata file:
* convert / filter dbf into our own format
Rather than the Geonames ID, we could just use the Natural Earth object ID,  
then a look-up file/table that matches the NE ID to the ISO / FIPS / whatever
code (NE provides this in the metadata) and Geonames ID (which we would have
to provide).  This would allow look-ups via whatever code or ID is available,
and we wouldn't be reliant on Geonames IDs staying constant.
 
So required work is:
1) Fix GeoPainter LinearRings which contain a pole not rendered correctly
  - Torsten knows, problem will fill in flat map needs to create polygon if closed and crosses dateline once only
2) Implement Douglas-Peuker reduction in GeoDataLineString
  - big investigation
3) New PNT file format definition (with a different name, MBL?)
  - existing serial format from geodata???
4) Metadata file format definition (could just be csv or xml? or sqlite?)
5) New GeoData PNT2 file loading code (convert olf data).
6) shp2mbl script to convert shp to new formats (using Perl::shp? there's
shp2xxx scripts out there we could copy?), including matching to Geonames ID
7) split files into 'ship with', 'download asap', 'ghns'@
 
Later add simple shapefile loading to GeoData, maybe with attibute layer?


* Geographic lines
** Polar circles, tropical circles, equator, and International Date Line.
** Probably not useful to Marble as we have a plugin for most of these.
** International Date Line could be extracted


GHNS:
*Graticules
* can it download layers as well as themes?
** 1-, 5-, 10-, 15-, 20-, and 30-degree increments. Includes WGS84 bounding box.
** Probably not useful to Marble as we have a Graticle plugin.

Latest revision as of 04:59, 26 October 2016

Natural Earth

Introduction

Marble currently uses the very old and outdated MWDBII dataset for its vector base map such as national borders and coastlines and we really need to replace it with more up-to-date data. However, MWDBII has two key advantages, it is very compact in size enabling Marble to ship it by default, and the individual nodes have a zoom level value which speeds up drawing.

The current vector layer also has the disadvantage of not being able to be manipulated either programtiacally or by the user. This prevents it from being used for such things as KGeography or other educational uses where you would want to select and manipulate a geographic entity.

Improving the vector base maps would thus consist of two closely related parts:

  • Improving the vector drawing code to allow interaction with the vectors, and improved performance to allow more detailed vectors to be drawn.
  • Importing a new base layer dataset.

Using NaturalEarth

The Natural Earth data set is a "public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software." This data set seems ideal as a replacement for the MWDBII.

Advantages:

  • Free / Public Domain data
  • Regularly updated
  • Wide variety of political and geographic features
  • Available at 3 different scales: 1:10m 1:50m and 1:110m
  • All feature nodes at same scale are matched
  • Data attributes such as country code, population, relative magnitude, etc

Disadvantages:

  • Is in Shapefile format which is space inefficient
  • No per node zoom level attribute
  • Different scale datasets do not match so cannot efficiently be used together for zooming

The 1:10m dataset seems ideal as the base map in Marble as it provides a higher level of detail than the current MWDBII. The 1:110m dataset seems ideal for use in a country selector widget in kdelibs. The 1:50m dataset is less detailed than the current MWDBII so may be less useful.

Using the data in the default shapefile format is not considered desirable however:

  • No shapefile format support in Marble (yet), would have to rely on an external library or write our own
  • Space inefficient (14Mb vs 2.6Mb for MWDBII)
  • No zoom level attribute or any node level attributes which would slow drawing
  • Vector level attributes are stored in .dbf format which adds complexity to implementing shapefile support

The ideal solution would seem be to convert the Natural Earth data into a more efficient file format and either calculate and store the zoom level attribute in the file or calculate it on file load. The full Natural Earth dataset would be converted, but would only ship the minimal dataset required with Marble (approx 4-5Mb?) with the remainder of the data later being downloaded via GHNS or as a separate package.

Vector Layer Improvements

The main changes required to Marble will be in the vector layer itself, removing the old PNT file vector drawing code and implementing the new dataset using the new GeoData library vector support.

Two main issues will need to be solved here

  • if a new file format is needed for efficient storage
  • if a zoom level attribute is needed for fast drawing, and if so where and how to implement the attribute

Some possibilities for the file format are:

  • Adapt the existing PNT format used for MWDBII to have a higher degree of accuracy and add an attributes table
  • Use the existing Marble serial/cache format which will be faster but the size efficiency may not be sufficient
  • Use shapefile by implementing a lightweight parser without dbf support but our own simple attribute table.
  • Implement full shapefile support including dbf (possibly using libshape)

The zoom level problem can be solved by either:

  • Calculate a zoom level for each point during the file conversion and save it in the new file format, but this will require more storage space.
  • Improve the vector drawing layer to calculate the zoom level on the fly, which would benefit all vector drawing but may be too slow.

The Douglas-Peucker algorithm may be able to be used here.

Some pros/cons to consider:

  • The 1:10m country file is 6.55MB and contains 533,202 points = 12.28

bytes/point compared to the PNT which is 745KB and contains 127,246 points = 5.85 bytes/point, which would suggest the NE data in PNT format would be half the size, so 6 MB in total. This could probably be further reduced by a light application of Douglas-Peucker.

  • The NE shapefiles have been carefully processed so shared borders and

overlapping features like rivers match exactly and other such niceties, applying the Douglas-Peucker algorithm might affect that.

  • A shapefile parser would allow users/apps to load other shapefiles.
  • We would have to reconvert and check the data every time there's a new NE

release which could be a lot of effort, but an automated shp2pnt script could prove useful to allow apps/users to display their own shapefiles in a simple way.

  • It is unknown if the D-H algorithm can be deployed in a way to mark each point with a detail level rather than just throwing the points away.

Calculating Zoom Level on the fly

Using GeoPainter and GeoDataLineString ("libgeodata"):

  • Apply Douglas-Peuker dynamically in GeoDataLineString class to set Detail Level
  • When GeoDataLineString modified (add/del point) set dirty flag
  • When node accessed for drawing, Dirty flag would trigger D-P to calculate detail level and unset dirty flag
  • Would benefit all vector formats, e.g. kml, ppx, shp

Possible new file format (Proposed by John)

Potential new Marble file format based on PNT:

  • 1st integer (32 bit): Latitude in arcseconds highest bit indicates new polygon starts: header information has to be read from 3rd integer
  • 2nd integer (32 bit): Longitude in arcseconds
  • Optional 3rd integer: feature key highest bit feature geometry (line or ring).

Applying this to the 1:10m dataset:

  • Each point takes 64 bits/8 bytes
  • The start of each polygon takes 96 bits
  • Roughly 533,202 x 8 bytes = 4 Mb for the country borders alone, not including internal border and coastline files
  • If that's too much to ship, then ship the 1:50m dataset as the default and download the 1:10m dataset once online

Possible more size efficient new file format (Proposed by Torsten)

For the Natural Earth Layer providing the Default data set at 0.5 arcminute resolution should be enough. This fileformat allows for even better packed data than the PNT format. For detailed polygons at arcminute scale on average it should use only 33% of the amount used by PNT.

Description of the file format

In the fileformat initally a file header is provided that provides the file format version and the number of polygons stored inside the file. A Polygon starts with the Polygon Header which provides the feature id and the number of so called "absolute nodes" that are about to follow. Absolute nodes always contain absolute geodetic coordinates. The Polygon Header also provides a flag that allows to specify whether the polygon is supposed to represent a line string ("0") or a linear ring ("1"). Each absolute node can be followed by relative nodes: These relative nodes are always nodes that follow in correct order inside the polygon after "their" absolute node. Each absolute node specifies the number of relative nodes which contain relative coordinates in reference to their absolute node. So an absolute node provides the absolute reference for relative nodes across a theoretical area of 2x2 squaredegree-area (which in practice frequently might rather amount to 1x1 square degrees).

So much of the compression works by just referencing lat/lon diffs to special "absolute nodes". Hence the compression will especially work well for polygons with many nodes with a high node density.

The parser has to convert these relative coordinates to absolute coordinates.

File Structure

File header

  • quint8: File format version
  • quint32: Number of polygons contained in the file.

Polygon Header

  • quint32: Feature id (either Natural Earth or Geonames).
  • quint32: Number of parent node chunks to follow
  • quint8: Flags: 1st bit: polygonIsClosed

Absolute node chunk

  • qint16: Latitude in halfarcminutes (allowed range = [-10800;+10800 ] halfarcminutes = [-90;+90 ] degrees )
  • qint16: Longitude in halfarcminutes (allowed range = [-21600;+21600 ] halfarcminutes = [-180;+180 ] degrees )
  • qint16: Number of child node chunks to follow (equals "0" if there are no child nodes)

Relative node chunk

  • qint8: Latitude-diff in halfarcminutes compared to the parent (allowed range = [-60;+60] arcminutes = [-1;+1] degrees)
  • qint8: Longitude-diff in halfarcminutes compared to the parent (allowed range = [-60;+60] arcminutes = [-1;+1] degrees)

Attribute Database

Metadata file:

  • convert / filter dbf into our own format
  • could just be csv or xml? or sqlite?

Rather than the Geonames ID, we could just use the Natural Earth object ID, then a look-up file/table that matches the NE ID to the ISO / FIPS / whatever code (NE provides this in the metadata) and Geonames ID (which we would have to provide). This would allow look-ups via whatever code or ID is available, and we wouldn't be reliant on Geonames IDs staying constant or being online.

Spatialite

One option would be to integrate Spatialite and use this as both the data storage for the vectors and as the attribute database. Spatialite is an extension to SQLite implementing a Spatial SQL database. Among the feature this provides is a compact data storage format and the ability to import Shapefiles and CSV files, as well as access all the standard GEOS tools if installed.

There is a 20Mb zip file available for Natural Earth in Spatialite format, it is unclear how much of Natural Earth is contained in this. A minimal dataset could be shipped by default with the full dataset downloaded later.

Spatial SQL queries could return just those vectors currently in the viewport, but repeated reloading and redrawing could be inefficient. However this may also solve the Level-of-Detail problem.

The major downside is the dependencies which include SQLite, PROJ and GEOS so on a platform like Windows would require a larger monolithic binary which defeats the purpose of shipping slimmed down data.

More research is required here. It may not be a suitable option for the default Atlas view, but would be a very powerful extension for Marble to provide lightweight GIS-like functionality.

Action Plan

A possible action plan is

  1. Fix GeoPainter LinearRings which contain a pole not rendered correctly
  2. Implement Douglas-Peucker reduction in GeoDataLineString
  3. New PNT file format definition (with a different name, MBL?)
  4. Metadata file format definition
  5. New GeoData PNT2 file loading code (convert old data).
  6. shp2pnt2 script to convert shp to new formats (using Perl::shp? there's shp2xxx scripts out there we could copy?), including matching to Geonames ID
  7. split files into 'ship with', 'download asap', 'ghns'

Later add simple shapefile loading to GeoData, maybe with attibute layer?

Natural Earth Datasets

Dataset Sizes

Key Natural Earth data files from v1.2, recent updates to 1.3 not included.

                            1:110m     1:50m       1:10m
                            ------    -------     -------
Admin level 0 countries     172 KB    1.36 MB     6.55 MB
Admin level 0 land borders   39 KB     301 KB      896 KB
Admin level 0 sea borders    12 KB      40 KB       79 KB
Admin level 0 disputed                  40 KB      157 KB
Admin level 1 regions        39 KB     339 MB     13.9 MB *
Admin level 1 land borders   16 KB      60 KB     4.82 MB
Coastlines                   79 KB     883 KB     2.15 MB
Rivers                       19 KB     420 KB     3.29 MB
Lakes                        10 KB     286 KB      786 MB              
Glaciers                     13 KB     208 KB     1.23 MB
Dateline                     18 KB      18 KB       18 KB
Playas                                  18 KB      106 KB
Ice Shelves                            105 KB      211 KB
Minor Islands                                      449 KB
Reefs                                              171 KB
                           -------    -------   ---------
                            417 KB    4.08 MB    34.03 MB

* level 1 regions are USA/Canada only at 110m and 50m, but whole world at 10m, 
perfect for KGeography use :-)

Other useful files:
Physical Features Land      146 KB    1.50 MB      692 KB
Physical Features Sea       348 KB     836 KB      836 MB
Populated Places                       347 KB     1.48 MB
Urban Areas                            439 KB     3.48 MB
Bathmetry                                        11.64 MB

Dataset Details

  • Countries
    • matched boundary lines and polygons with names attributes for countries and

sovereign states. Includes dependencies (French Polynesia), map units (U.S. Pacific Island Territories) and sub-national map subunits (Corsica versus mainland Metropolitan France).

    • Core data
  • Disputed areas and breakaway regions
    • From Kashmir to the Elemi Triangle, Northern Cyprus to Western Sahara.
    • Core data
  • Internal boundaries
    • Core data??
  • Coastline
    • ocean coastline, including major islands. Coastline is matched to land and water polygons.
    • Core data?
  • First order admin (provinces, departments, states, etc.)
    • internal boundaries and polygons for all but a few tiny island nations. Includes names attributes and some statistical groupings of the same for smaller countries.
    • Optional download
  • Populated places
    • point symbols with name attributes. Includes capitals, major cities and towns, plus significant smaller towns in sparsely inhabited regions. We favor regional significance over population census in determining rankings.
    • Optional download, or use to replace current places file?
  • Urban polygons
    • derived from 2002-2003 MODIS satellite data.
    • Optional download
  • Pacific nation groupings
    • boxes for keeping these far-flung islands tidy.
    • Optional download
  • Water boundary indicators
    • partial selection of key 200-mile nautical limits, plus some disputed, treaty, and median lines.
    • Optional download
  • Land
    • Land polygons including major islands
    • Optional download
  • Ocean
    • Ocean polygon split into contiguous pieces.
    • Optional download
  • Minor Islands
    • additional small ocean islands ranked to two levels of relative importance.
    • Optional download
  • Reefs
    • major coral reefs from WDB2.
    • Optional download
  • Physical region features
    • polygon and point labels of major physical features.
    • Optional download
  • Rivers and Lake Centerlines
    • ranked by relative importance. Includes name and line width attributes. Don’t want minor lakes? Turn on their centerlines to avoid unseemly data gaps.
    • Optional download
  • Lakes
    • Ranked by relative importance, coordinating with river ranking. Includes name attributes.
    • Optional download
  • Glaciated areas
    • polygons derived from DCW, except for Antarctica derived from MOA. Includes name attributes for major polar glaciers.
    • Optional download
  • Antarctic ice shelves
    • Derived from 2003-2004 MOA. Reflects recent ice shelf collapses.
    • Optional download
  • Bathymetry
    • nested polygons at 0, -200, -1,000, -2,000, -3,000, -4,000, -5,000, -6,000, -7,000, -8,000, -9,000,and -10,000 meters. Created from SRTM Plus.
    • Optional download
  • Parks and protected areas
    • US Only
    • Don't use, maybe user download layer
  • Geographic lines
    • Polar circles, tropical circles, equator, and International Date Line.
    • Probably not useful to Marble as we have a plugin for most of these.
    • International Date Line could be extracted
  • Graticules
    • 1-, 5-, 10-, 15-, 20-, and 30-degree increments. Includes WGS84 bounding box.
    • Probably not useful to Marble as we have a Graticle plugin.