Marble/NaturalEarth: Difference between revisions
Line 3: | Line 3: | ||
Marble currently uses the very old and outdated MWDBII dataset for vector map outlines and we really need to replace it with more up-to-date data. However, MWDBII has two key advantages, it is very compact in size enabling Marble to ship it by default, and the individual nodes have a zoom level value which speeds up drawing. | Marble currently uses the very old and outdated MWDBII dataset for vector map outlines and we really need to replace it with more up-to-date data. However, MWDBII has two key advantages, it is very compact in size enabling Marble to ship it by default, and the individual nodes have a zoom level value which speeds up drawing. | ||
The Natural Earth data set is a "public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software." This data set seems ideal as a replacement for the MWDBII | The [[Natural Earth data set]] is a "public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software." This data set seems ideal as a replacement for the MWDBII. | ||
Advantages: | |||
* Free / Public Domain data | |||
* Regularly updated | |||
* Wide variety of political and geographic features | |||
* Available at 3 different scales: 1:10m 1:50m and 1:110m | |||
* All feature nodes at same scale are matched | |||
* Data attributes such as country code, population, relative magnitude, etc | |||
Disadvantages: | |||
* Is in Shapefile format which is space inefficient | |||
* No per node zoom level attribute | |||
* Different scale datasets do not match so cannot efficiently be used together for zooming | |||
The 1:10m dataset seems ideal as the base map in Marble as it provides a higher level of detail than the current MWDBII. The 1:110m dataset seems ideal for use in a country selector widget. The 1:50m dataset being less detailed than the MWDBII is probably not useful to Marble. | |||
Using the data in the default shapefile format is not considered desirable however: | |||
* No shapefile format support in Marble (yet), would have to rely on an external library or write our own | |||
* Space inefficient (14Mb vs 2.6Mb for MWDBII) | |||
* No zoom level attribute or any node level attributes | |||
* Vector level attributes are stored in .dbf format which adds complexity to implementing shapefile support | |||
The ideal solution would therefore be to convert the Natural Earth data into a more efficient file format that includes a zoom level attribute | |||
As you say, there's the two approaches that could work for the lightweight | As you say, there's the two approaches that could work for the lightweight | ||
Line 28: | Line 41: | ||
Some pros/cons to consider: | Some pros/cons to consider: | ||
The 1:10m country file is 6.55MB and contains 533,202 points = 12.28 | The 1:10m country file is 6.55MB and contains 533,202 points = 12.28 | ||
bytes/point compared to the PNT which is 745KB and contains | bytes/point compared to the PNT which is 745KB and contains 127,246 points = | ||
5.85 bytes/point, which would suggest the NE data in PNT format would be half | 5.85 bytes/point, which would suggest the NE data in PNT format would be half | ||
the size, so 6 MB in total. This could probably be further reduced by a light | the size, so 6 MB in total. This could probably be further reduced by a light | ||
application of Douglas-Peucker. | application of Douglas-Peucker. | ||
The NE shapefiles have been carefully processed so shared borders and | The NE shapefiles have been carefully processed so shared borders and |
Revision as of 22:56, 1 March 2011
Natural Earth
Marble currently uses the very old and outdated MWDBII dataset for vector map outlines and we really need to replace it with more up-to-date data. However, MWDBII has two key advantages, it is very compact in size enabling Marble to ship it by default, and the individual nodes have a zoom level value which speeds up drawing.
The Natural Earth data set is a "public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software." This data set seems ideal as a replacement for the MWDBII.
Advantages:
- Free / Public Domain data
- Regularly updated
- Wide variety of political and geographic features
- Available at 3 different scales: 1:10m 1:50m and 1:110m
- All feature nodes at same scale are matched
- Data attributes such as country code, population, relative magnitude, etc
Disadvantages:
- Is in Shapefile format which is space inefficient
- No per node zoom level attribute
- Different scale datasets do not match so cannot efficiently be used together for zooming
The 1:10m dataset seems ideal as the base map in Marble as it provides a higher level of detail than the current MWDBII. The 1:110m dataset seems ideal for use in a country selector widget. The 1:50m dataset being less detailed than the MWDBII is probably not useful to Marble.
Using the data in the default shapefile format is not considered desirable however:
- No shapefile format support in Marble (yet), would have to rely on an external library or write our own
- Space inefficient (14Mb vs 2.6Mb for MWDBII)
- No zoom level attribute or any node level attributes
- Vector level attributes are stored in .dbf format which adds complexity to implementing shapefile support
The ideal solution would therefore be to convert the Natural Earth data into a more efficient file format that includes a zoom level attribute
As you say, there's the two approaches that could work for the lightweight default layer: a) Convert just the required NE datasets to pnt format, either merging the 3 scale levels into a single file with just 3 detail levels, or use Douglas- Peucker on the 1:10m files to create the required detail levels b) Implement an internal lightweight shapefile parser without dbf support and ship only the required NE datasets.
Some pros/cons to consider:
The 1:10m country file is 6.55MB and contains 533,202 points = 12.28
bytes/point compared to the PNT which is 745KB and contains 127,246 points =
5.85 bytes/point, which would suggest the NE data in PNT format would be half
the size, so 6 MB in total. This could probably be further reduced by a light
application of Douglas-Peucker.
The NE shapefiles have been carefully processed so shared borders and overlapping features like rivers match exactly and other such niceties, applying the Douglas-Peucker algorithm might affect that.
A lightweight shapefile parser would allow users/apps to load other shapefiles.
We would have to reconvert and check the data every time there's a new NE release which could be a lot of effort, but an automated shp2pnt script could prove useful to allow apps/users to display their own shapefiles in a simple way.
Overall, it seems the best approach for the updating the lightweight layer is indeed to convert the shapefiles to PNT format, provided the D-H algorithm can be deployed in a way to mark each point with a detail level rather than just throwing the points away.
For the heavier vector layer with attributes, rather than just implementing a shapefile importer based on shapelib (a C library), it might be better to implement an GDAL/OGR importer which would import any vector format and is a C++ library. That would depend on packaging weight and difficulty, so needs more investigation, but that's for another day.
Cheers!
John.
Key Natural Earth data files from v1.2, recent updates to 1.3 not included. 1:110m 1:50m 1:10m ------ ----- ----- Admin level 0 countries 172 KB 1.36 MB 6.55 MB Admin level 0 land borders 39 KB 301 KB 896 KB Admin level 0 sea borders 12 KB 40 KB 79 KB Admin level 0 disputed 40 KB 157 KB Admin level 1 regions 39 KB 339 MB 13.9 MB * Admin level 1 land borders 16 KB 60 KB 4.82 MB Coastlines 79 KB 883 KB 2.15 MB Rivers 19 KB 420 KB 3.29 MB Lakes 10 KB 286 KB 786 MB Glaciers 13 KB 208 KB 1.23 MB Dateline 18 KB 18 KB 18 KB Playas 18 KB 106 KB Ice Shelves 105 KB 211 KB Minor Islands 449 KB Reefs 171 KB ------- ------- --------- 417 KB 4.08 MB 34.03 MB * level 1 regions are USA/Canada only at 110m and 50m, but whole world at 10m, perfect for KGeography use :-) Other useful files: Physical Features Land 146 KB 1.50 MB 692 KB Physical Features Sea 348 KB 836 KB 836 MB Populated Places 347 KB 1.48 MB Urban Areas 439 KB 3.48 MB Bathmetry 11.64 MB
Datasets:
Countries
– matched boundary lines and polygons with names attributes for countries and
sovereign states. Includes dependencies (French Polynesia), map units (U.S. Pacific Island Territories) and sub-national map subunits (Corsica versus mainland Metropolitan France).
Disputed areas and breakaway regions
From Kashmir to the Elemi Triangle, Northern Cyprus to Western Sahara.
First order admin (provinces, departments, states, etc.)
– internal boundaries and polygons for all but a few tiny island nations. Includes names attributes and some statistical groupings of the same for smaller countries.
Populated places
– point symbols with name attributes. Includes capitals, major cities and towns, plus significant smaller towns in sparsely inhabited regions. We favor regional significance over population census in determining rankings.
Urban polygons
– derived from 2002-2003 MODIS satellite data.
Parks and protected areas
- US Only
- Don't use, maybe user download layer
Pacific nation groupings
– boxes for keeping these far-flung islands tidy.
Water boundary indicators
– partial selection of key 200-mile nautical limits, plus some disputed, treaty, and median lines.
Coastline
– ocean coastline, including major islands. Coastline is matched to land and water polygons.
Land
– Land polygons including major islands
Ocean
– Ocean polygon split into contiguous pieces.
Minor Islands
– additional small ocean islands ranked to two levels of relative importance.
Reefs
– major coral reefs from WDB2.
Physical region features
– polygon and point labels of major physical features.
Rivers and Lake Centerlines
– ranked by relative importance. Includes name and line width attributes. Don’t want minor lakes? Turn on their centerlines to avoid unseemly data gaps.
Lakes
– ranked by relative importance, coordinating with river ranking. Includes name attributes.
Glaciated areas
– polygons derived from DCW, except for Antarctica derived from MOA. Includes name attributes for major polar glaciers.
Antarctic ice shelves
– derived from 2003-2004 MOA. Reflects recent ice shelf collapses.
Bathymetry
– nested polygons at 0, -200, -1,000, -2,000, -3,000, -4,000, -5,000, -6,000, -7,000, -8,000, -9,000,and -10,000 meters. Created from SRTM Plus.
Geographic lines
– Polar circles, tropical circles, equator, and International Date Line.
Graticules
– 1-, 5-, 10-, 15-, 20-, and 30-degree increments. Includes WGS84 bounding box.
tk
Using GeoPainter and GeoDataLineString ("libgeodata"):
- Apply Douglas-Peuker dynamicallyin GeoDataLineString class to set Detail Level
- When GeoDataLineString modified (add/del point) set dirty flag
- When node accessed for drawing, Dirty flag would trigger D-P to calculate detail level and unset dirty flag
- Would benefit all vector formats, e.g. kml, ppx, shp
New PNT format:
- 1st integer (32 bit): Latitude in arcseconds
highest bit indicates new polygon starts: header information has to be read from 3rd integer
- 2nd integer (32 bit): Longitude in arcseconds
- optional 3rd integer: feature key
highest bit feature geometry (line or ring).
1:10 dataset in PNT2 format:
- each point takes 64 bits/8 bytes
- the start of polygon which takes 96 bits
- roughly 533,202 x 8 bytes = 4 Mb for the country borders alone
- If that's too much to ship, then ship the 1:50 dataset and download the 1:10 asap
Metadata file:
- convert / filter dbf into our own format
Rather than the Geonames ID, we could just use the Natural Earth object ID, then a look-up file/table that matches the NE ID to the ISO / FIPS / whatever code (NE provides this in the metadata) and Geonames ID (which we would have to provide). This would allow look-ups via whatever code or ID is available, and we wouldn't be reliant on Geonames IDs staying constant.
So required work is: 1) Fix GeoPainter LinearRings which contain a pole not rendered correctly
- Torsten knows, problem will fill in flat map needs to create polygon if closed and crosses dateline once only
2) Implement Douglas-Peuker reduction in GeoDataLineString
- big investigation
3) New PNT file format definition (with a different name, MBL?)
- existing serial format from geodata???
4) Metadata file format definition (could just be csv or xml? or sqlite?) 5) New GeoData PNT2 file loading code (convert olf data). 6) shp2mbl script to convert shp to new formats (using Perl::shp? there's shp2xxx scripts out there we could copy?), including matching to Geonames ID 7) split files into 'ship with', 'download asap', 'ghns'@
Later add simple shapefile loading to GeoData, maybe with attibute layer?
GHNS:
- can it download layers as well as themes?