GeoJSON vs Shapefile vs GeoPackage
Compare GeoJSON Shapefile GeoPackage and other geospatial formats. When to use each and migration strategies.
Overview
GeoJSON was defined in RFC 7946 (2016), superseding the original 2008 community spec. It encodes geographic features as standard JSON and has become the default interchange format for web mapping APIs, Mapbox, Leaflet, and PostGIS. Its primary use is transmitting and displaying spatial data in browsers and REST APIs.
Shapefile is an ESRI proprietary format introduced in 1998 that stores geometry in a binary .shp file alongside mandatory companion files. Despite being technically obsolete, it remains the most common format for exchanging data between desktop GIS tools simply because every GIS application has shipped a Shapefile importer for over two decades.
GeoPackage is an OGC standard (OGC 12-128r18) built on SQLite, published in 2014. It is a single portable file that can hold multiple vector layers, raster tiles, and attribute tables with full SQL types. It is the preferred format for offline GIS workflows and data distribution.
FlatGeobuf is a binary, column-oriented format published by Björn Harrtell in 2019. Its flat-buffer encoding (built on FlatBuffers) enables zero-copy reads and native HTTP range request support, making it the fastest format for streaming large datasets directly to a browser without a tile server.
TopoJSON is a JSON extension created by Mike Bostock in 2012 that encodes topology: shared borders between polygons are stored once rather than duplicated in each feature. A TopoJSON file of US county boundaries is typically 80% smaller than the equivalent GeoJSON, making it the preferred format for D3.js choropleth maps where file size directly affects page load time.
Comparison Table
| Format | File Type | Max Size | Multi-file? | Geometry Types | Property Types | Web-ready | Streaming |
|---|---|---|---|---|---|---|---|
| GeoJSON | .json / .geojson | Limited by RAM | No — single file | All 7 RFC 7946 types | string, number, bool, null | Yes | No |
| Shapefile | .shp + .shx + .dbf + .prj | 2 GB per component file | Yes — 3–7 files required | Limited — one type per file, no GeometryCollection | 10-char field names, 254-char string max | No | No |
| GeoPackage | .gpkg | No hard limit (SQLite) | No — single SQLite file | All types including curves | Full SQL types (INTEGER, REAL, TEXT, BLOB, DATE) | No — needs SQLite parser | No |
| FlatGeobuf | .fgb | No hard limit | No — single file | All OGC Simple Feature types | Full types (int, float, bool, string, datetime, binary) | Yes — HTTP range requests | Yes |
| TopoJSON | .json | Limited by RAM | No — single file | All types (encoded as arcs) | string, number, bool, null | Yes | No |
When to Use Each Format
These are opinionated thresholds based on common workflows, not committee recommendations.
GeoJSON
Under 10 MB with web display as the goal: use GeoJSON. It works natively in every JavaScript mapping library without a build step. It passes through REST APIs as plain JSON and is readable in a text editor. Above 10 MB, start profiling load times — a 10 MB GeoJSON file parsed in a browser main thread blocks rendering for 200–800 ms on mid-range hardware.
Shapefile
Only use Shapefile when a downstream system requires it and you have no control over the requirement. Shapefiles truncate field names to 10 characters, cap string values at 254 characters, have no native support for dates with time zones, and require shipping at minimum three files together. Every one of these constraints will bite you eventually. Prefer GeoPackage for any new desktop GIS workflow.
GeoPackage
Over 50 MB for a desktop GIS workflow, or when you need to bundle multiple layers and attribute tables in a single file: use GeoPackage. It handles millions of features without breaking a sweat because SQLite's B-tree index is built in. QGIS, GDAL/OGR, ArcGIS Pro, and PostGIS (via ST_ImportGeoPackage) all support it natively.
FlatGeobuf
Over 50 MB with a web viewer that must load data without a tile server: use FlatGeobuf. The built-in R-tree spatial index means a browser can issue an HTTP range request for only the features inside the current map viewport, fetching kilobytes instead of gigabytes. The flatgeobuf npm package handles the HTTP streaming client side in roughly 30 lines of JavaScript.
TopoJSON
Static visualizations where file size matters and shared topology is meaningful — US states, country borders, census tracts — use TopoJSON. The format was designed for D3.js and the two are still the canonical pairing. Do not use TopoJSON for editing workflows: reconstructing geometry from arc deltas is lossy and topology-aware editing tools are rare outside the Observable/D3 ecosystem.
Migration Strategies
GDAL's ogr2ogr command handles every conversion listed here. Install it via brew install gdal on macOS or apt install gdal-bin on Debian/Ubuntu.
Shapefile to GeoJSON
bashogr2ogr -f GeoJSON output.geojson input.shpWhat you lose: nothing structural, but field names longer than 10 characters were already truncated in the Shapefile. Check ogrinfo -al input.shp to inspect the actual field names before assuming your data is intact.
GeoJSON to GeoPackage
bashogr2ogr -f GPKG output.gpkg input.geojsonWhat you lose: nothing. GeoPackage is a strict superset of GeoJSON's type system. The GeoPackage will also add an fid integer primary key column automatically.
GeoJSON to FlatGeobuf
bashogr2ogr -f FlatGeobuf output.fgb input.geojsonGDAL 3.1+ is required for FlatGeobuf write support. The output includes a spatial index by default. Use -lco SPATIAL_INDEX=NO to skip indexing if you are pre-processing the file before a final conversion.
GeoJSON to TopoJSON
bash# Install the reference implementation
npm install -g topojson-server
# Convert, quantizing coordinates to reduce file size
geo2topo -q 1e5 features=input.geojson > output.topojsonThe -q quantization flag controls precision. 1e5 (100,000 steps) is sufficient for most country-level maps. Reducing to 1e4 cuts file size further but introduces visible coordinate snapping at zoom levels above 10.
GeoPackage to Shapefile
bashogr2ogr -f "ESRI Shapefile" output_dir/ input.gpkgWhat you lose: field names are truncated to 10 characters, string values over 254 characters are silently cut, datetime fields lose timezone information, and any field name containing non-ASCII characters is mangled. Run ogrinfo -al input.gpkg first to audit your schema.
The Shapefile Problem
The Shapefile format was introduced by ESRI in 1998 and has not changed in 26 years. The specification document (ESRI White Paper, July 1998) is still the authoritative reference. Despite being older than Google, the format dominates government data portals, academic datasets, and enterprise GIS pipelines because every GIS tool that has ever existed ships a Shapefile importer.
The format's technical deficiencies are not minor inconveniences — they are architectural limits baked into the binary specification:
- 10-character field name limit — stored in the
.dbfdBASE III file format. A field namedpopulation_density_per_km2becomespopulationwith no warning. - 2 GB per component file — the file size is stored as a 32-bit integer counting 16-bit words. There is no extension mechanism.
- No mixed geometry types — a single Shapefile can hold only one geometry type. Storing both Points and Polygons requires two separate Shapefiles.
- No native UTF-8 — the
.dbfformat predates Unicode. Character encoding is stored in a single byte code page identifier and is frequently wrong or missing, causing mojibake in non-ASCII field values. - Mandatory companion files — a "Shapefile" is actually a minimum of three files (
.shp,.shx,.dbf) and typically four with the projection file (.prj). Sending a Shapefile over email means zipping, and "where's the .prj?" is a support ticket category in every GIS organization.
The reason Shapefiles persist is not technical — it is organizational. Every municipality, agency, and university has years of Shapefile exports in shared drives. The format is the lingua franca of GIS data exchange because it is the lowest common denominator every tool can read. Switching to GeoPackage or FlatGeobuf requires convincing every recipient that their tool can open the new format, which is a coordination problem, not a technical one.
The practical solution: accept Shapefiles as input, convert immediately on ingest, and never produce Shapefiles as output unless a downstream consumer explicitly requires it. The Switch from Shapefile initiative documents these issues with citations if you need to convince a colleague.
GeoJSON Limitations You Should Know
No Built-in CRS Declaration
RFC 7946 mandates WGS 84 (EPSG:4326) exclusively and explicitly forbids the "crs" member that existed in the pre-RFC GeoJSON spec. This means you cannot encode projected coordinates (e.g., EPSG:3857 Web Mercator or EPSG:27700 British National Grid) in a standards-compliant GeoJSON file. If your source data is in a projected CRS, you must reproject to WGS 84 before writing GeoJSON. GDAL does this automatically with -t_srs EPSG:4326.
No Topology
GeoJSON stores each feature's geometry independently. Two adjacent polygons that share a border store that border as duplicate coordinate sequences. This means editing one polygon's border does not automatically update its neighbor — a sliver gap is created. For topology-aware editing, use a PostGIS database or convert to TopoJSON for read-only visualization.
No Spatial Index
A GeoJSON file has no index structure. Finding all features within a bounding box requires a full linear scan of every feature in the file. For a file with 100,000 features, this is 100,000 bounding box intersection tests per query. FlatGeobuf solves this with a packed Hilbert R-tree; GeoPackage solves it with an RTree virtual table in SQLite. For query-heavy workflows, neither GeoJSON nor Shapefile is the right tool.
53-Bit Float Precision
GeoJSON coordinates are JSON numbers, which are IEEE 754 double-precision floats with 53 bits of significand. This gives approximately 15–17 significant decimal digits — more than sufficient for geographic coordinates (WGS 84 longitude to 8 decimal places gives sub-millimeter precision at the equator). In practice, the precision limit is not the issue; the issue is that many tools serialize coordinates with excessive decimal places (e.g., -73.98574523456789012) which inflates file size with meaningless digits. Rounding to 6 decimal places (±11 cm precision) typically reduces GeoJSON file size by 20–40% with no meaningful data loss.
Verbosity at Scale
GeoJSON repeats the keys "type", "geometry", and "properties" for every feature. A FeatureCollection with 1 million simple Point features — each with two coordinate values and one string property — will be roughly 120–150 bytes per feature, or 120–150 MB total. The equivalent FlatGeobuf binary is approximately 40–50 MB. For analytical workloads at this scale, consider converting to GeoParquet (Parquet files with WKB geometry columns), which adds columnar compression and predicate pushdown. See also /geojson-to-kml and /geojson-to-wkt for format conversion tools.