FlatGeobuf Optimization Techniques

FlatGeobuf is a cloud-native vector format whose packed spatial index and HTTP range-request model let a client read only the bytes a bounding box touches — but a naively converted .fgb artifact silently discards that advantage, ballooning cold-storage cost and turning sub-second map queries into full-file downloads. This page shows data engineers and GIS archivists how to tune indexing, compression, schema, and CRS handling inside the broader Format Conversion & Pipeline Automation workflow so that archival .fgb files stay small, deterministic, and cheap to serve at scale.

The Failure Mode: Indexes and Bytes Out of Alignment

The inefficiency this topic solves is subtle because a broken FlatGeobuf file still opens correctly in every reader. Three regressions creep in during automated conversion. First, the packed Hilbert R-tree index is omitted (GDAL builds it by default, but explicit pipeline flags or streaming writers frequently disable it), so a bounding-box query degrades into a sequential scan of the entire object. Second, because FlatGeobuf has no internal codec, teams forget to compress at the storage or transport layer and pay full uncompressed egress on every retrieval. Third, unbounded attribute schemas and implicit reprojection inflate each feature record, multiplying both the per-GET byte count and the audit surface. The symptom — retrieval latency that grows linearly with archive size instead of staying flat — only appears once the archive is large enough that a re-conversion is expensive. Catching these at conversion time is the entire point.

Prerequisite Context

This page assumes you have already converted source data to FlatGeobuf as part of a repeatable pipeline rather than a one-off export, and that the surrounding archival decisions are in place: a target object store and storage class selected, a retention policy framework governing how long .fgb artifacts are held, and a hot/warm/cold tier design that decides which tier serves live web maps versus deep archive. FlatGeobuf is the web-delivery and range-read tier in that design; if your access pattern is analytical (columnar predicate pushdown, vectorized scans), the GeoParquet Migration Workflows pipeline owns that tier and the two formats coexist behind one manifest. This page is a deep-dive under the parent Format Conversion & Pipeline Automation topic — start there for the end-to-end conversion architecture.

How Range Reads Work

The packed spatial index lets a client fetch only the bytes its bounding box needs:

Concept & Design Decisions

Every optimization decision for an archival .fgb file reduces to four levers: whether to pack the spatial index, how to compress, how wide to let the schema grow, and which CRS to freeze.

Spatial index packing. Retrieval performance is governed by the packed Hilbert R-tree. Enabling it clusters features along a Hilbert space-filling curve so that spatially adjacent geometries sit adjacent on disk, minimizing the number of distinct byte ranges a bounding-box query must request. This matters most for anisotropic datasets — linear infrastructure corridors, coastal transects, pipeline networks — where a small geographic window can otherwise scatter across the whole file. The trade-off is concrete: Hilbert packing adds roughly 15–25% write-time CPU but cuts cold-storage random-access latency by up to ~60%. The decision rule: enable the index for any dataset that will be queried by extent (web maps, tile servers, point lookups); disable it (SPATIAL_INDEX=NO) only for archives that are exclusively read as bulk sequential scans or compliance exports, where the index adds 8–12 MB of dead weight per 1M features.

Compression layer. FlatGeobuf’s on-disk layout is uncompressed binary tuned for range reads, so compression is an external decision, not a creation option. Compress at the object-store tier (store the object ZSTD-compressed) or at the transport tier (serve it gzip/Brotli over HTTP). For tuning the codec level itself, the same logic from ZSTD Level Configuration for Spatial Files applies: level 9 typically yields ~40% better ratios than level 3 with negligible decompression cost on modern x86/ARM silicon. When attribute payloads are highly repetitive, dictionary encoding for GIS attributes at the storage layer compounds the savings.

Schema width. Strip every column not required for delivery or compliance. A narrower schema shrinks each feature record, which directly lowers the byte count of every range read and reduces the breach surface a data subject access request must reason about. Push column selection into the conversion step rather than post-processing.

CRS freeze. Declare the target CRS explicitly so the pipeline cannot silently reproject. A single drifting EPSG code across partitions breaks spatial joins and invalidates extent-based audits; this is handled rigorously in CRS Synchronization in Pipelines.

Implementation

The following GDAL conversions encode all four decisions. Build the index, freeze the CRS, and prune the schema in a single deterministic ogr2ogr pass over a real archival corridor dataset, then compress the artifact at the storage layer.

# Convert an infrastructure-corridor shapefile to an indexed, schema-pruned,
# CRS-frozen FlatGeobuf archive.
#   -t_srs        : freeze the output CRS so no implicit reprojection occurs
#   SPATIAL_INDEX : pack the Hilbert R-tree for bounding-box range reads
#   -sql          : allowlist only delivery/compliance columns (prunes payload)
ogr2ogr -f FlatGeobuf \
  datasets/corridors/2023/pipeline_north.fgb \
  datasets/corridors/2023/pipeline_north.shp \
  -t_srs EPSG:4326 \
  -lco SPATIAL_INDEX=YES \
  -sql "SELECT id, status, recorded_at FROM pipeline_north"

# FlatGeobuf has no internal codec — compress the immutable object at the
# storage/transport layer before it lands in cold storage.
zstd -9 datasets/corridors/2023/pipeline_north.fgb \
  -o datasets/corridors/2023/pipeline_north.fgb.zst

For purely sequential archives where the index is dead weight, invert the index flag while keeping the same CRS and schema discipline:

# Bulk compliance-export variant: drop the index, keep CRS + schema controls.
ogr2ogr -f FlatGeobuf \
  exports/compliance/2023/parcels_bulk.fgb \
  datasets/parcels/2023/parcels.gpkg \
  -t_srs EPSG:4326 \
  -lco SPATIAL_INDEX=NO \
  -sql "SELECT parcel_id, owner_class, assessed_at FROM parcels"

To keep web delivery and analytics aligned, record a dual-format manifest at write time so the router sends map traffic to the .fgb and analytical traffic to its GeoParquet twin without re-deriving either.

Validation Gate

Never promote a converted artifact to immutable storage on faith — confirm the index is present and the geometry/feature counts survived the conversion. ogrinfo reports both:

ogrinfo -so datasets/corridors/2023/pipeline_north.fgb pipeline_north

Expected output (the index line is the gate — Spatial Index: YES must appear):

INFO: Open of `datasets/corridors/2023/pipeline_north.fgb'
      using driver `FlatGeobuf' successful.

Layer name: pipeline_north
Geometry: Line String
Feature Count: 1048576
Extent: (-124.482003, 32.528832) - (-114.131211, 42.009518)
Spatial Index: YES
Layer SRS WKT: GEOGCRS["WGS 84", ...]

Then prove the index actually serves range reads by querying a known extent and confirming a tiny subset returns:

# A small window over a 1M-feature corridor must return far fewer features.
ogrinfo -spat -120.0 36.0 -119.5 36.5 \
  -ro datasets/corridors/2023/pipeline_north.fgb pipeline_north \
  -sql "SELECT COUNT(*) FROM pipeline_north"

Most common failure — Spatial Index: NO on a file you meant to index. Root cause is almost always a streaming or append writer (or a default-overriding -lco SPATIAL_INDEX=NO left in a pipeline template) that emitted features without buffering them for the Hilbert sort, since the packed index can only be built once the full extent is known. Re-run the conversion as a single batched ogr2ogr pass with SPATIAL_INDEX=YES; do not attempt to retrofit an index onto the existing object.

Cost & Performance Trade-offs

The levers above translate directly into object-storage economics, where every GET carries a fixed request charge plus per-byte egress.

Decision	Storage / CPU impact	Retrieval impact
Hilbert index ON	+15–25% write CPU; +8–12 MB / 1M features	Up to ~60% lower random-access latency; far fewer ranged `GET`s
Hilbert index OFF	Smallest file; minimal write CPU	Full-scan reads — only acceptable for sequential/bulk access
ZSTD level 3 (storage)	Low CPU	Baseline ratio
ZSTD level 9 (storage)	Higher CPU	~40% smaller objects → lower egress per read
Schema pruning (`-sql`)	Smaller records, lower write cost	Fewer bytes per ranged read; smaller audit surface

The economic crossover to watch: once a dataset exceeds ~50 GB or is queried analytically at high frequency, the fixed per-GET cost of range-reading FlatGeobuf is outpaced by columnar predicate pushdown, and the analytical tier should move to GeoParquet while FlatGeobuf retains web delivery. Validating these numbers against your own access logs before committing to immutable storage is cheaper than re-converting a cold archive later.

Failure Modes & Edge Cases

Index omitted by a streaming writer. As in the validation gate above, append/stream paths cannot build the packed Hilbert index because it requires the full extent up front. Always materialize the layer in a single batched conversion when the index is required.
CRS metadata loss on conversion. Without an explicit -t_srs, GDAL may carry forward an ambiguous or missing .prj and downstream consumers silently assume EPSG:4326. Quarantine inputs with undeclared CRS and resolve them through CRS Synchronization in Pipelines before serialization.
Schema drift on upstream column additions. When a provider adds fields, an unguarded SELECT * re-widens the archive and breaks the deterministic deserialization contract. Route new columns to a staging layer, validate them against the allowlist in Schema Mapping & Attribute Validation, and merge into the archival manifest only after approval.
Compression block size vs. range-read granularity. Compressing a whole .fgb as one ZSTD frame forces a full-object decompress on any range read, defeating the index. For range-served objects, prefer transport-layer encoding or block-aligned compression so partial reads stay partial; reserve whole-file zstd -9 for cold, sequentially-read archives.

Operational Execution Checklist

Enable the packed Hilbert spatial index (-lco SPATIAL_INDEX=YES) for any extent-queried archive
Compress .fgb at the storage/transport layer — FlatGeobuf has no internal codec
Prune attributes to a compliance allowlist via an explicit -sql column list
Declare the target CRS with -t_srs; quarantine inputs with undeclared CRS
Gate every artifact on ogrinfo -so showing Spatial Index: YES and correct feature count
Generate a dual-format manifest routing web delivery to FlatGeobuf and analytics to GeoParquet
Automate post-conversion validation and per-GET cost tracking in CI/CD

Consult the GDAL FlatGeobuf driver documentation for the full list of creation options and known limitations, and the ZSTD reference for level tuning and dictionary training.

Format Conversion & Pipeline Automation — the parent topic covering the end-to-end conversion and pipeline architecture this page sits within.
GeoParquet Migration Workflows — the columnar analytical counterpart to FlatGeobuf, paired behind a dual-format manifest.
CRS Synchronization in Pipelines — enforce a single coordinate reference system so .fgb extents and joins stay valid.
Schema Mapping & Attribute Validation — field-level allowlists and type checks that keep pruned schemas deterministic.
Optimizing FlatGeobuf for Web Mapping Archives — applying these techniques to public and internal mapping portals.
ZSTD Level Configuration for Spatial Files — tuning the codec level used when compressing .fgb objects at rest.

FlatGeobuf Optimization Techniques

The Failure Mode: Indexes and Bytes Out of Alignment #

Prerequisite Context #

How Range Reads Work #

Concept & Design Decisions #

Implementation #

Validation Gate #

Cost & Performance Trade-offs #

Failure Modes & Edge Cases #

Operational Execution Checklist #

Related #

Explore this section

Related pages