Retention Policy Frameworks

A retention policy framework is the deterministic control plane that decides when a geospatial asset moves from active compute to cold storage, when it becomes immutable, and when — if ever — it is permitted to be deleted. The failure this page solves is retention by convention: lifecycle decisions left to ad-hoc administrative cleanup, where a frequently joined basemap gets purged on a calendar timer while a decade of superseded drone captures quietly accrues Standard-class charges, and where no immutable audit trail exists to prove to a regulator that a litigation-held dataset was never touched. This guide is for the data engineers, GIS archivists, and cloud architects who must engineer retention as executable state machines — codified in infrastructure-as-code, validated before any irreversible action, and auditable at the individual object level.

The Failure Mode: Retention Drift

Retention drift is the gap between the policy a compliance team believes is in force and the lifecycle rules actually executing against the bucket. It manifests in three recurring ways across spatial archives, and each one is expensive.

The first is policy-without-immutability: a lifecycle rule transitions objects to an archive tier but nothing prevents an over-privileged ETL role or a terraform destroy from deleting them inside their mandated retention window. A regulator does not accept “we had a deletion-protection policy”; they require WORM (Write Once, Read Many) enforcement at the storage layer that even the account root cannot override.

The second is uniform expiry on non-uniform data. Geospatial workloads have wildly divergent decay curves — high-frequency raster time-series, LiDAR point clouds, and transactional vector feature classes all age at different rates. A single 365-day expiration rule applied across a bucket simultaneously over-retains ephemeral processing intermediates and prematurely archives reference layers that downstream tile servers still query weekly.

The third is replica divergence: a legal hold suspends deletion on the primary bucket but the cross-region replica keeps executing its own expiration timer, silently destroying the very evidence the hold was meant to preserve. Retention that is not enforced identically across every replica is not enforced at all.

A retention framework fixes all three by making the retention class a tagged, computed property of each dataset, by binding that class to a hardware-enforced lock mode, and by propagating both across every replica and into the discovery catalog.

Retention Enforcement Flow

Retention is enforced from ingest through audit, not bolted on later:

Prerequisite Context

This page assumes the surrounding system from the Spatial Archival Architecture & Tiering Strategy is already operating. A retention framework is the enforcement layer on top of that architecture, not a replacement for it, so confirm three things are in place before you enable any irreversible rule:

Storage classes and a tier model are configured. Retention windows attach to objects that already live in the right class. Resolve the tier transitions first with the Hot/Warm/Cold Tier Design for Geospatial Data, because the expiry and minimum-storage interactions below depend on which class an object lands in.
A backend with immutable-lock support is selected. Compliance-grade retention requires Object Lock or its equivalent, and the available lock modes differ by provider. Settle vendor selection through Object Storage Selection for GIS Archives first; Object Lock must be enabled at bucket creation and cannot be retrofitted.
A metadata catalog exists to receive status changes. When an object crosses a retention boundary its discovery record must change too. Wire the framework into Metadata Cataloging & Discovery so that archived assets stop being advertised by live WFS/WCS endpoints.

Concept & Design Decisions

Three decisions define a retention framework: the lock mode, the classification scheme, and the transition thresholds.

Lock mode: Compliance vs. Governance

Object Lock offers two modes and choosing wrong is a compliance finding waiting to happen.

Compliance mode blocks deletion and retention-shortening for the entire window, for every principal including the account root. Use it for datasets under a statutory or contractual retention mandate — cadastral records, environmental impact baselines, anything a regulator can subpoena. It cannot be undone, so a fat-fingered 100-year window is permanent.
Governance mode blocks the same operations but permits principals holding s3:BypassGovernanceRetention to override. Use it as the default for operational data where you need deletion protection but must retain an authorized escape hatch for genuine errors.

Legal holds are orthogonal to both: a hold is an open-ended, flag-based suspension of deletion with no fixed expiry, applied and released independently of the retention timer. Litigation uses holds; statute uses Compliance-mode retention.

Classification by dataset class, not creation date

Lifecycle rules must be parameterized by what the data is, not merely when it was created. Drive every rule from object tags applied at ingest, so the policy engine evaluates dataset class, derivative status, and access recency rather than a blanket age timer. A workable baseline taxonomy:

`retention-class` tag	Example assets	Lock mode	Min retention	Transition schedule
`regulatory`	Cadastral, flood-zone, EIA baselines	Compliance	7–30 yr	Glacier @ 90d, no expiry
`reference`	Active basemaps, admin boundaries	Governance	3 yr	Standard-IA @ 60d
`derivative`	Tiles, COG overviews, rendered mosaics	Governance	90 d	IA @ 30d, expire @ 365d
`ephemeral`	ETL intermediates, scratch reprojections	none	—	expire @ 14d

Transition thresholds tied to spatial I/O

Premature archival of frequently joined layers introduces unacceptable restore latency and early-deletion penalties; over-retention of intermediates inflates monthly spend. Evaluate multiple signals before triggering a transition: dataset age, last-access timestamp, derivative-generation status, spatial-index freshness, and query SLA. A reference basemap that backs a live tile service should never be eligible for an archive tier regardless of age, which is why the reference class above stops at Standard-IA. Abstract these provider-specific constraints behind a single policy-as-code layer (Terraform, Crossplane, or Open Policy Agent) so the same intent enforces identically across regions and clouds.

Implementation

The following Terraform provisions an Object Lock-enabled archive bucket and binds tag-scoped lifecycle rules to the taxonomy above. Object Lock requires versioning and must be declared at bucket creation.

# archive bucket — Object Lock MUST be enabled at creation, not retrofitted
resource "aws_s3_bucket" "spatial_archive" {
  bucket              = "org-spatial-archive-prod"
  object_lock_enabled = true
}

resource "aws_s3_bucket_versioning" "spatial_archive" {
  bucket = aws_s3_bucket.spatial_archive.id
  versioning_configuration { status = "Enabled" }
}

# default COMPLIANCE retention for regulatory geospatial records (7 years)
resource "aws_s3_bucket_object_lock_configuration" "spatial_archive" {
  bucket = aws_s3_bucket.spatial_archive.id
  rule {
    default_retention {
      mode = "COMPLIANCE"
      days = 2557 # 7 years; immutable even to account root
    }
  }
}

# tag-scoped lifecycle: each rule filters on the retention-class tag set at ingest
resource "aws_s3_bucket_lifecycle_configuration" "spatial_archive" {
  bucket = aws_s3_bucket.spatial_archive.id

  rule {
    id     = "regulatory-deep-archive"
    status = "Enabled"
    filter { tag { key = "retention-class", value = "regulatory" } }
    transition { days = 90, storage_class = "DEEP_ARCHIVE" }
    # no expiration — Compliance lock governs deletion
  }

  rule {
    id     = "reference-warm"
    status = "Enabled"
    filter { tag { key = "retention-class", value = "reference" } }
    transition { days = 60, storage_class = "STANDARD_IA" }
  }

  rule {
    id     = "derivative-tiles"
    status = "Enabled"
    filter { tag { key = "retention-class", value = "derivative" } }
    transition { days = 30, storage_class = "STANDARD_IA" }
    expiration { days = 365 } # rendered tiles are regenerable
  }

  rule {
    id     = "ephemeral-scratch"
    status = "Enabled"
    filter { tag { key = "retention-class", value = "ephemeral" } }
    expiration { days = 14 }
    # clean up failed multipart uploads of large rasters/point clouds
    abort_incomplete_multipart_upload { days_after_initiation = 7 }
  }
}

Apply the retention-class tag at ingest, never as a later batch job — an object that lands untagged falls through every filter and is silently retained forever on Standard. A minimal ingest tagging call:

aws s3api put-object \
  --bucket org-spatial-archive-prod \
  --key datasets/cadastral/2026/parcels_region_north.gpkg \
  --body parcels_region_north.gpkg \
  --tagging "retention-class=regulatory&compliance-tier=verified" \
  --object-lock-mode COMPLIANCE \
  --object-lock-retain-until-date 2033-06-26T00:00:00Z

Validation Gate

Never enable irreversible rules without a dry run. Validate in three checks before and after rollout.

First, confirm the lock configuration is actually COMPLIANCE and not silently absent:

aws s3api get-object-lock-configuration --bucket org-spatial-archive-prod

Expected output — the mode and window must match your intent exactly:

{
  "ObjectLockConfiguration": {
    "ObjectLockEnabled": "Enabled",
    "Rule": { "DefaultRetention": { "Mode": "COMPLIANCE", "Days": 2557 } }
  }
}

Second, prove an object is genuinely immutable by attempting a delete that should fail:

aws s3api delete-object \
  --bucket org-spatial-archive-prod \
  --key datasets/cadastral/2026/parcels_region_north.gpkg
# Expected: An error occurred (AccessDenied) — Object is WORM protected

Third, surface objects that escaped classification, since these are the silent cost and compliance leak:

aws s3api list-objects-v2 --bucket org-spatial-archive-prod \
  --query "Contents[].Key" --output text | tr '\t' '\n' | while read k; do
    t=$(aws s3api get-object-tagging --bucket org-spatial-archive-prod --key "$k" \
        --query "TagSet[?Key=='retention-class'].Value" --output text)
    [ -z "$t" ] && echo "UNCLASSIFIED: $k"
  done
# Expected output: (empty) — every object carries a retention-class tag

Most common failure — the delete in check two succeeds. The root cause is almost always that Object Lock was never enabled at bucket creation: a lifecycle expiration rule or a versioning-only config gives the appearance of retention without WORM enforcement. Object Lock cannot be turned on after the fact, so the remediation is to create a new lock-enabled bucket and re-replicate the data into it — there is no in-place fix.

Cost & Performance Trade-offs

Retention decisions are storage-economics decisions. The dominant levers are minimum-storage duration (which creates early-deletion penalties), retrieval pricing, and the per-object overhead that punishes small-file spatial archives.

Storage class	Min-storage window	Early-delete penalty	Retrieval latency	Best-fit retention class
Standard	none	none	ms	freshly ingested, pre-classification
Standard-IA	30 days	charged to 30 d	ms	`reference` basemaps
Glacier Flexible	90 days	charged to 90 d	minutes–hours	aging `regulatory`
Deep Archive	180 days	charged to 180 d	up to 12 h	long-horizon `regulatory`

Two spatial-specific cost effects dominate the matrix. Early-deletion penalties make premature transitions actively worse than doing nothing: archive a 2 TB orthomosaic to Deep Archive, discover next week a downstream pipeline still needs it, and you pay the full 180-day storage charge plus a bulk retrieval fee to get it back. Per-object minimum billing punishes archives of many small vector files — a Glacier tier bills a minimum object size, so a folder of thousands of tiny GeoJSON tiles costs far more than its byte count suggests. Consolidate small features before archival; tuning ZSTD Level Configuration for Spatial Files on the consolidated objects compounds the saving by shrinking the bytes that sit under the long retention window.

Failure Modes & Edge Cases

Multi-file legacy formats transition non-atomically. A Shapefile is not one object — it is .shp + .shx + .dbf + .prj (and often .cpg/.sbn). Tag-scoped lifecycle rules can move the .shp while leaving a sidecar on a different timer, producing an unreadable archive. Route these through the dedicated lifecycle rules for Shapefile archives procedure, which keeps the sibling files on one rule, or convert to a single-object format first.
Replica retention drift. A legal hold or Compliance window on the primary does not automatically apply to a cross-region replica. Replication must propagate the lock state, and a hold must suspend expiration on every replica simultaneously, or the replica becomes the deletion path that defeats the hold.
Catalog desync on transition. When an object cold-tiers, its live service endpoints (WFS/WCS/tile servers) must be deprecated and discovery routed to an archived proxy. If the catalog is not updated bidirectionally, analysts hit stale cache entries or broken spatial joins and assume data loss.
Compliance-mode over-commitment. A Compliance window cannot be shortened by anyone. A misconfigured 100-year default on a derivative class permanently blocks deletion of regenerable tiles, inflating storage indefinitely. Default-retention values belong in version-controlled IaC and code review, never a console click.

Operational Execution Checklist

For authoritative lifecycle-configuration syntax and retention semantics, consult the AWS S3 Object Lifecycle Management documentation, and align audit and sanitization controls with the NIST SP 800-88 Rev. 1 guidelines for secure media sanitization and records management.

Up one level: the Spatial Archival Architecture & Tiering Strategy sets the lifecycle this framework enforces.
The Hot/Warm/Cold Tier Design for Geospatial Data defines the transitions that retention rules attach to.
Object Storage Selection for GIS Archives determines which lock modes and archive tiers are available.
Metadata Cataloging & Discovery receives the visibility changes a retention transition triggers.
Implementing Lifecycle Rules for Shapefile Archives handles the multi-file atomicity edge case.
Cross-domain: shrink the bytes under long retention windows with ZSTD Level Configuration for Spatial Files.

Retention Policy Frameworks

The Failure Mode: Retention Drift #

Retention Enforcement Flow #

Prerequisite Context #

Concept & Design Decisions #

Lock mode: Compliance vs. Governance #

Classification by dataset class, not creation date #

Transition thresholds tied to spatial I/O #

Implementation #

Validation Gate #

Cost & Performance Trade-offs #

Failure Modes & Edge Cases #

Operational Execution Checklist #

Related #

Explore this section

Related pages