May 14, 2026 Read on ssp.sh
4

Internal vs. External Storage? What's the Limit of External Tables

Data EngineeringData PlatformsData ModelingTools & ProductsIndustry

Simon Späti traces the 25-year history of external tables from Oracle 9i in 2001 to modern lakehouse architectures, arguing they represent one of data engineering's most enduring patterns. He explains that external tables are essentially pointers to data stored outside a database, queryable via SQL without moving data. The post compares internal vs. external storage tradeoffs around query speed, cost, and data freshness, with benchmarks showing external tables pay only a 1.3×–1.7× performance tax while offering up to 20× storage cost savings. Späti shows how the pattern evolved from CSV parsing to open table formats with ACID guarantees, and argues that with DuckLake and open catalogs, the distinction between external and internal is dissolving. He invokes the Lindy Effect to predict external tables will persist another 25 years.

External tables represent one of data engineering's most durable patterns because the principle of reading data in place rather than moving it only grows more valuable as formats evolve, and with modern lakehouse architectures the line between external and internal storage is finally dissolving.
  • 3

    Everything that is old will be new again. Exactly what the Lindy Effect is all about.

  • 5

    The modern external table isn't external anymore. DuckDB reads them directly, and DuckLake handles the metadata that multifile lakehouse architectures would otherwise drown in.

  • 5

    The lesson from history is that whenever someone tries to replace it, the pattern is that reading data in place always beats moving it.

  • 4

    The external table, once a humble workaround for avoiding data loads, has become the architectural foundation of the data lakehouse.

  • 6

    With open data catalogs, the warehouse becomes a stateless rental over a bucket you own, and external tables are increasingly managed.

  • 3

    The cycle from 'external table for cheap storage' to 'external table as a full ACID database on S3' took 25 years, completing the journey back to database principles while maintaining the separation of storage and compute.

  • 2

    Disk space was growing more quickly than our compute needs, which is what triggered the adoption of external tables.

technical, historically grounded, optimistic