Deployed

DuckDB WASM Demo

Browser-based SQL analytics with DuckDB and WebAssembly. Share and query datasets without a local setup.

DuckDBWASMSQLData AnalyticsWeb WorkersProgressive Web AppWater CSS

DuckDB is an embedded database system similar to SQLite. Datasets can be saved as .duckdb files and shared directly between developers, or hosted alongside the WASM-compiled DuckDB engine. With the WASM approach, queries can be run against datasets directly from a browser.

Tech Stack

  • DuckDB
  • HTML/CSS/JS
  • WASM
  • CDNs
  • Web Workers

Architecture

  • Static files served from a basic NGINX reverse proxy.
  • CDN hosting the DuckDB Javascript wrappers and WASM binaries.
  • Web Workers to run the database engine asynchronously.

Challenges

  • Initializing the DuckDB engine in the browser adds noticeable load time to pages.
  • Frequently changing datasets are not suitable for static hosting.
  • Large datasets (larger than ~50-200MB) strain client memory, since the entire file is loaded into an ArrayBuffer before DuckDB can read it.
  • DuckDB is a relatively new (but certainly reputable) tool that some are unfamiliar with.

Learnings

  • WASM is more approachable than expected, especially when JavaScript wrappers and WASM binaries are available from a CDN.
    • In the DuckDB case, the JavaScript wrapper identifies the best WASM binary to fetch based on the client’s browser.
  • Web Workers let us run queries asynchronously without blocking the main UI thread.
  • Web Workers cannot be created using cross-origin URLs — the Worker() constructor strictly enforces a same-origin policy, regardless of CORS headers.
    • The solution is to wrap the CDN script URL in a Blob and call URL.createObjectURL(), which produces a blob:// URL that is same-origin to the page.
  • Window functions are a nice addition to the SQL specification.

For the Future

  • Explore the file size limits of datasets.
  • Compare WASM query performance against native query performance.
  • Build out a richer UI that can browse available datasets and load them on demand.
  • Fetch datasets from a separate file server or API, rather than the same static website that serves the DuckDB WASM code.
    • This would avoid rebuilding the static site when datasets change.
  • Use Parquet files hosted on a remote server, allowing DuckDB to fetch only the relevant row groups via HTTP range requests.