Deployed
DuckDB WASM Demo
Browser-based SQL analytics with DuckDB and WebAssembly. Share and query datasets without a local setup.
DuckDB is an embedded database system similar to SQLite. Datasets can be saved as .duckdb files and shared directly between developers, or hosted alongside the WASM-compiled DuckDB engine. With the WASM approach, queries can be run against datasets directly from a browser.
Tech Stack
- DuckDB
- HTML/CSS/JS
- WASM
- CDNs
- Web Workers
Architecture
- Static files served from a basic NGINX reverse proxy.
- CDN hosting the DuckDB Javascript wrappers and WASM binaries.
- Web Workers to run the database engine asynchronously.
Challenges
- Initializing the DuckDB engine in the browser adds noticeable load time to pages.
- Frequently changing datasets are not suitable for static hosting.
- Large datasets (larger than ~50-200MB) strain client memory, since the entire file is loaded into an
ArrayBufferbefore DuckDB can read it. - DuckDB is a relatively new (but certainly reputable) tool that some are unfamiliar with.
Learnings
- WASM is more approachable than expected, especially when JavaScript wrappers and WASM binaries are available from a CDN.
- In the DuckDB case, the JavaScript wrapper identifies the best WASM binary to fetch based on the client’s browser.
- Web Workers let us run queries asynchronously without blocking the main UI thread.
- Web Workers cannot be created using cross-origin URLs — the
Worker()constructor strictly enforces a same-origin policy, regardless of CORS headers.- The solution is to wrap the CDN script URL in a
Bloband callURL.createObjectURL(), which produces ablob://URL that is same-origin to the page.
- The solution is to wrap the CDN script URL in a
- Window functions are a nice addition to the SQL specification.
For the Future
- Explore the file size limits of datasets.
- Compare WASM query performance against native query performance.
- Build out a richer UI that can browse available datasets and load them on demand.
- Fetch datasets from a separate file server or API, rather than the same static website that serves the DuckDB WASM code.
- This would avoid rebuilding the static site when datasets change.
- Use Parquet files hosted on a remote server, allowing DuckDB to fetch only the relevant row groups via HTTP range requests.