Toward a Scientific Discovery Engine for Weather and Climate Data: A Visual Analytics Workbench for Embedding-Based Exploration
d041308
Weather and climate science is producing increasingly large, high-dimensional datasets from numerical simulations, Earth system models, and AI-based weather and climate models. Embedding-based representations can make these data searchable through similarity search and analog retrieval, but nearest neighbors in latent space are not automatically scientifically meaningful. Researchers need tools to inspect how embeddings organize meteorological data, compare representation models, develop retrieval strategies, and verify results against physical evidence. We present an open-source visual analytics workbench for inspectable, configurable, and scalable embedding-based search over weather and climate data. The system links embedding experiments to source data, metadata, spatial context, model configurations, and retrieval parameters, allowing users to explore latent spaces, construct global or localized queries, and inspect retrieved analogs through meteorological views. We demonstrate the workbench through tropical-cyclone retrieval using ERA5 derived embeddings and IBTrACS metadata, and evaluate its out-of-core retrieval backend to show that large embedding collections can be searched beyond in-memory limits on commodity workstation hardware.
dataset
https://gdex.ucar.edu/datasets/d041308/
protocol: https
name: Dataset Description
description: Related Link
function: information
https://gdex.ucar.edu/datasets/d041308/dataaccess/
protocol: https
name: Data Access
description: Related Link
function: download
climatologyMeteorologyAtmosphere
dataset
revision
2021-03-30
ECMWF ERA5 > ECMWF ERA5 Atmospheric Reanalysis
revision
2026-04-29
EARTH SCIENCE > ATMOSPHERE > WEATHER EVENTS > TROPICAL CYCLONES
revision
2026-04-29
publication
2026-04-30
notPlanned
Creative Commons Attribution 4.0 International License
None
pointOfContact
NSF NCAR Geoscience Data Exchange
name: NSF NCAR Geoscience Data Exchange
description: The Geoscience Data Exchange (GDEX), managed by the Computational and Information Systems Laboratory (CISL) at NSF NCAR, contains a large collection of meteorological, atmospheric composition, and oceanographic observations, and operational and reanalysis model outputs, integrated with NSF NCAR High Performance Compute services to support atmospheric and geosciences research.
function: download
pointOfContact
2026-04-30T21:15:10Z