Identification

Title

Toward a Scientific Discovery Engine for Weather and Climate Data: A Visual Analytics Workbench for Embedding-Based Exploration

Alternative title(s)

d041308

Abstract

Weather and climate science is producing increasingly large, high-dimensional datasets from numerical simulations, Earth system models, and AI-based weather and climate models. Embedding-based representations can make these data searchable through similarity search and analog retrieval, but nearest neighbors in latent space are not automatically scientifically meaningful. Researchers need tools to inspect how embeddings organize meteorological data, compare representation models, develop retrieval strategies, and verify results against physical evidence. We present an open-source visual analytics workbench for inspectable, configurable, and scalable embedding-based search over weather and climate data. The system links embedding experiments to source data, metadata, spatial context, model configurations, and retrieval parameters, allowing users to explore latent spaces, construct global or localized queries, and inspect retrieved analogs through meteorological views. We demonstrate the workbench through tropical-cyclone retrieval using ERA5 derived embeddings and IBTrACS metadata, and evaluate its out-of-core retrieval backend to show that large embedding collections can be searched beyond in-memory limits on commodity workstation hardware.

Resource type

dataset

Resource locator

https://gdex.ucar.edu/datasets/d041308/

protocol: https

name: Dataset Description

description: Related Link

function: information

https://gdex.ucar.edu/datasets/d041308/dataaccess/

protocol: https

name: Data Access

description: Related Link

function: download

Unique resource identifier

code

codeSpace

Dataset language

Spatial reference system

code identifying the spatial reference system

Classification of spatial data and services

Topic category

climatologyMeteorologyAtmosphere

Keywords

Keyword set

keyword value

dataset

originating controlled vocabulary

title

Resource Type

reference date

date type

revision

effective date

2021-03-30

Keyword set

keyword value

ECMWF ERA5 > ECMWF ERA5 Atmospheric Reanalysis

originating controlled vocabulary

title

U.S. National Aeronautics and Space Administration Global Change Master Directory

reference date

date type

revision

effective date

2026-04-29

Keyword set

keyword value

EARTH SCIENCE > ATMOSPHERE > WEATHER EVENTS > TROPICAL CYCLONES

originating controlled vocabulary

title

U.S. National Aeronautics and Space Administration Global Change Master Directory

reference date

date type

revision

effective date

2026-04-29

Geographic location

West bounding longitude

East bounding longitude

North bounding latitude

South bounding latitude

Temporal reference

Temporal extent

Begin position

End position

Dataset reference date

date type

publication

effective date

2026-04-30

Frequency of update

notPlanned

Quality and validity

Lineage

Conformity

Data format

name of format

version of format

Constraints related to access and use

Constraint set

Use constraints

Creative Commons Attribution 4.0 International License

Limitations on public access

None

Responsible organisations

Responsible party

organisation name

email address

datahelp@ucar.edu

responsible party role

pointOfContact

Metadata on metadata

Metadata point of contact

organisation name

NSF NCAR Geoscience Data Exchange

email address

datahelp@ucar.edu

web address

https://gdex.ucar.edu

name: NSF NCAR Geoscience Data Exchange

description: The Geoscience Data Exchange (GDEX), managed by the Computational and Information Systems Laboratory (CISL) at NSF NCAR, contains a large collection of meteorological, atmospheric composition, and oceanographic observations, and operational and reanalysis model outputs, integrated with NSF NCAR High Performance Compute services to support atmospheric and geosciences research.

function: download

responsible party role

pointOfContact

Metadata date

2026-04-30T21:15:10Z

Metadata language

eng; USA