Data guide introduction#

Input data plays a role equally important to the radiative transfer equation integration algorithm when it comes to simulating radiative transfer. Data handling and procurement in Eradiate follows the following principles:

  • Eradiate reads standard data formats that are understood by the main data processing libraries of the scientific Python ecosystem. Most data are supplied in the NetCDF format and loaded as xarray datasets. Xarray provides a comprehensive, robust and convenient interface to read, write, manipulate and visualize NetCDF data.

  • Eradiate ships data to use as “sensible defaults”, similar to most radiative transfer models. We try to make shipped data transparent and take advantage of modern documentation facilities to make it comprehensive.

  • Eradiate lets users swap their input with data sourced by themselves. If you want to use your own surface reflection spectra, aerosol single-scattering properties or molecular absorption data, it is possible thanks to the documented data formats.

  • Eradiate provides an interface to facilitate the delivery of shipped data to users. Downloading a dataset is usually not more complicated than issuing a single command line in a terminal.

Data store configuration#

Eradiate ships data managed by its global data store. This data store aggregates multiple data sources that can point to different locations (local or online) and implement different shipment behaviours (download with or without integrity checks). The following configuration items drive the behaviour of Eradiate’s data store:

Development mode

This behaviour is controlled by the ERADIATE_SOURCE_DIR environment variable. In development mode, parts of the data is shipped in a Git submodule. Otherwise, these data are downloaded upon access request.

Offline mode

This behaviour is controlled by the offline setting. In offline mode, all download requests made to the data store are denied. This mode is safer if you want to deliver the data yourself or operate with a bandwidth-limited or unstable connection.

Download directory

Upon download, Eradiate stores data in a directory defined by the download_dir setting. The default location, if this setting is not overridden by the user, depends on whether Eradiate is operating in development mode or not.

Downloading data#

Data download is done using the eradiate data fetch command (see Command-line interface reference). The most common way to download data is to reference a file list when calling eradiate data fetch. Known file lists are displayed using the --list option, e.g.:

$ eradiate data fetch --list
Known file lists:
  all
  minimal
  komodo
  gecko
  monotropa
  mycena
  panellus

A specific file list can then be downloaded by simply requesting it, e.g.:

$ eradiate data fetch komodo
Fetching 'spectra/absorption/mono/komodo/komodo.nc'
✓ found
[/home/username/src/eradiate/.eradiate_downloads/stable/spectra/absorpti
on/mono/komodo/komodo.nc]
Fetching 'spectra/absorption/mono/komodo/metadata.json'
✓ found
[/home/username/src/eradiate/.eradiate_downloads/stable/spectra/absorpti
on/mono/komodo/metadata.json]

Note

Some data might require ancillary files that are not part of the download lists. For example, the current format of molecular absorption databases uses index and spectral coverage tables, which can be missing from the download list, but can be built by Eradiate automatically. While this is done automatically when creating an AbsorptionDatabase, it can also be done manually prior to running computations with Eradiate, using the eradiate data check command with the --fix option, e.g.:

$ eradiate data check monotropa --fix
[10:44:35] INFO     Opening 'monotropa'
           WARNING  Could not find spectral coverage table, building it
[10:44:37] INFO     Success!

Accessing data (advanced users and developers)#

Every file managed by the global data store can be accessed using the eradiate.data.open_dataset() function:

>>> import eradiate
>>> ds = eradiate.data.open_dataset("spectra/solar_irradiance/thuillier_2003.nc")

This function behaves similarly to xarray.open_dataset(). The eradiate.data.load_dataset() also allows eager data loading. File access will, if necessary, trigger data download and caching.

Warning

The data module does not support concurrent download requests from multiple processes running Eradiate. This means that in such cases, two processes requesting the same resource using e.g. eradiate.data.load_dataset() might trigger two downloads that will overwrite each other, resulting in unpredictable (but surely incorrect) behaviour.

If your use case requires running Eradiate from multiple processes, we strongly recommend downloading all required data in advance using the eradiate data fetch command (see Downloading data).

See also

eradiate.data: complete data module reference.