Data guide introduction#
Input data plays a role equally important to the radiative transfer equation integration algorithm when it comes to simulating radiative transfer. Data handling and procurement in Eradiate follows the following principles:
Eradiate reads standard data formats that are understood by the main data processing libraries of the scientific Python ecosystem. Most data are supplied in the NetCDF format and loaded as xarray datasets. Xarray provides a comprehensive, robust and convenient interface to read, write, manipulate and visualize NetCDF data.
Eradiate ships data to use as “sensible defaults”, similar to most radiative transfer models. We try to make shipped data transparent and take advantage of modern documentation facilities to make it comprehensive.
Eradiate lets users swap their input with data sourced by themselves. If you want to use your own surface reflection spectra, aerosol single-scattering properties or molecular absorption data, it is possible thanks to the documented data formats.
Eradiate provides an interface to facilitate the delivery of shipped data to users. Downloading a dataset is usually not more complicated than issuing a single command line in a terminal.
Data store configuration#
Eradiate ships data managed by its global data store. This data store aggregates multiple data sources that can point to different locations (local or online) and implement different shipment behaviours (download with or without integrity checks). The following configuration items drive the behaviour of Eradiate’s data store:
- Development mode
This behaviour is controlled by the
ERADIATE_SOURCE_DIR
environment variable. In development mode, parts of the data is shipped in a Git submodule. Otherwise, these data are downloaded upon access request.- Offline mode
This behaviour is controlled by the offline setting. In offline mode, all download requests made to the data store are denied. This mode is safer if you want to deliver the data yourself or operate with a bandwidth-limited or unstable connection.
- Download directory
Upon download, Eradiate stores data in a directory defined by the download_dir setting. The default location, if this setting is not overridden by the user, depends on whether Eradiate is operating in development mode or not.
Downloading data#
Data download is done using the eradiate data fetch
command
(see Command-line interface reference). The most common way to download data is to
reference a file list when calling eradiate data fetch
. Known file lists
are displayed using the --list
option, e.g.:
$ eradiate data fetch --list
Known file lists:
all
minimal
komodo
gecko
monotropa
mycena
panellus
A specific file list can then be downloaded by simply requesting it, e.g.:
$ eradiate data fetch komodo
Fetching 'spectra/absorption/mono/komodo/komodo.nc'
✓ found
[/home/username/src/eradiate/.eradiate_downloads/stable/spectra/absorpti
on/mono/komodo/komodo.nc]
Fetching 'spectra/absorption/mono/komodo/metadata.json'
✓ found
[/home/username/src/eradiate/.eradiate_downloads/stable/spectra/absorpti
on/mono/komodo/metadata.json]
Note
Some data might require ancillary files that are not part of the download
lists. For example, the current format of molecular absorption databases uses
index and spectral coverage tables, which can be missing from the download
list, but can be built by Eradiate automatically. While this is done
automatically when creating an AbsorptionDatabase
, it can also be
done manually prior to running computations with Eradiate, using the
eradiate data check
command with the --fix
option, e.g.:
$ eradiate data check monotropa --fix
[10:44:35] INFO Opening 'monotropa'
WARNING Could not find spectral coverage table, building it
[10:44:37] INFO Success!
Accessing data (advanced users and developers)#
Every file managed by the global data store can be accessed using the
eradiate.data.open_dataset()
function:
>>> import eradiate
>>> ds = eradiate.data.open_dataset("spectra/solar_irradiance/thuillier_2003.nc")
This function behaves similarly to xarray.open_dataset()
. The
eradiate.data.load_dataset()
also allows eager data loading. File access
will, if necessary, trigger data download and caching.
Warning
The data module does not support concurrent download requests from multiple
processes running Eradiate. This means that in such cases, two processes
requesting the same resource using e.g. eradiate.data.load_dataset()
might trigger two downloads that will overwrite each other, resulting in
unpredictable (but surely incorrect) behaviour.
If your use case requires running Eradiate from multiple processes, we
strongly recommend downloading all required data in advance using the
eradiate data fetch
command (see Downloading data).
See also
eradiate.data
: complete data module reference.