[1]:

%reload_ext eradiate.notebook.tutorials

Last updated: 2024-03-07 15:48 (eradiate v0.25.1.dev31+g2a83b0ba.d20240229)

Post-processing pipelines#

Overview

Eradiate provides, for each experiment type, a default post-processing pipeline. However, you might want to perform tasks that go beyond what the default pipeline can achieve, such as applying several spectral response functions to a recorded signal. This tutorial explains how to intercept post-processing pipeline data and how to apply pipeline steps manually.

Prerequisites

Basic knowledge of the Eradiate workflow (see First steps with Eradiate).
Prior knowledge of the Hamilton library is useful but not required.

What you will learn

How to inspect post-processing pipelines.
How to retrieve arbitrary data from the pipeline.
How to build a custom post-processing chain.

Imports#

As for all tutorials the first step is to import all necessary classes and create aliases for convenience:

[2]:

%load_ext eradiate

import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import eradiate
eradiate.set_mode("ckd")
eradiate.config.progress = "spectral_loop"

import eradiate.scenes as ertsc
from eradiate import unit_registry as ureg

Visualizing the pipeline#

We set up a basic simulation to visualize its post-processing pipeline:

[3]:

exp = eradiate.experiments.AtmosphereExperiment(
    illumination={
        "type": "directional",
        "zenith": 30,
    },
    measures={
        "type": "mdistant",
        "construct": "hplane",
        "azimuth": 0.0,
        "zeniths": np.arange(-75.0, 76.0, 5.0),
        "srf": "sentinel_2a-msi-3",
        "spp": 10000,
    }
)

Now, we can use the pipeline() method to fetch the experiment’s post-processing pipeline, which is a Hamilton Driver object:

[4]:

drv = exp.pipeline(0)

The Driver defines how post-processing operations chain with each other, builds a graph to represent the pipeline and specifies the input, output and intermediate data handled during workflow execution. The Driver instance has a method to display its internal graph:

[5]:

drv.display_all_functions()

[5]:

../../_images/tutorials_advanced_postprocessing_pipelines_10_1.svg

On this graph, we can see all the functions that will be executed during post-processing, as well as the inputt and configuration variables, together with their corresponding data types. We can also list the variables as a table:

[6]:

eradiate.pipelines.list_variables(drv, as_table=True)

[6]:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Name                          ┃ Type         ┃ Input ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━┩
│ angles                        │ ndarray      │ x     │
│ bitmaps                       │ dict         │ x     │
│ illumination                  │ Illumination │ x     │
│ spectral_set                  │ SpectralSet  │ x     │
│ srf                           │ Spectrum     │ x     │
│ bidirectional_reflectance     │ dict         │       │
│ bidirectional_reflectance_srf │ dict         │       │
│ brdf                          │ DataArray    │       │
│ brdf_srf                      │ DataArray    │       │
│ brf                           │ DataArray    │       │
│ brf_srf                       │ DataArray    │       │
│ extract_irradiance            │ dict         │       │
│ gather_bitmaps                │ dict         │       │
│ irradiance                    │ DataArray    │       │
│ irradiance_srf                │ DataArray    │       │
│ radiance                      │ DataArray    │       │
│ radiance_raw                  │ DataArray    │       │
│ radiance_srf                  │ DataArray    │       │
│ solar_angles                  │ Dataset      │       │
│ spectral_response             │ DataArray    │       │
│ spp                           │ DataArray    │       │
│ viewing_angles                │ Dataset      │       │
│ weights_raw                   │ DataArray    │       │
└───────────────────────────────┴──────────────┴───────┘

Upon pipeline execution, Hamilton builds an execution graph from the global one we just displayed. For that purpose, it needs the outputs requested by the user. We can pick output names from the table above to see what happens if the driver is requested the irradiance and radiance products. In order to visualize execution, Hamilton also requires that we provide inputs: to start with, we provide a fake input dictionary.

Note

The list of inputs may look like it was magically made up; it was not! If you try and call the visualize_execution() method with missing input, an exception will be raised and it will include the list of missing inputs.

[7]:

drv.visualize_execution(
    final_vars=["radiance", "irradiance"],
    inputs={"angles": None, "bitmaps": None, "illumination": None, "spectral_set": None},
    orient="TD",
)

[7]:

../../_images/tutorials_advanced_postprocessing_pipelines_14_1.svg

We can see that Hamilton pruned the global graph to execute only the nodes that will contribute to the requested variables. We can also request intermediate outputs in a similar way:

[8]:

drv.visualize_execution(
    final_vars=["radiance", "irradiance", "solar_angles"],
    inputs={"angles": None, "bitmaps": None, "illumination": None, "spectral_set": None},
    orient="TD",
)

[8]:

../../_images/tutorials_advanced_postprocessing_pipelines_16_1.svg

With this, we have the basic tool to understand which are the input and output variable of the post-processing pipeline. Now, let’s see how to use it.

Running custom post-processing operations#

So far, we have been manipulating manually the default post-processing pipeline. Sometimes, however, we might want to do more than what this default pipeline can do. While writing a custom pipeline is possible, it is most often an unnecessary time investment, and we have a much simpler approach to build a custom, single-use post-processing chain. In this section, we will collect intermediate data and apply post-processing routines directly to build our customized output dataset.

As a supporting case, we consider a simulation in which we want to see what two similar instruments observing the same target will record. There are several ways to achieve this, listed here in increasing order of efficiency:

Set up two full-fledged experiments and run the Monte Carlo ray tracing simulation twice. This scenario is the most wasteful because it repeats all steps (experiment pre-processing, processing and post-processing) twice.
Set up a single experiment with two measures. This scenario wastes a little less computational time because it repeats only the ray tracing simulation.
Set up a single experiment covering the union of the spectral range covered by both sensors, extract the spectral signal and apply the spectral response function of both instruments as a custom post-processing step. This is the most efficient method.

We will see now how to implement method 3. We will simulate a signal recorded by the MSI instruments onboard Sentinel-2A and Sentinel–2B (bands 2, 3 and 4). Let us first have a look at their spectral response functions:

[14]:

bands = [
    "sentinel_2a-msi-2",
    "sentinel_2a-msi-3",
    "sentinel_2a-msi-4",
    "sentinel_2b-msi-2",
    "sentinel_2b-msi-3",
    "sentinel_2b-msi-4",
]

srf_ds = {
    band: eradiate.data.load_dataset(f"spectra/srf/{band}.nc")
    for band in bands
}

fig, ax = plt.subplots(1, 1)
for band, ds in srf_ds.items():
    ds.srf.plot(label=band, ax=ax)
plt.legend(ncol=2, loc="lower center", bbox_to_anchor=(0.5, 1.0))
plt.show()
plt.close()

../../_images/tutorials_advanced_postprocessing_pipelines_31_1.png

We can easily extract the bounds of the wavelength domain we should cover:

[15]:

df = pd.DataFrame(
    {
        band: {"wmin": float(ds.w.min()), "wmax": float(ds.w.max())}
        for band, ds in srf_ds.items()
    }
).T
print(f"wmin = {df.wmin.min()}")
print(f"wmax = {df.wmax.max()}")

wmin = 456.0
wmax = 686.0

We use this information to configure define a fictitious SRF which will let the experiment know that we want to process a given interval without apply a particular spectral response function. The simplest way to achieve this is to use the MultiDeltaSpectrum class with many wavelengths in the [\(\lambda_\mathrm{min}\), \(\lambda_\mathrm{max}\)] interval. As explained in the Spectral discretization guide, each wavelength specified in the MultiDeltaSpectrum definition will be used to activate the relevant bin in the spectral discretization. For safety, we use a fairly dense wavelength list (with 1 nm spacing), which we know will be more dense than the spectral discretization. This density would have to be inscreased if we would use a high-resolution atmospheric dataset. We also take a bit of margin on the bounds and set them to 450 nm and 690 nm.

Note

For simplicity, we are here performing the simulation on a connected spectral domain. This makes data processing and plotting simpler, but it is also suboptimal. Ideally, we would use an unconnected spectral domain specification:

srf = {
    "type": "multi_delta",
    "wavelengths": np.concatenate(
        (
            np.arange(450.0, 590.0, 1.0) * ureg.nm, # Bands 2 and 3
            np.arange(640.0, 690.0, 1.0) * ureg.nm, # Band 4
        )
    ),
}

Our minimal experiment setup and run looks like this:

[16]:

exp = eradiate.experiments.AtmosphereExperiment(
    illumination={
        "type": "directional",
        "zenith": 30,
    },
    measures={
        "type": "mdistant",
        "construct": "hplane",
        "azimuth": 0.0,
        "zeniths": np.arange(-75.0, 76.0, 5.0),
        "srf": {
            "type": "multi_delta",
            "wavelengths": np.arange(450.0, 690.0, 1.0) * ureg.nm
        },
        "spp": 10000,
    }
)
result = eradiate.run(exp)
result

Since we used a MultiDeltaSpectrum to specify the instrument’s SRF, only the spectral radiance product is available in the result dataset. Let’s post-process it manually and apply the spectral response function manually. For that purpose, we use the apply_spectral_response() function, which takes as parameters the SRF as an InterpolatedSpectrum, and the radiance data array. The SRF object has to be created manually, and we automate this with a short function:

[17]:

from eradiate.scenes.spectra import InterpolatedSpectrum

# We need to create Eradiate Spectrum instances representing the SRF from the datasets
def srf_spectrum(ds):
    w = ureg.Quantity(ds.w.values, ds.w.attrs["units"])
    srf = ds.data_vars["srf"].values
    return InterpolatedSpectrum(quantity="dimensionless", wavelengths=w, values=srf)

We now proceed with the post-processing operation and aggregate the results into a single data array, each radiance array being indexed on a new band dimension:

[18]:

from eradiate.pipelines.logic import apply_spectral_response

radiances_srf = []

for band in bands:
    srf = srf_spectrum(srf_ds[band])
    radiances_srf.append(apply_spectral_response(result.radiance, srf))

# Now, we concatenate all of these into a data array for convenience
radiance_srf = xr.concat(radiances_srf, dim="band").assign_coords({"band": bands})
radiance_srf

And we can finally plot the resulting dataset using xarray’s built-in plotting routines:

[19]:

radiance_srf.plot(hue="band", x="vza")
plt.show()
plt.close()

../../_images/tutorials_advanced_postprocessing_pipelines_41_1.png

Final words#

We have covered the basics on how to perform automatic and manual post-processing. For simplicity, we did not perform a realistic simulation nor address a practical use-case; to go further you can:

Make the simulation more realistic by adding an atmosphere (see the Molecular atmosphere basics and Particle layer basics).
Increase the sample count to reduce the noise in the simulated signal.
In the manual post-processing section, you can compute the BRDF and BRF using the compute_bidirectional_reflectance() function. You will first need to apply the SRF to the irradiance spectrum in a way similar to what we did for radiance.

Post-processing pipelines

Contents