Derived variables#
This notebook shows how to use derived variables. A derived variable is a variable that is not available as an input dataset, but computed from one or more input variables.
import pandas as pd
import yaml
import esmvalcore.preprocessor
from esmvalcore.cmor.table import get_tables
from esmvalcore.config import CFG
from esmvalcore.dataset import Dataset, DerivedDataset, datasets_to_recipe
pd.set_option("display.max_colwidth", None)
First, we configure ESMValCore so it searches the ESGF for data:
CFG["projects"]["CMIP6"].pop(
"data",
None,
) # Clear existing CMIP6 configuration for finding input data
CFG.nested_update(
{
"projects": {
"CMIP6": {
"data": {
"intake-esgf": {
"type": "esmvalcore.io.intake_esgf.IntakeESGFDataSource",
"priority": 2,
"facets": {
"activity": "activity_drs",
"dataset": "source_id",
"ensemble": "member_id",
"exp": "experiment_id",
"institute": "institution_id",
"grid": "grid_label",
"mip": "table_id",
"project": "project",
"short_name": "variable_id",
},
},
},
},
},
},
)
Which variables can be derived?#
The interface for working with derived variables from Python is not very polished yet. To list all available derived variables, we can run:
pd.DataFrame.from_dict(
[
{
"short_name": short_name,
}
| {
k: getattr(
get_tables(CFG, project="CMIP6").get_variable(
table_name="x",
short_name=short_name,
derived=True,
),
k,
None,
)
for k in ["units", "long_name"]
}
for short_name in esmvalcore.preprocessor._derive.ALL_DERIVED_VARIABLES # noqa: SLF001
],
).sort_values("short_name")
| short_name | units | long_name | |
|---|---|---|---|
| 29 | alb | 1 | albedo at the surface |
| 38 | amoc | kg s-1 | Atlantic Meridional Overturning Circulation |
| 44 | asr | W m-2 | Absorbed shortwave radiation |
| 32 | chlora | kg m-3 | chlorophyll concentration |
| 46 | clhmtisccp | % | ISCCP High Level Medium-Thickness Cloud Area Fraction |
| 2 | clhtkisccp | % | ISCCP high level thick cloud area fraction |
| 7 | cllmtisccp | % | ISCCP Low Level Medium-Thickness Cloud Area Fraction |
| 11 | clltkisccp | % | ISCCP low level thick cloud area fraction |
| 0 | clmmtisccp | % | ISCCP Middle Level Medium-Thickness Cloud Area Fraction |
| 36 | clmtkisccp | % | ISCCP Middle Level Thick Cloud Area Fraction |
| 40 | co2s | 1e-06 | Atmosphere CO2 |
| 42 | ctotal | kg m-2 | Total Carbon Mass in Ecosystem |
| 47 | et | mm day-1 | Evapotranspiration |
| 5 | hfns | W m-2 | Surface Net Heat Flux |
| 9 | hurs | % | Near-Surface Relative Humidity |
| 26 | lapserate | K km-1 | Lapse Rate |
| 20 | lvp | W m-2 | Latent Heat Release from Precipitation |
| 8 | lwcre | W m-2 | TOA Longwave Cloud Radiative Effect |
| 41 | lwp | kg m-2 | Liquid Water Path |
| 31 | netcre | W m-2 | TOA Net Cloud Radiative Effect |
| 23 | ohc | J | Heat content in grid cell |
| 43 | qep | kg m-2 s-1 | Net moisture flux into atmosphere |
| 39 | rlns | W m-2 | Surface Net downward Longwave Radiation |
| 13 | rlnst | W m-2 | Net Atmospheric Longwave Cooling |
| 33 | rlnstcs | W m-2 | Net Atmospheric Longwave Cooling assuming clear sky |
| 12 | rlntcs | W m-2 | TOA Net downward Longwave Radiation assuming clear sky |
| 45 | rlus | W m-2 | Surface Upwelling Longwave Radiation |
| 28 | rsns | W m-2 | Surface Net downward Shortwave Radiation |
| 25 | rsnst | W m-2 | Heating from Shortwave Absorption |
| 34 | rsnstcs | W m-2 | Heating from Shortwave Absorption assuming clear sky |
| 22 | rsnstcsnorm | % | Heating from Shortwave Absorption assuming clear sky normalized by incoming solar radiation |
| 27 | rsnt | W m-2 | TOA Net downward Shortwave Radiation |
| 3 | rsntcs | W m-2 | TOA Net downward Shortwave Radiation assuming clear sky |
| 10 | rsus | W m-2 | Surface Upwelling Shortwave Radiation |
| 17 | rtnt | W m-2 | TOA Net downward Total Radiation |
| 1 | sfcwind | NaN | NaN |
| 30 | siextent | 1 | Sea Ice Extent |
| 14 | sispeed | m s-1 | Sea-Ice Speed |
| 37 | sithick | m | Sea Ice Thickness |
| 15 | sm | m3 m-3 | Volumetric Moisture in Upper Portion of Soil Column |
| 16 | soz | m | Stratospheric Ozone Column (O3 mole fraction >= 125 ppb) |
| 4 | swcre | W m-2 | TOA Shortwave Cloud Radiative Effect |
| 21 | toz | m | Total Column Ozone |
| 6 | troz | m | Tropospheric Ozone Column (O3 mole fraction < 125 ppb) |
| 35 | uajet | degrees | Jet position expressed as latitude of maximum meridional wind speed |
| 19 | vegfrac | % | Vegetation Fraction |
| 24 | xch4 | 1 | Column-average Dry-air Mole Fraction of Atmospheric Methane |
| 18 | xco2 | 1 | Column-average Dry-air Mole Fraction of Atmospheric Carbon Dioxide |
Note that modules, functions, and variables starting with a single _ character should be considered internal, so there are no guarantees about the stability of this interface.
Finding available datasets#
We define a dataset template to search for all CMIP6 models that provide all required input datasets to derive lwcre or longwave cloud radiative effect at the top of atmosphere on a monthly resolution for the historical experiment. Note that ESMValCore uses its own names for the facets for a more uniform naming across different CMIP phases and other projects. The mapping to the facet names used on ESGF can be found in Facets.
dataset_template = DerivedDataset(
short_name="lwcre",
mip="Amon",
project="CMIP6",
exp="historical",
dataset="*",
institute="*",
ensemble="r1i1p1f1",
grid="gn",
)
Next, we use the DerivedDataset.from_files method to build a list of datasets from the available files. This may take a while as searching the ESGF for many files may be a bit slow. Because the search results are cached, subsequent searches will be faster.
datasets = list(dataset_template.from_files())
print(f"Found {len(datasets)} datasets, showing the first 10:")
datasets[:10]
Found 37 datasets, showing the first 10:
[DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=TaiESM1, institute=AS-RCEC, ensemble=r1i1p1f1, grid=gn),
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=AWI-CM-1-1-MR, institute=AWI, ensemble=r1i1p1f1, grid=gn),
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=AWI-ESM-1-1-LR, institute=AWI, ensemble=r1i1p1f1, grid=gn),
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=BCC-CSM2-MR, institute=BCC, ensemble=r1i1p1f1, grid=gn),
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=BCC-ESM1, institute=BCC, ensemble=r1i1p1f1, grid=gn),
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=CAMS-CSM1-0, institute=CAMS, ensemble=r1i1p1f1, grid=gn),
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=CAS-ESM2-0, institute=CAS, ensemble=r1i1p1f1, grid=gn),
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=FGOALS-g3, institute=CAS, ensemble=r1i1p1f1, grid=gn),
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=IITM-ESM, institute=CCCR-IITM, ensemble=r1i1p1f1, grid=gn),
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=CanESM5-1, institute=CCCma, ensemble=r1i1p1f1, grid=gn)]
Composing a recipe with derived variables#
To use the datasets found above in a recipe, we will want to use the name of the variable that needs to be derived, along with the derive: true option:
recipe_datasets = [
Dataset(
diagnostic="diagnostic_name",
derive=True,
**dataset.facets,
)
for dataset in datasets
]
print(yaml.safe_dump(datasets_to_recipe(recipe_datasets)))
datasets:
- dataset: ACCESS-CM2
institute: CSIRO-ARCCSS
- dataset: ACCESS-ESM1-5
institute: CSIRO
- dataset: AWI-CM-1-1-MR
institute: AWI
- dataset: AWI-ESM-1-1-LR
institute: AWI
- dataset: BCC-CSM2-MR
institute: BCC
- dataset: BCC-ESM1
institute: BCC
- dataset: CAMS-CSM1-0
institute: CAMS
- dataset: CAS-ESM2-0
institute: CAS
- dataset: CESM2
institute: NCAR
- dataset: CESM2-FV2
institute: NCAR
- dataset: CESM2-WACCM
institute: NCAR
- dataset: CESM2-WACCM-FV2
institute: NCAR
- dataset: CMCC-CM2-HR4
institute: CMCC
- dataset: CMCC-CM2-SR5
institute: CMCC
- dataset: CMCC-ESM2
institute: CMCC
- dataset: CanESM5
institute: CCCma
- dataset: CanESM5-1
institute: CCCma
- dataset: FGOALS-g3
institute: CAS
- dataset: FIO-ESM-2-0
institute: FIO-QLNM
- dataset: GISS-E2-1-G
institute: NASA-GISS
- dataset: GISS-E2-1-G-CC
institute: NASA-GISS
- dataset: GISS-E2-1-H
institute: NASA-GISS
- dataset: GISS-E2-2-G
institute: NASA-GISS
- dataset: GISS-E2-2-H
institute: NASA-GISS
- dataset: ICON-ESM-LR
institute: MPI-M
- dataset: IITM-ESM
institute: CCCR-IITM
- dataset: MIROC6
institute: MIROC
- dataset: MPI-ESM-1-2-HAM
institute: HAMMOZ-Consortium
- dataset: MPI-ESM1-2-HR
institute: MPI-M
- dataset: MPI-ESM1-2-LR
institute: MPI-M
- dataset: MRI-ESM2-0
institute: MRI
- dataset: NESM3
institute: NUIST
- dataset: NorCPM1
institute: NCC
- dataset: NorESM2-LM
institute: NCC
- dataset: NorESM2-MM
institute: NCC
- dataset: SAM0-UNICON
institute: SNU
- dataset: TaiESM1
institute: AS-RCEC
diagnostics:
diagnostic_name:
variables:
lwcre:
derive: true
ensemble: r1i1p1f1
exp: historical
grid: gn
mip: Amon
project: CMIP6
There is also a force_derivation option available for use in the recipe, when set to true that will cause the variable to be derived even if it is already available as a dataset.
Computing the derived variable#
Let’s load the data to derive the first dataset:
dataset = datasets[0]
dataset
DerivedDataset(short_name=lwcre, mip=Amon, project=CMIP6, exp=historical, dataset=TaiESM1, institute=AS-RCEC, ensemble=r1i1p1f1, grid=gn)
cubes = dataset.load()
cubes
WARNING:esmvalcore.cmor.check:There were warnings in variable rlut:
rlut: attribute positive not present
loaded from file
WARNING:esmvalcore.cmor.check:There were warnings in variable rlutcs:
rlutcs: attribute positive not present
loaded from file
| Toa Longwave Cloud Radiative Effect (W m-2) | time | latitude | longitude |
|---|---|---|---|
| Shape | 1980 | 192 | 288 |
| Dimension coordinates | |||
| time | x | - | - |
| latitude | - | x | - |
| longitude | - | - | x |
| Attributes | |||
| Conventions | 'CF-1.7 CMIP-6.2' | ||
| activity_drs | 'CMIP' | ||
| activity_id | 'CMIP' | ||
| branch_method | 'Hybrid-restart from year 0671-01-01 of piControl' | ||
| branch_time | 0.0 | ||
| branch_time_in_child | -674885 | ||
| branch_time_in_parent | 171550.0 | ||
| cmor_version | '3.5.0' | ||
| contact | 'Dr. Wei-Liang Lee (leelupin@gate.sinica.edu.tw)' | ||
| data_specs_version | '01.00.31' | ||
| experiment | 'all-forcing simulation of the recent past' | ||
| experiment_id | 'historical' | ||
| external_variables | 'areacella' | ||
| forcing_index | 1 | ||
| frequency | 'mon' | ||
| further_info_url | 'https://furtherinfo.es-doc.org/CMIP6.AS-RCEC.TaiESM1.historical.none.r ...' | ||
| grid | 'finite-volume grid with 0.9x1.25 degree lat/lon resolution' | ||
| grid_label | 'gn' | ||
| initialization_index | 1 | ||
| institution | 'Research Center for Environmental Changes, Academia Sinica, Nankang, Taipei ...' | ||
| institution_id | 'AS-RCEC' | ||
| license | 'CMIP6 model data produced by NCC is licensed under a Creative Commons Attribution ...' | ||
| member_id | 'r1i1p1f1' | ||
| mip_era | 'CMIP6' | ||
| model_id | 'TaiESM1' | ||
| nominal_resolution | '100 km' | ||
| original_units | 'W/m2' | ||
| parent_activity_id | 'CMIP' | ||
| parent_experiment_id | 'piControl' | ||
| parent_mip_era | 'CMIP6' | ||
| parent_source_id | 'TaiESM1' | ||
| parent_sub_experiment_id | 'none' | ||
| parent_time_units | 'days since 1850-1-1 00:00:00' | ||
| parent_variant_label | 'r1i1p1f1' | ||
| physics_index | 1 | ||
| positive | 'down' | ||
| product | 'model-output' | ||
| realization_index | 1 | ||
| realm | 'atmos' | ||
| references | '10.5194/gmd-2019-377' | ||
| run_variant | 'N/A' | ||
| source | 'TaiESM 1.0 (2018): \naerosol: SNAP (same grid as atmos)\natmos: TaiAM1 ...' | ||
| source_id | 'TaiESM1' | ||
| source_type | 'AOGCM AER BGC' | ||
| sub_experiment | 'none' | ||
| sub_experiment_id | 'none' | ||
| table_id | 'Amon' | ||
| table_info | 'Creation Date:(24 July 2019) MD5:0bb394a356ef9d214d027f1aca45853e' | ||
| title | 'TaiESM1 output prepared for CMIP6' | ||
| variant_label | 'r1i1p1f1' | ||
Implementing your own derived variables#
Guidance on adding new built-in derived variables to ESMValCore is available in Deriving a variable. However, if you are only using the Python interface, you can define an ad-hoc derived variable by subclassing the DerivedDataset class and implementing a custom required attribute and derive method. The required attribute defines the facets that describe the input data:
dataset.required
[{'short_name': 'rlut'}, {'short_name': 'rlutcs'}]
in this case we see that lwcre is derived from variables rlut and rlutcs. The derive method is a function that takes the iris cubes resulting from loading the datasets described by the facets and required attribute as an argument, and computes the derived variable.