NetCDF Conventions

This document is a proposed convention for the formatting and distribution of in situ solar radiation measurement data. The goal is to apply best practices for standardizing data and improve interoperability. This allows to develop generic tools such as vizualization, QC, statistics, …

It should be considered as a DRAFT, open for discussions.

This convention is implemented libinsitu which provides Python and CLI tools to :

  • Transform in situ measurements from various networks into standardized datasets

  • Explore and extract data from files following this convention

  • Apply quality checks on NetCDF files and embed resulting flags

libinsitu embeds a Common Data Langage (CDL) template, describing a NetCDF file format, filled at runtime with metadata gathered for several networks and their stations.


File format

We propose to format the data into NetCDF files.

NetCDF is a widespread file format for numerical data. It has several benefits :

  • Compact : NetCDF is very efficient for storing large amount of data. It supports lossless and lossy compression.

  • Well supported : Most languages and tools (Python, R, Matlab, …) have libraries for reading and writing NetCDF files.

  • Self descriptive : NetCDF stores metadata to describe the data (bound, units, …) and the context (station caracteristics).

The present convention is based on two other conventions :

We advise to use version 4 or above of NetCDF.


Data granularity

We recommend to distribute one file per station of measurements.

Providers can split the data into yearly or monthly subsets but should also provide aggregated datasets for easier requesting / subsetting.


Compression

We recommend to activate lossless zlib compression.

Optionally we recommend of use lossly compression by truncating data values to proper significant digits. For this purpose we use the attribute least_significant_digit supported by the Python driver of NetCDF.


Dimensions

Each file should have only one dimension :

  • Unlimited dimension for time, named time


Variables

We propose to include a subset of standard CF variables.

We suggest names for these variables but we only enforce :

  • Their standard_name attribute, as per CF conventions

  • Their units

Time

Each NetCDF file should have a single time variable, with standard_name “time”, along the time dimension. This time should be expressed as seconds since first january 1970. Hence, following the CF conventions, the units of this variable should be “seconds since 1970-01-01T00:00:00”. The data type of the time variable can be either double or int (preferred, more compact).

The time should be uniform :

  • No hole or duplicate values from start to end

  • Same regular time resolution for the whole time span, described with the global attribute time_coverage_resolution

The timezone should be in UTC. The specific local time zone can optionally be specified in the global attribute local_time_zone

Here is an example of a CDL of a Time variable :

int time(time) ;
    Time:long_name = "Time of measurement" ;
    Time:standard_name = "time" ;
    Time:units = "seconds since 1970-01-01 00:00:00";
    Time:axis = "T" ;
    Time:calendar = "gregorian" ;

Station name and coordinates

Following the CF conventions, some station metadata are stored as separate variables :

  • The name of the station, as a string

  • The coordinates of the station should be provided as three separate float variables with no dimensions (single point)

Name

Standard name

Unit

station_name

platform_name

latitude

latitude

degrees_north

longitude

longitude

degrees_east

elevation

height_above_mean_sea_level

m

Here is the corresponding CDL :

string station_name ;
      station_name:standard_name = "platform_name"
      station_name:long_name = "station name" ;
      station_name:cf_role = "timeseries_id" ;

float latitude ;
    latitude:long_name = "station latitude" ;
    latitude:standard_name = "latitude" ;
    latitude:units = "degrees_north" ;
    latitude:axis = "Y" ;

float longitude ;
    longitude:long_name = "station longitude" ;
    longitude:standard_name = "longitude" ;
    longitude:units = "degrees_east" ;
    longitude:axis = "X" ;

float elevation;
    elevation:long_name = "Elevation above mean seal level" ;
    elevation:standard_name = "height_above_mean_sea_level" ;
    elevation:units = "m" ;
    elevation:axis = "Z" ;

CRS

An empty variable named crs should be created to store information about the coordinate system. It should be referenced by any data varaible via the attribute grid_mapping.

double crs ;
    crs:grid_mapping_name = "latitude_longitude" ;
    crs:longitude_of_prime_meridian = "0.0" ;
    crs:semi_major_axis = "6378137.0" ;
    crs:inverse_flattening = "298.257223563" ;
    crs:epsg_code = "EPSG:4326";

Data variables

Data variables should be one dimensional along the time axis. Their type should be float or double.

They should declare the following CF attributes :

  • standard_name (mandatory) : Used to identify them.

  • units (mandatory) : Unit. SI unit is preferred.

  • grid_mapping (mandatory) : Set to “crs” defined above.

  • long name (optional) : Name used for display.

  • valid_min_, valid_max_ (optional) : Float attribute value for expected minimum and maximum (used for QC). Note that we don’t use directly the CF convention valid_min, valid_max here, since some drivers remove values outside of this range. We want to keep full control upon data here, and only use this meta data for flagging some values.

  • least_significant_digit (optional) : Number of significant digits. Used by Python driver at creation time for lossy compression.

  • _FillValue : This is better to set an explicit fill value. We use -999.0, which is a common value.

We propose to include the following subset of CF data variables, depending of their availability.

The variable names is a suggestion. The standard name and units should be respected. We propose to use SI units when possible.

Name

standard_name

unit

GHI

surface_downwelling_shortwave_flux_in_air

W m-2

DHI

surface_diffuse_downwelling_shortwave_flux_in_air

W m-2

BNI

direct_downwelling_shortwave_flux_in_air

W m-2

T2

air_temperature

K

RH

relative_humidity

“1” (ratio)

P

air_pressure

Pa

WS

wind_speed

m s-1

WD

wind_direction

degrees

This translates into the following CDL :

float GHI(time) ;
    GHI:long_name = "Global Horizontal Irradiance" ;
    GHI:standard_name = "surface_downwelling_shortwave_flux_in_air" ;
    GHI:abbreviation = "SWD" ;
    GHI:units = "W m-2" ;
    GHI:valid_min_=0.0 ;
    GHI:valid_max_=3000 ;
    GHI:grid_mapping = "crs" ;
    GHI:least_significant_digit = 1;
    GHI:_FillValue = -999.0;
    
float DHI(time) ;
    DHI:long_name = "Diffuse horizontal radiation" ;
    DHI:standard_name = "surface_diffuse_downwelling_shortwave_flux_in_air" ;
    DHI:abbreviation = "DHI" ;
    DHI:units = "W m-2" ;
    DHI:valid_min_=0.0 ;
    DHI:valid_max_=3000 ;
    DHI:grid_mapping = "crs" ;
    DHI:least_significant_digit = 1;
    DHI:_FillValue = -999.0;

float BNI(time) ;
    BNI:long_name = "Beam (or direct) normal radiation" ;
    BNI:standard_name = "direct_downwelling_shortwave_flux_in_air" ;
    BNI:abbreviation = "BNI" ;
    BNI:units = "W m-2" ;
    BNI:valid_min_=0.0 ;
    BNI:valid_max_=3000 ;
    BNI:grid_mapping = "crs" ;
    BNI:least_significant_digit = 1;
    BNI:_FillValue = -999.0;

float T2(time) ;
    T2:long_name = "Air temperature at 2 m height" ;
    T2:standard_name = "air_temperature" ;
    T2:abbreviation = "T2" ;
    T2:units = "K" ;
    T2:valid_min_=123.0 ;
    T2:valid_max_=372.9 ;
    T2:grid_mapping = "crs" ;
    T2:least_significant_digit = 1;
    T2:_FillValue = -999.0;

float RH(time) ;

    RH:long_name = "Relative humidity" ;
    RH:standard_name = "relative_humidity" ;
    RH:abbreviation = "RH" ;
    RH:units = "1" ;
    RH:valid_min_=0.0 ;
    RH:valid_max_=1.0 ;
    RH:grid_mapping = "crs" ;
    RH:least_significant_digit = 3;
    RH:_FillValue = -999.0;

float WS(time) ;

    WS:long_name = "Wind speed" ;
    WS:standard_name = "wind_speed" ;
    WS:abbreviation = "windspd" ;
    WS:units = "m s-1" ;
    WS:_valid_min_=0.0;
    WS:_valid_max_=100.0;
    WS:grid_mapping = "crs" ;
    WS:least_significant_digit = 2;
    WS:_FillValue = -999.0;

float WD(time) ;

    WD:long_name = "Wind direction, clockwise from north" ;
    WD:standard_name = "wind_direction" ;
    WD:abbreviation = "winddir" ;
    WD:units = "degrees";
    WD:_valid_min_=0.0;
    WD:_valid_max_=360.0;
    WD:grid_mapping = "crs" ;
    WD:least_significant_digit = 1;
    WD:_FillValue = -999.0;

float P(time) ;
    P:parameter = "Station pressure" ;
    P:long_name = "air pressure at station height" ;
    P:standard_name = "air_pressure" ;

    P:units = "Pa" ;
    P:valid_min_=0.0 ;
    P:valid_max_=120000.0;
    P:grid_mapping = "crs";
    P:least_significant_digit = 0;
    P:_FillValue = -999.0;

Quality flags

Optionally, we propose to include quality check (QC) flags directly in the NetCDF file, as bitmap variable.

We follow the recommendations of CF convention on flags for encoding and meta data. We use unsigned int variable named QC with each bit assigned to a given flag.

Here is the corresponding CDL

uint QC(time) ;
    QC:long_name = "QC flag status";
    QC:comment = "Flag=1 means QC test failed";
    QC:coordinates = "time latitude longitude elevation "
    QC:flag_masks = 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024;
    QC:flag_meanings = "T1C_ppl_GHI T1C_erl_GHI T1C_ppl_DIF T1C_erl_DIF T1C_ppl_DNI T1C_erl_DNI T2C_bsrn_kt T2C_seri_kn_kt T2C_seri_k_kt T3C_bsrn_3cmp tracker_off";
    QC:_FillValue = 0;

The list of flags si up to the producer of data and depends on the usage.

The list of flags currently produced by libinsitu are detailed in a dedicated section

Global attributes

Here, we propose a list of recommended global metadata providing additional information of the data and the station.

We try to stick as much as possible to the existing CF and ACDD conventions.

Some of those attributes may seem redondant with the contents of some variables. They are useful anyway as it is simpler to fetch all global attributes than to dig into the values of the variables, espcecially for remote access (ODAP)

Main info

Name

Content

Example

id

{NetWorkId}-{StationID}

“BSRN-CAP”

title

Title of the timeseries

“Timeseries of Baseline Surface Radiation Network (BSRN). Station : Cape Baranova”

summary

Short description

“Archive of solar radiation networks worldwide provided by the Webservice-Energy initiative supported by MINES Paris PSL. Files are provided as NetCDF file format with the support of a Thredds Data Server”

keywords

List of keywords

“meteorology, station, time, Earth Science > Atmosphere > Atmospheric Radiation > Incoming Solar Radiation, Earth Science > Atmosphere > Atmospheric Temperature > Surface Temperature > Air Temperature, Earth Science > Atmosphere > Atmospheric Pressure > Sea Level Pressure”

keywords_vocabulary

“GCMD Science Keywords”

“GCMD Science Keywords”

featureType

“timeSeries”

“timeSeries”

Conventions

List of conventions

“CF-1.9,ACDD-1.3”

Publisher info

Name

Name of the publisher

Example

publisher_name

Content

“Lionel MENARD, Raphael JOLIVET, Yves-Marie SAINT-DRENAN, Philippe BLANC”

publisher_email

Email of publisher

“lionel.menard@mines-paristech.fr, raphael.jolivet@mines-paristech.fr, saint-drenan@mines-paristech.fr, philippe.blanc@mines-paristech.fr”

publisher_url

URL of institution

“https://www.oie.minesparis.psl.eu/”

publisher_institution

Name of publisher institution

“Mines Paristech - PSL”

Creator info

Info on the creator of data. It may or may not be the same as publisher

Name

Content

Example

creator_name

Name of maintener of data / station

“Olga Sidorova (olsid@aari.ru)”

institution

Instituton of creator

NOAA

creator_url

URL of Creator / Network

https://bsrn.awi.de/

references

Academic references for data

“https://doi.org/10.5194/essd-10-1491-2018.”

license

Link to license of data

“https://bsrn.awi.de/data/conditions-of-data-release/”

Station info

The station info are mappend into ACDD attributes.

Name

Content

Example

project

Full Name of Network

“Baseline Surface Radiation Network”

platform

Full Name of station

“Cape Baranova”

geospatial_lat_min

latitude (float , not str)

79.27

geospatial_lon_min

longtitude (float , not str)

101.75

geospatial_lat_max

latitude (float , not str)

79.27

geospatial_lon_max

longtitude (float , not str)

101.75

geospatial_bounds

POINT({Station_Latitude} {Station_Longitude})

“POINT(79.27 101.75)”

geospatial_bounds_crs

Projection

“EPSG:4326”

Time information

Name

Content

Example

time_coverage_start

First timestamp of data (in ISO 8601 format)

“2016-01-01T00:00:00”

time_coverage_end

Last timestamp of data (in ISO 8601 format)

“2016-12-31T23:59:00”

time_coverage_resolution

Resolution in ISO 8601:2004 duration format : “P{minutes}M”

“P1M”

local_time_zone

Local time zone offset

“UTC+07:00”

date_created

Creation time

“2021-01-01T00:00:00”

date_modified

Modification time

“2021-01-01T00:00:00”

Custom IN Situ metadata

The followig attributes are not part of CF or ACDD conventions.

They are additional metadata recommended for this specific use case.

IDs

Unique Ids usefull for identifying network and station.

Name

Content

Example

network_id

Short Id for network

BSRN

station_id

Short Id for station. Same as the content of station_name variable

CAP

station_uid

Numeric ID of the station, if any

102

station_wmo_id

WMO ID of the station, if any

Surface

Description of the surface around the station

Name

Content

Example

surface_type

rock, gress, concrete, cultivated, …

“concrete”

topography_type

flat, hilly, moutain valley, mountain top, …

rural_urban

“rural” or “urban”

“rural”

Station location

Name

Content

Example

network_region

Region of the network

“Global”

station_country

Country of the station

“France”

station_address

Address of the station

“100, Erfurterweg (123)”

station_city

City of the station

“Carpentras”

Misc

Name

Content

Example

climate

Climate at the station (KeoppenGeiger)

“EF”

operation_status

‘active’, ‘inactive’ or ‘closed’

“closed”

Distribution of files

We advise to distribute the NetCDF files with THREDDS data server (TDS).

The server should be configured to provide at least the following services :

  • File server : HTTP download

  • OpenDAP : Remote data request

The files should be organized in a regular hierarchy and grouped by Network. We advise to serve one file per station and to group them by network :

  • NetworkA/

    • NetworkA-station1.nc

    • NetworkA-station2.nc

Alternatively, the data can by split into monthly or yearly NetCDF files. In that case, we advise to also serve aggregated data for easier requesting over OpenDAP :

  • NetworkA/

    • Station1/

      • station1-aggregaged.nc

      • 2018/

        • station1-2018-01.nc

        • station1-2018-02.nc

      • 2019/