ins-transform
ins_transform
enables to parse external files of various formats, and encode them into NetCDF
.
Transforms In-Situ data into NetCDF files
usage: ins-transform [-h] [--network NETWORK] --station-id <SID>
[--mapping <mapping.json>]
[--station-metadata <station-meta.csv>]
[--network-metadata <network-meta.csv>]
[--cdl <schema.cdl>] [--incremental]
[--strict-resolution] [--no-qc] [--check]
[--status-folder <folder>]
<out.nc> <file|dir> [<file|dir> ...]
Positional Arguments
- <out.nc>
Output file
- <file|dir>
Input files or folders
Named Arguments
- --network, -n
Network name
- --station-id, -s
Station ID
- --mapping, -m
Use a generic parser with custom mapping. Tu be used in conjonction with –station-metadata and network-metadata
- --station-metadata, -sm
Use custom station metadata (Station_* properties) instead of embedded ones.
- --network-metadata, -nm
Use custom metadata for the Networks (Network_* properties), instead of mebedded ones
- --cdl
Use a custom CDL (NetCDF schema)
- --incremental, -i
Incremental mode, skipping input files having a ‘.done’ status files
Default: False
- --strict-resolution, -sr
Skip chunks having a different resulution
Default: False
- --no-qc
Do not compute QC flags
Default: False
- --check, -c
Check potential override of data
Default: False
- --status-folder, -f
Folder for status files in incremental mode
Converting supported networks
The following command encodes all zip
files of the folder in/ABOM/LER
into the netcdf file ABOM-LER.nc
.
ins-transform --network ABOM --station-id LER ABOM-LER.nc in/ABOM/LER/*.zip
Converting custom Excel or CSV files
By default, libinsitu embeds decoders for several networks
To convert your own custom files, you need :
A schema CDL file, describing the layout of the output NetCDF file and, its variables and metadata. This file may refer placeholders
{Station_XXX}
and{Network_XXX}
, replaced by the values found in station and network* CSV files. By default, the embedded CDL is used.A CSV file containing the metadata for each station, similar to the embedded ones.
Optionally, a CSV files containing network metadata : to replace the placeholders
{Netowrk_XXX}
A
mapping.json
(or yaml) file, describing the mapping between the columns of the input files, and the variables of the output NetCDF file.
Format of the mapping file
The mapping file can be either in JSON or YAML. It should follow this format :
{
"separator" : ";", # [Optional] Separator for CSV files. Default : ","
"skip_lines": [1, 2, 4], # [Optional ] list of header lines to skip, starting at one. Default : None
"mapping" : { # Actual mapping of variables. Keys are destination variables as found in the CDL schema.
"time" : "timetamp", # Compact format for time mapping, with single column name
# -- OR --
"time" : { # Expanded mapping for time
"col" : ["date", "time"], # One or more source columns for time
"format" : "%Y/%m/%d %H:%M:%S", # [Optional] Format of date and time. Infered by default
"timezone" : "CET", # [Optional] UTC by default. Can be "TimzoneName", or "+0400" or placeholders "{Station_Timezone}"
},
# -- Data var mapping --
"dest_var1" : "source_col1", # Compact format for mapping
# -- OR --
"dest_var2" : { # Expanded mapping for var
"col" : "source_col", # Name of source column
"scale" : 100, # [Optional] Scale to apply to source data. 1 by default (no scale)
"offset" : 12.1, # [Optional] Offset to apply to source data. 0 by default. Offset is applied after scale.
}
}
}
Examples of transforming custom files
Here is an example for transforming custom csv file into NetCDF :
ins-transform \
--no-qc \
--station-metadata stations.csv \
--cdl schema.cdl \
--mapping mapping.yaml \
--station-id AAA \
AAA.nc input.csv
This command will create the file AAA.nc
. If the file already exists, it will be updated.
The input files look like this:
mapping.yaml
mapping:
time:
col: [Date, Time]
format: "%Y-%m-%d %H:%M"
timezone: "-01:00"
temperature: Temp
separator: ";"
input.csv
Date |
Time |
Temp |
---|---|---|
2008-08-01 |
00:05 |
10.5 |
2008-08-01 |
00:10 |
12 |
2008-08-01 |
00:15 |
13 |
2008-08-01 |
00:20 |
14 |
stations.csv
Name |
ID |
Latitude |
Longitude |
Elevation |
TimeResolution |
StartDate |
---|---|---|---|---|---|---|
Station AAA |
AAA |
9.0667 |
7.4833 |
536 |
5M |
2008-07-30 |
schema.cdl
This custom schema defines the list of variables and metadata, follwing CF conventions,
with placeholders replaced by values of the file stations.csv
.
netcdf base {
dimensions:
time = UNLIMITED ; # Main dimension
variables:
# Dummy var, holding CRS data for GIS software
double crs;
*:grid_mapping_name = "latitude_longitude" ;
*:longitude_of_prime_meridian = 0.0 ;
*:semi_major_axis = 6378137.0 ;
*:inverse_flattening = 298.257223563 ;
*:epsg_code = "EPSG:4326";
*:_FillValue = -999.0;
# Static scalar value, filled from meta data
string station_name;
*:standard_name = "platform_name"
*:long_name = "station_name" ;
*:cf_role = "timeseries_id" ;
*:_value = {!Station_ID};
# Single scalar value, filled from meta data
float latitude;
*:long_name = "station latitude" ;
*:standard_name = "latitude" ;
*:units = "degrees_north" ;
*:_CoordinateAxisType = "Lat" ;
*:_value = {!Station_Latitude};
*:axis = "Y" ;
# Single scalar value, filled from meta data
float longitude;
*:long_name = "station longitude" ;
*:standard_name = "longitude" ;
*:units = "degrees_east" ;
*:_CoordinateAxisType = "Lon" ;
*:_value = {!Station_Longitude};
*:axis = "X" ;
# Single scalar value, filled from meta data
float elevation;
*:long_name = "Elevation above mean seal level" ;
*:standard_name = "height_above_mean_sea_level" ;
*:_CoordinateAxisType = "Z" ;
*:units = "m" ;
*:_value = {!Station_Elevation};
*:axis = "Z" ;
# Time : UTC, uniform, expressed as seconds since epoch.
uint time(time) ;
*:long_name = "Time of measurement" ;
*:standard_name = "time" ;
*:units = "seconds since 1970-01-01 00:00:00";
*:time_origin = "1970-01-01 00:00:00" ;
*:time_zone= "UTC"
*:_CoordinateAxisType = "Time" ;
*:axis = "T" ;
*:calendar = "gregorian" ;
# Single data var
float temperature(time) ;
*:long_name = "Air temperature at 2 m height" ;
*:standard_name = "air_temperature" ;
*:coordinates = "time latitude longitude elevation "
*:units = "K";
*:grid_mapping = "crs" ;
*:least_significant_digit=1; # Excepted precision, for losly compression
*:_FillValue = -999.0;
# Global attributes
# Main info
:id = "{Network_ID}-{Station_ID}";
:title = "Timeseries of {Network_ID}. Station : {Station_Name}" ;
:keywords_vocabulary = "GCMD Science Keywords" ;
:keywords_vocabulary_url = "https://gcmd.earthdata.nasa.gov/static/kms/" ;
:record = "Basic measurements (global irradiance, direct irradiance, diffuse irradiance, air temperature, relative humidity, pressure)" ;
:featureType = "timeSeries" ;
:cdm_data_type = "timeSeries";
:product_version = "libinsitu {Version}"
# Conventions
:Conventions = "CF-1.10 ACDD-1.3";
# Publisher [ACDD1.3]
:publisher_name = "Name of publisher of data";
:publisher_email = "publisher@email.com";
:publisher_url = "http://publisher.url" ;
:publisher_institution = "Publisher institution name"
# Creator info [ACDD1.3]
:creator_name = "Creator of data" ;
:institution = "{Station_Institute}" ;
:metadata_link = "{Station_Url}";
:creator_email = "{Network_Email}";
:creator_url = "{Network_URL}" ;
:references = "http://some.doi" ;
:license = "{Network_License}" ;
:comment = "{Station_Comment}" ;
# Station info & coordinates [ACDD1.3]
:project = "Network name"; # Network long name
:platform = "{Station_Name}" ; # Should be a long / full name
:geospatial_lat_min = {Station_Latitude} ;
:geospatial_lon_min = {Station_Longitude} ;
:geospatial_lat_max = {Station_Latitude} ;
:geospatial_lon_max = {Station_Longitude} ;
:geospatial_vertical_min = {Station_Elevation};
:geospatial_vertical_max = {Station_Elevation};
:geospatial_bounds = "POINT({Station_Latitude} {Station_Longitude})";
:geospatial_bounds_crs = "EPSG:4326";
# Time information
:time_coverage_start = "{Station_StartDate}T00:00:00" ; # First data [Dataset Discovery v1.0]
:time_coverage_end = "{LastData}"; # Last data [Dataset Discovery v1.0]
:time_coverage_resolution = "P{Station_TimeResolution}"; # Resolution in ISO 8601:2004 duration format [Dataset Discovery v1.0]
:local_time_zone = "{Station_Timezone}" ;
:date_created = "{CreationTime}";
:date_modified = "{UpdateTime}";
}
The resulting NetCdf
file should contain all data and metadata.
Here is the output of ins-cat -t CSV -hd AAA.nc
, dumping NetCDF data and metadata as a CSV file:
# id = MyNetwork-AAA
# title = Timeseries of . Station : Station AAA
# keywords_vocabulary = GCMD Science Keywords
# keywords_vocabulary_url = https://gcmd.earthdata.nasa.gov/static/kms/
# record = Basic measurements (global irradiance, direct irradiance, diffuse irradiance, air temperature, relative humidity, pressure)
# featureType = timeSeries
# cdm_data_type = timeSeries
# product_version = libinsitu unset (local)
# Conventions = CF-1.10 ACDD-1.3
# publisher_name = Name of publisher of data
# publisher_email = publisher@email.com
# publisher_url = http://publisher.url
# publisher_institution = Publisher institution name
# creator_name = Creator of data
# references = http://some.doi
# project = Network name
# platform = Station AAA
# geospatial_lat_min = 9.0667
# geospatial_lon_min = 7.4833
# geospatial_lat_max = 9.0667
# geospatial_lon_max = 7.4833
# geospatial_vertical_min = 536
# geospatial_vertical_max = 536
# geospatial_bounds = POINT(9.0667 7.4833)
# geospatial_bounds_crs = EPSG:4326
# time_coverage_start = 2008-07-30T00:00:00
# time_coverage_resolution = 300
# date_created = 2023-11-28T17:46:03.687062
# date_modified = 2023-11-28T17:46:03.687062
# latitude = 9.0667
# longitude = 7.4833
# elevation = 536.0
# station_name = AAA
# variables:
# crs:
# _FillValue = -999.0
# grid_mapping_name = latitude_longitude
# longitude_of_prime_meridian = 0.0
# semi_major_axis = 6378137.0
# inverse_flattening = 298.257223563
# epsg_code = EPSG:4326
# station_name:
# standard_name = platform_name
# long_name = station_name
# cf_role = timeseries_id
# latitude:
# long_name = station latitude
# standard_name = latitude
# units = degrees_north
# _CoordinateAxisType = Lat
# axis = Y
# longitude:
# long_name = station longitude
# standard_name = longitude
# units = degrees_east
# _CoordinateAxisType = Lon
# axis = X
# elevation:
# long_name = Elevation above mean seal level
# standard_name = height_above_mean_sea_level
# _CoordinateAxisType = Z
# units = m
# axis = Z
# time:
# long_name = Time of measurement
# standard_name = time
# units = seconds since 1970-01-01 00:00:00
# time_origin = 1970-01-01 00:00:00
# time_zone = UTC
# abbreviation = Date/Time
# _CoordinateAxisType = Time
# axis = T
# calendar = gregorian
# temperature:
# _FillValue = -999.0
# least_significant_digit = 1
# long_name = Air temperature at 2 m height
# standard_name = air_temperature
# coordinates = time latitude longitude elevation
# abbreviation = T2
# units = K
# grid_mapping = crs
time,temperature
2008-08-01 01:05:00,10.5
2008-08-01 01:10:00,12.0
2008-08-01 01:15:00,13.0
2008-08-01 01:20:00,14.0