netcdf_to_dataframe

This function takes local or remote (via OpenDAP) NetCDF file and transforms it into a Pandas DataFrame.

libinsitu.netcdf_to_dataframe(ncfile: Union[Dataset, str], start_time: Optional[Union[datetime, datetime64]] = None, end_time: Optional[Union[datetime, datetime64]] = None, rel_start_time=None, rel_end_time=None, drop_duplicates=True, skip_na=False, skip_qc=False, vars=None, user=None, password=None, chunked=False, chunk_size=5000, steps=1, rename_cols=False, expand_qc=False)

Load NETCDF in-situ file (or part of it) into a panda Dataframe, with time as index.

Parameters
  • ncfile – NetCDF Dataset or filename, or OpenDAP URL

  • rename_cols – If True (default) rename solar irradiance columns as per convention (GHI, BNI, DHI)

  • drop_duplicates – If true (default), duplicate rows are droppped

  • skip_qc

    If true, filters rows having any failing QC. False by default (no filter).

    You can also provide a list of flags to filter : [“T3C_bsrn_3cmp”, “T2C_seri_kn_kt”]

    Or filter on any flags but some, by prepending ‘!’ : [“!T3C_bsrn_3cmp”, “!T2C_seri_kn_kt”]

    For full list of flags, see the [online doc](https://libinsitu.readthedocs.io/en/latest/qc.html)

  • skip_na – If True, drop rows containing only nan values

  • start_time – Start time (first record by default) : Datetime or datetime64

  • end_time – End time (last record by default) : Datetile or datetime64

  • rel_end_time – End time, relative to actual start time : relativedelta

  • rel_start_time – Start time, relatie to actual end time : relativedelta

  • vars – List of columns names to convert (all by default)

  • user – Optional login for OpenDAP URL

  • password – Optional password OpenDAP URL

  • chunked – If True, does not load the whole file in memory at once : returns an iterator on Dataframe chunks.

  • chunk_size – Size of chunks for chunked data

  • steps – Downsampling (1 by default)

  • expand_qc – If True, expand the QC bitmaps into one boolean column for each flag with name “QC.<flag>”

Returns

Pandas Dataframe, or iterator on Dataframes if chunking is activated

Example

# Fetch one year of data over the network (OpenDAP), for 3 variables
df = netcdf_to_dataframe(
    "http://tds.webservice-energy.org/thredds/dodsC/nrelmidc-stations/NREL_MIDC-BMS.nc",
    start_time=datetime(2020, 1, 1),
    end_time=datetime(2021, 3, 1),
    vars=["GHI", "BNI", "DHI"])