netcdf_to_dataframe
This function takes local or remote (via OpenDAP) NetCDF file and transforms it into a Pandas DataFrame.
- libinsitu.netcdf_to_dataframe(ncfile: Union[Dataset, str], start_time: Optional[Union[datetime, datetime64]] = None, end_time: Optional[Union[datetime, datetime64]] = None, rel_start_time=None, rel_end_time=None, drop_duplicates=True, skip_na=False, skip_qc=False, vars=None, user=None, password=None, chunked=False, chunk_size=5000, steps=1, rename_cols=False, expand_qc=False)
Load NETCDF in-situ file (or part of it) into a panda Dataframe, with time as index.
- Parameters
ncfile – NetCDF Dataset or filename, or OpenDAP URL
rename_cols – If True (default) rename solar irradiance columns as per convention (GHI, BNI, DHI)
drop_duplicates – If true (default), duplicate rows are droppped
skip_qc –
If true, filters rows having any failing QC. False by default (no filter).
You can also provide a list of flags to filter : [“T3C_bsrn_3cmp”, “T2C_seri_kn_kt”]
Or filter on any flags but some, by prepending ‘!’ : [“!T3C_bsrn_3cmp”, “!T2C_seri_kn_kt”]
For full list of flags, see the [online doc](https://libinsitu.readthedocs.io/en/latest/qc.html)
skip_na – If True, drop rows containing only nan values
start_time – Start time (first record by default) : Datetime or datetime64
end_time – End time (last record by default) : Datetile or datetime64
rel_end_time – End time, relative to actual start time : relativedelta
rel_start_time – Start time, relatie to actual end time : relativedelta
vars – List of columns names to convert (all by default)
user – Optional login for OpenDAP URL
password – Optional password OpenDAP URL
chunked – If True, does not load the whole file in memory at once : returns an iterator on Dataframe chunks.
chunk_size – Size of chunks for chunked data
steps – Downsampling (1 by default)
expand_qc – If True, expand the QC bitmaps into one boolean column for each flag with name “QC.<flag>”
- Returns
Pandas Dataframe, or iterator on Dataframes if chunking is activated
Example
# Fetch one year of data over the network (OpenDAP), for 3 variables
df = netcdf_to_dataframe(
"http://tds.webservice-energy.org/thredds/dodsC/nrelmidc-stations/NREL_MIDC-BMS.nc",
start_time=datetime(2020, 1, 1),
end_time=datetime(2021, 3, 1),
vars=["GHI", "BNI", "DHI"])