Preparing simulation files



Submission of simulations

  • Simulations are submitted to a dedicated file system on a central server located at DKRZ Hamburg. To access this server, you will need an account, please see Accessing ISIMIP data on the DKRZ server for more information.
  • You will submit your simulations to the Upload folder within DKRZ at /work/bb0820/ISIMIP/<simulation-round>/UploadArea/<sector>/<model>/_tmp/ . Within it you will find the subfolders corresponding to the time periods of the simulation round. Be sure to place your files in the correct folders. Incorrect upload can delay the processing of your files.
  • Before uploading your files, please review the protocol corresponding to your simulation round. Changes to the protocols are documented in our website: here for ISIMIP2b, and here for ISIMIP2a.
  • Please comply precisely with the formatting specified below to facilitate comparison among different models and between global and regional scale. Incorrect formatting can delay the analysis of your files.
  • Before submitting complete simulations, please upload a single example file for checking and send a quick email to isimip-data@pik-potsdam.de to let us know - this could save much time and bandwidth!
  • You can download a ISIMIP2b example NetCDF file to help you with your file preparation.

Working with NetCDF files

Files should be provided in compressed NetCDF format, a self-describing, machine-independent data format that support the creation, access, and sharing of array-oriented scientific data. It can be read/written/processed for example by:

File names

Within every protocol there is a section dedicated to the conventions on file naming, which applies to all sectors.

File names consist of a series of identifiers, separated by underscores. Identifiers depend on the simulation round, time interval and may be dependent on the sector (see details below).

Things to note:

  • Report one variable per file (there are few exceptions, see below)
  • In filenames, use lowercase letters only
  • Use underscore (“_”) to separate identifiers
  • Use hyphens (”-“) to separate strings within an identifier, e.g. in a model or in a variable's name
  • NetCDF file extension is .nc4 (for simulations belonging to ISIMIP3, extension is .nc)
  • use only the combinations of scenarios specified in the protocols
  • start and end years should correspond to reporting period, time interval and whether the simulation in global or local

In general, file names should follow this convention:

<model-name>_<gcm/observations>_<bias-correction>_<climate-scenario>­_<socio-econ-scenario>_<sens-scenarios>_<variable>_<region>_<timestep>_<start-year>_<end-year>.nc4

Here you'll find example file names for the biomes and global water sector.

File formats and meta data

Your submitted files should follow these general formatting specifications (you'll find examples of NetCDF headers below):

  • Format:
    • gridded NetCDF4 classic internally zipped (check with command cdo showformat FILE), if internal compression is not possible deliver unzipped files
  • Grid:
    • global grid ranges 89.75 to -89.75° latitude, and ‐179.75 to 179.75° longitude, i.e. 0.5° grid spacing, 360 rows and 720 columns, or 259200 grid cells total (corresponding to the resolution of the climate input data)
      • please report the output data row-wise starting at 89.75 and -179.75, and ending at -89.75 and 179.75
      • reporting intervals are 0.5 degrees_east for longitude, and -0.5 degrees_north for latitude
      • submitting data at lower resolution than 0.5x0.5 degrees is only encouraged in exceptional cases
    • regional data has specific grid requirements (see below)
    • cdo gridtype should be lonlat (not generic)
    • gridpoints you do not simulate should be filled with the missing_value and _FillValue marker (1.e+20f)
  • Variables and dimensions:
    • every dimension should have an associated variable
    • variable precision is float
    • precision of lon, lat and time, and secondary variables such as depth (see below), should be double
    • internal names of dimensions and variables all lowercase
    • for internal name of dimensions (coordinate variables), standard_name, long_name and unit, follow conventions given in table below
  • Time axis:
    • relative time axis with reference time "1661-01-01, 00:00:00" for ISIMIP2b files, and "1901-01-01, 00:00:00" for ISIMIP2a files
    • time increment according to temporal resolution of the data, e.g. “[days|months|years] since [reference time]”
    • specify the calendar used (preferable calendar is proleptic_gregorian) and, if needed, an explanation of what to do with leap days. e.g. calendar = "proleptic_gregorian" (see valid calendar options)
    • start and end dates according to time interval and sector (see details below)
  • Chunk sizes of variable (check with command ncdump -hs FILE):
    • for global 2d data:
      variable:_ChunkSizes = 1, 360, 720 ;
    • for global 3d data (in this case, assuming 10 vertical layers):
      variable:_ChunkSizes = 1, 10, 360, 720 ;
    • Note: NetCDF4 internally chunks the data into subsets. Usually one chunk is defined by a record, i.e. a combination of one horizontal field at one time step and one vertical layer. Chunking the data differently makes operations on the data extremely more time consuming.
  • Global attributes:
    • add and fill institution and contact as global attributes
    • add global comments at own will with additional information about the run; preferred name for this global attribute is comment (avoid the attribute "description" – it is used during ESGF publication)

Note 1: Regional data and variables with (varying) depth layers have additional specifications, which you'll find in subsections below.

Note 2: If your file does not have correct chunk sizes, or if you encounter any other issue with the formatting of your files, you may find a solution on the easy fixes listed in Quality checks of your simulation data.

Dimensions

Conventions for the internal name of dimensions (coordinate variables), standard_name, long_name and unit:

NetCDF axis' internal name Standard_name Long_name Unit
X lon longitude degrees_east
Y lat latitude degrees_north
Time time [no long name] [days/months/years] since [reference date+time]

Time Interval

  • Non-daily and/or non-global output should be reported as single files per experiment and climate scenario, covering the entire time period of the simulation round
  • For daily global model output only, split data into decadal chunks starting with year one of the decade and ending with year zero of the next decade, or the first or last year of the climate scenario period. Files containing a single year have both identifiers filled with the same year. Examples: 1861_1870, 2001_2005, 2006_2010, 2091_2099, 2100_2100, 2101_2110
  • Be sure to place your files in the subfolders of the upload area (.../_tmp) for the corresponding time periods; in ISIMIP2b, for example:
subfolder time period
pre-industrial 1661-1860
historical 1861-2005
future 2006-2099
future_extended 2100-2299

NetCDF header

A proper NetCDF header for daily global data shown with ncdump -h FILE should look like this:

dimensions:
   lon = 720 ;
   lat = 360 ;
   time = UNLIMITED ;
variables:
   double lon(lon) ;
       lon:standard_name = "longitude" ;
       lon:long_name = "longitude" ;
       lon:units = "degrees_east" ;
       lon:axis = "X" ;

   double lat(lat) ;
       lat:standard_name = "latitude" ;
       lat:long_name = "latitude" ;
       lat:units = "degrees_north" ;
       lat:axis = "Y" ;

   double time(time) ;
       time:standard_name = "time" ;
       time:units = "days since 1661-01-01 00:00:00" ;
       time:calendar = "proleptic_gregorian" ;

   float tas(time, lat, lon) ;
       tas:_FillValue = 1.e+20f ;
       tas:missing_value = 1.e+20f ;
       tas:units = “K" ;
       tas:standard_name = "air_temperature" ;
       tas:long_name = “Near-Surface Air Temperature" ;

// global attributes:
       :contact = "ISIMIP Coordination Team <info@isimip.org>";
       :institution = "Potsdam-Institute for Climate Impact Research (PIK)";
       :comment = "Data prepared for ISIMIP2b" ;

Requirements for regional data

  • regional model teams should interpolate their output data to the same, common 0.5x0.5° grid and submit only the region of interest
  • single grid cell (one‐point) time series have to be embedded onto a 1x1 point grid with properly set coordinates

Requirements for variables with fixed levels

For variables with fixed levels (e.g. layers whose depths do not change over time nor over space), we require the following:

  • The simulated variable contains levels and each level indicates the midpoint data (e.g. the midpoint of a layer's depth or dbh class)
  • The simulated variable depends on a level dimension as well; e.g. soilmoist(time, lev, lat, lon)
  • The level dimension has a specific name per sector: levlak for the lakes sector, lev for the water regional sector (some models have named it differently, e.g. depth, solay or soil; but please try to stick to the convention), or ‘dbh_class’ for forest models
  • The level dimension is double
  • The specifications (e.g. depth) of every level should be indicated either in units (see NetCDF header example below), or online within a model's documentation, or within the NetCDF file as a comment within the level dimension's attributes
  • If you want to introduce lower and upper boundaries to every level, you should also introduce variable depth_bnds, and in this case the following applies:
    • depth_bnds is double
    • depth_bnds depends on the level dimension and on index bnds; e.g. depth_bnds(lev, bnds)
    • bnds contains two entries, and what these indicate should be specified either in units, or within the comments of dimension depth_bnds; e.g. depth_bnds:comment = "bnds=0 for the top of the layer, and bnds=1 for the bottom of the layer"
  • To reduce the size of these files, store them in netcdf4_classic format and compress the data with zip level 9 (try command: ncks -7 -L 9 IFILE OFILE). For more info, check http://nco.sourceforge.net/nco.html#Compression and http://nco.sourceforge.net/nco.html#File-Formats-and-Conversion

Specific uses

  • For variables where it is possible to have layers or not (e.g. variable “harv” in Forest Models), add global attribute dbhclass_profile and use label "true" if the file contains layers (e.g. multiple dbh classes) or "false" depending on the case

NetCDF header

A proper NetCDF header for monthly global data of a variable with fixed depth layers from the water global sector, with layers' depths specified in units, shown with ncdump -h FILE should look like this:

dimensions:
	lon = 720 ;
	lat = 360 ;
	time = UNLIMITED ; // (2400 currently)
	depth = 5 ;
	bnds = 2 ;

variables:
	double lon(lon) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:standard_name = "longitude" ;
		lon:_Storage = "chunked" ;
		lon:_ChunkSizes = 720 ;
		lon:_DeflateLevel = 5 ;

	double lat(lat) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:standard_name = "latitude" ;
		lat:_Storage = "chunked" ;
		lat:_ChunkSizes = 360 ;
		lat:_DeflateLevel = 5 ;

	double time(time) ;
		time:units = "months since 1661-01-01" ;
		time:calendar = "365_day" ;
		time:_Storage = "chunked" ;
		time:_ChunkSizes = 1 ;
		time:_DeflateLevel = 5 ;

	double depth(depth) ;
		depth:units = "m" ;
		depth:bounds = "depth_bnds" ;
		depth:long_name = "depth_below_land" ;
		depth:positive = "down" ;
		depth:axis = "Z" ;
		depth:_Storage = "chunked" ;
		depth:_ChunkSizes = 5 ;
		depth:_DeflateLevel = 5 ;

	double depth_bnds(depth, bnds) ;
		depth_bnds:units = "m" ;
		depth_bnds:_Storage = "chunked" ;
		depth_bnds:_ChunkSizes = 5, 2 ;
		depth_bnds:_DeflateLevel = 5 ;

	float soilmoist(time, depth, lat, lon) ;
		soilmoist:short_field_name = "soilmoist" ;
		soilmoist:long_field_name = "soil moisture" ;
		soilmoist:long_name = "soil moisture" ;
		soilmoist:units = "kg m-2" ;
		soilmoist:_FillValue = 1.e+20f ;
		soilmoist:missing_value = 1.e+20f ;
		soilmoist:_Storage = "chunked" ;
		soilmoist:_ChunkSizes = 1, 5, 360, 720 ;
		soilmoist:_DeflateLevel = 5 ;

// global attributes:
       :contact = "Name <email@place.com>";
       :institution = "Institution of Affiliation (ACCRONYM)" ;
       :comment = "Data prepared for ISIMIP2b" ;

Requirements for variables with varying levels

For variables with levels that vary over time and/or over space (e.g. layers that can get deeper or shallower over time, or have different depths at different locations), we request the following additional attributes:

  • https://www.unidata.ucar.edu/software/netcdf/coords/0093.html
  • The simulated variable contains a fixed number of levels, and an attribute of these levels vary over time (e.g. varying depth of layers)
  • In every level, the data for the simulated variable indicates the midpoint data (e.g. the midpoint of a layer's depth)
  • The simulated variable depends on a level-attribute variable; e.g. if depth is the varying attribute (i.e. the level-attribute variable) of soil layers in variable soilmoist, then we would have soilmoist(time, depth, lat, lon)
  • The level-attribute variable is double
  • The level-attribute variable depends on a level dimension, with a specific name per sector; e.g. in the lakes sector, where lev is the level dimension, we would have depth(time, lev, lat, lon)
  • The level dimension has a specific name per sector: levlak for the lakes sector and lev for the water regional sector (some models have named it differently, e.g. depth, solay or soil; but please try to stick to the convention)
  • If you want to introduce lower and upper boundaries to every level, you should also introduce an additional boundaries variable (e.g. depth_bnds), and in this case the following applies:
    • The lower and upper boundaries of every layer are specified within variable the boundaries variable
    • The level-attribute variable and the boundaries variable are double
    • The boundaries variable depends on the level dimension and on index bnds; e.g. depth_bnds(bnds, time, lev, lat, lon)
    • What bnds indicates, should be specified within the comments of dimension depth_bnds; e.g. depth_bnds:comment = "bnds=0 for the top of the layer, and bnds=1 for the bottom of the layer"

Specific uses

  • For variables where depth of layers varies over time, add global attribute time_varying_soil_layer_depth and use label "true" or "false" depending on the case
  • For variables where depth of layers varies per grid cell, add global attribute location_varying_soil_layer_depth and use label "true" or "false" depending on the case

NetCDF header

A proper NetCDF header for daily global data of a variable with time varying depth layers from the water global sector, shown with ncdump -h FILE should look like this:

dimensions:
   time = UNLIMITED ;
   lon = 720 ;
   lat = 360 ;
   lev = 13 ;
   bnds = 2 ;

variables:
   double time(time) ;
       time:standard_name = "time" ;
       time:units = "days since 1661-01-01 00:00:00" ;
       time:calendar = "proleptic_gregorian" ;

   double lon(lon) ;
       lon:standard_name = "longitude" ;
       lon:long_name = "longitude" ;
       lon:units = "degrees_east" ;
       lon:axis = "X" ;

   double lat(lat) ;
       lat:standard_name = "latitude" ;
       lat:long_name = "latitude" ;
       lat:units = "degrees_north" ;
       lat:axis = "Y" ;

   double lev(lev) ;
       lev:standard_name = "level" ;
       lev:long_name = "level of vertical soil layers" ;
       lev:units = "1" ;
       lev:axis = "Z" ;
       lev:positive = "down" ;
       lev:_Storage = "contiguous" ;
       lev:_Endianness = "little" ;

   double depth(time, lev, lat, lon) ;
       depth:_FillValue = 1.e+20 ;
       depth:standard_name = "depth" ;
       depth:long_name = "depth of layer middle below surface" ;
       depth:units = "m" ;
       depth:positive = "down" ;
       depth:bounds = "depth_bonds" ;
       depth:_Storage = "chunked" ;
       depth:_ChunkSizes = 1, 13, 360, 720 ;
       depth:_DeflateLevel = 9 ;
       depth:_Shuffle = "true" ;
       depth:_Endianness = "little" ;

   double depth_bnds(bnds, time, lev, lat, lon) ;
       depth_bnds:_FillValue = 1.e+20 ;
       depth_bnds:standard_name = "depth_bounds" ;
       depth_bnds:long_name = "depth of layer\'s top and bottom below surface" ;
       depth_bnds:units = "1" ;
       depth_bnds:positive = "down" ;
       depth_bnds:comment = "bnds=0 for the top of the layer, and bnds=1 for the bottom of the layer" ;
       depth_bnds:_Storage = "chunked" ;
       depth_bnds:_ChunkSizes = 2, 1, 13, 360, 720 ;
       depth:_DeflateLevel = 9 ;
       depth:_Shuffle = "true" ;
       depth:_Endianness = "little" ;

   float soilmoist(time, lev, lat, lon) ;
       soilmoist:_FillValue = 1.e+20f ;
       soilmoist:long_name = "Soil moist" ;
       soilmoist:units = "kg m-2" ;
       soilmoist:missing_value = 1.e+20 ;
       soilmoist:_Storage = "chunked" ;
       soilmoist:_ChunkSizes = 1, 13, 360, 720 ;

// global attributes:
       :contact = "Name <email@place.com>";
       :institution = "Institution of Affiliation (ACCRONYM)" ;
       :comment = "Data prepared for ISIMIP2b" ;
       :time_varying_soil_layer_depth = "true" ;
       :location_varying_soil_layer_depth = "false" ;

Quality check of your simulation data

All data uploaded will be quality checked by the ISIMIP data managers. The goal is to have all runs available with a consistent set of variables.

If you are not sure about something, you can upload just a small subset to _tmp/ below your model folder and let us know at isimip-data@pik-potsdam.de.

On request we offer consistency checks for your model.

Checking process

  • Files in UploadArea will automatically get moved to a private folder 24 hrs after upload if they meet a basic naming scheme. Please refrain from re-uploading identical files that have already been copied to the DKRZ.
  • Checks identify severe and fixable errors, correct files without severe errors and produce a report of the checks. Summary of checks:
    • NetCDF format: NetCDF4 Classic ZIP and .nc4 extension
    • File name: lower case, correct model and scenario names (severe), gcm/climate_driver/social_scenario/co2_scenario combinations according to protocol
    • Variable: same file name and NetCDF variable name (severe), units according to protocol, only one variable per submitted file (severe; with some exceptions)
    • Grid: names and units of grid coordinates, grid type, grid increments, grid mask (severe), missing_value, _FillValue, _ChunkSizes (only for global data)
    • Time axis: name of time dimension (severe), time units, time increment (severe), reference date, number of time steps (severe), start and end year in agreement to simulation period and file name (severe)
    • Global metadata: contact and institution, and metadata for variables with varying depth layers (when it applies)
  • Log files are written to the DKRZ folder: /work/bb0820/ISIMIP/[SIMULATION-ROUND]/UploadArea/[SECTOR]/[MODEL]/_qc_reports/ . Non severe issues found (without “!!!”) will be fixed by the data managers. All others will need your assistance. Please check this folder on a regular basis and get in touch with Iliusi Vega or Matthias Büchner to discuss fixes on the files.
  • We will check for any gcm/climate_driver/social_scenario/co2_scenario combination that has been uploaded and successfully passed the format checks. It then internally generates a list off any variable provided and afterwards goes through all the combinations found and checks if all those variables are there.
  • Files that successfully pass the Quality Check will appear in the folder: /work/bb0820/ISIMIP/[SIMULATION-ROUND]/OutputData available for analysis by ISIMIP participants.

Quick formatting fixes

Some simple issues issues in your files –like wrong chunk sizes, wrong NetCDF format, inverted grid, or wrong variable and dimension names–, can be solved with the following fixes. For further instructions, please give a look at the NetCDF utilities mentioned above.

Chunk sizes

If your file does not have correct chunk sizes, try rewriting the data with:

2d: nccopy -k4 -d1 -c "time/1,lat/360,lon/720" IFILE OFILE
3d: nccopy -k4 -d1 -c "time/1,depth/10,lat/360,lon/720" IFILE OFILE

In some cases, when all dimensions are set as contiguous, the commands above might not work. In those cases, try:

cdo -f nc4c -z zip -copy IFILE OFILE

NetCDF format

nccopy -k4 -d5 IFILE OFILE

Change variable name

This implies changing the name of the variable from VAROLD to VARNEW:

ncrename -O -h -v VAROLD,VARNEW IFILE

Change dimension name

This implies changing both the name of the dimension (from DIMOLD to DIMNEW) and coordinate variable (from DIMNAMEOLD to DIMNAMENEW):

ncrename -O -h -d DIMOLD,DIMNEW -v DIMNAMEOLD,DIMNAMENEW IFILE

Reading ASCII data time series into NetCDF

Here are instructions for converting your global daily ASCII data into a NetCDF file with the required meta data. The tools cdo and ncatted (from NCO) are needed (see links above).

  • Prepare ASCII data with one data value per line sorted by time in data.txt
  • Import data starting at 179.75°W,89.75°N on a proleptic gregorian calendar:
cdo --history -f nc4c -z zip -setmissval,1e+20 -setunit,"UNIT" -setname,VARIABLE -setreftime,1661-01-01,00:00:00,1INCREMENT -settaxis,STARTYEAR-01-01,00:00:00,1INCREMENT -input,grid.txt data.nc4 < data.txt
  • You need to specify STARTYEAR, INCREMENT [days,months, years], VARIABLE, UNIT and grid description file.
  • Add meta data into NetCDF file:
ncatted -O -h -a contact,global,o,c,"NAME <EMAIL>" -a institution,global,o,c,"INSTITUTION (SHORT)" -a long_name,VARIABLE,o,c,"VARIABLE LONG NAME" data.nc4