Preparing simulation files



  • Please comply precisely with the formatting specified below to facilitate comparison among different models and between global and regional scale. Incorrect formatting can delay the analysis.
  • Before submitting complete simulations, please upload a single example file for checking and send a quick email to isimip-data@pik-potsdam.de to let us know - this could save much time and bandwidth!
  • Here you can download a sample file to help you with your file preparation.

File names

Conventions on file naming are documented in the general part of the protocol (Information for all sectors) in section 3.

Here you'll find example file names for the Biomes and Global Water sector.

File Formats and meta data

Files should be provided in compressed NetCDF format, a self-describing, machine-independent data format that support the creation, access, and sharing of array-oriented scientific data. It can be read/written/processed for example by:

ncdump (inspecting NetCDF headers and raw data) belonging to NetCDF software

ncview (simple data exploration and graphics)

command line tools, e.g. Climate Data Operators (CDO) (Help), netCDF Operator (NCO) and UVCDAT: show, convert, split, merge, write and perform arithmetic and statistical operations on NetCDF data

command line graphics, e.g. NCAR Command Language (NCL), UVCDAT and R

other applications, e.g. Matlab, Ferret, Panoply and many more

Requirements for NetCDF files

gridded NetCDF4 classic internally zipped, if internal compression is not possible deliver unzipped files

global grid ranges 89.75 to -89.75° latitude, and ‐179.75 to 179.75° longitude, i.e. 0.5° grid spacing, 360 rows and 720 columns, or 259200 grid cells total. (corresponding to the resolution of the climate input data)

gridtype: lonlat (not generic)

variable precision is float

regional model teams should interpolate their output data to the same, common 0.5x0.5° grid and submit only the region of interest

single grid cell (one‐point) time series have to be embedded onto a 1x1 point grid with properly set coordinates

pixels you do not simulate should be filled with the missing_value and _FillValue marker (1.e+20f)

relative time axis with reference time 1661-01-01, 00:00:00

time increment according to temporal resolution of the data, e.g. “[days|months|years] since 1661-01-01 00:00:00” for daily, monthly or annual data

dimensions and variables all lowercase

preferable calendar is “proleptic_gregorian”. Please specify the calendar used and, if needed, an explanation of what to do with leap days. calendar = "proleptic_gregorian"

add and fill institution and contact as global attribute

add global comments at own will with additional information about the run. Preferred global attribute is "comment" (avoid the attribute "description" – it is used during ESGF publication)

NetCDF4 internally chunks the data into subsets. Usually one chunk is defined by a record, i.e. a combination of one horizontal field at one time step and one vertical layer. Chunking the data differently make operations on the data extremely more time consuming. Check the chunk sizes of a variable with ncdump -hs FILE

for 2d data it should give you:
variable:_ChunkSizes = 1, 360, 720 ;

for 3d data it should give you:
variable:_ChunkSizes = 1, 10, 360, 720 ;
(assuming 10 vertical layers)

if necessary, rewrite the data with

- 2d: nccopy -k4 -d1 -c "time/1,lat/360,lon/720" IFILE OFILE
- 3d: nccopy -k4 -d1 -c "time/1,depth/10,lat/360,lon/720" IFILE OFILE

Time Interval

non-daily and/or non-global output should be reported as single files per experiment and climate scenario

for daily global model output only, split data into decadal chunks starting with year one of the decade and ending with year zero of the next decade or the first or last year of the climate scenario period. Files containing a single year have both identifiers filled with the same year. Examples: 1861_1870, 2001_2005, 2006_2010, 2091_2099, 2100_2100, 2101_2110

under the upload folder /path/to/UploadArea/sector/model/_tmp/ you will find four subfolders corresponding to the time periods listed below. Be sure to place your files in the correct folders:
- pre-industrial: 1661-1860
- historical: 1861-2005
- future: 2006-2099
- future_extended: 2100-2299

Dimension

NetCDF internal name; standard_name; long_name; Unit

X; lon; longitude; longitude; degrees_east

Y; lat; latitude; latitude; degrees_north

Time; time; time; [no long name]; [Relative to reference date+time]

NetCDF header

A proper NetCDF header for daily global data shown with ncdump -h FILE should look like this:


dimensions:
   lon = 720 ;
   lat = 360 ;
   time = UNLIMITED ;
variables:
   double lon(lon) ;
       lon:standard_name = "longitude" ;
       lon:long_name = "longitude" ;
       lon:units = "degrees_east" ;
       lon:axis = "X" ;

   double lat(lat) ;
       lat:standard_name = "latitude" ;
       lat:long_name = "latitude" ;
       lat:units = "degrees_north" ;
       lat:axis = "Y" ;

   double time(time) ;
       time:standard_name = "time" ;
       time:units = "days since 1661-01-01 00:00:00" ;
       time:calendar = "proleptic_gregorian" ;

   float tas(time, lat, lon) ;
       wind:_FillValue = 1.e+20f ;
       wind:missing_value = 1.e+20f ;
       wind:units = “K" ;
       wind:standard_name = "air_temperature" ;
       wind:long_name = “Near-Surface Air Temperature" ;

// global attributes:
       :contact = "ISIMIP Coordination Team <info@isimip.org";
       :institution = "Potsdam-Institute for Climate Impact Research (PIK)";
       :comment = "Data prepared for ISIMIP2b" ;

Reading ASCII data time series into NetCDF

Here are instructions on converting your global daily ASCII data into a NetCDF file with the required meta data. The tools cdo and ncatted (from NCO) are needed (see links above).

  • Prepare ASCII data with one data value per line sorted by time in data.txt
  • Import data starting at 179.75°W,89.75°N on a proleptic gregorian calendar:

cdo --history -f nc4c -z zip -setmissval,1e+20 -setunit,"UNIT" -setname,VARIABLE -setreftime,1661-01-01,00:00:00,1INCREMENT -settaxis,STARTYEAR-01-01,00:00:00,1INCREMENT -input,grid.txt data.nc4 < data.txt
  • You need to specify STARTYEAR, INCREMENT [days,months, years], VARIABLE, UNIT and grid description file.
  • Add meta data into NetCDF file:

ncatted -O -h -a contact,global,o,c,"NAME <EMAIL>" -a institution,global,o,c,"INSTITUTION (SHORT)" -a long_name,VARIABLE,o,c,"VARIABLE LONG NAME" data.nc4

Quality check of your simulation data

All data uploaded will be quality checked by the ISIMIP data managers. Log files are written to the DKRZ folder: /work/bb0820/ISIMIP/[SIMULATION-ROUND]/UploadArea/[SECTOR]/[MODEL]/_qc_reports/ . Non severe issues found (without “!!!”) will be fixed by the data managers. All others will need your assistance. Please check this folder on a regular basis and get in touch with Jan Volkholz or Matthias Büchner to discuss fixes on the files.

Files in UploadArea will automatically get moved to a private folder 24 hrs after upload. Refrain from reuploading identical files that have already been copied to the DKRZ.

Files that successfully pass the Quality Check will appear in the folder: /work/bb0820/ISIMIP/[SIMULATION-ROUND]/OutputData available for analysis by ISIMIP participants.

On request we offer consistency checks for your model. It checks for any gcm/climate_driver/social_scenario/co2_scenario combination that has been uploaded and successfully passed the formst checks. It then internally generates a list off any variable provided and afterwards goes through all the combinations found and checks if all those variables are there. The goal is to have all runs available with a consistent set of variables.

If you are not sure about anything, you can upload just a small subset to _tmp/ below your model folder and let us know at isimip-data@pik-potsdam.de.