New guidelines for uploading model results and quality checks

Posted on Dec. 7, 2015

we would like to share with you some important changes to the way that simulation data is uploaded to the DKRZ server  The aim of the changes is to ensure that only ISI-MIP participants can access the quality-checked and harmonised version (with respect to file naming, meta data and grid definitions etc.) of your data, which conform to the ISI-MIP format, and to prevent participants from accidentally accessing data that have not yet been quality checked.

In summary:

  • from now on new data needs to be uploaded to a temporary structure (_tmp)
  • already uploaded data will be moved to the _tmp structure
  • a quality check will be performed on the data and, if successful, transferred to the final download location
  • if the checks fail with severe errors, you will be asked to help correct the files

Please note: that the quality check does not test your data in any scientific sense.

At the bottom of this email you will find a sketch of the new file structure.

The folder structure for the three experiment types (historical observations runs, catch-up runs and cross-sectoral runs) have been mirrored (see below marked green) within your model's _tmp folder.

From now on you can only upload your data to the <MODEL>/_tmp structure. Over the coming days data that have already been uploaded will be moved to _tmp with permission to alter files by every member of the user group isimip. Please do not move any data on your own. Please also note, that you will have write permissions revoked from all former folders except for the _tmp folder. A hidden copy of these files will be kept to avoid unintentional removal of the data in case of errors in the quality control (QC)  process.

The newly-implemented QC is integrated into the transfer of data from the _tmp structure to the main structure. The QC covers basic checks on file naming and NetCDF meta data and grid configuration. Minor issues will be fixed by us. _tmp files that have passed the QC will appear in the main structure (shown in blue below). For each file where major issues have been found, and where your assistance is needed, a corresponding QC log file will be written in a similar structure below _tmp/_qc_reports/. We will also inform you via email. Until these issues are fixed, your data will not appear in the main structure. The QC is regularly repeated regularly, ensuring that corrected files are transferred promptly to the main structure. Since there are already more than 50.000 files to quality check, please be patient while the first checks are completed.

In order to reduce communication and time-consuming procedures for fixing improperly formatted files, you are again asked to thoroughly examine your files before uploading them. Details about file formatting and meta data requirements can be found in the simulation protocol.  For example, remember to put your institution and contact as global meta data into the NetCDF files. Specification of all relevant scenarios (harmonisation, sensitivity, irrigation, co2, [where applied]) is crucial to clearly identifying how your runs have been initialised. Splitting daily and monthly time series as described in the protocol also seems to be a common source of misformatting the files. If you are not sure about anything, you can upload just a small subset to _tmp and let Matthias Büchner <> have a look at it before transferring the rest of your data.

So please remember:
- upload only to the _tmp tree (below in green)
- download only from the parent tree (below in blue), as before.

Here is the extended folder structure:

|-- _code
|-- _doc
|-- _tmp
|   |-- catch-up
|   |   |-- gfdl-esm2m
|   |   |-- hadgem2-es
|   |   |-- ipsl-cm5a-lr
|   |   |-- miroc-esm-chem
|   |   `-- noresm1-m
|   |-- cross-sec
|   |   |-- hadgem2-es
|   |   `-- ipsl-cm5a-lr
|   `-- hist-obs
|   |   |-- gswp3
|   |   |-- princeton
|   |   |-- watch
|   |   `-- wfdei
|   `-- _qc_reports
|       |-- catch-up ...

|       |-- cross-sec ...

|       `-- hist-obs ...
|-- catch-up
|   |-- gfdl-esm2m
|   |-- hadgem2-es
|   |-- ipsl-cm5a-lr
|   |-- miroc-esm-chem
|   `-- noresm1-m
|-- cross-sec
|   |-- hadgem2-es
|   `-- ipsl-cm5a-lr
`-- hist-obs
    |-- gswp3
    |-- princeton
    |-- watch
    `-- wfdei

This information will also be made available on the ISI-MIP website under For Modellers>ISI-MIP>Output Data.