-
Notifications
You must be signed in to change notification settings - Fork 0
/
doc.tex
316 lines (302 loc) · 23.2 KB
/
doc.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
\section{Purpose and limitations}
The purpose of this document is to serve as an unambigious single resource for reference by administrators of IS-ENES2 ESGF datanodes, to configure their datanodes and publish data in compliance with regulations discussed and adopted by all datanode managers. This document aggregates information from sources such as the Trieste meeting notes \cite{trieste}, Martin Juckes' `CORDEX: ESGF Search Facet Mappings' document \cite{cordexfacetsdoc} and other discussions which have led to collective consensus. This document only contains information from the perspective of publishing/maintaining data on the ESGF datanode and may not be refered to for any other purpose.
\section{Latest Version}
The latest version of this document will always be available at:\\
\url{https://github.com/snic-nsc/datanode-mgr-doc/raw/master/ro/Datanodemgr-doc.pdf} \\
The entire repository, which includes the \LaTeX{} source file can be cloned from:\\
\url{https://github.com/snic-nsc/datanode-mgr-doc.git}
\section{IS-ENES2 ESGF datanode Search Facet Configuration}
IS-ENES2 ESGF datanodes have some additional search facets pertaining to CORDEX. Here below are the entire list of facets used, on an IS-ENES2 ESGF datanode. This file is available in the `\textbf{configfiles}' directory in the \href{https://github.com/snic-nsc/datanode-mgr-doc.git}{repo}. \\
Filename: \texttt{facets.properties}\\
Standard location: \texttt{/esg/config/facets.properties}
\begin{footnotesize}
\begin{verbatimtabinput}[4]{configfiles/facets.properties}
\end{verbatimtabinput}
\end{footnotesize}
\newpage
\section{ESGF Attribute Services}
\label{attribservicesfile}
File name: \texttt{/esg/config/esgf\_ats\_static.xml}\\
For information about how to setup your datanode to correctly enforce restrictions on CORDEX data usage, refer to Section~\ref{enforcegrouprestrictions}. This file is available in the `\textbf{configfiles}' directory in the \href{https://github.com/snic-nsc/datanode-mgr-doc.git}{repo}.
\begin{tiny}
\begin{verbatimtabinput}[4]{configfiles/esgf_ats_static.xml}
\end{verbatimtabinput}
\end{tiny}
\section{ESGF IDP Whitelist settings}
File name: \texttt{/esg/config/esgf\_idp\_static.xml}. This file is available in the `\textbf{configfiles}' directory in the \href{https://github.com/snic-nsc/datanode-mgr-doc.git}{repo}.
\begin{tiny}
\begin{verbatimtabinput}[4]{configfiles/esgf_idp_static.xml}
\end{verbatimtabinput}
\end{tiny}
\section{ESGF Search Shard configuration settings}
File name: \texttt{/esg/config/esgf\_shards\_static.xml}. This file is available in the `\textbf{configfiles}' directory in the \href{https://github.com/snic-nsc/datanode-mgr-doc.git}{repo}.
\begin{tiny}
\begin{verbatimtabinput}[4]{configfiles/esgf_shards_static.xml}
\end{verbatimtabinput}
\end{tiny}
\section{Publication Version}
It was decided at the Trieste meet that all data published on IS-ENES2 datanodes will clearly specify the version number which is the date of the publication, expressed in the format \textbf{v}\textit{yyyymmdd}. This requires the creation of directory with that name, in the physical directory structure. This directory has to be created after the `Variable name' directory. Examples:\\
\begin{tiny}
\texttt{/datapool1/cordexgeneral/cordex/output/MNA-22/SMHI/ECMWF-ERAINT/evaluation/r0i0p0/SMHI-RCA4/v1/fx/orog/\yellowline{v20131101}}\\
\texttt{/datapool1/cordexgeneral/cordex/output/ARC-44/SMHI/NCC-NorESM1-M/historical/r0i0p0/SMHI-RCA4/v1/fx/sftlf/\yellowline{v20140123}}\\
\end{tiny}
\\To get this version number correctly, the procedure is to append a \texttt{\myopt new-version $<$versionnum$>$} to the \texttt{esgpublish} command.
\section{Directory Structure}
The path to the directory tree containing the data shall have \texttt{Project/Product} followed by the directory tree containing the data. \\
Given below are examples of valid and invalid directory structures.\\
\vspace{0mm}\\
\texttt{/cordex/output/...} \cmark\\
\texttt{/localfs/localpath/cordex/output/...} \cmark \footnote{Some sites use the lower-case `cordex' while some use `CORDEX'; While there is no rule, the lower-case `cordex' may be considered as the prefered option.}\footnote{`output' is the value of the `Product' facet option here. It may take other values that are applicable to the `Product' facet in the future.} \\
\texttt{/corddata/output/...} \xmark{ } //non-standard name corresponding to `Project'. \\
\texttt{/cordex/AFR-44/...} \xmark{ } //there is no directory corresponding to `Product'. Here is a complete \texttt{directory\_format} line, for reference:
\begin{verbatim}
directory_format = %(root)s/cordex/%(product)s/%(domain)s/%(institute)s/\
%(driving_model)s/%(experiment)s/%(ensemble)s/%(rcm_models/%(rcm_version)s/\
%(time_frequency)s/%(variable)s/v%(version)s
\end{verbatim}
\section{Variables to be excluded during publish: CORDEX}
\label{skipvars}
The following declaration inside \texttt{/esg/esgcet/esg.ini} should be used to exclude certain variables from the THREDDS catalogues generated by \texttt{esgpublish}. Note that this differs from the default value created by previous versions of \texttt{esgsetup}; \yellowline{in particular managers should ensure that the variable \texttt{basin} is NOT excluded.}
\begin{verbatimtab}[4]
thredds_exclude_variables = a,a_bnds,alev1,alevel,alevhalf,alt40,b,b_bnds,bnds,\
bounds_lat,bounds_lon,dbze,depth,depth0m,depth100m,depth_bnds,geo_region,height,\
height10m,height2m,heightv,Lambert_Conformal,lat,lat_bnds,lat_bounds,\
lat_vertices,latitude,latitude_bnds,layer,lev,lev_bnds,location,lon,lon_bnds,\
lon_bounds,lon_vertices,longitude,longitude_bnds,olayer100m,olevel,oline,p0,\
p220,p500,p560,p700,p840,plev,plev3,plev7,plev8,plev_bnds,plevs,pressure1,region,\
rho,rlat,rlat_bnds,rlon,rlon_bnds,rotated_pole,Rotated_Pole,scatratio,sdepth,\
sdepth1,sza5,tau,tau_bnds,time,time1,time2,time_bnds,vegtype,x,y
\end{verbatimtab}
\section{Checking for variables that need to be skipped}
We saw in Section~\ref{skipvars} the compiled list of variables that need to be present in the \\
\texttt{thredds\_exclude\_variables} list, prior to data publication. However, there might well be variables in your data which need to be similarly excluded, but are not part of this list yet. This has caused problems in the past. Here's a simple script that can be used to inspect the data prior to publication. It reports variables which \textbf{may} need to be added to the exclude list. It also logs potential problems. The script uses the \texttt{ncdump} utility which is shipped with \textbf{uvcdat/cdat} on ESGF datanodes. This script is available in the `\textbf{scripts}' directory in the \href{https://github.com/snic-nsc/datanode-mgr-doc.git}{repo}.
\begin{tiny}
\begin{verbatimtabinput}[4]{scripts/exclvarcheck.sh}
\end{verbatimtabinput}
\end{tiny}
\textbf{How to use}
\begin{enumerate}
\item Ensure that you set the correct path to the variable `ncdumplocation'.
\item Choose a location for `scripthome'. Copy the \texttt{exclvarcheck.sh} and \texttt{excl\_cordex}\footnote{available in the `\textbf{scripts}' directory in the \href{https://github.com/snic-nsc/datanode-mgr-doc.git}{repo}} files to that location.
\item If you have run the script before, remember to clear the files in \texttt{/tmp/ncchecks}
\item To run, simply \textbf{cd} to the directory where your datafiles reside and run:
\begin{verbatim}
find . -name '*.nc' -exec bash <script home>/exclvarcheck.sh {} \;
\end{verbatim}
\item When it completes, inspect the file \texttt{/tmp/ncchecks/exclvars} to see if you might need to add any variables.
\item The file \texttt{/tmp/ncchecks/suspects} lists the files which produced the additions.
\item The file \texttt{/tmp/ncchecks/notfound} will list cases, if any, where a variable after which the file is named, isn't present in the file itself!
\item If you indeed find variables that need to be added to the list of excluded variables, please let me know
\end{enumerate}
\section{Value for the RCMModelName facet}
It was decided that the value of the `rcm\_name' facet should NOT contain the institute information, as this information is already captured and presented by the `Institute' facet. However, the directory corresponding to the `rcm\_model' contains the name of the institute too, along with the model name, as stipulated by the CORDEX archive specifications \footnote{``RCMModelName is an alphanumeric identifier chosen by the modeling group; it should consist of an institute acronym and a model acronym, connected by a dash, e.g., DMI-HIRHAM5 or SMHI-RCA3.''\cite{cordexarchivespecs}}. This results in the requirement for some special handling.
\mypar
The easiest way to handle this is by creating a substitution map for the variable.
\begin{enumerate}
\item Under the options for \texttt{[project:cordex]}, find the configuration line that says \texttt{maps}
\item Add the \texttt{rcm\_name} categorie:\\
\texttt{categories =\\ rcm\_name | string | false | true | 7}
\item Edit the line to say the following:\\
\texttt{maps = rcm\_name\_map,institute\_map, las\_time\_delta\_map, domain\_map}
\item Create a new map `rcm\_name\_map' and populate it with entries that correspond to the models that you handle, leaving out the institute part in the last field.
\item Look at the example below for reference:
\begin{verbatimtab}[4]
rcm_name_map = map(project,rcm_model : rcm_name)
cordex |SMHI-RCA4| RCA4
cordex |SMHI-RCA4-SN| RCA4-SN
\end{verbatimtab}
\item Use the regex `rcm\_name' in the place of the directory corresponding to the model directory, in the \texttt{dataset\_id} string.
\vspace{-6mm}\\
\begin{verbatimtab}[4]
dataset_id = cordex.%(product)s.%(domain)s.%(institute)s.%(driving_model)s.\
%(experiment)s.%(ensemble)s.%(rcm_name)s.%(rcm_version)s.%(time_frequency)s.\
%(variable)s
\end{verbatimtab}
\end{enumerate}
\subsection{Complete \texttt{rcm\_name\_map}}
The current and comprehensive list of CORDEX models may be obtained from:\\
\url{http://cordex.dmi.dk/joomla/images/CORDEX/RCMModelName.txt}. \\
Given below is a script that can generate a complete \texttt{rcm\_name\_map} table, that could then be pasted into the ini file. This script is available in the `\textbf{scripts}' directory in the \href{https://github.com/snic-nsc/datanode-mgr-doc.git}{repo}. Also produced here is the full output of the script.
\begin{tiny}
\begin{verbatimtabinput}[4]{scripts/makemodelmap.sh}
\end{verbatimtabinput}
\vspace{-8mm}
\begin{verbatimtab}[4]
rcm_name_map = map(project,rcm_model : rcm_name)
cordex| AUTH-LHTEE-WRF321B| WRF321B
cordex| AUTH-Met-WRF331A| WRF331A
cordex| AWI-HIRHAM5| HIRHAM5
cordex| BCCR-WRF331C| WRF331C
cordex| CCCma-CanRCM4| CanRCM4
cordex| CHMI-ALADIN52| ALADIN52
cordex| CLMcom-CCLM4-8-17| CCLM4-8-17
cordex| CNRM-ALADIN52| ALADIN52
cordex| CNRM-ARPEGE52| ARPEGE52
cordex| CRP-GL-WRF331A| WRF331A
cordex| CUNI-RegCM4-2| RegCM4-2
cordex| DHMZ-RegCM4-2| RegCM4-2
cordex| DMI-HIRHAM5| HIRHAM5
cordex| ENEA-RegCM4-3| RegCM4-3
cordex| HMS-ALADIN52| ALADIN52
cordex| ICTP-RegCM4-3| RegCM4-3
cordex| IDL-WRF331D| WRF331D
cordex| IPSL-INERIS-WRF331F| WRF331F
cordex| KNMI-RACMO21P| RACMO21P
cordex| KNMI-RACMO22T| RACMO22T
cordex| MIUB-WRF331A| WRF331A
cordex| MOHC-HadGEM3-RA| HadGEM3-RA
cordex| MOHC-HadRM3P| HadRM3P
cordex| MPI-CSC-REMO2009| REMO2009
cordex| NUIM-WRF331F| WRF331F
cordex| SMHI-RCA4| RCA4
cordex| SMHI-RCA4-SN| RCA4-SN
cordex| SMHI-RCAO| RCAO
cordex| SMHI-RCAO-SN| RCAO-SN
cordex| UCAN-WRF331G| WRF331G
cordex| UCAN-WRF350I| WRF350I
cordex| UCLM-PROMES| PROMES
cordex| UHOH-WRF331H| WRF331H
cordex| UQAM-CRCM5| CRCM5
\end{verbatimtab}
\end{tiny}
\section{\texttt{esgcet\_models\_table.txt}}
Apart from the rcm\_name map, another map that lists models and their parent organizations is the \texttt{/esg/config/esgcet\_models\_table.txt}. After making any changes to it, one needs to execute \texttt{esginitialize -c}, to update it, and if that doesn't work, you may need to `downgrade' the database by excuting \texttt{esginitialize \myopt d0} and then executing \texttt{esginitialize -c}.
\begin{tiny}
\begin{verbatim}
test | test | http://www-pcmdi.llnl.gov | Test
test | ncar_ccsm3_0 | http://www.ccsm.ucar.edu| NCAR Community Climate System Model, CCSM 3.0
cordex | RCA4 | SMHI | www.smhi.se
cordex | RCA4-SN | SMHI | www.smhi.se
cordex | RCAO | SMHI | www.smhi.se
cordex | RCAO-SN | SMHI | www.smhi.se
\end{verbatim}
\end{tiny}
\section{Displaying the project name in upper case}
Though the project name is always expressed in lower case in catalogs and metadata, it is displayed in the upper-case in the web frontend. This requires setting a simple substitution string. Simply add the name of the project, first in lower case and then in upper case, separated by a colon. The file into which this string goes in is:
\vspace{-4mm}
\begin{small}
\begin{verbatim}
/usr/local/tomcat/webapps/esg-search/WEB-INF/classes/esg/search/config/projects.properties
cmip5:CMIP5
obs4mips:obs4MIPs
cssef:CSSEF
tamip:TAMIP
lucid:LUCID
test:TEST
pmip3:PMIP3
geomip:GeoMIP
euclipse:EUCLIPSE
cordex:CORDEX
\end{verbatim}
\end{small}
\section{Enforcing group restrictions on CORDEX data}
\label{enforcegrouprestrictions}
CORDEX data published on the ESGF datanodes in the federation are made available only to those who apply for membership to one of the two groups associated with CORDEX data. These groups, apart from restricting who can access these datasets can also serve as a mechanism to specify additional terms of data access. The \texttt{CORDEX\_RESEARCH} group is for individuals who wish to download and use the data only for non-commercial purposes whereas \texttt{CORDEX\_COMMERCIAL} is for those individuals who may wish to use the data for commercial purposes. CORDEX data which is open for unrestricted use is made available to both groups whereas data which is meant to be only used for non-commercial use is only made accessible to members of the \texttt{CORDEX\_RESEARCH} group. \greenline{Unless otherwise specified by the data-provider, all CORDEX datasets should be accessible by members of both \texttt{CORDEX\_RESEARCH} and \texttt{CORDEX\_COMMERCIAL} groups.} Attribute management for these CORDEX groups is managed on the \texttt{esg-dn1.nsc.liu.se} datanode and for configuring your datanode to use this attribute service, refer to Section~\ref{attribservicesfile}.
\subsection{Ensure presence of license files}
If you are running the latest version of the middleware (1.6.x), you may skip to Section~\ref{segregatedata}. If you are running an older release, check whether the following files are present, on your datanode:
\begin{enumerate}
\item \$tomcatdir/webapps/esg-orp/licenses/CordexResearchLicense.xml
\item \$tomcatdir/webapps/esg-orp/licenses/CordexCommercialLicense.xml
\end{enumerate}
If the above listed files are NOT present:
\begin{enumerate}
\item git clone the repository containing this document, along with the license files from \url{https://github.com/snic-nsc/datanode-mgr-doc.git}
\item Copy the license files present in the \texttt{cordexlicensefiles} directory over to their respective target locations on the datanode(specified in file `filelocations', also in the same directory).
\item Ensure that you replace the default `registration-request.jsp' file with the one present in the \texttt{cordexlicensefiles} directory, as this file activates the usage of the CORDEX license files.
\item Restart \texttt{esg-node}
\end{enumerate}
\subsection{Segregating data}
\label{segregatedata}
The ESGF attribute service can be used to restrict access to data by creating different policies for different file paths. This means that data with different levels of access restrictions ought to be in distinct directory heirarchies. This needs some conscious planning by datanode managers, preferably prior to data publication, as it may be inconvenient to move data directories later. Planning is required to setup unambigious and intuitive directory trees which will then have different restriction policies applied on them. For the purpose of reducing publication time confusions and or possibility of errors, it is strongly recommended to set up entirely seperate directory trees, rather than having a mix of the two types under the same tree, that is, under distinct \texttt{thredds\_dataset\_roots}.
\subsection{Caveat}
Unlike most commercial scenarios where a paying or `commercial' customer gets additional features/privileges, in the CORDEX sense, a commercial user is one who has fewer datasets he/she can possibly access; This is because datasets which are meant for non-commercial access would not be available for these users. What this means is that \yellowline{naming a top-level directory/dataset\_root as \texttt{Commercial} or similar, would be counter-intuitive as it would be available for all users.} It is however beneficial to create a directory/dataset\_root called \texttt{Non-Commercial}, as this would clearly indicate that it's only for non-commercial use, that is, it's only available for users belonging to the \texttt{CORDEX\_RESEARCH} group.
\subsection{Paths and regexes}
The ESGF attribute service sees paths as presented to it by thredds. You can use that to design the regex that you need. \yellowline{Ensure that you don't design a regex which gets triggered by unintended elements in the path, including the hostname of the node itself!} While configuring the attribute service on the DMI datanode, the hostname of the node, \texttt{cordexesg.dmi.dk} was triggering the regex match for the expression \texttt{.*cordex.*} causing every url to match!!
\subsection{Setting up the \texttt{esgf\_policies\_local.xml}}
Let's consider the following configuration lines:
\begin{tiny}
\begin{verbatimtab}[4]
<policy resource=".*fileServer.*cordexnoncommercial.*" attribute_type="CORDEX_Research" attribute_value="user" action="Read"/>
<policy resource=".*fileServer.*cordexgeneral.*" attribute_type="CORDEX_Research" attribute_value="user" action="Read"/>
<policy resource=".*fileServer.*cordexgeneral.*" attribute_type="CORDEX_Commercial" attribute_value="user" action="Read"/>
<policy resource=".*fileServer.*cord.*" attribute_type="wheel" attribute_value="super" action="Write"/>
\end{verbatimtab}
\end{tiny}
These lines indicate that thredds urls containing the element \texttt{cordexnoncommercial} are only accessible to members of \texttt{CORDEX\_RESEARCH} group whereas urls containing \texttt{cordexgeneral} are accessible by all CORDEX data users. We can also see that \texttt{Write} or \texttt{Publish} access is only provided to users of group \texttt{wheel} with attribute \texttt{super}. This would allow the special user account \texttt{rootAdmin} to be used for all publication activities.
\subsection{Corresponding \texttt{thredds\_dataset\_roots} entries}
The \texttt{thredds\_dataset\_roots} entries can be set up in many ways. Let's consider two cases.
\begin{enumerate}
\item Both non-commercial and general data being under the same dataset\_root:
\begin{verbatim}
thredds_dataset_roots =
esg_dataroot1| /data
\end{verbatim}
Here, the non-commercial data would be placed under \texttt{/data/cordexnoncommercial} whereas the general data would be under \texttt{/data/cordexgeneral}.
\item Non-commercial and general data being under different dataset\_roots:
\begin{verbatim}
thredds_dataset_roots =
esg_cordexnoncommercial| /dir1/cordex
esg_cordexgeneral| /dir2/cordex
\end{verbatim}
\end{enumerate}
\redline{Caution!!} The part of the path specified as the \texttt{thredds\_dataset\_root} would be subsituted by the name associated with the dataset\_root in the thredds filename. This means that if your \texttt{thredds\_dataset\_root} value reads thus: \texttt{esg\_data| /partion1/noncommercial}, the `\texttt{partition1/noncommercial}' part of the path will be substituted by \texttt{esg\_data} in the thredds url and hence would not match the regex you'd planned to capture `noncommercial'. It is therefore preferred to simply use the name of the \texttt{thredds\_dataset\_root} as the regex match.
\subsection{Data restricted to `Non-Commercial usage only', by site}
\begin{longtable}{|l|l|c|}
\hline
\multicolumn{1}{|c|}{\textbf{Sl}} & \multicolumn{1}{c|}{\textbf{Site}} & \multicolumn{1}{c|}{\textbf{Data}}\endhead
\hline
1. & BADC & None \\
\hline
2. & DKRZ & CLMcom data\\
\hline
3. & DMI & HMS data\\
\hline
4. & IPSL & None\\
\hline
5. & LIU-NSC & None\\
\hline
6. & UIO & None\\
\hline
7. & UNICAN & All\\
\hline
\end{longtable}
\captionof{table}{Data restricted to `non-commercial usage only', by site}
\section{Enabling Gridftp and OPeNDAP access}
\label{gridftpaccess}
\label{opendapaccess}
In order to enable OPeNDAP access, simply ensure that the following lines are present in your \texttt{esg.ini} file:
\begin{verbatimtab}[4]
thredds_file_services =
HTTPServer | /thredds/fileServer/ | HTTPServer | fileservice
GridFTP | gsiftp://<nodename>:2811/ | GRIDFTP | fileservice
OpenDAP | /thredds/dodsC/ | OpenDAP | fileservice
\end{verbatimtab}
To allow incoming OpenDAP requests, you should also ensure that access is provided to CORDEX group members. It's done by putting these lines in the \texttt{/esg/config/esgf\_policies\_local.xml} file:
\begin{tiny}
\begin{verbatimtab}[4]
<policy resource=".*dodsC.*cordex.*" attribute_type="CORDEX_Research" attribute_value="user" action="Read"/>
<policy resource=".*dodsC.*cordex.*" attribute_type="CORDEX_Commercial" attribute_value="user" action="Read"/>
<policy resource=".*cordex.*aggregation.*" attribute_type="CORDEX_Research" attribute_value="user" action="Read"/>
<policy resource=".*cordex.*aggregation.*" attribute_type="CORDEX_Commercial" attribute_value="user" action="Read"/>
\end{verbatimtab}
\end{tiny}
If you intend to offer gridftp, don't forget to allow inbound access to TCP port 2811
\section{Handy pre-publish tips}
\subsection{Adding checksums}
It's a good practice to publish data along with their checksums, so that silent corruption or download errors don't get away unnoticed. Generating checksums at publish time may drastically slow down a publication, so it's advisable to precompute them and then insert them into the mapfile. There are several ways in which one could generate checksums. I prefer to use \href{http://www.gnu.org/software/parallel}{gnu parallel} to speed things up.
\begin{verbatimtab}
time(find . -name '*.nc' -print| parallel sha256sum) >/unsorted-sha256sums \
2>checksumout;
cat unsorted-sha256sums|sort -k 2,2 >sha256sums;
\end{verbatimtab}
Generate the map file using the standard \texttt{esgscan\_directory} and then use the following python script to insert the checksums. This script is available in the `\textbf{scripts}' directory in the \href{https://github.com/snic-nsc/datanode-mgr-doc.git}{repo}.
\begin{tiny}
\begin{verbatimtabinput}[4]{scripts/addchecksum.py}
\end{verbatimtabinput}
\end{tiny}
Simply call the script like this:
\begin{verbatimtab}
python addchecksum.py <checksumfile> <mapfile> <outputfile>
\end{verbatimtab}
\section{Acknowledgments}
Many people have contributed to this document, pointing out errors and suggesting improvements. Thanks in particular to Hans Ramthun, Katharina Berger and Stephanie Legutke of the DKRZ, Stephen Pascoe of the BADC, and Jose Carlos Blanco of UNICAN, for their suggestions and help. Together, we strive to make the task of datanode administration a bit less of a hopeless task!