Skip to content

Commit

Permalink
add more configuration options to THREDDS catalog
Browse files Browse the repository at this point in the history
  • Loading branch information
mishaschwartz committed Oct 4, 2024
1 parent 3d7c8d6 commit 3acb24e
Show file tree
Hide file tree
Showing 4 changed files with 76 additions and 13 deletions.
24 changes: 23 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,29 @@
[Unreleased](https://github.com/bird-house/birdhouse-deploy/tree/master) (latest)
------------------------------------------------------------------------------------------------------------------

[//]: # (list changes here, using '-' for each new entry, remove this when items are added)
## Changes

- THREDDS: add more options to configure catalog.xml
- Currently the default THREDDS configuration creates two default datasets, the Service Data dataset and the
Main dataset. The Service Data dataset is used internally and hosts WPS outputs. The Main dataset is the
place where users can access data served by THREDDS. Both of these are configured to serve files with the following
extensions: .nc .ncml .txt .md .rst .csv

- In order to allow the THREDDS server to serve files with additional extensions, this introduces two new
variables:
- `THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS`: this allows users to specify additional [filter
elements](https://docs.unidata.ucar.edu/tds/current/userguide/tds_dataset_scan_ref.html#including-only-desired-files) to the Service Data dataset. This is especially useful if a WPS
outputs files with an extension other than the default (eg: .h5) to the `wps_outputs/` directory.
- `THREDDS_DATASET_DATASETSCAN_BODY`: this allows users to specify the whole body of the main dataset's
[`<datasetScan>`](https://docs.unidata.ucar.edu/tds/current/userguide/tds_dataset_scan_ref.html) element.
This allows users to fully customize how this dataset serves files.

- We limit the configuration options for the Service Data dataset more than the main dataset because the Service
Data dataset requires a basic configuration in order to properly serve WPS outputs. Making significant changes
to this configuration could have unexpected negative impacts on WPS usage.

- The defaults for these new variables are fully backwards compatible. Without changing these variables, the THREDDS
server should behave exactly the same as before.

[2.5.3](https://github.com/bird-house/birdhouse-deploy/tree/2.5.3) (2024-09-11)
------------------------------------------------------------------------------------------------------------------
Expand Down
14 changes: 2 additions & 12 deletions birdhouse/components/thredds/catalog.xml.template
Original file line number Diff line number Diff line change
Expand Up @@ -27,24 +27,14 @@
<include wildcard="*.md" />
<include wildcard="*.rst" />
<include wildcard="*.csv" />
${THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS}
</filter>

</datasetScan>

<datasetScan name="${THREDDS_DATASET_LOCATION_NAME}" ID="${THREDDS_DATASET_URL_PATH}" path="${THREDDS_DATASET_URL_PATH}" location="${THREDDS_DATASET_LOCATION_ON_CONTAINER}">

<metadata inherited="true">
<serviceName>all</serviceName>
</metadata>

<filter>
<include wildcard="*.nc" />
<include wildcard="*.ncml" />
<include wildcard="*.txt" />
<include wildcard="*.md" />
<include wildcard="*.rst" />
<include wildcard="*.csv" />
</filter>
${THREDDS_DATASET_DATASETSCAN_BODY}

</datasetScan>

Expand Down
17 changes: 17 additions & 0 deletions birdhouse/components/thredds/default.env
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,22 @@ export THREDDS_SERVICE_DATA_LOCATION_NAME='Birdhouse'
export THREDDS_DATASET_URL_PATH='datasets'
export THREDDS_SERVICE_DATA_URL_PATH='birdhouse'

export THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS=''

export THREDDS_DATASET_DATASETSCAN_BODY='
<metadata inherited="true">
<serviceName>all</serviceName>
</metadata>

<filter>
<include wildcard="*.nc" />
<include wildcard="*.ncml" />
<include wildcard="*.txt" />
<include wildcard="*.md" />
<include wildcard="*.rst" />
<include wildcard="*.csv" />
</filter>
'

# add any new variables not already in 'VARS' or 'OPTIONAL_VARS' that must be replaced in templates here
VARS="
Expand All @@ -28,6 +43,7 @@ VARS="
\$THREDDS_DATASET_LOCATION_NAME
\$THREDDS_DATASET_URL_PATH
\$THREDDS_DATASET_LOCATION_ON_CONTAINER
\$THREDDS_DATASET_DATASETSCAN_BODY
"

OPTIONAL_VARS="
Expand All @@ -39,6 +55,7 @@ OPTIONAL_VARS="
\$THREDDS_IMAGE
\$THREDDS_IMAGE_URI
\$THREDDS_ADDITIONAL_CATALOG
\$THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS
"

export DELAYED_EVAL="
Expand Down
34 changes: 34 additions & 0 deletions birdhouse/env.local.example
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,40 @@ export THREDDS_ADDITIONAL_CATALOG=""
# </datasetScan>
#"

# Additional file filters to add for the Service Data THREDDS dataset. By default, the Service Data dataset will only
# serve files with the following extensions: .nc .ncml .txt .md .rst .csv
# If you need this dataset to serve other files you should update the THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS to add
# additional file filters.
# This may be useful to set if a WPS outputs files to the wps_outputs/ directory (hosted under the Service Data dataset)
# in a file format other than one of the defaults.
# See the example below which would also enable serving .png and .h5 files.
#export THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS="
# <include wildcard="*.png" />
# <include wildcard="*.h5" />
#"

# Set this variable to customize the body of the <datasetScan> XML element for the main THREDDS dataset. This is typically
# the dataset where you would store most of the data served by THREDDS (additional datasets can be configured by setting the
# THREDDS_ADDITIONAL_CATALOG variable).
# By default, the main dataset will only serve files with the following extensions: .nc .ncml .txt .md .rst .csv and will use
# the THREDDS service named "all" (see components/thredds/catalog.xml.template). However this can be customized if desired.
# See the example below which would change the configuration to also serve .h5 and .json files instead of .md and .rst files.
# See the THREDDS documentation for the <datasetScan> element for all configuration options.
#export THREDDS_DATASET_DATASETSCAN_BODY='
# <metadata inherited="true">
# <serviceName>all</serviceName>
# </metadata>
#
# <filter>
# <include wildcard="*.nc" />
# <include wildcard="*.ncml" />
# <include wildcard="*.txt" />
# <include wildcard="*.h5" />
# <include wildcard="*.json" />
# <include wildcard="*.csv" />
# </filter>
#'

# Allow using Github as external AuthN/AuthZ provider with Magpie
# To setup Github as login, goto <https://github.com/settings/developers> under section [OAuth Apps]
# and create a new Magpie application with configurations:
Expand Down

0 comments on commit 3acb24e

Please sign in to comment.