Skip to content

Commit

Permalink
Merge branch 'main' into HEAD
Browse files Browse the repository at this point in the history
  • Loading branch information
baloola committed Jun 14, 2024
2 parents 5845908 + b7d6ca8 commit 397e212
Show file tree
Hide file tree
Showing 76 changed files with 4,333 additions and 716 deletions.
5 changes: 1 addition & 4 deletions .github/workflows/pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,9 @@
name: Deploy static content to Pages

on:
# Runs on pushes targeting the default branch
push:
branches:
- main
issues:
types: [opened, labeled, unlabeled, edited]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand All @@ -25,6 +22,7 @@ concurrency:
cancel-in-progress: false

jobs:

# Single deploy job since we're just deploying
deploy:
environment:
Expand Down Expand Up @@ -59,4 +57,3 @@ jobs:
PASSWORD: ${{ secrets.PASSWORD }}
run: |
python3 ./stac/stac-generator/update_items.py
26 changes: 26 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Validate generated stac item in the PR

on:
pull_request:
branches:
- main
push:
branches:
- '**'
- '!master'
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: install dependencies
run: |
pip install -r ./stac/stac-generator/requirements.txt
- name: validate stac items

run: |
pytest --tb=no ./stac/stac-generator/test/validator.py
9 changes: 9 additions & 0 deletions CoverageEncoding/CategoryInformation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Example codespace for SWE:Category

Dominant Leafe Type. Details at https://land.copernicus.eu/pan-european/high-resolution-layers/forests/dominant-leaf-type/status-maps/dominant-leaf-type-2018

| Label | Value | Color | Color RGBA |
|-------|-------|-------|-----------|
| all non-tree covered areas | 0 | ![grafik](https://github.com/FAIRiCUBE/data-requests/assets/11915304/1d01084f-cd75-4052-8019-9e122270be47) | 240,240,240,255 |
| broadleaved trees | 1 | ![grafik](https://github.com/FAIRiCUBE/data-requests/assets/11915304/1254b337-78df-4560-bc35-b1ac5ceedc5f) | 70,158,74,255 |
| coniferous trees | 2 | ![grafik](https://github.com/FAIRiCUBE/data-requests/assets/11915304/cfdee270-398f-4d3f-901f-236e6ea6258e) | 28,92,36,255 |
122 changes: 122 additions & 0 deletions CoverageEncoding/rangeType.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Specification for provision of rangeType information

OGC CIS uses [SWE Common](https://portal.ogc.org/files/?artifact_id=41157) DataRecord types for the provision of rangeType (description of what the numbers provided in the coverage range actually mean)
To date, this element has often been filled with erroneous values, not taking requirements from the SWE Common specification into account.
In order to ensure correct provision of CIS encoded data, we will describe the requirements here, together with examples.

## SWE Common
The OGC Sensor Web Enablement Suite (SWE) aims to cover requirements pertaining to measured or observed data.
For CIS encodings, the following requirements classes are of relevance:
- Record Components Package: `The “DataRecord” class is modeled on the definition of ‘Record’ from ISO 11404. In this definition, a record is a composite data type composed of one to many fields, each of which having its own name and type definition.`
- Basic Types and Simple Components Schemas: `XML Schema elements and types defined in the “basic_types.xsd” and “simple_components.xsd” schema files implement all classes defined respectively in the “Basic Types” and “Simple Components” UML packages.`

From the Basic Types and Simple Components, we rely on the Quantity and Category Elements.
While the Count Element would also be applicable, for the moment we will handle Counts as Quantities with a uom of "1" for "unitless".

## SWE:DataRecord
SWE:DataRecord, derived from the SWE Common AbstractDataComponent, can be used to group multiple components via the `field` attribute.
Each field describes one band of the described Coverage.

![grafik](https://github.com/FAIRiCUBE/data-requests/assets/11915304/f943b189-cc85-4ccf-a39e-d85cad4a3e6f)

The name attribute of the swe:field must be provided, whereby Req 39 stipulates that `Each “field” attribute in a given instance of the “DataRecord” class shall be identified by a name that is unique to this instance.`

Depending on the nature of the data provided in each band, the correct Simple Component must be provided in the corresponding DataRecord.field.

## SWE:SimpleComponent Elements
All SWE:SimpleComponent types are derived from the SWE:AbstractSimpleComponent. Elements defined for SWE:AbstractSimpleComponent apply to all SWE:SimpleComponent types, thus are described jointly in this section.

![grafik](https://github.com/FAIRiCUBE/data-requests/assets/11915304/04029405-8c1d-4b67-a210-7ce1288423c4)

### definition
`definition` is a mandatory attribute of the SWE:SimpleComponent types.

Requirement - http://www.opengis.net/spec/SWE/2.0/req/xsd-simple-components/definition-resolvable
`Req 62. The “definition” attribute shall contain a URI that can be resolved to the complete human readable definition of the property that is represented by the data component.`

Additional information on the definition attribute is provided in clause 7.2.2 as follows:
`The “definition” attribute identifies the property (often an observed property in our context) that the data component represents by using a scoped ame. It should map to a controlled term defined in an (web accessible) dictionary, registry or ontology. Such terms provide the formal textual efinition agreed upon by one or more communities, eventually illustrated by pictures and diagrams as well as additional semantic information such as relationships to units and other concepts, ontological mappings, etc. `

As Observed Property registers covering FAIRiCUBE requirements have not been identified, alternative solutions may be required. A simple solution would be the use of the [QUDT Quantity Kind Vocabulary](http://qudt.org/2.1/vocab/quantitykind), e.g. [velocity](https://qudt.org/vocab/quantitykind/Velocity) or [radiance](https://qudt.org/vocab/quantitykind/Radiance) in order to provide basic information on the property being conveyed. When re-providing data from Copernicus Services, a reference to the page describing this dataset can be used, e.g. [High Resolution Layer Dominant Leaf Type](https://land.copernicus.eu/en/products/high-resolution-layer-dominant-leaf-type)
If a reference to an Observed Property register is available, this should be used.

When working from a data request, the `Definition` field in the Bands section provides the required content.
If this is missing, and there is only one band, the text from the dataset `Documentation Link` or `Data Source` may be used.

### label
The `label` element is a short descriptive human readable label describing what property the component represents.

When working from a data request, the `cell components` field in the Bands section provides the required content.
If this is missing, and there is only one band, the text from the dataset `Title` may be used.

### description
The `description` element is a longer more descriptive human readable description describing what property the component represents.

When working from a data request, the `Description` field in the Bands section provides the required content.
If this is missing, and there is only one band, the text from the dataset Description may be used.

### nilValues
The `nilValues` element is defined on the SWE:AbstractSimpleComponentType. If nil values are used in the coverage, they must be provided here, together with a reason. The swe:NilValuesType must be used.

When working from a data request, the `Null values` field in the Bands section provides the required content.

## SWE:Quantity
The SWE:Quantity type adds a mandatory `Unit of Measure` element of type swe:UnitReference

When working from a data request, the `Null values` field in the Bands section provides the required content.

Requirement - http://www.opengis.net/spec/SWE/2.0/req/xsd-simple-components/ucum-code-used
Req 64. The UCUM code for a unit of measure shall be used as the value of the “code” XML attribute whenever it can be constructed using the UCUM 1.8 specification. Otherwise the “href” XML attribute shall be used to reference an external unit definition.

Note: UCUM 1.8 has been deprecated, current version is [UCUM 2.1](https://ucum.org/ucum). Thus, we will use UCUM 2.1 in FAIRiCUBE.

The following shows an example of a Quantity rangeType taken from the Demography dataset. Note that ideally we would use the swe:Count type for this purpose.

```
<cis11:RangeType>
<swe:DataRecord>
<swe:field name="Population_total">
<swe:Quantity definition="https://ec.europa.eu/eurostat/web/gisco/geodata/population-distribution/geostat">
<swe:label>Population total</swe:label>
<swe:description>The 2021 census contained a major innovation with the presentation of key census topics on an EU-wide 1 km² grid.</swe:description>
<swe:nilValues>
<swe:NilValues>
<swe:nilValue reason="">65535</swe:nilValue>
</swe:NilValues>
</swe:nilValues>
<swe:uom code="1"/>
</swe:Quantity>
</swe:field>
</swe:DataRecord>
</cis11:RangeType>
```

## SWE:Category
The SWE:Category type adds a `codeSpace` element of type swe:UnitReference. This element utilizes xlink:href to reference an external dictionary, taxonomy or ontology representing the code space. This element is implicitely mandatory, as Req 25 stipulates the alternative provision of a `constraint` with a list of allowed values; as the constraint alternative does not allow for additional semantics on the individual entries, the `codeSpace` approach is prefered.

However, the `codeSpace` approach requires the availability of a URI that resolves to information on the categorization used in the data, often not the case. While a machine readable approach would be preferable, in lieu of identified standards in this area, a simple [GitHub page](https://github.com/FAIRiCUBE/data-requests/blob/main/CoverageEncoding/CategoryInformation.md) can suffice.

When working from a data request, the `Category List` field in the Bands section provides the required content.

The following shows an example of a Category rangeType taken from the DominantLeafType dataset

```
<cis11:RangeType>
<swe:DataRecord>
<swe:field name="DominantLeafType">
<swe:Category definition="https://land.copernicus.eu/en/products/high-resolution-layer-dominant-leaf-type">
<swe:label>Dominant Leaf Type</swe:label>
<swe:description>The HRL Forest 2018 primary status layer Dominant Leaf Type (DLT) has been created in frame of the tender “EEA/IDM/R0/18/009 - High Resolution land cover characteristics for the 2018 reference year” as part of the EEA Copernicus Land Monitoring Service (CLMS, https://land.copernicus.eu). The DLT raster product provides a basic land cover classification with 3 thematic classes (all non-tree covered areas / broadleaved / coniferous) at 10m spatial resolution and covers the full of EEA39 area. More about the High Resolution Layers and CLMS datasets can be found at https://land.copernicus.eu/pan-european.</swe:description>
<swe:nilValues>
<swe:NilValues>
<swe:nilValue reason="no data">240</swe:nilValue>
<swe:nilValue reason="outside area">255</swe:nilValue>
</swe:NilValues>
</swe:nilValues>
<swe:codeSpace xlink:href="https://github.com/FAIRiCUBE/data-requests/blob/main/CoverageEncoding/CategoryInformation.md/"/>
</swe:Category>
</swe:field>
</swe:DataRecord>
</cis11:RangeType>
```

50 changes: 13 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,49 +2,25 @@

This space is used for issues and documentation (until a canonical documentation place has been defined) of data ingestion as part of Fairicube WP5 work.

As data ingest is tightly connected with metadata management, use of data, etc., consider also these related spaces:

- [resource-metadata](https://github.com/FAIRiCUBE/resource-metadata): in addition to the issues providing metadata for resources, also used to discuss technical details on resource metadata
- [Fairicube Hub](https://github.com/FAIRiCUBE/FAIRiCUBE-Hub-issue-tracker): for general FAIRiCUBE topics

In a nutshell, data needed by the Use Cases will be imported into the FAIRiCUBE HUB to form datacubes. These make data access easier as they are homogenized in structure and access (if you ever had to go through different data repositories and harvest data, followed by own homogenization work, you will see that pushing that work "behind the curtain" by automating access is a big advantage). Of course, in order to provide data in such a "beautified" manner the homogenization work still needs to be done by somebody. In FC, that is: us. Tools like the data request form and the rasdaman ETL suite assist greatly, but for the final homogenized datacubes human caring hands are still necessary given the vast divergence of data; improving life of data wranglers is a key mission of the project, actually.

Contents of this space:

- [How to Get Data Added](#how-to-get-data-added)
- [Further details](https://github.com/FAIRiCUBE/data-requests/wiki) like finding data, datacube access how-to, and use case specific modeling and access


## How to Get Data Added

Please hand in your requests for data to be made available within FAIRiCUBE HUB by simply [adding an issue](https://github.com/FAIRiCUBE/data-requests/issues/new/choose) in this repository; this will add your request to the queue. You need to provide the agreed slate of description elements so that data can be downloaded, understood, and the datacube can be created.

**Make sure the description is correct and complete, otherwise iterations and misinterpretations can occur which lead to delays and extra effort on all sides.**
- [How to Get Data Added](https://github.com/FAIRiCUBE/data-requests/wiki/How-to-Add-Data)
- [Choosing the right pixel type](https://github.com/FAIRiCUBE/data-requests/wiki/Choosing-the-Right-Pixel-Type)
- [Details on the Coverage range type, as inherited from SWE Common](https://github.com/FAIRiCUBE/data-requests/blob/main/CoverageEncoding/rangeType.md)
- [Connecting catalog with datacubes](https://github.com/FAIRiCUBE/data-requests/wiki/Connection-Catalog-Datacubes)
- [Finding data ingested, datacube access how-to](https://github.com/FAIRiCUBE/data-requests/wiki)
- [Use case specific modeling and access](https://github.com/FAIRiCUBE/data-requests/wiki/Data-Overview)
- [complete rasdaman documentation](https://doc.rasdaman.com)

If questions remain open (and this has turned out to be the regular case) a request clearing meeting will be held subsequently. Commonly, details can be clarified this way, or the data requestor gets asked to provide more technical details on the data. After successful data import the requestor will be contacted for verification, after which the git issue gets closed.

### Coverage Naming Conventions

Ultimately, the catalog of the FAIRiCUBE hub will allow convenient search across data. For now, until this catalog becomes available, a pragmatic convention has been adopted: the coverage name is a combination of campaign year, reference year, inventory year and release version:
- [Mapping_Campaign]_[CLC_Reference_Year]_[Created_Inventory_Year]_[Version]

Stable version example with name CLC2006_CLC2000_V2018_20 means:
- CLC2006_ That the file came from the 2006 mapping campaign (the file content was last modified in this campaign)
- _CLC2000_ That the file captures Land Cover mapping results for 2000 reference year
- _V2018_ That this file comes from a delivery created in 2018 inventory year
- _20 That this is the final stable version

Beta-version example with name CLC2006_CLC2000_V2018_20b2 means:
- CLC2006_ That the file came from the 2006 mapping campaign (the file content was last modified in this campaign)
- _CLC2000_ That the file captures Land Cover mapping results for 2000 reference year
- _V2018_ That this file comes from a delivery created in 2018 inventory year
- _20b2 That this is the second beta-version


### Metadata
As data ingest is tightly connected with metadata management, use of data, etc., consider also these related spaces:

In order to become visible to all, the new data set also needs to get its twin entry in the catalog (currently: one of https://catalog.eoxhub.fairicube.eu and https://catalog.fairicube.eu). To this end, the catalog team needs to get contacted (best via mentioning them in the corresponding data upload issue). Note that the catalog typically will require additional descriptive metadata.
- [metadata-editor WebGUI](https://catalog-editor.eoxhub.fairicube.eu/): to provide and edit metadata to be shown in the [data catalog (STAC-fastapi)](https://catalog.eoxhub.fairicube.eu/?.language=en)

[Here](encoding-examples/dominant_leaf_type-metadata.xml) is an example of a metadata record compliant with the OGC coverage standard.
- [resource-metadata](https://github.com/FAIRiCUBE/resource-metadata): in addition to the issues providing metadata for resources, also used to discuss technical details on resource metadata
- [Fairicube Hub](https://github.com/FAIRiCUBE/FAIRiCUBE-Hub-issue-tracker): for general FAIRiCUBE topics

A hitherto unsolved problem is the project's policy for data and processing access management. Preliminaries:
- [FAIRiCUBE User Management](https://github.com/FAIRiCUBE/data-requests/wiki/user-management)

80 changes: 80 additions & 0 deletions encoding-examples/near_surface_air_temperature_wcs_2_0_1.xml

Large diffs are not rendered by default.

Loading

0 comments on commit 397e212

Please sign in to comment.