Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt the Occurrence API to use work with multiple taxonomies #342

Open
fmendezh opened this issue Mar 14, 2024 · 5 comments
Open

Adapt the Occurrence API to use work with multiple taxonomies #342

fmendezh opened this issue Mar 14, 2024 · 5 comments
Assignees

Comments

@fmendezh
Copy link
Contributor

fmendezh commented Mar 14, 2024

Draft proposal of changes to the Occurrence API to support: filter by checklist, add additional ranks, full classification, checklists dataset key, data types changes.

Decide if this change can be applied to API v1 or if a V2 is needed.

djtfmartin added a commit to gbif/gbif-api that referenced this issue Aug 28, 2024
djtfmartin added a commit that referenced this issue Aug 28, 2024
@djtfmartin
Copy link

djtfmartin commented Aug 29, 2024

Draft proposal for Occurrence API changes

Below is a set additional webservices and additional parameter options for existing web services.
These changes are intended to be additions to the existing v1 API and backwards compatible i.e. no breaking changes although they include additions to the output format, but leaving existing fields and nested structures in place.


Search response - including multiple classifications

The search response includes a classifications array, which contains 0..n classifications associated with the occurrence record. Example json below (shortened for brevity). The existing gbifClassification will remain in place with integer keys.

{
    "offset": 0,
    "limit": 20,
    "endOfRecords": false,
    "count": 412833,
    "results": [
        {
            "key": 462028,
            "datasetKey": "9bd520e3-00fa-4955-a554-924ea440862c",
            "publishingOrgKey": "d2b97690-bfd6-11de-b279-d52977ace833",
            "installationKey": "99672740-f762-11e1-a439-00145eb45e9a",
            "hostingOrganizationKey": "d2b97690-bfd6-11de-b279-d52977ace833",
            "publishingCountry": "IE",
            "protocol": "DWC_ARCHIVE",
            "lastCrawled": "2024-09-05T18:36:01.493+00:00",
            "lastParsed": "2024-09-12T14:10:38.809+00:00",
            "crawlId": 176,
            "extensions": {},
            "basisOfRecord": "HUMAN_OBSERVATION",
            "occurrenceStatus": "PRESENT",
            "sex": "MALE",
            "lifeStage": "Adult",
            "classifications": [
                {
                    "datasetKey": "7ddf754f-d193-4cc9-b351-99906754a03b",
                    "usage": {
                        "key": "8C2QW",
                        "name": "Episyrphus (Episyrphus) balteatus (De Geer, 1776)",
                        "rank": "SPECIES"
                    },
                    "acceptedUsage": {
                        "key": "8C2QW",
                        "name": "Episyrphus (Episyrphus) balteatus (De Geer, 1776)",
                        "rank": "SPECIES"
                    },
                    "classification": [
                        {
                            "key": "RT",
                            "name": "Arthropoda",
                            "rank": "PHYLUM"
                        },
                        {
                            "key": "CHP6G",
                            "name": "Hexapoda",
                            "rank": "SUBPHYLUM"
                        },
                        {
                            "key": "D2P",
                            "name": "Diptera",
                            "rank": "ORDER"
                        },
                        {
                            "key": "BXZTG",
                            "name": "Episyrphus",
                            "rank": "SUBGENUS"
                        },
                        {
                            "key": "BXZTD",
                            "name": "Episyrphus",
                            "rank": "GENUS"
                        },
                        {
                            "key": "B7XFC",
                            "name": "Syrphini",
                            "rank": "TRIBE"
                        },
                        {
                            "key": "8C2QW",
                            "name": "Episyrphus balteatus",
                            "rank": "SPECIES"
                        },
                        {
                            "key": "N",
                            "name": "Animalia",
                            "rank": "KINGDOM"
                        },
                        {
                            "key": "5T6MX",
                            "name": "Biota",
                            "rank": "UNRANKED"
                        },
                        {
                            "key": "H6",
                            "name": "Insecta",
                            "rank": "CLASS"
                        },
                        {
                            "key": "9H6NG",
                            "name": "Syrphinae",
                            "rank": "SUBFAMILY"
                        },
                        {
                            "key": "GVS",
                            "name": "Syrphidae",
                            "rank": "FAMILY"
                        }
                    ]
                }
            ],
            "type": "Occurrence"
        }
    ],
    "facets": []
}

Searches with checklistKey

Searches with the new request parameter checklistKey will allow users to retrieve records associated with a checklist.
This is possibly only of real use for smaller thematic checklists.
The checklistKey is the GBIF dataset key for the checklist e.g. 7ddf754f-d193-4cc9-b351-99906754a03b for Catalogue of Life

https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b

This only return occurrence results when the specified checklist is one of the checklists supported by multi taxonomy matching.
Occurrence records that have been matched to a taxon in the specified checklist will be returned


Searches with taxonKey and checklistKey

Searches with the new request parameter checklistKey and taxonKey will allow users to specify the checklist in use,. The following would be a query with a taxon from Catalogue of Life:

https://api.gbif-dev2.org/v1/occurrence/search?taxonKey=CB2MR&checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b

The result of this query would be to find records associated with the supplied taxonKey from the checklist specified by the checklistKey.
This only return occurrence results when the specified checklist is one of the checklists supported by multi taxonomy matching.


Searches with scientificName and checklistKey

Searches with the new request parameter checklistKey and scientificName will allow users to specify the taxonomy in use when matching the scientificName provided.

https://api.gbif-dev2.org/v1/occurrence/search?scientificName=Episyrphus%20(Episyrphus)%20balteatus&checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b

This will use name usage matching using the checklist with the specified checklistKey.
The checklist will resolve the name to a taxonKey in the checklist, and this will be used for occurrence searching.

The result of this query would be to find records associated with the matched taxonKey from the checklist specified by the checklistKey.
This only return occurrence results when the specified checklist is one of the checklists supported by multi taxonomy matching.

Facet on checklistKey

The ability to facet on checklistKey with any query to retrieve a list of relevant checklists for a particular search:

https://api.gbif-dev2.org/v1/occurrence/search?facet=checklistKey&limit=0

Will return:

{
  "offset": 0,
  "limit": 0,
  "endOfRecords": false,
  "count": 100,
  "results": [ ],
  "facets": [
    {
      "field": "CHECKLIST_KEY",
      "counts": [
        {
          "name": "2d59e5db-57ad-41ff-97d6-11f5fb264527",
          "count": 100
        },
        {
          "name": "7ddf754f-d193-4cc9-b351-99906754a03b",
          "count": 100
        },
        {
          "name": "d7dddbf4-2cf0-4f39-9b2a-bb099caae36c",
          "count": 100
        }
      ]
    }
  ]
}

The UUIDs returned here are datasetKey values in the GBIF registry.

Facets with checklistKey filter

The facets for taxonKey and higher rank taxon keys e.g. kingdomKey, genusKey will return values based on the GBIF taxonomy by default.
If a checklistKey is specified, then results will be from that checklist. For example:

https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=2d59e5db-57ad-41ff-97d6-11f5fb264527&facet=familyKey

Returns facets for familyKey values for WoRMS

{
    "offset": 0,
    "limit": 0,
    "endOfRecords": false,
    "count": 497984,
    "results": [],
    "facets": [{
        "field": "FAMILY_KEY",
        "counts": [{
            "name": "urn:lsid:marinespecies.org:taxname:235102",
            "count": 64908
        }, {
            "name": "urn:lsid:marinespecies.org:taxname:147429",
            "count": 23931
        }, {
            "name": "urn:lsid:marinespecies.org:taxname:196044",
            "count": 18861
        }, {
            "name": "urn:lsid:marinespecies.org:taxname:234449",
            "count": 18357
        }]
    }]
}

Search by any rank

Support search by any taxonomic rank. Applications using the web services can retrieve a list of checklists indexed.
With a checklist ID, a list rank key field names can be retrieved:

https://api.gbif-dev2.org/v1/occurrence/search/checklist/2d59e5db-57ad-41ff-97d6-11f5fb264527/rankKeys

Rank keys can be used to search occurrences for non major Linnean ranks such as subphylum, suborder:

https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=2d59e5db-57ad-41ff-97d6-11f5fb264527&subphylumKey=urn:lsid:marinespecies.org:taxname:886369

This example is searching subphylum using the WoRMS checklist.

Search by taxonDepth

To aid UI development, particularly taxonomic tree browsing components, and with Catalogue of Life and other taxonomic sources such as WoRMS, we need to support searching for different ranks, we can support for taxonDepth.
This allows the querying the taxonomic tree information based on a numerical depth within the tree as opposed to specific taxonomic rank (e.g. kingdom).

This URL will return root taxa (regardless of rank) for the specified checklist.

https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b&facet=taxonDepth0&limit=0

This URL will return child taxa (regardless of rank) of the taxon with taxonKey=5T6MX for the specified checklist.

https://api.gbif-dev2.org/v1/occurrence/search?checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b&facet=taxonDepth1&limit=0&taxonDepth0=5T6MX

Example output

{
  "offset": 0,
  "limit": 0,
  "endOfRecords": false,
  "count": 450926,
  "results": [ ],
  "facets": [
    {
      "field": "TAXON_DEPTH_1",
      "counts": [
        {
          "name": "P",
          "count": 314969
        },
        {
          "name": "N",
          "count": 135686
        },
        {
          "name": "c2ce3656-5b6e-46ea-b042-2056011ddb30",
          "count": 188
        },
        {
          "name": "B6LM6",
          "count": 78
        },
        {
          "name": "F",
          "count": 4
        },
        {
          "name": "C",
          "count": 1
        }
      ]
    }
  ]
}

Predicate search API

With predicate API the EqualsPredicate and InPredicate have been extended to include a checklistKey field allowing the user to specify the checklist that should be used for taxonomic key fields and taxon depth fields.
The predicate API supports searching with multiple taxonomies in a single query.
e.g. users can combine a search with a taxonKey from WoRMS and an taxonKey from Catalogue of Life.

Example with single SPECIES_KEY

{
    "predicate": {
        "type": "and",
        "predicates": [
            {
                "type": "equals",
                "key": "SPECIES_KEY",
                "value": "6HQ2Y",
                "checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
            }
        ]
    }
}

Example with TAXON_DEPTH_0

{
    "predicate": {
        "type": "and",
        "predicates": [
            {
                "type": "equals",
                "key": "TAXON_DEPTH_0",
                "value": "5T6MX",
                "checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
            }
        ]
    }
}

Example with multiple SPECIES_KEY values with taxa from different checklists (WoRMs and CoL in this example):

{
    "predicate": {
        "type": "or",
        "predicates": [
            {
                "type": "equals",
                "key": "SPECIES_KEY",
                "value": "5T6MX",
                "checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
            },
            {
                "type": "equals",
                "key": "SPECIES_KEY",
                "value": "urn:lsid:marinespecies.org:taxname:159142",
                "checklistKey": "2d59e5db-57ad-41ff-97d6-11f5fb264527"
            }
        ]
    }
}

For testing with curl:

curl --request POST \
  --header "Content-Type: application/json" \
  --data '{
    "predicate": {
      "type": "and",
      "predicates": [
        {
          "type": "equals",
          "key": "TAXON_DEPTH_0",
          "value": "5T6MX",
          "checklistKey": "7ddf754f-d193-4cc9-b351-99906754a03b"
        }
      ]
    }
  }' \
  https://api.gbif-dev2.org/v1/occurrence/search/predicate 

Example with curl, using WoRMS and multiple species key values from WoRMS:

curl --request POST \
  --header "Content-Type: application/json" \
  --data '{
    "predicate": {
      "type": "and",
      "predicates": [
        {
          "type": "in",
          "key": "SPECIES_KEY",
          "values": [
              "urn:lsid:marinespecies.org:taxname:159142",
              "urn:lsid:marinespecies.org:taxname:159037"
          ], 
          "checklistKey": "2d59e5db-57ad-41ff-97d6-11f5fb264527"
        }
      ]
    }
  }' \
  https://api.gbif-dev2.org/v1/occurrence/search/predicate 

djtfmartin added a commit that referenced this issue Sep 3, 2024
djtfmartin added a commit that referenced this issue Sep 7, 2024
djtfmartin added a commit to gbif/gbif-api that referenced this issue Sep 7, 2024
djtfmartin added a commit to gbif/dwc-api that referenced this issue Sep 8, 2024
djtfmartin added a commit that referenced this issue Sep 8, 2024
djtfmartin added a commit that referenced this issue Sep 8, 2024
djtfmartin added a commit that referenced this issue Sep 11, 2024
djtfmartin added a commit that referenced this issue Sep 12, 2024
djtfmartin added a commit that referenced this issue Sep 12, 2024
djtfmartin added a commit to gbif/gbif-api that referenced this issue Sep 12, 2024
djtfmartin added a commit to gbif/gbif-common-ws that referenced this issue Sep 17, 2024
taxon depth facet support
facetting on any rank level for a checklist
query on any rank level for a checklist
gbif/occurrence#342
djtfmartin added a commit that referenced this issue Sep 17, 2024
taxon depth facet support
facetting on any rank level for a checklist
query on any rank level for a checklist
#342
djtfmartin added a commit to gbif/gbif-api that referenced this issue Sep 17, 2024
taxon depth facet support
facetting on any rank level for a checklist
query on any rank level for a checklist
gbif/occurrence#342
djtfmartin added a commit that referenced this issue Sep 19, 2024
djtfmartin added a commit to gbif/gbif-api that referenced this issue Sep 19, 2024
djtfmartin added a commit to gbif/gbif-common-ws that referenced this issue Sep 19, 2024
djtfmartin added a commit that referenced this issue Sep 19, 2024
djtfmartin added a commit that referenced this issue Sep 19, 2024
djtfmartin added a commit that referenced this issue Sep 19, 2024
djtfmartin added a commit that referenced this issue Sep 19, 2024
djtfmartin added a commit that referenced this issue Sep 23, 2024
djtfmartin added a commit that referenced this issue Sep 23, 2024
@MortenHofft
Copy link
Member

MortenHofft commented Sep 25, 2024

This is a large new functionality, so I suppose large changes is expected. Here are the things that surprised me

Searches with taxonKey and checklistKey in GET API

The current version only allow for one checklist. Which is okay I suppose. It might be unlikely anyone want to use more that one.

I'm not dead aginst this, but it is slightly puzzling because it change the behaviour of what taxonKey refers to. I get 10 results for taxonKey=3, then I add an additional filter for checklistKey=123 and get more results. And I can only add it once, which is a bit unusual, but not crazy - the same goes for flags, but given that this use keys I expected to be able to add multiple.

Ideas
For species search we have flags that indicate changed beahviour (verbose=true, strict, qField=SCIENTIFIC). We could have something like matchMultipleChecklists=true which indicate changed behaviour. Once I add that flag, then taxonKey would match against all checklists. And I can then decide to narrow that by adding one or more checklistKey.

Another version: checklistTaxonKey=[datasetKey]:[taxonKey]

Search using predicates

If I undestand the conversation elsewhere correctly then the predicate approach is

{
  type: 'and',
  predicates: [
    {
      type: 'equals',
      key: 'checklistKey',
      value: '123-123-123'
    },
    {
      type: 'equals',
      key: 'taxonKey',
      value: '5dX'
    }
  ]
}

And to only allow the checklist predicate once.

This is confusing to me. Again it isn't clear to me how the 2 predicates in the AND influence each other. And secondly it is odd it only can be used once. And lastly unclear what part of the tree it applies to in that case (I imagine a more complex predicate with multiple AND/OR/NOT)

If it is only allowed once, then it isn't a predicate in my mind, but belongs om the same level as the q param: outside the predicate structure.

Something like {type: equals, key: taxonKey, checklist: '123-123-123', value: 5dX} is easier to understand and more expressive I would think.

Or {type: equals, key: checklistTaxonKey, checklist: '123-123-123', value: 5dX} or even a new type like
{type: checklistContext, checklistKey: '123-123-123', predicates: []} which then specifies the taxon scope for anything beneath.

djtfmartin added a commit to gbif/gbif-predicates that referenced this issue Sep 26, 2024
djtfmartin added a commit to gbif/gbif-api that referenced this issue Sep 26, 2024
djtfmartin added a commit to gbif/gbif-predicates that referenced this issue Sep 26, 2024
djtfmartin added a commit to gbif/gbif-predicates that referenced this issue Sep 26, 2024
@djtfmartin
Copy link

djtfmartin commented Sep 27, 2024

I've updated the main "Draft proposal" a bit to include predicates. I think where i've landed thus far is:

  • REST API only supports searching with single checklist. I think overloading single request parameters with multiple IDs might not be a good thing and if folks want searching across multiple taxonomies (arguably an edge case), then they can use the Predicate API.
  • Predicate API supports searching with multiple checklists. The predicate API is already more expressive than the REST API, so i think this disparity between REST and predicate API is something we can live with.
  • the EqualsPredicate and InPredicate have been changed to include an optional checklistKey, in a similar manner to the matchCase field - which only makes sense for certain OccurrenceSearchParameters and is ignored if its not relevant for the type.

djtfmartin added a commit to gbif/gbif-predicates that referenced this issue Sep 27, 2024
djtfmartin added a commit to gbif/gbif-predicates that referenced this issue Sep 27, 2024
djtfmartin added a commit to gbif/gbif-predicates that referenced this issue Sep 27, 2024
djtfmartin added a commit to gbif/dwc-api that referenced this issue Sep 27, 2024
djtfmartin added a commit that referenced this issue Sep 27, 2024
djtfmartin added a commit to gbif/gbif-api that referenced this issue Sep 27, 2024
djtfmartin added a commit to gbif/gbif-predicates that referenced this issue Sep 27, 2024
djtfmartin added a commit to gbif/gbif-predicates that referenced this issue Sep 27, 2024
djtfmartin added a commit to gbif/gbif-api that referenced this issue Sep 27, 2024
@fmendezh
Copy link
Contributor Author

Another option to consider is to use the checklist explicitly as part of the services that support multiple taxonomies, for example:

https://api.gbif-dev2.org/v1/occurrence/search/checklist/{checklistKey}?....

https://api.gbif-dev2.org/v1/occurrence/search/checklist/7ddf754f-d193-4cc9-b351-99906754a03b?....

@MattBlissett
Copy link
Member

https://api.gbif-dev2.org/v1/occurrence/search/checklist=7ddf754f-d193-4cc9-b351-99906754a03b?....

Another option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants