Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul taxon subsets #3363

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Overhaul taxon subsets #3363

wants to merge 4 commits into from

Conversation

gouttegd
Copy link
Collaborator

@gouttegd gouttegd commented Sep 18, 2024

This PR updates the way we are building “taxon subsets”.

As explained in #3362, we currently have, for reasons unknown (to me at least), two slightly different methods to create taxon subsets: one producing the -view subsets (human-view, mouse-view, xenopus-view) and one producing the -basic subsets (amniote-basic, euarchotonglires-basic). Both methods rely on the use of OWLTools.

This PR replaces both methods by a single one (so that all taxon subsets are produced in the same way) that relies on a new command in Uberon’s custom ROBOT plugin.

The PR does not change which taxon subsets are produced and released by default (the five subsets aforementioned: human, mouse, xenopus, amniote, and euarchotonglires).

More subsets can be produced on demand, all that is needed is to define a TAXON_ID_subsetname Make variable pointing to the desired NCBITaxon ID.

For example, to create a subset for, say, insects, one can do:

sh run.sh make subsets/insect-view.owl TAXON_ID_insect=NCBITaxon:50557

The PR also adds a possibility to create, not a subset directly, but a small component containing only oboInOwl:inSubset annotations to “tag” classes that belong to a taxon subset. For example:

sh run.sh make subsets/human-tags.ofn

would create a human-tags.ofn component containing, for all Uberon classes that belong to the human subset, oboInOwl:inSubset <http://purl.obolibrary.org/obo/uberon/core#human_subset> annotation assertion axioms. Such a component can then be merged with the main ontology for downstream use (e.g., extracting all the classes of the subset).

closes #3362

Use latest version (0.3.1) of the Uberon-specific ROBOT plugin, which
provides a new command to facilitate the creation of taxon subsets.
The custom Makefile contains two sets of rules to create taxon subsets
in two different ways:

* one set using OWLTools' `--make-species-subset` command (resulting in
  the `*-basic.owl` subsets);
* one set using the files in `src/ontology/contexts` to do basically the
  same thing as `--make-species-subset`, merely in a slightly different
  way (resulting in the `*-view.owl` subsets).

This command replaces both sets of rules by a single rule that relies on
the newly available `create-species-subset` command in the Uberon ROBOT
plugin.

In addition, a new rule is added to allow the creation of a component
file that contains `oboInOwl:inSubset` annotations to tag all the
classes that belong to a given subset. That rule is currently unused,
but the expectation is that it could be used by downstream applications
to facilitate the use of taxon-specific subsets.
There is no reason to have two different naming conventions for the
taxon-specific subsets (-view and -basic). Let's settle for -view.

This may require updating the PURL configuration for Uberon, if there
are people out there that are using the euarchontoglires-basic.owl
and/or amniote-basic.owl artifacts (though GitHub download stats suggest
nobody ever downloaded them).
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code changes, they look great, - I would like to offer a single word of caution: renaming files, even subset files, may break existing pipelines somewhere on the deep web unless we also add a purl redirect to the OBO purl config.

src/ontology/uberon.Makefile Show resolved Hide resolved
--reasoner ELK \
$(foreach root,$(TAXON_SUBSET_ROOTS),--root $(root)) \
reason --reasoner ELK --equivalent-classes-allowed all \
--exclude-tautologies structural \
relax \
remove --axioms equivalent \
relax \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't imagine what this second relax does but well since it's there..

src/ontology/uberon.Makefile Show resolved Hide resolved
@gouttegd
Copy link
Collaborator Author

gouttegd commented Oct 3, 2024

renaming files, even subset files, may break existing pipelines somewhere on the deep web unless we also add a purl redirect to the OBO purl config.

I know. But precisely, this is to be handled at the PURL level, which is here exactly for this purpose. There is no reason for us to refrain from renaming files if it brings better consistency (in this case, by having all taxon subsets consistently named something-view.owl instead of having some subsets being named something-basic).

@dosumis
Copy link
Contributor

dosumis commented Oct 3, 2024

Hi @gouttegd - great to see this.

I'm especially excited to see this:

The PR also adds a possibility to create, not a subset directly, but a small component containing only oboInOwl:inSubset annotations to “tag” classes that belong to a taxon subset. For example:

sh run.sh make subsets/human-tags.ofn

would create a human-tags.ofn component containing, for all Uberon classes that belong to the human subset, oboInOwl:inSubset http://purl.obolibrary.org/obo/uberon/core#human_subset annotation assertion axioms. Such a component can then be merged with the main ontology for downstream use (e.g., extracting all the classes of the subset).

I would really like the tags to in incorporated into the release files. We can use them straight away in our autosuggest pipelines to boost for species relevant terms.

@gouttegd
Copy link
Collaborator Author

gouttegd commented Oct 3, 2024

I would really like the tags to in incorporated into the release files.

I wasn’t sure whether this was your preferred option, so for now the idea was to merely produce the -tags file and leave downstream users of Uberon merge them with uberon.owl if they wanted.

But it certainly can be done directly upstream if preferred.

Do we want all release artefacts to include those tags (e.g. uberon.owl, oberon-basic.owl, but also all the organ-specific subsets such as nephron-minimal, sensory-minimal, etc)? Or just one supplementary artefact containing the taxon subset tags (e.g. something like uberon-with-subset-tags.owl)?

Based on your comment I assume the former (tags included in all release artefacts), in which case we’ll need a new intermediate file (upstream of uberon.owl) from which the subset tags can be generated before we produce the final uberon.owl with the subset tags included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overhauling “taxon subsets”
3 participants