Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citing GA4GH standards #39

Open
susanfairley opened this issue Apr 29, 2022 · 14 comments
Open

Citing GA4GH standards #39

susanfairley opened this issue Apr 29, 2022 · 14 comments
Labels
CRITICAL Decision needs to be made soon GA4GH Organization

Comments

@susanfairley
Copy link
Contributor

This issue has been raised recently in two Work Streams.

LSG have discussed the issue here: samtools/hts-specs#179

In addition, Discovery have had this question in relation to citation of the Data Connect standard in the time period before a publication can be released.

As additional background, GA4GH is redeveloping its website, which could theoretically play a role in some of the possible solutions to this.

It would be useful for TASC to investigate and determine an approach that can, ideally, be applied consistently across GA4GH.

@ianfore
Copy link

ianfore commented Jun 27, 2022

Do we see the citing use case as different from the need to reference a standard from within data?

The identifiers discussion of recent years (identifiers.org, n2t.net, bioregistry.io) has recognized much in common to these use cases and a common approach to them.

A specific relevance to GA4GH is how GA4GH standards would be referenced from the service registry and service-info.

Most specifically, the type in service-info references a "Type of a GA4GH service"
Most of following service-info response is specific to the implementation. The type field cites the service being used.

{
    "id": "drs.starterkit.federatedgenomics.org",
    "name": "Federated Genomics DRS service",
    "description": "Data Repository Service (DRS) instance serving public genomics datasets. Deployment of the GA4GH Starter Kit.",
    "contactUrl": "mailto:[email protected]",
    "documentationUrl": "https://apidocs.federatedgenomics.org/drs",
    "createdAt": "2022-07-10T09:00:00Z",
    "updatedAt": "2022-07-10T09:00:00Z",
    "environment": "development",
    "version": "1.0.0",
    "type": {
        "group": "org.ga4gh",
        "artifact": "drs",
        "version": "1.1.0"
    },
    "organization": {
        "name": "Federated Genomics",
        "url": "https://this-is-not-a-site.federatedgenomics.org"
    }
}

Following identifier practices using compact identifiers (Curies) the following approach may be useful
ga4gh:drs/1.1.0

Use of the ga4gh namespace (#16 ) for GA4GH standards seems an appropriate use of the namespace. It can likely co-exist with the VRS use of the namespaces which indicates the VRS ids by type as part of the identifier.

@michaelmhoffman
Copy link

I see the use cases as distinct—I see citation as being used in documents such as journal articles that have primarily human readers. DOIs (in the form of a URL starting with https://doi.org/) are the most advantageous identifier for this. Non-DOI URLs could work. Anything else is going to be used in all sorts of systems, where first-class support for arbitrary identifier schemes is never happening. Even citing standards from extremely well-known organizations such as ISO is awkward compared to a DOI.

@ajhpage
Copy link
Collaborator

ajhpage commented Jun 29, 2022

We do not currently have a consistent method for citing standards in journal articles (but I'd definitely be in favor of having one!). I think the motivation for writing a paper has often been exactly that - to create a citable reference. There was previously discussion about using the DOI approach and I have a sneaking suspicion that @mcourtot may know more about that and why it didn't turn into reality. I think the most common reference used has been the url of the documentation for the standard in question and that has been acceptable to editors. Here are two recent examples:
https://doi.org/10.1093/bioinformatics/btab524
https://doi.org/10.1093/bioinformatics/btac010

@mcourtot
Copy link

We struggled with this quite a bit for DUO until the paper was published, as there was a related project which had a publication available, and this was cited by default - even thought we had specific instructions for citation in the DUO repository using a PURL.
I like Zenodo for specifications, it supports versioning and provides a DOI. Maybe the GA4GH technical team would be willing to drive this, and then add CFF files to all GA4GH repos? At least we would have consistency in representing how the specs authors would like it to be cited, and setting this as a shared expectations may drive adoption?

@jmarshall
Copy link
Member

jmarshall commented Jun 30, 2022

Using Zenodo means depositing a copy of a specification with Zenodo, and the resulting DOI refers to the copy at a zenodo.org URL.

IMHO if GA4GH is a serious standards-setting organisation, it should be capable of using DOIs that point to GA4GH's canonical specification documents or landing pages. For example, I believe becoming a member of CrossRef would be a way to produce such DOIs. (This also of course largely presupposes that GA4GH is capable of maintaining a stable technical website containing specifications at stable URLs. This is not something GA4GH has focussed on to date, but to my mind it would also be part of being a serious standards-setting organisation.)

I previously attempted to summarise the options for DOIs that should be investigated in samtools/hts-specs#179 (comment). Also as noted in the samtools/hts-specs#179 discussion, there are some other options that should be investigated in addition to DOIs.

@ianfore
Copy link

ianfore commented Nov 7, 2022

Given that what I posted here samtools/hts-specs#179 (comment) came out of a TASC call perhaps this thread would have been a better place for it. Cross-linking.

Discussion continues in that other thread - which is maybe not so bad as it was a source of the actual need came from.

@andrewyatz
Copy link
Contributor

My feeling here is we have a number of issues colliding with each other such as

  1. Creating DOIs which point to a long-term archive of a standard (the Zenodo method)
  2. Creating DOIs to point to active documentation/artefact of a live standard (the CrossRef method)
  3. A manuscript which is cited with a DOI (publication)

Where are the priorities here? Have I missed another use-case

@mshadbolt
Copy link

mshadbolt commented Nov 10, 2022

Coming to the party late here and not an expert but I think Zenodo would be a great option for a lot of reasons. Chief among them that it is set up ready and easy to use, and perhaps could be a solution until GA4GH sets up something more permanent or decides to mint dois and provide stable long term storage. It gives you the ability to cite properly, attribute authorship properly, doi for every version that is uploaded as well as a url that always resolves to the latest version (see here for more info on that). Plus integrations (OpenAIRE/ORCID), APIs etc. You can also set up 'communities' that group together everything e.g. https://zenodo.org/communities/australianbiocommons/?page=1&size=20 . Getting metrics on views and downloads could also be a useful feature.

I don't think this would negate the need to also publish in journals, but I think having something citable until a standard is published in a journal, as well as something that is update-able with new versions over time (that may not need to be re-published) is important.

Making records at fairsharing could also be an option e.g. Beacon entry https://fairsharing.org/FAIRsharing.6fba91. I like this because you can link together github, documentation, publication etc all in one place. I think you still need to store the standard somewhere stable outside of their platform though.

@michaelmhoffman
Copy link

michaelmhoffman commented Nov 15, 2022

I just noticed that standards are a first-class record type at CrossRef. See their Standards markup guide.

@uniqueg
Copy link

uniqueg commented Sep 25, 2023

EDIT: Just realizing that I'm basically parroting what @mshadbolt has already said above. 100% agree. But CrossRef looks good, too, as @michaelmhoffman suggested. For me either Zenodo or CrossRef will be fine (or any other equivalent solution), as long as we have any solution that works for most use cases.


My perspective here: It would of course be great if GA4GH hosted its products itself and minted DOIs for them. But if (or as long as) that is not an option, this shouldn't stop us from solving this issue somehow in the meantime by creating guidelines for:

  1. Citing GA4GH products (this issue)
  2. Creating releases for GA4GH products that include where to host them, where to host the documentation and how to make them citable (agreeing with @michaelmhoffman: DOIs would be my favorite); there is already in issue for that: Streamlined API release policies #46; ideally we could write a GitHub Action that products could easily include in their CI workflows

In fact, I believe that once 2. is available and adopted for all releases (past and future), then 1. becomes fairly trivial for the main use case of citing a specific release of a specific version. Citing a paper for a given product (if available) is complimentary, in my opinion, and instructions for citing such a paper vs the specs (or both) can be included in the standard, docs or an accompanying file somewhere.

I think this could be fairly easily done via Zenodo, e.g., see the RO-Crate 1.1 spec. It also allows to set one DOI for each release of a product, and one DOI for the product as a whole (which always points to the latest release).

As for citing unreleased discussions/proposals/merges: These could be referred to via GitHub permalinks, but the guidelines should probably hint at the risks and discourage such citations in favor of DOIs of stable release snapshots wherever possible.

@andrewyatz
Copy link
Contributor

To give an update here. Angela has worked quite hard with CrossRef and we certainly have a way forward there. CrossRef supporting standards as a first class entity was a major reason for adopting them. The GA4GH technical team is looking at how to mint these identifiers and provide additional tooling to help GA4GH to create DOIs. What is still clear are issues around:

  • What are the items to be DOI'd?
  • What DOI structure do we want to have?
  • Should DOIs be predictable or obfuscated? Note recommendation from CrossRef is to obfuscated

It's in this light I'd like to frame TASC's discussions. Certainly any resource wishing to mint DOIs via another method is welcomed to do so and GA4GH does not want to get in the way of this.

@ajhpage
Copy link
Collaborator

ajhpage commented Sep 27, 2023 via email

@andrewyatz
Copy link
Contributor

Absolutely the requirements for DOIs are different depending on the part of the organisation you refer to. So the individual pages will make sense, but so would individual documents about standards and I think what TASC might be thinking about more than the top-level pages.

To quote CrossRef though about their reasons:

Suffixes are best when they include short strings that are easily displayed and typed but are ‘dumb’ - meaning, the suffixes contains no readable information, including metadata.
Keep suffixes short. This makes them easier to read and to re-type. Remember, DOIs will appear online and in print.
Remember, DOIs are persistent and not subject to correction or deletion.

As for the multiple minting, it would potentially be confusing but not the end of the world. I was more suggesting it as a stop-gap until we get this CrossRef work off the ground :)

@jkbonfield
Copy link

A DOI to a standard page allows citing of the overall standard, but not citing a specific version in use which can sometimes be vital for reproducability. I think both have merit.

DOIs can have metadata attached, the most obvious being a URL, but this also permits authors. Having versions of specs with DOIs mean the authors that arrive later on can still get credit for their input to that specific version of a specification, which is why I feel DOIs to spec versions is important.

@mamanambiya mamanambiya added GA4GH Organization CRITICAL Decision needs to be made soon labels May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CRITICAL Decision needs to be made soon GA4GH Organization
Projects
None yet
Development

No branches or pull requests