-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
locations constraints on DRS Pointer #400
Comments
The CRDC driven work in fasp-scripts had this use case in mind. The basic model was to use DRS to find out where the provider (CRDC, BDC, Anvil, etc) had made the data available and "go with the flow" of running compute there rather than downloading. The guidance is in essence provided by the provider by having the DRS service tell the consumer where the data is available. Some providers didn't enforce the expectation that the consumer would compute in place. They expected the consumer to "go with the flow". If we need the addition proposed here it might likely be better as an attribute of an access method - providing the constraints on usage in that particular location. |
I think that this is a valid concern for data that is being indexed by a DRS server however I am not sure that the |
Trying to mock this as part of the access method, could this be informational in this way: Some things I included here:
|
In this corrected version:
|
In the Cloud WS meeting on Aug 12th, 2024 we decided to simplify the feature described in Issue #400 for DRS release 1.5. PR #407, intended for DRS release 1.5, simply adds a string "cloud" to the access response. We now include cloud, region, and type information only… no cloud or geo location constraint support for example. The fields we will include are:
After DRS 1.5 we can revisit how we express region, cloud, geo location, etc constraints in DRS which is a much bigger issue. |
In CRDC driver Project and also in BioDataCatalyst we have a situation where the host of the data would like to provide a guidance on how to use the data, and there to use it.
In other words, they would like that any platform downstream of the DRS Server would compute on the data in certain cloud locations, which usually are the same where the data are from. The reasons for this request are different, going from keeping the egress cost down, to not having the data leaving the security level.
Given that at the end we have download url in DRS, and it would be pretty difficult to enforce the situation, therefore I suggest we go more towards an idea where the host "suggest" what is the preferred way to access the data, and the DRS client accessing these data honor the request to the best of their ability.
Proposal
The proposal aims to enhance the GA4GH DRS (Data Repository Service) specification by introducing a new field that provides metadata regarding the intended usage and location constraints for data objects. This additional field will allow data providers to specify their preferences and requirements for how the data should be accessed and utilized. The proposed field will offer the following options:
Cloud Exclusive (cloud_exclusive): the data object is intended for use exclusively within a cloud environment. Users are expected to access and process the data only within a cloud computing infrastructure and not outside of it; cannot download the data on somebody's laptop
Cloud Provider-Limited (cloud_provider_limited): the data object should not leave the cloud provider's ecosystem. Users are restricted from moving the data to external locations or platforms. It must remain within the boundaries of the specified cloud provider.
Cloud Region-Limited (cloud_region_limited): the data object is restricted to a specific cloud region. Users are required to access and process the data within the designated region and are prohibited from transferring it to other geographic locations within the cloud provider's infrastructure.
By introducing this new field, data providers and administrators can communicate their data access and usage policies more effectively, ensuring that data is handled in accordance with their specific requirements. This addition not only enhances the flexibility of the DRS specification but also strengthens data governance and compliance for genomic and health-related data in cloud-based environments.
It could look like this:
In this structure:
cloud_provider
to specify the preferred cloud provider, andcloud_region
to designate the desired cloud region.This structured metadata allows data providers to clearly communicate their data access and usage policies, ensuring that users are aware of the intended constraints. It also enables data consumers to make informed decisions about how to handle and access the data. The specific values for
access_type
can be defined in the DRS specification, and they should correspond to the proposed usage policy options. This structure helps promote consistency and interoperability across different implementations of the DRS specification.The text was updated successfully, but these errors were encountered: