Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support S3 additional checksums in high-level S3 commands #6750

Closed
m-radzikowski opened this issue Feb 26, 2022 · 18 comments
Closed

Support S3 additional checksums in high-level S3 commands #6750

m-radzikowski opened this issue Feb 26, 2022 · 18 comments
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue s3

Comments

@m-radzikowski
Copy link

Is your feature request related to a problem? Please describe.

Newly released additional S3 checksums feature enhances the SDKs operations by calculating selected checksum value on file upload. This also includes multipart upload. However, this new feature is not present in the high-level S3 commands.

Describe the solution you'd like

--checksum-algorithm parameter in the aws s3 commands, especially in the aws s3 cp.

Describe alternatives you've considered

Using low-level commands.

@m-radzikowski m-radzikowski added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Feb 26, 2022
@tim-finnigan
Copy link
Contributor

tim-finnigan commented Mar 1, 2022

Hi @m-radzikowski thanks for the feature request. There has already been some discussion on the team about how these checksums could enhance commands like aws s3 cp and aws s3 sync. But it will take more time and discussion to think through the implementation. In the meantime we can leave this issue open to track the request.

@tim-finnigan tim-finnigan added s3 and removed needs-triage This issue or PR still needs to be triaged. labels Mar 1, 2022
@kdaily kdaily mentioned this issue Jun 3, 2022
2 tasks
@rajivnarayan
Copy link

This would be a useful addition to the high-level commands. For reference here is a solution using s3api:
Would have been nice if MD5 digests were included as an option.

# aws-cli version 2.7.16
# https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/

# compute and save checksum on upload / copy
# algorithms supported: crc32 crc32c sha1 sha256
aws s3api put-object --body <file_name> --checksum-algorithm crc32 --bucket <bucket_name> --key <key_name>

# retrieve the checksum```
# ChecksumCRC32 ChecksumCRC32C ChecksumSHA1 ChecksumSHA256
aws s3api head-object --bucket <bucket_name> --key <key_name> --checksum-mode Enabled --Query ChecksumCRC32 --output text

@jonathansampson
Copy link

+1 to support for checksums when syncing.

@tim-finnigan tim-finnigan added the p2 This is a standard priority issue label Nov 14, 2022
@saksham
Copy link

saksham commented Jan 24, 2023

+1 on this feature.

@genvidkyle
Copy link

+1

@jbutz
Copy link

jbutz commented Mar 16, 2023

I've got a client migrating a small but critical dataset to S3, and they have strict requirements for data integrity validation. With checksum support missing from the S3 sync higher-level command, we expect an increased effort to meet the client's requirements. This is a significant gap as far as missing functionality goes.

@ashepherd
Copy link

+1

3 similar comments
@MaksymSimchuk-prxt
Copy link

+1

@khilnani
Copy link

+1

@animeshsg
Copy link

+1

@sarthakjain271095
Copy link

@rajivnarayan i stumbled upon this. The sha256 checksum value being returned from aws doesn't seem to be right. Additionally, as per the cli doc, --checksum-algorithm param is only supported when using SDK. Have you faced issues about the sha256 value not being rightly calculated?

@rajivnarayan
Copy link

rajivnarayan commented Jul 3, 2023 via email

@dpeger
Copy link

dpeger commented Aug 9, 2023

But it will take more time and discussion to think through the implementation.

@tim-finnigan could you perhaps elaborate on what the key problems are with adding checksum support to the s3 commands? As it is supported by the low-level s3api commands I'd expect that support in the high-level commands is straight forward. Other libraries such as boto3 support s3 based checksum computation in their high level API functions (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS).

@Park-minkyu
Copy link

I believe that most of use-cases are probably using high-level s3 command s3 cp or sync.
can we have more information to think through the implementation?

@YoongLoong
Copy link

I do support the changes in high level implementation aws s3 sync command however this feature should be disabled temporary when it is not being fixed at the moment. We have no idea when will this "new" feature exist (the thread had been 1 year plus) but the "sync" command is misleading the user that they have "sychronized" the files while it is not always the case. It may caused the financial lost to the company if the "wrong" object had been synchronized. I am forced to do the workaround to fix this aws s3 sync issue to ensure the "different md5sum with same file size" file being uploaded (skipped using aws s3 sync at the moment).

@YoongLoong
Copy link

May I have the update on the aws sync bug issue? This is causing a lot of inconvenience to sync the file(s) from AWS S3 now.

@aemous
Copy link
Contributor

aemous commented Nov 13, 2024

This feature has been released into version 2.18.0. Closing issue.

@aemous aemous closed this as completed Nov 13, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue s3
Projects
None yet
Development

No branches or pull requests