Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The AmazonS3Uri class does not seem to support some characters #3585

Open
1 task
mhroberts7734 opened this issue Dec 19, 2024 · 3 comments
Open
1 task

The AmazonS3Uri class does not seem to support some characters #3585

mhroberts7734 opened this issue Dec 19, 2024 · 3 comments
Assignees
Labels
bug This issue is a bug. needs-review p2 This is a standard priority issue s3

Comments

@mhroberts7734
Copy link

Describe the bug

When using the AmazonS3Uri class to parse pre-signed urls to get the key and bucket name, The class gets some characters incorrect.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

key = "180天OTB+(锦江)1216+-+sample+(1).xlsx"

Current Behavior

key = "180天OTB+(é\u0094¦æ±\u009fï¼\u00891216+-+sample+(1).xlsx"

Reproduction Steps

var filename = "https://my-bucket.s3.us-east-1.amazonaws.com/180天OTB+(锦江)1216+-+sample+(1).xlsx";
var url = new Uri(filename);
var s3Url = new AmazonS3Uri(url);
var key = s3Url.Key;
var s3Key = Uri.UnescapeDataString(url.AbsolutePath);

Possible Solution

No response

Additional Information/Context

No response

AWS .NET SDK and/or Package version used

AWSSDK.S3 3.7.410.8

Targeted .NET Platform

.NET 8.0

Operating System and version

Windows 11

@mhroberts7734 mhroberts7734 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 19, 2024
@bhoradc bhoradc added needs-reproduction This issue needs reproduction. s3 p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Dec 19, 2024
@bhoradc bhoradc self-assigned this Dec 19, 2024
@bhoradc
Copy link

bhoradc commented Dec 24, 2024

Hello @mhroberts7734,

Thank you for reporting the issue.

 var filename = "https://my-bucket.s3.us-east-1.amazonaws.com/180天OTB+(锦江)1216+-+sample+(1).xlsx";
 var url = new Uri(filename);
 var s3Url = new AmazonS3Uri(url);
 var key = s3Url.Key;
 Console.WriteLine(key);
 var s3Key = Uri.UnescapeDataString(url.AbsolutePath);

Above steps, reproduces key as 180天OTB+(é�¦æ±�ï¼�1216+-+sample+(1).xlsx instead of 180天OTB+(é\u0094¦æ±\u009fï¼\u00891216+-+sample+(1).xlsx.

I am running this on a windows console application with System.Text.UTF8Encoding. Kindly let me know your environment and console output encoding settings.

Regardless, the expected behavior should be that the AmazonS3Uri class correctly decodes and displays the S3 key, including any non-ASCII characters, without turning to Unicode escape sequences. However, the fact that the key is being displayed with Unicode escape sequences (\u0094, \u009f, \u0089) for certain characters, suggests that the class may not be properly handling or decoding those characters.

There seems to be a similar issue reported in past - #1902. Meanwhile, as you provide the environment and encoding details, I will review this with the team.

Regards,
Chaitanya

@bhoradc bhoradc added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. needs-review and removed needs-reproduction This issue needs reproduction. labels Dec 24, 2024
@mhroberts7734
Copy link
Author

Hello, I am not using the console to view the output of the two, I was using the debugger and just hovering over the variables. That makes sense that the console would replace the unicode sequences with those glyphs though. This example just isolates an issue that we are having in our production system, where we are trying to parse an s3 presigned url to build an s3 request. We're getting the key and bucket name from this url, and it results in the s3 client not finding the object in the bucket, even though it is there; the key is just corrupted.

@bhoradc
Copy link

bhoradc commented Dec 24, 2024

Hello @mhroberts7734,

Yes, I acknowledge that whether the corrupted S3 key appears as Unicode escape sequences in the debugger or as garbled characters in the console, they are both manifestations of the same root cause: the AmazonS3Uri class's character encoding issue.

I have verified that this specific scenario has already been documented in our issue tracking system. However, at this time, I cannot provide a specific timeline for when this issue will get prioritized.

Regards,
Chaitanya

@bhoradc bhoradc removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-review p2 This is a standard priority issue s3
Projects
None yet
Development

No branches or pull requests

2 participants