Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize UrlEncode in Utils #3307

Merged
merged 25 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
5ae6c49
Add unit tests
danielmarbach Apr 28, 2024
a7d48d4
Remove netstandard from unit tests since netstandard is not a target …
danielmarbach Apr 28, 2024
5ea6d15
Start Url Encode implementation
danielmarbach Apr 24, 2024
200bfcd
Better version
danielmarbach Apr 28, 2024
ea8eaa4
Allow unsafe
danielmarbach Apr 28, 2024
3ed9139
Encoding extensions and skip locals init
danielmarbach Apr 28, 2024
49a75bb
Skips locals init and cleanup
danielmarbach Apr 28, 2024
53b7d54
Remove comment because I will make this a reviewer comment
danielmarbach Apr 28, 2024
b0d00e4
Removed no longer needed dictionary
danielmarbach Apr 28, 2024
85b32e7
Comment
danielmarbach Apr 28, 2024
0f0c220
Curlies
danielmarbach Apr 28, 2024
3bd4a8e
Move to utils
danielmarbach May 30, 2024
f29b1d4
Move to missing types
danielmarbach May 30, 2024
3ca6552
Better comment
danielmarbach May 30, 2024
593c79f
Multi byte test
danielmarbach May 30, 2024
653a25f
Remove nullable for now
danielmarbach May 30, 2024
372f1b2
Fix extensions
danielmarbach May 30, 2024
0003d63
Fix ifdef
danielmarbach May 30, 2024
a3c4646
Unsafe for Netframework
danielmarbach Jul 3, 2024
721ac7d
Move extensions to BCL and netstandard folder
danielmarbach Jul 3, 2024
0b34f45
Bring back netstandard
danielmarbach Jul 3, 2024
c4d330f
Failing test for template
danielmarbach Jul 6, 2024
618ef3e
Well that was an easy fix
danielmarbach Jul 6, 2024
f5fe55e
Span indexof
danielmarbach Jul 6, 2024
4df6816
Remove extra test case for now
danielmarbach Jul 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions sdk/src/Core/AWSSDK.Core.NetFramework.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
<NoWarn>$(NoWarn);CS1591</NoWarn>

<SignAssembly>True</SignAssembly>

<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup>

<PropertyGroup Condition=" '$(RuleSetFileForBuild)' == 'false' Or '$(RuleSetFileForBuild)' == '' ">
Expand Down
2 changes: 2 additions & 0 deletions sdk/src/Core/AWSSDK.Core.NetStandard.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
<NoWarn>$(NoWarn);CS1591;CA1822</NoWarn>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<SignAssembly>True</SignAssembly>

<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
danielmarbach marked this conversation as resolved.
Show resolved Hide resolved
danielmarbach marked this conversation as resolved.
Show resolved Hide resolved
</PropertyGroup>
<PropertyGroup Condition="'$(TargetFramework)' == 'net8.0'">
<WarningsAsErrors>IL2026,IL2075</WarningsAsErrors>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#if !NET8_0_OR_GREATER
namespace System.Runtime.CompilerServices
{
/// <summary>Indicates to the compiler that the .locals init flag should not be set in nested method headers when emitting to metadata.</summary>
[AttributeUsage(AttributeTargets.Module | AttributeTargets.Class | AttributeTargets.Struct | AttributeTargets.Constructor | AttributeTargets.Method | AttributeTargets.Property | AttributeTargets.Event | AttributeTargets.Interface, Inherited = false)]
internal sealed class SkipLocalsInitAttribute : Attribute
{
}
}
#endif
92 changes: 64 additions & 28 deletions sdk/src/Core/Amazon.Util/AWSSDKUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Threading;
using Amazon.Runtime.Endpoints;
using ThirdParty.RuntimeBackports;
Expand Down Expand Up @@ -79,12 +80,6 @@ public static partial class AWSSDKUtils
// Default value of progress update interval for streaming is 100KB.
public const long DefaultProgressUpdateInterval = 102400;

internal static Dictionary<int, string> RFCEncodingSchemes = new Dictionary<int, string>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did check the entire SDK and couldn't find another reference so I got rid of it. Is it possible to entirely get rid of the different RFC implementations? If that is possible we could potentially even get rid of the custom code and replace it with the System.Net implementation?

https://learn.microsoft.com/en-us/dotnet/api/system.net.webutility.urlencode?view=net-8.0

{
{ 3986, ValidUrlCharacters },
{ 1738, ValidUrlCharactersRFC1738 }
};

internal const string S3Accelerate = "s3-accelerate";
internal const string S3Control = "s3-control";

Expand Down Expand Up @@ -1028,35 +1023,76 @@ public static string UrlEncode(string data, bool path)
/// Currently recognised RFC versions are 1738 (Dec '94) and 3986 (Jan '05).
/// If the specified RFC is not recognised, 3986 is used by default.
/// </remarks>
[SkipLocalsInit]
public static string UrlEncode(int rfcNumber, string data, bool path)
{
StringBuilder encoded = new StringBuilder(data.Length * 2);
string validUrlCharacters;
if (!RFCEncodingSchemes.TryGetValue(rfcNumber, out validUrlCharacters))
validUrlCharacters = ValidUrlCharacters;

string unreservedChars = String.Concat(validUrlCharacters, (path ? ValidPathCharacters : ""));
foreach (char symbol in System.Text.Encoding.UTF8.GetBytes(data))
byte[] sharedDataBuffer = null;
const int MaxStackLimit = 256;
try
{
if (unreservedChars.IndexOf(symbol) != -1)
{
encoded.Append(symbol);
}
else
if (!TryGetRFCEncodingSchemes(rfcNumber, out var validUrlCharacters))
validUrlCharacters = ValidUrlCharacters;

var unreservedChars = string.Concat(validUrlCharacters, path ? ValidPathCharacters : string.Empty).AsSpan();

var dataAsSpan = data.AsSpan();
var encoding = Encoding.UTF8;

var dataByteLength = encoding.GetMaxByteCount(dataAsSpan.Length);
var encodedByteLength = 2 * dataByteLength;
danielmarbach marked this conversation as resolved.
Show resolved Hide resolved
var dataBuffer = encodedByteLength <= MaxStackLimit
? stackalloc byte[MaxStackLimit]
: sharedDataBuffer = ArrayPool<byte>.Shared.Rent(encodedByteLength);
// Instead of stack allocating or renting two buffers we use one buffer with at least twice the capacity of the
// max encoding length. Then store the character data as bytes in the second half reserving the first half of the buffer
// for the encoded representation.
var encodingBuffer = dataBuffer.Slice(dataBuffer.Length - dataByteLength);
var bytesWritten = encoding.GetBytes(dataAsSpan, encodingBuffer);

var index = 0;
foreach (var symbol in encodingBuffer.Slice(0, bytesWritten))
{
encoded.Append('%');

// Break apart the byte into two four-bit components and
// then convert each into their hexadecimal equivalent.
byte b = (byte)symbol;
int hiNibble = b >> 4;
int loNibble = b & 0xF;
encoded.Append(ToUpperHex(hiNibble));
encoded.Append(ToUpperHex(loNibble));
if (unreservedChars.IndexOf((char)symbol) != -1)
{
dataBuffer[index++] = symbol;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we could potentially remove bound checks here by doing something like

Unsafe.Add(ref MemoryMarshal.GetReference(dataBuffer), index++) = symbol;

but I decided to not go down that path since it increases the complexity of the code

}
else
{
dataBuffer[index++] = (byte)'%';

// Break apart the byte into two four-bit components and
// then convert each into their hexadecimal equivalent.
var hiNibble = symbol >> 4;
var loNibble = symbol & 0xF;
dataBuffer[index++] = (byte)ToUpperHex(hiNibble);
dataBuffer[index++] = (byte)ToUpperHex(loNibble);
}
}

return encoding.GetString(dataBuffer.Slice(0, index));
}
finally
{
if (sharedDataBuffer != null) ArrayPool<byte>.Shared.Return(sharedDataBuffer);
}
}

internal static bool TryGetRFCEncodingSchemes(int rfcNumber, out string encodingScheme)
{
if (rfcNumber == 3986)
{
encodingScheme = ValidUrlCharacters;
return true;
}

return encoded.ToString();
if (rfcNumber == 1738)
{
encodingScheme = ValidUrlCharactersRFC1738;
return true;
}

encodingScheme = null;
return false;
}

private static void ToHexString(Span<byte> source, Span<char> destination, bool lowercase)
Expand Down
70 changes: 70 additions & 0 deletions sdk/src/Core/Amazon.Util/_bcl+netstandard/Extensions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
using System;
using System.Globalization;
using System.Text;

namespace Amazon.Util
{
internal static class Extensions
{
internal static string ToUpper(this String str, CultureInfo culture)
{
if (culture != CultureInfo.InvariantCulture)
throw new ArgumentException("The extension method ToUpper only works for invariant culture");
return str.ToUpperInvariant();
}

#if NETSTANDARD || NETFRAMEWORK
/// <summary>
/// Encodes into a span of bytes a set of characters from the specified read-only span.
/// </summary>
/// <param name="encoding">The encoding to be used.</param>
/// <param name="src">The span containing the set of characters to encode.</param>
/// <param name="dest">The byte span to hold the encoded bytes.</param>
/// <returns>The count of encoded bytes.</returns>
/// <remarks>
/// The method was introduced as a compatibility shim for .NET Standard and can be replaced for target frameworks that provide those methods out of the box.
/// </remarks>
/// <seealso
/// href="https://docs.microsoft.com/dotnet/api/system.text.encoding.getbytes?view=netstandard-2.1#system-text-encoding-getbytes(system-readonlyspan((system-char))-system-span((system-byte)))" />
public static unsafe int GetBytes(this Encoding encoding,
ReadOnlySpan<char> src,
Span<byte> dest)
{
if (src.Length == 0) return 0;

if (dest.Length == 0) return 0;

fixed (char* charPointer = src)
{
fixed (byte* bytePointer = dest)
{
return encoding.GetBytes(
charPointer,
src.Length,
bytePointer,
dest.Length);
}
}
}

/// <summary>
/// When overridden in a derived class, decodes all the bytes in the specified byte span into a string.
/// </summary>
/// <param name="encoding">The encoding to be used.</param>
/// <param name="bytes">A read-only byte span to decode to a Unicode string.</param>
/// <returns>A string that contains the decoded bytes from the provided read-only span.</returns>
/// <remarks>
/// The method was introduced as a compatibility shim for .NET Standard and can be replaced for target frameworks that provide those methods out of the box.
/// </remarks>
public static unsafe string GetString(this Encoding encoding, ReadOnlySpan<byte> bytes)
{
if (bytes.Length == 0) return string.Empty;

fixed (byte* bytePointer = bytes)
{
return encoding.GetString(bytePointer, bytes.Length);
}
}
#endif
}
}
18 changes: 0 additions & 18 deletions sdk/src/Core/Amazon.Util/_netstandard/Extensions.cs

This file was deleted.

73 changes: 72 additions & 1 deletion sdk/test/NetStandard/UnitTests/Core/AWSSDKUtilsTests.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
using System.Collections.Generic;
using System.Collections.Generic;
using Amazon.Util;
using System.Text;
using Amazon.Runtime.Internal;
Expand Down Expand Up @@ -63,5 +63,76 @@ public void DetermineService(string url, string expectedService)

Assert.Equal(expectedService, service);
}

[Theory]
[InlineData("value, with special chars!", "value%2C%20with%20special%20chars%21")]
danielmarbach marked this conversation as resolved.
Show resolved Hide resolved
[InlineData("value, with special chars and path {/+:}", "value%2C%20with%20special%20chars%20and%20path%20%7B%2F%2B%3A%7D")]
[InlineData(TemplateText, "%7B%0A%09%22AWSTemplateFormatVersion%22%20%3A%20%222010-09-09%22%2C%0A%0A%09%22Description%22%20%3A%20%22This%20is%20a%20sample%20template%22%2C%0A%0A%09%22Parameters%22%20%3A%20%7B%0A%09%09%22TopicName%22%20%3A%20%7B%0A%09%09%20%20%20%20%22Type%22%20%3A%20%22String%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Default%22%20%3A%20%22TheTopic%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Description%22%20%3A%20%22A%20topic.%22%0A%09%09%7D%2C%0A%20%20%20%20%20%20%20%20%22QueueName%22%20%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Type%22%20%3A%20%22String%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Default%22%20%3A%20%22TheQueue%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Description%22%20%3A%20%22A%20queue.%22%0A%20%20%20%20%20%20%20%20%7D%0A%09%7D%2C%0A%0A%09%22Resources%22%20%3A%20%7B%0A%0A%09%09%22TheQueue%22%20%3A%20%7B%0A%09%09%20%20%20%20%22Type%22%20%3A%20%22AWS%3A%3ASQS%3A%3AQueue%22%2C%0A%09%09%20%20%20%20%22Properties%22%20%3A%20%7B%0A%09%09%09%09%22QueueName%22%20%3A%20%7B%20%22Ref%22%20%3A%20%22QueueName%22%20%7D%0A%09%09%20%20%20%20%7D%0A%09%09%7D%2C%0A%0A%20%20%20%20%20%20%20%20%22TheTopic%22%20%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Type%22%20%3A%20%22AWS%3A%3ASNS%3A%3ATopic%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Properties%22%20%3A%20%7B%0A%09%09%09%09%22TopicName%22%20%3A%20%7B%20%22Ref%22%20%3A%20%22TopicName%22%20%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22Subscription%22%20%3A%20%5B%0A%09%09%09%09%09%7B%22Protocol%22%20%3A%20%22sqs%22%2C%20%22Endpoint%22%20%3A%20%7B%22Fn%3A%3AGetAtt%22%20%3A%20%5B%20%22TheQueue%22%2C%20%22Arn%22%5D%7D%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5D%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%7D%0A%09%7D%2C%0A%0A%09%22Outputs%22%20%3A%20%7B%0A%09%09%22TopicARN%22%20%3A%20%7B%0A%09%09%20%20%20%20%22Value%22%20%3A%20%7B%20%22Ref%22%20%3A%20%22TheTopic%22%20%7D%0A%09%09%7D%2C%0A%20%20%20%20%20%20%20%20%22QueueURL%22%20%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Value%22%20%3A%20%7B%20%22Ref%22%20%3A%20%22TheQueue%22%20%7D%0A%20%20%20%20%20%20%20%20%7D%0A%09%7D%0A%7D%0A")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test case that is failing which I didn't remember was a new test case until reviewing the changes again. This likely passes on a Linux based machine but will fail on Windows because of how some of the methods automatically translate \n to \r\n in windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah the input needs to be normalized. I can only do that after my vacation. Feel free to push it if you have time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it happen on checkout on the build agent?

danielmarbach marked this conversation as resolved.
Show resolved Hide resolved
public void UrlEncodeWithoutPath(string input, string expected)
{
var encoded = AWSSDKUtils.UrlEncode(input, path: false);

Assert.Equal(expected, encoded);
}

const string TemplateText = @"{
""AWSTemplateFormatVersion"" : ""2010-09-09"",

""Description"" : ""This is a sample template"",

""Parameters"" : {
""TopicName"" : {
""Type"" : ""String"",
""Default"" : ""TheTopic"",
""Description"" : ""A topic.""
},
""QueueName"" : {
""Type"" : ""String"",
""Default"" : ""TheQueue"",
""Description"" : ""A queue.""
}
},

""Resources"" : {

""TheQueue"" : {
""Type"" : ""AWS::SQS::Queue"",
""Properties"" : {
""QueueName"" : { ""Ref"" : ""QueueName"" }
}
},

""TheTopic"" : {
""Type"" : ""AWS::SNS::Topic"",
""Properties"" : {
""TopicName"" : { ""Ref"" : ""TopicName"" },
""Subscription"" : [
{""Protocol"" : ""sqs"", ""Endpoint"" : {""Fn::GetAtt"" : [ ""TheQueue"", ""Arn""]}}
]
}
}
},

""Outputs"" : {
""TopicARN"" : {
""Value"" : { ""Ref"" : ""TheTopic"" }
},
""QueueURL"" : {
""Value"" : { ""Ref"" : ""TheQueue"" }
}
}
}
";

[Theory]
[InlineData("\ud83d\ude02 value, with special chars!", "%F0%9F%98%82%20value%2C%20with%20special%20chars!")]
[InlineData("value, with special chars!", "value%2C%20with%20special%20chars!")]
[InlineData("value, with special chars and path {/+:}", "value%2C%20with%20special%20chars%20and%20path%20%7B/%2B:%7D")]
danielmarbach marked this conversation as resolved.
Show resolved Hide resolved
public void UrlEncodeWithPath(string input, string expected)
{
var encoded = AWSSDKUtils.UrlEncode(input, path: true);

Assert.Equal(expected, encoded);
}
}
}