Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the colon in the name not translated? #105

Open
lwlwudi opened this issue Nov 7, 2022 · 8 comments
Open

Why is the colon in the name not translated? #105

lwlwudi opened this issue Nov 7, 2022 · 8 comments

Comments

@lwlwudi
Copy link

lwlwudi commented Nov 7, 2022

Why is the colon in the name not translated?

@lwlwudi
Copy link
Author

lwlwudi commented Nov 7, 2022

like this pkg:generic/SuSE%20Linux%20Enterprise%20Server%2012%20SP5:[email protected]%2Bgit20140911.61c1681-38.13.1.x86_64

@lwlwudi
Copy link
Author

lwlwudi commented Nov 8, 2022

It seems to have been translated in the Java package.

@lwlwudi
Copy link
Author

lwlwudi commented Nov 8, 2022

Is it a colon that doesn't need to be translated anywhere?

@shiqi0715
Copy link

It seems there is a unnecessary step for encoding, it is against spec, is it a bug?

def quote(s):
    """
    Return a percent-encoded unicode string, except for colon :, given an `s`
    byte or unicode string.
    """
    if isinstance(s, unicode):
        s = s.encode('utf-8')
    quoted = _percent_quote(s)
    if not isinstance(quoted, unicode):
        quoted = quoted.decode('utf-8')
    quoted = quoted.replace('%3A', ':')              # there is unnecessary by spec
    return quoted

@matt-phylum
Copy link

This is not a bug. The spec does not say to escape ':', and the test suite gives examples of not escaping ':'.

@houdini91
Copy link

houdini91 commented Dec 5, 2023

What about a use case where the is url with port as part of the name..

For example
pkg:container/index.myregstry.io:5000/my-image@v1

Would you expect the ':' to be encoded when going in to toString func?
I was thinking it should not be encoded but it seems to fail on urlparse check.

It seems to recognize the 'index.myregistry.io' as a url scheme and fail with a a bit misleading error.

if scheme or authority:

If you prefer I can open a seperate issue.

I can prepare a pr depending on the discussion.

@houdini91
Copy link

Ok after taking previous advice I see you expect Such a purl to be
pkg:docker/my_image@sha256:244fd47e07d1004f0aed9c?repository_url=index.my-regstory.io:500

Did I understand correctly, can you elaborate on this logic or point me the the related spec, I tried it with the golang library and i did not see this limitation .

@matt-phylum
Copy link

The spec says "the ':' scheme and type separator does not need to and must NOT be encoded. It is unambiguous unencoded elsewhere," meaning it does not need to be encoded here. However:

  1. Some PURL implementations use generic escaping methods which escape more characters than necessary. This should be okay when parsed using an parser that correctly implements the spec, but can cause problems when supposedly canonical PURLs are compared as strings and they aren't really canonical. Because of this, I'd recommend to compare PURLs using the URL algorithm where you parse and then reserialize both PURLs using whatever PURL implementation you're using to get a consistent representation.
  2. The way PURL encodes qualifiers is very similar to x-www-form-urlencoded, which encodes more characters and has special rules about '+' characters. The PURL spec does not mention x-www-form-urlencoded anywhere, but some implementations use x-www-form-urlencoded anyway, leading to unnecessary escaping and incorrect serialization and parsing of some PURLs. (define handling of plus and space purl-spec#261)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants