Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing validations for purl component rules #3

Open
Brcrwilliams opened this issue Oct 26, 2022 · 0 comments
Open

Missing validations for purl component rules #3

Brcrwilliams opened this issue Oct 26, 2022 · 0 comments

Comments

@Brcrwilliams
Copy link
Contributor

Per https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#rules-for-each-purl-component, some components have rules which currently are not checked by the library during parsing or initialization.

Below, I'll attempt to itemize the rules. I've skipped some of the ones that describe how the component is parsed.

✅ = the rule is currently checked

❌ = the rule is currently not checked.

❓ = not sure

type:

  • ❌ The package type is composed only of ASCII letters and numbers, '.', '+' and '-' (period, plus, and dash)
  • ❌ The type cannot start with a number
  • ❌ The type cannot contains spaces
  • ❓ The type must NOT be percent-encoded: Code assumes that type is not encoded, but performs no checks. I'm unsure that it's necessary to do so.
  • ✅ The type is case insensitive. The canonical form is lowercase

namespace:

  • ✅ Leading and trailing slashes '/' are not significant and should be stripped in the canonical form. They are not part of the namespace
  • ✅ Each namespace segment must be a percent-encoded string
  • When percent-decoded, a segment:
    • ❌ must not contain a '/'
    • ✅ must not be empty
  • ❌ A URL host or Authority must NOT be used as a namespace. Use instead a repository_url qualifier. Note however that for some types, the namespace may look like a host.

name:

  • ✅ A name must be a percent-encoded string

version:

  • ✅ A version must be a percent-encoded string

qualifiers:

  • ❌ key must be unique within the keys of the qualifiers string
  • ✅ value cannot be an empty string: a key=value pair with an empty value is the same as no key/value at all for this key
  • For each pair of key = value:
    • ❌ The key must be composed only of ASCII letters and numbers, '.', '-' and '_' (period, dash and underscore)
    • ❌ A key cannot start with a number
    • ❓ A key must NOT be percent-encoded: Code assumes that key is not encoded, but performs no checks. I'm unsure that it's necessary to do so.
    • ✅ A key is case insensitive. The canonical form is lowercase
    • ❌ A key cannot contains spaces
    • ✅ A value must be a percent-encoded string

subpath:

  • ✅ Leading and trailing slashes '/' are not significant and should be stripped in the canonical form
  • ✅ Each subpath segment must be a percent-encoded string
  • When percent-decoded, a segment:
    • ❌ must not contain a '/'
    • ✅ must not be any of '..' or '.'
    • ✅ must not be empty
  • ✅ The subpath must be interpreted as relative to the root of the package (Cannot feasibly be checked here. IMO, is the responsibility of programs using this library.)

Proposal

Have #parse raise InvalidPackageURL if:

  • type does not have the desired character set
  • type begins with a number
  • type contains spaces
  • qualifier key does not have the desired character set
  • key begins with a number
  • key contains spaces
  • decoded namespace segment contains /
  • decoded subpath segment contains /

Have #new raise an ArgumentError if:

  • type does not have the desired character set
  • type begins with a number
  • type contains spaces
  • qualifier key does not have the desired character set
  • qualifier key begins with a number
  • qualifier key contains spaces

Since to_s calls URI.encode_www_form_component on namespace / subpath segments, it seems unnecessary to check for / in these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant