Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URNs produce the wrong result for .uri() method #2102

Open
tomschr opened this issue Sep 10, 2024 · 3 comments
Open

URNs produce the wrong result for .uri() method #2102

tomschr opened this issue Sep 10, 2024 · 3 comments

Comments

@tomschr
Copy link

tomschr commented Sep 10, 2024

Thanks for this great project! ❤️

  • Faker version: 28.4.1
  • OS: openSUSE 15.5

If you run the .uri() method and pass the URN (Uniform Resource Name) scheme, it returns the wrong result.

Steps to reproduce

Consider the following snippet:

>>> from faker import Faker
>>> fake = Faker()
>>> fake.uri(schemes=["urn"])
'urn://thompson.org/tagsauthor.htm'

Expected behavior

URNs are described in RFC 1737 and later in RFC 2141, I would have expected to have something like this:

urn:isbn:0451450523
urn:mpeg:mpeg7:schema:2001
urn:mrn:iala:pub:g1143

Find more examples in the Wikipedia article about Uniform Resource Name.

Actual behavior

At the moment, the URN scheme uses slashes (/) which is wrong according to the mentioned specs. As the method name .uri() implies, it's the generic term for URNs and URLs.

Possible solution

I can think about different solutions to solve this issue:

  1. Fix the .uri() method and return the correct result when a URN scheme is requested.
  2. Introduce a new .urn() method.

Idea 1 would probably the more generic solution as it works for URLs and URNs. On the other hand, idea 2 would make URNs stick out.

@Abdujabbar
Copy link
Contributor

@tomschr I did a little research and found this list of URI schemas: https://en.wikipedia.org/wiki/List_of_URI_schemes

Based on this list, schemas=['urn'] is not the right way because it's not in the list of URI schemas.

I think it would be better to have a new method .urn().

It's just opinion.

@tomschr
Copy link
Author

tomschr commented Oct 16, 2024

@tomschr I did a little research and found this list of URI schemas: https://en.wikipedia.org/wiki/List_of_URI_schemes

Based on this list, schemas=['urn'] is not the right way because it's not in the list of URI schemas.

Ahh, that's interesting. I didn't know, thanks!

I think it would be better to have a new method .urn().

It's just opinion.

I'm fine with using .urn() as the way to go. 👍

My only concern is that users will find it. So one way to ensure that could be:

  • If the users use .uri() for getting URNs, raise an exception and direct users to use .urn()
  • Document it in the docstring and the official documentation.
    (I guess that's the normal approach anyway, right?)

@Abdujabbar
Copy link
Contributor

Abdujabbar commented Oct 17, 2024

@tomschr

  • I think the normal it should be added some white list for schemas list, and if passed schemas will not match to whitelist schemas then raise an exception for .uri() should be good practice.
  • Yes I agree this should be documented of course.

In this case, it would be great to hear @fcurella opinion also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants