Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-6836] schema_registry/json: Support Internal References #22815

Conversation

BenPope
Copy link
Member

@BenPope BenPope commented Aug 9, 2024

As described in the tests, support "$ref": "#...":

{
  "$ref": "#/$defs/a_ref",
  "$defs": {
    "a_ref": {
      "type": "string"
    }
  }
}

I.e., they are internal-only, qualified as starting with #.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

@BenPope BenPope added the area/schema-registry Schema Registry service within Redpanda label Aug 9, 2024
@BenPope BenPope requested a review from andijcr August 9, 2024 13:05
@BenPope BenPope self-assigned this Aug 9, 2024
@BenPope BenPope requested a review from pgellert August 9, 2024 13:05
Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you have so far looks good. I just have a question about whether there are any other kinds of internal references to support.

Comment on lines +187 to +205
// Internal reference
if (ref.starts_with("#")) {
json::Pointer ptr{
ref.data() + 1,
ref.length() - 1,
};
if (auto* p = ptr.Get(_schema.doc); p) {
return *p;
}
throw as_exception(error_info{
error_code::schema_invalid,
fmt::format("Reference not found: '{}'", ref)});
}

throw as_exception(error_info{
error_code::schema_invalid,
fmt::format("External references not supported: '{}'", ref)});
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By internal references, do you mean non-cross-schema-version references in general or references starting with # specifically? Looking at https://json-schema.org/understanding-json-schema/structuring#dollarref, it's possible to have "internal" references (ie. a reference to a schema within the same schema document) that don't start with #. Do we want to support those kinds of references as well (eg. "$ref": "/schemas/address")?

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Aug 9, 2024

Comment on lines +106 to +109
std::string_view as_string_view(json::Value const& v) {
return {v.GetString(), v.GetStringLength()};
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@@ -1197,7 +1223,7 @@ bool is_not_combinator_superset(
// for not combinator, we want to check if the "not" newer subschema is
// less strict than the older subschema, because this means that newer
// validated less data than older
return is_superset(newer_it->value, older_it->value);
return is_superset(ctx, newer_it->value, older_it->value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to swap the order of contexts , like newer and older are subschema are swapped.
pseudo code:

Suggested change
return is_superset(ctx, newer_it->value, older_it->value);
return is_superset({ctx.newer, ctx.older}, newer_it->value, older_it->value);

otherwise newer will access the context for older as older will access the context for newer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot!

if (v.IsObject()) {
if (auto it = v.FindMember("$ref"); it != v.MemberEnd()) {
return ctx.resolve(as_string_view(it->value)).GetObject();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be the ok for draft4/7 ($ref overwrites the schema), but reading the spec seems like for 2019/20 the result should be merged on top of the schema.

A schema can reference another schema using the $ref keyword. The value of $ref is a URI-reference that is resolved against the schema's [Base URI](https://json-schema.org/understanding-json-schema/structuring#base-uri). When evaluating a $ref, an implementation uses the resolved identifier to retrieve the referenced schema and applies that schema to the [instance](https://json-schema.org/learn/glossary#instance).
Draft-specific info
In Draft 4-7, $ref behaves a little differently. When an object contains a $ref property, the object is considered a reference, not a schema. Therefore, any other properties you put in that object will not be treated as JSON Schema keywords and will be ignored by the validator. $ref can only be used where a schema is expected.

Then again, I suspect our reference implementation does not implement the merge behavior.
I'll check

@BenPope BenPope marked this pull request as draft September 20, 2024 14:53
@BenPope
Copy link
Member Author

BenPope commented Oct 3, 2024

Closed in favour of: #23461

@BenPope BenPope closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/schema-registry Schema Registry service within Redpanda
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants