Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduces SchemaRegistry and related classes #8614

Merged
merged 10 commits into from
Sep 24, 2024

Conversation

tbenr
Copy link
Contributor

@tbenr tbenr commented Sep 18, 2024

This PR introduces

  • SchemaTypes that will hold the static schemaIds for all datastructres types (they won't be milestone dependant, we will have only one even if the schema will change along the different forks)
  • SchemaRegistry class and SchemaCache responsible for implementing the registry with its internal cross-milestone cache
  • SchemaProvider interface and a base abstract class which implements the rules for generating a schema (the actual schema instantiation for which milestone(s)) for a given schemaId.
  • SchemaRegistryBuilder will be responsible to build the schema registry for each milestone via build method.

Note the design of the registry is based on the assumption that all registries will be built at startup (during SpecVersion creations) and registries will be immediately "primed". This guarantees that cache writes (puts) will happen only during that phase, which is a single-threaded (non concurrent). This is why the cache is not dealing with any concurrency\synchronization. After initialization the cache will be "readonly".

related to #8592
the full direction can be seen here: #8592

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

@tbenr tbenr changed the title introduces SchemaRegistry Introduces SchemaRegistry and related classes Sep 18, 2024
Comment on lines 46 to 73
public <T> T get(final SchemaId<T> schemaClass) {
SchemaProvider<T> provider = (SchemaProvider<T>) providers.get(schemaClass);
if (provider == null) {
throw new IllegalArgumentException(
"No provider registered for schema "
+ schemaClass
+ " or it does not support milestone "
+ milestone);
}
T schema = cache.get(milestone, schemaClass);
if (schema != null) {
return schema;
}
final SpecMilestone effectiveMilestone = provider.getEffectiveMilestone(milestone);
if (effectiveMilestone != milestone) {
schema = cache.get(effectiveMilestone, schemaClass);
if (schema != null) {
cache.put(milestone, schemaClass, schema);
return schema;
}
}
schema = provider.getSchema(this);
cache.put(effectiveMilestone, schemaClass, schema);
if (effectiveMilestone != milestone) {
cache.put(milestone, schemaClass, schema);
}
return schema;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are relying on the fact that we are calling SchemaRegistryBuilder from a "concurrency free" part of the code. I understand this is by design but maybe we can remove that assumption (so it does not bite us in the future).

Once the registry is primed, we should always hit the cache. However, it isn't thread-safe before priming.

What about separating the logic on get() to only read the cache. If we attempt to call get() w/o priming the registry first, we fail. We can move the logic that populates the cache to primeRegistry(), and make it synchronized so we know it is thread-safe no matter where it is being called from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree! Generally, a better separation between priming and lookup reflects the usage we expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relooking at this, the separation can't be done.
The way this works is that, when the schema is being created, it can lookup in the registry itself so it may trigger a creation of a schema (dependency). This help us not dealing with ordering (it works as soon as there are no dependency loops).

The real point here is if we want or not a dependency loop detection. To me it can only happen if as a developer you pick the wrong schemaId (and it is compatible with the type you need in that moment). Let me think

dependency loop detection
refactored package and visibility
add comments for readability
@tbenr
Copy link
Contributor Author

tbenr commented Sep 19, 2024

  • I added a loop dependency check (just in case) because it won't affect performance after everything is primed and could catch some dev problems instead of causing a bad stack overflow.

  • I added some general checks in the registry but no concurrency handling, relying on the building phase to be synchronized.

  • from visibility perspective, I decided to move everything under the same package registry because it makes all visibility clear: the only public things are Builder.build and Registry.get, all the rest is package private. The drawback is that we will have all providers together with the main classes, but I think it pays off.

  • Added some additional safety checks in the Builder.

Copy link
Member

@lucassaldanha lucassaldanha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tbenr tbenr enabled auto-merge (squash) September 24, 2024 07:58
@tbenr tbenr disabled auto-merge September 24, 2024 07:58
@tbenr tbenr enabled auto-merge (squash) September 24, 2024 08:42
@tbenr tbenr merged commit fd599c4 into Consensys:master Sep 24, 2024
17 checks passed
@tbenr tbenr deleted the Introduce-schema-registry branch September 24, 2024 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants