Introduces `SchemaRegistry` and related classes #8614

tbenr · 2024-09-18T14:36:56Z

This PR introduces

SchemaTypes that will hold the static schemaIds for all datastructres types (they won't be milestone dependant, we will have only one even if the schema will change along the different forks)
SchemaRegistry class and SchemaCache responsible for implementing the registry with its internal cross-milestone cache
SchemaProvider interface and a base abstract class which implements the rules for generating a schema (the actual schema instantiation for which milestone(s)) for a given schemaId.
SchemaRegistryBuilder will be responsible to build the schema registry for each milestone via build method.

Note the design of the registry is based on the assumption that all registries will be built at startup (during SpecVersion creations) and registries will be immediately "primed". This guarantees that cache writes (puts) will happen only during that phase, which is a single-threaded (non concurrent). This is why the cache is not dealing with any concurrency\synchronization. After initialization the cache will be "readonly".

related to #8592
the full direction can be seen here: #8592

Documentation

I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

I thought about adding a changelog entry, and added one if I deemed necessary.

ethereum/spec/src/main/java/tech/pegasys/teku/spec/schemas/AbstractSchemaProvider.java

ethereum/spec/src/main/java/tech/pegasys/teku/spec/schemas/SchemaRegistry.java

ethereum/spec/src/main/java/tech/pegasys/teku/spec/schemas/SchemaCache.java

lucassaldanha · 2024-09-18T22:44:27Z

ethereum/spec/src/main/java/tech/pegasys/teku/spec/schemas/SchemaRegistry.java

+  public <T> T get(final SchemaId<T> schemaClass) {
+    SchemaProvider<T> provider = (SchemaProvider<T>) providers.get(schemaClass);
+    if (provider == null) {
+      throw new IllegalArgumentException(
+          "No provider registered for schema "
+              + schemaClass
+              + " or it does not support milestone "
+              + milestone);
+    }
+    T schema = cache.get(milestone, schemaClass);
+    if (schema != null) {
+      return schema;
+    }
+    final SpecMilestone effectiveMilestone = provider.getEffectiveMilestone(milestone);
+    if (effectiveMilestone != milestone) {
+      schema = cache.get(effectiveMilestone, schemaClass);
+      if (schema != null) {
+        cache.put(milestone, schemaClass, schema);
+        return schema;
+      }
+    }
+    schema = provider.getSchema(this);
+    cache.put(effectiveMilestone, schemaClass, schema);
+    if (effectiveMilestone != milestone) {
+      cache.put(milestone, schemaClass, schema);
+    }
+    return schema;
+  }


We are relying on the fact that we are calling SchemaRegistryBuilder from a "concurrency free" part of the code. I understand this is by design but maybe we can remove that assumption (so it does not bite us in the future).

Once the registry is primed, we should always hit the cache. However, it isn't thread-safe before priming.

What about separating the logic on get() to only read the cache. If we attempt to call get() w/o priming the registry first, we fail. We can move the logic that populates the cache to primeRegistry(), and make it synchronized so we know it is thread-safe no matter where it is being called from.

Yes I agree! Generally, a better separation between priming and lookup reflects the usage we expect.

Relooking at this, the separation can't be done.
The way this works is that, when the schema is being created, it can lookup in the registry itself so it may trigger a creation of a schema (dependency). This help us not dealing with ordering (it works as soon as there are no dependency loops).

The real point here is if we want or not a dependency loop detection. To me it can only happen if as a developer you pick the wrong schemaId (and it is compatible with the type you need in that moment). Let me think

ethereum/spec/src/main/java/tech/pegasys/teku/spec/schemas/SchemaCache.java

ethereum/spec/src/main/java/tech/pegasys/teku/spec/schemas/SchemaTypes.java

ethereum/spec/src/main/java/tech/pegasys/teku/spec/schemas/SchemaRegistry.java

dependency loop detection refactored package and visibility add comments for readability

tbenr · 2024-09-19T17:21:26Z

I added a loop dependency check (just in case) because it won't affect performance after everything is primed and could catch some dev problems instead of causing a bad stack overflow.
I added some general checks in the registry but no concurrency handling, relying on the building phase to be synchronized.
from visibility perspective, I decided to move everything under the same package registry because it makes all visibility clear: the only public things are Builder.build and Registry.get, all the rest is package private. The drawback is that we will have all providers together with the main classes, but I think it pays off.
Added some additional safety checks in the Builder.

ethereum/spec/src/main/java/tech/pegasys/teku/spec/schemas/registry/AbstractSchemaProvider.java

improves Builder checks

ethereum/spec/src/main/java/tech/pegasys/teku/spec/schemas/registry/AbstractSchemaProvider.java

lucassaldanha

LGTM

…istry/AbstractSchemaProvider.java Mehdi's approach Co-authored-by: Mehdi AOUADI <[email protected]>

introduces SchemaRegistry

e8dacba

github-advanced-security bot found potential problems Sep 18, 2024

View reviewed changes

baseVersion to effectiveMilestone

394d70c

github-advanced-security bot found potential problems Sep 18, 2024

View reviewed changes

tbenr mentioned this pull request Sep 18, 2024

Add schema registry in schema definitions #8615

Merged

2 tasks

tbenr changed the title ~~introduces SchemaRegistry~~ Introduces SchemaRegistry and related classes Sep 18, 2024