Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cache update deduplication per fetch cycle and backoff algorithm. #5509

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

edwbuck
Copy link
Contributor

@edwbuck edwbuck commented Sep 20, 2024

fixes #5341
fixes #5349

This PR fixes 5341 by introducing a backoff algorithm, implemented in the "eventTracker".
This PR also fixes #5349 by collecting all registration entries (or attested node) updates into a single set of items to pull (using the golang "map keys" pattern) and only after that set is built, pulling those for the items to update the cache.

The two items are combined into one PR because previously there were three different "sections" of event monitoring, with each section pulling items to update the cache independently. For #5349 this meant that de-duplication of cache entry updates was not always guaranteed because one item might be de-duplicated twice in two "sections" of the event monitoring, leading to two cache updates.

The sections of the algorithm:

  • beforeFirstEvent (any event that arrives with a lower id than the first event)
  • polledEvents (any event that was skipped in a previous call of newEvents)
  • newEvents (any event that is detected past the previous last event)

Description of fix for #5349

Each of the sections of the algorithm above no longer update the cache directly. Instead, as they detect a Registration Entry / Attested Node that requires a cache update, it stores the need to fetch the Cache Entry by storing the appropriate key into fetchEntries

In doing so, as fetchEntries is a map[(int or string)] fetchEntries will only contain one key of the item to be fetched, de-duplicating the fetched item regardless of why it is to be fetched. This also has the desirable side-effect of keeping the Event processing separated from the entry updating, now all Event processing occurs prior to all Entry processing. This will be important when / if we unroll the cache update loop in updateCachedEntries as we can unroll only one loop, not three.

Description of fix for #5341

While the title of #5341 and its initial request was about incrementing the scanned Event IDs such that some of the skipped events might not be scanned at all, the investigation showed that this approach would likely lead to the SPIRE server getting out-of-sync with the "correct" skipped events, so the fix for #5341 was eventually determined to be a backoff algorithm.

The backoff algorithm is contained in the new eventTracker.go module, and consists of a number of a pollPeriods, a boundary slice that divides the pollPeriods into regions, and a map of eventStats. Within each region, an event will only be polled once, and each region is determined by the starting index which is constrained between [0, pollPeriods). If the first region starts after 0, then the event will be polled each polling cycle unit the event enters the first region.

As events are added to the eventTracker over time, eventStats keeps track of each event's progress while it is tracked. ticks counts the number of times the event was considered to be polled. polls counts the number of times the event was determined to be polled. By tracking these two items, one can calculate at which boundary the event is currently within.

  • If the event is before the first boundary, it is always returned as an item to be polled.
  • If the event is within a boundary, the event will be polled based on the number of ticks within that boundary.

To illustrate the last point, in the region starting at "5" within boundaries, (..., 5, 10, ...) there are five ticks in the boundary starting at "5" prior to encountering boundary "10". The last boundary extends from its index to the pollPeriods.

Within the region, all tracked events should distribute roughly evenly. (eg. If you have 1000 tracked events, and 10 regions, you should get about 100 events per region). This is achieved by using a hash of the event id. Unit testing ensures that the hash is using all the slots roughly equally for increments of 2, 3, 4, and 5 over regions of length 2, 3, 4, and 5.

@edwbuck
Copy link
Contributor Author

edwbuck commented Sep 20, 2024

Converting to a draft until I can fix the unit tests, which haven't been updated to match the new structure of the old algorithm.

Maintainers are free to comment on items they'd like changed early, if they have the time to review what exists.

Comment on lines +163 to +171
func hash(event uint) uint {
h := event
h ^= h >> 16
h *= 0x119de1f3
h ^= h >> 15
h *= 0x119de1f3
h ^= h >> 16
return h
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You provided a thorough explanation of this (and the rest of this file's code as well) during today's contributor sync. It would be very useful to have that explanation included as comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in next push. It's a big explanation, but it provides future devs with an appreciation of what's happening, and why it is needed (and if they read it carefully, what would be required in a modification).

Copy link
Member

@amoore877 amoore877 Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have to watch that recording to get the why :)

I'm finding it a bit hard to understand here how the logic specifically solves the problems we're looking at.

at the top it may help to type in some examples. you have some example already in the PR description- it would be helpful having this in comments in the code

speaking of, if we wanted to get extra spicy- a point was brought up earlier in the year how SPIRE diagrams haven't been getting added to over time for the core complex logics. that could fully solidify understanding for maintainers, but understandable if we want to stick to code docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amoore877 The "why" is to get a number that starts off repeatedly incrementing, and arrive at a pseudo random hash that distributes through the entire number space. That way, when we mod the "hash" by any other number, the remainders should be approximately evenly distributed through between [0, 1, 2, ... number).

To do so, you can't just multiply the number, as it will preserve its factors (divisible by 2 means always divisible by 2 because (2a) * b = 2(a * b). So hashing algorithms tend to "mix" bits in from the number into the low end bits, hence all the >> 16 and >> 15.

There's plenty of writeups on the internet about hashes. https://en.wikipedia.org/wiki/Hash_function This is a non-cryptographic hash, which is the kind you would likely have to study when hand writing a hash table. And this one isn't overly special, and like all non-cryptographic hashes, is designed to not use many CPU cycles to approximate a good dispersion.

So the multiplies and shifts combine to make something like 1, 2, 3, 4, 5 transform into 324231623, 01231385, 9923749123, 234202056, 49702374, which when modded by something should roughly distribute across the entire range of remainders, like (mod 3) (0, 2, 1, 1, 0, 2). (All numbers above are examples, and not the actual output of this hash).

The reason we need it is because every bucket can potentially have a different size. This means we want those remainders to distribute evenly across a large number of potential divisors. If they didn't then "poll once over 4 polling periods" might look like (90%, 2% 5%, 3%) which would be a bad balance across that bucket.

// fmt.Print(" last range\n")
bucketWidth := et.pollPeriods - et.boundaries[bucket]
bucketPosition := eventStats.hash % bucketWidth
//fmt.Printf("event %d, hash %d, bucket %d\n", event, eventStats.hash, bucketPosition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needed?

Copy link
Contributor Author

@edwbuck edwbuck Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fmt.Printf's can go. I just put them in there to debug the hashing index into "boundary's hash table" that doesn't exist in any struct.

Removed in the next push.

Altered the wording to get rid of the extra idea of "bucket" as keeping the terminology aligned with "boundary" introduces fewer ideas / words for ease of maintenance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heymarcel Please resolve this conversation item if there are no other follow ups on removing the fmt.Printfs It helps me when I have to scroll through unresovled conversations for follow ups.

// fmt.Print(" not last range\n")
bucketWidth := et.boundaries[1+bucket] - et.boundaries[bucket]
bucketPosition := eventStats.hash % bucketWidth
//fmt.Printf("event %d, hash %d, bucket %d\n", event, eventStats.hash, bucketPosition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifact, or still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftovers from designing the reporting of the boundaryIndex, in the virtual hash table of the boundary.

Removing in next push

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heymarcel Please resolve this conversation item if there are no other follow ups on removing the fmt.Printfs It helps me when I have to scroll through unresovled conversations for follow ups.

Signed-off-by: Edwin Buck <[email protected]>
Improved testing of authorized_entryfetcher_attested_nodes.go
Minor fixes to authorized_entryfetcher_regisration_entries.go
Fixed authorizedentries/cache_test.go
Better documentation for eventTracker.go
Fixed lack of sub-minute polling on eventTracker.go
Fixed unit tests to match sub-minute polling abilities.

Signed-off-by: Edwin Buck <[email protected]>
Renamed items in authorized entryfetcher attested nodes to not conflict.

Signed-off-by: Edwin Buck <[email protected]>
…ies.

Renamed similar test in attested nodes unit test to not conflict.

Signed-off-by: Edwin Buck <[email protected]>
Rename equivalent attested nodes unit test to avoid collision.
Rename registration entries func call to match attested nodes pattern.

Signed-off-by: Edwin Buck <[email protected]>
@edwbuck edwbuck changed the title Backoff Implement cache update deduplication per fetch cycle and backoff algorithm. Oct 1, 2024
Signed-off-by: Edwin Buck <[email protected]>
Signed-off-by: Edwin Buck <[email protected]>
Signed-off-by: Edwin Buck <[email protected]>
@edwbuck edwbuck marked this pull request as ready for review October 1, 2024 14:07
@edwbuck
Copy link
Contributor Author

edwbuck commented Oct 1, 2024

Ready for review. Failing unit test is a timeout on an unrelated module (ca manager).

@heymarcel Would you please go through your questions and resolve all of the items that no longer need work? Also, feel free to give it another review, if you have the time.

@azdagron Thanks for the preliminary review yesterday. Not trying to get a review per day out of you 😃, but it's ready.

@azdagron azdagron added this to the 1.11.0 milestone Oct 1, 2024
expectedBoundaries: []uint{0, 2, 6, 14, 30, 62},
},
{
name: "distributed linear polling for a while, then exponential",
Copy link
Member

@amoore877 amoore877 Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so iiuc, this could help in testing if DB client's increment strategy changes during runtime? or are the boundaries not related to the the increments at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time you check to see if something is to be schedule, it's time increments. A boundary is one region of time (for example, between the 6th check and the 14th check) when that item should be checked once.

The hash's remainder when divided by the size of the boundary will determine which time within that boundary the item will be polled. And number divided can only have one remainder, so this ensure that every polled item gets polled just once in the boundary>

The main reason to make this configurable (by allowing someone to pass boundaries to the eventTracker) is because we don't know what the most effective polling approach will be. It might even be different for different deployments / organizational policies. The BoundaryBuilder is just a default first-attempt that's known to be better than "poll every time".

When talking to the people impacted the most by event auto increments greater than 1, they didn't have a lot of confidence that they could predict the correct approach; that's why it is configurable. If this isn't sufficient, we will expose the boundary selection in some way through the configuration file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy doesn't change during runtime, but it's not hard coded. That permits one to alter it in the future for different runtimes.

Right now there's no idea of what kinds of strategies Uber might need, but it's clearly not exponential (the default strategy people mention when they hear backoff). It might be linear up to a "minimum polling interval" time, it might be something else. When Uber gets this in their environment (800,000 to 1,600,000 tracked entries estimated) we can determine what needs to be adjusted in future PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amoore877 Sorry, I just reread this and discovered that I was thinking of the program's strategy changing, not the database's strategy.

Yes, if the database's strategy changes at runtime, the program will automatically adjust. It will just be different "patterns" of skipped items, if any are skipped at all. That's intentional, because the database setup that gives the current algorithm so much trouble isn't a "skip every other" pattern, it is a "one database writes even ids, the other writes odd ids" That prevents the two databases from having to synchronize on ids or communicate when an id is issued.

As we never know when the write database will switch, and there's no guarantee that one db will fully finish its work before a switch, just figuring out the ones to skip (a previous suggestion) doesn't really work. Additionally, there is the ability to set up 3 or more databases, such that each one uses ids that don't conflict with the others under this kind of setup pattern.

That's the reason there are unit tests that check the items distribute evenly when they are incrementing by 2 (even), by 2 (odd), by 3 (divides evenly), by 3 (divides with remainder of 1), by 3 (divides with remainder of 2) and so on. Some programming flaws can make it such that an item that always divides by 2 evenly would only use 1/2 of the time slots, an item that always divides by 3 evenly would only use 1/3 of the time slots, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as the boundaries not being related to the increments, they aren't. But no matter what the increment is, any item being polled by any increment should be polled only once per boundary. Also, all elements incrementing by whatever amount they are incrementing by should be polled evenly across any sized boundary.

Unit testing only tests for increments between 2 and 5 (at every divisibility remainder) and boundaries between 2 and 5 (gotta cut the tests off at some time). If these are found to be "not enough" it's always possible to add more using the table-driven approach that exists already.

Now each boundary can be different, so you basically implement the algorithm to back off by assembling lots of boundaries. The change in polling with respect to time (polls/time) is set by the width of the boundary. If they are all equal, it's linear polling, if they are all offset by an exponential difference, it's exponential. If you mix in different boundary widths, you can create nearly any polling algorithm one wishes (exponential resetting at every hour of polling, any algorithm capped by a certain minimum polling, linera at one rate that then decreased to another, etc), but this does push the complexity of the algorithm;s implementation into the boundary choices.

t.Run(tt.name, func(t *testing.T) {
eventTracker := endpoints.NewEventTracker(tt.pollPeriods, tt.boundaries)

require.Equal(t, tt.expectedPollPeriods, eventTracker.PollPeriods(), "expcting %d poll periods; but, %d poll periods reported", eventTracker.PollPeriods(), tt.expectedPollPeriods)
Copy link
Member

@amoore877 amoore877 Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypernit, and maybe it's just my style- in tests I find using require aggressively may hide signal from code maintainers. I only use it if there would later be a panic (such as in a nil pointer) or there's otherwise no point in continuing the test (like if a dependency wasn't created correctly). in most other cases, assert will let a contributor get all relevant signal on what failed. as an example here, it could be interesting for a contributor to know that they broke the code in such a way that poll periods failed and boundaries failed but not expectedpolls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, and I just read up on the differences between require and assert. That said, there's no acceptable test in the suite that can fail and the product still be deemed "ok to use" because any failure would mean a runtime deviation between the cache and the database.

If it's a nit that keeps bothering you I can update the code, but any failure probably should just hurry through to the module failing unit test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not at all necessary to update- for me nits are always optional but just something to point out :)

// Update the cache
if err := a.updateCacheEntry(ctx, event.SpiffeID); err != nil {
return err
// track any skipped event ids, should the appear later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// track any skipped event ids, should the appear later.
// track any skipped event ids, should they appear later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the next push, in both the authorized_entryfetcher_registration_entries.go and authorized_entryfetcher_attested_nodes.go files.

Thanks for the signoff and commit link, but by me redoing it in my environment and pushing it, I don't get "not all commits" are signed errors should I ever have to update it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevend-uber Please verify that the changes were implemented in the code push, and if they meet your satisfaction, please resolve this conversation. It keeps the focus on the actively open items.

Copy link
Contributor

@heymarcel heymarcel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies for a slower review cycle than I would normally provide. The eventTracker code is complex and difficult to reason about. Is this a known algorithm that's documented elsewhere, is it entirely novel? If it's the former, it would be great to include a link to external documentation.

If the complexity of this algorithm is partly due to its performance goals, I would suggest running simulations on expected loads and comparing it with better-known backoff algorithms (say, exponential). This would let us see the performance deltas under different circumstances. It's a technique I've seen used many times at Amazon and AWS when investigating different approaches.


/**
* The default boundary strategy.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default poll rate is every 5 seconds for the first minute, right? And I'm assuming that these events are fairly distributed in practice, so we don't need to build in any jitter. Is that right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I like the explanation of the expected behavior. It might be nice to add it as a file-level comment, along with a comparison with a simpler backoff algorithm (e.g. exponential backoff, or something similar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strategy is completely arbitrary, in the sense I created it from nothing, using only a vauge idea of what might be required, because the people asking for the backoff also didn't have any concrete ideas of what they wanted either.

That's because an exponential backoff won't work (it was initially suggested). With exponential, your cache can drift from the database state far beyond what operations would permit (30 minutes to 8 hours to detect a late arriving change is quite possibly with exponential).

So, I figured, 1 minute a the configured polling, 9 minutes of twice per minute, and the rest once per minute as a good starting point for future changes. That's also why the algorithm is configurable (by passing in boundaries).

When we get specific request on how to configure it, we'll handle that with a config file entry or perhaps a new generator that the config file will reference. Till then, we need something, and this is a "first guess".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed one question.

I assume that these events are fairly distributed in practice....

I don't assume that. Most of these events are reactions to the controller manager adding in tons of entries, modifying them, and removing them in conjunction with kubernetes job launches. While that doesn't mean that all jobs in kubernetes launch at the same moment, my past experience with building enterprise scale schedulers is that it will be "a series of bursts of entry changes, with each group of near-simultaneous entries distributed over time"

And the hashing distributing over a boundary means that we don't need to add jitter, and we really don't want too much jitter anyway, as it's more valuable to have the database under a constant low load than to see it experience spikes and valleys in tracked events over different periods of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add in example of different polling algorithms in the comments

  • linear: (5, 10, 15, 20, 25, ...)
  • exponential (2, 4, 8, 16, 32, 64, ....)

but you can really tune this to be anything, if you just pick the offsets that match. (cosine, sine)


eventTracker: NewEventTracker(pollPeriods, pollBoundaries),

// initialize guages to nonsense values to force a change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// initialize guages to nonsense values to force a change.
// initialize gauges to nonsense values to force a change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next push

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevend-uber Please review that the spelling changes have been applied in the new commit, and if they meet your satisfaction, please resolve this conversation to keep the focus on the active items.

default:
log.WithError(err).Error("Failed to fetch info about missed Attested Node event")
// Node was deleted
if node == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this accounts for nodes that were deleted between fetching the list and getting the details.

I don't think this accounts for updating the cache for nodes that were deleted prior to the fetch?

Copy link
Contributor Author

@edwbuck edwbuck Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When fetching the attested node data from the database, there's only two possibilities.

  1. The data exists
  2. The data doesn't.

If the node was deleted while we were calculating the nodes to update, or if the node was delete even before we calculated the nodes to update, the node will remain missing from the database, and as the cache is a representation of what's in the database, it should be cleared from the database in both scenarios.

There's no separate logic for "deleted before" we determine it needed to be updated.

Likewise, even if it was a delete event, and someone slipped in a create event afterwards, the same node would be deemed to need a cache update (once, as it's node id is a key in the fetchNodes map) and the one fetch would pull the latest state of the node (the recreation) out of the database, even though two events exist, one delete followed by one create.

Previously, when we fetched nodes while we were calculing if we should fetch them, timing issues like the kind you are suggesting existed and had to be accounted for, that's why the refactoring to "figure out what needs updated" and then "update those items" reduces the code complexity a lot.

@edwbuck
Copy link
Contributor Author

edwbuck commented Oct 1, 2024

My apologies for a slower review cycle than I would normally provide. The eventTracker code is complex and difficult to reason about. Is this a known algorithm that's documented elsewhere, is it entirely novel? If it's the former, it would be great to include a link to external documentation.

If you have never had to hand write a hashtable (something that honestly is less needed every day) then I would expect not running into code like it. Additionally, most hash tables focus on creating memory data structure (an array to hold the bucket list of a hash, a set of "buckets" to hold the entries, etc).

Since we really only want to track presence, the hash is constant, and the buckets in the hash table is constant, we simply avoid the entire data structure by computing the hash bucket index every time from the item's hash value and the width of the hashtable, which is "boudnaryWidth" in the code.

The only novelty here is that instead of looking up an object by its id, the object and the id are the same, so we only need to know where it would be stored to distribute it across the bucket indexes. Then we use the bucket indexes to determine which offset from the beginning of the boundary the item will be polled at.

Why all the effort to not create the structs and entries to make it look like a regular hash table?

  1. No need to store the input, in a bucket, as it's identical to the key being searched.
  2. Every item being tracked would always be in every hash table.
  3. Actually creating the hash tables in RAM would be expensive, especially with the primary person asking for this having 800,000 to 1,600,000 items to track across nearly 864,000 poll periods (each, which start at independent times).
  4. The important part of the table isn't if the item is in it (they all are), it's the item's bucket index.
  5. The memory savings is significant and additionally is tied to performance. No malloc or free on a table the doesn't exist.

The hash is pre-computed and stored, and then only the mod need performed. To figure out which number to mod against, we calculate the hash table's width (between the starting indices of the buckets).

If the complexity of this algorithm is partly due to its performance goals, I would suggest running simulations on expected loads and comparing it with better-known backoff algorithms (say, exponential). This would let us see the performance deltas under different circumstances. It's a technique I've seen used many times at Amazon and AWS when investigating different approaches.

Exponential backoff algorithms were initially suggested, but nobody wants their cache to be sync'd to the database with an upper bound of 2^8 polling periods (~21 minutes with default polling settings, or 42 minutes after that time period).

The selection of boundaries permits any backoff algorithm to be created. If you want exponential, use (2, 4, 8, 16, 32, 64, 128, 256, 512, etc.) If you want linear, (5, 10, 15, 20, 25, 30, 35, etc.) This is a defensive measure, because we know we don't know the ideal backoff algorithm. Getting the initial deployment out will help Uber determine if the algorithm needs tweaking, and a follow up commit can even expose the bucket choice through the configuration in some manner if deemed necessary.

Additionally, any "normal" backoff doesn't work, because we want:

  1. A guaranteed poll at the end of the item's polling time.
  2. An ability to set a hard upper limit on the backoff, so operational people can know the database and the cache will be in sync after "some fixed period of time"
  3. Guarantees that any item will only be polled once randomly between the time period it is to be polled (complicates rescheduling based on last poll time).
  4. The ability to adjust the ramp up between "poll every cycle" and "poll at the slowest rate possible".

BoundaryBuilder(...) is 100% guaranteed to be non-optimal, mostly because every investigation into what is optimal yields the answer, "we don't know, it will have to be observed and tuned post-deployment." I hope it's a good enough "first offering" and nothing more. If it isn't, it is poised to be altered easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants