fix(HMS-2181): statuser throttling configuration options #608

lzap · 2023-07-19T13:31:04Z

We are tracking a ticket to implement caching of source availability checks when necessary (AWS API limits). However, the numbers from both stage and production shows that we do hundreds of checks per hour, so it would be preliminary to work on this.

https://issues.redhat.com/browse/HMS-1244

However, it makes sense to prepare app configuration values just in case we hit some API limits and we will need to slow down rate of availability checks. One configuration value (delay) can be used to arbitrarily slow down pace of checks per hyperscaler. Another config value (rate) can be used for random skipping of checks in case we need to buy time in order to do proper caching implementation.

We have about 150 checks in total on stage, production is currently similar. Therefore I suggest to start with the default value of 1 second which has plenty of room for growth. If we start getting Kafka lag (we have a SLO for that), we can easily either shorten the delay or enable dice rolling (e.g. every 2/3 check will be skipped on average = rate 0.33).

ezr-ondrej · 2023-07-19T13:45:28Z

I do love the approach, but why we are implementing it when we do not see a problem? 🤔
given the PR is not very complex, can't we implement it once there is an issue?
Or do you see us hitting the AWS limits soon?

lzap · 2023-07-19T14:24:25Z

why we are implementing it when we do not see a problem

I want to be ready if something happens, in that case it will take us just an hour to get an app-interface configuration changed which can be done for stage or production quite quickly. After all, this was why we have chosen the statuser being a single pod so this is the ultimate goal of our effort - not implementing it actually feels weird.

Signed-off-by: Lukáš Zapletal <[email protected]>

lzap · 2023-07-19T14:26:47Z

Rebased the example config file, why I always forget? :-D

avitova

I think that this is a valid approach. 👍 TY

adiabramovitch · 2023-07-20T09:54:08Z

What about the skipped requests? Shouldn't we address those whenever possible? Or do we prefer to wait for another exact request to arrive from the user (if occurs)?

lzap · 2023-07-20T11:51:49Z

What is a "skipped request"?

lzap · 2023-07-20T12:30:07Z

Oh so you mean when probability-based check is skipped when this feature is enabled.

Indeed, this is a problem for user-initiated checks. We asked sources team to solve this in https://issues.redhat.com/browse/RHCLOUD-22776 but until then, if we ever enabled the probability, it would be probably only temporarily to buy us some time. As you can see, there is no skipping in the default configuration (default value is 1.0 = all checks are performed).

fix(HMS-2181): statuser throttling configuration options

28339bc

Signed-off-by: Lukáš Zapletal <[email protected]>

lzap force-pushed the delay-statuser branch from 8a3a673 to 28339bc Compare July 19, 2023 14:26

avitova approved these changes Jul 20, 2023

View reviewed changes

adiabramovitch merged commit dce8174 into RHEnVision:main Jul 20, 2023
6 checks passed

lzap deleted the delay-statuser branch July 20, 2023 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(HMS-2181): statuser throttling configuration options #608

fix(HMS-2181): statuser throttling configuration options #608

lzap commented Jul 19, 2023

ezr-ondrej commented Jul 19, 2023

lzap commented Jul 19, 2023

lzap commented Jul 19, 2023

avitova left a comment

adiabramovitch commented Jul 20, 2023 •

edited

Loading

lzap commented Jul 20, 2023

lzap commented Jul 20, 2023

fix(HMS-2181): statuser throttling configuration options #608

fix(HMS-2181): statuser throttling configuration options #608

Conversation

lzap commented Jul 19, 2023

ezr-ondrej commented Jul 19, 2023

lzap commented Jul 19, 2023

lzap commented Jul 19, 2023

avitova left a comment

Choose a reason for hiding this comment

adiabramovitch commented Jul 20, 2023 • edited Loading

lzap commented Jul 20, 2023

lzap commented Jul 20, 2023

adiabramovitch commented Jul 20, 2023 •

edited

Loading