Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlled rollout of ZOS on mainnet #2413

Open
xmonader opened this issue Sep 2, 2024 · 14 comments
Open

Controlled rollout of ZOS on mainnet #2413

xmonader opened this issue Sep 2, 2024 · 14 comments
Assignees
Labels
type_feature New feature or request
Milestone

Comments

@xmonader
Copy link
Collaborator

xmonader commented Sep 2, 2024

We need to implement a Controlled rollout for ZOS upgrades, specially on mainnet to facilitate controlled experiments on different nodes or farms. The primary goal is to allow for testing the next version of ZOS on selected nodes or farms without impacting the entire network. This will be crucial for evaluating new features or optimizations before rolling them out to the broader network.

  • We should be able to define the mainnet farms list to be used in the A/B testing
  • We should be able to define the mainnet nodes list to be used in the A/B testing
  • safe_to_upgrade_network defaulted to false: This flag will be used to indicate whether it is safe to proceed with network-wide upgrades with the latest zos version specified on chain or not

TODO...

@iwanbk
Copy link
Member

iwanbk commented Sep 3, 2024

Doesn't mean to be picky on the wording part.

We need to implement A/B Testing functionality for ZOS upgrades to facilitate controlled experiments on different nodes or farms. The primary goal is to allow for testing the next version of ZOS on selected nodes or farms without impacting the entire network

But i think what you really mean is staging, not A/B testing @xmonader ?
The main difference is for staging, we really want that the feature/version to be finally deployed.
While in A/B testing, we want to choose one between two or more alternatives.

This will be crucial for evaluating new features or optimizations before rolling them out to the broader network.

Other than staging, we can also employ feature flag/toggle technique:

  • activate the feature for several controlled users/nodes
  • deactivate when things go wrong.

@xmonader
Copy link
Collaborator Author

xmonader commented Sep 3, 2024

We have already qanet, testnet as staging environments for the release in pipeline, what is needed is controlled rollout on a small, defined, subset of nodes on mainnet.

@iwanbk
Copy link
Member

iwanbk commented Sep 4, 2024

Oh okay, feature flag/toggle then

@xmonader
Copy link
Collaborator Author

xmonader commented Sep 4, 2024

Flag is already toggled/set as part of zos upgrade. When we want to upgrade nodes on mainnet network, we create a proposal - on tfchain - that has a zos version to upgrade the whole network to, and as soon as the node picks up that proposal approval it starts its upgrade. What is needed is breaking that into two steps:

  • changing the version of zos on chain
  • controlled rollout, by allowing some of the nodes to reflect as soon as they become aware of a new version and manual approval to propagate across the network - after testing on that's first batch of nodes-.

@iwanbk
Copy link
Member

iwanbk commented Sep 4, 2024

controlled rollout

OK, so it is clear that we want is controlled rollout.

@xmonader xmonader changed the title Introduce A/B Testing to ZOS Controlled rollout of ZOS on mainnet Sep 4, 2024
@rawdaGastan rawdaGastan self-assigned this Sep 5, 2024
@rawdaGastan rawdaGastan added the type_feature New feature or request label Sep 5, 2024
@rawdaGastan rawdaGastan added this to the 3.12 milestone Sep 5, 2024
@rawdaGastan
Copy link
Contributor

Farm IDs can be included in the A/B testing but we can't include node IDs.
We can't get the ID of the node before the registration.

@xmonader
Copy link
Collaborator Author

Farm IDs can be included in the A/B testing but we can't include node IDs. We can't get the ID of the node before the registration.

If the node isn't registered, it's not part of the allowed nodes list by design, no?

@rawdaGastan
Copy link
Contributor

I mean the registration/noded module in general even the node is registered. This step is known after the identityd module (which is the one responsible for the upgrade)

@xmonader
Copy link
Collaborator Author

Alright, let's remove the nodes list and stick to farm ids only

@rawdaGastan
Copy link
Contributor

Do you want to use the node address instead? or just farm IDs will be enough?

@xmonader
Copy link
Collaborator Author

I think farm IDs are enough, addresses are too cumbersome IMO

@rawdaGastan
Copy link
Contributor

WIP:

  • waiting qa release to test
  • used zos configs repo for the rollout configs

@rawdaGastan
Copy link
Contributor

Testing in progress

  • Will be tested in the next qa release

@ashraffouda
Copy link
Collaborator

The current situation is we have a config file which specify if the version is safe to upgrade or not https://github.com/threefoldtech/zos-config/blob/main/development-v4.json#L68
this will introduce inconsistency since we are adding the version to the chain and on each change on the chain we need to change this file. a better solution is to include safe_to_upgrade in the version itself something like
{"version": "3.15", "safe_to_upgrade": false}
this will be passed while creating the motion to upgrade the environment.
and we keep the farms to be tested first on the zos-config
this will require a very small change in ts_client and go_clients

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type_feature New feature or request
Projects
Status: Pending Deployment
Development

No branches or pull requests

4 participants