Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running e2e tests on AWS #45

Draft
wants to merge 43 commits into
base: main
Choose a base branch
from
Draft

Conversation

eaudetcobello
Copy link
Contributor

@eaudetcobello eaudetcobello commented Aug 19, 2024

  • ci-e2e-tests.sh → my goal is to make this script a one-step solution to run the e2e tests, it launches a lxc container on the local machine and all the necessary tools are installed into it, and then tests are run within. It requires AWS credential configuration variables to be set on the local machine.

  • e2e-deleteme.yaml → temporary file I created to not interefere with the existing workflows that are defined in e2e.yaml. End goal is that whatever is in e2e-deleteme.yaml should be merged with e2e.yaml and a test matrix should be used that includes each architecture.

  • aws-nuke-config.yaml → file is still a WIP. It’s the config. to use with aws-nuke. The goal is for this config to cleanup all resources that have a certain tag. I’m still evaluating if this is the best way to go about things. Because of the way aws-nuke is built, we need to pass an exact tag, which means we can’t use a glob to include all capi resources (like tag:sigs.k8s.io/cluster-api/cluster/*). We need to say specifically say tag:sigs.k8s.io/cluster-api/cluster/capick8s-test-2djn3. One way to circumvent this would be to grep the cluster name from the output of the e2e test logs, then use sed to replace the tag in the aws-nuke configuration file.

  • install-aws-nuke.sh, install-clusterawsadm.sh, install-clusterctl.sh → install these tools into the lxc container's /usr/local/bin

  • write-provider-config.sh → this writes the CAPI provider configuration to the clusterctl.yaml config. file. We need this because ck8s is not a part of the default providers, so we need to tell clusterctl where to find our provider.

  • cluster_upgrade_test.go → Here we replace the hardcoded reference to docker infrastructure and instead tell the test suite to resolve `. This is then read from the config. file of the test (I think), which lives at test/config/ck8s-{docker,aws}.yaml

  • create_test.go → same as above.

  • test/e2e/data/infrastructure-aws folder → this folder contains the templates that will be applied to the cluster during the test. cluster-template.yaml is the one I have tested “Workload cluster creation” test with. The remediation.yaml templates are pretty much the same as the cluster-template.yaml except that they add a MachineHealthCheck and labels under MachineDeployment.

  • test/e2e/shared/v1beta1 → this defines the major and minor contract version of the core cluster api components that we use for the tests. (See ck8s-aws.yaml line 25)

  • test/e2e/shared/v1beta1_aws → this defines the major and minor contract version of the CAPA resources we deploy. (See ck8s-aws.yaml Line 38)

  • kcp_remediation_test.go, md_remediation_test.go, node_scale_test.go → only changed the hardcoded docker provider to clusterctl.DefaultInfrastructureProvider

@eaudetcobello eaudetcobello requested a review from a team as a code owner August 19, 2024 19:07
@eaudetcobello eaudetcobello changed the title Support running e2e tests on aws Support running e2e tests on AWS Aug 19, 2024
Copy link
Contributor

@bschimke95 bschimke95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @eaudetcobello, added couple of comments
we should probably get #22 in before this.

@@ -92,14 +92,14 @@ KUSTOMIZE := $(TOOLS_BIN_DIR)/$(KUSTOMIZE_BIN)-$(KUSTOMIZE_VER)
# Ginkgo
TEST_DIR := $(shell pwd)/test
ARTIFACTS ?= $(shell pwd)/_artifacts
GINKGO_FOCUS ?=
GINKGO_FOCUS ?= Workload cluster creation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a left over from testing?

Copy link
Contributor Author

@eaudetcobello eaudetcobello Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is temporary. I am leaving it like this for now because this test passes on all infras and it's easier to iterate with one test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by temporary do we mean that it's gonna get removed before this PR is merged? or are we planning to change this in the up coming PRs? if it's the latter, maybe we can add a note or todo.
also I think this will change the scope for both the docker and aws tests, right? I think we only need this for the aws test.

# when bootstrapping with tilt, it will use
# the defaultProviderVersion in https://github.com/kubernetes-sigs/cluster-api/blob/main/hack/tools/internal/tilt-prepare/main.go as
# default version for docker infrastructure provider
# name here should match defaultProviderVersion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it does not belong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It belongs, but I will clean up the message.

test/e2e/config/ck8s-aws.yaml Show resolved Hide resolved
test/e2e/create_test.go Show resolved Hide resolved
@eaudetcobello eaudetcobello marked this pull request as draft August 20, 2024 12:51
Makefile Outdated Show resolved Hide resolved
test/e2e/e2e_suite_test.go Outdated Show resolved Hide resolved
test/e2e/data/infrastructure-aws/cluster-template.yaml Outdated Show resolved Hide resolved
bschimke95 and others added 17 commits August 23, 2024 21:04
- Change service CIDR block to match our standard range
- Create a bastion by default
- Copy cluster bootstrap/join overrides to control-plane
- Remove unneeded prerunCommands
- Remove tolerations that don't apply to our setup (kubernetes.io/master)
- Add args from default provider template to aws-cloud-controller-manager args
- Restore some RBAC rules from default provider template
no hardcoded defaults, empty variables by default,
use new nodeName field to specify node name
support tilt with tilt-provider.json
@eaudetcobello eaudetcobello force-pushed the KU-1266/e2e-test-aws branch 19 times, most recently from 190c2a5 to 54dddf0 Compare September 12, 2024 00:09
Copy link
Contributor

@HomayoonAlimohammadi HomayoonAlimohammadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work! Thanks a lot @eaudetcobello! did an initial pass and left some comments.

strategy:
matrix:
ginkgo_focus:
#- "KCP remediation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we plan to uncomment these? or are these commented by mistake? maybe we can add a note or todo if this is intentional

- name: Install requirements
run: |
sudo apt install make
sudo apt install wget
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can install everything in a single line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about the name of this file, I guess the contents of this file was supposed to be carried over to the e2e.yaml file or something

@@ -2,6 +2,8 @@ name: E2E Tests

on:
pull_request:
branches:
- does-not-exist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I guess after moving the content of deleteme file to this file, we should remove this line

@@ -92,14 +92,14 @@ KUSTOMIZE := $(TOOLS_BIN_DIR)/$(KUSTOMIZE_BIN)-$(KUSTOMIZE_VER)
# Ginkgo
TEST_DIR := $(shell pwd)/test
ARTIFACTS ?= $(shell pwd)/_artifacts
GINKGO_FOCUS ?=
GINKGO_FOCUS ?= Workload cluster creation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by temporary do we mean that it's gonna get removed before this PR is merged? or are we planning to change this in the up coming PRs? if it's the latter, maybe we can add a note or todo.
also I think this will change the scope for both the docker and aws tests, right? I think we only need this for the aws test.

@@ -0,0 +1,96 @@
regions:
- us-east-2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can extract this from this file and also the workflow file. I think this might cause subtle problems if we duplicate it.

sudo sysctl fs.inotify.max_user_instances=8192
- name: Run e2e tests
run: |
sudo -E ./hack/ci-e2e-tests.sh true aws v0.1.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't understand this true here, would elaborate a bit? I see ${0}, ${2} and ${3} in the script, this should be ${1}, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice work putting up this script together. it should have been really challenging. but I strongly feel like we need to change this to a (maybe) python script cause it's too complicated for a bash script. not for this task but in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants