Delete/Recreate openshift marketplace pods #1482

raukadah · 2024-04-16T11:45:06Z

On crc Zuul reproducer job, cert-manager operator installation is failing with following error.

the cert-manager CRDs are not yet installed on the Kubernetes API server

The cert-manager operator get installed from openshift marketplace. After digging deep, we found that pods under openshift-marketplace namespace are hitting CrashLoopBackOff due to following error.

failed to populate resolver cache from source redhat-operators/openshift-marketplace:
failed to list bundles: rpc error: code = Unavailable desc = connection error: desc =

Based on crc-org/crc#4109 (comment), Delete and recreating openshift-marketplace pods fixes the issue.

Since OCP is deployed after pre_infra hook and cert_manager role iscalled before post_infra. There is no way to run this workaround as a hook.

It would be best to include under openshift_setup role.

As a pull request owner and reviewers, I checked that:

Appropriate testing is done and actually running

pablintino · 2024-04-16T11:47:50Z

hooks/playbooks/disable_catalogsource.yml

@@ -0,0 +1,18 @@
+---
+- name: Disable/Enable default CatalogSource


I'd do this only if the issue is hitting, that, afik, it's happening only in the latest 4.15, but, I'd not like to mesh the marketplace always as a default.

softwarefactory-project-zuul · 2024-04-16T12:17:32Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/a3722701902d4a80a1d4f3fc59925e8d

✔️ openstack-k8s-operators-content-provider SUCCESS in 30m 43s
❌ podified-multinode-edpm-deployment-crc RETRY_LIMIT in 8m 45s
✔️ noop SUCCESS in 0s
❌ cifmw-pod-pre-commit FAILURE in 7m 51s (non-voting)

softwarefactory-project-zuul · 2024-04-16T13:43:44Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f6bf3b2e2b8c40b684c4021b6323cd3e

✔️ openstack-k8s-operators-content-provider SUCCESS in 29m 43s
❌ podified-multinode-edpm-deployment-crc RETRY_LIMIT in 8m 44s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 39s (non-voting)

pablintino · 2024-04-16T14:25:24Z

hooks/playbooks/disable_catalogsource.yml

+      kubernetes.core.k8s_info:
+        kind: Pod
+        kubeconfig: "{{ cifmw_openshift_kubeconfig }}"
+        name: "{{ pod_list.stdout | regex_search('^pod/redhat-operators-.*$', multiline=True) | split('/') | last }}"


Uhmm I don't get this one, this will query a single pods as your are passing the name field. Isn't enough if you remove the previous task and the name? The field_selector here should do the rest.

Yes, got it now! Updated it.

softwarefactory-project-zuul · 2024-04-16T15:53:16Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/d3f10df365794f53936424b8e6df5aca

✔️ openstack-k8s-operators-content-provider SUCCESS in 38m 08s
❌ podified-multinode-edpm-deployment-crc FAILURE in 18m 00s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 29s (non-voting)

raukadah · 2024-04-16T16:29:51Z

recheck

softwarefactory-project-zuul · 2024-04-16T17:06:44Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/ee2bcb7ff4f74e11a38412cb7bd97ca6

✔️ openstack-k8s-operators-content-provider SUCCESS in 34m 41s
❌ podified-multinode-edpm-deployment-crc FAILURE in 17m 32s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 22s (non-voting)

lewisdenny

Interesting issue Chandan, I was going to suggest we add a todo to remove this "workaround" once fixed but seems from the RH solution you linked this is just how it works.

One request left

lewisdenny · 2024-04-17T00:38:23Z

scenarios/centos-9/multinode-ci.yml

@@ -4,6 +4,12 @@ cifmw_use_libvirt: false

 cifmw_openshift_setup_skip_internal_registry_tls_verify: true

+pre_infra:


We can't use pre_infra here as OCP isn't available at that stage.

post_infra looks to be the correct hook to use but I didn't test.

You can see CI failing due to this[1]:

fatal: [localhost]: FAILED! => {"changed": false, "msg": "Could not find or access '/home/zuul/.crc/machines/crc/kubeconfig' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}

[1] https://logserver.rdoproject.org/82/1482/43267db1d1546643bcf8d8f5f0a5fc8cc69f8f2d/github-check/podified-multinode-edpm-deployment-crc/a1df9e4/controller/ci-framework-data/logs/ci_script_000_run_disable_enable_red_hat.log

Thank you @pablintino @lewisdenny for the review. Since OCP is deployed after pre_infra hook and cert_manager role iscalled before post_infra. There is no way to run this workaround
as a hook so we need to include in the cert_manager role itself. I have updated the same.

Nice, let's see how CI likes it. Thanks for adding the comment too :)

karelyatin · 2024-04-17T03:53:57Z

hooks/playbooks/disable_catalogsource.yml

+        namespace: openshift-marketplace
+        field_selectors:
+          - status.phase=CrashLoopBackOff
+      register: _pod_status


just deleting the crash pod will also recover it crc-org/crc#4109 (comment)
Issue seen in 4.15.3 atleast, 4.15.8 didn't hit it.

On crc Zuul reproducer job, cert-manager operator installation is failing with following error. ``` the cert-manager CRDs are not yet installed on the Kubernetes API server ``` The cert-manager operator get installed from openshift marketplace. After digging deep, we found that pods under openshift-marketplace namespace are hitting CrashLoopBackOff due to following error. ``` failed to populate resolver cache from source redhat-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = ``` Based on crc-org/crc#4109 (comment), Delete and recreating openshift-marketplace pods fixes the issue. Since OCP is deployed after pre_infra hook and cert_manager role is called before post_infra. There is no way to run this workaround as a hook. It would be best to include under openshift_setup role. Signed-off-by: Chandan Kumar <[email protected]>

arxcruz · 2024-04-17T08:37:01Z

/lgtm
lgtm

rebtoor · 2024-04-17T08:37:04Z

/approve

openshift-ci · 2024-04-17T08:37:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rebtoor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rebtoor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from adrianfusco and rlandy April 16, 2024 11:45

pablintino reviewed Apr 16, 2024

View reviewed changes

raukadah force-pushed the fix_cert_manager branch 2 times, most recently from 6e9b003 to 05e96bf Compare April 16, 2024 13:12

pablintino reviewed Apr 16, 2024

View reviewed changes

raukadah force-pushed the fix_cert_manager branch from 05e96bf to 43267db Compare April 16, 2024 15:13

lewisdenny requested changes Apr 17, 2024

View reviewed changes

openshift-ci bot assigned lewisdenny Apr 17, 2024

karelyatin reviewed Apr 17, 2024

View reviewed changes

raukadah force-pushed the fix_cert_manager branch 4 times, most recently from 137c8f7 to 8ea3496 Compare April 17, 2024 06:10

raukadah force-pushed the fix_cert_manager branch from 8ea3496 to 2b60416 Compare April 17, 2024 06:27

raukadah changed the title ~~[cert_manager]Disable/Enable default catalogsource~~ Delete/Recreate openshift marketplace Apr 17, 2024

raukadah changed the title ~~Delete/Recreate openshift marketplace~~ Delete/Recreate openshift marketplace pods Apr 17, 2024

openshift-ci bot assigned arxcruz Apr 17, 2024

openshift-ci bot added the lgtm label Apr 17, 2024

openshift-ci bot added the approved label Apr 17, 2024

rebtoor approved these changes Apr 17, 2024

View reviewed changes

openshift-ci bot assigned rebtoor Apr 17, 2024

openshift-merge-bot bot merged commit a113e1a into main Apr 17, 2024
10 checks passed

openshift-merge-bot bot deleted the fix_cert_manager branch April 17, 2024 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete/Recreate openshift marketplace pods #1482

Delete/Recreate openshift marketplace pods #1482

raukadah commented Apr 16, 2024 •

edited

Loading

pablintino Apr 16, 2024

raukadah Apr 16, 2024

softwarefactory-project-zuul bot commented Apr 16, 2024

softwarefactory-project-zuul bot commented Apr 16, 2024

pablintino Apr 16, 2024

raukadah Apr 16, 2024

softwarefactory-project-zuul bot commented Apr 16, 2024

raukadah commented Apr 16, 2024

softwarefactory-project-zuul bot commented Apr 16, 2024

lewisdenny left a comment

lewisdenny Apr 17, 2024

raukadah Apr 17, 2024

raukadah Apr 17, 2024

lewisdenny Apr 17, 2024

karelyatin Apr 17, 2024

raukadah Apr 17, 2024

arxcruz commented Apr 17, 2024

rebtoor commented Apr 17, 2024

openshift-ci bot commented Apr 17, 2024

		@@ -0,0 +1,18 @@
		---
		- name: Disable/Enable default CatalogSource

		@@ -4,6 +4,12 @@ cifmw_use_libvirt: false

		cifmw_openshift_setup_skip_internal_registry_tls_verify: true

		pre_infra:

Delete/Recreate openshift marketplace pods #1482

Delete/Recreate openshift marketplace pods #1482

Conversation

raukadah commented Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Apr 16, 2024

softwarefactory-project-zuul bot commented Apr 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Apr 16, 2024

raukadah commented Apr 16, 2024

softwarefactory-project-zuul bot commented Apr 16, 2024

lewisdenny left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arxcruz commented Apr 17, 2024

rebtoor commented Apr 17, 2024

openshift-ci bot commented Apr 17, 2024

raukadah commented Apr 16, 2024 •

edited

Loading