-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete/Recreate openshift marketplace pods #1482
Conversation
@@ -0,0 +1,18 @@ | |||
--- | |||
- name: Disable/Enable default CatalogSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do this only if the issue is hitting, that, afik, it's happening only in the latest 4.15, but, I'd not like to mesh the marketplace always as a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/a3722701902d4a80a1d4f3fc59925e8d ✔️ openstack-k8s-operators-content-provider SUCCESS in 30m 43s |
6e9b003
to
05e96bf
Compare
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/f6bf3b2e2b8c40b684c4021b6323cd3e ✔️ openstack-k8s-operators-content-provider SUCCESS in 29m 43s |
kubernetes.core.k8s_info: | ||
kind: Pod | ||
kubeconfig: "{{ cifmw_openshift_kubeconfig }}" | ||
name: "{{ pod_list.stdout | regex_search('^pod/redhat-operators-.*$', multiline=True) | split('/') | last }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhmm I don't get this one, this will query a single pods as your are passing the name field. Isn't enough if you remove the previous task and the name? The field_selector here should do the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, got it now! Updated it.
05e96bf
to
43267db
Compare
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/d3f10df365794f53936424b8e6df5aca ✔️ openstack-k8s-operators-content-provider SUCCESS in 38m 08s |
recheck |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/ee2bcb7ff4f74e11a38412cb7bd97ca6 ✔️ openstack-k8s-operators-content-provider SUCCESS in 34m 41s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting issue Chandan, I was going to suggest we add a todo to remove this "workaround" once fixed but seems from the RH solution you linked this is just how it works.
One request left
scenarios/centos-9/multinode-ci.yml
Outdated
@@ -4,6 +4,12 @@ cifmw_use_libvirt: false | |||
|
|||
cifmw_openshift_setup_skip_internal_registry_tls_verify: true | |||
|
|||
pre_infra: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use pre_infra
here as OCP isn't available at that stage.
post_infra
looks to be the correct hook to use but I didn't test.
You can see CI failing due to this[1]:
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Could not find or access '/home/zuul/.crc/machines/crc/kubeconfig' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @pablintino @lewisdenny for the review. Since OCP is deployed after pre_infra hook and cert_manager role iscalled before post_infra. There is no way to run this workaround
as a hook so we need to include in the cert_manager role itself. I have updated the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, let's see how CI likes it. Thanks for adding the comment too :)
namespace: openshift-marketplace | ||
field_selectors: | ||
- status.phase=CrashLoopBackOff | ||
register: _pod_status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just deleting the crash pod will also recover it crc-org/crc#4109 (comment)
Issue seen in 4.15.3 atleast, 4.15.8 didn't hit it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
137c8f7
to
8ea3496
Compare
On crc Zuul reproducer job, cert-manager operator installation is failing with following error. ``` the cert-manager CRDs are not yet installed on the Kubernetes API server ``` The cert-manager operator get installed from openshift marketplace. After digging deep, we found that pods under openshift-marketplace namespace are hitting CrashLoopBackOff due to following error. ``` failed to populate resolver cache from source redhat-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = ``` Based on crc-org/crc#4109 (comment), Delete and recreating openshift-marketplace pods fixes the issue. Since OCP is deployed after pre_infra hook and cert_manager role is called before post_infra. There is no way to run this workaround as a hook. It would be best to include under openshift_setup role. Signed-off-by: Chandan Kumar <[email protected]>
8ea3496
to
2b60416
Compare
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rebtoor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
On crc Zuul reproducer job, cert-manager operator installation is failing with following error.
The cert-manager operator get installed from openshift marketplace. After digging deep, we found that pods under openshift-marketplace namespace are hitting CrashLoopBackOff due to following error.
Based on crc-org/crc#4109 (comment), Delete and recreating openshift-marketplace pods fixes the issue.
Since OCP is deployed after pre_infra hook and cert_manager role iscalled before post_infra. There is no way to run this workaround as a hook.
It would be best to include under openshift_setup role.
As a pull request owner and reviewers, I checked that: