AWS Pacemaker awsvip failing with different errors #1876

jayan800 · 2023-06-23T02:06:42Z

Hi All,

We are running a two node pacemaker cluster in AWS and we use "awsvip" resource type to configure the vip IP. Below is the conf

pcs resource show privip_node1

Resource: privip_node1 (class=ocf provider=heartbeat type=awsvip)
Attributes: secondary_private_ip=10.x.x.x
Operations: migrate_from interval=0s timeout=30s (privip_node1-migrate_from-interval-0s)
migrate_to interval=0s timeout=30s (privip_node1-migrate_to-interval-0s)
monitor interval=20s timeout=30s (privip_node1-monitor-interval-20s)
start interval=0s timeout=30s (privip_node1-start-interval-0s)
stop interval=0s timeout=30s (privip_node1-stop-interval-0s)
validate interval=0s timeout=10s (privip_node1-validate-interval-0s)

pcs resource show node1_vip

Resource: node1_vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.x.x.x
Operations: monitor interval=10s timeout=20s (node1_vip-monitor-interval-10s)
start interval=0s timeout=20s (node1_vip-start-interval-0s)
stop interval=0s timeout=20s (node1_vip-stop-interval-0s)

The EC2 instance is configured to use IMDSV2.The fence_aws agent and resource-agent have also been upgraded to the most recent versions, which support imdsv2. Additionally, the resource is set up to use the IAM Profile credentials.

fence-agents-aws-4.2.1-41.el7_9.3.x86_64
python-s3transfer-0.1.13-1.0.1.el7.noarch
resource-agents-4.1.1-61.el7_9.15.x86_64

pip list | grep -i boto
boto3 (1.10.0)
botocore (1.13.50)

aws --version
aws-cli/2.9.4 Python/3.9.11 Linux/3.10.0-1160.80.1.0.1.el7.x86_64 exe/x86_64.oracle.7 prompt/off

pip3 list | grep -i boto
boto3 1.23.10
botocore 1.26.10

The privip resource consistently fails with the different errors:

pengine: warning: unpack_rsc_op_failure: Processing failed monitor of privip_node2 on node2: unknown error | rc=1
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000:109357 - timed out after 30000ms

Jun 16 10:01:43 node2 lrmd[36967]: notice: privip_node2_monitor_20000:13042:stderr [ Unable to locate credentials. You can configure credentials by running "aws configure". ]
Jun 16 10:01:43 node2 crmd[36970]: notice: privip_node2_monitor_20000:91 [ % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 359 100 359 0 0 37513 0 --:--:-- --:--:-- --:--:-- 39888\n\nUnable to locate credentials. You can configure credentials by running "aws configure".\n ]

Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ #15 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ]
Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ #15 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ]
Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ An error occurred (MissingParameter) when calling the DescribeInstances operation: The request must contain the parameter InstanceId ]

Failed Resource Actions:

privip_node1_start_0 on node1 'not running' (7): call=250, status=complete, exitreason='instance_id not found. Is this a EC2 instance?',
last-rc-change='Fri May 26 07:27:46 2023', queued=0ms, exec=6597ms

Any advice would be great.

oalbrigt · 2023-06-23T10:03:07Z

Try running pcs resource debug-start --full <resource>. That should show you all the commands it's running, and hopefully some pointers to what's wrong.

jayan800 · 2023-06-26T04:32:10Z

Thank you.

The debug command completed without any errors.

is there anything else to check?

oalbrigt · 2023-06-26T07:17:58Z

You can run pcs resource update <resource> trace_ra=1 and then disable/enable or restart the resource.

The trace files will be available in /var/lib//heartbeat/trace_ra/.

jayan800 · 2023-06-27T01:30:19Z

Thank you. I will enable the trace.
fingers crossed

jayan800 changed the title ~~AWS Pacemaker awsvip faling with different errors~~ AWS Pacemaker awsvip failing with different errors Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Pacemaker awsvip failing with different errors #1876

AWS Pacemaker awsvip failing with different errors #1876

jayan800 commented Jun 23, 2023

oalbrigt commented Jun 23, 2023

jayan800 commented Jun 26, 2023 •

edited

Loading

oalbrigt commented Jun 26, 2023

jayan800 commented Jun 27, 2023

AWS Pacemaker awsvip failing with different errors #1876

AWS Pacemaker awsvip failing with different errors #1876

Comments

jayan800 commented Jun 23, 2023

pcs resource show privip_node1

pcs resource show node1_vip

oalbrigt commented Jun 23, 2023

jayan800 commented Jun 26, 2023 • edited Loading

oalbrigt commented Jun 26, 2023

jayan800 commented Jun 27, 2023

jayan800 commented Jun 26, 2023 •

edited

Loading