Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD: operator appends random string to endpoint #20

Open
mkania-cisco opened this issue Sep 11, 2022 · 12 comments
Open

ETCD: operator appends random string to endpoint #20

mkania-cisco opened this issue Sep 11, 2022 · 12 comments
Assignees

Comments

@mkania-cisco
Copy link

I'm using Metallb as a provider for LoadBalancer.

I've created two services:

mkania@linux-700-2:~$ kubectl get svc -n cnwan-msc-green
NAME                         TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)          AGE
svc-msc-simple-k8s-app-bar   LoadBalancer   10.111.59.63    70.70.70.129   8000:30273/TCP   174m
svc-msc-simple-k8s-app-foo   LoadBalancer   10.103.51.228   70.70.70.128   8000:31586/TCP   174m

which autopopulated endpoints:

mkania@linux-700-2:~$ kubectl get endpoints -n cnwan-msc-green
NAME                         ENDPOINTS                        AGE
svc-msc-simple-k8s-app-bar   10.0.1.56:8000,10.0.4.80:8000    176m
svc-msc-simple-k8s-app-foo   10.0.1.131:8000,10.0.2.32:8000   176m

however, when cn-reader received changes, it came up with an error:

7:00PM INF sending data...	| func=queue.senderWorkQueue.sendData length=1
7:00PM INF received response from the adaptor	| func=services.servicesHandler.logResponseError response="207 - INVALID RESOURCES: Some resources have not been processed successfully. List of failed resources is included." status-code=207
7:00PM WRN adaptor error occurred on resource	| error="Resource 'svc-msc-simple-k8s-app-foo-b7f9137dfd': 400 ENDPOINT NOT FOUND  Cannot process DELETE event: resource  IP 70.70.70.128 and port 8000 does not exist. Ignoring this event." func=services.servicesHandler.logResponseError status-code=207
7:00PM INF events sent successfully	| func=queue.senderWorkQueue.sendData length=1

looking into etcd I see different endpoint name:

/service-registry/namespaces/cnwan-msc-green/services/svc-msc-simple-k8s-app-foo
name: svc-msc-simple-k8s-app-foo
namespaceName: cnwan-msc-green
metadata:
    cnwan.io/traffic-profile: green
    owner: cnwan-operator

/service-registry/namespaces/cnwan-msc-green/services/svc-msc-simple-k8s-app-foo/endpoints/svc-msc-simple-k8s-app-foo-b7f9137dfd
name: svc-msc-simple-k8s-app-foo-b7f9137dfd
serviceName: svc-msc-simple-k8s-app-foo
namespaceName: cnwan-msc-green
metadata:
    owner: cnwan-operator
address: 70.70.70.128
port: 8000

Although I deleted these endpoints manually and created again manually to match name from kubectl but still these does not populate to vManage...

@asimpleidea asimpleidea self-assigned this Sep 12, 2022
@asimpleidea
Copy link
Member

asimpleidea commented Sep 12, 2022

Hi mkania,

thank you for posting this. This is an expected behavior as the random string is appended to prevent overlapping in endpoints for the same Service and it is a sha256(address+":"+port).

The error occurred - Resource 'svc-msc-simple-k8s-app-foo-b7f9137dfd': 400 ENDPOINT NOT FOUND Cannot process DELETE event: resource IP 70.70.70.128 and port 8000 does not exist. Ignoring this event. - means that the cnwan-reader was not able to reach the cnwan-adapter.

Is the adapter running properly and reachable?

@mkania-cisco
Copy link
Author

mkania-cisco commented Sep 12, 2022

Thanks @asimpleidea for quick response!

I think it should be reachable, it does not complain on startup (I tried with both docker IP and host):

root@linux-700-2:/home/mkania# docker run  \
>               --name reader \
>               --rm \
>               cnwan/cnwan-reader:v0.8.0 watch etcd \
>               --metadata-keys cnwan.io/traffic-profile \
>               --adaptor-api http://172.17.0.3:8080/cnwan \
>               --endpoints 70.70.72.2:3379 \
>               --prefix /service-registry/ \
>               --interval 5
7:28AM INF getting current state of service registry from etcd...
7:28AM INF watching for changes...
7:28AM INF /service-registry/
7:28AM INF sending data...	| func=queue.senderWorkQueue.sendData length=2
7:28AM INF received response from the adaptor	| func=services.servicesHandler.logResponseError response=<> status-code=204
7:28AM INF events sent successfully	| func=queue.senderWorkQueue.sendData length=2
7:30AM INF detected deleted endpoint key=namespaces/cnwan-msc-green/services/svc-msc-simple-k8s-app-foo/endpoints/svc-msc-simple-k8s-app-foo-b7f9137dfd
getting before the delete
7:30AM INF sending data...	| func=queue.senderWorkQueue.sendData length=1
7:30AM INF detected deleted endpoint key=namespaces/cnwan-msc-green/services/svc-msc-simple-k8s-app-bar/endpoints/svc-msc-simple-k8s-app-bar-5216f6163b
getting before the delete
7:30AM INF received response from the adaptor	| func=services.servicesHandler.logResponseError response="207 - INVALID RESOURCES: Some resources have not been processed successfully. List of failed resources is included." status-code=207
7:30AM WRN adaptor error occurred on resource	| error="Resource 'svc-msc-simple-k8s-app-foo-b7f9137dfd': 400 ENDPOINT NOT FOUND  Cannot process DELETE event: resource  IP 70.70.70.128 and port 8000 does not exist. Ignoring this event." func=services.servicesHandler.logResponseError status-code=207

and when I change to endpoint that is not supposed to work I get error on startup:

root@linux-700-2:/home/mkania# docker run  \
>               --name reader \
>               --rm \
>               cnwan/cnwan-reader:v0.8.0 watch etcd \
>               --metadata-keys cnwan.io/traffic-profile \
>               --adaptor-api http://172.17.0.3:1234/cnwan \
>               --endpoints 70.70.72.2:3379 \
>               --prefix /service-registry/ \
>               --interval 5
7:31AM INF getting current state of service registry from etcd...
7:31AM INF sending data...	| func=queue.senderWorkQueue.sendData length=2
7:31AM ERR error while getting response	| error="Post \"http://172.17.0.3:1234/cnwan/events\": dial tcp 172.17.0.3:1234: connect: connection refused" func=services.servicesHandler.Send
7:31AM INF watching for changes...
7:31AM INF /service-registry/

Is there any way to enable any debugging on adaptor side?

@asimpleidea
Copy link
Member

Judging by the error, it looks like the adaptor could find endpoint svc-msc-simple-k8s-app-bar-5216f6163b in any policies.

This is an issue that the reader is re-forwarding from the adaptor, so before moving this issue to the adaptor's repo could you repeat these same steps but also including --debug in the reader's command and post the output again, please?

I want to make sure that the reader is sending all events appropriately before moving on investigating the adaptor. Thanks!

@mkania-cisco
Copy link
Author

root@linux-700-2:/home/mkania# docker run  \
>               --name reader \
>               --rm \
>               cnwan/cnwan-reader:v0.8.0 watch etcd \
>               --metadata-keys cnwan.io/traffic-profile \
>               --adaptor-api http://172.17.0.3:8080/cnwan \
>               --endpoints 70.70.72.2:3379 \
>               --prefix /service-registry/ \
>               --interval 5 \
>               --debug
8:02AM INF getting current state of service registry from etcd...
8:02AM INF watching for changes...
8:02AM INF /service-registry/
8:02AM INF sending data...	| func=queue.senderWorkQueue.sendData length=2
8:02AM INF received response from the adaptor	| func=services.servicesHandler.logResponseError response=<> status-code=204
8:02AM INF events sent successfully	| func=queue.senderWorkQueue.sendData length=2 

--debug flag seems not to change logging level.

@asimpleidea asimpleidea transferred this issue from CloudNativeSDWAN/cnwan-operator Sep 12, 2022
@asimpleidea
Copy link
Member

Transferring this to the adaptor's repo.

Could you post here any logs you see from the adaptor?

@arnatal
Copy link
Member

arnatal commented Sep 12, 2022

Hi @mkania-cisco, thanks for posting this! Which version of vManage are you running? Note that the Adaptor has been tested mainly with vManage version 20.3.1 (recommended) and 19.2.1. Thanks!

@mkania-cisco
Copy link
Author

mkania-cisco commented Sep 12, 2022

Hey @arnatal

Hi @mkania-cisco, thanks for posting this! Which version of vManage are you running? Note that the Adaptor has been tested mainly with vManage version 20.3.1 (recommended) and 19.2.1. Thanks!

unfortunately pretty recent -- 20.7.1.1.

Transferring this to the adaptor's repo.

Could you post here any logs you see from the adaptor?

root@linux-700-2:/home/mkania# docker logs adaptor
 * Serving Flask app '__main__' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

not much in default logging.

@mkania-cisco
Copy link
Author

mkania-cisco commented Sep 12, 2022

Enabling debugging for flask does not help:

docker run -d \
            -p 80:8080 \
            --rm \
            --name adaptor \
            -e SDWAN_IP=*** \
            -e SDWAN_USERNAME=*** \
            -e SDWAN_PASSWORD=*** \
            -e MERGE_POLICY=cnwan_merge\
            -e FLASK_ENV=development \
            ghcr.io/cloudnativesdwan/cnwan-adaptor
root@linux-700-2:/home/mkania# curl -H 'Content-Type: application/json' -X POST -d '{"metadataKey":"cnwan.io/traffic-profile","metadataValue": "green", "policyType": "Data", "policyName": "cnwan_dp" }' http://localhost:80/mappings
{
  "detail": "Config OK"
}
root@linux-700-2:/home/mkania# docker logs adaptor
 * Serving Flask app '__main__' (lazy loading)
 * Environment: development
 * Debug mode: on

@asimpleidea
Copy link
Member

Ok then this is probably due to a response sent by vManage.

Thank you, will update you asap!

@arnatal
Copy link
Member

arnatal commented Sep 12, 2022

@mkania-cisco If I remember correctly the adaptor keeps the logs inside the docker container. Could you get inside the adaptor container and check "adaptor.log"?

docker exec -it [container-id] /bin/sh
cat adaptor.log

@mkania-cisco
Copy link
Author

mkania-cisco commented Sep 12, 2022

Thanks -- I haven't dug that deep to discover that!

DEBUG:metadata_adaptor.vmanage_functions:AppRoute policy loaded: {'description': 'cnwan_merge',
 'mode': '',
 'name': 'cnwan_merge',
 'optimized': 'false',
 'sequences': [{'actions': [{'parameter': [{'field': 'name',
                                            'ref': 'e68bcf64-a8f9-4fb1-b2f4-803cb1907aba'},
                                           {'field': 'preferredColor',
                                            'value': 'green'},
                                           {'field': 'strict'}],
                             'type': 'slaClass'}],
                'match': {'entries': []},
                'sequenceId': 10,
                'sequenceIpType': 'ipv4',
                'sequenceName': 'App Route',
                'sequenceType': 'appRoute'}],
 'type': 'appRoute'}
DEBUG:metadata_adaptor.core_lib:New merge policy for AppRoute is [{'actions': [{'parameter': [{'field': 'name',
                              'ref': 'e68bcf64-a8f9-4fb1-b2f4-803cb1907aba'},
                             {'field': 'preferredColor', 'value': 'green'},
                             {'field': 'strict'}],
               'type': 'slaClass'}],
  'match': {'entries': []},
  'sequenceId': 10,
  'sequenceIpType': 'ipv4',
  'sequenceName': 'App Route',
  'sequenceType': 'appRoute'}]
DEBUG:metadata_adaptor.vmanage_functions:PUT https://10.62.141.179:8443/dataservice/template/policy/definition/approute/cbce642b-46e1-457f-b5a8-e8541cdbb769
DEBUG:metadata_adaptor.vmanage_functions:Sending this payload: {"name": "cnwan_merge", "type": "appRoute", "description": "cnwan_merge", "sequences": [{"sequenceId": 10, "sequenceName": "App Route", "sequenceType": "appRoute", "sequenceIpType": "ipv4", "match": {"entries": []}, "actions": [{"type": "slaClass", "parameter": [{"field": "name", "ref": "e68bcf64-a8f9-4fb1-b2f4-803cb1907aba"}, {"field": "preferredColor", "value": "green"}, {"field": "strict"}]}]}], "mode": "", "optimized": "false"}
DEBUG:urllib3.connectionpool:https://10.62.141.179:8443 "PUT /dataservice/template/policy/definition/approute/cbce642b-46e1-457f-b5a8-e8541cdbb769 HTTP/1.1" 200 None
DEBUG:metadata_adaptor.vmanage_functions:Status Code:  200
DEBUG:connexion.apis.abstract:Getting data and status code
DEBUG:connexion.apis.abstract:Prepared body and status code (204)
DEBUG:connexion.apis.abstract:Got framework response

From here it looks ok with 200 code.

and seems no errors either:

/usr/src/app # cat adaptor.log | grep ERROR
/usr/src/app #

@mkania-cisco
Copy link
Author

mkania-cisco commented Sep 12, 2022

...end after etcd refresh I see ERROR:

ERROR:metadata_adaptor.core_lib:An error ocurred while communicating with the SDWAN controller.
ERROR:metadata_adaptor.core_lib:Details: 'encap'

probably some change in templates..?

Looking here does not give any further ideas aside of checking for BREAKING CHANGES in vManage APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants