Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeflow/spark-operator:2.0.0-rc.0 - installation showing config paramters of v2.0.2 (was working earlier) #2330

Open
1 task
karanalang opened this issue Nov 21, 2024 · 1 comment
Labels
kind/bug Something isn't working

Comments

@karanalang
Copy link

What happened?

  • ✋ I have searched the open/closed issues and my issue is not listed.

I've kubeflow/spark-operator:2.0.0-rc.0 installed on GKE, and it is working fine

When i create a new helm install with the same version, it is giving me error:

Here is the Helm install command :

helm install spark-operator spark-operator/spark-operator \
  --namespace so350 \
  --set image.tag=2.0.0-rc.0 \
  --create-namespace \
  --set webhook.enable=true \
  --set webhook.port=443 \
  --set webhook.namespaceSelector="spark-webhook-enabled=true" \
  --set logLevel=debug \
  --set enableResourceQuotaEnforcement=true \
  --set webhook.failOnError=true \
  --set controller.resources.limits.cpu=100m \
  --set controller.resources.limits.memory=200Mi \
  --set controller.resources.requests.cpu=50m \
  --set controller.resources.requests.memory=100Mi \
  --set webhook.resources.limits.cpu=100m \
  --set webhook.resources.limits.memory=200Mi \
  --set webhook.resources.requests.cpu=50m \
  --set webhook.resources.requests.memory=100Mi \
  --set "sparkJobNamespaces={spark-apps}" 

Error in controller pod logs :

+ [[ -z root:x:0:0:root:/root:/bin/bash ]]
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator controller start --zap-log-level=info --namespaces=default --controller-threads=10 --enable-ui-service=true --enable-metrics=true --metrics-bind-address=:8080 --metrics-endpoint=/metrics --metrics-prefix= --metrics-labels=app_type --leader-election=true --leader-election-lock-name=spark-operator-controller-lock --leader-election-lock-namespace=so350 --workqueue-ratelimiter-bucket-qps=50 --workqueue-ratelimiter-bucket-size=500 --workqueue-ratelimiter-max-delay=6h
Error: unknown flag: --workqueue-ratelimiter-bucket-qps
Usage:
  spark-operator controller start [flags]

Flags:
      --cache-sync-timeout duration                      Informer cache sync timeout. (default 30s)
      --controller-threads int                           Number of worker threads used by the SparkApplication controller. (default 10)
      --enable-batch-scheduler                           Enable batch schedulers.
      --enable-http2                                     If set, HTTP/2 will be enabled for the metrics and webhook servers
      --enable-metrics                                   Enable metrics.
      --enable-ui-service                                Enable Spark Web UI service. (default true)
      --health-probe-bind-address string                 The address the probe endpoint binds to. (default ":8081")
  -h, --help                                             help for start
      --ingress-class-name string                        Set ingressClassName for ingress resources created.
      --ingress-url-format string                        Ingress URL format.
      --kubeconfig string                                Paths to a kubeconfig. Only required if out-of-cluster.
      --leader-election                                  Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.
      --leader-election-lease-duration duration          Leader election lease duration. (default 15s)
      --leader-election-lock-name string                 Name of the ConfigMap for leader election. (default "spark-operator-lock")
      --leader-election-lock-namespace string            Namespace in which to create the ConfigMap for leader election. (default "spark-operator")
      --leader-election-renew-deadline duration          Leader election renew deadline. (default 14s)
      --leader-election-retry-period duration            Leader election retry period. (default 4s)
      --metrics-bind-address string                      The address the metric endpoint binds to. Use the port :8080. If not set, it will be 0 in order to disable the metrics server (default "0")
      --metrics-endpoint string                          Metrics endpoint. (default "/metrics")
      --metrics-job-start-latency-buckets float64Slice   Buckets for the job start latency histogram. (default [30.000000,60.000000,90.000000,120.000000,150.000000,180.000000,210.000000,240.000000,270.000000,300.000000])
      --metrics-labels strings                           Labels to be added to the metrics.
      --metrics-prefix string                            Prefix for the metrics.
      --namespaces strings                               The Kubernetes namespace to manage. Will manage custom resource objects of the managed CRD types for the whole cluster if unset.
      --secure-metrics                                   If set the metrics endpoint is served securely
      --zap-devel                                        Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error)
      --zap-encoder encoder                              Zap log encoding (one of 'json' or 'console')
      --zap-log-level level                              Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity (default )
      --zap-stacktrace-level level                       Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
      --zap-time-encoding time-encoding                  Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.

unknown flag: --workqueue-ratelimiter-bucket-qps

describing the Deployment -

(base) Karans-MacBook-Pro:~ karanalang$ kc describe deployment.apps/spark-operator-controller -n so350
Name:                   spark-operator-controller
Namespace:              so350
CreationTimestamp:      Thu, 21 Nov 2024 12:39:10 -0800
Labels:                 app.kubernetes.io/component=controller
                        app.kubernetes.io/instance=spark-operator
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=spark-operator
                        app.kubernetes.io/version=2.0.2
                        helm.sh/chart=spark-operator-2.0.2
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: spark-operator
                        meta.helm.sh/release-namespace: so350
Selector:               app.kubernetes.io/component=controller,app.kubernetes.io/instance=spark-operator,app.kubernetes.io/name=spark-operator
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/component=controller
                    app.kubernetes.io/instance=spark-operator
                    app.kubernetes.io/name=spark-operator
  Annotations:      prometheus.io/path: /metrics
                    prometheus.io/port: 8080
                    prometheus.io/scrape: true
  Service Account:  spark-operator-controller
  Containers:
   spark-operator-controller:
    Image:      docker.io/kubeflow/spark-operator:2.0.0-rc.0
    Port:       8080/TCP
    Host Port:  0/TCP
    Args:
      controller
      start
      --zap-log-level=info
      --namespaces=default
      --controller-threads=10
      --enable-ui-service=true
      --enable-metrics=true
      --metrics-bind-address=:8080
      --metrics-endpoint=/metrics
      --metrics-prefix=
      --metrics-labels=app_type
      --leader-election=true
      --leader-election-lock-name=spark-operator-controller-lock
      --leader-election-lock-namespace=so350
      --workqueue-ratelimiter-bucket-qps=50
      --workqueue-ratelimiter-bucket-size=500
      --workqueue-ratelimiter-max-delay=6h
    Limits:
      cpu:     100m
      memory:  200Mi
    Requests:
      cpu:         50m
      memory:      100Mi
    Liveness:      http-get http://:8081/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:     http-get http://:8081/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:   <none>
    Mounts:        <none>
  Volumes:         <none>
  Node-Selectors:  <none>
  Tolerations:     <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    ReplicaSetUpdated
OldReplicaSets:  <none>
NewReplicaSet:   spark-operator-controller-56789bb775 (1/1 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  2m42s  deployment-controller  Scaled up replica set spark-operator-controller-56789bb775 to 1

seems version 2.0.2 is being installed, and that has the parameters -

--workqueue-ratelimiter-bucket-qps=50
      --workqueue-ratelimiter-bucket-size=500
      --workqueue-ratelimiter-max-delay=6h

If i remove these manually from the deployment, the controller pod is starting up.

Can you pls take a look, and let me know why this is happenning.

thanks !

Reproduction Code

No response

Expected behavior

No response

Actual behavior

No response

Environment & Versions

  • Kubernetes Version: 1.28
  • Spark Operator Version: 2.0.0-rc.0
  • Apache Spark Version: 3.5

Additional context

No response

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

@karanalang karanalang added the kind/bug Something isn't working label Nov 21, 2024
@ChenYi015
Copy link
Contributor

@karanalang Try this:

helm repo update

helm install spark-operator spark-operator/spark-operator \
  --namespace so350 \
  --create-namespace \
  --version=2.0.0-rc.0 \
  --set controller.logLevel=debug \
  --set controller.resources.limits.cpu=100m \
  --set controller.resources.limits.memory=200Mi \
  --set controller.resources.requests.cpu=50m \
  --set controller.resources.requests.memory=100Mi \
  --set webhook.enable=true \
  --set webhook.logLevel=debug \
  --set webhook.port=9443 \
  --set webhook.failurePolicy=Fail \
  --set webhook.resources.limits.cpu=100m \
  --set webhook.resources.limits.memory=200Mi \
  --set webhook.resources.requests.cpu=50m \
  --set webhook.resources.requests.memory=100Mi \
  --set webhook.resourceQuotaEnforcement.enable=true \
  --set "spark.jobNamespaces={spark-apps}" 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants