You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
✋ I have searched the open/closed issues and my issue is not listed.
We are encountering an issue where Spark jobs, executed as part of an Argo-Workflow, are not being triggered successfully. The workflows that use Spark fail to proceed beyond the execution step, and this issue encountered an issue after upgrading the Spark-Operator from version v1beta2-1.4.6-3.5.0 to 2.0.2.
The status.applicationState.state field not getting update appears to be behaving unexpectedly or not reflecting the desired state during Spark job execution. [cx-lab-create1-vv52g, NOTE: Complete status job ran before the upgrade]
Spark application pod logs lab-create1-xxxxx
time="2024-12-18T14:29:02.450Z" level=info msg="Get sparkapplications 200"
time="2024-12-18T14:29:02.450Z" level=info msg="failure condition '{status.applicationState.state == [FAILED]}' evaluated false"
time="2024-12-18T14:29:02.450Z" level=info msg="success condition '{status.applicationState.state == [COMPLETED]}' evaluated false"
time="2024-12-18T14:29:02.450Z" level=info msg="0/1 success conditions matched"
time="2024-12-18T14:29:02.450Z" level=info msg="Waiting for resource sparkapplication.sparkoperator.k8s.io/lab-create1-xxxxx in namespace <NAMESPACE> resulted in retryable error: Neither success condition nor the failure condition has been matched. Retrying..."
spark-operator-controller pod logs
Displaying logs from Namespace: spark for Pod: spark-operator-controller-b6bdb5dd9-txzk6. Logs from 12/18/2024, 4:53:36 PM
++ 1a -9
+ gid=185
+ set +e
++ getent passwd 185
+ uidentry=spark:x:185:185::/home/spark:/bin/sh
+ set -e
+ [L -z spark:x:185:185::/home/spark:/bin/sh ]]
+ exec /usr/bin/tini-s -- /usr/bin/spark-operator controller start --zap-log-level=info --namespaces=default --controller-threads=10 --enable-ui-service=true --enable-metrics=true --me
Spark Operator Version: 2.0.2+HEAD+unknown
Build Date: 2024-10-11T01:46:23+00:00
Git Commit ID:
Git Tree State: clean
Go Version: g01.23.1
Compiler: gc
Platform: linux/amd64
I1218 14:53:38.142572 10 request.go:697] Waited for 1.035793361s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/karpenter.k8s.aws/v1
2024-12-18T14:53:38.607Z INFO controller/start.go:298 Starting manager
2024-12-18T14:53:38.608Z INFO controller-runtime.metrics server/server.go:205 Starting metrics server
2024-12-18T14:53:38.608Z INFO manager/server.go:50 starting server {"kind":
"health probe"
"addr": "0.0.0.0:8081"}
2024-12-18T14:53:38.608Z INFO controller-runtime.metrics server/server.go:244 Serving metrics server {"bindAddress":
":8080", "secure": false}
I1218 14:53:38.609333 10 leaderelection.go:250] attempting to acquire leader lease spark/spark-operator-controller-lock...
I1218 14:53:54.335544 10 leaderelection.go:260] successfully acquired lease spark/spark-operator-controller-lock
2024-12-18T14:53:54.335Z INFO controller/controller.go:178 Starting EventSource {"controller": "spark-application-controller"
2024-12-18T14:53:54.336Z INFO controller/controller.go:178 Starting EventSource {"controller": "spark-application-controller"
"source": "kind source:
*v1.Pod"}
"source": "kind source: *V1beta2.SparkApplication"}
2024-12-18T14:53:54.336Z INFO controller/controller.go:186 Starting Controller {"controller": "spark-application-controller"}
2024-12-18T14:53:54.335Z INFO controller/controller.go:178 Starting EventSource {"controller":
"scheduled-spark-application-controller",
"source": "kind source: *v1beta2.ScheduledSparkf
2024-12-18T14:53:54.336Z INFO controller/controller.go:186 Starting Controller {"controller": "scheduled-spark-application-controller"}
2024-12-18T14:53:54.437Z INFO controller/controller.go:220 Starting workers {"controller":
"spark-application-controller"
"worker count": 10}
2024-12-18T14:53:54.437Z INFO controller/controller.go:220 Starting workers {"controller": "scheduled-spark-application-controller", "worker count": 10}
Steps to Reproduce:
Upgrade the Spark-Operator from v1beta2-1.4.6-3.5.0 to 2.0.2.
Submit a Spark job.
Observe the status.applicationState.state field.
Expected Behavior:
The Spark job should be successfully triggered and executed as part of the Argo-Workflow.
What question do you want to ask?
We are encountering an issue where Spark jobs, executed as part of an Argo-Workflow, are not being triggered successfully. The workflows that use Spark fail to proceed beyond the execution step, and this issue encountered an issue after upgrading the Spark-Operator from version
v1beta2-1.4.6-3.5.0
to2.0.2
.The
status.applicationState.state
field not getting update appears to be behaving unexpectedly or not reflecting the desired state during Spark job execution. [cx-lab-create1-vv52g, NOTE: Complete status job ran before the upgrade]Spark application pod logs
lab-create1-xxxxx
spark-operator-controller pod logs
Steps to Reproduce:
v1beta2-1.4.6-3.5.0
to2.0.2
.Expected Behavior:
Additional context
Argo-Workflow version: v3.2.7
Spark-Operator version: 2.0.2
EKS version: 1.25
Have the same question?
Give it a 👍 We prioritize the question with most 👍
The text was updated successfully, but these errors were encountered: