Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torchserve bloom7b1 demo Load model failed #3202

Open
zqc2011hy opened this issue Jun 22, 2024 · 2 comments
Open

torchserve bloom7b1 demo Load model failed #3202

zqc2011hy opened this issue Jun 22, 2024 · 2 comments
Labels
kfserving triaged Issue has been reviewed and triaged

Comments

@zqc2011hy
Copy link

🐛 Describe the bug

2024-06-22T03:41:52,860 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2
2024-06-22T03:41:52,861 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:242) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bloom7b1, error: Worker died.
2024-06-22T03:41:52,863 [DEBUG] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-bloom7b1_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-06-22T03:41:52,864 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STOPPED

Error logs

2024-06-22T03:41:52,860 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2
2024-06-22T03:41:52,861 [ERROR] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:242) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: bloom7b1, error: Worker died.
2024-06-22T03:41:52,863 [DEBUG] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-bloom7b1_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-06-22T03:41:52,863 [WARN ] W-9000-bloom7b1_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-06-22T03:41:52,864 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STOPPED

Installation instructions

https://kserve.github.io/website/latest/modelserving/v1beta1/llm/torchserve/accelerate/

Model Packaging

gs://kfserving-examples/models/torchserve/llm/Huggingface_accelerate/bloom

config.properties

No response

Versions

torchserve --start --model-store=/mnt/models/model-store --ts-config=/mnt/models/config/config.properties

Repro instructions

gs://kfserving-examples/models/torchserve/llm/Huggingface_accelerate/bloom

Possible Solution

No response

@agunapal
Copy link
Collaborator

Hi @zqc2011hy Looking at this log : Backend worker did not respond in given time, it seems you need to increase the default_response_timeout value in config.properties.
This value would changes depending on the hardware you are using.

@agunapal agunapal added triaged Issue has been reviewed and triaged kfserving labels Jun 22, 2024
@zqc2011hy
Copy link
Author

zqc2011hy commented Jun 23, 2024

SERVICE_HOSTNAME=$(kubectl get inferenceservice bloom7b1 -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v
-H "Host: ${SERVICE_HOSTNAME}"
-H "Content-Type: application/json"
-d @./text.json
http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/bloom7b1:predict

{"predictions":["My dog is cute.\nNice.\n- Hey, Mom.\n- Yeah?\nWhat color's your dog?\n- It's gray.\n- Gray?\nYeah.\nIt looks gray to me.\n- Where'd you get it?\n- Well, Dad says it's kind of...\n- Gray?\n- Gray.\nYou got a gray dog?\n- It's gray.\n- Gray.\nIs your dog gray?\nAre you sure?\nNo.\nYou sure"]}

Please provide the specific parameters of text.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kfserving triaged Issue has been reviewed and triaged
Projects
None yet
Development

No branches or pull requests

2 participants