You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometime we saw long image pull latency, even with lazy pull enabled. Looking into containerd/overlaybd logs, we found overlaybd has the Get meta timedout error on an image blob, which blocks later image pull requests during this timeout period.
Below is a rough timeline from containerd/overlaybd logs. Sorry I probably couldn't share all the logs, but let me know if you need any more details.
It seems the snapshotter (or overlaybd) cannot process requests in parallel, even for requests for different images.
# (containerd) - notice the long latency for img2, both are overlaybd images"2024-12-03T01:24:32.973066470Z" PullImage img1
"2024-12-03T01:24:33.300435436Z" PullImage img1 returns image reference
"2024-12-03T01:24:36.923843733Z" PullImage img2
"2024-12-03T01:25:06.203165092Z" PullImage img2 returns image reference
# (overlaybd-snapshotter)# snapshotted starts to have img2 related logs only AFTER the previous img1 blob timed out."2024-12-03T01:24:33Z""{repoBlobUrl: img1/blobs/... }""2024-12-03T01:25:06Z""failed to enable target for ... failed:failed to open remote file ... Get meta timedout"
...
"2024-12-03T01:25:06Z""{repoBlobUrl: img2/blobs/...}"# (overlaybd)"2024/12/03 01:24:36" /src/src/switch_file.cpp:50|try_open_zfile:open file as zfile format, path: img1/blobs/...
"2024/12/03 01:25:06" /src/src/image_file.cpp:165|__open_ro_remote:failed to open remote file img1/blobs/... Get meta timedout errno=110(Connection timed out)
"2024/12/03 01:25:08" /src/src/image_file.cpp:145|__open_ro_remote:open file from remotefs mg2/blobs/...
What did you expect to happen?
A Get Meta timeout in one image shouldn't block requests for other images.
Make Get Meta timeout configurable, so we can reduce the wait time. (I believe the payload for Get Meta should be small, so 30s seems too long).
How can we reproduce it?
Not sure, this only happens in one region. We can only capture some live nodes having the issue, but couldn't repro it yet.
What is the version of your Overlaybd?
overlaybd: 0.6.17
overlaybd-snapshotter: 0.6.7
What is your OS environment?
Ubuntu 20.04
Are you willing to submit PRs to fix it?
Yes, I am willing to fix it.
The text was updated successfully, but these errors were encountered:
What happened in your environment?
Sometime we saw long image pull latency, even with lazy pull enabled. Looking into containerd/overlaybd logs, we found
overlaybd
has theGet meta timedout
error on an image blob, which blocks later image pull requests during this timeout period.Below is a rough timeline from containerd/overlaybd logs. Sorry I probably couldn't share all the logs, but let me know if you need any more details.
It seems the snapshotter (or overlaybd) cannot process requests in parallel, even for requests for different images.
What did you expect to happen?
Get Meta
timeout in one image shouldn't block requests for other images.Get Meta
timeout configurable, so we can reduce the wait time. (I believe the payload forGet Meta
should be small, so 30s seems too long).How can we reproduce it?
Not sure, this only happens in one region. We can only capture some live nodes having the issue, but couldn't repro it yet.
What is the version of your Overlaybd?
overlaybd: 0.6.17
overlaybd-snapshotter: 0.6.7
What is your OS environment?
Ubuntu 20.04
Are you willing to submit PRs to fix it?
The text was updated successfully, but these errors were encountered: