Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable retry and recover for tarfs #538

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

jiangliu
Copy link
Contributor

Enhance tarfs implementation by:

  • retry downloading from registry if the background downloading task fails
  • recover snapshot information related to tarfs on startup
  • remount EROFS associated with tarfs when restart
  • avoid redundant merge operation if the image has only one layer
  • auto detach loop devices when they are not used anymore

@jiangliu jiangliu force-pushed the tarfs-recover branch 4 times, most recently from c001e9e to d276eea Compare September 19, 2023 03:18
@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Attention: Patch coverage is 14.16838% with 418 lines in your changes missing coverage. Please review.

Project coverage is 33.97%. Comparing base (5009c52) to head (e0124e7).
Report is 138 commits behind head on main.

Files with missing lines Patch % Lines
pkg/tarfs/tarfs.go 20.97% 255 Missing and 5 partials ⚠️
pkg/tarfs/losetup_linux.go 0.00% 105 Missing ⚠️
snapshot/snapshot.go 0.00% 45 Missing ⚠️
pkg/manager/manager.go 0.00% 7 Missing ⚠️
snapshot/process.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #538      +/-   ##
==========================================
+ Coverage   33.63%   33.97%   +0.33%     
==========================================
  Files          65       68       +3     
  Lines        8259     9352    +1093     
==========================================
+ Hits         2778     3177     +399     
- Misses       5166     5817     +651     
- Partials      315      358      +43     
Files with missing lines Coverage Δ
snapshot/process.go 0.00% <0.00%> (ø)
pkg/manager/manager.go 0.00% <0.00%> (ø)
snapshot/snapshot.go 0.00% <0.00%> (ø)
pkg/tarfs/losetup_linux.go 0.00% <0.00%> (ø)
pkg/tarfs/tarfs.go 28.23% <20.97%> (ø)

... and 5 files with indirect coverage changes

if lock {
st.mutex.Lock()
}
st.mutex.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lock this by default, Can we add lock and release it after calling getSnapshotStatus?

Otherwise, change getSnapshotStatus to getSnapshotStatusWithLock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ensure atomicity by acquiring the lock on snapshotStatus with the Manager.mutex" locked. So renamed getSnapshotStatustogetSnapshotStatusWithLock`.

pkg/tarfs/tarfs.go Outdated Show resolved Hide resolved
pkg/tarfs/tarfs.go Outdated Show resolved Hide resolved
pkg/tarfs/tarfs.go Outdated Show resolved Hide resolved
pkg/tarfs/tarfs.go Outdated Show resolved Hide resolved
}

// AttachLoopDevice attaches a specified backing file to a loop device
func AttachLoopDevice(backingFile string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough if we use it by importing "github.com/containerd/containerd/mount"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some modification is needed, so coopied it here.

pkg/tarfs/tarfs.go Outdated Show resolved Hide resolved
The field `blobTarFilePath` is only used once, and it's easy to build
it on demand. So remove it from struct snapshotStatus.

Signed-off-by: Jiang Liu <[email protected]>
`waitLayerReady()` alreayds ensures the layer is in ready state,
so avoid redundant status check.

Signed-off-by: Jiang Liu <[email protected]>
Refactor tarfs::blobProcess() for coming changes.

Signed-off-by: Jiang Liu <[email protected]>
Tarfs downloads and converts layer content on background, there's
no flexible way to report failure from background tasks to containerd.
So retry downloading and converting tasks when preparing rw layer
for container. If it still fails, failure will be reported to the
rw layer preparation request.

Signed-off-by: Jiang Liu <[email protected]>
Avoid redundant merge operation for images with only one layer.

Signed-off-by: Jiang Liu <[email protected]>
When nydus snapshotter restarts, we need to recover information
related to tarfs. Otherwise all tarfs instance will become unusable.

Signed-off-by: Jiang Liu <[email protected]>
Import losetup.go from containerd, to replace go-losetup.

Signed-off-by: Jiang Liu <[email protected]>
Replace go-losetup with the version from containerd, to support
auto-clean unused loop devices.

Signed-off-by: Jiang Liu <[email protected]>
On startup, we need to recover information for all tarfs related
snapshots, and remount EROFS filesystems.

Signed-off-by: Jiang Liu <[email protected]>
@jiangliu jiangliu force-pushed the tarfs-recover branch 10 times, most recently from 8835297 to bfc11e7 Compare November 20, 2023 01:50
@jiangliu jiangliu force-pushed the tarfs-recover branch 4 times, most recently from ea0b54c to 8f31e8e Compare November 20, 2023 02:28
@jiangliu jiangliu force-pushed the tarfs-recover branch 24 times, most recently from 62c85f1 to cc2f9f6 Compare November 22, 2023 11:20
.github/workflows/ci.yml Outdated Show resolved Hide resolved
Add unit test cases for tarfs.

Signed-off-by: Jiang Liu <[email protected]>
Fix a data race condition
WARNING: DATA RACE
Write at 0x00c000178428 by goroutine 27:
  github.com/containerd/nydus-snapshotter/pkg/remote/remotes/docker.(*httpReadSeeker).Close()
      /home/runner/work/nydus-snapshotter/nydus-snapshotter/pkg/remote/remotes/docker/httpreadseeker.go:87 +0x57
  github.com/containerd/nydus-snapshotter/pkg/tarfs.(*Manager).blobProcess.func2.1()
      /home/runner/work/nydus-snapshotter/nydus-snapshotter/pkg/tarfs/tarfs.go:339 +0x48
  runtime.deferreturn()
      /opt/hostedtoolcache/go/1.20.1/x64/src/runtime/panic.go:476 +0x32
  github.com/containerd/nydus-snapshotter/pkg/tarfs.(*Manager).blobProcess.func3()
      /home/runner/work/nydus-snapshotter/nydus-snapshotter/pkg/tarfs/tarfs.go:394 +0x71

Previous read at 0x00c000178428 by goroutine 40:
  github.com/containerd/nydus-snapshotter/pkg/remote/remotes/docker.(*httpReadSeeker).Read()
      /home/runner/work/nydus-snapshotter/nydus-snapshotter/pkg/remote/remotes/docker/httpreadseeker.go:48 +0x68
  bufio.(*Reader).Read()
      /opt/hostedtoolcache/go/1.20.1/x64/src/bufio/bufio.go:223 +0x2c3
  github.com/containerd/containerd/archive/compression.(*bufferedReader).Read()
      /home/runner/go/pkg/mod/github.com/containerd/[email protected]/archive/compression/compression.go:113 +0xa4
  io.copyBuffer()
      /opt/hostedtoolcache/go/1.20.1/x64/src/io/io.go:427 +0x28d
  io.Copy()
      /opt/hostedtoolcache/go/1.20.1/x64/src/io/io.go:386 +0x88
  os.genericReadFrom()
      /opt/hostedtoolcache/go/1.20.1/x64/src/os/file.go:161 +0x34
  os.(*File).ReadFrom()
      /opt/hostedtoolcache/go/1.20.1/x64/src/os/file.go:155 +0x324
  io.copyBuffer()
      /opt/hostedtoolcache/go/1.20.1/x64/src/io/io.go:413 +0x1c5
  io.Copy()
      /opt/hostedtoolcache/go/1.20.1/x64/src/io/io.go:386 +0x84
  os/exec.(*Cmd).childStdin.func1()
      /opt/hostedtoolcache/go/1.20.1/x64/src/os/exec/exec.go:511 +0x45
  os/exec.(*Cmd).Start.func2()
      /opt/hostedtoolcache/go/1.20.1/x64/src/os/exec/exec.go:717 +0x42
  os/exec.(*Cmd).Start.func3()
      /opt/hostedtoolcache/go/1.20.1/x64/src/os/exec/exec.go:729 +0x47

Goroutine 27 (running) created at:
  github.com/containerd/nydus-snapshotter/pkg/tarfs.(*Manager).blobProcess()
      /home/runner/work/nydus-snapshotter/nydus-snapshotter/pkg/tarfs/tarfs.go:393 +0x9dd
  github.com/containerd/nydus-snapshotter/pkg/tarfs.(*Manager).PrepareLayer()
      /home/runner/work/nydus-snapshotter/nydus-snapshotter/pkg/tarfs/tarfs.go:465 +0x444
  github.com/containerd/nydus-snapshotter/pkg/tarfs.TestPrepareLayer()
      /home/runner/work/nydus-snapshotter/nydus-snapshotter/pkg/tarfs/tarfs_test.go:33 +0x188
  testing.tRunner()
      /opt/hostedtoolcache/go/1.20.1/x64/src/testing/testing.go:1576 +0x216
  testing.(*T).Run.func1()
      /opt/hostedtoolcache/go/1.20.1/x64/src/testing/testing.go:1629 +0x47

Goroutine 40 (finished) created at:
  os/exec.(*Cmd).Start()
      /opt/hostedtoolcache/go/1.20.1/x64/src/os/exec/exec.go:716 +0xf8e
  github.com/containerd/containerd/archive/compression.cmdStream()
      /home/runner/go/pkg/mod/github.com/containerd/[email protected]/archive/compression/compression.go:284 +0x36f
  github.com/containerd/containerd/archive/compression.gzipDecompress()
      /home/runner/go/pkg/mod/github.com/containerd/[email protected]/archive/compression/compression.go:272 +0x152
  github.com/containerd/containerd/archive/compression.DecompressStream()
      /home/runner/go/pkg/mod/github.com/containerd/[email protected]/archive/compression/compression.go:203 +0x3e4
  github.com/containerd/nydus-snapshotter/pkg/tarfs.(*Manager).blobProcess.func2()
      /home/runner/work/nydus-snapshotter/nydus-snapshotter/pkg/tarfs/tarfs.go:341 +0x1b1
  github.com/containerd/nydus-snapshotter/pkg/tarfs.(*Manager).blobProcess.func3()
      /home/runner/work/nydus-snapshotter/nydus-snapshotter/pkg/tarfs/tarfs.go:394 +0x71
==================
    testing.go:1446: race detected during execution of test

Signed-off-by: Jiang Liu <[email protected]>
- name: Setup Nydus
run: |
# Download nydus components
NYDUS_VER=v$(curl --header 'authorization: Bearer ${{ secrets.GITHUB_TOKEN }}' -s "https://api.github.com/repos/dragonflyoss/nydus/releases/latest" | jq -r .tag_name | sed 's/^v//')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No auth is required when requesting the latest version.

// Download and convert layer content in background.
// Will retry when the content is actually needed if the background process failed.
go func() {
_ = process(rc, remote)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also give a warning log if an error happens.

@imeoer
Copy link
Collaborator

imeoer commented Dec 11, 2023

Can we add some e2e test cases for tarfs in https://github.com/containerd/nydus-snapshotter/blob/main/integration/entrypoint.sh?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants