Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(reaper): refactor to allow retries and fix races #2728

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stevenh
Copy link
Collaborator

@stevenh stevenh commented Aug 9, 2024

Refactor how the reaper is created to allow for proper retries of temporary errors such as container not found issues during startup or shutdown races.

This eliminates the use of sync.Once which wasn't solving the problem at hand and replaces it with a singleton spawner with locked access.

Wrap reaper errors so we can determine the cause of failures more easily.

Fix race condition in port wait when loading from container by always waiting for the port first.

Remove unnecessary use of buffering and invalid retry logic in reaper connection handling which could never recover correctly from a partial read.

Move the reaper creation just before connections are established in compose to ensure its still running when the Connect calls are made.

Previously the reaper was requested in NewDockerComposeWith which means it could have already shutdown before connections are made during the later sections of Up if the startup took over 1 minute.

This was causing consistent failures for:
TestDockerComposeAPIWithWaitLogStrategy

Ensure that resource labels are correct so that resources aren't reaped when the reaper is disabled by excluding session id when reaper is disabled.

Error when creating a reaper when the config says it's disabled so that we avoid hard to debug issues because a reaper is running when it shouldn't be.

@stevenh stevenh requested a review from a team as a code owner August 9, 2024 15:48
Copy link

netlify bot commented Aug 9, 2024

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit 4218dcc
🔍 Latest deploy log https://app.netlify.com/sites/testcontainers-go/deploys/66fd047a0951ef0008057e5e
😎 Deploy Preview https://deploy-preview-2728--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@stevenh stevenh marked this pull request as draft August 9, 2024 15:55
@stevenh
Copy link
Collaborator Author

stevenh commented Aug 9, 2024

Depends on #2738

Copy link
Member

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit on top to satisfy the usage of several missing functions (see comments).

Other than than, I merged the branch with current main and resolved conflicts.

docker.go Show resolved Hide resolved
docker.go Show resolved Hide resolved
reaper.go Show resolved Hide resolved
reaper.go Show resolved Hide resolved
@stevenh
Copy link
Collaborator Author

stevenh commented Sep 8, 2024

@mdelapenya reminder this is dependent on #2728 which is why it has missing components still.

I'll rebase once #2738 is merged.

@stevenh
Copy link
Collaborator Author

stevenh commented Sep 12, 2024

Rebased and picking up again, hopefully wont require many fixes. Will convert from draft when ready.

@stevenh stevenh force-pushed the fix/reaper-retries-race branch 3 times, most recently from bf961f0 to 7649e0c Compare September 13, 2024 08:34
@stevenh stevenh marked this pull request as ready for review September 13, 2024 13:12
Copy link
Member

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added a few super minor comments, other than that, my only concern is about the implications of these changes with the reuse mode we are proposing in #2768

I'm fine merging this, although I'd like to discuss with you about reuse more in depth, as it could apply to the reaper code as well.

reaper.go Outdated Show resolved Hide resolved
reaper.go Show resolved Hide resolved
reaper_test.go Show resolved Hide resolved
@stevenh stevenh marked this pull request as draft September 13, 2024 23:16
@mdelapenya
Copy link
Member

@stevenh is there anything I can do to move forward with this PR?

@stevenh stevenh marked this pull request as ready for review September 19, 2024 13:24
@stevenh stevenh marked this pull request as draft September 19, 2024 14:17
@stevenh
Copy link
Collaborator Author

stevenh commented Sep 19, 2024

Blocked on #2786

generic.go Show resolved Hide resolved
internal/core/labels.go Outdated Show resolved Hide resolved
reaper.go Outdated Show resolved Hide resolved
@stevenh stevenh force-pushed the fix/reaper-retries-race branch 2 times, most recently from a304b09 to 1f4141c Compare September 21, 2024 12:21
@stevenh stevenh marked this pull request as ready for review September 21, 2024 17:50
@mdelapenya
Copy link
Member

There are failing tests for the reaper. Could you take a look?

@stevenh
Copy link
Collaborator Author

stevenh commented Oct 2, 2024

There are failing tests for the reaper. Could you take a look?

Thanks for the poke I was expecting some issues here, but hadn't got round to checking yet.

Refactor how the reaper is created to allow for proper retries of
temporary errors such as container not found issues during startup or
shutdown races.

This eliminates the use of sync.Once which wasn't solving the problem at
hand and replaces it with a singleton spawner with locked access.

Wrap reaper errors so we can determine the cause of failures more
easily.

Fix race condition in port wait when loading from container by always
waiting for the port first.

Remove unnecessary use of buffering and invalid retry logic in reaper
connection handling which could never recover correctly from a partial
read.

Move the reaper creation just before connections are established in
compose to ensure its still running when the Connect calls are made.

Previously the reaper was requested in NewDockerComposeWith which
means it could have already shutdown before connections are made
during the later sections of Up if the startup took over 1 minute.

This was causing consistent failures for:
TestDockerComposeAPIWithWaitLogStrategy

Ensure that resource labels are correct so that resources aren't reaped
when the reaper is disabled by excluding session id when reaper is
disabled.

Error when creating a reaper when the config says it's disabled so that
we avoid hard to debug issues because a reaper is running when it
shouldn't be.

Set org.testcontainers.reap label for containers which should be reaped
by the reaper, to prevent containers which disable the reaper from being
incorrectly reaped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants