v6.22.1 patch includes breaking(?) change #1341

Syriiin · 2024-07-10T09:06:15Z

Describe the bug
The v6.22.1 patch doesn't mention it in the release notes, but it includes a PR that fixes a bug where fix-buildkite-agent-builds-permissions is run always instead of only when docker user namespace remapping is disabled.

Even with docker user namespace remapping enabled, there are cases where files written from a container via a bind mount will be owned by a user that prevents buildkite-agent from cleaning them upon running a new step, hence this can be considered a breaking change.

I noticed this when our pipeline broke following the v6.22.1 upgrade.
The root cause for us was actually a typo, of chown 775 instead of chmod 775, but even still, this should probably not prevent further jobs from being run.

Steps To Reproduce
Steps to reproduce the behavior:

Run a step on an agent that creates a new directory and file within it, both owned by the container non-root user

mkdir testdir
chmod 777 testdir
docker run --rm -u 10001 -v $(pwd):/app -w /app alpine sh -c "mkdir testdir/containerdir && touch testdir/containerdir/test.txt"
# (`-u` is used here for simplicity, but an image that is configured to use a non-root user is the real use case)

Run a new step and observe the git clean -ffxdq fail

Expected behavior
A step should run, regardless of what any previous step has done to the file permissions within the cloned repo.

Actual behaviour
If file permissions are sufficiently broken from a previous step on the agent, further steps will fail to clean the working directory for checkout.

Stack parameters (please complete the following information):

AWS Region: ap-southeast-2
Version: v6.22.1

Related to #409

The text was updated successfully, but these errors were encountered:

DrJosh9000 · 2024-07-17T05:21:10Z

Hi @Syriiin, thanks for the report! #1334 was not intended as a breaking change, so it's always interesting to hear when a use case is broken.

That said, this seems as though the change that happened as a result of #1334 uncovered a bug in your pipeline more than it uncovered a bug in the stack. The intention was always to only run fix-perms when Docker user remapping is disabled, because checking and fixing the owning user/group on every file in a repo on every job can be expensive.

I agree that ideally the agent would be always be able to clean up after a previous job, but this is in tension with wanting to run jobs quickly since that relies on reusing previous state on the instance. There are all sorts of things a job could do to the instance state that would prevent future jobs running, even if fix-perms ran on every job.

A compromise might be to expose (yet another) stack parameter to force fix-perms regardless of the other settings - WDYT?

Syriiin · 2024-07-17T05:42:50Z

uncovered a bug in your pipeline more than it uncovered a bug in the stack

Yea definitely. I've fixed these issues with our pipelines now and have been able to upgrade without issue.
Mainly just wanted to call this out in case others run into it and aren't able to fix their pipelines as easily.

A compromise might be to expose (yet another) stack parameter to force fix-perms regardless of the other settings - WDYT?

Yea that was my thought exactly.
As you've said though, it's not an intended use case so it might not be worth the extra parameter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v6.22.1 patch includes breaking(?) change #1341

v6.22.1 patch includes breaking(?) change #1341

Syriiin commented Jul 10, 2024 •

edited

Loading

DrJosh9000 commented Jul 17, 2024

Syriiin commented Jul 17, 2024 •

edited

Loading

v6.22.1 patch includes breaking(?) change #1341

v6.22.1 patch includes breaking(?) change #1341

Comments

Syriiin commented Jul 10, 2024 • edited Loading

DrJosh9000 commented Jul 17, 2024

Syriiin commented Jul 17, 2024 • edited Loading

Syriiin commented Jul 10, 2024 •

edited

Loading

Syriiin commented Jul 17, 2024 •

edited

Loading