Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin package versions? Use snapshot archives? #154

Open
ahachete opened this issue Aug 13, 2024 · 12 comments
Open

Pin package versions? Use snapshot archives? #154

ahachete opened this issue Aug 13, 2024 · 12 comments
Assignees

Comments

@ahachete
Copy link

I'm new to chisel, so my understanding may be wrong. But it seems like packages are always picked from the "latest" available version. While this may be good for experimenting, it doesn't help making stable (read: reproducible) builds.

Is it possible to pin package versions? Is it possible to use snapshot archives?

From a quick look at the source code, it seems like repositories are hardcoded. If so, are there any plans to offer alternatives in these areas?

@rebornplusplus
Copy link
Member

rebornplusplus commented Aug 13, 2024

Hello @ahachete, you are correct in assuming that Chisel fetches the "latest" version. See below:

func (a *ubuntuArchive) selectPackage(pkg string) (control.Section, *ubuntuIndex, error) {
var selectedVersion string
var selectedSection control.Section
var selectedIndex *ubuntuIndex
for _, index := range a.indexes {
section := index.packages.Section(pkg)
if section != nil && section.Get("Filename") != "" {
version := section.Get("Version")
if selectedVersion == "" || deb.CompareVersions(selectedVersion, version) < 0 {
selectedVersion = version
selectedSection = section
selectedIndex = index
}
}
}
if selectedVersion == "" {
return nil, nil, fmt.Errorf("cannot find package %q in archive", pkg)
}
return selectedSection, selectedIndex, nil
}

You are also right on the reproducible builds part. If a new version is available with modified contents, a re-run of Chisel will generate different rootfs. It was ultimately a design choice to always get the latest packages, as they might have security updates. But your concern is also quite valid and we might need to think about that.

Unfortunately, pinning packages is not currently supported. You can pin archives at a package level via the archive field, but I reckon that doesn't help much with this particular issue.

About the hardcoded repository URLs, there aren't any immediate plans to make this configurable (via the chisel.yaml file). If it is very important to use a snapshot or a mirror, one workaround might be to somehow modify the network calls in the running host -- replacing archive.ubuntu.com to your snapshot endpoint. (Related: #135) I am not too sure if this is doable, to be honest with you.

Let me know if you have any more queries! Cheers.

@rebornplusplus rebornplusplus self-assigned this Aug 13, 2024
@ahachete
Copy link
Author

Thank you very much for your reply @rebornplusplus . It's very insightful.

About the hardcoded repository URLs, there aren't any immediate plans to make this configurable (via the chisel.yaml file). If it is very important to use a snapshot or a mirror, one workaround might be to somehow modify the network calls in the running host -- replacing archive.ubuntu.com to your snapshot endpoint. (Related: #135) I am not too sure if this is doable, to be honest with you.

I see. What would be the effect, though, of patching Chisel code and replacing that hardcoded string by a snapshot as in https://snapshot.ubuntu.com/ ? If a particular snapshot is selected, package selection should work the same way.

If so, it doesn't look like to me this would be a hard change. I'd love to contribute it, but I'm not a Go programmer. I can program, though, happy to give it a try if needed, but I'd like to make sure this approach may be interesting in being adopted.

In general, I'm strongly focused on reproducible container images, and would love to use Chisel for a new upcoming OSS project that wants to rely on Ubuntu as a base image, but needs to make reproducible container images. Chisel seems like a perfect fit save for this issue. As for now we've been building with Distroless and it works well, but we prefer Ubuntu for the longer LTS support window (project is WIP and not public yet).

@rebornplusplus
Copy link
Member

Yes, locally, you could replace the hardcoded string by a repository snapshot URL. As long as the relative paths (dists/<suite>/InRelease, pool/.../*.deb etc) are the same, as they should be for a Debian repository, Chisel will fetch stuff from the new link. So, you could totally maintain a fork with the link changed. But currently there's no plans to do that in the upstream repo (this one).

In general, I'm strongly focused on reproducible container images, and would love to use Chisel for a new upcoming OSS project that wants to rely on Ubuntu as a base image, but needs to make reproducible container images. Chisel seems like a perfect fit save for this issue. As for now we've been building with Distroless and it works well, but we prefer Ubuntu for the longer LTS support window (project is WIP and not public yet).

This is nice to hear! All the best and please feel free to let us know if you have any more concerns!

@ahachete
Copy link
Author

Thank you for the information, it's very valuable and I'll try to fork it to support this functionality.

But currently there's no plans to do that in the upstream repo (this one).

Just for the record, may I ask why not? It seems to me like it would be:

  • a valuable change that others may also benefit from
  • a (possibly) relatively simple change to create and adopt
  • something that won't change default beavior of pushing latest, but rather an additional option.

I'd be more than happy to contribute such an improvement.

@rebornplusplus
Copy link
Member

Ah, it's mainly because we are focusing on other priority items currently. But I agree that this is an interesting and quite useful feature request. I will make sure to raise it to the team and track it. I will let you know of any updates! I will keep this issue open until then. Meanwhile, if you would like to do a proof-of-concept, I would be more than happy to take a look at it!

@letFunny
Copy link
Collaborator

@ahachete Even though this is a useful feature (like @rebornplusplus said) it is not that we are not prioritizing it, it is more that it goes against the fundamental design of Chisel, at least at the moment.

There are several problems with pinning software versions, among the most important ones:

  • We would have to create and maintain different slice definitions based on the specific versions of the package. The problem is much bigger than that though because we guarantee that there are no conflicts no matter what slices you install. If we pinned versions that will be incredibly hard, or impossible, to do because we could not guarantee that versions 1.0 and 2.0 both work will all other slices. If you think about it, that quickly becomes incredibly restrictive.
  • There is also the fact that we want to encourage users to use the latest version which is patched and updated by Ubuntu with the latest security fixes. Ideally our UX would not encourage users to pin versions and create outdated images by default, then going through the pain of updating the packages and finding that the slice definitions broke perpetuating the cycle of not updating in a timely manner.

Lastly, you could pin your package versions by having a local archive and/or caching the packages. That is a feature that will come eventually but we do have other priorities at the moment. A workaround might be to build Chisel and point it to a local registry for example.

@ahachete
Copy link
Author

ahachete commented Aug 22, 2024

Thank you for your comments @rebornplusplus and @letFunny.

From what I understand it looks like Chisel chose security (from the perspective of always having latest versions of packages to minimize CVEs) over reproducibility. That may be a reasonable compromise --except I believe they are not at odds and you don't need to compromise. Both can be achieved simultaneously.

Why I think reproducibility is as important as security: without reproducibility, once an image is built it becomes a "golden image". It's now the source of truth of deployments, and needs to be properly backed up and copied/distributed everywhere. It cannot be lost. Probably it has to be built on some special "golden servers" with restricted access to make sure such a golden image is not tainted. If these provisions are not taken you enter into many risks like having inconsistent deployments (where different versions of the images are deployed to parts of the same fleet); having development environments working on different versions than the production images (which in turns breaks one of the main advantages of using containers); hard to troubleshoot problems (if you cannot reproduce the very exact environment elsewhere); etc.

It's worth noting that reproducibility doesn't mean that you run old versions of packages all the time. It just means (in this respect, it also requires other additional work, of course) that you source packages from a given snapshot or a set of pinnned packages. And that you can build your image anywhere and at any time, and you will get the same byte-by-byte output. No golden images, no special servers, no need to backup up images or redistribute them. Just rebuild them anytime needed. Now to avoid using old software with CVEs, build and re-deploy images with a relative high frequency. That anyone can choose, depending on their operational needs.

Therefore reproducibility and security (via using latest versions) can and, in my opinion should, co-exist.

If Chisel will ignore reproducibility and just focus on security, that's a choice I respect but one I'd respectfully consider not the best one and therefore a project I doubt I will put to use. But that's fine, it's just me (well, maybe others too). But if this feedback serves for any purpose to re-thinking this strategy, I'd be happy to have contributed that.

On more practical terms:

We would have to create and maintain different slice definitions based on the specific versions of the package.

I believe you have to do this anyway. Whenever a new version of an existing package introduces a change that requires updating the slide definitions, you will be forced to do this anyway. Actually, I'd say your burden is higher, since you need to update the slice definition on a very short timeframe, or else that package is temporarily broken (in contrast, if you pin and cherry pick versions, you can chose when to update them and avoid having temporarily broken packages).

Therefore, package slice definitions need to be updated regardless, and the only difference when allowing pinning versions would be that you will need to keep that history of slice definitions explicitly (and not only on git history). That could be done by maintaining a list of slice definitions per package and add version boundaries (lower and possibly upper) for each element of that list. This is a bit of work in designing this now, but IMHO nothing really big and should not require additional maintenance work in the future (since, again, updating slice definitions will need to be done anyway).

Ideally our UX would not encourage users to pin versions

At no point I'd have expected pinning to be the default behavior. It could just be an option that you need to explicitly opt-in, for those that are conscious about the advantages of combining latest or recent package versions with reproducibility.

@letFunny
Copy link
Collaborator

letFunny commented Aug 23, 2024

Hey @ahachete, thanks for your comment, you have a lot of valid points. I feel I didn't do a good job of explaining the reasoning in my previous comment so I will try to elaborate a bit more here.

And that you can build your image anywhere and at any time

I think this is the key point because this use case is very different from "general" package version pining. If I get that correctly, what we want is to be able to rebuild an image given that we had an inventory of what went inside it. That is indeed a good idea and something that we could support because we are already making the scripts deterministic (given the same package versions) and we are working on producing a manifest that could serve as an inventory. This does not contradict the design at all and, if you have a snapshot of the slice definitions used and the manifest, we could in fact add a feature to rebuild the same image.

What goes against the current design is version pinning or saying I want x version of package A and y version of package B. That is indeed a much harder problem if we consider the guarantees that we make about compatibility and updates. I will elaborate why by responding to your comments:

Whenever a new version of an existing package introduces a change that requires updating the slide definitions, you will be forced to do this anyway. Actually, I'd say your burden is higher, since you need to update the slice definition on a very short timeframe, or else that package is temporarily broken

We see that as a feature and not a bug. In the same way that Ubuntu updates the versions of packages and does not break the system we attempt to do the same thing in a timely manner. Even if we pinned versions we would have to update as fast as possible to get the latest security fixes. Lastly, we can rely on the fact that Ubuntu packagers are not going to change packages unexpectedly for an arbitrary reason, substantial changes only happen for new releases.

That could be done by maintaining a list of slice definitions per package and add version boundaries

I think this is where we have the biggest disconnect. I think this is a valid point for traditional package managers which support different compatibility matrices for packages. Marking one version as compatible with another is more of a manual process where it is the case by default if the packages do not interact. Chisel however has much stronger guarantees. Because we specify the contents of the packages upfront and we verify that there are no conflicts without downloading them, we have to be more restrictive. As a result, we can be sure that if there is no conflict in the slice definitions then there is never going to be a conflict in the final image.

The problem with maintaining different version of packages is that all the versions cannot have any conflict with any other package or their versions. Coupled with the restrictiveness of the slice definitions, that means that it will be really hard to guarantee that there are no conflicts. Just to give you an example, we do not allow different slices to declare the same path (with some exceptions) because we cannot guarantee that the extracted content will be the same. That translates roughly to a one slice per path rule. We will have a combinatorial explosion if we allowed all versions to co-exist, where some of them declare a path and are incompatible with some packages and some of them don't. But all of them have to be compatible so we would need to change them in a non-obvious way. The alternative is to end up with groups of packages that are compatible with each other and taint the rest. The moment one group has one package that is incompatible with a package in another group, both will be incompatible as a whole, which is why we do not want to segment it.

Then there is the questions of backporting features and Chisel bug/security fixes. If we maintained slices for each version of a package we will have to effectively do that for each version, for no real benefit as the outdated version will not contain the latest security fixes for the package.

EDIT: Please tell me if the conflict resolution bit is clear or not because maybe we need to write something more detailed in the general documentation than what we currently have.

@jjmaestro
Copy link

What goes against the current design is version pinning or saying I want x version of package A and y version of package B.

@letFunny agreed but maybe there's an easier / simpler way forward. When you use a Ubuntu snapshot the "pinning" would be "whatever was in that Ubuntu version at that point in time". Which, unless I'm mistaken, that's exactly what you have today except you only maintain the "latest snapshot" because chisel-releases are only "versioned" by the Ubuntu release and thus don't have any guarantees with respect to that Ubuntu version at previous points in time.

So, what if on top of the --release argument chisel accepts a --snapshot argument? It could use both to (1) resolve which chisel-releases/ubuntu-##.##/slices to use and (2) when pulling the packages.

I think (2) is easy because it's just building the snapshot URL and then pulling whatever version of the package is in the snapshot. And for (1), since snapshots are time-based, it would be a matter of knowing what snapshots are compatible with what slices. Or rather, at which point in time, if any, a snapshot would break for a given set of slices. Then, it would be a matter of keeping different slices-<SNAPSHOT> folders that would be compatible up to that SNAPSHOT.

This is something that, by definition, you should have known anyway. Otherwise, chisel would have been broken if used at the time the snapshot was created. And, as you said:

(...) we specify the contents of the packages upfront and we verify that there are no conflicts without downloading them, we have to be more restrictive. As a result, we can be sure that if there is no conflict in the slice definitions then there is never going to be a conflict in the final image.

Thus I think the problem could be reduced to knowing what snapshots are allowed to work for the current slices. And I think this could be done by using the current Spread tests to run "regression tests" against the Ubuntu snapshots. If they break, you would mark that snapshot as the minimum supported one and then, keep supporting future ones going forward (which is something that's implicitly done today when someone changes a slice because it breaks with a future version of a package).

Would something like this work?

@ahachete
Copy link
Author

I think there's some common concepts shared here in your comments @letFunny @jjmaestro and my initial ideas. I think when I mentioned the option to "pin packages" it was not clearly stated the limits within that may happen. And the general case where a user may pin some packages to some versions, some to others and some unpinned is a general case that would be quite challenging to solve (plus I don't think it brings any significant value).

My line of thought was closer to what @jjmaestro says, something along the lines of having some --snapshot or equivalent flag, where you may specify a valid snapshot repository (e.g. https://snapshot.ubuntu.com/) and use packages, all, as they were at that particular instant of time.

Actually, from this perspective, Chisel could even do this even when this flag is not requested: if an user ask for "latest-latest", check what is the very latest snapshot available (if I'm not mistaken they happen every 4 hours or so, which means that they would be very latest) and use (internally; could be exposed to the user or not) that snapshot to build the chiselled container image.

I understand Chisel requirements are more strict with slices to prevent them overlapping (that makes sense). But if the need for a snapshot functionality could be accommodated within these bounds, it would be great --and I'd be happy to try to contribute (not a Go programmer, but I'd do my best) and definitely test and evaluate it.

@letFunny
Copy link
Collaborator

@ahachete @jjmaestro

Thus I think the problem could be reduced to knowing what snapshots are allowed to work for the current slices. And I think this could be done by using the current Spread tests to run "regression tests" against the Ubuntu snapshots.

My line of thought was closer to what @jjmaestro says, something along the lines of having some --snapshot or equivalent flag, where you may specify a valid snapshot repository (e.g. https://snapshot.ubuntu.com/) and use packages, all, as they were at that particular instant of time.

Thank you for these because they are indeed very good ideas and something that could, in theory, support reproducibility without any of the downsides I mentioned in the comments above. Tagging releases with the supported snapshot(s) without having to make any of the extra guarantees or sacrifice security looks like something that could also work well with the rest of the design. That being said, I think this feature would require a lot more discussion and a more detailed design (apart from the implementation), and right now we want to focus on the foundations for Chisel, at least for the next few months.

So I will make a note and when we finish the priority features we will come back to study this use-case, thanks again!

@ahachete
Copy link
Author

So I will make a note and when we finish the priority features we will come back to study this use-case, thanks again!

Awesome! Feel free to ping here when time comes, happy to provide feedback. Thank you @letFunny

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants