Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

formula: add make_deduplication_links_in #18478

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

cho-m
Copy link
Member

@cho-m cho-m commented Oct 1, 2024

Particularly for Java dependents that commonly duplicate JARs, e.g.

  • prestodb can be reduced from 2GB to 600MB
  • joern can be reduced from 1.3GB to 500MB

Also can be used for PostgreSQL dependents that have same SQL files installed to support multiple postgresql@X formulae.

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

Need to write some tests and locally experiment with.

Particularly for Java dependents that commonly duplicate JARs, e.g.
* `prestodb` can be reduced from 2GB to 600MB
* `joern` can be reduced from 1.3GB to 500MB

Also can be used for PostgreSQL dependents that have same SQL files
installed to support multiple `postgresql@X` formulae.
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, looking good so far! Any thoughts on what would be required to do this automatically?

Comment on lines +1944 to +1945
# FIXME: Hardlinks are not fully supported so using `hardlink: true` will only
# reduce the bottle size or source build but will be duplicated on bottle pour.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes them not fully supported, out of interest?

Copy link
Member Author

@cho-m cho-m Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My old incomplete PR #13154 is related as BSD cp duplicates hardlinks. Trying to switch this brings its own set of problems due to bugs/limitations in other commands.

An example of bottle pour behavior is trino which has a bottle/estimated-unpack size of <1GB but will pour to >2GB.

There also needs to be special handling when crossing filesystem boundaries (the most extreme case being someone running on a USB formatted as (ex)FAT which doesn't support hardlinks). Off the top of my head, some that can handle this are rsync -H and GNU cp --preserve=links

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for context!

I think an opt-in approach like this makes sense. I think ideally handling this with automatic hardlinks in future (given they are supported in default filesystems on both macOS and Linux) and accepting that you'll get duplication for non-hard links seems ideal.

@cho-m
Copy link
Member Author

cho-m commented Oct 2, 2024

Any thoughts on what would be required to do this automatically?

Symlinks are a bit risky to do automatically since they can be processed in different ways (e.g. if the symlink is resolved then it can break functionality. It is one reason for switching bin symlinks to exec scripts).

Hardlinks can be safer as they don't have a different behavior on readlink/realpath, but brew and filesystem support are things to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants