Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zio_compress: introduce max_size threshold #9416

Closed
wants to merge 3 commits into from

Conversation

gmelikov
Copy link
Member

@gmelikov gmelikov commented Oct 5, 2019

Created brand new PR because of some problems with rebasing old one #9311, + updated description.

Motivation and Context

87.5% minimum compressratio is way to large, on 1TB with recordsize 16M such artifical limitation may lead to up to 125GB of wasted space in worst case (but you must be very, very unlucky).

So i propose to check only for max ashift of pool, it may save you nearly 99% of compressratio on large recordsizes.

And I think if you enable slow compression on dataset you will prefer better compressratio over decompression absence for data nowadays.

So, pros:

  • better compressratio
  • better IO if decompress is faster than disks
  • we're already compressing everything
  • it may be tuned for not so compressible data via module parameter

Cons:

  • if you have data, which used to be written uncompressible due to previous code - you will see better compressratio at the price of more CPU load

Description

Now default compression is lz4, which can stop
compression process by itself on incompressible data.
So we can check only for common sector size.

How Has This Been Tested?

Generate 12% (just as needed to get into 87.5% threshold) compressible file with fio:

# cat fio.conf
[workload]
bs=128k
ioengine=libaio
iodepth=1
numjobs=1
direct=1
runtime=10
size=100m
filename=bigfile
rw=read
thread
unified_rw_reporting=1
group_reporting=1
buffer_compress_percentage=12
refill_buffers
buffer_pattern=0xdeadbeef
zfs set compression=lz4 rpool/test
version recordsize ashift written diff (from original)
0.7.13 1M 12 100M
master_patched 1M 12 88.8M ~11.2%

So there may be up to ~12.5% compression gain for hardly compressible files.

Tested manually in VM, didn't run actual system on this code yet.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@codecov
Copy link

codecov bot commented Oct 6, 2019

Codecov Report

Merging #9416 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9416      +/-   ##
==========================================
+ Coverage   79.15%   79.15%   +<.01%     
==========================================
  Files         416      416              
  Lines      123655   123658       +3     
==========================================
+ Hits        97876    97880       +4     
+ Misses      25779    25778       -1
Flag Coverage Δ
#kernel 79.75% <85.71%> (+0.01%) ⬆️
#user 66.64% <85.71%> (-0.54%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 685abd5...39d1434. Read the comment docs.

Copy link
Contributor

@GregorKopka GregorKopka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments.

Also it should be verified that changing the module parameter won't modify the result for existing data when called from https://github.com/zfsonlinux/zfs/blob/13a4027a7cd68069cb252e94c18ba1e5eb5af1cd/module/zfs/arc.c#L1625 as that would break encryption MAC validation.

Possibly this needs to be a feature tracking the TXG it's activated on to behave differently depending on the age of the data.

man/man5/zfs-module-parameters.5 Outdated Show resolved Hide resolved
module/zfs/zio_compress.c Outdated Show resolved Hide resolved
module/zfs/zio_compress.c Outdated Show resolved Hide resolved
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 9, 2019
@gmelikov gmelikov force-pushed the compress_ng branch 2 times, most recently from 10488d9 to 39d1434 Compare October 22, 2019 08:06
@gmelikov gmelikov added the Status: Work in Progress Not yet ready for general review label Nov 20, 2019
@gmelikov gmelikov mentioned this pull request Dec 23, 2019
12 tasks
@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Dec 25, 2019

@gmelikov Thanks for pointing me here.
I can at least confirm that 12% compressible writes still don't get compressed as today on MASTER, so this is not just an issue with 0.7.13.

Good work on this, LGTM, this PR deserves some more love! 👍

edit
To be more clear:
This issue isn't just with extremely large record sizes, with 1M record sizes this can already contribute to significant savings!

@gmelikov gmelikov changed the title zio_compress: use min sector size as threshold WIP: zio_compress: use min sector size as threshold Apr 14, 2020
module/zfs/zio_compress.c Outdated Show resolved Hide resolved
@ahrens
Copy link
Member

ahrens commented Apr 14, 2020

I would prefer to eliminate the module parameter, and instead "use sector size of this pool as threshold". I.e. always make the threshold 1 sector (spa_max_ashift).

@gmelikov
Copy link
Member Author

@ahrens hmm, I agree with you, we don't need to use larger threshold here with lz4 and future zstd.

Will get back to this PR in a week.

@gmelikov gmelikov force-pushed the compress_ng branch 2 times, most recently from c147e6d to f7523c3 Compare April 17, 2020 19:41
@codecov-io
Copy link

codecov-io commented Apr 18, 2020

Codecov Report

Merging #9416 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9416      +/-   ##
==========================================
- Coverage   79.38%   79.38%   -0.01%     
==========================================
  Files         388      388              
  Lines      123392   123396       +4     
==========================================
  Hits        97953    97953              
- Misses      25439    25443       +4     
Flag Coverage Δ
#kernel 79.83% <87.50%> (-0.10%) ⬇️
#user 66.07% <75.00%> (+0.56%) ⬆️
Impacted Files Coverage Δ
module/zfs/arc.c 85.65% <100.00%> (+0.44%) ⬆️
module/zfs/zio.c 88.76% <100.00%> (+0.25%) ⬆️
module/zfs/zio_compress.c 92.85% <100.00%> (+0.54%) ⬆️
module/os/linux/spl/spl-zlib.c 55.35% <0.00%> (-28.58%) ⬇️
module/os/linux/spl/spl-kmem-cache.c 75.58% <0.00%> (-8.14%) ⬇️
module/zfs/dsl_synctask.c 92.40% <0.00%> (-2.54%) ⬇️
module/zfs/zil.c 91.25% <0.00%> (-1.87%) ⬇️
module/zfs/vdev_indirect.c 73.83% <0.00%> (-0.84%) ⬇️
module/os/linux/zfs/vdev_disk.c 83.27% <0.00%> (-0.73%) ⬇️
module/zfs/vdev_trim.c 95.66% <0.00%> (-0.73%) ⬇️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7929f3...f7523c3. Read the comment docs.

@gmelikov
Copy link
Member Author

Updated,

  • I've removed module parameter and use max_ashift of pool;
  • tested interoperability with old code, manually imported pool with new code, created data with 89% effective compression, imported it with old code, scrub/read/write is ok, rewrote it, imported with new code, scrub/read/write ok;
  • nightly ztest is almost all green, I've got just one ASSERT at ztest.c:6050:ztest_scrub()/home/debian/zfs/cmd/ztest/.libs/ztest(+0x9444)[0x564c51a51444], but looks like it's not related to this patch
  • I've found only one place in code where we recompress data to checksum it - https://github.com/openzfs/zfs/pull/9416/files#diff-163982413ed6b6861e08633c89a77395R1759 , so looks like encrypted dataset and ARC without compression may trigger a checksum error on pool with 89+% compressed data on old code, @tcaputi could you please help me here and prove it or give some information. I can't reproduce this case, looks like I'm wrong;
  • I looked through all zio_compress_data() usages and can conclude that if data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold;
  • Old code (unless of uncompressed ARC and encryption case) doesn't recompress data for checksums, so if that case will be resolved - we can use this patch without feature flag!

@gmelikov gmelikov changed the title WIP: zio_compress: use min sector size as threshold WIP: zio_compress: use sector size as threshold Apr 18, 2020
@tcaputi
Copy link
Contributor

tcaputi commented May 7, 2020

@gmelikov without looking at it too closely i think i might know whats going on. I think there are 2 things as play which are combining to result in the ARC's "recompress before checksumming" to work ok, but I think it is a pretty brittle dependency.

For some quick background, we need to store data in the L2ARC exactly the same as it is on disk in order for the decryption to work. This is why we do the re-compression here in the ARC layer.

When the data gets recompressed, the ARC uses the compression algorithm that it stored in the ARC header. This info comes from the bp when the block first enters the ARC. This patch doesn't change the way compression is done (AFAICT), it just changes the threshold at which we decide the compression is "worth it". However, no matter what that threshold is, if ZFS decides not to compress the block it sets the compression algorithm to ZIO_COMPRESS_OFF. So the encryption code will only recompress it if it was actually compressed. This code does NOT have the same check for whether or not the compression was worth it. It simply relies on the value from the bp, so it ends up doing the right thing in all these cases.

That being said, this discussion is making me wonder if QAT compression support breaks L2ARC with encryption + no compression. I suspect it does, but that is a very niche bug with very minor repercussions in that case (the data simply fails to read from the L2ARC) and checks the main pool instead. Probably a problem for another time.

Anyway, hope that helps.

@gmelikov
Copy link
Member Author

gmelikov commented May 7, 2020

@tcaputi Thank you for insight, you're right, if it was compressed - we can use zero threshold here too, so I was wrong here (how glad I am)!

So, this patch is ready to review, I've rebased it.

@gmelikov gmelikov removed the Status: Work in Progress Not yet ready for general review label May 7, 2020
@gmelikov gmelikov changed the title WIP: zio_compress: use sector size as threshold zio_compress: use sector size as threshold May 7, 2020
module/zfs/arc.c Outdated
@@ -1755,7 +1756,7 @@ arc_hdr_authenticate(arc_buf_hdr_t *hdr, spa_t *spa, uint64_t dsobj)
abd_take_ownership_of_buf(abd, B_TRUE);

csize = zio_compress_data(HDR_GET_COMPRESS(hdr),
hdr->b_l1hdr.b_pabd, tmpbuf, lsize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, steps to get problem here:

  • Use code from this PR
  • get 89+% compressed encrypted block
  • L2ARC attached to pool
  • Load pool with pre-PR code
  • zio_compress_data will not compress this block again here due to old 12.5% threshold inside it

Heck, because it's the problem - I'll add a read-only feature flag only to make it safe for this case. In future zio_compress_data will have threshold argument so we won't get in this probem again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, if you move the pool to a system with older software, it doesn't correctly write to the L2ARC, because even though the block was compressed, the old L2ARC-writing code will not compress it if it doesn't save at least 1/8th of the space. This only applies if the old system has disabled compressed ARC, right?

Introducing a readonly-compatible feature flag is a bit unfortunate. It would mean maintaining the old 1/8th code forever. I wonder how common it is to disable compressed ARC, nowadays? It might be reasonable to say that if you disable compressed ARC, and use L2ARC, and bring a newer pool back to old bits, then some of the data won't be able to "stick" in the L2ARC. I mean, we previously considered (but didn't implement) saying that disabling compressed ARC makes L2ARC not work with compressed blocks at all (i.e. removing the recompress-on-L2ARC-write code entirely).

Copy link
Member Author

@gmelikov gmelikov May 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrens yes, you've got the point. The funny thing is that I can't reproduce problems in such theoretic scenario, l2arc works well under manual tests with data from new code, but with old module and disabled ARC compression.

There is a "best effort" thing near arc_hdr_authenticate function, maybe it's the case?

I've tried to add a feature flag, and code began to transform from beautiful and easy to read info ugly if-if monster. I don't want to do this :(

If it's already a problematic and rare case (compressed l2arc with encryption on old code with pool and data from new code), and we are ready to go with this - I propose to merge it as is without a feature flag, PR is ready for it. I can add this comment somewhere (I'd be glad if someone gives me exact place for this).

@gmelikov gmelikov added the Status: Work in Progress Not yet ready for general review label May 7, 2020
@adamdmoss
Copy link
Contributor

(as an aside, the comments here talk about a >1MB recordsize a few times - I thought 1MB was the upper limit?)

@gmelikov
Copy link
Member Author

@adamdmoss by default it is, but technically 16MB is an upper limit now, you may just change this module parameter and voila https://github.com/openzfs/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfs_max_recordsize

It's good if you have really large block workload with rare read.

@gmelikov gmelikov mentioned this pull request May 18, 2020
17 tasks
Copy link
Contributor

@tonyhutter tonyhutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this fell off our radar - approved

@gmelikov
Copy link
Member Author

Don't see CI fails related to changes, previous runs were successfull, rebased up to date.

@behlendorf
Copy link
Contributor

@gmelikov the Fedora build failures do look like they're introduced by this PR.

@gmelikov gmelikov force-pushed the compress_ng branch 4 times, most recently from 9b3d7f7 to 5737823 Compare August 29, 2024 10:20
@gmelikov
Copy link
Member Author

@behlendorf not sure what was with fedora runner, after rebase I've got 2 green runs.

But I had flaky tests (they've failed only once):

  • import_devices_missing
  • redundancy_draid_spare1 (saw in some other PRs too)

gmelikov and others added 3 commits September 19, 2024 10:40
Now default compression is lz4, which can stop
compression process by itself on incompressible data.
If there are additional size checks -
we will only make our compressratio worse.

New usable compression thresholds are:
- less than BPE_PAYLOAD_SIZE (embedded_data feature);
- at least one saved sector.

Old 12.5% threshold is left to minimize affect
on existing user expectations of CPU utilization.

If data wasn't compressed - it will be saved as
ZIO_COMPRESS_OFF, so if we really need to recompress
data without ashift info and check anything -
we can just compress it with zero threshold.
So, we don't need a new feature flag here!

Signed-off-by: George Melikov <[email protected]>
ZLE compressor needs additional bytes to process
d_len argument efficiently.
Don't use BPE_PAYLOAD_SIZE as d_len with it
before we rework zle compressor somehow.

Signed-off-by: George Melikov <[email protected]>
On compression we could be more explicit here for cases
where we can not recompress the data.

Co-authored-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this. Looks good, the CI failures were unrelated.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 20, 2024
behlendorf pushed a commit that referenced this pull request Sep 20, 2024
ZLE compressor needs additional bytes to process
d_len argument efficiently.
Don't use BPE_PAYLOAD_SIZE as d_len with it
before we rework zle compressor somehow.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes #9416
behlendorf pushed a commit that referenced this pull request Sep 20, 2024
On compression we could be more explicit here for cases
where we can not recompress the data.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Co-authored-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes #9416
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Sep 20, 2024
Now default compression is lz4, which can stop
compression process by itself on incompressible data.
If there are additional size checks -
we will only make our compressratio worse.

New usable compression thresholds are:
- less than BPE_PAYLOAD_SIZE (embedded_data feature);
- at least one saved sector.

Old 12.5% threshold is left to minimize affect
on existing user expectations of CPU utilization.

If data wasn't compressed - it will be saved as
ZIO_COMPRESS_OFF, so if we really need to recompress
data without ashift info and check anything -
we can just compress it with zero threshold.
So, we don't need a new feature flag here!

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#9416
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Sep 20, 2024
ZLE compressor needs additional bytes to process
d_len argument efficiently.
Don't use BPE_PAYLOAD_SIZE as d_len with it
before we rework zle compressor somehow.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#9416
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Sep 20, 2024
On compression we could be more explicit here for cases
where we can not recompress the data.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Co-authored-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#9416
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Sep 20, 2024
Now default compression is lz4, which can stop
compression process by itself on incompressible data.
If there are additional size checks -
we will only make our compressratio worse.

New usable compression thresholds are:
- less than BPE_PAYLOAD_SIZE (embedded_data feature);
- at least one saved sector.

Old 12.5% threshold is left to minimize affect
on existing user expectations of CPU utilization.

If data wasn't compressed - it will be saved as
ZIO_COMPRESS_OFF, so if we really need to recompress
data without ashift info and check anything -
we can just compress it with zero threshold.
So, we don't need a new feature flag here!

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#9416
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Sep 20, 2024
ZLE compressor needs additional bytes to process
d_len argument efficiently.
Don't use BPE_PAYLOAD_SIZE as d_len with it
before we rework zle compressor somehow.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#9416
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Sep 20, 2024
On compression we could be more explicit here for cases
where we can not recompress the data.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Co-authored-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#9416
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Sep 21, 2024
Now default compression is lz4, which can stop
compression process by itself on incompressible data.
If there are additional size checks -
we will only make our compressratio worse.

New usable compression thresholds are:
- less than BPE_PAYLOAD_SIZE (embedded_data feature);
- at least one saved sector.

Old 12.5% threshold is left to minimize affect
on existing user expectations of CPU utilization.

If data wasn't compressed - it will be saved as
ZIO_COMPRESS_OFF, so if we really need to recompress
data without ashift info and check anything -
we can just compress it with zero threshold.
So, we don't need a new feature flag here!

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#9416
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Sep 21, 2024
ZLE compressor needs additional bytes to process
d_len argument efficiently.
Don't use BPE_PAYLOAD_SIZE as d_len with it
before we rework zle compressor somehow.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#9416
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Sep 21, 2024
On compression we could be more explicit here for cases
where we can not recompress the data.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Co-authored-by: Alexander Motin <[email protected]>
Signed-off-by: George Melikov <[email protected]>
Closes openzfs#9416
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.