Avoid BUG in migrate_folio_extra #16568
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Linux page migration code won't wait for writeback to complete unless it needs to call
release_folio
. CallSetPagePrivate
whereverPageUptodate
is set and define.release_folio
, to causefallback_migrate_folio
to wait for us.Motivation and Context
Thanks for considering this PR.
I came across issue #15140 from the Proxmox VE 8.1 release notes, and gave it a good long look over. As far as I can tell, what's happening is that the Linux kernel page migration code is starting writeback on some pages, not waiting for writeback to complete, and then throwing a BUG when it finds that pages are still under writeback.
Pretty much all of the interesting action happens in fallback_migrate_folio(), which doesn't show up in the stack traces listed in #15140, but suffice it to say that it's called from move_to_new_folio(), which does appear in the stack traces. What appears to be happening in the case of the crashes described in #15140 is that fallback_migrate_folio() is being called upon dirty ZFS page-cache pages, so it's starting writeback by calling writeout(). Then, since ZFS doesn't store private data in any page cache pages, it skips the call to filemap_release_folio() (because folio_test_private() returns false), and immediately calls migrate_folio(), which in turn calls migrate_folio_extra(). Then, at the beginning of migrate_folio_extra(), it BUGs out because the page is still under writeback (folio_test_writeback() returns true).
Notably, if the page did have private data, then fallback_migrate_folio() would call into filemap_release_folio(), which would return false for pages under writeback, causing fallback_migrate_folio() to exit before calling migrate_folio().
So, in summary, in order for the BUG to happen a few things need to be true:
I went through the code for all of the filesystems in the Linux kernel and didn't see any that met all three conditions. Notably, pretty much all traditional filesystems store buffers in page private data. Those filesystems that don't store buffers either store something else in page_private (e.g. shmem/tmpfs, iomap), or don't do asynchronous writeback (e.g. ecryptfs, fuse, romfs, squashfs). So it would appear as if ZFS may be the only filesystem that experiences this particular behavior. As far as I can tell, the above-described behavior goes back all the way to when page migration was first implemented in kernel 2.6.16.
The way I see it, there are two ways to make the problem go away:
I assume the latter may be preferable (even if only temporarily) so that ZFS can avoid this crash for any/all kernel versions, but I'm happy to defer to the ZFS devs on which option(s) you choose to pursue.
The latter is the approach I took in the patch proposed here.
Description
Call
SetPagePrivate
whereverPageUptodate
is set and define.release_folio
, to causefallback_migrate_folio
to wait for writeback to complete.How Has This Been Tested?
Tested by user @JKDingwall - results in the following comments on #15140:
Also, regression-tested by running ZFS Test Suite (on Ubunutu 23.10, running kernel version 6.5.0-35-generic. No new test failures were observed. See attached files:
Types of changes
Checklist:
Signed-off-by
.