Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O_DIRECT regression on TIP post 2.2.6 release #16596

Open
Qubitium opened this issue Oct 2, 2024 · 10 comments
Open

O_DIRECT regression on TIP post 2.2.6 release #16596

Qubitium opened this issue Oct 2, 2024 · 10 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@Qubitium
Copy link

Qubitium commented Oct 2, 2024

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 24.04
Kernel Version 6.8/6.10.9/6.10.12
Architecture Linux x86_64
OpenZFS Version d34d4f9

Describe the problem you're observing

percona-mysql, mariadb using tokudb/innodb with O_DIRECT (innodb) enabled or DIRECTIO (toku) enabled will crash due to tokudb (22) init error and innodb will crash about writing to ibtmp1 and expecting n byte written but got 0 bytes.

Describe how to reproduce the problem

Install latest tip and enable O_DIRECT for innodb and/or DIRECTIO for tokudb.

ZFS 2.2.6 does not have this issue. The error is caused by regression between changes made after 2.2.6 and <= commit: d34d4f97a81f6895de3da67ffbad6f986b2cdae6.

At first I thought it was the 6.10.9/6.10.12 kernel I was using. It is not kernel related. I rolled back to Ubuntu 24.04 6.8 kernel and issue persisted.

Only when I disabled O_DIRECT options for innodb/toku did the errors go away with latest compiled zfs from commit d34d4f9. The errors also went away with O_DIRECT enabled with stable 2.2.6 release.

This appears to be a really bad regression. Please double check the Linux O_DIRECT unit tests if any.

Extra notes on my zfs pool:

  • stripped (no raidz protection)
  • nvme
  • encryption enabled
  • blocksize=16KB
  • no compression
  • no dedup

EDIT: looks to be directly related to #16594 Which also details other directio bugs on tip.

@Qubitium Qubitium added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 2, 2024
@rincebrain
Copy link
Contributor

That one, the person thinks should have been fixed by a specific commit already in git tip.

Also, keep in mind, 2.2.6 is not on the same series of commits as HEAD of master, 2.2 branched from master a while ago and gets cherrypicks, basically, so that's a lot larger delta than just "since 2.2.6".

That said, I agree it's probably the O_DIRECT changes if it's anything, but if you wanted to bisect, just be aware that's not a line you can bisect on effectively.

@behlendorf
Copy link
Contributor

@Qubitium thanks for testing out O_DIRECT and reporting this. There are some known issues we're working to wrap up. We'll be disabling this in the upcoming 2.3 release branch until fully resolved, #16597. The intent is to leave it enabled in the master branch to continue to surface any potential issues.

@Qubitium
Copy link
Author

Qubitium commented Oct 3, 2024

@behlendorf Thanks. I will apply the zfs level direct io disable right-away and stay away from using tip. I did not even realize tip is not 2.2.next but a larger dev branch.

@rincebrain

Also, keep in mind, 2.2.6 is not on the same series of commits as HEAD of master, 2.2 branched from master a while ago and gets cherrypicks, basically, so that's a lot larger delta than just "since 2.2.6".

This is seriously confusing to the end-user (non-contributor). The version tagged in the master is 2.2.99 so to anyone compiling and installing, it is telling the user that is a direct lineage of 2.2.next branch.

@Qubitium
Copy link
Author

Qubitium commented Oct 3, 2024

@behlendorf @rincebrain I have isolated the regression range to be > 1713aa7 and <= d34d4f97a81f6895de3da67ffbad6f986b2cdae6

@rincebrain
Copy link
Contributor

I don't think the expectation is for end users to run git tip, generally.

That said, I don't know that there's any option that makes more semantic sense - it's more or less [current].99-r[number of commits since branch]-g[tag] or [next].0pre[something][same suffix], if people want something that also keeps some kind of versioning ordering for packages.

So I can't think of any way to get a nice stable ordering that's not "infinitely after [current] but smaller than [next].0" other than something like what's used now.

@Qubitium
Copy link
Author

Qubitium commented Oct 3, 2024

That said, I don't know that there's any option that makes more semantic sense - it's more or less [current].99-r[number of commits since branch]-g[tag] or [next].0pre[something][same suffix], if people want something that also keeps some kind of versioning ordering for packages.

I don't know the reasons but having master as non-release tree is bound to confuse a lot of people. Most people expect master to be the release/dev branch where releases are tagged and non-release dev code are in their branches. Not the other way where master becomes a long 2.3 major release and in-between are sprinkled 2.2.x branch offshoots cuts/cherrypicks. If anything, 2.3 should be on 2.3 branch, and the cherry picks for interim 2.2.x to release are from 2.3 to master + tags.

@rincebrain
Copy link
Contributor

This is how e.g. Linux and FreeBSD work, 6.0.1 isn't on the main branch, it's patches on 6.0, and FreeBSD's release process works similarly.

@behlendorf
Copy link
Contributor

Most people expect master to be the release/dev branch where releases are tagged and non-release dev code are in their branches.

This is basically the model being used. All development work is done on branches and only merged in to master once its complete. For all of the supported releases (2.0.x, 2.1.x, 2.2.x) there is a release branch which get backports of needed changes from the master branch.

https://github.com/openzfs/zfs/tree/zfs-2.0-release
https://github.com/openzfs/zfs/tree/zfs-2.1-release
https://github.com/openzfs/zfs/tree/zfs-2.2-release

One thing we could potentially do is make the default branch the latest release branch. That would perhaps be a bit more annoying for the developers, but it would ensure anyone casually cloning the git tree would get the latest release version,

@clhedrick
Copy link

or call master version 2.99

@amotin
Copy link
Member

amotin commented Oct 3, 2024

or call master version 2.99

@clhedrick No. Then the version number would be bigger than later released 2.3, which is not right. Good side of 2.2.99 is that it is bigger than any 2.2.x, but smaller than 2.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

5 participants