Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a diagnostic kstat for obtaining pool status #16484

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

usaleem-ix
Copy link
Contributor

Motivation and Context

This PR is an updated version of previous #16026

In the original PR, JSON was written into the buffer directly and nvlists were also converted to JSON, which was redundant.

Description

This PR creates an output nvlist and later nvlist is printed in JSON format to provided buffer. Spares and l2cache devices were also not showing up with previous #16026. This is also fixed now.

How Has This Been Tested?

Manually tested in different pool configurations.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

.gitignore Outdated Show resolved Hide resolved
include/sys/spa_impl.h Outdated Show resolved Hide resolved
#define JPRINTF(start, end, ...) \
do { \
if (start < end) \
start += snprintf(start, end - start, __VA_ARGS__); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be safe we need to check that the return value is not negative here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.

module/zfs/spa.c Outdated Show resolved Hide resolved
module/zfs/spa_json_stats.c Outdated Show resolved Hide resolved
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 28, 2024
fredw and others added 2 commits August 29, 2024 10:48
This kstat output does not require taking the spa_namespace
lock, as in the case for 'zpool status'. It can be used for
investigations when pools are in a hung state while holding
global locks required for a traditional 'zpool status' to
proceed.

This kstat is not safe to use in conditions where pools are
in the process of configuration changes (i.e., adding/removing
devices).  Therefore, this kstat is not intended to be a general
replacement or alternative to using 'zpool status'.

Sponsored-by: Wasabi Technology, Inc.
Sponsored-By: Klara Inc.

Co-authored-by: Don Brady <[email protected]>
Signed-off-by: Don Brady <[email protected]>
This commit updates the kstat for pool status and simplifies by
creating an nvlist that contains the pool status. This nvlist is then
printed to provided buffer in JSON format. The redundant parts of code
have also been removed.

Signed-off-by: Umer Saleem <[email protected]>
@tonyhutter
Copy link
Contributor

@usaleem-ix is the main goal of this to get the JSON into a kstat so it's lockless? Or was there another use case for wanting the JSON specifically in the kstats? I ask because I'm working on some prototype code that would remove spa_namespace_lock from the zpool status callpath. Early tests are promising but I'm not 100% sure it's going to work yet.

@usaleem-ix
Copy link
Contributor Author

@tonyhutter yes, the main goal of this is to get zpool status JSON into a kstat so it is lockless.

Also, instead of executing zpool status command in a subprocess in python, ease of accessing the kstats within our TrueNAS middleware is also one of the motivations for this.

@yocalebo
Copy link

yocalebo commented Sep 3, 2024

@usaleem-ix is the main goal of this to get the JSON into a kstat so it's lockless? Or was there another use case for wanting the JSON specifically in the kstats? I ask because I'm working on some prototype code that would remove spa_namespace_lock from the zpool status callpath. Early tests are promising but I'm not 100% sure it's going to work yet.

@tonyhutter For TrueNAS, we have a few primary reasons for wanting zpool status in procfs.

  1. lockless (already mentioned)
  2. preventing the necessity for fork+exec'ing (this can become expensive)
  3. simplifying a large part of our code-base that uses libzfs (parsing structured output from procfs becomes "trivial" compared to using libzfs)

EDIT:
4. parts of libzfs aren't thread safe and so we have to use a process pool. A process pool adds a non-trivial amount of overhead to our application and so offloading certain information to procfs allows us to lessen the usage of said process pool.

@tonyhutter
Copy link
Contributor

tonyhutter commented Sep 4, 2024

@usaleem-ix @yocalebo thanks for the info.

  1. I just opened WIP: Remove spa_namespace_lock from zpool status #16507 for lockless zpool status.

  2. The kstat means we're doing JSON generation in kernel-space. That makes me worry more about JSON edge cases (special characters in a vdev name, overflowing the JSON buffer, etc...)

  3. This kstat JSON is just a dump of the pool's nvlist. That means we're exposing a bunch of internal variables that the user never needs to see:

$ sudo cat /proc/spl/kstat/zfs/tank/status.json | jq
{
  "status_json_version": 4,
  "scl_config_lock": true,
  "scan_error": 2,
  "scan_stats": {
    "func": "NONE",
    "state": "NONE"
  },
...

It also API-ifys the config nvlist, which is something I think we should avoid.

  1. The status.json kstat doesn't match the zpool status JSON output.

  2. This PR introduces a second JSON implementation in the same codebase.

  3. The zpool status -j route allows us to use our delegation support, wheras the kstat does not.

Overall, I think putting the JSON functionality in libzfs may be the better route, even if it also means fixing whatever thread safety issues we have in the library. It's just nice to have all the JSON generation done in userspace.

Alternatively, if you want to go the fork+exec zpool status -j route, it's possible pre-forking could speed things up a bit.

@yocalebo
Copy link

yocalebo commented Sep 6, 2024

1. I just opened [WIP: Remove spa_namespace_lock from zpool status #16507](https://github.com/openzfs/zfs/pull/16507) for lockless `zpool status`.

This is great. Removing that lock for zpool status will benefit everyone (not just selfish developers like myself 😄 )

2. The kstat means we're doing JSON generation in kernel-space.  That makes me worry more about JSON edge cases (special characters in a vdev name, overflowing the JSON buffer, etc...)

I tend to agree this isn't that big of a deal. The linux kernel does yaml formatting for certain nfs statistics (and we actually found it wasn't escaping characters properly) and was producing invalid yaml. Not quite the same argument you're making about special chars, over/under flowing etc, but procfs is pretty resilient and has a ton of information from all kinds of esoteric subsystems these days.

3. This kstat JSON is just a dump of the pool's nvlist.  That means we're exposing a bunch of internal variables that the user never needs to see:

I agree with you on this point. I see no benefit in exposing this type of information. I was under the assumption this would essentially mirror the zpool status -j output (for the most part).

It also API-ifys the config nvlist, which is something I think we should avoid.

One of the biggest gripes I have is that libzfs isn't versioned and is hard to utilize 😄 I've got no strong opinion on this side of the argument though.

4. The `status.json` kstat doesn't match the `zpool status` JSON output.

Yeah, this is a problem and I agree with you 100% here. Some subtle differences are okay but not matching at a large percentage is bad.

5. This PR introduces a second JSON implementation in the same codebase.

Agree with you 100% on this too. No reason to add unnecessary complexity like this.

6. The `zpool status -j` route allows us to use our delegation support, wheras the kstat does not.

That's a valid point.

Overall, I think putting the JSON functionality in libzfs may be the better route, even if it also means fixing whatever thread safety issues we have in the library. It's just nice to have all the JSON generation done in userspace.

Yeah, I'm all for improving libzfs. We've had a couple community members try to user our python libzfs bindings and they have run into issues with non-reentrant calls being made in libzfs (and IIRC, there were some global memory objects wreaking havoc at some point.) Anyways, one of the community members actually tried to help improve thread-safety by opening a PR here. There were quite a few follow-up commits to make the lib thread-safe. Maybe it is thread-safe (for the most part) and we just need to test it.

Alternatively, if you want to go the fork+exec zpool status -j route, it's possible pre-forking could speed things up a bit.

We already do this with a process pool. However, eventually the child processes get reaped and new ones get forked. This all becomes moot, however, if the library is indeed thread-safe. More tests need to be done on our side I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants