Add a diagnostic kstat for obtaining pool status #16484

usaleem-ix · 2024-08-28T06:49:48Z

Motivation and Context

This PR is an updated version of previous #16026

In the original PR, JSON was written into the buffer directly and nvlists were also converted to JSON, which was redundant.

Description

This PR creates an output nvlist and later nvlist is printed in JSON format to provided buffer. Spares and l2cache devices were also not showing up with previous #16026. This is also fixed now.

How Has This Been Tested?

Manually tested in different pool configurations.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

.gitignore

include/sys/spa_impl.h

behlendorf · 2024-08-28T18:28:43Z

module/nvpair/nvpair.c

+#define	JPRINTF(start, end, ...)				\
+do {						\
+	if (start < end)	\
+		start += snprintf(start, end - start, __VA_ARGS__);	\


To be safe we need to check that the return value is not negative here.

Thanks, updated.

module/zfs/spa.c

module/zfs/spa_json_stats.c

This kstat output does not require taking the spa_namespace lock, as in the case for 'zpool status'. It can be used for investigations when pools are in a hung state while holding global locks required for a traditional 'zpool status' to proceed. This kstat is not safe to use in conditions where pools are in the process of configuration changes (i.e., adding/removing devices). Therefore, this kstat is not intended to be a general replacement or alternative to using 'zpool status'. Sponsored-by: Wasabi Technology, Inc. Sponsored-By: Klara Inc. Co-authored-by: Don Brady <[email protected]> Signed-off-by: Don Brady <[email protected]>

This commit updates the kstat for pool status and simplifies by creating an nvlist that contains the pool status. This nvlist is then printed to provided buffer in JSON format. The redundant parts of code have also been removed. Signed-off-by: Umer Saleem <[email protected]>

tonyhutter · 2024-08-31T01:21:07Z

@usaleem-ix is the main goal of this to get the JSON into a kstat so it's lockless? Or was there another use case for wanting the JSON specifically in the kstats? I ask because I'm working on some prototype code that would remove spa_namespace_lock from the zpool status callpath. Early tests are promising but I'm not 100% sure it's going to work yet.

usaleem-ix · 2024-09-03T12:22:35Z

@tonyhutter yes, the main goal of this is to get zpool status JSON into a kstat so it is lockless.

Also, instead of executing zpool status command in a subprocess in python, ease of accessing the kstats within our TrueNAS middleware is also one of the motivations for this.

yocalebo · 2024-09-03T12:33:15Z

@usaleem-ix is the main goal of this to get the JSON into a kstat so it's lockless? Or was there another use case for wanting the JSON specifically in the kstats? I ask because I'm working on some prototype code that would remove spa_namespace_lock from the zpool status callpath. Early tests are promising but I'm not 100% sure it's going to work yet.

@tonyhutter For TrueNAS, we have a few primary reasons for wanting zpool status in procfs.

lockless (already mentioned)
preventing the necessity for fork+exec'ing (this can become expensive)
simplifying a large part of our code-base that uses libzfs (parsing structured output from procfs becomes "trivial" compared to using libzfs)

EDIT:
4. parts of libzfs aren't thread safe and so we have to use a process pool. A process pool adds a non-trivial amount of overhead to our application and so offloading certain information to procfs allows us to lessen the usage of said process pool.

tonyhutter · 2024-09-04T22:02:04Z

@usaleem-ix @yocalebo thanks for the info.

I just opened WIP: Remove spa_namespace_lock from zpool status #16507 for lockless zpool status.
The kstat means we're doing JSON generation in kernel-space. That makes me worry more about JSON edge cases (special characters in a vdev name, overflowing the JSON buffer, etc...)
This kstat JSON is just a dump of the pool's nvlist. That means we're exposing a bunch of internal variables that the user never needs to see:

$ sudo cat /proc/spl/kstat/zfs/tank/status.json | jq
{
  "status_json_version": 4,
  "scl_config_lock": true,
  "scan_error": 2,
  "scan_stats": {
    "func": "NONE",
    "state": "NONE"
  },
...

It also API-ifys the config nvlist, which is something I think we should avoid.

The status.json kstat doesn't match the zpool status JSON output.
This PR introduces a second JSON implementation in the same codebase.
The zpool status -j route allows us to use our delegation support, wheras the kstat does not.

Overall, I think putting the JSON functionality in libzfs may be the better route, even if it also means fixing whatever thread safety issues we have in the library. It's just nice to have all the JSON generation done in userspace.

Alternatively, if you want to go the fork+exec zpool status -j route, it's possible pre-forking could speed things up a bit.

yocalebo · 2024-09-06T11:56:35Z

1. I just opened [WIP: Remove spa_namespace_lock from zpool status #16507](https://github.com/openzfs/zfs/pull/16507) for lockless `zpool status`.

This is great. Removing that lock for zpool status will benefit everyone (not just selfish developers like myself 😄 )

2. The kstat means we're doing JSON generation in kernel-space.  That makes me worry more about JSON edge cases (special characters in a vdev name, overflowing the JSON buffer, etc...)

I tend to agree this isn't that big of a deal. The linux kernel does yaml formatting for certain nfs statistics (and we actually found it wasn't escaping characters properly) and was producing invalid yaml. Not quite the same argument you're making about special chars, over/under flowing etc, but procfs is pretty resilient and has a ton of information from all kinds of esoteric subsystems these days.

3. This kstat JSON is just a dump of the pool's nvlist.  That means we're exposing a bunch of internal variables that the user never needs to see:

I agree with you on this point. I see no benefit in exposing this type of information. I was under the assumption this would essentially mirror the zpool status -j output (for the most part).

It also API-ifys the config nvlist, which is something I think we should avoid.

One of the biggest gripes I have is that libzfs isn't versioned and is hard to utilize 😄 I've got no strong opinion on this side of the argument though.

4. The `status.json` kstat doesn't match the `zpool status` JSON output.

Yeah, this is a problem and I agree with you 100% here. Some subtle differences are okay but not matching at a large percentage is bad.

5. This PR introduces a second JSON implementation in the same codebase.

Agree with you 100% on this too. No reason to add unnecessary complexity like this.

6. The `zpool status -j` route allows us to use our delegation support, wheras the kstat does not.

That's a valid point.

Overall, I think putting the JSON functionality in libzfs may be the better route, even if it also means fixing whatever thread safety issues we have in the library. It's just nice to have all the JSON generation done in userspace.

Yeah, I'm all for improving libzfs. We've had a couple community members try to user our python libzfs bindings and they have run into issues with non-reentrant calls being made in libzfs (and IIRC, there were some global memory objects wreaking havoc at some point.) Anyways, one of the community members actually tried to help improve thread-safety by opening a PR here. There were quite a few follow-up commits to make the lib thread-safe. Maybe it is thread-safe (for the most part) and we just need to test it.

Alternatively, if you want to go the fork+exec zpool status -j route, it's possible pre-forking could speed things up a bit.

We already do this with a process pool. However, eventually the child processes get reaped and new ones get forked. This all becomes moot, however, if the library is indeed thread-safe. More tests need to be done on our side I guess.

behlendorf reviewed Aug 28, 2024

View reviewed changes

behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 28, 2024

fredw and others added 2 commits August 29, 2024 10:48

usaleem-ix force-pushed the kstat-json branch from 26344e7 to 138612c Compare August 29, 2024 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a diagnostic kstat for obtaining pool status #16484

Add a diagnostic kstat for obtaining pool status #16484

usaleem-ix commented Aug 28, 2024

behlendorf Aug 28, 2024

usaleem-ix Aug 29, 2024

tonyhutter commented Aug 31, 2024

usaleem-ix commented Sep 3, 2024

yocalebo commented Sep 3, 2024 •

edited

Loading

tonyhutter commented Sep 4, 2024 •

edited

Loading

yocalebo commented Sep 6, 2024

Add a diagnostic kstat for obtaining pool status #16484

Are you sure you want to change the base?

Add a diagnostic kstat for obtaining pool status #16484

Conversation

usaleem-ix commented Aug 28, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

behlendorf Aug 28, 2024

Choose a reason for hiding this comment

usaleem-ix Aug 29, 2024

Choose a reason for hiding this comment

tonyhutter commented Aug 31, 2024

usaleem-ix commented Sep 3, 2024

yocalebo commented Sep 3, 2024 • edited Loading

tonyhutter commented Sep 4, 2024 • edited Loading

yocalebo commented Sep 6, 2024

yocalebo commented Sep 3, 2024 •

edited

Loading

tonyhutter commented Sep 4, 2024 •

edited

Loading