Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdev_id JBOD enclosure numbering is not deterministic #16572

Open
mrobbert opened this issue Sep 26, 2024 · 0 comments
Open

vdev_id JBOD enclosure numbering is not deterministic #16572

mrobbert opened this issue Sep 26, 2024 · 0 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@mrobbert
Copy link

System information

Type Version/Name
Distribution Name Red Hat Enterprise Linux
Distribution Version 9.4
Kernel Version 5.14.0-427.20.1.el9_4.x86_64
Architecture x86_64
OpenZFS Version 2.1.15

Describe the problem you're observing

When I added a second JBOD to a system with an existing pool I found that the device names in /dev/disk/by-vdev/ for the disks in the new JBOD got assigned to JBOD 1 and that is how the disks in the old JBOD were assigned in the existing pool and those existing disks were now named as if they were in JBOD 2.
I have a single HBA with 2 ports and my JBODs are HGST Data60s that are cabled as recommended in the HGST User manual for these enclosures with multipath in a dasiy-chain configuration.
Here is the vdev_id.conf that I am using:

multipath yes
topology sas_direct
multijbod yes
enclosure_symlinks yes

#       PCI_ID  HBA PORT  CHANNEL NAME
channel ca:00.0 0         A
channel ca:00.0 1         A

I did some investigation to see how the enclosure numbers were getting assigned and it looks like it is based on the order that they show up in /sys/class/enclosure. It is my understanding that this order can change every time the server reboots and if it does the by-vdev device names of all my drives will change.
If I am doing something wrong please let me know, but this looks like a bug to me. Maybe there is a better way to make this assignment deterministic, but I was able to fix this for my single use case with a small patch to the vdev_id code that reads some new lines that I added to the vdev_id.conf file. This new configuration line defines a consistent enclosure/JBOD number based on the enclosure's id as presented in /sys/class/enclosure/*/id

Here is the diff of my change:

--- udev/vdev_id	2024-09-26 10:06:54.594634865 -0600
+++ /root/vdev_id	2024-09-26 10:15:39.972563744 -0600
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/usr/bin/sh
 #
 # vdev_id: udev helper to generate user-friendly names for JBOD disks
 #
@@ -240,7 +240,8 @@
 			d=$(eval echo '$'{$i})
 			id=$(cat "/sys/class/enclosure/${d}/id")
 			if [ "$d" = "$DEVEXP" ] && [ $id = $count ] ; then
-				MAPPED_JBOD=$j
+				JBOD_NUM=$(awk -v enc_id="$id" '$1 == "encid" && $2 == enc_id { print $3; exit }' $CONFIG)
+				MAPPED_JBOD=$JBOD_NUM
 				break
 			fi
 			i=$((i + 1))

And here is my new vdev_id.conf:

multipath yes
topology sas_direct
multijbod yes
enclosure_symlinks yes

#       PCI_ID  HBA PORT  CHANNEL NAME
channel ca:00.0 0         A
channel ca:00.0 1         A

#       Enc. ID  		JBOD Num
encid	0x5000ccab053b9e00	1
encid	0x5000ccab054e2880	2

Let me know what you think!

@mrobbert mrobbert added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

1 participant