Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zcommon causes a kernel crash RHEL9.4 #16540

Open
khain0 opened this issue Sep 16, 2024 · 2 comments
Open

zcommon causes a kernel crash RHEL9.4 #16540

khain0 opened this issue Sep 16, 2024 · 2 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@khain0
Copy link

khain0 commented Sep 16, 2024

System information

Type | Version/Name
Red Hat Enterprise Linux 9.4 5.14.0-427.22.1.el9_4.x86_64
Distribution Name: Red Hat Enterprise Linux
Distribution Version: 9.4
Kernel Version: Red Hat Enterprise Linux
Architecture: x86_64
OpenZFS Version: zfs-dkms 2.1.15-3

Command to find OpenZFS version:
zfs-2.1.15-3
zfs-kmod-2.1.15-3

Describe the problem you're observing

Zcommon caused a kernel crash.

Describe how to reproduce the problem

Wait for kernel crash

Include any warning/errors/backtraces from the system logs

Kernel crash log

[6972591.685926] general protection fault, maybe for address 0xff43024862a6b000: 0000 [#1] PREEMPT SMP NOPTI
[6972591.687227] CPU: 27 PID: 2524776 Comm: z_wr_iss Kdump: loaded Tainted: P           OE  X  -------  ---  5.14.0-427.22.1.el9_4.x86_64 #1
[6972591.689706] Hardware name: Dell Inc. PowerEdge XE8640/0TVHHH, BIOS 2.0.3 05/15/2024
[6972591.690719] RIP: 0010:kfpu_end+0x34/0xa0 [zcommon]
[6972591.691725] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 65 8b 05 4a c2 85 3f 48 98 48 8b 0c c2 0f 1f 44 00 00 b8 ff ff ff ff 89 c2 <0f> c7 19 fb 65 ff 0d 29 c2 85 3f 75 05 0f 1f 44 00 00 48 8b 44 24
[6972591.693723] RSP: 0018:ff4781d383f5f9a0 EFLAGS: 00010046
[6972591.694708] RAX: 00000000ffffffff RBX: ff4781d383f5faa0 RCX: ff43024862a6b000
[6972591.695684] RDX: 00000000ffffffff RSI: ff430229a0368000 RDI: ff4781d383f5fac0
[6972591.696645] RBP: 0000000000001000 R08: ff4781d383f5faa0 R09: ff430229a03687e6
[6972591.697593] R10: ff430229a0368000 R11: 00000000000007e6 R12: ff430229a0368000
[6972591.698529] R13: 0000000000001000 R14: 0000000000000000 R15: 0000000000000008
[6972591.699452] FS:  0000000000000000(0000) GS:ff430282ff340000(0000) knlGS:0000000000000000
[6972591.700366] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[6972591.701272] CR2: 0000000012808000 CR3: 000000fdb80c4003 CR4: 0000000000771ee0
[6972591.702169] PKRU: 55555554
[6972591.703046] Call Trace:
[6972591.703901]  <TASK>
[6972591.704741]  ? show_trace_log_lvl+0x1c4/0x2df
[6972591.705580]  ? show_trace_log_lvl+0x1c4/0x2df
[6972591.706406]  ? abd_fletcher_4_iter+0x64/0xc0 [zcommon]
[6972591.707217]  ? __die_body.cold+0x8/0xd
[6972591.708005]  ? die_addr+0x39/0x60
[6972591.708801]  ? exc_general_protection+0x1aa/0x400
[6972591.709549]  ? asm_exc_general_protection+0x22/0x30
[6972591.710291]  ? kfpu_end+0x34/0xa0 [zcommon]
[6972591.711026]  abd_fletcher_4_iter+0x64/0xc0 [zcommon]
[6972591.711742]  abd_iterate_func.part.0+0xbd/0x1c0 [zfs]
[6972591.712531]  ? __pfx_abd_fletcher_4_iter+0x10/0x10 [zcommon]
[6972591.713213]  abd_fletcher_4_native+0x7c/0xc0 [zfs]
[6972591.713995]  ? find_busiest_group+0x11d/0x240
[6972591.714645]  zio_checksum_compute+0xc7/0x3f0 [zfs]
[6972591.715374]  ? __kmem_cache_alloc_node+0x1c7/0x2d0
[6972591.715994]  ? spl_kmem_alloc+0xb2/0x100 [spl]
[6972591.716604]  ? spl_kmem_alloc+0xb2/0x100 [spl]
[6972591.717190]  ? __kmalloc_node+0x4e/0x140
[6972591.717755]  ? spl_kmem_alloc+0xb2/0x100 [spl]
[6972591.718309]  ? zio_write_compress+0x768/0x9c0 [zfs]
[6972591.718932]  zio_checksum_generate+0x4c/0x70 [zfs]
[6972591.719533]  zio_execute+0x80/0x120 [zfs]
[6972591.720115]  taskq_thread+0x2cc/0x500 [spl]
[6972591.720612]  ? __pfx_default_wake_function+0x10/0x10
[6972591.721091]  ? __pfx_zio_execute+0x10/0x10 [zfs]
[6972591.721630]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[6972591.722080]  kthread+0xdd/0x100
[6972591.722509]  ? __pfx_kthread+0x10/0x10
[6972591.722922]  ret_from_fork+0x29/0x50

@khain0 khain0 added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 16, 2024
@khain0
Copy link
Author

khain0 commented Sep 18, 2024

The same issue occurred today

[103992.441067] CPU: 79 PID: 2564 Comm: z_rd_int_1 Kdump: loaded Tainted: P           OE  X  -------  ---  5.14.0-427.22.1.el9_4.x86_64 #1
[103992.443048] Hardware name: Dell Inc. PowerEdge XE8640/0TVHHH, BIOS 2.0.3 05/15/2024
[103992.444053] RIP: 0010:kfpu_end+0x34/0xa0 [zcommon]
[103992.445062] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 65 8b 05 4a c2 74 3f 48 98 48 8b 0c c2 0f 1f 44 00 00 b8 ff ff ff ff 89 c2 <0f> c7 19 fb 65 ff 0d 29 c2 74 3f 75 05 0f 1f 44 00 00 48 8b 44 24
[103992.447128] RSP: 0000:ff4809c2396478a0 EFLAGS: 00010046
[103992.448170] RAX: 00000000ffffffff RBX: ff4809c2396479a0 RCX: ff16a4239c857000
[103992.449228] RDX: 00000000ffffffff RSI: ff16a4799fde0000 RDI: ff4809c2396479c0
[103992.450284] RBP: 0000000000020000 R08: ff4809c2396479a0 R09: 0000000000000000
[103992.451163] R10: 0000000000000000 R11: ff16a465cee3f578 R12: ff16a4799fde0000
[103992.451978] R13: 0000000000020000 R14: 0000000000000000 R15: 0000000000000008
[103992.452796] FS:  0000000000000000(0000) GS:ff16a4a17f9c0000(0000) knlGS:0000000000000000
[103992.453627] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[103992.454454] CR2: 00007fb1fbc6a560 CR3: 000000afba548004 CR4: 0000000000771ee0
[103992.455229] PKRU: 55555554
[103992.456234] Call Trace:
[103992.457231]  <TASK>
[103992.457985]  ? show_trace_log_lvl+0x1c4/0x2df
[103992.458926]  ? show_trace_log_lvl+0x1c4/0x2df
[103992.459914]  ? abd_fletcher_4_iter+0x64/0xc0 [zcommon]
[103992.460886]  ? __die_body.cold+0x8/0xd
[103992.461829]  ? die_addr+0x39/0x60
[103992.462749]  ? exc_general_protection+0x1aa/0x400
[103992.463614]  ? asm_exc_general_protection+0x22/0x30
[103992.464441]  ? kfpu_end+0x34/0xa0 [zcommon]
[103992.465247]  abd_fletcher_4_iter+0x64/0xc0 [zcommon]
[103992.466032]  abd_iterate_func.part.0+0xbd/0x1c0 [zfs]
[103992.466907]  ? __pfx_abd_fletcher_4_iter+0x10/0x10 [zcommon]
[103992.467666]  abd_fletcher_4_native+0x7c/0xc0 [zfs]
[103992.468521]  ? update_sg_lb_stats+0x7e/0x450
[103992.469119]  ? blk_mq_start_request+0x34/0x120
[103992.469713]  ? nvme_prep_rq.part.0+0xab/0x110 [nvme]
[103992.470298]  ? nvme_queue_rqs+0x1e7/0x290 [nvme]
[103992.470959]  zio_checksum_error_impl+0xf9/0x640 [zfs]
[103992.471667]  ? __pfx_abd_fletcher_4_native+0x10/0x10 [zfs]
[103992.472362]  ? __blk_flush_plug+0xf1/0x150
[103992.473015]  ? remove_entity_load_avg+0x2e/0x70
[103992.473617]  ? migrate_task_rq_fair+0x14c/0x1d0
[103992.474228]  ? sched_clock+0xc/0x30
[103992.474743]  ? __smp_call_single_queue+0x93/0x120
[103992.475425]  ? ttwu_queue_wakelist+0xf2/0x110
[103992.475978]  ? try_to_wake_up+0x3e2/0x5d0
[103992.476622]  zio_checksum_error+0x64/0xc0 [zfs]
[103992.477363]  vdev_raidz_io_done+0x1b6/0x550 [zfs]
[103992.478090]  zio_vdev_io_done+0x7c/0x220 [zfs]
[103992.478811]  zio_execute+0x80/0x120 [zfs]
[103992.479534]  taskq_thread+0x2cc/0x500 [spl]
[103992.480143]  ? __pfx_default_wake_function+0x10/0x10
[103992.480731]  ? __pfx_zio_execute+0x10/0x10 [zfs]
[103992.481404]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[103992.481976]  kthread+0xdd/0x100
[103992.482520]  ? __pfx_kthread+0x10/0x10
[103992.483028]  ret_from_fork+0x29/0x50

@rincebrain
Copy link
Contributor

I think this would be #14989, whose workaround is in 2.2.x but not backported into a 2.1.x release so far (it's in 2.1.16-staging, but I don't know if 2.1.16 will ever be released.)

You could try cherrypicking from f288fdb if you can't upgrade to 2.2.x, but 2.2.x would probably be the simpler solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants