Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rh_kselftests_vm: kernel selftests execution in guest #4114

Merged
merged 1 commit into from
Sep 29, 2024

Conversation

mcasquer
Copy link
Contributor

@mcasquer mcasquer commented Jul 23, 2024

rh_kselftests_vm: kernel selftests execution in guest

Creates a new test case that executes the kernel selftests
inside the VM through the RPM that has been previously downloaded
and installed. Could be expanded with more tests in the future.

Signed-off-by: mcasquer [email protected]
ID: 2637

@mcasquer mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from f024df5 to fb7b441 Compare July 23, 2024 07:41
@mcasquer
Copy link
Contributor Author

 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.vm_hugetlb_selftests.q35: STARTED
 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.vm_hugetlb_selftests.q35: PASS (171.86 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

@mcasquer mcasquer marked this pull request as ready for review July 23, 2024 07:58
@mcasquer
Copy link
Contributor Author

@zhenyzha @MiriamDeng @fbq815 @zhencliu @YongxueHong please could you review this PR? Thanks !

@mcasquer
Copy link
Contributor Author

mcasquer commented Aug 7, 2024

@zhenyzha @MiriamDeng @fbq815 @zhencliu @YongxueHong this is a kindly reminder, please could you review this PR? Thanks !

@zhenyzha
Copy link
Contributor

zhenyzha commented Aug 8, 2024

@mcasquer Could you provide the test results of rhel.10? I have not been able to compile successfully on 10 recently.
qemu.vm_hugetlb_selftests.arm64-pci: FAIL: Error during mm selftests compilation: <sys/capability.h>

@mcasquer
Copy link
Contributor Author

mcasquer commented Aug 8, 2024

@mcasquer Could you provide the test results of rhel.10? I have not been able to compile successfully on 10 recently. qemu.vm_hugetlb_selftests.arm64-pci: FAIL: Error during mm selftests compilation: <sys/capability.h>

Indeed, good catch !

@mcasquer mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 3 times, most recently from 4aeb981 to f76de3b Compare August 13, 2024 07:09
@mcasquer mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from f76de3b to 370b56e Compare August 16, 2024 06:35
@yanan-fu
Copy link
Contributor

Hi @mcasquer , I have a quick look with the kernel CKI testing for the mm part, any code change with the following trigger source will trigger the test automatically,

trigger_sources:
  - tools/testings/selftests/.*
  - mm/.*
  - arch/x86/mm/.*
  - arch/s390/mm/.*
  - arch/arm64/mm/.*
  - arch/powerpc/mm.*

hugetlb was covered with:
https://gitlab.com/redhat/centos-stream/tests/kernel/kpet-db/-/blob/main/cases/selftests/kselftests/mm.yaml?ref_type=heads#L64

I would like to double confirm does this meet your requirements, or it is must to check it in a qemu-kvm based VM os which is the purpose of this patch ?
Thanks

@mcasquer
Copy link
Contributor Author

I would like to double confirm does this meet your requirements, or it is must to check it in a qemu-kvm based VM os which is the purpose of this patch ? Thanks

@yanan-fu the idea is to execute those tests inside a VM backed by hugepages, my understanding is the kernel team is not covering this scenario so that's why we need the test case

@mcasquer mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from 370b56e to 061d464 Compare August 26, 2024 10:39
@zhencliu
Copy link
Contributor

I would like to double confirm does this meet your requirements, or it is must to check it in a qemu-kvm based VM os which is the purpose of this patch ? Thanks

@yanan-fu the idea is to execute those tests inside a VM backed by hugepages, my understanding is the kernel team is not covering this scenario so that's why we need the test case

Hi @mcasquer , I am curious do we need to cover different page size? Or the page size is a part of test matrix?

@mcasquer
Copy link
Contributor Author

I would like to double confirm does this meet your requirements, or it is must to check it in a qemu-kvm based VM os which is the purpose of this patch ? Thanks

@yanan-fu the idea is to execute those tests inside a VM backed by hugepages, my understanding is the kernel team is not covering this scenario so that's why we need the test case

Hi @mcasquer , I am curious do we need to cover different page size? Or the page size is a part of test matrix?

@zhencliu not really, at least for x86_64 the 2MB hugepages are fine

@mcasquer mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 3 times, most recently from edc0751 to 68c1f57 Compare September 2, 2024 13:56
@mcasquer
Copy link
Contributor Author

mcasquer commented Sep 2, 2024

@YongxueHong @zhencliu could you review again this PR? Thanks!

@mcasquer mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 4 times, most recently from 9d7799f to 083e5ca Compare September 3, 2024 07:22
@mcasquer
Copy link
Contributor Author

mcasquer commented Sep 3, 2024

@zhenyzha @fbq815 do you consider this test case can be supported in you corresponding archs? Please have a look to the issue and to the internal patch as well

@mcasquer
Copy link
Contributor Author

mcasquer commented Sep 3, 2024

@zhenyzha @fbq815 do you consider this test case can be supported in you corresponding archs? Please have a look to the issue and to the internal patch as well

That way I will delete the only x86_64 key from cfg

@mcasquer
Copy link
Contributor Author

@yanan-fu @PaulYuuu could you review this PR? Thanks !

Copy link
Contributor

@fbq815 fbq815 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the test result above, LGTM

@mcasquer mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 2 times, most recently from ac5e452 to 7906012 Compare September 25, 2024 05:34
@mcasquer
Copy link
Contributor Author

@PaulYuuu @yanan-fu sorry for pinging you again, but it would be ideal to merge this PR before the end of the month, thanks !

test.fail("Error during selftests execution: %s" % o)

test.log.info("The selftests results: %s" % o)
error_context.context("Cleaning kernel files", test.log.debug)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be in the finally section of the try block, which means even if the test case fails or not, we should clean the downloaded rpm(and also uninstall it?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use clone_master = yes to skip the cleanup step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaulYuuu done !

@yanan-fu
Copy link
Contributor

Following @yanan-fu suggestion I took qemu_guest_agent as a reference please take a look and let me know if this is the idea you have in mind FYI @zhencliu

Thanks Mario, I am afraid it's not exactly. If we follow qemu_guest_agent, we introduce the class abstraction to encapsulate everything, i.e. setup/cleanup kselftest rpm package, and running the specific test with a member function execute() simply. Currently, the class you introduced just does part of pre-test task: setting the cartesian params in py, actually we don't recommend changing the global params in running time.
My previous idea is a more simple way, e.g. we define a selftests_function = 'test_vm_kernel_mm' for the 'mm' variant, then we can define a 'test_vm_kernel_mm' function in py, so function_obj is the locals()[params["selftests_function"]], in future we add a new variant 'nn', selftests_function = 'test_vm_kernel_nn', we define a 'test_vm_kernel_nn' function to run the testing.
But as we talked previously, the tests_execution_cmd is a more simple way, if no pre-task needs to be performed, we are OK for your previous solution. So what about using your last commit due to the deadline? You could consider the qemu_guest_agent solution when you have time

@zhencliu ok I see, yeah better going back to the tests_execution_cmd approach at this moment, thanks !

I just saw the version now.
It is not qga style!!! May be i did not make it clear enough, the key point is the gagent_check_type which lead to a individual function for one test case (variant), the solution is what @zhencliu mentioned above.

Is there any blocker to give up it now ? Just some code structure change.

s, o = session.cmd_status_output(tests_execution_cmd, 180)

# Exit code for skipped selftests is 4, raise a warning until is fixed
if s == 4:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is a issue need be fixed in kernel selftests, but it is intentional.

From my checking with the source code, if there is a skip case after a fail case, the return value is 4. It is wrong(Not the scope of this PR but the kselftest).

Suggest to use whitelist for the know test can be skipped, and parse the output of the test to make a final decision of the test case result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfu but then how will the user know that there are skipped tests? If we set the skipped as passed it's possible the user won't be aware, right?

Copy link
Contributor

@PaulYuuu PaulYuuu Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are talking about exitcode in this thread.

run_test() {
	if test_selected ${CATEGORY}; then
		# On memory constrainted systems some tests can fail to allocate hugepages.
		# perform some cleanup before the test for a higher success rate.
		if [ ${CATEGORY} == "thp" ] | [ ${CATEGORY} == "hugetlb" ]; then
			echo 3 > /proc/sys/vm/drop_caches
			sleep 2
			echo 1 > /proc/sys/vm/compact_memory
			sleep 2
		fi

		local test=$(pretty_name "$*")
		local title="running $*"
		local sep=$(echo -n "$title" | tr "[:graph:][:space:]" -)
		printf "%s\n%s\n%s\n" "$sep" "$title" "$sep" | tap_prefix

		("$@" 2>&1) | tap_prefix
		local ret=${PIPESTATUS[0]}
		count_total=$(( count_total + 1 ))
		if [ $ret -eq 0 ]; then
			count_pass=$(( count_pass + 1 ))
			echo "[PASS]" | tap_prefix
			echo "ok ${count_total} ${test}" | tap_output
		elif [ $ret -eq $ksft_skip ]; then
			count_skip=$(( count_skip + 1 ))
			echo "[SKIP]" | tap_prefix
			echo "ok ${count_total} ${test} # SKIP" | tap_output
			exitcode=$ksft_skip
		else
			count_fail=$(( count_fail + 1 ))
			echo "[FAIL]" | tap_prefix
			echo "not ok ${count_total} ${test} # exit=$ret" | tap_output
			exitcode=1
		fi
	fi # test_selected
}

In the loop,

1: PASS 1: SKIP 1: FAIL
2: PASS 0 4 1
2: SKIP 4 4 4
2: FAIL 1 1 1

Scenarios that we need to agree on are SKIP --> FAIL(exit 1) and FAIL --> SKIP(exit 4). I think FAIL --> SKIP(exit 4) must return 1 but this needs to be changed at Linux source code, for now, I don't think we have good solution to handle it. A workaround is to check the output and collect count of [PASS] [SKIP] [FAIL] rather than check the exit code. Keep if s == 4: in this version is in order to safely raise a warning if we will have new test cases for mm in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfu but then how will the user know that there are skipped tests? If we set the skipped as passed it's possible the user won't be aware, right?

Any skip but case not in whitelist should fail this auto case. Checking the output is needed.

  1. Use exitcode is incorrect if there are skip case but after a fail one
  2. If the skip case is in whitelist, i prefer to mark test case status to PASS instead of WARN as it is a known issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the whitelist approach

@zhencliu
Copy link
Contributor

Just some code structure change.

hi @yanan-fu , time is limited, as we talked before, tests_execution_cmd is OK for the current testing, and we may have to enhance it when we need to do some pre-test work or the execution is different from mm test, so what about keeping the current code, and make the change once we have to

@yanan-fu
Copy link
Contributor

Just some code structure change.

hi @yanan-fu , time is limited, as we talked before, tests_execution_cmd is OK for the current testing, and we may have to enhance it when we need to do some pre-test work or the execution is different from mm test, so what about keeping the current code, and make the change once we have to

okay, then let's focus on the result parser now, cc @mcasquer

@mcasquer
Copy link
Contributor Author

Results in x86_64

 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: STARTED
 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: PASS (213.96 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Results in s390x (now marked as passed)

 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: STARTED
 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: PASS (101.15 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

@mcasquer
Copy link
Contributor Author

@yanan-fu @zhencliu @PaulYuuu could you review again this PR? Thanks !

kvm_module_parameters = 'hpage=1'
setup_hugepages = yes
tests_execution_cmd = "cd ${kselftests_path}/mm && sh run_vmtests.sh -t hugetlb"
whitelist = "hugetlb_fault_after_madv"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move into s390x: as skip this case for s390x only if i remember it correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done !

session = vm.wait_for_login()
kernel_path = params.get("kernel_path", "/tmp/kernel")
tests_execution_cmd = params.get("tests_execution_cmd")
whitelist = params.get("whitelist").split()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align with the comment above, now, only s390x have the known skip case, so:
whitelist = params.get("whitelist", "").split() or whitelist = params.objects("whitelist")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done !

skipped_tests = True if "[SKIP]" in o else False
test.log.debug("Skipped tests: %r" % skipped_tests)
for test_name in whitelist:
if skipped_tests and test_name in o:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_name is always in the o, no matter the status is pass, skip or fail.
You need to parser the output and get the skiped case name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done !

@mcasquer mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 3 times, most recently from cc64e54 to b851155 Compare September 26, 2024 07:38
@mcasquer
Copy link
Contributor Author

Results in x86_64

 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: STARTED
 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: PASS (228.19 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Results in s390x

 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: STARTED
 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: PASS (102.73 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Comment on lines 53 to 58
if len(skipped_list) == num_skipped_tests:
return True
elif len(skipped_list) < num_skipped_tests:
raise exceptions.TestWarn("Some skipped test(s) are not in the whitelist")
if s != 0:
test.fail("Error during selftests execution: %s" % o)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM, except here.

If we have 3 test cases, 1 PASS 1 SKIP 1 FAIL, the skip one is on the whitelist, the current logic still returns True, but PASS + SKIP != 3.
I think the summary like SUMMARY: PASS=9 SKIP=1 FAIL=0 can help to detect how to handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaulYuuu
Updated, failed cases are considered first, then skipped so nothing should be missed now

Copy link
Contributor

@PaulYuuu PaulYuuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanan-fu I am okay with this version, how about you?

test.log.info("The selftests results: %s" % o)

summary = re.findall(r"\# SUMMARY.+", o)
num_failed_tests = int(re.findall(r"FAIL\=\d", str(summary))[0].split('=')[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
num_failed_tests = int(re.findall(r"FAIL\=\d", str(summary))[0].split('=')[1])
num_failed_tests = int(re.findall(r"FAIL\=\d+", str(summary))[0].split('=')[1])

To assume count may more than 10, same for skipped test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated !

test.log.debug("Number of failed tests: %d" % num_failed_tests)

if num_failed_tests != 0:
test.fail("Error during selftests execution: %s" % o)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already log output above, let's simplify the error message, the current one is too long for the test status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated !


summary = re.findall(r"\# SUMMARY.+", o)
num_failed_tests = int(re.findall(r"FAIL\=\d", str(summary))[0].split('=')[1])
test.log.debug("Number of failed tests: %d" % num_failed_tests)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test.log.debug("Number of failed tests: %d" % num_failed_tests)
test.log.debug("Number of failed tests: %d", num_failed_tests)

A bit suggestion, logging module will help to format it with this during call, can refer to the logging doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated !

Creates a new test case that executes the kernel selftests
inside the VM through the RPM that has been previously downloaded
and installed. Could be expanded with more tests in the future.

Signed-off-by: mcasquer <[email protected]>
@mcasquer
Copy link
Contributor Author

Results in x86_64

 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: STARTED
 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: PASS (198.72 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Results in s390x

 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: STARTED
 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: PASS (102.31 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

@YongxueHong YongxueHong merged commit 2b6aff3 into autotest:master Sep 29, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants