Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same test executed multiple times in parallel on different processes and some of them not at all #1242

Open
bartam1 opened this issue Jul 26, 2023 · 11 comments

Comments

@bartam1
Copy link

bartam1 commented Jul 26, 2023

Dear Community!

Im working on an E2E test using ginkgo.
I do the same that @onsi advise to run tests with different parameters in parallel (#561 (comment))

Currently my setup looks like:

Describe("root", func() {
   Describe(fmt.Sprintf("PID: %d:", GinkgoParallelProcess()), Ordered, func() {
          Describe(test, Ordered, func() {
                 It(...)
                 It(...)
                 It(...)
          })
    })
    Describe(fmt.Sprintf("PID: %d:", GinkgoParallelProcess()), Ordered, func() {
          Describe(test, Ordered, func() {
                 It(...)
                 It(...)
                 It(...)
          })
    }) 
    ...
})

When I run this in serial everyting is good but when I go with parallel I have different test results every time.

  • Sometimes I have more tests then the expected sometimes less
  • Sometimes same tests runs multiple time and other ones skipped

In the following example I run 5 specs:
MockTest1 has MockTest1-1 and MockTest1-2
MockTest2 has MockTest2-1 MockTest2-2 and MockTest2-3

Expected: (what we have in serial):

(all tests executed in order)

Will run 5 of 5 specs
------------------------------
 PID: 1: MockTest1(testContextName3) MockTest1-1 [clusterID:clusterID3, testID:9112f3d1ae]
• [1.002 seconds]
------------------------------
 PID: 1: MockTest1(testContextName3) MockTest1-2 [clusterID:clusterID3, testID:9112f3d1ae]
• [1.001 seconds]
------------------------------
 PID: 1: MockTest2(testContextName2) MockTest2-1 [clusterID:clusterID2, testID:005f598d5b]
• [1.001 seconds]
------------------------------
 PID: 1: MockTest2(testContextName2) MockTest2-2 [clusterID:clusterID2, testID:005f598d5b]
• [1.001 seconds]
------------------------------
 PID: 1: MockTest2(testContextName2) MockTest2-3 [clusterID:clusterID2, testID:005f598d5b]
• [1.001 seconds]
------------------------------
Ran 5 of 5 Specs in 5.009 seconds
SUCCESS! -- 5 Passed | 0 Failed | 0 Pending | 0 Skipped

Actual what we have in parallel:

Will run 5 of 5 specs
Running in parallel across 7 processes
------------------------------
• [1.001 seconds]
 PID: 4: MockTest1(testContextName3) MockTest1-1 [clusterID:clusterID3, testID:9112f3d1ae]
------------------------------
• [1.001 seconds]
 PID: 3: MockTest1(testContextName1) MockTest1-1 [clusterID:clusterID1, testID:0aa3b528cc]
------------------------------
• [1.001 seconds]
 PID: 3: MockTest1(testContextName1) MockTest1-2 [clusterID:clusterID1, testID:0aa3b528cc]
------------------------------
• [1.001 seconds]
 PID: 4: MockTest1(testContextName3) MockTest1-2 [clusterID:clusterID3, testID:9112f3d1ae]
------------------------------
Ran 4 of 5 Specs in 2.015 seconds
SUCCESS! -- 4 Passed | 0 Failed | 0 Pending | 0 Skipped
  • We can see that 4 of 5 Specs executed and 4 specs passed (there is no skipped or failed -- something is buggy)
  • MockTest2 tests hidden skipped

This is also a buggy ran for second time:

Will run 5 of 5 specs
Running in parallel across 7 processes
------------------------------
• [1.001 seconds]
 PID: 5: MockTest2(testContextName1) MockTest2-1 [clusterID:clusterID1, testID:4050bf3235]
------------------------------
• [1.001 seconds]
 PID: 1: MockTest2(testContextName2) MockTest2-1 [clusterID:clusterID2, testID:005f598d5b]
------------------------------
• [1.001 seconds]
 PID: 5: MockTest2(testContextName1) MockTest2-2 [clusterID:clusterID1, testID:4050bf3235]
------------------------------
• [1.001 seconds]
 PID: 1: MockTest2(testContextName2) MockTest2-2 [clusterID:clusterID2, testID:005f598d5b]
------------------------------
• [1.001 seconds]
 PID: 1: MockTest2(testContextName2) MockTest2-3 [clusterID:clusterID2, testID:005f598d5b]
------------------------------
• [1.002 seconds]
 PID: 5: MockTest2(testContextName1) MockTest2-3 [clusterID:clusterID1, testID:4050bf3235]
------------------------------

Ran 6 of 5 Specs in 3.020 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 0 Skipped
  • We can see 6 of 5 Specs ?
  • MockTest1 skipped

Can you give me please some explanation why is this happen?
I cannot figure out :(

@onsi
Copy link
Owner

onsi commented Jul 26, 2023

hey @bartam1 is there any chance you can share the actual code so I can see how the clusterId and testId parameters are being injected - and how the test is being constructed? Sometimes what you are seeing can happen if different test processes construct different trees (Ginkgo's parallelisation model requires that the constructed tree is 100% consistent between among test runners.)

@bartam1
Copy link
Author

bartam1 commented Jul 26, 2023

Hello @onsi! Thank you for the fast reply!!!

This is the PR that Im working on.
banzaicloud/koperator#1020
Important part: https://github.com/banzaicloud/koperator/blob/61a0815f099ae7409b828eba2ba795acca119958/tests/e2e/pkg/tests/tests.go#L358
https://github.com/banzaicloud/koperator/blob/61a0815f099ae7409b828eba2ba795acca119958/tests/e2e/pkg/tests/tests.go#L92
Currently it is configured to do the same output that I wrote in the issue

@bartam1
Copy link
Author

bartam1 commented Jul 26, 2023

hey @bartam1 is there any chance you can share the actual code so I can see how the clusterId and testId parameters are being injected - and how the test is being constructed? Sometimes what you are seeing can happen if different test processes construct different trees (Ginkgo's parallelisation model requires that the constructed tree is 100% consistent between among test runners.)

This can be the problem. Im looking into it.
Thank you for the clue!

@onsi
Copy link
Owner

onsi commented Jul 26, 2023

hey there - i think the issue might be on line 92-95 of tests.go. the closure captures tests which will be changed as you loop through testsByClusterID. you'll want tests := tests between lines 92 and 93. give that a try.

@bartam1
Copy link
Author

bartam1 commented Jul 26, 2023

hey there - i think the issue might be on line 92-95 of tests.go. the closure captures tests which will be changed as you loop through testsByClusterID. you'll want tests := tests between lines 92 and 93. give that a try.

Thanks that one helped.
I also had a random mix between runs which caused specs tree inconsistency. (https://github.com/banzaicloud/koperator/blob/61a0815f099ae7409b828eba2ba795acca119958/tests/e2e/pkg/tests/tests.go#L175)
I still could produce an issue and probably because iterating through the map keys the order can be different in each run thus different tree can be constructed.

@onsi
Copy link
Owner

onsi commented Jul 26, 2023

hey - that makes sense. the map issue should have been fixed in v2.7.0 - since that release ginkgo breaks ties for specs in the same location using the spec name. but you do need to make sure the spec names are identical between processes (eg that random test id you are generating needs to be the same for each process).

@bartam1
Copy link
Author

bartam1 commented Jul 26, 2023

hey - that makes sense. the map issue should have been fixed in v2.7.0 - since that release ginkgo breaks ties for specs in the same location using the spec name. but you do need to make sure the spec names are identical between processes (eg that random test id you are generating needs to be the same for each process).

testID is fixed.
Is this can be a problem: https://github.com/banzaicloud/koperator/blob/61a0815f099ae7409b828eba2ba795acca119958/tests/e2e/pkg/tests/tests.go#L92
Here I using GinkgoParallelProcess() in each Describe container name so this will be different in each process

@onsi
Copy link
Owner

onsi commented Jul 26, 2023

oh yes good catch - that’ll be a problem too. sorry this is so confusing :( there’s not a great way to ensure consistent order when a loop like this is in play.

you can attach the process index as metadata via AddReportEntry or as a Label if you’d like

@onsi
Copy link
Owner

onsi commented Jul 26, 2023

i really should add a checksum to validate that all trees are the same and give the user feedback. i’ll add that to the backlog.

@bartam1
Copy link
Author

bartam1 commented Jul 27, 2023

Thank you for the helps! It looks like works fine so far.
I could fix every problem with your help.
Thank you for the fast replies!

@bartam1
Copy link
Author

bartam1 commented Jul 27, 2023

i really should add a checksum to validate that all trees are the same and give the user feedback. i’ll add that to the backlog.

It is a good idea. Please let me know when the PR is up :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants