Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of rows in crosscheck metrics files are non-deterministic #1854

Open
kachulis opened this issue Feb 1, 2023 · 4 comments · Fixed by #1892
Open

Order of rows in crosscheck metrics files are non-deterministic #1854

kachulis opened this issue Feb 1, 2023 · 4 comments · Fixed by #1892

Comments

@kachulis
Copy link
Contributor

kachulis commented Feb 1, 2023

The order of rows output by CrosscheckFingerprints seems to change from run to run on the same input data. Not sure if this is new behavior or has always been the case.

This isn't a massive problem, but would be nice for the order to always be the same.

@lbergelson
Copy link
Member

As always I suspect HashMap/Set

@serge2016
Copy link

Is it possible to fix it?

@tmelman tmelman linked a pull request Jun 13, 2023 that will close this issue
5 tasks
droazen pushed a commit that referenced this issue Nov 14, 2023
…tively for a more stable result (#1892)

First step towards fixing #1854
@droazen
Copy link
Contributor

droazen commented Nov 14, 2023

Reopening, as this was not completely fixed by #1892

@droazen droazen reopened this Nov 14, 2023
@droazen
Copy link
Contributor

droazen commented Nov 14, 2023

Remaining TODOs (courtesy of @lbergelson ):

I looked at this a bit more, it seems like there is instability built in at the start since FingerPrintChecker.fingerprintFiles() builds an unsorted map from the output of a parallel and unpredictable process. I think it's going to require a more complicated change than just changing the other sets to linked sets. Either change the inputs to return their results in order, or sort the maps according to some criteria.

I think we either need to change it so the input maps are built as the result of Futures instead of being added to the concurrent map as the are built. The errors are actually handled this way so it wouldn't be that hard to change.

The other option would be to pick an arbitrary sort order and sort the input maps after they're built and then maintain that order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants