-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvement(setup): improve caching of local tarballs #557
improvement(setup): improve caching of local tarballs #557
Conversation
22be9ac
to
2c32b7a
Compare
ccmlib/utils/download.py
Outdated
@@ -172,7 +172,8 @@ def get_url_hash(url: str) -> str: | |||
""" | |||
|
|||
if os.path.exists(url): # if file/dir is local, hash based on the path | |||
return hashlib.md5(url).hexdigest() | |||
# Strings must be encoded before hashing | |||
return hashlib.md5(url.encode('utf-8')).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might worth adding a unit test for this case (we have test for downloading, but not for using local file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
2d5f7dd
to
262b1ed
Compare
9823d09
to
64970e2
Compare
ccmlib/common.py
Outdated
# get hash from file | ||
for line in lines: | ||
if line.startswith("hash="): | ||
current_hash = line.replace("hash=", "").strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that the source_file
does not contain a line of "hash=.."? is it possible that it contains multiple lines of "hash=.."? if both answers are "no", i'd suggest put something like:
if not source_file.exists():
return ""
with open(source_file, 'r') as f:
for line in f:
if line.startswith("hash="):
return line.replace("hash=", "").strip()
throw ArgumentError(f"{source_file} does not contains the hash")
less indent this way. and less tolerant for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few lines with "hash" is not possible because we replace whole file and not add the lines.
No "hash": it is possible. Due to this I added the comment:
# Current hash may be None. It may be due to uncompleted downloading and installation.
# Or because of it is old installation that was not saved the hash yet.
# In any case the new download and installation should be performed.
# For this goal we need to remove existing folder and start downloading again
It means that if hash is not in the source.txt file - version will be downloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. then still good to return early in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think we need to return and stop the test? Maybe there is the reason that I did not think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @tchaikov is talking more on the style, i.e. readability of it, nothing logical you are missing here.
i.e. no need for exception, but the style suggestion still can be relevert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov I changed it, let me know if it is more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. indeed. i was concerned about the readability. the code was and still correct.
64970e2
to
ae4bc72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a couple comments.
ccmlib/scylla_repository.py
Outdated
if validate_by_hash and packages: | ||
""" | ||
Validate if packages hash was changed and the new package(s) have to be downloaded | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, i'd suggest use a comment instead of a multi-line string for comment. as it cannot be used as doc-string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, if we had a linter of any kind on this project, it would have complain about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ccmlib/scylla_repository.py
Outdated
if not (new_hash and current_hash) or new_hash != current_hash: | ||
# remove version_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, new_hash
should always be a non-empty string, right? so, probably we can simplify this a little bit. how about:
for package in zip(packages._fields, packages):
# ...
if current_hash is None or new_hash != current_hash:
rmdirs(version_dir)
version_dir = None
break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I do not know, may be from some reason it will be empty. I added this validation to be sure. If it will be None (I can not let you example now), the comparison new_hash != current_hash
will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. the !
and not
in the condition expression makes my eyes hurt. hence this suggestion. if we can trust the 'ETag' returned by S3, then it should not be empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if get_url_hash
always returns string, there no need to check for emptiness of it return value.
also get_installed_scylla_package_hash
doesn't return None (at least not now), it always return a string, could be empty, so using is None
wouldn't work as one might expect.
so current_hash and new_hash != current_hash
should be suffice
if get_url_hash
might return None
so current_hash and new_hash and new_hash != current_hash
should do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we can trust. 'ETag' is not mandatory. https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html
@fruch please, let you opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov I changed it, let me know if it is more clear
tests/test_scylla_repository.py
Outdated
|
||
def test_get_local_tarball_hash(self): | ||
this_path = Path().resolve() | ||
url_hash = get_url_hash(url=str(this_path / "tests" / "test_data" / "scylla_unified_master_2023_04_03.tar.gz")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is that file ? in the the source code ?
let make sure it's not a full sized release, the a minimal empty tar.gz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yeah, CI show there no such file:
FAILED tests/test_scylla_repository.py::TestScyllaRepositoryRelease::test_get_local_tarball_hash - requests.exceptions.MissingSchema: Invalid URL '/home/runner/work/scylla-ccm/scylla-ccm/tests/test_data/scylla_unified_master_2023_04_03.tar.gz': No scheme supplied. Perhaps you meant https:///home/runner/work/scylla-ccm/scylla-ccm/tests/test_data/scylla_unified_master_2023_04_03.tar.gz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took is from unstable. It's size 166M
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yeah, CI show there no such file:
FAILED tests/test_scylla_repository.py::TestScyllaRepositoryRelease::test_get_local_tarball_hash - requests.exceptions.MissingSchema: Invalid URL '/home/runner/work/scylla-ccm/scylla-ccm/tests/test_data/scylla_unified_master_2023_04_03.tar.gz': No scheme supplied. Perhaps you meant https:///home/runner/work/scylla-ccm/scylla-ccm/tests/test_data/scylla_unified_master_2023_04_03.tar.gz?
Yeah, I did not added it. fixing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fruch
Oh, it is too big. It's is allowed 100m. I'll take any tarball. It is not must to be scylla package, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
ae4bc72
to
52be412
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliayakovlev as I've said, if we can make it even smaller, I would rather not had 24mb binary on a git repository
and if we need something at this size for some reason, that it would fetch or generated during the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliayakovlev as I've said, if we can make it even smaller, I would rather not had 24mb binary on a git repository
and if we need something at this size for some reason, that it would fetch or generated during the test
replaced it with small 131K. is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better than 24Mb
15e000f
to
35aba32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better. thanks!
bf32b6e
to
7a4d007
Compare
ad3a94e
to
61671b9
Compare
If the test version is unstable (not release, maybe private branch) and installation folder exists, we want to check if this version was downloaded nd installed already in the past and was changed. In this case the version should be downloaded and installed again. Compare hash of saved version (it is saved in the 'scylla-core-package/source.txt' file) and hash of a new package. If it is same - skip the download. If not - remove existing folder and download again.
61671b9
to
bdf4ea1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@juliayakovlev can you also update dtest readme, and remove the comment about this issue?
If the test version is unstable (not release, maybe private branch) and installation folder exists, we want to check if this version was downloaded nd installed already in the past and was changed.
In this case the version should be downloaded and installed again. Compare hash of saved version (it is saved in the 'scylla-core-package/source.txt' file) and hash of a new package.
If it is same - skip the download. If not - remove existing folder and download again.
Tests: