Skip to content

Commit

Permalink
fix: ignore datasets that haven't been processed when redirecting
Browse files Browse the repository at this point in the history
This makes it so when redirecting to latest version (or latest major/minor
version) datasets that haven't yet been processed by the backend worker are
ignored.

This is so clients can better trust the "latest" is the latest fully usable
version of data.

Co-authored-by: Michal Charemza <[email protected]>
Co-authored-by: Mohizur Khan <[email protected]>
  • Loading branch information
Mohizurkhan and michalc committed Jul 2, 2024
1 parent 91d32e9 commit 3de0349
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,10 +230,15 @@ def semver_key(path):
v_major_str, minor_str, patch_str = path.split('.')
return (int(v_major_str[1:]), int(minor_str), int(patch_str))

folders = aws_list_folders(
keys = aws_list_keys(
signed_s3_request, prefix=request.view_args['dataset_id'] + '/'
)
matching_folders = filter(predicate, folders)
folders_with_processed_datasets = set(
key.partition('/')[0]
for key in keys
if '__CSV_VERSION_' in key
)
matching_folders = filter(predicate, folders_with_processed_datasets)
latest_matching_version = max(
matching_folders, default=None, key=semver_key
)
Expand Down

0 comments on commit 3de0349

Please sign in to comment.