Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYNPY-1548] Swap to a FIFO queue #1147

Merged
merged 14 commits into from
Dec 17, 2024

Conversation

BryanFauble
Copy link
Contributor

@BryanFauble BryanFauble commented Dec 6, 2024

Problem:

  1. AsyncIO has no order guarantee for the AsyncIO tasks that are created, it can lead to us requesting a auth token for a specific file and not using it until much later, leading it to be expired.
  2. Logging and messaging around the download process was a bit all over the place without clearly indicating what message was for what Synapse ID.
  3. The TQDM download bar (See: https://sagebionetworks.jira.com/browse/SYNPY-1507) was broken during concurrent file downloading and the sync from synapse logic didn't use a context managed download bar.

Solution:

  1. Move to a FIFO queue for file downloads that works by indexing the container to be downloaded, and then starting a number of workers that will concurrently download each file. This makes it such that only a smaller number of files can enter the download process rather than everything.
  2. Adding in additional retry logic around getting the file size, along with moving the logic to get the authenticated URL to the same thread where the file size is retrieved.
  3. Update the messaging for file downloads to include a prefix of [ID:Name]:

image

  1. Update the TQDM bar to track concurrent file downloads and only close the transfer bar when there are no active transfers. Also swapping the sync from synapse container logic to use a context managed transfer bar

Testing:

  1. I have tested/benchmarked (Without significant changes) 1000 files 1 MiB each, and 100 files, 1000 MiB each
  2. I re-did benchmarking for 100GB file downloads and verified no changes to benchmarking results
  3. @thomasyu888 Also verified around 800GB of file downloads without error with a synapse project

@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

@BryanFauble BryanFauble changed the title Swap to a FIFO queue [SYNOY-1548] Swap to a FIFO queue Dec 16, 2024
@pep8speaks
Copy link

pep8speaks commented Dec 16, 2024

Hello @BryanFauble! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 279:89: E501 line too long (108 > 88 characters)
Line 284:89: E501 line too long (102 > 88 characters)
Line 557:89: E501 line too long (92 > 88 characters)

Line 79:89: E501 line too long (91 > 88 characters)

Comment last updated at 2024-12-17 21:34:38 UTC

@BryanFauble BryanFauble changed the title [SYNOY-1548] Swap to a FIFO queue [SYNPY-1548] Swap to a FIFO queue Dec 16, 2024
Comment on lines +108 to +114
transfer_count: int = getattr(_thread_local, "transfer_count", 0)
transfer_count -= 1
if transfer_count < 0:
transfer_count = 0

_thread_local.transfer_count = transfer_count
if progress_bar is not None and not transfer_count:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://sagebionetworks.jira.com/browse/SYNPY-1507 - There was an issue where using asyncio.gather to execute a bunch of download tasks in parallel would cause the progress bar to get closed, and no longer show. By tracking the files being transferred and increment when we open a new bar/decrement when closing a bar we can close the bar when the last transfer occurs. It allows the bar to remain open for the duration of the transfer and maintain it's progress/context throughout.

Comment on lines 58 to 92
async def worker(
self,
queue: asyncio.Queue,
failure_strategy: FailureStrategy,
synapse_client: Synapse,
) -> NoReturn:
"""
Coroutine that will process the queue of work items. This will process the
work items until the queue is empty. This will be used to download files in
parallel.

Arguments:
queue: The queue of work items to process.
failure_strategy: Determines how to handle failures when retrieving items
out of the queue and an exception occurs.
synapse_client: The Synapse client to use to download the files.
"""
while True:
# Get a "work item" out of the queue.
work_item = await queue.get()

try:
result = await work_item
except asyncio.CancelledError as ex:
raise ex
except Exception as ex:
result = ex

self._resolve_sync_from_synapse_result(
result=result,
failure_strategy=failure_strategy,
synapse_client=synapse_client,
)

queue.task_done()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By taking this worker approach we are limiting the number of concurrent file transfers. Asyncio also provides a semaphore (https://docs.python.org/3/library/asyncio-sync.html#asyncio.Semaphore) - But, this First in, First out queue seemed like the more intuitive approach to solving this problem.

Comment on lines 409 to 413
side_effect=[
mocked_project_rest_api_dict(),
mocked_folder_rest_api_dict(),
mocked_file_rest_api_dict(),
mocked_folder_rest_api_dict(),
],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to swapping over to using an AsyncIO queue the files are instantly downloaded as they're added to the queue, instead of waiting for the code to execute all of the tasks as Folder tasks are added to the list. It changes the order of operations slightly, and as a result - Broke these unit tests.

@BryanFauble BryanFauble marked this pull request as ready for review December 17, 2024 18:49
@BryanFauble BryanFauble requested a review from a team as a code owner December 17, 2024 18:49
Comment on lines +232 to +234
with syn._requests_session_storage.stream(
method="GET", url=presigned_url_provider.get_info().url
) as response:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the logic to here aligns with how the logic works when streaming each individual part of the files. This is executing in a different thread so the presigned URL should be instantly used.

I also wrapped this in a retry block as well to ensure it'll remain functional.

Comment on lines +282 to +285
else:
client.logger.info(
f"[{file.id}:{file_name}]: Found existing file at {download_path}, skipping download."
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In lieu of structured logging setup this is at least a start to get to the point where logging is clearer within the client. I updated a bunch of the messages around the download process so that it's clear what Synapse ID/file/entity produced the message.

Copy link
Contributor

@BWMac BWMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome update

synapseclient/core/download/download_functions.py Outdated Show resolved Hide resolved
Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 Nice use of FIFO queue so that it's not just a completely random ordering of tasks being executed. Does this slow down download speeds a bit?

@BryanFauble
Copy link
Contributor Author

🔥 Nice use of FIFO queue so that it's not just a completely random ordering of tasks being executed. Does this slow down download speeds a bit?

@thomasyu888 i redid a few of the benchmark tests and they were all within 5% of the previous tests

@BryanFauble BryanFauble merged commit be4c78a into develop Dec 17, 2024
24 checks passed
@BryanFauble BryanFauble deleted the worker-and-queue-for-concurrent-downloads branch December 17, 2024 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants