Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand payload to 10MB #59

Merged
merged 1 commit into from
Sep 25, 2024
Merged

Expand payload to 10MB #59

merged 1 commit into from
Sep 25, 2024

Conversation

mbosio85
Copy link
Member

expands the maximum payload to 10MB

@pditommaso
Copy link
Collaborator

We may have found who is collapsing Platform memory ..

@mbosio85
Copy link
Member Author

I forgot to mention that this is related to partially solve the issue mentioned in:
https://seqera.atlassian.net/browse/PLAT-280 "[Bug] Seqera Platform displays truncated reports when using Tower Agent"

@mbosio85
Copy link
Member Author

@pditommaso ,

We may have found who is collapsing Platform memory ..

Is this an actual worry? If so, 2x or 3x the max payload may create additional pressure.
I would wait to merge this until we are sure it's safe.
Also, I'd increase it to 15MB to match the value on the platform side and to answer @ewels's request in https://seqera.atlassian.net/browse/PLAT-280

@pditommaso
Copy link
Collaborator

I forgot to mention that this is related to partially solve the issue mentioned in:
https://seqera.atlassian.net/browse/PLAT-280

Indeed, context is useful.

Now this was a known limitation. It was discussed ages ago with @jordigg to implement a kind of chucking mechanism to prevent such hard limit. Worth also mentioning that @jimmypoms pointed out that Websocket should already support chunking, so it's not clear why there's this hard limit.

Apart this, it can be increased to 10MB, but we should be aware that's kicking the can down the road.

Also, there's a suspect that the agent is related to the abnormal usage of memory in platform. But we have no yet evidence for it.

@jordeu
Copy link
Member

jordeu commented Aug 27, 2024

It should theoretically be possible to leverage Websocket fragmentation (I don't know how to use it from Micronaut), but it may make things more difficult because there are other messages like "command-request", "heartbeat," and "info" that we may need to "defragment" and handle on our side the JSON deserialization. So, adding a new message, "command-response-fragment", may make things easier with the current setup and more backwards compatible. On top of this, we need to stream the fragments throw Redis so we may end up fragmenting again the websocket fragments.

Regarding "there's a suspect that the agent is related to the abnormal usage of memory in the platform", I suspect that the use of countSubscribers in Redis is not always working properly, so it might be related.

Overall, a more long-term solution to decouple this complexity could be to leverage Seqera Connect to handle the Agent-Platform communication channel.

@mbosio85
Copy link
Member Author

mbosio85 commented Sep 5, 2024

Overall, a more long-term solution to decouple this complexity could be to leverage Seqera Connect to handle the Agent-Platform communication channel.

While we wait/decide on a longer-term solution, I'd move this PR forward knowing it is a stop-gap.

@mbosio85 mbosio85 requested a review from jordeu September 5, 2024 08:21
@robnewman
Copy link
Member

robnewman commented Sep 17, 2024

@jordeu Any progress on this PR review? Ideally we would increase this to 15MB, but 10MB is an okay stop-gap to resolve the issue one of our key customers (Ziad at Biocommons, who is giving a talk at NF Summit BCN) is experiencing.

CC: @pditommaso - Any major concerns about Platform memory, or can we safely move to 10MB?

@pditommaso
Copy link
Collaborator

Think it's fine to move ahead, but we need to plan a more sustainable solution

Copy link
Member

@jordeu jordeu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take into account that this 10485760 size will allow to transfer files of ~7.5MB because they are base64 encoded. We cannot increase much more without changing also the limit that we have on platform side.

Agent was never thought to transfer files, only to execute commands. In fact, we are transferring files using a command line execution of a cat. This is very inefficient and we should plan to add proper support for transferring files.

@robnewman
Copy link
Member

@jordeu Thanks for the review. The build is failing the checks. Anything we can do about that?

@jordeu
Copy link
Member

jordeu commented Sep 17, 2024

Seems that GH Actions are broken and need an update. I can check tomorrow.

@robnewman
Copy link
Member

@jordeu Any chance we could get this fixed on Monday so we can provide an update to the customer? Thanks!

@jordeu
Copy link
Member

jordeu commented Sep 23, 2024

After merging #54 we'll be able to update and build this branch.

@jordeu jordeu changed the base branch from master to update-graalvm September 23, 2024 16:48
@jordeu
Copy link
Member

jordeu commented Sep 23, 2024

Meanwhile, we wait for #54 approval, I've rebased this PR on top of that so it builds the binary and JAR distributions that can be downloaded from the artifacts section here: https://github.com/seqeralabs/tower-agent/actions/runs/10998755756

@robnewman @mbosio85 if you wish you can download and test it from there.

@jordeu jordeu changed the base branch from update-graalvm to master September 25, 2024 03:34
@jordeu jordeu merged commit 357e6d2 into master Sep 25, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants