How to safely re-enqueue an aborted job? #485

sashkent3 · 2024-11-09T15:21:21Z

I need to perform something similar to:

await Job("id", pool).abort()
await pool.enqueue_job("func", _job_id="id")

However, sometimes this can lead to the abort of the freshly enqueued job. From my observations, this always happens if the aborted job is not found. ~~Waiting until a key in the abort queue expires (1 minute, not configurable) seems to help.~~ My questions are then:

Is waiting for arq.constants.abort_job_max_age guaranteed to be enough for the freshly enqueued job to not be aborted?
Is it possible to not abort a job if it's not found? Simply checking the job.status() is a race condition.

The text was updated successfully, but these errors were encountered:

sashkent3 · 2024-11-19T23:37:47Z

I believe I've found the culprit of my problem. Calling await Job("id", pool).abort() isn't ever safe. If the job isn't found, the specified job_id will remain in the abort_jobs_ss until a job with the specified id is run. At that point, the job will be "aborted before start".
The comment here is incorrect. Items in the abort_jobs_ss older than abort_job_max_age are not deleted. The line here seems rather confusing. The worker removes items from the abort_jobs_ss which are abort_job_max_age (60 ms) in the future. Such items can only appear if the Job.abort was called immediately after the above line but before the pipeline is executed.
If the only intended purpose here was to deliver on the abort_job_max_age comment's promise, the line should probably look like this:

pipe.zremrangebyscore(abort_jobs_ss, min=0, max=timestamp_ms() - abort_job_max_age)

I'm willing to submit a PR if the issue is confirmed and the proposed resolution is accepted by the maintainers.
Also, it seems like the value of the abort_job_max_age constant was specified in milliseconds by mistake, and the 60-second max-age was intended.

drizzt · 2024-11-23T12:44:48Z

hi, did you see if your line fixes the problem?

sashkent3 · 2024-11-23T12:47:30Z

hi, did you see if your line fixes the problem?

I did very limited testing so take it for what it's worth. But yes, the fix seems to be working for me.

SteniMariyaThomas · 2024-12-26T07:12:40Z

@sashkent3 The enqueued job always be in queue until it taken for execution. Then how will it be not found while aborting.

Can you explain the scenario in which the job is not found

sashkent3 · 2024-12-26T07:24:12Z

@SteniMariyaThomas well, nothing prevents the user from running await Job("id", pool).abort() while there's no job with the id "id". For example, this could be a first command on a fresh (empty) queue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to safely re-enqueue an aborted job? #485

How to safely re-enqueue an aborted job? #485

sashkent3 commented Nov 9, 2024 •

edited

Loading

sashkent3 commented Nov 19, 2024

drizzt commented Nov 23, 2024

sashkent3 commented Nov 23, 2024

SteniMariyaThomas commented Dec 26, 2024

sashkent3 commented Dec 26, 2024

How to safely re-enqueue an aborted job? #485

How to safely re-enqueue an aborted job? #485

Comments

sashkent3 commented Nov 9, 2024 • edited Loading

sashkent3 commented Nov 19, 2024

drizzt commented Nov 23, 2024

sashkent3 commented Nov 23, 2024

SteniMariyaThomas commented Dec 26, 2024

sashkent3 commented Dec 26, 2024

sashkent3 commented Nov 9, 2024 •

edited

Loading