Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ContinuousTimetable false triggering when last run ends in future #45175

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jx2lee
Copy link
Contributor

@jx2lee jx2lee commented Dec 23, 2024

closes: #45081


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@jx2lee jx2lee marked this pull request as ready for review December 23, 2024 12:41
@jx2lee jx2lee requested a review from uranusjr as a code owner December 23, 2024 12:41
@jx2lee jx2lee force-pushed the fix-timeteable-continuous branch from 336824a to 0e8de6d Compare December 23, 2024 13:26
Comment on lines 144 to +148
start = last_automated_data_interval.end
end = timezone.coerce_datetime(timezone.utcnow())
end = current_time

if start > end: # Skip scheduling if the last run ended in the future
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if last_automated_data_interval.end > current_time:
    return None
start = ...
end = ...

Exactly the same logic, but more readable IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is returning None the right thing to do here? It would make this DAG not run at all in the future, even after the previous run is no longer in the future.

Copy link
Contributor Author

@jx2lee jx2lee Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that return None prevents the DAG from running anymore. So, how about applying start and end as shown below when the start_date is in the future(instead of None)?

        if last_automated_data_interval is not None:  # has already run once
            if last_automated_data_interval.end > current_time:  # start date is future
                start = restriction.earliest
                elapsed = last_automated_data_interval.end - last_automated_data_interval.start # elapsed already run

                end = start + elapsed.as_timedelta() # set end to "start + timedelta", not current_time
            else:
                start = last_automated_data_interval.end
                end = current_time

start is set to restriction.earliest, and end is calculated by adding the previous execution's time difference to start. This way, even if it's set in the future, expect the DAG to run. Could this approach cause any problems? (Also, since date calculations are involved, would it be better to put it in a separate function?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Continuous timetable causes false dag triggering if dag has already run and start_date is in future
2 participants