Better integration between datasets and data intervals #45187
Labels
area:datasets
Issues related to the datasets feature
kind:feature
Feature Requests
needs-triage
label for new issues that we didn't triage yet
Description
Currently, one is able to trigger a DAG based on a dataset or a time schedule or a DatasetOrTimeSchedule, but it would be good if the dataset itself (or dataset event) could be associated with a schedule or logical_date. E.g. a monthly dataset, where an event is emitted by a DAG at most once for a given month, and such that the
catchup
argument of a downstream DAG is respected.For example a DAG with two dataset dependencies, if dataset 1 has been produced for month1 and dataset2 gets produced for month2, the DAG will be triggered even though the two dataset events relate to separate intervals. I'd like to trigger the DAG only if the datset events were emitted for the same interval.
I'm fairly new to using datasets so apologies if my issue already has a solution or workaround.
Use case/motivation
I have a few issues with datasets that I'm having trouble solving:
Technically this could be accomplished with TriggerDagRunOperator/ExternalTaskSensor, but these have other issues that datasets solve quite nicely. The benefit of decoupling DAGs using datasets is huge. However by using datasets, some of the benefits of time schedules are lost.
Related issues
#36618
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: