-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIP-72 | "unsafe" level #45200
Comments
Not going to happen IMHO. I would definitely be voting NO if it came to a vote. First of all airflow DB is an internal detail. Secondly you have REST APi and Python client and we are going to have built in authentication to work automatically from worker so that you can use Python. Client'Pythonic APi to interact with Airflow DB over REST API and this is the way it is going to be the 'official'way of interacting with Airflow DB. If you think you miss some ways of interacting with Airflow DB (do r example bulk updates) - the right approach is to add appropriate API endpoints to do it NOT use the SQL queries on the Database that might and will change any time without any warning. What you are asking for is a step back in security and architecture and I highly doubt any maintainer would support it . BTW. In many cases using. python Client will produce more readable and simpler code than using sqlqchemy |
Thank you for the answer 👍 Without security compromise, would it be possible to create a pluggable SDK extension system ? So at run time an airflow "custom_sdk_extension" or "sdk_custom_backend" that would let the user enrich the API of the SDK (so no need to maintain a custom fork airflow project for the user ) ? Thanks |
Why Python client is not enough ? |
What precisely features you miss and why they cannot be added to standard API as pull request ? The 'extendable sdk' is anti-thesis of having an API. The REST API is designed to make a maintainable, public way of interacting with airflow. What your are asking for is maintainable API to be able to extend rhe API which is 'meta API API' The thing is that you are NOT supposed to be interacting with the DB of Airflow because it might and will change. So if you do it (which you still can do if you really want by configuring your own way of accessing the same DB and writing your own code to do so. Then you are doing it on absolutely your own risk and cost. The whole thing about exposing the API by maintainers and not giving ANY official way to interacting with the DB that aims for exactly this - to make you solely and ultimately responsible if you access the things we made 'internal' deliberately . So what you are asking for REALLY is for maintainers and community to take responsibility and maintenance cost and burden for accessing airflow db in the ways we are not able to foresee. IMHO - this is very clear ask from maintainers: a) if you see some access patterns that can be reused by others - contribute it back as REST API b) if you see something useful only for you - do what you need In your but without impacting community in ways that it will make it more difficult to maintain and make changes. You need to take full responsibility for adapting the changes to future airflow version and it should be clear to you that you are commiting to it Simply 'with great powers come great responsibilities' and if you want more power you should take full responsibility for what it means |
Yes you are right, the main pattern that I have in mind is the bulk maintenance of the airflow database ( there are few users, doing >100k task_run by day ). Thanks again for your detailed answer, it will help many users realize that building directly on top of Airflow a very custom platform ( I seen many cases of users using XCOM as almost a product database 😱, and also querying the airflow database for very very custom orchestration decisioning ) without separation of concern is not a good idea. I guess that users will surely have intriguing and subtle questions when airflow V3 will be release 😀 Thanks again |
I think it's not a problem to build custom platform - the REST API of ours is precisely what's making it possible and it was the main motivator for the API implementation. And again - if some features are missing there, adding an API endpoint via PR is precisely how you can achieve what you need. You can also export the data somewhere (even using Airflow DAG !) using the API and do whatever you need outside of airflow DB. This is a typical pattern that internal DB needs to be performant and maintainable for the Application using it and if you need to do any other mechanism to access the data and aggregate or process it in other ways you export the data and so it 'outside'. And yes - you can query and retrieve data using XCOM REST API. And again - interacting with them over the XCOM part of the Python client is even easier than using sqlalchemy or SQL. And the fact that you should not be using db access had not only been mentioned every time someone brought the subject in slack and GitHub but also very clearly explained in https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html#what-is-not-part-of-the-public-interface-of-apache-airflow When we implemented REST API 4 years ago for Airflow 2 we made it very clear that it is the only maintained and supported interface and we gave our users 4 years to adapt. So I think it is not unexpected, it's a very deliberate, very well announced move and we went a great length to add tools and explanations how to use it. If anyone is still not ready for that, well, there is still time to prepare. |
I hope so. And I hope that it will lead that very users to contribute back whatever they need and miss in Airflow 3 API in a reusable way for others That is one of the intentions we had when we added the API |
With airflow 2.11 we will be able to use the sdk , right ? |
I think it is planned as one of the rules . Certainly we discussed it as an option and we already did something similar for db isolation AIP-44 work - so we already have the right code implemented in v2-10-test branch, we just need to turn the AIP-44 error into deprecation warning If it is not already there #41641 - it's.a.good idea to add it there |
Description
I would like to still be able to write dag with a low level access to airflow database for operational purposes in airflow v3
it's convenient to be able to programmatically access the database session in a pythonoperator for specific cleaning / operational purposes
example
wdyt ?
Use case/motivation
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: