Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AES256 encryption of sensitive params #45195

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

andrii-korotkov-verkada

Implements support for _ENCRYPTED versions of env variables and _encrypted versions of config params. Those would be encrypted using a parameter params_encryption_aes256_key (which can be retrieved from a secret for example). The params encryption key itself doesn't support encrypted suffix, or it'd create an infinite recursion.

closes: #45194


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@andrii-korotkov-verkada
Copy link
Author

Starting with a draft with a POC. Please, let me know if this is a good direction.

@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 45194-support-aes256-encryption-of-sensitive-params branch from 692732e to 56d5d47 Compare December 24, 2024 15:01
Implements support for `_ENCRYPTED` versions of env variables and `_encrypted` versions of config params.
Those would be encrypted using a parameter `params_encryption_aes256_key` (which can be retrieved from a secret for example).
The params encryption key itself doesn't support encrypted suffix, or it'd create an infinite recursion.

closes: apache#45194

Signed-off-by: Andrii Korotkov <[email protected]>
@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 45194-support-aes256-encryption-of-sensitive-params branch from 56d5d47 to c2ca77e Compare December 24, 2024 15:06
@potiuk
Copy link
Member

potiuk commented Dec 24, 2024

Before we go any deeper. I would like to dicuss the reasoning of doing it

I simply think it does not achieve any "more" security than we currently have. Both parameters and decryption key are necesarlly in the same environment and you neeed to have access to both - if you don't, then you cannot decrypt them.

We already have encryption using fernet key (symmetric) but there we use it to encrypt the database values, becuase then for example if you have only access to database backups/dumps or database itself, you are not able to decrypt it without access to the configuraiton of Airlfow - where the fernet key is stored. So here values and keys are kept in two separate "spaces" / "security scopes/perimeters".

We also already support _CMD pattern - where the actual sensitive values are retrieved by running a script that can (for example) use workload identity to retrieve sensitive values from somewhere else (say secrets manager) - so that it is not stored in the configuration at all.

In this case and your proposal, seemlngly both encrypted values and the key are supposed to be stored in the same "space" - so if you have access to one, you have access to other, and you can use the key to decrypt the values, so at best it's security-by-obscurity.

Do you see any scenario where someone could get hold of the values but not the key - where it could be helpful? What kind of attacks and which actor behaviours it would prevent ?

It might be you have some concrete use cases in mind that I don't know.

@andrii-korotkov-verkada
Copy link
Author

andrii-korotkov-verkada commented Dec 24, 2024

Sure. I think of this as more of a convenience feature rather than a feature that heightens a security.
ArgoCD is used to manage manifests, as well as there are multiple shards (i.e. AWS accounts and clusters), where each needs to have some configuration, ideally with as little overhead as possible. Some secret values can be based on large things, e.g. pgbouncer.ini config, so automated way for managing them would be very helpful.
Here are several options I've considered:

  • Storing generated manifests in the repo, even for secrets. That's a hard no from a security perspective.
  • Generate and apply secrets to the cluster directly without committing them to the repo. This makes secrets not very durable, e.g. if fernet key is removed for some reason, the encrypted data is effectively lost. This would require duplicating the key in some password manager, which would make management a bit clunkier.
  • Use secrets manager backend to store secrets. This helps with durability and overall seems like a good option, however adding secrets to the manager and rotating them would be clunky, since it's not integrated with helm templating.
  • Using _CMD env variables. This can work, but has a few downsides in my case, like aws cli not being pre-installed in the default image (at least I didn't see it in constraints) and cmd code needs to be repeated for each parameter.
  • Use AES256 encryption to store encrypted versions of secrets. This solves durability, auditing, security and convenience. Since helm templating supports aes256 encryption natively, I can just generate all manifests with it with all secrets needed for all shards, which is especially convenient given some secrets like connection are based on multiple helm inputs (e.g. host, username, password). This way, I reduce the number of secrets to manage with external means (like cluster secret or secrets manager) to just one.

I'd have to think separately on how to work with secret things mounted as files, e.g. pgbouncer.ini. Though probably a similar approach with having a key like pgbouncer.ini.encrypted and using the same key to decrypt would work.

Please, let me know if these arguments make sense. Happy to discuss further.

@andrii-korotkov-verkada
Copy link
Author

Here's an example when there's a secret value in a config where it doesn't seem to be possible to hide it https://airflow.apache.org/docs/apache-airflow/1.10.8/security.html#github-enterprise-ghe-authentication.

@potiuk
Copy link
Member

potiuk commented Dec 26, 2024

Just to note - not that I am against it totally, but I want you to think and come up with proposal how to tell the user to maintain "secret" status of such values?

If you are keeping both - secret values and private key in the same configuration place (secure perimeter), then you can use the key to decrypt the values. What your advise should be for the users who are going to use that feature, how they should keep the key and what exactly will the encryption will be protecting against?

If we are going to accept some form of it, we need have detailed documentation describing the users what kind of security they can expect and how they should protect (and against what) - the thing is that often when people see some "Encrypted" values, they think they are generally protected against the values leaking (but they don't realise for example that storing the key to decrypt the values in the same security perimeter/storage gives them false sense of security).

What your "user documentation" for that feature and user recommendation would look ilike for it?

BTW:

Using _CMD env variables. This can work, but has a few downsides in my case, like aws cli not being pre-installed in the default image (at least I didn't see it in constraints) and cmd code needs to be repeated for each parameter.

Not really. You can have a single command that is used there and pass the parameter to that command indicating which secret you want to retrieve - ...SQL_CON_CMD=/opt/airflow/bin/get_secret_value.sh SQL_CONN etc. That's what it was designed for.

@potiuk
Copy link
Member

potiuk commented Dec 26, 2024

Also. I think it needs an explanation on why you cannot use "k8s secrets" (if you are talking about k8s) - and why is it problematic to deploy them separately and mount as config files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support AES256-encrypted values for sensitive parameters in configs, env variables etc.
2 participants