-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Permission caveat to restrict Macaroons to permissions #15240
base: main
Are you sure you want to change the base?
Conversation
fb6ff2a
to
f7bf2e7
Compare
Note: I'm thinking we may want to use integers for permissions here, and an explicit registry of permission <-> integer which will also allow us to limit which permissions are even valid in a macaroon. |
Ok, after thinking about this more, I'm pretty sold on the idea of serializing our permissions to integers instead of as strings, which gives us some benefits:
This does have two downsides (one of which is the same downside we've chosen to live with for all of our caveat serialization decisions):
I think that tradeoff makes sense (particularly the first one, since it's the trade off we've made repeatedly for Macaroon serialization). An interesting open question still is how exactly do we want to serialize permissions as integer(s). The simplest option is to just map a permission string to an integer value, and when you get a list of permission strings you just turn that into a list of integers and serialize that (or reverse it for deserialization). That works, but it has some overhead, you need at least 2 bytes for the The other option, is we don't have to model our list of permissions as actually a list of permissions, but rather we could model it as a series of boolean flags, which we can compactly encode into a single integer using bitwise math, which lets us encode up to 100 individual permissions in 31 bytes of JSON (vs 291 bytes for list of integers). Immediately we can see that treating our permissions as bitflags minimizes the maximum amount of storage space by quite a lot (list of integers has almost 10x as much space used), but the trick is that is for a "worst case" scenario where we have 100 permissions exposed and all 100 are being included on this token. If we only want to encode a single permission, things get a lot murkier. For the list of integers case it will consume 2 bytes for the For the bitflags case, it depends entirely on which bit corresponds to the permission we want to set. If we have 100 permissions, but the permission we want to set is in the first 3 bits, then we only require a single byte, on the flip side if the permission we want to set is on the 100th bit, then we will require 31 bytes, even for a single permission. I'm leaning towards using bitflags being the right tradeoff to represent our permissions when serialized in a Macaroon caveat, for the following reasons:
Essentially the choice between bitflags and list of integers is that by choosing bitflags we drastically limit how bad the worst case is, but we also make it more likely that we'll hit that worst case (which can be mitigated by careful planning), and I think minimizing the worst case, paired with the mitigations we can do for common scenarios, is going to be our best path forward. There's a third option here, which is we don't actually have to pick between list of integers and bitflags, and we can support both, and if we support both we could actually have the caveat serialization choose whichever one results in a smaller serialized structure (at the expense of needing to run serialization twice or needing to program in rules for when one should be preferred over another). I think that we can punt on that for now, as we currently only expose a single permission, and it's not until we get into 14+ permissions that bitflags isn't just universally better, and we can always add it in later. So I'm going to move forward with a bitflags based implementation for now. Footnotes
|
I've got this in a state locally where it uses a bitflags (via Basically as it stands we treat creating a caveat through deserialization and through instantiating the class as the same thing, and they support the same input types. What I want to do is have I'm thinking of ways around this, but the answer might be to refactor caveats away from using pydantic and to use something else. That shouldn't be a huge refactor/change and comes with it's own benefit (one of the things about pydantic is they are very ideological that "validation" means "the output shape is what you declare it to be" not "the input shape is validated" and do a lot of coercing to make it so. We turn some of that off through using |
f7bf2e7
to
cb990be
Compare
# This is intended only for use in testing, to allow us to create legacy | ||
# style Macaroons that do not have a permission scope. | ||
_require_permission_scope=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily think it's a better idea, but just to write it down: if we want to avoid this test-only state, we could probably have the relevant tests do explicit/raw DB inserts instead of calling create_macaroon
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm little surprised we don't have a db-centric MacaroonFactory
yet - so that might be a good addition to the testing stack, and we could create a Trait to make the legacy-style if needed. Although I don't know how that might work with the need to have a serialized macaroon for testing, which we get from create_macaroon()
- maybe we need a mock implementation of IMacaroonService
to be used in tests?
This overall approach makes sense to me, and is much cleaner! I agree 100% with the int enum/flags rationale. An adjacent thought: if we're continuing to put pressure on token sizes, maybe we use this opportunity to also change the envelope format itself? In particular, using either msgpack, CBOR, or similar instead of JSON would also save us some bytes here. |
When I made the Caveats v2 branch I tried message pack but pymacaroons had a hard check that the serialized caveat was valid utf8 which severely limited our ability to use a more compact encoding... but I was looking the other day and it looks like maybe that is gone now? Of course we have the problem of reliably detecting a json encoded caveat vs a msgpack encoded one, but we could just do something smarter for the future and treat the first bite as a serialization version, and if it's |
Nevermind, it's still here: https://github.com/ecordell/pymacaroons/blob/master/pymacaroons/caveat_delegates/first_party.py#L23-L26 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through, had a bunch of questions and comments, hopefully they make sense, let me know if they don't.
I'm warming to the idea of bitmasks, but completely agree that they will be confusing to decipher without some tooling in place.
Regarding the serialization format, I'm of the mind of not choosing the potential third option ("There's a third option here, which is we don't actually have to pick between list of integers and bitflags") since that's pushing even more conditional complexity on to the logic here, which is already having to account for old and new tokens.
I've also been refactoring other Permissions - I haven't gotten to upload
yet, but that'll be likely the last one, unless it makes sense to do that sooner to minimize merge conflicts? Let me know!
def upgrade(): | ||
# Initially set our default to true, so all existing macaroons get this set | ||
# to true, then change it so new macaroons get it set to false. | ||
op.add_column( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note (non-blocking): will affect some 200k rows in prod (some 40k are oidc generated). Does the migration need to account for longer locking times considering we have trusted publishers auto-creating macaroons with some frequency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the locking side, but thinking out loud for the row count: we can probably purge 99% of those OIDC generated tokens, since they should all be expired and (IIRC) shouldn't need to be persisted for any UI state (since the relevant bits are separately persisted via the Event
model payloads).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@woodruffw irrespective of this PR, it'd be great to have a daily task run to expire/purge useless OIDC macaroons from the DB to prevent growth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%! I'll file a separate tracking issue for that.
revision = "a78d021addb1" | ||
down_revision = "812e14a4cddf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chore: needs an alembic rebase before merging
|
||
# Previous Macaroon were generated without a caveat restricting what | ||
# permissions they were valid for, so we'll store a flag to indicate whether | ||
# this Macaroon predated the permission caveat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (non-blocking): Reading this, I found myself getting confused between predated the permission caveat
and the permissions_caveat
on this model. Should we invest in refactoring/removing that column before proceeding here now that we have caveats
to reduce the potential confusion?
if permission not in ["upload"]: | ||
return WarehouseDenied( | ||
f"API tokens are not valid for permission: {permission}!", | ||
reason="invalid_permission", | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loving the removal, but main
has changed since and I've added:
warehouse/warehouse/macaroons/security_policy.py
Lines 160 to 164 in 68c1db9
# TODO: Adding API-specific routes here is not sustainable. However, | |
# removing this guard would allow Macaroons to be used for Session-based | |
# operations, bypassing any 2FA requirements. | |
Permissions.APIEcho, | |
Permissions.APIObservationsAdd, |
I'm not certain yet how to perform this kind of guard yet - have you figured out how to prevent using API Tokens for web interactions?
# with a permission caveat), so if this particular Macaroon predates that then | ||
# we will do a legacy fallback here. | ||
if dm.predates_permission_caveat: | ||
if permission != "upload": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Would this also need to be replaced by an enum?
# This is intended only for use in testing, to allow us to create legacy | ||
# style Macaroons that do not have a permission scope. | ||
_require_permission_scope=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm little surprised we don't have a db-centric MacaroonFactory
yet - so that might be a good addition to the testing stack, and we could create a Trait to make the legacy-style if needed. Although I don't know how that might work with the need to have a serialized macaroon for testing, which we get from create_macaroon()
- maybe we need a mock implementation of IMacaroonService
to be used in tests?
# The structure of these values should basically always be 1 << N, where N | ||
# is a unique integer per permission, starting with 0 and increasing from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (non-blocking): while the list is currently small, should we add the @unique
decorator to prevent accidental duplicate values? (self-note - probably a good idea to add to the Permissions
enum too!)
# For instance, if we have internal permissions for "create project", | ||
# "create release", and "upload a new file" as well as a public | ||
# permission for each _and_ a overall "upload" permission that | ||
# encapsulates all 3, granting someone the "upload a new file" | ||
# permission shouldn't also grant the "upload" permission but both | ||
# upload and "upload a new file" permission should work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Really liking this example, helps describe the use case.
Does this mean that by playing with two lists, we could feasibly split apart PublicPermissions.Upload
to more discrete action, or would we use a new PublicPermissions.XXX
to denote the more fine-grained scope and emit that instead (or in addition) 🤔?
@dataclass(frozen=True) | ||
class Expiration(Caveat): | ||
class Expiration(Caveat, frozen=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious - is there anything preventing us from shipping this refactor independently of other changes? Looks like a nice cleanup.
This PR removes (mostly) the hardcoded check for the upload permission, and instead moves it into a "Permission" caveat, the semantics of which is that the permission that is being asked for must be one of the ones mentioned in that caveat.
To ensure that we don't create tokens that can expand in scope as we add new API endpoints, the caveat does not have a way to say "all" permissions or to use any sort of wild cards, each permission must be explicitly enumerated AND we require that all new macaroons include a permission caveat (though we don't make any requirements on what those permissions it limits the macaroon to are).
To support the existing Macaroons, we add a column to the Macaroon database table which records whether this particular Macaroon predates the permission caveat or not, and if it does predate it we do a manual check that the permission is upload. If we ever force everyone to roll new tokens, we could eliminate this manual check, but for now it is easy to have to stay 1.
The biggest question left I think is how do we want to serialize permissions? Right now this PR just serializes them as their string value, which means that our internal permission strings become part of our public API and we can't refactor them without ensuring that the old permission strings still work. This also is one of the least space efficient way of serializing these, since we waste a minimum 2 bytes, and by their nature strings are longer.
We could serialize them to integers, and we just assign an integer per scope, which means that only the integer value becomes part of our public API and we can pack a lot more permissions into a smaller serialized structure.
Throwing this up here now though to get some eyes on it, overall this should be working, though I want to do some more manual testing as well (on top of the open question).
Footnotes
We do this rather than detect the presence of a
Permission
caveat, because anyone with a token can add aPermission
caveat, which means in theory they could expand the scope of a legacy token from upload only to something else, but the same is not true for a database column. ↩