Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 [Feature] Protect JupyterHub #334

Open
fmigneault opened this issue Jun 1, 2023 · 21 comments
Open

💡 [Feature] Protect JupyterHub #334

fmigneault opened this issue Jun 1, 2023 · 21 comments
Assignees
Labels
component/jupyterhub Related to JupyterHub as development frontend with notebooks enhancement New feature or request project/DACCS Related to DACCS project (https://github.com/orgs/DACCS-Climate) security Issues or features related to security concerns

Comments

@fmigneault
Copy link
Collaborator

Description

Ensure JupyterHub is accessed behind Magpie/Twticher authentication/authorization.
Currently, it uses Magpie/Twticher after the fact to login the user, but the initial request and subsequent ones are not protected by default. Therefore, any user can still reach the entrypoints (though they cannot login).

References

@fmigneault fmigneault added the enhancement New feature or request label Jun 1, 2023
@fmigneault fmigneault changed the title 💡 [Feature] 💡 [Feature] Protect JupyterHub Jun 1, 2023
@fmigneault fmigneault added security Issues or features related to security concerns project/DACCS Related to DACCS project (https://github.com/orgs/DACCS-Climate) component/jupyterhub Related to JupyterHub as development frontend with notebooks labels Jun 1, 2023
@mishaschwartz
Copy link
Collaborator

If we put all jupyterhub routes behind magpie/twitcher then we also have the opportunity to allow users who have already signed in with magpie to not have to sign in again with jupyterhub.

The advantage of this is that users will have a single place to sign in. However it will change the user experience in the following ways:

  • the default sign in page will be the magpie sign in page, not the jupyterhub one
  • the information currently displayed on the jupyterhub sign in page will have to be displayed elsewhere (either on the magpie page, or on the jupyterhub home/spawner page once the user has logged in)

Another option is to keep the jupyterhub sign in page as the default but also have the jupyterhub sign in trigger the magpie sign in. This would avoid the situation where a user is signed in with jupyterhub but is not signed in with magpie which could mean that they are not authorized to access certain routes even though they have signed in from the user's perspective.

My vote would be to go with option 2 and have the jupyterhub sign in trigger the magpie sign in. This changes the least number of things from a user's perspective and keeps the juptyerhub sign in page as the "default" sign in location.

@huard
Copy link
Collaborator

huard commented Jun 12, 2023

I suspect that using the Magpie sign-in page could have long-term advantages, as for example displaying the current permission profile for data, services, other daccs nodes, etc.

@tlogan2000 Other thoughts regarding this ?

@tlogan2000
Copy link
Collaborator

tlogan2000 commented Jun 12, 2023

I suppose my quick thought would be to agree with @huard but certain changes need to be made to the magpie page in my opinion ... Currently magpie signin is very much aligned to an admin user (options for user, service perms etc.) visible even if unaccessable after login : see https://pavics.ouranos.ca/magpie/. The basic user would have to see something like 'Please sign in here' then a 'take me to the Jupyterlab ' type navigation. Possibily an account info / change password option

@mishaschwartz
Copy link
Collaborator

@huard @tlogan2000 That all makes sense to me. Just to clarify what we're proposing:

  • make Magpie the default sign-in page
  • modify the magpie page to allow us to display additional information such as "current permission profile for data, services, other daccs nodes"
  • after a non-admin user signs in through magpie, display different options such as:
  • if a user goes to the jupyterhub url directly:
    • if they have not signed in to magpie: redirects them to the magpie sign in page (or shows them an error message saying they're not authorized)
    • if they have signed in to magpie: redirects them to the jupyter/hub/spawn url

@huard
Copy link
Collaborator

huard commented Jun 13, 2023

Is it possible to rig this so that if I go to pavics.ouranos.ca/jupyter, once the user signs-in in Magpie, it goes directly back to Jupyter (ie you're not stuck in magpie) ?

Also, could we "re-brand" the magpie sign-in page ? I'm concerned if people see Magpie and its logo, they'll think they left the DACCS node.

@mishaschwartz
Copy link
Collaborator

@huard

Is it possible to rig this so that if I go to pavics.ouranos.ca/jupyter, once the user signs-in in Magpie, it goes directly back to Jupyter (ie you're not stuck in magpie) ?

Yes that should be possible

Also, could we "re-brand" the magpie sign-in page ? I'm concerned if people see Magpie and its logo, they'll think they left the DACCS node.

Another option is to leave magpie alone and create a separate "DACCS branded" sign in page that makes a call to the magpie api (see: https://pavics-magpie.readthedocs.io/en/latest/api.html#tag/Session%2Fpaths%2F~1signin%2Fpost). That way we don't have to force magpie to be something it isn't.

@huard
Copy link
Collaborator

huard commented Jun 13, 2023

I like the API call idea. Good for me.
@tlvu Thoughts ?

@fmigneault
Copy link
Collaborator Author

I also agree with the option 2 (keep the jupyterhub sign in page as the default but also have the jupyterhub sign in trigger the magpie sign in). I think it is much easier to use JupyterHub as the entrypoint, because even if we login on Magpie first, we would not have the JupyterHub session ID and session cookie...

When JupyterHub does the login, it performs some Magpie login using this:

c.JupyterHub.authenticator_class = 'jupyterhub_magpie_authenticator.MagpieAuthenticator'

Can't it return the Magpie Cookie at the same time if not already set?
Can't it do a pre-check that a Cookie matching Magpie's definition is already present and valid?
It should be sufficient to send a request to https://pavics.ouranos.ca/magpie/session with the detected cookies to validate if the user is already logged in in Magpie, and that login did not expire, to skip the login on the JupyterHub side.

Where is even that jupyterhub_magpie_authenticator.MagpieAuthenticator implementation?

certain changes need to be made to the magpie page

Note that modifying Magpie to have access to JupyterHub or other useful links is not that straightforward.
Magpie is not used exclusively in birdhouse, and JupyterHub makes no sense in other platforms.
Therefore, it would need some kind of templating HTML to dynamically add extra contents to be displayed by specific platform overrides. The logic and utilities for such templating is already in Magpie (for the notification/registration emails), but would have to be added to that UI page. Also, the JupyterHub Session ID/Cookie would still be missing, so the user would still have to login again on the Jupyter side anyway, unless this template basically reimplements what jupyterhub_magpie_authenticator.MagpieAuthenticator does.

Is it possible to rig this so that if I go to pavics.ouranos.ca/jupyter, once the user signs-in in Magpie, it goes directly back to Jupyter (ie you're not stuck in magpie) ?

That would require some callback-URL to be specified. Magpie can already do something like that when handling "external provider" logins to return to the signin page after resolving the external login, but I'm not certain (pretty sure it won't) work out of the box for internal logins.

@mishaschwartz
Copy link
Collaborator

mishaschwartz commented Jun 13, 2023

@fmigneault

Note that modifying Magpie to have access to JupyterHub or other useful links is not that straightforward.
Magpie is not used exclusively in birdhouse, and JupyterHub makes no sense in other platforms.

I agree which is why I sugested:

Another option is to leave magpie alone and create a separate "DACCS branded" sign in page that makes a call to the magpie api (see: https://pavics-magpie.readthedocs.io/en/latest/api.html#tag/Session%2Fpaths%2F~1signin%2Fpost). That way we don't have to force magpie to be something it isn't.

how do you feel about this suggestion?

Also... The jupyterhub_magpie_authenticator.MagpieAuthenticator can easily be modified to automatically log in users through jupyter if they're already logged in through magpie. If you're interested I've been experimenting with this on the jupyter-behind-twitcher branch, if you have a look at this file it'll give you an idea of a possible solution:

https://github.com/bird-house/birdhouse-deploy/blob/jupyter-behind-twitcher/birdhouse/config/jupyterhub/config/magpie/authenticator/jupyterhub_magpie_authenticator.py

@tlvu
Copy link
Collaborator

tlvu commented Jun 13, 2023

When JupyterHub does the login, it performs some Magpie login using this:

c.JupyterHub.authenticator_class = 'jupyterhub_magpie_authenticator.MagpieAuthenticator'

Can't it return the Magpie Cookie at the same time if not already set?
Can't it do a pre-check that a Cookie matching Magpie's definition is already present and valid?
It should be sufficient to send a request to https://pavics.ouranos.ca/magpie/session with the detected cookies to validate if the user is already logged in in Magpie, and that login did not expire, to skip the login on the JupyterHub side.

Agreed.

Where is even that jupyterhub_magpie_authenticator.MagpieAuthenticator implementation?

It is here https://github.com/Ouranosinc/jupyterhub/blob/master/jupyterhub_magpie_authenticator/jupyterhub_magpie_authenticator.py, from David Caron. Is he still at CRIM? I forgot.

I think if we can combine the idea using Magpie API from Misha with the idea to make the Jupyterhub Magpie authenticator also detect and set Magpie cookies, possibly using this Magpie API?

I agree keeping the Jupyterhub login page is a nicer experience as the Magpie login page is mostly for admin users. As for creating a brand new DACCS login page, then this DACCS login page will have to deal with both Jupyterhub and Magpie sessions cookies. Not sure if this is easier than making the existing Jupyterhub Magpie authenticator play nicer with Magpie sessions.

But I still have a fundamental question that is not very clear to me. Is the current way protecting illegal JupyterHub login not enough? In terms of protection what does putting it behind Twitcher and route all traffic behind Twitcher offer more in terms of protection?

Just to be clear, I see the advantage for better integrating the Jupyterhub session with Magpie session to offer a single sign-on experience. I think we can achieve this without having to route all Jupyterhub traffic behind Twitcher. As such, probably a better title of this issue would be "Allow single sing-on between Jupyterhub and Magpie" instead of "Project JupyterHub".

@mishaschwartz
Copy link
Collaborator

@tlvu

But I still have a fundamental question that is not very clear to me. Is the current way protecting illegal JupyterHub login not enough? In terms of protection what does putting it behind Twitcher and route all traffic behind Twitcher offer more in terms of protection?

By putting it behind magpie/twitcher it would allow us to use magpie to specify permissions on a more fine-grained level (we can allow access to jupyterhub for specific groups of users for example).

Right now, the MagpieAuthenticator simply checks if a users exists in magpie in order to allow them access. This would give us more flexibility.

@mishaschwartz
Copy link
Collaborator

mishaschwartz commented Jun 13, 2023

@tlvu

As for creating a brand new DACCS login page, then this DACCS login page will have to deal with both Jupyterhub and Magpie sessions cookies

Not necessarily, we could imagine a workflow like this:

  • user goes to custom login page and signs in
  • a POST request gets sent to magpie to log in the user and sets the magpie cookies (if successful)
  • the user then goes to the jupyterhub page
  • jupyterhub checks if the user is already logged in through magpie and automatically sets the jupyterhub cookies (see my comment here for details 💡 [Feature] Protect JupyterHub #334 (comment))
  • jupyterhub redirects automatically to the user's jupyterlab environment

Essentially this means that magpie becomes the "source of truth" for whether a user is logged in or not and other components (custom login page, jupyterhub) just have to interact with magpie, not with each other.

@tlvu
Copy link
Collaborator

tlvu commented Jun 13, 2023

@tlvu

But I still have a fundamental question that is not very clear to me. Is the current way protecting illegal JupyterHub login not enough? In terms of protection what does putting it behind Twitcher and route all traffic behind Twitcher offer more in terms of protection?

By putting it behind magpie/twitcher it would allow us to use magpie to specify permissions on a more fine-grained level (we can allow access to jupyterhub for specific groups of users for example).

Right now, the MagpieAuthenticator simply checks if a users exists in magpie in order to allow them access. This would give us more flexibility.

I see, this is now much clearer. We are basically missing both single sign-on and fine grained permissions, like for other WPS services.

If the Magpie API can also provide this group membership information, can the MagpieAuthenticator use this for finer grained permission?

Not necessarily, we could imagine a workflow like this:

* user goes to custom login page and signs in

* a POST request gets sent to magpie to log in the user and sets the magpie cookies (if successful)

* the user then goes to the jupyterhub page

* jupyterhub checks if the user is already logged in through magpie and automatically sets the jupyterhub cookies (see my comment here for details

But here the user is logged into Magpie, but does it have permissions to access JupyterHub? Looks like the same problem as with MagpieAuthenticator?

Essentially this means that magpie becomes the "source of truth" for whether a user is logged in or not and other components (custom login page, jupyterhub) just have to interact with magpie, not with each other.

Currently this is already the case I think. Thredds and all WPS are behind Twitcher/Magpie. Jupyterhub login uses Magpie users.

If I summarize, we want

  • single sign-on between Magpie and JupyterHub
  • fine-grained permissions for JupyterHub
  • nice user experience at the login page (a way to customize the login page as the JupyterHub login page right now with custom banner and informations for users)

I have a feeling all changes can be done at the MagpieAuthenticator level but if a separate DACCS login page is easier then I have no objections. Is this DACCS login page a static page or another app to be deployed?

@fmigneault
Copy link
Collaborator Author

@mishaschwartz

how do you feel about this suggestion?

I'm not sure if this only moves the problem to another service to keep in sync.
The user could still log in with Magpie or JupyterHub.
Long seems to have also identified this issue.

@tlvu

David Caron. Is he still at CRIM?

No. Gone for quite a while now.

Since even the custom DACCS login approach would require that the JupyterHub handler checks if the user is already logged in through Magpie and automatically sets the JupyterHub cookies, I think it is just easier to keep JupyterHub as the main login location and leave it up to the handler to sync items as needed.

@mishaschwartz
Copy link
Collaborator

mishaschwartz commented Jun 14, 2023

I'm not sure if this only moves the problem to another service to keep in sync.
The user could still log in with Magpie or JupyterHub.

We wouldn't have to keep another service in sync. We're still only ever logging in through magpie. We're just adding a static page that we can customize so that we don't have to modify any of the magpie code. Think of it as simply replacing the look of the magpie login page (without actually changing the magpie code).

If it can combine the authenticate method from https://github.com/Ouranosinc/jupyterhub/blob/master/jupyterhub_magpie_authenticator/jupyterhub_magpie_authenticator.py and a definition of https://jupyterhub.readthedocs.io/en/stable/reference/api/auth.html#jupyterhub.auth.Authenticator.check_allowed, that should cover most cases.

That's a great idea to combine those

I think it is just easier to keep JupyterHub as the main login location and leave it up to the handler to sync items as needed.

Yes it is easier @fmigneault, but @huard's point:

I suspect that using the Magpie sign-in page could have long-term advantages, as for example displaying the current permission profile for data, services, other daccs nodes, etc.

does have a lot of advantages for DACCS specifically (even if the advantages to CRIM's use of birdhouse-deploy are not as clear)

@mishaschwartz
Copy link
Collaborator

I have an idea for a compromise that I hope will make everyone happy. Please let me know what you think:

  1. create one PR that puts all jupyterhub routes behind twitcher and changes the MagpieAuthenticator so that it sets the magpie cookies as well when you log in through jupyterhub
  2. create another PR that creates a separate optional component that implements the updated MagpieAuthenticator (described in the jupyter-behind-twitcher) as well as creating a customizable static login page (as described above).

Then, you can choose to have jupyterhub as your main login page or you can choose to enable this optional component to have a customizable login page.

I would prioritize step 1 in order to resolve this issue and then we can work on step 2 at a later date.

@tlvu
Copy link
Collaborator

tlvu commented Jun 19, 2023

I have an idea for a compromise that I hope will make everyone happy. Please let me know what you think:

1. create one PR that puts all jupyterhub routes behind twitcher and changes the MagpieAuthenticator so that it sets the magpie cookies as well when you log in through jupyterhub

Good first step in achieving single sign-on between JupyterHub and Magpie.

I guess what you propose, user already logged into JupyterHub will not need the login again for Magpie but maybe not the other way around?

It's okay since this is the first step. We can implement the reverse scenario in subsequent steps.

2. create another PR that creates a separate optional component that implements the updated MagpieAuthenticator (described in the jupyter-behind-twitcher) as well as creating a customizable static login page (as described above).

Then, you can choose to have jupyterhub as your main login page or you can choose to enable this optional component to have a customizable login page.

I would prioritize step 1 in order to resolve this issue and then we can work on step 2 at a later date.

I like this flexibility to let user decide if JupyterHub or another login page is preferred.

@tlvu
Copy link
Collaborator

tlvu commented Jun 19, 2023

I have an idea for a compromise that I hope will make everyone happy. Please let me know what you think:

1. create one PR that puts all jupyterhub routes behind twitcher and changes the MagpieAuthenticator so that it sets the magpie cookies as well when you log in through jupyterhub

Just to be clear, all routes behind Twitcher means all data flows through Twitcher or simply the "verify" trick so data do not flow through Twitcher to avoid performance penalty?

@mishaschwartz
Copy link
Collaborator

Just to be clear, all routes behind Twitcher means all data flows through Twitcher

Not this one

or simply the "verify" trick so data do not flow through Twitcher to avoid performance penalty?

Yes this one

mishaschwartz added a commit that referenced this issue Oct 31, 2023
## Overview

Sets magpie cookies whenever a user logs in or out through jupyterhub so
that they are automatically logged in or out through magpie as well.
Ensures that the user has permission to access jupyterhub according to
magpie when logging in.

## Changes

**Non-breaking changes**
- adds jupyterhub as a provider in magpie so that admin users can set
api permissions in magpie for jupyterhub

**Breaking changes**

## Related Issue / Discussion

- implements step 1 from this comment:
#334 (comment)

## Additional Information
mishaschwartz added a commit that referenced this issue Nov 30, 2023
…permission to access (#402)

## Overview

By setting the `JUPYTERHUB_CRYPT_KEY` environment variable in the
`env.local` file, jupyterhub will store user's authentication
information (session cookie) in the database. This allows jupyterhub to
periodically check whether the user still has permission to access
jupyterhub (the session cookie is not expired and the permission have
not changed).

The minimum duration between checks can be set with the
`JUPYTERHUB_AUTHENTICATOR_REFRESH_AGE` variable which is an integer (in
seconds).

Note that users who are already logged in to jupyterhub will need to log
out and log in for these changes to take effect.

To forcibly log out all users currently logged in to jupyterhub you can
run the following command to force the recreation of the cookie secret:

  ```shell
docker exec jupyterhub rm /persist/jupyterhub_cookie_secret && docker
restart jupyterhub
  ```

First discussed here:
#358 (comment)

## Changes

**Non-breaking changes**
- Adds two new environment variables to configure additional jupyterhub
authentication
- New jupyterhub version pavics/jupyterhub:4.0.2-20231024

**Breaking changes**

## Related Issue / Discussion

Related to #334

- [x] Note that this PR requires
Ouranosinc/jupyterhub#23 to be merged in first
and the jupyterhub version updated to match.

## Additional Information

<!--
The test suite can be run using a different DACCS config with
``birdhouse_daccs_configs_branch: branch_name`` in the PR description.
To globally skip the test suite regardless of the commit message use
``birdhouse_skip_ci: true`` in the PR description.
-->

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/jupyterhub Related to JupyterHub as development frontend with notebooks enhancement New feature or request project/DACCS Related to DACCS project (https://github.com/orgs/DACCS-Climate) security Issues or features related to security concerns
Projects
None yet
Development

No branches or pull requests

5 participants