Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cohort inspector #892

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Cohort inspector #892

wants to merge 5 commits into from

Conversation

thcrock
Copy link
Contributor

@thcrock thcrock commented Apr 16, 2022

Just putting something together quick for inspecting cohorts.

I added pydantic here as a prototype for using it elsewhere in Triage. I think when we specify specific dict structure as arguments (which we do all over the place) or return values (which we usually don't, but this PR certainly does), we should be more specific about what that dict is supposed to contain. Pydantic makes it pretty easy to do this.

Anyway, this is just the single-date version. I wanted a check-in to see if we should keep going this direction, and maybe decide what else the interface should be. Logging all of these values maybe?

@shaycrk
Copy link
Contributor

shaycrk commented Apr 19, 2022

Thanks, @thcrock -- this seems like a good start to me! The main extension from here is probably specifying the date(s) you want to pass, and one piece that seems like it could be useful to support is to (optionally) start from the temporal config and take something like the last as_of_date or all the as_of_dates in the last training or test matrix. What do you think?

I've been imagining that a use case here might be via a notebook just to provide a little more of an interface, so it might help to add an example notebook that people could start from, for instance along the lines of this one for visualize_chops.

Also, I think that makes sense to me about using pydantic for cases where we want to be explicit about the structure of something like a dict being passed. I'm not sure trying to integrate it to make things more strongly typed everywhere, but curious what your thoughts are on that balance?

@thcrock
Copy link
Contributor Author

thcrock commented Apr 29, 2022

@shaycrk Regarding multiple as-of-dates: What would you want to see as the output in that case? I think a histogram from matplotlib would bring this more in line with visualize_chops; x axis is each as of date that you pass in, and the y axis is the # of rows on that date. If you pass in one row, it could just plot the one bar on that histogram I guess.

If not a graph, what kind of result would you be looking to see for multiple dates? Print the info for each date?

@thcrock
Copy link
Contributor Author

thcrock commented Apr 29, 2022

@shaycrk By histogram I just meant bar chart, I guess. I just included an example notebook with such a bar chart. It's just hardcoded and not calling the cohort inspector right now, but if you think that's a good way to visualize it we could make something like that work.

@shaycrk
Copy link
Contributor

shaycrk commented Jun 21, 2022

Thanks @thcrock (and sorry for the very slow reply!).

I think something along those lines makes sense, but might opt for a line chart, more along the lines of audition, rather than bars. Returning a small number of example entity-date pairs might also be helpful here (or just showing in the example notebook how to grab them from the resulting table) to let users double-check the logic or look at characteristics of some example cohort entities.

Certainly well beyond the scope here, but one could imagine a more fully-featured cohort inspector that makes it easy to look at crosstabs of the resulting cohort and how they vary over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants