Cohort inspector #892

thcrock · 2022-04-16T04:54:08Z

Just putting something together quick for inspecting cohorts.

I added pydantic here as a prototype for using it elsewhere in Triage. I think when we specify specific dict structure as arguments (which we do all over the place) or return values (which we usually don't, but this PR certainly does), we should be more specific about what that dict is supposed to contain. Pydantic makes it pretty easy to do this.

Anyway, this is just the single-date version. I wanted a check-in to see if we should keep going this direction, and maybe decide what else the interface should be. Logging all of these values maybe?

shaycrk · 2022-04-19T00:09:05Z

Thanks, @thcrock -- this seems like a good start to me! The main extension from here is probably specifying the date(s) you want to pass, and one piece that seems like it could be useful to support is to (optionally) start from the temporal config and take something like the last as_of_date or all the as_of_dates in the last training or test matrix. What do you think?

I've been imagining that a use case here might be via a notebook just to provide a little more of an interface, so it might help to add an example notebook that people could start from, for instance along the lines of this one for visualize_chops.

Also, I think that makes sense to me about using pydantic for cases where we want to be explicit about the structure of something like a dict being passed. I'm not sure trying to integrate it to make things more strongly typed everywhere, but curious what your thoughts are on that balance?

thcrock · 2022-04-29T04:39:45Z

@shaycrk Regarding multiple as-of-dates: What would you want to see as the output in that case? I think a histogram from matplotlib would bring this more in line with visualize_chops; x axis is each as of date that you pass in, and the y axis is the # of rows on that date. If you pass in one row, it could just plot the one bar on that histogram I guess.

If not a graph, what kind of result would you be looking to see for multiple dates? Print the info for each date?

thcrock · 2022-04-29T05:04:50Z

@shaycrk By histogram I just meant bar chart, I guess. I just included an example notebook with such a bar chart. It's just hardcoded and not calling the cohort inspector right now, but if you think that's a good way to visualize it we could make something like that work.

shaycrk · 2022-06-21T21:16:35Z

Thanks @thcrock (and sorry for the very slow reply!).

I think something along those lines makes sense, but might opt for a line chart, more along the lines of audition, rather than bars. Returning a small number of example entity-date pairs might also be helpful here (or just showing in the example notebook how to grab them from the resulting table) to let users double-check the logic or look at characteristics of some example cohort entities.

Certainly well beyond the scope here, but one could imagine a more fully-featured cohort inspector that makes it easy to look at crosstabs of the resulting cohort and how they vary over time.

thcrock added 4 commits April 15, 2022 23:41

Add prototype cohort inspector

7efb8d4

Squash to one function

c9ac190

Add return type annotation

cdbc40b

Add failure state

8e4877d

thcrock requested review from shaycrk and tweddielin April 16, 2022 04:54

Add cohort size bar example

9c4f118

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cohort inspector #892

Cohort inspector #892

thcrock commented Apr 16, 2022

shaycrk commented Apr 19, 2022

thcrock commented Apr 29, 2022

thcrock commented Apr 29, 2022

shaycrk commented Jun 21, 2022

Cohort inspector #892

Are you sure you want to change the base?

Cohort inspector #892

Conversation

thcrock commented Apr 16, 2022

shaycrk commented Apr 19, 2022

thcrock commented Apr 29, 2022

thcrock commented Apr 29, 2022

shaycrk commented Jun 21, 2022