Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create features per fold #897

Open
ecsalomon opened this issue May 26, 2022 · 0 comments
Open

Create features per fold #897

ecsalomon opened this issue May 26, 2022 · 0 comments
Assignees

Comments

@ecsalomon
Copy link
Contributor

ecsalomon commented May 26, 2022

Triage would ideally create features for each temporal fold, such that columns (whether quantitative aggregates or categorical choice aggregates) that would not have been available (or which would not have met the conditions of the choice query, e.g., at least 1000 examples of the choice) at training time are not used in the training or test matrices for models built on that fold. This raises some questions we might encounter in implementing this behavior:

  • How will model grouping be handled when the same feature configuration results in different columns over time?
  • What metadata should we store about model features? Should we include both the observed and configured features in model metadata?
  • Will we need additional metadata for matrices? Should they also have the configured features in their metadata?
  • Given that test matrices are built at a different as-of-date than the training matrices they are paired with, will we make test matrices tied to specific training matrices (i.e., this test matrix has features available at X training time) or try to make them generic and get the features from the trained model? If the latter, will we run into situations where the test matrix is missing a feature that was available at training time, and how will we handle that?
  • Alternatively, should we refactor the feature generation much more broadly and avoid some of these matrix questions altogether?
@nanounanue nanounanue self-assigned this Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants