Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VS-1470 Have Extract Cohort to PGEN write to cost_observability table #8958

Merged

Conversation

gbggrant
Copy link
Collaborator

@gbggrant gbggrant commented Aug 20, 2024

Fix a bug where ExtractCohortToPgen does not write to the cost_observability table.

  • Successful run here
  • Successful integration test here

Copy link

@koncheto-broad koncheto-broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gbggrant gbggrant merged commit 35601a1 into ah_var_store Aug 21, 2024
11 of 17 checks passed
@gbggrant gbggrant deleted the gg_VS-1470_HaveExtractCohortToPgenWriteToCOTable branch August 21, 2024 14:38
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, although I wonder why some of the intervals in the test run have 0 BigQuery Query Scanned

@gbggrant
Copy link
Collaborator Author

Thanks! This is a good question. I had seen it too and looked into it much greater detail now. For Extract (both for VCF and PGEN) the only BigQuery activity that is being recorded is on the sample_info table. Each shard in a multi-sharded extract is querying the entire table to get a map of sample id to sample name (excluding withdrawn). Since every shard is running the same query (again, not sample-specific, it's the whole table!), BigQuery returns the cached result in almost all cases and so, the scanned value is 0. For the run in question, there were two shards which returned values of 54. I think they ran close enough in time that they weren't cached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants