Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a few additional failures to our notes doc #8980

Draft
wants to merge 1 commit into
base: ah_var_store
Choose a base branch
from

Conversation

RoriCremer
Copy link
Contributor

No description provided.

@rsasch rsasch self-requested a review September 26, 2024 13:58
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some specific changes, also I think it would be useful to have a consistent way of attaching the failures to a specific workflow, sub-workflow and task for easier use.

Comment on lines +12 to 15
1. GVS is running very slowly!
1. If your GVS workflow is running very slowly compared to the example runtimes in the workspace, you may have run GVS on GVCFs that have not been reblocked. Confirm your GVCFs are reblocked.
1. My workflow failed during ingestion, can I restart it?
1. If it fails during ingestion, yes, the GvsBeta workflow is restartable and will pick up where it left off.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these two cases are different, since they don't involve a particular error. Maybe something like "Ingestion-Specific Issues"

1. The GVS requires that sample names are unique because the sample names are used to name the samples in the VCF, and VCF format requires unique sample names.
2. After deleting or renaming the duplicate sample, you can restart the workflow without any clean up.
3. `BulkIngestGenomes/GvsBulkIngestGenomes/hash/call-ImportGenomes/GvsImportGenomes/hash/call-GetUningestedSampleIds/gvs_ids.csv Required file output '/cromwell_root/gvs_ids.csv' does not exist.`
1. Duplicate sample names error: ERROR: The input file ~{sample_names_file} contains the following duplicate entries:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would bring back the formatting around the actual error message, as it's easier to scan the document to find the text that was in the log.

Suggested change
1. Duplicate sample names error: ERROR: The input file ~{sample_names_file} contains the following duplicate entries:
1. `Duplicate sample names error: ERROR: The input file ~{sample_names_file} contains the following duplicate entries:`

1. Duplicate sample names error: ERROR: The input file ~{sample_names_file} contains the following duplicate entries:
1. The GVS requires that sample names are unique because the sample names are used to name the samples in the VCF, and VCF format requires unique sample names.
1. After deleting or renaming the duplicate sample, you can restart the workflow without any clean up.
1. BulkIngestGenomes/GvsBulkIngestGenomes/hash/call-ImportGenomes/GvsImportGenomes/hash/call-GetUningestedSampleIds/gvs_ids.csv Required file output '/cromwell_root/gvs_ids.csv' does not exist.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to above

Suggested change
1. BulkIngestGenomes/GvsBulkIngestGenomes/hash/call-ImportGenomes/GvsImportGenomes/hash/call-GetUningestedSampleIds/gvs_ids.csv Required file output '/cromwell_root/gvs_ids.csv' does not exist.
1. During Ingest: `Required file output '/cromwell_root/gvs_ids.csv' does not exist.`

1. If you've attempted to run GVS more than once in the same BigQuery dataset, you may see this error. Please delete the dataset and create a new one. We recommend naming the new dataset something different than the one you deleted.
4. AssignIds failure with error message: `BigQuery error in mk operation: Not found: Dataset`
1. This is saying that GVS was unable to find the BigQuery dataset specified in the inputs. If you haven't created a BigQuery dataset prior to running the workflow, you can follow the steps in [the quickstart](./gvs-quickstart.md). If you created it and still see this error, check the naming of the dataset matches your input specified and that the google project in the inputs is correct. Lastly, confirm you have set up the correct permissions for your Terra proxy account following the instructions in the quickstart.
1. AssignIds failure with error message: BigQuery error in mk operation: Not found: Dataset
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. AssignIds failure with error message: BigQuery error in mk operation: Not found: Dataset
1. AssignIds failure with error message: `BigQuery error in mk operation: Not found: Dataset`

1. This is saying that GVS was unable to find the BigQuery dataset specified in the inputs. If you haven't created a BigQuery dataset prior to running the workflow, you can follow the steps in [the quickstart](./gvs-quickstart.md). If you created it and still see this error, check the naming of the dataset matches your input specified and that the google project in the inputs is correct. Lastly, confirm you have set up the correct permissions for your Terra proxy account following the instructions in the quickstart.
1. AssignIds failure with error message: BigQuery error in mk operation: Not found: Dataset
1. This is saying that GVS was unable to find the BigQuery dataset specified in the inputs. If you haven't created a BigQuery dataset prior to running the workflow, you can follow the steps in the quickstart. If you created it and still see this error, check the naming of the dataset matches your input specified and that the google project in the inputs is correct. Lastly, confirm you have set up the correct permissions for your Terra proxy account following the instructions in the quickstart.
1. Ingest failure with error message: raise ValueError("vcf column not in table")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Ingest failure with error message: raise ValueError("vcf column not in table")
1. Ingest failure with error message: `raise ValueError("vcf column not in table")`

1. (e.g. alternate_bases.AS_RAW_MQ, RAW_MQandDP or RAW_MQ)
1. This means that there is at least one incorrectly formatted sample in your data model. Confirm your GVCFs are reblocked. If the incorrectly formatted samples are a small portion of your callset and you wish to just ignore them, simply delete the from the data model and restart the workflow without them. There should be no issue with starting from here as none of these samples were loaded.
1. Extract failure with OSError: Is a directory. If you point your extract to a directory that doesn’t already exist, it will not be happy about this. Simply make the directory and run the workflow again.
1. Ingest failure with: Lock table error
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Ingest failure with: Lock table error
1. Ingest failure with: `Lock table error`

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems incomplete; what is the user to do if they run into this error?

1. It is important to verify that the data has ALL made it into the BQ dataset or not
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems incomplete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants