Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cram support #609

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Cram support #609

wants to merge 7 commits into from

Conversation

zlskidmore
Copy link
Collaborator

This should allow the rnaseq.cwl workflow be input agnostic in terms of input being sam/cram/bam. I've also incorporated the suggestion from @jasonwalker80 on using RevertSam which was not in here before.

The best solution I could come up with was to force conversion to bam format right at the start. If you were instead to try and pass SAM/CRAM/BAM to RevertSam the output file and more importantly the extension would be incorrect (though if someone can figure this out maybe that's a better solution?).

At any rate as it is now samtools is run and just converts to bam so while there are 3 input types only 1 output type averting this headache alluded to above.

Side note mgibio has 2 samtools docker images, updated at the same time. I don't know which one we use so if this is accepted those lines will have to be modified (right now it uses zlskidmore/samtools)

@zlskidmore zlskidmore changed the base branch from cram_support to master March 19, 2019 20:56
cram_reference: cram_reference
out: [bam_file]
revert_bam:
run: ../tools/revert_input.cwl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could revert_input.cwl be run on CRAMs directly? If so seems like input_to_bam could be removed. If not, seems like it should be named revert_bam.cwl instead of revert_input.cwl to denote its specificity.

(Is this operation time and space-efficient enough that we're willing to do it on all inputs instead of only those that need it?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the specificity, the tool in revert_input.cwl can be run on anything but you would have to use the input variable, currently bam, to grab the appropriate out file , basically would require some javascript i expect which i try to avoid in cwl. I think renaming to revert_bam.cwl is preferable from my perspective.

The flexibility in the PR does come at a computational cost for sure. I think we should run revert_input.cwl either way though so the cost is just converting whatever input to bam

@chrisamiller
Copy link
Collaborator

Does @tmooney's recent code - that allows for Fastq or Bam input to the alignment workflow - provide a path forward here? #741

Seems like could be possible to expand the sequence type to have 4 input types instead of 2:

unaligned_bam
fastq
aligned_bam
aligned_cram

Then we would add some downstream logic in the alignment steps to run RevertSam if needed. That would apply in (I think) just three different places: dna alignment, rnaseq alignment, bisulfite alignment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants