Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picard SortVcf has issues reading from /dev/stdin and writing to /dev/stdout #1776

Open
clintval opened this issue Feb 14, 2022 · 0 comments

Comments

@clintval
Copy link

clintval commented Feb 14, 2022

Bug Report

Affected tool

  • Picard SortVcf

Affected version(s)

  • 2.26.10 (presumably others in the past too)

Description

Most of the time, but not always, Picard SortVcf will raise an exception when it consumes VCF data from a pipe stream.

Picard SortVcf will raises an unsupported operation exceptiom when writing to places like /dev/stdout.

Steps to reproduce

If you have a VCF and stream it into SortVcf, you may get a cryptic error about malformed info.

cat dna02268.var2vcf_valid.temp.vcf | picard SortVcf -I /dev/stdin -O output.vcf -SD hs38DH.dict
19:59:36.163 INFO  NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/cvalentine/miniconda3/share/picard-2.26.10-0/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib
[Sun Feb 13 19:59:36 EST 2022] SortVcf --INPUT /dev/stdin --OUTPUT output.vcf --SEQUENCE_DICTIONARY hs38DH.dict --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sun Feb 13 19:59:36 EST 2022] Executing as [email protected] on Mac OS X 10.16 x86_64; OpenJDK 64-Bit Server VM 11.0.9.1+1-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.26.10
INFO	2022-02-13 19:59:36	SortVcf	Reading entries from input file 1
[Sun Feb 13 19:59:36 EST 2022] picard.vcf.SortVcf done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=536870912
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.tribble.TribbleException: Line 63: there aren't enough columns for line ADJAF=0;SHIFT3=0;MSI=0;MSILEN=0;NM=0.9;HICNT=9166;HICOV=9166;LSEQ=0;RSEQ=0;DUPRATE=0;SPLITREAD=0;SPANPAIR=0	GT:DP:VD:AD:AF:RD:ALD	0/0:9232:0:9166:0:4385,4781:0,0 (we expected 9 tokens, and saw 3 ), for input source: file:///dev/stdin
	at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:381)
	at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)
	at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)
	at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
	at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
	at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:375)
	at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.<init>(TribbleIndexedFeatureReader.java:342)
	at htsjdk.tribble.TribbleIndexedFeatureReader.iterator(TribbleIndexedFeatureReader.java:309)
	at htsjdk.variant.vcf.VCFFileReader.iterator(VCFFileReader.java:329)
	at picard.vcf.SortVcf.sortInputs(SortVcf.java:164)
	at picard.vcf.SortVcf.doWork(SortVcf.java:98)
	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
	at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
	at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

However, everything works fine if it is read from a file:

picard SortVcf -I dna02268.var2vcf_valid.temp.vcf -O output.vcf -SD hs38DH.dict
20:00:21.375 INFO  NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/cvalentine/miniconda3/share/picard-2.26.10-0/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib
[Sun Feb 13 20:00:21 EST 2022] SortVcf --INPUT dna02268.var2vcf_valid.temp.vcf --OUTPUT output.vcf --SEQUENCE_DICTIONARY hs38DH.dict --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sun Feb 13 20:00:21 EST 2022] Executing as [email protected] on Mac OS X 10.16 x86_64; OpenJDK 64-Bit Server VM 11.0.9.1+1-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.26.10
INFO	2022-02-13 20:00:21	SortVcf	Reading entries from input file 1
INFO	2022-02-13 20:00:22	SortVcf	read        25,000 records.  Elapsed time: 00:00:00s.  Time for last 25,000:    0s.  Last read position: chr8:116,854,353
INFO	2022-02-13 20:00:22	SortVcf	read        50,000 records.  Elapsed time: 00:00:00s.  Time for last 25,000:    0s.  Last read position: chr21:34,792,483
INFO	2022-02-13 20:00:22	SortVcf	wrote        25,000 records.  Elapsed time: 00:00:00s.  Time for last 25,000:    0s.  Last read position: chr8:116,854,353
INFO	2022-02-13 20:00:23	SortVcf	wrote        50,000 records.  Elapsed time: 00:00:00s.  Time for last 25,000:    0s.  Last read position: chr21:34,792,483
[Sun Feb 13 20:00:23 EST 2022] picard.vcf.SortVcf done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=934281216

Additionally, this tool has difficulty writing to locations like /dev/stdout or /dev/null because the underlying writer needs to write the index but that is forbidden on most systems:

picard SortVcf -I dna02268.var2vcf_valid.temp.vcf -O /dev/stdout -SD hs38DH.dict > output.vcf
20:01:57.670 INFO  NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/cvalentine/miniconda3/share/picard-2.26.10-0/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib
[Sun Feb 13 20:01:57 EST 2022] SortVcf --INPUT dna02268.var2vcf_valid.temp.vcf --OUTPUT /dev/stdout --SEQUENCE_DICTIONARY hs38DH.dict --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sun Feb 13 20:01:57 EST 2022] Executing as [email protected] on Mac OS X 10.16 x86_64; OpenJDK 64-Bit Server VM 11.0.9.1+1-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.26.10
INFO	2022-02-13 20:01:57	SortVcf	Reading entries from input file 1
INFO	2022-02-13 20:01:58	SortVcf	read        25,000 records.  Elapsed time: 00:00:00s.  Time for last 25,000:    0s.  Last read position: chr8:116,854,353
INFO	2022-02-13 20:01:58	SortVcf	read        50,000 records.  Elapsed time: 00:00:00s.  Time for last 25,000:    0s.  Last read position: chr21:34,792,483
INFO	2022-02-13 20:01:59	SortVcf	wrote        25,000 records.  Elapsed time: 00:00:00s.  Time for last 25,000:    0s.  Last read position: chr8:116,854,353
INFO	2022-02-13 20:01:59	SortVcf	wrote        50,000 records.  Elapsed time: 00:00:00s.  Time for last 25,000:    0s.  Last read position: chr21:34,792,483
[Sun Feb 13 20:01:59 EST 2022] picard.vcf.SortVcf done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1032847360
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: Unable to close index for file:///dev/stdout
	at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:183)
	at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:233)
	at picard.vcf.SortVcf.writeSortedOutput(SortVcf.java:186)
	at picard.vcf.SortVcf.doWork(SortVcf.java:101)
	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
	at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
	at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.nio.file.FileSystemException: /dev/stdout.idx: Operation not permitted
	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
	at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
	at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:478)
	at java.base/java.nio.file.Files.newOutputStream(Files.java:219)
	at htsjdk.tribble.index.AbstractIndex.write(AbstractIndex.java:381)
	at htsjdk.tribble.index.AbstractIndex.writeBasedOnFeaturePath(AbstractIndex.java:391)
	at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:178)
	... 6 more

An unsupported operation exception is a surprising behavior for trying to write to a file handle where write access of sibling files are disallowed.

The final example does succeed when --CREATE_INDEX false but the user experience could be improved so that an index is not attempted when writing to a special file system device (like a Java "non-regular file") or when the sibling file to the output is non-writable.

Expected behavior

  1. Picard SortVcf should be able to read from standard input (/dev/stdin)
  2. Picard SortVcf when run with default settings shout not raise an exception when writing to places like /dev/stdout

Actual behavior

Exceptions are raised which is hard for casual users to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants