Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picard not finishing jobs on ArchLinux when using parallel - BUG #1798

Open
GACGAMA opened this issue Apr 17, 2022 · 0 comments
Open

Picard not finishing jobs on ArchLinux when using parallel - BUG #1798

GACGAMA opened this issue Apr 17, 2022 · 0 comments

Comments

@GACGAMA
Copy link

GACGAMA commented Apr 17, 2022

Hello Everyone!
I've been using Picard on RNAseq data BAM. I've tried one sample per step to ensure everything is working well.
When trying to use parallel command, I get a few issues

I'm using ArchLinux with 32 threads (16 cores) and 256gb RAM. Disk Space is ok also.

Some jobs for MarkDuplicates and FixMateInformation never finishes, even tought they start.
For MarkDuplicates, some jobs just don't make BAM file (some BAM files got from 5gb to 160mb, but when I check the output of the program, it finishes too soon without any errors). For the FixMateInformation, I had to repeat the operation for 20% of the samples everytime until everything finished (no errors spilled).

Let's focus on MarkDuplicates

I'm using:

script /home/sripts/mark_duplicates_test1.txt
parallel --verbose --link -j 15 'java -XX:ParallelGCThreads=2 -Djava.io.tmpdir=`pwd`/tmp  -jar /home/picard.jar MarkDuplicates -I {1} -O /home/picard/mark_duplicates/{1/.}.markdups.bam -M /home/picard/mark_duplicates_txt/{1/.}.markdups.txt' ::: /home/picard/mate_information/*bam
exit

Even though I have 55 BAM files generated, I only got 24 TXT files.

What I get from the script output:

[Sun Apr 17 13:44:49 GMT-03:00 2022] MarkDuplicates --INPUT /home/picard/mate_information/96_FRAS202421986-1a_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.bam --OUTPUT /home/picard/mark_duplicates/96_FRAS202421986-1a_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.markdups.bam --METRICS_FILE /home/picard/mar_duplicates_txt/96_FRAS202421986-1a_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.markdups.txt --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sun Apr 17 13:44:49 GMT-03:00 2022] Executing as gabriel.gama@tcg on Linux 5.13.13-arch1-1 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.26.0
INFO	2022-04-17 13:44:49	MarkDuplicates	Start of doWork freeMemory: 2036971400; totalMemory: 2058354688; maxMemory: 28631367680
INFO	2022-04-17 13:44:49	MarkDuplicates	Reading input file and constructing read end information.
INFO	2022-04-17 13:44:49	MarkDuplicates	Will retain up to 103736839 data points before spilling to disk.
INFO	2022-04-17 13:44:56	MarkDuplicates	Read     1,000,000 records.  Elapsed time: 00:00:06s.  Time for last 1,000,000:    6s.  Last read position: 1:12,013,314

finishes with:

INFO	2022-04-17 14:36:41	MarkDuplicates	Traversing fragment information and detecting duplicates.
INFO	2022-04-17 14:36:45	MarkDuplicates	Sorting list of duplicate records.
INFO	2022-04-17 14:36:50	MarkDuplicates	After generateDuplicateIndexes freeMemory: 24194363144; totalMemory: 31524388864; maxMemory: 31524388864
INFO	2022-04-17 14:36:50	MarkDuplicates	Marking 72796758 records as duplicates.
INFO	2022-04-17 14:36:50	MarkDuplicates	Found 362973 optical duplicate clusters.
INFO	2022-04-17 14:36:50	MarkDuplicates	Reads are assumed to be ordered by: coordinate

One sample that did work:

[Sun Apr 17 14:16:14 GMT-03:00 2022] MarkDuplicates --INPUT /home/picard/mate_information/9_FRAS202372575-2r_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.bam --OUTPUT /home/picard/mark_duplicates/9_FRAS202372575-2r_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.markdups.bam --METRICS_FILE /home/picard/mark_duplicates_txt/9_FRAS202372575-2r_1.fqAligned.sortedByCoord.out.addOrReplace.fixedmate.markdups.txt --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Sun Apr 17 14:16:14 GMT-03:00 2022] Executing as gabriel.gama@tcg on Linux 5.13.13-arch1-1 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.26.0
INFO	2022-04-17 14:16:14	MarkDuplicates	Start of doWork freeMemory: 2036970664; totalMemory: 2058354688; maxMemory: 28631367680
INFO	2022-04-17 14:16:14	MarkDuplicates	Reading input file and constructing read end information.
INFO	2022-04-17 14:16:14	MarkDuplicates	Will retain up to 103736839 data points before spilling to disk.
INFO	2022-04-17 14:16:21	MarkDuplicates	Read     1,000,000 records.  Elapsed time: 00:00:06s.  Time for last 1,000,000:    6s.  Last read position: 1:25,484,661
INFO	2022-04-17 14:16:21	MarkDuplicates	Tracking 37883 as yet unmatched pairs. 37875 records in RAM.
INFO	2022-04-17 14:16:26	MarkDuplicates	Read     2,000,000 records.  Elapsed time: 00:00:11s.  Time for last 1,000,000:    5s.  Last read position: 1:66,121,764
INFO	2022-04-17 14:16:26	MarkDuplicates	Tracking 410 as yet unmatched pairs. 299 records in RAM.
INFO	2022-04-17 14:16:31	MarkDuplicates	Read     3,000,000 records.  Elapsed time: 00:00:16s.  Time for last 1,000,000:    4s.  Last read position: 1:116,625,993
INFO	2022-04-17 14:16:31	MarkDuplicates	Tracking 138 as yet unmatched pairs. 18 records in RAM.
INFO	2022-04-17 14:16:36	MarkDuplicates	Read     4,000,000 records.  Elapsed time: 00:00:21s.  Time for last 1,000,000:    5s.  Last read position: 1:161,191,238

Finishes with:

INFO	2022-04-17 15:01:54	MarkDuplicates	Writing complete. Closing input iterator.
INFO	2022-04-17 15:01:54	MarkDuplicates	Duplicate Index cleanup.
INFO	2022-04-17 15:01:54	MarkDuplicates	Getting Memory Stats.
INFO	2022-04-17 15:01:55	MarkDuplicates	Before output close freeMemory: 27596498792; totalMemory: 27786215424; maxMemory: 28631367680
INFO	2022-04-17 15:01:55	MarkDuplicates	Closed outputs. Getting more Memory Stats.
INFO	2022-04-17 15:01:55	MarkDuplicates	After output close freeMemory: 27830331240; totalMemory: 28020047872; 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant