Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDLArrayBindingsJob can use a lot of memory #5085

Open
adamnovak opened this issue Sep 11, 2024 · 0 comments
Open

WDLArrayBindingsJob can use a lot of memory #5085

adamnovak opened this issue Sep 11, 2024 · 0 comments
Labels

Comments

@adamnovak
Copy link
Member

adamnovak commented Sep 11, 2024

I had a toil-wdl-runner use over 8G of memory in the leader Slurm job (and its local workers) and I had Slurm OOM-kill a child process of the leader that was running a WDLArrayBindingsJob.

This was running something like:

for SAMPLE_NAME in HG002.m84005_220827_014912_s1 HG002.m84005_220919_232112_s2 HG002.m84011_220902_175841_s1 HG003.m84010_220919_235306_s2 HG004.m84010_220919_232145_s1 NA12878.palladium ; do
cat >inputs-training-${SAMPLE_NAME}.json <<EOF
{
    "Giraffe.INPUT_READ_FILE_1": "https://storage.googleapis.com/brain-genomics/awcarroll/share/ucsc/pacbio_fastq/${SAMPLE_NAME}.fastq.gz",
    "Giraffe.SAMPLE_NAME": "${SAMPLE_NAME}",
    "Giraffe.PAIRED_READS": false,
    "Giraffe.HAPLOTYPE_SAMPLING": false,
    "Giraffe.GBZ_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v1.1-mc-grch38.d9.gbz",
    "Giraffe.MIN_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v1.1-mc-grch38.d9.k31.w50.W.withzip.min",
    "Giraffe.ZIPCODES_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v1.1-mc-grch38.d9.k31.w50.W.zipcodes",
    "Giraffe.DIST_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v1.1-mc-grch38.d9.dist",
    "Giraffe.VG_DOCKER": "quay.io/adamnovak/vg:beec239",
    "Giraffe.READS_PER_CHUNK": 150000,
    "Giraffe.GIRAFFE_PRESET": "hifi",
    "Giraffe.PRUNE_LOW_COMPLEXITY": true,
    "Giraffe.LEFTALIGN_BAM": true,
    "Giraffe.REALIGN_INDELS": false,
    "Giraffe.OUTPUT_SINGLE_BAM": true,
    "Giraffe.REFERENCE_PREFIX": "GRCh38#0#",
    "Giraffe.REFERENCE_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/references/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna",
    "Giraffe.CONTIGS": ["GRCh38#0#chr1", "GRCh38#0#chr2", "GRCh38#0#chr3",  "GRCh38#0#chr4", "GRCh38#0#chr5", "GRCh38#0#chr6", "GRCh38#0#chr7", "GRCh38#0#chr8", "GRCh38#0#chr9", "GRCh38#0#chr10", "GRCh38#0#chr11", "GRCh38#0#chr12", "GRCh38#0#chr13", "GRCh38#0#chr14", "GRCh38#0#chr15", "GRCh38#0#chr16", "GRCh38#0#chr17", "GRCh38#0#chr18", "GRCh38#0#chr19", "GRCh38#0#chr20", "GRCh38#0#chr21", "GRCh38#0#chr22", "GRCh38#0#chrX", "GRCh38#0#chrY"] 
}
EOF
done

mkdir -p ./output/training

SAMPLE_NAME=HG002.m84005_220827_014912_s1
sbatch -c2 --mem 8G --partition long --time 7-00:00:00 --wrap "toil-wdl-runner https://raw.githubusercontent.com/vgteam/vg_wdl/9b3e4016b16d657a0a7c73e01e1b4c4410f5593e/workflows/giraffe.wdl ./inputs-training-${SAMPLE_NAME}.json --wdlOutputDirectory ./output/training/${SAMPLE_NAME} --wdlOutputFile ./output/training/${SAMPLE_NAME}.json --logFile ./output/training/${SAMPLE_NAME}.log --writeLogs ./output/training/log-${SAMPLE_NAME} --jobStore ./output/training/tree-${SAMPLE_NAME} --batchSystem slurm --slurmTime 11:59:59 --disableProgress --caching=False"

We should make sure that we aren't somehow throwing around very large Python objects in the WDL interpreter. We might want to use MiniWDL's own JSON serialization where possible, somehow.

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1641

@adamnovak adamnovak added the wdl label Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant