Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Maximize 5hmC Calling Efficiency #326

Open
andrew-galbraith opened this issue Oct 18, 2022 · 2 comments
Open

How to Maximize 5hmC Calling Efficiency #326

andrew-galbraith opened this issue Oct 18, 2022 · 2 comments

Comments

@andrew-galbraith
Copy link

Hello I am attempting to run megalodon for 5hmC calling on 100s of cancer nanopore samples and so far have gotten a couple runs to work. However, to run megalodon on the whole genome these runs have taken a week to run. I am wondering what parameters would be ideal for optimizing megalodon efficiency. Here are the current run parameters I am using:

#SBATCH --mem-per-cpu=64gb
#SBATCH --gres=gpu:2

megalodon <path_to_fast5s_folder> --guppy-server-path <path_to_guppy_6.38_server> --guppy-config dna_r9.4.1_450bps_sup_prom.cfg --reference <path_to_reference> --remora-modified-bases dna_r9.4.1_e8 sup 0.0.0 5hmc_5mc CG 0 --device 0 1 --outputs per_read_mods basecalls --chunk-size 500 --max-concurrent-chunks 100

Currently, I have tried using 1-10 fast5 files per run separating each set of fast5 files to their own folder for the whole genome. I am doing these runs on a local cluster of gpus with fairly limited resources. Please, let me know if you think there are any flaws with this approach and what I could alter to optimize efficiency. I know their are the fast remora models but ideally I don't want to compromise any accuracy. Let me know what your thoughts are thanks!

@marcus1487
Copy link
Collaborator

I would recommend using Guppy or Dorado for modified base calling going forward. Megalodon is not being supported going forward and you are likely to get much better performance from the production basecallers where the Remora models have been integrated and optimized. If there is something missing from the outputs of the production basecallers in terms of modified base support please raise those issues there.

@andrew-galbraith
Copy link
Author

Hello Marcus,

Sounds good! Thank you we had tried guppy but had subpar results; we'll try again with the newest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants