Pangebin is a nextflow pipeline that permites to run plasmid binning software (Plasbinflow, Gplas for example) on multiple assembly graphs at once, exploiting pangenome graphs capabilities.
Pangebin pipeline takes as input two assembly graphs of a bacterial sample, built from Skesa and Unicycler, builds the augmented pangenome that could be then used as infput for PlasBin-Flow or Gplas.
Software required for running the pipeline
- nextflow
- htslib
- R/Rscript
--The input graphs should be named short.gfa.gz
for unicycler graphs and skesa.gfa.gz
for skesa graphs. Both should be placed into a folder named after the sample_id
.--
The input is composed by:
- unicycler and skesa graphs in .gfa or .gfagz format
- pangenome graph built with nf-core Pangenome (PGGB).
nextflow run main.nf -profile mamba --db folder_sample_id --out output
will run the pipeline, placing the output gfa into the output
folder.