easy-phylogenomics

This repository contains scripts that facilitate the manipulation of large amounts of data for phylogenomic analysis.

Requirements

I) split_fasta_into_folders.py

Split the fasta file into folders containing the sequences of the same name.

split_fasta_into_folders.py  \
	-f sequences.fasta \
	-d /path/to/dir/ \

II) blastp_with_diamond.py

Perform BLASTP searches in multiple proteomes using as query the splited sequences inside the folders.

blastp_with_diamond.py  \
	-p path/to/folders \
	-e 1e-05 \
	-t 10

III) extract_blast_hits.py

Extract the sequences from blastp output.

-p|--proteome_dir

extract_blast_hits.py  \
	-p path/to/folders \

IV) run_mafft.py

Kazutaka Katoh, Daron M. Standley, MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability, Molecular Biology and Evolution, Volume 30, Issue 4, April 2013, Pages 772–780, https://doi.org/10.1093/molbev/mst010

Multiple alignment program for amino acid or nucleotide sequences.

run_mafft.py