This repository contains scripts that facilitate the manipulation of large amounts of data for phylogenomic analysis.
I) split_fasta_into_folders.py
Split the fasta file into folders containing the sequences of the same name.
OPTIONS:-
-f|--fasta_file
-d|--output_directory (Default: current directory)
split_fasta_into_folders.py \
-f sequences.fasta \
-d /path/to/dir/ \
II) blastp_with_diamond.py
Perform BLASTP searches in multiple proteomes using as query the splited sequences inside the folders.
OPTIONS:-
-p|--proteomes_dir Directory containing proteome files in FASTA format.
-e|--evalue E-value for BLAST search (Default: 1e-10)
-t|--threads Number of CPU threads (Default: all available virtual cores in the machine)
blastp_with_diamond.py \
-p path/to/folders \
-e 1e-05 \
-t 10
III) extract_blast_hits.py
Extract the sequences from blastp output.
OPTIONS:-
-p|--proteome_dir
extract_blast_hits.py \
-p path/to/folders \
IV) run_mafft.py
Multiple alignment program for amino acid or nucleotide sequences.
run_mafft.py
Rafael Eberhardt |