Skip to content

Latest commit

 

History

History
76 lines (63 loc) · 2.12 KB

README.md

File metadata and controls

76 lines (63 loc) · 2.12 KB

easy-phylogenomics

This repository contains scripts that facilitate the manipulation of large amounts of data for phylogenomic analysis.

Requirements

I) split_fasta_into_folders.py

    Split the fasta file into folders containing the sequences of the same name.

    OPTIONS:
      -f|--fasta_file
      -d|--output_directory (Default: current directory)
split_fasta_into_folders.py  \
	-f sequences.fasta \
	-d /path/to/dir/ \

II) blastp_with_diamond.py

    Perform BLASTP searches in multiple proteomes using as query the splited sequences inside the folders.

    OPTIONS:
      -p|--proteomes_dir Directory containing proteome files in FASTA format.
      -e|--evalue E-value for BLAST search (Default: 1e-10)
      -t|--threads Number of CPU threads (Default: all available virtual cores in the machine)
blastp_with_diamond.py  \
	-p path/to/folders \
	-e 1e-05 \
	-t 10

III) extract_blast_hits.py

    Extract the sequences from blastp output.

    OPTIONS:
      -p|--proteome_dir
extract_blast_hits.py  \
	-p path/to/folders \

IV) run_mafft.py

    Multiple alignment program for amino acid or nucleotide sequences.

run_mafft.py

Co-author


Rafael Eberhardt