Skip to content

This repository contains scripts that facilitate the manipulation of large amounts of data for phylogenomic analysis.

License

Notifications You must be signed in to change notification settings

nramme/easy-phylogenomics

Repository files navigation

easy-phylogenomics

This repository contains scripts that facilitate the manipulation of large amounts of data for phylogenomic analysis.

Requirements

I) split_fasta_into_folders.py

    Split the fasta file into folders containing the sequences of the same name.

    OPTIONS:
      -f|--fasta_file
      -d|--output_directory (Default: current directory)
split_fasta_into_folders.py  \
	-f sequences.fasta \
	-d /path/to/dir/ \

II) blastp_with_diamond.py

    Perform BLASTP searches in multiple proteomes using as query the splited sequences inside the folders.

    OPTIONS:
      -p|--proteomes_dir Directory containing proteome files in FASTA format.
      -e|--evalue E-value for BLAST search (Default: 1e-10)
      -t|--threads Number of CPU threads (Default: all available virtual cores in the machine)
blastp_with_diamond.py  \
	-p path/to/folders \
	-e 1e-05 \
	-t 10

III) extract_blast_hits.py

    Extract the sequences from blastp output.

    OPTIONS:
      -p|--proteome_dir
extract_blast_hits.py  \
	-p path/to/folders \

IV) run_mafft.py

    Multiple alignment program for amino acid or nucleotide sequences.

run_mafft.py

Co-author


Rafael Eberhardt

About

This repository contains scripts that facilitate the manipulation of large amounts of data for phylogenomic analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages