Custom Genome and Visualizations #1519

singhbhavya · 2024-05-23T20:01:34Z

Hi there,

I apologize for the naive question I'm about to ask, but I've been struggling with this for a week and would appreciate some help. I created a custom gene index using a FASTA file of fusions genes (each line > is a gene) and then aligned reads to these fusion genes. I'd like to visualize the FASTA as a "genome" in IGV, along with read alignments to the genes. When I try to load the FASTA in IGV as a genome, I get the error that IGV cannot set the starting chromosome. When I try to import the BAM alignments, I get the error "Invalid BAM file header: missing sequence name in file". Could you please help me understand what I'm missing? Do I need to also provide an annotation to go with this custom "genome", describing what each gene is?

jrobinso · 2024-05-23T20:54:51Z

You should be able to load the fasta from the "Genome" menu, I don't understand the error you are getting. What version of IGV are you using?

By "import" bam alignments I assume you are loading the BAM file from the "File" menu, correct? The error message indicates there is something wrong with your BAM file.

If you are able to share these files (fasta, fasta index, bam, and bam index) email us at [email protected] and I can send you a secure dropbox link. But first confirm that you are using a recent version of IGV.

singhbhavya · 2024-05-23T21:07:14Z

Hi, thank you so much for the response! The version I am using is 2.3.98.

Yes, correct, I am loading the BAM file from the "File" menu. Please let me know whether or not I can email you - thank you again!

jrobinso · 2024-05-23T21:28:28Z

Sorry I can't provide any help for that version, it was released in 2017. You might try the latest version, 2.17.4. If you would like me to look at your files please send email to the address noted above for a dropbox link, or share them in some other way.

singhbhavya · 2024-05-24T15:29:07Z

Hi there, I updated the version to 2.17.4 and received the same error. Sending you an email! thank you so much!

singhbhavya · 2024-05-28T17:55:04Z

Hi there! I identified the problems and fixed them. In case anyone else goes through the same thing, here they are:

There were unexpected characters in the FASTA headers. I replaced those characters in the genome, and re-aligned the FASTQs to the genome.
Due to the characters, the genome wasn't being correctly loaded into IGV, and this fixed it as well.

I used a combination of these two scripts:

Python script to remove ">":

import re

def replace_gt_with_dash_except_first(filename):
    with open(filename, 'r') as file:
        lines = file.readlines()
    
    with open(filename, 'w') as file:
        for line in lines:
            if line.startswith('>'):
                # Replace '>' with '-' except the first instance
                parts = line.split('>')
                line = parts[0] + '>' + '-'.join(parts[1:])
            file.write(line)

filename = 'Genomic_sequences_fromFusion_batch1.fasta'
replace_gt_with_dash_except_first(filename)

Bash script to remove parentheses and dashes.:

#!/bin/bash

# Function to replace dashes and parentheses with colons in sequence names
replace_specific_chars() {
    input_file=$1
    output_file=$2

    sed -E '/^>/ s/[-()]/:/g' $input_file > $output_file
}

input_file="Genomic_sequences_fromFusion_batch1.fasta"
output_file="output.fasta"

replace_specific_chars $input_file $output_file

jrobinso · 2024-06-01T02:05:13Z

@singhbhavya Thanks for this, I'm sure it will be helpful. If you could post one of the offending fasta header lines here I will see if we can improve the parser to load it without modification. The main rule is the sequence name should be the string between the initial ">" and the first whitespace, we should be able to change the parser to ignore everything else.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Genome and Visualizations #1519

Custom Genome and Visualizations #1519

singhbhavya commented May 23, 2024

jrobinso commented May 23, 2024

singhbhavya commented May 23, 2024

jrobinso commented May 23, 2024

singhbhavya commented May 24, 2024

singhbhavya commented May 28, 2024

jrobinso commented Jun 1, 2024

Custom Genome and Visualizations #1519

Custom Genome and Visualizations #1519

Comments

singhbhavya commented May 23, 2024

jrobinso commented May 23, 2024

singhbhavya commented May 23, 2024

jrobinso commented May 23, 2024

singhbhavya commented May 24, 2024

singhbhavya commented May 28, 2024

jrobinso commented Jun 1, 2024