Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'cpgplot' input sequence not recognized #47

Open
piatekmj opened this issue Feb 24, 2017 · 9 comments
Open

'cpgplot' input sequence not recognized #47

piatekmj opened this issue Feb 24, 2017 · 9 comments

Comments

@piatekmj
Copy link

Session info:
perl = v5.22.0
Mac OS = 10.11.6
BioPerl-Run-1.007001

From Bio::Factory::EMBOSS when trying to provide input sequence for cpgplot as a variable, I do get an error message saying:
Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat
This seems to be composition dependent as I do not get that for every sequence provided. It seems this error only shows up for sequences that do have CpG island identified.

Also occasionally I do get error message saying:
Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309

The above only applies for some cases not for all, I only get those messages for some sequences.

When I, however provide the input sequence with asis prefix like so: sequence => "asis:$sequence" program completes with output.

As long as we are on cpgplot, I would like to also mention that with plot => 'Y' provided, I do get another error message for sequences for which CpG islands have been identified:

*** PLPLOT ERROR ***
Unable to either (1) open/find or (2) allocate memory for the font file
Program aborted

This can be turned off by providing plot => 'N' if one is not interested in graphical output.
I do have plplot installed with my package manager.

Any assistance would be appreciated.

@bosborne
Copy link
Member

bosborne commented Feb 25, 2017 via email

@piatekmj
Copy link
Author

Hi Brian,

The code I'm testing is as follows:

use Bio::Factory::EMBOSS;
use Bio::SeqIO;

my $genome = 'genome.fa';
my $inseq = Bio::SeqIO->new(    
    -file   => $genome,          
    -format => 'fasta',
    );

my %assembly;

while (my $seq = $inseq->next_seq) {
    $assembly{$seq->id} = $seq->seq;
}

my $factory = new Bio::Factory::EMBOSS;
my $prog = $factory -> program('cpgplot');

foreach my $frag (keys %assembly){

    print "Processing $frag\n";
    my %input = (
	-sequence => "asis:$assembly{$frag}",
	-outfile => "${frag}.cpgplot",
	-window => 100,
	-minlen => 200,
	-minoe => 0.6,
	-minpc => 50,
	-plot => 'N',
    );

    print "$frag\tCheck 1\t";
    $prog->run(\%input);
    print "Check 2\n";

}

The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them.

I have to have the -plot => 'N', switch otherwise I get errors related to PLPLOT.

With -sequence => $assembly{$frag}, I get some failures reported:
Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat
or
Check 1 Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309 Check 2

With the -sequence => "asis:$assembly{$frag}", I get no errors but results differ from a webserver even when used with the same settings.

The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between -sequence => "asis:$assembly{$frag}" and just -sequence => "$assembly{$frag}" where the variable holds just a DNA sequence 3) the output being different between a webserver and local script using the same settings using webserver1 or webserver2 (EBI's cpgplot doesn't accept files larger than 1MB). (I'm not able to check what version of EMBOSS is being used by any of these two)

Could there be something wrong with the installation maybe?

Thanks,
Marek

Additional session info:
print $factory->version(); # prints 6.5.7.0
print $factory->location(); # prints local

@bosborne
Copy link
Member

bosborne commented Feb 27, 2017 via email

@bosborne
Copy link
Member

bosborne commented Feb 27, 2017 via email

@piatekmj
Copy link
Author

piatekmj commented Feb 27, 2017

They are not very long. See the lengths of the sequences below:
1223259 unitig_9
1087134 unitig_0
934178 unitig_1
660843 unitig_2
453028 unitig_5
312397 unitig_6
114250 unitig_11
80635 unitig_3
66509 unitig_7
40100 unitig_4
6932 unitig_8
2384 unitig_10

Interestingly (or not), I only get reported results from 'unitig_11' down. Is there a sequence length limits imposed on cpgplot and other EMBOSS scripts?

@piatekmj
Copy link
Author

Brian, it seems I was able to overcome the problem by using your suggestion of using sequence objects.

while (my $seq = $inseq->next_seq) {
    $assembly{$seq->id} = $seq;
}

Then later on using -sequence => $assembly{$frag} produces the exact output as from example webserver, without any error if used in conjunction with -plot => 'N'.

It seems the major issue is now solved, leaving me wonder why was the message so cryptic?
If you would be able to provide any assistance with regards to the PLPLOT issue, it would be great.

@bosborne
Copy link
Member

bosborne commented Feb 27, 2017 via email

@piatekmj
Copy link
Author

pltek is in my PATH indeed. If you instructed me, I could provide some more details regarding that.

@bosborne
Copy link
Member

bosborne commented Feb 27, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants