[Bioperl-l] Problems with Bio::DB::Fasta

Florent Angly florent.angly at gmail.com
Fri May 27 23:33:36 UTC 2011



On 28/05/11 05:07, Justin Chu wrote:
> Thanks for your reply, I think something is wrong with my installation 
> because I keep getting an error when running your script. I have had 
> already tried reinstalling with a version on cpan to make sure my 
> problem is not due to missing dependencies but I still get the 
> following error:
>
> Can't locate Test/Exception.pm in @INC (@INC contains: t/lib 
> /home/justin/workspace/.metadata/.plugins/org.epic.debug 
> /home/justin/workspace/LocalTools/Testing /etc/perl 
> /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 
> /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 
> /usr/local/lib/site_perl .) at (eval 46) line 2.
> BEGIN failed--compilation aborted at (eval 46) line 2.
>
> BEGIN failed--compilation aborted at 
> /usr/local/share/perl/5.10.1/Bio/Root/Test.pm line 152.
> Compilation failed in require at 
> /home/justin/workspace/LocalTools/Testing/test.pl <http://test.pl> line 6.
> BEGIN failed--compilation aborted at 
> /home/justin/workspace/LocalTools/Testing/test.pl <http://test.pl> line 6.

Hi Justin,
Install the Test::Exception module this way (for Debian-like systems): 
sudo apt-get install libtest-exception- perl
Once it is installed, you should get the error messages on the white 
lines of your FASTA file when running the script. If you don't get 
errors on the white lines, and the script continues happily, then that's 
very likely the reason why you get the wrong subsequences.
Florent



>
> However I did post my problem somewhere else and I did find other 
> people did get errors when trying to make a index with my files. The 
> weird thing is that I could make index files but lines with out 
> sequence would cause my sequence retrieval to be offset one sequence 
> position by each empty line. I found that removing all the spaces 
> fixed the retrieval but this still does not explain the lack or error 
> messages.
>
> Thanks for your help,
>
> Justin
>
> On Thu, May 26, 2011 at 8:55 PM, Florent Angly 
> <florent.angly at gmail.com <mailto:florent.angly at gmail.com>> wrote:
>
>     Hi Justin,
>
>     I been trying to reproduce your issue. A problem I ran into was
>     that there were some extra empty lines in your FASTA files. Then I
>     made a test script that gets the subsequences you mentioned using
>     three different methods: Bio::SeqIO+Bio::Seq, Bio::DB::Fasta, and
>     your InMemoryFastaAccess. These three methods return the same
>     answer, so, I see no problem there.
>
>     My system is pretty similar to yours:
>     Bioperl-live from the BioPerl GitHub master branch from 27/5/11
>     Perl 5.12.3
>     Linux 2.6.38-2-amd64 (Linux Mint Debian Edition)
>
>     Can you run the attached script on the attached FASTA files and
>     see if all tests pass?
>
>     Thanks,
>
>     Florent
>
>
>
>
>     On 21/05/11 05:51, Justin Chu wrote:
>>     Hello:
>>
>>     I'm having trouble with Bio::DB::Fasta. It sometimes occurs when I use large
>>     fasta files and retrieve sequence from a bit past the start of the file. I
>>     think some characters are being ignored or a rounding error is occurring or
>>     something  when using the offset to retrieve entries from the index file. I
>>     have attached the Fasta files I have been using, just incase my problem is
>>     due to improper formatting of my files.
>>
>>     For example:
>>
>>     my $refDB   = Bio::DB::Fasta->new('Test2.Fasta');
>>     my $queryDB = Bio::DB::Fasta->new('Test1.Fasta');
>>
>>     print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
>>     )."\n";
>>     print $queryDB->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";
>>
>>     output:
>>     GGTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCAG...
>>     GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGATAG...
>>
>>     my $refDB2  = InMemoryFastaAccess->new('Test2.Fasta');
>>     my $queryDB2 = InMemoryFastaAccess->new('Test1.Fasta');
>>
>>     print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
>>     )."\n";
>>     print $queryDB2->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";
>>
>>     I get:
>>
>>     output:
>>     GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...
>>     GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGAT...
>>
>>     Basically, sometimes the sequences retrieved are correct but other times it
>>     is offset slightly by a few base pairs. Interestingly it seems that the
>>     offset problem gets worse as you retrieve sequence chunks further and
>>     further down the sequence.
>>
>>     print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
>>     1515579)."\n";
>>
>>     output:
>>     CCCTGGTAGTCCACGCCGTAAACGATGAATGCCAGTCGT...
>>
>>     when it should be:
>>
>>     print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
>>     1515579)."\n";
>>
>>     output:
>>     GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...
>>
>>     This module is still way faster than what I have, so I want to keep using
>>     it. Do you think there something I'm overlooking that could be the problem
>>     or do you see a way to fix this?
>>
>>     I am currently running:
>>     Bioperl-live from the BioPerl GitHub master branch from 19/5/11
>>     Perl 5.10.1
>>     Debian 6.0.1
>>
>>     If you need any other information please let me know.
>>
>>     Thanks,
>>
>>     Justin Chu
>>
>>
>>
>>     _______________________________________________
>>     Bioperl-l mailing list
>>     Bioperl-l at lists.open-bio.org  <mailto:Bioperl-l at lists.open-bio.org>
>>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>




More information about the Bioperl-l mailing list