[Bioperl-l] Problems with Bio::DB::Fasta

Justin Chu justinchu1989 at gmail.com
Fri May 20 19:51:25 UTC 2011


Hello:

I'm having trouble with Bio::DB::Fasta. It sometimes occurs when I use large
fasta files and retrieve sequence from a bit past the start of the file. I
think some characters are being ignored or a rounding error is occurring or
something  when using the offset to retrieve entries from the index file. I
have attached the Fasta files I have been using, just incase my problem is
due to improper formatting of my files.

For example:

my $refDB   = Bio::DB::Fasta->new('Test2.Fasta');
my $queryDB = Bio::DB::Fasta->new('Test1.Fasta');

print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
)."\n";
print $queryDB->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";

output:
GGTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCAG...
GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGATAG...

my $refDB2  = InMemoryFastaAccess->new('Test2.Fasta');
my $queryDB2 = InMemoryFastaAccess->new('Test1.Fasta');

print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 161067, 161788
)."\n";
print $queryDB2->subseq( "gi|169245903|gb|EU376363.1|", 1, 722 )."\n";

I get:

output:
GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...
GTAGTCCCGGCCGTAAACGATGGATGCTAGCCGTCGGAT...

Basically, sometimes the sequences retrieved are correct but other times it
is offset slightly by a few base pairs. Interestingly it seems that the
offset problem gets worse as you retrieve sequence chunks further and
further down the sequence.

print $refDB->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
1515579)."\n";

output:
CCCTGGTAGTCCACGCCGTAAACGATGAATGCCAGTCGT...

when it should be:

print $refDB2->subseq( "gi|294675557|ref|NC_014034.1|", 1514858,
1515579)."\n";

output:
GTAGTCCACGCCGTAAACGATGAATGCCAGTCGTCGGCA...

This module is still way faster than what I have, so I want to keep using
it. Do you think there something I'm overlooking that could be the problem
or do you see a way to fix this?

I am currently running:
Bioperl-live from the BioPerl GitHub master branch from 19/5/11
Perl 5.10.1
Debian 6.0.1

If you need any other information please let me know.

Thanks,

Justin Chu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test2.Fasta
Type: application/octet-stream
Size: 3798623 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20110520/317cce7f/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test1.Fasta
Type: application/octet-stream
Size: 839 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20110520/317cce7f/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: InMemoryFastaAccess.pm
Type: application/x-perl
Size: 1111 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20110520/317cce7f/attachment.pl>


More information about the Bioperl-l mailing list