[BioSQL-l] Problem with Bio::DB::BioSQL::PrimarySeqAdapter
Hilmar Lapp
hlapp at gnf.org
Thu May 26 19:20:38 EDT 2005
Doesn't look immediately obvious what's going on but one suspicion I
have is that the sequence retrieval optimization is playing a role
here. The sequence of a db-retrieved entry is actually lazy-loaded,
i.e., only on demand. Theoretically, though, truncating or revcom'ing
the sequence should provide for the demand ...
Can you try in your test script to print out $pseq before you truncate
and revcom it? I.e.,
my $pseq=$objadap->find_by_query($query)->next_object;
print "\$pseq isa Bio::PrimarySeq\n" if $pseq->isa('Bio::PrimarySeq');
print $out $pseq;
my $ptrunc=$pseq->trunc(100,120);
my $prc=$pseq->revcom;
print $out $ptrunc, $prc;
Does this yield a different result?
-hilmar
On May 26, 2005, at 9:26 AM, Roy Chaudhuri wrote:
> Hi.
>
> [Wasn't sure which list to post to, apologies if this is more
> appropriate for the BioPerl list]
>
> I'm having problems using the PrimarySeqAdapter to get a
> Bio::PrimarySeq
> object from a BioSQL database. The object appears to work okay, and
> will print out fine using SeqIO, but if I trunc() or revcom() the
> sequence information disappears. I can work around this by using the
> Bio::SeqI adapter instead of Bio::PrimarySeqI, but this is slow as I'm
> working with whole bacterial genome GenBank entries with lots of
> features. The problem isn't with PrimarySeq objects generally, as if I
> define one from scratch it will trunc and revcom correctly.
>
> Here's a test script that demonstrates the problem:
> #!/usr/bin/perl
> use warnings;
> use strict;
> use Bio::PrimarySeq;
> use Bio::SeqIO;
> use Bio::DB::Query::BioQuery;
> use Bio::DB::BioDB;
> my $out=Bio::SeqIO->newFh(-format=>'fasta');
> my $tinyseq=Bio::PrimarySeq->new(-seq=>'ATGATGATGATGATG',
> -display_id=>'test');
> my $tinytrunc=$tinyseq->trunc(2,5);
> my $tinyrc=$tinyseq->revcom;
> print "\$tinyseq isa Bio::PrimarySeq\n" if
> $tinyseq->isa('Bio::PrimarySeq');
> print $out $tinyseq, $tinytrunc, $tinyrc;
>
> my $dbadap= Bio::DB::BioDB->new(-database => 'biosql',
> -dbname => 'biosql',
> -user => 'username',
> -pass => 'password',
> -driver => 'mysql');
> my $query = Bio::DB::Query::BioQuery->new(-datacollections =>
> ["Bio::PrimarySeqI entry"],
> -where =>
> ["entry.accession_number='AE003850'"]
> );
>
> my $objadap = $dbadap->get_object_adaptor('Bio::PrimarySeqI');
> my $pseq=$objadap->find_by_query($query)->next_object;
> print "\$pseq isa Bio::PrimarySeq\n" if $pseq->isa('Bio::PrimarySeq');
> my $ptrunc=$pseq->trunc(100,120);
> my $prc=$pseq->revcom;
> print $out $pseq, $ptrunc, $prc;
>
> $objadap = $dbadap->get_object_adaptor('Bio::SeqI');
> my $seq=$objadap->find_by_query($query)->next_object;
> print "\$seq isa Bio::Seq\n" if $seq->isa('Bio::Seq');
> my $trunc=$seq->trunc(100,120);
> my $rc=$seq->revcom;
> print $out $seq, $trunc, $rc;
>
>
>
> This gives the following output:
> $tinyseq isa Bio::PrimarySeq
>> test
> ATGATGATGATGATG
>> test
> TGAT
>> test
> CATCATCATCATCAT
> $pseq isa Bio::PrimarySeq
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
> GGTACCCCCCACACCCCCCTACTCGCTCGTAACTGAGTACCCACGACCGGCTAGGTTCGC
> GCAAAAGGCCAACATGACCTCTAGGGGAACCCACTCCATGAAGCCAATGGCACGAGAACG
> GGAGGTATCGCTACAGGTGAGCATCCTACGAGCACTACGGAGCCGATAACGATCACCCGA
> GCTGCGAGCGTCTGAGACGCGCCAGGAGCGCACCAAACGGCGATAAGCGAAATACCCCCC
> ATCACCACGCTCACGATGATCCTGTAGATCGATACGAAGGGCATCAGACACAGGCCAATA
> GCCACCCTTACCCCAAACACGGCCCGTAAGCCCTTTCCAGCCTTCAGGGAGATTCTCAGA
> ACAACGCTGGTAATGGCGCACGCCTCGGGCGGCGTGCTTGCTCACGTACTGAAACCATCC
> GACAACCCCATCAATAATCCGACCATGCTGCCCACGCAGACCAGCACCACAGGAGGACGC
> AACAGCCAACCACGCATCGACGCATAGAAGCACATCGTAAACAGTGCCAGAAAACCAGAT
> AGCACAATGCAAATGCGGGACACCTCGACGCTGCCACTCCGTCACCCAGTGAACCCTGAT
> CATACCAGCACGCCTCATGCGAGCTTCCCACGCACGCCTGATTTTCTGCCACTCCTGAGC
> AGTAGGAGGGCAATCACGAACGGTAAGGGTCAAAGCGAGACCAGCGCCCGTTAACTGATC
> CTCACGAACGGACATGAGGAACTCTGTATTGCGACGGACAGCCCCAGGAGACCACCCCTG
> AACCTCGCCTCGTGGCGTCCTGATATGTGATGAGTTCATGGGAGCAACACCACCTTTTCC
> CCCATGACGGTAAACTGTAATTACTGGCATCGGCCTCTCCGATAGCTGGTCACGACCCCG
> GGTGCTCGTAACACCGCGGGGTTATTTTTTTGCCGCATGCAGGAAGGAGGAAAAACCCCC
> AACCTTAACAAAACGTACAGATATGTAACCACTAATCAAGGGAGGATGGAAATCCCCCCC
> GTTTCGCACTCGCTTCGCTCGCTCAAAAGCGGGGGAGATTTCTATTCCCCAATGACAATT
> TGTCAAGCAATCACTTGACGTTAAATCCAAGGGGGTTGAACTGAATGTCATCCAATTGGA
> GACCACTGGAAACCTAGATTTCCACCCAGGGGACACAGGGCGTAAAAACGGTTATCCGTG
> AAATAGATCAGGGCTTCGTGTTGGGGGTCATTTGGCCCCCACATAACGGACCGAAGGAGA
> GGGCGTAAAAGCGCCTCCGCAGGGGN
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
>
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
>
> $seq isa Bio::Seq
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
> GGTACCCCCCACACCCCCCTACTCGCTCGTAACTGAGTACCCACGACCGGCTAGGTTCGC
> GCAAAAGGCCAACATGACCTCTAGGGGAACCCACTCCATGAAGCCAATGGCACGAGAACG
> GGAGGTATCGCTACAGGTGAGCATCCTACGAGCACTACGGAGCCGATAACGATCACCCGA
> GCTGCGAGCGTCTGAGACGCGCCAGGAGCGCACCAAACGGCGATAAGCGAAATACCCCCC
> ATCACCACGCTCACGATGATCCTGTAGATCGATACGAAGGGCATCAGACACAGGCCAATA
> GCCACCCTTACCCCAAACACGGCCCGTAAGCCCTTTCCAGCCTTCAGGGAGATTCTCAGA
> ACAACGCTGGTAATGGCGCACGCCTCGGGCGGCGTGCTTGCTCACGTACTGAAACCATCC
> GACAACCCCATCAATAATCCGACCATGCTGCCCACGCAGACCAGCACCACAGGAGGACGC
> AACAGCCAACCACGCATCGACGCATAGAAGCACATCGTAAACAGTGCCAGAAAACCAGAT
> AGCACAATGCAAATGCGGGACACCTCGACGCTGCCACTCCGTCACCCAGTGAACCCTGAT
> CATACCAGCACGCCTCATGCGAGCTTCCCACGCACGCCTGATTTTCTGCCACTCCTGAGC
> AGTAGGAGGGCAATCACGAACGGTAAGGGTCAAAGCGAGACCAGCGCCCGTTAACTGATC
> CTCACGAACGGACATGAGGAACTCTGTATTGCGACGGACAGCCCCAGGAGACCACCCCTG
> AACCTCGCCTCGTGGCGTCCTGATATGTGATGAGTTCATGGGAGCAACACCACCTTTTCC
> CCCATGACGGTAAACTGTAATTACTGGCATCGGCCTCTCCGATAGCTGGTCACGACCCCG
> GGTGCTCGTAACACCGCGGGGTTATTTTTTTGCCGCATGCAGGAAGGAGGAAAAACCCCC
> AACCTTAACAAAACGTACAGATATGTAACCACTAATCAAGGGAGGATGGAAATCCCCCCC
> GTTTCGCACTCGCTTCGCTCGCTCAAAAGCGGGGGAGATTTCTATTCCCCAATGACAATT
> TGTCAAGCAATCACTTGACGTTAAATCCAAGGGGGTTGAACTGAATGTCATCCAATTGGA
> GACCACTGGAAACCTAGATTTCCACCCAGGGGACACAGGGCGTAAAAACGGTTATCCGTG
> AAATAGATCAGGGCTTCGTGTTGGGGGTCATTTGGCCCCCACATAACGGACCGAAGGAGA
> GGGCGTAAAAGCGCCTCCGCAGGGGN
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
> GAAGCCAATGGCACGAGAACG
>> AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
> NCCCCTGCGGAGGCGCTTTTACGCCCTCTCCTTCGGTCCGTTATGTGGGGGCCAAATGAC
> CCCCAACACGAAGCCCTGATCTATTTCACGGATAACCGTTTTTACGCCCTGTGTCCCCTG
> GGTGGAAATCTAGGTTTCCAGTGGTCTCCAATTGGATGACATTCAGTTCAACCCCCTTGG
> ATTTAACGTCAAGTGATTGCTTGACAAATTGTCATTGGGGAATAGAAATCTCCCCCGCTT
> TTGAGCGAGCGAAGCGAGTGCGAAACGGGGGGGATTTCCATCCTCCCTTGATTAGTGGTT
> ACATATCTGTACGTTTTGTTAAGGTTGGGGGTTTTTCCTCCTTCCTGCATGCGGCAAAAA
> AATAACCCCGCGGTGTTACGAGCACCCGGGGTCGTGACCAGCTATCGGAGAGGCCGATGC
> CAGTAATTACAGTTTACCGTCATGGGGGAAAAGGTGGTGTTGCTCCCATGAACTCATCAC
> ATATCAGGACGCCACGAGGCGAGGTTCAGGGGTGGTCTCCTGGGGCTGTCCGTCGCAATA
> CAGAGTTCCTCATGTCCGTTCGTGAGGATCAGTTAACGGGCGCTGGTCTCGCTTTGACCC
> TTACCGTTCGTGATTGCCCTCCTACTGCTCAGGAGTGGCAGAAAATCAGGCGTGCGTGGG
> AAGCTCGCATGAGGCGTGCTGGTATGATCAGGGTTCACTGGGTGACGGAGTGGCAGCGTC
> GAGGTGTCCCGCATTTGCATTGTGCTATCTGGTTTTCTGGCACTGTTTACGATGTGCTTC
> TATGCGTCGATGCGTGGTTGGCTGTTGCGTCCTCCTGTGGTGCTGGTCTGCGTGGGCAGC
> ATGGTCGGATTATTGATGGGGTTGTCGGATGGTTTCAGTACGTGAGCAAGCACGCCGCCC
> GAGGCGTGCGCCATTACCAGCGTTGTTCTGAGAATCTCCCTGAAGGCTGGAAAGGGCTTA
> CGGGCCGTGTTTGGGGTAAGGGTGGCTATTGGCCTGTGTCTGATGCCCTTCGTATCGATC
> TACAGGATCATCGTGAGCGTGGTGATGGGGGGTATTTCGCTTATCGCCGTTTGGTGCGCT
> CCTGGCGCGTCTCAGACGCTCGCAGCTCGGGTGATCGTTATCGGCTCCGTAGTGCTCGTA
> GGATGCTCACCTGTAGCGATACCTCCCGTTCTCGTGCCATTGGCTTCATGGAGTGGGTTC
> CCCTAGAGGTCATGTTGGCCTTTTGCGCGAACCTAGCCGGTCGTGGGTACTCAGTTACGA
> GCGAGTAGGGGGGTGTGGGGGGTACC
>
> Any idea what's going on?
> Thanks.
> Roy.
>
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, UK
>
> http://colibase.bham.ac.uk
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list