[BioSQL-l] Problem with Bio::DB::BioSQL::PrimarySeqAdapter

Roy Chaudhuri roy at colibase.bham.ac.uk
Thu May 26 12:26:49 EDT 2005


Hi.

[Wasn't sure which list to post to, apologies if this is more
appropriate for the BioPerl list]

I'm having problems using the PrimarySeqAdapter to get a Bio::PrimarySeq
 object from a BioSQL database. The object appears to work okay, and
will print out fine using SeqIO, but if I trunc() or revcom() the
sequence information disappears. I can work around this by using the
Bio::SeqI adapter instead of Bio::PrimarySeqI, but this is slow as I'm
working with whole bacterial genome GenBank entries with lots of
features. The problem isn't with PrimarySeq objects generally, as if I
define one from scratch it will trunc and revcom correctly.

Here's a test script that demonstrates the problem:
#!/usr/bin/perl
use warnings;
use strict;
use Bio::PrimarySeq;
use Bio::SeqIO;
use Bio::DB::Query::BioQuery;
use Bio::DB::BioDB;
my $out=Bio::SeqIO->newFh(-format=>'fasta');
my $tinyseq=Bio::PrimarySeq->new(-seq=>'ATGATGATGATGATG',
                                 -display_id=>'test');
my $tinytrunc=$tinyseq->trunc(2,5);
my $tinyrc=$tinyseq->revcom;
print "\$tinyseq isa Bio::PrimarySeq\n" if $tinyseq->isa('Bio::PrimarySeq');
print $out $tinyseq, $tinytrunc, $tinyrc;

my $dbadap= Bio::DB::BioDB->new(-database => 'biosql',
                                -dbname => 'biosql',
                                -user => 'username',
                                -pass => 'password',
                                -driver => 'mysql');
my $query = Bio::DB::Query::BioQuery->new(-datacollections =>
["Bio::PrimarySeqI entry"],
                                          -where =>
["entry.accession_number='AE003850'"]
                                         );

my $objadap = $dbadap->get_object_adaptor('Bio::PrimarySeqI');
my $pseq=$objadap->find_by_query($query)->next_object;
print "\$pseq isa Bio::PrimarySeq\n" if $pseq->isa('Bio::PrimarySeq');
my $ptrunc=$pseq->trunc(100,120);
my $prc=$pseq->revcom;
print $out $pseq, $ptrunc, $prc;

$objadap = $dbadap->get_object_adaptor('Bio::SeqI');
my $seq=$objadap->find_by_query($query)->next_object;
print "\$seq isa Bio::Seq\n" if $seq->isa('Bio::Seq');
my $trunc=$seq->trunc(100,120);
my $rc=$seq->revcom;
print $out $seq, $trunc, $rc;



This gives the following output:
$tinyseq isa Bio::PrimarySeq
>test
ATGATGATGATGATG
>test
TGAT
>test
CATCATCATCATCAT
$pseq isa Bio::PrimarySeq
>AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
GGTACCCCCCACACCCCCCTACTCGCTCGTAACTGAGTACCCACGACCGGCTAGGTTCGC
GCAAAAGGCCAACATGACCTCTAGGGGAACCCACTCCATGAAGCCAATGGCACGAGAACG
GGAGGTATCGCTACAGGTGAGCATCCTACGAGCACTACGGAGCCGATAACGATCACCCGA
GCTGCGAGCGTCTGAGACGCGCCAGGAGCGCACCAAACGGCGATAAGCGAAATACCCCCC
ATCACCACGCTCACGATGATCCTGTAGATCGATACGAAGGGCATCAGACACAGGCCAATA
GCCACCCTTACCCCAAACACGGCCCGTAAGCCCTTTCCAGCCTTCAGGGAGATTCTCAGA
ACAACGCTGGTAATGGCGCACGCCTCGGGCGGCGTGCTTGCTCACGTACTGAAACCATCC
GACAACCCCATCAATAATCCGACCATGCTGCCCACGCAGACCAGCACCACAGGAGGACGC
AACAGCCAACCACGCATCGACGCATAGAAGCACATCGTAAACAGTGCCAGAAAACCAGAT
AGCACAATGCAAATGCGGGACACCTCGACGCTGCCACTCCGTCACCCAGTGAACCCTGAT
CATACCAGCACGCCTCATGCGAGCTTCCCACGCACGCCTGATTTTCTGCCACTCCTGAGC
AGTAGGAGGGCAATCACGAACGGTAAGGGTCAAAGCGAGACCAGCGCCCGTTAACTGATC
CTCACGAACGGACATGAGGAACTCTGTATTGCGACGGACAGCCCCAGGAGACCACCCCTG
AACCTCGCCTCGTGGCGTCCTGATATGTGATGAGTTCATGGGAGCAACACCACCTTTTCC
CCCATGACGGTAAACTGTAATTACTGGCATCGGCCTCTCCGATAGCTGGTCACGACCCCG
GGTGCTCGTAACACCGCGGGGTTATTTTTTTGCCGCATGCAGGAAGGAGGAAAAACCCCC
AACCTTAACAAAACGTACAGATATGTAACCACTAATCAAGGGAGGATGGAAATCCCCCCC
GTTTCGCACTCGCTTCGCTCGCTCAAAAGCGGGGGAGATTTCTATTCCCCAATGACAATT
TGTCAAGCAATCACTTGACGTTAAATCCAAGGGGGTTGAACTGAATGTCATCCAATTGGA
GACCACTGGAAACCTAGATTTCCACCCAGGGGACACAGGGCGTAAAAACGGTTATCCGTG
AAATAGATCAGGGCTTCGTGTTGGGGGTCATTTGGCCCCCACATAACGGACCGAAGGAGA
GGGCGTAAAAGCGCCTCCGCAGGGGN
>AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.

>AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.

$seq isa Bio::Seq
>AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
GGTACCCCCCACACCCCCCTACTCGCTCGTAACTGAGTACCCACGACCGGCTAGGTTCGC
GCAAAAGGCCAACATGACCTCTAGGGGAACCCACTCCATGAAGCCAATGGCACGAGAACG
GGAGGTATCGCTACAGGTGAGCATCCTACGAGCACTACGGAGCCGATAACGATCACCCGA
GCTGCGAGCGTCTGAGACGCGCCAGGAGCGCACCAAACGGCGATAAGCGAAATACCCCCC
ATCACCACGCTCACGATGATCCTGTAGATCGATACGAAGGGCATCAGACACAGGCCAATA
GCCACCCTTACCCCAAACACGGCCCGTAAGCCCTTTCCAGCCTTCAGGGAGATTCTCAGA
ACAACGCTGGTAATGGCGCACGCCTCGGGCGGCGTGCTTGCTCACGTACTGAAACCATCC
GACAACCCCATCAATAATCCGACCATGCTGCCCACGCAGACCAGCACCACAGGAGGACGC
AACAGCCAACCACGCATCGACGCATAGAAGCACATCGTAAACAGTGCCAGAAAACCAGAT
AGCACAATGCAAATGCGGGACACCTCGACGCTGCCACTCCGTCACCCAGTGAACCCTGAT
CATACCAGCACGCCTCATGCGAGCTTCCCACGCACGCCTGATTTTCTGCCACTCCTGAGC
AGTAGGAGGGCAATCACGAACGGTAAGGGTCAAAGCGAGACCAGCGCCCGTTAACTGATC
CTCACGAACGGACATGAGGAACTCTGTATTGCGACGGACAGCCCCAGGAGACCACCCCTG
AACCTCGCCTCGTGGCGTCCTGATATGTGATGAGTTCATGGGAGCAACACCACCTTTTCC
CCCATGACGGTAAACTGTAATTACTGGCATCGGCCTCTCCGATAGCTGGTCACGACCCCG
GGTGCTCGTAACACCGCGGGGTTATTTTTTTGCCGCATGCAGGAAGGAGGAAAAACCCCC
AACCTTAACAAAACGTACAGATATGTAACCACTAATCAAGGGAGGATGGAAATCCCCCCC
GTTTCGCACTCGCTTCGCTCGCTCAAAAGCGGGGGAGATTTCTATTCCCCAATGACAATT
TGTCAAGCAATCACTTGACGTTAAATCCAAGGGGGTTGAACTGAATGTCATCCAATTGGA
GACCACTGGAAACCTAGATTTCCACCCAGGGGACACAGGGCGTAAAAACGGTTATCCGTG
AAATAGATCAGGGCTTCGTGTTGGGGGTCATTTGGCCCCCACATAACGGACCGAAGGAGA
GGGCGTAAAAGCGCCTCCGCAGGGGN
>AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
GAAGCCAATGGCACGAGAACG
>AE003850 Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence.
NCCCCTGCGGAGGCGCTTTTACGCCCTCTCCTTCGGTCCGTTATGTGGGGGCCAAATGAC
CCCCAACACGAAGCCCTGATCTATTTCACGGATAACCGTTTTTACGCCCTGTGTCCCCTG
GGTGGAAATCTAGGTTTCCAGTGGTCTCCAATTGGATGACATTCAGTTCAACCCCCTTGG
ATTTAACGTCAAGTGATTGCTTGACAAATTGTCATTGGGGAATAGAAATCTCCCCCGCTT
TTGAGCGAGCGAAGCGAGTGCGAAACGGGGGGGATTTCCATCCTCCCTTGATTAGTGGTT
ACATATCTGTACGTTTTGTTAAGGTTGGGGGTTTTTCCTCCTTCCTGCATGCGGCAAAAA
AATAACCCCGCGGTGTTACGAGCACCCGGGGTCGTGACCAGCTATCGGAGAGGCCGATGC
CAGTAATTACAGTTTACCGTCATGGGGGAAAAGGTGGTGTTGCTCCCATGAACTCATCAC
ATATCAGGACGCCACGAGGCGAGGTTCAGGGGTGGTCTCCTGGGGCTGTCCGTCGCAATA
CAGAGTTCCTCATGTCCGTTCGTGAGGATCAGTTAACGGGCGCTGGTCTCGCTTTGACCC
TTACCGTTCGTGATTGCCCTCCTACTGCTCAGGAGTGGCAGAAAATCAGGCGTGCGTGGG
AAGCTCGCATGAGGCGTGCTGGTATGATCAGGGTTCACTGGGTGACGGAGTGGCAGCGTC
GAGGTGTCCCGCATTTGCATTGTGCTATCTGGTTTTCTGGCACTGTTTACGATGTGCTTC
TATGCGTCGATGCGTGGTTGGCTGTTGCGTCCTCCTGTGGTGCTGGTCTGCGTGGGCAGC
ATGGTCGGATTATTGATGGGGTTGTCGGATGGTTTCAGTACGTGAGCAAGCACGCCGCCC
GAGGCGTGCGCCATTACCAGCGTTGTTCTGAGAATCTCCCTGAAGGCTGGAAAGGGCTTA
CGGGCCGTGTTTGGGGTAAGGGTGGCTATTGGCCTGTGTCTGATGCCCTTCGTATCGATC
TACAGGATCATCGTGAGCGTGGTGATGGGGGGTATTTCGCTTATCGCCGTTTGGTGCGCT
CCTGGCGCGTCTCAGACGCTCGCAGCTCGGGTGATCGTTATCGGCTCCGTAGTGCTCGTA
GGATGCTCACCTGTAGCGATACCTCCCGTTCTCGTGCCATTGGCTTCATGGAGTGGGTTC
CCCTAGAGGTCATGTTGGCCTTTTGCGCGAACCTAGCCGGTCGTGGGTACTCAGTTACGA
GCGAGTAGGGGGGTGTGGGGGGTACC

Any idea what's going on?
Thanks.
Roy.

--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, UK

http://colibase.bham.ac.uk


More information about the BioSQL-l mailing list