[Bioperl-l] Writing and retrieving Genbank files from BioSQL

Roy Chaudhuri roy.chaudhuri at gmail.com
Wed Apr 23 16:18:37 UTC 2014


Just noticed that my replies to Rik only went to the Google group, not 
sure if those e-mails eventually filter through to bioperl-l? In case, 
they don't here are my combined messages:

Hi Rik,

See this discussion between myself and Peter:
http://bioperl.org/pipermail/bioperl-l/2011-July/035435.html

There isn't a "circular" column in the BioSQL schema (although there 
probably should be). However, you can hack around this by adding an 
annotation tag called "is_circular" to your sequence before storing it 
in BioSQL. The tag would have the value of $seq->is_circular (either 1 
or undef). The code which extracts the sequence from the database would 
need to be aware of this and convert the annotation tag back into the 
BioPerl is_circular value (or the BioPython equivalent). I think Peter 
was working on this in BioPython at the time so there may already be 
code to do this.

Here is some BioPerl code to retrieve a sequence from BioSQL and print 
it to STDOUT:

#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;
use Bio::DB::Query::BioQuery;
use Bio::DB::BioDB;
my $accession='L08752';
my $dbadap= Bio::DB::BioDB->new(-database => 'biosql',
                                 -dbname   => 'bioseqdb',
                                 -user => 'root',
                                 -pass => 'pass',
                                 -driver => 'mysql');
my $query = Bio::DB::Query::BioQuery->new(-datacollections => 
["Bio::SeqI entry"],
                                           -where => 
["entry.accession_number='$accession'"]
                                          );
my $objadap = $dbadap->get_object_adaptor('Bio::SeqI');
my $seq=$objadap->find_by_query($query)->next_object;
die "Accession $accession not found\n" if not defined $seq;
my($circular)=$seq->annotation->remove_Annotations('is_circular');
$seq->is_circular($circular);
Bio::SeqIO->new(-format=>'genbank')->write_seq($seq);

Cheers,
Roy.

On 21/04/2014 15:55, Rik Rademaker wrote:
> Dear all,
>
> I am a biologist trying to write genbank files to bioSQL. I am comfortable
> in writing python scripts but there is a problem with BioPython and that
> is  that the molecule type in the locus line is lost (eg 'circular DNA'
> becomes just 'DNA'). I am now trying to figure out how BioPerl is doing
> this and how BioPerl is writing this information to BioSQL.
>
> I have a BioSQL database (MySQL) and I can commit to BioSQL eg via this
> program:
> #!/usr/bin/perl
>
> use strict;
>
> use Bio::DB::BioDB;
> use Bio::DB::GenBank;
>
> #Load Genbank file
> my $genbank_id = 'L08752';
>
> my $genDB = new Bio::DB::GenBank;
> my $sequence = $genDB->get_Seq_by_id($genbank_id);
>
> my $db=Bio::DB::BioDB->new(-database => 'BioSQL',
>                             -user => 'root',
>                             -dbname => 'bioseqdb',
>                             -host => 'localhost',
>                             -driver => 'mysql');
>
> my $pobj = $db->create_persistent($sequence);
> $pobj->create();
> $pobj->commit();
>
> This works and I see the data appearing in BioSQL.
>
> I am having  hard time to figure out how to retrieve the sequence object
> from BioSQL and to write it back to a genbank on harddisk.
> Could someone give me some suggestions?
>
> I would like to know if BioPerl is capable to maintain the 'circular'
> property in the locus line after the data has been exported to and
> retrieved from BioSQL. Sofar,  I have not identified tables/fields that
> store the circular property. Next step would to implement this behavior in
> BioPython for which I already contacted Peter Cock.
>
> Kind regards, Rik.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list