[BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq

Jochen Eisinger jochen at penguin-breeder.org
Tue Jun 8 04:42:58 EDT 2004


Hi,

thanks for your clarifying answer!

On Mon, Jun 07, 2004 at 04:52:26PM -0700, Hilmar Lapp wrote:
> >$seq = $seqadaptor->find_by_unique_key($seq);
> >
> ># make sure, $seq isn't persistant anymore
> >my $buffer = new IO::String;
> >my $out = new Bio::SeqIO(-fh => $buffer, -format => 'embl');
> >$out->write_seq($seq);
> >$buffer->setpos(0);
> >my $in = new Bio::SeqIO(-fh => $buffer, -format => 'embl');
> >$seq = $in->next_seq;
> >
> ># modify it a little
> >$seq->primary_id('NEW001');
> >
> ># create a new copy (fails, just overwrites the old one)
> >$seq->create()
> 
> With the above code this line needs to throw a perl error for calling a 
> non-existent function on an object. A sequence stream will never give 
> you a persistent object.

Ah, yes, I forgot

  $seq = $db->create_persistent($seq) 

before the create() in the above example.

> (accession_number,version,namespace) is a well-established uniqueness 
> constraint on sequences in order to guarantee a minimal amount of 
> sanity.

Why isn't this the primary key btw? I'm quite new to biosql and may
still be missing some points... I'm rather surprised you're using
artificial columns as primary keys and add unique constraints to the
table, instead of using them as primary keys and dropping this integer
valued id columns.

> 
> >Even worse, $seq->create in most cases doesn't give an error if there 
> >is already a similar sequence, but just writes over the existing 
> >sequence:
> 
> It doesn't write over an existing sequence. It will update the 
> attributes of the object you wanted to create to match those of the 
> existing object in the database, unless you pass in an object factory 
> (-obj_factory => $myseqfactory).

It won't update the record in any case. If you change the length of the
sequence for example, you will get an error "tried to lie about sequence
length"

> >In Bio/DB/BioSQL/BasePersistenceAdaptor.pm, line 196-213, you try to
> >insert an the new object. If this fails, you conclude this object 
> >already exists and retrieve it from the DB. Now this behaviour is ok 
> >for creating the eventually missing foreign key objects. However, if I 
> >invoke create() on an sequence object, I'd expect this object to be 
> >newly created or to receive an error.
> >
> 
> If that's what you expect then run a find_by_unique_key() first to make 
> sure it's not present already. (Note that this is still no guarantee 
> because between the time you get the negative result and the time you 
> commit the create() transaction somebody else may have inserted the 
> same sequence.)
That should not be possible, the DBs transaction system should take care
of this.

> Note that the method is named create(), not insert_or_fail(). The 
> purpose is that after the call returns successfully the object on which 
> you invoked create() has an equivalent entry in the database. It is not 
> an error if the respective row that you wanted to be present in the 
> database is already there.

I expected store() to do this, and create to be insert_or_fail-like 

> Bioperl-db is not a SQL interface. It's an OR mapper. You use it if you 
> want to live and navigate in object land, not when you want to be close 
> to the RDBMS vibe. At least that's the goal ...

Ok

> I'm inclined to make the tuple of (identifier,namespace) the default 
> for the future; there seem to be too many subtle issues otherwise if 
> you're unsuspecting.

I guess that would be a good thing to do. Otherwise it's quite
impossible to have the same sequence in multiple versions in a single
database. 

In my case, I need to have sequences with several different annotations
stored in one db. changing the primary id of the sequences is not an
option here.

kind regards
-- jochen


More information about the BioSQL-l mailing list