[BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq

jochen jochen at penguin-breeder.org
Thu Jun 3 04:49:06 EDT 2004


Hi,

I have a similar problem, namely I want to modify some sequences and
store them back in the database, without overwriting any of the original
sequences, basically this:

# retrieve an existing sequence
my $seq = Bio::Seq::RichSeq->new( -display_id => 'something' );
$seq = $seqadaptor->find_by_unique_key($seq);

# make sure, $seq isn't persistant anymore
my $buffer = new IO::String;
my $out = new Bio::SeqIO(-fh => $buffer, -format => 'embl');
$out->write_seq($seq);
$buffer->setpos(0);
my $in = new Bio::SeqIO(-fh => $buffer, -format => 'embl');
$seq = $in->next_seq;

# modify it a little
$seq->primary_id('NEW001');

# create a new copy (fails, just overwrites the old one)
$seq->create()

A little debugging revealed that there are several unique constraints on
the bioentry (using postgresql here), which prevent me from creating two
objects, if they have

o the same primary_id and/or
o the same (accession_number,version,namespace)

Isn't this an unneccsary restriction? especially, why is primary_id an
unique constraint, and not (primary_id,namespace)?

Even worse, $seq->create in most cases doesn't give an error if there is
already a similar sequence, but just writes over the existing sequence:

In Bio/DB/BioSQL/BasePersistenceAdaptor.pm, line 196-213, you try to 
insert an the new object. If this fails, you conclude this object already 
exists and retrieve it from the DB. Now this behaviour is ok for creating 
the eventually missing foreign key objects. However, if I invoke create() 
on an sequence object, I'd expect this object to be newly created or to 
receive an error.

What do you think about this? Did I miss something there?

I'd suggest fixing that by introducing two different create functions
(or a parameter) that controls whether it's ok to retrieve an eventually
existing object (i.e. when creating the foreign key objects) or whether 
the whole method should fail if there is an already existing object.

> ...
> # trigger insert by making the object forget
> # its primary key
> $pseq->primary_key(undef);
> # we need to duplicate dependent objects
> # (children) too, like features
> foreach my $pfea ($pseq->get_SeqFeatures) {
> 	$pfea->primary_key(undef)
> 		if $pfea->isa("Bio::DB::PersistentObjectI");
> 	# features have locations
> 	$pfea->location->primary_key(undef)
> 		if $pfea->location->isa("Bio::DB::PersistentObjectI");
> }
> # do the insert
> $pseq->create();

assuming you just changed the namespace, this code example won't work, 
because you didn't change the primary_id, thus violating the unique
constraint

kind regards
-- jochen


More information about the BioSQL-l mailing list